Finite state and Constraint Grammar based analysers, proofing tools and other resources
The Komi lexicon files are used both for dictionary creation and for the transducer.
The main komi file is kt/kom/src/kom-lex.txt.
It contains the lexicon Root (the initial lexicon). In the same src catalogue
is found the catalogue working_files
.
(cf. here for a look).
During compilation, the entries from the xml files in the dictionary are extracted, and put
in the catalogue kt/tmp/out/
(two levels up).
To take an example:
The file working_files/PRON-PERS_kom-lex.xml has an entry
<entry>
<lemma>ме</lemma>
<stem/>
<contlex>PRON-PERS-SG1-NOM</contlex>
<pos>PRON-PERS</pos>
<article>
<eng>
<choice>
<variant>I</variant>
</choice>
</eng>
<fin>
<choice>
<variant>minä</variant>
</choice>
</fin>
</article>
</entry>
From this file, the compilation process derives a lexc file to the catalogue
kt/tmp/out. Here, we find a derived file PRON-PERS_kom-lex.txt
. The first
three lines of that file are:
LEXICON PRON-PERS
ме PRON-PERS-SG1-NOM "I" ;
The file-name of the xml file (PRON-PERS) is the name of the continuation lexicon. Each entry has a lemma (here ме), and a stem (here, the stem is identical to the lemma). Then comes space, and then the contlex (here, the contlex is PRON-PERS-SG1-NOM. The contlex is found in the file kt/kom/src/pron-kom-morph.txt.
The Komi lexicon files are found here (you may have to choose “show source code” in the browser):