Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-sme
This flowchart gives an overview of how the sme sourcefiles are related. In principle, the other lg files are arranged in the same way.
The main lex file Separate lex files for different POS (parts
of speech)
|----------------------| |------------------|
| sme-lex.txt | | noun-sme-lex.txt |
| | | viessu GOAHTI ; | From the Root lexicon, there
| Root -------------> | ... | are pointers to each POS.
| | | | | The files for nouns, verbs and
| LEXICON GOAHTI <---------------| | adjectives point back to the
| +N DEVNVCASE ; | | | sme-lex.txt file, and are di-
| ... | |------------------| rected to their respective
| | sublexica.
| | |-------------------|
| ---> | verb-sme-lex.txt | (the auxiliary verbs are
| <--------- ... | also found in the verb file)
| | |-------------------|
| |
| | |-------------------|
| ---> | adj-sme-lex.txt |
| <--------- ... |
| | |-------------------|
| |
| |
| ---> |-------------------| The other lex files contain
| <- - - - - closed-sme-lex.txt| closed classes. They are smal
|----------------------| | LEXICON Pronoun | ler, and all the sublexica
| Personal ; | are in the same file, not in
| | the sme-lex file (well, some
| LEXICON Personal | point to some sme-lex sub-
| ... | lexica). Other files are pp-
|-------------------| lex.txt, etc. All in all
there are ca. 10 lex files.
This is compiled together with the ||
twol rules. These rules contain the ||
(morpho)phonological processes, ||
consonant gradation, etc. \/
|------------| |------------| |------------| The sme.save file is
|twol-sme.txt| => |twol-sme.bin| => | sme.save | compiled in lexc, and
|------------| |------------| |------------| is the merger of the
lex files and the rule
Here are the After compi- file twol-sme.bin
rules them- lation in twolc ||
selves they are in this ||
binary file ||
||
Then comes preprosessor files: \/
|----------| |------------| ||=========|| This is the final morpho-
|case.regex| => |caseconv.fst| ======> || sme.fst || logical parser for
|----------| |------------| ||=========|| North Sami.
The case.regex file is com-
piled in xfst. The preprocessor
itself, tok.fst, is not shown here.