Status quo
This document gives a short status report of the
Komi grammatical analyser
and its source files.
Juy 15th, 2016
Status for the source files:
- Lexicon: The lexicon contains 35806 entries (19129 nouns, 12191 verbs,
4486 adjectives)
- Morphology: The morphology files are 3494 lines long, and contain 479
continuation lexica. Compared to 8234 lines and 1309 lexica for Erzya,
there is still work to do.
- Morphophonology: The file kpv-phon.twolc is 253 lines long. Compared to
the one for Erzya, 514 lines, the situation is not that bad for Komi.
Tasks ahead:
- Test and correct the morphology and morphophonology
- Integrate the Komi-Russian dictionary in the analyser (done)
- Add more words:
- Test the resulting analyser against text material, and add new words
- Systematically add Russian loanwords: names and technical terms (partly done)
- Work on the spellchecker
- TODO: Collect error corpus