mns meeting Nov 29th 2023
Present: Csilla, Jack, Trond
Agenda
- Status
- Issues
- Priorities
- Next meeting
Status
Csilla
- fixing typos
- collecting postpositions
- working on flagged verbal forms
Jack
work with prefixed verbs
@U.VPref.konyl
@U.VPref.nox
@U.VPref.tyg
We have flags in the code. Let us discuss with Shjur whether tags (converted into flags later on) would be better.
Trond
Has been updating test routines to take typos.txt into account for testing of missing words.
Testing results
Small changes to the better for coverage, we do 0.923 coverage for Luima Seripos.
242 ва̄рим # gramm, verb is not working
238 хотьют # typo хотъют
209 арыгтем # missing
196 арыгкем # missing lemma аригкем Adv
189 сыресыр # missing
173 места # missing loanword
160 ловиньтэлы̄н
159 ӯнттувес
...
Issues
Tag issues:
We go for Comp. The tags were (mostly) fixed, cleanup after the meeting.
Flag issue
@U.VPref.xot@хот-воратаӈкве+V:@U.VPref.xot@ворат V_A “” ;
This should work, but does not. We discuss with Sjur. The tag_test.sh claims @хот-воратаӈкве is an undeclared tag, which it of course not is. The problem is that @U.VPref.xot@ is declared.
We should check whether the tag_test.sh is ok.
maturity status
Trond has lifted mns from alpha to beta level (but it does not come up on the page, though).
Lexicon size
Our results are fantastic, taken into consideration the small lexica we do have:
3881 src/fst/stems/verbs.lexc
3460 src/fst/stems/nouns.lexc
1450 src/fst/stems/adjectives.lexc
714 src/fst/stems/abbreviations.lexc
529 src/fst/stems/Missing_words_20231006.txt
368 src/fst/stems/mns-propernouns.lexc
271 src/fst/stems/adverbs.lexc
197 src/fst/stems/pronouns.lexc
121 src/fst/stems/numerals.lexc
110 src/fst/stems/postpositions.lexc
30 src/fst/stems/conjunctions.lexc
24 src/fst/stems/interjections.lexc
11 src/fst/stems/participles.lexc
We understand why: No compound and not many loanwords in the newspaper. One way of improving our list would still be to look for dictionaries. This we do later, though
Priorities
- Adding words from missing lists (Csilla)
- Adjective Jack -> Csilla
- Look at grammatical issues (Jack, evt Trond, Sjur, with input from Csilla)
When we are down at few missing in the 10 and above we start seriously looking at the spellchecker.
Next meeting
Probably week 50, perhaps in Hki.