Mansi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-mns

mns meeting Nov 29th 2023

Present: Csilla, Jack, Trond

Agenda

Status

Csilla

Jack

work with prefixed verbs

@U.VPref.konyl
@U.VPref.nox
@U.VPref.tyg

We have flags in the code. Let us discuss with Shjur whether tags (converted into flags later on) would be better.

Trond

Has been updating test routines to take typos.txt into account for testing of missing words.

Testing results

Small changes to the better for coverage, we do 0.923 coverage for Luima Seripos.

 242 ва̄рим # gramm, verb is not working
 238 хотьют # typo хотъют
 209 арыгтем # missing
 196 арыгкем # missing lemma аригкем Adv
 189 сыресыр # missing
 173 места # missing loanword
 160 ловиньтэлы̄н
 159 ӯнттувес
 ...

Issues

Tag issues:

We go for Comp. The tags were (mostly) fixed, cleanup after the meeting.

Flag issue

@U.VPref.xot@хот-воратаӈкве+V:@U.VPref.xot@ворат V_A “” ;

This should work, but does not. We discuss with Sjur. The tag_test.sh claims @хот-воратаӈкве is an undeclared tag, which it of course not is. The problem is that @U.VPref.xot@ is declared.

We should check whether the tag_test.sh is ok.

maturity status

Trond has lifted mns from alpha to beta level (but it does not come up on the page, though).

Lexicon size

Our results are fantastic, taken into consideration the small lexica we do have:

    3881 src/fst/stems/verbs.lexc
    3460 src/fst/stems/nouns.lexc
    1450 src/fst/stems/adjectives.lexc
     714 src/fst/stems/abbreviations.lexc
     529 src/fst/stems/Missing_words_20231006.txt
     368 src/fst/stems/mns-propernouns.lexc
     271 src/fst/stems/adverbs.lexc
     197 src/fst/stems/pronouns.lexc
     121 src/fst/stems/numerals.lexc
     110 src/fst/stems/postpositions.lexc
      30 src/fst/stems/conjunctions.lexc
      24 src/fst/stems/interjections.lexc
      11 src/fst/stems/participles.lexc

We understand why: No compound and not many loanwords in the newspaper. One way of improving our list would still be to look for dictionaries. This we do later, though

Priorities

  1. Adding words from missing lists (Csilla)
  2. Adjective Jack -> Csilla
  3. Look at grammatical issues (Jack, evt Trond, Sjur, with input from Csilla)

When we are down at few missing in the 10 and above we start seriously looking at the spellchecker.

Next meeting

Probably week 50, perhaps in Hki.