Mansi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-mns

mns meeting 16.6. 2023

Present: Csilla, Jack, Trond

Agenda

Status

For the work

The version of Luima Seripos from the previous project has been checked into test/data, N = 672628.

For the fst

Coverage: 83% for LS, 86 % for the textbook.

Priorities onward

TODO for next meeting:

Missing list command:

cat test/data/filename.txt |hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |grep " ?" |cut -d'"' -f2 |sort|uniq -c|sort -nr > misc/filename.missing.DATE.freq
у with macron
cat test/data/Luima\ Seripos\ 2013-2017.txt |hfst-tokenise -g tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst|grep ' ?'|sort|uniq -c |sort -nr|wc
   23689   76434  702817
Jacks-MacBook-Pro:lang-mns jackrueter$ cat test/data/Luima\ Seripos\ 2013-2017.txt | perl -wpne 's/Ӯ/Ӯ/g; s/ӯ/ӯ/g; '|hfst-tokenise -g tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst|grep ' ?'|sort|uniq -c |sort -nr|wc
   22251   71687  657136

Next meeting

(NB! Csilla to find out by 21st)

Perhaps: Friday, June 23th at 11 Finnish time.