Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-mns
Present: Csilla, Jack, Trond
The version of Luima Seripos from the previous project has been checked into test/data
, N = 672628.
Coverage: 83% for LS, 86 % for the textbook.
TODO for next meeting:
Missing list command:
cat test/data/filename.txt |hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |grep " ?" |cut -d'"' -f2 |sort|uniq -c|sort -nr > misc/filename.missing.DATE.freq
у with macron
cat test/data/Luima\ Seripos\ 2013-2017.txt |hfst-tokenise -g tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst|grep ' ?'|sort|uniq -c |sort -nr|wc
23689 76434 702817
Jacks-MacBook-Pro:lang-mns jackrueter$ cat test/data/Luima\ Seripos\ 2013-2017.txt | perl -wpne 's/Ӯ/Ӯ/g; s/ӯ/ӯ/g; '|hfst-tokenise -g tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst|grep ' ?'|sort|uniq -c |sort -nr|wc
22251 71687 657136
(NB! Csilla to find out by 21st)
Perhaps: Friday, June 23th at 11 Finnish time.