mns meeting 16.6. 2023
Present: Csilla, Jack, Trond
Agenda
- Status
- for the work
- for the fst
- Priorities onward
- Next meeting
Status
For the work
The version of Luima Seripos from the previous project has been checked into test/data, N = 672628.
For the fst
Coverage: 83% for LS, 86 % for the textbook.
Priorities onward
TODO for next meeting:
- Check last meetings TODOS.
- Look at yaml results (steadlily decreasing)
- Look at LS missing lst and textbook missing list
Missing list command:
cat test/data/filename.txt |hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |grep " ?" |cut -d'"' -f2 |sort|uniq -c|sort -nr > misc/filename.missing.DATE.freq
у with macron
cat test/data/Luima\ Seripos\ 2013-2017.txt |hfst-tokenise -g tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst|grep ' ?'|sort|uniq -c |sort -nr|wc
23689 76434 702817
Jacks-MacBook-Pro:lang-mns jackrueter$ cat test/data/Luima\ Seripos\ 2013-2017.txt | perl -wpne 's/Ӯ/Ӯ/g; s/ӯ/ӯ/g; '|hfst-tokenise -g tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst|grep ' ?'|sort|uniq -c |sort -nr|wc
22251 71687 657136
Next meeting
(NB! Csilla to find out by 21st)
Perhaps: Friday, June 23th at 11 Finnish time.