Nenets NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-yrk

Test diary

Coverage

Wikipedia and Matt 4.

Commands

Words (76998):

cat misc/yrkwiki.list misc/Matt_4.txt  |grep -v "[a-zA-Z]"|\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
grep "[а-я]"|grep -v '"."'|wc -l

Unknown:

cat misc/yrkwiki.list misc/Matt_4.txt  |grep -v "[a-zA-Z]" |\
hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
grep "[а-я]"|grep -v '"."'|huyrk|grep " ?"|wc -l

Test results