Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-yrk
Words (76998):
cat misc/yrkwiki.list misc/Matt_4.txt |grep -v "[a-zA-Z]"|\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
grep "[а-я]"|grep -v '"."'|wc -l
Unknown:
cat misc/yrkwiki.list misc/Matt_4.txt |grep -v "[a-zA-Z]" |\
hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
grep "[а-я]"|grep -v '"."'|huyrk|grep " ?"|wc -l