Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-rmn
This document writes down test statistics
The overal test command: make check
The command:
sh test/yaml-check.sh
(data forthcoming)
rmn
Number of words (standing in lang-rmn
):
cat test/data/freecorpus.txt |\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |wc -l
Number of unknown words:
cat test/data/freecorpus.txt |\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
preprocess --corr=test/data/typos.txt|\
hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
grep " ?"|cut -d'"' -f2|wc -l
Test with the full corpus (free + bound):
The file is test/data/freecorpus.txt
.
Coverage: