Finite state and Constraint Grammar based analysers, proofing tools and other resources
This document writes down test statistics
The overal test command: make check
The command:
sh test/yaml-check.sh
(data forthcoming)
Number of words (standing in lang-smn
):
cat test/data/freecorpus.txt |\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |wc -l
Number of unknown words:
cat test/data/freecorpus.txt |\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
preprocess --corr=test/data/typos.txt|\
hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
grep " ?"|cut -d'"' -f2|wc -l
Test with the full corpus (free + bound):
The file is test/data/freecorpus.txt
.
Coverage:
Coverage:
The table shows the number of typos tested, as well as some data for suggestions.
To test: Clone divvunspell and install divvunspell and acceracy. Then stand in divvunspell
and do:
accuracy -o support/accuracy-viewer/public/report.json ../../giellalt/lang-smn/test/data/ typos.txt ../../giellalt/lang-smn/tools/spellcheckers/smn.zhfst
cd support/accuracy-viewer
npm i && npm run dev
At the end the report says (for example) port: 35729
. Take the 5-digit number and open (the parallel to) http://localhost:35729 in your browser.
Test results with divvunspell:
typos Avrg pos % missp % missp
.txt for corr in 1st in top-5
-----------------------------------------------------------------
240521: 904 56.64 72.35
240522: 904 68.14 84.96
-----------------------------------------------------------------