Inari Sámi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

On this page

Test diary

This document writes down test statistics

The overal test command: make check

yaml

The command:

sh test/yaml-check.sh

(data forthcoming)

Lexical coverage

Number of words (standing in lang-smn):

cat test/data/freecorpus.txt |\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |wc -l

Number of unknown words:

cat test/data/freecorpus.txt |\
 hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
 preprocess --corr=test/data/typos.txt|\
 hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
 grep " ?"|cut -d'"' -f2|wc -l

Test with the full corpus (free + bound):

Lexical coverage of freecorpus

The file is test/data/freecorpus.txt.

Coverage:

Lexical coverage of free + bound

Coverage:

Speller suggestions

The table shows the number of typos tested, as well as some data for suggestions.

To test: Clone divvunspell and install divvunspell and acceracy. Then stand in divvunspell and do:

accuracy -o support/accuracy-viewer/public/report.json ../../giellalt/lang-smn/test/data/ typos.txt ../../giellalt/lang-smn/tools/spellcheckers/smn.zhfst

cd support/accuracy-viewer

npm i && npm run dev

At the end the report says (for example) port: 35729. Take the 5-digit number and open (the parallel to) http://localhost:35729 in your browser.

Test results with divvunspell:

             typos      Avrg pos       % missp        % missp
             .txt       for corr       in 1st         in top-5     
-----------------------------------------------------------------
240521:       904                      56.64          72.35      
240522:       904                      68.14          84.96     
-----------------------------------------------------------------
Last updated: Edit on GitHub

Sitemap