Kven Finnish NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-fkv

Test diary

This document writes down test statistics

The overal test command: make check

yaml

The command:

sh test/yaml-check.sh

(data forthcoming)

Lexical coverage

fkv Number of words (standing in lang-fkv):

cat test/data/freecorpus.txt |\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |wc -l

Number of unknown words:

cat test/data/freecorpus.txt |\
 hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
 preprocess --corr=test/data/typos.txt|\
 hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
 grep " ?"|cut -d'"' -f2|wc -l

Test with the full corpus (free + bound):

Lexical coverage of freecorpus

The file is test/data/freecorpus.txt.

Coverage:

Lexical coverage of free + bound

Coverage:

tbw

Speller suggestions

Results

The table shows the number of typos tested, as well as some data for suggestions. For how to test, see below.

Test results with divvunspell (for older data, see below):

             typos    Avrg pos    % missp    % missp   % missp
             .txt     for corr    in 1st     in top-5  anywhere   
---------------------------------------------------------------
240509:       736                 67.39      86.01      
240510:       737                 67.30      85.89     90.09 
---------------------------------------------------------------

How to test

To test: Clone divvunspell and install divvunspell and acceracy. Then stand in divvunspell and do:

accuracy -o support/accuracy-viewer/public/report.json ../../giellalt/lang-mns/test/data/ typos.txt ../../giellalt/lang-mns/tools/spellcheckers/mns.zhfst

cd support/accuracy-viewer

npm i && npm run dev

At the end the report says (for example) port: 35729. Take the 5-digit number and open (the parallel to) http://localhost:35729 in your browser.