Tornedalen Finnish NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-fit

Test diary

This document writes down test statistics

The overal test command is make check. Other commands are described below.

Lexical coverage

Number of words (standing in lang-fit):

cat test/data/freecorpus.txt |\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |wc -l

Number of unknown words:

cat test/data/freecorpus.txt |\
 hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
 preprocess --corr=test/data/typos.txt|\
 hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
 grep " ?"|cut -d'"' -f2|wc -l

Lexical coverage of freecorpus

The file is test/data/freecorpus.txt.

Coverage:

Lexical coverage of free + bound

Coverage:

Speller suggestions

The table shows the number of typos tested, as well as some data for suggestions.

             typos      Avrg pos        % missp        % missp
             .txt       for corr        in 1st         in top-5     
-----------------------------------------------------------------
240411:        11        1.40           72.73          90.91        
240422:       150        1.35           66.67          78.52       
240422b:      150        1.05           77.61          80.60       
240424:       220        1.11           81.96          85.57       
-----------------------------------------------------------------

The number of typos is only 11 and the table is given as an illustration only.

Lemma coverage

make check measures in how many cases the generator is not able to generate the baseforms of each lemma. The following table tells how often it fails.

Date A Prop N V
240411 0 17 32 19
240425 0 17 36 19
240503 0 17 43 17

The files counted are found in the catalogue test/src/morphology, and the files are:

missing_adjectives_lemmas.hfst.txt
missing_fit-propernouns_lemmas.hfst.txt
missing_nouns_lemmas.hfst.txt
missing_verbs_lemmas.hfst.txt

yaml

Test with the command: sh test/yaml-check.sh

230428:  gt-norm fst(s): PASSES: 3176 / FAILS:  4 / TOTAL: 3180
240425:  gt-norm fst(s): PASSES: 3180 / FAILS: 18 / TOTAL: 3198
240430:  gt-norm fst(s): PASSES: 3128 / FAILS: 70 / TOTAL: 3198
240501:  gt-norm fst(s): PASSES: 3170 / FAILS: 28 / TOTAL: 3198
240502:  gt-norm fst(s): PASSES: 3142 / FAILS: 56 / TOTAL: 3198
240503:  gt-norm fst(s): PASSES: 3180 / FAILS: 24 / TOTAL: 3204

Sitemap

Debugging site.pages:

URL: /assets/css/style.css - Title:

URL: /HInsertion.html - Title:

URL: /Links.html - Title:

URL: /fit.html - Title: Meänkieli (Tornedalen Finnish) language model documentation

URL: /index-header.html - Title: Meänkieli documentation

URL: / - Title: Meänkieli documentation

URL: /isof/ - Title: Kurs i lexc og twolc for Isof, april 2022

URL: /isof/timeplan.html - Title: Oversikt over kurset

URL: /meetings/230301.html - Title: Møte om språkteknologi for meänkieli

URL: /src-cg3-dependency.cg3.html - Title: C O M M O N S Á M I D E P E N D E N C Y G R A M M A R

URL: /src-cg3-disambiguator.cg3.html - Title: Disambiguator for Meänkieli

URL: /src-cg3-functions.cg3.html - Title:

URL: /src-fst-morphology-affixes-abbreviations.lexc.html - Title: Documenting the morphological tags for Meänkieli abbreviations

URL: /src-fst-morphology-affixes-acronyms.lexc.html - Title: Documenting Meänkieli acronym morphology

URL: /src-fst-morphology-affixes-adjectives.lexc.html - Title: Documenting the file for Meänkieli adjective morphology

URL: /src-fst-morphology-affixes-nouns.lexc.html - Title: Meänkieli noun morphology

URL: /src-fst-morphology-affixes-numerals.lexc.html - Title: Meänkieli numerals

URL: /src-fst-morphology-affixes-pronouns.lexc.html - Title:

URL: /src-fst-morphology-affixes-propernouns.lexc.html - Title: Meänkieli propernoun morphology

URL: /src-fst-morphology-affixes-symbols.lexc.html - Title: Symbol affixes

URL: /src-fst-morphology-affixes-verbs.lexc.html - Title: Meänkieli verbs

URL: /src-fst-morphology-phonology.twolc.html - Title: Meänkieli twolc file

URL: /src-fst-morphology-root.lexc.html - Title: Meänkieli morphological transducer

URL: /src-fst-morphology-stems-adjectives.lexc.html - Title: Meänkieli adjectives

URL: /src-fst-morphology-stems-adverbs.lexc.html - Title: Meänkieli adverbs

URL: /src-fst-morphology-stems-conjunctions.lexc.html - Title: Meänkieli conjunctions

URL: /src-fst-morphology-stems-fit-abbreviations.lexc.html - Title: File containing meänkieli abbreviations

URL: /src-fst-morphology-stems-fit-acronyms.lexc.html - Title: Meänkieli aacronyms

URL: /src-fst-morphology-stems-fit-propernouns.lexc.html - Title: Meänkieli propernouns

URL: /src-fst-morphology-stems-interjections.lexc.html - Title: Meänkieli interjections

URL: /src-fst-morphology-stems-nouns.lexc.html - Title: Noun stems for Meänkieli

URL: /src-fst-morphology-stems-numerals.lexc.html - Title: Meänkieli numerals

URL: /src-fst-morphology-stems-postpositions.lexc.html - Title: Meänkieli postpositions

URL: /src-fst-morphology-stems-prepositions.lexc.html - Title: Meänkieli prepositions

URL: /src-fst-morphology-stems-pronouns.lexc.html - Title: Meänkieli pronouns

URL: /src-fst-morphology-stems-subjunctions.lexc.html - Title: Meänkieli subjunctions

URL: /src-fst-morphology-stems-verbs.lexc.html - Title: Documenting the file for meänkieli verbs

URL: /src-fst-phonetics-txt2ipa.xfscript.html - Title:

URL: /src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html - Title:

URL: /src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.html - Title: Number transcriptions

URL: /test-diary.html - Title: Test diary

URL: /tools-grammarcheckers-grammarchecker.cg3.html - Title:

URL: /tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html - Title: Tokeniser for fit

URL: /tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html - Title: Grammar checker tokenisation for fit

URL: /tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html - Title: TTS tokenisation for smj

URL: /tyolista.html - Title: Työlista = Arbetslista

Root items:

URL: /HInsertion.html - Title: Hinsertion

URL: /Links.html - Title: Links

URL: /fit.html - Title: Meänkieli (Tornedalen Finnish) language model documentation

URL: /index-header.html - Title: Meänkieli documentation

URL: / - Title: Meänkieli documentation

URL: /isof/ - Title: Kurs i lexc og twolc for Isof, april 2022

URL: /src-cg3-dependency.cg3.html - Title: C O M M O N S Á M I D E P E N D E N C Y G R A M M A R

URL: /src-cg3-disambiguator.cg3.html - Title: Disambiguator for Meänkieli

URL: /src-cg3-functions.cg3.html - Title: Src-cg3-functions.cg3

URL: /src-fst-morphology-affixes-abbreviations.lexc.html - Title: Documenting the morphological tags for Meänkieli abbreviations

URL: /src-fst-morphology-affixes-acronyms.lexc.html - Title: Documenting Meänkieli acronym morphology

URL: /src-fst-morphology-affixes-adjectives.lexc.html - Title: Documenting the file for Meänkieli adjective morphology

URL: /src-fst-morphology-affixes-nouns.lexc.html - Title: Meänkieli noun morphology

URL: /src-fst-morphology-affixes-numerals.lexc.html - Title: Meänkieli numerals

URL: /src-fst-morphology-affixes-pronouns.lexc.html - Title: Src-fst-morphology-affixes-pronouns.lexc

URL: /src-fst-morphology-affixes-propernouns.lexc.html - Title: Meänkieli propernoun morphology

URL: /src-fst-morphology-affixes-symbols.lexc.html - Title: Symbol affixes

URL: /src-fst-morphology-affixes-verbs.lexc.html - Title: Meänkieli verbs

URL: /src-fst-morphology-phonology.twolc.html - Title: Meänkieli twolc file

URL: /src-fst-morphology-root.lexc.html - Title: Meänkieli morphological transducer

URL: /src-fst-morphology-stems-adjectives.lexc.html - Title: Meänkieli adjectives

URL: /src-fst-morphology-stems-adverbs.lexc.html - Title: Meänkieli adverbs

URL: /src-fst-morphology-stems-conjunctions.lexc.html - Title: Meänkieli conjunctions

URL: /src-fst-morphology-stems-fit-abbreviations.lexc.html - Title: File containing meänkieli abbreviations

URL: /src-fst-morphology-stems-fit-acronyms.lexc.html - Title: Meänkieli aacronyms

URL: /src-fst-morphology-stems-fit-propernouns.lexc.html - Title: Meänkieli propernouns

URL: /src-fst-morphology-stems-interjections.lexc.html - Title: Meänkieli interjections

URL: /src-fst-morphology-stems-nouns.lexc.html - Title: Noun stems for Meänkieli

URL: /src-fst-morphology-stems-numerals.lexc.html - Title: Meänkieli numerals

URL: /src-fst-morphology-stems-postpositions.lexc.html - Title: Meänkieli postpositions

URL: /src-fst-morphology-stems-prepositions.lexc.html - Title: Meänkieli prepositions

URL: /src-fst-morphology-stems-pronouns.lexc.html - Title: Meänkieli pronouns

URL: /src-fst-morphology-stems-subjunctions.lexc.html - Title: Meänkieli subjunctions

URL: /src-fst-morphology-stems-verbs.lexc.html - Title: Documenting the file for meänkieli verbs

URL: /src-fst-phonetics-txt2ipa.xfscript.html - Title: Src-fst-phonetics-txt2ipa.xfscript

URL: /src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html - Title: Src-fst-transcriptions-transcriptor-abbrevs2text.lexc

URL: /src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.html - Title: Number transcriptions

URL: /test-diary.html - Title: Test diary

URL: /tools-grammarcheckers-grammarchecker.cg3.html - Title: Tools-grammarcheckers-grammarchecker.cg3

URL: /tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html - Title: Tokeniser for fit

URL: /tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html - Title: Grammar checker tokenisation for fit

URL: /tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html - Title: TTS tokenisation for smj

URL: /tyolista.html - Title: Työlista = Arbetslista

Directory items:

URL: /isof/timeplan.html - Title: Oversikt over kurset

URL: /meetings/230301.html - Title: Møte om språkteknologi for meänkieli