South Sámi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-sma

Test results for the morphology and lexicon files

This document documents the testing of the parser and disambiguator. Background info and test plan is found in the test plan document.What is found here is an overview of what has been tested, both vocabulary testing, testing of the disambiguator, and testing of the morphological analysis.

Test results for morphology and lexicon

Vocabualry testing

The following table records recall for word forms in various texts. Here we measure coverage of the vocabulary, by recording all word forms that are not recognised.

---------------------------------------------------------
Don jih daan bijre
Test    Wftot Wf-tkn %-recall   Tytot  Wf-typ  %-recall
090902  80546  68386   84.9 %   11430    6328   55.4 %

----------------------------------------------------------

Explaining the table

Lower token than type percentage indicates that the parser misses common words more often than seldom ones.

Lower type than token percentage (which is the case) indicates that the parser is good at the core vocabulary, but has

Each text is given a separate section in the table, ordered chronologically, with the oldest test case (Test 1) at the bottom. The first line of each section gives the name of the file (note: the files of the test cases 2 and 3 are so changed that these two test cases are closed). Each line represents a test run. The first colum gives the test date (in the format ddmmyy), the second (WFtot) the total number of words in the file question, the third (Wf-tkn) the number of recognised word form tokens, and the percentage compared to the total. The next columns does the same for wordform types (cf. below for the commands used to calculate the numbers).

-------------------------------------------------------------------------
Wftot:
cat filename | preprocess | wc -l

Non_recognised_wf:
cat filename | preprocess | lookup -flags mbTT -utf8 bin/sma.fst
 | grep '\?' | grep -v CLB | wc -l

Wf-tkn = Wftot - Non_recognised_wf

%-recall = Wf-tkn * 100 / Wftot
-------------------------------------------------------------------------
Tytot (Total number of wordform types):
cat filename | preprocess | sort | uniq | wc -l

Non_recognised_wt (Number of non-analysed wordform types:
cat filename | preprocess | sort | uniq |
lookup -flags mbTT -utf8 bin/sma.fst | grep '\?' | grep -v CLB | wc -l

Wf-typ (Number of recognised wordform types)
Wf-typ = Tytot - Non_recognised_wt

%-recall = Wf-typ * 100 / Tytot
-------------------------------------------------------------------------

Sitemap

Debugging site.pages:

URL: /assets/css/style.css - Title:

URL: /ConvertingToApertium.html - Title:

URL: /KompilereFST.html - Title:

URL: /Links.html - Title:

URL: /adj-meeting-05-2009.html - Title: Stoda no

URL: /docu-sma-adjs.html - Title: Sørsamiske adjektiv, system

URL: /docu-sma-background.html - Title: Background information on the South Saami project

URL: /docu-sma-bugs.html - Title: Bug reports, errors

URL: /docu-sma-deptags.html - Title: South Saami dependency tags

URL: /docu-sma-grammartags.html - Title: Overview

URL: /docu-sma-lex.html - Title: Documenting the South Saami lexicon file

URL: /docu-sma-morphophonology.html - Title: South Saami morphophonological processes

URL: /docu-sma-testplan.html - Title: Test plan for sma

URL: /docu-sma-twol.html - Title: Documentation of South Saami rules

URL: /docu-sma-verbs.html - Title: Souths Saami verb morphology

URL: /gramcheck/collecting-developer-texts.html - Title: Collecting developer texts

URL: /gramcheck/ - Title: Grammar checker for South Saami

URL: /index-header.html - Title: South Sámi documentation

URL: / - Title: South Sámi documentation

URL: /lemma.html - Title: Prinsipp for lemmatisering av sørsamisk

URL: /normativity-issues.html - Title: Background

URL: /sma-korpus-innsamling.html - Title: Korpusmøte for sma

URL: /sma-testdiary.html - Title: Test results for the morphology and lexicon files

URL: /sma.html - Title: South Sámi language model documentation

URL: /sma_lemma.freq.html - Title:

URL: /sma_wf.freq.html - Title:

URL: /src-cg3-disambiguator.cg3.html - Title: S O U T H   S Á M I   D I S A M B I G U A T O R

URL: /src-cg3-valency.cg3.html - Title: S O U T H   S Á M I   V A L E N C Y A N N O T A T O R

URL: /src-fst-morphology-affixes-abbreviations.lexc.html - Title: Continuation lexicons for abbreviations

URL: /src-fst-morphology-affixes-adjectives.lexc.html - Title: Adjective affixes

URL: /src-fst-morphology-affixes-nouns.lexc.html - Title: Nominal inflection sublexica

URL: /src-fst-morphology-affixes-possessive-suffixes.lexc.html - Title:

URL: /src-fst-morphology-affixes-propernouns.lexc.html - Title: Proper nouns morphology

URL: /src-fst-morphology-affixes-symbols.lexc.html - Title: Symbol affixes

URL: /src-fst-morphology-affixes-verbs.lexc.html - Title: South Saami verbal inflection sublexica

URL: /src-fst-morphology-compounding.lexc.html - Title: South Sámi morphological analyser

URL: /src-fst-morphology-phonology.twolc.html - Title: South Sámi morphophonological rule set

URL: /src-fst-morphology-root.lexc.html - Title: South Sámi morphological analyser

URL: /src-fst-morphology-stems-adjectives.lexc.html - Title: Adjective stems

URL: /src-fst-morphology-stems-adverbs.lexc.html - Title:

URL: /src-fst-morphology-stems-nouns.lexc.html - Title: South Sámi nouns

URL: /src-fst-morphology-stems-numerals.lexc.html - Title:

URL: /src-fst-morphology-stems-pronouns.lexc.html - Title: South Saami pronouns

URL: /src-fst-morphology-stems-sma-propernouns.lexc.html - Title:

URL: /src-fst-morphology-stems-verbs.lexc.html - Title: Verb stems

URL: /src-fst-oahpa-filer-aff-adjectives-oahpa.lexc.html - Title: Adjective affixes

URL: /src-fst-oahpa-filer-stems-adjectives-oahpa.lexc.html - Title: Adjective stems

URL: /src-fst-phonetics-txt2ipa.xfscript.html - Title:

URL: /src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html - Title:

URL: /src-fst-transcriptions-transcriptor-symbols2text.lexc.html - Title:

URL: /syntaks-testing.html - Title: Syntaks-testmateriale

URL: /tools-grammarcheckers-grammarchecker.cg3.html - Title:

URL: /tools-grammarcheckers-grc-disambiguator.cg3.html - Title: S O U T H   S Á M I   D I S A M B I G U A T O R

URL: /tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html - Title: Tokeniser for sma

URL: /tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html - Title: Grammar checker tokenisation for sma

URL: /tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html - Title: TTS tokenisation for smj

Root items:

URL: /ConvertingToApertium.html - Title: Convertingtoapertium

URL: /KompilereFST.html - Title: Kompilerefst

URL: /Links.html - Title: Links

URL: /adj-meeting-05-2009.html - Title: Stoda no

URL: /docu-sma-adjs.html - Title: Sørsamiske adjektiv, system

URL: /docu-sma-background.html - Title: Background information on the South Saami project

URL: /docu-sma-bugs.html - Title: Bug reports, errors

URL: /docu-sma-deptags.html - Title: South Saami dependency tags

URL: /docu-sma-grammartags.html - Title: Overview

URL: /docu-sma-lex.html - Title: Documenting the South Saami lexicon file

URL: /docu-sma-morphophonology.html - Title: South Saami morphophonological processes

URL: /docu-sma-testplan.html - Title: Test plan for sma

URL: /docu-sma-twol.html - Title: Documentation of South Saami rules

URL: /docu-sma-verbs.html - Title: Souths Saami verb morphology

URL: /gramcheck/ - Title: Grammar checker for South Saami

URL: /index-header.html - Title: South Sámi documentation

URL: / - Title: South Sámi documentation

URL: /lemma.html - Title: Prinsipp for lemmatisering av sørsamisk

URL: /normativity-issues.html - Title: Background

URL: /sma-korpus-innsamling.html - Title: Korpusmøte for sma

URL: /sma-testdiary.html - Title: Test results for the morphology and lexicon files

URL: /sma.html - Title: South Sámi language model documentation

URL: /sma_lemma.freq.html - Title: Sma_lemma.freq

URL: /sma_wf.freq.html - Title: Sma_wf.freq

URL: /src-cg3-disambiguator.cg3.html - Title: S O U T H   S Á M I   D I S A M B I G U A T O R

URL: /src-cg3-valency.cg3.html - Title: S O U T H   S Á M I   V A L E N C Y A N N O T A T O R

URL: /src-fst-morphology-affixes-abbreviations.lexc.html - Title: Continuation lexicons for abbreviations

URL: /src-fst-morphology-affixes-adjectives.lexc.html - Title: Adjective affixes

URL: /src-fst-morphology-affixes-nouns.lexc.html - Title: Nominal inflection sublexica

URL: /src-fst-morphology-affixes-possessive-suffixes.lexc.html - Title: Src-fst-morphology-affixes-possessive-suffixes.lexc

URL: /src-fst-morphology-affixes-propernouns.lexc.html - Title: Proper nouns morphology

URL: /src-fst-morphology-affixes-symbols.lexc.html - Title: Symbol affixes

URL: /src-fst-morphology-affixes-verbs.lexc.html - Title: South Saami verbal inflection sublexica

URL: /src-fst-morphology-compounding.lexc.html - Title: South Sámi morphological analyser

URL: /src-fst-morphology-phonology.twolc.html - Title: South Sámi morphophonological rule set

URL: /src-fst-morphology-root.lexc.html - Title: South Sámi morphological analyser

URL: /src-fst-morphology-stems-adjectives.lexc.html - Title: Adjective stems

URL: /src-fst-morphology-stems-adverbs.lexc.html - Title: Src-fst-morphology-stems-adverbs.lexc

URL: /src-fst-morphology-stems-nouns.lexc.html - Title: South Sámi nouns

URL: /src-fst-morphology-stems-numerals.lexc.html - Title: Src-fst-morphology-stems-numerals.lexc

URL: /src-fst-morphology-stems-pronouns.lexc.html - Title: South Saami pronouns

URL: /src-fst-morphology-stems-sma-propernouns.lexc.html - Title: Src-fst-morphology-stems-sma-propernouns.lexc

URL: /src-fst-morphology-stems-verbs.lexc.html - Title: Verb stems

URL: /src-fst-oahpa-filer-aff-adjectives-oahpa.lexc.html - Title: Adjective affixes

URL: /src-fst-oahpa-filer-stems-adjectives-oahpa.lexc.html - Title: Adjective stems

URL: /src-fst-phonetics-txt2ipa.xfscript.html - Title: Src-fst-phonetics-txt2ipa.xfscript

URL: /src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html - Title: Src-fst-transcriptions-transcriptor-abbrevs2text.lexc

URL: /src-fst-transcriptions-transcriptor-symbols2text.lexc.html - Title: Src-fst-transcriptions-transcriptor-symbols2text.lexc

URL: /syntaks-testing.html - Title: Syntaks-testmateriale

URL: /tools-grammarcheckers-grammarchecker.cg3.html - Title: Tools-grammarcheckers-grammarchecker.cg3

URL: /tools-grammarcheckers-grc-disambiguator.cg3.html - Title: S O U T H   S Á M I   D I S A M B I G U A T O R

URL: /tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html - Title: Tokeniser for sma

URL: /tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html - Title: Grammar checker tokenisation for sma

URL: /tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html - Title: TTS tokenisation for smj

Directory items:

URL: /gramcheck/collecting-developer-texts.html - Title: Collecting developer texts