Inari Sámi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

On this page

Testing is important, before you check in your work

The essence of testing: With testing, we want to check 3 things:

  1. that the fst is not broken: (make)
  2. that what we want to achieve is achieved: (analyse / generate the form(s) you want)
  3. that our work has not made the FST worse: (yaml tests + check_lemmas.sh & ensure numbers are not worse)

make

With make you check that there are no technical issues. Read the report.

Typical issues:

make check

With make check you also check the morphology:

yaml-tests

genererating of all stems - get list of missing lemmas: nouns, adjectives, verbs, propernouns

Test only yaml-tests

Test only yaml-tests with this command:

sh test/yaml-check.sh

Test only generating of all stems

Test only genererating of nouns, adjectives, verbs, propernouns (no yaml-tests) with this command:

sh test/check_lemmas.sh

Test if you have achieved what you were trying to achieve

Analyse the forms:

usmn and usmnNorm

analyse e.g. nieidáin

If you don’t get any analysis, only ?, then you should generate the word:

dsmn and dsmnNorm

generate the forms, e.g. nieidâ+N+Sg+Com

Are you not quite sure that you you have a new analyser and generator? How to check the date/time for when you analyser/generator was compiled:

` ll src/`

Scripts as a help to look at the generated forms

When you are in langs/smn - the quick commands:

sh devtools/noun_minip.sh nieidâ

sh devtools/adj_minip.sh uánehâš

sh devtools/prop_minip.sh Aanaar

Get only the correct lemma and not compounds:

sh devtools/noun_minip.sh '^nieidâ[:+]' 

Look at all lemmas going to the same continuation lexicon:

sh devtools/noun_minip.sh PIIVTAS | less 
sh devtools/adj_minip.sh KOOIDAS | less 

Test the miniparadigms in the stem files

Look at all forms:

grep '¢' src/fst/stems/nouns.lexc | cut -d '¢' -f2 | cut -d '!' -f1 |preprocess |grep '[a-z]' |usmnNorm |less 

Get the forms which are not recognized by the analyser:

grep '¢' src/fst/stems/nouns.lexc | cut -d '¢' -f2 | cut -d '!' -f1 |preprocess |grep '[a-z]' |usmnNorm |grep '\?' | less 
Last updated: Edit on GitHub

Sitemap