Inari Sámi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-smn

Testing is important, before you check in your work

The essence of testing: With testing, we want to check 3 things:

  1. that the fst is not broken: (make)
  2. that what we want to achieve is achieved: (analyse / generate the form(s) you want)
  3. that our work has not made the FST worse: (yaml tests + check_lemmas.sh & ensure numbers are not worse)

make

With make you check that there are no technical issues. Read the report.

Typical issues:

make check

With make check you also check the morphology:

yaml-tests

genererating of all stems - get list of missing lemmas: nouns, adjectives, verbs, propernouns

Test only yaml-tests

Test only yaml-tests with this command:

sh test/yaml-check.sh

Test only generating of all stems

Test only genererating of nouns, adjectives, verbs, propernouns (no yaml-tests) with this command:

sh test/check_lemmas.sh

Test if you have achieved what you were trying to achieve

Analyse the forms:

usmn and usmnNorm

analyse e.g. nieidáin

If you don’t get any analysis, only ?, then you should generate the word:

dsmn and dsmnNorm

generate the forms, e.g. nieidâ+N+Sg+Com

Are you not quite sure that you you have a new analyser and generator? How to check the date/time for when you analyser/generator was compiled:

` ll src/`

Scripts as a help to look at the generated forms

When you are in langs/smn - the quick commands:

sh devtools/noun_minip.sh nieidâ

sh devtools/adj_minip.sh uánehâš

sh devtools/prop_minip.sh Aanaar

Get only the correct lemma and not compounds:

sh devtools/noun_minip.sh '^nieidâ[:+]' 

Look at all lemmas going to the same continuation lexicon:

sh devtools/noun_minip.sh PIIVTAS | less 
sh devtools/adj_minip.sh KOOIDAS | less 

Test the miniparadigms in the stem files

Look at all forms:

grep '¢' src/fst/stems/nouns.lexc | cut -d '¢' -f2 | cut -d '!' -f1 |preprocess |grep '[a-z]' |usmnNorm |less 

Get the forms which are not recognized by the analyser:

grep '¢' src/fst/stems/nouns.lexc | cut -d '¢' -f2 | cut -d '!' -f1 |preprocess |grep '[a-z]' |usmnNorm |grep '\?' | less