Finite state and Constraint Grammar based analysers, proofing tools and other resources
The lexc can be written with SRO (The Standard Roman Orthography)
From this one can compile alternative FSTs with systematic variation, as a base for generation of spell checker programs and ICALL-programs.
For an analyser to be used for analysing texts, one can use spell relax to get the analyser to understand all orthographies. With spell relax there will not be any tags in the output to tell which kind of orthography is used.
Words with another orthography, from other dialects:
These are forms, which don’t follow orthographic principles, but still they are in texts, which we want to analyse.
The normative form should be on the left side, and then the lemma in the analysis will be a normative form and can be found e.g. in the dictionary.
bázahus:bázahuss JOHTOLAT "remainder" ;
bázahus+Err/Orth:bázáhuss JOHTOLAT "remainder" ;
The descriptive FST will inflect both “bázahus” and “bázáhus”, but the string with the tag Err/Sub is removed from the normative analyser/generator during the compilation prosess.
bázahusat
bázahusat bázahus+N+Pl+Nom
bázáhusat
bázáhusat bázahus+Err/Orth+N+Pl+Nom
The normative analyser:
bázahusat
bázahusat bázahus+N+Pl+Nom
bázáhusat
bázáhusat bázáhusat +?
Ex. “brillefutterála” which is a slightly adapated loanword from Norwegian to North Saami. The normative word is “čalbmelássaskuohppu”
brillefutterála+Err/Lex:brille#futterál SOSIAL "spectacle case" ;
The descriptive FST will inflect “brillefutterála”, but the line with the tag Err/Lex is removed from the normative analyser/generator during the compilation prosess.
brillefutterálat
brillefutterálat brillefutterála+N+Err/Lex+Pl+Nom
The normative analyser:
brillefutterálat
brillefutterálat brillefutterálat +?
Two lemmas can have identical base forms, but different paradigms and semantics.
Example from North Saami. G3 = Grade 3 in consonant gradation
beassi:beassi BEARRI "nest" ;
beassi+G3:beas'si AIGI "birchbark" ;
Analysis:
beassi
beassi beassi+N+G3+Sg+Nom
beassi beassi+N+G3+Sg+Acc
beassi beassi+N+G3+Sg+Gen
beassi beassi+N+Sg+Nom
beasi
beasi beassi+N+Sg+Gen
beasi beassi+N+Sg+Acc
Example from North Saami. NomAg tag for derivation Nomen Agentis
vuovdi+NomAg:vuovdi ACTOR "salesman" ;
vuovdi+G3:vuov'di AIGI "forest" ;
Analysis:
vuovdi
vuovdi vuovdi+N+NomAg+Sg+Nom
vuovdi vuovdi+N+NomAg+Sg+Acc
vuovdi vuovdi+N+NomAg+Sg+Gen
vuovdi vuovdi+N+Sg+Nom
vuovddi
vuovddi vuovdi+N+Sg+Gen
vuovddi vuovdi+N+Sg+Acc
Example from South Saami, two verbs:
govledh+Hom1:govl TJOEHPEDH_TV "hear" ;
govledh+Hom2:govl VÅÅJNEDH "sound" ;
Analysis:
gåvla
gåvla govledh+Hom1+V+TV+Ind+Prs+Sg3
govloe
govloe govledh+Hom2+V+IV+Ind+Prs+Sg3
One lemma can have orthograpic variants for base form and at least parts of the inflection paradigm. We can add a variants tag as a help to recognize the correct base form for the paradigm.
Example from North Saami:
mandáhta+v2:mandáhtta GOAHTI-A "mandate" ;
mandáhta+v1:mandáhta STAHTA "mandate" ;
Generation with normative generator gives:
mandáhta+v2+N+Ess
mandáhta+v2+N+Ess mandáhttan
mandáhta+v1+N+Ess
mandáhta+v1+N+Ess mandáhtan
If the base forms are identical, but there are variants in the inflection, we don’t use these tags.