Inari Sámi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-smn

Inari Sámi morphological analyser

This file documents the Inari Sámi morphological analyser. The tags in bold are the ones in use in the analyser.

Multichar_Symbols definitions

Parts of speech

Tags for sub-POS

Tags for governing abbreviations in preprocessing

Grammatical properties

Number

Person - number

Px

Cases

Adjectival forms

Adverb types

Tense - mood

Indefinite verb forms

Derivation tags

The derivation position tags.

The tag +Der

All non-positional derivations should be preceded by this tag, to make it possible to target regular expressions at all derivations in a language-independent way: just specify +Der|+Der1 .. +Der5 and you are set.

Table for derivation tags

Pos1 Pos2 Pos3 Pos4 POS switches (from-to) Explanation
+Der1         Position tag, required
  +Der2       Position tag, required
    +Der3     Position tag, required
      +Der4   Position tag, required
+Der/lasj       NA  
+Der/d       VV  
+Der/tt       VV - Causative čälittiđ
+Der/Caus       VV - 3-syll causatives
+Der/l       VV  
+Der/st       VV čälistiđ
+Der/Car       NA * +Der1+Der2 - only combine with Der3 caritive: peljittem
+Der/laakan       AA * +Der1+Der2 - only combine with Der3
+Der/Pass       VV - passive
  +Der/Dimin     NN (was: Der/aš & Der/š)
  +Der/NomAg        
  +Der/NomAct     VN Der/NomAct har to realisasjonar, med ulike restriksjonar,
  +Der/sasj     NA  
  +Der/alla     VV  
  +Der/adda     VV  
  +Der/AAdv       adverb pyeremusávt pyeremusâht
  +Der/taa       adverb pyeremustáá !This is not the best tag?
    +Der/vuota   AN  
      +Der/InchL VV  
      +Der/upmi VN  
      +Der/mas VN  

Other derivations

Other/unclassified derivations, can appear in all positions:

Clitics

Error tags

All Err-tags must have a normative form as lemma except Err/Lex

Usage tags

Semantic tags to help disambiguation & synt. analysis: (before POS)

semtags to be checked

Multiple Semantic tags:

Punctuation

Morphophonemes

Archiphonemes

Triggers

Symbols that need to be escaped on the lower side (towards twolc):

Variants within the same paradigm

Compound tags

These tags describe the parts of the compound.

The prefix (before “/”) is Cmp.

These tags govern the parts of the compound

The prefix (before “/”) is CmpNP: (meaning: this is the normative position of thus tag)

The prefix (before “/”) is CmpN: (meaning: this is the normative position of thus tag) The tagged part of the compound should make a compound using:

Unmarked = Default, ie +CmpN/SgN for SMN.

The second part of the compound may require that the previous (left part) is:

Language tagged names

Flag diacritics

We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again. The P sets positive value, the C clears it.

Flag Explanation
@P.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
@D.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
@C.NeedNoun@ (Dis)allow compounds with verbs unless nominalised
@R.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
@D.ErrOrth.ON@ Disallow ErrOrth
@C.ErrOrth@ Clear ErrOrth flag
@P.ErrOrth.ON@ Set positive value for ErrOrth flag
@R.ErrOrth.ON@ Reset ErrOrth Flag

For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.

Flag Explanation
@P.CmpFrst.FALSE@ Require that words tagged as such only appear first
@D.CmpPref.TRUE@ Block such words from entering ENDLEX
@P.CmpPref.FALSE@ Block these words from making further compounds
@D.CmpLast.TRUE@ Block such words from entering R
@D.CmpNone.TRUE@ Combines with the next tag to prohibit compounding
@U.CmpNone.FALSE@ Combines with the prev tag to prohibit compounding
@U.CmpNone.TRUE@ Combines with the two previous ones to block compounding
@P.CmpOnly.TRUE@ Sets a flag to indicate that the word has passed R
@D.CmpOnly.FALSE@ Disallow words coming directly from root.
@D.CmpHyph.TRUE@ Flag to control hyphenated compounds like proper nouns
@U.CmpHyph.FALSE@ Flag to control hyphenated compounds like proper nouns
@U.CmpHyph.TRUE@ Flag to control hyphenated compounds like proper nouns
@C.CmpHyph@ Flag to control hyphenated compounds like proper nouns
@P.CmpHyph.TRUE@ Flag to control hyphenated compounds like proper nouns
@N.CmpHyph.TRUE@ Flag to control hyphenated compounds like proper nouns

Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.

Flag Explanation
@U.Cap.Obl@ Allowing downcasing of derived names: deatnulasj.
@U.Cap.Opt@ Allowing downcasing of derived names: deatnulasj.
@U.NeedsVowRed.OFF@ is used to force hyphenation/non-reduction: samediggi-
@U.NeedsVowRed.ON@ is used to force reduction w/o hyphen: samedigge#xxx
@C.NeedsVowRed@ Clearing this feature, so that it doesn’t interfere with further compounding
@C.Px@ Clear px
@C.Nom3Px@  
@P.Px.add@  
@R.Px.add@  
@P.Px.block@  
@D.Px.block@  
@P.Nom12Px.add@  
@R.Nom12Px.add@  
@P.Nom3Px.add@  
@R.Nom3Px.add@  
@P.Vgen.add@  
@R.Vgen.add@  
@R.SpellRlx.ON@ Flag used to tag spell-relax-analysed strings (and only those).
@D.SpellRlx.ON@ Flag used to tag spell-relax-analysed strings (and only those).
@C.SpellRlx@ Flag used to tag spell-relax-analysed strings (and only those).
@R.SpaceCmp.ON@ Flag to tag compounds written with a space
@D.SpaceCmp.ON@ Flag to tag compounds written with a space
@C.SpaceCmp@ Flag to tag compounds written with a space

Use the following flag diacritics to control harmony in numeral case inflection

Flag Explanation
@U.Case.SgNom@ Unifies with case Nominative
@U.Case.PlNom@ Unifies with case Nominative
@U.Case.SgGen@ Unifies with case Genitive
@U.Case.PlGen@ Unifies with case Genitive
@U.Case.PlAcc@ Unifies with case Accusative
@U.Case.SgLoc@ Unifies with case Locative
@U.Case.PlLoc@ Unifies with case Locative
@U.Case.SgIll@ Unifies with case Illative
@U.Case.PlIll@ Unifies with case Illative
@U.Case.SgCom@ Unifies with case Comitative
@U.Case.Ess@ Unifies with case Essive
@U.Number.Sg@ Unifies with number Singular - perhaps not in use
@U.Number.Pl@ Unifies with number Plural - perhaps not in use
Flag diacritic Explanation
@U.number.one@ Flag used to give arabic numerals in smj different cases ;
@U.number.two@ Flag used to give arabic numerals in smj different cases ;
@U.number.three@ Flag used to give arabic numerals in smj different cases ;
@U.number.four@ Flag used to give arabic numerals in smj different cases ;
@U.number.five@ Flag used to give arabic numerals in smj different cases ;
@U.number.six@ Flag used to give arabic numerals in smj different cases ;
@U.number.seven@ Flag used to give arabic numerals in smj different cases ;
@U.number.eight@ Flag used to give arabic numerals in smj different cases ;
@U.number.nine@ Flag used to give arabic numerals in smj different cases ;
@U.number.zero@ Flag used to give arabic numerals in smj different cases ;

Basic lexica, pointing to the other lexicon files

Lexicon Root where everyting starts

We split of the 3 lexica defined above already here:

Lexicon ENDLEX

And this is the ENDLEX of everything:

**@D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ # ;** 

The @D.CmpOnly.FALSE@ flag diacritic is ued to disallow words tagged with +CmpNP/Only to end here. The @D.NeedNoun.ON@ flag diacritic is used to block illegal compounds.


This (part of) documentation was generated from src/fst/morphology/root.lexc