Pite Sámi morphological analyser

This file contains the tags and reference to main lexica

Multichar_Symbols definitions

POS

+N Noun
+V Verb
+A Adjective
+Adv Adverb
+CC Coordinating conjuction
+CS Subordinating conjuction
+Interj Interjection
+Pron Pronoun
+Num Numeral
+Pcle Particle
+Po Postposition
+Pr Preposition

Subclasses

+Pers Personal
+Dem Demonstrative
+Interr Interrogative
+Indef Indefinite
+Refl Reflexive
+Recipr Reciprocal
+Rel Relative
+NomAg Agent noun
+Attr Attributive
+Comp Comparative
+Superl Superlative

Morphosyntactic properties

Verbal MSP

Tense-mode

+Prs Present tense
+Prt Preterite (past) tense
+Ind Indicative mood
+Imprt Imperative mood
+Pot Potential mood

Person-number

+Sg1 First person singular
+Sg2 Second person singular
+Sg3 Third person singular
+Du1 First person dual
+Du2 Second person dual
+Du3 Third person dual
+Pl1 First person plural
+Pl2 Second person plural
+Pl3 Third person plural

Infinite forms

+Inf Infinitive
+Neg Negation verb
+ConNeg Connegative verb
+GerI Gerund I
+GerII Gerund II
+PrfPrc Perfect participle
+PrsPrc Present participle
+VAbess Verb abessive
+Cmp Compound
+TV Transitive verb
+IV Intransitive verb
+Vsubst “actio” verb

Other tags

+ABBR Abbreviation
+Symbol = independent symbols in the text stream, like £, €, ©
+Coll Collocation
+Cmp/SgNom Compound component using Nominative Singular form
+Cmp/SgGen Compound component using Genitive Singular form
+Det Determiner
+Clt Clitic ‘l for some forms of copula/auxiliary verb following V-final word

Derivation tags

+Der/NomAg Derived agent noun
+Der/Dimin Derived diminutive
+Der/State Derived state noun
+Der/VAdv Derived deverbal adverb

Nominal MSP

+Sg Singular
+Pl Plural

Case

+Nom Nominative
+Acc Accusative
+Gen Genitive
+Ill Illative
+Ine Inessive
+Ela Elative
+Com Comitative
+Ess Essive
+Abe Abessive
+Ord Ordinal
+Card Cardinal

Semantic properties of names

Pssessive suffixes

+PxSg1 First person singular possessive suffix
+PxSg2 Second person singular possessive suffix
+PxSg3 Third person singular possessive suffix
+PxDu1 First person dual possessive suffix
+PxDu2 Second person dual possessive suffix
+PxDu3 Third person dual possessive suffix
+PxPl1 First person plural possessive suffix
+PxPl2 Second person plural possessive suffix
+PxPl3 Third person plural possessive suffix

Other tags

+Err/Orth Not part of standard orthography
+Use/NG Found in reality, but not generated
+Use/Circ
+Cmp/Hyph
+Cmp/SplitR
+Use/-Spell
+Use/NGminip
+Use/TTS – only retained in the HFST Text-To-Speech disambiguation tokeniser
+Use/-TTS – never retained in the HFST Text-To-Speech disambiguation tokeniser
+Use/PMatch means that the following is only used in the analyser feeding the disambiguator
+Use/-PMatch Do not include in fst’s made for hfst-pmatch

Compounding tags

The tags are of the following form:

+CmpNP/xxx - Normative (N), Position (P), ie. the tag describes what position the tagged word can be in in a compound
+CmpN/xxx - Normative (N) form ie. the tag describes what form the tagged word should use when making compounds
+Cmp/xxx - Descriptive compounding tags, ie. tags thatdescribes what form a word actually is using in a compound

Normative/prescriptive compounding tags: (to govern compound behaviour for the speller, ie. what a compound SHOULD BE)

The first part of the component may be ..

+CmpN/Sg = Singular
+CmpN/SgN = Singular Nominative
+CmpN/SgG = Singular Genitive
+CmpN/PlG = Plural Genitive
+CmpNP/All - … be in all positions, default, this tag does not have to be written
+CmpNP/First - … only be first part in a compound or alone
+CmpNP/Pref - … only first part in a compound, NEVER alone
+CmpNP/Last - … only be last part in a compound or alone
+CmpNP/Suff - … only last part in a compound, NEVER alone
+CmpNP/None - … not take part in compounds
+CmpNP/Only - … only be part of a compound, i.e. can never be used alone, but can appear in any position
+CmpN/SgLeft Singular to the left
+CmpN/SgNomLeft Singular nominative to the left
+CmpN/SgGenLeft Singular genitive to the left
+CmpN/PlGenLeft Plural genitive to the left
+Cmp/Sg Singular
+Cmp/SgNom Singular Nominative
+Cmp/SgGen Singular Genitive
+Cmp/PlGen Plural Genitiv
+Cmp/PlNom Plural Nominative
+Cmp/Attr Attribute
+Cmp Dynamic compound - this tag should always be part of a dynamic compound. It is important for Apertium, and useful in other cases as well.
+Cmp/SplitR This is a split compound with the other part to the right: “Arbeids- og inkluderingsdepartementet” => Arbeids- = +Cmp/SplitR
+Cmp/SplitL This is a split compound with the other part to the left
+Cmp/Sh testing ShCmp

Punctuation tags

+CLB Clause boundary
+PUNCT Punctuation
+LEFT
+RIGHT +MIDDLE
+SENT

Morphophonological symbols

Symbols for regulating the twolc file

^WG * weak grade ^G3 * marks grade three for stems w/o Cgrad ^V2E2AA * e to á (before j), o to u before j in V2 ^CDEL * Deleting final consonant, biednag ^VDEL * Deleting final V2 vowel in compounds or gájk ^MON * Monophthong in contract ^UAUML * uo to uä juolge / juällge ^IEUML * ie to ä etc. gielbar gællbara ^IUML * a to i, gallgat gillgin ^IJ * e to i in front of Plural j and Sg Com ^V2O2U * o to u in V2 (e.g. Ill.Sg, Dim, some N_ODD) etc. ^MONB4J * No rules for this one in twolc!

Archiphonemes

i2 * Variable vowel, does not trigger VH u2 * Variable vowel, does not trigger VH ä2 * Variable vowel, does not undergo (further) VH b2 d2 g2 t2 j2 * Variable consonants, undergo final devoicing or other alternations ^O * o but ä in uä, a in ua

 »7       * »
 «7       * «
 %[%>%]   * >
 %[%<%]   * <

Flag diacritics

We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again: | @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised

For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm. | @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first | @D.CmpPref.TRUE@ | Block such words from entering ENDLEX | @P.CmpPref.FALSE@ | Block these words from making further compounds | @D.CmpLast.TRUE@ | Block such words from entering R | @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding | @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding | @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R | @D.CmpOnly.FALSE@ | Disallow words coming directly from root.

Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags. | @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. | @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj.

The following flag diacritics are used to control case inflection of numbers:

Flag diacritic	Explanation
@U.number.one@	Flag used to give arabic numerals in smj different cases ;
@U.number.two@	Flag used to give arabic numerals in smj different cases ;
@U.number.three@	Flag used to give arabic numerals in smj different cases ;
@U.number.four@	Flag used to give arabic numerals in smj different cases ;
@U.number.five@	Flag used to give arabic numerals in smj different cases ;
@U.number.six@	Flag used to give arabic numerals in smj different cases ;
@U.number.seven@	Flag used to give arabic numerals in smj different cases ;
@U.number.eight@	Flag used to give arabic numerals in smj different cases ;
@U.number.nine@	Flag used to give arabic numerals in smj different cases ;
@U.number.zero@	Flag used to give arabic numerals in smj different cases ;

The following flag diacritic look-alikes are used in hfst-pmatch/hfst-tokenise to properly handle (possibly) multitoken single strings.

Flag	Explanation
@P.Pmatch.Loc@	Used on multi-token analyses; tell hfst-tokenise/pmatch where in the form/analysis the token should be split.
@P.Pmatch.Backtrack@	Used on single-token analyses; tell hfst-tokenise/pmatch to backtrack by reanalysing the substrings before and after this point in the form (to find combinations of shorter analyses that would otherwise be missed)

Key lexicon

Lexicon Root starts the analyser and directs paths to all POS.

Lexicon ENDLEX

And this is the ENDLEX of everything:

@D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ # ;

The @D.CmpOnly.FALSE@ flag diacritic is ued to disallow words tagged with +CmpNP/Only to end here. The @D.NeedNoun.ON@ flag diacritic is used to block illegal compounds.

This (part of) documentation was generated from src/fst/morphology/root.lexc

Pite Sami NLP Grammar

Page Content