Multichar_Symbols and Root lexicon for Iñupiaq

Multichar_Symbols

Grammatical tags

+N +V +Part +Prop +Pron POS
+Sg +Du +Pl Number
+1Sg +2Sg +3Sg +4Sg Intransitive number Sg
+1Du +2Du +3Du +4Du Intransitive number Du
+1Pl +2Pl +3Pl +4Pl Intransitive number Pl
+1SgO +2SgO +3SgO +4SgO Objective conjugation
+1DuO +2DuO +3DuO +4DuO Objective conjugation
+1PlO +2PlO +3PlO +4PlO Objective conjugation
+Symbol = independent symbols in the text stream, like £, €, ©
+Abs +Rel +Trm +Loc +Abl +Mod Cases
+Prs +Prt Tenses
+Ind +Int +Cau +ConReal +ConUnreal Modes NB! No Imp
+Arch tags for archaic forms. In this pilot just used to indicate twin forms

4th person still missing in the transitive conjugation ľ !digraphs plus ľ for voiceless palatalized l Remember to check this letter, it was problematic on Linux

Boundary symbols

Symbols that need to be escaped on the lower side (towards twolc):

»7: Literal »
«7: Literal « %[%>%] - Literal > %[%<%] - Literal < %> morphemeborder

Derivational affixes

+LLATU +LLATU=NIAQ +NIAQ +NIAQ=ŊIT +ŊIT +SAAĠE +SAAĠE=ŊIT +TEQ verb elaborating +IT +QAQ
+VIK nominalizers
+LU +GUUQ +UNA clitics
Morphophonological dummy symbols examples:*
%^TRUNC truncation dummy
%^CVCTRUNC dummy for very long truncations
%^VCTRUNC dummy for long truncation
%^FRIC dummy for fricativizing stem-final consonants. Needed to avoid a general rule that also would affect unwantedly as in aaġagu for aaqagu. The alternative would have been to postulate truncating flexives with a fricative first consonant (aiviq -q +ġit) but that is hokus pokus
%^EBLOCK dummy to block schwa going to a (aŋutik not *aŋuttak)
%^C dummy for intermediate gemination
%^DEFRIC dummy when fricatives go stops (amaġuq -> amaqquk) as apposed to %C in niġi+VIK -> niġġivik
%^SCHWADEL !dummy with derivatives truncating semi-final schwa

Flag diacritics

These flag diacritics are there tounify IV/TV verbs and their person merophology across the derivational morphology.

@P.IV.ON@ Flag - sets value for transitivity to IV
@P.TV.ON@ Flag - sets value for transitivity to TV
@R.IV.ON@ Flag - reset value for transitivity to IV
@R.TV.ON@ Flag - reset value for transitivity to TV
@D.IV.ON@ Flag - delete if unsaturated IV flag (=Verb was not IV)
@D.TV.ON@ Flag - delete if unsaturated TV flag (=Verb was not TV)

We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:

Flag	Explanation
@P.NeedNoun.ON@	(Dis)allow compounds with verbs unless nominalised
@D.NeedNoun.ON@	(Dis)allow compounds with verbs unless nominalised
@C.NeedNoun@	(Dis)allow compounds with verbs unless nominalised

For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.

Flag	Explanation
@P.CmpFrst.FALSE@	Require that words tagged as such only appear first
@D.CmpPref.TRUE@	Block such words from entering ENDLEX
@P.CmpPref.FALSE@	Block these words from making further compounds
@D.CmpLast.TRUE@	Block such words from entering R
@D.CmpNone.TRUE@	Combines with the next tag to prohibit compounding
@U.CmpNone.FALSE@	Combines with the prev tag to prohibit compounding
@P.CmpOnly.TRUE@	Sets a flag to indicate that the word has passed R
@D.CmpOnly.FALSE@	Disallow words coming directly from root.

Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.

Flag	Explanation
@U.Cap.Obl@	Allowing downcasing of derived names: deatnulasj.
@U.Cap.Opt@	Allowing downcasing of derived names: deatnulasj.

This file gives the start of the Iñupiaq lexicon. The lexicon Root points at the different parts of speech. Each POS has its own file stems/nouns.lexc, etc., which in turn points to affixes/nouns.lexc, etc. POS-changing nominalizers are found in affixes/verbs.lexc and verbalizers in affixes/nouns.lexc It might be a good idea to have noun-ipk-der.txt etc. as well. The common, final lexica, are found in clitics.lexc.

The Root lexicon

LEXICON Root

Nouns ;
Verbs ;
Determiners ;
Adverbs ;
prop ;
pron ;
part ;
Punctuation ;
Symbols ;

About lexica and continuations. Instead of separate lexica for words that can only be sing or only plur and others for words that can take all numbers, this is a better solution: Normal nouns are tagged tp, tup etc. whereas specials are tagged with the continuation lexicon

This (part of) documentation was generated from src/fst/morphology/root.lexc

Inupiaq NLP Grammar

Page Content

Multichar_Symbols and Root lexicon for Iñupiaq