Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-ess
INTRODUCTION TO MORPHOLOGICAL ANALYSER OF Central Siberian Yupik LANGUAGE.
+1PlO +2PlO +3PlO +4PlO Objective conjugation
4th person still missing in the transitive conjugation
+Arch tags for archaic forms. In this pilot just used to indicate twin forms
+LU +GUUQ +UNA clitics
We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again: | @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised
For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm. | @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first | @D.CmpPref.TRUE@ | Block such words from entering ENDLEX | @P.CmpPref.FALSE@ | Block these words from making further compounds | @D.CmpLast.TRUE@ | Block such words from entering R | @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding | @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding | @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R | @D.CmpOnly.FALSE@ | Disallow words coming directly from root.
Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags. | @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. | @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj.
This file gives the start of the Iñupiaq lexicon. The lexicon Root points at the different parts of speech. Each POS has its own file stems/nouns.lexc, etc., which in turn points to affixes/nouns.lexc, etc. POS-changing nominalizers are found in affixes/verbs.lexc and verbalizers in affixes/nouns.lexc It might be a good idea to have noun-ipk-der.txt etc. as well. The common, final lexica, are found in clitics.lexc.
LEXICON Root
This (part of) documentation was generated from src/fst/morphology/root.lexc