Cornish morphology
First, we declare multicharacter symbols
POS
- +N +V +A
- +Adv +CC +CS +Interj +Pron +Num +Pr
- +Smut +Mmut +Pmut +Amut for mutation
- +Symbol = independent symbols in the text stream, like £, €, ©
Verbal MSP
+Prs +Fut +Prt +Prf +Ipf +Plf
+Ind +Imp +Sbj
+Inf +Sit ! what is +Sit?
+Pos +Neg +ConNeg
+1Sg +2Sg +3Sg +1Pl +2Pl +3Pl
+Impers
+Ptc
+VN
Nominal MSP
+Sg +Pl +P
+Nom +Acc +Gen
+Ord
Pronominal MSP
+Suff +Emph +Indef
+Msc +Fem
Diacritical marks, to trigger morphophonological rules. The “%” symbol literalises the next symbol
%^S %^A %^P %^M %^V %^E
%> %^D
%^UML %^CD %^TRUNC %^STDEL
Archiphonemes
e3 e to o in plural a4 o4 Umlaut phonemes, changing to e
Symbols that need to be escaped on the lower side (towards twolc):
- »7: Literal »
- «7: Literal «
%[%>%] - Literal > %[%<%] - Literal <
Flag diacritics
We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:
| Flag | Explanation |
|---|---|
| @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
| @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
| @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.
| Flag | Explanation |
|---|---|
| @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
| @D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
| @P.CmpPref.FALSE@ | Block these words from making further compounds |
| @D.CmpLast.TRUE@ | Block such words from entering R |
| @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
| @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
| @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
| @D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.
| Flag | Explanation |
|---|---|
| @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. |
| @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
| @P.Pmatch.Backtrack@ | For preprocessing |
| @R.ErrOrth.ON@ | resetting ErrOrth flag |
| @C.ErrOrth@ | clearing ErrOrth Flag. |
- +Use/TTS – only retained in the HFST Text-To-Speech disambiguation tokeniser
- +Use/-TTS – never retained in the HFST Text-To-Speech disambiguation tokeniser
| Flag diacritic | Explanation |
|---|---|
| @U.number.one@ | Flag used to give arabic numerals in smj different cases ; |
| @U.number.two@ | Flag used to give arabic numerals in smj different cases ; |
| @U.number.three@ | Flag used to give arabic numerals in smj different cases ; |
| @U.number.four@ | Flag used to give arabic numerals in smj different cases ; |
| @U.number.five@ | Flag used to give arabic numerals in smj different cases ; |
| @U.number.six@ | Flag used to give arabic numerals in smj different cases ; |
| @U.number.seven@ | Flag used to give arabic numerals in smj different cases ; |
| @U.number.eight@ | Flag used to give arabic numerals in smj different cases ; |
| @U.number.nine@ | Flag used to give arabic numerals in smj different cases ; |
| @U.number.zero@ | Flag used to give arabic numerals in smj different cases ; |
Key lexicon
- LEXICON Root
- Adverb ;
- Noun ;
- Propernoun ;
- Preposition ;
- Conjunction ;
- Subjunction ;
- Verb ;
- Adjective ;
- Interjection ;
- Numeral ;
- Pronoun ;
- Punctuation ;
- Symbols ;
Adhoc lexica, to be fixed
- LEXICON ENDLEX
- LEXICON RNum
- LEXICON ARABICCOMPOUNDS
This (part of) documentation was generated from src/fst/morphology/root.lexc