Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-vep
Thes are to be evaluated (are they in use?) TODO: Have a look at these:
+Rom
+Use/-TTS – never retained in the HFST Text-To-Speech disambiguation tokeniser
+Hom3
@R.ErrOrth.ON@
The morphological analyses of wordforms of Veps are presented in this system in terms of the following symbols. (It is highly suggested to follow existing standards when adding new tags).
+Use/-Spell =
Derivations are classified under the morphophonetic form of the suffix, the source and target part-of-speech.
To represent phonologic variations in word forms we use the following symbols in the lexicon files:
We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:
Flag | Explanation |
---|---|
@P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.
Flag | Explanation |
---|---|
@P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
@D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
@P.CmpPref.FALSE@ | Block these words from making further compounds |
@D.CmpLast.TRUE@ | Block such words from entering R |
@D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
@U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
@P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
@D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.
Flag | Explanation |
---|---|
@U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
Flag diacritic | Explanation |
---|---|
@U.number.one@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.two@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.three@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.four@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.five@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.six@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.seven@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.eight@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.nine@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.zero@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.one@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.two@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.three@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.four@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.five@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.six@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.seven@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.eight@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.nine@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.ten@ | Flag used to give arabic numerals in smj different cases ; |
@P.number.zero@ | Flag used to give arabic numerals in smj different cases ; |
The word forms in Veps start from the lexeme roots of basic word classes.
CC_
CS_
INTERJ_
ADV_
ADV_MANNER
ADV_ADE ADV_ABL ADV_ALL ADV_ELA ADV_ILL ADV_INE ADV_LAT ADV_SPAT
ADV_TEMP
This (part of) documentation was generated from src/fst/morphology/root.lexc