Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-est-x-utee
+N
: nimisõna / substantive+A
: omadussõna / adjective+Num
: arvsõna / numeral+Pron
: asesõna / pronoun+V
: tegusõna / verb+Adv
: määrsõna / adverb+Interj
: hüüdsõna / interjection+CC
: rinnastav sidesõna / co-ordinating conjunction+CS
: alistav sidesõna / subordinating conjunction+Adp
: kaassõna / adposition, i.e. pre/postposition+Pref
: prefiks / prefixGenitiivatribuut pole eraldi kategooria / No special tag for genitive attribute : angoora+N+Sg+Gen
+Prop
: pärisnimi / proper name+Card
: põhiarvsõna / cardinal numeral+Ord
: järgarvsõna / ordinal numeral+Comp
: komparatiiv / comparative+Superl
: superlatiiv / superlative
asi+N+Sg+Nom
(Eng. # thing)Jaan+N+Prop+Sg+Nom
asjalik+A+Sg+Nom
(Eng. # serious)parem+A+Comp+Sg+Nom
(Eng. # better)parim+A+Superl+Sg+Nom
(Eng. # best)kaks+Num+Card+Sg+Nom
(Eng. # two)teine+Num+Ord+Sg+Nom
(Eng. # second)see+Pron+Sg+Nom
(Eng. # that)asjastama+V+Pers+Sup+Ill
(Eng. # objectify)asjatult+Adv
(Eng. # in vain)hei+Interj
(Eng. # hi)ja+CC
(Eng. # and)et+CS
(Eng. # that)kaudu+Adp
(Eng. # via)eba+Pref
(Eng. # un-, non-)+Sg
: ainsus / singular+Pl
: mitmus / plural+Nom
: nimetav / nominativepart+N+Sg+Nom
maja+N+Sg+Nom
part+N+Pl+Nom
majad: maja+N+Pl+Nom
+Gen
: omastav / genitivepart+N+Sg+Gen
maja+N+Sg+Gen
part+N+Pl+Gen
majade: maja+N+Pl+Gen
+Par
: osastav / partitivepart+N+Sg+Par
maja+N+Sg+Par
part+N+Pl+Par
part+N+Pl+Par+Usage/Rare
maja+N+Pl+Par
majasid: maja+N+Pl+Par+Usage/Rare
+Ill
: sisseütlev / illativepart+N+Sg+Ill
maja+N+Sg+Ill
part+N+Sg+Ill+Usage/Hyp
maja+N+Sg+Ill+Usage/Rare
part+N+Pl+Ill
majadesse: maja+N+Pl+Ill
Lühike sisseütlev e suunduv pole eraldi kääne (parti, majja) / Short illative or additive is not a separate case
+Ine
: seesütlev / inessivepart+N+Sg+Ine
maja+N+Sg+Ine
part+N+Pl+Ine
majades: maja+N+Pl+Ine
+Ela
: seestütlev / elativepart+N+Sg+Ela
maja+N+Sg+Ela
part+N+Pl+Ela
majadest: maja+N+Pl+Ela
+All
: alaleütlev / allativepart+N+Sg+All
maja+N+Sg+All
part+N+Pl+All
majadele: maja+N+Pl+All
+Ade
: alalütlev / adessivepart+N+Sg+Ade
maja+N+Sg+Ade
part+N+Pl+Ade
majadel: maja+N+Pl+Ade
+Abl
: alaltütlev / ablativepart+N+Sg+Abl
maja+N+Sg+Abl
part+N+Pl+Abl
majadelt: maja+N+Pl+Abl
+Tra
: saav / translativepart+N+Sg+Tra
maja+N+Sg+Tra
part+N+Pl+Tra
majadeks: maja+N+Pl+Tra
+Trm
: rajav / terminativepart+N+Sg+Trm
maja+N+Sg+Trm
part+N+Pl+Trm
majadeni: maja+N+Pl+Trm
+Ess
: olev / essivepart+N+Sg+Ess
maja+N+Sg+Ess
part+N+Pl+Ess
majadena: maja+N+Pl+Ess
+Abe
: ilmaütlev / abessivepart+N+Sg+Abe
maja+N+Sg+Abe
part+N+Pl+Abe
majadeta: maja+N+Pl+Abe
+Com
: kaasaütlev / comitativepart+N+Sg+Com
maja+N+Sg+Com
part+N+Pl+Com
maja+N+Pl+Com
Verbil on finiitsed (pöördelised) ja infiniitsed (nn käänduvad) vormid. Pöördelistel vormidel on kategooriad: tegumood, aeg, kõneviis, isik, arv, kõneliik (tegumood ja aeg on ka mõnedel infiniitsetel vormidel).
Verbs have finite (conjugable) and infinite (“declinable”) forms. The finite forms have categories: voice, tense, mood, person, number, negation (some infinite forms also have voice and tense).
+Impers
: umbisikuline / impersonal+Pers
: isikuline / personal+Prs
: olevik / present+Prt
: minevik / past (i.e. not present)Lihtminevik = minevik / Past imperfect = past
Täisminevik / Past perfect: olema (pres) + nud/tud/dud (olen teinud)
Enneminevik / Past pluperfect: olema (impf) + nud/tud/dud (olin teinud)
+Ind
: kindel / indicative+Cond
: tingiv / conditional+Imprt
: käskiv / imperative / jussive+Quot
: kaudne / quotative+Sg1
: ainsuse 1. pööre / singular 1st person+Sg2
: ainsuse 2. pööre / singular 2nd person+Sg3
: ainsuse 3. pööre / singular 3rd person+Pl1
: mitmuse 1. pööre / plural 1st person+Pl2
: mitmuse 2. pööre / plural 2nd person+Pl3
: mitmuse 3. pööre / plural 3rd person+Aff
: jaatav kõne / affirmative+Neg
: eitav kõne / negative+Sup
: ma-tegevusnimi / supine (ma-infinitive)
+Inf
: da-tegevusnimi / infinitive (da-infinitive)a
+Ger
: des-vorm / gerund (des-form)
+Prc
: kesksõna / participle
allikas / source : Heiki-Jaan Kaalep. Eesti verbi vormistik. Keel ja kirjandus 1/2015 lk 1-15.
The categories are given in the order in which the allomorphs (if they can be distinguished) that represent them are attached to the word stem (note that the treatment of allomorphs is sloppy here). The justification is that the categories are not equal, but form an hierarchy: those closer to the word end tend to be more optional, more often non-specified.
The forms of negative words pole and ära are included in order to capture really all the combinations; also note that for olema, there are some constellations of categories that are the same for a wordform beginning with ole- and pole-
elama+V+Pers+Prs+Ind+Sg1+Aff
elama+V+Pers+Prs+Ind+Sg2+Aff
elama+V+Pers+Prs+Ind+Sg3+Aff
elama+V+Pers+Prs+Ind+Pl1+Aff
elama+V+Pers+Prs+Ind+Pl2+Aff
elama+V+Pers+Prs+Ind+Pl3+Aff
\
elama+V+Pers+Prs+Ind+Neg
olema+V+Pers+Prs+Ind+Neg
olema+V+Pers+Prs+Ind+Neg
NB! ei ole = pole
elama+V+Pers+Prs+Cond+Sg1+Aff
elama+V+Pers+Prs+Cond+Sg2+Aff
elama+V+Pers+Prs+Cond
elama+V+Pers+Prs+Cond+Pl1+Aff
elama+V+Pers+Prs+Cond+Pl2+Aff
elama+V+Pers+Prs+Cond+Pl3+Aff
\
olema+V+Pers+Prs+Cond+Sg1+Neg
olema+V+Pers+Prs+Cond+Sg2+Neg
olema+V+Pers+Prs+Cond+Neg
olema+V+Pers+Prs+Cond+Pl1+Neg
olema+V+Pers+Prs+Cond+Pl2+Neg
olema+V+Pers+Prs+Cond+Pl3+Neg
\
elama+V+Pers+Prs+Imprt+Sg2
elama+V+Pers+Prs+Imprt
elama+V+Pers+Prs+Imprt+Pl1
elama+V+Pers+Prs+Imprt+Pl2
\
ära+V+Pers+Prs+Imprt+Sg2+Neg
ära+V+Pers+Prs+Imprt+Neg
ära+V+Pers+Prs+Imprt+Pl1+Neg
ära+V+Pers+Prs+Imprt+Pl1+Neg+Usage/Rare
ära+V+Pers+Prs+Imprt+Pl2+Neg
\
elama+V+Pers+Prs+Quot
olema+V+Pers+Prs+Quot+Neg
\
elama+V+Pers+Prt+Ind+Sg1+Aff
elama+V+Pers+Prt+Ind+Sg2+Aff
elama+V+Pers+Prt+Ind+Sg3+Aff
elama+V+Pers+Prt+Ind+Pl1+Aff
elama+V+Pers+Prt+Ind+Pl2+Aff
elama+V+Pers+Prt+Ind+Pl3+Aff
\
elama+V+Pers+Prt+Ind+Neg
olema+V+Pers+Prt+Ind+Neg
olema+V+Pers+Prt+Ind+Neg
NB! ei olnud = polnud
elama+V+Pers+Prt+Cond+Sg1+Aff
elama+V+Pers+Prt+Cond+Sg2+Aff
elama+V+Pers+Prt+Cond
elama+V+Pers+Prt+Cond+Pl1+Aff
elama+V+Pers+Prt+Cond+Pl2+Aff
elama+V+Pers+Prt+Cond+Pl3+Aff
\
olema+V+Pers+Prt+Cond+Sg1+Neg
olema+V+Pers+Prt+Cond+Sg2+Neg
olema+V+Pers+Prt+Cond+Neg
olema+V+Pers+Prt+Cond+Pl1+Neg
olema+V+Pers+Prt+Cond+Pl2+Neg
olema+V+Pers+Prt+Cond+Pl3+Neg
elanud: elama+V+Pers+Prt+Imprt: 1.11.2016 ei genereerita ega tunta ära / not recognized nor generated as of Nov 1, 2016
ära+V+Pers+Prt+Imprt+Neg
\
elama+V+Pers+Prt+Quot
olema+V+Pers+Prt+Quot+Neg
elama+V+Pers+Prs+Prc
elama+V+Pers+Prt+Prc
(on, oli, …) + V+Pers+Prt+Prc = some analytical personal form
elama+V+Pers+Sup+Ill
elama+V+Pers+Sup+Ine
elama+V+Pers+Sup+Ela
elama+V+Pers+Sup+Tra
elama+V+Pers+Sup+Abe
elama+V+Impers+Prs+Ind+Aff
elata: elama+V+Impers+Prs+Ind+Neg
olema+V+Impers+Prs+Ind+Neg
olema+V+Impers+Prs+Ind+Neg
NB! ei olda = polda
elama+V+Impers+Prs+Cond
olema+V+Impers+Prs+Cond+Neg
\
elama+V+Impers+Prs+Imprt
ära+V+Impers+Prs+Imprt+Neg
\
elama+V+Impers+Prs+Quot
olema+V+Impers+Prs+Quot+Neg
\
elama+V+Impers+Prt+Ind+Aff
elatud: elama+V+Impers+Prt+Ind+Neg
olema+V+Impers+Prt+Ind+Neg
olema+V+Impers+Prt+Ind+Neg
NB! ei oldud = poldud
elama+V+Impers+Prt+Cond
olema+V+Impers+Prt+Cond+Neg
elama+V+Impers+Prs+Prc
elama+V+Impers+Prt+Prc
(on, oli, …) + V+Impers+Prt+Prc = some analytical personal form
elama+V+Impers+Sup
elama+V+Inf
elama+V+Ger
personal present (Prs not implemented????), 3 words: kuulukse, tunnukse, näikse
kuulukse+V
ei+V+Neg
Analytical forms (olen elanud, olin elanud, oleksin elanud, ei olnud elanud, ei olnuks elanud etc) are not treated here…
+Foc/gi
pulk+N+Sg+Nom+Foc/gi
+Emph
: long inflectional forms of personal pronouns mina, sina, tema, meie, teie, nemadtema+Pron+Sg+All+Emph
+Usage/Rare
: norm, but rarepuusid: puu+N+Pl+Par+Usage/Rare
+Usage/Hyp
: norm, but so rare that norm is probaly wrongtiivasse: tiib+N+Sg+Ill+Usage/Hyp
+Usage/NotNorm
: not norm, but sometimes usedpöidlates: pöial+N+Pl+Ine+Usage/NotNorm
+Usage/CommonNotNorm
: not norm, and used more than normköömen+N+Sg+Par+Usage/CommonNotNorm
Oletatav analüüs / Guessed analysis
+Guess
: guessed analysis+Pref
taas+Pref#riigistama+V+Pers+Sup+Ill
re+Pref#investeerima+V+Pers+Sup+Ill
V –> N
+Der/mine
jahumine: jahuma+V+Der/mine+N+Sg+Nom
+Der/ja
jahuja: jahuma+V+Der/ja+N+Sg+Nom
+Der/nu
elanu: elama+V+Der/nu+N+Sg+Nom
+Der/mus
küllastumus: küllastuma+V+Der/mus+N+Sg+Nom
+Der/vus
elavus: elama+V+Der/vus+N+Sg+Nom
+Der/ng
devalveering: devalveerima+V+Der/ng+N+Sg+Nom
+Der/is
arveldama+V+Der/is+N+Sg+Nom
V –> A
+Der/v
jahuv: jahuma+V+Der/v+A+Sg+Nom
+Der/tav
öeldav: ütlema+V+Der/tav+A+Sg+Nom
+Der/nud
elanud: elama+V+Der/nud+A+Sg+Nom
+Der/mata
elamata: elama+V+Der/mata+A
+Der/matu
segamatu: segama+V+Der/matu+A+Sg+Nom
+Der/tamatu
mõeldamatu: mõtlema+V+Der/tamatu+A+Sg+Nom
+Der/tu
elatu: elama+V+Der/tu+A+Sg+Nom
+Der/tud
elama+V+Der/tud+A
N –> A
+Der/lik
kotkalik: kotkas+N+Der/lik+A+Sg+Nom
+Der/line
põõsas+N+Der/line+A+Sg+Nom
onomastika+N+Der/line+A+Sg+Nom
apooriline: apooria+N+Der/line+A+Sg+Nom
+Der/ne
A –> Adv
+Der/lt
roosalt: roosa+A+Der/lt+Adv
+Der/sti
valusasti: valus+A+Der/sti+Adv
+Der/ini
parem+A+Comp+Der/ini+Adv
A –> A
+Der/m
valusam: valus+A+Der/m+A+Comp+Sg+Nom
+Der/im
valus+A+Der/im+A+Superl+Sg+Nom
N –> N
+Der/nna
õmblejanna: õmbleja+N+Der/nna+N+Sg+Nom
+Der/kond
vaatajaskond: vaatama+V+Der/ja+N+Der/kond+N+Sg+Nom
+Der/ist
kapitalism+N+Der/ist+N+Sg+Nom
N Prop –> N
+Der/lane
: helveetslane:Helveetsia+N+Prop+Der/lane+N+Sg+NomFirenze+N+Prop+Der/lane+N+Sg+Nom
Helveetsia+N+Prop+Der/lane+N+Sg+Nom
A –> N, V –> N
+Der/us
porine+A+Der/us+N+Sg+Nom
õpetama+V+Der/us+N+Sg+Nom
N –> Adv
+Der/ti
laud+N+Der/ti+Adv
N –> N, A –> A
+Dim/ke
: diminutivepõõsake: põõsas+N+Dim/ke+Sg+Nom
+Der/minus
: shortening stemvaatamine+N+Der/minus
astraalne+A+Der/minus
Num –> N
+Der/kas
kolm+Num+Card+Der/kas+N+Sg+Nom
Copied from Sami root.lexc
+Use/Circ
: for arabic and roman numerals;+Use/PMatch
: for tokenisation with pmatch+Use/-PMatch
: for tokenisation with pmatch
+ABBR
: Lühend / Abbreviation+ACR
: Suurtähtlühend / Acronym+CLB
: Osalause piir (punkt, koma) / Clause border (full stop, comma..)+PUNCT
: Kirjavahemärk / Punctuation+LEFT
: Kirjavahemärgi asetus / Punctuation orientation+RIGHT
: Kirjavahemärgi asetus / Punctuation orientation+Err/Orth
: Ortograafiaviga (genereeritud failide sümbol)/ orthography errorUse the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags. | @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. | @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj.
These are used for limiting the number of components in a compound word (the compound transducer is cyclic, but in reality there is a limit to the length of words) | @D.Part@ | No part of a compound should have been encountered yet | @P.Part.One@ | Indicate that this could be the first part of a compound | @R.Part.One@ | Require that the first part has been encountered; if a lemma has it, it means that the lemma cannot be part2 or part3 of a compound | @D.Part.One@ | Require that the first part has NOT been encountered | @P.Part.Two@ | Indicate that this could be the second part of a compound | @R.Part.Two@ | Require that the second part has been encountered | @P.Part.Three@ | Indicate that this could be the third part of a compound | @P.Part.Bad@ | Indicate that this cannot be a part of a compound | @R.Part.Bad@ | Require that the first part has been encountered; if a lemma has it, it means that the lemma cannot be part2
POS is used for filtering derivations and compounds | @R.POS.NumCardCompound@ | compound numeral (viis+sada - five hundred)
Case is used for filtering derivations and compounds
Remember there has been some derivation from A or N; for filtering compounds derivation from V is called paradigmatic and does not result in Der; just a new POS…
Remember the stem type; for filtering compounds
A special condition that is used for filtering derivations and compounds
Tokeniser
Guesser
Sami GT convention
(escaped with square brackets, to avoid collision with > as morpheme boundary)
< (escaped with square brackets, to avoid collision with < as morpheme boundary)
Legitimate strings that are not words: numbers, acronyms, …
LEXICON Root
is the starting point of everythingFor modelling compounds, the simplex word fst is concatenated with itself. For this, Kleene star operation is used, i.e. fst is concatenated zero to any number of times. For the lookup process, this creates a possibility of infinitely many passes through the fst, thus allowing infinitely long words. For limiting and controlling the passes, flag diacritics are used. Lookup process remembers which paths it has taken, and counts the passes. For remembering, it sets up flags on the path:
Lexicon-based passes
strictly simplex word; cannot be a part of a compound a simplex word, or the first part of a compound
Guesser assumes that there is only one pass, and that only the final part is important (out-of-vocabulary simplex words are treated elsewhere)
— end guesser
lexicon-based
strictly simplex words; cannot be a part of a compound
` AdverbsLast ; ` an adverb that may be either a simplex word, or the second part of a compound
` SymbolStrings ; ` .ee .com -ending strings
LEXICON First
a simplex word or the first part of a compound
` @P.Part.Two@ StartCompoundException ; ` samasuur, samakõrgusjoon, eneseabi etc
LEXICON FirstOpenClass
a simplex word or the first part of a compound
` @P.POS.Pref@ Prefixes ; `
LEXICON FirstClosedClass
a simplex word or the first part of a compound` @P.POS.Pron@ Pronouns ; `
LEXICON FirstWordLike
a word-like string that may (or must) be the first part of a compound` @P.POS.ACRMinus@@P.NeedPart.Two@ Acrominus ; `
LEXICON Latter
the latter part of a compound
` @R.POS.N@@R.Case.Gen@@R.Part.Two@ StartCompoundException ; ` noun + samasuur, samakõrgusjoon, eneseabi etc
` @P.POS.A@@C.Der@@C.Stem@ Adjectives ; `
` @P.POS.A@@C.Der@@C.Stem@ Adjectives_ne ; ` järguline järk+N+Der/line+A redellik redel+N+Der/lik+A NOT -autone
` @P.POS.A@@C.Der@@C.Stem@ Adjectives_v ; `
` @P.POS.A@@C.Der@@C.Stem@ PlainAdjectives ; `
` @P.POS.AComp@@C.Der@@C.Stem@ SuperlativeAdjectives ; `
` @P.Case.Gen@ LatterVerb ; `
This (part of) documentation was generated from src/fst/morphology/root.lexc