Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-fit
Usage:
cat text.txt|hfst-tokenize -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |vislcg3 -g src/cg3/disambiguator.cg3
This file documents the Meänkieli disambiguator file .
Sentence delimiters are the following: “<.>” “<…>” “<!>” “<?>” “<¶>”
Interj = interjection
Indef = Indef pron
Neg = Negation verb
COMMA = comma
WORD = all PoS
NPMODADV = NPMOD plus adverb
NOT-NPMOD = these cannot modify a noun
NOT-NPMODADV = these cannot modify a noun, and is not adverb
Boundaries
Verbs
person_test selects finite verb if there is a Pron Pers to the left
adv_after_V selects adverb if there is a verb to the right
prop_infrontof_kieli removes propernoun in fron of kieli, if it kan be something else, e.g. Kainun kieli
Rule: PropInit removes propernoun in the beginning of a sentence if it kan be a CC or a Pr (e.g. Mutta)
Rule: PropNotInit selects propernoun if it is not in the beginning of a sentence
Possessive suffixes
First we put rules to choose Px forms… (forthcomong)
Then we remove the remaining Px
Numeral phrases
Rule: PropNotInit selects propernoun if it is not in the beginning of a sentence
Rule: Prifgenpar selects preposition to the left of Gen or Par
Rule: Poifgenpar selects postposition to the right of Gen or Par
Rule: vasthaan not vasta if -1 Par
Rule: CVP maps @CVP to CS and mutta
Rule: CNPifN maps @CNP to CC between two N
Rule: CNPifInf maps @CNP to CC between two Inf
Genitive
ei negation verb
eli
Conjunctions
että
jos
ko
mutta
sillä
Imperative
Relative pronouns
Rule: Pl3ollaifplrelpronandplinterrpron selects Pl3 if olla
Rule: Sg3ollaifplrelpronandplinterrpron selects Sg3 if olla
Rule: Sg3ollainpretandperf selects Sg3 if COPULAS
Rule: Sg3ollainpretandperf selects Sg3 if COPULAS
Rule: Relpronandnotintterpron selects Rel Sg if Interr
Rule: Relpronandnotintterpron selects Rel Sg if Interr
Rule: interrpron selects Interr if ? in the end
Rule: DifferenceBetweenNiitäImprtAndNiitäDemAndPersIfSubj selects Pron Dem Pl or Pron Pers Pl3 when finite verb to the right
Rule: paljonadvandnotpaljonoun selects Adv if paljon
Rule: Relpronifitsanounoracommabeforeit selects Rel Pl if N to the left
Rule: annaimperativeandnotannaname removes Prop if Anna se
Rule: tulinounfromtuliprtsg3 selects V Sg
Rule: dempronandnotpronpers selects Den if A of N to the right
Rule: Imperativefromconneg selects and removes ConNeg
Rule: ImperativeafterNeg removes Imprt if pronoun
Rule: interrel selects Interr of Rel if CS to the right
Rule: +FMAINV to the remaining finite verbs which are not AUX
Rule: @<ADVLcoor (@<ADVL) for ADVLCASEAdv if @CNP to the left and ADVL to the left of it
Rule: X maps X everywhere
Rule: REMOVE X removes X whenever there is any other tag.
WORDLEMMA = regex giving the lemma in question
Rule: errorth removes Err/Orth if there is an analysis without Err/Orth with the same lemma
This (part of) documentation was generated from src/cg3/disambiguator.cg3