Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-fit
All doc-comment documentation in one large file.
This dep file is for sma, sme, smj, sje.
Sentence delimiters are the following: <.> <!> <?> <…> <¶>
N V A Adv CC CS Inf Sup Neg Num Po Pr
Pcle Prop
Pron IV TV COMMA DASH CITATION to keep colouring we add a “ HYPHEN QMARK PUNCT LEFT RIGHT CLB Ind Pot Impr ImprtII Cond ConNeg Caus causative eus VGen Interj ABBR ACR Prs Prt Cmpnd RCmpnd PrfPrc PrsPrc Actor Actio Ger Indef Nom Acc Ill Com Gen Ess
IM For fao
Correction rules
muitalit
XX
XX
XX
faoSumId=Rel
lgRemove removes the language tags
This (part of) documentation was generated from src/cg3/dependency.cg3
Usage:
cat text.txt|hfst-tokenize -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |vislcg3 -g src/cg3/disambiguator.cg3
This file documents the Meänkieli disambiguator file .
Sentence delimiters are the following: “<.>” “<…>” “<!>” “<?>” “<¶>”
Interj = interjection
Indef = Indef pron
Neg = Negation verb
COMMA = comma
WORD = all PoS
NPMODADV = NPMOD plus adverb
NOT-NPMOD = these cannot modify a noun
NOT-NPMODADV = these cannot modify a noun, and is not adverb
Boundaries
Verbs
person_test selects finite verb if there is a Pron Pers to the left
adv_after_V selects adverb if there is a verb to the right
prop_infrontof_kieli removes propernoun in fron of kieli, if it kan be something else, e.g. Kainun kieli
Rule: PropInit removes propernoun in the beginning of a sentence if it kan be a CC or a Pr (e.g. Mutta)
Rule: PropNotInit selects propernoun if it is not in the beginning of a sentence
Possessive suffixes
First we put rules to choose Px forms… (forthcomong)
Then we remove the remaining Px
Numeral phrases
Rule: PropNotInit selects propernoun if it is not in the beginning of a sentence
Rule: Prifgenpar selects preposition to the left of Gen or Par
Rule: Poifgenpar selects postposition to the right of Gen or Par
Rule: vasthaan not vasta if -1 Par
Rule: CVP maps @CVP to CS and mutta
Rule: CNPifN maps @CNP to CC between two N
Rule: CNPifInf maps @CNP to CC between two Inf
Genitive
ei negation verb
eli
Conjunctions
että
jos
ko
mutta
sillä
Imperative
Relative pronouns
Rule: Pl3ollaifplrelpronandplinterrpron selects Pl3 if olla
Rule: Sg3ollaifplrelpronandplinterrpron selects Sg3 if olla
Rule: Sg3ollainpretandperf selects Sg3 if COPULAS
Rule: Sg3ollainpretandperf selects Sg3 if COPULAS
Rule: Relpronandnotintterpron selects Rel Sg if Interr
Rule: Relpronandnotintterpron selects Rel Sg if Interr
Rule: interrpron selects Interr if ? in the end
Rule: DifferenceBetweenNiitäImprtAndNiitäDemAndPersIfSubj selects Pron Dem Pl or Pron Pers Pl3 when finite verb to the right
Rule: paljonadvandnotpaljonoun selects Adv if paljon
Rule: Relpronifitsanounoracommabeforeit selects Rel Pl if N to the left
Rule: annaimperativeandnotannaname removes Prop if Anna se
Rule: tulinounfromtuliprtsg3 selects V Sg
Rule: dempronandnotpronpers selects Den if A of N to the right
Rule: Imperativefromconneg selects and removes ConNeg
Rule: ImperativeafterNeg removes Imprt if pronoun
Rule: interrel selects Interr of Rel if CS to the right
Rule: +FMAINV to the remaining finite verbs which are not AUX
Rule: @<ADVLcoor (@<ADVL) for ADVLCASEAdv if @CNP to the left and ADVL to the left of it
Rule: X maps X everywhere
Rule: REMOVE X removes X whenever there is any other tag.
WORDLEMMA = regex giving the lemma in question
Rule: errorth removes Err/Orth if there is an analysis without Err/Orth with the same lemma
This (part of) documentation was generated from src/cg3/disambiguator.cg3
S Y N T A C T I C F U N C T I O N S F O R S Á M I
Sámi language technology project 2003-2018, University of Tromsø #
This file adds syntactic functions. It is common for all the Saami
LEFT RIGHT because of apertium
Sets for POS sub-categories
Sets for Semantic tags
Sets for Morphosyntactic properties
Sets for verbs
V is all readings with a V tag in them, REAL-V should be the ones without an N tag following the V. The REAL-V set thus awaits a fix to the preprocess V … N bug.
The set COPULAS is for predicative constructions
NP sets defined according to their morphosyntactic features
The PRE-NP-HEAD family of sets
These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.
The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NPT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)
Miscellaneous sets
Border sets and their complements
ADLVCASE
These were the set types.
hab1 hab aux leat
hab_numo1 hab copula comma comma N+Nom
hab_numo2 copula nu mo/go hab
leahab copula nu mo/go hab
hab2 hab auxv adv leat
hab3 (
hab3 (
hab3 (
hab3 (
hab_main (
habInf hab lea inf
habNomLeft Nom or Num + gen hab lea
habAdvl Ii han ovttasge du sogas leat dat namma.
hab4 hab cc hab leat
hab6 lea go hab – leago hab
hab7 lea go hab
hab5 This is not HAB Mánás gollot gieđat.
hab9 prop ord-hab leat
hab10 prop ord-hab leat
habDain2
habRel # before relative clause
habEllipse Buot gánddain lea dreassa, nieiddain fas gákti.
habGen (
habGenQst (
n<titel1 (@N<) for (“jr”) or (“sr”); if first one to the left is Prop
n<titel2 (@N<) for INITIAL; if first one to the left is a noun, or if to the left of you is a single letter which is part of a noun conjunction bustávas e ja f gáibiduvvo
n<:com (@N<) for (Sg Com); if first one to the left is Coll
>nAttr (@>N) for Attr; if there is a noun to your right
n>Indef (Pron Indef Attr); if eará is to the right
n>Indef (Pron Indef Com); if eará is to the right
>nNum (@>N) for numerals if; there is a noun to your right. You are not allowed to be (Sg Nom), (Sg Acc) or (Sem/Date)
noun>n (@>N) for Gen; if there is a noun to your right. Restrictions: Not if you are: a time related word. Not if you are OKTA with Pl Loc to your right. Not if CC is to your right followed by another Gen and then Po. Not if you are HUMAN and to your right is Actio Nom folloed by a noun.
>nTime (@>N) for Gen TIME-N; if timenoun to your right. Restrictions: Not if you are a OKTA Nom with Pl Loc to your right. Not if CC followed by Gen, followed by Po to your right. Not if COMMA to your right
>ntittel (@>N) for (Sg Nom TIME-N) or (Nom Der/NomAg); if to your right is Sem/Mal, Sem/Fem, Sem/Sur
>nplc (@>N) for (Sg Nom Prop Sem/Plc), if to your right is Sem/Plc
>nALU (@>N) for Sg Acc numerals; when a measure-noun to the right
>NTime (@>N) for Gen; if you are TIME-N with BOC to your left, and PREGEN to your right
n<:Refl (@N<) for (Refl Nom); if to the left is (N Nom), or if first one to the left is a finite mainverb with a (N Nom) to the left
>pron1 (@>Pron) for GRADE-ADV, DUSSE, BUOT if; first one to the right is Pron
>pron2 (@>Pron) for (Refl Nom) if; first one to the right is Refl
>pron3 (@>Pron) for (Pron Recipr) if; first one to the right is (Pron Recipr)
vaikko (@>Pron) for vaikko if; first one to the right is Indef
vaikkoman (@>ADVL) for vaikko if; first one to the right is man
dasmaŋŋel (@>ADVL) for vaikko if; first one to the right is man
adv>advl (@>ADVL)
adv>advl (@>ADVL)
BOSvoc (@VOC) for HUMAN Nom; if sentence initial. To the right is comma. No nom-cased HUMAN followed by comma or CC is allowed to the right. There should not be a relative clause to the right, because then you are likely to be SUBJ
voc (@VOC) for Nom HUMAN; if comma to the left and an second person verb or pronoun to the left. To the right is the end of the sentence
Particle<subj (@PCLE)
spred<obj (@SPRED<OBJ) for Acc; the object of an SPRPED. Not to be mistaken with OPRED. If SPRED is to the left, and copulas is to the left of it. Nom or Hab are found sentence initially.
Hab<subj (
Hab<subj (
Hab>Advlcase<subj (
Nom>Advlcase<subj (
<extSubj (
<extSubj (
<extSubjA (
<extSubj (
<extSubj (
loc<extSubj (
<spred (@<SPRED) for Nom; if Nom to the left, copulas to the left of Nom, and a time related word to the left of it.
<extQst1 (
<extQst2 (
extQst3> (
extQst3> (
<extsubjcoor (
Sem/Year
<spredQst (@<SPRED) for Nom; in a typically question sentence; You are not allowed to be Pers or human. The special part is that Nom is not allowed to your right
<spredQst2 (@<SPRED) for (A Nom); in a typically question sentence; You are SPRED if (N Nom) is to your left and leat + qst is to the left
<spredQst3 (@<SPRED) for (A Nom); you are SPRED when you are (A Nom) and to your right is (N Nom). This is a Qst-sentence, so copulas is found to your left
<spredQst4 (@<SPRED) for Nom; but only in a qst-sentence where there is no chance of you beeing the subj
<NomBeforeSpred (@<SPRED) for (A Nom) if; Nom to the left, and copulas is to the left of Nom. There is no Nom allowed to the right of copulas! To avoid messing with coordination: ja, dahje and comma are not allowed to your left. Comma is not allowed to your right; if so then you are likely to be coordinated
<spred (@<SPRED) for A Nom or N Nom if; the subject Nom is on the same side of copulas as you: on the right side of copulas
<spredVeara (@<SPRED) for veara + Nom; if genitive immediately to the right, and intransitive mainverb to the right of genitive
leftCop<spred (@<SPRED) for Nom; if copulas is the main verb to the left, and there is no Ess found to the left of cop (note that Loc is allowed between target and cop). OR: if you are Coll or Sem/Group with copulas to your left.
<spredLocEXPERIMENT (@<SPRED) for material Loc; if you are to the right of copulas, and the Nom to the left of copulas is not a hab-actor
NumTime (@<SPRED) for A Nom
<spredSg (@<SPRED) for Sg Nom
<spredPg (@<SPRED) for Pl Nom
<spred (@<SPRED) for Nom; if copulas to the left, and Nom or sentence boundary to the left of copulas. First one to the right is EOS.
COP<spredEss (@<SPRED) for N Ess
spredEss> (@SPRED>) for N Ess; if copulas to the right of you, and if an NP with nom-case first one to your left.
GalleSpred> (@SPRED>) for Num Nom; if sentence initial
spredSgMII> (@SPRED>)
spredšaddat> (@SPRED>)
r492> (@SPRED>) for Interr Gen; consisting only of negations. You are not allowed to be MII. You are not allowed to have an adjective or noun to yor right. You are not allowed to have a verb to your right; the exception beeing an aux.
AdjSpredSg> (@SPRED>) for A Sg Nom; if copulas to the right, but not if A or @<SPRED are found to the right of copulas
Spred>SubjInf (@SPRED>) for Nom; if copulas to the right, and the subject of copulas is an Inf to the right
spredCoord (@<SPRED) coordination for Nom; only if there already is a SPRED to the left of CNP. Not if there is some kind of comparison involved.
subj>Sgnr1 (@SUBJ>) for Nom Sg, including Indef Nom if; VFIN + Sg3 or Pl3 to the right (VFIN not allowed to the left)
subj>Pl (@SUBJ>) for plural nominatives, including Coll and Sem/Group. VFIN + Pl3 to the right.
subj>Pl (@SUBJ>) for plural nominatives
subj>Sg (@SUBJ>) for Nom Sg; if VFIN + Sg3 to the right.
Sg<subj (@<SUBJ) for Nom Sg; if VFIN Sg3 or Du2 to the left (no HAB allowed to the left).
Du<subj (@<SUBJ) for Nom Coll if; a dual third person verb is found to the left
PlDu<subj (@<SUBJ) for (N Nom Pl), (Sem/Group Nom), (Coll Nom), (Pron Nom Pl) if; a verb is Pl3 or Du3 to your left. The verb is not allowed to be copulas with a place, Loc or time noun to its left
copPl3<subj (@<SUBJ) for Nom Pl; you don’t to be a noun, only Nom Pl. To the left is copulas and first one to the right is @<SPRED
-fsubj> (@-FSUBJ>) for HUMAN Gen; in a NP-clause. To your right is Actio Nom followed by a noun
f<advl (@-F<ADVL) for infinite adverbials
f<advl (@-F<ADVL) for infinite adverbials
s-boundary=advl> (@ADVL>) for ADVL that resemble s-boundaries. Mainverb to the right.
diibmuadvl> (@ADVL>) for (diibmu Nom) if first one to the right is Num
-fsubj (@-FSUBJ>) for HUMAN Acc after DADJAT verbs
-fobj> (@-FOBJ>) for Acc if front of abessive, gerundium, actio locative, perfectum participle or infinitive. First one to the right not allowed to be Acc though
-fobj> (@-FOBJ>) for Acc if human with ADVL-case to the left and transitive infinitive OBJ to the right. First one to the right not allowed to be Acc though
advl>mainV (@ADVL>) if; finite mainverb not found to the left, but the finite mainverb is found to the right.
V<advl (@<ADVL) if; finite mainverb found to the left. Not if a comma is found immediately to the left and a finite mainverb is located somewhere to the right of this comma.
advl>v (@ADVL>) if; you are ADVL, time-noun or Sem/Route and there is a finite verb to the right in the clause, or if to your right is: de followed by a finite verb. OR: if you are a time-nound and to your right is: go or sentenceboundary followed by a finite verb
advlPoPr> (@<ADVL) for Po or Pr; if mainverb to the right.
BOSPo> (@ADVL>) for Po; if trapped between BOS to the right and S-BOUNDARY OR COMMA to the left, because the main verb will then automatically be on your right side.
<advlComIll (@<ADVL) only if; you are Com OR Ill. To your left is a mainverb, and to your right a sentenceboundary, because we don’t want there to be another mainverb you potentially could belong to
<advlEOS (@<ADVL) for Po or Pr or Loc; if you are found at the very end of a sentence. A mainverb is needed to the left though.
<advlGen (@<ADVL) for (N Gen) if mainverb to the left and no noun to the right
<opredgohcodit (@<OPRED) for Ess
advlEss> (@<ADVL) for weather and time Ess, if FMAINV to the left.
comma<advlEOS (@<ADVL) for Adv if; mainverb is to the left. Comma to the left and mainverb to the right in the same clause is not allowed
advl>inbetween (@ADVL>) for Adv; if inbetween two sentenceboundaries where no mainverb is present.
comma<advlEOS (@<ADVL) for Adv if; comma found to the left and the finite mainverb to the left of comma. To the right is the end of the sentence.
BOSadvl> (@ADVL>) if; you are N Loc or N Ill and found sentence initially and there is a main verb somewhere to the right. No barrier for the mainverb; based on the thought that first one to your right is probably a sentenceboundary.
cleanupILL<advl (@<ADVL) for N Ill if; there are no boundarysymbols to your left, if you arent already @N< OR @APP-N<, and no mainverb is to yor left.
cleanupPo (@ADVL) for Po: This rule tags all Po:s as ADVL if they haven’t gotten a tag somewhere along the way.
cleanupPr (@ADVL) for Po: This rule tags all Pr:s as ADVL if they haven’t gotten a tag somewhere along the way.
-fsubj>asAcc (@-FSUBJ>) for HUMAN Acc; if there is a verb @-F<OBJ to your left
-f<obj (@-F<OBJ) for Acc if there is a transitive verb + SYN-V to your left
-fsubj>IV (@-FSUBJ>) for Acc; if there is an IV-verb acting as a @-F<OBJ to your right
-fsubj>IV (@-FSUBJ>) for Acc; if there is an TV-verb acting as a @-F<OBJ to your right followed by an Acc
-fsubj>asGen (@-FSUBJ>) for Gen;
f<subj (@-F<SUBJ) for Nom if; (V @-F<OBJ) to the left.
<opredAAcc (@<OPRED) for A Acc; if an other accusative to the left, and a transtive verb to the left of it. OR: if a transitive verb to the left, and an accusative to the left of it.
<advlMeasr (@<ADVL) for (Num Acc); if finite IV-mainverb to the left, measure-noun to the right
<objMeasr (@<OBJ) for Num Acc; if finite TV-mainverb to the left, measure-noun to the right
<advlMeasr2 (@<ADVL) for MEASR-N + Acc; if (Num Pl) to the left and mainverb to the left of it
advlMeasr> (@ADVL>) for Num Acc;
Obj> (@OBJ>) for Acc; if there is a finite mainverb to the right in the clause. A really simple rule with no other restrictions..
s-boun<obj (@<OBJ) for Acc; if sentenceboundary to your left and a transitive mainverb to the left futher to the left
<objIV (@<OBJ) for Acc; if there is an intransitive mainverb in the clause. Not for Rel or Num. Not if you are a numeral followed by a measure-noun
<advlEss (@<ADVL) for ESS-ADVL if; FMAINV to the left
IV<spredEss (@<SPRED) for N Ess if; FMAINV to the left is intransitive or bargat
<opredEss (@<OPRED) for (N Ess), (A Ess) if; transitive mainverb to the left in the clause. If accusative to the left or to the right, or if Inf or ahte to the right, or if there is a noun to the right followed by an Inf
Acc<opredEss (@<OPRED) for (N Ess), (A Ess) if; transitive mainverb to the left in the clause, and an accusative cased Rel left to the verb
onlyV<opred (@<OPRED) for (N Ess) if; there is a transitive mainverb to the left. Usually there needs to be an Acc to the left, but here it is not needed
onlyV<opred2 (@<OPRED) for (N Ess) if;
subj>ifV (@SUBJ>) for NP-HEAD-NOM, DUPRON or (Num Nom) if; a finite mainverb is found to the right. This is a cleanup rule for subjects
hnoun>ifV (@SUBJ>) for NP-HEAD-NOM, DUPRON if. The counterpart of subj>ifV. You are HNOUN if there is a finite verb to your right, but NOT if there is a finite verb after a relative clause
The analysis give double analysis because of optional semtags. We go for the one with semtag.
This (part of) documentation was generated from src/cg3/functions.cg3
This file documents affixes/abbreviations.lexc
, the file for Meänkieli abbreviation morphology
Now splitting according to POS, and according to dot or not
LEXICON ab-noun-itrab LEXICON ab-noun-trab LEXICON ab-noun-trnumab
LEXICON ab-noun
LEXICON ab-adj
LEXICON ab-adv
LEXICON ab-num
LEXICON ab-nodot-noun The bulk
LEXICON ab-nodot-adj
LEXICON ab-nodot-adv
LEXICON ab-nodot-num
LEXICON ab-dot-noun This is the lexicon for abbrs that must have a period.
LEXICON ab-dot-adj This is the lexicon for abbrs that must have a period.
LEXICON ab-dot-adv This is the lexicon for abbrs that must have a period.
LEXICON ab-dot-num This is the lexicon for abbrs that must have a period.
LEXICON ab-dot-cc
LEXICON ab-dot-verb
LEXICON nodot-attrnomaccgen-infl
LEXICON nodot-attr-infl
LEXICON nodot-nomaccgen-infl
LEXICON dot-attrnomaccgen-infl
LEXICON dot-attr
LEXICON dot-nomaccgen-infl
LEXICON DOT - Adds the dot to dotted abbreviations.
This (part of) documentation was generated from src/fst/morphology/affixes/abbreviations.lexc
This file documents affixes/acronyms.lexc
, the file for Meänkieli acronym morphology
LEXICON Acronym-fit-suf for adding +ACR tag
LEXICON ACRONOUN_cons
LEXICON ACRONOUN_vow
LEXICON UNIT As acro, but without paradigm
LEXICON ACRO_ACCRA
LEXICON ACRO_BERN
LEXICON ACRO_LONDON
LEXICON ACRO_NYSTØ
LEXICON ACRO_cons
LEXICON ACRO_vow
This (part of) documentation was generated from src/fst/morphology/affixes/acronyms.lexc
This file documents the file affixes/adjectives.lexc for Meänkieli adjective morphology.
Most lexica here (a1, a_e, …) add +A, and thereafter redirect to the
corresponding x1, x_e, … lexicon in affixes/nouns.lexc
for case inflection.
The lexicon numbers correspond to the ones for nouns.
In addition, each lexicon also points to comparative and superlative sublexica.
LEXICON ax pointing to a1. It is for adjectives that have still not been classified.
LEXICON a1 adding +A and sending to x1, and to 3comp, 3sup.
LEXICON a1_e vanha, which has Err/Orth vanhee-, otherwise like a1
LEXICON a_vasen adding +A and sending to x1, and to 3comp, 3sup.
LEXICON a_e gets +A and goes to x_e.
LEXICON a3 kamala gets +A and points to x3
+A: x3_sg ;
LEXICON a4 has no comparative or superlative , just points to x4
LEXICON anen has no comparative or superlative , just points to xnen
LEXICON aas has no comparative or superlative , just points to xnas
LEXICON a_suuri has no comparative or superlative , just points to x4
LEXICON a1_ton
LEXICON x1_ton
LEXICON 3comp 2syll adj, 3syll comparative
LEXICON 4comp 3syll adj, 4syll comparative
LEXICON xcomp common for 2syll and 3syll
LEXICON 3sup 2syll adj, 3syll superlative
LEXICON 4sup 3syll adj, 4syll superlative
LEXICON xsup common for 2syll and 3syll
This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc
This file documents affixes/nouns.lexc
, the file for Meänkieli noun morphology
n_äes = identical to 3n_ks except N+Sg+Nom (äes:äke)
LEXICON nx pointing to n1.
LEXICON n_nomorph for uninflected nouns
LEXICON nc for consonant-final nouns, structure CVC
LEXICON xc_sg
LEXICON xc_pl
LEXICON n0 for 1-syllabic: maa, suu, tie, …
LEXICON n0_pl for plurals of the same: häät
LEXICON x0 splitting to sg and pl
LEXICON x0_sg sg forms x0 point here
LEXICON x0_sg_oblique for oblique case forms in sg
LEXICON x0_pl for plural case forms
LEXICON n1 for 2-syll ordinary nouns (talo)
LEXICON n1_pl for the same plural words (urut)
LEXICON x1 for the bisyallbic, pointing to sg, pl
LEXICON x1_sg bisyllabic sg
LEXICON x1_sg_oblique gives the rest
LEXICON x1_pl the pl forms
LEXICON n_e vene, liike, säe
LEXICON n_e_pl vehkheet
LEXICON x_e splits in sg and pl
LEXICON x_e_sg the sg
LEXICON x_e_pl the pl
LEXICON x_e_pl urvakke etc, n_e-ord med -lle/-lla
LEXICON x_e_pl splits in sg and pl
LEXICON x_e_pl the sg
LEXICON x_e_pl the pl
LEXICON n3 odd-syllabic: kanava
LEXICON n3_pl haalarit
LEXICON x3
LEXICON x3_oblique
LEXICON x3_sg
LEXICON x3_oblique_sg
LEXICON x3_pl
LEXICON x3_pl
LEXICON 3nc
LEXICON xnc
LEXICON n4 kivi, stem kive
LEXICON x4 veri
LEXICON n4_pl
LEXICON x4_sg shared lexica for n4, n5, n5_lumi/loimi/lapsi EXCEPT SgNom, SgPar
LEXICON x4_pl
LEXICON n5 kieli, stem kiele
LEXICON n5 kieli, stem kiele
LEXICON n5_kieli kieli, stem kiele
LEXICON n5_lumi lumi, stem lu
LEXICON n5_loimi loimi, stem loi, som n5_lumi PLUS partitiv loimea
LEXICON n5_vuosi vuosi> vuoessa/vuessa, stem ELLER vu
LEXICON n5_kasi käsi, stem kä
LEXICON n5_kasi_pl continuation for kasi_pl
LEXICON x5_kasi veri
LEXICON x5_kasi_pl
LEXICON n5_lapsi
LEXICON n5_ie_odd
LEXICON n5_ie_odd same as n5_ie except Pl+Part: takki>takkeja
LEXICON n5_nuoret_pl same as n1_pl except Pl+Gen: nuoret>nuorten
LEXICON n5_i_pl cont lexica for type n1-words ending with -i
LEXICON x5_i_pl cont lexica for type n1-words ending with -i
LEXICON nen bisyllabic nainen stem nai
LEXICON nen_sg
LEXICON nen_pl
LEXICON xnen
LEXICON xnen_sg +Sg:se 2cases ; for Ade, All, Ess lla, lle, nna
LEXICON xnen_pl
LEXICON 3nen odd-syllabic hevonen stem hevose
LEXICON x3nen
LEXICON x3nen_sg
LEXICON x3nen_pl
LEXICON xnen_common_sg
LEXICON xnen_common_pl
LEXICON 3cases
LEXICON 2cases
LEXICON 3n_ks
LEXICON 3n_ks_pl
LEXICON xn_ks
LEXICON xn_ks_sg
LEXICON xn_ks_pl
LEXICON n_äes
LEXICON x_äes
LEXICON 3n_ue
LEXICON 3x_ue
LEXICON 3x_ue_sg
LEXICON 3x_ue_pl
LEXICON 3n_ime
LEXICON 3n_ime_sg
LEXICON 3n_ime_pl
LEXICON x_ime_sg
LEXICON x_ime_pl
LEXICON nas
LEXICON xnas
LEXICON xnas_sg
LEXICON xnas_pl
LEXICON xnas_pl
LEXICON xnas_pl
LEXICON nas_h_pl
LEXICON 3mies
LEXICON n_ien
LEXICON n_ien_sg
LEXICON n_uus
LEXICON n_uus_odd
LEXICON 3n_lnr ahven - ahvenheen
LEXICON 3n_kymmen 3n_kymmen
LEXICON 30n_lnr askel - askelheesheen
LEXICON n_kasuven
LEXICON 3xn_lnr tyär, kort och lång Ill
LEXICON 3n_lnr_inteill inte Ill, Ine, Ess men alla andra
LEXICON 4n_ks
LEXICON x4n_ks
LEXICON x4n_ks_sg
LEXICON x4n_ks_pl
LEXICON TRA
Px is now not in use, with one exception, comitative.
LEXICON n_PxK has either -n or goes to Px LEXICON n_PxK
LEXICON a_PxK has either -s or goes to Px with -a LEXICON a_PxK
LEXICON s_PxK has either -s or goes to Px LEXICON s_PxK
LEXICON sh_PxK has either -s or goes to Px with -he- LEXICON sh_PxK
LEXICON st_PxK has either -s or goes to Px with -te- rakuaus, rakhauteni LEXICON st_PxK
LEXICON t_PxK has either -t or goes to Px LEXICON t_PxK
LEXICON i_PxK Tra: -i or -e and goes to Px LEXICON i_PxK
LEXICON PxK has only -nsA, compare PxxK LEXICON PxK
LEXICON PxxK has also -Vn, thus both .. llensa and ..lleen. LEXICON PxxK
LEXICON Px
LEXICON Px-Vn
LEXICON n5_troppi troppi tropin troppia?
LEXICON n5_troppi_odd
This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc
From fin via fkv.
Numeral inflection is like nominal, except that numerals compound in all forms which requires great amount of care in the inflection patterns.
kaksi+Num+Sg+Nom
(Eng. # two)kaks: kaksi+Num+Sg+Nom
yksi+Num+Sg+Nom
(Eng. # one)yks: yksi+Num+Sg+Nom
(Eng. # one)
kahet: kaksi+Num+Pl+Nom
yhet: yksi+Num+Pl+Nom
kaksi+Num+Sg+Gen
kaksi+Num+Sg+Ade
kaksi+Num+Sg+Abl
kaksi+Num+Sg+All
kaksi+Num+Sg+Ine
kaksi+Num+Sg+Ela
kaksi+Num+Sg+Tra
kahetta: kaksi+Num+Sg+Abe
yksi+Num+Sg+Gen
yksi+Num+Sg+Ade
yksi+Num+Sg+Abl
yksi+Num+Sg+All
yksi+Num+Sg+Ine
yksi+Num+Sg+Ela
yksi+Num+Sg+Tra
yhettä: yksi+Num+Sg+Abe
kahtena: kaksi+Num+Sg+Ess
yhtenä: yksi+Num+Sg+Ess
kaksi+Num+Pl+Ade
kaksi+Num+Pl+Abl
kaksi+Num+Pl+All
kaksi+Num+Pl+Ine
kaksi+Num+Pl+Ela
kaksi+Num+Pl+Tra
kaksitta: kaksi+Num+Pl+Abe
yksi+Num+Pl+Ade
yksi+Num+Pl+Abl
yksi+Num+Pl+All
yksi+Num+Pl+Ine
yksi+Num+Pl+Ela
yksi+Num+Pl+Tra
yksittä: yksi+Num+Pl+Abe
kaksi+Num+Pl+Ess
kaksine: kaksi+Num+Pl+Com
kaksi+Num+Pl+Ess
kaksine: kaksi+Num+Pl+Com
yksi+Num+Pl+Ess
yksine: yksi+Num+Pl+Com
yksi+Num+Pl+Ess
yksine: yksi+Num+Pl+Com
kaheksee: kaheksen+Num+Sg+Par
(Eng. !eight)
kolmee: kolme+Num+Sg+Par
(Eng. !three)
kuutta: kuusi+Num+Sg+Par
(Eng. !six)
viittä: viisi+Num+Sg+Par
(Eng. !five)
kaheksheen: kaheksen+Num+Sg+Ill
kolmheen: kolme+Num+Sg+Ill
viitheen: viisi+Num+Sg+Ill
miljardhiin: miljardi+Num+Sg+Ill
(Eng. !billion)
kaksii: kaksi+Num+Pl+Par
miljardii: miljardi+Num+Pl+Par
kaksiin: kaksi+Num+Pl+Gen
kuusi+Num+Pl+Gen
kuutten: kuusi+Num+Pl+Gen
(Eng. !kuussiin on tärkeämpi)
viisi+Num+Pl+Gen
viitten: viisi+Num+Pl+Gen
(Eng. !viissiin on tärkeämpi)
Numeral plural genitive in back examples:*
Numeral plural genitive in front examples:*
miljardhiin: miljardi+Num+Pl+Ill
kakshiin: kaksi+Num+Pl+Ill
kahteni: kaksi+Num+Sg+Nom+PxSg1
(Eng. !Kainun kielessä possessiivisuffiksiita käytethään aika vähän. Annamme niiden olla täällä toistaiseksi.)
yhteni: yksi+Num+Sg+Nom+PxSg1
kolmeensa: kolme+Num+Sg+Par+PxSg3
kaksi+Num+Sg+Tra+PxSg3
kahekseen: kaksi+Num+Sg+Tra+PxSg3
nelje+Num+Sg+Tra+PxSg3
neljekseen: nelje+Num+Sg+Tra+PxSg3
viisi+Num+Sg+Par+PxSg3
viittään: viisi+Num+Sg+Par+PxSg3
kaksi+Num+Sg+Nom+Foc/han
kakshan: kaksi+Num+Sg+Nom+Foc/han
yksi+Num+Sg+Nom+Foc/han
ykshän: yksi+Num+Sg+Nom+Foc/han
LEXICON ARABICCASES adds +Arab
LEXICON ARABICCASE adds +Arab
This (part of) documentation was generated from src/fst/morphology/affixes/numerals.lexc
Meänkieli pronoun morphology
This file documents affixes/pronouns.lexc
, the file for Meänkieli verb morphology
Pronominit ovat edelleen vaan kokeiluvaiheessa.
LEXICON 12pronsg on 1., 2. p. yksikkö
LEXICON 123pronpl
nuoitä
tuotä
This (part of) documentation was generated from src/fst/morphology/affixes/pronouns.lexc
This file documents affixes/propernouns.lexc
, the file for Meänkieli propernoun morphology. The file pointing here is stems/fit-propernouns.lexc
The lexicon names look like this: p_mal_1
etc. They have 3 parts, divided by “_”
affixes/noun.lexc
file. Thus, _1 points to the lexicon x1, etc.We do not use _pl for names
LEXICON p_plc_0
LEXICON p_sur_0
LEXICON p_surplc_0
LEXICON p_sur_4
LEXICON p_surplc_4
LEXICON p_21ie
LEXICON p_22oi
LEXICON p_nen
LEXICON p_C
LEXICON p_ani_1
LEXICON p_ani_41
… and many more.
Vowel stems, odd and even stems
Consonant stems, odd and even stems
This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc
This file documents affixes/synbols.lexc
, the file for the affixes added to language-independent symbols
+N+Symbol: SYMBOL_connector ;
+N+Symbol: # ;
+Sg+Nom: # ;
This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc
This file documents affixes/verbs.lexc
, the file for Meänkieli verb morphology
LEXICON OLLA LEXICON OLLA olla-paradigm
LEXICON NEG negation verb
LEXICON v1_otta otta-lexicon
LEXICON v1_tietaa tietää-lexicon
LEXICON v1 sanoa, lukea, antaa
LEXICON v1_odd käsittää>käsittänny osv
LEXICON v2 huomata, haluta övriga former
LEXICON v2_ata masinata etc
LEXICON v2_ata_odd huomata etc
LEXICON v2_uta haluta etc
LEXICON v2_havata havata-paradigm
LEXICON v3_syä syä, myä, lyä .#.
LEXICON v3_jua jua, lua, sua, tua .#.
LEXICON v3_ja for inf with ’a; saaja
LEXICON v3_ta maata
LEXICON v3_j contlex for viejä mfl
LEXICON v3_viä
LEXICON v3_other contlex för v3-type (saaja, syöjä)
LEXICON v3_kaya käyä:kä
LEXICON v3_nahha nähä:nä
LEXICON v3_tehha tehä:te
LEXICON v4 tulla, mennä etc
LEXICON v4_syljästä julkasta etc, points to v4_julkas
LEXICON v4_julkasta julkasta etc
LEXICON v4_julkas julkasta etc
LEXICON v4_3la varjela:varjel
LEXICON v4_4lla lauleskella etc
LEXICON v5 kehitä:kehi
LEXICON v5_keritä keritä:kerki
LEXICON v6 = paeta:pake
LEXICON 2cond for -imm^A
from fkv
LEXICON v12pers Only sg12, pl12 so far
LEXICON PRFPRC_OBL is without nom sg from fkv
This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc
This file documents the Meänkieli twolc file (the file governing gradation, gemination, vowel harmony and other morphophonological processes).
The first part of the file contains definitions, the second part contains rules.
This defines all symbols (letters, archiphonemes, triggers) to be used.
Here we group the symbols in convenient sets.
This defines strings used often in rules.
WeakGrade = ([l|n|r]) (%^AE:) %^WG:
This chapter gives the rules themselves.
For the gradation rules, each consonant deletion or change is given its own rule. Thus, both kk:k and k:0 are handled in the same *k:0 rule. This to avoid rule conflicts. The change rules (k:g, k:j etc.) are restricted by context (k:g only after n, etc.).
RULE: f:0
RULE: j:0
RULE: k:g
Tests:
RULE: k:0
Tests:
RULE: k:j
RULE: k4:j
Tests:
sylje0>n
!
(Eng. k3:j ?)!
(Eng. k:0?)RULE: k:v
Tests:
RULE: k:v
RULE: m:0
RULE: n:0
RULE: p:0
Tests:
RULE: p:v
Tests:
RULE: p:m
RULE: p:m
RULE: r:0
RULE: t:0
Tests:
RULE: t4:0 where t4 is t in rt that shall not become rr
Tests:
RULE: t:j
Tests:
**RULE: t:l ** for lt:ll
Tests:
**RULE: t:n ** for nt:nn
Tests:
**RULE: t:r ** for rt:rr
Tests:
RULE: t:s
Tests:
RULE: v:0
The gemination rules insert the geminated consonant (thus 0:h if h to the left). There is one subrule for each vowel context, in order to avoid confilcts.
RULE: Gemination 0:h
RULE: Gemination 0:j
RULE: Gemination 0:k
Tests:
RULE: Gemination 0:l
Tests:
RULE: Gemination 0:m
RULE: Gemination 0:n
RULE: Gemination 0:p
RULE: Gemination 0:s
Tests:
RULE: h:0
RULE: h:0
RULE: h:0
kasva>hm^A^An kasva>mhaan
saarna>^A>hm^A^An saarna>a>hmaan
tule>hm^A^An tule>mhaan
RULE: Gemination 0:t
Tests:
RULE: Gemination 0:v Tests:
These are assimilation rules for n on suffix borders of LNRS consonant stems. There is also a rule j:0 avoiding a lji sequence.
RULE: Alveolar assimilation for consonant stem l
Tests:
RULE: Alveolar assimilation for consonant stem r
RULE: Alveolar assimilation for consonant stem s in infinitives Tests:
RULE: Alveolar assimilation for consonant stem s in participles
Here come the rules for stem vowel changes in front of suffix -i- (be it plural, present, comparative or conditional). Vowels are deleted or changed according to context. There are also some other vowel change rules.
RULE: a:e before the ^AE trigger
RULE: a:0 before metathesis h
Tests:
RULE: a:o when nonrounded root vowel and before i
Tests:
RULE: ä:0
Tests:
RULE: ä:e
RULE: e:0 deletes -e- in LNR stems as well as before -i-
Tests:
RULE: e:i
Tests:
RULE: i:0
Tests:
RULE: i:j
RULE: i2:j
RULE: i8:0
sano>0
alko>0
Tests:
RULE: i:e
RULE: o:0
Tests:
RULE: ö:0
Tests:
RULE: u:0
Tests:
RULE: y:0
Tests:
These are the rules connected to the Meänkieli -h- suffixes. The vowel must be copied from the stem to the right of the h and also deleted in the stem (cf. talo : talhoon)
RULE: a copying for h metathesis
Tests:
hint00>haan
RULE: o copying for h metathesis
Tests:
RULE: i copying for h metathesis
Tests:
RULE: ä copying for h metathesis
RULE: e copying for h metathesis
RULE: ö copying for h metathesis
RULE: y copying for h metathesis
RULE: u copying for h metathesis
All vowel harmony is taken care of with one rule.
RULE: Back harmony
Tests:
keskus>ta
This (part of) documentation was generated from src/fst/morphology/phonology.twolc
Beware of remnants from the Finnish and Kven files.
+Use/-TTS – never retained in the HFST Text-To-Speech disambiguation tokeniser
These three tags are not added in lexc. The POS tag before derivation is converted into this tag when compiling FST for disambiguation.
Tag
We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:
Flag | Explanation |
---|---|
@P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.
Flag | Explanation |
---|---|
@P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
@D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
@P.CmpPref.FALSE@ | Block these words from making further compounds |
@D.CmpLast.TRUE@ | Block such words from entering R |
@D.CmpSuff.TRUE@ | Block such words from entering R |
@P.CmpSuff.TRUE@ | Mark that we have passed R |
@D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
@U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
@P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
@D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.
Flag | Explanation |
---|---|
@U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
These tags are for handling errorneous forms | Flag | Explanation | |—– |———– | | @D.ErrOrth.ON@ | tbw | @P.ErrOrth.ON@ | tbw | @C.ErrOrth@ | tbw | @R.ErrOrth.ON@ | tbw
This is for pronouns with multiple case suffixes (jommallekummalle)
Flag | Explanation |
---|---|
@U.pron.nom@ | tbw |
@U.pron.gen@ | tbw |
@U.pron.gen2@ | tbw |
@U.pron.ill@ | tbw |
@U.pron.par@ | tbw |
@U.pron.par2@ | tbw |
@U.pron.par3@ | tbw |
@U.pron.ess@ | tbw |
@U.pron.tra@ | tbw |
@U.pron.ine@ | tbw |
@U.pron.ela@ | tbw |
@U.pron.all@ | tbw |
@U.pron.ade@ | tbw |
@U.pron.abl@ | tbw |
@P.compound.block@ | tbw |
@D.compound.block@ | tbw |
These are for preprocessing
Flag | Explanation |
---|---|
@P.Pmatch.Loc@ | |
@P.Pmatch.Backtrack@ | |
+Use/PMatch | |
+Use/-PMatch | |
+Gram/TAbbr | Transitive abbreviation (it needs an argument) |
+Gram/NoAbbr | Intransitive abbreviations that are homonymous with more frequent words. They should only be considered abbreviations in the middle of a sentence. |
+Gram/TNumAbbr | Transitive abbreviation if the following constituent is numeric |
+Gram/NumNoAbbr | Transitive abbreviations for which numerals are complements and normal words. The abbreviation usage is less common and thus only the occurences in the middle of the sentence can be considered as true cases. |
+Gram/TIAbbr | Both transitive and intransitive abbreviation |
+Gram/IAbbr | Intransitive abbreviation (it takes no argument) |
+Gram/3syll | trisyllabic verbs |
+Gram/Superl | superlative |
+Gram/Comp | comparative |
Here is the Root lexicon, pointing to all the parts of speech:
LEXICON Root
This (part of) documentation was generated from src/fst/morphology/root.lexc
This file documents the file for Meänkieli adjectives.
LEXICON AdjectiveRoot
This (part of) documentation was generated from src/fst/morphology/stems/adjectives.lexc
This file documents the file for Meänkieli adverbs.
The first part of the file adds tags, and the second lists the adverbs.
+Adv: K ;
+Adv: K ;
This (part of) documentation was generated from src/fst/morphology/stems/adverbs.lexc
This file documents the file for Meänkieli conjunctions.
It contains two parts, one for adding tags, and one for listing conjunctions.
+CC: # ;
+CC: # ;
This (part of) documentation was generated from src/fst/morphology/stems/conjunctions.lexc
This file documents the file for Meänkieli abbreviations.
The file contains 5-6 abbreviations, and is thus just a placeholder. Most fit abbreviations thus come from the common abbreviation file. Here we should add meänkieli-specific ones.
LEXICON ITRAB
e.Kr+Adv:e.Kr ab-dot-adv-itrab ;
LEXICON TRNUMAB
nro+N:nro ab-noun-trnumab ;
LEXICON TRAB
esim+A:esim ab-dot-adj-trab ;
This (part of) documentation was generated from src/fst/morphology/stems/fit-abbreviations.lexc
The file stems/fit-acronyms.lexc is a dummy file, with this comtent only:
This (part of) documentation was generated from src/fst/morphology/stems/fit-acronyms.lexc
This file documents the file for Meänkieli propernouns.
Contrary to other GiellaLT languages, the Meänkieli FST is not set up to use the language-independent name base found in the infrastructure.
The lexicon names look like this: p_mal_1
etc. They have 3 parts, divided by “_”
affixes/noun.lexc
file. Thus, _1 points to the lexicon x1, etc.
We do not use _pl for names (except for plural names).32000 names
This (part of) documentation was generated from src/fst/morphology/stems/fit-propernouns.lexc
This file documents the file for Meänkieli interjections.
LEXICON ijx +Interj: K ;
This (part of) documentation was generated from src/fst/morphology/stems/interjections.lexc
This file documents the file for Meänkieli nouns.
This is an overview of the continuation lexicon types.
n_äes = identical to 3n_ks except N+Sg+Nom (äes:äke)
The lemma list
häpy n1 ;
This (part of) documentation was generated from src/fst/morphology/stems/nouns.lexc
This file documents the file for Meänkieli numerals.
These are taken from fkv, but originally from fin, an FST with very different ways of doing things.
Numerals have been split in three sections, the compounding parts of cardinals and ordinals, and the non-compounding ones:
kaksi+Num+Sg+Nom#kymmenen+Num+Sg+Par#kolme+Num+Sg+Nom#tuhat+Num+Sg+Par
(Eng. ! 23,000)kaksi+Num+Sg+Nom#kymmenen+Num+Sg+Par#kolme+Num+Sg+Nom#tuhat+Num+Sg+Par
kahes+A+Ord+Sg+Nom#saas+A+Ord+Sg+Nom#neljes+A+Ord+Sg+Nom
(Eng. ! 204rd)viitisen+Num#kymmentä
(Eng. ! 50-ish)The compounding parts of cardinals are the number multiplier words.
yksi+Num+Sg+Nom
(Eng. !one)yksi+Num+Sg+Nom
viisi+Num+Sg+All
(Eng. !five)tuhat+Num+Sg+Par
(Eng. !thousand)The suffixes only appear after cardinal multipliers
viisi+Num+Sg+Nom#kymmentä
viisi+Num+Sg+Nom#kymmentä
nelje+Num+Sg+Nom#sata+Num+Sg+Par#tuhatta
The compounding parts of ordinals are the number multiplier words.
neljes+A+Ord+Sg+Nom
viies+A+Ord+Sg+All
tuhanes+A+Ord+Sg+Par
The suffixes only appear after cardinal multipliers
viies+A+Ord+Sg+Nom#kymmenes
neljes+A+Ord+Sg+Nom#saas+A+Ord+Sg+Nom#tuhanes
There is a set of numbers or corresponding expressions that work like them, but are not basic cardinals or ordinals:
viitisen+Num#kymmentä
Numerals follow the same stem variation patterns as nouns, some of these being very rare to extinct for nouns.
yksi+Num+Sg+Nom
yksi+Num+Sg+Nom
(Eng. !sallima puhekieliset haamut ko “yks” ja “kaks”)yksi+Num+Sg+Ill
yksi+Num+Sg+Ess
yksi+Num+Sg+Ine
yksi+Num+Sg+Ade
yksi+Num+Sg+Par
yksi+Num+Pl+Par
yksi+Num+Pl+Gen
yksi+Num+Pl+Ill
yksi+Num+Pl+Ess
yksissä: yksi+Num+Pl+Ine
kaksi+Num+Sg+Nom
kaksi+Num+Sg+Nom
(Eng. !sallima puhekieliset haamut ko “yks” ja “kaks”)kaksi+Num+Sg+Ill
kaksi+Num+Sg+Ess
kaksi+Num+Sg+Ine
kaksi+Num+Sg+Ade
kaksi+Num+Sg+Par
kaksi+Num+Pl+Par
kaksi+Num+Pl+Gen
kaksi+Num+Pl+Ill
kaksi+Num+Pl+Ess
kaksissa: kaksi+Num+Pl+Ine
kolme+Num+Sg+Nom
kolme+Num+Sg+Ill
kolme+Num+Sg+Ill
kolme+Num+Sg+Ess
kolme+Num+Sg+Ine
kolme+Num+Sg+Ade
kolme+Num+Sg+Par
kolme+Num+Pl+Par
kolme+Num+Pl+Gen
kolme+Num+Pl+Ill
kolme+Num+Pl+Ill
kolme+Num+Pl+Ess
kolmissa: kolme+Num+Pl+Ine
nelje+Num+Sg+Nom
nelje+Num+Sg+Par
nelje+Num+Sg+Ill
nelje+Num+Sg+Ess
nelje+Num+Sg+Ine
nelje+Num+Sg+Ade
nelje+Num+Pl+Par
nelje+Num+Pl+Gen
nelje+Num+Pl+Gen
(Eng. !harvinainen muoto)nelje+Num+Pl+Ill
nelje+Num+Pl+Ess
neljissä: nelje+Num+Pl+Ine
viisi+Num+Sg+Nom
viisi+Num+Sg+Nom
viisi+Num+Sg+Ill
viisi+Num+Sg+Par
viisi+Num+Sg+Ine
viisi+Num+Sg+Ade
viisi+Num+Sg+Ess
viisi+Num+Pl+Ine
viisi+Num+Pl+Par
viisi+Num+Pl+Gen
viisi+Num+Pl+Gen
(Eng. !harvinainen muoto)viisi+Num+Pl+Ill
viisinä: viisi+Num+Pl+Ess
kuusi+Num+Sg+Nom
kuusi+Num+Sg+Nom
kuusi+Num+Sg+Par
kuusi+Num+Sg+Ess
kuusi+Num+Sg+Ine
kuusi+Num+Sg+Ade
kuusi+Num+Pl+Ess
kuusi+Num+Pl+Ine
kuusi+Num+Pl+Par
kuusi+Num+Pl+Gen
kuusi+Num+Pl+Gen
(Eng. !harvinainen muoto)kuushiin: kuusi+Num+Pl+Ill
kaheksen+Num+Sg+Nom
kaheksen+Num+Sg+Par
kaheksen+Num+Sg+Ill
kaheksen+Num+Sg+Ine
kaheksen+Num+Sg+Ade
kaheksen+Num+Sg+Ess
kaheksen+Num+Pl+Par
kaheksen+Num+Pl+Par
kaheksen+Num+Pl+Gen
kaheksen+Num+Pl+Ill
kaheksen+Num+Pl+Ine
kaheksinna: kaheksen+Num+Pl+Ess
yheksän+Num+Sg+Nom
yheksän+Num+Sg+Par
yheksän+Num+Sg+Ill
yheksän+Num+Sg+Ine
yheksän+Num+Sg+Ade
yheksän+Num+Sg+Ess
yheksän+Num+Pl+Par
yheksän+Num+Pl+Par
yheksän+Num+Pl+Gen
yheksän+Num+Pl+Ill
yheksän+Num+Pl+Ine
yheksinnä: yheksän+Num+Pl+Ess
kymmenen+Num+Sg+Nom
kymmenen+Num+Sg+Ill
kymmenen+Num+Sg+Ess
kymmenen+Num+Sg+Ine
kymmenen+Num+Sg+Ade
kymmenen+Num+Sg+Par
kymmenen+Num+Pl+Gen
kymmenen+Num+Pl+Gen
kymmenen+Num+Pl+Ill
kymmenen+Num+Pl+Ine
kymmeninnä: kymmenen+Num+Pl+Ess
sata+Num+Sg+Nom
sata+Num+Sg+Ess
sata+Num+Sg+Ine
sata+Num+Sg+Ill
sata+Num+Sg+Par
sata+Num+Pl+Gen
sata+Num+Pl+Ill
sata+Num+Pl+Ine
satoina: sata+Num+Pl+Ess
tuhat+Num+Sg+Nom
tuhat+Num+Sg+Ill
tuhat+Num+Sg+Ess
tuhat+Num+Sg+Ine
tuhat+Num+Sg+Par
tuhat+Num+Pl+Par
tuhat+Num+Pl+Gen
tuhat+Num+Pl+Gen
(Eng. !harvinainen muoto)tuhat+Num+Pl+Ill
tuhat+Num+Pl+Ess
tuhansissa: tuhat+Num+Pl+Ine
miljoona+Num+Sg+Nom
miljoona+Num+Sg+Ess
miljoona+Num+Sg+Ine
miljoona+Num+Sg+Par
miljoona+Num+Sg+Ill
miljoona+Num+Pl+Par
miljoona+Num+Pl+Gen
miljoona+Num+Pl+Ill
miljoona+Num+Pl+Ine
miljooninna: miljoona+Num+Pl+Ess
miljardi+Num+Sg+Nom
miljardi+Num+Sg+Ill
miljardi+Num+Sg+Par
miljardi+Num+Sg+Ine
miljardi+Num+Sg+Ess
miljardi+Num+Pl+Ill
miljardi+Num+Pl+Par
miljardi+Num+Pl+Ine
miljardi+Num+Pl+Gen
miljardiina: miljardi+Num+Pl+Ess
Googol: Googol+Num+Sg+Nom
pari+Num+Sg+Nom
pari+Num+Sg+Ill
pari+Num+Sg+Par
pari+Num+Sg+Ess
pari+Num+Sg+Ine
pari+Num+Pl+Ine
pari+Num+Pl+Ess
pari+Num+Pl+Par
pari+Num+Pl+Gen
parhiin: pari+Num+Pl+Ill
ensimäinen+A+Ord+Sg+Nom
ensimäinen+A+Ord+Sg+Ess
ensimäinen+A+Ord+Sg+Ine
ensimäinen+A+Ord+Sg+Par
ensimäinen+A+Ord+Pl+Gen
ensimäinen+A+Ord+Pl+Gen
ensimäinen+A+Ord+Pl+Par
ensimäinen+A+Ord+Pl+Ill
ensimäinen+A+Ord+Pl+Ess
ensimäisissä: ensimäinen+A+Ord+Pl+Ine
toinen+A+Ord+Sg+Nom
toinen+A+Ord+Sg+Ill
toinen+A+Ord+Sg+Par
toinen+A+Ord+Sg+Ine
toinen+A+Ord+Sg+Ade
toinen+A+Ord+Sg+Ess
toinen+A+Ord+Pl+Gen
toinen+A+Ord+Pl+Gen
toinen+A+Ord+Pl+Par
toinen+A+Ord+Pl+Ill
toinen+A+Ord+Pl+Ine
toisina: toinen+A+Ord+Pl+Ess
kolmas+A+Ord+Sg+Nom
kolmas+A+Ord+Sg+Ess
kolmas+A+Ord+Sg+Ine
kolmas+A+Ord+Sg+Ade
kolmas+A+Ord+Sg+Ill
kolmas+A+Ord+Sg+Par
kolmas+A+Ord+Pl+Par
kolmas+A+Ord+Pl+Gen
kolmas+A+Ord+Pl+Ine
kolmansinna: kolmas+A+Ord+Pl+Ess
neljes+A+Ord+Sg+Nom
neljes+A+Ord+Sg+Ill
neljes+A+Ord+Sg+Ess
neljes+A+Ord+Sg+Ine
neljes+A+Ord+Sg+Ade
neljes+A+Ord+Sg+Par
neljes+A+Ord+Pl+Par
neljes+A+Ord+Pl+Gen
neljes+A+Ord+Pl+Ill
neljes+A+Ord+Pl+Ine
neljensinnä: neljes+A+Ord+Pl+Ess
This (part of) documentation was generated from src/fst/morphology/stems/numerals.lexc
This file documents the file for Meänkieli postpositions.
+Po: K ;
This (part of) documentation was generated from src/fst/morphology/stems/postpositions.lexc
This file documents stems/prepositions.lexc
, the file for Meänkieli prepositions
+Pr: K ; prx
+Pr: K ;
+Pr: KK ;
This (part of) documentation was generated from src/fst/morphology/stems/prepositions.lexc
This file documents the file for Meänkieli pronouns.
se+Pron+Dem+Sg: se_pron ;
nämä+Pron+Dem+Pl:näi namaobl ;
joka+Pron+Interr+Sg:jo relkys ;
harva+Pron:pron pron_x1 ;
This (part of) documentation was generated from src/fst/morphology/stems/pronouns.lexc
This file documents the file for Meänkieli subjunctions.
+CS: # ;
This (part of) documentation was generated from src/fst/morphology/stems/subjunctions.lexc
This file documents the file for Meänkieli verb stems.
First, it gives an nverview of the continuation lexica, and thereafter it sketches their actual content.
The rest of the file contains some 5500 verbs.
Irregular verbs
v1 sanoa, lukea
v2 tryykätä
v3 syödä, juoda
jua:ju v3_jua ;
v4 tulla, mennä
v5 tarvita
v6 paeta
Then comes the long list
This (part of) documentation was generated from src/fst/morphology/stems/verbs.lexc
retroflex plosive, voiceless t ʈ 0288, 648 (
= ASCII 096)
retroflex plosive, voiced d ɖ 0256, 598
labiodental nasal F ɱ 0271, 625
retroflex nasal n
ɳ 0273, 627
palatal nasal J ɲ 0272, 626
velar nasal N ŋ 014B, 331
uvular nasal N\ ɴ 0274, 628
bilabial trill B\ ʙ 0299, 665
uvular trill R\ ʀ 0280, 640
alveolar tap 4 ɾ 027E, 638
retroflex flap r ɽ 027D, 637
bilabial fricative, voiceless p\ ɸ 0278, 632
bilabial fricative, voiced B β 03B2, 946
dental fricative, voiceless T θ 03B8, 952
dental fricative, voiced D ð 00F0, 240
postalveolar fricative, voiceless S ʃ 0283, 643
postalveolar fricative, voiced Z ʒ 0292, 658
retroflex fricative, voiceless s
ʂ 0282, 642
retroflex fricative, voiced z` ʐ 0290, 656
palatal fricative, voiceless C ç 00E7, 231
palatal fricative, voiced j\ ʝ 029D, 669
velar fricative, voiced G ɣ 0263, 611
uvular fricative, voiceless X χ 03C7, 967
uvular fricative, voiced R ʁ 0281, 641
pharyngeal fricative, voiceless X\ ħ 0127, 295
pharyngeal fricative, voiced ?\ ʕ 0295, 661
glottal fricative, voiced h\ ɦ 0266, 614
alveolar lateral fricative, vl. K alveolar lateral fricative, vd. K\
labiodental approximant P (or v) alveolar approximant r\ retroflex approximant r` velar approximant M\
retroflex lateral approximant l`
palatal lateral approximant L
velar lateral approximant L
Clicks
bilabial O\ (O = capital letter)
dental |
(post)alveolar !\
palatoalveolar =\
alveolar lateral ||
Ejectives, implosives
ejective > e.g. ejective p p> implosive < e.g. implosive b b< Vowels
close back unrounded M close central unrounded 1 close central rounded } lax i I lax y Y lax u U
close-mid front rounded 2 close-mid central unrounded @\ close-mid central rounded 8 close-mid back unrounded 7
schwa ə @
open-mid front unrounded E open-mid front rounded 9 open-mid central unrounded 3 open-mid central rounded 3\ open-mid back unrounded V open-mid back rounded O
ash (ae digraph) { open schwa (turned a) 6
open front rounded & open back unrounded A open back rounded Q Other symbols
voiceless labial-velar fricative W voiced labial-palatal approx. H voiceless epiglottal fricative H\ voiced epiglottal fricative <\ epiglottal plosive >\
alveolo-palatal fricative, vl. s\ alveolo-palatal fricative, voiced z\ alveolar lateral flap l\ simultaneous S and x x\ tie bar _ Suprasegmentals
primary stress “
secondary stress %
long :
half-long :\
extra-short _X
linking mark -
Tones and word accents
level extra high _T level high _H level mid _M level low _L level extra low _B downstep ! upstep ^ (caret, circumflex)
contour, rising contour, falling _F contour, high rising _H_T contour, low rising _B_L
contour, rising-falling _R_F
(NB Instead of being written as diacritics with _, all prosodic
marks can alternatively be placed in a separate tier, set off
by < >, as recommended for the next two symbols.)
global rise
voiceless 0 (0 = figure), e.g. n_0 voiced _v aspirated _h more rounded _O (O = letter) less rounded _c advanced _+ retracted _- centralized _” syllabic = (or _=) e.g. n= (or n=) non-syllabic _^ rhoticity `
breathy voiced _t creaky voiced _k linguolabial _N labialized _w palatalized ‘ (or _j) e.g. t’ (or t_j) velarized _G pharyngealized _?\
dental d apical _a laminal _m nasalized ~ (or _~) e.g. A~ (or A~) nasal release _n lateral release _l no audible release _}
velarized or pharyngealized _e velarized l, alternatively 5 raised _r lowered _o advanced tongue root _A retracted tongue root _q
This (part of) documentation was generated from src/fst/phonetics/txt2ipa.xfscript
We describe here how abbreviations are in Tornedalen Finnish are read out, e.g. for text-to-speech systems.
For example:
This (part of) documentation was generated from src/fst/transcriptions/transcriptor-abbrevs2text.lexc
This file is copied from the Finnish one. It should thus be Meänkielified. Transcribing numbers to words in Finnish is not completely trivial, one reason is that numbers in Finnish are written as compounds, regardless of length: 123456 is satakaksikymmentäkolmetuhattaneljäsataaviisikymmentäkuusi. Another limitation is that inflections can be unmarked in running text, that is digit expression is assumed to agree the case of the phrase it is in, e.g. 27 is kaksikymmentäseittemän, and 27:lle kahdellekymmenelleseittemälle but in a phrase: “tarjosin 27 osanottajalle” 27 assumes the allative case without marking and it is preferred grammatical form in good writing.
Flag diacritics in number transcribing are used to control case agreement: in Finnish numeral compounds all words agree in case except in nominative singular where 10’s exponential multipliers are in singular partitive.
@U.CASE.SGNOM@
for singular nominative agreement@U.CASE.SGALL@
for singular allative agreementThe morphotactics related to numbers and their transcriptions is that we need to know the whole digit string to know how the length of whole digit string to know what to start reading, and zeroes are not read out but have an effect to readout. The numerals are systematic and perfectly compositional: the implementation of 100 000–999 999 is almost exactly same as 100 000 000–999 000 000 and everything afterwads with the change of word tuhat~tuhatta, miljoona~miljoonaa, miljardia, biljoonaa, biljardia and so forth–that is along the long scale British (French) system where American billion = milliard etc. The numbers are built from ~single word length blocks in decreasing order with the exception of zig-zagging over numbers 11–19 where the second digit comes before first. The rest of this documentation describes the morphotactic implementation by the lexicon structure in descending order of magnitude with examples.
yksi
kaksikymmentäyksi
kolmesataakaksikymmentäyksi
neljätuhattakolmesataakaksikymmentäyksi
viisikymmentäneljätuhattakolmesataakaksikymmentäyksi
kuusisataaviisikymmentäneljätuhattakolmesataakaksikymmentäyksi
seittemänmiljoonaakuusisataaviisikymmentäneljätuhattakolmesataakaksikymmentäyksi
Lexicon HUNDREDSMRD
contains numbers 2-9 that need to be followed by exactly
11 digits: 200 000 000 000–999 999 999 999
this is to implement Nsataa…miljardia…
Lexicon CUODIMRD
contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…miljardia…
kaksisataamiljardia
Lexicon HUNDREDMRD
is for numbers in range: 100 000 000 000–199 000 000 000
this is to implement sata…miljardia…
satamiljardia
Lexicon TEENSMRD
is for numbers with 11 000 000 000–19 000 000 000
this is to implement …Ntoista…miljardia…
Lexicon TEENMRD
is for numbers with 11 000 000 000–19 000 000 000
this is to implement …Ntoista…miljardia…
kaksitoistailjardia
Lexicon TENSMRD
is for numbers with 20 000 000 000–90 000 000 000
this is to implement …Nkymmentä…miljardia…
Lexicon TENMRD
is for numbers with 10 000 000 000–10 999 999 999
this is to implement …kymmenenmiljardia…
kymmenenmiljardia
Lexicon LÅGEVMRD
is for numbers with 20 000 000 000–90 000 000 000
this is to implement …Nkymmentä…miljardia…
kaksikymmentämiljardia
Lexicon ONESMRD
is for numbers with 1 000 000 000–9 000 000 000
this is to implement …Nmiljardia…
Lexicon MILJARD
is for numbers with 1 000 000 000–9 000 000 000
this is to implement …Nmiljardia…
kaksimiljardia
Lexicon OVERMILLIONS
is for the millions part of numbers greater than 1 milliard
Lexicon HUNDREDSM
contains numbers 2-9 that need to be followed by exactly
8 digits: 200 000 000–999 999 999
this is to implement Nsataa…miljoonaa…
Lexicon CUODIM
contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…miljoonaa…
kaksisataamiljoonaa
Lexicon HUNDREDM
is for numbers in range: 100 000 000–199 000 000
this is to implement sata…miljoonaa…
Lexicon TEENSM
is for numbers with 11 000 000–19 000 000
this is to implement …Ntoista…miljoonaa…
Lexicon TEENM
is for numbers with 11 000 000–19 000 000
this is to implement …Ntoista…miljoonaa…
kaksitoistamiljoonaa
Lexicon TENSM
is for numbers with 20 000 000–90 000 000
this is to implement …Nkymmentä…miljoonaa…
Lexicon TENM
is for numbers with 10 000 000–10 999 999
this is to implement …kymmenenmiljoonaa…
kymmenenmiljoonaa
Lexicon LÅGEVM
is for numbers with 20 000 000–90 000 000
this is to implement …Nkymmentä…miljoonaa..
kaksikymmentämiljoonaa
Lexicon ONESM
is for numbers with 1 000 000–9 000 000
this is to implement …Nmiljoonaa…
Lexicon MILJON
is for numbers with 1 000 000–9 000 000
this is to implement …Nmiljoonaa…
kaksisataamiljoonaa
Lexicon UNDERMILLION
is for numbers with 100 000–900 000 after milliards
Lexicon OVERTHOUSANDS
is for the thousands part of numbers greater than 1 million
Lexicon HUNDREDST
contains numbers 2-9 that need to be followed by exactly
5 digits: 200 000–999 999
this is to implement Nsataa…tuhatta…
Lexicon CUODIT
contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…tuhatta…
kaksisataatuhatta
Lexicon HUNDREDT
is for numbers in range: 100 000–199 000
this is to implement sata…tuhatta…
Lexicon TEENST
is for numbers with 11 000–19 000
this is to implement …Ntoista…tuhatta…
Lexicon TEENT
is for numbers with 11 000–19 000
this is to implement …Ntoista…tuhatta…
kaksitoistatuhatta
Lexicon TENST
is for numbers with 20 000–90 000
this is to implement …Nkymmentä…tuhatta…
Lexicon TENT
is for numbers with 10 000 000–10 999 999
this is to implement …kymmenentuhatta…
kymmenentuhatta
Lexicon LÅGEVT
is for numbers with 20 000–90 000
this is to implement …Nkymmentä…tuhatta..
kaksikymmentätuhatta
Lexicon ONEST
is for numbers with 1 000–9 000
this is to implement …Ntuhatta…
Lexicon THOUSANDS
is for numbers with 1 000–9 000
this is to implement …Ntuhatta…
kaksituhatta
kolmetuhattaneljäsataaviisikymmentäkuusi
Lexicon THOUSAND
is for the ones-tens-hundreds of numbers greater than thousand
Lexicon UNDERTHOUSAND
is for numbers with 100–900 after thousands
Lexicon HUNDREDS
contains numbers 2-9 that need to be followed by exactly
2 digits: 200–999
this is to implement Nsataa…
Lexicon CUODI
contains numbers 2-9 that need to be followed by exactly
this is to implement Nsataa…
kaksisataa
kolmesataaneljäkymmentäviisi
Lexicon HUNDRED
is for numbers in range: 100–999
Lexicon TEENS
is for numbers with 11–19
this is to implement …Ntoista
Lexicon TEEN
is for numbers with 11–19
this is to implement …Ntoista
ykstoista
kakstoista
kolmetoista
Lexicon TENS
is for numbers with 20–90
this is to implement …Nkymmentä…
Lexicon LÅGEV
is for numbers with 20–90
this is to implement …Nkymmentä…
kaksikymmentä
kolmekymmentäneljä
Lexicon JUSTTEN
is for number 10
this is to implement …kymmenen
kymmenen
Lexicon ONES
is for numbers with 1–9
this is to implement yksi, kaksi, kolme…, yheksän
yksi
kaksi
kolme
Lexicon ZERO
is for number 0
nolla
nolla
Lexicon LOPPU
is to implement potential case inflection with a colon.
yhdelle
Note: accepting or rejecting case inflected digit strings without explicit
suffix can be changed here.This (part of) documentation was generated from src/fst/transcriptions/transcriptor-numbers-digit2text.lexc
[ L A N G U A G E ] G R A M M A R C H E C K E R
This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.
BOS EOS
N A Adv V Pron CS CC CC-CS Po Pr Pcle Num Interj ABBR ACR CLB LEFT RIGHT WEB PPUNCT PUNCT
COMMA ¶
Pers Dem Interr Indef Recipr Refl Rel Coll NomAg Prop Allegro Arab Romertall
Nom Acc Gen Ill Loc Com Ess Ess Sg Du Pl Cmp/SplitR Cmp/SgNom Cmp/SgGen Cmp/SgGen PxSg1 PxSg2 PxSg3 PxDu1 PxDu2 PxDu3 PxPl1 PxPl2 PxPl3 Px
Comp Superl Attr Ord Qst IV TV Prt Prs Ind Pot Cond Imprt ImprtII Sg1 Sg2 Sg3 Du1 Du2 Du3 Pl1 Pl2 Pl3 Inf ConNeg Neg PrfPrc VGen PrsPrc Ger Sup Actio VAbess
Err/Orth
Sem/Act Sem/Ani Sem/Atr Sem/Body Sem/Clth Sem/Domain Sem/Feat-phys Sem/Fem Sem/Group Sem/Lang Sem/Mal Sem/Measr Sem/Money Sem/Obj Sem/Obj-el Sem/Org Sem/Perc-emo Sem/Plc Sem/Sign Sem/State-sick Sem/Sur Sem/Time Sem/Txt
HUMAN
PROP-ATTR PROP-SUR
TIME-N-SET
@+FAUXV @+FMAINV @-FAUXV @-FMAINV @-FSUBJ> @-F<OBJ @-FOBJ> @-FSPRED<OBJ @-F<ADVL @-FADVL> @-F<SPRED @-F<OPRED @-FSPRED> @-FOPRED> @>ADVL @ADVL< @<ADVL @ADVL> @ADVL @HAB> @<HAB @>N @Interj @N< @>A @P< @>P @HNOUN @INTERJ @>Num @Pron< @>Pron @Num< @OBJ @<OBJ @OBJ> @OPRED @<OPRED @OPRED> @PCLE @COMP-CS< @SPRED @<SPRED @SPRED> @SUBJ @<SUBJ @SUBJ> SUBJ SPRED OPRED @PPRED @APP @APP-N< @APP-Pron< @APP>Pron @APP-Num< @APP-ADVL< @VOC @CVP @CNP OBJ