Kven Finnish NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-fkv

Page Content

  • src-fst-morphology-affixes-adjectives.lexc.md
  • The base lexica
  • Basic paradigms
  • src-fst-morphology-affixes-nouns.lexc.md
  • Sublexica for NounRoot
  • src-fst-morphology-affixes-numerals.lexc.md
  • Kven numerals
  • Numeral inflection
  • src-fst-morphology-affixes-pronouns.lexc.md
  • Pronominien morfologia
  • src-fst-morphology-affixes-propernouns.lexc.md
  • src-fst-morphology-affixes-symbols.lexc.md
  • Symbol affixes
  • src-fst-morphology-affixes-verbs.lexc.md
  • LEXICA FOR KVEN VERB INFLECTION
  • src-fst-morphology-phonology.twolc.md
  • Phonological rules for Kven
  • Rules
  • src-fst-morphology-root.lexc.md
  • Kven morphological transducer
  • src-fst-morphology-stems-adjectives.lexc.md
  • Kven language adjectives
  • src-fst-morphology-stems-adverbs.lexc.md
  • src-fst-morphology-stems-closed.lexc.md
  • Closed parts of speech
  • src-fst-morphology-stems-fkv-abbreviations.lexc.md
  • src-fst-morphology-stems-nouns.lexc.md
  • Substantiivit
  • src-fst-morphology-stems-numerals.lexc.md
  • Kven numerals
  • src-fst-morphology-stems-postpositions.lexc.md
  • Postpositioiden vartalot
  • src-fst-morphology-stems-prepositions.lexc.md
  • Prepositiot
  • src-fst-morphology-stems-pronouns.lexc.md
  • Pronominien vartalot
  • src-fst-morphology-stems-propernouns.lexc.md
  • Propernoun lexicon for Kven
  • src-fst-morphology-stems-verbs.lexc.md
  • Verbivartalot
  • Lexicon VerbRoot
  • src-fst-phonetics-txt2ipa.xfscript.md
  • src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md
  • tools-grammarcheckers-grammarchecker.cg3.md
  • DELIMITERS
  • TAGS AND SETS
  • Kven Finnish language model documentation

    All doc-comment documentation in one large file.


    src-cg3-disambiguator.cg3.md

    Disambiguator for Kven

    Sets

    Sentence delimiters are the following: “<.>” “<…>” “<!>” “<?>” “<¶>”

    Part-of-Speech

    Numerus

    Cases

    Types

    Sets with more members

    Boundaries

    Verbs

    Disambiguation rules

    Dialects

    Early rules

    Possessive suffixes

    Numeral phrases

    Preposition/postposition/adverb rules

    Rules for mapping @CVP and @CNP on the CC and CS

    Case rules

    Partitive

    Genitive

    Illative

    Number rules

    More disambiguation rules

    Elative

    Propernouns

    Verbs

    Specific verbs

    ei negation verb

    eli

    Adverbs

    paljon

    kerran

    jälkhiin

    Adjectives

    Conjunctions

    Subjunctions

    että

    jos

    ko

    sillä

    Pronouns

    Verb rules, Verbs

    Infinitive

    Present Sg3

    Present Pl3 or Passive

    Imperative

    HNOUN MAPPING


    This (part of) documentation was generated from src/cg3/disambiguator.cg3


    src-cg3-old_disambiguation.cg3.md


    This (part of) documentation was generated from src/cg3/old_disambiguation.cg3


    src-fst-morphology-affixes-abbreviations.lexc.md

    Lexicons without final period

    Lexicons with final period


    This (part of) documentation was generated from src/fst/morphology/affixes/abbreviations.lexc


    src-fst-morphology-affixes-adjectives.lexc.md

    Affix file for Kven adjectives

    The base lexica

    Each a_ lexicon gets the +A tag and is then redirected to a common x_ lexicon in the noun file, while compar and superl are here.

    long_par ;

    LEXICON MATON ! TODO TODO TODO ajattelemattomalle lle lla nna

    LEXICON MATONodd ! käymättömäle le la na ! TODO Probably not in use atm

    Basic paradigms

    Most cases are directed to affixes/nouns.lexc

    Lexica for the non-uniform cases


    This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc


    src-fst-morphology-affixes-nouns.lexc.md

    Sublexica for NounRoot

    Aloitin sovittaa analyysin Eiran kieliopin mukaan (Porsanki). Leksikot ovat nyt (tai pitää olla) n11, n12, jne., eli siis Eiran nominit tyyppi 1.1., 1.2., jne.

    2007 s.87

    Noun types

    Incomong

    the same affix in sg and pl

    Eira’s classification

    the same affix in sg and pl

    LEXICON n_11_E2I on ovi:ove, joki:joke, hyksi, suomi ! Ei lainasanat

    these two lexica for cases with

    the same affix in sg and pl

    LEXICON n_11_E2I_pl sakset, hykset

    the same affix in sg and pl

    LEXICON n_12 kieli

    LEXICON n_12_pl nuoret

    LEXICON n_12_hi lohi, tuohi, riihi

    LEXICON n_12_mi lumi lu

    LEXICON n_12_si käsi, hirsi

    LEXICON n_12_vuosi vuosi, vuona

    LEXICON x_12_vuosi

    LEXICON n_12_kusi kusi kuusi

    LEXICON x_12_kusi kusi kuusi these two lexica for cases with

    the same affix in sg and pl

    LEXICON n_12_lapsi lapsi la

    LEXICON x_12_lapsi lapsi la these two lexica for cases with

    the same affix in sg and pl

    LEXICON n_12_mies mies mie

    LEXICON x_12_mies mies mie these two lexica for cases with

    the same affix in sg and pl

    LEXICON n_22 tytär:tyttär, taival:taipal

    LEXICON n_22_pl tytär:tyttär, taival:taipal

    LEXICON n_22_m elläin elläi yđin ydin

    LEXICON x_22_m

    LEXICON n_22_m_pl

    LEXICON n_22_s sairhaus

    LEXICON n_22_s_even avaruus

    LEXICON n_32_as rakas, asukas

    LEXICON n_32_as_pl rakas, asukas

    LEXICON n_32_ae kevät

    LEXICON n_32_is ruvis, ruumis

    LEXICON n_32_et venet, hyljet, huonet

    LEXICON n_32_et_pl venet, hyljet, huonet

    LEXICON n_32_et_2 askel, kyynel, kannel

    LEXICON n_32_ut vantut:vantutta:vantthuut

    LEXICON n_32_ut_pl vantthuut

    LEXICON n_32_ts kirves

    LEXICON x_32_ts kirves

    these two lexica for cases with

    the same affix in sg and pl – tsekkaa long

    the same affix in sg and pl – tsekkaa long

    the same affix in sg and pl – tsekkaa long

    these two lexica for cases with

    the same affix in sg and pl

    LEXICON n_22_inen_odd ihminen

    these two lexica for cases with

    the same affix in sg and pl

    LEXICON n_22_inen_pl ! tervheiset tervhei the same affix in sg and pl

    LEXICON n_22_inen_pl_even ! olympialaiset the same affix in sg and pl

    LEXICON n_32_nu ! oppinu

    LEXICON n_21_odd_i_poengi ! poengi-poengissa-poengiissa

    LEXICON n_tuhat ! poengi-poengissa-poengiissa

    +N:se nomgen_px ;

    make+N+Pl+kom:sine K ;

    Basic paradigms

    Sublexica for the basic unified cases, with even and odd variations

    Sublexica for Gen, Par, Ill, Ess and Com.

    Sublexica for possessive suffixes

    Px is now not in use, with one exception, comitative.

    LEXICON n_PxK has either -n or goes to Px

    LEXICON i_PxK Tra: -i or -e and goes to Px

    LEXICON PxK has only -nsA, and is currently not in use. Check

    LEXICON PxxK has also -Vn, thus both .. llensa and ..lleen.

    Tästä tiedostosta löytyvät suljetut sanaluokat.

    Px-Vn leksikossa on vain kuusi sanaa

    Basic paradigms

    Basic vowel stems

    !LEXICON strong_v_stem_even ‘‘Kaikki vahvan asteen sijamuodot tähän’’

    !LEXICON strong_v_stem_odd ‘‘Kaikki vahvan asteen sijamuodot tähän’’

    Sublexica for the vowel stems


    This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc


    src-fst-morphology-affixes-numerals.lexc.md

    Kven numerals

    Numeral inflection

    Numeral inflection is like nominal, except that numerals compound in all forms which requires great amount of care in the inflection patterns.


    This (part of) documentation was generated from src/fst/morphology/affixes/numerals.lexc


    src-fst-morphology-affixes-pronouns.lexc.md

    Pronominien morfologia

    Pronominit ovat edelleen vaan kokeiluvaiheessa.

    LEXICON 12pronsg on 1., 2. p. yksikkö

    LEXICON 12pronsg_short on 1., 2. p. yksikkö

    LEXICON 123pronpl

    nuoitä

    tuotä


    This (part of) documentation was generated from src/fst/morphology/affixes/pronouns.lexc


    src-fst-morphology-affixes-propernouns.lexc.md

    2007 s.87


    This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc


    src-fst-morphology-affixes-symbols.lexc.md

    Symbol affixes


    This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc


    src-fst-morphology-affixes-verbs.lexc.md

    LEXICA FOR KVEN VERB INFLECTION

    This file documents affixes/verbs.lexc

    Auxiliaries

    LEXICON neg jakaantuu kolmeen

    LEXICON indneg negaation indikatiivipäätteet

    Note that lexicon indneg directs to K_NEG and not to K, this is since we have enkä but not enkin

    LEXICON imprtneg negaation imperatiivipäätteet

    LEXICON OLLA on oma leksikko, vielä vain preesens

    Regular verbs

    Verbiluokat v1…

    Luokkien v1, v2… Eiran kieliopin mukaan. Tarvitaan luultavasti alaluokkia.

    Jokaisessa leksikossa on infinitiivi, kolmannen persoonan pääteet, ja viittaus preesensin ja preteritin kautta leksikkoon v12pers, missä -n, -t, -mA, -ttA

    LEXICON v1 saađa:saa

    LEXICON v1iđa Cond uiđa:uisin

    LEXICON v1kayda käyđä:kä

    LEXICON v1nahda nähđä:nä

    LEXICON v1tehda tehđä:te

    Verbiluokat v2…

    LEXICON v2 ! aikkoot:aiko, anttaat:anta, assuut:asu, kattoot:katt2o, kulkkeet:kulke, lähteet:lähte, lenttäät:lentä, lukkeet:luke, luottaat:luotta, näkkyyt:näky, pittäät:pitä, soppiit:sopi (tämä ei ole yaml)

    LEXICON v2_si ! pyyttäät:pyysi

    LEXICON v2_taittaat ! taisi, tainu / taitanu

    LEXICON v2_sauttaat ! sauttaat-sautti

    LEXICON v2_särkkyyt ! särkkyyt-särjyn

    LEXICON v2_tiettäät tiettäät

    LEXICON v2_odd kirjoittaat:kirjoitta

    LEXICON v2_odd_UUt hyväksyyt:hyväksy

    Verbiluokat v3…

    LEXICON v3_ele_short = nielä:niel

    LEXICON v3_ele ajatella:ajattel

    LEXICON v3_ele_odd kävelä:kävel

    LEXICON v3_ise aukaista:aukaise

    LEXICON v3_aise aukaista:aukaise

    LEXICON v3_lnr ja vartalo on pan-

    LEXICON v3_s kusta, nousta, pestä, päästä ja vartalo on kus-

    LEXICON v3piera pierä:pie

    LEXICON v3juosta juosta:juo

    Verbiluokat v4…

    LEXICON v4 vanheta:vanhe, pajeta:pake

    LEXICON v4_itte ja vartalo on - kyyti

    LEXICON v4_oitte ja vartalo on haravoi-

    LEXICON v43 hantteerata:hantteera

    LEXICON v43_odd jatkata:jatka

    LEXICON v43_odd_II jatkata:jatka

    Verbien persoonapäätteet

    Tämä osa antaa personapäätteet.

    1. ja 2. persoona erikseen, koska ne ovat samoja preesensissä ja preterissä.
    2. persoona annettiin jo vartaloleksikoissa.

    LEXICON v12pers Only sg12, pl12 so far

    LEXICON PRFPRC_OBL is without nom sg

    LEXICON PRFPRC_OBL_nny is without nom sg

    LEXICON PRFPRC_OBL_nnu is without nom sg

    LEXICON PRFPRC_OBL_lly is without nom sg

    LEXICON PRFPRC_OBL_llu is without nom sg


    This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc


    src-fst-morphology-phonology.twolc.md

    Phonological rules for Kven

    This file documents the phonology.twolc file

    We first define alphabets and sets. Thereafter come the rules.

    Alphabets and sets

    Alphabet

    The letters
    The archiphonemes
    Letters with deviant behaviour
    Triggers
    Dialect tags

    Literal quotes and angles

    Thesemust be escaped (cf morpheme boundaries further down):

    Sets

    Development principles: — NO UNCLEAR SEQUENCES WITHOUT AN EXPLANATION (and TESTS): (Cns:) :Cns+ Cns: (:Cns) — One TRIGGER, one change! No ^AO that means a:o and a:0 and a:i, then 3 DIFFERENT triggers

    Trigger order

    (to be completed). The triggers should be in this order both in lexc and here in twolc.

    stem Dial: WG {T0,TJ} {E0,I0,E2I,A2I,AO,AE,VDEL,UU} HMETA > {i2:,i3:,i4:](i5:} > suffixes

    Rules

    (Divided into consonant and vowel rules)

    Consonant rules

    Gemination rules

    Pitkän vokaalin jälkeen ja kans painottoman tavun jälkeen k, t, p ja s geminoituvat ko perässä on pitkä vokaali (= lounaismurteitten erikoisgeminaatio), mutta muut konsonantit geminoituvat vaan lyhyen painollisen tavun jälkeen (= yleisgeminaatio). (ES).

    Rule: Gemination 0:h

    Rule: Gemination 0:j

    Rule: Gemination 0:k

    Tests:

    Rule: Gemination 0:l

    Rule: Gemination 0:m

    Rule: Gemination 0:n

    Rule: Gemination 0:p

    Rule: Gemination 0:r

    Rule: Gemination 0:s

    Rule: Gemination 0:t

    Rule: Gemination 0:v

    Gradation rules

    Rules for p gradation

    Rule: Gradation p:0 (pp:p)

    Rule: Gradation p:v

    Rule: Gradation mp:mm

    Tests:

    Rules for k gradation

    Rule: Gradation i6:0, in word poika: pojan

    Tests:

    TODO: When k:j and when k:0 between e and i.

    Rule: Gradation k:j

    Rule: Gradation k:0

    Tests:

    Tests:

    Rule: Gradation k3:0

    Tests:

    Rule: Gradation k:v

    Tests:

    Rule: Gradation nk:ng

    Tests:

    Gradation t

    Rule: Gradation Nt:NN in first syllable after short vowel

    Rule: Gradation t:0 for tt:t, Nt:N and vuote:vuoeksi

    Tests:

    Rule: ti:si

    Tests:

    Rule: t:j in Var variant vuojeksi

    Rule: o:u in vuosi vuote vuoet -> vuuet optional variant

    Tests:

    Rule: Gradation t:đ

    Tests:

    Assimilation rules

    Rule: Alveolar assimilation for consonant stem l

    Rule: Alveolar assimilation for consonant stem r

    Rule: Alveolar assimilation for consonant stem s

    Rule: j:0 in front of i

    Vowel rules

    Vowel harmony rules

    Thefa idea of having f.ex V:e, always to a specific vowel, is to not have conflicts in twolc compilation. This improves compilation time (we assume) and it make Twolc behave more predictable. Weird things happen sometimes with conflicts! The downside is that often you have linguistic rules for vowels that are similar for a group of vowels! This is most of case. And now you have to edit it for all vowel separately one by one. Hopefully, we can make sure we don’t forget to do it too often, by having more user feedback, especially from the paradigms in the dictionary.

    For each Vowel separately

    Rule: ^V:e

    Tests:

    Rule: ^V:a

    Tests:

    Rule: ^V:ä

    Tests:

    Rule: ^V:i

    Tests:

    Rule: ^V:o

    (the old system)

    with variables (Vx/Vy) instead of each vowel separately

    Rule: Back harmony for %^A: %^O: %^U:

    Tests:

    Vow copying and metathesis

    Rule: Vow copying in short h-illative and short partitive sg

    Tests:

    Rule: Vow copying in partitive of words ending in io, ia

    Rule: Vow copying in long h sg forms both part one and part two

    Rule: Vow copying in long h pl forms

    Rule: a to o and metathesis in h forms in pl of a-stems

    Rule: Stem deletion in h-illative

    Tests:

    Stem alternation rules

    e rules

    Rule: e:i in nom.sg. of e-stems and in n_23ia kauhia hopia in Var

    Rule: e:0 in consonant stems and illative plural

    Tests:

    Cns:0 in hoppe- hope-a in Var @RULENAME@ Jok

    i rules

    The -i- rules require different i-s for different POS.

    Rule: i:0

    a rules

    Rule: a:0 before Pret and Pl i when rounded root vowel

    Tests:

    Rule: **a:o before Pl i and Pret i **

    Rule: ä:ö before Pl i

    Tests:

    Rule: a:i in 3-syll stems with long a and i

    Tests:

    Rule: a:i in 3-syll stems with long a and i

    Shortening

    Rule: Shortening of long vowel in front of i

    Other Vowel rules

    (two A:e rules and one ä:0)

    Rule: a:e in comparative

    Tests:

    Rule: a:e in passives

    Rule: ä:0

    Tests:

    Rule: ö:0

    Rule: o:0

    Gemination tests

    Tests:

    Rule: o:0

    Rule: o:0

    Rule: o:0


    This (part of) documentation was generated from src/fst/morphology/phonology.twolc


    src-fst-morphology-root.lexc.md

    Kven morphological transducer

    Beware of remnants from the Finnish file. Take nothing at face value!

    Tags for POS

    Tags for grammar

    Pronoun types

    Number

    Number-person

    Case

    Comparatives

    Finite verbs

    Infinite verbs

    Punctuation

    Speller tags

    Usage tags

    Compounds

    Derivation

    Clitic tags

    Tokeniser tags

    Semantic tags

    Dialect tags

    Lexeme disambiguation tags

    Stem variant tags

    Phonological symbols

    Flag diacritics

    We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:

    Flag Explanation
    @P.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
    @D.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
    @C.NeedNoun@ (Dis)allow compounds with verbs unless nominalised

    For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.

    Flag Explanation
    @P.CmpFrst.FALSE@ Require that words tagged as such only appear first
    @D.CmpPref.TRUE@ Block such words from entering ENDLEX
    @P.CmpPref.FALSE@ Block these words from making further compounds
    @D.CmpLast.TRUE@ Block such words from entering R
    @D.CmpSuff.TRUE@ Block such words from entering R
    @P.CmpSuff.TRUE@ Mark that we have passed R
    @D.CmpNone.TRUE@ Combines with the next tag to prohibit compounding
    @U.CmpNone.FALSE@ Combines with the prev tag to prohibit compounding
    @P.CmpOnly.TRUE@ Sets a flag to indicate that the word has passed R
    @D.CmpOnly.FALSE@ Disallow words coming directly from root.

    Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.

    Flag Explanation
    @U.Cap.Obl@ Allowing downcasing of derived names: deatnulasj.
    @U.Cap.Opt@ Allowing downcasing of derived names: deatnulasj.
    @C.ErrOrth@ tbw
    @D.ErrOrth.ON@ tbw
    @P.ErrOrth.ON@ tbw
    @R.ErrOrth.ON@ tbw
    @P.Pmatch.Loc@ Used on multi-token analyses; tell hfst-tokenise/pmatch where in the form/analysis the token should be split.
    @P.Pmatch.Backtrack@ Used on single-token analyses; tell hfst-tokenise/pmatch to backtrack by reanalysing the substrings before and after this poin in the form (to find combinations of shorter analyses that would otherwise be missed)

    Pronoun flags

    Flag Explanation
    @U.pron.nom@ tbw
    @U.pron.gen@ tbw
    @U.pron.gen2@ tbw
    @U.pron.ill@ tbw
    @U.pron.par@ tbw
    @U.pron.par2@ tbw
    @U.pron.par3@ tbw
    @U.pron.ess@ tbw
    @U.pron.tra@ tbw
    @U.pron.ine@ tbw
    @U.pron.ela@ tbw
    @U.pron.all@ tbw
    @U.pron.ade@ tbw
    @U.pron.abl@ tbw
    @P.compound.block@ tbw
    @D.compound.block@ tbw
    Flag diacritic Explanation
    @U.number.one@ Flag used to give arabic numerals in smj different cases ;
    @U.number.two@ Flag used to give arabic numerals in smj different cases ;
    @U.number.three@ Flag used to give arabic numerals in smj different cases ;
    @U.number.four@ Flag used to give arabic numerals in smj different cases ;
    @U.number.five@ Flag used to give arabic numerals in smj different cases ;
    @U.number.six@ Flag used to give arabic numerals in smj different cases ;
    @U.number.seven@ Flag used to give arabic numerals in smj different cases ;
    @U.number.eight@ Flag used to give arabic numerals in smj different cases ;
    @U.number.nine@ Flag used to give arabic numerals in smj different cases ;
    @U.number.zero@ Flag used to give arabic numerals in smj different cases ;

    Basic lexica, pointing to the other lexicon files

    Here is the Root lexicon, pointing to all the parts of speech:

    LEXICON Root

    LEXICON Acronym pointing to:

    LEXICON Abbreviation pointing to:


    This (part of) documentation was generated from src/fst/morphology/root.lexc


    src-fst-morphology-stems-adjectives.lexc.md

    Kven language adjectives

    AdjectiveRoot on alkuvaiheessa vielä.

    TYYPPI 1: Kaksitavuiset lyhykäisvokaaliset rangat

    Kaksitavuiset lyhykäisvokaaliset rangat

    TYYPPI 2: Usheempitavuiset lyhykäisvokaaliset rangat

    Usheempitavuiset lyhykäisvokaaliset rangat

    Usheempitavuiset (Konsonanttirankaiset?)

    Usheempitavuiset (Ekstrakonsonanttirankaiset?)

    Usheempitavuiset -nen-adjektiivit

    (Usheempitavuiset) -ton-adjektiivit, Pariton määrä tavuja

    (Usheempitavuiset) -ton-adjektiivit, Parillinen määrä tavuja

    TYYPPI 3: Pitkävokaaliset rangat

    Pitkävokaaliset rangat: (Yksirankaiset kaksitavuiset (Kokoontumanom.?) ?)

    Pitkävokaaliset Konsonanttirankaiset (Kokoontuma?)

    2. Vokaalirankaiset

    2.1 Kaksitavvuiset lyhykäisvokaaliset rangat

    2.2. Usseempitavvuiset lyhykäisvokaaliset rangat

    2.3 Yksirankaiset kaksitavuiset (Kokoontumanom.?)

    3. Konsonanttirankaiset

    3.1. Kaksitavuiset

    3.2 Usseaempitavuiset

    3.3 Kokoontuma

    1. Ekstrakonsonanttirankaiset

    4.1 ekstrakonsonanttirankaiset

    4.2 -nen-adjektiivit

    4.3 -ton-adjektiivit


    This (part of) documentation was generated from src/fst/morphology/stems/adjectives.lexc


    src-fst-morphology-stems-adverbs.lexc.md

    Sannoi ja haamui listan mukkaan

    Adverbiaaleja myötä?

    TT: Ei. Tässä pitää olla adverbeja vaan. Adverbiaali on syntaktinen kategoria ja saa analyysinsa eri ohjelmassa (src/syntax/disambiguation.cg3)


    This (part of) documentation was generated from src/fst/morphology/stems/adverbs.lexc


    src-fst-morphology-stems-closed.lexc.md

    Closed parts of speech

    Tästä tiedostosta löytyvät suljetut sanaluokat.

    Particle leksikossa on vain kuusi sanaa

    Subjunction -leksikossa on tärkeimmät sanat (koska, että, jos, ..)

    Conjunction -leksikossa on vain pari sanaa. joko - tai

    Interjection -leksikossa on vain pari sanaa. yäk, kääk, nono


    This (part of) documentation was generated from src/fst/morphology/stems/closed.lexc


    src-fst-morphology-stems-fkv-abbreviations.lexc.md

    File containing abbreviations

    Tämä on saamenkielinen lyhennelista. Se pitää vaihtaa.


    This (part of) documentation was generated from src/fst/morphology/stems/fkv-abbreviations.lexc


    src-fst-morphology-stems-nouns.lexc.md

    Substantiivit

    LEXICON NounRoot

    Nomenityyppit s.147 (Eira published book 2014) (Niitähän oon 3)

    TYYPPI 1: Kaksitavuiset lyhykäisvokaaliset rangat

    TYYPPI 2: Usheempitavuiset lyhykäisvokaaliset rangat

    TYYPPI 3: Pitkävokaaliset rangat

    Leksikkonimet Eiran mukaan

    2007 s.87


    This (part of) documentation was generated from src/fst/morphology/stems/nouns.lexc


    src-fst-morphology-stems-numerals.lexc.md

    Kven numerals

    Numerals have been split in three sections, the compounding parts of cardinals and ordinals, and the non-compounding ones:

    The compounding parts of cardinals are the number multiplier words.

    The suffixes only appear after cardinal multipliers

    The compounding parts of ordinals are the number multiplier words.

    The suffixes only appear after cardinal multipliers

    There is a set of numbers or corresponding expressions that work like them, but are not basic cardinals or ordinals:

    Numeral stem variation

    Numerals follow the same stem variation patterns as nouns, some of these being very rare to extinct for nouns.


    This (part of) documentation was generated from src/fst/morphology/stems/numerals.lexc


    src-fst-morphology-stems-postpositions.lexc.md

    Postpositioiden vartalot

    Tässä sekä postpositiot ja niiden tagi.

    LEXICON post on itse +Po -tagi

    Postpositiot itse sijaitsevat leksikossa LEXICON Postposition

    Sannoi ja haamui vesta/Varenki 2012 mukkaan.


    This (part of) documentation was generated from src/fst/morphology/stems/postpositions.lexc


    src-fst-morphology-stems-prepositions.lexc.md

    Prepositiot

    Adposittiit eli postposittiit ja preposittiit – esimerkkii Aikamatka-sanalista

    pr for +Pr tag

    Preposition for +Pr tag


    This (part of) documentation was generated from src/fst/morphology/stems/prepositions.lexc


    src-fst-morphology-stems-pronouns.lexc.md

    Pronominien vartalot

    Itse taivutus jatkuu

    Persoonapronominit

    Demonstratiivipronominit

    Kysymäpronominit

    Relatiivipronominit

    Refleksiivi- ja resiprookipronominit

    Indefiniittipronominit/Kvanttoripronominit


    This (part of) documentation was generated from src/fst/morphology/stems/pronouns.lexc


    src-fst-morphology-stems-propernouns.lexc.md

    Propernoun lexicon for Kven

    LEXICON ProperNoun on kokeellinen leksikko


    This (part of) documentation was generated from src/fst/morphology/stems/propernouns.lexc


    src-fst-morphology-stems-verbs.lexc.md

    Verbivartalot

    Katsaus:

    Lexicon VerbRoot

    v_v2 Now v3_ise all end on s. In order to have aukasevat, as an possible Unexpected results: aukasseeva, aukaisseevat Missing results: aukasevat, Unexpected results: aukasseevat

    huokata:huokka v43_odd ; puheta:puhke v43_odd ;


    This (part of) documentation was generated from src/fst/morphology/stems/verbs.lexc


    src-fst-phonetics-txt2ipa.xfscript.md

    retroflex plosive, voiceless t ʈ 0288, 648 ( = ASCII 096) retroflex plosive, voiced d ɖ 0256, 598 labiodental nasal F ɱ 0271, 625 retroflex nasal n ɳ 0273, 627 palatal nasal J ɲ 0272, 626 velar nasal N ŋ 014B, 331 uvular nasal N\ ɴ 0274, 628

    bilabial trill B\ ʙ 0299, 665 uvular trill R\ ʀ 0280, 640 alveolar tap 4 ɾ 027E, 638 retroflex flap r ɽ 027D, 637 bilabial fricative, voiceless p\ ɸ 0278, 632 bilabial fricative, voiced B β 03B2, 946 dental fricative, voiceless T θ 03B8, 952 dental fricative, voiced D ð 00F0, 240 postalveolar fricative, voiceless S ʃ 0283, 643 postalveolar fricative, voiced Z ʒ 0292, 658 retroflex fricative, voiceless s ʂ 0282, 642 retroflex fricative, voiced z` ʐ 0290, 656 palatal fricative, voiceless C ç 00E7, 231 palatal fricative, voiced j\ ʝ 029D, 669 velar fricative, voiced G ɣ 0263, 611 uvular fricative, voiceless X χ 03C7, 967 uvular fricative, voiced R ʁ 0281, 641 pharyngeal fricative, voiceless X\ ħ 0127, 295 pharyngeal fricative, voiced ?\ ʕ 0295, 661 glottal fricative, voiced h\ ɦ 0266, 614

    alveolar lateral fricative, vl. K alveolar lateral fricative, vd. K\

    labiodental approximant P (or v) alveolar approximant r\ retroflex approximant r` velar approximant M\

    retroflex lateral approximant l` palatal lateral approximant L velar lateral approximant L
    Clicks

    bilabial O\ (O = capital letter) dental |
    (post)alveolar !\ palatoalveolar =\ alveolar lateral ||
    Ejectives, implosives

    ejective > e.g. ejective p p> implosive < e.g. implosive b b< Vowels

    close back unrounded M close central unrounded 1 close central rounded } lax i I lax y Y lax u U

    close-mid front rounded 2 close-mid central unrounded @\ close-mid central rounded 8 close-mid back unrounded 7

    schwa ə @

    open-mid front unrounded E open-mid front rounded 9 open-mid central unrounded 3 open-mid central rounded 3\ open-mid back unrounded V open-mid back rounded O

    ash (ae digraph) { open schwa (turned a) 6

    open front rounded & open back unrounded A open back rounded Q Other symbols

    voiceless labial-velar fricative W voiced labial-palatal approx. H voiceless epiglottal fricative H\ voiced epiglottal fricative <\ epiglottal plosive >\

    alveolo-palatal fricative, vl. s\ alveolo-palatal fricative, voiced z\ alveolar lateral flap l\ simultaneous S and x x\ tie bar _ Suprasegmentals

    primary stress “ secondary stress % long : half-long :\ extra-short _X linking mark -
    Tones and word accents

    level extra high _T level high _H level mid _M level low _L level extra low _B downstep ! upstep ^ (caret, circumflex)

    contour, rising contour, falling _F contour, high rising _H_T contour, low rising _B_L

    contour, rising-falling _R_F (NB Instead of being written as diacritics with _, all prosodic marks can alternatively be placed in a separate tier, set off by < >, as recommended for the next two symbols.) global rise global fall Diacritics

    voiceless 0 (0 = figure), e.g. n_0 voiced _v aspirated _h more rounded _O (O = letter) less rounded _c advanced _+ retracted _- centralized _” syllabic = (or _=) e.g. n= (or n=) non-syllabic _^ rhoticity `

    breathy voiced _t creaky voiced _k linguolabial _N labialized _w palatalized ‘ (or _j) e.g. t’ (or t_j) velarized _G pharyngealized _?\

    dental d apical _a laminal _m nasalized ~ (or _~) e.g. A~ (or A~) nasal release _n lateral release _l no audible release _}

    velarized or pharyngealized _e velarized l, alternatively 5 raised _r lowered _o advanced tongue root _A retracted tongue root _q


    This (part of) documentation was generated from src/fst/phonetics/txt2ipa.xfscript


    src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md

    We describe here how abbreviations are in Kven Finnish are read out, e.g. for text-to-speech systems.

    For example:


    This (part of) documentation was generated from src/fst/transcriptions/transcriptor-abbrevs2text.lexc


    tools-grammarcheckers-grammarchecker.cg3.md

    K V E N G R A M M A R C H E C K E R

    DELIMITERS

    TAGS AND SETS

    Tags

    This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.

    Beginning and end of sentence

    BOS EOS

    Parts of speech tags

    N A Adv V Pron CS CC CC-CS Po Pr Pcle Num Interj ABBR ACR CLB LEFT RIGHT WEB PPUNCT PUNCT

    COMMA ¶

    Tags for POS sub-categories

    Pers Dem Interr Indef Recipr Refl Rel Coll NomAg Prop Allegro Arab Romertall

    Tags for morphosyntactic properties

    Nom Acc Gen Ill Ine Com Ess Ess Tra Sg Pl

    Cmp/SplitR Cmp/SgNom Cmp/SgGen Cmp/SgGen PxSg1 PxSg2 PxSg3 PxDu1 PxDu2 PxDu3 PxPl1 PxPl2 PxPl3 Px

    Comp Superl Attr Ord Qst IV TV Prt Prs Ind Pot Cond Imprt ImprtII Sg1 Sg2 Sg3 Pl1 Pl2 Pl3 Inf ConNeg Neg PrfPrc VGen PrsPrc Ger Sup Actio VAbess

    Err/Orth

    Semantic tags

    Sem/Act Sem/Ani Sem/Atr Sem/Body Sem/Clth Sem/Domain Sem/Feat-phys Sem/Fem Sem/Group Sem/Lang Sem/Mal Sem/Measr Sem/Money Sem/Obj Sem/Obj-el Sem/Org Sem/Perc-emo Sem/Plc Sem/Sign Sem/State-sick Sem/Sur Sem/Time Sem/Txt

    HUMAN

    PROP-ATTR PROP-SUR

    TIME-N-SET

    Syntactic tags

    @+FAUXV @+FMAINV @-FAUXV @-FMAINV @-FSUBJ> @-F<OBJ @-FOBJ> @-FSPRED<OBJ @-F<ADVL @-FADVL> @-F<SPRED @-F<OPRED @-FSPRED> @-FOPRED> @>ADVL @ADVL< @<ADVL @ADVL> @ADVL @HAB> @<HAB @>N @Interj @N< @>A @P< @>P @HNOUN @INTERJ @>Num @Pron< @>Pron @Num< @OBJ @<OBJ @OBJ> @OPRED @<OPRED @OPRED> @PCLE @COMP-CS< @SPRED @<SPRED @SPRED> @SUBJ @<SUBJ @SUBJ> SUBJ SPRED OPRED @PPRED @APP @APP-N< @APP-Pron< @APP>Pron @APP-Num< @APP-ADVL< @VOC @CVP @CNP OBJ

    -OTHERS SYN-V @X ### Sets containing sets of lists and tags This part of the file lists a large number of sets based partly upon the tags defined above, and partly upon lexemes drawn from the lexicon. See the sourcefile itself to inspect the sets, what follows here is an overview of the set types. #### Sets for Single-word sets INITIAL #### Sets for word or not WORD NOT-COMMA #### Case sets ADLVCASE CASE-AGREEMENT CASE NOT-NOM NOT-GEN NOT-ACC #### Verb sets NOT-V #### Sets for finiteness and mood REAL-NEG MOOD-V NOT-PRFPRC #### Sets for person SG1-V SG2-V SG3-V DU1-V DU2-V DU3-V PL1-V PL2-V PL3-V #### Pronoun sets #### Adjectival sets and their complements #### Adverbial sets and their complements #### Sets of elements with common syntactic behaviour #### NP sets defined according to their morphosyntactic features #### The PRE-NP-HEAD family of sets These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression **WORD - premodifiers**. #### Postposition sets #### Border sets and their complements Grammarchecker rules begin here ### Grammarchecker sets ### Grammarchecker rules #### Speller rules #### Agreement rules ##### Sg1 **Agreement rule:** msyn-agr-other-sg1, *Mun puátá/puáđám* **Agreement rule:** msyn-agr-other-sg1 ##### Pl3 **Agreement rule:** msyn-agr-other-pl3, Subject to the left, *Toh puátá/puátih.* **Agreement rule:** msyn-agr-other-pl3, Subject to the left, *Toh puátá/puátih.* **Agreement rule:** msyn-agr-other-pl3 **Agreement rule:** msyn-agr-other-pl3 #### Agreement rules PrfPrc Pl > PrfPrc Sg relative sentences between the noun and the finite verb Prs Sg3 > Prs Pl3 ##### regular congruence rules #### Negation verb rules #### Postposition rules #### L2 rules #### NP internal rules #### Punctuation rules #### Spacing errors * * * This (part of) documentation was generated from [tools/grammarcheckers/grammarchecker.cg3](https://github.com/giellalt/lang-fkv/blob/main/tools/grammarcheckers/grammarchecker.cg3) --- ## tools-grammarcheckers-grc-disambiguator.cg3.md ## Disambiguator for Kven ### Sets Sentence delimiters are the following: "<.>" "<...>" "<!>" "<?>" "<¶>" #### Part-of-Speech * N = noun * A = adjective * Num = numeral * V = verb * Adv = adverb * Pcle = particle * Pr = preposition * Po = postposition * Pron = pronoun * Interj = interjection #### Numerus * Sg = Singular * Pl = Plural * Sg1 = Singular 1.p. * Sg2 = Singular 2.p. * Sg3 = Singular 3.p. * Pl1 = Plural 1.p. * Pl2 = Plural 2.p. * Pl3 = Plural 3.p. #### Cases * Nom * Gen * Acc * Par * Ine * Ill * Ela * Ade * Abe * All * Abl * Ess * Tra * Ins * Com * SUBJ-CASE = Nom Par #### Types * Prop = Proper noun * Interr = Interrogative * Dem = demonstrative pron * Rel = Relative pron Relpronpl "mikkä ja "jokka" Relpronsg "mikä" ja "joka" Interrpronpl "kuka" ja "mikä" * Pers = Personal pron * Indef = Indef pron * Inf = Infinitive * ConNeg = Conjugated as Negative form * PrfPrc = Perfectum Particip * Imprt = Imperative * Act = Active * Neg = Negation verb * COMMA = comma * Foc/kaan = focus clitic -kaan * Sem/Fem = feminin propernoun ### Sets with more members * WORD = all PoS * NPMOD = these can modify a noun * NPMODADV = NPMOD plus adverb * NOT-NPMOD = these cannot modify a noun * NOT-NPMODADV = these cannot modify a noun, and is not adverb * QVANT-ADV = e.g. paljon, vähän * KUNKA = e.g. kunka missä (adverbs that start a sentence) Boundaries * S-BOUNDARY = words that start a sentence Verbs * SV-BOUNDARY = words that start a sentence and finite verb ### Disambiguation rules #### Dialects #### Early rules * __person_test__ selects finite verb if there is a Pron Pers to the left * __adv_after_V__ selects adverb if there is a verb to the right * __prop_infrontof_kieli__ removes propernoun in fron of kieli, if it kan be something else, e.g. Kainun kieli * **PropInit** removes propernoun in the beginning of a sentence if it kan be a CC or a Pr (e.g. Mutta) * **PropNotInit** selects propernoun if it is not in the beginning of a sentence Possessive suffixes Numeral phrases #### Preposition/postposition/adverb rules * **Prifgenpar** selects preposition to the left of Gen or Par * **Poifgenpar** selects postposition to the right of Gen or Par * **vasthaan** ### Rules for mapping @CVP and @CNP on the CC and CS * **CVP** maps @CVP to CS and mutta * **CNPifN** maps @CNP to CC between two N * **CNPifInf** maps @CNP to CC between two Inf ### Case rules #### Partitive Genitive #### Illative ### Number rules ### More disambiguation rules * **SgNotPl** #### Elative ### Propernouns ### Verbs #### Specific verbs ei negation verb eli ### Adverbs #### paljon #### kerran #### jälkhiin ### Adjectives Conjunctions ### Subjunctions että jos ko sillä ### Pronouns ### Verb rules, Verbs #### Infinitive ### Present Sg3 ### Present Pl3 or Passive Imperative * **Pl3ollaifplrelpronandplinterrpron** selects Pl3 if olla * **Sg3ollaifplrelpronandplinterrpron** selects Sg3 if olla * **Sg3ollainpretandperf** selects Sg3 if COPULAS * **Sg3ollainpretandperf** selects Sg3 if COPULAS * **Relpronandnotintterpron** selects Rel Sg if Interr * **Relpronandnotintterpron** selects Rel Sg if Interr * **interrpron** selects Interr if ? in the end * **DifferenceBetweenNiitäImprtAndNiitäDemAndPersIfSubj** selects Pron Dem Pl or Pron Pers Pl3 when finite verb to the right * **paljonadvandnotpaljonoun** selects Adv if paljon * **Relpronifitsanounoracommabeforeit** selects Rel Pl if N to the left * **annaimperativeandnotannaname** removes Prop if Anna se * **tulinounfromtuliprtsg3** selects V Sg * **dempronandnotpronpers** selects Den if A of N to the right * **Imperativefromconneg** selects and removes ConNeg * **ImperativeafterNeg** removes Imprt if pronoun * **interrel** selects Interr of Rel if CS to the right * **+FMAINV** to the remaining finite verbs which are not AUX ### HNOUN MAPPING * **@<ADVLcoor** (@<ADVL) for ADVLCASEAdv if @CNP to the left and ADVL to the left of it * **X** maps X everywhere * **REMOVE X** removes X whenever there is any other tag. * WORDLEMMA = regex giving the lemma in question * **errorth** removes Err/Orth if there is an analysis without Err/Orth with the same lemma * * * This (part of) documentation was generated from [tools/grammarcheckers/grc-disambiguator.cg3](https://github.com/giellalt/lang-fkv/blob/main/tools/grammarcheckers/grc-disambiguator.cg3) --- ## tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.md ## Tokeniser for fkv Usage: ``` $ make $ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://github.com/hfst/hfst/wiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1. unknown word-like forms, and 2. unmatched strings We want to give 1) a match, but let 2) be treated specially by `hfst-tokenise -a` Unknowns are made of: * lower-case ASCII * upper-case ASCII * select extended latin symbols ASCII digits * select symbols * Combining diacritics as individual symbols, * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" ### Unknown handling Unknowns are tagged ?? and treated specially with `hfst-tokenise` hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Finally we mark as a token any sequence making up a: * known word in context * unknown (OOV) token in context * sequence of word and punctuation * URL in context * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-disamb-gt-desc.pmscript](https://github.com/giellalt/lang-fkv/blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript) --- ## tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.md ## Grammar checker tokenisation for fkv Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just: ``` $ make $ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` More usage examples: ``` $ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://github.com/hfst/hfst/wiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a * select extended latin symbols * select symbols * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" TODO: Could use something like this, but built-in's don't include šžđčŋ: Simply give an empty reading when something is unknown: hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Finally we mark as a token any sequence making up a: * known word in context * unknown (OOV) token in context * sequence of word and punctuation * URL in context * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript](https://github.com/giellalt/lang-fkv/blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript) --- ## tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.md ## TTS tokenisation for smj Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just: ```sh make echo "ja, ja" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` More usage examples: ```sh echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \ boasttu olmmoš, man mielde lahtuid." \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst echo "márffibiillagáffe" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a * select extended latin symbols * select symbols * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" TODO: Could use something like this, but built-in's don't include šžđčŋ: Simply give an empty reading when something is unknown: hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Needs hfst-tokenise to output things differently depending on the tag they get * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-tts-cggt-desc.pmscript](https://github.com/giellalt/lang-fkv/blob/main/tools/tokenisers/tokeniser-tts-cggt-desc.pmscript)

    Sitemap

    Debugging site.pages:

    URL: /assets/css/style.css - Title:

    URL: /Arbeidsplan.html - Title:

    URL: /KaikkiGeneroidutParadigmat.html - Title:

    URL: /LingvistinenSuunnitelma.html - Title: Morfofonologia

    URL: /Links.html - Title:

    URL: /MissingLists.html - Title: Missing list

    URL: /Morfofonologia.html - Title:

    URL: /NominienTilanne.html - Title:

    URL: /PuuttuvienSanojenNouto.html - Title:

    URL: /RapporttiElokuu2014.html - Title: Kaisa Maliniemi

    URL: /Tehtavat.html - Title:

    URL: /TwolcProblems.html - Title:

    URL: /YamltestitMaaliskuu2019.html - Title:

    URL: /corpus/RuijanKaikuYliot.html - Title:

    URL: /fkv.html - Title: Kven Finnish language model documentation

    URL: /index-header.html - Title: Kven Finnish documentation

    URL: / - Title: Kven Finnish documentation

    URL: /meetings/130306.html - Title: Kokkous 6(?) .3. 2013

    URL: /meetings/130315.html - Title:

    URL: /meetings/130322.html - Title: Viikkokokous

    URL: /meetings/130405.html - Title: Viikkokokous

    URL: /meetings/130503.html - Title: Viikkokokous 3.5.2013

    URL: /meetings/130607.html - Title: Viikkokokous 7.6.2013

    URL: /meetings/130612.html - Title: Viikkokokous 7.6.2013

    URL: /meetings/130920.html - Title: Viikkokokous

    URL: /meetings/140117.html - Title: Viikkokokous

    URL: /meetings/140130.html - Title: Viikkokokous 30012014

    URL: /meetings/140207.html - Title: Viikkokokous

    URL: /meetings/140219.html - Title: Kokous

    URL: /meetings/140305.html - Title: Viikkokokous

    URL: /meetings/140318.html - Title: Kokous 18032014

    URL: /meetings/140401.html - Title: Kokous

    URL: /meetings/140424.html - Title: Viikkokokous

    URL: /meetings/140515.html - Title: Viikkokokous

    URL: /meetings/150124.html - Title:

    URL: /meetings/150824.html - Title:

    URL: /meetings/150929_ja_16XXXX.html - Title:

    URL: /meetings/170818.html - Title:

    URL: /meetings/181105.html - Title:

    URL: /meetings/181119.html - Title:

    URL: /meetings/190122.html - Title:

    URL: /meetings/190222.html - Title:

    URL: /meetings/190328.html - Title:

    URL: /meetings/190403.html - Title:

    URL: /meetings/190502.html - Title:

    URL: /meetings/190516.html - Title:

    URL: /meetings/190528.html - Title:

    URL: /meetings/190923.html - Title:

    URL: /meetings/191015.html - Title:

    URL: /meetings/191021.html - Title:

    URL: /meetings/191217.html - Title:

    URL: /meetings/200424.html - Title:

    URL: /meetings/200429.html - Title:

    URL: /meetings/201006.html - Title:

    URL: /meetings/210106.html - Title:

    URL: /meetings/210121.html - Title:

    URL: /meetings/210211.html - Title:

    URL: /meetings/210311.html - Title: fkv-kokous 11.3. 2021

    URL: /meetings/210428.html - Title: fkv-kokous 28.4.2021

    URL: /meetings/210510.html - Title: fkv-kokous 10.5.2021

    URL: /meetings/211104.html - Title: fkv-kokous 4.11.21

    URL: /meetings/221111.html - Title:

    URL: /meetings/230510.html - Title:

    URL: /meetings/230515.html - Title:

    URL: /meetings/231004.html - Title:

    URL: /meetings/231006.html - Title: Sanakirjakokous 6.10.2023

    URL: /meetings/231129.html - Title:

    URL: /meetings/240129.html - Title: Møte 29/1

    URL: /meetings/240228.html - Title: Pyssyjokiseminaari 26.-28.2. 2024

    URL: /n_11-feilit.html - Title: Feilit, jokka oon jokhaisessa paradigmassa:

    URL: /nominien_jaottelusta.html - Title:

    URL: /old_documentation.html - Title: Older documents

    URL: /sanakirja.html - Title:

    URL: /src-cg3-disambiguator.cg3.html - Title: Disambiguator for Kven

    URL: /src-cg3-old_disambiguation.cg3.html - Title:

    URL: /src-fst-morphology-affixes-abbreviations.lexc.html - Title:

    URL: /src-fst-morphology-affixes-adjectives.lexc.html - Title:

    URL: /src-fst-morphology-affixes-nouns.lexc.html - Title: Sublexica for NounRoot

    URL: /src-fst-morphology-affixes-numerals.lexc.html - Title: Kven numerals

    URL: /src-fst-morphology-affixes-pronouns.lexc.html - Title: Pronominien morfologia

    URL: /src-fst-morphology-affixes-propernouns.lexc.html - Title:

    URL: /src-fst-morphology-affixes-symbols.lexc.html - Title: Symbol affixes

    URL: /src-fst-morphology-affixes-verbs.lexc.html - Title: LEXICA FOR KVEN VERB INFLECTION

    URL: /src-fst-morphology-phonology.twolc.html - Title: Phonological rules for Kven

    URL: /src-fst-morphology-root.lexc.html - Title: Kven morphological transducer

    URL: /src-fst-morphology-stems-adjectives.lexc.html - Title: Kven language adjectives

    URL: /src-fst-morphology-stems-adverbs.lexc.html - Title:

    URL: /src-fst-morphology-stems-closed.lexc.html - Title: Closed parts of speech

    URL: /src-fst-morphology-stems-fkv-abbreviations.lexc.html - Title:

    URL: /src-fst-morphology-stems-nouns.lexc.html - Title: Substantiivit

    URL: /src-fst-morphology-stems-numerals.lexc.html - Title: Kven numerals

    URL: /src-fst-morphology-stems-postpositions.lexc.html - Title: Postpositioiden vartalot

    URL: /src-fst-morphology-stems-prepositions.lexc.html - Title: Prepositiot

    URL: /src-fst-morphology-stems-pronouns.lexc.html - Title: Pronominien vartalot

    URL: /src-fst-morphology-stems-propernouns.lexc.html - Title: Propernoun lexicon for Kven

    URL: /src-fst-morphology-stems-verbs.lexc.html - Title: Verbivartalot

    URL: /src-fst-phonetics-txt2ipa.xfscript.html - Title:

    URL: /src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html - Title:

    URL: /test-diary.html - Title: Test diary

    URL: /testit/AdjTestiCompSgNom.html - Title:

    URL: /testit/AdjTestiPlGen.html - Title:

    URL: /testit/AdjTestiPlIll.html - Title:

    URL: /testit/AdjTestiPlPar.html - Title:

    URL: /testit/AdjTestiSgGen.html - Title:

    URL: /testit/AdjTestiSgIll.html - Title:

    URL: /testit/AdjTestiSgPar.html - Title:

    URL: /testit/AdjTestiSuperlSgNom.html - Title:

    URL: /testit/PronTesti.html - Title:

    URL: /testit/PropTestiPlCom.html - Title:

    URL: /testit/PropTestiPlEss.html - Title:

    URL: /testit/PropTestiPlGen.html - Title:

    URL: /testit/PropTestiPlIll.html - Title:

    URL: /testit/PropTestiPlPar.html - Title:

    URL: /testit/PropTestiSgGen.html - Title:

    URL: /testit/PropTestiSgIll.html - Title:

    URL: /testit/PropTestiSgPar.html - Title:

    URL: /testit/TestiPlCom.html - Title:

    URL: /testit/TestiPlEss.html - Title:

    URL: /testit/TestiPlGen.html - Title:

    URL: /testit/TestiPlIll.html - Title:

    URL: /testit/TestiPlIne.html - Title:

    URL: /testit/TestiPlPar.html - Title:

    URL: /testit/TestiSgGen.html - Title:

    URL: /testit/TestiSgIll.html - Title:

    URL: /testit/TestiSgPar.html - Title:

    URL: /testit/VerbTestiActPrfPrc.html - Title:

    URL: /testit/VerbTestiActPrsPrc.html - Title:

    URL: /testit/VerbTestiIndPrsPl1.html - Title:

    URL: /testit/VerbTestiIndPrsPl3.html - Title:

    URL: /testit/VerbTestiIndPrsSg1.html - Title:

    URL: /testit/VerbTestiIndPrsSg3.html - Title:

    URL: /testit/VerbTestiIndPrtPl1.html - Title:

    URL: /testit/VerbTestiIndPrtPl3.html - Title:

    URL: /testit/VerbTestiIndPrtSg1.html - Title:

    URL: /testit/VerbTestiIndPrtSg3.html - Title:

    URL: /testit/VerbTestiPassPrfPrc.html - Title:

    URL: /testit/VerbTestiPassPrsPrc.html - Title:

    URL: /testreports/Virhe131125.html - Title:

    URL: /tools-grammarcheckers-grammarchecker.cg3.html - Title:

    URL: /tools-grammarcheckers-grc-disambiguator.cg3.html - Title: Disambiguator for Kven

    URL: /tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html - Title: Tokeniser for fkv

    URL: /tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html - Title: Grammar checker tokenisation for fkv

    URL: /tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html - Title: TTS tokenisation for smj

    Root items:

    URL: /Arbeidsplan.html - Title: Arbeidsplan

    URL: /KaikkiGeneroidutParadigmat.html - Title: Kaikkigeneroidutparadigmat

    URL: /LingvistinenSuunnitelma.html - Title: Morfofonologia

    URL: /Links.html - Title: Links

    URL: /MissingLists.html - Title: Missing list

    URL: /Morfofonologia.html - Title: Morfofonologia

    URL: /NominienTilanne.html - Title: Nominientilanne

    URL: /PuuttuvienSanojenNouto.html - Title: Puuttuviensanojennouto

    URL: /RapporttiElokuu2014.html - Title: Kaisa Maliniemi

    URL: /Tehtavat.html - Title: Tehtavat

    URL: /TwolcProblems.html - Title: Twolcproblems

    URL: /YamltestitMaaliskuu2019.html - Title: Yamltestitmaaliskuu2019

    URL: /fkv.html - Title: Kven Finnish language model documentation

    URL: /index-header.html - Title: Kven Finnish documentation

    URL: / - Title: Kven Finnish documentation

    URL: /n_11-feilit.html - Title: Feilit, jokka oon jokhaisessa paradigmassa:

    URL: /nominien_jaottelusta.html - Title: Nominien_jaottelusta

    URL: /old_documentation.html - Title: Older documents

    URL: /sanakirja.html - Title: Sanakirja

    URL: /src-cg3-disambiguator.cg3.html - Title: Disambiguator for Kven

    URL: /src-cg3-old_disambiguation.cg3.html - Title: Src-cg3-old_disambiguation.cg3

    URL: /src-fst-morphology-affixes-abbreviations.lexc.html - Title: Src-fst-morphology-affixes-abbreviations.lexc

    URL: /src-fst-morphology-affixes-adjectives.lexc.html - Title: Src-fst-morphology-affixes-adjectives.lexc

    URL: /src-fst-morphology-affixes-nouns.lexc.html - Title: Sublexica for NounRoot

    URL: /src-fst-morphology-affixes-numerals.lexc.html - Title: Kven numerals

    URL: /src-fst-morphology-affixes-pronouns.lexc.html - Title: Pronominien morfologia

    URL: /src-fst-morphology-affixes-propernouns.lexc.html - Title: Src-fst-morphology-affixes-propernouns.lexc

    URL: /src-fst-morphology-affixes-symbols.lexc.html - Title: Symbol affixes

    URL: /src-fst-morphology-affixes-verbs.lexc.html - Title: LEXICA FOR KVEN VERB INFLECTION

    URL: /src-fst-morphology-phonology.twolc.html - Title: Phonological rules for Kven

    URL: /src-fst-morphology-root.lexc.html - Title: Kven morphological transducer

    URL: /src-fst-morphology-stems-adjectives.lexc.html - Title: Kven language adjectives

    URL: /src-fst-morphology-stems-adverbs.lexc.html - Title: Src-fst-morphology-stems-adverbs.lexc

    URL: /src-fst-morphology-stems-closed.lexc.html - Title: Closed parts of speech

    URL: /src-fst-morphology-stems-fkv-abbreviations.lexc.html - Title: Src-fst-morphology-stems-fkv-abbreviations.lexc

    URL: /src-fst-morphology-stems-nouns.lexc.html - Title: Substantiivit

    URL: /src-fst-morphology-stems-numerals.lexc.html - Title: Kven numerals

    URL: /src-fst-morphology-stems-postpositions.lexc.html - Title: Postpositioiden vartalot

    URL: /src-fst-morphology-stems-prepositions.lexc.html - Title: Prepositiot

    URL: /src-fst-morphology-stems-pronouns.lexc.html - Title: Pronominien vartalot

    URL: /src-fst-morphology-stems-propernouns.lexc.html - Title: Propernoun lexicon for Kven

    URL: /src-fst-morphology-stems-verbs.lexc.html - Title: Verbivartalot

    URL: /src-fst-phonetics-txt2ipa.xfscript.html - Title: Src-fst-phonetics-txt2ipa.xfscript

    URL: /src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html - Title: Src-fst-transcriptions-transcriptor-abbrevs2text.lexc

    URL: /test-diary.html - Title: Test diary

    URL: /tools-grammarcheckers-grammarchecker.cg3.html - Title: Tools-grammarcheckers-grammarchecker.cg3

    URL: /tools-grammarcheckers-grc-disambiguator.cg3.html - Title: Disambiguator for Kven

    URL: /tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html - Title: Tokeniser for fkv

    URL: /tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html - Title: Grammar checker tokenisation for fkv

    URL: /tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html - Title: TTS tokenisation for smj

    Directory items:

    URL: /corpus/RuijanKaikuYliot.html - Title: Ruijankaikuyliot

    URL: /meetings/130306.html - Title: Kokkous 6(?) .3. 2013

    URL: /meetings/130315.html - Title: 130315

    URL: /meetings/130322.html - Title: Viikkokokous

    URL: /meetings/130405.html - Title: Viikkokokous

    URL: /meetings/130503.html - Title: Viikkokokous 3.5.2013

    URL: /meetings/130607.html - Title: Viikkokokous 7.6.2013

    URL: /meetings/130612.html - Title: Viikkokokous 7.6.2013

    URL: /meetings/130920.html - Title: Viikkokokous

    URL: /meetings/140117.html - Title: Viikkokokous

    URL: /meetings/140130.html - Title: Viikkokokous 30012014

    URL: /meetings/140207.html - Title: Viikkokokous

    URL: /meetings/140219.html - Title: Kokous

    URL: /meetings/140305.html - Title: Viikkokokous

    URL: /meetings/140318.html - Title: Kokous 18032014

    URL: /meetings/140401.html - Title: Kokous

    URL: /meetings/140424.html - Title: Viikkokokous

    URL: /meetings/140515.html - Title: Viikkokokous

    URL: /meetings/150124.html - Title: 150124

    URL: /meetings/150824.html - Title: 150824

    URL: /meetings/150929_ja_16XXXX.html - Title: 150929_ja_16xxxx

    URL: /meetings/170818.html - Title: 170818

    URL: /meetings/181105.html - Title: 181105

    URL: /meetings/181119.html - Title: 181119

    URL: /meetings/190122.html - Title: 190122

    URL: /meetings/190222.html - Title: 190222

    URL: /meetings/190328.html - Title: 190328

    URL: /meetings/190403.html - Title: 190403

    URL: /meetings/190502.html - Title: 190502

    URL: /meetings/190516.html - Title: 190516

    URL: /meetings/190528.html - Title: 190528

    URL: /meetings/190923.html - Title: 190923

    URL: /meetings/191015.html - Title: 191015

    URL: /meetings/191021.html - Title: 191021

    URL: /meetings/191217.html - Title: 191217

    URL: /meetings/200424.html - Title: 200424

    URL: /meetings/200429.html - Title: 200429

    URL: /meetings/201006.html - Title: 201006

    URL: /meetings/210106.html - Title: 210106

    URL: /meetings/210121.html - Title: 210121

    URL: /meetings/210211.html - Title: 210211

    URL: /meetings/210311.html - Title: fkv-kokous 11.3. 2021

    URL: /meetings/210428.html - Title: fkv-kokous 28.4.2021

    URL: /meetings/210510.html - Title: fkv-kokous 10.5.2021

    URL: /meetings/211104.html - Title: fkv-kokous 4.11.21

    URL: /meetings/221111.html - Title: 221111

    URL: /meetings/230510.html - Title: 230510

    URL: /meetings/230515.html - Title: 230515

    URL: /meetings/231004.html - Title: 231004

    URL: /meetings/231006.html - Title: Sanakirjakokous 6.10.2023

    URL: /meetings/231129.html - Title: 231129

    URL: /meetings/240129.html - Title: Møte 29/1

    URL: /meetings/240228.html - Title: Pyssyjokiseminaari 26.-28.2. 2024

    URL: /testit/AdjTestiCompSgNom.html - Title: Adjtesticompsgnom

    URL: /testit/AdjTestiPlGen.html - Title: Adjtestiplgen

    URL: /testit/AdjTestiPlIll.html - Title: Adjtestiplill

    URL: /testit/AdjTestiPlPar.html - Title: Adjtestiplpar

    URL: /testit/AdjTestiSgGen.html - Title: Adjtestisggen

    URL: /testit/AdjTestiSgIll.html - Title: Adjtestisgill

    URL: /testit/AdjTestiSgPar.html - Title: Adjtestisgpar

    URL: /testit/AdjTestiSuperlSgNom.html - Title: Adjtestisuperlsgnom

    URL: /testit/PronTesti.html - Title: Prontesti

    URL: /testit/PropTestiPlCom.html - Title: Proptestiplcom

    URL: /testit/PropTestiPlEss.html - Title: Proptestipless

    URL: /testit/PropTestiPlGen.html - Title: Proptestiplgen

    URL: /testit/PropTestiPlIll.html - Title: Proptestiplill

    URL: /testit/PropTestiPlPar.html - Title: Proptestiplpar

    URL: /testit/PropTestiSgGen.html - Title: Proptestisggen

    URL: /testit/PropTestiSgIll.html - Title: Proptestisgill

    URL: /testit/PropTestiSgPar.html - Title: Proptestisgpar

    URL: /testit/TestiPlCom.html - Title: Testiplcom

    URL: /testit/TestiPlEss.html - Title: Testipless

    URL: /testit/TestiPlGen.html - Title: Testiplgen

    URL: /testit/TestiPlIll.html - Title: Testiplill

    URL: /testit/TestiPlIne.html - Title: Testipline

    URL: /testit/TestiPlPar.html - Title: Testiplpar

    URL: /testit/TestiSgGen.html - Title: Testisggen

    URL: /testit/TestiSgIll.html - Title: Testisgill

    URL: /testit/TestiSgPar.html - Title: Testisgpar

    URL: /testit/VerbTestiActPrfPrc.html - Title: Verbtestiactprfprc

    URL: /testit/VerbTestiActPrsPrc.html - Title: Verbtestiactprsprc

    URL: /testit/VerbTestiIndPrsPl1.html - Title: Verbtestiindprspl1

    URL: /testit/VerbTestiIndPrsPl3.html - Title: Verbtestiindprspl3

    URL: /testit/VerbTestiIndPrsSg1.html - Title: Verbtestiindprssg1

    URL: /testit/VerbTestiIndPrsSg3.html - Title: Verbtestiindprssg3

    URL: /testit/VerbTestiIndPrtPl1.html - Title: Verbtestiindprtpl1

    URL: /testit/VerbTestiIndPrtPl3.html - Title: Verbtestiindprtpl3

    URL: /testit/VerbTestiIndPrtSg1.html - Title: Verbtestiindprtsg1

    URL: /testit/VerbTestiIndPrtSg3.html - Title: Verbtestiindprtsg3

    URL: /testit/VerbTestiPassPrfPrc.html - Title: Verbtestipassprfprc

    URL: /testit/VerbTestiPassPrsPrc.html - Title: Verbtestipassprsprc

    URL: /testreports/Virhe131125.html - Title: Virhe131125