Viena Karelian NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and language resources for the Viena Karelian language

View the project on GitHub giellalt/lang-krl

Page Content

  • src-fst-phonetics-txt2ipa.xfscript.md
  • src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md
  • src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.md
  • tools-grammarcheckers-grammarchecker.cg3.md
  • DELIMITERS
  • TAGS AND SETS
  • Viena Karelian language model documentation

    All doc-comment documentation in one large file.


    src-cg3-functions.cg3.md

    These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.

    The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NPT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)

    These were the set types.

    HABITIVE MAPPING

    sma object

    SUBJ MAPPING - leftovers

    OBJ MAPPING - leftovers

    HNOUN MAPPING


    This (part of) documentation was generated from src/cg3/functions.cg3


    src-fst-morphology-affixes-adjectives.lexc.md

    Adjective inflection The Viena Karelian language adjectives compare.


    This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc


    src-fst-morphology-affixes-nouns.lexc.md

    Noun inflection

    This file documents Viena Karelian noun inflection.


    This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc


    src-fst-morphology-affixes-prefixes.lexc.md

    Prefixes Prefixes in the Viena Karelian language are bound to beginning of other words.


    This (part of) documentation was generated from src/fst/morphology/affixes/prefixes.lexc


    src-fst-morphology-affixes-propernouns.lexc.md

    Proper noun inflection The Viena Karelian language proper nouns inflect in the same cases as regular nouns, but


    This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc


    src-fst-morphology-affixes-symbols.lexc.md

    Symbol affixes


    This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc


    src-fst-morphology-affixes-verbs.lexc.md

    Viena Karelian Verb inflection

    The verb lexicon contains two groups of continuation lexica One, with names like VERB_KUUL/UO (in capital letters and indicating stem) have analyses like the Finnish fst (without twolc). The other group has contlexes with nams verb, verb_frekv, verb_intr, etc. They have analyses more like the Kven and Meänkieli ones (with gradation and harmony as twolc processes.

    TODO: Clean up this and go for one of the two.

    Intermediate lexica, for now pointing to present tense only.

    LEXICON verb LEXICON verb_deskr LEXICON verb_fakt LEXICON verb_fakt.kaus … etc. some 20 similar lexica.

    Morphological lexica

    Lexica pointing to final lexica

    LEXICON vinfl going to strong (no trigger) and weak (^WG trigger) +Act+Ind:^WG verb_weak_pres ; +Act+Ind: verb_strong_pres ;

    Final lexica (only pointing to K)

    LEXICON verb_weak_pres

    LEXICON verb_strong_pres

    LEXICON verb_3pl_pres

    LEXICON VERB_CONNEG_0

    LEXICON VERB_PAST_3SG_0

    LEXICON VERB_3SG_0_BACK

    LEXICON VERB_3SG_U

    LEXICON VERB_3SG_Y

    LEXICON VERB_PASSIVE_H

    LEXICON VERB_PASSIVE_H_BACK

    LEXICON VERB_PASSIVE_H_FRONT

    LEXICON VERB_PAST_PASSIVE_H

    LEXICON VERB_PAST_PASSIVE_TIH_BACK

    LEXICON VERB_PAST_PASSIVE_TIH_FRONT

    LEXICON VERB_IMPV_KAH

    LEXICON VERB_IMPVPL1_KA

    LEXICON VERB_IMPVPL2_KUA

    LEXICON VERB_IMPV_KÄH

    LEXICON VERB_IMPVPL1_KÄ

    LEXICON VERB_IMPVPL2_KYÄ

    LEXICON VERB_IMPV

    LEXICON VERB_INF_MÄ

    LEXICON VERB_INF_MA

    LEXICON VERB_INF_TA

    LEXICON VERB_INF_TÄ

    FIXME: not sure LEXICON VERB_INF_AS

    LEXICON VERB_INF_A

    FIXME: ger or sup or some other LEXICON VERB_INF_EN

    LEXICON VERB_INF_Ä

    LEXICON VERB_INF_Ö

    LEXICON VERB_INF_O

    LEXICON VERB_INF_E

    LEXICON VERB_PCP_TU

    LEXICON VERB_PCP_TY

    LEXICON VERB_PCP_TAVA

    LEXICON VERB_PCP_TÄVÄ

    LEXICON VERB_PCP_N

    LEXICON VERB_PCP_N_BACK

    LEXICON VERB_PCP_N_FRONT

    LEXICON VERB_PCP_UN

    LEXICON PCP_UN verbal adjective kuollun, kuollehet

    LEXICON VERB_PCP_YN

    LEXICON PCP_YN verbal adjective nähnyn, nähnehet

    LEXICON VERB_PRES_BACK

    LEXICON VERB_PRES_FRONT

    LEXICON VERB_PAST_BACK

    LEXICON VERB_PAST_FRONT

    LEXICON VERB_COND

    LEXICON VERB_COND_FRONT

    LEXICON VERB_COND_BACK

    LEXICON VERB_COND_PASSIVE_TAIS

    LEXICON VERB_COND_PASSIVE_TÄIS

    LEXICON AUX_IMPVSP3_KAH

    LEXICON AUX_IMPVPL1_KA

    LEXICON AUX_IMPVPL2_KUA

    LEXICON AUX_PCP FIXME

    LEXICON AUX_3SG_PI

    LEXICON AUX_3SG_0

    LEXICON AUX_3SG_Y

    LEXICON AUX_CONNEG_0

    LEXICON AUX_PRES_FRONT

    LEXICON AUX_PRES_BACK

    LEXICON AUX_PAST_WEAK_BACK

    LEXICON AUX_PAST_3SG_0

    LEXICON AUX_PCP_UN


    This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc


    src-fst-morphology-phonology.twolc.md

    The Viena Karelian morphophonological/twolc rules file

    This file documents the phonology.twolc file

    Alphabets and sets

    Alpahbet

    Sets

    Rules

    Rule: Vowel harmony basic

    Tests:


    This (part of) documentation was generated from src/fst/morphology/phonology.twolc


    src-fst-morphology-root.lexc.md

    Viena Karelian morphological analyser

    This file documents the Viena Karelian fst/root.lexc file

    Tags and other multicharacter symbols

    Definitions for Multichar_Symbols

    Analysis symbols

    The morphological analyses of wordforms for the Viena Karelian language are presented in this system in terms of the following symbols. (It is highly suggested to follow existing standards when adding new tags).

    The parts-of-speech are:

    The parts of speech are further split up into:

    The Usage extents are marked using following tags:

    The nominals are inflected in the following Case and Number

    The possession is marked as such:

    Other verb forms are

    Question and Focus particles:

    Semantics are classified with

    Derivations are classified under the morphophonetic form of the suffix, the source and target part-of-speech.

    Morphophonology To represent phonologic variations in word forms we use the following symbols in the lexicon files:

    And following triggers to control variation

    Flag diacritics

    We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:

    Flag Explanation
    @P.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
    @D.NeedNoun.ON@ (Dis)allow compounds with verbs unless nominalised
    @C.NeedNoun@ (Dis)allow compounds with verbs unless nominalised

    For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.

    Flag Explanation
    @P.CmpFrst.FALSE@ Require that words tagged as such only appear frst
    @D.CmpPref.TRUE@ Block such words from entering ENDLEX
    @P.CmpPref.FALSE@ Block these words from making further compounds
    @D.CmpLast.TRUE@ Block such words from entering R
    @D.CmpNone.TRUE@ Combines with the next tag to prohibit compounding
    @U.CmpNone.FALSE@ Combines with the prev tag to prohibit compounding
    @P.CmpOnly.TRUE@ Sets a flag to indicate that the word has passed R
    @D.CmpOnly.FALSE@ Disallow words coming directly from root.

    Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.

    Flag Explanation
    @U.Cap.Obl@ Allowing downcasing of derived names: deatnulasj.
    @U.Cap.Opt@ Allowing downcasing of derived names: deatnulasj.
    Flag diacritic Explanation
    @U.number.one@ Flag used to give arabic numerals in smj different cases ;
    @U.number.two@ Flag used to give arabic numerals in smj different cases ;
    @U.number.three@ Flag used to give arabic numerals in smj different cases ;
    @U.number.four@ Flag used to give arabic numerals in smj different cases ;
    @U.number.five@ Flag used to give arabic numerals in smj different cases ;
    @U.number.six@ Flag used to give arabic numerals in smj different cases ;
    @U.number.seven@ Flag used to give arabic numerals in smj different cases ;
    @U.number.eight@ Flag used to give arabic numerals in smj different cases ;
    @U.number.nine@ Flag used to give arabic numerals in smj different cases ;
    @U.number.zero@ Flag used to give arabic numerals in smj different cases ;
    @P.number.one@ Flag used to give arabic numerals in smj different cases ;
    @P.number.two@ Flag used to give arabic numerals in smj different cases ;
    @P.number.three@ Flag used to give arabic numerals in smj different cases ;
    @P.number.four@ Flag used to give arabic numerals in smj different cases ;
    @P.number.five@ Flag used to give arabic numerals in smj different cases ;
    @P.number.six@ Flag used to give arabic numerals in smj different cases ;
    @P.number.seven@ Flag used to give arabic numerals in smj different cases ;
    @P.number.eight@ Flag used to give arabic numerals in smj different cases ;
    @P.number.nine@ Flag used to give arabic numerals in smj different cases ;
    @P.number.ten@ Flag used to give arabic numerals in smj different cases ;
    @P.number.zero@ Flag used to give arabic numerals in smj different cases ;

    The Root and K lexica

    LEXICON Root is where it all begins The word forms in Viena Karelian language start from the lexeme roots of basic word classes, or optionally from prefixes:

    LEXICON K adds clitics or goes to #


    This (part of) documentation was generated from src/fst/morphology/root.lexc


    src-fst-morphology-stems-adjectives.lexc.md

    Viena Karelian Adjectives

    This file documents the stems/adjectives.lexc file for Adjective stems The files points to the affixes/adjectives.lexc file.

    LEXICON Adjectives

    aito+A:ai ADJ_AI/TO ; etc.

    maybe like fin: eri, no infl.


    This (part of) documentation was generated from src/fst/morphology/stems/adjectives.lexc


    src-fst-morphology-stems-adpositions.lexc.md

    Viena Karelian adpositions

    adpositions


    This (part of) documentation was generated from src/fst/morphology/stems/adpositions.lexc


    src-fst-morphology-stems-adverbs.lexc.md

    Viena Karelian adverb stems

    ADV


    This (part of) documentation was generated from src/fst/morphology/stems/adverbs.lexc


    src-fst-morphology-stems-conjunctions.lexc.md

    Viena Karelian conjunctions

    conjunctions


    This (part of) documentation was generated from src/fst/morphology/stems/conjunctions.lexc


    src-fst-morphology-stems-interjections.lexc.md

    Viena Karelian interjections

    @LEXNAME*


    This (part of) documentation was generated from src/fst/morphology/stems/interjections.lexc


    src-fst-morphology-stems-nouns.lexc.md

    Viena Karelian Nouns

    This file documents the Viena Karelian noun stem file. The first part of the file contains stems, the second contains the intermediate morphology.

    The stem list

    Nouns

    afrikkalaine+N:afrikkalai NOUN_ELAVUTTAMI/NE ; aihe+N:aihe NOUN_AIH/E ; aikakaušlehti+N:aikakaus#leh NOUN_LEH/TI ;

    The list of intermediate lexica

    These lexica point to the morphology in affixes/nouns.lexc

    LEXICON a_i_noun

    LEXICON a_i_u_noun

    LEXICON a_i_ä_noun

    Intermediate lexicon, approach 2 (todo: unify)

    LEXICON rihm/a__noun

    LEXICON NOUN_MÄT/ÄŠ

    LEXICON NOUN_KIN/NAŠ

    LEXICON NOUN_EHOK/AŠ

    LEXICON NOUN_KYNNY/Š


    This (part of) documentation was generated from src/fst/morphology/stems/nouns.lexc


    src-fst-morphology-stems-numerals.lexc.md

    Numerals

    The Numerals are analysed as the ones for Finnish.

    LEXICON Numerals

    LEXICON cardinal

    LEXICON cardinal_vaill

    LEXICON ordinal

    LEXICON NUM_Y/KSI

    LEXICON NUM_KA/KŠI

    LEXICON NUM_KOLM/E

    … etc.


    This (part of) documentation was generated from src/fst/morphology/stems/numerals.lexc


    src-fst-morphology-stems-particles.lexc.md

    Viena Karelian particles

    LEXICON Particles gives the particles.

    LEXICON particle gives tag

    LEXICON particle_vahv gives the same tag, actually.


    This (part of) documentation was generated from src/fst/morphology/stems/particles.lexc


    src-fst-morphology-stems-pronouns.lexc.md

    Viena Karelian Pronouns

    The file list pronoun stems .

    LEXICON Pronouns

    LEXICON PRON_MI/NÄ

    LEXICON PRON_MI/E

    LEXICON PRON_H/IÄN

    LEXICON PRON_M/YÖ

    LEXICON PRON_TÄ/MÄ

    LEXICON PRON_NÄ/MÄ

    LEXICON PRON_T/UO

    LEXICON PRON_N/UO

    LEXICON PRON_Š/E

    LEXICON PRON_N/E

    LEXICON PRON_IČ/E

    LEXICON PRON_KAI/KKI

    LEXICON PRON_KU/DAI

    LEXICON PRON_MOLOM/PI

    LEXICON PRON_JOKAHI/NI

    LEXICON PRON_KUMPA/INE

    LEXICON PRON_KE/N

    LEXICON PRON_MI

    LEXICON PRON_KU

    LEXICON PRON_JOKA

    LEXICON PRON_MON/I

    LEXICON PRON_MU/U

    LEXICON PRON_TOI/NI


    This (part of) documentation was generated from src/fst/morphology/stems/pronouns.lexc


    src-fst-morphology-stems-propernouns.lexc.md

    Viena Karelian Propernouns

    The file stems/propernouns.lexc lists just that.

    LEXICON PROPN


    This (part of) documentation was generated from src/fst/morphology/stems/propernouns.lexc


    src-fst-morphology-stems-verbs.lexc.md

    Documenting the Viena Karelian Verb lexicon.

    The verb lexicon contains two groups of continuation lexica One, with names like VERB_KUUL/UO (in capital letters and indicating stem) have analyses like the Finnish fst (without twolc). The other group has contlexes with nams verb, verb_frekv, verb_intr, etc. They have analyses more like the Kven and Meänkieli ones (with gradation and harmony as twolc processes.

    TODO: Clean up this and go for one of the two.

    LEXICON Verbs contains the stem list

    The second list of verbs

    This contains just the infinitive and points to defect paradigms for now.

    The intermediate lexica

    These lexica redirects the stem to different person-number sublexica.

    LEXICON kavota_katuo_verb … This lexicon does not work, as both stems go to the same contlex.

    LEXICON proššai(k)koa_verb_vaill

    LEXICON sevota_verb

    LEXICON stavaikkoa_verb_vaill

    LEXICON tavai(k)koa_tavaite_verb_vaill

    LEXICON tuta_verb

    LEXICON viyhtie_verb

    LEXICON voulie_vuolie_verb

    LEXICON kirjut/tua__verb

    LEXICON VERB_J/IÄHÄ

    LEXICON VERB_V/IIJÄ

    LEXICON VERB_Š/YYVVÄ

    LEXICON VERB_L/UUVVA

    LEXICON VERB_Š/YYVÄ

    LEXICON VERB_J/UUVA

    LEXICON VERB_PIÄS/TÄ

    LEXICON VERB_KÄ/YVÄ

    LEXICON VERB_S/UAHA

    LEXICON VERB_MIET/TIE

    LEXICON VERB_LÄ/HTIE

    LEXICON VERB_T/UUVVA

    LEXICON VERB_TU/LLA

    LEXICON VERB_KUOL/LA

    LEXICON VERB_PA/ISSA

    LEXICON VERB_NOU/ŠŠA

    LEXICON VERB_PAN/NA

    LEXICON VERB_MÄN/NÄ

    LEXICON VERB_TARVI/TA

    LEXICON VERB_MERKI/TÄ

    LEXICON VERB_STARINOI/JA

    LEXICON VERB_IKÄVÖI/JÄ

    LEXICON VERB_ŠAN/OA

    LEXICON VERB_MUISTEL/EHTOA

    LEXICON VERB_KAŠV/OA

    LEXICON VERB_AL/KOA

    LEXICON VERB_AN/TOA

    LEXICON VERB_PAIS/TOA

    LEXICON VERB_KAČ/ČUO

    LEXICON VERB_KAČ/ČOA

    LEXICON VERB_KOROŠ/TOA

    LEXICON VERB_VALMIS/TOA

    LEXICON VERB_TAH/TOA

    LEXICON VERB_KARJ/UO

    LEXICON VERB_TAP/PUA

    LEXICON VERB_TAP/POA

    LEXICON VERB_SOIT/TOA

    LEXICON VERB_OT/TOA

    LEXICON VERB_TANŠŠI/E

    LEXICON VERB_EČ/ČIE

    LEXICON VERB_POIMI/E

    LEXICON VERB_IT/KIE

    LEXICON VERB_KITK/IE

    LEXICON VERB_LAŠ/KIE

    LEXICON VERB_KÄŠ/KIE

    LEXICON VERB_OP/PIE

    LEXICON VERB_ŠO/PIE

    LEXICON VERB_TUN/TIE

    LEXICON VERB_LUA/TIE

    LEXICON VERB_TI/ETEÄ

    LEXICON VERB_TÄYT/TYÄ

    LEXICON VERB_TYÖN/TYÄ

    LEXICON VERB_NÄYT/TYÄ

    LEXICON VERB_VIČER/TEÄ

    LEXICON VERB_PIÄT/TEÄ

    LEXICON VERB_TYÖN/TEÄ

    LEXICON VERB_LÖY/TEÄ

    LEXICON VERB_JÄRJEŠ/TEÄ

    LEXICON VERB_PI/TYÄ

    LEXICON VERB_PIÄŠ/TYÄ

    LEXICON VERB_OPAŠ/TUA

    LEXICON VERB_OPAŠ/TUO

    LEXICON VERB_TOIV/UO

    LEXICON VERB_VOIT/TUA

    LEXICON VERB_KAN/TUA

    LEXICON VERB_AUT/TUA

    LEXICON VERB_RUA/TUA

    LEXICON VERB_RUA/TUO

    LEXICON VERB_TAH/TUO

    LEXICON VERB_KUUL/UO

    LEXICON VERB_LOP/PUO

    LEXICON VERB_RYH/TYÖ

    LEXICON VERB_PISY/Ö

    LEXICON VERB_ILMEŠ/TYÖ

    LEXICON VERB_IS/TUO

    LEXICON VERB_RIK/KUO

    LEXICON VERB_ROIK/KUO

    LEXICON VERB_SAT/TUO

    LEXICON VERB_KER/TUO

    LEXICON VERB_ŠI/TUO

    LEXICON VERB_KUČ/ČUO

    LEXICON VERB_VAI/PUO

    LEXICON VERB_KER/ÄTÄ

    LEXICON VERB_KER/ITÄ

    LEXICON VERB_N/ÄHÄ

    LEXICON VERB_AV/ATA

    LEXICON VERB_RU/VETA

    LEXICON VERB_KERÄ/TÄ

    LEXICON VERB_LEIK/ATA

    LEXICON VERB_ŠAL/VATA

    LEXICON VERB_ŠAL/VATA

    LEXICON VERB_NIM/ETÄ

    LEXICON VERB_TYK/YTÄ CHECKME 20250831

    LEXICON VERB_TYK/ÄTÄ

    LEXICON VERB_HYREYTY/Ä

    LEXICON VERB_PUREŠKEL/LA

    LEXICON VERB_AJAT/ELLA

    LEXICON VERB_LEVÄHEL/LÄ

    LEXICON VERB_OM/MELLA

    LEXICON VERB_O/LLA

    LEXICON AUX_O/LLA

    LEXICON AUX_E/I

    LEXICON AUX_VO/IJA

    LEXICON AUX_PI/TEÄ


    This (part of) documentation was generated from src/fst/morphology/stems/verbs.lexc


    src-fst-phonetics-txt2ipa.xfscript.md

    retroflex plosive, voiceless t ʈ 0288, 648 ( = ASCII 096) retroflex plosive, voiced d ɖ 0256, 598 labiodental nasal F ɱ 0271, 625 retroflex nasal n ɳ 0273, 627 palatal nasal J ɲ 0272, 626 velar nasal N ŋ 014B, 331 uvular nasal N\ ɴ 0274, 628

    bilabial trill B\ ʙ 0299, 665 uvular trill R\ ʀ 0280, 640 alveolar tap 4 ɾ 027E, 638 retroflex flap r ɽ 027D, 637 bilabial fricative, voiceless p\ ɸ 0278, 632 bilabial fricative, voiced B β 03B2, 946 dental fricative, voiceless T θ 03B8, 952 dental fricative, voiced D ð 00F0, 240 postalveolar fricative, voiceless S ʃ 0283, 643 postalveolar fricative, voiced Z ʒ 0292, 658 retroflex fricative, voiceless s ʂ 0282, 642 retroflex fricative, voiced z` ʐ 0290, 656 palatal fricative, voiceless C ç 00E7, 231 palatal fricative, voiced j\ ʝ 029D, 669 velar fricative, voiced G ɣ 0263, 611 uvular fricative, voiceless X χ 03C7, 967 uvular fricative, voiced R ʁ 0281, 641 pharyngeal fricative, voiceless X\ ħ 0127, 295 pharyngeal fricative, voiced ?\ ʕ 0295, 661 glottal fricative, voiced h\ ɦ 0266, 614

    alveolar lateral fricative, vl. K alveolar lateral fricative, vd. K\

    labiodental approximant P (or v) alveolar approximant r\ retroflex approximant r` velar approximant M\

    retroflex lateral approximant l` palatal lateral approximant L velar lateral approximant L
    Clicks

    bilabial O\ (O = capital letter) dental |
    (post)alveolar !\ palatoalveolar =\ alveolar lateral ||
    Ejectives, implosives

    ejective > e.g. ejective p p> implosive < e.g. implosive b b< Vowels

    close back unrounded M close central unrounded 1 close central rounded } lax i I lax y Y lax u U

    close-mid front rounded 2 close-mid central unrounded @\ close-mid central rounded 8 close-mid back unrounded 7

    schwa ə @

    open-mid front unrounded E open-mid front rounded 9 open-mid central unrounded 3 open-mid central rounded 3\ open-mid back unrounded V open-mid back rounded O

    ash (ae digraph) { open schwa (turned a) 6

    open front rounded & open back unrounded A open back rounded Q Other symbols

    voiceless labial-velar fricative W voiced labial-palatal approx. H voiceless epiglottal fricative H\ voiced epiglottal fricative <\ epiglottal plosive >\

    alveolo-palatal fricative, vl. s\ alveolo-palatal fricative, voiced z\ alveolar lateral flap l\ simultaneous S and x x\ tie bar _ Suprasegmentals

    primary stress “ secondary stress % long : half-long :\ extra-short _X linking mark -
    Tones and word accents

    level extra high _T level high _H level mid _M level low _L level extra low _B downstep ! upstep ^ (caret, circumflex)

    contour, rising contour, falling _F contour, high rising _H_T contour, low rising _B_L

    contour, rising-falling _R_F (NB Instead of being written as diacritics with _, all prosodic marks can alternatively be placed in a separate tier, set off by < >, as recommended for the next two symbols.) global rise global fall Diacritics

    voiceless 0 (0 = figure), e.g. n_0 voiced _v aspirated _h more rounded _O (O = letter) less rounded _c advanced _+ retracted _- centralized _” syllabic = (or _=) e.g. n= (or n=) non-syllabic _^ rhoticity `

    breathy voiced _t creaky voiced _k linguolabial _N labialized _w palatalized ‘ (or _j) e.g. t’ (or t_j) velarized _G pharyngealized _?\

    dental d apical _a laminal _m nasalized ~ (or _~) e.g. A~ (or A~) nasal release _n lateral release _l no audible release _}

    velarized or pharyngealized _e velarized l, alternatively 5 raised _r lowered _o advanced tongue root _A retracted tongue root _q


    This (part of) documentation was generated from src/fst/phonetics/txt2ipa.xfscript


    src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md

    We describe here how abbreviations are in Viena Karelian are read out, e.g. for text-to-speech systems.

    For example:


    This (part of) documentation was generated from src/fst/transcriptions/transcriptor-abbrevs2text.lexc


    src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.md

    % komma% :, Root ; % tjuohkkis% :%. Root ; % kolon% :%: Root ; % sárggis% :%- Root ; % násti% :%* Root ;


    This (part of) documentation was generated from src/fst/transcriptions/transcriptor-numbers-digit2text.lexc


    tools-grammarcheckers-grammarchecker.cg3.md

    [ L A N G U A G E ] G R A M M A R C H E C K E R

    DELIMITERS

    TAGS AND SETS

    Tags

    This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.

    Beginning and end of sentence

    BOS EOS

    Parts of speech tags

    N A Adv V Pron CS CC CC-CS Po Pr Pcle Num Interj ABBR ACR CLB LEFT RIGHT WEB PPUNCT PUNCT

    COMMA ¶

    Tags for POS sub-categories

    Pers Dem Interr Indef Recipr Refl Rel Coll NomAg Prop Allegro Arab Romertall

    Tags for morphosyntactic properties

    Nom Acc Gen Ill Loc Com Ess Ess Sg Du Pl Cmp/SplitR Cmp/SgNom Cmp/SgGen Cmp/SgGen PxSg1 PxSg2 PxSg3 PxDu1 PxDu2 PxDu3 PxPl1 PxPl2 PxPl3 Px

    Comp Superl Attr Ord Qst IV TV Prt Prs Ind Pot Cond Imprt ImprtII Sg1 Sg2 Sg3 Du1 Du2 Du3 Pl1 Pl2 Pl3 Inf ConNeg Neg PrfPrc VGen PrsPrc Ger Sup Actio VAbess

    Err/Orth

    Semantic tags

    Sem/Act Sem/Ani Sem/Atr Sem/Body Sem/Clth Sem/Domain Sem/Feat-phys Sem/Fem Sem/Group Sem/Lang Sem/Mal Sem/Measr Sem/Money Sem/Obj Sem/Obj-el Sem/Org Sem/Perc-emo Sem/Plc Sem/Sign Sem/State-sick Sem/Sur Sem/Time Sem/Txt

    HUMAN

    PROP-ATTR PROP-SUR

    TIME-N-SET

    Syntactic tags

    @+FAUXV @+FMAINV @-FAUXV @-FMAINV @-FSUBJ> @-F<OBJ @-FOBJ> @-FSPRED<OBJ @-F<ADVL @-FADVL> @-F<SPRED @-F<OPRED @-FSPRED> @-FOPRED> @>ADVL @ADVL< @<ADVL @ADVL> @ADVL @HAB> @<HAB @>N @Interj @N< @>A @P< @>P @HNOUN @INTERJ @>Num @Pron< @>Pron @Num< @OBJ @<OBJ @OBJ> @OPRED @<OPRED @OPRED> @PCLE @COMP-CS< @SPRED @<SPRED @SPRED> @SUBJ @<SUBJ @SUBJ> SUBJ SPRED OPRED @PPRED @APP @APP-N< @APP-Pron< @APP>Pron @APP-Num< @APP-ADVL< @VOC @CVP @CNP OBJ

    -OTHERS SYN-V @X ### Sets containing sets of lists and tags This part of the file lists a large number of sets based partly upon the tags defined above, and partly upon lexemes drawn from the lexicon. See the sourcefile itself to inspect the sets, what follows here is an overview of the set types. #### Sets for Single-word sets INITIAL #### Sets for word or not WORD NOT-COMMA #### Case sets ADLVCASE CASE-AGREEMENT CASE NOT-NOM NOT-GEN NOT-ACC #### Verb sets NOT-V #### Sets for finiteness and mood REAL-NEG MOOD-V NOT-PRFPRC #### Sets for person SG1-V SG2-V SG3-V DU1-V DU2-V DU3-V PL1-V PL2-V PL3-V #### Pronoun sets #### Adjectival sets and their complements #### Adverbial sets and their complements #### Sets of elements with common syntactic behaviour #### NP sets defined according to their morphosyntactic features #### The PRE-NP-HEAD family of sets These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression **WORD - premodifiers**. #### Border sets and their complements #### Grammarchecker sets * * * This (part of) documentation was generated from [tools/grammarcheckers/grammarchecker.cg3](https://github.com/giellalt/lang-krl/blob/main/tools/grammarcheckers/grammarchecker.cg3) --- ## tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.md ## Tokeniser for krl Usage: ``` $ make $ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://github.com/hfst/hfst/wiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1. unknown word-like forms, and 2. unmatched strings We want to give 1) a match, but let 2) be treated specially by `hfst-tokenise -a` Unknowns are made of: * lower-case ASCII * upper-case ASCII * select extended latin symbols ASCII digits * select symbols * Combining diacritics as individual symbols, * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" ### Unknown handling Unknowns are tagged ?? and treated specially with `hfst-tokenise` hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Finally we mark as a token any sequence making up a: * known word in context * unknown (OOV) token in context * sequence of word and punctuation * URL in context * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-disamb-gt-desc.pmscript](https://github.com/giellalt/lang-krl/blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript) --- ## tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.md ## Grammar checker tokenisation for krl Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just: ``` $ make $ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` More usage examples: ``` $ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://github.com/hfst/hfst/wiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a * select extended latin symbols * select symbols * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" TODO: Could use something like this, but built-in's don't include šžđčŋ: Simply give an empty reading when something is unknown: hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Finally we mark as a token any sequence making up a: * known word in context * unknown (OOV) token in context * sequence of word and punctuation * URL in context * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript](https://github.com/giellalt/lang-krl/blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript) --- ## tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.md ## TTS tokenisation for smj Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just: ```sh make echo "ja, ja" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` More usage examples: ```sh echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \ boasttu olmmoš, man mielde lahtuid." \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst echo "márffibiillagáffe" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a * select extended latin symbols * select symbols * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" TODO: Could use something like this, but built-in's don't include šžđčŋ: Simply give an empty reading when something is unknown: hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Needs hfst-tokenise to output things differently depending on the tag they get * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-tts-cggt-desc.pmscript](https://github.com/giellalt/lang-krl/blob/main/tools/tokenisers/tokeniser-tts-cggt-desc.pmscript)

    Sitemap

    Debugging site.pages:

    URL: /assets/css/style.css - Title:

    URL: /Links.html - Title:

    URL: /index-header.html - Title: Viena Karelian documentation

    URL: / - Title: Viena Karelian documentation

    URL: /krl.html - Title: Viena Karelian language model documentation

    URL: /src-cg3-functions.cg3.html - Title:

    URL: /src-fst-morphology-affixes-adjectives.lexc.html - Title:

    URL: /src-fst-morphology-affixes-nouns.lexc.html - Title: Noun inflection

    URL: /src-fst-morphology-affixes-prefixes.lexc.html - Title:

    URL: /src-fst-morphology-affixes-propernouns.lexc.html - Title:

    URL: /src-fst-morphology-affixes-symbols.lexc.html - Title: Symbol affixes

    URL: /src-fst-morphology-affixes-verbs.lexc.html - Title: Viena Karelian Verb inflection

    URL: /src-fst-morphology-phonology.twolc.html - Title: The Viena Karelian morphophonological/twolc rules file

    URL: /src-fst-morphology-root.lexc.html - Title: Viena Karelian morphological analyser

    URL: /src-fst-morphology-stems-adjectives.lexc.html - Title: Viena Karelian Adjectives

    URL: /src-fst-morphology-stems-adpositions.lexc.html - Title: Viena Karelian adpositions

    URL: /src-fst-morphology-stems-adverbs.lexc.html - Title: Viena Karelian adverb stems

    URL: /src-fst-morphology-stems-conjunctions.lexc.html - Title: Viena Karelian conjunctions

    URL: /src-fst-morphology-stems-interjections.lexc.html - Title: Viena Karelian interjections

    URL: /src-fst-morphology-stems-nouns.lexc.html - Title: Viena Karelian Nouns

    URL: /src-fst-morphology-stems-numerals.lexc.html - Title: Numerals

    URL: /src-fst-morphology-stems-particles.lexc.html - Title: Viena Karelian particles

    URL: /src-fst-morphology-stems-pronouns.lexc.html - Title: Viena Karelian Pronouns

    URL: /src-fst-morphology-stems-propernouns.lexc.html - Title: Viena Karelian Propernouns

    URL: /src-fst-morphology-stems-verbs.lexc.html - Title: Documenting the Viena Karelian Verb lexicon.

    URL: /src-fst-phonetics-txt2ipa.xfscript.html - Title:

    URL: /src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html - Title:

    URL: /src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.html - Title:

    URL: /tools-grammarcheckers-grammarchecker.cg3.html - Title:

    URL: /tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html - Title: Tokeniser for krl

    URL: /tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html - Title: Grammar checker tokenisation for krl

    URL: /tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html - Title: TTS tokenisation for smj

    Root items:

    URL: /Links.html - Title: Links

    URL: /index-header.html - Title: Viena Karelian documentation

    URL: / - Title: Viena Karelian documentation

    URL: /krl.html - Title: Viena Karelian language model documentation

    URL: /src-cg3-functions.cg3.html - Title: Src-cg3-functions.cg3

    URL: /src-fst-morphology-affixes-adjectives.lexc.html - Title: Src-fst-morphology-affixes-adjectives.lexc

    URL: /src-fst-morphology-affixes-nouns.lexc.html - Title: Noun inflection

    URL: /src-fst-morphology-affixes-prefixes.lexc.html - Title: Src-fst-morphology-affixes-prefixes.lexc

    URL: /src-fst-morphology-affixes-propernouns.lexc.html - Title: Src-fst-morphology-affixes-propernouns.lexc

    URL: /src-fst-morphology-affixes-symbols.lexc.html - Title: Symbol affixes

    URL: /src-fst-morphology-affixes-verbs.lexc.html - Title: Viena Karelian Verb inflection

    URL: /src-fst-morphology-phonology.twolc.html - Title: The Viena Karelian morphophonological/twolc rules file

    URL: /src-fst-morphology-root.lexc.html - Title: Viena Karelian morphological analyser

    URL: /src-fst-morphology-stems-adjectives.lexc.html - Title: Viena Karelian Adjectives

    URL: /src-fst-morphology-stems-adpositions.lexc.html - Title: Viena Karelian adpositions

    URL: /src-fst-morphology-stems-adverbs.lexc.html - Title: Viena Karelian adverb stems

    URL: /src-fst-morphology-stems-conjunctions.lexc.html - Title: Viena Karelian conjunctions

    URL: /src-fst-morphology-stems-interjections.lexc.html - Title: Viena Karelian interjections

    URL: /src-fst-morphology-stems-nouns.lexc.html - Title: Viena Karelian Nouns

    URL: /src-fst-morphology-stems-numerals.lexc.html - Title: Numerals

    URL: /src-fst-morphology-stems-particles.lexc.html - Title: Viena Karelian particles

    URL: /src-fst-morphology-stems-pronouns.lexc.html - Title: Viena Karelian Pronouns

    URL: /src-fst-morphology-stems-propernouns.lexc.html - Title: Viena Karelian Propernouns

    URL: /src-fst-morphology-stems-verbs.lexc.html - Title: Documenting the Viena Karelian Verb lexicon.

    URL: /src-fst-phonetics-txt2ipa.xfscript.html - Title: Src-fst-phonetics-txt2ipa.xfscript

    URL: /src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html - Title: Src-fst-transcriptions-transcriptor-abbrevs2text.lexc

    URL: /src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.html - Title: Src-fst-transcriptions-transcriptor-numbers-digit2text.lexc

    URL: /tools-grammarcheckers-grammarchecker.cg3.html - Title: Tools-grammarcheckers-grammarchecker.cg3

    URL: /tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html - Title: Tokeniser for krl

    URL: /tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html - Title: Grammar checker tokenisation for krl

    URL: /tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html - Title: TTS tokenisation for smj

    Directory items: