Võro NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-vro

Page Content

  • src-fst-morphology-affixes-numerals.lexc.md
  • src-fst-morphology-affixes-postpositions.lexc.md
  • src-fst-morphology-affixes-pronouns.lexc.md
  • src-fst-morphology-affixes-propernouns.lexc.md
  • src-fst-morphology-affixes-quantifiers.lexc.md
  • src-fst-morphology-affixes-symbols.lexc.md
  • Symbol affixes
  • src-fst-morphology-affixes-verbs.lexc.md
  • src-fst-morphology-clitics.lexc.md
  • src-fst-morphology-phonology.twolc.md
  • The Võro morphophonological/twolc rules file
  • Rules
  • src-fst-morphology-root.lexc.md
  • Võru tags and basic lexica
  • Definitions for Multichar_Symbols
  • Morphophonology
  • Oahpa Place names and case used
  • Flag diacritics
  • The Root lexicon
  • src-fst-morphology-stems-acronyms.lexc.md
  • src-fst-morphology-stems-adjectives_newwords.lexc.md
  • src-fst-morphology-stems-adpositions_newwords.lexc.md
  • src-fst-morphology-stems-adverbs_newwords.lexc.md
  • src-fst-morphology-stems-determiners_newwords.lexc.md
  • src-fst-morphology-stems-exceptions.lexc.md
  • src-fst-morphology-stems-interjections_newwords.lexc.md
  • src-fst-morphology-stems-nouns.lexc.md
  • src-fst-morphology-stems-nouns_newwords.lexc.md
  • src-fst-morphology-stems-verbs.lexc.md
  • src-fst-morphology-stems-verbs_newwords.lexc.md
  • src-fst-phonetics-txt2ipa.xfscript.md
  • src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md
  • src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.md
  • src-fst-transcriptions-transcriptor-symbols2text.lexc.md
  • tools-grammarcheckers-grammarchecker.cg3.md
  • DELIMITERS
  • TAGS AND SETS
  • Võro language model documentation

    All doc-comment documentation in one large file.


    src-cg3-dependency.cg3.md

    C O M M O N S Á M I D E P E N D E N C Y G R A M M A R

    This dep file is for sma, sme, smj, sje.

    DELIMITERS

    Sentence delimiters are the following: <.> <!> <?> <…> <¶>

    TAGS AND SETS

    N V A Adv CC CS Inf Sup Neg Num Po Pr

    Pcle Prop

    Pron IV TV COMMA DASH CITATION to keep colouring we add a “ HYPHEN QMARK PUNCT LEFT RIGHT CLB Ind Pot Impr ImprtII Cond ConNeg Caus causative eus VGen Interj ABBR ACR Prs Prt Cmpnd RCmpnd PrfPrc PrsPrc Actor Actio Ger Indef Nom Acc Ill Com Gen Ess

    IM For fao

    POS sub-categories

    Syntactic tags and sets

    Syntactic tags in input to this file

    Syntactic tags added in this file

    fao syntags

    kal syntags

    eus syntags

    Syntactic set definitions

    Dep grammar

    Correction rules

    The finite verb

    Mapping rules

    lgRemove removes the language tags , , etc, before proceeding to the dep file.


    This (part of) documentation was generated from src/cg3/dependency.cg3


    src-cg3-disambiguator.cg3.md

    Disambiguator for Võro

    Sets

    Sentence delimiters are the following: “<.>” “<…>” “<!>” “<?>” “<¶>”

    Part-of-Speech

    Numerus

    Cases

    Types

    Sets with more members

    Boundaries

    Verbs

    Disambiguation rules

    Dialects

    Early rules

    Possessive suffixes

    Numeral phrases

    Preposition/postposition/adverb rules

    Rules for mapping @CVP and @CNP on the CC and CS

    Case rules

    Partitive

    Genitive

    Illative

    Number rules

    More disambiguation rules

    Elative

    Propernouns

    Verbs

    Specific verbs

    ei negation verb

    eli

    Adverbs

    paljon

    kerran

    jälkhiin

    Adjectives

    Conjunctions

    Subjunctions

    että

    jos

    ko

    sillä

    Pronouns

    Verb rules, Verbs

    Infinitive

    Present Sg3

    Present Pl3 or PrsPrc

    Present Pl3 or Passive

    Imperative

    Past tense

    Prt Pl3 or Prt Sg2

    Negative verb

    Relative pronouns

    HNOUN MAPPING


    This (part of) documentation was generated from src/cg3/disambiguator.cg3


    src-cg3-functions.cg3.md

    S Y N T A C T I C F U N C T I O N S F O R S Á M I

    Sámi language technology project 2003-2018, University of Tromsø #

    This file adds syntactic functions. It is common for all the Saami

    LEFT RIGHT because of apertium

    Syntactic tags

    Tag sets

    These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.

    The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NPT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)

    ADLVCASE

    These were the set types.

    Numeral outside the sentence

    HABITIVE MAPPING

    sma object

    SUBJ MAPPING - leftovers

    OBJ MAPPING - leftovers

    MAPPING for MT - experimental

    HNOUN MAPPING

    missingX adds @X to all missings

    therestX adds @X to all what is left, often errouneus disambiguated forms

    For Apertium:

    The analysis give double analysis because of optional semtags. We go for the one with semtag.


    This (part of) documentation was generated from src/cg3/functions.cg3


    src-fst-morphology-affixes-adjectives.lexc.md

    Adjective inflection The VÕRO language adjectives compare.

    LEXICON A_1HANS1A 1 hanśa:hanśa

    LEXICON A_1HERRAE 1 herrä:herrä

    LEXICON A_2ARTIKLI suhvli:suhvli

    LEXICON A_2KERGE 1 kerge:

    LEXICON A_3ALADU aladu:aladu

    LEXICON A_3PERAEDUE perädü:perädü

    LEXICON A_4AINUS ainus:ainus

    LEXICON A_11AINWQ ainõq:ainõ

    LEXICON A_11KELMEQ kelmeq:kelme

    LEXICON A_13ALONW alonõ:alo

    LEXICON A_13TAEHINE tähine:tähi

    LEXICON A_13TAEHINE_PL tähine:tähi

    LEXICON A_14RITS1KAS ritśkas:ritśka%{sØ%}

    LEXICON A_14HAMMAS rikas:ri%{kØ%}ka%{sØ%}

    LEXICON A_14IKAES rikas:ri%{kØ%}ka%{sØ%}

    LEXICON A_16ABILINW inemine:inemi LEXICON A_16INEMINE inemine:inemi

    LEXICON A_19ALOMANW alomanõ:aloma

    LEXICON A_19PEDAEJAENE pedäjäne:pedäjä

    LEXICON A_19PEDAEJAENE_PL pedäjäne:pedäjä

    LEXICON A_22VWROKWNW võrokõnõ:võrokõ

    LEXICON A_22VAEHAEKENE võrokõnõ:võrokõ

    gradation: no

    gradation: yes

    gradation: no


    This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc


    src-fst-morphology-affixes-adverbs.lexc.md

    Adverbs The VÕRO language adverbs…

    Spatial adverbs

    adjective modifiers

    What is this 2017-03-27


    This (part of) documentation was generated from src/fst/morphology/affixes/adverbs.lexc


    src-fst-morphology-affixes-nouns.lexc.md

    Noun inflection for Võro

    LEXICON N_1HANS1A 1 hanśa:hanśa

    LEXICON N_1VIU 1 viu:viu

    LEXICON N_1HERRAE 1 herrä:herrä

    LEXICON N_1PREI 1 prei:prei

    LEXICON N_3PERAEDUE perädü:perädü

    LEXICON N_3ALADU aladu:aladu

    kipõń:kipõn

    allaś:allas

    sinneĺ:sinnel

    veteĺ:vetel

    tukõv:tukõv

    verrev:verrev

    sallai:sallai

    elläi:elläi

    herre:herre

    villõ:villõ

    LEXICON N_10HWRAK hõrak:hõrak

    LEXICON N_10HAIDAK haidak:haidak

    LEXICON N_10ESAEK esäk:esäk

    LEXICON N_10RAAMAT raamat:raamat

    LEXICON N_10LEMBIT esäk:esäk

    LEXICON N_10AABITS aabits:aabits

    LEXICON N_10HEERITS heerits:heerits

    LEXICON N_10AADRWS1 aadrõś:aadrõs

    LEXICON N_10AMMAT1 ammat́:ammat

    LEXICON N_10HUEPAETS1 hüpätś:hüpä%{td%}s

    LEXICON N_11LAETEQ läteq:lä%{tØ%}te

    LEXICON N_11ANNWQ annõq:andõ

    LEXICON N_11AINWQ ainõq:ainõ LEXICON N_11KELMEQ kelmeq:ainõ

    LEXICON N_11VAIH vaih:vaih

    LEXICON N_12NWKWS1 nõkõś:nõ%{kg%}õ%{sś%}

    LEXICON N_13ALONW alonõ:alo

    LEXICON N_13TAEHINE tähine:tähi

    Gradation: No

    LEXICON N_14RITS1KAS ritśkas:ritśka%{sØ%}

    Gradation: No

    LEXICON N_14HAMMAS hammas:ham%{bm%}a%{sØ%}, saabas:saapa distinguished from 14RITS1KAS due to gradation

    LEXICON N_14IKAES ikäs:ikkä distinguished from 14RITS1KAS due to gradation

    LEXICON N_14NUMMWR1 nummõŕ:numbõr distinguished from 14RITS1KAS due to gradation

    vowel_harmony: front gradation: yes

    vowel_harmony: back gradation: yes

    kotus:kotus

    kotus:kotus

    LEXICON N_16INEMINE inemine:inemi

    LEXICON N_16ABILINW abilinõ:abili

    LEXICON N_16TERAEKENE inemine:inemi

    LEXICON N_16TSIRGUKWNW tsirgukõnõ:abili

    LEXICON N_19ALOMANW alomanõ:aloma

    LEXICON N_19PEDAEJAENE pedäjäne:pedäjä

    LEXICON N_20LATS1 latś:lat%{sś%}

    LEXICON N_20TAEUES1 täüś:täü

    LEXICON N_20VIIS1 täüś:täü

    LEXICON N_20ORS1 täüś:täü

    LEXICON N_20HIRS1 täüś:täü

    LEXICON N_20VAEITS1 väitś:väits

    LEXICON N_20KUEUEDS1 küüdś:küüds

    LEXICON N_20MIIS1 miiś:m

    LEXICON N_21HUEDSI hüdsi:hü

    LEXICON N_21KUSI kusi:kus

    LEXICON N_22VWROKWNW võrokõnõ:võrokõ

    LEXICON N_22NAANW naanõ:naa

    LEXICON N_22VAEHAEKENE vähäkene:vähäke

    gradation: yes

    tarõ:tar

    uma:uma

    pesä:pesä

    nimi:ni%{mØ%}m

    lumi:lum

    LEXICON N_36TUUM1 tuuḿ:t%{ou%}%{ou%}m

    LEXICON N_36HANG1 hanǵ:hang

    LEXICON N_36SAERG1 särǵ:sär%{gǵØ%}

    LEXICON N_36LAHT1 laht́:lah%{tt́Ø%}

    LEXICON N_36PAEIV päiv:päiv

    LEXICON N_36LEIB päiv:päiv

    kogõr:kogõr

    kokr:ko%{kg%}r

    sõbõr:sõbõr oblique plural in o

    kubõl:ku%{pb%}õl oblique plural in õ

    LEXICON N_37PINI pini:pini

    LEXICON N_37WLI õli:õli

    LEXICON N_37MUNA muna:mu%{nØ%}na

    LEXICON N_40TALO talo:ta%{lØ%}lo

    LEXICON N_40HELUE helü:helü

    LEXICON N_40UJA uja:u%{jØ%}ja

    LEXICON N_40IJAE ijä:i%{jØ%}jä

    LEXICON N_40SAVV savv:savvu

    LEXICON N_40TUEKK tükk:tü%{kØ%}kü

    LEXICON N_41JUHT1 juht́:juht

    LEXICON N_41AIG aig:a

    LEXICON N_41ASK aig:aig

    LEXICON N_41MAENG aig:aig

    LEXICON N_41VIIT aig:aig

    LEXICON N_43KANARIK usklik+A:%{ˋØ%}#uskli%{kØ%}%{kg%}

    LEXICON N_43ELAENIK elänik+N:eläni%{kØ%}%{kg%}

    LEXICON N_43SASLWK1

    LEXICON N_43APRIL1

    LEXICON N_43SEKRETAER1

    LEXICON N_43AASTAK

    LEXICON N_44SWDA sõda:sõ%{tØ%}%{tdØ%}a

    LEXICON N_45KANA kana:ka%{nØ%}na

    LEXICON N_45RIHAE rihä:ri%{hØ%}hä

    LEXICON N_46HAIN hain:hain

    LEXICON N_46TARK tark:tark

    LEXICON N_47ASI asi:asi

    LEXICON N_47VELI veli:ve%{lØ%}l

    LEXICON N_47KIRI kiri:kiri

    NOMINAL DECLENSIONS

    LEXICON NMN_1HANS1A 1 hanśa:hanśa

    in d

    LEXICON NMN_1HERRAE 1 herrä:herrä

    in d

    LEXICON NMN_3PERAEDUE perädü:perädü

    LEXICON NMN_3ALADU aladu:aladu

    ainus:ainus

    Secondary

    kuldnõ:kuld

    Secondary

    Secondary

    Secondary

    Secondary

    Secondary

    Secondary

    Secondary

    LEXICON NMN_9KIPWN1/ELLAEI kipõń:kipõń fixme 2016-08-27

    LEXICON NMN_9ALLWV1/XX allõv́:ki%{pb%}õ%{nń%}

    LEXICON NMN_9ALLAS1/SINNEL1 allaś:allas

    LEXICON NMN_9TUKWV/VERREV tukõv:tu%{kg%}õv

    LEXICON NMN_9SALLAI/ELLAEI elläi:e%{lØ%}lä%{ij%}

    SHOULD THIS BE HERE, c.f. yaml

    LEXICON NMN_9TAHHE/HERRE tahhe:ta%{hØ%}he

    LEXICON NMN_9VILLW/XX villõ:vi%{lØ%}lõ

    Noun (10) perit

    vowel_harmony: ONLY FRONT N-lembit10

    N-hwrak10

    LEXICON NMN_11AINWQ/KELMEQ ainõq:ainõ

    A-vaih11

    LEXICON NMN_11ANNWQ/LAETEQ läteq:lä%{tØ%}te

    A-ainwq11

    N-repaenj12

    N-suekues12

    N-suekues12

    LEXICON NMN_13ALONW/TAEHINE alonõ:alo

    A-alonw

    LEXICON NMN_13VAHTSWNW vahtsõnõ:vah

    A-vahtswnw

    LEXICON NMN_13XX/SAEAENE sääne:sää A-alonw

    Distinguished from 14RITS1KAS due to gradation Yaml: N-hammas_gt-norm.yaml

    Distinguished from 14RITS1KAS due to word final h vowel_harmony_variant: hamõh Yaml: N-pereh_gt-norm.yaml

    LEXICON NMN_16ABILINW/INEMINE inemine:inemi abilinõ:abili

    LEXICON NMN_16TSIRGUKWNW/TERAEKENE inemine:inemi tsirgukõnõ:abili

    LEXICON NMN_19ALOMANW/PEDAEJAENE alomanõ:aloma

    LEXICON NMN_22NAANW naanõ:naa

    LEXICON NMN_22VWROKWNW/VAEHAEKENE vähäkene:vähäke

    nimi:nim

    **LEXICON NMN_46SWBWR ** sõbõr:sõbõr Oblique plural in o

    kubõl:ku%{pb%}õl Oblique plural in õ

    pini:pi%{nØ%}ni

    pini:pi%{nØ%}ni

    pung:pung

    kuld:kul%{dl%}

    kuld:kul%{dl%}

    Derived from PUHM, Gradation=”yes”, stem=”+Sg+Nom” stem_vowel=”o”

    LEXICON NMN_46HAIN jalg:jalg gradation: no

    LEXICON NMN_46TARK jalg:jal%{gØ%} gradation: yes

    SINGULAR GENITIVE STEMS

    PLURAL ALLATIVE STEMS

    TAGS THAT CAN BE FOLLOWED BY CLITICS “K”

    PLURAL TAGS

    SINGULAR TAGS

    LEXICON Harm_Neutr_SG_INE_hn RARE

    TAGS THAT CANNOT BE FOLLOWED BY CLITICS

    CASES ONLY

    TAGS THAT CAN BE FOLLOWED BY CLITICS

    TAGS WITH NO ADDED MORPHOLOGY THAT CANNOT BE FOLLOWED BY CLITICS

    digits


    This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc


    src-fst-morphology-affixes-numerals.lexc.md

    Noun inflection for Võro

    kipõnʼ:kipõn

    allaś:allas

    veteĺ:vetel

    tukõv:tukõv

    elläi:elläi

    verrev:verrev

    gradation: no

    gradation: yes distinguished from 14RITS1KAS due to gradation

    distinguished from 14RITS1KAS due to word final h

    distinguished from 14RITS1KAS due to word final h

    kotus:kotus

    inemine:inemi

    abilinõ:abili

    LEXICON NUM_22VWROKWNW võrokõnõ:võrokõ

    LEXICON NUM_22NAANW naanõ:naa

    Gradation: No

    vowel_harmony: front

    Gradation: No

    tarõ:tar

    pesä:tar

    nimi:nim

    kokr:ko%{kg%}r

    sõbõr:sõbõr

    LEXICON NUM_43KANARIK

    LEXICON NUM_44SWDA sõda:sõda

    vro-digits


    This (part of) documentation was generated from src/fst/morphology/affixes/numerals.lexc


    src-fst-morphology-affixes-postpositions.lexc.md

    Postpositions The Võro language postpositions …

    POSTPOSITIONS WITH READY CASE ENDINGS


    This (part of) documentation was generated from src/fst/morphology/affixes/postpositions.lexc


    src-fst-morphology-affixes-pronouns.lexc.md

    Pronoun inflection The Võro language pronouns inflect in the same cases as regular nouns, but with a colon (‘:’) as separator.

    PERSONAL PRONOUN

    CHECKME vowel harmony

    LEXICON PERS_PL1 maq:m

    LEXICON PERS_PL2 saq:

    LEXICON PERS_PL3 timä:

    DEMONSTRATIVE PRONOUNS

    INDEFINITE PRONOUNS

    INTERROGATIVE PRONOUNS


    This (part of) documentation was generated from src/fst/morphology/affixes/pronouns.lexc


    src-fst-morphology-affixes-propernouns.lexc.md

    Proper noun inflection The Võro language proper nouns inflect in the same cases as regular nouns, but with a colon (‘:’) as separator.

    LEXICON PROP_1HANS1A 1 hanśa:hanśa

    LEXICON PROP_1VIU 1 viu:viu

    LEXICON PROP_1HERRAE 1 herrä:herrä

    LEXICON PROP_3ALADU aladu:aladu

    LEXICON PROP_VERE Rakvere:Rakv

    harmony: front

    kipõń:kipõń

    sallai:sallai

    elläi:elläi

    tukõv:tukõv

    LEXICON PROP_10AMEERIGA Ameeriga:Ameerik cf. _10HWRAK

    LEXICON PROP_10ESAEK esäk:esäk

    LEXICON PROP_10LEMBIT Lembit:Lembi%{td%}

    LEXICON PROP_10VIDRIK vidrik:vidrik gradation: no

    Gradation: No

    Gradation: No

    Gradation: No

    LEXICON PROP_14HAMMAS hammas:hamba, saabas:saapa gradation: yes distinguished from 14RITS1KAS due to gradation

    distinguished from 14RITS1KAS due to word final h

    distinguished from 14RITS1KAS due to word final h

    kotus:kotus

    kotus:kotus

    kotus:kotus

    Gradation: No

    LEXICON PROP_16ABILINW abilinõ:abili

    Gradation: No

    Gradation: No

    Gradation: No

    gradation: yes vowel_harmony: front

    gradation: yes vowel_harmony: front

    gradation: yes

    Gradation: No

    gradation: yes

    gradation: yes

    tarõ:tar

    nimi:nim

    pesä:pesä

    pesä:pesä

    LEXICON PROP_36TUUM1 tuuḿ:t%{ou%}%{ou%}m :%{back%} NMN_36TUUM1/XX1-SG_OBL ; This allows for place names, which, for the most part, have nominative singulars that are identical to their genitive singulars.

    LEXICON PROP_36SAERG1 särǵ:särgʼ

    LEXICON PROP_36PAEIV päiv:päiv

    kogõr:kogõr

    LEXICON PROP_37PINI pini:pini

    LEXICON PROP_37WLI pini:pini

    LEXICON PROP_40TALO talo:talo

    LEXICON PROP_40UJA uja:uja

    LEXICON PROP_41ASK ask:asko

    LEXICON PROP_44SWDA sõda:sõda

    LEXICON PROP_46HAIN hain:hain


    This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc


    src-fst-morphology-affixes-quantifiers.lexc.md

    Quantifier inflection The Võro language quantifiers inflect in cases.


    This (part of) documentation was generated from src/fst/morphology/affixes/quantifiers.lexc


    src-fst-morphology-affixes-symbols.lexc.md

    Symbol affixes


    This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc


    src-fst-morphology-affixes-verbs.lexc.md

    Verb inflection Võro language verbs inflect for person and number.

    There are other verbs here, cf. V_ELAEMAE

    There are other verbs here, cf. V_ELAEMAE

    There are other verbs here, cf. V_ELAEMAE

    There are other verbs here, cf. V_ELAEMAE

    There are other verbs here, cf. V_ELAEMAE

    Pss_PrfPrc: sadat

    taplõma:tapõl

    võitlõma:võitõl

    kullõma+V:ku%{lØ%}l%{õØ%}%{lĺ%}

    +Pss+Ind+Prs+Sg1, +Pss+Ind+Prt+Sg1 +Pss+PrsPrc, +Pss+PrfPrc

    +Act+Ind+Prs+Sg1, +Act+Ind+ConNegII, +Act+Imprt+Sg2 +Act+Ind+Prs+Neg, +Act+Ind+Prt+Neg, +Act+Ind+ConNegI

    +Act+Ind+Prs+Sg3, +Act+Ind+Prs+Pl3

    +Act+Ind+Prs+Sg2, +Err/Dial+Act+Ind+Prs+Sg2, +Act+Ind+Prs+Pl1, +Act+Ind+Prs+Pl2

    +Act+Ind+Prt+Sg3

    argnõma:arg

    +Pss+Ind+Prt +Sg1-+Pl3, ConNeg

    THIS FAR 2016-08-27

    Act_Ind_Prs_Pl3: essüseq

    V_Inf/mA: miildümä

    Pss+PrfPrc, Pss+PrsPrc

    Retain consonant and stem vowel

    Weaken consonant and semi-retension of stem vowel

    Act+Ind+Prs+Sg1/Sg2/Pl1/Pl2, Ind+ConNegII, Ind+Prs+ConNeg Pss+Ind

    Retain consonant and stem vowel

    Weaken consonant and replace stem vowel with i

    Retain consonant remove stem vowel and add i

    +Jus

    Pss+PrfPrc, Pss+PrsPrc

    Retain consonant and stem vowel

    Weaken consonant and semi-retension of stem vowel

    Act+Ind+Prs+Sg1/Sg2/Pl1/Pl2, Ind+ConNegII, Ind+Prs+ConNeg Pss+Ind

    Retain consonant and stem vowel

    Weaken consonant and replace stem vowel with i

    Retain consonant remove stem vowel and add i

    +Jus

    Retain consonant and stem vowel

    Pss+PrfPrc, Pss+PrsPrc

    Weaken consonant and semi-retension of stem vowel

    Weaken consonant and semi-retension of stem vowel

    Act+Ind+Prs+Sg1/Sg2/Pl1/Pl2, Ind+ConNegII, Ind+Prs+ConNeg Pss+Ind

    Retain consonant and stem vowel

    Weaken consonant and replace stem vowel with i

    Retain consonant remove stem vowel and add i

    Remainder is in exceptions.lexc minemä to go/ mennä

    Retain consonant and stem vowel

    Retain consonant and stem vowel

    Strengthen consonant

    Retain consonant and stem vowel

    Retain consonant and add õ

    Retain consonant and stem vowel

    Strengthen consonant and replace stem vowel with i

    consonant and add i

    Retain consonant and stem vowel

    Strengthen consonant

    Retain consonant and stem vowel

    Retain consonant

    Retain consonant and add õ

    Act+Ind+Prs+Sg1/Sg2/Pl1/Pl2, Ind+ConNegII, Ind+Prs+ConNeg Pss+Ind

    Retain consonant and stem vowel

    Strengthen consonant and replace stem vowel with i

    Strengthen consonant and add ʼ

    tegemä to do/ tehdä

    nägemä to see/nähdä

    IS THIS RIGHT? 2015-09-02

    sõida

    IS THIS RIGHT? 2015-09-02

    sõida

    HERE is the distinction 2016-10-04

    IS THIS RIGHT? 2015-09-02

    IS THIS RIGHT? 2015-09-02

    IS THIS RIGHT? 2015-09-02

    sõida

    sõida

    SETS BY CONSONANT QUALITY

    INDICATIVE PRESENT ACTIVE CONJUGATION

    JUS

    CHECK THIS

    PASSIVE INDICATIVE PRESENT CONJUGATION

    INDICATIVE PRETERIT SUBJECT CONJUGATION

    PASSIVE INDICATIVE PRETERIT CONJUGATION

    NON-FINITES

    PASSIVE DISTRIBUTION


    This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc


    src-fst-morphology-clitics.lexc.md

    Clitics in Võro


    This (part of) documentation was generated from src/fst/morphology/clitics.lexc


    src-fst-morphology-phonology.twolc.md

    The Võro morphophonological/twolc rules file

    This file documents the phonology.twolc file

    Special letters

    Vowel harmony with “(t)a/ä”

     %{aä%}:0    — Vowel harmony with "(t)a/ä" AÄ1:a AÄ1:ä AÄ1:0
     %{ae%}:a   — Vowel harmony with "a/e/õ" passive tahetu
     %{aõ%}:a   — Vowel harmony with "a/e/õ" passive sõidõtu
     %{äe%}:ä    — Vowel harmony with "ä/e/õ" passive
     %{eõ%}:0    — Vowel harmony with "e/õ"
     %{uü%}:0    — Vowel harmony with "u/ü"
     %{öü%}:ö    — Vowel raising
     %{ou%}:o    — Vowel raising
     %{ei%}:e    — Vowel raising
     %{õy%}:õ    — Vowel raising
     %{ao%}:a    — Vowel raising
    
     %{eØ%}:e    — ütlemä:üt%{eØ%}l  
     %{õØ%}:õ    — ütlemä:üt%{eØ%}l  
     %{Øõ%}:0    — juurdlõma:juur%{dØ%}%{0õ%}l
    
     %{dØ%}:d    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{dv%}:d    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{dn%}:d    — HJK and KimmoK ideas lammas:lam%{bm%}a%{sØ%}
     %{dl%}:d    — HJK and KimmoK ideas lammas:lam%{bm%}a%{sØ%}
    
     %{ij%}:i    ellä%{ij%}
     %{gv%}:g    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{gl%}:g    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{gØ%}:g    — HJK and KimmoK ideas argnõma:ar%{gØ%}
     %{uv%}:u    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{üv%}:ü    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{hØ%}:h    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{jØ%}:j    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{kØ%}:k    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{lØ%}:l    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{mØ%}:m    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{nØ%}:n    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{pØ%}:p    — HJK and KimmoK ideas oppama:o%{pØ%}pama
     %{rØ%}:r    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{sØ%}:s    — HJK and KimmoK ideas närväs:när%{bv%}ä%{sØ%}
     %{vØ%}:v    — HJK and KimmoK ideas kana:ka%{nØ%}na
    
     %{pØ%}:0    — häbü:häbü+N:hä%{pØ%}%{pbØ%}ü
     %{tØ%}:0    — koda:ko%{tØ%}%{tdØ%}a
     %{kØ%}:0    — nägo:nä%{kØ%}%{kgØ%}o
    
     %{bv%}:b    — HJK and KimmoK ideas närväs:när%{bv%}ä%{sØ%}
     %{dr%}:d    — HJK and KimmoK ideas parras:par%{dr%}a%{sØ%}
     %{bm%}:b    — HJK and KimmoK ideas lammas:lam%{bm%}a%{sØ%}
     %{pb%}:p    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{pb%}:b    — HJK and KimmoK ideas kana:ka%{nØ%}na
    
     %{tØ%}:t    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{t́Ø%}:t    — HJK and KimmoK ideas jaht́lõma:jah%{t́Ø%}%{eØ%}%{lĺ%}
     %{td%}:t    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{t́d́%}:t́    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{kg%}:k    — HJK and KimmoK ideas kaigas:kai%{kg%}as
    
     %{pbØ%}:p   — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{pbØ%}:b   — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{pbØ%}:0   — HJK and KimmoK ideas kana:ka%{nØ%}na
    
     %{pbv%}:p   %{pbv%}:b   %{pbv%}:v   — tõbi: tõvõ tõpõ tõppõ
    
     %{tdØ%}:d   — HJK and KimmoK ideas kana:ka%{nØ%}na
    
     %{kgØ%}:k   — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{kgØ%}:g   — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{kgØ%}:0   — HJK and KimmoK ideas kana:ka%{nØ%}na
    
     %{jiØ%}:i   — HJK and KimmoK ideas vari:var%{jiØ%}o
     %{qmn%}:q   — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{qn%}:q    — HJK and KimmoK ideas kana:ka%{nØ%}na
    
     %{dd́Ø%}:d   
     %{dd́n%}:d   
     %{dd́r%}:d   
     %{dd́v%}:d   
     %{dd́Ø%}:d   
     %{gǵv%}:g   
     %{gǵØ%}:g   
     %{kḱg%}:k    %{kḱg%}:ḱ    %{kḱg%}:g   
     %{kḱØ%}:k   
     %{pṕb%}:p   %{pṕb%}:ṕ    %{pṕb%}:b   
     %{tt́d%}:t    %{tt́d%}:t́    %{tt́d%}:d   
     %{tt́Ø%}:t    täh%{tt́Ø%}
     %{pṕØ%}:p   
    
    

    Palatalization of consonants

     %{bb́%}:b    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{cć%}:c    — HJK and KimmoK ideas Isaać:Isaa%{cć%}:ci
     %{dd́%}:d    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{ff́%}:f    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{gǵ%}:g    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{hh́%}:h    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{kḱ%}:k    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{lĺ%}:l     — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{lĺ%}:ĺ     — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{mḿ%}:m    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{nń%}:n    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{pṕ%}:p    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{rŕ%}:r    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{sś%}:s    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{sś%}:ś    — HJK and KimmoK ideas vaśma:va%{sØ%}%{sś%}
     %{tt́%}:t    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{vv́%}:v    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{ḱǵj%}:ḱ   — HJK and KimmoK ideas laǵa:la%{ḱǵj%}a
     %{zź%}:z    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{dd́n%}:d 
    

    Miscellaneous other symbols

     %{XV%}:0    — This is used for echoing the previous vowel
     %{XC%}:0    — This is used for lengthening a consonant
     %^I7:0      — This appears in stem vaoma:va%^I7o for vaio
     %^K7:0      — This appears in stem väemä:vä%^K7e for väkeq
     %^V7:0      — This appears in stem häömä:hä%^V7ö for hävvü
     %^T7:0      — This appears in stem kaoma:ka%^T7o for katoq
     %^Y7:õ      — This appears for syna = s%^Y7na and is rendered as õ in the norm
    

    Triggers

       %^OO2Õ:0    — joo%^OO2Õ%>i:j0õ0%>i
       %^CC2C:0    — att%^CC2C%>m%{aä%} atma
     %^PSS:0       vowel in passive tahetu, sõidõtu, eletü
     %^ÄI2ÄÄ:0    — päiv%^ÄI2ÄÄ%>ä: päävä
     %{front%}:0    — front harmony
     %{back%}:0    — back harmony
    %^ErrorBack:0  — +Err/Orth+Clt:%>kinaq in front harmony context BHARM disallowance
     %{PrsSg1%}:0  — this helps with %{eõ%}:i̬
    
    
     %{td%}:t 	 HJK and KimmoK ideas kana:ka%{nØ%}na
     %{kg%}:k 	 HJK and KimmoK ideas kaigas:kai%{kg%}as
    
     %{qmn%}:q 	 HJK and KimmoK ideas kana:ka%{nØ%}na
     %{qn%}:q 	 HJK and KimmoK ideas kana:ka%{nØ%}na
     %{XV%}:0		 This is used for echoing the previous vowel
     %{XC%}:0	 This is used for lengthening a consonant
     %^I7:0          This appears in stem vaoma:va%^I7o for vaio
     %^K7:0           This appears in stem väemä:vä%^K7e for väkeq
     %^V7:0          This appears in stem häömä:hä%^V7ö for hävvü
     %^T7:0          This appears in stem kaoma:ka%^T7o for katoq
    
    **%^Y7:õ  **  This appears for syna = s%^Y7na and is rendered as õ in the norm
    
    %^NoGrad:0     — This will be placed after a stem to break Gradation
    %^APOCH:0      — This causes apochope: puhksama vs puhastaq
    %^StrD2T:0     — This changes g,d,b => k,t,p
    
    %^G1:0	       — This is used with %{pØ%} %{pbØ%} for 0 0, also t, k
    %^G2:0	       — This is used with %{pØ%} %{pbØ%} for 0 b, also t, k
    %^G3:0	       — This is used with %{pØ%} %{pbØ%} for 0 p, also t, k
    %^G4:0	       — This is used with %{pØ%} %{pbØ%} for p p, also t, k
    
    %^WGStem:0     — This weakens "kipõń" to "kibõna", "ompel" to "ommel"
    %^StrGStem:0   — This strengthens "perädü" to "perätüt"
    %^ShortGStem:0   — This shortens "pu%{tØ%}tu" to "putma", an orthographic convension
    %^LongGStem:0     — This lengthens "pu%{tØ%}tu" to "puttuq"
    
    %^Pen:0        — This moves us to penultimate coda
    %^PAL:0	       — Palatalization
    %^NoPAL:0	       — NoPalatalization
    
    %^JI20:0	       — in vari: vaŕo
    %^JI2I:0	       — in vari vari
    %^JI2J:0	       — in vari: varjo
    
    %^PenWGStem:0  — This weakens "kipõń" to "kibõna"
    %^PenVowRM:0   — syncope tapõld : taplõma 
    %^D2S:0        — The ti => si
    %^TS2S:0       — The -ts- => -s-
    %^I2J:0        — The i => j change
    %^PLPRT:0      — The a:o attested in Plural kana:kanno and prt
    %^VOWRaise:0   — Raises vowel
    %^VOWLower:0   — Lowers vowel
    %^XLowerVow:0  — Lowers vowel two levels
    %^VOWLowerDelab:0   — Lowers vowel and delabializes it
    %^XLowerVowDelab:0  — Lowers vowel two levels and delabializes it
    %^U2E:0        — lowers u:õ and ü:e delabializes and lowers
    %^U2A:0        — lowers u:a and ü:ä delabializes and lowers
    %^VowRM:0      — this will remove stem final vowel
    %^CnsRM:0      — this will remove stem final consonant tervüs:tervü
    

    Onset consonant or word boundary

    Right context for gradation

    Rules

    VOWEL HARMONY

    Vowel harmony suffixes Front

    %{aä%}:a

    %{aä%}:ä

    %{uü%}:u

    %{uü%}:ü

    %{eõ%}:õ

    %{eõ%}:e

    %{ae%}:e tahtma+V+Pss+PrfPrc+Sg+Nom: want/haluta

    %{aõ%}:õ

    %{äe%}:e

    VOWEL LOWERING

    u:o

    ü:ö

    o2õ

    u2õ

    ö2e

    Delabializing o and ö

    VOWEL RAISING

    Delabializing o and ö

    PALATALIZATION

    n2ń palatalization all kestmä+V+Act+Ind+Prt+Sg3:

    akaŕ+A+Sg+Nom

    asi+N+Sg+Gen:

    alostama+V+Act+Ind+Prt+Sg3:

    %{kḱ%}:ḱ kakma

    n2n no palatalization all

    rehksämä+V+Inf/mA:

    {dd́n}:d́ palatalization for 3-way

    särǵ+N+Sg+Nom: roach/särki

    {dd́n}:n weaken 3-way

    andma+V+Act+Ind+Prs+Sg1

    püüdmä+V+Act+Ind+Prs+Sg1

    %{dd́v%}:v

    %{pṕb%}:p loroṕ+N+Sg+Par:

    %{tt́d%}:t

    hainatama+V+Inf/mA

    %{kḱg%}:k

    %{pṕb%}:ṕ loroṕ

    %{tt́d%}:t́

    %{kḱg%}:ḱ

    kõiḱ+Pron+Sg+Nom

    VOWEL CHANGE WITH PLURAL

    tegemä+V+Act+Ind+Prs+Sg1: do

    õ2õ̭

    o2u̬

    Vx%{ou%}:Vyo

    hoolas+A+Sg+Nom:

    Vx%{ou%}2Vyu̬ nuuĺ+N+Sg+Nom: arrow

    kiiĺ+N+Sg+Gen: tongue/kieli

    i2e pini+N+Pl+Par: dog/koira

    i:ä päiv+N+Sg+Gen: day/päivä

    a2o

    * *ka%{nØ%}na%{back%}%^Pen%^StrGStem%^PLPRT*
    * *kanno0000*
    

    {ao}o

    * *ka%{nØ%}n%{ao%}%{back%}%^G3%^PLPRT*
    * *kanno000*
    

    VOWEL LOSS

    a:0 a _ (HarmDummiesVar) %> i ;

    sõda+N+Pl+Par:

    ä:0 pügämä+V+Pss+PrfPrc:

    U:0 Vx

    * *hirnu{back}^Pen^CC2C^VowRM>m{aä}*
    * *hirn00000>ma*
    * *kut{sś}u{back}^Pen^VOWRaise^Pen^PAL^VowRM*
    * *kutś0000000*
    * *tervüs{front}^VowRM^CnsRM>i>t*
    * *terv00000>i>t*
    juusk+N+Sg+Nom: ____
    * *j{ou}{ou}s{kØ}u{back}^VOWRaise^VowRM*
    * *ju̬u̬sk0000*
    
    * *kuu{back}^VOWLower^VowRM>i>d*
    * *ku0000>i>d*
    

    [ Cns: |ArchCns:| Vow: ] _ (s:) (HarmDummiesVar) (%^Pen: %^CC2C:|%^Pen: %^G3:|%^Pen: %^G4:|PenVOWHite %^Pen: %^G1:) %^VowRM: ;

    e:0

    o:0 juuma+V+Inf

    Vx%{ou%}:0 juuma+V+Inf

    Vx%{äe%}:0 Passive stem vowel nõstma+V+Inf/mA

    ö:0

    i:0 hüdsi+N+Sg+Par:

    õ:0

    %{eØ%}: 0

    %{õØ%}: 0

    VOWEL LENGTHENING

    %{XV%}:u

    %{XV%}:ü

    %{XV%}:o

    %{XV%}:a

    %{XV%}:ä

    %{XV%}:õ kannõĺ+N+Sg+Gen: kantele

    %{XV%}:i

    i2j

    %{ij%}:j

    %{jiØ%}:j

    %{jiØ%}:i

    %{jiØ%}:0 vari+N+Sg+Gen: shadow/varjo

    %{jØ%}:0 vari+N+Sg+Gen: shadow/varjo

    u2v depricate to “%{uv%}:v”

    %{uv%}:v

    {üv}:v

    %^I7:i

    %^I7:i

    CONSONANT %{pṕØ%}:ṕ

    **%{tt́Ø%}:t́ **

    **%{tt́Ø%}:t **

    täht́+N+Err/Orth-no-pal+Sg+Nom: star/tähti

    %{kḱØ%}:ḱ

    SECONDARY CONSONANT LENGTHENING

    %{pØ%}:p

    * *hä%{pØ%}%{pbØ%}ü%{front%}%^Pen%^G4*
    * *häppü000*
    * *tõ%{pØ%}%{pbv%}%{back%}%^G4%>%{eõ%}*
    * *tõpp00%>õ*
    * *se%{pØ%}p%{front%}%^StrGStem*
    * *sepp00*
    * *nu%{pØ%}pu%{back%}%^Pen%^VOWRaise%^Pen%^StrGStem%^VowRM*
    * *nupp0000000*
    

    {tØ}:t

    %{t́Ø%}:t́

    %{Øk%}:k igä+N+Sg+Ill

    %{XC%}:s

    %{XC%}:l

    %{XC%}:ĺ

    %{XC%}:k

    %{cć%}:ć

    %{cć%}:c

    Consonant weakening

    kToZero

    %{pṕØ%}:0

    %{tt́Ø%}:0

    %{kḱØ%}:0

    %{sØ%}:0

    %{vØ%}:0
    kruv́ma+V+Inf/mA

    %{rØ%}:0

    %{nØ%}:0

    %{lØ%}:0

    %{mØ%}:0

    %{kØ%}:0

    nätsk+A+Sg+Gen

    kakma:

    kõiḱ+Pron+Sg+Nom

    pToZero

    %{pØ%}:0

    XØToZero agras+A+Sg+Gen

    XØToSelf villui+A+Sg+Nom

    kevväi+N+Sg+Gen: spring

    %{sØ%}:s ratas+N+Sg+Nom

    %{hØ%}:h hamõh+N+Sg+Nom

    %{kØ%}:k rehksämä+V+Inf/mA:

    %{pb%}:p

    %{t́d́%}:d́

    %{t́d́%}:t́

    %{td%}:t

    %{kg%}:k akaŕ+A+Sg+Nom

    %{kg%}:g apteḱ+N+Sg+Gen:

    nõkõś+N+Sg+Ill

    %{td%}:d

    kaotama+V+Act+Ind+Prs+Sg1:

    %{tt́d%}:d kergütämä+V+Act+Ind+Prs+Sg1:

    tToZero hüdsi+N+Sg+Par:

    %{tØ%}:0

    sõda+N+Sg+Gen:

    %{t́Ø%}:0

    CONSONANT QUALITY CHANGE

    %{pṕb%}:b

    %{pb%}:b habras+A+Sg+Nom

    p2b

    b20

    %{pbØ%}:b

    %{dr%}:r murrõq+N+Sg+Nom

    %{dr%}:d murrõq+N+Sg+Gen

    %{ḱǵj%}:ǵ

    %{ḱǵj%}:ḱ

    %{ḱǵj%}:0

    %{tdØ%}:d

    %{dØ%}:d väärdlemä+V+Inf/mA

    kaardas+N+Sg+Nom

    %{kgØ%}:g jõgi+N+Sg+Nom: river / joki

    %{pbv%}:b

    hammas

    %{bm%}:m

    %{bm%}:b

    %{bv%}:v

    %{dn%}:n kannõĺ+N+Sg+Nom: kantele

    %{dl%}:l

    %{dv%}:v

    VdVToVtV

    dTos

    tTos

    tTod kaotama+V+Act+Ind+Prs+Sg1:

    There should always be a trigger

    ** %{dn%}:d**

    j2i

    **{kḱg}:g **

    kõiḱ+Pron+Sg+Gen

    k2g

    igä+N+Sg+Ill

    bTop

    %{pbv%}:p

    %{pbØ%}:p

    %{tdØ%}:t

    %{kgØ%}:k

    STEM-FINAL CONSONANT LOSS

    s20 kirotus+N+Pl+Gen:

    usś+N+Sg+Par door

    vaśma+V+Inf/mA

    %{bv%}:b närväs+A+Sg+Gen:

    %{gØ%}:g liig+A+Sg+Nom:

    d20

    %{dØ%}:0

    g20 deprication to {gǵØ}:0

    %{gØ%}:0

    {gǵØ}:0 särǵ+N+Sg+Gen: roach/särki

    {gǵØ}:g särǵ+N+Sg+Ill: roach/särki

    %{pbv%}:v

    %{pbØ%}:0

    %{tdØ%}:0

    %{kgØ%}:0

    püüdmä+V+Act+Ind+Prs+Sg3

    pereq

    naŕma

    Other marks

    Disallow %^ErrorBack:0 in BHARM

    Disallow %^ErrorBack:0 in BHARM


    This (part of) documentation was generated from src/fst/morphology/phonology.twolc


    src-fst-morphology-root.lexc.md

    Võru tags and basic lexica

    Definitions for Multichar_Symbols

    Analysis symbols

    The morphological analyses of wordforms for the Võro language are presented in this system in terms of the following symbols. (It is highly suggested to follow existing standards when adding new tags).

    The parts-of-speech are:

    The parts of speech are further split up into:

    The Usage extents are marked using following tags:

    The nominals are inflected in the following Case and Number

    The possession is marked as such: There are no possessive markers

    The comparative forms are:

    Verb personal forms are:

    Subject conjugation

    Passive conjugation

    Special symbols are classified with:

    Question and Focus particles:

    Tags distinguishing different versions of the same lemma (before POS)

    Derivations are classified under the morphophonetic form of the suffix, the source and target part-of-speech.

    Morphophonology

    To represent phonologic variations in word forms we use the following symbols in the lexicon files:

     %{aä%}    — Vowel harmony with "(t)a/ä" AÄ1:a AÄ1:ä AÄ1:0
     %{ae%}   — Vowel harmony with "a/e/õ" passive tahetu
     %{aõ%}   — Vowel harmony with "a/e/õ" passive sõidõtu
     %{äe%}    — Vowel harmony with "ä/e/õ" passive
     %{eõ%}    — Vowel harmony with "e/õ"
     %{uü%}    — Vowel harmony with "u/ü"
     %{öü%}    — Vowel raising
     %{ou%}    — Vowel raising
     %{ei%}    — Vowel raising
     %{õy%}    — Vowel raising
     %{ao%}    — Vowel raising
     %{eØ%}    — ütlemä:üt%{eØ%}l  
     %{õØ%}    — ütlemä:üt%{eØ%}l  
     %{Øõ%}    — juurdlõma:juur%{dØ%}%{0õ%}l
     %{XV%}    — This is used for echoing the previous vowel
     %{XC%}    — This is used for lengthening a consonant
     %{dØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{tØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{t́Ø%}    — HJK and KimmoK ideas jaht́lõma:jah%{t́Ø%}%{eØ%}%{lĺ%}
     %{dv%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{ij%}    ellä%{ij%}
     %{gv%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{gl%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{gØ%}    — HJK and KimmoK ideas argnõma:ar%{gØ%}
     %{uv%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{üv%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
    

    Gemination

     %{hØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{jØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{kØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{lØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{mØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{nØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{pØ%}    — HJK and KimmoK ideas oppama:o%{pØ%}pama
     %{rØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{sØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{vØ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{Øp%}    — häbü:hä%{Øp%}%{pbØ%}ü
     %{Øt%}    — koda:ko%{Øt%}%{tdØ%}a
     %{Øk%}    — nägo:nä%{Øk%}%{kgØ%}o
    
    

    Strong and weak

     %{pb%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{td%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{t́d́%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{kg%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{bv%}    — HJK and KimmoK ideas närväs:när%{bv%}ä%{sØ%}
     %{dr%}    — HJK and KimmoK ideas parras:par%{dr%}a%{sØ%}
     %{bm%}    — HJK and KimmoK ideas lammas:lam%{bm%}a%{sØ%}
     %{dn%}    — HJK and KimmoK ideas lammas:lam%{bm%}a%{sØ%}
     %{dl%}    — HJK and KimmoK ideas lammas:lam%{bm%}a%{sØ%}
     %{pbØ%}   — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{pbv%}   — tõbi: tõvõ tõpõ tõppõ
     %{tdØ%}   — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{kgØ%}   — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{jiØ%}   — HJK and KimmoK ideas vari:var%{jiØ%}o
     %{qmn%}   — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{qn%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{dd́Ø%}   
     %{dd́n%}   
     %{dd́r%}   
     %{dd́v%}   
     %{dd́Ø%}   
     %{gǵv%}   
     %{gǵØ%}   
     %{tt́d%}   
     %{tt́Ø%}    täh%{tt́Ø%}
     %{kḱg%}   
     %{kḱØ%}   
     %{pṕb%}   
     %{pṕØ%}   
    
     %{dśtv%}    tä%{üv%}%{śtv%}
     %{djśt%}    vii%{jśt%}
     %{drśt%}    var%{rśt%}
    
    

    Palatalization

     %{bb́%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{dd́%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{ff́%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{gǵ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{hh́%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{kḱ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{lĺ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{mḿ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{nń%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{pṕ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{rŕ%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{sś%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{tt́%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{vv́%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{ḱǵj%}   — HJK and KimmoK ideas laǵa:la%{ḱǵj%}a
     %{zź%}    — HJK and KimmoK ideas kana:ka%{nØ%}na
     %{dd́n%}  
    

    ** %^I7 ** This appears in stem vaoma:va%^I7o for vaio ** %^K7 ** This appears in stem väemä:vä%^K7e for väkeq ** %^V7 ** This appears in stem häömä:hä%^V7ö for hävvü ** %^T7 ** This appears in stem kaoma:ka%^T7o for katoq ** %^Y7 ** This appears for syna = s%^Y7na and is rendered as õ in the norm

    And following triggers to control variation %^ErrorBack +Err/Orth+Clt:%>kinaq in front harmony context BHARM disallowance

    %^CC2C att%^CC2C%>m%{aä%} atma %^OO2Õ joo%^OO2Õ%>i:j0õ0%>i %^PSS vowel in passive tahetu, sõidõtu, eletü

    %^ÄI2ÄÄ päiv%^ÄI2ÄÄ%>ä: päävä %{PrsSg1%} — this helps with %{eõ%}:i̬

    %^StrD2T This changes g,d,b, => k,t,p

    ** %^VowRM ** this will remove stem final vowel ** %^CnsRM ** this will remove stem final consonant tervüs:tervü ** %^StrGStem ** This strengthens “perädü” to “perätüt” %^NoGrad ** %^WGStem ** This weakens %^G1 — This is used with %{pØ%} %{pbØ%} for 0 0, also t, k %^G2 — This is used with %{pØ%} %{pbØ%} for 0 b, also t, k %^G3 — This is used with %{pØ%} %{pbØ%} for 0 p, also t, k %^G4 — This is used with %{pØ%} %{pbØ%} for p p, also t, k “sõda” to “sõtta” %^ShortGStem — This shortens “pu%{tØ%}tu” to “putma”, an orthographic convension %^LongGStem — This lengthens “pu%{tØ%}tu” to “puttuq” %^Pen This moves us to penultimate coda %^PAL — Palatalization %^NoPAL — NoPalatalization %^JI20 — in vari: vaŕo %^JI2I — in vari vari %^JI2J — in vari: varjo

    %^PenWGStem This weakens “kipõń” to “kibõna”

    ** %^PenVowRM ** syncope tapõld : taplõma

    **%^D2S ** käsi, susi %^TS2S The -ts- => -s- %^I2J The i => j change

    ** %^PLPRT ** The a:o attested in Plural kana:kanno and prt **%^VOWRaise ** Raises vowel **%^VOWLower ** Lowers vowel **%^XLowerVow ** Lowers vowel two levels **%^VOWLowerDelab ** Lowers vowel and delabializes it **%^XLowerVowDelab ** Lowers vowel two levels and delabializes it %^U2E lowers u:õ and ü:e delabializes and lowers %^U2A lowers u:a and ü:ä delabializes and lowers

    = a symbol used in front of # to block backtracking and mwe reanalysis in hfst-tokenise (e.g. in dynanic compounds). Makes it possible to distinguish lexical and dynamic compounds in rules. It is converted to zero together with #.

    Flag Explanation
    @D.ErrOrth.ON@  
    @C.ErrOrth@  
    @P.ErrOrth.ON@  
    @R.ErrOrth.ON@  

    Oahpa Place names and case used

    The tagged part of the compound should make a compound using:

    Flag diacritics

    We have manually optimised the structure of our lexicon using the following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again: | @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised

    For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm. | @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first | @D.CmpPref.TRUE@ | Block such words from entering ENDLEX | @P.CmpPref.FALSE@ | Block these words from making further compounds | @D.CmpLast.TRUE@ | Block such words from entering R | @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding | @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding | @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R | @D.CmpOnly.FALSE@ | Disallow words coming directly from root.

    Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.

    @U.Cap.Obl@ Allowing downcasing of derived names: deatnulasj.
    @U.Cap.Opt@ Allowing downcasing of derived names: deatnulasj.
    @U.Case.Abe@ Abessive
    @U.Case.Abl@ Ablative
    @U.Case.Ade@ Adessive
    @U.Case.All@ Allative
    @U.Case.Com@ Comitative
    @U.Case.Ela@ Elative
    @U.Case.Gen@ Genitive
    @U.Case.Ill@ Illative
    @U.Case.Ine@ Inessive
    @U.Case.Nom@ Nominative
    @U.Case.Par@ Partitive
    @U.Case.Ter@ Terminative
    @U.Case.Tra@ Translative
    @U.Number.Pl@ Plural
    @U.Number.Sg@ Singular

    The following flag diacritics are being applied for vowel harmony variation | @U.VowHarm.B@ | Back harmony, used with subsequent Err/Orth-front | @U.VowHarm.F@ | Front harmony, used with subsequent Err/Orth-back

    | Flag diacritic | Explanation | :————- |:———– | @U.number.one@ | Flag used to give arabic numerals in smj different cases ; | @U.number.two@ | Flag used to give arabic numerals in smj different cases ; | @U.number.three@ | Flag used to give arabic numerals in smj different cases ; | @U.number.four@ | Flag used to give arabic numerals in smj different cases ; | @U.number.five@ | Flag used to give arabic numerals in smj different cases ; | @U.number.six@ | Flag used to give arabic numerals in smj different cases ; | @U.number.seven@ | Flag used to give arabic numerals in smj different cases ; | @U.number.eight@ | Flag used to give arabic numerals in smj different cases ; | @U.number.nine@ | Flag used to give arabic numerals in smj different cases ; | @U.number.zero@ | Flag used to give arabic numerals in smj different cases ;

    The Root lexicon

    The word forms in the Võro language start from the lexeme roots of basic word classes, or optionally from prefixes:

    Incoming

    less complex word classes


    This (part of) documentation was generated from src/fst/morphology/root.lexc


    src-fst-morphology-stems-acronyms.lexc.md

    Acronyms Veps acronyms …


    This (part of) documentation was generated from src/fst/morphology/stems/acronyms.lexc


    src-fst-morphology-stems-adjectives_newwords.lexc.md

    This is where new words are added as lexc entries before they are added to the xml source files. kõhna+A:kõhna A_1HANS1A “” ;

    ADD NOUNS BELOW

    | —


    This (part of) documentation was generated from src/fst/morphology/stems/adjectives_newwords.lexc


    src-fst-morphology-stems-adpositions_newwords.lexc.md

    This is where new words are added as lexc entries before they are added to the xml source files. perrä:perrä PO_ “(eng) /(est) /(fin) “ ;

    ADD NOUNS BELOW


    This (part of) documentation was generated from src/fst/morphology/stems/adpositions_newwords.lexc


    src-fst-morphology-stems-adverbs_newwords.lexc.md

    CHECKME


    This (part of) documentation was generated from src/fst/morphology/stems/adverbs_newwords.lexc


    src-fst-morphology-stems-determiners_newwords.lexc.md

    This is where new words are added as lexc entries before they are added to the xml source files. perrä:perrä PO_ “(eng) /(est) /(fin) “ ;

    ADD DETERMINERS BELOW


    This (part of) documentation was generated from src/fst/morphology/stems/determiners_newwords.lexc


    src-fst-morphology-stems-exceptions.lexc.md

    ADVERBS

    ADJECTIVES

    CONJUNTIONS

    GENITIVE ATTRIBUTES

    NOUNS

    PROPER NOUNS

    PLURAL NOUNS

    NUMERALS

    POSTPOSITIONA

    PRONOUNS

    VERBS

    andma to give/antaa

    VERBS WITH FORMS TO STUDY

    kündma to plow/kyntää

    nakkama to begin/ alkaa

    olõma to be/ olla

    nakkama to start/ alkaa

    pandma to put/panna

    pidämä to keep/ pitää

    tundma to feel/tuntea


    This (part of) documentation was generated from src/fst/morphology/stems/exceptions.lexc


    src-fst-morphology-stems-interjections_newwords.lexc.md

    This is where new words are added as lexc entries before they are added to the xml source files.

    ADD INTERJECTIONS BELOW


    This (part of) documentation was generated from src/fst/morphology/stems/interjections_newwords.lexc


    src-fst-morphology-stems-nouns.lexc.md

    hanśa+N:hanśa N_1HANS1A “” ;


    This (part of) documentation was generated from src/fst/morphology/stems/nouns.lexc


    src-fst-morphology-stems-nouns_newwords.lexc.md

    This is where new words are added as lexc entries before they are added to the xml source files. hanśa+N:hanśa N_1HANS1A “” ;

    ADD NOUNS BELOW

    N_HAIDAK, N_10ESAEK in -gu N_10AABITS in -dsa, -ga N_10HWRAK in -ga ~ -gu N_10HEERITS in -dsä N_10RAAMAT, N_LEMBIT in -du/dü

    two-syllable

    Three-syllable words


    This (part of) documentation was generated from src/fst/morphology/stems/nouns_newwords.lexc


    src-fst-morphology-stems-verbs.lexc.md

    atma+V:atta, ikma+V:ikkõ petmä+V:pettä


    This (part of) documentation was generated from src/fst/morphology/stems/verbs.lexc


    src-fst-morphology-stems-verbs_newwords.lexc.md

    This is where new words are added as lexc entries before they are added to the xml source files.

    ADD VERBS BELOW

    verb type split

    atma+V:atta, ikma+V:ikkõ petmä+V:pettä


    This (part of) documentation was generated from src/fst/morphology/stems/verbs_newwords.lexc


    src-fst-phonetics-txt2ipa.xfscript.md

    retroflex plosive, voiceless t ʈ 0288, 648 ( = ASCII 096) retroflex plosive, voiced d ɖ 0256, 598 labiodental nasal F ɱ 0271, 625 retroflex nasal n ɳ 0273, 627 palatal nasal J ɲ 0272, 626 velar nasal N ŋ 014B, 331 uvular nasal N\ ɴ 0274, 628

    bilabial trill B\ ʙ 0299, 665 uvular trill R\ ʀ 0280, 640 alveolar tap 4 ɾ 027E, 638 retroflex flap r ɽ 027D, 637 bilabial fricative, voiceless p\ ɸ 0278, 632 bilabial fricative, voiced B β 03B2, 946 dental fricative, voiceless T θ 03B8, 952 dental fricative, voiced D ð 00F0, 240 postalveolar fricative, voiceless S ʃ 0283, 643 postalveolar fricative, voiced Z ʒ 0292, 658 retroflex fricative, voiceless s ʂ 0282, 642 retroflex fricative, voiced z` ʐ 0290, 656 palatal fricative, voiceless C ç 00E7, 231 palatal fricative, voiced j\ ʝ 029D, 669 velar fricative, voiced G ɣ 0263, 611 uvular fricative, voiceless X χ 03C7, 967 uvular fricative, voiced R ʁ 0281, 641 pharyngeal fricative, voiceless X\ ħ 0127, 295 pharyngeal fricative, voiced ?\ ʕ 0295, 661 glottal fricative, voiced h\ ɦ 0266, 614

    alveolar lateral fricative, vl. K alveolar lateral fricative, vd. K\

    labiodental approximant P (or v) alveolar approximant r\ retroflex approximant r` velar approximant M\

    retroflex lateral approximant l` palatal lateral approximant L velar lateral approximant L
    Clicks

    bilabial O\ (O = capital letter) dental |
    (post)alveolar !\ palatoalveolar =\ alveolar lateral ||
    Ejectives, implosives

    ejective > e.g. ejective p p> implosive < e.g. implosive b b< Vowels

    close back unrounded M close central unrounded 1 close central rounded } lax i I lax y Y lax u U

    close-mid front rounded 2 close-mid central unrounded @\ close-mid central rounded 8 close-mid back unrounded 7

    schwa ə @

    open-mid front unrounded E open-mid front rounded 9 open-mid central unrounded 3 open-mid central rounded 3\ open-mid back unrounded V open-mid back rounded O

    ash (ae digraph) { open schwa (turned a) 6

    open front rounded & open back unrounded A open back rounded Q Other symbols

    voiceless labial-velar fricative W voiced labial-palatal approx. H voiceless epiglottal fricative H\ voiced epiglottal fricative <\ epiglottal plosive >\

    alveolo-palatal fricative, vl. s\ alveolo-palatal fricative, voiced z\ alveolar lateral flap l\ simultaneous S and x x\ tie bar _ Suprasegmentals

    primary stress “ secondary stress % long : half-long :\ extra-short _X linking mark -
    Tones and word accents

    level extra high _T level high _H level mid _M level low _L level extra low _B downstep ! upstep ^ (caret, circumflex)

    contour, rising contour, falling _F contour, high rising _H_T contour, low rising _B_L

    contour, rising-falling _R_F (NB Instead of being written as diacritics with _, all prosodic marks can alternatively be placed in a separate tier, set off by < >, as recommended for the next two symbols.) global rise global fall Diacritics

    voiceless 0 (0 = figure), e.g. n_0 voiced _v aspirated _h more rounded _O (O = letter) less rounded _c advanced _+ retracted _- centralized _” syllabic = (or _=) e.g. n= (or n=) non-syllabic _^ rhoticity `

    breathy voiced _t creaky voiced _k linguolabial _N labialized _w palatalized ‘ (or _j) e.g. t’ (or t_j) velarized _G pharyngealized _?\

    dental d apical _a laminal _m nasalized ~ (or _~) e.g. A~ (or A~) nasal release _n lateral release _l no audible release _}

    velarized or pharyngealized _e velarized l, alternatively 5 raised _r lowered _o advanced tongue root _A retracted tongue root _q


    This (part of) documentation was generated from src/fst/phonetics/txt2ipa.xfscript


    src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md

    We describe here how abbreviations are in Võro are read out, e.g. for text-to-speech systems.

    For example:


    This (part of) documentation was generated from src/fst/transcriptions/transcriptor-abbrevs2text.lexc


    src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.md

    Ordinal numerals begin


    This (part of) documentation was generated from src/fst/transcriptions/transcriptor-numbers-digit2text.lexc


    src-fst-transcriptions-transcriptor-symbols2text.lexc.md

    This file contains mappings from abbreviations and some acronyms to full forms for text-to-speech purposes. This is a supplement to the analyser; the analyser must tag the strings as +ABBR or similar for the transcriptions to work. The resulting full form must be lemmas known to the analyser, for further processing.

    We describe here how abbreviations in Võro are read out, for text-to-speech systems.

    The file contains:


    This (part of) documentation was generated from src/fst/transcriptions/transcriptor-symbols2text.lexc


    tools-grammarcheckers-grammarchecker.cg3.md

    [ L A N G U A G E ] G R A M M A R C H E C K E R

    DELIMITERS

    TAGS AND SETS

    Tags

    This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.

    Beginning and end of sentence

    BOS EOS

    Parts of speech tags

    N A Adv V Pron CS CC CC-CS Po Pr Pcle Num Interj ABBR ACR CLB LEFT RIGHT WEB PPUNCT PUNCT

    COMMA ¶

    Tags for POS sub-categories

    Pers Dem Interr Indef Recipr Refl Rel Coll NomAg Prop Allegro Arab Romertall

    Tags for morphosyntactic properties

    Nom Acc Gen Ill Loc Com Ess Ess Sg Du Pl Cmp/SplitR Cmp/SgNom Cmp/SgGen Cmp/SgGen PxSg1 PxSg2 PxSg3 PxDu1 PxDu2 PxDu3 PxPl1 PxPl2 PxPl3 Px

    Comp Superl Attr Ord Qst IV TV Prt Prs Ind Pot Cond Imprt ImprtII Sg1 Sg2 Sg3 Du1 Du2 Du3 Pl1 Pl2 Pl3 Inf ConNeg Neg PrfPrc VGen PrsPrc Ger Sup Actio VAbess

    Err/Orth

    Semantic tags

    Sem/Act Sem/Ani Sem/Atr Sem/Body Sem/Clth Sem/Domain Sem/Feat-phys Sem/Fem Sem/Group Sem/Lang Sem/Mal Sem/Measr Sem/Money Sem/Obj Sem/Obj-el Sem/Org Sem/Perc-emo Sem/Plc Sem/Sign Sem/State-sick Sem/Sur Sem/Time Sem/Txt

    HUMAN

    PROP-ATTR PROP-SUR

    TIME-N-SET

    Syntactic tags

    @+FAUXV @+FMAINV @-FAUXV @-FMAINV @-FSUBJ> @-F<OBJ @-FOBJ> @-FSPRED<OBJ @-F<ADVL @-FADVL> @-F<SPRED @-F<OPRED @-FSPRED> @-FOPRED> @>ADVL @ADVL< @<ADVL @ADVL> @ADVL @HAB> @<HAB @>N @Interj @N< @>A @P< @>P @HNOUN @INTERJ @>Num @Pron< @>Pron @Num< @OBJ @<OBJ @OBJ> @OPRED @<OPRED @OPRED> @PCLE @COMP-CS< @SPRED @<SPRED @SPRED> @SUBJ @<SUBJ @SUBJ> SUBJ SPRED OPRED @PPRED @APP @APP-N< @APP-Pron< @APP>Pron @APP-Num< @APP-ADVL< @VOC @CVP @CNP OBJ

    -OTHERS SYN-V @X ### Sets containing sets of lists and tags This part of the file lists a large number of sets based partly upon the tags defined above, and partly upon lexemes drawn from the lexicon. See the sourcefile itself to inspect the sets, what follows here is an overview of the set types. #### Sets for Single-word sets INITIAL #### Sets for word or not WORD NOT-COMMA #### Case sets ADLVCASE CASE-AGREEMENT CASE NOT-NOM NOT-GEN NOT-ACC #### Verb sets NOT-V #### Sets for finiteness and mood REAL-NEG MOOD-V NOT-PRFPRC #### Sets for person SG1-V SG2-V SG3-V DU1-V DU2-V DU3-V PL1-V PL2-V PL3-V #### Pronoun sets #### Adjectival sets and their complements #### Adverbial sets and their complements #### Sets of elements with common syntactic behaviour #### NP sets defined according to their morphosyntactic features #### The PRE-NP-HEAD family of sets These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression **WORD - premodifiers**. #### Border sets and their complements #### Grammarchecker sets * * * This (part of) documentation was generated from [tools/grammarcheckers/grammarchecker.cg3](https://github.com/giellalt/lang-vro/blob/main/tools/grammarcheckers/grammarchecker.cg3) --- ## tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.md ## Tokeniser for vro Usage: ``` $ make $ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://github.com/hfst/hfst/wiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1. unknown word-like forms, and 2. unmatched strings We want to give 1) a match, but let 2) be treated specially by `hfst-tokenise -a` Unknowns are made of: * lower-case ASCII * upper-case ASCII * select extended latin symbols ASCII digits * select symbols * Combining diacritics as individual symbols, * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" ### Unknown handling Unknowns are tagged ?? and treated specially with `hfst-tokenise` hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Finally we mark as a token any sequence making up a: * known word in context * unknown (OOV) token in context * sequence of word and punctuation * URL in context * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-disamb-gt-desc.pmscript](https://github.com/giellalt/lang-vro/blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript) --- ## tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.md ## Grammar checker tokenisation for vro Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just: ``` $ make $ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` More usage examples: ``` $ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://github.com/hfst/hfst/wiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a * select extended latin symbols * select symbols * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" TODO: Could use something like this, but built-in's don't include šžđčŋ: Simply give an empty reading when something is unknown: hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Finally we mark as a token any sequence making up a: * known word in context * unknown (OOV) token in context * sequence of word and punctuation * URL in context * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript](https://github.com/giellalt/lang-vro/blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript) --- ## tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.md ## TTS tokenisation for smj Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just: ```sh make echo "ja, ja" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` More usage examples: ```sh echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \ boasttu olmmoš, man mielde lahtuid." \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst echo "márffibiillagáffe" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a * select extended latin symbols * select symbols * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" TODO: Could use something like this, but built-in's don't include šžđčŋ: Simply give an empty reading when something is unknown: hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Needs hfst-tokenise to output things differently depending on the tag they get * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-tts-cggt-desc.pmscript](https://github.com/giellalt/lang-vro/blob/main/tools/tokenisers/tokeniser-tts-cggt-desc.pmscript)

    Sitemap