Livvi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-olo

Page Content

Livvi language model documentation

All doc-comment documentation in one large file.


src-cg3-dependency.cg3.md

C O M M O N S Á M I D E P E N D E N C Y G R A M M A R

This dep file is for sma, sme, smj, sje.

DELIMITERS

Sentence delimiters are the following: <.> <!> <?> <…> <¶>

TAGS AND SETS

N V A Adv CC CS Inf Sup Neg Num Po Pr

Pcle Prop

Pron IV TV COMMA DASH CITATION to keep colouring we add a “ HYPHEN QMARK PUNCT LEFT RIGHT CLB Ind Pot Impr ImprtII Cond ConNeg Caus causative eus VGen Interj ABBR ACR Prs Prt Cmpnd RCmpnd PrfPrc PrsPrc Actor Actio Ger Indef Nom Acc Ill Com Gen Ess

IM For fao

POS sub-categories

Syntactic tags and sets

Syntactic tags in input to this file

Syntactic tags added in this file

fao syntags

kal syntags

eus syntags

Syntactic set definitions

Dep grammar

Correction rules

The finite verb

Mapping rules

lgRemove removes the language tags , , etc, before proceeding to the dep file.


This (part of) documentation was generated from src/cg3/dependency.cg3


src-cg3-disambiguator.cg3.md

Disambiguator for Olonets

Sets

Sentence delimiters are the following: “<.>” “<…>” “<!>” “<?>” “<¶>”

Part-of-Speech

Numerus

Cases

Types

Sets with more members

Boundaries

Verbs

Disambiguation rules

Dialects

Early rules

Possessive suffixes

Numeral phrases

Preposition/postposition/adverb rules

Rules for mapping @CVP and @CNP on the CC and CS

Case rules

Partitive

Genitive

Illative

Number rules

More disambiguation rules

Elative

Propernouns

Verbs

Specific verbs

ei negation verb

eli

Adverbs

paljon

kerran

jälkhiin

Adjectives

Conjunctions

Subjunctions

että

jos

ko

sillä

Pronouns

Verb rules, Verbs

Infinitive

Present Sg3

Present Pl3 or PrsPrc

Present Pl3 or Passive

Imperative

Past tense

Prt Pl3 or Prt Sg2

Negative verb

Relative pronouns

HNOUN MAPPING


This (part of) documentation was generated from src/cg3/disambiguator.cg3


src-cg3-functions.cg3.md

These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.

The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NPT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)

These were the set types.

HABITIVE MAPPING

sma object

SUBJ MAPPING - leftovers

OBJ MAPPING - leftovers

HNOUN MAPPING


This (part of) documentation was generated from src/cg3/functions.cg3


src-fst-morphology-affixes-abbreviations.lexc.md

Lexicons without final period

Lexicons with final period


This (part of) documentation was generated from src/fst/morphology/affixes/abbreviations.lexc


src-fst-morphology-affixes-adjectives.lexc.md

Livvi adjective inflection

Temporary lexica

Somewhat open issues

LEXICON A_BAHUV

LEXICON A_UNDECL undeclinable fin: taipumaton

LEXICON A_IS-N-PL-GEN genitive plural attributes

LEXICON A_IS-N-SG-GEN genitive plural attributes

LEXICON A-DEM_NÄMÄ nämä:nämä

ONE-SYLLABLE VOWEL-FINAL STEMS ENDING IN LONG VOWEL SEGMENT

LEXICON A_KUU kuu:kuu

LEXICON A_MUA mua:maa

LEXICON A_PIÄ piä:piä

Ordinary inflection

TWO-SYLLABLE VOWEL-FINAL STEMS

LEXICON A_HYVÄ hyvä:hyvä the comparatives and superlatives are supletive

LEXICON A_OZA oza:oza

LEXICON A_SÄYNÄ säynä:säynä

LEXICON A_KALA kala:kala

LEXICON A_KOIVU koivu:koivu

LEXICON A_HERY hery:hery

LEXICON A_LUGU lugu:lugu

LEXICON A_IDY idy:idy

LEXICON A_HÄKKI häkki:häkki

LEXICON A_ARTELI

LEXICON A_ALUS alus:aluks

LEXICON A_KUURNIS kuurnis:kuurne

LEXICON A_PUHTAHUS puhtahus:puhtahu

LEXICON A_OLUT olut:olu

TWO-SYLLABLE VOWEL-FINAL STEM WITH UNIQUE +Nom+Sg VOWEL

LEXICON A_PÄIVY päivy:päivä

LEXICON A_MUARJU muarju:muarja

LEXICON A_AKKU akku:akka

LEXICON A_VALGEI valgei:valge

LEXICON A_RAHMANNOI rahmannoi:rahmannoi

LEXICON A_PAGIZII pagizii:pagizi

LEXICON A_KESTÄY kestäy:kestä

TWO-SYLLABLE VOWEL-FINAL STEMS WITH CONSONANT-FINAL PARTITIVE STEM

THREE-SYLLABLE VOWEL-FINAL STEMS

LEXICON A_PAREMBI parembi:paremb

LEXICON A_JIÄTÖI jiätöi:jiät LEXICON A_HUOLETOI huoletoi:huolet LEXICON A_HUOLETOI/JIÄTÖI huoletoi:huolet

LEXICON A_KARJALAINE karjalaine:karjala

LEXICON A_LIYGILÄINE liygiläine:liygilä

LEXICON A_NAINE naine:nai LEXICON A_KIELINE kieline:kieli LEXICON A_NAINE/KIELINE_01 kieline:kieli naine:nai

LEXICON A_TOINE toine:to LEXICON A_TOINE-PL toine:to

THREE-SYLLABLE STEMS WITH TWO-SYLLABLE NOMINATIVE SINGULAR

LEXICON A_MADAL madal:madal

LEXICON A_PIIRAI piirai:piira

LEXICON A_RAIŠ raiš:ra

LEXICON A_PEREH pereh:pereh

LEXICON A_TULLUH tulluh:tullu

LEXICON A_PESSYH pessyh:pessy

LEXICON A_ARMAS armas:arma

LEXICON A_VARVAS varvas:varva

LEXICON A_TUORES tuores:tuore

LEXICON A_SUARI suari:suar

LEXICON A_KIELI kieli:kiel LEXICON A_SUARI/KIELI_01 suari:suar

LEXICON A_VUOZI vuozi:vuod

LEXICON A_VEZI vezi:ved

LEXICON A_NIMI nimi:nim

LEXICON A_JÄLGI jälgi:jälg front vowel gradation Yes

TWO-SYLLABLE WORD WITH CONSONANT-FINAL STEM

LEXICON A_VAŽEN važen:važe

LEXICON A_LÄMMIN lämmin:lämbi

LEXICON A_TAIGIN taigin:taigin

LEXICON A_SALBOIN salboin:salboi

LEXICON A_ENIN enin:eni

These cases are symmetrically marked for number The next two share the same stem vowel


This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc


src-fst-morphology-affixes-adverbs.lexc.md

Adverbs Olonets-Karelian adverbs compare.

LEXICON ADV-IS-ELA-WITH-PXSG3 e.g. levälleh


This (part of) documentation was generated from src/fst/morphology/affixes/adverbs.lexc


src-fst-morphology-affixes-clitics.lexc.md

Clitics Livvi clitics


This (part of) documentation was generated from src/fst/morphology/affixes/clitics.lexc


src-fst-morphology-affixes-nouns.lexc.md

Noun inflection

Livvi nouns inflect in cases. Vowel harmony involves front and back Gradation does not affect all consonants, therefore there are three values: Yes, No and NA (not applicable)

The file proper

ONE-SYLLABLE VOWEL-FINAL STEMS ENDING IN LONG VOWEL SEGMENT

LEXICON N_SUO suo:suo Gradation: No Harmony: Back

LEXICON N_VYÖ vyö:vyö Gradation: No Harmony: Front

LEXICON N_KUU kuu:kuu Gradation: No Harmony: Back

LEXICON N_PII pii:pii Gradation: No Harmony: front

LEXICON N_MUA mua:mua Gradation: No Harmony: Back

LEXICON N_PIÄ piä:piä Gradation: No Harmony: Front

TWO-SYLLABLE VOWEL-FINAL STEMS

LEXICON N_PAPPI pappi:pappi Gradation: Yes Harmony: Back stem final i is retained

LEXICON N_HÄKKI häkki:häkki Gradation: Yes Harmony: Front stem final i is retained

LEXICON N_LEIRI leiri:leiri Gradation NA Harmony: Front stem final i is retained

LEXICON N_PADA pada:pada Gradation Yes Harmony: Back stem final a changes to u in Sg Par stem final a changes to o before i in Pl stem

LEXICON N_KALA kala:kala Gradation NA Harmony: Back stem final a changes to u in Sg Par stem final a changes to o before i in Pl stem

LEXICON N_OZA oza:oza Gradation NA Harmony: Back stem final a changes to u in Sg Par stem final a changes to 0 before i in Pl stem

LEXICON N_SÄYNÄ säynä:säynä Gradation NA Harmony: Front stem final ä changes to i in Sg Par stem final ä changes to 0 before i in Pl stem

LEXICON N_KOIVU koivu:koivu Gradation NA Harmony: Back There are no changes in stem-final u Plural stem in loi

LEXICON N_HERY hery:hery Gradation NA Harmony: Front There are no changes in stem-final y Plural stem in löi

LEXICON N_IDY idy:idy Gradation Yes Harmony: Front There are no changes in stem-final y Plural stem in löi

LEXICON N_LUGU lugu:lugu Gradation Yes Harmony: Back There are no changes in stem-final u Plural stem in loi

LEXICON N_RUNO runo:runo Gradation NA Harmony: Back Stem-fianl o changes to u in Sg Par Plural stem in loi

LEXICON N_HÖRÖ hörö:hörö Gradation NA Harmony: Front

LEXICON N_RUADO ruado:ruado Gradation Yes Harmony: Back

LEXICON N_KYNDÖ kyndö:kyndö Gradation Yes Harmony: Front

TWO-SYLLABLE VOWEL-FINAL STEM WITH UNIQUE +Nom+Sg VOWEL

LEXICON N_JÄLGI jälgi:jälg Gradation Yes Harmony: Front

LEXICON N_JOGI jogi:jog Gradation Yes Harmony: Back

LEXICON N_MUAMO muamo:muama Gradation NA Harmony: Back

LEXICON N_TUATTO tuatto:tuatta Gradation Yes Harmony: Back

LEXICON N_DIÄDÖ diädö:diädä Gradation No Harmony: Front

LEXICON N_MUARJU muarju:muarja Gradation No Harmony: Back two forms for accusative two forms for elative, ablative phrases

LEXICON N_PIÄSTÄNDY piäständy:piäständä Gradation No Harmony: Front

LEXICON N_SUAJU suaju:suaja Gradation No Harmony: Back

LEXICON N_AKKU akku:akka Gradation Yes Harmony: Back

LEXICON N_KNIIGU kniigu:kniiga Gradation No Harmony: Back

LEXICON N_SULGU sulgu:sulga Gradation Yes Harmony: Back

LEXICON N_KOIRU koiru:koira Gradation NA Harmony: Back

LEXICON N_NIMI nimi:nim Gradation NA Harmony: Front

LEXICON N_HANGI hangi:hang Gradation NA Harmony: Back

LEXICON N_PÄIVY päivy:päivä Gradation NA Harmony: Front

LEXICON N_MEČČY meččy:meččä Gradation Yes Harmony: Front

LEXICON N_IŽÄNDY ižändy:ižändä Gradation No Harmony: Front

LEXICON N_LATE late:latte Gradation Yes Harmony: Back

LEXICON N_SIVE sive:side Gradation Yes Harmony: Front

LEXICON N_HARDIE hardie:hardie Gradation NA Harmony: Back

LEXICON N_KONDII kondii:kondi Gradation NA Harmony: Back

LEXICON N_STIPENDII stipendii:stipendi Gradation NA Harmony: Back

LEXICON N_REBOI reboi:reboi Gradation No Harmony: Back

LEXICON N_JÄNÖI jänöi:jänöi Gradation No Harmony: Back

LEXICON N_PÖČÖI pöčöi:pöččö Gradation Yes Harmony: Front

LEXICON N_VALGEI valgei:valge Gradation NA Harmony: Back

LEXICON N_LIBEI libei:libe Gradation NA Harmony: Back

LEXICON N_OSTAI ostai:osta Gradation NA Harmony: Back

LEXICON N_PEZII pezii:pezi Gradation NA Harmony: Front

LEXICON N_KESTÄY kestäy:kestä Gradation NA Harmony: Front

TWO-SYLLABLE VOWEL-FINAL STEMS WITH CONSONANT-FINAL PARTITIVE STEM

LEXICON N_UKSI uksi:uks Gradation NA Harmony: Back

LEXICON N_SUARI suari:suar Harmony: Back LEXICON N_SUARI-PL suari:suar Harmony: Back

LEXICON N_SUARI/KIELI_01 kieli:kiel Gradation No

LEXICON N_KIELI kieli:kiel Harmony: Front

LEXICON N_KIELI-SG kieli:kiel LEXICON N_KIELI-PL kieli:kiel Harmony: Front

LEXICON N_LAPSI lapsi:laps lapsi:laps Gradation NA Harmony: Back

LEXICON N_VEZI vezi:ved Gradation NA Harmony: Front

LEXICON N_SUZI suzi:su Gradation NA Harmony: Back

LEXICON N_VUOZI vuozi:vuod Gradation NA Harmony: Back

THREE-SYLLABLE VOWEL-FINAL STEMS

LEXICON N_SYGYZY sygyzy:sygyzy Gradation NA Harmony: Front

LEXICON N_VASKIČČU vaskičču:vaskičča Gradation Yes Harmony: Back

THREE-SYLLABLE STEMS WITH TWO-SYLLABLE NOMINATIVE SINGULAR

LEXICON N_KARJAL karjal:karjal Gradation NA Harmony: Back LEXICON N_KARJAL-SG karjal:karjal LEXICON N_KARJAL-PL karjal:karjal

LEXICON N_MADAL madal:madal Gradation No Harmony: Back LEXICON N_MADAL-SG madal:madal LEXICON N_MADAL-PL madal:madal

LEXICON N_PIIRAI piirai:piira CHECK THIS Gradation NA Harmony: Back

LEXICON N_VERÄI veräi:verä CHECK THIS Gradation NA Harmony: Back

LEXICON N_KANDAI kandai:kanda Gradation NA Harmony: Back

LEXICON N_AVUAJU Gradation NA Harmony: Back

LEXICON N_KERIÄJY piästäjy:piästä Gradation NA Harmony: Front

LEXICON N_PAGIZII pagizii:pagizi Gradation NA Harmony: Back LEXICON N_PAGIZII-SG LEXICON N_PAGIZII-PL

LEXICON N_HUOLETOI huoletoi: huolet Gradation Yes Harmony: Back

LEXICON N_SAMMAL sammal:sammal Gradation No Harmony: Back

LEXICON N_ŠOUFER šoufer:šoufer Vowel Harmony: Back

LEXICON N_VANUIN vanuin:vanui Gradation Yes Harmony: Back Stem consonant n/m Singular stem vowel 0/0/e Plural stem vowel i

LEXICON N_SAMMUTIN sammutin:sammutti Gradation Yes Harmony: Back Stem consonant n/m Singular stem vowel 0/0/e Plural stem vowel i

LEXICON N_KEITIN keitin:keitti Gradation Yes Harmony: Front Stem consonant n/m Singular stem vowel 0/0/e Plural stem vowel i

LEXICON N_LÄMMIN lämmin:lämbi Gradation Yes Harmony: Front

LEXICON N_TAIGIN taigin:taigin Gradation No Harmony: Back

LEXICON N_KARJALAINE karjalaine:karjala Gradation NA Harmony: Back

LEXICON N_LIYGILÄINE liygiläine:liygilä Gradation NA Harmony: Front

LEXICON N_NAINE naine:nai Gradation NA Harmony: Back

LEXICON N_KIELINE kieline:kieli Gradation NA Harmony: Front

LEXICON N_TOINE toine:to Gradation NA Harmony: Back

LEXICON N_RAIŠ raiš:ra Gradation Yes Harmony: Back

LEXICON N_TOVESTEH tovesteh:tovesteh Gradation No Harmony: Back

LEXICON N_PEREH pereh:pereh Gradation No Harmony: Front

LEXICON N_HUIKEH huikeh:huikkeh Gradation Yes Harmony: Front

LEXICON N_LIIKEH liikeh:liikkeh Gradation Yes Harmony: Front

LEXICON N_PENGER penger:penger Gradation No Harmony: Front

LEXICON N_ARTELI arteli:arteli Gradation No Harmony: Back

LEXICON N_PUHTAHUS puhtahus:puhtahu Harmony: Back Gradation NA

LEXICON N_VÄHYS vähys:vähy Harmony: Back Gradation NA

LEXICON N_ALUS alus:aluks Gradation No Harmony: Back

LEXICON N_ALUS-SG alus:aluks

LEXICON N_ALUS-PL alus:aluks

LEXICON N_ILVES ilves:ilveks Gradation No Harmony: Front

LEXICON N_ILVES-SG ilves:ilveks

LEXICON N_ILVES-PL ilves:ilveks

LEXICON N_MIES mies:mie Gradation No Harmony: Front

LEXICON N_MIES-SG mies:mie

LEXICON N_MIES-PL mies:mie

LEXICON N_KUURNIS kuurnis:kuurne Gradation NA Harmony: Back

LEXICON N_SUALIŠ suališ:suali Gradation NA Harmony: Back

LEXICON N_SUALIŠ-SG suališ:suali

LEXICON N_SUALIŠ-PL suališ:suali

LEXICON N_OLUT olut:olu Gradation No Harmony: Back

LEXICON N_KEVÄT kevät:kevä Gradation No Harmony: Front

LEXICON N_TUHAT tuhat:tuha Gradation No Harmony: Back

LEXICON N_ARMAS armas:arma Gradation NA Harmony: Back

LEXICON N_VARVAS varvas:varba Gradation NA Harmony: Back

LEXICON N_PAREMBI parembi:paremb Gradation NA Harmony: Back

LEXICON N_PESSYH pessyh:pessy Gradation NA Harmony: Front

LEXICON N_BEMMEL bemmel:bembel Gradation Yes Harmony: Front

LEXICON N_SUURIM suurim:suurim Gradation NA Harmony: Back

LEXICON N_TUATINDAM tuatindam:tuatindam Gradation NA Harmony: Back

LEXICON N_TUATANDIM tuatandim:tuatandim Gradation NA Harmony: Back

LEXICON N_SIEMEN siemen:siemen Gradation NA Harmony: Front

LEXICON N_SALBOIN salboin:salboi Gradation No Harmony: Back

LEXICON N_UDAR udar:udar Gradation No Harmony: Back

LEXICON N_PIENAR pienar:piendar Gradation Yes Harmony: Back

NOMINAL DECLENSION BEGINS

Back vowel gradation Yes

LEXICON NMN_MUARJU/PIÄSTÄNDY muarju:muarja gradation NA

LEXICON NMN_AKKU akku:akka gradation Yes

LEXICON NMN_KNIIGU kniigu:kniiga gradation No

NMN = Nominals mutually shared case marking for nouns, adjectives, proper ouns, numerals

Front Vowel Gradation Yes

LEXICON NMN_YKSI yksi:y

LEXICON NMN_VUOZI/VEZI vuozi:vuod

LEXICON NMN_PAREMBI/ENÄMBI parembi:paremb

LEXICON NMN_KUDAI kudai:kuda

LEXICON NMN_PIIRAI/VERÄI piirai:piira

LEXICON NMN_PAGIZII-SG

LEXICON NMN_PAGIZII-PL

LEXICON NMN_HUOLETOI/JIÄTÖI huoletoi: huolet

Gradation Yes Vowel Harmony Back

LEXICON NMN_KOIRU/PÄIVY koiru:koira Gradation NA Vowel Harmony Back

LEXICON NMN_AVUAJU/KERIÄJY päivy:päivä Gradation NA

gradation Yes

LEXICON NMN_KOIVU/HERY koivu:koivu Gradation NA Vowel Harmony Back

LEXICON NMN_LUGU/IDY lugu:lugu Gradation Yes Vowel Harmony Back

LEXICON NMN_RAHMANNOI rahmannoi:rahmannoi

LEXICON NMN_HARDIE hardie:hardie

LEXICON NMN_KONDII/STIPENDII kondii:kondi

LEXICON NMN_OSTAI ostua:osta%>j

Stem Vowel 0:a:0 kandai, kandajan, kandajua, kandajinnu

LEXICON NMN_PEZII pestä:pezi%>j

LEXICON NMN_KESTÄY kestäy:kestä

LEXICON NMN_RUNO/HÖRÖ runo:runo Gradation No

Gradation No

LEXICON NMN_RUADO/KYNDÖ ruado:ruado Gradation Yes

LEXICON NMN_KUU/PII kuu:kuu

LEXICON NMN_PIÄ piä:piä

LEXICON NMN_VYÖ vyö:vyö

LEXICON NMN_MUA mua:mua

LEXICON NMN_KALA kala:kala Gradation NA

LEXICON NMN_PADA pada:pada Yaml: pada Gradation Yes

LEXICON NMN_TULLUH/PESSYH pessyh

Nominative singular in “h”

LEXICON NMN_TULLUH tulluh:tullu

Nominative singular in “h”

LEXICON NMN_KARJAL karjal:karjal

LEXICON NMN_KARJAL-SG karjal:karjal LEXICON NMN_KARJAL-PL karjal:karjal

LEXICON NMN_MADAL madal:madal

LEXICON NMN_MADAL-SG madal:madal LEXICON NMN_MADAL-PL madal:madal

Nominative singular in “m”

Nominative singular in “n”

LEXICON NMN_ENIN enin:eni

Nominative singular in “r”

Nominative singular in “s”

LEXICON NMN_PUHTAHUS/VÄHYS puhtahus:puhtahu

LEXICON NMN_ALUS/ILVES alus:aluks

LEXICON NMN_ARMAS/EVAES armas:arma

LEXICON NMN_VARVAS varvas:varba

LEXICON NMN_VIDEL videl:videl Gradation No

LEXICON NMN_TUORES tuores:tuore

LEXICON NMN_RAIŠ raiš:ra

LEXICON NMN_KUURNIS kuurnis:kuurne

Gradation None

LEXICON NMN_TOVESTEH/PEREH pereh:pereh

LEXICON NMN_ARTELI/LEIRI arteli:arteli

LEXICON NMN_PAPPI/HÄKKI pappi:pappi

LEXICON NMN_REBOI/JÄNÖI reboi:reboi

LEXICON NMN_OZA/SÄYNÄ oza:oza

LEXICON NMN_TVER tver:tver Gradation NA Front Vowel Singular stem vowel 0/i Plural stem vowel il%{oö%}i

Gradation NA Back Vowel Singular stem vowel 0/i Plural stem vowel il%{oö%}i

LEXICON NMN_VAŽEN važen:važe

LEXICON NMN_LÄMMIN lämmin:lämbi

LEXICON NMN_TAIGIN taigin:taigin

Nominative singular in “v”

Singular suffixes

Plural suffixes

SINGULAR POSSESSA

LEXICON SGNOM/PXSP3 adding -h


This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc


src-fst-morphology-affixes-numerals.lexc.md

Olonets numerals

Numeral inflection

Numeral inflection is like nominal, except that numerals compound in all forms which requires great amount of care in the inflection patterns.


This (part of) documentation was generated from src/fst/morphology/affixes/numerals.lexc


src-fst-morphology-affixes-pronouns.lexc.md

Pronoun inflection Livvi pronouns inflect for case.

DEMONSTRATIVE PRONOUNS

LEXICON PRON_TÄMÄ tämä:tä

LEXICON PRON-DEM_NÄMMÄ nämmä:nämmä

LEXICON PRON-DEM_NET net:n

INDEFINITE

LEXICON PRON-INDEF_ Still requires work

LEXICON PRON-INDEF_KEN kentahto:ke

LEXICON PRON-INDEF_KUDAI kudaitahto:kuda

LEXICON PRON-INDEF_MI mitahto:mi

LEXICON PRON-INDEF_MITTUINE mittuinetahto:mittu

LEXICON PRON-INDEF_NIMI nimi:ni«mi

LEXICON PRON-INDEF_NIKEN niken:ni«ke

LEXICON PRON-INDEF_NIMITTUINE nimittuine:ni«mittu

INTERROGATIVE

LEXICON PRON-INTERR_ Still requires work

LEXICON PRON-INTERR_MI mi:mi

LEXICON PRON-INTERR_MITTUINE mittuine:mittu

LEXICON PRON-INTERR_KEN ken:ke

REFLEXIVE PRONOUNS

LEXICON PRON-REFL_ iče:ičče iččiedäh, iččedäh, iččeh, ičes, iččenäh, iččiedäs, iččeni,

RELATIVE PRONOUNS

LEXICON PRON-REL_KUDAI kudai:kuda

LEXICON PRON_ARMAS armas:arma

LEXICON PRON_OZA oza:oza

LEXICON PRON-QNT_KIELI kieli:kiel

LEXICON PRON_TOINE toine:to

LEXICON PRON_KAI requires developing

LEXICON PRON_ENÄMBI enämbi:enämb


This (part of) documentation was generated from src/fst/morphology/affixes/pronouns.lexc


src-fst-morphology-affixes-propernouns.lexc.md

Proper noun inflection The LIVVI-KARELIAN language proper nouns inflect in the same cases as regular nouns, but sometimes with a colon (‘:’) as separator.

LEXICON PROP_

ONE-SYLLABLE LEMMA AND STEM

LEXICON PROP_VYÖ vyö:vyö

LEXICON PROP_MUA mua:maa

TWO-SYLLABLE LEMMA AND STEM

LEXICON PROP_OZA Gradation NA Back vowel Stem vowel a Plural stem in i

LEXICON PROP_OZA_FEM

LEXICON PROP-PLC_OZA

LEXICON PROP_KALA Gradation NA Back vowel Stem vowel a Plural stem in oi

LEXICON PROP_KALA_PATRFEM

LEXICON PROP-PLC_KALA

LEXICON PROP_KALA_SURFEM

LEXICON PROP_PAPPI Gradation Yes Back vowel Stem vowel i Plural stem in iloi

LEXICON PROP_PAPPI-SG

LEXICON PROP_PAPPI-PL

LEXICON PROP-PLC_PAPPI

LEXICON PROP_ARTELI Gradation NA Back vowel Stem vowel i Plural stem in iloi

LEXICON PROP_ARTELI-SG

LEXICON PROP_ARTELI-PL

LEXICON PROP_LEIRI Gradation NA Front vowel Stem vowel i Plural stem in iloi

LEXICON PROP_LEIRI-SG

LEXICON PROP_LEIRI-PL

LEXICON PROP_NIMI nimi:nim Gradation NA Front vowel Stem vowel i/e Plural stem in i

LEXICON PROP_JÄLGI Gradation Yes Vowel Harmony Front Stem Vowel i/0/e Plural stem in i

LEXICON PROP_JÄLGI-SG

LEXICON PROP_JÄLGI-PL

LEXICON PROP_JOGI Gradation Yes Vowel Harmony Back Stem Vowel i/0/e Plural stem in i

LEXICON PROP_JOGI-SG

LEXICON PROP_JOGI-PL

LEXICON PROP_SUARI suari:suar Gradation NA Vowel Harmony Back Stem Vowel i/0/e Plural stem in i

LEXICON PROP_REBOI reboi:reboi Gradation NA Vowel Harmony Back Stem Vowel oi/o Plural stem in oloi

LEXICON PROP_KOIRU koiru:koira Gradation NA Back vowel Stem vowel u/a Plural stem in i

LEXICON PROP-PLC_KOIRU koiru:koira

LEXICON PROP_PÄIVY päivy:päivä Gradation NA Vowel Harmony Front Stem Vowel y/ä Plural stem in i

LEXICON PROP-PLC_KNIIGU kniigu:kniiga Gradation No (looks like it should have gradation) Vowel Harmony Back Stem Vowel u/a Plural stem in oi

LEXICON PROP_MUARJU muarju:muarja Gradation NA Vowel Harmony Back Stem Vowel u:a Plural stem in o

LEXICON PROP-PLC_MUARJU muarju:muarja

LEXICON PROP_AKKU akku:akka Gradation Yes Vowel Harmony Back Stem Vowel u:a Plural stem in o

LEXICON PROP_KOIVU koivu:koivu Back vowel Gradation NA Stem vowel u Plural stem in loi Can be merged with _RUNO

LEXICON PROP_RUNO runo:runo Back vowel Gradation NA Stem vowel o Plural stem in loi

LEXICON PROP_RUADO ruado:ruado Back vowel Gradation Yes Stem vowel o Plural stem in loi

LEXICON PROP-PLC_RUADO ruado:ruado

LEXICON PROP_KYNDÖ kyndö:kyndö Front vowel Gradation Yes Stem vowel o Plural stem in loi

LEXICON PROP_VALGEI Back vowel Gradation NA

LEXICON PROP_VALGEI-SG

LEXICON PROP_VALGEI-PL

TWO-SYLLABLE LEMMA THREE-SYLLABLE STEM

LEXICON PROP_KARJAL karjal:karjal Back vowel Gradation NA Singular stem vowel 0/a Plural stem vowel o

LEXICON PROP-MAL_KARJAL

LEXICON PROP_KARJALAINE karjalaine:karjala

LEXICON PROP_KIELINE kieline:kieli

LEXICON PROP-PLC_TVER Tver:Tver

LEXICON PROP-PLC_TAIGIN

LEXICON PROP_PEREH pereh:pereh

LEXICON PROP_VIDEL videl:videl

LEXICON PROP-PLC_ALUS Alus:Aluks

LEXICON PROP_ALUS Alus:Aluks

LEXICON PROP_KONDII kondii:kondi

LEXICON PROP_STIPENDII kondii:kondi


This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc


src-fst-morphology-affixes-quantifiers.lexc.md

Quantifier inflection Livvi quantifiers inflect for case.

LEXICON NUM_MUARJU muarju:muarja

LEXICON NUM_MILJOUNU muarju:muarja

LEXICON NUM_YKSI yksi:y

LEXICON NUM_KAKSI kaksi:ka

LEXICON NUM_SEIČČIE seiččie:seičče

LEXICON NUM_NELLI nelli:nell

LEXICON NUM_KAHEKSA kaheksa:kaheksa

LEXICON NUM_YHEKSÄ yheksä:yheksä

LEXICON NUM_TUHAT tuhat:tuha LEXICON NUM_TUHAT_01 tuhat:tuha

LEXICON NUM_KUUZI kuuzi:kuud

LEXICON NUM_VIIZI viizi:viid

LEXICON NUM_KYMMENE kymmene:kymmen LEXICON NUM_KYMMENE_01 kymmene:kymmen

LEXICON NUM_KOLME

LEXICON ORD_TOINE toine:to

LEXICON ORD_KARJALAINE enzimäine:ensimä

LEXICON ORD_LIYGILÄINE enzimäine:ensimä


This (part of) documentation was generated from src/fst/morphology/affixes/quantifiers.lexc


src-fst-morphology-affixes-rus-Cyrl-2-Lat-propernouns.lexc.md

Proper noun inflection

Erzya proper nouns inflect in the same cases as regular nouns.

Vili:Vil

Russian type Surnames Abdʼejev:Abdʼejev

Bagrij:Bagr

Amorskij:Amorsk

DECLENSION LIMITATIONS


This (part of) documentation was generated from src/fst/morphology/affixes/rus-Cyrl-2-Lat-propernouns.lexc


src-fst-morphology-affixes-symbols.lexc.md

Symbol affixes


This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc


src-fst-morphology-affixes-verbs.lexc.md

Verb inflection

Temporary lexica

Auxiliary verbs

Regular verbs

Verbs of the Finnish type 1

V1a

V1o

V1u

V1i

VERBS OF FINNISH TYPE 2 in dA

Verbs of the Finnish type 3

Verbs of the Finnish type 3 but not identical

Verbs of the Finnish type 3 but not identical gradation: yes

Verbs of the Finnish type 4

gradation: no

gradation: no

gradation: no

gradation: no

gradation: yes

gradation: yes

gradation: no

gradation: no

gradation: no

gradation: no

gradation: yes

gradation: yes

gradation: yes

Verbs of the Finnish type 5

Reflexive verbs

V1 This verb type has two final vowels in the first infinitive

LEXICON V-3SYLL_KIRJUTTUA kirjuttua:kirjutta

REFLEXIVE CONJUGATION

HOW WILL THESE WORK

HOW WILL THESE WORK

HOW WILL THESE WORK

Nonfinites

Forthcoming

Finites

INDICATIVE PRESENT

INDICATIVE PRESENT REFLEXIVE

INDICATIVE PRETERITE

INDICATIVE PRETERITE REFLEXIVE

Conditional

CONDITIONAL PRETERITE

IMPERATIVE

IMPERATIVE REFLEXIVE

… and next chapter


This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc


src-fst-morphology-phonology.twolc.md

The Livvi (Olonets) Karelian morphophonological/twolc rules file

This file documents the phonology.twolc file

Alphatbet, sets

a b c č d e f g h i j k l m n o p r s š ş t u v w y z ž ƶ ü ä ö A B C Č D E F G H I J K L M N O P R S Š Ş T U V W Y Z Ž Ƶ Ü Ä Ö y Y

%{aä%}:a %{aä%}:ä

%{aoeInf%}:a Back vowel infinitive marker %{äöeInf%}:ä Front vowel infinitive marker

%{ui%}:i in imperative before %{aä%}:

%{oö%}:o %{oö%}:ö

%{oöØ%}:0 Used in present participle

%{uy%}:y

%{ijPRC%}:j participle

this appears in the illative V1:a V1:e V1:i V1:o V1:u V1:y V1:ä V1:ö

These appear with the inessive and adessive V2:a V2:e V2:i V2:o V2:u V2:y V2:ä V2:ö

These reduplicate the preceding vowel if it in turn is preceded by a consonant V3:a V3:e V3:i V3:o V3:u V3:y V3:ä V3:ö V3:0

%^DEVOICE:0
%^VOICE:0 pestä : pezen

%^SyllBound:0

%^KS2S:0

no change The example here is for something that should not be done We have two infinite sets, Olonets-Karelian and incoming loanwords. The original idea was to make a rule changing all instances of adjacent double aa to ua. For this reason a special trigger was to be inserted into the lexc stem of a word to prevent such a rule from occurring. Since the infinite Olonets-Karelian set is more predictable and perhaps smaller than the incoming loanword set, it is better to literally spell out adjacent vowels that are constant. 2019-09-02 JMR %^NONE:0 This will break vowel change, e.g. sa%^NONEamelaine

%^Pen:0 pagisou paistah in combination with WGStem to trigger

%^D2Z:0 ! The ti => zi

%^E2O:0 the e => o, e => ö

%^Ä2I:0 ä => i %^A2UÄ2I:0 a => u, ä => i

%^ILoss:0 the i => 0 reboi reboloi %^V2U:0 v => y kävvä käydy

_%^RVow:0 _ _%^RmVow:0 _ vowel removal, e.g. with superlative

Triggers dictating right context phenomena

Sets

Right context for gradation

Rules

Rule: %{aä%}:a kuvitella+V+Inf: imagine/kuvitella

Rule: %{aä%}:ä ezitellä+V+Inf: present/esitellä

Rule: %{aäoeö%}:a

Rule: %{aäoeö%}:ä heittiä+V+Inf: throw/heittää

Rule: %{aäoeö%}:o

Rule: %{aäoeö%}:e

Rule: %{aäoeö%}:ö

Rule: {aäoeöuiy%}:a

Rule: {aäoeöuiy%}:ä

Rule: {aäoeöuiy%}:o

Rule: {aäoeöuiy%}:e

Rule: {aäoeöuiy%}:ö

Rule: {aäoeöuiy%}:u

Rule: {aäoeöuiy%}:i

Rule: {aäoeöuiy%}:y

Rule: e:0

Rule: %{ui%}:u

* %{front%}:0  ! imperative forms
* *kanda%{back%}%>kk%{ui%}%{aä%}mm%{oö%}*
* *kanda0%>kkuammo*

Rule: %{ui%}:i

* %{front%}:0  ! imperative forms
* *lʼykkä%{front%}%^WGStem%>kk%{ui%}%{aä%}mm%{oö%}*
* *lʼyk0ä00%>kkiämmö*

Rule: %{ui%}:i Rule: %{ui%}:i Rule: %{ui%}:i

Rule: %{aäuyiØ%}:a

Rule: %{aäuyiØ%}:u

Rule: %{aäuyiØ%}:ä

Rule: %{aäuyiØ%}:y

Rule: %{aäuyiØ%}:i

Rule: %{aäuyiØ%}:0

a:u before subsequent a Diphthong a+a => ua ostua+V+Inf: buy/ostaa

ä:i before subsequent A2 Diphthong ä+ä => iä kehittiä+V+Inf: develop/kehittää

o:u before subsequent o Diphthong o+o => uo erota:eruou

ö:y before subsequent ö Diphthong ö+ö, ö+%{oö%}:ö => yö

e:i before subsequent :e Diphthong o+o => uo erota:eruou


* valge%>e%^WGStem%>t examples:*

* valgi%>e00t examples:*

* hävi%>%{aäPar%}%>n examples:*

* hävi%>e0n examples:*

* tiedo%^WGStem%>n examples:*

* tiijo0%>n examples:*

* *väge%{front%}%^WGStem>n*
* *vä0i00%>n*

* luge>%{ijPRC%} examples:*

* lugi%>j examples:*

e:o Vowel harmony suffixes Back


* luge%^E2O%>u examples:*

* lugo%>u examples:*

e:ö Vowel harmony

Rule: ä:y word final

Rule: ä:ö word final

Rule: a:0 in suaja:sai

*  a _ (%{back%}:)  %^RmVow:  ;  Vowel shortening before subsequent "i"

* koira%^RmVow%>i%>en examples:*

* koir00%>i%>en examples:*

* koira0%>j%>en examples:*

* vahna%^RmVow%>in examples:*

* vahn00%>in examples:*
* *otta%{back%}%^WGStem%^RmVow%>in*
* *ot00000%>in*

iToj between vowels

Rule: i:j

%{ijPRC%}:i

*  a _ (%{back%}:)  %^RmVow:  ;  +ActPrsPrc

Rule: a:o in the plural and preterite

a:e pidiä

ä:e piettih and in comparatives pidiä

Rule: ä:0 pidiä:pidi

Rule: i:0 reboi:reboloi

%{oö%}:o Vowel harmony suffixes Back %{oö%}:ö Vowel harmony suffixes Front

%{oöØ%}:0 Vowel harmony suffixes Back

%{oöØ%}:o Vowel harmony suffixes Back

%{oöØ%}:ö Vowel harmony suffixes Front

%{uy%}:u Vowel harmony suffixes Back

%{uy%}:y Vowel harmony suffixes Front

Consonant change

Rule: g:j

* *poiga%^WGStem%>n*
* *poija0%>n*

Rule: g:v

g:l

g:r


* särge%^WGStem%>n examples:*

* särre0%>n examples:*

* kergi%^WGStem%>t%{aäPar%} examples:*

* kerri0%>tä examples:*

d:v

d:v <=> [ ö y: | o u: ] _ [ ä: | a ] (HarmDummy:) %^WGStem:0 ; 
          u a           _   o       (%{back%}:) %^WGStem:0 ; 
          a             _   u       (%{back%}:) %^WGStem:0 ; 
        [ u o: | u: ]   _  [ (%{back%}:) e | a: ]  (%{back%}:) ((%^RmVow:) %> i )    %^WGStem:0 ;  
        [ ä y: ]   _  [(%{front%}:) e | y ]  (%{front%}:) ((%^RmVow:) %> i )    %^WGStem:0 ;  
* *täydy%{front%}%^WGStem*
* *tävvy00*

Rule: v:y d:j

Rule: y:v

Rule: u:v juodu+N+Pl+Ade

o:v Lengthening with Ut:vv weakening

d:z

d:t in partitive

Rule: s:z

* s:z <=> _ (HarmDummy:) %^VOICE:0 ;  pestä: pezen

k:g pestä: peskäh

rd:rr weakening

ld:ll weakening

nd:nn weakening mennä+Ind+Prs+ScPl3: mennäh

rn:rr in prtprc

ln:ll in prtprc

sn:ss in prtprc

Rule: %{dtlnr%}:d

Rule: %{dtlnr%}:t

Rule: %{dtlnr%}:l tulla+Ind+Prs+ScPl3: tullah

Rule: %{dtlnr%}:n

Rule: %{dtlnr%}:r

CONSONANT LOSS

čToZero

* s:z <=> _ (HarmDummy:) %^VOICE:0 ;  suvaija:suvaičen

kToZero aika: ajan

* *lʼykkä%{front%}%^WGStem%>t%{AÄ%}*
* *lʼy0kä00%>tä*
* *abuniekka%{back%}%^A2O%>i%>l*
* *abunie0ko00%>i%>l*
* *liikkeh%{front%}%^WGStem*
* *lii0keh00*

gToZero aika: ajan

ezitellä+V+Inf: present/esitellä

b:v

Rule: b:m

b:m <=> m _ [ a | i ] (HarmDummy:) %^WGStem:0 ;

p:0 in lapsi

Consonant loss

d:0 pidiä:piän

s:0


This (part of) documentation was generated from src/fst/morphology/phonology.twolc


src-fst-morphology-root.lexc.md

The tags and root lexica of the morphological fst of Livvi

Multichar symbols

The morphological analyses of wordforms of Livvi are presented in this system in terms of following symbols. (It is highly suggested to follow existing standards when adding new tags).

The parts-of-speech are:

Pronouns

These tags describe the parts of the compound.

The prefix (before “/”) is Cmp.

Useage

The Usage extents are marked using following tags:

The nominals are inflected in the following Case and Number

The possession is marked as such:

The comparative forms are:

Numerals and Quantifiers are classified under:

Verb tenses are: | +Prs | Present, non-past Tense

Verb moods are:

Verb personal forms are: +Rfl : This is a work around for olo passive. Olo has a passive conjugation, whereas Finnish and Estonian do not. Other verb forms are +Inf : Infinitive +Act : active voice +Pss : passive voice +PrfPrc : past participle +PrsPrc : present participle +RcPrfPrc : reflexive past participle +Ger : Gerund

+ConNeg +ConNegII +Neg +ImprtII +PrfPrcPl3 +Sup +VGen +VAbess

+ABBR +ACR

Non-dictionary words can be recognised with: +Guess

Question and Focus particles:

Pmatch 2021-03-13

semantic types of adverbs

Semantics are classified with

Derivations are classified under the morphophonetic form of the suffix, the source and target part-of-speech.

Morphophonology

To represent phonologic variations in word forms we use the following symbols in the lexicon files:

%{aoeInf%} Back vowel infinitive marker %{äöeInf%} Front vowel infinitive marker

And following triggers to control variation

Symbols that need to be escaped on the lower side (towards twolc):

These are for developing underlying morphology rules

Symbols that need to be escaped on the lower side (towards twolc):

Flag diacritics

We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again: | @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised

For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm. | @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first | @D.CmpPref.TRUE@ | Block such words from entering ENDLEX | @P.CmpPref.FALSE@ | Block these words from making further compounds | @D.CmpLast.TRUE@ | Block such words from entering R | @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding | @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding | @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R | @D.CmpOnly.FALSE@ | Disallow words coming directly from root.

Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags. | @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. | @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj.

Flags used to identify parts of speech

FLAGS USED WITH NUMERALS

cardinal vs ordinal

Orthographical errors? 2021-03-13

Pmatch 2021-03-13

Removal

| Flag diacritic | Explanation | :————- |:———– | @U.number.one@ | Flag used to give arabic numerals in smj different cases ; | @U.number.two@ | Flag used to give arabic numerals in smj different cases ; | @U.number.three@ | Flag used to give arabic numerals in smj different cases ; | @U.number.four@ | Flag used to give arabic numerals in smj different cases ; | @U.number.five@ | Flag used to give arabic numerals in smj different cases ; | @U.number.six@ | Flag used to give arabic numerals in smj different cases ; | @U.number.seven@ | Flag used to give arabic numerals in smj different cases ; | @U.number.eight@ | Flag used to give arabic numerals in smj different cases ; | @U.number.nine@ | Flag used to give arabic numerals in smj different cases ; | @U.number.zero@ | Flag used to give arabic numerals in smj different cases ;

Lexicon Root

NEWWORDS FILES A_NEWWORDS ; adjectives ADV_NEWWORDS ; adverbs N_NEWWORDS ; nouns PROP_NEWWORDS ; proper nouns V_NEWWORDS ; verbs

I INCLUDE SOME SMALL LEXICA HERE WAITING FOR OWN FILES, OR PERHAPS THEY COULD STAY HERE


This (part of) documentation was generated from src/fst/morphology/root.lexc


src-fst-morphology-stems-adjectives_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. lyhyt+A:lyhy A_ “/(eng) short/(fin) lyhyt” ;

ADD ADJECTIVES BELOW!


This (part of) documentation was generated from src/fst/morphology/stems/adjectives_newwords.lexc


src-fst-morphology-stems-adverbs_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. rounu+Adv:rounu ADV_ “/(eng) /(fin) tasan” ;

ADD ADVERBS BELOW!


This (part of) documentation was generated from src/fst/morphology/stems/adverbs_newwords.lexc


src-fst-morphology-stems-exceptions.lexc.md

Special verbal forms

ADJECTIVES

Adverbs incomplete #päivy+N+Sg+Ess+Adv+Adv:#piän This should help in compound words Essive

Conjunctors incomplete

MISC

POSTPOSITIONS

Pronouns

Proper given names female

Proper given names male

Place names

Nouns incomplete

identitiettu iänenando iänenandolippu julgavot Jougamoine kirjutannuh koudu kulʼtuurutego livgiläine nielenvaldu muasku noumer partnʼourat piäkirjutus politiekku pyhänpiän sananparzi sportu immigrantat valličendupäivänny valličenduvirguniekku valliiskoikieline Wikipedii

Väinämöine Verbs INCOMPLETE

PROPER NOUNS FROM OLONETS

Undentified Morph


This (part of) documentation was generated from src/fst/morphology/stems/exceptions.lexc


src-fst-morphology-stems-nouns_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. pappi+N:pappi N_PAPPI “/(eng) priest/(fin) pappi” ;

ERRONEOUS FORMS sluudielaine+N:sluudiela : stuudielaine oldihanukselaine+N:oldihanuksela : oldih_anukselazet

ADD NOUNS BELOW!


This (part of) documentation was generated from src/fst/morphology/stems/nouns_newwords.lexc


src-fst-morphology-stems-numerals.lexc.md

Numerals Numerals in the Livvi language are numbers.

Numerals have been split in three sections, the compounding parts of cardinals and ordinals, and the non-compounding ones:


This (part of) documentation was generated from src/fst/morphology/stems/numerals.lexc


src-fst-morphology-stems-prefixes.lexc.md

Prefixes Prefixes in the Livvi language are bound to beginning of other words.


This (part of) documentation was generated from src/fst/morphology/stems/prefixes.lexc


src-fst-morphology-stems-propernouns_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. merki:merki PROP_ “/(eng) mark/(fin) merkki” ;

ADD NOUNS BELOW!


This (part of) documentation was generated from src/fst/morphology/stems/propernouns_newwords.lexc


src-fst-morphology-stems-rus-Cyrl-2-Lat-propernouns.lexc.md

Male given names that are used for deriving patronymics

Francʼ+N+Prop:Franc

Vili+N+Prop:Vil

FEMALE GIVEN NAMES


This (part of) documentation was generated from src/fst/morphology/stems/rus-Cyrl-2-Lat-propernouns.lexc


src-fst-morphology-stems-verbs_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. kandua+V:kanda V_KANDUA “/(eng) carry/(fin) kantaa” ;

ADD VERBS BELOW! These Below exist in xml but lack Finnish translation


This (part of) documentation was generated from src/fst/morphology/stems/verbs_newwords.lexc


src-fst-phonetics-txt2ipa.xfscript.md

retroflex plosive, voiceless t ʈ 0288, 648 ( = ASCII 096) retroflex plosive, voiced d ɖ 0256, 598 labiodental nasal F ɱ 0271, 625 retroflex nasal n ɳ 0273, 627 palatal nasal J ɲ 0272, 626 velar nasal N ŋ 014B, 331 uvular nasal N\ ɴ 0274, 628

bilabial trill B\ ʙ 0299, 665 uvular trill R\ ʀ 0280, 640 alveolar tap 4 ɾ 027E, 638 retroflex flap r ɽ 027D, 637 bilabial fricative, voiceless p\ ɸ 0278, 632 bilabial fricative, voiced B β 03B2, 946 dental fricative, voiceless T θ 03B8, 952 dental fricative, voiced D ð 00F0, 240 postalveolar fricative, voiceless S ʃ 0283, 643 postalveolar fricative, voiced Z ʒ 0292, 658 retroflex fricative, voiceless s ʂ 0282, 642 retroflex fricative, voiced z` ʐ 0290, 656 palatal fricative, voiceless C ç 00E7, 231 palatal fricative, voiced j\ ʝ 029D, 669 velar fricative, voiced G ɣ 0263, 611 uvular fricative, voiceless X χ 03C7, 967 uvular fricative, voiced R ʁ 0281, 641 pharyngeal fricative, voiceless X\ ħ 0127, 295 pharyngeal fricative, voiced ?\ ʕ 0295, 661 glottal fricative, voiced h\ ɦ 0266, 614

alveolar lateral fricative, vl. K alveolar lateral fricative, vd. K\

labiodental approximant P (or v) alveolar approximant r\ retroflex approximant r` velar approximant M\

retroflex lateral approximant l` palatal lateral approximant L velar lateral approximant L
Clicks

bilabial O\ (O = capital letter) dental |
(post)alveolar !\ palatoalveolar =\ alveolar lateral ||
Ejectives, implosives

ejective > e.g. ejective p p> implosive < e.g. implosive b b< Vowels

close back unrounded M close central unrounded 1 close central rounded } lax i I lax y Y lax u U

close-mid front rounded 2 close-mid central unrounded @\ close-mid central rounded 8 close-mid back unrounded 7

schwa ə @

open-mid front unrounded E open-mid front rounded 9 open-mid central unrounded 3 open-mid central rounded 3\ open-mid back unrounded V open-mid back rounded O

ash (ae digraph) { open schwa (turned a) 6

open front rounded & open back unrounded A open back rounded Q Other symbols

voiceless labial-velar fricative W voiced labial-palatal approx. H voiceless epiglottal fricative H\ voiced epiglottal fricative <\ epiglottal plosive >\

alveolo-palatal fricative, vl. s\ alveolo-palatal fricative, voiced z\ alveolar lateral flap l\ simultaneous S and x x\ tie bar _ Suprasegmentals

primary stress “ secondary stress % long : half-long :\ extra-short _X linking mark -
Tones and word accents

level extra high _T level high _H level mid _M level low _L level extra low _B downstep ! upstep ^ (caret, circumflex)

contour, rising contour, falling _F contour, high rising _H_T contour, low rising _B_L

contour, rising-falling _R_F (NB Instead of being written as diacritics with _, all prosodic marks can alternatively be placed in a separate tier, set off by < >, as recommended for the next two symbols.) global rise global fall Diacritics

voiceless 0 (0 = figure), e.g. n_0 voiced _v aspirated _h more rounded _O (O = letter) less rounded _c advanced _+ retracted _- centralized _” syllabic = (or _=) e.g. n= (or n=) non-syllabic _^ rhoticity `

breathy voiced _t creaky voiced _k linguolabial _N labialized _w palatalized ‘ (or _j) e.g. t’ (or t_j) velarized _G pharyngealized _?\

dental d apical _a laminal _m nasalized ~ (or _~) e.g. A~ (or A~) nasal release _n lateral release _l no audible release _}

velarized or pharyngealized _e velarized l, alternatively 5 raised _r lowered _o advanced tongue root _A retracted tongue root _q


This (part of) documentation was generated from src/fst/phonetics/txt2ipa.xfscript


src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md

We describe here how abbreviations are in Livvi are read out, e.g. for text-to-speech systems.

For example:


This (part of) documentation was generated from src/fst/transcriptions/transcriptor-abbrevs2text.lexc


tools-grammarcheckers-grammarchecker.cg3.md

O L O N E T S I A N G R A M M A R C H E C K E R

DELIMITERS

TAGS AND SETS

Tags

This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.

Beginning and end of sentence

BOS EOS

Parts of speech tags

N A Adv V Pron CS CC CC-CS Po Pr Pcle Num Interj ABBR ACR CLB LEFT RIGHT WEB PPUNCT PUNCT

COMMA ¶

Tags for POS sub-categories

Pers Dem Interr Indef Recipr Refl Rel Coll NomAg Prop Allegro Arab Romertall

Tags for morphosyntactic properties

Nom Acc Gen Ill Loc Com Ess Ess Sg Du Pl Cmp/SplitR Cmp/SgNom Cmp/SgGen Cmp/SgGen PxSg1 PxSg2 PxSg3 PxDu1 PxDu2 PxDu3 PxPl1 PxPl2 PxPl3 Px

Comp Superl Attr Ord Qst IV TV Prt Prs Ind Pot Cond Imprt ImprtII Sg1 Sg2 Sg3 Du1 Du2 Du3 Pl1 Pl2 Pl3 Inf ConNeg Neg PrfPrc VGen PrsPrc Ger Sup Actio VAbess

Err/Orth

Semantic tags

Sem/Act Sem/Ani Sem/Atr Sem/Body Sem/Clth Sem/Domain Sem/Feat-phys Sem/Fem Sem/Group Sem/Lang Sem/Mal Sem/Measr Sem/Money Sem/Obj Sem/Obj-el Sem/Org Sem/Perc-emo Sem/Plc Sem/Sign Sem/State-sick Sem/Sur Sem/Time Sem/Txt

HUMAN

PROP-ATTR PROP-SUR

TIME-N-SET

Syntactic tags

@+FAUXV @+FMAINV @-FAUXV @-FMAINV @-FSUBJ> @-F<OBJ @-FOBJ> @-FSPRED<OBJ @-F<ADVL @-FADVL> @-F<SPRED @-F<OPRED @-FSPRED> @-FOPRED> @>ADVL @ADVL< @<ADVL @ADVL> @ADVL @HAB> @<HAB @>N @Interj @N< @>A @P< @>P @HNOUN @INTERJ @>Num @Pron< @>Pron @Num< @OBJ @<OBJ @OBJ> @OPRED @<OPRED @OPRED> @PCLE @COMP-CS< @SPRED @<SPRED @SPRED> @SUBJ @<SUBJ @SUBJ> SUBJ SPRED OPRED @PPRED @APP @APP-N< @APP-Pron< @APP>Pron @APP-Num< @APP-ADVL< @VOC @CVP @CNP OBJ

-OTHERS SYN-V @X ## Sets containing sets of lists and tags This part of the file lists a large number of sets based partly upon the tags defined above, and partly upon lexemes drawn from the lexicon. See the sourcefile itself to inspect the sets, what follows here is an overview of the set types. ### Sets for Single-word sets INITIAL ### Sets for word or not WORD NOT-COMMA ### Case sets ADLVCASE CASE-AGREEMENT CASE NOT-NOM NOT-GEN NOT-ACC ### Verb sets NOT-V ### Sets for finiteness and mood REAL-NEG MOOD-V NOT-PRFPRC ### Sets for person SG1-V SG2-V SG3-V DU1-V DU2-V DU3-V PL1-V PL2-V PL3-V ### Pronoun sets ### Adjectival sets and their complements ### Adverbial sets and their complements ### Sets of elements with common syntactic behaviour ### NP sets defined according to their morphosyntactic features ### The PRE-NP-HEAD family of sets These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression **WORD - premodifiers**. ### Border sets and their complements ### Grammarchecker sets * * * This (part of) documentation was generated from [tools/grammarcheckers/grammarchecker.cg3](https://github.com/giellalt/lang-olo/blob/main/tools/grammarcheckers/grammarchecker.cg3) --- # tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.md # Tokeniser for olo Usage: ``` $ make $ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://github.com/hfst/hfst/wiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1. unknown word-like forms, and 2. unmatched strings We want to give 1) a match, but let 2) be treated specially by `hfst-tokenise -a` Unknowns are made of: * lower-case ASCII * upper-case ASCII * select extended latin symbols ASCII digits * select symbols * Combining diacritics as individual symbols, * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" ## Unknown handling Unknowns are tagged ?? and treated specially with `hfst-tokenise` hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Finally we mark as a token any sequence making up a: * known word in context * unknown (OOV) token in context * sequence of word and punctuation * URL in context * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-disamb-gt-desc.pmscript](https://github.com/giellalt/lang-olo/blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript) --- # tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.md # Grammar checker tokenisation for olo Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just: ``` $ make $ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` More usage examples: ``` $ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://github.com/hfst/hfst/wiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a * select extended latin symbols * select symbols * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" TODO: Could use something like this, but built-in's don't include šžđčŋ: Simply give an empty reading when something is unknown: hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Finally we mark as a token any sequence making up a: * known word in context * unknown (OOV) token in context * sequence of word and punctuation * URL in context * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript](https://github.com/giellalt/lang-olo/blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript) --- # tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.md # TTS tokenisation for smj Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just: ```sh make echo "ja, ja" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` More usage examples: ```sh echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \ boasttu olmmoš, man mielde lahtuid." \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst echo "márffibiillagáffe" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a * select extended latin symbols * select symbols * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" TODO: Could use something like this, but built-in's don't include šžđčŋ: Simply give an empty reading when something is unknown: hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Needs hfst-tokenise to output things differently depending on the tag they get * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-tts-cggt-desc.pmscript](https://github.com/giellalt/lang-olo/blob/main/tools/tokenisers/tokeniser-tts-cggt-desc.pmscript)