This part of the file lists a large number of sets based partly upon the tags defined above, and partly upon lexemes drawn from the lexicon. See the sourcefile itself to inspect the sets, what follows here is an overview of the set types.

Sets for Single-word sets

the set INITIAL for initial letters INITIAL

Sets for word or not

Derivational affixes

Case sets

ADLVCASE

Verb sets

NOT-V

Sets for finiteness and mood

MOOD-V

Homonymy for subject conjugation and subject-object conjugation with Pl3 object

VFIN

VFIN-POS

Sets for person

Pronoun sets

кортамс мезде

words that go with эрьва for кизэ homonymy PxSg2 for кизэ homonymy PxSg1 This will be expanded for homonymy at first

This will be expanded for homonymy at first, i.e., diminutives

verbs elative, illative, lative

these have homonyms

used with Dat PxSg1

Derivation tags

2VDerTag 2NDerTag

DerTag

Pl Nom Def is Homomym with verb stem in тне-мс. This is relative for Clt/Cop with ScPl1 and ScPl2

in SP Gen Indef the next word can be кель

2023_03_15 important part of regular inflection

This (part of) documentation was generated from src/cg3/disambiguator.cg3

src-cg3-functions.cg3.md

Sets for POS sub-categories
Sets for Semantic tags
Sets for Morphosyntactic properties

negation marker for fits between negation and conneg

@CMP-STD> : oblique comparative standard with adverb, adjective to right
@>ADVL : Modifier of an adverbial, with the adverbial to the right. ** vaikko: doppe leat vaikko man ollu studeanttat.
@ADVL< : Complement of adverbial. ** vahkus: Son málesta guktii vahkus. таргозонзо: Ломанесь пиди-пани кавксть таргозонзо.
@<ADVL : Adverbial following the main verb. ** dás: Eanet dieđuid gávnnat dás. тестэ: Седе ламо содамочи муят тестэ.
@ADVL> : Adverbial to the left of the main verb ** viimmat: Dál de viimmat asttan lohkat reivve. окойники: Ней окойники кенеринь ловномо сёрманть.

MOOD-V

Erzya and Moksha

this needs Moksha, too.

finite auxiliary verbs with

макссь чарькодемс, Deal with DATAUX separately; they also take MS

finite auxiliary taking supine MO/ME

@+FAUXV : finite auxiliary verbs
@-FAUXV : non-finite auxiliary verbs

finite supaux 2023_03_13

@+FMAINV : finite main verbs
@-FMAINV : non-finite main verb

This (part of) documentation was generated from src/cg3/functions.cg3

src-fst-morphology-affixes-adjectives.lexc.md

Adjective inflection

Adjectives and other parts of speech in ERZYA are compared by means of either a particle or ablative case marking on the standard of comparison

ordinals in -це

истямо:истя

кондямо:кондя

кодамо:кода кодамо:кода кодатнэ кодатне

This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc

src-fst-morphology-affixes-adpositions.lexc.md

The Erzya language postpositions can be broken into many subgroups according to morphological and semantic criteria

Some of the nouns have defective paradigms: € кудыкельганть

ало:ал alo-SPAT-1Arg

This allows for possessor indices, word end or focus e.g. вельде, вельдеяк, вельдензэ ?вельдензэль, вельдензэтне

This allows for word end, possessor indices, predication

postposition that is in ablative case алдо:алдо

postposition that is in elative case потсто:потсто

postposition that is in illative case эземс:эзем

postposition that is in illative case эйс:э

postposition that is in inessive case эйсэ:эйсэ, кисэ

postposition that is in lative case ютков:ютков

postposition that is in locative case ало:ало

postposition that has no continuation пачк

postposition that is in ablative case алдо:алдо

postposition that is in elative case потсто:потсто

postposition that is in illative case малас:мала

postposition that is in illative case потс:пот

postposition that is in illative case эйс:э

postposition that is in inessive case потсо:потсо

postposition that is in lative case алов:ало

postposition that is in locative case ало:ало

postposition that is in prolative case перька:перька

+Temp: K ; перть

+Ela+Temp: PO_POSS_OR_END_FOC ; пингстэ

This (part of) documentation was generated from src/fst/morphology/affixes/adpositions.lexc

src-fst-morphology-affixes-adverbs.lexc.md

Adverb inflection

The Erzya language adverbs do not compare.

Not a real particle; it can take a clitic седеяк

LEXICON ADV-SPAT_ пачк

LEXICON ADV_IS_LAT алов

LEXICON ADV_IS_LOC ало

LEXICON ADV/PO/PRON-SPAT_ALO ало:ал

LEXICON ADV-SPAT_ALO ало:ал

“стядо”

spatial adverbs dependent and independent case marking

This marking would indicate a word form that may be

This (part of) documentation was generated from src/fst/morphology/affixes/adverbs.lexc

src-fst-morphology-affixes-interjections.lexc.md

Interjections

The Erzya language interjections…

This (part of) documentation was generated from src/fst/morphology/affixes/interjections.lexc

src-fst-morphology-affixes-nonverbalConjugation.lexc.md

Non-Verbal conjugation

In the Erzya language nominals and adverbs also conjugate

Used with deverbals

This is where adjectives get their plural T.

used with infinitives

Conjugation

NON-VERB CONJUGATION

Conjugation

_KAL-NomSg-Conjugation-only

This allows Clt/Cop+Prs Sg1|Sg2|Pl1|Pl2 Clt/Cop+Prt2 Sg1|Sg2|Sg3|Pl1|Pl2|Pl3 K 2019-01-26

_KUDO-NomPl-Conjugation-only

_KUDO-NomPl-Conjugation-only-mutual

Are there copula verb combinations? 2024-08-06

This (part of) documentation was generated from src/fst/morphology/affixes/nonverbalConjugation.lexc

src-fst-morphology-affixes-nouns.lexc.md

Noun inflection

Nouns in ERZYA inflect for number, case and declension (definite, indefinite and possessive).

LEXICON N_PELE пеле:пель, ало:ал

KINSHIP

HUMAN

PLACE

LATIVE

VOCATIVE

NAMES OF MONTHS

COMMON NOUNS

LEXICON N_T1 кель:кель %^Ь2ZERO
LEXICON N_KEL1 кель:кель %^Ь2ZERO
LEXICON N_LOMAN1 ломань:ломань %^Ь2ZERO
LEXICON N_OZIM1 озимь:озимь %^Ь2ZERO
LEXICON N_RUF1 озимь:озимь %^Ь2ZERO
LEXICON N_RECH1 озимь:озимь %^Ь2ZERO
LEXICON N_VESHCH1 озимь:озимь %^Ь2ZERO
LEXICON N_PEJ кель:кель %^Ь2ZERO
LEXICON N_J кель:кель %^Ь2ZERO
LEXICON N_SODYJ сода%>{иы}й, содый

кардаз:карда

панго:панг

потмо:пот

Front vowel, non-palatal consonant before vowel Front vowel, non-palatal consonant before vowel

Front vowel, palatal consonant before vowel

Front vowel, non-palatal consonant before vowel Front vowel, non-palatal consonant before vowel

Does this need a diminutive?

NMN

LEXICON NMN_SAN сан:сан
LEXICON NMN_KEL1 кель:кель %^Ь2ZERO
LEXICON NMN_LOMAN1 ломань:ломань
LEXICON NMN_PEJ кель:кель %^Ь2ZERO
** TMP-INDEF ; ** Check this
**LEXICON NMN_KUDO-PL ** This needs checking 2013-03-27

harmony: front

DERIVATION

**+SP+Gen+Indef:%>{оеэØ}нь%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Gen+Def:%>{оеэØ}нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Ela+Def:%>ст{оэØ}%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Ine+Def:%>с{оэØ}%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Prl+Def:%>Г2а%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Cmpr+Def:%>{оеэØ}шка%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Abe+Def:%>вт{оеэ}мО1%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Abe+Def+Err/Orth-stem-soft-should-be-0:%^SoftRetain%>темО1%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Gen+Def+Use/NG+Err/Orth+Dial/NW:%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Ela+Def+Use/NG+Err/Orth+Dial/NW:%>ст{оэØ}%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Ine+Def+Use/NG+Err/Orth+Dial/NW:%>с{оэØ}%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Prl+Def+Use/NG+Err/Orth+Dial/NW:%>Г2а%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Cmpr+Def+Use/NG+Err/Orth+Dial/NW:%>{оеэØ}шка%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Abe+Def+Use/NG+Err/Orth+Dial/NW:%>вт{оеэ}мО1%>%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Abe+Def+Use/NG+Err/Orth+Dial/NW+Err/Orth-stem-soft-should-be-0:%^SoftRetain%>темО1%>%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Pl+Gen+Def:%>тнЕ3%>нь%> N2Dem-SE ; ** ь retension through double %>%>
**+SP+Gen+Indef:%^Ь2ZERO%>ень%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Gen+Def:%^Ь2ZERO%>енть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Ela+Def:%>стэ%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Ine+Def:%>сэ%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Prl+Def:%>га%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Cmpr+Def:%^Ь2ZERO%>ешка%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Abe+Def:%>теме%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Gen+Def+Use/NG+Err/Orth+Dial/NW:%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Ela+Def+Use/NG+Err/Orth+Dial/NW:%>стэ%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Ine+Def+Use/NG+Err/Orth+Dial/NW:%>сэ%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Prl+Def+Use/NG+Err/Orth+Dial/NW:%>га%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Cmpr+Def+Use/NG+Err/Orth+Dial/NW:%^Ь2ZERO%>ешка%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Use/-Spell+Sg+Abe+Def+Use/NG+Err/Orth+Dial/NW:%>теме%>%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Pl+Gen+Def:%>тне%>нь%> N2Dem-SE ; ** ь retension through double %>%>
**+SLoss+Sg+Ela+Def:%>SLossст{оэØ}%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+SLoss+Sg+Ine+Def:%>SLossс{оэØ}%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+SLoss+Sg+Ela+Def+Use/NG+Err/Orth+Dial/NW:%>SLossст{оэØ}%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+SLoss+Sg+Ine+Def+Use/NG+Err/Orth+Dial/NW:%>SLossс{оэØ}%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Ela+Def+Use/NG:%>{оеэØ}%>ст{оэØ}%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Ine+Def+Use/NG:%>{оеэØ}%>с{оэØ}%>нть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Ela+Def+Use/NG+Err/Orth+Dial/NW:%>{оеэØ}%>ст{оэØ}%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+Sg+Ine+Def+Use/NG+Err/Orth+Dial/NW:%>{оеэØ}%>с{оэØ}%>сть%> N2Dem-SE ; ** ь retension through double %>%>
**+SP+Gen+Indef:%>{оеэØ}нь%> N2Dem-SE ; ** ь retension through double %>%>

pango:pang

N_KUDO-Def-Declension

Plurale tantum

DEFINITE SINGULAR TAGS

Noun singular nominative definite examples:*
калось: кал+N+Sg+Nom+Def
калоськак: кал+N+Sg+Nom+Def+Foc/Гак
★калосьгак: кал+N+Sg+Nom+Def+Foc/Гак (is not standard language)
Noun singular nominative definite examples:*
паксясось: пакся+N+SP+Ine+Indef+Der+Der/MWN+N+Sg+Nom+Def
паксясоськак: пакся+N+SP+Ine+Indef+Der+Der/MWN+N+Sg+Nom+Def+Foc/Add
★паксясосьгак: пакся+N+SP+Ine+Indef+Der+Der/MWN+N+Sg+Nom+Def+Foc/Add (is not standard language)
Noun singular genitive definite examples:*
калонть: кал+N+Sg+Gen+Def
калонтькак: кал+N+Sg+Gen+Def+Foc/Гак
**+Sg+Ine+Def:%>с{оэØ}%>н%> N2Dem-SE ; ** !коридорсонсесь 2022-02-10

INDEFINITE DECLENSION

SG-NOM-INDEF_LAK ;

SG-NOM-INDEF_KAL ;

SG-NOM-INDEF_OSH ;

** TMP-INDEF ; ** Check this

INDEFINITE TAGS

Noun singular nominative definite examples:*
калдо: кал+N+SP+Abl+Indef
калдояк: кал+N+SP+Abl+Indef+Foc/Гак

POSSESSIVE DECLENSION

CASES BEFORE POSSESSIVE TAGS

DEFINITE PLURAL

Cases for тнэ

NP head ellipsis declension, Modifiers without nouns = MWN

Nouns1S_A

POSSESSIVE marking followed by clitics

Possessor indices

The Erzya language possessor indices or possessive suffixes may be followed by a number of morpheme types

These are possessor indices that can be followed by predicate marking in the present there is no destinction between ScSg3 and ScPl3 Possessor indices allowing (1) #, (2) Foc, (3) Der/Pr ()

This appears with kindred terminology

Is “_KAL” necessary ?

DAT-PXPL1 ;

POSSESSIVE TAGS

These are possessor Indices for non-nominative singular NonNomSg

word boundary or focus

This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc

src-fst-morphology-affixes-pronouns.lexc.md

Pronoun inflection

Erzya pronouns inflect in many the same cases as regular nouns.

Closed class personal pronouns

+Sem/Hum+Sg+Nom:е ENDLEX ; кие:ки

+Sem/Obj: CLT/COP_SG ; singular

мон:мо

тон:то

сон:со

минь:

тынь:ты

сынь:сы

Obligatory Possessor Index

Demonstrative

Interrogative

What should be done

кона:кона This is not the same as indefinite PronRel-kona

What should be done

LEXICON PRON-IS-INTERR-SPAT-INE косо

What should be done

Relative pronouns

ки:ки

ки

мезе+Pron:мезИ2 Misc_Pronouns1 ; мезе+Pron+Rel+Gen:мень K ; ки+Pron:ки Misc_Pronouns1 ;

Some pronoun continuation have been moved here Out of TestLexc-noun.txt

This (part of) documentation was generated from src/fst/morphology/affixes/pronouns.lexc

src-fst-morphology-affixes-propernouns.lexc.md

Proper noun inflection

Erzya proper nouns inflect in the same cases as regular nouns.

Андрей:Андре

Вили:Вил

Russian type Surnames Абдеев:Абдеев

Багрий:Багр

Аморский:Аморск

Front-vowel stem

DECLENSION LIMITATIONS

This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc

src-fst-morphology-affixes-quantifiers.lexc.md

Quantifier inflection

Erzya quantifiers inflect in many the same cases as regular nouns.

extra numerals

Now regular

кавонст

омбонст

кавонест is a pronoun like the Finnish molemmat This means a radical increase in the Erzya pronoun inventory: 6 x for each numeral 2 and above

кавксоненек

once, twice; весть, кавксть, аламоксть twofold, threefold; веенькирда, кавонькирда, колмонькирда

васенцеде advmod:multimprf > advmod:ordimprf

васняяк ‘first of all’

Numeral with a range limitation to adnominal phrase

2012-08-09

This (part of) documentation was generated from src/fst/morphology/affixes/quantifiers.lexc

src-fst-morphology-affixes-symbols.lexc.md

Symbol affixes

This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc

src-fst-morphology-affixes-verbs.lexc.md

Verb inflection

Erzya language verbs inflect for person, subject and object.

OBJECT FLAGS AND +V tags а+V:а

**LEXICON V-AUX-NEG-PRT1 ** а+V:эзь

**LEXICON TV_KADOMS **

**LEXICON TV_NEVTEMS_SUB **

LEXICON TV_NEVTEMS невтемс:невть

**LEXICON TV_SAVTOMS_SUB **

**LEXICON TV_SAVTOMS **

**LEXICON TV_SAVTOMS-SG3_SUBJ/ZERO **

**LEXICON TV_CHACHTOMS **

**LEXICON TV_KUNDAMS_SUB **

**LEXICON TV_KUNDAMS **

**LEXICON TV_SATOMS **

**LEXICON TV_TUEMS **

**LEXICON TV_TEEMS **

LEXICON TV_NEKSHNEMS некшнемс:некшн

VERBS WITH THIRD PERSON OBJECTS @U.CONJ-PX.13@

VERBS WITH INTRANSITIVE TAGS +V

AUXILIARY VERBS

DERIVATION

VERBS AFTER TRANSITIVITY Tags OBJECT FLAGS

муемс:му

теемс:тей теемс:тей

no deverbals

DERIVATION

LEXICON TV_NEKSHNEMS Alternates with TRA LEXICON TV_NEKSHNEMS Alternates with TRA LEXICON TV_NEKSHNEMS Alternates with TRA

This is fed by actors and participles in N_myv, A_myv and Prc_myv This is fed by actors and participles in N_myv, A_myv and Prc_myv

CONJUGATION

Indicative Preterite I

INDICATIVE

Indicative NonPast

INDICATIVE PRETERITE 2

DESIDERATIVE

CONJUNCTIVE

redo conj 2012-11-07 begin

redo conj 2012-11-07 end

begin

end

OPTATIVE

IMPERATIVE

PRECATIVE

OPTATIVE

2012-11-09

Given in Grammar 2000

Used with deverbals

ваномс+V+Imprt+ScPl2+Clt/Ga: look/katsoa

ван%>{оеэØ}дО1%>Г4а
ван%>одо%>0я

This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc

src-fst-morphology-clitics.lexc.md

Clitics

The Erzya language clitics…

END

This (part of) documentation was generated from src/fst/morphology/clitics.lexc

src-fst-morphology-phonology.twolc.md

The Erzya morphophonological/twolc rules file

This file documents the phonology.twolc file

Alphabet

ӓ Ӓ ҥ Ҥ і І ѳ Ѳ Pre-Soviet 1930s letters

Special letters in the root that might be useful in dialect research and etymology later

**Ь3:0 ** арсемс:арсе arśems vs арсемс:арЬ3се aŕśems
**Ӓ3:э ** эрямс:Ӓ3ря
**Ӓ4:е ** пелемс:пӒ4ль
%^Ӓ3:Э ^Ӓ3 :Э
%^Ӓ4:Е ^Ӓ4 :Е
%^ӓ3:э эрямс:^ӓ3ря
%^ӓ4:е пелемс:п^ӓ4ль
**%^Ь2ZERO:0 ** removes stem-final soft sign

идиса, идима ашоян disallow о:0

вт%{оеэ%}мО1

%{ОØ%}:0 Stem-final archiphoneme панго
%{ЕØ%}:ь Stem-final archiphoneme тинге
%{ЕØ%}:0 Stem-final archiphoneme тинге
%^H:0 used with stems in ч, ш, ж for hard plurals
%{дт%}:д in ablative
%{дт%}:т in ablative

%{frontHard%}:0 — front harmony hard %{frontSoft%}:0 — front harmony soft %{back%}:0 — back harmony %{backHard%}:0 — back harmony

%{dialM%}:0 — for Shoksha and Drakino Dial/M morphology
%{ichPat%}:0 — for triggering colloquial partonymic forms
%^CnsRM:0 — Remove consonant

%^OldAE:0 — This allows Ӓ4 and Ӓ3 to be realized as я %^NoLinkVow:0 — No linking vowel is used only after consonants for error

%^SoftRetain:0 — The soft sign is not lost when adding -тне
%^HardNoDent:0 — Hard non-dent followed by -тнэ потоктнэсэ

verbStemVowStrong:0

цёра%>%{АЯ%}н
цёр0%>ан

Ӓ3 Ӓ4 as я

A1:o

яка%>%{оеэØ%}мА1
яка%>0мо
★яка%>%{оеэØ%}мА1 (is not standard language)
★яка%>0ма (is not standard language)

Y2:yi

%{оеэ%}:е неемс+V+Ger+Ill+PxPl1: –see/nähdä–

ней%>%{оеэØ%}мО1%>%{оеэØ%}з%>%{оеэØ%}н%{оеэ%}к
не0%>еме%>0з%>энек

%{оеэ%}:о псака+N+SP+Abe+PxSg3+Der+Der/MWN+N+SP+Tra+Indef: cat/kissa

псака%>втО1мО1%>%{оеэØ%}нз%{оэØ%}%>кс
псака%>втомо%>0нзо%>кс

%{оеэ%}:э

%{оеэØ%}:0 %{оеэØ%}:е панемс+V+Ind+ConNeg: drive/ajaa

пань%>%{оеэØ%}К3
пан0%>е0

вадемс+V+Der/Ovt+Prc/Telic+Sg+Nom+Def: the greased one/

вадь>{оеэØ}вт{ЬØ}>{оеэØ}сь
вад0>евт0>есь

%{оеэØ%}:э кев+N+SP+Ill+PxSg2: rock/kivi

к^ӓ4в{frontSoft}>з>{оеэØ}т{ЬØ}
кев0>з>эть
к^ӓ4в{frontSoft}>{оеэØ}з>{оеэØ}т{ЬØ}
кев0>ез>эть
пильге%{frontSoft%}%>%{оеэØ%}з%>%{оеэØ%}т%{ЬØ%}
пильге0%>0з%>эть

%{оеэØ%}:о ков+N+SP+Ill+PxSg2: moon/kuu

ков%{back%}%>з%>%{оеэØ%}т%{ЬØ%}
ков0%>з%>от
ков%{back%}%>%{оеэØ%}з%>%{оеэØ%}т%{ЬØ%}
ков0%>оз%>от
асфальт>{оеэØ}сь
асфальт>ось

%{уиыØ%}:и панемс+V+Inf+Dial/NW: drive/ajaa

пань%>%{уиыØ%}мс
пан0%>имс

%{уиыØ%}:ы кев+N+SP+Ill+PxSg2: rock/kivi

к^ӓ4в{frontSoft}>з>{уиыØ}т{ЬØ}
кев0>з>эть
к^ӓ4в{frontSoft}>{уиыØ}з>{уиыØ}т{ЬØ}
кев0>из>ыть

%{уиыØ%}:у ков+N+SP+Ill+PxSg2: moon/kuu

ков%{back%}%>з%>%{уиыØ%}т%{ЬØ%}
ков0%>з%>ут
ков%{back%}%>%{уиыØ%}з%>%{уиыØ%}т%{ЬØ%}
ков0%>уз%>ут
сай%{frontSoft%}%>О1%>дО1
са00%>е%>де

O1:e

O1:o

%{оэØ%}:e

тев+N+Sg+Nom+PxSg3+Err/Orth-no-linking-vowel: thing/juttu

тев>^NoLinkVow>з{оэØ}
тев>0>зэ мель+N+Sg+Nom+PxSg3+Err/Orth-no-linking-vowel: wish/mieli
мель>^NoLinkVow>з{оэØ}
мель>0>зэ

%{оэØ%}:o

псака+N+SP+Abe+PxSg3+Der+Der/MWN+N+SP+Tra+Indef: cat/kissa

псака%>втО1мО1%>%{оеэØ%}нз%{оэØ%}%>кс
псака%>втомо%>0нзо%>кс псака+N+SP+Gen+PxSg3+Der+Der/MWN+N+Sg+Gen+Def
псака%>%{оеэØ%}нз%{оэØ%}%>%{оеэØ%}нть
псака%>0нзо%>0нть стувтомс+V+Opt+ScSg3+OcSg3
стувт>{оеэØ}сс{оэØ}
стувт>оссо

%{оэØ%}:0

O1:0

%{ое%}:е

%{ое%}:о

A2:a
путомс+V+Prec+ScSg2: put/laittaa

пут%>%{КТ%}%{АЯ%}
пут%>та
карь>{дт}О1>{АЯ}н
кар0>д0>ян

и:ы

j:0

сай%>%{оеэØ%}%>дО1
са0%>е%>де

**Е3:э always ** %> т н _ 2013-02-23

**Е3:э sometimes ** %> т н _ 2013-02-23

**ye:e always **
сыр

сыр>Н1е{frontSoft}>{оеэØ}нь
сыр>нэ0>0нь

Н1:н
Н1:к

а: и Dimin

о: ы Dimin

у: и Dimin

о regressive raising у озномс+V+Ind+Prs+ScSg1+OcSg3+Dial/NW: bless/siunata

озно^RegrRaise>са
озну0>са

э: и Dimin

а: и Dimin

о: и Dimin

у: и Dimin

я: и Dimin

ё: и Dimin

ю: и Dimin

е: и Dimin

a:ya

n loss with plural ведун+N+Pl+Indef: knower/tietäjä

ведунCnsRM
веду00

v:0

G1:0

G1:g

G1:k

G2:g

G2:k

G4:0
саемс+V+Ind+Prs+ConNeg+Clt/Ga:

сай>{КТ}{ЬØ}>Г4а
сай>т0>0я

G4:k

потмо+N+Relator+SP+Ela+Indef: inside/sisäosa

потм%{back%}%>ст%{оэØ%}
пот00%>сто

imperative suffix K1:t

лыказевемс+V+Imprt+ScSg2: have taken

лыказев%>%{КТ%}%{ЬØ%}
лыказев%>ть

K1:к
ливтемс+V+Prec+ScSg2: set out/laittaa esille

ливть%>%{КТ%}%{АЯ%}
ливт0%>тя

U4:y
кал+N+Sg+Nom+Def: fish/kala

кал>{dialM}с{ЬØ}
кал>0с0

пильге+N+Pl+Nom+Indef leg; foot/jalka

пильг%{frontSoft%}%>т%{ЬØ%}
пильг0%>ть валдо+N+Pl+Nom+Indef light/valo
★валд%{backHard%}%>т%{ЬØ%} (is not standard language)
★валд0%>ть (is not standard language) лыказевемс+V+Imprt+ScSg2: have taken
лыказев%>%{КТ%}%{ЬØ%}
лыказев%>ть

U4:0

вадемс+V+Der/Ovt+Prc/Telic+Sg+Nom+Def: the greased one/

вадь>{оеэØ}вт{ЬØ}>{оеэØ}сь
вад0>евт0>есь

валдо+N+Pl+Nom+Indef light/valo

валд%{backHard%}%>т%{ЬØ%}
валд0%>т0

t:d
ловомс+V+Ind+Prs+ScSg1+OcSg2: regard/pitää jonain

лов>^TD>т{АЯ}н
лов>0>дан

s:0

класс%>с
клас0%>с

d:t

кедь%>%{дт%}О1
кед0%>те
обед%{frontHard%}%>%{дт%}О1
обед0%>тэ
★обед%{frontHard%}%>%{дт%}О1 (is not standard language)
★обед0%>дэ (is not standard language)

d:d

y:y

ведь{frontSoft}^SoftRetain>тне
ведь00>тне

y:0

кель^Ь2ZERO>енть
кел00>енть ломань+N+Pl+Indef: person/ihminen
ломаньCnsRM>ть
лома000>ть

меремс+V+Ind+Prt1+ScSg3: say/sanoa

мерь>сь
мер0>сь
★мерь>сь (is not standard language)
★мерь>сь (is not standard language) лисемс+V+Ind+Prt1+ScSg3: go out/mennä ulos
лись>сь
лис0>сь
★лись>сь (is not standard language)
★лись>сь (is not standard language)

Disallow TLoss after non-t

Disallow ^H before t and subsequent {ЬØ} Disallow RegrRaise after A

Disallow vow loss before break

Disallow OldAE when no Ä

★раське>{уиыØ}нь (is not standard language)
★раське>0нь (is not standard language)

Disallow KLoss after non-k

Disallow SLoss after non-s

Disallow %^WLoss after non-v

Disallow Н1:н after Letters

[л

:Vows (HarmDummies:)] (ь:) %> _ %> %{оеэØ%}: ;

пильге{frontSoft}>з>{оеэØ}нз{оэØ}
пильге0>з>энзэ

Disallow soft loss

кирьпець^SoftRetain>тне
кирьпець0>тне

Disallow hard dent with soft error

Disallow SoftRetain

Disallow SoftRetain чувто+N+Pl+Nom+Def: tree/puu

чувт%{ОØ%}%>тнЕ3
чувт0%>тнэ

веле+N+SP+Tra+PxSg2

веле%>%{оеэØ%}кс%>%{оеэØ%}т%{ЬØ%}
веле%>0кс%>эть

псака+N+SP+Abe+PxSg2+Clt/Cop+Prt2+ScPl3+Clt/Gak

псака%>втО1мО1%>%{оеэØ%}т%{ЬØ%}%>%{оеэØ%}линек%>Г1ак
псака%>втомо%>0т0%>олинек%>как ош+N+SP+Ill+PxSg2
ош%>%{оеэØ%}з%>%{оеэØ%}т%{ЬØ%}
ош%>оз%>от0
эряв%>^WLoss%>{ОЕЭØ}вО1ль
эряв%>0%>00оль

псака+N+SP+Abe+PxSg3+Der+Der/MWN+N+SP+Tra+Indef: cat/kissa

псака%>вт%{оеэ%}мО1%>%{оеэØ%}нз%{оэØ%}%>кс
псака%>втомо%>0нзо%>кс

веле+N+SP+Tra+PxSg2+Clt/Cop+Prt2+ScPl3: village/kylä

веле>{оеэØ}кс>{оеэØ}т{ЬØ}>{оеэØ}льть
веле>0кс>эт0>ельть

Disallow %^NoLinkVow after vowel

Disallow s for control of stems with inessive…

Disallow dano after non-voiced

★вечк>^TD>т{АЯ}н (is not standard language)
★вечк>0>дян (is not standard language)

Disallow k for control of comparative with stem types

This (part of) documentation was generated from src/fst/morphology/phonology.twolc

src-fst-morphology-root.lexc.md

Morphology

INTRODUCTION TO MORPHOLOGICAL ANALYSER OF ERZYA.

Analysis symbols

The morphological analyses of wordforms of ERZYA are presented in this system in terms of following symbols. (It is highly suggested to follow existing standards when adding new tags).

+TYÄ WORK HAS TO BE DONE
%

The parts-of-speech are:

+A adjective
+Adp adposition
+Adv adverb
+CS subordinating conjunction
+CC coordinating conjunction
+Det determiner
+Descr descriptive
+Interj interjection
+N noun
+Num numerals
+Pcle particle
+Po postposition
+Pr preposition (in Russian loans)
+Pron pronoun
+Qnt quantifier
+V verb

Parts of speech are further split up into:

Adjectives

+Adn Adnominal (modifier) !! This is not an NP head like +Pron
+Bahuvrihi This is a nominative-case NP used as an adjective
+bahuvrihi get rid of these for upper-case

Adverbs

+Ideoph These are ideophonic descriptors used to modify the verb вырк ливтясь “flit and it flew off” “Ideophone: A vivid representation of an idea in sound. A word, often onomatopoeic, which describes a predicate, qualificative or adverb in respect to manner, colour, sound, smell, action, state or intensity.” (Doke 1935:118)
+Manner with reference to type of adverb
+Parenthetic parenthetic
+Spat spatial
+Iter Iterative form expressing number of times; myv: кавксть, kpv: кыкысь
+Mult Multiplicative, two-ply; myv: кавонькирда
+Deg Ad-adjective This is degree, depricate + AdA
+Epist epistemic modality marker speaker’s evaluation/judgment of, degree of confidence in
+EvidNfh not first-hand келя
+EvidFh first-hand
+PerifMod periferal modifier ськамонзо

Interjections

+Formulaic

Nouns

+Prop proper

Particles

Postpositions + Spat, + Temp

Pronouns

+Dem demonstrative
+Indef indefinite
+Dep dependent word requiring the presence of another, e.g. мень
+Exclusive: ськамонза
+Intensive: intensive pronoun
+Interr interrogative
+PerifMod: periferal modifier ськамонза, кавонест
+Pers personal
+Recipr reciprocal
+Refl reflexive
+Rel relative
+Relat relator noun
+Sel selective, when selecting from a set of definites
+Short тень, теть; эстень
+Long монень, тонеть; монстень
+Sg1 first person singular
+Sg2 second person singular
+Sg3 third person singular
+Pl1 first person plural
+Pl2 second person plural
+Pl3 third person plural

Quantifiers (numerals)

Quantifiers and Numerals are classified under:

+Appr Approximative numeral кавто-колмо, колмошка two or three NB! do not confuse with Komi case +Apr
+AssocColl -ne- ; avide-
+Assoc +мезть
+Card cardinal + NCard
+Coll collective
+Distr Distributive
+Ord ordinal + NOrd
+Exclusive: ськамонзо

Nominals are inflected for Number and Case

Number

+Sg singular
+Pl plural
+SP ambiguous for number, general number

Case

+Abe abessive
+Abl ablative case
+Com Comitative “-нек/-нэк”
+Cmpr Comparative case form -шка
+Dat dative
+Ela elative case
+Gen genitive case
+Ill illative
+Ine inessive
+Lat lative
+Loc Locative “вить ён : вить ёно”
+Nom nominative case
+Prl prolative “га/ка/ва”
+Tra translative: used in similative and depictive constructions to mark what would be a secondary subject: –вармакс оргодсь тосто–
+Temp Temporalis case form “-не/-нэ” previously TempCx
+Voc Vocative

Possession and other declension types are marked with:

+PxSg1 first person singular
+PxSg2 second person singular
+PxSg3 third person singular
+PxSP3 third person singular or plural with dative only
+PxPl1 first person plural
+PxPl2 second person plural
+PxPl3 third person plural
+Def Definite

The comparative forms are:

+Comp comparative as opposed to superlative
+Superl superlative
+Attr Attribute

Verb moods are:

+Cond conditional Ындеря- (Derivational)
+Conj conjunctional “вОль”
+Des desiderative Ыксэль “was about to; wanted to”
+Ind indicative
+Imprt imperative
+Opt optative
+Prec precative
+Proh prohibitive is distinct from the negation of imperative Иля аварде! Don't cry' (Proh); Аволь мелявтт, кецяк!Don’t worry, be happy!’ (Neg + Imprt)

Infinitive moods

+Oblig modality: deontic/directive/obligative андомс: андома , якамс: якама
+Delib +Sugg modality: deontic/directive/deliberative I still need the right word for this андомс: андомсат

Tenses in the indicative and infrequently in the conditional

+Prs In Erzya There is no morphological distinction between present and future
+Prt1 Preterite 1
+Prt2 Preterite 2 (This is also used in predicate forms not involving a finite verb.)

Verb personal forms are:

+ScSg1 * subject conjugation first person singular
+ScSg2 * subject conjugation second person singular
+ScSg3 * subject conjugation third person singular
+ScPl1 * subject conjugation first person plural
+ScPl2 * subject conjugation second person plural
+ScPl3 * subject conjugation third person plural
Object conjugation
+OcSg1 * object conjugation first person singular
+OcSg2 * object conjugation second person singular
+OcSg3 * object conjugation third person singular
+OcPl1 * object conjugation first person plural
+OcPl2 * object conjugation second person plural
+OcPl3 * object conjugation third person plural

Other verb forms are

+Act * active voice (exo-tradition)
+PrsPrc * present participle (only non-contrastive usage)
+DemPrc * present participle (both contrastive and non-contrastive)
+ActPrcLong {иы}й (This is dealt with elsewhere as an active present participle)
+ActPrcShort {иы} (This is dealt with elsewhere as an active present participle)
+ActDemPrc {иы}ця (This is dealt with elsewhere as an active present participle)
+ConNeg * connegative, main verb complement to Neg, vow-stem
+ConNegII * connegative, main verb complement to Neg, cons-stem
+Ger * gerund This is used with Der/Ozj and VAbl
+Inf * infinitive
+Neg * verb of negation эзь, аволь, иля, апак
+ConvPrc * converb OR participle апак
+Prc * participle
+VGen * Verb Genitive, genitive form participle
+VAbl * Verb Ablative “озадо”
+Prc/Telic * Telic participle “саевть”
+Der/Abe * ВтОмО
+Der/Cmpr * шка
+Der/A * adjective derived from N or V
+Der/N2A * adjective derived from N
+Der/V2A * adjective derived from V
+Subst * deverbal nouns retaining verb arguments/gov
+PrfPrc

The Usage extents are marked using following tags:

+Err/Orth * Substandard
+Err/Sub * Substandard
+Err/Orth-no-hyphen * тетятават should be тетят-ават
+Err/Orth-back-should-be-hard-front * back should be hard front
+Err/Orth-cons-stem * пачт емс 2012 пачтямс
+Err/Orth-freq-le * пачтнемс:пачле
+Err/Orth-cons-stem * эзь эряв
+Err/Orth-front-linking-vowel * linking vowel is front уряжень
+Err/Orth-high-linking-vowel * linking vowel is high
+Err/Orth-mid-linking-vowel-should-be-high * linking vowel is mid вечкелизь should be вечкилизь
+Err/Orth-mid-onset-default-missing * should be скаломок, but this is скалмок, мелезэнек: мельзэнек
+Err/Orth-no-linking-vowel * linking vowel is missing
+Err/Orth-shib-hard * Иважнэнь
+Err/Orth-stem-a-should-be-o0 * чачтомс+V:чачта
+Err/Orth-stem-hard-e-should-be-je * Nekshnems
+Err/Orth-stem-ja-should-be-je0 * лемдемс+V:лемдя
+Err/Orth-stem-je-should-be-ja * мелямс:меле
+Err/Orth-stem-je-should-be-je0 * чудемс+V:чуде чуд емс (->)чуде мс
+Err/Orth-je-for-jo * should be ё
+Err/Orth-vowel-stem-je * пачтякшномс:пачтекшне
+Err/Orth-stem-soft-should-be-0 * кирпець:кирпецьтне
+Err/Orth-stem-nodent-hard-should-be-tnje * потоктнэсэ
+Err/Orth-missing-soft-in-stem * видме
+Err/Orth-missing-t-in-def-pl * область: областне
+Err/Orth-s-to-j * кайсь Modern: кассь
+Err/Orth-z-to-j * кардайсэ Modern: кардазсо
+Err/Orth-v-loss-before-lab * ольной
+Err/Orth-split-tween * гемень, кавтово
+Err/Orth-0-not-pal * no soft sign but should take soft sign
+Err/Orth-f * not v but instead f
+Err/Orth-s * not v but instead s
+Err/Orth-d * not t but instead d
+Err/Orth-colloq * colloquial, e.g. Минорыч
+Err/Orth-old1 * old1 like озимь, морковь
+Err/Orth-pre1880 * orthography preceding 1880
+Err/Orth-pre1978 * orthography preceding 1978
+Err/Orth-pre2012 * previous orthography
+Use/Marg * Marginal
+Use/-Spell * Exclude from speller
+Use/SpellNoSugg * recognized but not suggested in speller
+Use/Circ * Circular path
+Use/CircN * Circular number path
+Use/-Ped * Remove from pedagogical speller
+Use/NG * Do not generate, for isme-ped.fst and apertium
+Use/GC – only retained in the HFST Grammar Checker disambiguation analyser
+Use/-GC – never retained in the HFST Grammar Checker disambiguation analyser
+Use/TTS – only retained in the HFST Text-To-Speech disambiguation tokeniser
+Use/-TTS – never retained in the HFST Text-To-Speech disambiguation tokeniser
+Err/Lex * The lemma is not an Erzya word (Depricating –+Src/F–)
+URL * For tagging URLs

Dialect tags

+Dial/SH * Short forms
+Dial/L * Long forms
+Dial * No specification Specific to some dialects Rueter 2010: 8
+Dial/-C * Not central standard
+Dial/C * 1 Central or Kozlovka-Mokshlei
+Dial/W * 2 Western or Insar
+Dial/W-NW * 2 Western or Insar, subgroup NW
+Dial/W-SW * 2 Western or Insar, subgrou0 SW
+Dial/NW * 3 North-Western or Alatyr
+Dial/SE * 4 South-Eastern or Sura
+Dial/M * 5 Mixed or Drakino-Shoksha

Orthography tags

+Orth/PhonDeriv * Derivation is phonetic but declension and conjugation morphologic
+Orth/PhonInfl * Entire inflection is phonetic 1821, 1920-30
+Orth/standard * described in 2008, dictionary 2012
+Orth/thirties * 1939–1955 phonetic, morphological
+Orth/fifties * 1955–1978 phonetic, morphological
+Orth/seventies * 1978–1993 phonetic, morphological
+Orth/nineties * 1993-2008 morphological, but phonetic compounding
+Orth/wiki * Regular-semantic deriving from 1993 and 2008
+Orth/-wiki * e.g. compound words written with white space
+Orth/standard_wiki * e.g. вайгельпе
+Orth/-thirties * e.g. таргсемс, студенттнэ
+Orth/Colloq Colloquial speech reflected in spelling

Abbreviated words are classified with:

+ABBR * Abbreviation
+Symbol = independent symbols in the text stream, like £, €, ©
+ACR * Acronym

Special symbols

Delimiter marks are classified with:

+CLB +PUNCT +LEFT +RIGHT +MIDDLE *
%^excl *

The verbs are syntactically split according to transitivity:

+TV * transitive verb
+IV * intransitive verb
+NomAg Actor Noun From Verb - Nomen Agentis (ready)
+NomAct Action Noun From Verb - Nomen Actio (ready)
+Dimin Diminutive

Auxiliary verbs

+Aux *

Special multiword units are analysed with:

+Multi

Non-dictionary words can be recognised with:

+Guess

Question and Focus particles:

+Qst +Foc
+Acc for Russian
+All for Russian
+AnIn for Russian animate
+Anim for Russian
+Cmpar for Russian
+Count for Russian
+Epenth for Russian
+Imp for Russian imperative
+Impf for Russian
+Inan for Russian inanimate
+Ins for Russian
+Fac for Russian
+Fem for Russian feminine
+MFN for Russian
+Msc for Russian masculine
+Neu for Russian neuter
+Perf for Russian
+PObj for Russian
+Pos for Russian
+Prb for Russian
+Pred for Russian predicate
+PrsAct for Russian
+Pst for Russian

Semantic tags

Semantic tags to help disambiguation & synt. analysis: (before POS) Borrowed from main/langs/sme/src/morphology/root.lexc

Simplex tags

+Sem/Act Activity
+Sem/Amount Amount
+Sem/Ani Animate
+Sem/Aniprod Animal Product
+Sem/Body Bodypart
+Sem/Body-abstr siellu, vuoig?a, jierbmi
+Sem/Build Building
+Sem/Build-part Part of Bulding, like the closet
+Sem/Cat Category
+Sem/Clth Clothes
+Sem/Clth-jewl Jewelery
+Sem/Clth-part part of clothes, boallu, sávdnji…
+Sem/Ctain Container
+Sem/Ctain-abstr Abstract container like bank account
+Sem/Ctain-clth
+Sem/Curr Currency like dollár, Not Money
+Sem/Dance Dance
+Sem/Dir Direction like GPS-kursa
+Sem/Domain Domain like politics, reindeerherding (a system of actions)
+Sem/Drink Drink
+Sem/Dummytag Dummytag
+Sem/Edu Educational event
+Sem/Event Event
+Sem/Feat Feature, like Árvu
+Sem/Feat-phys Physiological feature, ivdni, fárda
+Sem/Feat-psych Psychological feauture
+Sem/Feat-measr Psychological feauture
+Sem/Fem Female name
+Sem/Food Food
+Sem/Food-med Medicine
+Sem/Furn Furniture
+Sem/Game Game
+Sem/Geom Geometrical object
+Sem/Group Animal or Human Group
+Sem/Hum Human
+Sem/Hum-abstr Human abstract
+Sem/Ideol Ideology
+Sem/Kin Kinship term (special PxSg2 forms),
+Sem/Kin_Fem Kinship term (special PxSg2 forms), female
+Sem/Kin_Mal Kinship term (special PxSg2 forms), male
+Sem/Lang Language
+Sem/Mal Male name
+Sem/Mat Material for producing things
+Sem/Measr Measure
+Sem/Money Has to do with money, like wages, not Curr(ency)
+Sem/Obj Object
+Sem/Obj-clo Cloth
+Sem/Obj-cogn Cloth
+Sem/Obj-el (Electrical) machine or apparatus
+Sem/Obj-ling Object with something written on it
+Sem/Obj-rope flexible ropelike object
+Sem/Obj-surfc Surface object
+Sem/Org Organisation
+Sem/Part Feature, oassi, bealli
+Sem/Perc-cogn Cognative perception
+Sem/Perc-emo Emotional perception
+Sem/Perc-phys Physical perception
+Sem/Perc-psych Physical perception
+Sem/Plant Plant
+Sem/Plant-part Plant part
+Sem/Plc Place
+Sem/Plc-abstr Abstract place
+Sem/Plc-elevate Place
+Sem/Plc-line Place
+Sem/Plc-water Place
+Sem/Pos Position (as in social position job)
+Sem/Process Process
+Sem/Prod Product
+Sem/Prod-audio Audio product
+Sem/Prod-cogn Cognition product
+Sem/Prod-ling Linguistic product
+Sem/Prod-vis Visual product
+Sem/Rel Relation
+Sem/Route Name of a Route
+Sem/Rule Rule or convention
+Sem/Semcon Semantic concept
+Sem/Sign Sign (e.g. numbers, punctuation)
+Sem/Sport Sport
+Sem/State
+Sem/State-sick Illness
+Sem/Substnc Substance, like Air and Water
+Sem/Sur Surname
+Sem/Fem-Sur Surname female
+Sem/Mal-Sur Surname male
+Sem/Symbol Symbol
+Sem/Time Time
+Sem/Tool Prototypical tool for repairing things
+Sem/Tool-catch Tool used for catching (e.g. fish)
+Sem/Tool-clean Tool used for cleaning
+Sem/Tool-it Tool used in IT
+Sem/Tool-measr Tool used for measuring
+Sem/Tool-music Music instrument
+Sem/Tool-write Writing tool
+Sem/Txt Text (girji, lávlla…)
+Sem/Veh Vehicle
+Sem/Wpn Weapon
+Sem/Wthr The Weather or the state of ground

Multiple Semantic tags:

+Sem/Act_Group
+Sem/Act_Plc
+Sem/Act_Route
+Sem/Amount_Build
+Sem/Amount_Semcon
+Sem/Ani_Body-abstr_Hum
+Sem/Ani_Build
+Sem/Ani_Build-part
+Sem/Ani_Build_Hum_Txt
+Sem/Ani_Group
+Sem/Ani_Group_Hum
+Sem/Ani_Hum
+Sem/Ani_Hum_Plc
+Sem/Ani_Hum_Time
+Sem/Ani_Plc
+Sem/Ani_Plc_Txt
+Sem/Ani_Time
+Sem/Ani_Veh
+Sem/Aniprod_Hum
+Sem/Aniprod_Obj-clo
+Sem/Aniprod_Perc-phys
+Sem/Aniprod_Plc
+Sem/Body-abstr_Prod-audio_Semcon
+Sem/Body_Body-abstr
+Sem/Body_Clth
+Sem/Body_Food
+Sem/Body_Group_Hum
+Sem/Body_Hum
+Sem/Body_Mat
+Sem/Body_Measr
+Sem/Body_Obj_Tool-catch
+Sem/Body_Plc
+Sem/Body_Time
+Sem/Build-part_Plc
+Sem/Build_Build-part
+Sem/Build_Clth-part
+Sem/Build_Edu_Org
+Sem/Build_Event_Org
+Sem/Build_Org
+Sem/Build_Route
+Sem/Clth-jewl_Curr
+Sem/Clth-jewl_Money
+Sem/Clth-jewl_Plant
+Sem/Clth_Hum
+Sem/Ctain-abstr_Org
+Sem/Ctain-clth_Plant
+Sem/Ctain-clth_Veh
+Sem/Ctain_Feat-phys
+Sem/Ctain_Furn
+Sem/Ctain_Tool
+Sem/Ctain_Tool-measr
+Sem/Curr_Org
+Sem/Dance_Org
+Sem/Dance_Prod-audio
+Sem/Domain_Food-med
+Sem/Domain_Prod-audio
+Sem/Edu_Event
+Sem/Edu_Group_Hum
+Sem/Edu_Mat
+Sem/Edu_Org
+Sem/Event_Food
+Sem/Event_Hum
+Sem/Event_Plc
+Sem/Event_Time
+Sem/Feat-phys_Tool-write
+Sem/Feat-phys_Veh
+Sem/Feat-phys_Wthr
+Sem/Feat-psych_Hum
+Sem/Feat_Plant
+Sem/Food_Perc-phys
+Sem/Food_Plant
+Sem/Game_Obj-play
+Sem/Geom_Obj
+Sem/Group_Hum
+Sem/Group_Hum_Org
+Sem/Group_Hum_Plc
+Sem/Group_Hum_Prod-vis
+Sem/Group_Org
+Sem/Group_Sign
+Sem/Group_Txt
+Sem/Hum_Lang
+Sem/Hum_Lang_Plc
+Sem/Hum_Lang_Time
+Sem/Hum_Obj
+Sem/Hum_Org
+Sem/Hum_Plant
+Sem/Hum_Plc
+Sem/Hum_Tool
+Sem/Hum_Veh
+Sem/Hum_Wthr
+Sem/Lang_Tool
+Sem/Mat_Plant
+Sem/Mat_Txt
+Sem/Measr_Time
+Sem/Money_Obj
+Sem/Money_Txt
+Sem/Obj-play
+Sem/Obj-play_Sport
+Sem/Obj_Semcon
+Sem/Clth-jewl_Org
+Sem/Org_Rule
+Sem/Org_Txt
+Sem/Org_Veh
+Sem/Part_Prod-cogn
+Sem/Perc-emo_Wthr
+Sem/Plant_Plant-part
+Sem/Plant_Tool
+Sem/Plant_Tool-measr
+Sem/Plc-abstr_Rel_State
+Sem/Plc-abstr_Route
+Sem/Plc_Pos
+Sem/Plc_Route
+Sem/Plc_Substnc
+Sem/Plc_Substnc_Wthr
+Sem/Plc_Time
+Sem/Plc_Tool-catch
+Sem/Plc_Wthr
+Sem/Prod-audio_Txt
+Sem/Prod-cogn_Txt
+Sem/Semcon_Txt
+Sem/Obj_State
+Sem/Substnc_Wthr
+Sem/Time_Wthr

Semantics are classified with

+Sem/Divinity Divinity (god personified),
+Sem/Constellation Constellation,
+Sem/Ant Anthroponym
+Sem/Fem Anthroponym female
+Sem/Mal Anthroponym male
+Sem/Patr Patronym
+Sem/Fem-Patr Patronym female
+Sem/Mal-Patr Patronym male
+Sem/Rvr name of river or water way, media of transportation,
+Sem/Mnth name of month
+Sem/Inanim Inanimate,

Semantic Fields

+Field/Agr agriculatural
+Field/Anat anatomical
+Field/Bio biological
+Field/Bot botanical
+Field/Chem chemical
+Field/Geol geological
+Field/Gram grammatical
+Field/Hist historical
+Field/Law law
+Field/Mar maritime
+Field/Math mathematical
+Field/Med medical
+Field/Mus musical
+Field/Relig church
+Field/Tech technical
+Field/Zool zoological

Other tags

Verbal arguments

+Subj/Zero This is used to mark verbs without a semantic subject

Derivations are classified under the morphophonetic form of the suffix, the source and target part-of-speech.

+V→N +V→V +V→A

Homonymy

Der begin

+Der In front of every derivation to make it possible to target derivations as a class e.g. in regular expressions etc
+Der/VtOmO
+Der/AbeAttr
+Der/stO Deriving adverbs from adjectives A2Adv
+Der/ms эрзямс эрзя, истямс истя, вадрямс вадря
+Der/shka
+Der/GenAttr +Der/Onj genitive attribute derivation of non-nouns
+Der/aj vocative
+Der/kaj vocative
+Der/PatrMal Male patronymic
+Der/PatrFem Female patronymic
+Der/Ovt * telic deverbal noun also attr, resultative participle
+Der/Oms * infinitive illative
+Der/OmO * infinitive locative/nominative
+Der/OmstO * infinitive elative
+Der/OmsO * infinitive inessive
+Der/OmdO * infinitive ablative
+Der/Omga * infinitive prolative
+Der/Oma * modality: deontic/directive/obligative андомс: андома , якамс: якама
+Der/Omka * modality: deontic/directive/obligative андомс: андомка , якамс: якамка
+Der/Ycja * active (demonstrative) present participle takes copula person
+Der/Yj * active long present participle takes copula person
+Der/Y * active short present participle
+Der/Yks * active short present participle with ks derivation
+Der/Ozj * Gerund
+Der/Cond * conditional derivation +Der/Ynderja
+Der/NomAg Actor Noun From Verb - Nomen Agentis (derivation) default in Ыця
+Der/NomAct Action Noun From Verb - Nomen Actio (derivation)

Declaring noun derivations

+Der/pelj

Modifier without noun

+Der/MWN Modifier without Noun
+Der/Dem Speaker-Oriented Demonstrative
Conjugation of words other than finite verbs
+Der/Pr derivation to predicate head, e.g. nominal conjugation
+Der/Cop This is not a derivation
+Clt/Cop This will replace the nominal conjugation Der/Pr+V
+Clt/Cond

Declaring Indefinite Pronoun derivations

+Der/koj prefix +Indef in indefinite pronouns
+Der/ta prefix +Indef in indefinite pronouns
+Der/tago prefix +Indef in indefinite pronouns
+Der/Gak suffix +Indef in indefinite pronouns
+Der/buti suffix +Indef in indefinite pronouns
+Der/Yja suffix +Indef in indefinite pronouns ковия, зярыя

DECLARING NOUN DERIVATIONS

+Der/chi adjective-to-noun
the combinatory –Event– preceding the NP-final noun
+Der/OmA verb-to-noun

DECLARING NUMERAL DERIVATIONS

+Der/cje +A+Ord
+Der/tjks +A+Ord (non-contrastive)

DECLARING DEVERBAL DERIVATIONS OF VERBS

+Der/kshnO verb2verb derivation
+Der/OkshnOms verb2verb derivation
+Der/OvOms verb2verb derivation
+Der/OvkshnOms verb2verb derivation
+Der/OvtOms verb2verb derivation
+Der/Ovtnjems verb2verb derivation
+Der/Ozevems verb2verb derivation
+Der/Ozevtems verb2verb derivation
+Der/Ozevtnjems verb2verb derivation
+Der/Ozevkshnems verb2verb derivation
+Der/sje this in verb2verb derivation and also in denominal demonstrative –Der/Dem–
+Der/nje verb2verb derivation
+Der/njems verb2verb derivation
+Der/Oncje old orth кудонцесь
+Der/Dimin
+Der/ka diminutive
+Der/NJE This is used in ошке, калнэ and кудыне
+Der/nJE diminutive
+Der/Ynje diminutive
+Der/Ynjka diminutive
+Der/Ynjkinje diminutive
+Der/ke diminutive in –ке–
+Der/kinje diminutive
+Der/ks Adv›N
+OLang/SME - North Sámi
+OLang/SMJ - Lule Sámi
+OLang/SMA - South Sámi
+OLang/FIN - Finnish
+OLang/SWE - Swedish
+OLang/NOB - Norw. bokmål
+OLang/NNO - Norw. nynorsk
+OLang/ENG - English
+OLang/MYV - Erzya
+OLang/MDF - Moksha
+OLang/RUS - Russian
+OLang/TAT - Tatar
+OLang/UND - Undefined
+F - Foreign

Morphophonology

To represent phonologic variations in word forms we use the following symbols in the lexicon files:

And following triggers to control variation

{frontHard} — front harmony hard
{frontSoft} — front harmony soft
{back} — back harmony
{backHard} — back harmony
{dialM} — for Shoksha and Drakino Dial/M morphology
{ichPat} — for triggering colloquial partonymic forms
%^CnsRM — Remove consonant
Е3 testing тне тнэ
%^H used with stems in ч, ш, ж for hard plurals

Special letters in the root that might be useful in dialect research and etymology later

Ь3 арсемс:арсе arśems vs арсемс:арЬ3се aŕśems
Ӓ3 эрямс:Ӓ3ря
Ӓ4 пелемс:пӒ4ль
%^Ӓ3 ^Ӓ3 :Э
%^Ӓ4 ^Ӓ4 :Е
%^ӓ3 эрямс:^ӓ3ря
%^ӓ4 пелемс:п^ӓ4ль
%^Ь2ZERO removes stem-final soft sign
{дт} in ablative
{ое} inflectional suffix protovowel аволь аволинь
{оеэØ} Suffix-initial archiphoneme
{уиыØ} Suffix-initial archiphoneme in dialect
%^RegrRaise идиса, идима ! raising e:i, o:u before a in NW
%^Break ашоян disallow о:

вт{оеэ}мО1 suffix-internal archivowel

{оэØ} inessive, elative; this is the hard/broad s
{ОØ} Stem-final archiphoneme панго
{ЕØ} Stem-final archiphoneme тинге

%^OldAE — This allows Ӓ4 and Ӓ3 to be realized as я

%^NoLinkVow — No linking vowel is used only after consonants for error
%^SoftRetain — The soft sign is not lost when adding -тне
%^HardNoDent — Hard non-dent followed by -тнэ потоктнэсэ

MISC

+Cmp/Hyph A tag to indicate that a hyphen was used when compounding

Development tag

+WORK
+NoVowX
ZERO
%0
%-
+Dig1
+Dig2
+Dig3
+Dig4
+Rom Roman numerals

Compounding

+Cmp Dynamic compound - this tag should always be part of a dynamic compound. It is important for Apertium, and useful in other cases as well.
+Cmp/Hyph-Coll with nouns
+Cmp/Hyph-Redup with verbs
+Cmp/Hyph-Synonym with verbs
+Cmp/Hyph-Serial with verbs
+Cmp/Hyph-tejems with verbs

Imperative clitics

+Clt/Ga редяка Precative +Prec
+Clt/Gaja редякая
+Clt/Gajatj редякаять
+Clt/Gajatja редякаятя
+Clt/Gatja редякатя
+Clt/Gaka редякака ARE these real?
+Clt/Gakaja редякакая ARE these real?
+Pred2 secondary predicate. Examples: “Joe came in with his hat on.” “Joe came in Joe had his hat on.”

Tags distinguishing different versions of the same lemma (before POS)

+v1
+v2
+v3
+v4
+v5
+v6
+v7
+v8
+v9
+v10
+v11
+v12
+v13
+v14
+v15
+v16
+v17
+v18
+v19
+v20
+v21
+v22
+v23
+v24
+ACC +DAT +COM This marks a function not a morpheme
+NoPoss used with personal pronouns in oblique cases, where a possessor index is expected

Symbols that need to be escaped on the lower side (towards twolc):

»
«
(written with square brackets, see the root.lexc file)
< (written with square brackets, see the root.lexc file)

Flag diacritics

We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again: | @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised

For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm. | @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first | @D.CmpPref.TRUE@ | Block such words from entering ENDLEX | @P.CmpPref.FALSE@ | Block these words from making further compounds | @D.CmpLast.TRUE@ | Block such words from entering R | @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding | @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding | @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R | @D.CmpOnly.FALSE@ | Disallow words coming directly from root.

Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags. | @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. | @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj.

Flags used to identify parts of speech

@P.POS.PRON@
@U.POS.N@
@U.POS.NUM@
@U.POS.A@
@P.POS.N@
@R.POS.N@
@P.POS.NUM@
@R.POS.NUM@
@P.POS.A@
@R.POS.A@
@P.POS.V@
@R.POS.V@
@C.POS@

Flags used with +Clt/Cop nonverbal predication

@U.PRED.NO@
@U.PRED.YES@
@C.PRED@

Flags used with transitivity

@U.TRANS.TV@
@U.TRANS.IV@
@P.TRANS.TV@
@P.TRANS.IV@ Flags used with serial verbs
@U.CONJ-INF.YES@
@U.CONJ-INF.NO@
@U.CONJ-TX.NONPAST@
@U.CONJ-TX.PRT1@
@U.CONJ-TX.PRT2@
@U.CONJ-MX.IND@
@D.CONJ-MX.IND@ 2012-11-04 should this be –D– or –N–
@U.CONJ-MX.IMP@
@U.CONJ-MX.OPT@
@U.CONJ-MX.PREC@
@U.CONJ-MX.DES@
@U.CONJ-MX.CONJ@
@U.CONJ-MX.COND@
@U.CONJ-CONNEG.YES@
@U.CONJ-CONNEG.NO@
@U.CONJ-NX.PL@
@U.CONJ-NX.SG@
@U.CONJ-POSS.1@
@U.CONJ-POSS.2@
@U.CONJ-POSS.3@
@U.CONJ-POSS.2ACC@
@U.CONJ-POSS.3ACC@
@U.CONJ-PX.10@
@U.CONJ-PX.12@
@U.CONJ-PX.13@
@U.CONJ-PX.15@
@U.CONJ-PX.16@
@U.CONJ-PX.20@
@U.CONJ-PX.21@
@U.CONJ-PX.23@
@U.CONJ-PX.24@
@U.CONJ-PX.26@
@U.CONJ-PX.30@
@U.CONJ-PX.31@
@U.CONJ-PX.32@
@U.CONJ-PX.33@
@U.CONJ-PX.34@
@U.CONJ-PX.35@
@U.CONJ-PX.36@
@U.CONJ-PX.40@
@U.CONJ-PX.42@
@U.CONJ-PX.43@
@U.CONJ-PX.45@
@U.CONJ-PX.46@
@U.CONJ-PX.50@
@U.CONJ-PX.51@
@U.CONJ-PX.53@
@U.CONJ-PX.54@
@U.CONJ-PX.56@
@U.CONJ-PX.60@
@U.CONJ-PX.61@
@U.CONJ-PX.62@
@U.CONJ-PX.63@
@U.CONJ-PX.64@
@U.CONJ-PX.65@
@U.CONJ-PX.66@
@R.CONJ-PX.13@
@R.CONJ-PX.16@
@R.CONJ-PX.23@
@R.CONJ-PX.26@
@R.CONJ-PX.33@
@R.CONJ-PX.36@
@R.CONJ-PX.43@
@R.CONJ-PX.46@
@R.CONJ-PX.53@
@R.CONJ-PX.56@
@R.CONJ-PX.63@
@R.CONJ-PX.66@
@P.CONJ.ObjAll@
@R.CONJ.ObjAll@
@C.CONJ@
@P.TLOSS.ON@
@R.TLOSS.ON@
@P.PossPx.Sg1@
@P.PossPx.Sg2@
@P.PossPx.Sg3@
@P.PossPx.Pl1@
@P.PossPx.Pl2@
@P.PossPx.Pl3@
@U.PossPx.S3@
@U.PossPx.SP3@
@U.PossPx.Sg1@
@U.PossPx.Sg2@
@U.PossPx.Sg3@
@U.PossPx.Pl1@
@U.PossPx.Pl2@
@U.PossPx.Pl3@
@D.PossPx@
@C.PossPx@
@P.TNUM.SG@
@P.TNUM.PL@
@D.TNUM.SG@
@D.TNUM.PL@
@C.TNUM@

problematic

@P.TPERS.1@
@P.TPERS.2@
@P.TPERS.3@
@N.TPERS.1@
@N.TPERS.2@
@N.TPERS.3@
@U.TPERS.1@
@U.TPERS.2@
@U.TPERS.3@
@C.TPERS@
@U.CX.ABE@
@U.CX.ABL@
@U.CX.CMP@
@U.CX.COM@
@U.CX.DAT@
@U.CX.ELA@
@U.CX.GEN@
@R.CX.ILL@
@D.CX.ILL@
@U.CX.ILL@
@U.CX.INE@
@U.CX.LAT@
@U.CX.LOC@
@U.CX.NOM@
@U.CX.PRL@
@U.CX.TRA@
@U.CX.PRL@
@U.CX.TEMP@
@N.CX.ILL@
@N.CX.INE@
@N.CX.LAT@
@N.CX.ELA@
@C.CX@
@P.DNUM.PL@
@P.DNUM.SG@
@U.DNUM.PL@
@U.DNUM.SG@
@C.DNUM@
@P.NUM.SG@
@P.NUM.PL@
@D.NUM.SG@
@D.NUM.PL@
@C.NUM@
@U.INDEF.KOI@
@U.INDEF.TA@
@U.INDEF.TAGO@
@U.INDEF.BUTI@
@U.INDEF.GAK@
@C.INDEF-PRON@
@P.INDEF.PREF@
@D.INDEF.PREF@
@R.INDEF.PREF@
@C.INDEF@

This allows or disallows combining with hyphen through loop especially for acronyms 2012-11-04

@U.HYPH-COMBO.ACRO@
@D.HYPH-COMBO.ACRO@
@C.HYPH-COMBO@

This disallows secondary compounding

@U.COMPOUND.YES@
@D.COMPOUND.YES@
@U.COMPOUND.NO@

Linking vowel for use with Translative

@P.LV.ON@
@P.LV.OFF@
@R.LV.ON@
@U.LV.ON@
@D.LV.ON@
@C.LV@
@C.CONJ-INF@
@C.CONJ-TX@
@C.CONJ-MX@
@C.CONJ-CONNEG@
@C.CONJ-NX@
@C.CONJ-PX@
@C.CONJ-POSS@
@C.KLOSS@
@C.TLOSS@

FLAGS USED WITH COLLECTIVE NOUNS

number

@U.DECL-NX.SG@
@U.DECL-NX.SP@
@U.DECL-NX.PL@
@R.DECL-NX.SG@
@R.DECL-NX.SP@
@R.DECL-NX.PL@
case
@U.DECL-CX.NOM@
@U.DECL-CX.ACC@
@U.DECL-CX.GEN@
@U.DECL-CX.DAT@
@U.DECL-CX.ABL@
@U.DECL-CX.ILL@
@U.DECL-CX.INE@
@U.DECL-CX.ELA@
@U.DECL-CX.LAT@
@U.DECL-CX.LOC@
@U.DECL-CX.TRA@
@U.DECL-CX.PRL@
@U.DECL-CX.COM@
@U.DECL-CX.TEMP@
@U.DECL-CX.ABE@
@U.DECL-CX.CMP@
@U.DECL-DX.DEF@
@U.DECL-DX.INDEF@
@U.DECL-DX.PX@

Removal

@C.DECL-NX@
@C.DECL-DX@
@C.DECL-CX@

Flag diacritic	Explanation
@U.number.one@	Flag used to give arabic numerals in smj different cases ;
@U.number.two@	Flag used to give arabic numerals in smj different cases ;
@U.number.three@	Flag used to give arabic numerals in smj different cases ;
@U.number.four@	Flag used to give arabic numerals in smj different cases ;
@U.number.five@	Flag used to give arabic numerals in smj different cases ;
@U.number.six@	Flag used to give arabic numerals in smj different cases ;
@U.number.seven@	Flag used to give arabic numerals in smj different cases ;
@U.number.eight@	Flag used to give arabic numerals in smj different cases ;
@U.number.nine@	Flag used to give arabic numerals in smj different cases ;
@U.number.zero@	Flag used to give arabic numerals in smj different cases ;

Russian letters from shared-urj_Cyrl а́ и́ о́ е́ у́

The word forms in ERZYA start from the lexeme roots of basic word classes, or optionally from prefixes: Here follow all contlexes, appr 20.

Hyphenated-nouns ; entire serial nouns
Hyphenated-verbs ; entire serial verbs

CyrillicFemaleName ; Emptied 2026-06-08, all moved to urj-Cyrl-propernouns.lexc HUNSPELL Type name derivation RussianMalenamesDerive ; ! RussianSurnamesDerive ;

увол-авол

alo-SPAT-1Arg ; >PO_KAL-LOC

This (part of) documentation was generated from src/fst/morphology/root.lexc

src-fst-morphology-stems-adjectives-russian-like_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. од:од A_KAL “(eng) /(fin)/(rus) “ ;

ADD ADJECTIVES BELOW

This (part of) documentation was generated from src/fst/morphology/stems/adjectives-russian-like_newwords.lexc

src-fst-morphology-stems-adjectives_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. эрзя-мокшонь:эрзя-мокшонь A_IS_GEN “(eng) /(fin) /(rus) “ ;

ADD ADJECTIVES BELOW

This (part of) documentation was generated from src/fst/morphology/stems/adjectives_newwords.lexc

src-fst-morphology-stems-adverbs_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. лембстэ:лембстэ ADV_ “(eng) /(fin) /(rus) “ ;

ADD ADVERBS BELOW

This (part of) documentation was generated from src/fst/morphology/stems/adverbs_newwords.lexc

src-fst-morphology-stems-exceptions.lexc.md

Exceptions are quite strange word-forms. the ones that do not fit anywhere else. This file contains all enumerated word forms that cannot reasonably be created from lexical data by regular inflection. Usually there should be next to none exceptions, it’s always better to have a paradigm that covers only one or few words than an exception since these will not work nicely with e.g. compounding scheme or possibly many end applications.

verbs of negation have partial inflection: € аволь € иля € эзь

The verb ярсамс has additional irregular forms: € ярстано € ярстадо

The verb сеземс

Some of the nouns have archaic consonant stem forms left: € ийть

Periferal

Some random Russian elements:

Some of the nouns have special forms for Gen PxSg1 and PxSg2:

Reciprocal pronouns These might be done with flags

These two stems have м loss but its presence can be observed in the choice of “тнэ” over “тне” This has special hard after lost consonant This has special hard after lost consonant

1930s Phonetic transcription дс » ц гт » к мекевлангт+Adv+Use/NG+Err/Orth:мекевланг K ; Half way between morphology and phonetics with a Russian twist

ADPOSITIONS

IDEOPHONES

are dealt with as adverbs

PRONOUNS

QUANTIFIERS

сисем+Num+Ord:сисеме NUMORD_KUDO ; This is irregularly formed, cf. сисемце

NOUNS

NOUNS WRITTEN Appart

PLACE NAMES

GEO

ANIMAL NAMES

FIRST NAMES

100 % homographs of Russian words

adjectives in ой Adj-od » A_RU-OJ with +Use/SpellNoSugg

+SP+Gen+Indef attributes as adjectives

Russian language words found in Erzya texts

Old Bible Names and words

RUSSIAN VERBS

unrecognized

Problems with synchronization missing lemmas

COLLECTIVE NOUNS

This (part of) documentation was generated from src/fst/morphology/stems/exceptions.lexc

src-fst-morphology-stems-genitive_attributes.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. Ботужале+N+Prop+SP+Gen+Indef:ботужале A_IS_PROP_GEN ;

ADD ADJECTIVES BELOW

This (part of) documentation was generated from src/fst/morphology/stems/genitive_attributes.lexc

src-fst-morphology-stems-hyphenated-nouns.lexc.md

These are nouns with parallel declension

ават%-тейтерть аванзо-тетянзо ават%-цёрат атявтт%-ававтт атят%-ават атят%-бабат атят%-сэрдят бабат%-нуцькат барант%-каткат боярт%-азорт боярт%-боярават

вирть%-лугат вирть%-паксят вирть%-укшторт ворт%-грабительть ворт%-розбойникть эрзят%-мокшот

This (part of) documentation was generated from src/fst/morphology/stems/hyphenated-nouns.lexc

src-fst-morphology-stems-hyphenated-verbs.lexc.md

These are verbs with parallel conjugation

REDUPLICATION

авардемс%-авардемс ардомс%-ардомс ардтневтемс%-ардтневтемс арсемс%-арсемс аштемс%-аштемс ванномс%-ванномс ваномс%-ваномс вешнемс%-вешнемс

%-And such

авардемс%-теемс арсемс%-теемс аштемс%-теемс ванномс%-теемс ваномс%-теемс

андомс%-симдемс аштемс%-учомс велямс%-чарамс вастомс%-дёлямс васькамс%-оймамс витнемс%-петнемс ёмавтомс%-аравтомс ярсамс%-симемс

SERIAL

витнемс%-ютавтомс

This (part of) documentation was generated from src/fst/morphology/stems/hyphenated-verbs.lexc

src-fst-morphology-stems-myv-propernouns.lexc.md

-kal

-osh

-kudo

-kal

-osh

-kudo

Place names, Settlements

Rivers

This (part of) documentation was generated from src/fst/morphology/stems/myv-propernouns.lexc

src-fst-morphology-stems-nouns_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. автор:автор N_KAL ;

ADD NOUNS BELOW

This (part of) documentation was generated from src/fst/morphology/stems/nouns_newwords.lexc

src-fst-morphology-stems-nouns_russian_100_newwords.lexc.md

This is where new Russian-equivalent nouns are added as lexc entries. This makes for a shared list in Mordvin analyser development автор:автор N_KAL_rus100 ;

ADD NOUNS BELOW

This (part of) documentation was generated from src/fst/morphology/stems/nouns_russian_100_newwords.lexc

src-fst-morphology-stems-propernouns_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. автор:автор N_KAL “(eng) /(fin) /(rus) “ ;

ADD NOUNS BELOW

This (part of) documentation was generated from src/fst/morphology/stems/propernouns_newwords.lexc

src-fst-morphology-stems-rusFemName.lexc.md

The derivable female given names have been moved to the template urj-Cyrl-propernouns.lexc.

This (part of) documentation was generated from src/fst/morphology/stems/rusFemName.lexc

src-fst-morphology-stems-rusMaleNameDer.lexc.md

The derivable male given names have been moved to the template urj-Cyrl-propernouns.lexc.

This (part of) documentation was generated from src/fst/morphology/stems/rusMaleNameDer.lexc

src-fst-morphology-stems-verbs_newwords.lexc.md

This is where new words are added as lexc entries before they are added to the xml source files. ливтевкшнемс+V:ливтевкшне TV_KUNDAMS “(eng) /(fin) /(rus) “ ;

ADD VERBS BELOW

These verbs just need Finnish translations A-M

N-End

This (part of) documentation was generated from src/fst/morphology/stems/verbs_newwords.lexc

src-fst-phonetics-txt2ipa.xfscript.md

retroflex plosive, voiceless t ʈ 0288, 648 ( = ASCII 096) retroflex plosive, voiced dɖ 0256, 598 labiodental nasal F ɱ 0271, 625 retroflex nasal n ɳ 0273, 627 palatal nasal J ɲ 0272, 626 velar nasal N ŋ 014B, 331 uvular nasal N\ ɴ 0274, 628

bilabial trill B\ ʙ 0299, 665 uvular trill R\ ʀ 0280, 640 alveolar tap 4 ɾ 027E, 638 retroflex flap rɽ 027D, 637 bilabial fricative, voiceless p\ ɸ 0278, 632 bilabial fricative, voiced B β 03B2, 946 dental fricative, voiceless T θ 03B8, 952 dental fricative, voiced D ð 00F0, 240 postalveolar fricative, voiceless S ʃ 0283, 643 postalveolar fricative, voiced Z ʒ 0292, 658 retroflex fricative, voiceless s ʂ 0282, 642 retroflex fricative, voiced z` ʐ 0290, 656 palatal fricative, voiceless C ç 00E7, 231 palatal fricative, voiced j\ ʝ 029D, 669 velar fricative, voiced G ɣ 0263, 611 uvular fricative, voiceless X χ 03C7, 967 uvular fricative, voiced R ʁ 0281, 641 pharyngeal fricative, voiceless X\ ħ 0127, 295 pharyngeal fricative, voiced ?\ ʕ 0295, 661 glottal fricative, voiced h\ ɦ 0266, 614

alveolar lateral fricative, vl. K alveolar lateral fricative, vd. K\

labiodental approximant P (or v) alveolar approximant r\ retroflex approximant r` velar approximant M\

retroflex lateral approximant l` palatal lateral approximant L velar lateral approximant L
Clicks

bilabial O\ (O = capital letter) dental |
(post)alveolar !\ palatoalveolar =\ alveolar lateral ||
Ejectives, implosives

ejective > e.g. ejective p p> implosive < e.g. implosive b b< Vowels

close back unrounded M close central unrounded 1 close central rounded } lax i I lax y Y lax u U

close-mid front rounded 2 close-mid central unrounded @\ close-mid central rounded 8 close-mid back unrounded 7

schwa ə @

open-mid front unrounded E open-mid front rounded 9 open-mid central unrounded 3 open-mid central rounded 3\ open-mid back unrounded V open-mid back rounded O

ash (ae digraph) { open schwa (turned a) 6

open front rounded & open back unrounded A open back rounded Q Other symbols

voiceless labial-velar fricative W voiced labial-palatal approx. H voiceless epiglottal fricative H\ voiced epiglottal fricative <\ epiglottal plosive >\

alveolo-palatal fricative, vl. s\ alveolo-palatal fricative, voiced z\ alveolar lateral flap l\ simultaneous S and x x\ tie bar _ Suprasegmentals

primary stress “ secondary stress % long : half-long :\ extra-short _X linking mark -
Tones and word accents

level extra high _T level high _H level mid _M level low _L level extra low _B downstep ! upstep ^ (caret, circumflex)

contour, rising contour, falling _F contour, high rising _H_T contour, low rising _B_L

contour, rising-falling _R_F (NB Instead of being written as diacritics with _, all prosodic marks can alternatively be placed in a separate tier, set off by < >, as recommended for the next two symbols.) global rise global fall Diacritics

voiceless 0 (0 = figure), e.g. n_0 voiced _v aspirated _h more rounded _O (O = letter) less rounded _c advanced _+ retracted _- centralized _” syllabic = (or _=) e.g. n= (or n=) non-syllabic _^ rhoticity `

breathy voiced _t creaky voiced _k linguolabial _N labialized _w palatalized ‘ (or _j) e.g. t’ (or t_j) velarized _G pharyngealized _?\

dental d apical _a laminal _m nasalized ~ (or _~) e.g. A~ (or A~) nasal release _n lateral release _l no audible release _}

velarized or pharyngealized _e velarized l, alternatively 5 raised _r lowered _o advanced tongue root _A retracted tongue root _q

This (part of) documentation was generated from src/fst/phonetics/txt2ipa.xfscript

src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md

We describe here how abbreviations are in Erzya are read out, e.g. for text-to-speech systems.

For example:

s.:syntynyt # ;
os.:omaa% sukua # ;
v.:vuosi # ;
v.:vuonna # ;
esim.:esimerkki # ;
esim.:esimerkiksi # ;

This (part of) documentation was generated from src/fst/transcriptions/transcriptor-abbrevs2text.lexc

src-fst-transcriptions-transcriptor-symbols2text.lexc.md

This file contains mappings from abbreviations and some acronyms to full forms for text-to-speech purposes. This is a supplement to the analyser; the analyser must tag the strings as +ABBR or similar for the transcriptions to work. The resulting full form must be lemmas known to the analyser, for further processing.

We describe here how abbreviations in Erzya are read out, for text-to-speech systems.

The file contains:

miscellaneous symbols
smileys
Clause boundary symbols
Single punctuation marks
Paired punctuation marks

This (part of) documentation was generated from src/fst/transcriptions/transcriptor-symbols2text.lexc

tools-grammarcheckers-grammarchecker.cg3.md

E R Z Y A G R A M M A R C H E C K E R

DELIMITERS

TAGS AND SETS

Upper and lower case

Sets for parts of speech
Sets for POS sub-categories
Sets for Semantic tags
Sets for Morphosyntactic properties
Sets for Derivation

This will be expanded for homonymy at first

This will be expanded for homonymy at first, i.e., diminutives

used with Dat PxSg1

Derivation tags

2VDerTag 2NDerTag

DerTag

Grammarchecker sets

This (part of) documentation was generated from tools/grammarcheckers/grammarchecker.cg3

tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.md

Tokeniser for myv

Usage:

$ make
$ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
$ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
$ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
$ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst

Pmatch documentation: https://github.com/hfst/hfst/wiki/HfstPmatch

Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words:

Punct contains ASCII punctuation marks
The symbol after m-dash is soft-hyphen U+00AD
The symbol following {•} is byte-order-mark / zero-width no-break space U+FEFF.

Whitespace contains ASCII white space and the List contains some unicode white space characters

En Quad U+2000 to Zero-Width Joiner U+200d’
Narrow No-Break Space U+202F
Medium Mathematical Space U+205F
Word joiner U+2060

Apart from what’s in our morphology, there are

unknown word-like forms, and
unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a Unknowns are made of:
- lower-case ASCII
- upper-case ASCII ASCII digits
- select symbols
- Combining diacritics as individual symbols,
- various symbols from Private area (probably Microsoft), so far:
- U+F0B7 for “x in box”

Unknown handling

Unknowns are tagged ?? and treated specially with hfst-tokenise hfst-tokenise –giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it’s safer to let hfst-tokenise handle them.

Finally we mark as a token any sequence making up a:

known word in context
unknown (OOV) token in context
sequence of word and punctuation
URL in context

This (part of) documentation was generated from tools/tokenisers/tokeniser-disamb-gt-desc.pmscript

tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.md

Grammar checker tokenisation for myv

Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just:

$ make
$ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst

More usage examples:

$ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
$ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
$ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst

Pmatch documentation: https://github.com/hfst/hfst/wiki/HfstPmatch

Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words:

Punct contains ASCII punctuation marks
The symbol after m-dash is soft-hyphen U+00AD
The symbol following {•} is byte-order-mark / zero-width no-break space U+FEFF.

Whitespace contains ASCII white space and the List contains some unicode white space characters

En Quad U+2000 to Zero-Width Joiner U+200d’
Narrow No-Break Space U+202F
Medium Mathematical Space U+205F
Word joiner U+2060

Apart from what’s in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a

select extended latin symbols
select symbols
various symbols from Private area (probably Microsoft), so far:
U+F0B7 for “x in box”

TODO: Could use something like this, but built-in’s don’t include šžđčŋ:

Simply give an empty reading when something is unknown: hfst-tokenise –giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it’s safer to let hfst-tokenise handle them.

Finally we mark as a token any sequence making up a:

known word in context
unknown (OOV) token in context
sequence of word and punctuation
URL in context

This (part of) documentation was generated from tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript

tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.md

TTS tokenisation for smj

Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just:

make
echo "ja, ja" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst

More usage examples:

echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \
boasttu olmmoš, man mielde lahtuid." \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
echo "márffibiillagáffe" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst

Pmatch documentation: https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstPmatch

Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words:

Punct contains ASCII punctuation marks
The symbol after m-dash is soft-hyphen U+00AD
The symbol following {•} is byte-order-mark / zero-width no-break space U+FEFF.

Whitespace contains ASCII white space and the List contains some unicode white space characters

En Quad U+2000 to Zero-Width Joiner U+200d’
Narrow No-Break Space U+202F
Medium Mathematical Space U+205F
Word joiner U+2060

Apart from what’s in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a

select extended latin symbols
select symbols
various symbols from Private area (probably Microsoft), so far:
U+F0B7 for “x in box”

TODO: Could use something like this, but built-in’s don’t include šžđčŋ:

Needs hfst-tokenise to output things differently depending on the tag they get

This (part of) documentation was generated from tools/tokenisers/tokeniser-tts-cggt-desc.pmscript

Last updated: May 7, 2021

Erzya language model documentation

src-cg3-disambiguator.cg3.md

DELIMITERS

TAGS AND SETS

Tags

Beginning and end of sentence

Parts of speech tags

Tags for POS sub-categories

Tags for morphosyntactic properties

Derivation tags

Semantic tags

Syntactic tags

Sets containing sets of lists and tags

Sets for Single-word sets

Sets for word or not

Derivational affixes

Case sets

Verb sets

Sets for finiteness and mood

Sets for person

Pronoun sets

Derivation tags

src-cg3-functions.cg3.md

src-fst-morphology-affixes-adjectives.lexc.md

Adjective inflection

src-fst-morphology-affixes-adpositions.lexc.md

src-fst-morphology-affixes-adverbs.lexc.md

Adverb inflection

src-fst-morphology-affixes-interjections.lexc.md

Interjections

src-fst-morphology-affixes-nonverbalConjugation.lexc.md

NON-VERB CONJUGATION

src-fst-morphology-affixes-nouns.lexc.md

Noun inflection

KINSHIP

HUMAN

PLACE

LATIVE

VOCATIVE

NAMES OF MONTHS

COMMON NOUNS

Plurale tantum

DEFINITE SINGULAR TAGS

INDEFINITE DECLENSION

INDEFINITE TAGS

POSSESSIVE DECLENSION

CASES BEFORE POSSESSIVE TAGS

DEFINITE PLURAL

Cases for тнэ

POSSESSIVE marking followed by clitics

POSSESSIVE TAGS

src-fst-morphology-affixes-pronouns.lexc.md

Pronoun inflection

Closed class personal pronouns

src-fst-morphology-affixes-propernouns.lexc.md

src-fst-morphology-affixes-quantifiers.lexc.md

Now regular

src-fst-morphology-affixes-symbols.lexc.md

Symbol affixes

src-fst-morphology-affixes-verbs.lexc.md

Verb inflection

AUXILIARY VERBS

DERIVATION

VERBS AFTER TRANSITIVITY Tags OBJECT FLAGS

DERIVATION

CONJUGATION

INDICATIVE

INDICATIVE PRETERITE 2

DESIDERATIVE

CONJUNCTIVE

OPTATIVE

IMPERATIVE

PRECATIVE

OPTATIVE

src-fst-morphology-clitics.lexc.md

Clitics

src-fst-morphology-phonology.twolc.md

The Erzya morphophonological/twolc rules file

Alphabet

Special letters in the root that might be useful in dialect research and etymology later