Woods Cree NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-cwd

Woods Cree language model documentation

All doc-comment documentation in one large file.


src-cg3-functions.cg3.md

S Y N T A C T I C F U N C T I O N S F O R (LANGUAGE NAME HERE)

Sámi language technology project 2003-2014, University of Tromsø #

This file adds syntactic functions. It was copied from sme.

Syntactic sets

These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.

The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NPT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)

HNOUN MAPPING

The leftovers are tagged @X

missingX adds @X to all missings

therestX adds @X to all what is left, often errouneus disambiguated forms


This (part of) documentation was generated from src/cg3/functions.cg3


src-fst-morphology-affixes-noun_affixes.lexc.md

Here we continue to lexical prenouns (in prenouns.lexc)

Starting with the general morphosyntactic features in a standardized order

Suffixation for animate nouns

Potentially obsolete code

Suffixation for inanimate nouns

Likely obsolete code

Irregular nouns

NOUN_ENDLEXs for wrapping up various things

End of noun affixes LEXC code


This (part of) documentation was generated from src/fst/morphology/affixes/noun_affixes.lexc


src-fst-morphology-affixes-prenouns.lexc.md

Woods Cree verb morphology

Prenouns


This (part of) documentation was generated from src/fst/morphology/affixes/prenouns.lexc


src-fst-morphology-affixes-preverbs.lexc.md

Woods Cree verb morphology

Preverbs


This (part of) documentation was generated from src/fst/morphology/affixes/preverbs.lexc


src-fst-morphology-affixes-verb_affixes.lexc.md

Plains Cree verb morphology

The Plains Cree verbs are divided in four groups:

  1. AI: Animate intransitive
  2. II: Inanimate intransitive
  3. TA: Transitive animate
  4. TI: Transitive inanimate

Prefixes

LEXICON VerbPrefixes divides the lexicon into four modes: independent, conjunctive, imperative and future conditional

LEXICON INDEPENDENT gives flags and prefixes for personprefix Hypotheticals

LEXICON IND_TENSE gives flags and prefixes for tense

LEXICON FUTURE_CONDITIONAL gives flags for future conditional (no prefix)

LEXICON CONJUNCT gives flag for conjunct and combined tense preverbs

LEXICON CNJ_TENSE gives prefixes and flags for tense in conjunct

LEXICON IMPERATIVE gives flag for imperative (no prefixes)

Preverbs

LEXICON VERBPREFIXES just adds the prefix boundary

Here we will take care of lexical preverbs (in preverbs.lexc)

Now, LEXC directs us to the ../stems/verbs_stems.lexc file, where we find all the verbal stems. The suffixes are then found in the section “Suffixes” right underneath.

Suffixes

Intransitive inanimate (II)

LEXICON VIIn

LEXICON VIIn_SG

LEXICON VIIw_PL

= LEXICON VIIw_PL NO LONGER NEEDED FROM AROK +V+II: VIIw_PL_WICI ; NO LONGER NEEDED FROM AROK

LEXICON VIIw

LEXICON VIIw_SG

LEXICON VIIn_PL NO LONGER NEEDED FROM AROK NO LONGER NEEDED FROM AROK

 NO LONGER NEEDED FROM AROK @U.wici.NULL@ VIIw_PL_ORDER ; NO LONGER NEEDED FROM AROK

@U.wici.NULL@ VIIw_PL_ORDER ;

LEXICON VIIw_SGPL_ORDER

LEXICON VIIw_SG_ORDER singular only

LEXICON VIIw_PL_ORDER singular only

= LEXICON VIIw_PL_ORDER plural only @U.order.indep@+Ind:@U.order.indep@ VIIw_PL_IND_PERSON ; ! @U.order.cnj@+Cnj:@U.order.cnj@ VIIw_PL_CNJ_PERSON ; ! @U.order.FutCon@+Fut+Cond:@U.order.FutCon@ VIIw_PL_FUT_CON_PERSON ;!

LEXICON VIIn_SGPL_ORDER

LEXICON VIIn_SG_ORDER singular only

LEXICON VIIn_PL_ORDER plural only

LEXICON VIIw_SG_IND_TENSE plural only

LEXICON VIIw_SG_CNJ_TENSE plural only

LEXICON VIIw_PL_IND_TENSE plural only

LEXICON VIIw_PL_CNJ_TENSE plural only

= LEXICON VIIw_PL_CNJ_TENSE plural only @U.tense.Prs@+Prs:@U.tense.Prs@ VIIw_PL_IND_PERSON ; ! @U.tense.Prt@+Prt:@U.tense.Prt@ VIIw_PL_IND_PERSON ; ! @U.tense.FutDef@+Fut+Def:@U.tense.FutDef@ VIIw_PL_IND_PERSON ; ! @U.tense.FutInt@+Fut+Int:@U.tense.FutInt@ VIIw_PL_IND_PERSON ; !

= LEXICON VIIw_PL_CNJ_TENSE plural only @U.tense.Prs@+Prs:@U.tense.Prs@ VIIw_PL_CNJ_PERSON ; ! @U.tense.Prt@+Prt:@U.tense.Prt@ VIIw_PL_CNJ_PERSON ; ! @U.tense.FutInt@+Fut+Int:@U.tense.FutInt@ VIIw_PL_CNJ_PERSON ; !

LEXICON VIIn_SG_IND_TENSE plural only

LEXICON VIIn_SG_CNJ_TENSE plural only

LEXICON VIIn_PL_IND_TENSE plural only

LEXICON VIIn_PL_CNJ_TENSE plural only

LEXICON VIIw_SGPL_IND_PERSON

LEXICON VIIw_SGPL_CNJ_PERSON

LEXICON VIIw_SGPL_FUT_CON_PERSON

LEXICON VIIw_SG_IND_PERSON

LEXICON VIIw_SG_CNJ_PERSON

LEXICON VIIw_SG_FUT_CON_PERSON

LEXICON VIIw_PL_IND_PERSON

LEXICON VIIw_PL_CNJ_PERSON

LEXICON VIIw_PL_FUT_CON_PERSON

= LEXICON VIIw_PL_FUT_CON_PERSON plural only @U.person.NULL@ VIIw_IND_PL_SUFFIX ;

= LEXICON VIIw_PL_FUT_CON_PERSON plural only @U.person.NULL@ VIIw_CNJ_PL_SUFFIX ;

= LEXICON VIIw_PL_FUT_CON_PERSON plural only @U.person.NULL@ VIIw_FUT_CON_PL_SUFFIX ;

LEXICON VIIn_SGPL_IND_PERSON

LEXICON VIIn_SGPL_CNJ_PERSON

LEXICON VIIn_SGPL_FUT_CON_PERSON

LEXICON VIIn_SG_IND_PERSON

LEXICON VIIn_SG_CNJ_PERSON

LEXICON VIIn_SG_FUT_CON_PERSON

LEXICON VIIn_PL_IND_PERSON plural only

LEXICON VIIn_PL_CNJ_PERSON plural only

LEXICON VIIn_PL_FUT_CON_PERSON plural only

LEXICON VIIn_SGPL_IND_NULL

LEXICON VIIn_SG_IND_SUFFIX singular

LEXICON VIIn_PL_IND_SUFFIX plural

LEXICON VIIw_SGPL_IND_NULL

LEXICON VIIw_SG_IND_SUFFIX w final singular

LEXICON VIIw_PL_IND_SUFFIX w final plural

LEXICON VIIn_SGPL_CNJ_NULL

LEXICON VIIn_SG_CNJ_SUFFIX singular

LEXICON VIIn_PL_CNJ_SUFFIX plural

LEXICON VIIw_SGPL_CNJ_NULL

LEXICON VIIw_SG_CNJ_SUFFIX w final singular

LEXICON VIIw_PL_CNJ_SUFFIX w final plural

LEXICON VIIn_SGPL_FUT_CON_NULL

LEXICON VIIn_SG_FUT_CON_SUFFIX singular

LEXICON VIIn_PL_FUT_CON_SUFFIX plural

LEXICON VIIw_SGPL_FUT_CON_NULL

LEXICON VIIw_SG_FUT_CON_SUFFIX w final singular

LEXICON VIIw_PL_FUT_CON_SUFFIX w final plural

Intransitive animate (AI)

LEXICON VAIw_PL stems that end in â or ê

LEXICON VAIae stems that end in â or ê

LEXICON VAIio stems that end in i, î, o, ô

LEXICON VAIn

LEXICON VAIn_PL

LEXICON VAIm These are VTI3 in Arok’s database

LEXICON VAIn_ORDER

LEXICON VAIn_PL_ORDER plural only

LEXICON VAIae_ORDER

LEXICON VAIw_PL_ORDER plural only

LEXICON VAIio_ORDER

LEXICON VAIn_PL_IND_TENSE plural only

LEXICON VAIn_PL_CNJ_TENSE plural only

LEXICON VAIw_PL_IND_TENSE plural only

LEXICON VAIw_PL_CNJ_TENSE plural only

LEXICON VAIn_IND_PERSON

LEXICON VAIn_CNJ_PERSON

LEXICON VAIn_FUT_CON_PERSON

LEXICON VAIn_IMP_PERSON

LEXICON VAIn_PL_IND_PERSON plural only

LEXICON VAIn_PL_CNJ_PERSON plural only

LEXICON VAIn_PL_FUT_CON_PERSON plural only

LEXICON VAIn_PL_IMP_PERSON plural only

LEXICON VAIw_PL_IND_PERSON plural only

LEXICON VAIw_PL_CNJ_PERSON plural only

LEXICON VAIw_PL_FUT_CON_PERSON plural only

LEXICON VAIw_PL_IMP_PERSON plural only

LEXICON VAIae_IND_PERSON

LEXICON VAIae_CNJ_PERSON

LEXICON VAIw_FUT_CON_PERSON

LEXICON VAIw_IMP_PERSON

LEXICON VAIio_IND_PERSON

LEXICON VAIio_CNJ_PERSON

LEXICON VAIw_IND_NI

LEXICON VAIw_IND_NI_SG_SUFFIX

LEXICON VAIw_IND_NI_PL_SUFFIX

LEXICON VAIw_IND_KI

LEXICON VAIw_IND_KI_SG_SUFFIX

LEXICON VAIw_IND_KI_PL_SUFFIX

LEXICON VAIae_IND_NULL

LEXICON VAIio_IND_NULL

LEXICON VAIw_IND_NULL_PL_SUFFIX

LEXICON VAIn_IND_NI

LEXICON VAIn_IND_NI_SG_SUFFIX

LEXICON VAIn_IND_NI_PL_SUFFIX

LEXICON VAIn_IND_KI

LEXICON VAIn_IND_KI_SG_SUFFIX

LEXICON VAIn_IND_KI_PL_SUFFIX

LEXICON VAIn_IND_NULL

LEXICON VAIn_IND_NULL_SG_SUFFIX

LEXICON VAIn_IND_NULL_PL_SUFFIX

LEXICON VAIae_CNJ_NULL

LEXICON VAIio_CNJ_NULL

LEXICON VAIae_CNJ_NULL_SG_SUFFIX

LEXICON VAIio_CNJ_NULL_SG_SUFFIX

LEXICON VAIw_CNJ_NULL_PL_SUFFIX

LEXICON VAIn_CNJ_NULL

LEXICON VAIn_CNJ_NULL_SG_SUFFIX

LEXICON VAIn_CNJ_NULL_PL_SUFFIX

LEXICON VAIae_FUT_CON_NULL

LEXICON VAIw_FUT_CON_NULL_SG_SUFFIX

+X+4Sg:%>yiki # ;

LEXICON VAIw_FUT_CON_NULL_PL_SUFFIX

+X+4Pl:%>yikwâwi # ;

LEXICON VAIn_FUT_CON_NULL

LEXICON VAIn_FUT_CON_NULL_SG_SUFFIX

+X+4Sg:%>iyiki # ;

LEXICON VAIn_FUT_CON_NULL_PL_SUFFIX

+X+4Pl:%>iyikwâwi # ;

Transitive inanimate (TI)

LEXICON VTIm_ORDER .

LEXICON VTIm_PL_ORDER plural only NOTE: imperative and fut con go straight to person lexica

LEXICON VTIm_PL_IND_TENSE plural only

LEXICON VTIm_PL_CNJ_TENSE plural only

LEXICON VTIm_IND_PERSON

LEXICON VTIm_CNJ_PERSON

LEXICON VTIm_FUT_CON_PERSON

LEXICON VTIm_IMP_PERSON

LEXICON VTIm_PL_IND_PERSON plural only

LEXICON VTIm_PL_CNJ_PERSON plural only

LEXICON VTIm_PL_FUT_CON_PERSON plural only

LEXICON VTIm_PL_IMP_PERSON plural only

LEXICON VTIm_IND_NI

LEXICON VTIm_IND_NI_SG_SUFFIX

LEXICON VTIm_IND_NI_PL_SUFFIX

LEXICON VTIm_IND_KI

LEXICON VTIm_IND_KI_SG_SUFFIX

LEXICON VTIm_IND_KI_PL_SUFFIX

LEXICON VTIm_IND_NULL

LEXICON VTIm_IND_NULL_SG_SUFFIX NOTE: X actor will eventually derive to VII, so it is not yet included as per Arok’s paradigm

Derives to VIIn

LEXICON VTIm_IND_NULL_PL_SUFFIX

Derives to VIIn

LEXICON VTIm_CNJ_NULL

LEXICON VTIm_CNJ_NULL_SG_SUFFIX

+X+4Sg:%>mihiyik # ;

LEXICON VTIm_CNJ_NULL_PL_SUFFIX

+X+4Pl:%>mihiyiki # ;

LEXICON VTIm_FUT_CON_NULL

LEXICON VTIm_FUT_CON_NULL_SG_SUFFIX

+X+4Sg:%>mihiyiki # ;

LEXICON VTIm_FUT_CON_NULL_PL_SUFFIX

+X+3Sg:%>mihkwâwi # ; +X+4Sg:%>mihiyikwâwi # ;

LEXICON VTA_ORDER Note: Imp and Fut Con don’t take tense

LEXICON VTA_PL_ORDER Note: Imp and Fut Con don’t take tense

LEXICON VTAi_ORDER Note: Imp and Fut Con don’t take tense ; Conjugates as TA regular except in 2sg IMM IMP

LEXICON VTAt_ORDER Note: Imp and Fut Con don’t take tense ; Conjugates as TA regular except in 2sg IMM IMP

LEXICON VTA_IND_TENSE plural only

LEXICON VTA_CNJ_TENSE plural only

LEXICON VTA_PL_IND_TENSE plural only

LEXICON VTA_PL_CNJ_TENSE plural only

LEXICON VTA_IND_PERSON

LEXICON VTA_CNJ_PERSON

LEXICON VTA_FUT_CON_PERSON

LEXICON VTA_IMP_PERSON

LEXICON VTA_PL_IND_PERSON

LEXICON VTA_PL_CNJ_PERSON

LEXICON VTA_PL_FUT_CON_PERSON

LEXICON VTA_PL_IMP_PERSON

LEXICON VTAt_IMP_PERSON no -i in 2sg+3SgO

LEXICON VTAi_IMP_PERSON

LEXICON VTA_IND_NI NOTE: No local, as local forms are always with ki-

LEXICON VTA_IND_NI_SG_SUFFIX

LEXICON VTA_IND_NI_PL_SUFFIX

LEXICON VTA_IND_KI

LEXICON VTA_IND_KI_SG_SUFFIX

LEXICON VTA_IND_KI_PL_SUFFIX

LEXICON VTA_IND_NULL NOTE: never local

LEXICON VTA_IND_NULL_SG_SUFFIX

LEXICON VTA_IND_NULL_PL_SUFFIX

~~~~~~~~~~~~~~~~~~~~~~

End of verb affixes LEXC code


This (part of) documentation was generated from src/fst/morphology/affixes/verb_affixes.lexc


src-fst-morphology-phonology.xfscript.md

Definitions

Rules

VG>i2 -> VV

OUTSIDE RULES

INITIAL CHANGE

Composing the rules together


This (part of) documentation was generated from src/fst/morphology/phonology.xfscript


src-fst-morphology-root.lexc.md

Woods Cree morphological analyser

INTRODUCTION TO MORPHOLOGICAL ANALYSER OF Plains Cree LANGUAGE.

Definitions for Multichar_Symbols

Analysis symbols

The morphological analyses of wordforms of Plains Cree are presented in this system in terms of the following symbols. (It is highly suggested to follow existing standards when adding new tags).

POS

Nominal morphology

Particles

ordinals

Verbal MSP

Person prefix fragment features

Nominal morphosyntactic features

Verb conjugation (transitivity + animacy classes)

Noun animacy and dependency classes

Preverbs

Auxiliary symbols

These symbols either shape or govern the morphophonological structure

Symbols that need to be escaped on the lower side (towards twolc):

Special characters for morphophonology

Triggers for various morphophonological phenomena Mostly, these are not realized themselves as any grapheme/phoneme

Usage tags

These tags distinguish different special-purpose analysers and generators from each other. Thus, for examples, we have normative and descriptive analysers, and generators for different purposes.

Flagdiacritics

These are documented in Chapter 8 of Beesley/Karttunen, p. 456 zB.

For indicative, there are prefixes, so here we need one flag for each person-number combination. Note that for the inverse objective conjugation, the flag refers to the prefix, not to the subject. So indsg1 refers to either subject = 1Sg or object = 1Sg. The 3-3 forms are prefixless.

The conjunct form always has the ê- prefix, and future conditional never has a prefix.

Prefixes with a certain phonological content:

Order

Tense

New multichar symbols for nouns

End of new and all Multichar_Symbols

LEXICON Root is where it all starts


This (part of) documentation was generated from src/fst/morphology/root.lexc


src-fst-morphology-stems-noun_header.lexc.md

Test lemma/stem set for nouns according the new cwd FST

LEXICON NOUN_DEP_NONKINSHIP_STEMS ! Animate Dependent Noun stems masakay:asakay NAD “skin” ; mitâs:tâs NAD “pair of pants” ; ! AEW NDA-1 (Consonant initial regular stem) mispan:span NAD “lung” ; ! AEW NDA-1 (Vowel initial regular stem) note that in 3rd person the w- does not surface mitihtihkosiy:tihtihkosiy NAD “kidney” ; ! AEW NDA-2 (Consonant initial regular VG stem) N.B. also occurs as the stems tihtikosiw and tihtikos (the latter is an NDI-1 declension) maskasiy:askasiy NAD “finger nail/claw” ; ! AEW NDA-2 (Vowel Initial VG stem) ! Inanimate Dependent Noun stems mihtawakay:htawakay NID “ear” ; mispiton:spiton NID “arm” ; miyaw:iyaw2 NID ; mitîh:tîh NID “heart” ; ! AEW NDI-1 (Consonant Initial Regular stem) mîpit:îpit NID “tooth” ; ! AEW NDI-1 (Vowel Initial Regular stem) miskotâkay:skotâkay NID “coat/dress” ; ! AEW NDI-2 (Consonant Initial VW stem) mîstakay:îstakay NID “hair” ; ! AEW NDI-1 (Vowel Initial VW stem) miskîsik:skîsikw NID “eye” ; ! AEW NDI-3 (Consonant Initial Cw stem) mathakask:athakaskw NID “palate” ; ! AEW NDI-3 (Vowel Initial Cw stem) @P.number.SG@mîni:@P.number.SG@în NID_SG/I “bone-marrow” ; ! AEW NDI-4 (Vowel Initial Single Syllable NDI-4 stem) N.B. might cause issue with <î>, also is a mass noun that is never plural.

LEXICON NOUN_DEP_KINSHIP_STEMS nîwas:îwat3 NID “sacred bundle” ; ! AEW NID Irregular(?) VW single-syllable stem. N.B. T->s when word final. (can’t be PxX) nôhkom:ohkom NAD “grand-mother” ; nôhtawiy:ôhtawiy NAD “father” ; nitânis:tânis NAD “daughter, brother, sister” ; ninîkihik:nîkihikw NAD “parent” ; ! AEW NDA-3 (Consonant initial Vw stem) Deal later: can also be the stem or . and can occur in locative. The can not. nîtim:îtimw NAD "cross-cousin of opposite gender" ; ! AEW NDA-3 (Cowel initial Vw stem). nîwa:îw2 NAD_SG/A "wife" ; ! AEW NDA-4 (Vowel initial sing syllable NDA-4 stem). N.B. Only one of its type. @P.loc.NULL@nîskwa:@P.loc.NULL@îskw NAD_SG/A "fellow wife; husband's ex" ; ! AEW NDA-4w (Vowel initial single-syllable-/w/ stem). N.B. Only one of its type. Cannot take the locative (marked here with a flag-diacritic, which could be incorporated in a new set of NON-LOC contlexes).

! Non-kinship dependent noun(s) which do not take PxX nîki:îk NID_SG/I “home” ; ! AEW NDI-4 (Vowel Initial Single Syllable NDI-4 stem) N.B. Doesn’t usually work unpossessed or plural unless distributive.

LEXICON NOUN_INDEP_STEMS ! ! Independent NA AEW to FST conversion ! NA-1, NA-2 -> NA ! NA-3 -> NA + stem-final -w ! NA-4, NA-4w -> NA_SG/A_POSS/IM -w2, y4 (immutable) ! ! Independent NI AEW to FST conversion ! NI-1, NI-2 -> NI ! NI-3 -> NI + -w (stem) ! NI-4, NI-4w -> NI_SG/I_POSS/IM, with w2, y4 for the immutable final glides ! Test case wug stems (marked with +Err/Dummy so that they may be eventually removed) mamam+Err/Dummy:mamam NA ; papap+Err/Dummy:papap NA_POSS/IM ; pipip+Err/Dummy:pipip NA_POSS/IM2 ; tatat+Err/Dummy:tatat NA_DIM/IS ; titit+Err/Dummy:titit NA_DIM/ISIS ; nanan+Err/Dummy:nanan NA_POSS/IM_DIM/IS ; ninin+Err/Dummy:ninin NA_POSS/IM_DIM/ISIS ; kikik+Err/Dummy:kikik NA_POSS/IM2_DIM/IS ; kakak+Err/Dummy:kakak NA_POSS/IM2_DIM/ISIS ; wawâs+Err/Dummy:wawâs NA_DIM/DIM ; ! mam+Err/Dummy:mam NA_SG/A ; ! mim+Err/Dummy:mim NA_SG/I ; nan+Err/Dummy:nan NA_SG/A_POSS/IM ; nin+Err/Dummy:nin NA_SG/I_POSS/IM ; pap+Err/Dummy:pap NA_SG/A_POSS/IM_DIM/IS ; pip+Err/Dummy:pip NA_SG/A_POSS/IM_DIM/ISIS ;

Animate Noun stems pahkwîsikan NA_POSS/IM “bannock” ; ! AEW NA-1 (Consonant-Initial Regular NA Stem) asikan NA “sock” ; ! AEW NA-1 (Vowel-Initial Regular NA Stem) kihc-ôkiniy NA “tomato” ; ! AEW NA-2 (Consonant-Initial Vowel-Glide NA Stem) athapiy NA “net” ; ! AEW NA-2 (Vowel-Initial Vowel-Glide NA Stem) kwâpahikan NA “ladle” ; ! AEW NA-1 (Consonant-Initial Regular NA Stem) masinahikanâhtik:masinahikanâhtikw NA “pencil” ; ! AEW NA-3 (Consonant-Initial Consonant-/w/ NA Stem) askihk:askihkw NA_POSS/IM “kettle, pail” ; ! AEW NA-3 (Vowel-Initial Consonant-/w/ NA Stem) niska:nisk NA_SG/A_POSS/IM “goose” ; ! AEW NA-4 (Consonant-Initial Single-Syllable NA Stem) sihti:siht NA_SG/I_POSS/IM “spruce” ; ! AEW NA-4 (Consonant-Initial Single-Syllable NA Stem) îsa:îs NA_SG/A_POSS/IM “clam; shell” ; ! AEW NA-4 (Vowel-Initial Single-Syllable NA Stem) wâhkwa:wâhkw NA_SG/A_POSS/IM “roe, fish eggs; lump of roe” ; ! AEW NA-4w (Consonant-Initial Single-Syllable-/w/ NA Stem) ihkwa:ihkw NA_SG/A_POSS/IM “louse” ; ! AEW NA-4w (Vowel-Initial Single-Syllable-/w/ NA Stem) ! ! Non-AEW NA test cases ! âmow NA_POSS/IM “bee” ; ! maskwa:maskw NA_SG/A_POSS/IM “bear” ; ! sîsîp NA_POSS/IM “duck” ; ! Inanimate Noun stems askiy NI “land” ; cîmân NI “canoe” ; ! AEW NI-1 (Consonant-Initial Regular NI Stem) astotin NI “hat” ; ! AEW NI-1 (Vowel-Initial Regular NI Stem) maskihkiy NI “medicine” ; ! AEW NI-2 (Consonant-Initial VW NI Stem) mîkisasâkay NI “beaded coat, beaded dress” ; ! AEW NI-2 (Consonant-Initial VW NI Stem) oskasâkay NI “new coat, new dress” ; ! AEW NI-2 (Vowel-Initial VW NI Stem) ! pahkîkin:pahkîkinw NI “leather, rawhide” ; ! AEW NI-3 (Consonant-Initial Cw NI Stem) kotawânâpisk:kotawânâpiskw NI ; ! AEW NI-3 (Consonant-Initial Cw NI Stem) nipîwikamik:nipîwikamikw NI ; ! AEW NI-3 (Consonant-Initial Cw NI Stem) ! askîkin:askîkinw NI “fresh rawhide” ; ! AEW NI-3 (Vowel-Initial Cw NI Stem) wâwi:wâw2 NI_SG/I_POSS/IM “egg” ; ! AEW NI-4 (Consonant-Initial Single-Syllable NI Stem) osk-âyi:osk-ây4 NI_SG/I_POSS/IM “new item, new thing” ; ! AEW NI-4 (Vowel-Initial Single-Syllable NI Stem) @P.number.SG@misko:@P.number.SG@miskw NI_SG/I_POSS/IM “blood” ; ! AEW NI-4w (Consonant-Initial Single-Syllable-/w/ NI Stem) ! Irregular stem cases ! Suppletive atimw-/-têm- @R.person.NULL@atim:@R.person.NULL@atimw NA “dog, beast of burden” ; ! Regular stem of ‘atim’: ‘atimw’ (cannot be possessed) @D.person.NULL@atim:@D.person.NULL@tîm NA “dog, beast of burden” ; ! Irregular suppletive stem of ‘atim’: ‘-tîm’ (must be possessed) Semi-suppletive kîhtî-aya/kîhcî-aya ! This is not part of the latest AEW noun paradigm sets, so probably should be excluded ! @R.person.NULL@kîhtî-aya:@R.person.NULL@kîhtî-ay4 NA_SG/A_POSS/IM “elder” ; ! AEW NA-4 (Consonant-Initial Single-Syllable NA Stem) ! @D.person.NULL@kîhtî-aya:@D.person.NULL@kîhcî-ayim NA “elder” ; ! AEW NA-4 (Consonant-Initial Single-Syllable NA Stem) kîhtî-aya:kîhtî-ay4 NA_SG/A_POSS/IM ; ! AEW NA-4 (Consonant-Initial Single-Syllable NA Stem) Regular/Irregular ôsi- both a regularly inflecting stem, and a number of irregular forms enumerated separately ôsi:ôs NI_SG/I_POSS/IM “canoe, boat” ; ! AEW NI-4 (Vowel-Initial Single-Syllable irregular NI Stem) ! Subset of lexicalized Diminutive Animate Independent stems @P.dim.DIM@ NOUN_INDEP_DIM_STEMS ;

! Sub-continuation lexicon for lexicalized diminutive stems LEXICON NOUN_INDEP_DIM_STEMS acimos NA “puppy” ; ! ôcisis NI “small canoe” ; !

Complete extraction of lemma:stem info from LLR dictionary 2022, according to LEXC structure in the new cwd FST.


This (part of) documentation was generated from src/fst/morphology/stems/noun_header.lexc


src-fst-morphology-stems-noun_stems.lexc.md

Test lemma/stem set for nouns according the new cwd FST

LEXICON NOUN_DEP_NONKINSHIP_STEMS ! Animate Dependent Noun stems masakay:asakay NAD “skin” ; mitâs:tâs NAD “pair of pants” ; ! AEW NDA-1 (Consonant initial regular stem) mispan:span NAD “lung” ; ! AEW NDA-1 (Vowel initial regular stem) note that in 3rd person the w- does not surface mitihtihkosiy:tihtihkosiy NAD “kidney” ; ! AEW NDA-2 (Consonant initial regular VG stem) N.B. also occurs as the stems tihtikosiw and tihtikos (the latter is an NDI-1 declension) maskasiy:askasiy NAD “finger nail/claw” ; ! AEW NDA-2 (Vowel Initial VG stem) ! Inanimate Dependent Noun stems mihtawakay:htawakay NID “ear” ; mispiton:spiton NID “arm” ; miyaw:iyaw2 NID ; mitîh:tîh NID “heart” ; ! AEW NDI-1 (Consonant Initial Regular stem) mîpit:îpit NID “tooth” ; ! AEW NDI-1 (Vowel Initial Regular stem) miskotâkay:skotâkay NID “coat/dress” ; ! AEW NDI-2 (Consonant Initial VW stem) mîstakay:îstakay NID “hair” ; ! AEW NDI-1 (Vowel Initial VW stem) miskîsik:skîsikw NID “eye” ; ! AEW NDI-3 (Consonant Initial Cw stem) mathakask:athakaskw NID “palate” ; ! AEW NDI-3 (Vowel Initial Cw stem) @P.number.SG@mîni:@P.number.SG@în NID_SG/I “bone-marrow” ; ! AEW NDI-4 (Vowel Initial Single Syllable NDI-4 stem) N.B. might cause issue with <î>, also is a mass noun that is never plural.

LEXICON NOUN_DEP_KINSHIP_STEMS nîwas:îwat3 NID “sacred bundle” ; ! AEW NID Irregular(?) VW single-syllable stem. N.B. T->s when word final. (can’t be PxX) nôhkom:ohkom NAD “grand-mother” ; nôhtawiy:ôhtawiy NAD “father” ; nitânis:tânis NAD “daughter, brother, sister” ; ninîkihik:nîkihikw NAD “parent” ; ! AEW NDA-3 (Consonant initial Vw stem) Deal later: can also be the stem or . and can occur in locative. The can not. nîtim:îtimw NAD "cross-cousin of opposite gender" ; ! AEW NDA-3 (Cowel initial Vw stem). nîwa:îw2 NAD_SG/A "wife" ; ! AEW NDA-4 (Vowel initial sing syllable NDA-4 stem). N.B. Only one of its type. @P.loc.NULL@nîskwa:@P.loc.NULL@îskw NAD_SG/A "fellow wife; husband's ex" ; ! AEW NDA-4w (Vowel initial single-syllable-/w/ stem). N.B. Only one of its type. Cannot take the locative (marked here with a flag-diacritic, which could be incorporated in a new set of NON-LOC contlexes).

! Non-kinship dependent noun(s) which do not take PxX nîki:îk NID_SG/I “home” ; ! AEW NDI-4 (Vowel Initial Single Syllable NDI-4 stem) N.B. Doesn’t usually work unpossessed or plural unless distributive.

LEXICON NOUN_INDEP_STEMS ! ! Independent NA AEW to FST conversion ! NA-1, NA-2 -> NA ! NA-3 -> NA + stem-final -w ! NA-4, NA-4w -> NA_SG/A_POSS/IM -w2, y4 (immutable) ! ! Independent NI AEW to FST conversion ! NI-1, NI-2 -> NI ! NI-3 -> NI + -w (stem) ! NI-4, NI-4w -> NI_SG/I_POSS/IM, with w2, y4 for the immutable final glides ! Test case wug stems (marked with +Err/Dummy so that they may be eventually removed) mamam+Err/Dummy:mamam NA ; papap+Err/Dummy:papap NA_POSS/IM ; pipip+Err/Dummy:pipip NA_POSS/IM2 ; tatat+Err/Dummy:tatat NA_DIM/IS ; titit+Err/Dummy:titit NA_DIM/ISIS ; nanan+Err/Dummy:nanan NA_POSS/IM_DIM/IS ; ninin+Err/Dummy:ninin NA_POSS/IM_DIM/ISIS ; kikik+Err/Dummy:kikik NA_POSS/IM2_DIM/IS ; kakak+Err/Dummy:kakak NA_POSS/IM2_DIM/ISIS ; wawâs+Err/Dummy:wawâs NA_DIM/DIM ; ! mam+Err/Dummy:mam NA_SG/A ; ! mim+Err/Dummy:mim NA_SG/I ; nan+Err/Dummy:nan NA_SG/A_POSS/IM ; nin+Err/Dummy:nin NA_SG/I_POSS/IM ; pap+Err/Dummy:pap NA_SG/A_POSS/IM_DIM/IS ; pip+Err/Dummy:pip NA_SG/A_POSS/IM_DIM/ISIS ;

Animate Noun stems pahkwîsikan NA_POSS/IM “bannock” ; ! AEW NA-1 (Consonant-Initial Regular NA Stem) asikan NA “sock” ; ! AEW NA-1 (Vowel-Initial Regular NA Stem) kihc-ôkiniy NA “tomato” ; ! AEW NA-2 (Consonant-Initial Vowel-Glide NA Stem) athapiy NA “net” ; ! AEW NA-2 (Vowel-Initial Vowel-Glide NA Stem) kwâpahikan NA “ladle” ; ! AEW NA-1 (Consonant-Initial Regular NA Stem) masinahikanâhtik:masinahikanâhtikw NA “pencil” ; ! AEW NA-3 (Consonant-Initial Consonant-/w/ NA Stem) askihk:askihkw NA_POSS/IM “kettle, pail” ; ! AEW NA-3 (Vowel-Initial Consonant-/w/ NA Stem) niska:nisk NA_SG/A_POSS/IM “goose” ; ! AEW NA-4 (Consonant-Initial Single-Syllable NA Stem) sihti:siht NA_SG/I_POSS/IM “spruce” ; ! AEW NA-4 (Consonant-Initial Single-Syllable NA Stem) îsa:îs NA_SG/A_POSS/IM “clam; shell” ; ! AEW NA-4 (Vowel-Initial Single-Syllable NA Stem) wâhkwa:wâhkw NA_SG/A_POSS/IM “roe, fish eggs; lump of roe” ; ! AEW NA-4w (Consonant-Initial Single-Syllable-/w/ NA Stem) ihkwa:ihkw NA_SG/A_POSS/IM “louse” ; ! AEW NA-4w (Vowel-Initial Single-Syllable-/w/ NA Stem) ! ! Non-AEW NA test cases ! âmow NA_POSS/IM “bee” ; ! maskwa:maskw NA_SG/A_POSS/IM “bear” ; ! sîsîp NA_POSS/IM “duck” ; ! Inanimate Noun stems askiy NI “land” ; cîmân NI “canoe” ; ! AEW NI-1 (Consonant-Initial Regular NI Stem) astotin NI “hat” ; ! AEW NI-1 (Vowel-Initial Regular NI Stem) maskihkiy NI “medicine” ; ! AEW NI-2 (Consonant-Initial VW NI Stem) mîkisasâkay NI “beaded coat, beaded dress” ; ! AEW NI-2 (Consonant-Initial VW NI Stem) oskasâkay NI “new coat, new dress” ; ! AEW NI-2 (Vowel-Initial VW NI Stem) ! pahkîkin:pahkîkinw NI “leather, rawhide” ; ! AEW NI-3 (Consonant-Initial Cw NI Stem) kotawânâpisk:kotawânâpiskw NI ; ! AEW NI-3 (Consonant-Initial Cw NI Stem) nipîwikamik:nipîwikamikw NI ; ! AEW NI-3 (Consonant-Initial Cw NI Stem) ! askîkin:askîkinw NI “fresh rawhide” ; ! AEW NI-3 (Vowel-Initial Cw NI Stem) wâwi:wâw2 NI_SG/I_POSS/IM “egg” ; ! AEW NI-4 (Consonant-Initial Single-Syllable NI Stem) osk-âyi:osk-ây4 NI_SG/I_POSS/IM “new item, new thing” ; ! AEW NI-4 (Vowel-Initial Single-Syllable NI Stem) @P.number.SG@misko:@P.number.SG@miskw NI_SG/I_POSS/IM “blood” ; ! AEW NI-4w (Consonant-Initial Single-Syllable-/w/ NI Stem) ! Irregular stem cases ! Suppletive atimw-/-têm- @R.person.NULL@atim:@R.person.NULL@atimw NA “dog, beast of burden” ; ! Regular stem of ‘atim’: ‘atimw’ (cannot be possessed) @D.person.NULL@atim:@D.person.NULL@tîm NA “dog, beast of burden” ; ! Irregular suppletive stem of ‘atim’: ‘-tîm’ (must be possessed) Semi-suppletive kîhtî-aya/kîhcî-aya ! This is not part of the latest AEW noun paradigm sets, so probably should be excluded ! @R.person.NULL@kîhtî-aya:@R.person.NULL@kîhtî-ay4 NA_SG/A_POSS/IM “elder” ; ! AEW NA-4 (Consonant-Initial Single-Syllable NA Stem) ! @D.person.NULL@kîhtî-aya:@D.person.NULL@kîhcî-ayim NA “elder” ; ! AEW NA-4 (Consonant-Initial Single-Syllable NA Stem) kîhtî-aya:kîhtî-ay4 NA_SG/A_POSS/IM ; ! AEW NA-4 (Consonant-Initial Single-Syllable NA Stem) Regular/Irregular ôsi- both a regularly inflecting stem, and a number of irregular forms enumerated separately ôsi:ôs NI_SG/I_POSS/IM “canoe, boat” ; ! AEW NI-4 (Vowel-Initial Single-Syllable irregular NI Stem) ! Subset of lexicalized Diminutive Animate Independent stems @P.dim.DIM@ NOUN_INDEP_DIM_STEMS ;

! Sub-continuation lexicon for lexicalized diminutive stems LEXICON NOUN_INDEP_DIM_STEMS acimos NA “puppy” ; ! ôcisis NI “small canoe” ; !

Complete extraction of lemma:stem info from LLR dictionary 2022, according to LEXC structure in the new cwd FST.


This (part of) documentation was generated from src/fst/morphology/stems/noun_stems.lexc


src-fst-morphology-stems-numerals.lexc.md

Plains Cree numerals

The file for numerals

Here start the 999 numbers


This (part of) documentation was generated from src/fst/morphology/stems/numerals.lexc


src-fst-morphology-stems-particles.lexc.md

Woods Cree particles

Full extraction of particles from LLR source (2022):


This (part of) documentation was generated from src/fst/morphology/stems/particles.lexc


src-fst-morphology-stems-particles_header.lexc.md

Woods Cree particles

Full extraction of particles from LLR source (2022):


This (part of) documentation was generated from src/fst/morphology/stems/particles_header.lexc


src-fst-morphology-stems-pronouns.lexc.md

Plains Cree pronouns

There are more pronoums to be added here.

LEXICON Pronoun

LEXICON Personal
nîtha+Pron+Pers+1Sg:nîtha # ; kîtha+Pron+Pers+2Sg:kîtha # ;

LEXICON Interrogative
awîna+Pron+Interr+A+Sg:awîna # “who,whose” ; awînak+Pron+Interr+A+Sg:awînak # “who,whose” ; awîna+Pron+Interr+A+Pl:awîna # “who,whose” ; awînak+Pron+Interr+A+Pl:awînak # “who,whose” ; awîniki+Pron+Interr+A+Pl+Var/East:awîniki # “who” ; awînikik+Pron+Interr+A+Pl+Var/East:awînikik # “who” ; awîna+Pron+Interr+A+Obv:awîna # “who,whose” ; awînithiwa+Pron+Interr+A+Obv+Var/East:awînithiwa # “who” ; awînaka+Pron+Interr+A+Obv+Var/East:awînaka # “who” ;

LEXICON Indefinite awiyak+Pron+Indef+A+Sg:awiyak # “someone” ; awiyak+Pron+Indef+A+Pl:awiyakak # “some people” ;

LEXICON Definite \

LEXICON Demonstrative
ANIMATE
awa+Pron+Dem+Prox+A+Sg:awa # “this” ; ôko+Pron+Dem+Prox+A+Pl:ôko # “these” ; ôho+Pron+Dem+Prox+A+Obv:ôho # “this/these” ;

INANIMATE \

ôma+Pron+Dem+Prox+I+Sg:ôma # “this” ; ôho+Pron+Dem+Prox+I+Pl:ôho # “these” ; ômîthiw+Pron+Dem+Prox+I+Obv:ômîthiw # “this/these” ;

ôma+Pron+Def+Prox+I+Sg:ôma # “this one” ; ôho+Pron+Def+Prox+I+Pl:ôho # “these one” ; ômîthiw+Pron+Def+Prox+I+Obv:ômîthiw # “this/these one(s)” ;


This (part of) documentation was generated from src/fst/morphology/stems/pronouns.lexc


src-fst-morphology-stems-verb_header.lexc.md

Model verb lemmas and stems for new crk FST

LEXICON VERBSTEMS

osâwâw:osâwâ VIIw ; miskwâw:miskwâ VIIw ; nîpin:nîpin3 VIIn ; mispon VIIn_PL ; ! only occurs in plural pimamon VIIn ; mâthâtan:mâthâtan3 VIIn ; mîthwâsin VIIn ;

apiw:api VAIw ; atoskîw:atoskî VAIw ; mâtow:mâto VAIio ; mîcisow:mîciso VAIio ; nîhithawîw:nîhithawî VAIw ; nipâw:nipâ VAIae ; pimisin:pimisin3 VAIn ;

kâtâw:kâtâ VTIw ; kîsihtâw:kîsihtâ VTIw ; mîciw:mîci VTIw ; wâpahtam:wâpaht4a VTIm ; ! Check status of -a nâtam:nâta VTIm ;

kîskiswîw:kîskisw VTA; ! w:0 for collapsing cases nitonawîw:nitonaw VTA ; atoskahîw:atoskah VTA ; miskawêw:miskaw VTA ; mowîw:mow2 VTA ; ! w2:w for non-collapsing cases nakatîw:nakat3 VTA ; ! t3:s in some cases nâtîw:nât3 VTAt ; itîw:it3 VTAi ; wâpamîw:wâpam VTA ; wîcihîw:wîcih VTA ; mîstasîhkawîwak:mîstasîhkaw VTA_PL ; !’generally’ plural according to LLR

Full incorporation of LLR 2022 verbs into new cwd FST


This (part of) documentation was generated from src/fst/morphology/stems/verb_header.lexc


src-fst-morphology-stems-verb_stems.lexc.md

Model verb lemmas and stems for new crk FST

LEXICON VERBSTEMS

osâwâw:osâwâ VIIw ; miskwâw:miskwâ VIIw ; nîpin:nîpin3 VIIn ; mispon VIIn_PL ; ! only occurs in plural pimamon VIIn ; mâthâtan:mâthâtan3 VIIn ; mîthwâsin VIIn ;

apiw:api VAIw ; atoskîw:atoskî VAIw ; mâtow:mâto VAIio ; mîcisow:mîciso VAIio ; nîhithawîw:nîhithawî VAIw ; nipâw:nipâ VAIae ; pimisin:pimisin3 VAIn ;

kâtâw:kâtâ VTIw ; kîsihtâw:kîsihtâ VTIw ; mîciw:mîci VTIw ; wâpahtam:wâpaht4a VTIm ; ! Check status of -a nâtam:nâta VTIm ;

kîskiswîw:kîskisw VTA; ! w:0 for collapsing cases nitonawîw:nitonaw VTA ; atoskahîw:atoskah VTA ; miskawêw:miskaw VTA ; mowîw:mow2 VTA ; ! w2:w for non-collapsing cases nakatîw:nakat3 VTA ; ! t3:s in some cases nâtîw:nât3 VTAt ; itîw:it3 VTAi ; wâpamîw:wâpam VTA ; wîcihîw:wîcih VTA ; mîstasîhkawîwak:mîstasîhkaw VTA_PL ; !’generally’ plural according to LLR

Full incorporation of LLR 2022 verbs into new cwd FST


This (part of) documentation was generated from src/fst/morphology/stems/verb_stems.lexc


src-fst-phonetics-txt2ipa.xfscript.md

retroflex plosive, voiceless t ʈ 0288, 648 ( = ASCII 096) retroflex plosive, voiced d ɖ 0256, 598 labiodental nasal F ɱ 0271, 625 retroflex nasal n ɳ 0273, 627 palatal nasal J ɲ 0272, 626 velar nasal N ŋ 014B, 331 uvular nasal N\ ɴ 0274, 628

bilabial trill B\ ʙ 0299, 665 uvular trill R\ ʀ 0280, 640 alveolar tap 4 ɾ 027E, 638 retroflex flap r ɽ 027D, 637 bilabial fricative, voiceless p\ ɸ 0278, 632 bilabial fricative, voiced B β 03B2, 946 dental fricative, voiceless T θ 03B8, 952 dental fricative, voiced D ð 00F0, 240 postalveolar fricative, voiceless S ʃ 0283, 643 postalveolar fricative, voiced Z ʒ 0292, 658 retroflex fricative, voiceless s ʂ 0282, 642 retroflex fricative, voiced z` ʐ 0290, 656 palatal fricative, voiceless C ç 00E7, 231 palatal fricative, voiced j\ ʝ 029D, 669 velar fricative, voiced G ɣ 0263, 611 uvular fricative, voiceless X χ 03C7, 967 uvular fricative, voiced R ʁ 0281, 641 pharyngeal fricative, voiceless X\ ħ 0127, 295 pharyngeal fricative, voiced ?\ ʕ 0295, 661 glottal fricative, voiced h\ ɦ 0266, 614

alveolar lateral fricative, vl. K alveolar lateral fricative, vd. K\

labiodental approximant P (or v) alveolar approximant r\ retroflex approximant r` velar approximant M\

retroflex lateral approximant l` palatal lateral approximant L velar lateral approximant L
Clicks

bilabial O\ (O = capital letter) dental |
(post)alveolar !\ palatoalveolar =\ alveolar lateral ||
Ejectives, implosives

ejective > e.g. ejective p p> implosive < e.g. implosive b b< Vowels

close back unrounded M close central unrounded 1 close central rounded } lax i I lax y Y lax u U

close-mid front rounded 2 close-mid central unrounded @\ close-mid central rounded 8 close-mid back unrounded 7

schwa ə @

open-mid front unrounded E open-mid front rounded 9 open-mid central unrounded 3 open-mid central rounded 3\ open-mid back unrounded V open-mid back rounded O

ash (ae digraph) { open schwa (turned a) 6

open front rounded & open back unrounded A open back rounded Q Other symbols

voiceless labial-velar fricative W voiced labial-palatal approx. H voiceless epiglottal fricative H\ voiced epiglottal fricative <\ epiglottal plosive >\

alveolo-palatal fricative, vl. s\ alveolo-palatal fricative, voiced z\ alveolar lateral flap l\ simultaneous S and x x\ tie bar _ Suprasegmentals

primary stress “ secondary stress % long : half-long :\ extra-short _X linking mark -
Tones and word accents

level extra high _T level high _H level mid _M level low _L level extra low _B downstep ! upstep ^ (caret, circumflex)

contour, rising contour, falling _F contour, high rising _H_T contour, low rising _B_L

contour, rising-falling _R_F (NB Instead of being written as diacritics with _, all prosodic marks can alternatively be placed in a separate tier, set off by < >, as recommended for the next two symbols.) global rise global fall Diacritics

voiceless 0 (0 = figure), e.g. n_0 voiced _v aspirated _h more rounded _O (O = letter) less rounded _c advanced _+ retracted _- centralized _” syllabic = (or _=) e.g. n= (or n=) non-syllabic _^ rhoticity `

breathy voiced _t creaky voiced _k linguolabial _N labialized _w palatalized ‘ (or _j) e.g. t’ (or t_j) velarized _G pharyngealized _?\

dental d apical _a laminal _m nasalized ~ (or _~) e.g. A~ (or A~) nasal release _n lateral release _l no audible release _}

velarized or pharyngealized _e velarized l, alternatively 5 raised _r lowered _o advanced tongue root _A retracted tongue root _q


This (part of) documentation was generated from src/fst/phonetics/txt2ipa.xfscript


src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md

We describe here how abbreviations are in Woods Cree are read out, e.g. for text-to-speech systems.

For example:


This (part of) documentation was generated from src/fst/transcriptions/transcriptor-abbrevs2text.lexc


src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.md

% komma% :, Root ; % tjuohkkis% :%. Root ; % kolon% :%: Root ; % sárggis% :%- Root ; % násti% :%* Root ;


This (part of) documentation was generated from src/fst/transcriptions/transcriptor-numbers-digit2text.lexc


tools-grammarcheckers-grammarchecker.cg3.md

[ L A N G U A G E ] G R A M M A R C H E C K E R

DELIMITERS

TAGS AND SETS

Tags

This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.

Beginning and end of sentence

BOS EOS

Parts of speech tags

N A Adv V Pron CS CC CC-CS Po Pr Pcle Num Interj ABBR ACR CLB LEFT RIGHT WEB PPUNCT PUNCT

COMMA ¶

Tags for POS sub-categories

Pers Dem Interr Indef Recipr Refl Rel Coll NomAg Prop Allegro Arab Romertall

Tags for morphosyntactic properties

Nom Acc Gen Ill Loc Com Ess Ess Sg Du Pl Cmp/SplitR Cmp/SgNom Cmp/SgGen Cmp/SgGen PxSg1 PxSg2 PxSg3 PxDu1 PxDu2 PxDu3 PxPl1 PxPl2 PxPl3 Px

Comp Superl Attr Ord Qst IV TV Prt Prs Ind Pot Cond Imprt ImprtII Sg1 Sg2 Sg3 Du1 Du2 Du3 Pl1 Pl2 Pl3 Inf ConNeg Neg PrfPrc VGen PrsPrc Ger Sup Actio VAbess

Err/Orth

Semantic tags

Sem/Act Sem/Ani Sem/Atr Sem/Body Sem/Clth Sem/Domain Sem/Feat-phys Sem/Fem Sem/Group Sem/Lang Sem/Mal Sem/Measr Sem/Money Sem/Obj Sem/Obj-el Sem/Org Sem/Perc-emo Sem/Plc Sem/Sign Sem/State-sick Sem/Sur Sem/Time Sem/Txt

HUMAN

PROP-ATTR PROP-SUR

TIME-N-SET

Syntactic tags

@+FAUXV @+FMAINV @-FAUXV @-FMAINV @-FSUBJ> @-F<OBJ @-FOBJ> @-FSPRED<OBJ @-F<ADVL @-FADVL> @-F<SPRED @-F<OPRED @-FSPRED> @-FOPRED> @>ADVL @ADVL< @<ADVL @ADVL> @ADVL @HAB> @<HAB @>N @Interj @N< @>A @P< @>P @HNOUN @INTERJ @>Num @Pron< @>Pron @Num< @OBJ @<OBJ @OBJ> @OPRED @<OPRED @OPRED> @PCLE @COMP-CS< @SPRED @<SPRED @SPRED> @SUBJ @<SUBJ @SUBJ> SUBJ SPRED OPRED @PPRED @APP @APP-N< @APP-Pron< @APP>Pron @APP-Num< @APP-ADVL< @VOC @CVP @CNP OBJ

-OTHERS SYN-V @X ## Sets containing sets of lists and tags This part of the file lists a large number of sets based partly upon the tags defined above, and partly upon lexemes drawn from the lexicon. See the sourcefile itself to inspect the sets, what follows here is an overview of the set types. ### Sets for Single-word sets INITIAL ### Sets for word or not WORD NOT-COMMA ### Case sets ADLVCASE CASE-AGREEMENT CASE NOT-NOM NOT-GEN NOT-ACC ### Verb sets NOT-V ### Sets for finiteness and mood REAL-NEG MOOD-V NOT-PRFPRC ### Sets for person SG1-V SG2-V SG3-V DU1-V DU2-V DU3-V PL1-V PL2-V PL3-V ### Pronoun sets ### Adjectival sets and their complements ### Adverbial sets and their complements ### Sets of elements with common syntactic behaviour ### NP sets defined according to their morphosyntactic features ### The PRE-NP-HEAD family of sets These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression **WORD - premodifiers**. ### Border sets and their complements ### Grammarchecker sets * * * This (part of) documentation was generated from [tools/grammarcheckers/grammarchecker.cg3](https://github.com/giellalt/lang-cwd/blob/main/tools/grammarcheckers/grammarchecker.cg3) --- # tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.md # Tokeniser for cwd Usage: ``` $ make $ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://github.com/hfst/hfst/wiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1. unknown word-like forms, and 2. unmatched strings We want to give 1) a match, but let 2) be treated specially by `hfst-tokenise -a` Unknowns are made of: * lower-case ASCII * upper-case ASCII * select extended latin symbols ASCII digits * select symbols * Combining diacritics as individual symbols, * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" ## Unknown handling Unknowns are tagged ?? and treated specially with `hfst-tokenise` hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Finally we mark as a token any sequence making up a: * known word in context * unknown (OOV) token in context * sequence of word and punctuation * URL in context * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-disamb-gt-desc.pmscript](https://github.com/giellalt/lang-cwd/blob/main/tools/tokenisers/tokeniser-disamb-gt-desc.pmscript) --- # tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.md # Grammar checker tokenisation for cwd Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just: ``` $ make $ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` More usage examples: ``` $ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst $ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://github.com/hfst/hfst/wiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a * select extended latin symbols * select symbols * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" TODO: Could use something like this, but built-in's don't include šžđčŋ: Simply give an empty reading when something is unknown: hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Finally we mark as a token any sequence making up a: * known word in context * unknown (OOV) token in context * sequence of word and punctuation * URL in context * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript](https://github.com/giellalt/lang-cwd/blob/main/tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript) --- # tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.md # TTS tokenisation for smj Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just: ```sh make echo "ja, ja" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` More usage examples: ```sh echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \ boasttu olmmoš, man mielde lahtuid." \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst echo "márffibiillagáffe" \ | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst ``` Pmatch documentation: <https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstPmatch> Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words: * Punct contains ASCII punctuation marks * The symbol after m-dash is soft-hyphen `U+00AD` * The symbol following {•} is byte-order-mark / zero-width no-break space `U+FEFF`. Whitespace contains ASCII white space and the List contains some unicode white space characters * En Quad U+2000 to Zero-Width Joiner U+200d' * Narrow No-Break Space U+202F * Medium Mathematical Space U+205F * Word joiner U+2060 Apart from what's in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a * select extended latin symbols * select symbols * various symbols from Private area (probably Microsoft), so far: * U+F0B7 for "x in box" TODO: Could use something like this, but built-in's don't include šžđčŋ: Simply give an empty reading when something is unknown: hfst-tokenise --giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it's safer to let hfst-tokenise handle them. Needs hfst-tokenise to output things differently depending on the tag they get * * * This (part of) documentation was generated from [tools/tokenisers/tokeniser-tts-cggt-desc.pmscript](https://github.com/giellalt/lang-cwd/blob/main/tools/tokenisers/tokeniser-tts-cggt-desc.pmscript)