Plains Cree NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-crk

Work Plan with background for Cree FST:

(Potential) orthographical differences and addressing them

Modeling of morphology: background and modeling decisions

Frequency counts from Wolfart texts:

WEAK with ay-:

kâ-kî-ay-ohtinamihk kâ-kî-ay-ohtinamihk +?

WEAK with oy-:

ê-oy-oswâcik	ê-oy-oswâcik	+?
nitati-oy-otâpân	nitati-oy-otâpân	+?
kâ-kî-oy-ohpikihicik	kâ-kî-oy-ohpikihicik	+?

STRONG with âh-:

kî-âh-oskinîkiwak	kî-âh-oskinîkiwak	+?
k-âh-oskinîkicik	k-âh-oskinîkicik	+?
ê-kî-âh-ocawâsimisicik	RdplS+ocawâsimisiw+V+AI+Cnj+Prt+3Pl
kiwî-kakwê-âh-onâpêminâwâw	PV/kakwe+RdplS+onâpêmiw+V+AI+Ind+Fut+Int+2Pl
kâ-âh-otinahk	kâ-âh-otinahk	+?
kiwî-kakwê-âh-onâpêminâwâw	PV/kakwe+RdplS+onâpêmiw+V+AI+Ind+Fut+Int+2Pl
kâ-âh-otinahk	kâ-âh-otinahk	+?
kâ-kî-kakwê-âh-otinikoyâhk.	kâ-kî-kakwê-âh-otinikoyâhk.	+?
nikî-âh-otinâhtikwânân	nikî-âh-otinâhtikwânân	+?
âh-oyôsisimiw;	âh-oyôsisimiw;	+?
âh-oyôsisimiw,	âh-oyôsisimiw,	+?
nikî-pê-âh-otihtikonân,	nikî-pê-âh-otihtikonân,	+?
wî-âh-osâmêyihtam,	wî-âh-osâmêyihtam,	+?
ê-at[i]-âh-ocawâsimisit.	ê-at[i]-âh-ocawâsimisit.	+?

STRONG with -wâh- (or wâh- preverb)

ê-wâh-onâpêmicik.	ê-wâh-onâpêmicik.	+?
ê-wâh-ocihcihkwanapiyâhk.	ê-wâh-ocihcihkwanapiyâhk.	+?
kî-wâh-osîhcikâtêwa,	kî-wâh-osîhcikâtêwa,	+?
nitawi-wâh-ocipitêwak	PV/nitawi+PV/wah+ocipitêw+V+TA+Ind+Prs+3Pl+4Sg/PlO
ê-wâh-onâpêmicik,	ê-wâh-onâpêmicik,	+?
ê-pê-wâh-otihtinikoyâhk	PV/pe+PV/wah+otihtinêw+V+TA+Cnj+Prs+3Sg+1PlO
ê-pê-wâh-otihtinikoyâhk	PV/pe+PV/wah+otihtinêw+V+TA+Cnj+Prs+4Sg/Pl+1PlO
ê-wâh-ocêmikoyâhk	PV/wah+ocêmêw+V+TA+Cnj+Prs+3Sg+1PlO
ê-wâh-ocêmikoyâhk	PV/wah+ocêmêw+V+TA+Cnj+Prs+4Sg/Pl+1PlO
ê-kî-wâh-osîhtamawâcik	PV/wah+osîhtamawêw+V+TA+Cnj+Prt+3Pl+4Sg/PlO
ê-kî-papâmi-wâh-otinât	PV/papami+PV/wah+otinêw+V+TA+Cnj+Prt+3Sg+4Sg/PlO
ê-wâh-otinahk	PV/wah+otinam+V+TI+Cnj+Prs+3Sg
ê-pimi-wâh-ohpahtênaman	ê-pimi-wâh-ohpahtênaman	+?
ê-wâh-ohtohtêcik,	ê-wâh-ohtohtêcik,	+?

differences in animacy/transitivity types of nouns and verbs: some verbs seem to apply to both NI and NA objects (tomina vs tominam: AW says that one is VTI, the other VTA; Maskwacîs says the opposite): to be looked into later.

Action items with priorities and assignments

Verbs:

  1. DONE! implement 3rd person proximate/obviative features as 3rd, 4th, and 5th persons with number-wise ambiguity tags (for 4th and 5th persons), making sure changes LEXC and YAML files are in full agreement (Atticus).

  2. DONE! Add -wici- -m- forms (Atticus)
  3. DONE! Add -ikawi- unspecified actor suffix (Atticus)
  4. DONE! fully implement VTA-5 paradigm (instead of using VTA-1 as the default). According to Arok, VTA-5 is basically the same as VTA-1, with the addition of to the stem in the Immediate Imperative forms (Atticus)
  5. include reciprocals and ensure they show up only in the Singular forms (Atticus)
  6. Go through IICONJ stems and verify with Arok which are SG or PL only. Infrastructure for this is already implemented in affixes/verbs.lexc; one only needs to adjust the coding in the stems file to redirect to relevant continuation lexica. After II verbs, do this process systematically for AICONJ, TICONJ and TACONJ verbs. (Atticus)
  7. implement reduplication for o-inital verb stems as discussed above (Antti)

The following will take more work and research to implement:

  1. DONE! Implementation of prefixed conjunct forms (line 68 in affixes/verbs.lexc). Research to be done to determine how kâ- (and other grammatical preverbs such as kâ-kî-) interacts with the various verb moods and functions (relativizer, infinitivizer), and how we could code this. One solution would be mark explicltly the grammatic preverb preceding a conjunct form, allowing also for the absence of such grammatical preverb (cf. above) (Atticus)
  2. Deal with two/three-letter preverb problematics (analysing the reduction of a potential -ta- as a reduced -t-, instead of as an epenthetic -t-). This can be partially solved with requiring hyphens as joiners, as well as with some restrictions on preverb combinatorics. Arok to provide some categorical restrictions, if possible, but otherwise to be explored based on Wolfart corpus data (Antti, Arok)
  3. DONE! Change preverb tags to represent vowel length accurately (to distinguish e.g. maci- ‘start’ from mâci- ‘bad’) (Antti)
  4. Allow for analysis of forms with -h- joiners, but not their generation. (TBD)

Nouns:

  1. -ici- ‘fellow’ forms (Atticus)
  2. MOSTLY DONE!<-m-> in possession (some nouns are not coded for the right continuation lexicon to allow for -m- suffix). Discuss the forms that do this with Arok. If fuzzy, allow for both possession options, but if it is categorically the case that a possessed noun must, or must not, use -m-, then code it as such (Atticus)

Particles

  1. Incorporate common contractions of particles in the lexicon as +Err/Orth cases. (Atticus)

Other/General: