Wangkajunga language model documentation

All doc-comment documentation in one large file.

src-cg3-dependency.cg3.md

C O M M O N S Á M I D E P E N D E N C Y G R A M M A R

This dep file is for sma, sme, smj, sje.

DELIMITERS

Sentence delimiters are the following: <.> <!> <?> <…> <¶>

TAGS AND SETS

N V A Adv CC CS Inf Sup Neg Num Po Pr

Pcle Prop

Pron IV TV COMMA DASH CITATION to keep colouring we add a “ HYPHEN QMARK PUNCT LEFT RIGHT CLB Ind Pot Impr ImprtII Cond ConNeg Caus causative eus VGen Interj ABBR ACR Prs Prt Cmpnd RCmpnd PrfPrc PrsPrc Actor Actio Ger Indef Nom Acc Ill Com Gen Ess

IM For fao

POS sub-categories

Syntactic tags and sets

Syntactic tags in input to this file

Syntactic tags added in this file

@FMV : finite main verb
oaidná: Son oaidná ollislaš gova. - She sees the whole picture
infinite main verb
@FAUX : finite auxiliary verb
ferte: Son ferte oaidnit ollislaš gova. - She must see the whole picture.
@FMVdic : finite main verb introducing direct speech
@IMVdic : infinite main verb introducing direct speech
@FS-IMV : infinite main verb of subclause
@FS-IAUX : infinite auxiliary verb in subclause
@FS-N<IAUX : infinite auxiliary verb of a relative subclause
@FS-N<IMV : infinite main verb of a relative subclause
@FS-OBJ : finite verb in subclause functioning as object
@FS-OBJ> : finite verb in subclause functioning as object
@FS-<OBJ : finite verb in subclause functioning as object
@FS-SUBJ : finite verb in subclause functioning as subject
@FS-SUBJ> : finite verb in subclause functioning as subject
@FS-<SUBJ : finite verb in subclause functioning as subject
@FS-ADVL> : finite verb in subclause functioning as adverbial to the left of the main clause
@FS-<ADVL : finite verb in subclause functioning as adverbial to the right of the main clause
@S< : a clause modifying a sentence to the right of it
@FS-ADVL : finite verb in subclause …
@-FS-<ADVL : infinite subclause - eus
@-FS-ADVL> : infinite subclause - eus
@FS-N< : relative clause to N
@FS->N : relative clause to N to the left side of it - eus
@FS-VFIN< : finite verb in sentence, statement
eai: Idja ii leat šat, eai ge sii dárbbaš lámppá dahje beaivváža čuovgga, dasgo Hearrá Ipmil lea sin čuovga. - The night is not anymore, they do not need the lamp- or day- light either, because God the Lord is their light.
@FS-<APP : finite subclause functioning as an apposition
@ICL-ADVL : non-finite subclause …
@ICL-AUX< : “right” argument of auxiliary (?)
@ICL-OBJ : infinitival clause object
@ICL-SUBJ : infinitival clause subject
@ICL-P< : infinitival clause complement of preprosition
@IAUX : non-finite auxiliary
: main verb. A temporarily tag omitted in the end of the file.
: auxilary verb. A temporarily tag omitted in the end of the file.

fao syntags

kal syntags

@INS :
@<INS :
@INS> :

eus syntags

@FS-SPRED : finite verb in subclause functioning as a subject predicate - eus, but not sure if in use

Syntactic set definitions

Dep grammar

Correction rules

muitalit
XX
XX
XX
faoSumId=Rel

The finite verb

Mapping rules

lgRemove removes the language tags , , etc, before proceeding to the dep file.

This (part of) documentation was generated from src/cg3/dependency.cg3

src-cg3-disambiguator.cg3.md

Start making a syntactic disambiguator

Sets

Sentence delimiters are the following: “<.>” “<…>” “<!>” “<?>” “<¶>”

Part-of-Speech

N = noun
A = adjective
Num = numeral
V = verb
CC = conjunction
CS = subjunction
Adv = adverb
Pr = preposition
Po = postposition
Pron = pronoun
Interj = interjection

Numerus

Sg = Singular
Pl = Plural
Sg1 = Singular 1.p.
Sg2 = Singular 2.p.
Sg3 = Singular 3.p.
Pl1 = Plural 1.p.
Pl2 = Plural 2.p.
Pl3 = Plural 3.p.
Cases
Nom
Gen
Acc
Par
Ine
Ill
Ela
Ade
Abe
All
Abl
Ess
Tra
Ins
Com
SUBJ-CASE = Nom Par

Types

Prop = Proper noun
Interr = Interrogative
Dem = demonstrative pron
Rel = Relative pron Relpronpl “mikkä ja “jokka” Relpronsg “mikä” ja “joka” Interrpronpl “kuka” ja “mikä”
Pers = Personal pron
Indef = Indef pron
Inf = Infinitive
ConNeg = Conjugated as Negative form
PrfPrc = Perfectum Particip
Imprt = Imperative
Act = Active
Neg = Negation verb
COMMA = comma
Foc/kaan = focus clitic -kaan
Sem/Fem = feminin propernoun

Sets with more members

WORD = all PoS
NPMOD = these can modify a noun
NPMODADV = NPMOD plus adverb
NOT-NPMOD = these cannot modify a noun
NOT-NPMODADV = these cannot modify a noun, and is not adverb
QVANT-ADV = e.g. paljon, vähän
KUNKA = e.g. kunka missä (adverbs that start a sentence)
S-BOUNDARY = words that start a sentence
VFIN = finite verb
COPULAS = olla
AUX = verbs which can be auxilary
SV-BOUNDARY = words that start a sentence and finite verb

This (part of) documentation was generated from src/cg3/disambiguator.cg3

src-cg3-functions.cg3.md

S Y N T A C T I C F U N C T I O N S F O R S Á M I

Sámi language technology project 2003-2018, University of Tromsø #

This file adds syntactic functions. It is common for all the Saami

LEFT RIGHT because of apertium

Sets for POS sub-categories
Sets for Semantic tags
Sets for Morphosyntactic properties

Syntactic tags

@+FAUXV : finite auxiliary verb
ferte: Son ferte oaidnit ollislaš gova. - She must see the whole picture.
@+FMAINV : finite main verb
oaidná: Son oaidná ollislaš gova. - She sees the whole picture
@-FAUXV : infinite auxiliary verb
sáhte: In sáhte gáhku borrat. - I cannot eat cake.
@-FMAINV : infinite main verb
oaidnit: Son ferte oaidnit ollislaš gova. - She must see the whole picture.
@-FSUBJ> : Subject of infinite verb outside the verbal.
mu: Diet dáhpáhuvai mu dieđikeahttá. - It happened without me knowing about it.
@-F<OBJ : Subject of infinite verb outside the verbal.
nuppi: Ulbmil lea oažžut nuppi boagustit. - The goal is to get the other one to laugh.
@-FOBJ> : Object of infinite verb outside the verbal.
váldovuoittuid: Sii vurde váldovuoittuid fasket. - They waited to grab the main prizes.
@SPRED<OBJ : Object of an subsject predicative. (some adjectives are transitive)
guliid: Mánát leat oažžulat guliid.
@-FADVL : Adverbial complement of infinite verb outside the verbal.
várrogasat: Dihkkadeaddji rávve skohtervuddjiid várrogasat mátkkoštit. - The roadman warns snowscooter drivers to drive carefully.
@-F<PRED : Predicative complement of infinite verb outside the verbal.
ággan: Jáhkken kulturmáhtu leat oktan ággan.
@>ADVL : Modifier of an adverbial to the right.
vaikko: doppe leat vaikko man ollu studeanttat.
@ADVL< : Komplement for adverbial.
vahkus: Son málesta guktii vahkus.
@<ADVL : Adverbial after the main verb.
dás: Eanet dieđuid gávnnat dás.
@ADVL> : Adverbial to the left of the main verb
viimmat: Dál de viimmat asttan lohkat reivve.
@ADVL>CS : Adverbial modifying subjunction.
‘beare’ pointing at ‘danin go’: Muhto dus ii leat riekti dearpat su beare danin go sáhtát.
: Habitive, specifying an adverbial, e.g. @ADVL>
Máhtes: Máhtes lea beana.
: Extencial, specifying an subject, e.g. @<SUBJ
beana: Máhtes lea beana.
: logoforic pronouns, e.g. @>N (for MT)
:
@>N : Modifier of a noun to the right.
geavatlaš: Ráđđehussii lea geavatlaš politihkka deaŧalaš. - For the government, practical politics is important.
@N< : Complement of noun to the left.
vihtta: Mun boađán diibmu vihtta.
@>A : Modifier of an adjective to the right.
juohke: Seminára lágiduvvo juohke nuppi jagi.
@P< : Complement of preposition.
soađi: Dat dáhpáhuvai maŋŋel soađi.
@>P : Complement of postposition.
riegádeami: Seta riegádeami maŋŋel Áttán elii vel 800 jagi.
@HNOUN : Stray noun in sentence fragment.
muittut: Fidnokurssa muittut.
@INTERJ : Interjection.
Hei: Hei, boađe!
@>Num : Attribute of numeral to the right.
dušše: Mun ledjen dušše guokte mánu doppe.
@Pron< : Complement of pronoun to the left.
Birehiin: Moai Birehiin leimme doppe.
@>Pron : Modifyer of pronoun to the right.
vaikko: Olmmoš sáhttá bargat vaikko maid.
@Num< : Complement of numeral to the left.
girjjiin: Dat lea okta min buoremus girjjiin.
@OBJ : Object, the verb is not in the sentence (ellipse)
@<OBJ : Object, the verb is to the left.
gávtti: Son goarru gávtti.
@OBJ> : Object, the verb is to the right.
filmma: Dán filmma leat Kárášjoga nuorat oaidnán.
@OPRED : Object predicative, the verb is not in the sentence (ellipse).
@<OPRED : Object predicative, the verb is to the left.
buriid: Son ráhkada gáhkuid hui buriid.
@OPRED> : Object predicative, the verb is to the right.
dohkkemeahttumin: Son oinnii dohkkemeahttumin bargat ášši nu.
@PCLE : Particle.
Amma: Amma mii eat leat máksán? - We have not paid, have we?
@COMP-CS< : Complement of subjunction.
vejolaš: Dat šaddá nu buorre go vejolaš.
@SPRED : Subject predicative, the verb is not in the sentence (ellipse).
@<SPRED : Subject predicative, the verb is to the left.
árgabivttas: Ovdal lei gákti árgabivttas.
@SPRED> : Subject predicative, the verb is to the left.
álbmogin: Sápmelaččaid historjá álbmogin lea duháhiid jagiid boaris.
@SUBJ : Subject, the finite verb is not in the sentence (ellipse).
@<SUBJ : Subject, the finite verb is to the left.
gákti: Ovdal lei gákti árgabivttas.
@SUBJ> : Subject, the finite verb is to the right.
Son: Son lea mu oabbá. - Sheis my sister.
@PPRED : Predicative for predicative.
@APP : Apposition
@APP-N< : Apposition to noun to the left.
oahpaheaddji: Oidnen Ánne, min oahpaheaddji.
@APP-Pron< : Apposition to pronoun to the left.
boazodoalloáirasat: Ja moai boazodoalloáirasat áigguime vaikko guovttá joatkit barggu.
@APP>Pron : Apposition to noun to the right.
@APP-Num< : Apposition to numeral to the left.
@APP-ADVL< : Apposition to adverbial to the left.
bearjadaga: Mun vuolggán ihttin, bearjadaga.
@VOC : Vocative
Miss Turner : Bures boahtin deike, Miss Turner! - Welcome her, Miss Turner!
@CVP : Conjunction or subjunction that conjoins finite verb phrases.
go : Leago guhkes áigi dassá go Máreha oidnet? - Is it a long time since you saw Máret?
@CNP : Local conjunction or subjunction.
vai : Leago nieida vai bárdni? - Is it a girl or a boy?
@CMPND
@X : The function is unknown, e.g. because of that the word is unknown

Tag sets

Sets for verbs
V is all readings with a V tag in them, REAL-V should be the ones without an N tag following the V. The REAL-V set thus awaits a fix to the preprocess V … N bug.
The set COPULAS is for predicative constructions
NP sets defined according to their morphosyntactic features
The PRE-NP-HEAD family of sets

These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.

The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NPT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)

Miscellaneous sets
Border sets and their complements

ADLVCASE

Syntactic sets

These were the set types.

Numeral outside the sentence

HABITIVE MAPPING

hab1 hab aux leat
hab_numo1 hab copula comma comma N+Nom
hab_numo2 copula nu mo/go hab
leahab copula nu mo/go hab
hab2 hab auxv adv leat
hab3 ( @ADVL>) for asdf hab-actor and hab-case; if leat to the right, and Nom to the right of leat. Lots of restrictions.
hab3 ( @ADVL>) for asdf hab-actor and hab-case; if leat to the right, and Nom to the right of leat. Lots of restrictions.
hab3 ( @ADVL>) for asdf hab-actor and hab-case; if leat to the right, and Nom to the right of leat. Lots of restrictions.
hab3 ( @ADVL>) for hab-actor and hab-case; if leat to the right, and Nom to the right of leat. Lots of restrictions.
hab_main ( @ADVL>) for hab-actor and hab-case; if leat to the right, and Nom to the right of leat. Lots of restrictions.
habInf hab lea inf
habNomLeft Nom or Num + gen hab lea
habAdvl Ii han ovttasge du sogas leat dat namma.
hab4 hab cc hab leat
hab6 lea go hab – leago hab
hab7 lea go hab
hab8 This is not HAB Ellii šattai hoahppu.
hab5 This is not HAB Mánás gollot gieđat.
hab9 prop ord-hab leat
hab10 prop ord-hab leat
habDain ( @ADVL>) for (Pron Dem Pl Loc) if leat followed by Nom to the right
habDain2
habRel # before relative clause
habEllipse Buot gánddain lea dreassa, nieiddain fas gákti.
habGen ( @<ADVL) hab for Gen; if Gen is located in the end of the sentence and Nom is sentence initial
habGenQst ( @<ADVL) hab for Gen; in a question sentence. Gen is located sentence initially and SUBJ is found to the right. To the right of SUBJ is copulas
n<titel1 (@N<) for (“jr”) or (“sr”); if first one to the left is Prop
n<titel2 (@N<) for INITIAL; if first one to the left is a noun, or if to the left of you is a single letter which is part of a noun conjunction bustávas e ja f gáibiduvvo
n<:com (@N<) for (Sg Com); if first one to the left is Coll
>nAttr (@>N) for Attr; if there is a noun to your right
n>Indef (Pron Indef Attr); if eará is to the right
n>Indef (Pron Indef Com); if eará is to the right
>nNum (@>N) for numerals if; there is a noun to your right. You are not allowed to be (Sg Nom), (Sg Acc) or (Sem/Date)
noun>n (@>N) for Gen; if there is a noun to your right. Restrictions: Not if you are: a time related word. Not if you are OKTA with Pl Loc to your right. Not if CC is to your right followed by another Gen and then Po. Not if you are HUMAN and to your right is Actio Nom folloed by a noun.
>nTime (@>N) for Gen TIME-N; if timenoun to your right. Restrictions: Not if you are a OKTA Nom with Pl Loc to your right. Not if CC followed by Gen, followed by Po to your right. Not if COMMA to your right
>ntittel (@>N) for (Sg Nom TIME-N) or (Nom Der/NomAg); if to your right is Sem/Mal, Sem/Fem, Sem/Sur
>nplc (@>N) for (Sg Nom Prop Sem/Plc), if to your right is Sem/Plc
>nALU (@>N) for Sg Acc numerals; when a measure-noun to the right
>NTime (@>N) for Gen; if you are TIME-N with BOC to your left, and PREGEN to your right
n<:Refl (@N<) for (Refl Nom); if to the left is (N Nom), or if first one to the left is a finite mainverb with a (N Nom) to the left
>pron1 (@>Pron) for GRADE-ADV, DUSSE, BUOT if; first one to the right is Pron
>pron2 (@>Pron) for (Refl Nom) if; first one to the right is Refl
>pron3 (@>Pron) for (Pron Recipr) if; first one to the right is (Pron Recipr)
vaikko (@>Pron) for vaikko if; first one to the right is Indef
vaikkoman (@>ADVL) for vaikko if; first one to the right is man
dasmaŋŋel (@>ADVL) for vaikko if; first one to the right is man
adv>advl (@>ADVL)
adv>advl (@>ADVL)
BOSvoc (@VOC) for HUMAN Nom; if sentence initial. To the right is comma. No nom-cased HUMAN followed by comma or CC is allowed to the right. There should not be a relative clause to the right, because then you are likely to be SUBJ
voc (@VOC) for Nom HUMAN; if comma to the left and an second person verb or pronoun to the left. To the right is the end of the sentence
Particle<subj (@PCLE)
spred<obj (@SPRED<OBJ) for Acc; the object of an SPRPED. Not to be mistaken with OPRED. If SPRED is to the left, and copulas is to the left of it. Nom or Hab are found sentence initially.
Hab<subj ( @<SUBJ) for Nom; if copulas, goallut or jápmit is FMAINV and habitive or human Loc is found to the left. OR: if Ill or @Pron< followed by HAB are found to the left.
Hab<subj ( @<SUBJ) with relative clause in between
Hab>Advlcase<subj ( @<SUBJ) for Nom; it allows adverbials with Ill/Loc/Com/Ess to be found inbetween HAB and .
Nom>Advlcase<subj ( @<SUBJ) for Nom; it allows adverbials with Ill/Loc/Com/Ess to be found inbetween Nom and @<SUBJ.
<extSubj ( @<SUBJ) for Nom; if copulas to the left, and some kind of adverb, N Loc, time related word or Po to the left of it. OR: if Ill or @Pron< to the left, followed by copulas and the before mentioned to the left of copulas.
<extSubj ( @<SUBJ) for sma Nom; if some kind of adverb to the left, N Loc, time related word or Po to the left of it.
<extSubjA ( @<SUBJ) for A - TEST WITHOUT THIS ONE
<extSubj ( @<SUBJ) for Nom; if leat to the left and sentenceboundary
<extSubj ( @<SUBJ) for Nom, but not for Pers. To the left boahtit or heaŋgát as MAINV, and futher to the left is some kind of place related word, or time related word
loc<extSubj ( @<SUBJ) for Nom
<spred (@<SPRED) for Nom; if Nom to the left, copulas to the left of Nom, and a time related word to the left of it.
<extQst1 ( @<SUBJ) for Nom; in an existential sentence. To your left is hab, some kind of place or time-word or Po. This is a Qst-sentence so the qst-pcle is attached to leat or following leat
<extQst2 ( @<SUBJ) for Nom; in an existential sentence. To your left is leat and it is sentence initial. No attributes or other words are allowed inbetween (because then you are SPRED), except the attribute muhtun, muhtin
extQst3> ( @SUBJ>) for Nom; if habitive first one to the left, followed by copulas.
extQst3> ( @SUBJ>) for Nom; if habitive first one to the left, followed by copulas.
<extsubjcoor ( @<SUBJ) for Nom. Coordination
Sem/Year
<spredQst (@<SPRED) for Nom; in a typically question sentence; You are not allowed to be Pers or human. The special part is that Nom is not allowed to your right
<spredQst2 (@<SPRED) for (A Nom); in a typically question sentence; You are SPRED if (N Nom) is to your left and leat + qst is to the left
<spredQst3 (@<SPRED) for (A Nom); you are SPRED when you are (A Nom) and to your right is (N Nom). This is a Qst-sentence, so copulas is found to your left
<spredQst4 (@<SPRED) for Nom; but only in a qst-sentence where there is no chance of you beeing the subj
<NomBeforeSpred (@<SPRED) for (A Nom) if; Nom to the left, and copulas is to the left of Nom. There is no Nom allowed to the right of copulas! To avoid messing with coordination: ja, dahje and comma are not allowed to your left. Comma is not allowed to your right; if so then you are likely to be coordinated
<spred (@<SPRED) for A Nom or N Nom if; the subject Nom is on the same side of copulas as you: on the right side of copulas
<spredVeara (@<SPRED) for veara + Nom; if genitive immediately to the right, and intransitive mainverb to the right of genitive
leftCop<spred (@<SPRED) for Nom; if copulas is the main verb to the left, and there is no Ess found to the left of cop (note that Loc is allowed between target and cop). OR: if you are Coll or Sem/Group with copulas to your left.
<spredLocEXPERIMENT (@<SPRED) for material Loc; if you are to the right of copulas, and the Nom to the left of copulas is not a hab-actor
NumTime (@<SPRED) for A Nom
<spredSg (@<SPRED) for Sg Nom
<spredPg (@<SPRED) for Pl Nom
<spred (@<SPRED) for Nom; if copulas to the left, and Nom or sentence boundary to the left of copulas. First one to the right is EOS.
COP<spredEss (@<SPRED) for N Ess
spredEss> (@SPRED>) for N Ess; if copulas to the right of you, and if an NP with nom-case first one to your left.
GalleSpred> (@SPRED>) for Num Nom; if sentence initial
spredSgMII> (@SPRED>)
spredšaddat> (@SPRED>)
r492> (@SPRED>) for Interr Gen; consisting only of negations. You are not allowed to be MII. You are not allowed to have an adjective or noun to yor right. You are not allowed to have a verb to your right; the exception beeing an aux.
AdjSpredSg> (@SPRED>) for A Sg Nom; if copulas to the right, but not if A or @<SPRED are found to the right of copulas
Spred>SubjInf (@SPRED>) for Nom; if copulas to the right, and the subject of copulas is an Inf to the right
spredCoord (@<SPRED) coordination for Nom; only if there already is a SPRED to the left of CNP. Not if there is some kind of comparison involved.
subj>Sgnr1 (@SUBJ>) for Nom Sg, including Indef Nom if; VFIN + Sg3 or Pl3 to the right (VFIN not allowed to the left)
subj>Du (@SUBJ>) for dual nominatives, including Coll Nom. VFIN + Du3 to the right.
subj>Pl (@SUBJ>) for plural nominatives, including Coll and Sem/Group. VFIN + Pl3 to the right.
subj>Pl (@SUBJ>) for plural nominatives
subj>Sg (@SUBJ>) for Nom Sg; if VFIN + Sg3 to the right.
Sg<subj (@<SUBJ) for Nom Sg; if VFIN Sg3 or Du2 to the left (no HAB allowed to the left).
Du<subj (@<SUBJ) for Nom Coll if; a dual third person verb is found to the left
PlDu<subj (@<SUBJ) for (N Nom Pl), (Sem/Group Nom), (Coll Nom), (Pron Nom Pl) if; a verb is Pl3 or Du3 to your left. The verb is not allowed to be copulas with a place, Loc or time noun to its left
copPl3<subj (@<SUBJ) for Nom Pl; you don’t to be a noun, only Nom Pl. To the left is copulas and first one to the right is @<SPRED
-fsubj> (@-FSUBJ>) for HUMAN Gen; in a NP-clause. To your right is Actio Nom followed by a noun
f<advl (@-F<ADVL) for infinite adverbials
f<advl (@-F<ADVL) for infinite adverbials
s-boundary=advl> (@ADVL>) for ADVL that resemble s-boundaries. Mainverb to the right.
diibmuadvl> (@ADVL>) for (diibmu Nom) if first one to the right is Num
-fsubj (@-FSUBJ>) for HUMAN Acc after DADJAT verbs
-fobj> (@-FOBJ>) for Acc if front of abessive, gerundium, actio locative, perfectum participle or infinitive. First one to the right not allowed to be Acc though
-fobj> (@-FOBJ>) for Acc if human with ADVL-case to the left and transitive infinitive OBJ to the right. First one to the right not allowed to be Acc though
advl>mainV (@ADVL>) if; finite mainverb not found to the left, but the finite mainverb is found to the right.
V<advl (@<ADVL) if; finite mainverb found to the left. Not if a comma is found immediately to the left and a finite mainverb is located somewhere to the right of this comma.
advl>v (@ADVL>) if; you are ADVL, time-noun or Sem/Route and there is a finite verb to the right in the clause, or if to your right is: de followed by a finite verb. OR: if you are a time-nound and to your right is: go or sentenceboundary followed by a finite verb
<advlPoPr (@<ADVL) for Po or Pr; if mainverb to the left.
advlPoPr> (@<ADVL) for Po or Pr; if mainverb to the right.
BOSPo> (@ADVL>) for Po; if trapped between BOS to the right and S-BOUNDARY OR COMMA to the left, because the main verb will then automatically be on your right side.
<advlComIll (@<ADVL) only if; you are Com OR Ill. To your left is a mainverb, and to your right a sentenceboundary, because we don’t want there to be another mainverb you potentially could belong to
<advlEOS (@<ADVL) for Po or Pr or Loc; if you are found at the very end of a sentence. A mainverb is needed to the left though.
<advlGen (@<ADVL) for (N Gen) if mainverb to the left and no noun to the right
<opredgohcodit (@<OPRED) for Ess
advlEss> (@<ADVL) for weather and time Ess, if FMAINV to the left.
comma<advlEOS (@<ADVL) for Adv if; mainverb is to the left. Comma to the left and mainverb to the right in the same clause is not allowed
advl>inbetween (@ADVL>) for Adv; if inbetween two sentenceboundaries where no mainverb is present.
comma<advlEOS (@<ADVL) for Adv if; comma found to the left and the finite mainverb to the left of comma. To the right is the end of the sentence.
BOSadvl> (@ADVL>) if; you are N Loc or N Ill and found sentence initially and there is a main verb somewhere to the right. No barrier for the mainverb; based on the thought that first one to your right is probably a sentenceboundary.
cleanupILL<advl (@<ADVL) for N Ill if; there are no boundarysymbols to your left, if you arent already @N< OR @APP-N<, and no mainverb is to yor left.
cleanupPo (@ADVL) for Po: This rule tags all Po:s as ADVL if they haven’t gotten a tag somewhere along the way.
cleanupPr (@ADVL) for Po: This rule tags all Pr:s as ADVL if they haven’t gotten a tag somewhere along the way.
-fsubj>asAcc (@-FSUBJ>) for HUMAN Acc; if there is a verb @-F<OBJ to your left
-f<obj (@-F<OBJ) for Acc if there is a transitive verb + SYN-V to your left
-fsubj>IV (@-FSUBJ>) for Acc; if there is an IV-verb acting as a @-F<OBJ to your right
-fsubj>IV (@-FSUBJ>) for Acc; if there is an TV-verb acting as a @-F<OBJ to your right followed by an Acc
-fsubj>asGen (@-FSUBJ>) for Gen;
f<subj (@-F<SUBJ) for Nom if; (V @-F<OBJ) to the left.
<opredAAcc (@<OPRED) for A Acc; if an other accusative to the left, and a transtive verb to the left of it. OR: if a transitive verb to the left, and an accusative to the left of it.
TV<obj (@<OBJ) for Acc; if there is a transitive mainverb to the left in the clause. Not for Rel. Not if you are a numeral followed by a measure-noun

sma object

<advlMeasr (@<ADVL) for (Num Acc); if finite IV-mainverb to the left, measure-noun to the right
<objMeasr (@<OBJ) for Num Acc; if finite TV-mainverb to the left, measure-noun to the right
<advlMeasr2 (@<ADVL) for MEASR-N + Acc; if (Num Pl) to the left and mainverb to the left of it
advlMeasr> (@ADVL>) for Num Acc;
Obj> (@OBJ>) for Acc; if there is a finite mainverb to the right in the clause. A really simple rule with no other restrictions..
s-boun<obj (@<OBJ) for Acc; if sentenceboundary to your left and a transitive mainverb to the left futher to the left
<objIV (@<OBJ) for Acc; if there is an intransitive mainverb in the clause. Not for Rel or Num. Not if you are a numeral followed by a measure-noun
<advlEss (@<ADVL) for ESS-ADVL if; FMAINV to the left
IV<spredEss (@<SPRED) for N Ess if; FMAINV to the left is intransitive or bargat
<opredEss (@<OPRED) for (N Ess), (A Ess) if; transitive mainverb to the left in the clause. If accusative to the left or to the right, or if Inf or ahte to the right, or if there is a noun to the right followed by an Inf
Acc<opredEss (@<OPRED) for (N Ess), (A Ess) if; transitive mainverb to the left in the clause, and an accusative cased Rel left to the verb
onlyV<opred (@<OPRED) for (N Ess) if; there is a transitive mainverb to the left. Usually there needs to be an Acc to the left, but here it is not needed
onlyV<opred2 (@<OPRED) for (N Ess) if;

SUBJ MAPPING - leftovers

subj>ifV (@SUBJ>) for NP-HEAD-NOM, DUPRON or (Num Nom) if; a finite mainverb is found to the right. This is a cleanup rule for subjects
hnoun>ifV (@SUBJ>) for NP-HEAD-NOM, DUPRON if. The counterpart of subj>ifV. You are HNOUN if there is a finite verb to your right, but NOT if there is a finite verb after a relative clause

OBJ MAPPING - leftovers

MAPPING for MT - experimental

HNOUN MAPPING

@<ADVLcoor (@<ADVL) for ADVLCASEAdv if @CNP to the left and ADVL to the left of it

missingX adds @X to all missings

therestX adds @X to all what is left, often errouneus disambiguated forms

For Apertium:

The analysis give double analysis because of optional semtags. We go for the one with semtag.

This (part of) documentation was generated from src/cg3/functions.cg3

src-fst-morphology-affixes-adjectives.lexc.md

Adjective inflection The Wangkajunga language adjectives compare.

This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc

src-fst-morphology-affixes-nouns.lexc.md

Noun inflection The Wangkajunga language nouns inflect in cases.

temporal and spatial nouns - have a limited set of specific case endings, and do not have pronoun clitics

demonstrative lexicons

This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc

src-fst-morphology-affixes-prefixes.lexc.md

Prefixes Prefixes in the Wangkajunga language are bound to beginning of other words.

This (part of) documentation was generated from src/fst/morphology/affixes/prefixes.lexc

src-fst-morphology-affixes-propernouns.lexc.md

Proper noun inflection The Wangkajunga language proper nouns inflect in the same cases as regular nouns, but with a colon (‘:’) as separator.

This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc

src-fst-morphology-affixes-symbols.lexc.md

Symbol affixes

This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc

src-fst-morphology-affixes-verbs.lexc.md

Verb inflection The Wangkajunga language verbs inflect in persons.

lexicon Verb_prefixes (above) -> lexicon Verb_stems (separate file) -> following lexicons, depending on relevant conjugation:

is positioning of +V here okay? or better with separate lexicon / before prefixes?

This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc

src-fst-morphology-clitics.lexc.md

Pronoun clitics (quite long)

This (part of) documentation was generated from src/fst/morphology/clitics.lexc

src-fst-morphology-phonology.twolc.md

=================================== ! The Wangkajunga morphophonological/twolc rules file ! =================================== !

pilyurr%>^P^A
pilyurr%>pa

This (part of) documentation was generated from src/fst/morphology/phonology.twolc

src-fst-morphology-root.lexc.md

Documenting the Wangkajunga root.lexc file

This files documents the Wangkajunga root.lexc file.

Analysis symbols

The morphological analyses of wordforms for the Wangkajunga language are presented in this system in terms of the following symbols. (It is highly suggested to follow existing standards when adding new tags).

The parts-of-speech are:

+N
+A
+Adv
+V
+Pron
+CS
+CC
+Adp
+Po
+Pr
+Interj
+Pcle
+Num

Transitivity:

+IV Intransitive (i.e. with Abs)
+TV Transitive (i.e. with Erg + Abs)

nominal cases

+Abs
+Erg
+Dat
+Abl
+Gen
+Loc
+Perl
+All
+Avoid

Derivational tags

+Der/Foc = derivational tags
+Der/SpatAbl
+Der/SpatAll
+Der/TempLoc

Other tags

+Inch NB from the reference book, inchoative is used as verbalisation

pronoun clitics

+Pron/Clt
+1Sg +2Sg +3Sg
+1Du +2Du +3Du
+1Pl +2Pl +3Pl
+Incl +Excl
+Acs +Refl Acs = Accessory = locative or allative
+Subj Subj = abs with intransitive verb, erg with transitive verb

other cases are declared elsewhere - Dat, Abs, Abl.

Verb affixes

tense inflections

+Prs Present Tense
+Perf +Imprt +Pst +PstNar +Fut imperfect tense inflections for Imperfective: Past, Past Habitual, Future, Imperative
+Imperf +PstHbt

irrealis tense inflections ! TODO: work on tags. Irrealis/Admon? But two separate morphophonemes

+Irr = Irrealis
+Admon = Admonitive
+Int = Intentive
+Unr = Unrealised
+Purp = Purposive
+Oblig = Obligative
+Hyp = Hypothetical
+Char = Characteristic (*payi may behave differently - nominalisation?)

affixes following from irrealis inflections

+Contr Contradictive
+Avoid Avoidance

serial and nominalised inflections

+Ser Serial
+Nomz Nominaliser

verb derivation affixes

+Act -ti nominal -> IT verb. changes position/stance meaning to action.
+Caus/Make -ma nominal -> T verb. (particularly for attributes)
+Caus/PutTo -ju nominal -> T verb.

temporal relative affix

+Trel

verb directional affixes

+Directional/towards ni (suffix / infix)
Directional/away+ maa (prefix)
Directional/around+ parra (prefix)

verb post-inflection affixes

+Grp = group (different to GROUP as derivational nominal suffix)
+Compl = completed action
+Warn = warning

verb compounds

+Compound/put = jurra
+Compound/hit = puwa
+Compound/have = kanyila
+Compound/carry = kati
+Compound/go = yarra
+Compound/lie = ngarrin
+Compound/get_up = pakala
+Compound/mouth_action = jarra

Reduplication

+Redpl Redpl+

Clitics

^P^A +Pa =
+Clt/Foc =
+Clt/Prob = kirli
+Clt/contrary_to_expectation lka
+Clt/really =
+Clt/Cert = ngulyu
+Clt/Rep = nyu
+Clt/Dub = pa
+Clt/Emph = kaja, rtuka, rtu
+Clt/while = kaji
+Clt/when = la
+Clt/then = yila, lta
+Voc = voc

Demonstrative affixes #TODO: add more meaning to tag names?

+SentMod
+Dem/ngula +Dem/pa only with yangka. In book +Rel +Pa
+Dem/na +Dem/janu in book +Foc +Abl
+Dem/janulu only with palunya

Flag diacritics for verb conjugations

@U.CONJ.Ø@
@U.CONJ.WA@
@U.CONJ.RRA@
@U.CONJ.LA@

Flag diacritics for noun cases

DCASE = derivational case

@U.DCASE.HAV@
@U.DCASE.THING@
@U.DCASE.PRIV@
@U.DCASE.INT@
@U.DCASE.WANT@
@U.DCASE.ASST@
@U.DCASE.TEMP@
@U.DCASE.DWELL@
@U.DCASE.SIDE@
@U.DCASE.TYPE@
@U.DCASE.SIM@
@U.DCASE.CONTR@
@U.DCASE.MOD@
@U.DCASE.BIG@
@U.DCASE.ANOTH@
@U.DCASE.VERY@
@U.DCASE.NUM@
@U.DCASE.DUAL@
@U.DCASE.FEW@
@U.DCASE.PL@
@U.DCASE.GRP@
@U.DCASE.PAIR@
@U.DCASE.ONLY@
@U.DCASE.FOC@

corresponding D-flags

@D.DCASE.HAV@
@D.DCASE.THING@
@D.DCASE.PRIV@
@D.DCASE.INT@
@D.DCASE.WANT@
@D.DCASE.ASST@
@D.DCASE.TEMP@
@D.DCASE.DWELL@
@D.DCASE.SIDE@
@D.DCASE.TYPE@
@D.DCASE.SIM@
@D.DCASE.CONTR@
@D.DCASE.MOD@
@D.DCASE.BIG@
@D.DCASE.ANOTH@
@D.DCASE.VERY@
@D.DCASE.NUM@
@D.DCASE.DUAL@
@D.DCASE.FEW@
@D.DCASE.PL@
@D.DCASE.GRP@
@D.DCASE.PAIR@
@D.DCASE.ONLY@
@D.DCASE.FOC@

SCASE = semantic case

@U.SCASE.ABL@
@U.SCASE.GEN@
@U.SCASE.LOC@
@U.SCASE.PERL@
@U.SCASE.ALL@

corresponding D-flags

@D.SCASE.ABL@
@D.SCASE.GEN@
@D.SCASE.LOC@
@D.SCASE.PERL@
@D.SCASE.ALL@

Flag diacritics for clitics (to ensure the same clitic does not appear twice on a single word)

CLT = clitic

@U.CLT.FOC@
@U.CLT.KIRLI@
@U.CLT.LKA@
@U.CLT.YILTA@
@U.CLT.CERT@
@U.CLT.REP@
@U.CLT.DUB@
@U.CLT.EMPH@
@U.CLT.KAJI@
@U.CLT.LA@
@U.CLT.LTA@
@U.CLT.YILA@

corresponding D-flags

@D.CLT.FOC@
@D.CLT.KIRLI@
@D.CLT.LKA@
@D.CLT.YILTA@
@D.CLT.CERT@
@D.CLT.REP@
@D.CLT.DUB@
@D.CLT.EMPH@
@D.CLT.KAJI@
@D.CLT.LA@
@D.CLT.LTA@
@D.CLT.YILA@

Flag diacritics for pronoun clitics (to ensure the same case is not used twice within a cluster).

CLCASE = pronoun clitic case TODO: consider changing name to PCCASE

@U.CLCASE.S@
@U.CLCASE.ABS@
@U.CLCASE.DAT@
@U.CLCASE.ACS@
@U.CLCASE.ABL@
@U.CLCASE.REFL@

corresponding D-flags

@D.CLCASE.S@
@D.CLCASE.ABS@
@D.CLCASE.DAT@
@D.CLCASE.ACS@
@D.CLCASE.ABL@
@D.CLCASE.REFL@

integrate the things to come:

Here are the tags from the template. These and the ones above should be merged.

The parts of speech are further split up into:

+Prop +Pers +Dem +Interr +Refl +Recipr +Rel +Indef +Temp +Spat

The Usage extents are marked using following tags:

+Err/Orth
+Use/-Spell

The nominals are inflected in the following Case and Number

+Sg +Du +Pl
+Ess +Nom +Gen +Acc +Ill +Loc +Com +Com/Sh

The possession is marked as such:

+PxSg1 +PxSg2 +PxSg3 +PxDu1 +PxDu2 +PxDu3 +PxPl1 +PxPl2 +PxPl3
The comparative forms are:
+Comp +Superl
Numerals are classified under:
+Attr +Card
+Ord

Verb moods are:

+Ind +Prs +Prt +Cond +Imprt

Other verb forms are

+Inf +Ger +ConNeg +Neg +PrsPrc +PrfPrc +Sup +VGen +VAbess

Abbreviated words are classified with:

+ABBR +ACR
+Symbol = independent symbols in the text stream, like £, €, ©
Special symbols are classified with:
+CLB +PUNCT +LEFT +RIGHT +MIDDLE
The verbs are syntactically split according to transitivity:
+TV +IV
Special multiword units are analysed with:
+Multi
Non-dictionary words can be recognised with:
+Guess

Question and Focus particles:

+Qst +Foc

Semantics are classified with

+Sem/Spat
+Sem/Temp
+Sem/Mal
+Sem/Fem
+Sem/Sur
+Sem/Plc
+Sem/Org
+Sem/Obj
+Sem/Ani
+Sem/Hum
+Sem/Plant
+Sem/Group
+Sem/Time
+Sem/Txt
+Sem/Route
+Sem/Measr
+Sem/Wthr
+Sem/Build
+Sem/Edu
+Sem/Veh
+Sem/Clth

Derivations are classified under the morphophonetic form of the suffix, the

source and target part-of-speech.

Morphophonology

Flag diacritics

We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:

Flag	Explanation [
[——[————-
@P.NeedNoun.ON@	(Dis)allow compounds with verbs unless nominalised
@D.NeedNoun.ON@	(Dis)allow compounds with verbs unless nominalised
@C.NeedNoun@	(Dis)allow compounds with verbs unless nominalised

For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.

Flag	Explanation [
[——[————-
@P.CmpFrst.FALSE@	Require that words tagged as such only appear first
@D.CmpPref.TRUE@	Block such words from entering ENDLEX
@P.CmpPref.FALSE@	Block these words from making further compounds
@D.CmpLast.TRUE@	Block such words from entering R
@D.CmpNone.TRUE@	Combines with the next tag to prohibit compounding
@U.CmpNone.FALSE@	Combines with the prev tag to prohibit compounding
@P.CmpOnly.TRUE@	Sets a flag to indicate that the word has passed R
@D.CmpOnly.FALSE@	Disallow words coming directly from root.

Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.

Flag	Explanation [
[——[————-
@U.Cap.Obl@	Allowing downcasing of derived names: deatnulasj.
@U.Cap.Opt@	Allowing downcasing of derived names: deatnulasj.

LEXIXON Root

The word forms in Wangkajunga language start from the lexeme roots of basic word classes, or optionally from prefixes:

This (part of) documentation was generated from src/fst/morphology/root.lexc

src-fst-morphology-stems-adjectives.lexc.md

Adjectives Adjectives in the Wangkajunga language describe things.

This (part of) documentation was generated from src/fst/morphology/stems/adjectives.lexc

src-fst-morphology-stems-closed.lexc.md

Closed parts of speech

This file contains closed parts of speech. It might be split later on. Each POS gets first a lexicon for the tag, then a lexicon for the words pointing to the tag lexicon.

Interjections

Particles

Conjunctions

This (part of) documentation was generated from src/fst/morphology/stems/closed.lexc

src-fst-morphology-stems-nouns.lexc.md

Nouns Nouns in the Wangkajunga language are things.

This (part of) documentation was generated from src/fst/morphology/stems/nouns.lexc

src-fst-morphology-stems-numerals.lexc.md

Numerals Numerals in the Wangkajunga language are numbers.

This (part of) documentation was generated from src/fst/morphology/stems/numerals.lexc

src-fst-morphology-stems-pronouns.lexc.md

Pronouns Pronouns in the Wangkajunga language are references to things.

This (part of) documentation was generated from src/fst/morphology/stems/pronouns.lexc

src-fst-morphology-stems-verbs.lexc.md

Verbs Verbs in the Wangkajunga language are actions.

This (part of) documentation was generated from src/fst/morphology/stems/verbs.lexc

src-fst-phonetics-txt2ipa.xfscript.md

retroflex plosive, voiceless t ʈ 0288, 648 ( = ASCII 096) retroflex plosive, voiced dɖ 0256, 598 labiodental nasal F ɱ 0271, 625 retroflex nasal n ɳ 0273, 627 palatal nasal J ɲ 0272, 626 velar nasal N ŋ 014B, 331 uvular nasal N\ ɴ 0274, 628

bilabial trill B\ ʙ 0299, 665 uvular trill R\ ʀ 0280, 640 alveolar tap 4 ɾ 027E, 638 retroflex flap rɽ 027D, 637 bilabial fricative, voiceless p\ ɸ 0278, 632 bilabial fricative, voiced B β 03B2, 946 dental fricative, voiceless T θ 03B8, 952 dental fricative, voiced D ð 00F0, 240 postalveolar fricative, voiceless S ʃ 0283, 643 postalveolar fricative, voiced Z ʒ 0292, 658 retroflex fricative, voiceless s ʂ 0282, 642 retroflex fricative, voiced z` ʐ 0290, 656 palatal fricative, voiceless C ç 00E7, 231 palatal fricative, voiced j\ ʝ 029D, 669 velar fricative, voiced G ɣ 0263, 611 uvular fricative, voiceless X χ 03C7, 967 uvular fricative, voiced R ʁ 0281, 641 pharyngeal fricative, voiceless X\ ħ 0127, 295 pharyngeal fricative, voiced ?\ ʕ 0295, 661 glottal fricative, voiced h\ ɦ 0266, 614

alveolar lateral fricative, vl. K alveolar lateral fricative, vd. K\

labiodental approximant P (or v) alveolar approximant r\ retroflex approximant r` velar approximant M\

retroflex lateral approximant l` palatal lateral approximant L velar lateral approximant L
Clicks

bilabial O\ (O = capital letter) dental |
(post)alveolar !\ palatoalveolar =\ alveolar lateral ||
Ejectives, implosives

ejective > e.g. ejective p p> implosive < e.g. implosive b b< Vowels

close back unrounded M close central unrounded 1 close central rounded } lax i I lax y Y lax u U

close-mid front rounded 2 close-mid central unrounded @\ close-mid central rounded 8 close-mid back unrounded 7

schwa ə @

open-mid front unrounded E open-mid front rounded 9 open-mid central unrounded 3 open-mid central rounded 3\ open-mid back unrounded V open-mid back rounded O

ash (ae digraph) { open schwa (turned a) 6

open front rounded & open back unrounded A open back rounded Q Other symbols

voiceless labial-velar fricative W voiced labial-palatal approx. H voiceless epiglottal fricative H\ voiced epiglottal fricative <\ epiglottal plosive >\

alveolo-palatal fricative, vl. s\ alveolo-palatal fricative, voiced z\ alveolar lateral flap l\ simultaneous S and x x\ tie bar _ Suprasegmentals

primary stress “ secondary stress % long : half-long :\ extra-short _X linking mark -
Tones and word accents

level extra high _T level high _H level mid _M level low _L level extra low _B downstep ! upstep ^ (caret, circumflex)

contour, rising contour, falling _F contour, high rising _H_T contour, low rising _B_L

contour, rising-falling _R_F (NB Instead of being written as diacritics with _, all prosodic marks can alternatively be placed in a separate tier, set off by < >, as recommended for the next two symbols.) global rise global fall Diacritics

voiceless 0 (0 = figure), e.g. n_0 voiced _v aspirated _h more rounded _O (O = letter) less rounded _c advanced _+ retracted _- centralized _” syllabic = (or _=) e.g. n= (or n=) non-syllabic _^ rhoticity `

breathy voiced _t creaky voiced _k linguolabial _N labialized _w palatalized ‘ (or _j) e.g. t’ (or t_j) velarized _G pharyngealized _?\

dental d apical _a laminal _m nasalized ~ (or _~) e.g. A~ (or A~) nasal release _n lateral release _l no audible release _}

velarized or pharyngealized _e velarized l, alternatively 5 raised _r lowered _o advanced tongue root _A retracted tongue root _q

This (part of) documentation was generated from src/fst/phonetics/txt2ipa.xfscript

src-fst-transcriptions-transcriptor-abbrevs2text.lexc.md

We describe here how abbreviations are in Wangkajunga are read out, e.g. for text-to-speech systems.

For example:

s.:syntynyt # ;
os.:omaa% sukua # ;
v.:vuosi # ;
v.:vuonna # ;
esim.:esimerkki # ;
esim.:esimerkiksi # ;

This (part of) documentation was generated from src/fst/transcriptions/transcriptor-abbrevs2text.lexc

src-fst-transcriptions-transcriptor-numbers-digit2text.lexc.md

% komma% :, Root ; % tjuohkkis% :%. Root ; % kolon% :%: Root ; % sárggis% :%- Root ; % násti% :%* Root ;

This (part of) documentation was generated from src/fst/transcriptions/transcriptor-numbers-digit2text.lexc

src-fst-transcriptions-transcriptor-symbols2text.lexc.md

This file contains mappings from abbreviations and some acronyms to full forms for text-to-speech purposes. This is a supplement to the analyser; the analyser must tag the strings as +ABBR or similar for the transcriptions to work. The resulting full form must be lemmas known to the analyser, for further processing.

We describe here how abbreviations in Wangkajunga are read out, for text-to-speech systems.

The file contains:

miscellaneous symbols
smileys
Clause boundary symbols
Single punctuation marks
Paired punctuation marks

This (part of) documentation was generated from src/fst/transcriptions/transcriptor-symbols2text.lexc

tools-grammarcheckers-grammarchecker.cg3.md

[ L A N G U A G E ] G R A M M A R C H E C K E R

Wangkajunga language model documentation

src-cg3-dependency.cg3.md

C O M M O N S Á M I D E P E N D E N C Y G R A M M A R

DELIMITERS

TAGS AND SETS

POS sub-categories

Syntactic tags and sets

Syntactic tags in input to this file

Syntactic tags added in this file

fao syntags

kal syntags

eus syntags

Syntactic set definitions

Dep grammar

The finite verb

Mapping rules

src-cg3-disambiguator.cg3.md

Start making a syntactic disambiguator

Sets

Part-of-Speech

Numerus

Cases

Types

Sets with more members

src-cg3-functions.cg3.md

Syntactic tags

Tag sets

Numeral outside the sentence

HABITIVE MAPPING

sma object

SUBJ MAPPING - leftovers

OBJ MAPPING - leftovers

MAPPING for MT - experimental

HNOUN MAPPING

missingX adds @X to all missings

therestX adds @X to all what is left, often errouneus disambiguated forms

For Apertium:

src-fst-morphology-affixes-adjectives.lexc.md

src-fst-morphology-affixes-nouns.lexc.md

src-fst-morphology-affixes-prefixes.lexc.md

src-fst-morphology-affixes-propernouns.lexc.md

src-fst-morphology-affixes-symbols.lexc.md

Symbol affixes

src-fst-morphology-affixes-verbs.lexc.md

src-fst-morphology-clitics.lexc.md

src-fst-morphology-phonology.twolc.md

src-fst-morphology-root.lexc.md

Documenting the Wangkajunga root.lexc file

Analysis symbols

The parts-of-speech are:

Transitivity:

nominal cases

Derivational tags

Other tags

pronoun clitics

Verb affixes

tense inflections

irrealis tense inflections ! TODO: work on tags. Irrealis/Admon? But two separate morphophonemes

affixes following from irrealis inflections

serial and nominalised inflections

verb derivation affixes

temporal relative affix

verb directional affixes

verb post-inflection affixes

verb compounds

Reduplication

Clitics

Demonstrative affixes #TODO: add more meaning to tag names?

Flag diacritics for verb conjugations

Flag diacritics for noun cases

DCASE = derivational case

corresponding D-flags

SCASE = semantic case

corresponding D-flags

Flag diacritics for clitics (to ensure the same clitic does not appear twice on a single word)

CLT = clitic

corresponding D-flags

Flag diacritics for pronoun clitics (to ensure the same case is not used twice within a cluster).

CLCASE = pronoun clitic case TODO: consider changing name to PCCASE

corresponding D-flags