Inari Sámi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-smn

Page Content

I N A R I    S A A M I    G R A M M A R    C H E C K E R

Development setup

Compiling the grammarchecker:

cd $GTLANGS/lang-smn
./autogen.sh
./configure --with-hfst --enable-syntax --enable-grammarchecker --enable-tokenisers --enable-alignment --enable-reversed-intersect
make
cd tools/grammarcheckers
make dev

Then edit/test as follows:

echo "Sun ij puátá." | sh modes/smngram.mode
echo "Sun ij puátá." | sh modes/smngram-release.mode

There are very many modes in the modes folder, look at them. If you use emacs and have cg-mode installed, you may run analysis with C-c C-i / C-c C-c See also the documentation on grammarchecker testing

Delimiters, tags and sets

Sentence delimiters are the following: <.> <!> <?> <…> <¶>

Tags and sets

This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.

Beginning / end of sentence

BOS EOS

Parts of speech

- N A Adv V Pron CS CC CC-CS Po Pr Pcle Num Interj

- ABBR ACR WEB CLB LEFT RIGHT PPUNCT PUNCT COMMA EXCLMARK ¶ ? MWE Cmp, but two-word muotâsaanijd seems like an error

POS sub-categories

- Pers Dem Interr Indef Recipr Refl Rel Coll NomAg Prop Allegro Arab Rom

Morphosyntactic properties

Nom Acc Gen Ill Loc Com Ess Ess Ess Sg Du Pl Cmp/SplitR Cmp/SgNom Cmp/SgGen PxSg1 PxSg2 PxSg3 PxDu1 PxDu2 PxDu3 PxPl1 PxPl2 PxPl3 Px (= set of all Px)

Comp Superl

Attr Ord

Qst

IV TV Prt Prs Ind Pot Cond Imprt ImprtII

Sg1 Sg2 Sg3 Du1 Du2 Du3 Pl1 Pl2 Pl3

Inf ConNeg Neg PrfPrc VGen PrsPrc Ger Sup Actio VAbess

Clitic particles

Foc/ge Foc/gen Foc/gin Foc/ges Foc/gis Foc/naj Foc/ba Foc/be Foc/hal Foc/han Foc/bat Foc/son

Derivation

Der/Pass Der/NomAg Actor (= NomAg and Der/NomAg) Der/alla Der/d Der/Car Der/Caus Der/lasj Der/NomAct Der/st Der/upmi Der/vuota Der/InchL Der/Dimin Der/Aadv Der/Comp Der/Superl

Error tags

Secondary tags

Semantic tags

HUMAN

This ends the semtag list.

TIME-N-SET NOT-TIME TIME-N

Valency tags

See also the valency file in src/cg3

Syntactic tags

The following work after the mapping rules for verbs:

SETS

Sets containing sets of lists and tags

This part of the file lists a large number of sets based partly upon the tags defined above, and partly upon lexemes drawn from the lexicon. See the sourcefile itself to inspect the sets, what follows here is an overview of the set types.

Sets for Single-word entities

The set go for ko, , and the set INITIAL for initial letters

Sets for word or not

Derivational affixes

Case sets

Verb sets

Sets for person

SG1-V SG2-V SG3-V DU1-V DU2-V DU3-V PL1-V PL2-V PL3-V

Finite verb sets

SG-V, DU-V, PL-V, DU-PL-V 1-2-V VNOTSG1 (for all other persons than Sg1), VNOTSG2, …

No one so far

z### Copula sets

(these ones need to be rewritten)

LEDE, LEAN, LEAT, …

QUASI-TV

Mun vuolgim raapâid pajas. Verb accepts accusative in front of adverb.

Pronoun sets

MUN, DON, SON, MOAI, …

Adjective sets

LEX-A, A-CASE, …

This set was removed, for a good reason?

Adverbial sets

LEX-ADV, LEX-ADV-DE, …

Coordinator sets

Foc, NEGFOC, …

Adverbs that have lookalikes

Here come some adverbs that have identical twins in other POS. If these are found in Adv contexts, we treat them as adverbs.

LACCAT-ADV

MOD-NP-ADV

MOD-ADV-ADV

EASKKA

Sets of elements with common syntactic behaviour

Verb sets

V is all readings with a V tag in them, REAL-V should be the ones without an N tag following the V. The REAL-V set thus awaits a fix to the preprocess V … N bug.

TRANS-V is the set for verbs really taking objects

STRICT-TRANS-V is the set for verbs which don’t let a GenAcc be a modifier of anything else than an object, e.g. Mun organiseren eatni gievkkanis. - eatni wants to be the object

Valency sets

Adverb sets

Adjective sets

Lexical valency sets for adjectives. here we have adjectives according to their semantic properties

Other adjective sets

A-N, A-N-CASE, …

NP sets defined according to their morphosyntactic features

The PRE-NP-HEAD family of sets

These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.

The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NOT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)

Other negatively defined morphosyntactic noun sets

Noun sets

Nominal sets defined morphophonologically Sets for lexeme homonymy (most of them are moved to where the actual rules are.)

The words in the set N-PO can be both N and Po, the set takes that into account.

Nouns that have dangerous homonyms

Nominal sets defined semantically

OHTA

GEN-ANIMAL, PREDATOR. BIRD, …

Miscellaneous sets

Border sets and their complements

Syntactic sets

These were the set types.

Grammarchecker sets

name convention for error tags: ´´&errortype-errorsubtype-is-shouldbe´´

 

 

 

RULE SECTION

Speller suggestions rule ADD @typo - make sure the suggestions survive the cg mangling:

Singleton words

Speller suggestions rule &real-love-lope

äigin > ääigi

äigi (Nom) > ääigi (Gen)

moadde kerdi > moddii

Noun phrase internal phenomena

Possessive pronouns > reflexive pronouns

Phrasal verbs

Demonstratives

Agreement rule: msyn-dem-locattr-gen

Noun phrase possessor

Agreement rule: msyn-posspl-acc-gen. Siijđoid/Siijđoi lehâstem tábáhtuvá itten.

Agreement rule: msyn-posspl-acc-gen. coordination: uásálistiđ párnáid/párnái já nuorâi leiráid

Agreement rule: msyn-posspl-acc-gen. coordination: uásálistiđ párnáid/párnái já nuorâi leiráid

Noun phrase complements

N + Ill

Agreement rule: msyn-posspl-ill-gen: uásálistiđ párnáid/párnái leiráid stuorrâhâžžân.

Agreement rule: msyn-ncompl-ess-sgill: Must lii tárbu toorjân/torjui.

Agreement rule: msyn-ncompl-ess-sgill: Maggaar toorjân/torjui sist ličij tárbu.

Agreement rule: msyn-ncompl-placc-plill: Motomeh suomâkielâ vaikuttâsah/tábáhtussáid tábáhtusâid láá tuhhiittum anarâškielân.

Agreement rule: msyn-ncompl-placc-plill

Double possessive rules

Number and case agreement

Det + N agreement

Agreement rule: msyn-det-nom-acc: Puurâ tuoh/tuoid rusinijd!

Attributive forms

mii + nominative should be mii + acc

Agreement rule: msyn-mii-sgnom-placc: Mii historjá/historjáid taat lii

Agreement rule:

Adjectives in attributive position

Agreement rule: msyn-adj-gen-nom (A.Gen + N.Nom)

Agreement rule: msyn-adj-nom-acc

Numeral phrases - case and number of nouns

Quantors in attributive positions

Agreement rule: msyn-quant-nom-gen (A.Gen + N.Nom) # add Ord

Relative pronoun in N + Rel

Quantor phrases

Numeral phrases

Agreement rule: msyn-num-par-gen: Must láá kyehti kyellid/kyele

Agreement rule: msyn-num-gen-par: 8 kaandâ/kandâd

Agreement rule: msyn-num-acc-par: 8 kaandâ/kandâd

Quantor adverbs

Quantor adverb complements

msyn-quant-gen-nom

Verb agreement rules

Indicative person agreement

Sg1

Agreement rule: msyn-agr-sg2-sg1,

Agreement rule: msyn-agr-other-sg1, Mun puátá/puáđám

Agreement rule: msyn-agr-other-sg1

Agreement rule: msyn-v-prfprc-sg1, Subject to the left

Agreement rule: msyn-v-actio-sg1, Subject to the left

Sg2

Agreement rule: msyn-agr-other-sg2, Subject to the left, Tun puátá/puáđah

Sg3

Agreement rule: msyn-agr-sg1-sg3, Subject to the left, Sun puáđam/puátá

Agreement rule: msyn-agr-sg2-sg3, Subject to the left, Sun puáđah/puátá

Agreement rule: msyn-agr-imprt-sg3, Subject to the left, Sun puáđah/puátá

Du1

Agreement rule: msyn-agr-other-du1

Du2

Agreement rule: msyn-agr-other-du2 Tuoi koolgâi/kolgáid tääl algâttiđ monnii sämmilâškampanja.

Du3

Agreement rule: msyn-agr-sg3-du3

Agreement rule: msyn-agr-sg3-du3

Pl1

Agreement rule: msyn-agr-other-pl1, Mij puátá/puáttip.

Pl2

Agreement rule: syn-agr-other-pl2, Tij puátá/puátivetteđ.

Pl3

Sg3/Pl3 errors: Suomâkielâ sárnumkielâ epikongruens maaŋgâlovo 3. persovnist

Agreement rule: Subject to the right, msyn-agr-sg3-pl3

Agreement rule: msyn-agr-sg3-du3

Agreement rule: msyn-agr-sg3-pl3, Subject to the left, 80 puátá/puáđah

Agreement rule: msyn-agr-sg3-pl3, Subject to the left, 80 puátá/puáđah

Agreement rule: msyn-agr-other-pl3, Subject to the left, Toh puátá/puátih.

Agreement rule: msyn-agr-other-pl3

Agreement rule: msyn-agr-other-pl3

Imperativ rules

Infinite verbforms

Inf should be Actio Essive

Agreement rule: msyn-orrood-inf-actioess Mun orom leđe/lemin ennuv velgus anarâškielân.

ConNeg in present tense

Agreement rule: &msyn-negcompl-sg3-conneg

Agreement rule: msyn-v-sg3-conneg Sun ij lah/lii.

Agreement rule: msyn-v-du3-conneg Noomah iä vuáđuduv/vuáđuduu.

Agreement rule: msyn-v-sg3-conneg Sun ij puávtáččij/puávtáččii vyelgiđ.

ConNeg in past tense

Agreement rule Negative gives participle msyn-v-sg1-prfprc

Sg1 should be Participle

Agreement rule: msyn-v-sg1-prfprc Sun lii huunjâm/huunnjâm.

Existential sentences

Verb should be plural.

The interference comes from Finnish e-sentences, where the verb is in the singular.

Agreement rule: msyn-extv-sg3-pl3 Must lii/láá uđđâ autoh.

Agreement rule: msyn-extv-sg3-pl3, Iäruh omâstemráhtusist: mieđetteijee já kieldee häämi.

Agreement rule: msyn-extv-pl3-sg3 Agreement rule: msyn-extv-pl3-sg3 Liihân/Lááhân must uđđâ autoh. Agreement rule: msyn-extv-pl3-sg3 Liihân/Lááhân must uđđâ autoh. Agreement rule: msyn-extv-pl3-sg3 Liihân/Lááhân must uđđâ autoh. Agreement rule: msyn-extv-pl3-sg3 Lii/Láá must uđđâ autoh.

Agreement rule: msyn-extv-numeral-sg3-pl3

Agreement rule: msyn-extneg-sg3-pl3

Agreement rule: msyn-extneg-sg3-pl3

E-sentences and habitives

Agreement rule: msyn-extsubj-ill-nom

Agreement rule: msyn-extsubj-ill-nom Šiljoost láá poccuuh/poccuid.

Agreement rule: msyn-extsubj-acc-nom

Agreement rule: msyn-extv-sg3-pl3

Agreement rule: msyn-extsubj-ill-nom

Verb agreement outside of existential and habitive sentences

Agreement rule: msyn-sg3-pl3

Subjects

Subjects gen > nom

Finnish nessesiivi: acc > nom

Agreement rule: msyn-ness-acc-nom

Agreement rule: msyn-ness-acc-nom Muu ličij/liččim kolgâm porgâđ taam tállán. (???)

Agreement rule: msyn-ness-acc-nom Ij-uv/Jieh-uv tuu/tun kolgâm vyelgiđ suáluikuávlun Jennyin?

Agreement rule: msyn-ness-acc-nom

Agreement rule: msyn-ness-acc-nom Suu/Sun koolgâi forgâ porgâđ miärádâs.

Agreement rule: msyn-pass-accsubj-nomsubj: Sämikielâlijd/Sämikielâliih nomâttâsâid kiävttojeh uccáá.

Agreement rule: msyn-pass-accsubj-nomsubj Tävirijd/Tävireh láppojii ääitist.

Agreement rule: msyn-pass-accsubj-nomsubj: Páárnán iä adeluu talkkâsijd/talkkâseh tipšopeeivi ääigi.

Subjects in passive: acc > nom

b) Suomâkielâ partitiiv passiivráhtusijn; sämikielâst passiiv ohtâvuođâst lii nominatiiv já verbâ maaŋgâlovvoost.

Agreement rule: msyn-pass-accsubj-nomsubj

Agreement rule: msyn-pass-accsubj-nomsubj Sämikielâlijd nomâttâsâid iä jur kevttuu.

Agreement rule: syn-top-placc-plnom Anarâškielâlijd/Anarâškielâliih noomâid kiävttoo/kiävttojeh uccáá.

Agreement rule: msyn-top-placc-plnom tiäđuid/tiäđuh ij kavnuu.

Objects

Singular objects

Ordinary singular objects

Hmm, no rules for this, it seems.

LEDE + OBJ + Th-ADJ + TV

** Object rule ** Object in unexpected post-copula position: Onne lii kandâ/kaandâ älkkee uáiniđ

Topicalised objects

Agreement rule: msyn-top-nom-acc

Agreement rule: msyn-top-nom-acc

Plural objects

These are often put in nominative, due to Finnish plural objects.

Agreement rule: msyn-plobj-nom-acc

Agreement rule: msyn-plobj-nom-acc

Agreement rule: msyn-obj-sgnom-sgacc Mun juuvâm ain mielkki/mielhi.

Agreement rule: msyn-obj-sgnom-sgacc Mun lam ain juunâm mielkki/mielhi.

Agreement rule: msyn-obj-sgnom-sgacc)

Agreement rule: msyn-obj-plnom-placc

Agreement rule: msyn-obj-plnom-placc

Agreement rule: msyn-obj-plnom-placc

Agreement rule: msyn-obj-plnom-placc

Agreement rule: msyn-obj-plnom-placc

Agreement rule: msyn-plobj-nom-acc Nubeh tobdeh kuobbâreh/kuobbârijd ivneest.

Agreement rule: msyn-obj-plnom-placc (6 rules) Lam valjim taah säänih/saanijd.

Imperative objects

Nom should be acc in imperative

Suomâkielâ imperatiiv mieđetteijee häämist, mast objekt sajehäämmin lii maaŋgâlovo nominatiiv mut sämikielâst akkusatiiv:

Agreement rule: msyn-imp-nom-acc

Agreement rule: msyn-imp-nom-acc

Outcommented…

Predicative

Acc > Nom in predicative

Agreement rule: msyn-pred-acc-nom Taah láá čielgâ aašijd/ááših.

Agreement rule: msyn-pred-acc-nom Lii-uv toos synonymáid/synonym

Agreement rule: msyn-pred-acc-nom as previous but -2 leđe

Agreement rule: msyn-pred-acc-nom as previous but -3 leđe

Agreement rule: msyn-pred-acc-nom complements of leđe should be Nom. As previous but -4 leđe

Agreement rule: msyn-pred-ill-nom Lii-uv toos synonymijd/synonym

Agreement error with predicative

The challenge is to avoid cases where the A is not part of the NP, like:

Agreement rule: msyn-predagr-pl3-sg3 Iä/Ij lah toorjâ.

Adjective errors

Agreement rule: msyn-adj-pred-attr Plural adjectives should be Attr in front of N.

Agreement rule: msyn-adj-attr-other

Agreement rule: msyn-adj-attr-pred Tot lii hirmâd

Agreement rule: msyn-adj-attr-pred Mun lam fiskis/fiskâd.

Dimin after ucca rules

Adverbial rules

These rules target adverbial cases, many of them the acc-ill lookalike -âid/-áid.

Adverbial case errors

Acc > Ill

Agreement rule: msyn-obj-acc-ill

Agreement rule: msyn-obj-acc-ill Ideologia kuáská kielâid/kieláid.

Agreement rule: msyn-obj-acc-ill Ideologia kuáská kielâid/kieláid.

Agreement rule: msyn-obj-acc-ill

Agreement rule: msyn-obj-acc-ill

Agreement rule: msyn-obj-acc-ill

PlLoc > SgCom

PlLoc > SgIll

PlGen > SgIll

Ess > SgIll

Gen > Loc laa pággu

Postposition internal case errors

Agreement rule: msyn-po-nom-gen

Agreement rule: msyn-po-nom-gen

Agreement rule: msyn-po-placc-plgen Vuoigâdvuotä nubástittiđ kielâ jieijâs táárbuid/táárbui mield.

Valency errors

Lexical rules

Realword error rule: real-pisso-pissood

Realword error rule: real-pele-peeli

Realword error rule: real-keesi-keessiv

Realword error rule: real-keessiv-keesi

Easteregg rule

Word order rules

Syntax rule: syn-OVS-OSV

Question particle rules

The grammarchecker file ends here.

AFTER-SECTIONS

SUBSTITUTE MWE (*)


This (part of) documentation was generated from tools/grammarcheckers/grammarchecker.cg3