Inari Sámi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-smn

Page Content

I N A R I S A A M I D I S A M B I G U A T O R

DELIMITERS

Sentence delimiters are the following: <.> <!> <?> <…> <¶>

TAGS AND SETS

Tags

This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.

Beginning and end of sentence

BOS EOS

Parts of speech tags

N A Adv V Pron CS CC CC-CS Po Pr Pcle Num Interj ABBR ACR CLB LEFT RIGHT WEB PPUNCT

PUNCT

COMMA ¶

Tags for POS sub-categories

Tags for morphosyntactic properties

Case:

Compounding:

Possessives:

Adjectival features:

Qst IV TV Prt Prs Ind Pot Cond Imprt

Person and Number:

Verb features:

Tags for clitic particles

Derivation tags

<vdic>

Err/Orth

Numeral sets

OKTA

Semantic tags

HUMAN

PROP-ATTR PROP-SUR

TIME-N-SET NOT-TIME TIME-N

Syntactic tags

Tag structure: @self>mother or @mother<self or @self

Sets containing sets of lists and tags

This part of the file lists a large number of sets based partly upon the tags defined above, and partly upon lexemes drawn from the lexicon. See the sourcefile itself to inspect the sets, what follows here is an overview of the set types.

Sets for Single-word sets

OKTA and go, and the set INITIAL for initial letters go INITIAL

Sets for word or not

WORD WORD-NOT-de NOT-COMMA

Derivational affixes

DER-V

DER-N

DER-A1 DER-A

A-V

A-NOT-V

Case sets

ADLVCASE

CASE-HALFAGREEMENT CASE-AGREEMENT CASE

NOT-NOM NOT-GEN NOT-ACC

Verb sets

NOT-V

Sets for finiteness and mood

REAL-NEG

MOOD-V

GC

VFIN

VFIN-POS

VFIN-NOT-IMPRT

VFIN-NOT-NEG

NOT-PRFPRC

Sets for person

SG1-V SG2-V SG3-V DU1-V DU2-V DU3-V PL1-V PL2-V PL3-V

Sets consisting of forms of “leđe” (these ones need to be rewritten)

Pronoun sets

Adjectival sets and their complements

Adverbial sets and their complements

Sets for coordinators

Sets for adverbs that have lookalikes

Here come some adverbs that have identical twins in other POS. If these are found in Adv contexts, we treat them as adverbs.

Sets of elements with common syntactic behaviour

Sets for verbs

V is all readings with a V tag in them, REAL-V should be the ones without an N tag following the V.
The REAL-V set thus awaits a fix to the preprocess V … N bug.

TRANS-V is the set for verbs really taking objects

STRICT-TRANS-V is the set for verbs which don’t let a GenAcc be a modifier of anything else than an object, e.g. Mun organiseren eatni gievkkanis. - eatni wants to be the object

Valency sets

Adverb sets

Adjective sets

NP sets defined according to their morphosyntactic features

The PRE-NP-HEAD family of sets

These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.

The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NOT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)

Other negatively defined morphosyntactic noun sets

Noun sets

Nominal sets defined according to their morphophonological properties Sets for lexeme homonymy (most of them are moved to where the actual rules are.)

The words in the set N-PO can be both N and Po, the set takes that into account.

Nominal sets defined according to their semantical properties

OKTA

Miscellaneous sets

Border sets and their complements

Syntactic sets

These were the set types.

RULE SECTION

Here follow the rules.

@NO CODE@

Do not touch the speller suggestions:

@NO CODE@

This is the first section. Here we put safe rules with no or minimal context.

Removing unwanted names

SUSPICIOUSNAME for remivong propernouns Ai Ain Lie Sun Ta Van Viste Ive

Numbers

Ruleset for numbers from sme, and adjusted.

SELECT:SemYear SemYear if Sem/Date is Num

SELECT:SemYear Sem/Year Choose if not currency

Sem/ID if § to the left

REMOVE:dyn dyn Arab if Prop # foreløpig løsning, til vi har ny løsning for numerals.lexc

Remove all Sem/ID

Focus clitics

SECTION 2, more context

Numerals

Imperatives

See also Imprt or Ind some sections down.

###

Partitive after numerals

Lexicalised derivations

Particular verbs

Propernouns

Removing or selecting proper nouns that are lookalikes

*Removes PropPl, but problems with names as Davviriikkaid Ráđi, there we want Prop Pl

MISC

der

*Removes derNEss if lexicalised, and both nouns are essive.

Verbs

Adjectives or nouns

Adjectives, nouns, not adverbs

Subjunctions

Conjunctions

Adverbs

Adverbs or postpositions

Adverbs or nouns

Specific adverbs

buoh

tuárviAdv SELECT

Adverbs and not Pronouns

Pronouns

Nouns, not verbs

puáttiđ, not pađđeeđ (Prt: Sg3 poođij, Du1 poođijm, Du2 poođijd, Du3 poođijn, Pl1 poođijm, Pl2 poođijd)

Lexical selection - nouns

Remove Imperative

Verb or Noun?

Px constraints

PX Number

From sme

First select Px, then remove all remaining Px

We end section 2 by removing all remaining Px

Sg2 - early cycle, safe rules

Sg3 - early cycle, safe rules

Pl3 - early cycle, safe rules

Select…

OBS: denne er ikke helt bra

Adjectives and adverbs

Adv or not?

maid has many readings and as Rel it is a member of S-BOUNDARY. Therefore we need to disambiguate it early in this file. Most important is to select Adv. Because of that A and N still can have Vfin readings, it is difficult to make very general rules.

MAPPING OF CC AND CS

Mostly we map both @CNP and @CVP, then we select @CNP, after that we remove them so @CVP remains

Numerals

Indefinite pronouns

The rules are not documented yet

And now some rules for adverbs that modify adjectives

ConNeg forms

Number following the rule headers below refer to numbers of hit in a 13 053 859 word corpus.

Supinum vs. potential – no example found in large corpus

Perfect Participle

Topicalized version

the following chapter should be possible to unify.

Actio

Present participle

*orrut vs. orrot)

Rules for “addit” (which is an adjective, but more often a verb)

Actio Loc = N Loc

Nouns or verbs

The rules are no documented yet

Demonstrative pronouns, agreement in DP - should it be moved to after verbmappings?

The rules are no documented yet

Attribute disambiguation

Rules for Attr between Dem and N

VERB MAPPINGS

Lexical disambiguation of verbs

Verbs as predicatives (@SPRED>) and (@<OPRED)

The tags (@SPRED>) and (@<OPRED) target PrfPrc

The rules are no documented yet

Verbs as prenominal participles (@>N):

(@+FAUXV) and (@+FMAINV) target Neg, orrut

(@<SUBJ) target Inf

(@<SPRED) target Inf

(@<ADVL) target Inf, Actio Ess

@-F<OBJ target Inf

(@A<) target Inf

(@N<) target Inf, Actio Ess

(@<ADVL) target Inf, Actio Ess

(@<OBJ) target Inf, Actio Ess, PrfPrc

(@+FMAINV) and (@+FAUXV) and (@-FAUXV)

The big general @+FMAINV rule

(@-FMAINV) and (@-FAUXV)

NOUNS

CASE DISAMBIGUATION

Num as subject, tricky cases - the rule should be here because of the verbdisambiguation

ACCUSATIVE-ILLATIVE DISAMBIGUATION

ACCUSATIVE-GENITIVE DISAMBIGUATION

Secure rules for choosing Acc

Semantihkka: Choosing accusative or genitive semantically

Other genitive rules

lassinIll Selects Ill if first one to the left is lassin *lassin Sarai

Gen and preposition/postposition

Genitive in place adverbials ROUTE

Temporal adverbials: Choosing accusative or genitive TIME

Accusative or Genitive

Reflexive pronouns: acc or gen

Accusative object

*topOBJPers Removes Gen if you are Acc, and to you right is a Pron followed by a transitive verb. You have to be sentence initial

*AccVAbess Selects Gen if to the right is abessive

Gen modifiers inside NP

Accusative in coordination

Intransitive verbs can sometimes be transitive

Numerals

Leftover accusatives

*COMPInfAcc Selects Acc if you are Gen and to the left is an Inf TV @COMP-CS<

Accusative or Illative

Nominative or accusative or genitive

Nominative

Vocatives, subjects of sentence fragments

Nominative in titles and sentence fragments

Nominative after “ko”, “mahte”, “dugo” and “nugo”

Preverbal subjects

Postverbal subjects

Nominative predicatives

Nominative as objects in existential clauses

Nominative in coordination and apposition

Nominative in parallell constructions

Not nominative

Comitative rules

Assuming there is Sg Com / Pl Gen homonymy in Inari Saami. There is, but far more marginal than for North Saami, the following rules should be revised to account for that.

NP internal disambiguation of Com

Disambiguation based upon verb valency

Disambiguation of Com depending on Adv or certain verb or N

Animate nouns

HAB-ACTOR in habitive-constructions

Disambiguation based upon verb valency

COM-V

tools (concrete and abstract)

Dynamic-verbs

Event-tool-actio

Most actio can be both tool and event.

PLACE-V

Movement-verbs

Locative and comitative - Disambiguation based upon coordination

And then we remove the remaining Sg Com analysis

Final Com/Loc rule: Remove Com.

Essive

Finite or not

Finite

Not Finite

Infinitive

Indicative or imperative

Verbs according to person and number

Sg1 - First person singular

Du1 - First person dual

Pl3

Passive

Infinitive

Present Participle

Actio/Perfect Participle

NOMEN

Case rules

Other rules for nouns and pronouns

Determiners

Adverbs and adjectives

Adverbs not nouns

NOUNS

Adverb or Participle

Genitive not Nominative

Variant lemmas

And then we remove the verbs which didn’t get any syntactic tag, in favour of verbs with syntactic tags.

killifVinCohort This rule removes all other readings, if there is a mapped V reading in the same cohort. Every case which this goes wrong, should be fixed in mapping rules or previous disrules.

Removing Err/Orth

Denne regelen fjerner Err/Orth når det er samme lemma, sjøl om morfologien er forskjellig.

Test: Go for minimal weight.

Substitute rules

These 12 substitute rules add the language code to all words, to govern their behaviour in the subsequent cg files. The rules are removed when this file is ported to Apertium.


This (part of) documentation was generated from src/cg3/disambiguator.cg3