North Sami NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-sme

Page Content

DELIMITERS

Sentence delimiters are the following: <.> <!> <?> <…> <¶>

TAGS AND SETS

Tags

This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.

Beginning and end of sentence

BOS EOS

Parts of speech tags

Semantic tags

Syntactic tags

Sets containing sets of lists and tags

This part of the file lists a large number of sets based partly upon the tags defined above, and partly upon lexemes drawn from the lexicon. See the sourcefile itself to inspect the sets, what follows here is an overview of the set types.

Sets for Single-word sets

OKTA and go, and the set INITIAL for initial letters OKTA go INITIAL

Sets for word or not

WORD REAL-WORD WORD-NOT-de NOT-COMMA

Derivational affixes

DER-V

DER-V

DER-N

DER-A1

DER-A

A-V

A-NOT-V

Case sets

ADLVCASE

CASE-HALFAGREEMENT CASE-AGREEMENT CASE

NOT-NOM NOT-GEN NOT-ACC

Verb sets

NOT-V

Sets for finiteness and mood

REAL-NEG

MOOD-V

GC

VFIN

VFIN-POS

VFIN-NOT-IMPRT

VFIN-NOT-NEG

NOT-PRFPRC

Sets for person

Sets consisting of forms of “leat” (these ones need to be rewritten)

Pronoun sets

Adjectival sets and their complements

Adverbial sets and their complements

Sets for coordinators

Sets for adverbs that have lookalikes

Here come some adverbs that have identical twins in other POS. If these are found in Adv contexts, we treat them as adverbs.

Sets of elements with common syntactic behaviour

Sets for verbs

V is all readings with a V tag in them, REAL-V should be the ones without an N tag following the V.
The REAL-V set thus awaits a fix to the preprocess V … N bug.

TRANS-V is the set for verbs really taking objects

STRICT-TRANS-V is the set for verbs which don’t let a GenAcc be a modifier of anything else than an object, e.g. Mun organiseren eatni gievkkanis. - eatni wants to be the object

Valency sets

Adverb sets

Adjective sets

NP sets defined according to their morphosyntactic features

The PRE-NP-HEAD family of sets

These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.

The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NPT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)

Other negatively defined morphosyntactic noun sets

Noun sets

Nominal sets defined according to their morphophonological properties Sets for lexeme homonymy (most of them are moved to where the actual rules are.)

The words in the set N-PO can be both N and Po, the set takes that into account.

The LAHKA set family

Nominal sets defined according to their semantical properties

Miscellaneous sets

Border sets and their complements

Syntactic sets

ALLSYNTAG NON-APP

These were the set types.

Guessing: Rule for adding Sem/Date as a tag to readings which looks like dates

Guessing: Rule for adding Adv Sem/Adr as a tag to readings which looks addresses

Removing or selecting proper nouns that are lookalikes

we don’t want propernoun analysis of these words, initially in sentences

*Removes PropPl, but problems with names as Davviriikkaid Ráđi, there we want Prop Pl

*Select PlcSur (Sem/Plc) (Sem/Sur)

Some propernouns have two parts and the first is not a genitive. We still have problems with abbr when these propernouns are inflected or are a part of a cmp. The copy rule adds Attr reading to names which not get it in the fst (Soria). The select rule selects Attr when the next word is e.g. Moria.

Rules for giving Attr to names, e.g. Ole Attr Kåven.

Remove unwanted analyses

Southern Locative vs. Essive

Numerals

Lexicalised derivations

Particular verbs

Propernouns

Some adjectives are never derived as Adv

Rules for Prop Attr, Sem/Sur and Plc

MISC

ONE-COHORT DISAMBIGUATION - CYCLE 0

The idea behind “cycle 0” is to have safe rules without context first. These rules typically chose lexicalisations over derivations, Saami words instead of marginal names, etc.

Lexicalised derivations

*Removes derN if lexicalised.

*Removes derNEss if lexicalised, and both nouns are essive.

*Removes derA or PrsPrc or VGen if lexicalised. VGen is a chance.

*Removes derAdv when Adv is lexicalised.

*Removes VAbess when Adv is lexicalised.

Fragments and headliners

Adjectives or nouns, not adverbs

Adjective plural, not comparative

Adverbs

Lexicalised adverbs

It is useful to select early the adverbial reading for potensial nouns or verbs.

*aloGen removes állu Gen, álo Adv vs. N Gen

*bealisAdv

*bearreAdv beare vs bearri

*ilusAdv

*rámisA

Pronouns

Nouns, not verbs

Lexical selection - nouns

mánnu vs mánus

Not noun

Adposition or not

Not Qst

Interjections

Px-rules for special nouns

Some verb rules

Particular CS

Verb or Noun?

Adpositions

Adpositions, not verbs

Section 2: LOCAL DISAMBIGUATION - CYCLE 1

FAMILY pronouns

Pron Pers 1. p.

Pron Pers 2. p.

Pron Pers 3. p.

An early rule for “eanaš”/”eanas”

Px constraints

First select Px, then remove all remaining Px

We end section 2 by removing all remaining Px

Section 3: Certain verb readings

verb or adv

All imperatives

For imperative disambiguation we need the following: Pick imperative contexts, and thereafter remove imperative. Such contexts are: Imperative verb sentence-initially with exclamation mark

Sg1 - early cycle, safe rules

Sg2 - early cycle, safe rules

Sg3 - early cycle, safe rules

Negative verb, not abbreviation or roman numeral Ii.

Du1 - early cycle, safe rules

These Du1, Du2 rules are (almost) not in use in our corpus, but we keep them for completeness.

Du2 - early cycle, safe rules

The next two rules are not found in the corpus, but logically they belong, to cover the whole paradigm. There is no verb-internal homonymy here, but there is homonymy with e.g. Illative for certain verbs.

Du3 - early cycle, safe rules

The competitor to Du3 is -ba Foc.

Pl1 - early cycle, safe rules

The competitor here is obviously Inf, but also Pl3 and Prt Sg2.

Pl2 - early cycle, safe rules

These rules are not used when disambiguating the corpus

Pl3 - early cycle, safe rules

Select…

The following two may be joined:

Remove…

The following two may be joined:

PrsPrc

OBS: denne er ikke helt bra

*listInf in lists

Section 4: CYCLE 1B: REMOVING THE READINGS THAT WERE LEFT FROM THE 1A RULES

We don’t need more Px sections, it’s done alrady

Noun, adjectiv, PrsPrc or not?

Adjectives and adverbs

Adv or not?

maid has many readings and as Rel it is a member of S-BOUNDARY. Therefore we need to disambiguate is early in this file. Most important is to select Adv. Because of that A ang N still can have Vfin readings, it is difficult to make very general rules.

matPcle

The following two rules are omitted. They only inflect on the disambiguation of mat pcle, a wackernagel, which is done in the rule over here, I think.

Disambiguating abbreviations

Disambiguating particles

Disambiguating rom attr

Disambiguating clitics

Disambiguating numerals

Disambiguating adpositions

čađa

Commented out som adp-rules we don’t need anymore:

geahčai

guovddaš

mađe

miehta

LIST LG-MATERIAL = Inf Adv Nom ;

Diambiguation Noun vs. Po or Pr:

Some particular subjunctions and Neg Sup

go as CS and Qst Pcle

First select all “go” Qst Pcle, then remove them so the rest will be “go” CS

Section 9 WORD-SPECIFIC RULES

Some particular subjunctions

Adverb rules

MAPPING OF COMP-CS< , COMPLEMENTS OF PARTICLES IN COMPARISON

First map all COMP-CS<, then remove the other readings

MAPPING OF CC AND CS

Mostly we map both @CNP and @CVP, then we select @CNP, after that we remove them so @CVP remains

*CVPoppramsing Lásse, Iŋgá ja mun

*CVPCmp/SplitR Cmp/SplitR @CNP

PRONOUNS

Plural?

Interrogative and relative pronouns

Emphatic ieš

Numerals

Indefinite pronouns

The rules are not documented yet

Demonstrative pronouns - should have a look at these

Disambiguating adjectives

Attribute disambiguation

Rules for Attr between Dem and N

Other attribute rules

Special rules for ‘buorre’ (the only adjective showing case agreement)

This block of rules is there to ensure case agreement for comparatives.

alit vs. allat Comp Attr

And now some rules for adverbs that modify adjectives

Proper nouns

VERBS

Disambiguating verbs - part 1

First ConNeg forms, they are dependent upon Neg verbs. Then Imperative (with their special syntax), infinitive, and other infinite forms. Person comes later (in part 2)

ConNeg forms

Number following the rule headers below refer to numbers of hit in a 13 053 859 word corpus.

Imperative

See also Imprt or Ind some sections down.

Infinitive

Rules that prevent later selection of Inf for a finite verb in the frame

INF-V…CC…

Verbgenitive

Supinum vs. potential – no example found in large corpus

Perfect Participle

Topicalized version

the following chapter should be possible to unify.

Actio

Present participle

*orrut vs. orrot)

Rules for “addit” (which is an adjective, but more often a verb)

Actio Loc = N Loc

Actio Nom = Ess

Imprt or Ind

Nouns or verbs

The rules are no documented yet

Demonstrative pronouns, agreement in DP - should it be moved to after verbmappings?

The rules are no documented yet

VERB MAPPINGS

Verbs as predicatives (@SPRED>) and (@<OPRED)

The tags (@SPRED>) and (@<OPRED) target PrfPrc

The rules are no documented yet

Passive verbs often have

Verbs as prenominal participles (@>N):

(@+FAUXV) and (@+FMAINV) target Neg, orrut

(@A<) target Inf

(@<SUBJ) target Inf

(@<SPRED) target Inf

(@<ADVL) target Inf, Actio Ess

@-F<OBJ target Inf

(@N<) target Inf, Actio Ess

(@<ADVL) target Inf, Actio Ess

(@<OBJ) target Inf, Actio Ess, PrfPrc

(@+FMAINV) and (@+FAUXV) and (@-FAUXV)

(@-FMAINV) and (@-FAUXV)

And then we remove the verbs which didn’t get any syntactic tag, in favour of verbs with syntactic tags.

killifVinCohort This rule removes all other readings, if there is a mapped V reading in the same cohort. Every case which this goes wrong, should be fixed in mapping rules or previous disrules.

NOUNS

CASE DISAMBIGUATION

Num as subject, tricky cases - the rule should be here because of the verbdisambiguation

ACCUSATIVE-GENITIVE DISAMBIGUATION

Secure rules for choosing Acc

Semantihkka: Choosing accusative or genitive semantically

Other genitive rules

Genlassin Selects Gen if first one to the right is lassin *bargostipeanddaid lassin

lassinIll Selects Ill if first one to the left is lassin *lassin Sarai

*GenAhkásaš Selects Gen

Gen and preposition/postposition

Genitive in place adverbials ROUTE

Adjectives take object

Temporal adverbials: Choosing accusative or genitive TIME

Reflexive pronouns: acc or gen

Accusative object

*topOBJPers Removes Gen if you are Acc, and to you right is a Pron followed by a transitive verb. You have to be sentence initial

*AccVAbess Selects Gen if to the right is abessive

Gen modifiers inside NP

Accusative in coordination

Intransitive verbs can sometimes be transitive

Accusative or genitive in front of ALU and in front of adjectives

Exceptional accusative attributes in front of ALU nouns.

Numerals

NumGenMeasure Genitive numerals in front of ruvdnosaš with friends

Leftover accusatives

*COMPInfAcc Selects Acc if you are Gen and to the left is an Inf TV @COMP-CS<

Accusative before @COMP-CS<

Accusative before some A

Accusative sentence-finally

Genitive

Nominative and accusative

*NomIFInitialThenSg3 Selects Nom if -1 BOS and 1 oblique / Sg3 lookalike. Works in fragments.

Nominative

Miscellaneous rules

Vocatives, subjects of sentence fragments

Nominative in titles and sentence fragments

Nominative after “go”, “dego”, “dugo” and “nugo”

Preverbal subjects

Postverbal subjects

Nominative predicatives

Nominative as objects in existential clauses

Nominative in coordination and apposition

Nominative in parallell constructions

Not nominative

Comitative rules

NP internal disambiguation of Com

Disambiguation based upon verb valency

Disambiguation of Com depending on Adv or certain verb or N

Animate nouns

HAB-ACTOR in habitive-constructions

váldit vára + Loc

dahkat earrodearvvuođat geainna nu

eallit mainna nu

Disambiguation based upon verb valency

COM-V

tools (concrete and abstract)

BODY as an instrument

Dynamic-verbs

Event-tool-actio

Most actio can be both tool and event.

PLACE-V

STATE-V (eallit)

Movement-verbs

The super-set Dynamic-verb according to choose (Pl Loc) or (Sg Com)

The idea is that the superset DYNAMIC-V are not connected to TOOL, ABSTR-TOOL or CONCEPT in (Pl Loc). This is the “minste felles multiplum”. The sub-sets are different, f.i. many of them (but not all) are not connected to HUMAN in (Pl Loc), one is not connected to ABSTR-ENTITY and ACTOR in (Pl Loc). We work with negation so the rules don´t destroy analysis because of insufficent sets.

First the general-rules for selecting (Sg Com), then the more special rules for selecting (Sg Com), and then we selct (Pl Loc) for the rest of them under # Another round of locative rules.

HUMAN-LOC-V

Locative and comitative - Disambiguation based upon coordination

And then we remove the remaining Sg Com analysis

Essive OBS

Late case rules (after other case rules have worked).

VERBS PART 2, Section #22

Finite or not

Finite

Not Finite

Indicative Negative

Infinitive

Indicative or imperative

Verbs according to person and number

Sg1 - First person singular

Sg2 - Second person singular

Sg3 - Third person singular

Infinitive and clausal subject

Rules that look backwards for a subject across a relative clause:

Rules that look backwards for a subject across a subordinate clause (CP boundary):

Extension possibilities: Coordination

Son oaidná du ja mu ovdal go boahtit…

Coordinated Sg3 verbs

Not V + Sg3

Du1 - First person dual

The previous two rules look marginal.

Du2 - Second person dual

Rules for leahppi = (“leahppi” N Sg Nom)

Du3 - Third person dual

Pl1 - First person plural

Pl2 - Second person plural

Pl3 - Third person plural

Rules for a special infinitive construction

More finite verbs

Passive

Infinitive

Present Participle

Actio/Perfect Participle

Actio

Selecting some more finite verbs

Lexical disambiguation of verbs

NOMEN

Case rules

Other rules for nouns and pronouns

Determiners

Adverbs and adjectives

NOUNS

Variant lemmas

VERBS

Test: Go for minimal weight.

Final removing rules

Removing Err/Orth


This (part of) documentation was generated from src/cg3/disambiguator.cg3