INTRODUCTION TO MORPHOLOGICAL ANALYSER OF Irish LANGUAGE.
The morphological analyses of wordforms for the Irish
language are presented in this system in terms of the following symbols.
(It is highly suggested to follow existing standards when adding new tags).
-
**+Adv ** = Adverb
- **+Gn ** = General
- **+Its ** = intensifiers e.g. sách, ró- etc.
- **+Q ** = Interrogative
- **+Dir ** = Directional
- **+Loc ** = Locative
-
**+Temp ** = Temporal e.g. inniu, amárach etc.
- **+Art ** = article
- **+Def ** = definite
- **+Masc ** = masculine gender
-
**+Fem ** = feminine gender
- **+Com ** = nominative, accusative,dative case
-
**+Gen ** = genitive case
- **+Sg ** = singular
-
**+Pl ** = plural
- **+Conj ** = conjunction
- **+Coord ** = co-ordinate
-
**+Subord ** = subordinate
-
**+Det ** = Determiner
- **+Dem ** = Demonstrative
- **+Poss ** = Possessive
- **+Q ** = Interrogative
- **+Idf ** = Indefinite
- **+Def ** = Definite
- **+Qty ** = Quantifier
- **+Art ** = with article
-
**+1P +2P +3P ** = first, sceond or third person
- **+Fem ** = feminine gender
-
**+Masc ** = masculine gender
-
**+Sg +Pl ** = singular or plural in number
-
**+CM +CC +CU ** = dialects
- **+Itj ** = Interjection
- **+Filler ** = Filled Pause (eh, em,
- **+Cmc ** = Communicator (yeah, y’know)
- **+Event ** = Simple Event (laugh, sneeze etc.)
-
**+Xxx ** = Indecipherable speech
- **+Num ** = Numeral
- **+Card ** = Cardinal (one two three …)
- **+Ord ** = Ordinal (first, second, third..) i.e. mo dhá lámh, an chéad dhá theach
-
**+Def ** = after definite article etc. (an, na, aon, céad)
-
**+Part ** = Particle (not +Vb) (U)
- **+Voc ** = Vocative (v)
- **+Nm ** = Numeral (m)
- **+Inf ** = Infinitive (i)
- **+Pat ** = Patronym (p) (e.g. Ó, Ní, Uí, le, de ..)
- **+Comp ** = Comparative degree (c)
- **+Sup ** = Superlative degree (s)
- **+Cp ** = cop rel part
- **+Deg ** = degree particle with Adj/Abstract Noun (so loud, so sharp etc..
-
**+Ad ** = Adverbial particle “go”
-
**+Vb ** = Verbal (Q)
- **+Neg ** = Negative (n)
- **+Q ** = Interrogative verbal particle(q)
-
**+NegQ ** = Negative interrogative verbal particle(q)
- **+Rel ** = Relative (r)
-
**+Pro ** = rel part + pron
- **+Subj ** = subjunctive
- **+Imp ** = imperative
-
**+Cmpl ** = complementizer
- **+Past ** = past tense verbal particle
- **+CU ** = canúint Uladh
- **+CM ** = canúint na Mumhan
-
**+CC ** = canúint Chonnachta
-
**+Prep ** = Preposition
- **+Simp ** = Simple
-
**+Cmpd ** = Compound
- **+Rel ** = with Relative particle
- **+Poss ** = with Possessive
- **+Art ** = with Article
- **+Deg ** = with Degree Particle
-
**+Obj ** = á = “do a” when obj of VN
- **+Pl ** = plural
-
**+CU ** = Canuint Uladh
- **+Pron ** = Pronoun
-
**+Pro ** = Pronoun with Copula
- **+Prep ** = Prepositional pronoun
- **+Pers ** = Personal
- **+Emph ** = Emphatic (Contrastive) form of personal pronoun
- **+Ref ** = Reflexive
- **+Idf ** = Indefinite
- **+Dem ** = Demonstrative
- **+1P +2P +3P ** = first, second or third person
- **+Fem ** = feminine gender
-
**+Masc ** = masculine gender
- **+Sg +Pl ** = singular or plural in number
-
**+Sbj ** = sí, sé and siad are used only when pron follows predicate verb in
- **+Punct ** = Abbreviation
- **+Int ** = sentence internal
- **+Fin ** = sentence final
- **+Brack ** = round, square and curly brackets
- **+St ** = start bracket, quote etc
- **+End ** = end bracket, quote etc
- **+Q ** = question mark ?
- **+Bar ** = hyphen, underscore, dash etc.
- **+Quo ** = all quotation marks double, single etc.
- **+Conj +Coord ** = &
-
+Symbol = independent symbols in the text stream, like £, €, ©
-
**+XMLTag ** =
- **+VT +VI +VTI ** = transitive, intrans., both trans & intrans
- **+1P +2P +3P ** = First, second and third person
-
**+Auto ** = Autonomous
-
**+Sg +Pl ** = Singular and Plural
- **+PresInd ** = Present Indicative
- **+PastInd ** = Past Indicative
-
**+FutInd ** = Future Indicative
- **+PastImp ** = Gháthchaite Past Habitual (Imperfect Indicative)
-
**+PresImp ** = Gháthláithreach Pres Habitual (Verb bí only)
-
**+Cond ** =
- **+PresSubj ** = Present Subjunctive
-
**+PastSubj ** = Past Subjunctive
-
**+Imper ** =
- **+Rel ** = relative forms - direct
-
**+RelInd ** = rel. indirect
-
**+Dep ** = dependant forms
- **+Cop ** = Copula
- **+Pres ** = copula present & future
-
**+Past ** = copula past & conditional
- **+VF ** = - form used before a word starting with a vowel or f+vowel
- **+Pro ** = - copula - sea
- **+Emph ** = - emphatic forms
- **+CM ** = Canúint na Mumhan
- **+CC ** = Canúint Chonnachta
-
**+CU ** = Canúint Uladh
-
**^VN ** = verbal noun
- **+Adj ** = adjective
- **+Base ** = positive / base form (changed from +Pos to +Base 10/09/03)
- **+Comp ** = comparative
- **+Masc ** = masculine gender
- **+Fem ** = feminine gender
- **+Com ** = nominative case
- **+Gen ** = genitive case
- **+Voc ** = vocative case
- **+Sg ** = singular
-
**+Pl ** = plural
- **+Weak ** = when an adj is qualifying a strong plural noun(i.e. noun plural is the
-
**+Strong ** = same for all cases) the adj will also have the same form in all cases
- **+NotSlen ** = qualifies a plural noun ending in a broad consonantor a vowel
- **+hPref ** = prefix e.g. (h)iontach
- **+Ecl ** = eclipsis - urú; e.g. i ngach
-
**+Len ** = e.g. ab fhearr, ba mhó
- **^Adj ** = Adjective- used in initial mutations
- **^Sé ** = Séimhiú (softening) Lenition - h added after certain initial
- **^Urú ** = eclipsis e.g. i ngach
- **^Ath ** = Athrú (change) word ending
- **^Caol ** = Caolú (slenderise)- Attenuation : ie slenderise the end of word
- **^Lea ** = Broaden
- **^Coim ** = Syncopate
- **^IM ** = initial mutation
-
**^hv ** = h before vowel
- **+Noun ** = noun
- **+Prop ** = proper
- **+Masc ** = masculine gender
- **+Fem ** = feminine gender
- **+Com ** = nominative case
- **+Gen ** = genitive case
- **+Voc ** = vocative case
- **+Dat ** = dative (e.g. teach)
- **+Sg ** = singular
- **+Pl ** = plural
- **+DefArt ** = noun preceeded by definite article (an)
- **+Def ** = noun preceeded by definite article (an)
- **+Idf ** = noun without article (there is no indefinite article)
- **+Strong ** = strong plural
- **+Weak ** = weak plural
- **+Emph ** = emphasised - ár dteachsa, do theachsa, a teachsa
- **+Subst ** = substantive - functions like a noun, but lack noun inflections
- **+Len ** = (+Sé) lenite after simple prep. eg ar chat
- **+Ecl ** = (+Urú) e.g. after compound prep eg ar an gcat
- **+Poss ** = possessive e.g. haois, n-aoiseanna
- **+hPref ** = h before vowel
- **+CM ** = canúint na Mumhan, Munster dialect
- **+CC ** = canúint Chonnachta
-
**+CU ** = canúint Uladh
- **+Part ** = see irregular nouns
-
**+Num ** = see irregular nouns
- **^M ** = masculine & feminine : initial mutations of singular nouns depend on
-
**^F ** = whether the noun is masculine or feminine
- **^C ** = nominative, genitive & vocative : initial mutations of plural nouns
-
**^G ** = depend on the case
-
**^Sé ** = Séimhiú (softening) Lenition - h added after certain initial
-
**^tv ** = “t-“ before a vowel (eg éan : Nom. Sg. Masc. an t-éan - the bird)
-
**^hv ** = “h” before a vowel (eg éan : Nom. Pl. Masc. na héin - the birds)
- **^ts ** = “t” before “s”
-
**^Def ** = dntls rule after definite article
- **^Caol ** = Caolú (slenderise)- Attenuation : ie slenderise the end of word
- **^Lea ** = Leathnú - Broadening eg an “i” is removed
- **^Coim ** = Coimriú - Syncopation - the last unstressed vowel is dropped
- **^Ath ** = Athrú (Change) - in certain plurals the ending changes : “e” -> “í”,
- **^VH ** = Maintains vowel harmony of broad and slender vowels
- **^Emph ** = emphatic forms
-
**^IM ** = general initial mutation e.g. mo chat, ar an mballa
-
**^CB ** = compound boundary
-
**+CmpdNoGen ** =
- **+Noun ** = noun
- **+Prop ** = proper
- **+Masc ** = masculine gender
- **+Fem ** = feminine gender
- **+Com ** = nominative case
- **+Gen ** = genitive case
- **+Voc ** = vocative case
- **+Sg ** = singular
- **+Pl ** = plural
- **+hPref ** =
- **+DefArt ** = noun preceeded by definite article (an)
- **+Def ** = noun preceeded by definite article (an)
- **+Idf ** = noun without article (there is no indefinite article)
- **+Strong ** = strong plural
- **+Weak ** = weak plural
- **+Emph ** = emphasised - ár dteachsa, do theachsa, a teachsa
- **+Len ** = lenite after simple prep. eg ar chat
- **+Prep ** = prefix h before vowel
- **+Ecl ** = after compound prep eg ar an gcat
- **+CM ** = canúint na Mumhan, Munster dialect
- **+CC ** = canúint Chonnachta
-
**+CU ** = canúint Uladh
- **+Place ** = Place name
- **+Fam ** = Family Name
- **+Pers ** = Personal Name
- **+Adj ** = +Adj+Base+DeNom are used for adjectives drived from proper nouns
-
**+Base ** = e.g. Spáinneach
- **^M ** = masculine & feminine : initial mutations of singular nouns depend on
-
**^F ** = whether the noun is masculine or feminine
- **^C ** = nominative, genitive & vocative : initial mutations of plural nouns
-
**^G ** = depend on the case
-
**^Sé ** = Séimhiú (softening) Lenition - h added after certain initial
-
**^Urú ** = Eclipsis - a letter placed before word initial letter (bcdfgpt)
-
**^tv ** = “t-“ before a vowel (eg éan : Nom. Sg. Masc. an t-éan - the bird)
- **^ts ** = “t” before “s”
-
**^Def ** = dntls rule after definite article
- **^Caol ** = Caolú (slenderise)- Attenuation : ie slenderise the end of word
- **^Lea ** = Leathnú - Broadening eg an “i” is removed
- **^Coim ** = Coimriú - Syncopation - the last unstressed vowel is dropped
- **^Ath ** = Athrú (Change) - in certain plurals the ending changes : “e” -> “í”,
- **^VH ** = Maintains vowel harmony of broad and slender vowels
- **^Emph ** = emphatic forms
-
**^IM ** = general initial mutation e.g. mo chat, ar an mballa
- **+VT +VI +VTI +VD ** = transitive, intrans., both trans & intrans
- **+Vow ** = vowel initial stem
- **+Suf ** = -s suffix e.g. a bhíonns
- **+Typo ** = ta/ata instead of tá/atá
- **+Var ** = variant spelling e.g. rabh instead of raibh or dheachaidh
- **+1P +2P +3P ** = First, second and third person
-
**+Auto ** = Autonomous
-
**+Sg +Pl ** = Singular and Plural
- **+PresInd ** = Present Indicative
- **+PastInd ** = Past Indicative
-
**+FutInd ** = Future Indicative
- **+PastImp ** = Gháthchaite Past Habitual (Imperfect Indicative)
-
**+PresImp ** = Gháthláithreach Pres Habitual (Verb bí only - and deireann (abair)
- **+PresSubj ** = Present Subjunctive
-
**+PastSubj ** = Past Subjunctive
- **+Rel ** = relative forms - direct
-
**+RelInd ** = rel. indirect
-
**+Dep ** = dependant forms
- **+Cop ** = Copula
- **+Pres ** = copula present & future
-
**+Past ** = copula past & conditional
- **+VF ** = - form used before a word starting with a vowel or f+vowel
- **+Pron ** = - copula+pron - sea
- **+Art ** = - copula+pron+art - sén
- **+Def ** = - copula+pron+art - sén
- **+Subst ** = - copula+pron+art+noun - séard (is é an rud)
- **+Noun ** = - copula+pron+art+noun - séard (is é an rud)
- **+Emph ** = - emphatic forms
- **+CM ** = Canúint na Mumhan
- **+CC ** = Canúint Chonnachta
-
**+CU ** = Canúint Uladh
-
**^VN ** = verbal noun
- **+VT ** = transitive
- **+VD ** = ditransitive
- **+VI ** = intransitive
- **+VTI ** = transitive & intransitive
-
**+Vow ** = vowel-initial : used to allow past-tense Len e.g. d´ith
- **+1P +2P +3P ** = First, second and third person
-
**+Auto ** = Autonomous
-
**+Sg +Pl ** = Singular and Plural
- **+PresInd ** = Present Indicative
- **+PastInd ** = Past Indicative
- **+FutInd ** = Future Indicative
-
**+PastImp ** = Past Imperfect Indicative
-
**+Cond ** =
- **+PresSubj ** = Present Subjunctive
-
**+PastSubj ** = Past Subjunctive
- **+Imper ** =
- **+Neg ** = Negative
- **+Q ** = Interrogative
- **+NegQ ** =
- **^Sé ** = Séimhiú (Lenite, soften)
- **^Caol ** = Caolaítear an deireadh (Slenderise the ending)
- **^Lea ** = Leathnaítear an tús (Broaden the root)
- **^LeaS ** = Leathnaítear an tús mura dtosnaíonn an foirceann le “t”
-
**^LC ** = leathan/Caol: Leathnaítear an tús mura dtosnaíonn an foirceann le “t”
- **^igh ** = remove -igh ending
- **^aigh ** = remove -aigh ending
- **^Coim ** = Coimriú (syncopation)
- **^Fr ** = Fréamh (root) use root - i.e.don’t syncopate in these cases
- **^Do ** = d’ before Past Past Imperfect (gnáthchaite0 and conditional
- **^hv ** = h before vowel e.g. ná hólaigí …
- **+VD ** = ditransitive - not at present
-
**+NStem ** = de-nominal verbal (action) noun
- **^IM ** = initial mutation
We have manually optimised the structure of our lexicon using following
flag diacritics to restrict morhpological combinatorics - only allow compounds
with verbs if the verb is further derived into a noun again:
| @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised
| @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised
| @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised
For languages that allow compounding, the following flag diacritics are needed
to control position-based compounding restrictions for nominals. Their use is
handled automatically if combined with +CmpN/xxx tags. If not used, they will
do no harm.
| @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first
| @D.CmpPref.TRUE@ | Block such words from entering ENDLEX
| @P.CmpPref.FALSE@ | Block these words from making further compounds
| @D.CmpLast.TRUE@ | Block such words from entering R
| @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding
| @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding
| @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R
| @D.CmpOnly.FALSE@ | Disallow words coming directly from root.
Use the following flag diacritics to control downcasing of derived proper
nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use
these flags. There exists a ready-made regex that will do the actual down-casing
given the proper use of these flags.
| @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj.
| @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj.