Irish morphological analyser !
INTRODUCTION TO MORPHOLOGICAL ANALYSER OF Irish LANGUAGE.
Definitions for Multichar_Symbols
Tag symbols for analysis
The morphological analyses of wordforms for the Irish
language are presented in this system in terms of the following symbols.
- **+Corr ** =
- **+Error ** =
- **+Start ** =
Tag list:
Flag diacritics
We have manually optimised the structure of our lexicon using following
flag diacritics to restrict morhpological combinatorics - only allow compounds
with verbs if the verb is further derived into a noun again:
| @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised
| @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised
| @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised
For languages that allow compounding, the following flag diacritics are needed
to control position-based compounding restrictions for nominals. Their use is
handled automatically if combined with +CmpN/xxx tags. If not used, they will
do no harm.
| @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first
| @D.CmpPref.TRUE@ | Block such words from entering ENDLEX
| @P.CmpPref.FALSE@ | Block these words from making further compounds
| @D.CmpLast.TRUE@ | Block such words from entering R
| @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding
| @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding
| @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R
| @D.CmpOnly.FALSE@ | Disallow words coming directly from root.
Use the following flag diacritics to control downcasing of derived proper
nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use
these flags. There exists a ready-made regex that will do the actual down-casing
given the proper use of these flags.
| @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj.
| @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj.
The Root lexicon etc.
- **LEXICON Root ** =
- ** Abbrev; ** =
- ** Prepositions; ** = Adpositions = Prepositions in
- ** Adverb; ** =
- ** Articles; ** =
- ** Conjunctions; ** =
- ** Determiners; ** =
- ** Interjections; ** =
- ** Fillers; ** =
- ** Communicators; ** =
- ** Events; ** =
- ** Anonymous; ** =
- ** Numerals; ** =
- ** Particles; ** =
-
** Personal_Pronouns; ** =
- ** Englishlex; ** = English lexicon including all parts of speech
- ** Communicators-English; ** = English multi word communicators, e.g. d’ya know
- ** Bardiclex; ** = classical Irish lexicon from TCD Bardic corpus -
- ** Latinlex; ** = Latin lexicom from RIA historical corpus
-
** !Tobar; ** = omitting this (non-standard older forms)
- ** Punctuation; ** =
- ** Punctuation_ga; ** =
- ** Symbols; ** =
-
** XMLTags; ** = XML tags e.g. <p>,
etc.
- ** AdjA; ** = ORIGINAL TEST LEXICON
- ** AdjIrregular; ** = ORIGINAL TEST LEXICON
- ** Adj-BaseOnly; !AdjBASE; ** = ORIGINAL TEST LEXICON
- ** Adj-IrregComp; !AI-COMP; ** = ORIGINAL TEST LEXICON
- ** AdjB; ** = punk adjs
- ** AdjC; ** = FP adjs - auto
- ** AdjDath; ** = colours
- ** AdjE; ** = FP adjs - manual
- ** Adj-FGB1; ** = Foclóir Gaeilge Béarla Uí Dhónaill
- ** Adj-FGB2; ** = Foclóir Gaeilge Béarla Uí Dhónaill
- ** AdjVariants; ** = Adj Variants in FGB
- ** AdjEqualVariants; ** = Adj Variants with Equal Sign in FGB
- ** AdjF; ** = Nationalities
-
** AdjG; ** = additions from gaois.ie bitex
- ** Nouns; ** = ORIGINAL TEST LEXICON
- ** Dative; ** = ORIGINAL TEST LEXICON
- ** Other; ** = ORIGINAL TEST LEXICON
- ** NounsB; ** = nouns
- ** NounsC; ** = FP nouns (automatic)
- ** NounsD; ** = FP nouns (manual Decl 1-3)
- ** NounsE; ** = FP nouns (manual Decs 4-5)
- ** NounsF; ** = FP nouns (manual Irregular)
- ** NounsH; ** = Various from corpora
- ** NounsIrregular; ** =
- ** Substantive; ** =
- ** NounsFGB1; ** = FGB (O Donaill) automatic (in NCI corpus)
- ** NounsFGB2; ** = FGB (O Donaill) automatic (additional)
- ** NounsVariants; ** = Variants extracted from FGB
-
** NounsEqualVariants; ** = Variants extracted from FGB (2011 EUD)
- ** NounsG; ** = Proper Nouns - MOVED from Nouns TO Proper Nouns Lexicons
- ** NP-LEX-FAM; ** = Family Names (Irish)
- ** NP-LEX-FAM-EN; ** = Family Names (English)
- ** NP-LEX-PERS; ** = Personal Names (Irish)
- ** NP-LEX-PERS-EN; ** = Family Names (English)
- ** NP-LEX-EIRE; ** = Ireland - Counties, Cities and Towns (Irish)
- ** NP-LEX-EIRE-EN; ** = Ireland - Counties, Cities and Towns (English)
- ** NP-LEX-TIR; ** = Countries (Irish)
- ** NP-LEX-TIR-EN; ** = Countries (English)
- ** NP-Irregular; ** = Various Irregular Proper Nouns
- ** NP-LEX-ORG; ** = Organisations
- ** NP-LEX-LOGAINM; ** = Placenames - sample from logainm.ie
-
** NP-LEX-RIACORPAS1; ** = Various Proper nouns from RIA Historical Corpus of Irish
- ** VerbalNounsV; ** = Verbal nouns derived from verb roots
- ** VerbalNounsN; ** = Verbal nouns derived from nouns
- ** VerbalAdjs; ** = Verbal adjectives derived from verb roots
- ** VerbalNounsGenV; ** = Verbal nouns (genitive ase) derived from verb roots
- ** VerbalNounsGenN; ** = Verbal nouns (genitive ase) derived from nouns
- ** VN-Variants; ** = FGB VN variants (VN, VNG & VA included)
-
** VNEqualVariants; ** = FGB VN = variants (VN, VNG & VA included)
- ** Verbs; ** = Irregular verbs (11)
- ** VerbsC1A; ** = ORIGINAL TEST LEXICON
- ** VerbsC2A; ** = ORIGINAL TEST LEXICON
- ** VerbsB; ** = verbs
- ** VerbsC; ** = FP verbs
- ** VerbsD; ** = FP verbs
- ** Verbs-FGB1; ** = FGB verbs
- ** Verbs-FGB2; ** = FGB verbs
- ** Verb-Variants; ** = FGB verb variants
- ** VerbsEqualVariants; ** = FGB verb = variants
This (part of) documentation was generated from src/fst/morphology/root.lexc