Finite state and Constraint Grammar based analysers, proofing tools and other resources
All doc-comment documentation in one large file.
Sets for POS sub-categories
Sets for Semantic tags
Sets for Morphosyntactic properties
Sets for verbs
V is all readings with a V tag in them, REAL-V should
be the ones without an N tag following the V.
The REAL-V set thus awaits a fix to the preprocess V … N bug.
The set COPULAS is for predicative constructions
NP sets defined according to their morphosyntactic features
The PRE-NP-HEAD family of sets
These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.
The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NPT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)
Miscellaneous sets
Border sets and their complements
Syntactic sets
These were the set types.
hab1
hab2
hab3 (
habNomLeft
hab4
hab6
hab7
hab5 This is not HAB
habDain (
habGen (
spred<obj (@SPRED<OBJ) for Acc; the object of an SPRPED. Not to be mistaken with OPRED. If SPRED is to the left, and copulas is to the left of it. Nom or Hab are found sentence initially.
Hab<spred (@<SPRED) for Nom; if copulas, goallut or jápmit is FMAINV and habitive or human Loc is found to the left. OR: if Ill or @Pron< followed by HAB are found to the left.
Hab>Advlcase<spred (
Nom>Advlcase<spred (
<spred (
<spred (
<spredQst1 (
<spredQst2 (@<SPRED) for Nom; in a typically question sentence; differs from <spredQst1 by not beeing as restricted to the right. Though you are not allowed to be Pers or human.
Nom<spredQst (@<SPRED) for Nom; in a typically question sentence. Differs from <spredQst2 by letting Nom be found between SPRED and copulas
<spred (@<SPRED) for A Nom or N Nom if; the subject Nom is on the same side of copulas as you: on the right side of copulas
<spredVeara (@<SPRED) for veara + Nom; if genitive immediately to the right, and intransitive mainverb to the right of genitive
leftCop<spred (@<SPRED) for Nom; if copulas is the main verb to the left, and there is no Ess found to the left of cop (note that Loc is allowed between target and cop). OR: if you are Coll or Sem/Group with copulas to your left.
<spredLocEXPERIMENT (@<SPRED) for material Loc; if you are to the right of copulas, and the Nom to the left of copulas is not a hab-actor
NumTime (@<SPRED) for A Nom
<spredSg (@<SPRED) for Sg Nom
<spredPg (@<SPRED) for Pl Nom
<spred (@<SPRED) for Nom; if copulas to the left, and Nom or sentence boundary to the left of copulas. First one to the right is EOS.
<spred (@<SPRED) for N Ess
spredEss> (@SPRED>) for N Ess; if copulas to the right of you, and if an NP with nom-case first one to your left.
HABSpredSg> (@SPRED>) for Nom; if habitive first one to the left, followed by copulas.
GalleSpred> (@SPRED>) for Num Nom; if sentence initial
spredSgMII> (@SPRED>)
r492> (@SPRED>) for Interr Gen; consisting only of negations. You are not allowed to be MII. You are not allowed to have an adjective or noun to yor right. You are not allowed to have a verb to your right; the exception beeing an aux.
AdjSpredSg> (@SPRED>) for A Sg Nom; if copulas to the right, but not if A or @<SPRED are found to the right of copulas
SpredSg>Hab (@SPRED>) for Nom; if you are sentence initial, copulas is located to the right, and there is a habitive to the right of copulas
Spred>SubjInf (@SPRED>) for Nom; if copulas to the right, and the subject of copulas is an Inf to the right
spredCoord (@<SPRED) coordination for Nom; only if there already is a SPRED to the left of CNP. Not if there is some kind of comparison involved.
subj>Sgnr1 (@SUBJ>) for Nom Sg, including Indef Nom if; VFIN + Sg3 or Pl3 to the right (VFIN not allowed to the left)
subj>Pl (@SUBJ>) for plural nominatives, including Coll and Sem/Group. VFIN + Pl3 to the right.
subj>Pl (@SUBJ>) for plural nominatives
subj>Sgnr2 (@SUBJ>) for Nom Sg; if VFIN + Sg3 to the right.
<subjSg (@<SUBJ) for Nom Sg; if VFIN Sg3 or Du2 to the left (no HAB allowed to the left).
f<advl (@-F<ADVL) for infinite adverbials
f<advl (@-F<ADVL) for infinite adverbials
s-boundary=advl> (@ADVL>) for ADVL that resemble s-booundaries. Mainverb to the right.
-fobj> (@-FOBJ>) for Acc
-fobj> (@-FOBJ>) for Acc
advl>mainV (@ADVL>) if; finite mainverb not found to the left, but the finite mainverb is found to the right.
<advl (@<ADVL) if; finite mainverb found to the left. Not if a comma is found immediately to the left and a finite mainverb is located somewhere to the right of this comma.
advlPoPr> (@<ADVL) if mainverb to the right.
advlEss> (@<ADVL) for weather and time Ess, if FMAINV to the left.
advl>inbetween (@ADVL>) for Adv; if inbetween two sentenceboundaries where no mainverb is present.
comma<advlEOS (@<ADVL) if; comma found to the left and the finite mainverb to the left of comma. To the right is the end of the sentence.
advlBOS> (@ADVL>) if; you are N Ill and found sentnece initially. First one to your right is a clause.
<advlPoEOS (@<ADVL) for Po; if you are found at the very end of a sentence. A mainverb is needed to the right though.
cleanupILL<advl (@<ADVL) for N Ill if; there are no boundarysymbols to your left, if you arent already @N< OR @APP-N<, and no mainverb is to yor left.
This (part of) documentation was generated from src/cg3/functions.cg3
Adjective inflection The Somali language adjectives compare.
This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc
Irregular verbs
These are the “irregular” verbs, which are mostly prefixing or copular.
The copulas are mostly suffixing, and all the other verbs include person prefixes, and agreement on suffixes for person. Tense and mood are expressed with complex stem alternations that are no longer 100% productive, and progressive is formed from a derivational stem, with no person prefixes.
NB: After adding in some additional morphological boundaries for some of the verbs, it should become obvious that some more simplification in amount of lexica is possible. Prefixing verbs often have multiple stems for separate tenses, and more or less get the same person prefixes in full and reduced paradigms. The only trick there is it requires more flag diacritics, to make sure that the prefix matches the suffix.
TODO: omg
LEXICON MA ma and related inflected forms.
Ah
ah is a verb meaning ‘to exist’, but can function as a copula. It is inflected in all tenses, but has long and short forms.
LEXICON Ah Inflections in tense.
This (part of) documentation was generated from src/fst/morphology/affixes/irregularverbs.lexc
Noun inflection The Somali nouns inflect in cases, are marked for gender and number.
This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc
Proper noun inflection The Somali language proper nouns inflect in the same cases as regular nouns, but with a colon (‘:’) as separator.
This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc
This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc
Verb inflection The Somali language verbs inflect in persons.
Full Reduced 1Sg A A 2Sg B A 3SgM A A 3SgF B B 1Pl C C 2Pl D A 3Pl E A
Present Past 1Sg keenaa keenay 2Sg keentaa keentay 3SgM keenaa keenay 3SgF keentaa keentay 1Pl keennaa keennay 2Pl keentaan keenteen 3Pl keenaan keenaan
Present Past 1Sg keénayaa keénayay 2Sg keénaysaa keénaysay 3SgM keénayaa keénayay 3SgF keénaysaa keénaysay 1Pl keénaynaa keénaynay 2Pl keénaysaan keénayseen 3Pl keénayaan keénayeen
Apply post-root tones, and other root triggers
This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc
Somali has several phonological alternations involving reduplication, lenition, vowel harmony and tone. The hopes with this documentation is that it will either make twolc rules clearer, or help if it comes time to completely redo all the rules.
The lenis stop series in Somali alternates with the fortis series
ilig ‘tooth' ~ iligga ‘tooth (Def.)' ~ ilko ‘teeth (Indef.)’
arag ‘see' ~ aragtaa ‘2Sg/3SgF sees' ~ arkaa ‘1Sg/3SgM sees’
Stops assimilate for voicing (or lenis/fortis), particularly across morpheme boundaries, however they only assimilate if they share place of articulation.
aragtaa ‘2Sg/3SgF sees’
wararka ‘the news’
buugga ‘the book’
naagta ‘the woman’
jaamacadda ‘the university’
This also follows for the retroflex segment
gabadh ‘(a) girl’
gabadha ‘the girl’
Vowels are subject to two main types of ablaut: (1) full ablaut across back consonants,
and (2) partial ablaut with ~
waxaan ‘foc+1Sg' magac rah
wuxuu ‘foc+3SgM' magucu / magacu ruhu
magicii / magicii
Full ablaut appears to be optional in some words.
Partial ablaut occurs in verbal infinitives with mostly any word of the pattern CaC. When
the infinitive ending is appended, raises to
tag ‘go' tegi ‘to go' tegeen ‘they went’
bax ‘leave' bexi ‘to leave' bexeen ‘they left’