Somali language model documentation
All doc-comment documentation in one large file.
src-cg3-functions.cg3.md
-
Sets for POS sub-categories
-
Sets for Semantic tags
-
Sets for Morphosyntactic properties
-
Sets for verbs
-
V is all readings with a V tag in them, REAL-V should be the ones without an N tag following the V.
The REAL-V set thus awaits a fix to the preprocess V … N bug. -
The set COPULAS is for predicative constructions
-
NP sets defined according to their morphosyntactic features
-
The PRE-NP-HEAD family of sets
These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.
The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NPT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)
-
Miscellaneous sets
-
Border sets and their complements
-
Syntactic sets
These were the set types.
HABITIVE MAPPING
-
hab1
-
hab2
-
hab3 (
@ADVL>) for hab-actor and hab-case; if leat to the right, and Nom to the right of leat. Lots of restrictions. -
habNomLeft
-
hab4
-
hab6
-
hab7
- hab8 This is not HAB
-
hab5 This is not HAB
-
habDain (
@ADVL>) for (Pron Dem Pl Loc) if leat followed by Nom to the right -
habGen (
@<ADVL) hab for Gen; if Gen is located in the end of the sentence and Nom is sentence initial -
spred<obj (@SPRED<OBJ) for Acc; the object of an SPRPED. Not to be mistaken with OPRED. If SPRED is to the left, and copulas is to the left of it. Nom or Hab are found sentence initially.
-
Hab<spred (@<SPRED) for Nom; if copulas, goallut or jápmit is FMAINV and habitive or human Loc is found to the left. OR: if Ill or @Pron< followed by HAB are found to the left.
-
Hab>Advlcase<spred (
@<SUBJ) for Nom; it allows adverbials with Ill/Loc/Com/Ess to be found inbetween HAB and . -
Nom>Advlcase<spred (
@<SUBJ) for Nom; it allows adverbials with Ill/Loc/Com/Ess to be found inbetween Nom and @<SUBJ. -
<spred (
@<SUBJ) for Nom; if copulas to the left, and some kind of adverb, N Loc, time related word or Po to the left of it. OR: if Ill or @Pron< to the left, followed by copulas and the before mentioned to the left of copulas. -
<spred (
@<SUBJ) for Nom, but not for Pers. To the left boahtit or heaŋgát as MAINV, and futher to the left is some kind of place related word, or time related word -
<spredQst1 (
@<SUBJ) for Nom in a typically question sentence; if A) Hab, some kind of place word, Po or Nom to the left, and Qst followed by copulas to the left. B) same as a, only the Qst-pcle is attached to copulas. C) Qst to the left, with copulas to its left, but not if two Nom:s are found somewhere to the right. D) copulas to the left, and BOS to the left. E) Loc or Ill to the left, and Loc or Hab to the left of this, Qst and copulas to the left. F) Num @>N to the left, Hab, some kind of place word, Po or Nom to the left, and Qst followed by copulas to the left. NOTE) for all these rules; human, Loc or Sem/Plc not allowed to the right. -
<spredQst2 (@<SPRED) for Nom; in a typically question sentence; differs from <spredQst1 by not beeing as restricted to the right. Though you are not allowed to be Pers or human.
-
Nom<spredQst (@<SPRED) for Nom; in a typically question sentence. Differs from <spredQst2 by letting Nom be found between SPRED and copulas
-
<spred (@<SPRED) for A Nom or N Nom if; the subject Nom is on the same side of copulas as you: on the right side of copulas
-
<spredVeara (@<SPRED) for veara + Nom; if genitive immediately to the right, and intransitive mainverb to the right of genitive
-
leftCop<spred (@<SPRED) for Nom; if copulas is the main verb to the left, and there is no Ess found to the left of cop (note that Loc is allowed between target and cop). OR: if you are Coll or Sem/Group with copulas to your left.
-
<spredLocEXPERIMENT (@<SPRED) for material Loc; if you are to the right of copulas, and the Nom to the left of copulas is not a hab-actor
-
NumTime (@<SPRED) for A Nom
-
<spredSg (@<SPRED) for Sg Nom
-
<spredPg (@<SPRED) for Pl Nom
-
<spred (@<SPRED) for Nom; if copulas to the left, and Nom or sentence boundary to the left of copulas. First one to the right is EOS.
-
<spred (@<SPRED) for N Ess
-
spredEss> (@SPRED>) for N Ess; if copulas to the right of you, and if an NP with nom-case first one to your left.
-
HABSpredSg> (@SPRED>) for Nom; if habitive first one to the left, followed by copulas.
-
GalleSpred> (@SPRED>) for Num Nom; if sentence initial
-
spredSgMII> (@SPRED>)
-
r492> (@SPRED>) for Interr Gen; consisting only of negations. You are not allowed to be MII. You are not allowed to have an adjective or noun to yor right. You are not allowed to have a verb to your right; the exception beeing an aux.
-
AdjSpredSg> (@SPRED>) for A Sg Nom; if copulas to the right, but not if A or @<SPRED are found to the right of copulas
-
SpredSg>Hab (@SPRED>) for Nom; if you are sentence initial, copulas is located to the right, and there is a habitive to the right of copulas
-
Spred>SubjInf (@SPRED>) for Nom; if copulas to the right, and the subject of copulas is an Inf to the right
-
spredCoord (@<SPRED) coordination for Nom; only if there already is a SPRED to the left of CNP. Not if there is some kind of comparison involved.
-
subj>Sgnr1 (@SUBJ>) for Nom Sg, including Indef Nom if; VFIN + Sg3 or Pl3 to the right (VFIN not allowed to the left)
- subj>Du (@SUBJ>) for dual nominatives, including Coll Nom. VFIN + Du3 to the right.
-
subj>Pl (@SUBJ>) for plural nominatives, including Coll and Sem/Group. VFIN + Pl3 to the right.
-
subj>Pl (@SUBJ>) for plural nominatives
-
subj>Sgnr2 (@SUBJ>) for Nom Sg; if VFIN + Sg3 to the right.
-
<subjSg (@<SUBJ) for Nom Sg; if VFIN Sg3 or Du2 to the left (no HAB allowed to the left).
-
f<advl (@-F<ADVL) for infinite adverbials
-
f<advl (@-F<ADVL) for infinite adverbials
-
s-boundary=advl> (@ADVL>) for ADVL that resemble s-booundaries. Mainverb to the right.
-
-fobj> (@-FOBJ>) for Acc
-
-fobj> (@-FOBJ>) for Acc
-
advl>mainV (@ADVL>) if; finite mainverb not found to the left, but the finite mainverb is found to the right.
-
<advl (@<ADVL) if; finite mainverb found to the left. Not if a comma is found immediately to the left and a finite mainverb is located somewhere to the right of this comma.
- <advlPoPr (@<ADVL) if mainverb to the left.
-
advlPoPr> (@<ADVL) if mainverb to the right.
-
advlEss> (@<ADVL) for weather and time Ess, if FMAINV to the left.
-
advl>inbetween (@ADVL>) for Adv; if inbetween two sentenceboundaries where no mainverb is present.
-
comma<advlEOS (@<ADVL) if; comma found to the left and the finite mainverb to the left of comma. To the right is the end of the sentence.
-
advlBOS> (@ADVL>) if; you are N Ill and found sentnece initially. First one to your right is a clause.
-
<advlPoEOS (@<ADVL) for Po; if you are found at the very end of a sentence. A mainverb is needed to the right though.
-
cleanupILL<advl (@<ADVL) for N Ill if; there are no boundarysymbols to your left, if you arent already @N< OR @APP-N<, and no mainverb is to yor left.
- <opredAAcc (@<OPRED) for A Acc; if an other accusative to the left, and a transtive verb to the left of it. OR: if a transitive verb to the left, and an accusative to the left of it.
sma object
- <advlEss (@<ADVL) for ESS-ADVL if; FMAINV to the left
- <spredEss (@<SPRED) for N Ess if; FMAINV to the left is intransitive or bargat
SUBJ MAPPING - leftovers
OBJ MAPPING - leftovers
HNOUN MAPPING
This (part of) documentation was generated from src/cg3/functions.cg3
src-fst-morphology-affixes-adjectives.lexc.md
Adjective inflection The Somali language adjectives compare.
This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc
src-fst-morphology-affixes-irregularverbs.lexc.md
Irregular verbs
These are the “irregular” verbs, which are mostly prefixing or copular.
The copulas are mostly suffixing, and all the other verbs include person prefixes, and agreement on suffixes for person. Tense and mood are expressed with complex stem alternations that are no longer 100% productive, and progressive is formed from a derivational stem, with no person prefixes.
NB: After adding in some additional morphological boundaries for some of the verbs, it should become obvious that some more simplification in amount of lexica is possible. Prefixing verbs often have multiple stems for separate tenses, and more or less get the same person prefixes in full and reduced paradigms. The only trick there is it requires more flag diacritics, to make sure that the prefix matches the suffix.
TODO: omg
LEXICON MA ma and related inflected forms.
Ah
ah is a verb meaning ‘to exist’, but can function as a copula. It is inflected in all tenses, but has long and short forms.
LEXICON Ah Inflections in tense.
This (part of) documentation was generated from src/fst/morphology/affixes/irregularverbs.lexc
src-fst-morphology-affixes-nouns.lexc.md
Noun inflection The Somali nouns inflect in cases, are marked for gender and number.
This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc
src-fst-morphology-affixes-prefixes.lexc.md
Prefixes Prefixes in the Somali language are bound to beginning of other words.
This (part of) documentation was generated from src/fst/morphology/affixes/prefixes.lexc
src-fst-morphology-affixes-propernouns.lexc.md
Proper noun inflection The Somali language proper nouns inflect in the same cases as regular nouns, but with a colon (‘:’) as separator.
This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc
src-fst-morphology-affixes-symbols.lexc.md
Symbol affixes
This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc
src-fst-morphology-affixes-verbs.lexc.md
Verb inflection The Somali language verbs inflect in persons.
Full Reduced 1Sg A A 2Sg B A 3SgM A A 3SgF B B 1Pl C C 2Pl D A 3Pl E A
Present Past 1Sg keenaa keenay 2Sg keentaa keentay 3SgM keenaa keenay 3SgF keentaa keentay 1Pl keennaa keennay 2Pl keentaan keenteen 3Pl keenaan keenaan
Present Past 1Sg keénayaa keénayay 2Sg keénaysaa keénaysay 3SgM keénayaa keénayay 3SgF keénaysaa keénaysay 1Pl keénaynaa keénaynay 2Pl keénaysaan keénayseen 3Pl keénayaan keénayeen
Apply post-root tones, and other root triggers
This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc
src-fst-morphology-phonology.twolc.md
The Somali morphophonological/twolc rules file
Morphophonological notes
Phonological Processes in Somali
Somali has several phonological alternations involving reduplication, lenition, vowel harmony and tone. The hopes with this documentation is that it will either make twolc rules clearer, or help if it comes time to completely redo all the rules.
Spreading processes
Lenition
The lenis stop series in Somali alternates with the fortis series
ilig ‘tooth' ~ iligga ‘tooth (Def.)' ~ ilko ‘teeth (Indef.)’
arag ‘see' ~ aragtaa ‘2Sg/3SgF sees' ~ arkaa ‘1Sg/3SgM sees’
Voicing assimilation
Stops assimilate for voicing (or lenis/fortis), particularly across morpheme boundaries, however they only assimilate if they share place of articulation.
aragtaa ‘2Sg/3SgF sees’
wararka ‘the news’
buugga ‘the book’
naagta ‘the woman’
jaamacadda ‘the university’
This also follows for the retroflex segment
gabadh ‘(a) girl’
gabadha ‘the girl’
Vowel ablaut
Vowels are subject to two main types of ablaut: (1) full ablaut across back consonants,
and (2) partial ablaut with ~
waxaan ‘foc+1Sg' magac rah
wuxuu ‘foc+3SgM' magucu / magacu ruhu
magicii / magicii
Full ablaut appears to be optional in some words.
Partial ablaut occurs in verbal infinitives with mostly any word of the pattern CaC. When
the infinitive ending is appended, raises to
tag ‘go' tegi ‘to go' tegeen ‘they went’
bax ‘leave' bexi ‘to leave' bexeen ‘they left’