Somali language model documentation

All doc-comment documentation in one large file.

src-cg3-functions.cg3.md

Sets for POS sub-categories
Sets for Semantic tags
Sets for Morphosyntactic properties
Sets for verbs
V is all readings with a V tag in them, REAL-V should be the ones without an N tag following the V.
The REAL-V set thus awaits a fix to the preprocess V … N bug.
The set COPULAS is for predicative constructions
NP sets defined according to their morphosyntactic features
The PRE-NP-HEAD family of sets

These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.

The set NOT-NPMOD is used to find barriers between NPs. Typical usage: … (*1 N BARRIER NPT-NPMOD) … meaning: Scan to the first noun, ignoring anything that can be part of the noun phrase of that noun (i.e., “scan to the next NP head”)

Miscellaneous sets
Border sets and their complements
Syntactic sets

These were the set types.

HABITIVE MAPPING

hab1
hab2
hab3 ( @ADVL>) for hab-actor and hab-case; if leat to the right, and Nom to the right of leat. Lots of restrictions.
habNomLeft
hab4
hab6
hab7
hab8 This is not HAB
hab5 This is not HAB
habDain ( @ADVL>) for (Pron Dem Pl Loc) if leat followed by Nom to the right
habGen ( @<ADVL) hab for Gen; if Gen is located in the end of the sentence and Nom is sentence initial
spred<obj (@SPRED<OBJ) for Acc; the object of an SPRPED. Not to be mistaken with OPRED. If SPRED is to the left, and copulas is to the left of it. Nom or Hab are found sentence initially.
Hab<spred (@<SPRED) for Nom; if copulas, goallut or jápmit is FMAINV and habitive or human Loc is found to the left. OR: if Ill or @Pron< followed by HAB are found to the left.
Hab>Advlcase<spred ( @<SUBJ) for Nom; it allows adverbials with Ill/Loc/Com/Ess to be found inbetween HAB and .
Nom>Advlcase<spred ( @<SUBJ) for Nom; it allows adverbials with Ill/Loc/Com/Ess to be found inbetween Nom and @<SUBJ.
<spred ( @<SUBJ) for Nom; if copulas to the left, and some kind of adverb, N Loc, time related word or Po to the left of it. OR: if Ill or @Pron< to the left, followed by copulas and the before mentioned to the left of copulas.
<spred ( @<SUBJ) for Nom, but not for Pers. To the left boahtit or heaŋgát as MAINV, and futher to the left is some kind of place related word, or time related word
<spredQst1 ( @<SUBJ) for Nom in a typically question sentence; if A) Hab, some kind of place word, Po or Nom to the left, and Qst followed by copulas to the left. B) same as a, only the Qst-pcle is attached to copulas. C) Qst to the left, with copulas to its left, but not if two Nom:s are found somewhere to the right. D) copulas to the left, and BOS to the left. E) Loc or Ill to the left, and Loc or Hab to the left of this, Qst and copulas to the left. F) Num @>N to the left, Hab, some kind of place word, Po or Nom to the left, and Qst followed by copulas to the left. NOTE) for all these rules; human, Loc or Sem/Plc not allowed to the right.
<spredQst2 (@<SPRED) for Nom; in a typically question sentence; differs from <spredQst1 by not beeing as restricted to the right. Though you are not allowed to be Pers or human.
Nom<spredQst (@<SPRED) for Nom; in a typically question sentence. Differs from <spredQst2 by letting Nom be found between SPRED and copulas
<spred (@<SPRED) for A Nom or N Nom if; the subject Nom is on the same side of copulas as you: on the right side of copulas
<spredVeara (@<SPRED) for veara + Nom; if genitive immediately to the right, and intransitive mainverb to the right of genitive
leftCop<spred (@<SPRED) for Nom; if copulas is the main verb to the left, and there is no Ess found to the left of cop (note that Loc is allowed between target and cop). OR: if you are Coll or Sem/Group with copulas to your left.
<spredLocEXPERIMENT (@<SPRED) for material Loc; if you are to the right of copulas, and the Nom to the left of copulas is not a hab-actor
NumTime (@<SPRED) for A Nom
<spredSg (@<SPRED) for Sg Nom
<spredPg (@<SPRED) for Pl Nom
<spred (@<SPRED) for Nom; if copulas to the left, and Nom or sentence boundary to the left of copulas. First one to the right is EOS.
<spred (@<SPRED) for N Ess
spredEss> (@SPRED>) for N Ess; if copulas to the right of you, and if an NP with nom-case first one to your left.
HABSpredSg> (@SPRED>) for Nom; if habitive first one to the left, followed by copulas.
GalleSpred> (@SPRED>) for Num Nom; if sentence initial
spredSgMII> (@SPRED>)
r492> (@SPRED>) for Interr Gen; consisting only of negations. You are not allowed to be MII. You are not allowed to have an adjective or noun to yor right. You are not allowed to have a verb to your right; the exception beeing an aux.
AdjSpredSg> (@SPRED>) for A Sg Nom; if copulas to the right, but not if A or @<SPRED are found to the right of copulas
SpredSg>Hab (@SPRED>) for Nom; if you are sentence initial, copulas is located to the right, and there is a habitive to the right of copulas
Spred>SubjInf (@SPRED>) for Nom; if copulas to the right, and the subject of copulas is an Inf to the right
spredCoord (@<SPRED) coordination for Nom; only if there already is a SPRED to the left of CNP. Not if there is some kind of comparison involved.
subj>Sgnr1 (@SUBJ>) for Nom Sg, including Indef Nom if; VFIN + Sg3 or Pl3 to the right (VFIN not allowed to the left)
subj>Du (@SUBJ>) for dual nominatives, including Coll Nom. VFIN + Du3 to the right.
subj>Pl (@SUBJ>) for plural nominatives, including Coll and Sem/Group. VFIN + Pl3 to the right.
subj>Pl (@SUBJ>) for plural nominatives
subj>Sgnr2 (@SUBJ>) for Nom Sg; if VFIN + Sg3 to the right.
<subjSg (@<SUBJ) for Nom Sg; if VFIN Sg3 or Du2 to the left (no HAB allowed to the left).
f<advl (@-F<ADVL) for infinite adverbials
f<advl (@-F<ADVL) for infinite adverbials
s-boundary=advl> (@ADVL>) for ADVL that resemble s-booundaries. Mainverb to the right.
-fobj> (@-FOBJ>) for Acc
-fobj> (@-FOBJ>) for Acc
advl>mainV (@ADVL>) if; finite mainverb not found to the left, but the finite mainverb is found to the right.
<advl (@<ADVL) if; finite mainverb found to the left. Not if a comma is found immediately to the left and a finite mainverb is located somewhere to the right of this comma.
<advlPoPr (@<ADVL) if mainverb to the left.
advlPoPr> (@<ADVL) if mainverb to the right.
advlEss> (@<ADVL) for weather and time Ess, if FMAINV to the left.
advl>inbetween (@ADVL>) for Adv; if inbetween two sentenceboundaries where no mainverb is present.
comma<advlEOS (@<ADVL) if; comma found to the left and the finite mainverb to the left of comma. To the right is the end of the sentence.
advlBOS> (@ADVL>) if; you are N Ill and found sentnece initially. First one to your right is a clause.
<advlPoEOS (@<ADVL) for Po; if you are found at the very end of a sentence. A mainverb is needed to the right though.
cleanupILL<advl (@<ADVL) for N Ill if; there are no boundarysymbols to your left, if you arent already @N< OR @APP-N<, and no mainverb is to yor left.
<opredAAcc (@<OPRED) for A Acc; if an other accusative to the left, and a transtive verb to the left of it. OR: if a transitive verb to the left, and an accusative to the left of it.

sma object

<advlEss (@<ADVL) for ESS-ADVL if; FMAINV to the left
<spredEss (@<SPRED) for N Ess if; FMAINV to the left is intransitive or bargat

SUBJ MAPPING - leftovers

OBJ MAPPING - leftovers

HNOUN MAPPING

This (part of) documentation was generated from src/cg3/functions.cg3

src-fst-morphology-affixes-adjectives.lexc.md

Adjective inflection The Somali language adjectives compare.

This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc

src-fst-morphology-affixes-irregularverbs.lexc.md

Irregular verbs

These are the “irregular” verbs, which are mostly prefixing or copular.

The copulas are mostly suffixing, and all the other verbs include person prefixes, and agreement on suffixes for person. Tense and mood are expressed with complex stem alternations that are no longer 100% productive, and progressive is formed from a derivational stem, with no person prefixes.

NB: After adding in some additional morphological boundaries for some of the verbs, it should become obvious that some more simplification in amount of lexica is possible. Prefixing verbs often have multiple stems for separate tenses, and more or less get the same person prefixes in full and reduced paradigms. The only trick there is it requires more flag diacritics, to make sure that the prefix matches the suffix.

TODO: omg

LEXICON MA ma and related inflected forms.

ah is a verb meaning ‘to exist’, but can function as a copula. It is inflected in all tenses, but has long and short forms.

LEXICON Ah Inflections in tense.

This (part of) documentation was generated from src/fst/morphology/affixes/irregularverbs.lexc

src-fst-morphology-affixes-nouns.lexc.md

Noun inflection The Somali nouns inflect in cases, are marked for gender and number.

This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc

src-fst-morphology-affixes-prefixes.lexc.md

Prefixes Prefixes in the Somali language are bound to beginning of other words.

This (part of) documentation was generated from src/fst/morphology/affixes/prefixes.lexc

src-fst-morphology-affixes-propernouns.lexc.md

Proper noun inflection The Somali language proper nouns inflect in the same cases as regular nouns, but with a colon (‘:’) as separator.

This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc

src-fst-morphology-affixes-symbols.lexc.md

Symbol affixes

This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc

src-fst-morphology-affixes-verbs.lexc.md

Verb inflection The Somali language verbs inflect in persons.

Full Reduced 1Sg A A 2Sg B A 3SgM A A 3SgF B B 1Pl C C 2Pl D A 3Pl E A

Present Past 1Sg keenaa keenay 2Sg keentaa keentay 3SgM keenaa keenay 3SgF keentaa keentay 1Pl keennaa keennay 2Pl keentaan keenteen 3Pl keenaan keenaan

Present Past 1Sg keénayaa keénayay 2Sg keénaysaa keénaysay 3SgM keénayaa keénayay 3SgF keénaysaa keénaysay 1Pl keénaynaa keénaynay 2Pl keénaysaan keénayseen 3Pl keénayaan keénayeen

Apply post-root tones, and other root triggers

This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc

src-fst-morphology-phonology.twolc.md

The Somali morphophonological/twolc rules file

Morphophonological notes

Phonological Processes in Somali

Somali has several phonological alternations involving reduplication, lenition, vowel harmony and tone. The hopes with this documentation is that it will either make twolc rules clearer, or help if it comes time to completely redo all the rules.

Spreading processes

Lenition

The lenis stop series in Somali alternates with the fortis series , note that the does not actually participate. Lenis stops are found in coda positions, and fortis stops are found elsewhere. The alternation occurs in both nouns and verbs.

ilig ‘tooth'   ~   iligga ‘tooth (Def.)'     ~   ilko ‘teeth (Indef.)’
arag ‘see'     ~   aragtaa ‘2Sg/3SgF sees'   ~   arkaa ‘1Sg/3SgM sees’

Voicing assimilation

Stops assimilate for voicing (or lenis/fortis), particularly across morpheme boundaries, however they only assimilate if they share place of articulation.

aragtaa            ‘2Sg/3SgF sees’
wararka            ‘the news’
buugga             ‘the book’
naagta            ‘the woman’
jaamacadda        ‘the university’

This also follows for the retroflex segment , however the sequence is shortened to .

gabadh            ‘(a) girl’
gabadha           ‘the girl’

Vowel ablaut

Vowels are subject to two main types of ablaut: (1) full ablaut across back consonants, and (2) partial ablaut with ~ preceding the high vowel . Full ablaut is constrained to morpheme boundaries: and most commonly occurs in the ‘waxa’ focus marker.

waxaan ‘foc+1Sg'            magac                        rah
wuxuu  ‘foc+3SgM'           magucu    / magacu           ruhu
magicii   / magicii

Full ablaut appears to be optional in some words. Partial ablaut occurs in verbal infinitives with mostly any word of the pattern CaC. When the infinitive ending is appended, raises to . It also occurs around person suffixes and tense ending in

tag ‘go'        tegi ‘to go'            tegeen ‘they went’
bax ‘leave'     bexi ‘to leave'         bexeen ‘they left’

deletion

Somali NLP Grammar

Page Content