GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
Lexicon files are a part of the langs/sms/src/morphology infrastructure.
TODO: example sms entry
Man skiller mellom synonymer og meningsgrupper. Synonymer har samme
Elementet
TODO: example entry with <t />
TODO: example entry with <t />
TODO: example entry with <t />
TODO: example entry with <t />
TODO:
TODO:
Typically these words also include an
<e>
<lg>
<l pos="Pron" type="Pers">muʹnne</l>
<lemma_ref lemmaID="mon_Pron_Pers">mon</lemma_ref>
<analysis>Pron_Pers_Sg_Ill</analysis>
</lg>
Leads to …
<e>
<lg>
<l pos="Pron" type="Pers">mon</l>
</lg>
These are found in Pron_references_sms2x.xml.
TODO:
In sms these can come in under either
```TODO: example of reasons
### Files with static paradigms
Currently all sms files have a minimal miniparadigm, but in NDS we generate more.
In NDS we can tell the system to not use the static miniparadigm with the @exclude attribute:
<mini_paradigm exclude="NDS">
<analysis ms="Pron_Pers_Sg1_Gen"><wordform>muu</wordform></analysis>
<analysis ms="Pron_Pers_Sg1_Ill"><wordform>muʹnne</wordform></analysis>
</mini_paradigm> ```
If this attribute is not present as in the above, then the static paradigm will be displayed in NDS.
TODO:
Miniparadigms are generated in lexicon entries in order to help users. They vary from POS to POS and sometimes within POS.
TODO: are these the tags we use now in sms?
+Use/NGminip - remove inflectional forms that one does not want to present in the miniparadigm. One example, North Saami adjectives.
NB: judicious use of +Use/NGminip from sme to clean up many possibilities into one.
Inflection | Without +Use/NGminip | With +Use/NGminip |
---|---|---|
A+Sg+Nom | heittot | heittot |
A+Attr | heittogis heittohis (bivttas) | heittogis (bivttas) |
A+Pl+Nom | heittogat heittohat | heittogat |
A+Comp+Attr | heittogit heittogut heittoget heittogat heittohit heittohut heittohet heittohat | heittoget heittogat |
A+Comp+Sg+Nom | heittogit heittogut heittoget heittogeabbo heittogat heittogabbo heittohit heittohut heittohet heittoheabbo heittohat heittohabbo | heittogeabbo heittogabbo |
A+Superl+Sg+Nom | heittogeamos heittogamos heittoheamos heittohamos | heittogeamos heittogamos |
Display the whole paradigm in two columns for plural. In NDS, because there are case inflections that do not have +Sg or +Pl, we use a special tagset to separate these cases out to display them across one column.
TODO: Noun attributes that affect miniparadigms ?
Bøyning | Eksempel |
---|---|
Sg+Nom | võrr |
Sg+Gen | võõr |
Sg+Acc | võõr |
Sg+Ill | võʹrre |
Sg+Loc | võõrâst |
Sg+Com | võõrin |
Sg+Abe | võõrtää |
Sg+Abe | võõrtaa TODO: does this need an attribute to control? |
Pl+Nom | võõr |
Pl+Gen | võõri |
Pl+Acc | võõrid |
Pl+Ill | võõrid |
Pl+Loc | võõrin |
Pl+Com | võõrivuiʹm |
Pl+Abe | võõritää |
Pl+Abe | võõritaa TODO: does this need an attribute to control? |
Ess | võrrân |
Par | võrrâd |
For now, all proper nouns are not generated in Plural.
Sg+Nom Njuõttjokk Sg+Gen Njuõttjook Sg+Acc Njuõttjook Sg+Ill Njuõttjoʹǩǩe Sg+Loc Njuõttjookâst Sg+Abe Njuõttjooktää Sg+Par Njuõttjokkâd
EX: Äʹnnjääuʹraž
TODO: determine how to display these in sms
Form | Context | Example | Translation |
---|---|---|---|
- | - | ||
Sg+Gen | X pääiʹǩ | ||
Sg+Ill | - | ||
Sg+Loc | - |
TODO: Any plural-only proper nouns?
use räjja in context for e.g. eeʹjjpeeiʹv räjja
For adjectives we use context as an attribute on the lemma node, in order to provide an attributive adjective example with a noun.
TODO: determine some good contexts for adjs
Inflection | Context | Example |
---|---|---|
A+Pred+Sg | oođâs | |
A+Attr | context:”??” | ođđ (??) |
A+Comp | ođđsab | |
A+Superl | ođđsumus |
TODO: +A+Pred+Pl ?
TODO:
Most personal pronouns can be generated live from FSTs, depending on the analysis, but it may be easier to include the whole paradigm in a miniparadigm because of complexities in tags matching up with lemmas.
This also requires the type=”Pers” attribute on the
Inflection | Example |
---|---|
Sg+Nom | mon |
Sg+Gen | muu |
Sg+Acc | muu |
Sg+Ill | muʹnne |
Sg+Loc | muʹst |
Sg+Com | muin |
Sg+Abe | muutää |
Ess | muuʹnen |
Par | muuʹđed |
TODO:
måtam Måtmin
TODO:
Because the analyzer uses tags that make generation difficult, the thought was to include miniparadigms in the XML file that will contain the whole displayed paradigm.
TODO:
TODO:
Sg1
Sg2
Sg3
Pl1
Pl2
Pl3
Homonymous entries (lemma + POS) may be tricky for a combination of the lexicon and the analyzer. An additional way to deal with this is to mark these on an additional attribute, POS type, or something else. This is also problematic when generating the correct paradigm for the lexicon entry, or when lining up analyses with the meanings.
TODO: jokk is homonymous in sms, find examples for documentation from there.
TODO: