Lexicon files
Lexicon files are a part of the langs/sms/src/morphology infrastructure.
Entry structure
level
TODO: example sms entry
level
Man skiller mellom synonymer og meningsgrupper. Synonymer har samme
level
Elementet
- a word
TODO: example entry with <t />
- a phrase
TODO: example entry with <t />
- An explanation: a sentence which explains the meaning of a word, but can’t be used in the translation.
TODO: example entry with <t />
- Restriction
-
gives a restriction for the translation, f.ex. norwegian _vest_ has the restriction _of clothes_, to separate it from the navigational direction.
TODO: example entry with <t />
attribute documentation
TODO:
attribute documentation
TODO:
for references
Typically these words also include an
<e>
<lg>
<l pos="Pron" type="Pers">muʹnne</l>
<lemma_ref lemmaID="mon_Pron_Pers">mon</lemma_ref>
<analysis>Pron_Pers_Sg_Ill</analysis>
</lg>
Leads to …
<e>
<lg>
<l pos="Pron" type="Pers">mon</l>
</lg>
These are found in Pron_references_sms2x.xml.
Example sentences
TODO:
In sms these can come in under either
```TODO: example of reasons
### Files with static paradigms
Currently all sms files have a minimal miniparadigm, but in NDS we generate more.
In NDS we can tell the system to not use the static miniparadigm with the @exclude attribute:
<mini_paradigm exclude="NDS">
<analysis ms="Pron_Pers_Sg1_Gen"><wordform>muu</wordform></analysis>
<analysis ms="Pron_Pers_Sg1_Ill"><wordform>muʹnne</wordform></analysis>
</mini_paradigm> ```
If this attribute is not present as in the above, then the static paradigm will be displayed in NDS.
Other files
TODO:
Generated miniparadigms
Miniparadigms are generated in lexicon entries in order to help users. They vary from POS to POS and sometimes within POS.
Use/NGminip og Allegro i lexc
TODO: are these the tags we use now in sms?
+Use/NGminip - remove inflectional forms that one does not want to present in the miniparadigm. One example, North Saami adjectives.
NB: judicious use of +Use/NGminip from sme to clean up many possibilities into one.
Inflection | Without +Use/NGminip | With +Use/NGminip |
---|---|---|
A+Sg+Nom | heittot | heittot |
A+Attr | heittogis heittohis (bivttas) | heittogis (bivttas) |
A+Pl+Nom | heittogat heittohat | heittogat |
A+Comp+Attr | heittogit heittogut heittoget heittogat heittohit heittohut heittohet heittohat | heittoget heittogat |
A+Comp+Sg+Nom | heittogit heittogut heittoget heittogeabbo heittogat heittogabbo heittohit heittohut heittohet heittoheabbo heittohat heittohabbo | heittogeabbo heittogabbo |
A+Superl+Sg+Nom | heittogeamos heittogamos heittoheamos heittohamos | heittogeamos heittogamos |
Nouns
Display the whole paradigm in two columns for plural. In NDS, because there are case inflections that do not have +Sg or +Pl, we use a special tagset to separate these cases out to display them across one column.
TODO: Noun attributes that affect miniparadigms ?
Bøyning | Eksempel |
---|---|
Sg+Nom | võrr |
Sg+Gen | võõr |
Sg+Acc | võõr |
Sg+Ill | võʹrre |
Sg+Loc | võõrâst |
Sg+Com | võõrin |
Sg+Abe | võõrtää |
Sg+Abe | võõrtaa TODO: does this need an attribute to control? |
Pl+Nom | võõr |
Pl+Gen | võõri |
Pl+Acc | võõrid |
Pl+Ill | võõrid |
Pl+Loc | võõrin |
Pl+Com | võõrivuiʹm |
Pl+Abe | võõritää |
Pl+Abe | võõritaa TODO: does this need an attribute to control? |
Ess | võrrân |
Par | võrrâd |
Proper nouns
For now, all proper nouns are not generated in Plural.
Sg+Nom Njuõttjokk Sg+Gen Njuõttjook Sg+Acc Njuõttjook Sg+Ill Njuõttjoʹǩǩe Sg+Loc Njuõttjookâst Sg+Abe Njuõttjooktää Sg+Par Njuõttjokkâd
EX: Äʹnnjääuʹraž
TODO: determine how to display these in sms
Form | Context | Example | Translation |
---|---|---|---|
- | - | ||
Sg+Gen | X pääiʹǩ | ||
Sg+Ill | - | ||
Sg+Loc | - |
TODO: Any plural-only proper nouns?
Holidays?
use räjja in context for e.g. eeʹjjpeeiʹv räjja
Adjectives
For adjectives we use context as an attribute on the lemma node, in order to provide an attributive adjective example with a noun.
TODO: determine some good contexts for adjs
Inflection | Context | Example |
---|---|---|
A+Pred+Sg | oođâs | |
A+Attr | context:”??” | ođđ (??) |
A+Comp | ođđsab | |
A+Superl | ođđsumus |
TODO: +A+Pred+Pl ?
numerals
TODO:
Pronouns
Personal pronouns
Most personal pronouns can be generated live from FSTs, depending on the analysis, but it may be easier to include the whole paradigm in a miniparadigm because of complexities in tags matching up with lemmas.
This also requires the type=”Pers” attribute on the
Inflection | Example |
---|---|
Sg+Nom | mon |
Sg+Gen | muu |
Sg+Acc | muu |
Sg+Ill | muʹnne |
Sg+Loc | muʹst |
Sg+Com | muin |
Sg+Abe | muutää |
Ess | muuʹnen |
Par | muuʹđed |
TODO:
Indef pron
måtam Måtmin
TODO:
Pregenerated paradigms
pronouns
Because the analyzer uses tags that make generation difficult, the thought was to include miniparadigms in the XML file that will contain the whole displayed paradigm.
TODO:
negative verb
TODO:
Sg1
Sg2
Sg3
Pl1
Pl2
Pl3
Homonymous entries
Homonymous entries (lemma + POS) may be tricky for a combination of the lexicon and the analyzer. An additional way to deal with this is to mark these on an additional attribute, POS type, or something else. This is also problematic when generating the correct paradigm for the lexicon entry, or when lining up analyses with the meanings.
TODO: jokk is homonymous in sms, find examples for documentation from there.
Non-systematic homonymy
TODO: