GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
Procedure for bidix improvement:
The file is found as follows:
cd apertium/nursery/apertium-sme-smn
see apertium-sme-smn.sme-smn.dix
After 75 initial lines of definitions, the bidix contains, in this order:
Todo: Choose the right smn for each sme in chapter C.
use xml or xsl mode in SubEthaEdit.
Start on the top of section C.
There will be more that one sme reading, as follows:
<e><p><l>hiljážii<s n="adv"/></l><r>kuuloold<s n="adv"/></r></p></e>
<e><p><l>hiljážii<s n="adv"/></l><r>šiäđust<s n="adv"/></r></p></e>
<e><p><l>divttásmuvvat<s n="vblex"/><s n="iv"/></l><r>sovđâđ<s n="vblex"/></r></p></e>
<e><p><l>divttásmuvvat<s n="vblex"/><s n="iv"/></l><r>suáhuđ<s n="vblex"/></r></p></e>
The procedure for editing is:
apertium-sme-smn.sme-smn.lrx
, and write an explanation in the beginning of that file.Correction of errors:
When the smn translation should consist of more than one word, the blank is
marked with <b/>
, as follows:
<e><p><l>ovddos<b/>guvlui<s n="adv"/></l><r>ovdâskuávlui<s n="adv"/></r></p></e>
In most cases, we do not want multiword translations in the bidix, but in the transfer rules.
<!-- Checked until this line 1.11.15. TT -->
)make
, and look for error messages saying e.g.
apertium-sme-smn.sme-smn.dix:10444: parser error : Opening and ending tag mismatch: section line 75 and e
You should then look for the error at line 10444 (or the previous line)Give the lemma of both sme and smn. Check the analysis, e.g. ávvudoalut:
ávvudoalut ávvodoalut+Err/Orth+N+Pl+Nom
<= the lemma is ávvodoalut
Be aware of that some verbs are IV, other verbs are TV. At the time being we add this tag only to the sme lemma:
<e><p><l>birget<s n="vblex"/><s n="iv"/></l><r>piergiđ<s n="vblex"/></r></p></e>
Some lemmas are lexicalised as plurals. As long as it is the same for sme and smn, it is no problem. But if the number is not the same for these two languages, then the number tags must be given to the bidix.
E.g. ávvodoalut+N+Pl
vs. juhlálâšvuotâ+N+Sg
. Add plural and singular tags to the bidix:
<e><p><l>ávvodoalut<s n="n"/><s n="pl"/></l><r>juhlálâšvuotâ<s n="n"/><s n="sg"/></r></p></e>
Many adverbs are really inflected nouns, usually locatives, illatives or genetives. Sometimes the lemma can be lexicalised as an adverb in one of the languages, but not in the other language. One could consider if the word should be lexicalised also in the other language. If the bidix-worker is not responsible for the FST for the language in question, she should just leave a comment about it.
E.g. iđđes
vs. iđedist
. Give correct tags, and a comment:
<e><p><l>iđđes<s n="adv"/><s n="tv"/></l><r>iiđeed<s n="n"/><s n="sg"/><s n="loc"/></r></p></e> <!-- not same PoS -->
Sometimes the lemma can be lexicalised as a postposition in one of the languages, but not in the other language. One could consider if the word should be lexicalised also in the other language. If the bidix-worker is not responsible for the FST for the language in question, she should just leave a comment about it.
E.g. háldui+Po
vs. haaldun+Po
. Add a comment:
<e><p><l>háldui<s n="po"/></l><r>haaldun<s n="po"/></r></p></e> <!-- not in sme -->
e.g. haga+Po
vs. abessive case in smn.
Give explanations and examples in the contrastive grammar (or another common file for such notes) and a comment about it in the bidix:
<e><p><l>haga<s n="po"/></l><r><s n="po"/></r></p></e> <!-- abessive, explained in the contrastive grammar -->