GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology.
sme : davvisámegiella, smn : anársámegiella, smj : julevsámegiella, sma : lullisámegiella, sms : koltasámegiella, nob : girjedárogiella, nno : ođđadárogiella, fin : suomagiella, eng : eŋgelasgiella
cd main/langs/sme/
svn up
./configure --with-hfst --enable-apertium
make
cd main/langs/sma/
svn up
./configure --with-hfst --enable-apertium
make
Dáinna oažžu maiddái dábálaš norm- ja desc-xfst.
Dát kompileremat ádjánit guhká, erenoamážit sme.
svn up
(álo buorre dahkat vaikko ii leat dárbu dán oktavuođas)make
alias apsmn="pushd ~/apertium/nursery/apertium-sme-smn"
alias apsma="pushd ~/apertium/nursery/apertium-sme-sma"
alias apsmj="pushd ~/apertium/nursery/apertium-sme-smj"
alias smn="pushd $GTHOME/langs/smn"
alias sma="pushd $GTHOME/langs/sma"
alias smj="pushd $GTHOME/langs/smj"
alias sme="pushd $GTHOME/langs/sme"
…
…
less gt2apertium.cg3r
giellatekno-bealde: usmedis
Dáppe lea MT-systema analysahtor-output
echo "lohkan" | hfst-lookup sme-smn.automorf.hfst
echo "baakoem" | hfst-lookup sma-sme.automorf.hfst
Dáppe lea analysahtor-output ovdal go lea váldojuvvon dušše bidix-sániid.
echo "lohkan" | hfst-lookup .deps/sme.automorf.hfst
echo "baakoem" | hfst-lookup .deps/sma.automorf.hfst
echo "sátni<n><sg><acc>" | hfst-lookup sma-sme.autogen.hfst
echo "baakoe<n><sg><acc>" | hfst-lookup sme-sma.autogen.hfst
echo "sáni" | apertium -d . sme-sma
echo "sánis" | apertium -d . sme-sma
echo "Don galggat boahtit skuvlii." | apertium -d . sme-sma-morph
echo "Don galggat boahtit skuvlii." | apertium -d . sme-sma-disam
echo "Don galggat boahtit skuvlii." | apertium -d . sme-sma-biltrans
echo "Don galggat boahtit skuvlii." | apertium -d . sme-sma-chunker
echo "Don galggat boahtit skuvlii." | apertium -d . sme-sma-interchunk3
echo "Don galggat boahtit skuvlii." | apertium -d . sme-sma-postchunk
echo "Don galggat boahtit skuvlii." | apertium -d . sme-sma
cat texts/tarina.sme.txt | apertium -d . sme-sma | less
cat texts/tarina.sme.txt | apertium -d . sme-sma-dgen | less
- for debuggingsee apertium-sme-sma.sme-sma.dix
<e><p><l>ruotabealde<s n="adv"/></l><r>Sveerjen<b/>raedtesne<s n="adv"/></r></p></e>
<e><p><l>davábealde<s n="adv"/></l><r>noerhtelen<s n="adv"/></r></p></e> ```
### <s n="vblex"/>: <iv> ja <tv>
`iv` ja `tv` lea dárbbašlaš dušše sme-bealde, nugo dán ovdamearkkas. doallat+V+TV ja toollâđ+V
`<e><p><l>doallat<s n="vblex"/><s n="tv"/></l><r>toollâđ<s n="vblex"/></r></p></e>`
### Eai leat seamma gilkorat, omd. G3 - Lasit sme-beallái
$ usme ášši ášši+N+G3+Sg+Nom
<e><p><l>ášši<s n="n"/><s n="g3"/></l><r>ássje<s n="n"/></r></p></e> ```
$ usme
oahpaheaddji oahpaheaddji+N+NomAg+Sg+Nom
<e><p><l>oahpaheaddji<s n="n"/><s n="nomag"/></l><r>xxxxxxx<s n="n"/></r></p></e>
Some lemmas are lexicalised as plurals. As long as it is the same for sme and smX, it is no problem. But if the number is not the same for these two languages, then the number tags must be given to the bidix.
E.g. ávvodoalut+N+Pl
vs. juhlálâšvuotâ+N+Sg
. Add plural and singular tags to the bidix:
<e><p><l>ávvodoalut<s n="n"/><s n="pl"/></l><r>juhlálâšvuotâ<s n="n"/><s n="sg"/></r></p></e>
Many adverbs are really inflected nouns, usually locatives, illatives or genetives. Sometimes the lemma can be lexicalised as an adverb in one of the languages, but not in the other language. One could consider if the word should be lexicalised also in the other language. If the bidix-worker is not responsible for the FST for the language in question, she should just leave a comment about it.
E.g. iđđes
vs. iđedist
. Give correct tags, and a comment:
<e><p><l>iđđes<s n="adv"/><s n="tv"/></l><r>iiđeed<s n="n"/><s n="sg"/><s n="loc"/></r></p></e> <!-- not same PoS -->
Sometimes the lemma can be lexicalised as a postposition in one of the languages, but not in the other language. One could consider if the word should be lexicalised also in the other language. If the bidix-worker is not responsible for the FST for the language in question, she should just leave a comment about it.
E.g. háldui+Po
vs. haaldun+Po
. Add a comment:
<e><p><l>háldui<s n="po"/></l><r>haaldun<s n="po"/></r></p></e> <!-- not in sme -->
e.g. haga+Po
vs. abessive case in smn.
Give explanations and examples at the wiki-pages, and quasicode in the transfer file and a comment about it in the bidix:
<e><p><l>haga<s n="po"/></l><r><s n="po"/></r></p></e> <!-- abessive -->
apertium-sme-smn.sme-smX.lrx
, and make a rule.Omd.
<e><p><l>láhčit<s n="n"/><s n="tv"/></l><r>orniđ<s n="n"/></r></p></e>
<e><p><l>láhčit<s n="n"/><s n="tv"/></l><r>lääččiđ<s n="n"/></r></p></e> ```
## lrx-fiillaid struktuvra
omd. apertium-sme-smn.sme-smn.lrx
Dáppe lea lrx-fiilla ovdamearka. Default lea láhčit = orniđ (1.0 > 0.5). Jos láhčit-vearbba
máŋŋel boahtá sem_furn, de mis lea lääččiđ (0.5 + 0.6 = 1.1 > 1.0).
```
cat texts/*sme.txt | apertium -d . sme-smn | tr '\t' ' '| tr ' ' '\n' |\
tr -d '.,():;?!' | grep '\*' |sort | uniq -c | sort -nr |tr -d '\*' > dev/missinglist.txt
Mo sáhttá diehtit gosa galgá ráhkadit njuolggadusa:
Nubbi čilgehus:
```if slword 1 = liikot (suorcelanguage) slword 2 = N+Ill (suorcelanguage)
then tlword 1 =N+Acc (Targetlanguage)
Example: liikon dutnje => datnem lyjhkem.
```
If you in the source language have a noun in illativ followed by the verb liikot, then put the noun in accusative in the target language.
Example: liikon dutnje => datnem lyjhkem.
make
./t/regression-tests
./t/pending-tests