GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
Gávdnojit máŋggalágan missinglisttut:
Missinglistui leat čohkkejuvvon buot sániid mat ožžot nástti go mii jorgalit buot teavsttaid mat leat texts-máhpas. Násti mearkkaša ahte sátni ii leat bidix-fiillas, dahje ahte das lea boasttu sátneluohkká dahje (jus vearba) IV/TV bidixas. Sánit leat ordnejuvvon frekveanssa mielde, ja analyserejuvvon vai oainnát lemma-hámi.
Jus háliidat geahččat mo sátni geavahuvvo teavsttain: cat texts/*sme.txt | less
, ja de ohcat sáni.
Ovdamearka:
less dev/missinglist.txt
Ijahis idja+N+Der/heapmi+A+Attr
Ijahis ijaheapme+A+Attr
tel tel+N+ABBR+Nom
tel tel+N+ABBR+Gen
tel tel+N+ABBR+Attr
tel tel+N+ABBR+Acc
Akwé Akwé +?
ONid ON+N+ACR+Err/Orth+Pl+Gen
ONid ON+N+ACR+Err/Orth+Pl+Gen+Err/Orth
ONid ON+N+ACR+Err/Orth+Pl+Acc
ONid ON+N+ACR+Err/Orth+Pl+Acc+Err/Orth
Jus leat vuodján python-skripta, de jorgaluvvon teavsttat leat otpt_dir/ -máhpas, ja de sáhtát geavahit skripta:
sh star.sh
Jus teavsttat eai leat jorgaluvvon
````cat texts/*sme.txt | apertium -d . sme-smn | tr ‘\t’ ‘ ‘ | tr ‘ ‘ ‘\n’ | \ | ||
tr -d ‘.,():;?!’ | grep ‘*’ | sort | uniq -c | sort -nr | tr -d ‘*’ | usme > dev/missinglist.txt``` |
Fra relevante tekster i hele korpuset
cat dev/sikor.sme.V.freq.noder.missing |hfst-proc sme-smj.automorf.hfst |less
Missinglist-barggus lea vejolaš geavahit min sátnegirjjiid (ovdamearka sma-katalogas):
cat dev/missing_v_noder | smenob | see
Lexicon file '...bin/smenob-all.fst' could not be found or opened
cd $GTHOME/words/dicts
see make-bildict
make -f make-bildict
````