GiellaLT provides rule-based language technology aimed at minority and indigenous languages
Gávdnojit máŋggalágan missinglisttut:
Missinglistui leat čohkkejuvvon buot sániid mat ožžot nástti go mii jorgalit buot teavsttaid mat leat texts-máhpas. Násti mearkkaša ahte sátni ii leat bidix-fiillas, dahje ahte das lea boasttu sátneluohkká dahje (jus vearba) IV/TV bidixas. Sánit leat ordnejuvvon frekveanssa mielde, ja analyserejuvvon vai oainnát lemma-hámi.
Jus háliidat geahččat mo sátni geavahuvvo teavsttain: cat texts/*sme.txt | less
, ja de ohcat sáni.
Ovdamearka:
less dev/missinglist.txt
Ijahis idja+N+Der/heapmi+A+Attr
Ijahis ijaheapme+A+Attr
tel tel+N+ABBR+Nom
tel tel+N+ABBR+Gen
tel tel+N+ABBR+Attr
tel tel+N+ABBR+Acc
Akwé Akwé +?
ONid ON+N+ACR+Err/Orth+Pl+Gen
ONid ON+N+ACR+Err/Orth+Pl+Gen+Err/Orth
ONid ON+N+ACR+Err/Orth+Pl+Acc
ONid ON+N+ACR+Err/Orth+Pl+Acc+Err/Orth
Jus leat vuodján python-skripta, de jorgaluvvon teavsttat leat otpt_dir/ -máhpas, ja de sáhtát geavahit skripta:
sh star.sh
Jus teavsttat eai leat jorgaluvvon
cat texts/*sme.txt | apertium -d . sme-smn | tr '\t' ' '| tr ' ' '\n' |\
tr -d '.,():;?!' | grep '\*' |sort | uniq -c | sort -nr |tr -d '\*' | usme > dev/missinglist.txt
Fra relevante tekster i hele korpuset
cat dev/sikor.sme.V.freq.noder.missing |hfst-proc sme-smj.automorf.hfst |less
Missinglist-barggus lea vejolaš geavahit min sátnegirjjiid (ovdamearka sma-katalogas):
cat dev/missing_v_noder | smenob | see
Lexicon file '...bin/smenob-all.fst' could not be found or opened
cd $GTHOME/words/dicts
see make-bildict
make -f make-bildict