GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
cat texts/TextB_smj_eval1.txt |preprocess > list/smj_eval1.list
cat texts/TextB_smj_mt.txt |preprocess > list/smj_mt.list
diff -y list/smj_mt.list list/smj_eval1.list|grep '[><]()'|tr -s '\t'|tr -s ' '|sed 's/^/ /' >> wer_analysis.csv
Buore | Vuogas
ållo | enap
merrasáme <
> merrasámij
> dáfojn
ulmusjlåhko | lassánimev
lassán | ulmusjlågon
Dav | Boados
guoradallama | boahtá
vuosedi | åvddån
maj | guoradallamin
> majt
Čilgehus:
< = gádoduvvam báhko
> = laseduvvam báhko
| = målssum bágov (lexical selection)
1 = lexical selection
2 = difference in generation (same wordform, but different shape)
3 = difference in choice of form (different wordform selected) (ax-ay)
4 = word order changed (ab-ba)
5 = punctuation
6 = word added (0-a)
7 = word deleted (a-o)
Mierkki riekta lågujn sæmmi linjan gå rievddadus. Jus li avtan bágon moadda rievddadusá laseduvvi divna lågå dan sæmmi linnjaj:
1 Buore | Vuogas
1 ållo | enap
4,3 merrasáme <
- > merrasámij
6 > dáfojn
Diff lissta ij agev vatte riekta gåvåv rievddadusájs. Vuogas le de divodasstet diff listav vaj tjielggasap vuojnná jur mij la rievdaduvvam ja gåktu:
1 Dav | Boados
4,3 guoradallama <
1 vuosedi | boahtá
6 > åvddån
- > guoradallamin
3 maj | majt
Diff-lista divna báhkopáraj (aj daj ma ælla rievdaduvvam):
diff -y list/smj_mt.list list/smj_eval1.list|tr -s '\t'|tr -s ' '|sed 's/^/ /' | see