GiellaLT provides rule-based language technology aimed at minority and indigenous languages
cat texts/TextB_smj_eval1.txt |preprocess > list/smj_eval1.list
cat texts/TextB_smj_mt.txt |preprocess > list/smj_mt.list
diff -y list/smj_mt.list list/smj_eval1.list|grep '[><]()'|tr -s '\t'|tr -s ' '|sed 's/^/ /' >> wer_analysis.csv
Buore | Vuogas
ållo | enap
merrasáme <
> merrasámij
> dáfojn
ulmusjlåhko | lassánimev
lassán | ulmusjlågon
Dav | Boados
guoradallama | boahtá
vuosedi | åvddån
maj | guoradallamin
> majt
Čilgehus:
< = gádoduvvam báhko
> = laseduvvam báhko
| = målssum bágov (lexical selection)
1 = lexical selection
2 = difference in generation (same wordform, but different shape)
3 = difference in choice of form (different wordform selected) (ax-ay)
4 = word order changed (ab-ba)
5 = punctuation
6 = word added (0-a)
7 = word deleted (a-o)
Mierkki riekta lågujn sæmmi linjan gå rievddadus. Jus li avtan bágon moadda rievddadusá laseduvvi divna lågå dan sæmmi linnjaj:
1 Buore | Vuogas
1 ållo | enap
4,3 merrasáme <
- > merrasámij
6 > dáfojn
Diff lissta ij agev vatte riekta gåvåv rievddadusájs. Vuogas le de divodasstet diff listav vaj tjielggasap vuojnná jur mij la rievdaduvvam ja gåktu:
1 Dav | Boados
4,3 guoradallama <
1 vuosedi | boahtá
6 > åvddån
- > guoradallamin
3 maj | majt
Diff-lista divna báhkopáraj (aj daj ma ælla rievdaduvvam):
diff -y list/smj_mt.list list/smj_eval1.list|tr -s '\t'|tr -s ' '|sed 's/^/ /' | see