GiellaLT provides rule-based language technology aimed at minority and indigenous languages
Goal: We want to test our transducers better
Now and then Multichar Symbols slip through twolc and give “words” like
Suome^Vn
pro correct Suomeen
.
How to test for this:
multichar.fst
of themLANG.fst .o. multichar.fst
in xfstThis test one should be able to set up language-independently. In case we get
Example, from fkv (this must be adjusted to a script):
regex [ ?* [a|e|i|o|u|ä|ö] [a|e|i|o|u|ä](ö] e ) ;
xfst -e "regex @\"src/analyser-gt-desc.xfst\" .o. @\"VVe.fst\" ; "
xfst[1]: print lower-words > lower-words.txt