GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.

View GiellaLT on GitHub

Page Content

Existing tests
Ideas for new tests
Elaborating the test ideas
- Test for Multichar Symbols on the lower side
- Test for phonotactically illegal strings

Ideas for testing

Goal: We want to test our transducers better

Existing tests

Paradgim testing against predefined answers: yaml tests
Tests written in the lexc and twolc code
Testing whether we generate the lemma or not
Tests using the lemma list as gold standard (do we generate the lemma)

Ideas for new tests

Test for Multichar Symbols on the lower side
Test for phonotactically illegal strings

Elaborating the test ideas

Test for Multichar Symbols on the lower side

Now and then Multichar Symbols slip through twolc and give “words” like
Suome^Vn pro correct Suomeen.

How to test for this:

Read the set of multichar symbols from root.lexc
Make a transducer multichar.fst of them
Compose LANG.fst .o. multichar.fst in xfst
list the result (should be empty)

This test one should be able to set up language-independently. In case we get

Test for phonotactically illegal strings

Example, from fkv (this must be adjusted to a script):

Make a regex accepting strings in Vowel + Vowel + e: ` regex [ ?* [a|e|i|o|u|ä|ö] a|e|i|o|u|ä ;`
Compose it with the main fst: ` xfst -e “regex @"src/analyser-gt-desc.xfst" .o. @"VVe.fst" ; “`
print the result with xfst ` xfst[1]: print lower-words > lower-words.txt`