Testing in GiellaLT infra
Most of the repositories in GiellaLT come with a lot of tests. Tests are used to
verify that the language models work correctly within the specified parametres:
every update to language data triggers automatic build followed by automatic
checks. If checks are successful, new versions of stuff may be distributed to
users, e.g. via nightly channel. The main command to run automated tests is
make check -j. This runs tests (in parallel) until at least one fails, if one
or more fails it means the language model is not good enough to be used in
tools or research. Similarly, when we update something for all language repos at
once, it is make check that shows whether or not the update was compatible
with all of the languages.
make check is good for automated testing. It runs fast, in parallel, in
background, but it is not ideal for developers who are working on fixing things.
For development use make devtest can be more useful, it runs all the tests one
by one, ignoring the results and shows more output in order. For some of the
tests, there are helpful scripts in the devtools/ folder, these can provide
more output and open relevant files in a graphical editor for example.
Langs tests
Following is a list of all tests enabled for all languages and what they do.
These tests are run by the make check command and automated testing in the
build infra.
src/cg3/test
No common tests yet.
src/fst/morphology/test
generate-{adjective,noun,propernoun,verb}-lemmas.sh
gets lemmas from
src/fst/morphology/stems/{adjectives,nouns,propernouns,verbs}.lexc files
respectively and ensures that all of them can be generated by the normative
generator using specific analysis tags. Use
devtools/generate-{adj,noun,prop,verb}-lemmas.sh to get more information about
lemmas that are not generated.
generate-{adjective,noun,verb}-paradigm.sh
gets lemmas from src/fst/morphology/stems/{adjectives,nouns,verbs}.lexc files
respectively and generates paradigms defined by
test{adj,noun,verb}paradigm.txt respectively, with the descriptive generator.
Use devtools/generate-{adj,noun,prop,verb}-wordforms.sh to get more
information about paradigms that are not generated.
src/fst/morphology/phonology/pair-test-hfst.sh
runs pair tests against twol rules.
src/fst/orthography/test/run-initcaps-genyaml-testcases.sh
runs YAML tests from the same directory for generating word-forms with initial caps or titlecasing if any.
src/fst/phonetics/tests/
no common tests.
src/fst/test/run-*-yaml-testscases.sh
runs YAML tests from subdirectories like: dict-gt-yamls, gt-desc-yamls,
gt-norm-yamls etc. if any are found.
src/fst/test/run-lexc-testcases.sh
runs YAML testcases from lexc doccomments if any are found.
tools/analysers/test/
no common tests
tools/grammarcheckers/tests/
runs yaml checks in the directory against grammar checker, if any. Grammar checker testing is documented here.
tools/hyphenators/test/
no common tests
tools/mt/apertium/test/run-mt-gt-desc-anayaml-tests.sh
runs YAML checks in the same folder if any for machine translation analyser.
tools/spellcheckers/test/accept-all-lemmas.sh
gets lemmas from src/fst/morphology/stems/*.lexc and ensures that they
are accepted by spell-checker.
tools/spellcheckers/test/run-*-yaml-testcases.sh
runs YAML checks in the desktopspeller-gt-norm-yamls subdirectory if any.
tools/spellcheckers/test/suggestion-quality.sh
runs spelling corrector against typos.tsv and ensures that number of correct
suggestions is within specified parametres.
tools/spellcheckers/test/test-zhfst-basic-sugg-speed.sh
verifies spell-checker produces suggestions within reasonable time
tools/spellcheckers/test/test-zhfst-file.sh
checks that the spell-checker archive is usable at all
tools/tokenisers/tests/
no common tests
tools/tts/test
no common tests
devtools/
contains scripts for manually running tests and seeing more results and opening the result files in an editor.