GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
This document presents the pipeline for adding an error-marked text to the corpus and run it through grammarchecker testing for precision and recall.
Documents for testing should represent the target group of the grammar checker and potentially contain errors. They should be stored in \*corpus/orig/$LANG/catalogename/
where the cataloguename (and eventual subcatalogues) should be a catalogue reserved for annotated files for grammarchecker testing.
filename.correct.txt
(i.e. the filename must end in .correct.txt)convert2xml filnamn.correct.txt
filnamn.correct.txt.xsl
. In this file, change conversion_status from standard
to correct
. Add other metadata. Reference to original file may e.g. be given in the filename slot.convert2xml --goldstandard filename.correct.txt
. Given an original file orig/smn/testcorp/wiki/filename.correct.txt
he resulting file will by using this command be stored in goldstandard/converted/smn/testcorp/wiki/filename.correct.txt.xml
smn
as an example):
gtgramtool test -s $GTLANGS/lang-smn/tools/grammarcheckers/smn.zcheck xml goldstandard/converted/smn > <testfile-output>
gtgramtool test -c -s $GTLANGS/lang-smn/tools/grammarcheckers/smn.zcheck xml goldstandard/converted/smn
You may at any point reopen the file filename.correct.txt
, add or revise the error marking, and run the procedure again.