GiellaLT Documentation

GiellaLT provides rule-based language technology aimed at minority and indigenous languages

View GiellaLT on GitHub

Page Content

The <Correct!> corpora

The catalogue gt/sme/corp/ now contains a separate directory correct/.

This directory is meant to contain files marked with <Correct!> tags. Corpus files included here should preferably be stable and corrected for formatting and typographical errors before we embark on the job of marking them with correct tags.

The files are taken from the corpus files, thus, for any file named filename.txt in the corp/ directory, we may make a file corr-filename.txt in the correct directory. If the original filename.txt is corrected after the corr-filename.txt is made, the correction should probably be done in corr-filename.txt as well.

Documentation for using the corr-* files can be found on the documentation pages for vislcg.