GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
OmegaT is one of many computer-assisted translation (CAT) programs, i.e. programs that help you translate documents from one language to another. OmegaT is open source and it is thus possible to adapt it to translation to and from Saami languages. Most of [our resources for computer-assisted translation](../../tm/TranslationMemory.html] can be used by all CAT programs, though (the exception is machine translation, that works only for OmegaT).
The user documentation page for OmegaT refers to installation and user documentation, and can be found here:
What follows are our thoughts for developing CAT for Saami.
The idea is to offer a set of ready-made folders, perhaps in two different formats:
For the time being, the folders are at [https://gtsvn.uit.no/biggies/trunk/mt/omegat/].
The idea is to put the following resources into the following subdirectories:
dictionary
: our StarDict dictionary smenob (OmegaT documentation) (todo)glossary
: term lists, partly fad-marked pairs, partly from satni.org, cf documentation (done)tm
: our parallel texts, all files fused into one .tmx file (or one per theme), cf documentation (done)omegat
: a file segmentation.conf, for doing sentence level segmentation, cf. documentation (done for sme)The source and target folders are given svn ignore status, as we develop the folders we should determine what other files to ignore and what to share.
The language pairs are of three types:
Adding more resources:
You can get the hfst tokenizer ready compiled. You need to download:
And put them into ~/Library/Preferences/OmegaT/plugins
(create the dir if it’s not there). Tested and works with OmegaT 4.x.
Hfst tokenizer source is at github