GiellaLT provides rule-based language technology aimed at minority and indigenous languages
HfstTokenizer can be compiled together with OmegaT and bundled into Mac App. Follow these instructions:
JavaAppLauncher
and jre-mac-root
to be defined
in OMEGAT_ASSETS_DIR
folder, which is searched from environmental variables.
If not found in this folder the build process looks one folder down from
where you installed OmegaT sources.
jre-mac-root
is a soft link to the folder where Java Runtime libraries are foundOMEGAT_SRC_FOLDER/lib
where
OMEGAT_SRC_FOLDER is the folder you just installed the OmegaT source files.
here
1- Copy HfstTokenizer.java
and HfstStemFilter.java
to
OMEGAT_SRC_FOLDER/src/org/omegat/tokenizer
where
OMEGAT_SRC_FOLDER is the folder you just installed the OmegaT source files.
throws IOException
from getTokenStream
method and correct
StandardTokenizer
constructor callhfst-ol.jar
to manifest-template.mf
(details below)lib/hfst-ol.jar
entry to manifest.mf
’s Class-Path
variableant mac
in OmegaT source folder, the one where you installed OmegaTDiffs:
1c1
< package org.omegat.tokenizer;
---
> package no.divvun.tokenizer;
16a17
> import org.omegat.tokenizer.BaseTokenizer;
17a19
> import org.omegat.tokenizer.Tokenizer;
60,63c62,64
< final boolean stopWordsAllowed) {
< StandardTokenizer tokenizer = new StandardTokenizer(getBehavior(),
< new StringReader(strOrig));
< // tokenizer.setReader(new StringReader(strOrig));
---
> final boolean stopWordsAllowed) throws IOException {
> StandardTokenizer tokenizer = new StandardTokenizer();
> tokenizer.setReader(new StringReader(strOrig));
71,72c72
< return new HfstStemFilter(new StandardTokenizer(getBehavior(),
< new StringReader(strOrig)), transducer);
---
> return new HfstStemFilter(tokenizer, transducer);
1c1
< package org.omegat.tokenizer;
---
> package no.divvun.tokenizer;
11a12
> import org.apache.lucene.util.AttributeSource.State;
47,49c48,49
< for (String s : res) {
< // res.forEach(anal -> {
< String stem = s.substring(0, s.indexOf("+"));
---
> res.forEach(anal -> {
> String stem = anal.substring(0, anal.indexOf("+"));
53c53
< }
---
> });
Add the following for hfst-ol.jar
to template:
Name: org.omegat.tokenizer.HfstTokenizer
OmegaT-Plugin: tokenizer