OmegaT developer info
Mac App Bundling
HfstTokenizer can be compiled together with OmegaT and bundled into Mac App. Follow these instructions:
- Download OmegaT 3.x source code, not 4.x here
- Get appbundler used by OmegaT from here.
This needs Java 1.7
- install this into ~/.ant/lib/
- this appbundler needs
JavaAppLauncher
andjre-mac-root
to be defined inOMEGAT_ASSETS_DIR
folder, which is searched from environmental variables. If not found in this folder the build process looks one folder down from where you installed OmegaT sources.jre-mac-root
is a soft link to the folder where Java Runtime libraries are found
- Download thread safe version of hfst lookup library and put it to
OMEGAT_SRC_FOLDER/lib
where OMEGAT_SRC_FOLDER is the folder you just installed the OmegaT source files. here 1- CopyHfstTokenizer.java
andHfstStemFilter.java
toOMEGAT_SRC_FOLDER/src/org/omegat/tokenizer
where OMEGAT_SRC_FOLDER is the folder you just installed the OmegaT source files. - Modify files package name if needed - Removethrows IOException
fromgetTokenStream
method and correctStandardTokenizer
constructor call - diff HfstTokenizer.java against 4.x HfstTokenizer.java (see diffs below) - Add
hfst-ol.jar
tomanifest-template.mf
(details below) - Add
lib/hfst-ol.jar
entry tomanifest.mf
’sClass-Path
variable - run
ant mac
in OmegaT source folder, the one where you installed OmegaT
Diffs:
1c1
< package org.omegat.tokenizer;
---
> package no.divvun.tokenizer;
16a17
> import org.omegat.tokenizer.BaseTokenizer;
17a19
> import org.omegat.tokenizer.Tokenizer;
60,63c62,64
< final boolean stopWordsAllowed) {
< StandardTokenizer tokenizer = new StandardTokenizer(getBehavior(),
< new StringReader(strOrig));
< // tokenizer.setReader(new StringReader(strOrig));
---
> final boolean stopWordsAllowed) throws IOException {
> StandardTokenizer tokenizer = new StandardTokenizer();
> tokenizer.setReader(new StringReader(strOrig));
71,72c72
< return new HfstStemFilter(new StandardTokenizer(getBehavior(),
< new StringReader(strOrig)), transducer);
---
> return new HfstStemFilter(tokenizer, transducer);
diff HfstStemFilter.java against 4.x HfstStemFilter.java
1c1
< package org.omegat.tokenizer;
---
> package no.divvun.tokenizer;
11a12
> import org.apache.lucene.util.AttributeSource.State;
47,49c48,49
< for (String s : res) {
< // res.forEach(anal -> {
< String stem = s.substring(0, s.indexOf("+"));
---
> res.forEach(anal -> {
> String stem = anal.substring(0, anal.indexOf("+"));
53c53
< }
---
> });
Add the following for hfst-ol.jar
to template:
Name: org.omegat.tokenizer.HfstTokenizer
OmegaT-Plugin: tokenizer