GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
Apertium needs three components:
We assume you have installed the giellalt infrastructure already. The languages are found in their respective folders in $GTHOME/langs/
.
Fetching the Apertium files is done in the same way as for the giellalt files. In the path to download, exchange giellalt with apertium. The language (pair) string must of course also be changed. You probably want to put the giellalt and apertium directories in different giellalt and apertium directories.
Here we show how to fetch files. First, we show how to fetch lang-sme from giellalt, and then the sme-sma pair and the nob language model from apertium. The commands are for the ones using git with svn-style commands:
svn co https://github.com/giellalt/lang-sme.git/trunk lang-sme
svn co https://github.com/apertium/apertium-sme-sma.git/trunk apertium-sme-sma
svn co https://github.com/apertium/apertium-nob.git/trunk apertium-nob
Here are the same examples, fetched with git commands:
git clone git@github.com:giellalt/lang-sme.git
git clone git@github.com:apertium/apertium-sme-sma.git
git clone git@github.com:apertium/apertium-nob.git
Apertium is documented on its github page and on its wiki. Released apertium language pairs can be used on apertium.org
For each language pair you first compile each language. Note that some languages are compiled in Apertium, others in the Giellalt infrastrucutre. Norwegian Bokmål and German are e.g. compiled in Apertium. Saami and northern languages are compiled on Giellalt.
Go to the relevant language folder, here e.g. sme
, and set up the configuration for MT:
cd $GTLANGS/lang-sme/
./configure --enable-apertium
Now, be prepared to wait, from 15 minutes to several hours depending upon
the language and your computer. The compilation procedure will store the binary
files in tools/mt/apertium
in
each language folder and the apertium compilers will read them from that location.
While waiting, do the same for the other language(s) you want. Go to the
folder of the other language you want to translate as well (sma, smj, smn),
so that one for e.g. sme-sma must compile sme and sma.
Remember to reset the .configure option afterwards, e.g. to
./configure
if that is what you use for FST work.
To check that you have compiled the relevant files, file, write:
ls -l tools/mt/apertium/*.gz
If everything went well, you have new .gz
files in the apertium folder.
Remember that you must have compiled BOTH the languages you want to translae between.
For language pairs involving Giellalt languages, we take Norwegian Bokmål and German from Apertium. In addition to that, Apertium contains more than 100 languages (see the documentation on the Apertium github page or the Apertium wiki).
In the apertium-nob you fetched (above) folder, simply do:
./autogen.sh
make -j
Note that all Apertium folders contain a README file.
All Apertium language pairs (also the giellalt ones, e.g. sme-sma) are stored on Apertium github:
We assume you fetched your language pair folder as expleined above. For each language pair you must, in the folder of the language pair, set up a pointer to the two languages in the language pair:
For sme-sma (which is a pair with two giellalt languages), do this in the Apertium folder, e.g. apertium-sme-sma
:
./autogen.sh --with-lang1=$GTLANGS/lang-sme/tools/mt/apertium --with-lang2=$GTLANGS/lang-sma/tools/mt/apertium
make -j
For pairs with one Apertium language, e.g. sme-nob, do this in apertium-sme-nob
:
./autogen.sh --with-lang1=$GTLANGS/lang-sme/tools/mt/apertium --with-lang2=/path/to/apertium-nob
make -j
The command to test that everything is ok is, in each folder:
echo ja | apertium -d. sme-sma
echo ja | apertium -d. sme-nob
etc.
You may get this type of error message:
$ echo ja | apertium -d . fin-est
Error: Grammar revision is 9705, but this loader requires 10043 or later!
Now, you may have an old vislcg3 / cg-proc compiler. Test that:
vislcg3 --version
cg-proc --version
If the number you get (0.9.9.10195) is lower than the error message requires, you should update vislcg3. It may be, however, that the version number is ok but you still get the error message. In that case, you have old binary files although you have updated your compeler. In that case,
make clean
$GTLANGS/lang-<LANG>/
folder, delete the tools/mt/apertium/*.gz
filesThereafter, repeat the installation procedure.
… to be written, when reported.