GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started, and our Privacy document.

Plan for setup of machine translation

** NOTE: This documentation is old. It is kept since it may contain methodological points still valid.**

We plan to look at at least Apertium (a rule-based system, cf. its wiki) and Moses (a statistically-based system). This document discusses the setup of Moses.

Overview

The programs should be installed on the Xserve machine, in order to facilitate long runs (may last for days).

Files

We need 5 different programs, cf. the download information on each page:

alignment.jar, our Bergen-Tromsø sentence aligner
Mosesdecoder (the mt program itself)
giza++ (word alignment)
srilm (language model)
mkcls (word class/POS? training)

They shall be installed on the Xserve, and installed to standard paths.

The process

Input is a set of parallel sentences

Setup

Files where they belong
Paths and access
Modify makefiles

Make catalogues in gtsvn/mt

Today we have the catalogues:

courses
dev
doc
giza
grantapplications
script

Needed:

change giza to wordalign, make one for sentencealign.
have catalogues for the language pairs, and for the machine runs

MT systems, usage

smenob

A gist system, i.e. in order to get an idea of what is written

nobsme

engsme

Only KDE input

Edit on GitHub

Sitemap