GiellaLT provides rule-based language technology aimed at minority and indigenous languages
We look at alternatives to our tca2 aligners.
The sentence aligner used to align the Europarl parallel corpus is a perl script based upon Church and Gale algorithm. Here is the README file.
The script may be downloaded from the europarl site. At Giellatekno, it is placed under the
In order to run it: Add se and no abbreviation files to the nonbreaking_prefixes catalogue (this has been done). Then add files to the directories se and no. The filenames in se and no must be identical. The command then is
./sentence-align-corpus.perl se no
This might not work. For some hints, see:
Feel free to add.