GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology.

View GiellaLT on GitHub

Page Content

Building and installing hunspell

This procedure was done using a virtual machine, using Kubuntu inside Parallels. Hunspell is crashprone under Mac OS X, and not all versions will get compiled. The problem is probably due to errouneous dictionary files.

Building hunspell dictionaries and testing

To be able to build hunspell dictionaries we will have to build and install a hunspell specific transducer, then build and run the java program that generates the hunspell dictionaries.

Building and installing the transducer

Building and installing the java program

Generate dictionaries and debugging output

While generating dictionaries, the program can also produce debugging output, by making full paradigms of the words we try to generate for hunspell. If the –debug option is used, then the debugging output will placed inside gt/src/lexc2xspell, in the files debug-$POS-$LANG.txt. The dictionary files are placed in sme.dic and sme.aff.

Testing the dictionaries

To test the quality of the generation, we will have to build one POS at the time, then do the following.

To test the speller for real, build all the POS’s, then do the following: hunspell -d ./sme -l <test-text>