GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
This procedure is offered as a workaround while waiting for our hyphenators.
At present (2023) the GiellaLT framework cannot offer hyphenation integrated in its proofing tools. What we can offer, however, is a hyphenation component, either based upon the phonological structure of the language or upon both phonological and morphological cues.
In some cases, e.g. when wanting to publish a book, correct hypehenation becomes important. This procedure shows how a book manuscript may be hyphenated also whan the hyphenation tools are not (yet) integrated in the spellcheckers. It is a bit cumbersome, but compared to manual hyphenation you will literally save days of work on book manuscripts.
We assume that the manuscript is available in plain text format, in a file here called manuscript.txt, and that you have downloaded the lang-xxx
catalogue from github (xxx
being the ISO code for you language), as found here, and explained here.
If this is in place, do the following (parts 1-3 you do only once, 4-7 again for each new document):
lang-xxx
./configure --enable-fst-hyphenator
make -j
cat manuscript.txt |tr '\-' '‰' |
hfst-lookup -q tools/hyphenators/hyphenator-gt-desc.hfstol | cut -f2 | uniq > hyph-manuscript.txt
hyph-manuscript.txt
in Microsoft WordThat’s it! In 7 simple steps (!), you now have a book manuscript with hyphen boundaries exactly where you want to have them.
The example was done for Microsoft Word. You probably figure out how to repeat it in your favourite editor (if possible). Needless to say, we would have preferred for this to be integrated in your favourite text publisher tool. Have a look now and then for updates ad developments.