GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started, and our Privacy document.

View GiellaLT on GitHub

Page Content

Editors
Compilers for morphology and morphophonology
Disambiguation tools
Analysis and testing

Our home-made tools, and adjustments of public tools

Other tools

Obsolete documentation

Development tools

This page links to (documentation on) editors, compilers and some other tools.

Editors

Text editors

To edit our source file we need a text editor, which has to support UTF-8, and can save the edited result as pure text. You may use emacs and it’s modes or vim. Graphical editor for all platforms are Atom and Sublime Text, for Linux there is Gedit.

On a Mac you may e.g. use SubEthaEdit, for which we also have made modes for the relevant programming tools, or TextMate. On Windows (Ubuntu on Windows), you may use e.g. EditPad lite.

Dictionary editors

Documentation editors

We publish our documentation with MarkDown. cf. our documentation on the Markdown format). Many text editors, such as SubEthaEdit, can show Markdown pages. For mac, we recommend the Macdown editor.

The language specific documentation is written either in the source files themselves or in the lang-xxx/docs folder. Language independent documentation (= the pages you read just now) is written in the repository giellalt.github.io and Tromsø-specific documentation is written in the repository giellalt.uit.no.

Compilers for morphology and morphophonology

The project uses a set of morphological compilers which exists in two versions, the xerox and the hfst tools. The original xerox tools are as of April 2025 no longer supported by our infrastructure, which now mainly only uses the open source hfst tools.

A third compiler is also able to compile source files written for xfst and lexc but not twolc, the foma compiler.

The hfst compilers

The hfst tools are downloaded as described in the Getting started page. Documentation is found at the hfst wiki. For installation, see also our hfst3 installation page. Note that the documentation is mainly technical, for a pedagogical introduction, we still recommend the Beesley and Karttunen book.

The Xerox compilers

The xerox tools are robust and well documented, they are freely available for research, but they are not open source. As of 2025, they are only available for download through the Internet Archive. Our infrastructure no longer supports these tools, but they can still be used for compiling and testing single files. This is e.g. applicable for twolc, where the xerox tools have an interactive debugging interface which hfst lacks.

The Xerox tools are: twolc (for morphophonology), lexc (for morphology), xfst (for compiling the final transducer) , and lookup (for analysis and generation). Hfst has the same tools (called hfst-twolc, hfst-xfst, etc.) as well as a long list of other tools.

The Xerox tools can be found at fsmbook.com (archived version). They are documented in the book referred to on that page (Beesley and Karttunen), we strongly recommend anyone working on morphological transducers, both with Xerox and hfst, to buy the book.

twolc, for phonological and morphophonological rules (cf. a shorter and a longer documentation).
lexc, for representing the Saami stems and the affix lexica
xfst the finite-state transducer tool, for integrating the different parts of the program, and for compiling the preprocessor.
tokenize, for tokenization and processing (cf. documentation), note that we do not use tokenize for preprocessing at the moment, but perl.
lookup, an interface to the morphological analyser. (documentation, cf. also our lookup notes).

The programs are activated by printing e.g. lexc and then pressing the enter key. The tools are documented in Karttunen / Beesley Finite-State Morphology: Xerox Tools and Techniques (archived version). The tools may also be installed on your own machine, be it on Mac OSX, Linux or Windows. One version of the software is found on the CD accompanying the book, for the latest version, ask Trond for reference.

The foma compiler

Måns Huldén’s Foma may be downloaded at bitbucket.org/mhulden/foma. See our Foma documentation.

Disambiguation tools

Morphological disambiguation

Analysis and testing

The easiest and the most effective way to do this (although a little scary at first) is to use commandline tools. We have made a short introduction in English and a longer document in Norwegian on this topic. The introduction on how to use our parser is also an excellent introduction on how to combine the individual tools.

Our home-made tools, and adjustments of public tools

Other tools

tca2, the corpus alignment program.
Evaluating other sentence alignment programs.
Obsolete documentation on UTF8 for older operating systems: setup

Obsolete documentation

lookup2cg, a script to transform Xerox output to CG input. Nowadays, we use hfst-tokenise

Development tools

Editors

Text editors

Dictionary editors

Documentation editors

Compilers for morphology and morphophonology

The hfst compilers

The Xerox compilers

The foma compiler

Disambiguation tools

Analysis and testing

Our home-made tools, and adjustments of public tools

Other tools

Obsolete documentation

Sitemap