GiellaLT provides rule-based language technology aimed at minority and indigenous languages
The project manipulates text in many ways, organized in lexicons.
To edit our source file we need a text editor, which has to support UTF-8, and can save the edited result as pure text. You may use emacs and it’s modes or vim. Graphical editor for all platforms are Atom and Sublime Text
On a Mac you may e.g. use SubEthaEdit, for which we also have made modes for the relevant programming tools, or TextMate. On Windows (Ubuntu on Windows), you may use e.g. EditPad lite.
We publish our documentation with MarkDown. We also have documentation on Markdown). The language specific documentation is written either in the source files themselves or in the lang-xxx/docs
folder. Language independent documentation (= the pages you read just now) is written in the repository giellalt.github.io and Tromsø-specific documentation is written in the repository giellalt.uit.no.
The project uses a set of morphological compilers which exists in two versions, the xerox and the hfst tools. The xerox tools are the original ones, they are robust and well documented, they are freely available for research, but they are not open source. The hfst tools are open source with no restrictions. Both compilers compile the same source files, and at Giellatekno and Divvun we use both compilers interchangeably. Practical applications we compile in hfst, several useful features are available in hfst only. On a daily basis the xerox tools have a somewhat faster compilation speed.
A third compiler is also able to compile source files written for xfst and lexc, the foma compiler.
The Xerox tools are: twolc (for morphophonology), lexc (for morphology), xfst (for compiling the final transducer) , and lookup (for analysis and generation). Hfst has the same tools (called hfst-twolc, hfst-xfst, etc.) as well as a long list of other tools.
The Xerox tools can be found at fsmbook.com. They are documented in the book referred to on that page (Beesley and Karttunen), we strongly recommend anyone working on morphological transducers, both with Xerox and hfst, to buy the book.
The programs are activated by printing e.g. lexc
and then pressing the
enter key. The tools are documented in Karttunen / Beesley Finite-State
Morphology: Xerox Tools and Techniques. The
tools may also be installed on your own machine, be it on Mac OSX, Linux
or Windows. One version of the software is found on the CD accompanying
the book, for the latest version, ask Trond for reference.
The hfst tools are downloaded as described in the Getting started page. Documentation is found at the hfst wiki. For installation, see also our hfst3 installation page. Note that the documentation is mainly technical, for a pedagogical introduction, we still recommend the Beesley and Karttunen book.
Måns Huldén’s Foma may be downloadet at bitbucket.org/mhulden/foma. See our Foma documentation.
The easiest and the most effective way to do this (although a little scary at first) is to use commandline tools. We have made a short introduction in English and a longer document in Norwegian on this topic. The introduction on how to use our parser is also an excellent introduction on how to combine the individual tools.