GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology.

View GiellaLT on GitHub

Page Content

Getting started with the GiellaLT infrastructure on Windows

Ever since Windows 10, Anniversary Update 2018, it has been possible to install a Linux system on Windows. Follow the following instructions to install Linux/bash on Windows 10.

Note that If you only want to use the ready-made grammatical analysers (as explained on the Linguistic analysis page.

this documentation is relevant when you want to participate in building and developing the grammatical tools yourself.

Linux on Windows

Install Linux

Then return here.

Set up the work environment

Find a place for your files

When you open the Linux terminal window, you are in /home/yourlinuxusernamn/. To see your home catalogue on Windows, write: ls /mnt/c/Users/YourWindowsUserName/

A good idea would be to make an alias in the .profile file of your linux home folder, e.g. something along the lines of:

alias lgtech = "pushd /mnt/c/Users/YourWindowsUserName/Documents/lgtech"

… where YourWindowsUserName should be replaced with your Windows user name (= the name of your home catalogue on Windows).

Then writing lgtech when you open Linux will bring you directly to the relevant folder. You then may install all language technology files here. The good thing with installing them here and not under the home directory is that you can access the files with Windows programs (e.g. TextEdit) as well (but remember to use UTF-8 encoding!)

Install what is needed

Then follow the instructions for Linux to get the things you need for participating in the development of language technology tools. Rembember that if you only want to use the tools, you may stop here and instead just download the analysers, see the page on linguistic analysis

Installing required auxiliary programs

You need a number of tools for the build chain. We assume you installed Ubuntu as your Linux version. If you installed some other Linux version, look at its documentation for how to install programs like the ones below):

Install as follows (all this is one command):

sudo apt-get install autoconf automake libtool libsaxonb-java python3-pip \
python3-lxml  python3-bs4 python3-html5lib libxml-twig-perl antiword xsltproc \
poppler-utils wget python3-svn wv python3-feedparser subversion openjdk-11-jdk cmake \
python3-tidylib python3-yaml libxml-libxml-perl libtext-brew-perl

Installing our standard linguistic compilers

hfst, vislcg3 and apertium

You need tools to convert your linguistic source code (lexicons, morphology, phonology, syntax, etc.) into usefull tools like analysers, generators, hyphenators and spellers.

To get that, run these two commands in the terminal (e.g. after having written cd ENTER):

wget https://apertium.projectjj.com/apt/install-nightly.sh -O - | sudo bash

sudo apt-get -f install apertium-all-dev

This downloads a shell script (1), makes it executable (2), and runs it (3). The shell script in turn will download and install prebuilt binaries for programs for morphology, syntax and machine translation:

Rerun with regular intervals, e.g. once a year, to get the latest updates.

hfst is our default compiler, and it builds all our tools. It is open source, and it is needed for turning your morphology and lexicon into spellcheckers and other useful programs.

Troubleshooting

The following error message has been reported when using some hfst program:

hfst-lookup: symbol lookup error: /usr/lib/x86_64-linux-gnu/libhfst.so.55: undefined symbol: fsm_set_option

A solution may be to run:

sudo apt-get install libfoma0=0.10.0+s305-3~focal1

The reason for this seems to be an incongruence in (requirements for) foma versions in the nightly installer and the hfst program itself. The fix is to install libfoma directly, as above.

Two other compilers (alternatives to hfst)

The following two programs are not needed, we just refer to them since the source code is compatible with them. If you don’t know whether you need them, just skip them.

Installing an editor

In order to participate in the development work, you need an editor, a program for editing text files. Here are some candidates:

Any other editor handling UTF-8 should be fine as well.

Now go back to to Getting Started page for the next step towards building, using and developing the linguistic analysers.