GiellaLT Documentation

GiellaLT provides rule-based language technology aimed at minority and indigenous languages

View GiellaLT on GitHub

Page Content

  • Installing our standard linguistic compilers
  • Installing an editor
  • Now go back to to Getting Started page for the next step towards building, using and developing the linguistic analysers.
  • Getting started with the GiellaLT infrastructure on Windows

    Ever since Windows 10, Anniversary Update 2018, it has been possible to install a Linux system on Windows. Follow the following instructions to install Linux/bash on Windows 10.

    Note that If you only want to use the ready-made grammatical analysers (as explained on the Linguistic analysis page.

    this documentation is relevant when you want to participate in building and developing the grammatical tools yourself.

    Installation

    Then return here.

    Set up the work environment

    To access Windows files from the linux window, do ls /mnt/ and navigate from there. A good idea would be to make an alias in the .profile file of your linux home folder, e.g. something along the lines of:

    alias lgtech = "pushd /mnt/c/Users/YourUserName/Documents/lgtech"
    

    … where YourUserName should be replaced with just that. The path starts with /mnt/, you should check that the rest of the path is what you want.

    Then writing lgtech will bring you directly to the relevant folder. You then may want to install all language technology files here.

    The good thing with installing them here and not under the home directory is that you can access the files with Windows programs as well (but remember to use UTF-8 encoding!)

    Then follow the instructions for Linux to get the things you need for participating in the development of language technology tools. Rembember that if you only want to use the tools, you may stop here and instead just download the analysers, see the page on linguistic analysis

    Installing required auxiliary programs

    You need a number of tools for the build chain. We assume you have installed Ubuntu on your Windows machine. If you installed some other Linux version, look at its documentation for how to install programs like the ones below):

    Ubuntu (all this in one command)

    sudo apt-get install autoconf automake libtool libsaxonb-java python3-pip \
    python3-lxml  python3-bs4 python3-html5lib libxml-twig-perl antiword xsltproc \
    poppler-utils wget python3-svn wv python3-feedparser subversion openjdk-11-jdk cmake \
    python3-tidylib python3-yaml libxml-libxml-perl libtext-brew-perl
    

    Installing our standard linguistic compilers

    hfst, vislcg3 and apertium

    You need tools to convert your linguistic source code (lexicons, morphology, phonology, syntax, etc.) into usefull tools like analysers, generators, hyphenators and spellers.

    To get that, run these two commands in the terminal (e.g. after having written cd ENTER):

    wget https://apertium.projectjj.com/apt/install-nightly.sh -O - | sudo bash
    
    sudo apt-get -f install apertium-all-dev
    

    This downloads a shell script (1), makes it executable (2), and runs it (3). The shell script in turn will download and install prebuilt binaries for programs for morphology, syntax and machine translation:

    Rerun with regular intervals, e.g. once a year, to get the latest updates.

    hfst is our default compiler, and it builds all our tools. It is open source, and it is needed for turning your morphology and lexicon into spellcheckers and other useful programs.

    Two other compilers (alternatives to hfst)

    The following two programs are not needed, we just refer to them since the source code is compatible with them. If you don’t know whether you need them, just skip them.

    Installing an editor

    In order to participate in the development work, you need an editor, a program for editing text files. Here are some candidates:

    Any other editor handling UTF-8 should be fine as well.

    Now go back to to Getting Started page for the next step towards building, using and developing the linguistic analysers.