GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started, and our Privacy document.
The Giellalt tools described elsewhere on these pages may be downlaoded, compiled and used for linguistic analysis. Linguistic analysis may also be done without compiling the tools.
A wide range of grammatical tools may be used via online tools:
You may also download ready-compiled analysers for text analysis, here we explain how. If you have compiled the tools on your machine already, we recommend this page instead. If not, read on.
These commands will download the compilers hfst and vislcg3. They require a unix system. For use on Windows, see below.
The commands are different on the differeent operating systems:
Download on Mac:
Run these three different commands, one by one:
curl http://apertium.projectjj.com/osx/install-nightly.sh > install-nightly.sh
chmod a+x install-nightly.sh
sudo ./install-nightly.sh
Download on Linux Ubuntu (and on Windows, if you installed Ubuntu there):
Run these two different commands, one by one:
wget https://apertium.projectjj.com/apt/install-nightly.sh -O - | sudo bash
sudo apt-get -f install apertium-all-dev
Download on Linux fedora:
Run these two different commands, one by one:
curl https://apertium.projectjj.com/rpm/install-nightly.sh |sudo bash
sudo dnf install apertium-all-devel
Download on Arch Linux:
Forthcoming
You will need both morphology and syntax. We use North Sámi (ISO code: sme) as an example, use the language code you need (and contact us if your language is missing):
For each language, the pmhfst file gives a morphological analyser and the cg3 file gives the relevant analysis in the sentence.
curl https://gtsvn.uit.no/biggies/trunk/bin/sme/tokeniser-disamb-gt-desc.pmhfst > sme.pmhfst
curl https://gtsvn.uit.no/biggies/trunk/bin/sme/disambiguator.cg3 > sme.cg3
NOTE! For North Sámi (but not for the other languages) you also should run this command:
curl https://gtsvn.uit.no/biggies/trunk/bin/sme/semsets.cg3 > semsets.cg3
The file semset.cg3 should be in the same catalogue as the file sme.cg3.
curl https://gtsvn.uit.no/biggies/trunk/bin/sma/tokeniser-disamb-gt-desc.pmhfst > sma.pmhfst
curl https://gtsvn.uit.no/biggies/trunk/bin/sma/disambiguator.cg3 > sma.cg3
Replace the language code sme with the language you want (note! when the language code is mentioned twice in the commands above, replace both!):
More languages may be added upon request, from this list. Feel free to contact us if your language is missing.
Summary: When you have downloaded the files (cf. the Download… links above), you will be able to run the following command in a terminal window (again with sme as an example), exchange with sma or whatever you language code is:
echo ja | hfst-tokenise -cg sme.pmhfst | vislcg3 -g sme.cg3
The result should be "ja" CC <W:0.0> <sme> @CVP
. If not, ask for help. If yes, you can proceed to the next step, and analyse whole texts. Note that the text must be in clean text format (Word files etc. must be saved as clean text). They you can run the following command.
cat yourtextfile.txt | hfst-tokenise -cg sme.pmhfst | vislcg3 -g sme.cg3
The textfile is sent through a two-step analysis: First through the morphological analyser sme.pmhfst
,
by using the support program hfst-tokenise
. The flag -cg
ensures morphological analysis in the required format.
Thereafter the output is disambiguated with the disambiguator sme.cg3, by using the support program vislcg3
.
The flag -g
identifies the file sme.cg3
as the grammar file. In order to see more options, you may write
hfst-tokenise -h
and vislcg3 -h
.
You may also conduct automatic dictionary lookup, see below.
You may also use the Neahttadigisánit dictionaries on the command line. Warning!! The program to be downloaded here gives translation equivalent only, not explanations or example sentences. For dictionary lookup the online dictionaries are thus far better, the programs presented here are good for automatic lookup.
The dictionaries are found in the catalogue of the first language, the language to translate from. Each dictionary has the file name Lang1Lang2-all.hfst.
Here are two command examples for fetching various dictionaries.
North Saami:
curl https://gtsvn.uit.no/biggies/trunk/bin/sme/smenob-all.hfst > smenob.hfst
curl https://gtsvn.uit.no/biggies/trunk/bin/nob/nobsme-all.hfst > nobsme.hfst
curl https://gtsvn.uit.no/biggies/trunk/bin/fin/finsme-all.hfst > finsme.hfst
curl https://gtsvn.uit.no/biggies/trunk/bin/fin/smefin-all.hfst > smefin.hfst
South Saami:
curl https://gtsvn.uit.no/biggies/trunk/bin/sma/smanob-all.hfst > smanob.hfst
curl https://gtsvn.uit.no/biggies/trunk/bin/nob/nobsma-all.hfst > nobsma.hfst
For other dictionaries, replace sme/smenob-all.hfst above with smn/smnfin-all.hfst, fin/finsmn-all.hfst, sma/smanob-all.hfst, nob/nobsma-all.hfst, and correspondingly for sme/smenob.hfst etc.
The dictionaries may be used in two ways:
cat smn-words.txt | hfst-lookup smnfin-all.hfst
hfst-lookup smnfin-all.hfst
and thereafter write Inari Saami words and press ENTER. Leave the program with ctrl C
.Here you find links to word analysers for South, North and Inari Sami. For more languages, replace the language code (which is repeated 3 times in the url, change all three);
curl https://gtsvn.uit.no/biggies/trunk/bin/sma/sma.hfstol > sma.hfstol
curl https://gtsvn.uit.no/biggies/trunk/bin/sme/sme.hfstol > sme.hfstol
curl https://gtsvn.uit.no/biggies/trunk/bin/smn/smn.hfstol > smn.hfstol
Use the word analysers in two ways:
a, send lists with one word per line through them, with the command:
cat wordlist | hfst-lookup smn.hfstol
b. use the analyser interactively (put it on stand-by) with the command:
hfst-lookup smn.hfstol
Then write one word at a time and press ENTER. Leave the program
with the command ctrl C
.
Note The spellers will need the hfst-ospell program (TODO: Document how to get hfst-ospell from nightly).
curl https://gtsvn.uit.no/biggies/trunk/bin/smn/smn.zhfst > smn.zhfst
Thereafter use them in the same way as esplained for the hfstol files abov. The core command is: (presuming you have the hfst-ospell program:
hfst-ospell -S -n 5 smn.zhfst
The flag -S
means “present a correction suggestion”, and the flag -n 5
specifles the number of suggestions (here: 5).
All the above works on Linux and Mac. In order to make it work on Windows, do the following (one or the other; with a new or updated computer you probably have Windows 11, check in the control panel if you are not sure):
It is not too complicated, but requires admin rights on your machine. Thereafter, execute the commands for Linux ubuntu above.
After having installed Ubuntu on Windows, you have a terminal window
with /home/yourusername/
as your home catalogue (where
yourusername is just that. Now, we want two things: Access windows
files via the terminal, and accessing the terminal via Windows File
Manager.
You can find the path to your Windows files by writing
/mnt/c/Users/
One of the folders (or subfolders!) shown will hopefully be your user name. Note that this path will find files on your computer, not eventual files in the cloud, such as OneDrive, etc.
In the Ubuntu window, write
wslpath -w $HOME
The answer you get will help Windows find your Linux (Ubuntu) files. To do that, open the yellow folder symbol, showing the files in your computer. In the address field (the to the right of the arrows in the top of the window), copy in the the answer you got from the wslpath into this field, and press enter.