GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
This page documents updating lexicon and fsts on the server.
For the impatient: The short version:
ssh gtdict.uit.no
sudo su neahtta
nds update DICT
nds compile DICT
nds restart DICT
The longer explanation:
1.) Log in to the server, and become the neahtta user
ssh gtdict.uit.no
sudo su neahtta
The python virtual environment should be activated automatically, this is
indicated by the starting (venv)
in your prompt. You will also be placed
in the ~/neahttadigisanit/neahtta
folder. So, your prompt should look like:
(venv) neahtta@gtdict-02:~/neahttadigisanit/neahtta$
If this does not happen, then run
cd /home/neahtta/neahttadigisanit/neahtta
. venv/bin/activate
When you see (venv) in the command prompt, continue.
2.) Run the update command
nds update DICT
Replace DICT
with sanit
, baakoeh
, etc.
The output will tell you what happened, and which dictionaries were updated.
READ THIS (TEMPORARY ERRORS): As of November 2024, The nds update DICT
command DOES NOT WORK PROPERLY! You are going to have to go to each
individual ~/gut/giellalt/dict-xxx-yyy
folder, and run git pull
manually.
To make sure everything is correct, also run git status
in the dictionaries
after pulling, to see that the git status is in order. If anything is out of
place, for example, it may say that a rebase is in progress, then steps
will have to be taken to get the git status back to normal.
Hint: Use nds ls -d DICT
to see which dict-xxx-yyy
are a part of that
dictionary instance.
3.) Run the compile command
nds compile DICT
The output will show you which dictionaries got compiled, and how many entries
(<e>
nodes) it found in total. You may also look at the compiled
dicts/xxx-yyy.xml
files directly to see that they look like expected. For
example, that they contains roughly as many lines as all the corresponding
dict-xxx-yyy/src/*.xml
files combined.
If there is an error in an XML file used in compilation, the compile script
will give an error, and the existing compiled dicts/xxx-yyy.xml
file will
NOT be written to.
Notice that if you run the compile command again, then no dictionaries will
be compiled, because the nds
script detects that the already compiled
dictionary is newer than the sources. You can use nds compile DICT -f
to force
recompilation, for example to reset sme-nob
to not have stem information.
4.) Restart the instance
nds restart DICT
If it fails to start, it will print out information about what went wrong. The instance will not start unless everything in the configuration file is in order. If an instance is meant to run without an fst, for example, then simply comment-out the line specifying the fst in the config file, and run again.
There is a command that can also run these checks, and print out information.
nds test-configuration DICT
Running it will evaluate the config, test dictionaries, and then print FST paths and last updated date. If an FST is missing from its expected path, it will be listed as MISSING. If you see any errors at the end of the process, or worse, Python errors, something is wrong and you should avoid restarting until this is corrected.
As of November 2024, all FSTs running on the server are the ones from apertium nightly. They are updated through the operating system’s usual update mechanics, namely:
sudo apt-get update && sudo apt-get upgrade
You can see the paths to all FSTs used in all dictionaries, by running:
rg "file:" neahtta/configs/*.config.yaml
You will see all FST files are located in the apertium nightly folder, namely
/usr/share/giella/LANG/FILE
.
If, in the future, some dictionary uses an FST that is not from apertium nightly, then of course that will have to be updated manually.