GiellaLT Documentation

GiellaLT provides rule-based language technology aimed at minority and indigenous languages

View GiellaLT on GitHub

Page Content

How to use emacs for our purposes

Adding words, one by one

[to be written..]

Adding many words to the lexicon

Typically, we have a long list of words, (several tens of thousands). Do a reverse sort on that list. If it is called newwords, give the command rev newwords | sort | rev > r-newwords.

Take out a set of lines that behave the same way, e.g. all ending in “njárga”. Make a new file, C-x C-f njarga1, and another file njarga2. The file njarga1 should contain everything before the “:” mark, and the file njarga2 everything after the “:” mark. Cf. the line:

Stuoranjár0ga:Stuoranjár’ga GOAHTI ;

The file njarga1 should contain entries like


and the file njarga2 should contain entries like

Stuoranjár’ga GOAHTI ;

So, starting out with a list containing words like “Stuoranjárga”, change the entries in njarga1 to “Stuoranjár0ga”, and the entries in njarga2 to “Stuoranjar’ga GOAHTI ;”. Use the command M-x queTAB-TAB.

Save both files, and leave emacs.

Then you should paste these two files together. To do that, on the command line write the command

paste -d”:” njarga1 njarga2 | less

-d tells that you want to use a separator mark, and the mark you want to use (:) must be in quotation marks. Then comes the file (the leftmost file first), and since you want to control the output, you should write | less. If everything works, you should replace the last part with a command that creates a new file (e.g. named > 2njarga), or, if you know what you are doing, you should just add it directly to the file where you want it, with the >> operator.

paste -d”:” njarga1 njarga2 > 2njarga


paste -d”:” njarga1 njarga2 >> propernoun-sme-lex.txt