GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology.

View GiellaLT on GitHub

Page Content

How to use emacs for our purposes

Adding words, one by one

[to be written..]

Adding many words to the lexicon

Typically, we have a long list of words, (several tens of thousands). Do a reverse sort on that list. If it is called newwords, give the command rev newwords | sort | rev > r-newwords.

Take out a set of lines that behave the same way, e.g. all ending in “njárga”. Make a new file, C-x C-f njarga1, and another file njarga2. The file njarga1 should contain everything before the “:” mark, and the file njarga2 everything after the “:” mark. Cf. the line:

Stuoranjár0ga:Stuoranjár’ga GOAHTI ;

The file njarga1 should contain entries like

Stuoranjár0ga

and the file njarga2 should contain entries like

Stuoranjár’ga GOAHTI ;

So, starting out with a list containing words like “Stuoranjárga”, change the entries in njarga1 to “Stuoranjár0ga”, and the entries in njarga2 to “Stuoranjar’ga GOAHTI ;”. Use the command M-x queTAB-TAB.

Save both files, and leave emacs.

Then you should paste these two files together. To do that, on the command line write the command

paste -d”:” njarga1 njarga2 | less

-d tells that you want to use a separator mark, and the mark you want to use (:) must be in quotation marks. Then comes the file (the leftmost file first), and since you want to control the output, you should write | less. If everything works, you should replace the last part with a command that creates a new file (e.g. named > 2njarga), or, if you know what you are doing, you should just add it directly to the file where you want it, with the >> operator.

paste -d”:” njarga1 njarga2 > 2njarga

or

paste -d”:” njarga1 njarga2 >> propernoun-sme-lex.txt