GiellaLT provides rule-based language technology aimed at minority and indigenous languages
[to be written..]
Typically, we have a long list of words, (several tens of thousands). Do a reverse sort on that list. If it is called newwords, give the command rev newwords | sort | rev > r-newwords.
Take out a set of lines that behave the same way, e.g. all ending in “njárga”. Make a new file, C-x C-f njarga1, and another file njarga2. The file njarga1 should contain everything before the “:” mark, and the file njarga2 everything after the “:” mark. Cf. the line:
Stuoranjár0ga:Stuoranjár’ga GOAHTI ;
The file njarga1 should contain entries like
and the file njarga2 should contain entries like
Stuoranjár’ga GOAHTI ;
So, starting out with a list containing words like “Stuoranjárga”, change the entries in njarga1 to “Stuoranjár0ga”, and the entries in njarga2 to “Stuoranjar’ga GOAHTI ;”. Use the command M-x queTAB-TAB.
Save both files, and leave emacs.
Then you should paste these two files together. To do that, on the command line write the command
paste -d”:” njarga1 njarga2 | less
-d tells that you want to use a separator mark, and the mark you want to use (:) must be in quotation marks. Then comes the file (the leftmost file first), and since you want to control the output, you should write | less. If everything works, you should replace the last part with a command that creates a new file (e.g. named > 2njarga), or, if you know what you are doing, you should just add it directly to the file where you want it, with the >> operator.
paste -d”:” njarga1 njarga2 > 2njarga
paste -d”:” njarga1 njarga2 >> propernoun-sme-lex.txt