GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.

View GiellaLT on GitHub

How to update Ordbild, the WordPicture

Make sure you have the CorpusTools installed.

Follow the instrucitions here.

Lemgram

The concept lemgram is the Språkbanken way of modeling what linguists and lexicographers refer to as lexemes and lemmas. Ordbild uses lemgrams.

Definitions of the concepts

Generation of lemgrams

Generation of lemgrams from lexc (note: this may be obsolete, read with care):

Use generator-dict-gt-norm.hfstol. We remove the tags v1, v2.. from the fst. It is better for the user that all variants of the same paradigm are in the same lemgram. Many fst-lemmas have more than one entry in lexc, so the list should be uniqed before generating forms. I suggest that we start with these files:

noun-sme-lex.txt:

For nouns, we pick different 3 lists: The ordinary nouns, the actors (NomAg), and the G3-marked nouns. For the other parts of speech, one command is enough. Commands to filter (ir)relevant forms:

*Ordinary words:

egrep -v "(G3|ACTOR|CmpN/Only|ShCmp|RCmpnd|\+V\+|^\!)"
grep N+NomAg
grep N+G3

verb-sme-lex.txt:

egrep -v "(ENDLEX|\+V|^\!)"

adj-sme-lex.txt:

egrep -v "(LEXICON|Der| Rreal | R |^\!)"

adv-sme-lex.txt:

egrep -v "(LEXICON| K |^\!)"