GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.

View GiellaLT on GitHub

Page Content

1 alphabetically 2 by probabilities

1 rel freqency WP 2 rel freq actual corpus

only words with higher frequency in fo than in wp

we are looking for terms

could be but not so frequently

-6.146 = 50 – jo närare null desto meir frekvent confidence is conficence for the pair

likelihood of these words to be trans of each other

sme = dynamic compound first part nom, gen, pl

if it never changes I can add it back the reason they are removed is to get a smaller vocabulary size

lemma for compound ok for sme

updated, with all nouns, not the ones with high containing also absolute freq

giza++ ??

n a v exit rest

árvalit+V+TV+Der2+Der/eapmi+N+SgCmp#