GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology.
1 alphabetically 2 by probabilities
1 rel freqency WP 2 rel freq actual corpus
only words with higher frequency in fo than in wp
we are looking for terms
could be but not so frequently
-6.146 = 50 – jo närare null desto meir frekvent confidence is conficence for the pair
likelihood of these words to be trans of each other
sme = dynamic compound first part nom, gen, pl
if it never changes I can add it back the reason they are removed is to get a smaller vocabulary size
lemma for compound ok for sme
updated, with all nouns, not the ones with high containing also absolute freq
giza++ ??
n a v exit rest
árvalit+V+TV+Der2+Der/eapmi+N+SgCmp#