GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
The word guesser game is built on the same idea as MasterMind.
cat ../../main/words/lists/smj/2021-11-03_smj_lemma.freq | # use an existing lemma list if available
grep -v ' Prop' | # Remove proper nounds - check the tag
tr -s ' ' | # squeeze spaces (check output of previous command)
cut -d' ' -f3 | # use the third field for further processing
grep -v -e '[-é\ /.]' -e '[A-Z]' -e '[0-9]' | # Remove lines containing various noise letters
grep '^......$' | # Extract words only 6 letters long - adjust if needed
hfst-lookup -q lang-smj/src/fst/analyser-gt-norm.hfstol | # analyse all extracted lemmas
grep -v 'inf$' | # Remove unrecognised lemmas
grep -v '^$' | cut -f1 | uniq | # clean up the analysis output
sort -R # randomise the list of words
Alternatively, you can grab the list of lemmas directly from the lexc
files:
./giella-core/scripts/extract-lemmas.sh \
lang-sje/src/fst/morphology/stems/*lexc | # Extract all lemmas from lexc
grep '^......$' | # Grep all and only words with correct length
grep -v -e '[A-ZÁÆØÅÄÖ]' -e '\.' -e '[0-9]' | # Grep away problem strings
sort -u | sort -R # clean and randomise