GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
This page documents the Skolt Saami dictionary projects at Giellatekno.
The backbone of all dictionary projects, incl. the contlex files for the morphological analyzer, the Skolt Saami Oahpa and different user dictionaries are the lexical data stored in a database called sms2X.
The aim with this common dictionary database is to create a rich structure in single lexicon. We are working on a lexicographic structure which later allows exporting data for different applications: e.g. descriptive dictionaries,
bilingual learner dictionaries, Oahpa!-nuõrti, etc. Thus “sms2X” means both “to-X-languages” and “to-X-products”.
The database is the result of collaborative work carried out at Østsamisk museum Neiden, Freiburg Research Group in Saami Studies, Giellatekno, and members of the Skolt Saami language communities.
The dictionary database sms2X is devided into several single files, each representing one of the
Since most derivations are formed by means of regular/productive morphology and do not represent own lemmas they are stored in separate files for derived PoS’s with the link to the respective root as a variable. For different kinds of dictionaries, we will later handle derivations differently:
A PROBLEM: what are the productive (non/lexicalized) derivations and how do we tag them?
These are the files for derived parts-of-speech: