GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started, and our Privacy document.
There are a number of classes of tags where the classes are language independent, but the actual tags are language specific. Some examples of such classes of tags are:
All such classes of tags are described below. New classes will probably be added in the future, but we’ll try to keep the document updated. See also the documentation for each language.
Each class is recognised by having a tag prefix, a short string starting
with “+” (for suffix tags; prefix tags for prefixing languages end with
+ as their last character) and ending with “/”. Examples of such tag
prefixes are: +Err/
, +Dial/
etc.
It is assumed — and required — that all tags described here (and all other tags,
for that matter) are declared as multichar symbols in the root.lexc
file of
each language.
The error tag class is defined as follows:
+Err/
The dialect tag class is defined as follows:
+Dial/
DIALECTS
variable is set in configure.ac
, one
filter for each dialect defined there is built automatically. Each
filter will remove all strings tagged with a dialect different from
the one specific to the filter. Untagged strings will be left as is.
The dialect tags are presently only made use of in Oahpa generators.Other notes:
+
or –
,
denoting either inclusion (the entry/form is valid for the specified dialect)
or exclusion (the entry/form is NOT valid for the specified dialect - but for
all others)configure.ac
for the variable DIALECTS
.The area/country tag class is defined as follows:
+Area/
Other notes:
The semantic tag class is defined as follows:
+Sem/
Other notes:
The derivation tag class is defined as follows:
+Der/
The originating language tag class is defined as follows:
+OLang/
+OLang/
language, after which it is
possible to apply OLang-specific phonetic rulesOther notes:
So far the only speech synthesis system we have built is for North Sámi. It was furthermore built without using our text processing technology, and the features being made possible with these tags (ie pronouncing «u» as /ʉː/ instead of the default /uː/) has so far not been put to use. But we expect that to change in the future, as the text processing is applied to open-source speech synthesis systems such as Festival and Simple4All.