GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started, and our Privacy document.

View GiellaLT on GitHub

Language Independent Tags In The Giella Infra

There are a number of classes of tags where the classes are language independent, but the actual tags are language specific. Some examples of such classes of tags are:

All such classes of tags are described below. New classes will probably be added in the future, but we’ll try to keep the document updated. See also the documentation for each language.

Each class is recognised by having a tag prefix, a short string starting with “+” (for suffix tags; prefix tags for prefixing languages end with + as their last character) and ending with “/”. Examples of such tag prefixes are: +Err/, +Dial/ etc.

It is assumed — and required — that all tags described here (and all other tags, for that matter) are declared as multichar symbols in the root.lexc file of each language.

Error tags

The error tag class is defined as follows:

Dialect tags

The dialect tag class is defined as follows:

Other notes:

Area tags

The area/country tag class is defined as follows:

Other notes:

Semantic tags

The semantic tag class is defined as follows:

Other notes:

Derivation tags

The derivation tag class is defined as follows:

Originating language tags

The originating language tag class is defined as follows:

Other notes:

So far the only speech synthesis system we have built is for North Sámi. It was furthermore built without using our text processing technology, and the features being made possible with these tags (ie pronouncing «u» as /ʉː/ instead of the default /uː/) has so far not been put to use. But we expect that to change in the future, as the text processing is applied to open-source speech synthesis systems such as Festival and Simple4All.

Sitemap