GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
Linguistically, there are differences within several of the orthographies listed below, the grouping here is made from an OCR point of view.
Priority: The important dictionaries we have for 1, 2, 3, 5, 6. The important text corpus orthographies are 2, 4, 6.
TODO: Find out how many OCR models we need, and make them.
This contains Danish letters + palatalisation accents.
This was the dominating orthography until 1948. After 1949, its use was restricted to religious literature (‘‘Nuorttanaste’’ and related texts).
А а B b C c Č č D d Đ đ E e F f
G g Ǥ ǥ H h I i J j K k L l M m
N n Ƞ ƞ O o P p R r S s Š š T t
Ŧ ŧ U u V v Ʒ ʒ Å å Æ æ Ø ø
There are training data for this orthography in tesstrain. TODO: The Stockfleth dictionary.
Nothing done so far.
Nothing done so far.
Most letters are the same as for Friis (ǥ
is gone), but many glyphs are different from the 19th century. Both the dialect basis and the orthographic rules are neš, and the bigram pattern is thus new as well.
А а Á á B b C c Č č D d Đ đ E e
F f G g H h I i J j K k L l M m
N n Ƞ ƞ O o P p R r S s Š š T t
Ŧ ŧ U u V v Z z Ž ž Æ æ (Ø ø) Å å
TODO: Frette.
The letters and the glyphs are the same as for Bergsland/Ruong, but the bigram pattern is different.
А а Á á B b C c Č č D d Đ đ E e
F f G g H h I i J j K k L l M m
N n Ŋ ŋ O o P p R r S s Š š T t
Ŧ ŧ U u V v Z z Ž ž