GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started, and our Privacy document.

View GiellaLT on GitHub giellalt/giellalt.github.io

GiellaLT — Corpus Resources

Under construction icon Under construction.

This page contains a dynamically built list of all corpus repositories. For each language, there are two corpora, corpus-lang-orig and corpus-lang, where the former contains original files and metadata and the latter contains the corpus in text (xml) format.

Private repositories are not listed.

Overview

Grouped according to geography

Languages of the Nordic countries

Languages of Russia

Other European languages

Languages in North America

Languages in Africa

Languages in other parts of the world

Languages with no geography tag

Grouped according to language family

Eskimo-Aleut Languages

Indoeuropean languages

Niger-Congo Languages

Turkic Languages

Uralic Languages

Languages of other language families, isolates, artificial languages

Languages with no language family tag

Sitemap