GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started, and our Privacy document.

Our svn repositories

The Giellalt linguistic code (language models, keyboards) is on git. We are successively moving files to git, but there are still 4 svn repositories:

langtech - our main source code repository, with dictionaries, documentation and e-learning. We are currently migrating documentation files (like this one) to git.
biggies - large datasets like spell checker test results, recordings and test corpora
freecorpus - freely available corpus files (the non-free corpus data is available for research and development purposes upon request, and with a signed user agreement); corpus files are organised according to format, converted quality and purpose, then according to language, and then genre
speech - speech language technology data, presently speech synthesis recordings and accompanying text files

Details

langtech

browse online: [https://gtsvn.uit.no/langtech/trunk]
check out: svn co https://gtsvn.uit.no/langtech/trunk

biggies

browse online: [https://gtsvn.uit.no/biggies/trunk]
check out: svn co https://gtsvn.uit.no/biggies/trunk

freecorpus

browse online: [https://gtsvn.uit.no/freecorpus]
check out: svn co https://gtsvn.uit.no/freecorpus

speech

browse online: [https://gtsvn.uit.no/speech/trunk]
check out: svn co https://gtsvn.uit.no/speech/trunk

Edit on GitHub

Sitemap