GiellaLT Documentation

GiellaLT provides rule-based language technology aimed at minority and indigenous languages

View GiellaLT on GitHub

Page Content


This page documents conventions, standards and relevant workflows used for ELAN annotations created by the Freiburg-Tromsø Speech Corpora.


ELAN is a GUI tool for the creation of annotations on video and audio resources. It is used by many documentary linguists and several language documentation projects in [DOBES], HRELP and other similar programs.
The program allows for complex corpus searches using RegEx, multi-tier and multi-corpus (i.e. across several ELAN-files) as well as visualization of search results (concordance, frequency, etc.). For ELAN-files stored at [The Language Archive (TLA) TLA.html], these features work also with the online tool Trova.

We use ELAN for annotating our video and audio ressources stored at TLA as well as for annoting and presenting our purely written text corpora (without links to multimedia). Here are the ELAN Documentation Pages at TLA.


The name extension for ELAN files is .eaf. These are basically XML files (and can be opened as such), but they can also be read by the program ELAN for beeing presented and further edited in a GUI.


Current praxis

Planned extension

There is a script for this , at the langdoc/elan-fst page at github, maintained by Niko Partanen, Joshua Wilbur and Mihael Rießler. The pipeline has been used for Komi (the Freiburg project), Pite Saami (Joshua Wilbur) and North Saami (in Oulu).

Planned external project (Zhivotova)

Annotation Conventions


*Documentation page for the ELAN tier structures used by our projects and links to ELAN tier template files (XML file in ELAN’s .etf-format) *Documentation page for Transcription conventions applied by our projects

Related tools

*WebLicht, a web-based tool to semi-automatically annotate texts for linguistics and humanities research. Interaction with WebLicht from ELAN is still only under development