GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.

View GiellaLT on GitHub

This page documents the ELAN tier structures used by our projects.

Intro

The following presents an inventory of both the linguistic and the tier types used by our projects. *Note that adherance to these structures is necessary to use the ELAN-FST script which automatically adds annotations on word, lemma, morphological categories and part of speech.

ELAN Linguistic Types

Name Stereotype Controlled Vocabulary Purpose
refT - - used for ref-tiers; no stereotype because it ref-tiers are independent, root nodes, and time-alignable
orthT symbolic association - used for orth-tiers, exact time-aligned copy of the superordinate ref-tier
wordT symbolic subdivision - used for word-tiers, overall time-aligned copy of the orth-tier (and thus the ref-tier), but able to be divided into multiple equal parts
lemmaT symbolic subdivision - used for lemma-tiers, overall equally-spaced, time-aligned copy of the word-tier, but able to be divided into multiple equal parts
morphT symbolic subdivision - used for morph-tiers, overall equally-spaced, time-aligned copy of the word-tier, but able to be divided into multiple equal parts
posT symbolic subdivision pos used for pos-tiers, overall equally-spaced, time-aligned copy of the word-tier, but able to be divided into multiple equal parts
ftT symbolic association - used for free translation tiers, overall time-aligned copy of the orth-tier (and thus the ref-tier)
noteT symbolic association - used for tiers adding notes to a given parent tiers, overall time-aligned copy of the parent-tier
langT symbolic subdivision languages used for lang-tier to indicate language(s) being used in the utterance

ELAN Tiers and Tier Hierarchy

Required for each speaker:

Level Name Parent Tier Linguistic Type Language Purpose
0 ref - refT - (numbered) root node, time-aligned annotation units are set here, each annotation is provided with a unique number here
-1 orth ref orthT vernacular an orthographic transcription is provided here; this provides the input for the FST engine
-2 word orth wordT vernacular tokenized version of the orth-tier; automatically created by ELAN-FST-script
-3 lemma word lemmaT vernacular lemma (or lemmata in case of ambiguities) for word form listed in parent tier; automatically created by ELAN-FST-script
-3 morph word morphT English (linguistics) morphological category (or categories in case of ambiguities) for word form listed in parent tier; automatically created by ELAN-FST-script
-3 pos word posT English (linguistics) part of speech (or parts of speech in case of ambiguities) for word form listed in parent tier; adheres to ‘pos’-list of controlled vocabulary; automatically created by ELAN-FST-script

Optional for each speaker:

Level Name Parent Tier Linguistic Type Language Purpose
-2 ft-XYZ orth ftT a relevant lingua franca provides a free translation of the annotated text; XYZ is replaced with a language code (e.g. eng, rus, etc.); can occur multiple times for multiple lingua francas
-2 lang orth langT English indicates the language being used in an annotated utterance or part of an annotated utterance; the language name is in English; adheres to ‘languages’-list of controlled vocabulary

Optional for any tier or as a root node with its own time-alignment:

Level Name Parent Tier Linguistic Type Language Purpose
*** note-XYZ XYZ noteT anything provide unstructured text-based notes for any given parent tier XYZ

*all tiers for a given speaker are named using the tier name plus the @ symbol plus an short form referring to the relevant speaker, such as ref@JKW, lemma@JKW

ELAN Tier Template Files for Download

Template files (in ELAN .etf format):