GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
This page documents the ELAN tier structures used by our projects.
The following presents an inventory of both the linguistic and the tier types used by our projects. *Note that adherance to these structures is necessary to use the ELAN-FST script which automatically adds annotations on word, lemma, morphological categories and part of speech.
Name | Stereotype | Controlled Vocabulary | Purpose |
---|---|---|---|
refT | - | - | used for ref-tiers; no stereotype because it ref-tiers are independent, root nodes, and time-alignable |
orthT | symbolic association | - | used for orth-tiers, exact time-aligned copy of the superordinate ref-tier |
wordT | symbolic subdivision | - | used for word-tiers, overall time-aligned copy of the orth-tier (and thus the ref-tier), but able to be divided into multiple equal parts |
lemmaT | symbolic subdivision | - | used for lemma-tiers, overall equally-spaced, time-aligned copy of the word-tier, but able to be divided into multiple equal parts |
morphT | symbolic subdivision | - | used for morph-tiers, overall equally-spaced, time-aligned copy of the word-tier, but able to be divided into multiple equal parts |
posT | symbolic subdivision | pos | used for pos-tiers, overall equally-spaced, time-aligned copy of the word-tier, but able to be divided into multiple equal parts |
ftT | symbolic association | - | used for free translation tiers, overall time-aligned copy of the orth-tier (and thus the ref-tier) |
noteT | symbolic association | - | used for tiers adding notes to a given parent tiers, overall time-aligned copy of the parent-tier |
langT | symbolic subdivision | languages | used for lang-tier to indicate language(s) being used in the utterance |
Level | Name | Parent Tier | Linguistic Type | Language | Purpose |
---|---|---|---|---|---|
0 | ref | - | refT | - (numbered) | root node, time-aligned annotation units are set here, each annotation is provided with a unique number here |
-1 | orth | ref | orthT | vernacular | an orthographic transcription is provided here; this provides the input for the FST engine |
-2 | word | orth | wordT | vernacular | tokenized version of the orth-tier; automatically created by ELAN-FST-script |
-3 | lemma | word | lemmaT | vernacular | lemma (or lemmata in case of ambiguities) for word form listed in parent tier; automatically created by ELAN-FST-script |
-3 | morph | word | morphT | English (linguistics) | morphological category (or categories in case of ambiguities) for word form listed in parent tier; automatically created by ELAN-FST-script |
-3 | pos | word | posT | English (linguistics) | part of speech (or parts of speech in case of ambiguities) for word form listed in parent tier; adheres to ‘pos’-list of controlled vocabulary; automatically created by ELAN-FST-script |
Level | Name | Parent Tier | Linguistic Type | Language | Purpose |
---|---|---|---|---|---|
-2 | ft-XYZ | orth | ftT | a relevant lingua franca | provides a free translation of the annotated text; XYZ is replaced with a language code (e.g. eng, rus, etc.); can occur multiple times for multiple lingua francas |
-2 | lang | orth | langT | English | indicates the language being used in an annotated utterance or part of an annotated utterance; the language name is in English; adheres to ‘languages’-list of controlled vocabulary |
Level | Name | Parent Tier | Linguistic Type | Language | Purpose |
---|---|---|---|---|---|
*** | note-XYZ | XYZ | noteT | anything | provide unstructured text-based notes for any given parent tier XYZ |
*all tiers for a given speaker are named using the tier name plus the @ symbol plus an short form referring to the relevant speaker, such as ref@JKW, lemma@JKW
Template files (in ELAN .etf format):