GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
The new infrastructure produces transducers with a different naming scheme than in the old infra. The names are longer and more descriptive.
The strings to represent the value for each class is given first. The order of the info classes represent the order in the filename.
Based on the above list, we get this:
basictype-application-tagset-normativity[-dialect][-orthography].fsttype
This should give transducers with names like:
analyser-oahpa-gt-desc.hfst
generator-mt-apertium-single.hfst
analyser-gt-desc.xfst
These are not lexical transducers, instead they convert between one representational format and another, like between dates written as text and dates written with digits. For this type of transducers there is a separate basic transducertype: transcriptor. The application name specifies what type of transcription is being done by the transducer, and the direction is specified using either digit2text or text2digit.
Possible application names are:
Conversion to IPA is a variant of this type of transducers, and is named:
Since no tags are involved in these types of transducers, the tagset is left out.
The transcriptors are sometimes filtered (i.e. some forms are excluded in one direction or the other) and that is indicated by a suffix .filtered. And because of the issues with Xerox lookup, whether the fst is intended for lookup or composition is also indicated by a suitable suffix.
transcriptor-date-text2digit.filtered.lookup.xfst
transcriptor-clock-digit2text.filtered.lookup.xfst
transcriptor-text2ipa-desc.hfst