GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.

View GiellaLT on GitHub

Dictionary for administrative language

Status quo

Plan forward, dates

Conversion

Done

Parallelisation

Fix file names (B, C) ?

Done by the end of 18.2. (as discussed by B/C)

Sentence alignment – tca2

Done by 22.2 next week

Word alignment

Previous steps must be done before startup.

Starting 22.2, deadline 1.3.

Lexicography

Notes

FMT: The word alignment actually takes quite a bit of manual work, in order to process with the analysers, remove the unnecessary formatting and stripping the appropriate tags. It is ideal if this is only done once. In actual amount of time spent it isn’t a huge amount – a day or so. But we won’t get useful result until we get most of the text anyway (at the moment I hvae something like 3-4 files) – also it doesn’t make sense for the lexicographer to look at half-finished output.