mns meeting
- Time: Aug 14th 2024
- Present: Csilla, Jaska, Trond
Agenda
- Speller
- Corpus
- Next meeting
Speller
Testing
Look at typos data. Preparing testing documents like documented here. Note: For Mansi we only look at non-word orthographic errors, the $ type, unannotated, thus:
{лустас}${лусытас}
лустас$лусытас # may be written without {}, we will script them in
{нас ка̄ссыг}${наска̄ссыг} # must be written with {}
When there are no multiword issues, the annotator will write лустас$лусытас and we will then script in the parentheses.
We should
Before release
- Missing down to a certain threshhold
- precision/recall up to a certain threshhold
divvun.org and the online speller should be localised into Russian and perhaps updated as well.
Corpus
Files
Lst one is 2024. https://github.com/giellalt/corpus-mns-orig
https://github.com/giellalt/corpus-mns-orig-x-closed
We do not have the new LS numbers in corpus-mns.
https://gtweb.uit.no/u_korp/
conversion
Same problems as before. Let us have a meeting with Börre.
Next meeting
To be discussed with Jack. In Helsinki? At some Divvun week, in Tromsø.