Mansi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-mns

mns meeting

Agenda

Speller

Testing

Look at typos data. Preparing testing documents like documented here. Note: For Mansi we only look at non-word orthographic errors, the $ type, unannotated, thus:

{лустас}${лусытас}
лустас$лусытас # may be written without {}, we will script them in
{нас ка̄ссыг}${наска̄ссыг} # must be written with {}

When there are no multiword issues, the annotator will write лустас$лусытас and we will then script in the parentheses.

We should

Before release

  1. Missing down to a certain threshhold
  2. precision/recall up to a certain threshhold

divvun.org and the online speller should be localised into Russian and perhaps updated as well.

Corpus

Files

Lst one is 2024. https://github.com/giellalt/corpus-mns-orig

https://github.com/giellalt/corpus-mns-orig-x-closed

We do not have the new LS numbers in corpus-mns.

https://gtweb.uit.no/u_korp/

conversion

Same problems as before. Let us have a meeting with Börre.

Next meeting

To be discussed with Jack. In Helsinki? At some Divvun week, in Tromsø.