Mansi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-mns

Common meeting, sjd, mns

Agenda

Status

We summarised the situation for the fsts, both with coverage above 95%. The 20% drop in coverage for sjd checked in by Trond Tr yesterday looked suspicious, and Trond Ty’s test of today indeed confirmed the good results from previous testing.

Grammatically, there are some open issues, but from a speller point of view they are marginal. Loanword integration is still an issue.

Less work has been done on suggestion, especially for mobile phones.

Report to SDWG

Trond Tr will have a meeting with the project lead on this next week. We discussed the 3 issues of the report:

  1. Anticipated Deliverables and Accomplishments
  2. Anticipated Recommendations
  3. Anticipated Follow-on work.

The report itself will be done jointly with the project lead, but we looked into the following:

Anticipated Deliverables and Accomplishments

This is laid out in the plan (keyboards for relevant platforms, grammatical language models and with that spellcheckers), infrastructure for repeating this for other languages. We will be able to deliver on this.

The largest open issue is one on technical difficulties with integration into various operating systems. We will succeed in delivering most of the promised program + operative system combinations, whether we will be able to deliver all remains to be seen. A third factor is backwards compatibility: We may be able to deliver more on recent versions of the operating systems but less on older systems.

Anticipated Recommendations

The discussion consentrated on implementation issues: How to ensure the programs would be known and available to the key users and the speech community at large.

Anticipated Follow-on work.

For follow-ups we discussed 3 dimensions:

  1. Improve what we have done (improve the spellcheckers)
  2. Extend the work to new domains (e.g. grammar checkers)
  3. Extend the work to new languages.

Improve what we have done (improve the spellcheckers)

Key issues here would be

Even though the spellers will be more than good enough to be helpful, they are still not good enough (if people would have to pay for it they would have filed complaints).

Extend the work to new domains (e.g. grammar checkers)

In general, proofing tools have two target groups: Fluent speakers/writers and learners (revitalisers). For majority languages and even for large minority languages the former group is dominant, whereas especially for indigenous languages the latter group is large and in some cases dominant.

This latter group will, in addition to a spellchecker, need a grammar checker, a program that not only finds typographical errors in words but also find words spelled correctly but used wrong, in the wrong case or person form, etc.

In order to make it possible to use minority languages, dictinary and terminology projects are also important.

(For completeness: A further topic, not mentioned in the meeting, is speech technology, both synthetic speech and speech recognition)

Extend the work to new languages.

We discussed the role of locomotive languages as well as the possibility of extention.

The two languages of this project in essence function as locomotives, showing what is possible for languages with few (Mansi) and very few (Kildin Saami) resources.

The project is still conducted in a neutral and open context (github.com), and can thus be ported to other languages. There are, in fact already grammatical models (of various quality) for several RAIPON languages, e.g. Nenets, Evenki and Chukchi. Also languages for with the grammatical groundwork has been done may be turned into machine-readable grammatical models.

Next meeting

There will not be meetings in September. After that we will have separate meetings in October, but try to get a physical meeting in Helsinki during SAALS for the ones attending there.