Finite state and Constraint Grammar based analysers, proofing tools and other resources
We summarised the situation for the fsts, both with coverage above 95%. The 20% drop in coverage for sjd checked in by Trond Tr yesterday looked suspicious, and Trond Ty’s test of today indeed confirmed the good results from previous testing.
Grammatically, there are some open issues, but from a speller point of view they are marginal. Loanword integration is still an issue.
Less work has been done on suggestion, especially for mobile phones.
Trond Tr will have a meeting with the project lead on this next week. We discussed the 3 issues of the report:
The report itself will be done jointly with the project lead, but we looked into the following:
This is laid out in the plan (keyboards for relevant platforms, grammatical language models and with that spellcheckers), infrastructure for repeating this for other languages. We will be able to deliver on this.
The largest open issue is one on technical difficulties with integration into various operating systems. We will succeed in delivering most of the promised program + operative system combinations, whether we will be able to deliver all remains to be seen. A third factor is backwards compatibility: We may be able to deliver more on recent versions of the operating systems but less on older systems.
The discussion consentrated on implementation issues: How to ensure the programs would be known and available to the key users and the speech community at large.
For follow-ups we discussed 3 dimensions:
Key issues here would be
Even though the spellers will be more than good enough to be helpful, they are still not good enough (if people would have to pay for it they would have filed complaints).
In general, proofing tools have two target groups: Fluent speakers/writers and learners (revitalisers). For majority languages and even for large minority languages the former group is dominant, whereas especially for indigenous languages the latter group is large and in some cases dominant.
This latter group will, in addition to a spellchecker, need a grammar checker, a program that not only finds typographical errors in words but also find words spelled correctly but used wrong, in the wrong case or person form, etc.
In order to make it possible to use minority languages, dictinary and terminology projects are also important.
(For completeness: A further topic, not mentioned in the meeting, is speech technology, both synthetic speech and speech recognition)
We discussed the role of locomotive languages as well as the possibility of extention.
The two languages of this project in essence function as locomotives, showing what is possible for languages with few (Mansi) and very few (Kildin Saami) resources.
The project is still conducted in a neutral and open context (github.com), and can thus be ported to other languages. There are, in fact already grammatical models (of various quality) for several RAIPON languages, e.g. Nenets, Evenki and Chukchi. Also languages for with the grammatical groundwork has been done may be turned into machine-readable grammatical models.
There will not be meetings in September. After that we will have separate meetings in October, but try to get a physical meeting in Helsinki during SAALS for the ones attending there.