Eastern Mari NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-mhr

Ghent workshop June 5.-9. 1018

Present: Anna, Jack, Jeremy, Sasha, Trond

The workshop

Content

Evaluation:

Morphology and syntax must be better coordinated before next time

TODO

Planning ahead

Linguistic issues

Move from 4 to 40 mill words

This will happen in the summer / early autumn

  1. Trond, Jeremy: Read corpus documentation, collect texts, schedule for addition to Rusbound.
  2. Discuss CorpusTools with Børre
  3. Add the texts to the corpus.

Put things into use

First and foremost the spellcheckers

Things to do before the next meeting

  1. Jack to fix the lexicon issues on his table
  2. Jeremy to look at the result
  3. Adjustment of FST and CG (what tags do we want?)
  4. All corpus text available to be collected
  5. Trond to look at non-linguistic tagging issues
  6. Us all to look at the CG
  7. Us all to look at the coorpus

Evaluation to be done:

Next meeting time

Possible time: Last week of september

TODO: (all): Check calendars.