Ghent workshop June 5.-9. 1018
Present: Anna, Jack, Jeremy, Sasha, Trond
The workshop
Content
- Topic for the workshop: Improving the morphology and syntax of mhr.
- Result: Much done, much remaining.
Evaluation:
Morphology and syntax must be better coordinated before next time
TODO
- Make a benchmark after this week, before the next week, after it.
Planning ahead
Linguistic issues
- Improving morphology
- Improving syntax
- Getting the corpus in place
- realistic = 50 mill words
- Further goal:
Corpus issues
Move from 4 to 40 mill words
This will happen in the summer / early autumn
- Trond, Jeremy: Read corpus documentation, collect texts, schedule for addition
to Rusbound.
- Discuss CorpusTools with Børre
- Add the texts to the corpus.
Put things into use
First and foremost the spellcheckers
Things to do before the next meeting
- Jack to fix the lexicon issues on his table
- Jeremy to look at the result
- Adjustment of FST and CG (what tags do we want?)
- All corpus text available to be collected
- Trond to look at non-linguistic tagging issues
- Us all to look at the CG
- Us all to look at the coorpus
Evaluation to be done:
- What is the coverage of the FST?
- What is the disambiguation rate?
Next meeting time
Possible time: Last week of september
TODO: (all): Check calendars.