Meeting Jan 16
Participants: Antti, Conor, Trond - later: Sjur
Issues
Presentation
- web page
- Online analyser
- Form generator
- Paradigm generator
- pikiskwewina
- as dictionary and click-in-text
- Spellchecker
- for the same text
- learning tools based on Saami slides
syllabics conversion
Goal:
- to have two spellcheckers (or one spellchecker understanding both)
- Romanian as Roman
- Bulgarian as Syllabics
- Syllabics conversion online
- demo syllabics
TODO
Trond to:
- Send a list of relevant Oahpa presentations (Lene)
- Look at cgi-bin: conversion to/from syllabics
- look at the wrapped syllabic fst
Sjur to:
- look at the wrapped syllabic fst
- Take a trip to Bulgaria
Conor to:
- Add words, look at the linguistics
Speller
How to have a speller
go to langs/crk, and issue these commands:
./configure --with-hfst --enable-spellers
make
sudo make install
- Download LibreOffice if you do not have it already
(must be version 4.1 or newer). - download & install oxt (Mac version):
[http://divvun.no/static_files/voikko-2013-06-26.oxt]- restart LibreOffice if it was already open
- Open a new text document, write some text, select all
- e.g. in the lower margin of your document it says: Default Style US English
- Click on the language name, and choose More…
- set the language to Romanian (or soon: Bulgarian). Romanian (and Bulgarian) should have a blue “ABC” in front of it in the language drop down menu).
To update the speller:
- add a new noun
- make
- sudo make install
- restart LibreOffice
To repeat this on Windows:
- Download and install LibreOffice (at least 4.1)
- download and install the Windows beta oxt: [http://www.puimula.org/htp/testing/voikko-sma-fi.oxt]
- build the speller on a mac or linux machine, final file can be found in:
crk/tools/spellcheckers/fstbased/hfst/crk.zhfst - copy this zhfst file to the Windows machine, place it in:
to-be-added
Text for the presentation
[http://www.ualberta.ca/~arppe/PlainsCree.html]
The text:
nitêminân nipâw sisone iskwatemihk.
dog+N+Pl1Ex sleeps+V beside+PART door+N+LOC
waniskâw kîkisepâ.
wakes.up+V morning+N
waniskâw ekwa nohtekatew.
wakes.up+V and+PART is.hungry+V
wâpahtam ôskanisis wiyâkanihk ekwa mîciw.
sees+V bone+N+DIM bowl+N+LOC and+PART eats+V
ekota-ohci nôhkwâtam wiyâkan.
here.from+PART licks+V bowl+N
keyâpic nohtekatew ekwa kâwe nipâw.
still+PART is.hungry+V and+PART again+PART sleeps+V
Issues to fix:
- waniskâw
- Trond to fix the waniskâw x 14 (due to Trond’s flag diacritic experiments).
- nohtekatew nohtekatew +?
- VAI, to be added
- nôhkwâtam nôhkwâtam +? Conor: Check with Dorothy
- VTA ??
- mîciw mîciw +?
- VTA to be added
- ekota-ohci ekota-ohci +?
Diminutives
Working with a productive analyser for now.
yaml files
Conor has checked files and added verb types. Different verb types are to be checked in.
Verbs
Subjunctive prefix ê-
In writing:
- Either: always ê-
- Or: êh in front of vowels, no mark (thus, just an ê) in front of consonants
Solution:
Two prefixes in lexc:
- ê-
- êh^%eh
and then to h deletion before %^eh in twolc
The tag is now +Sbj, an alternative is +Conjunct mode, +Cnj, so we could do that.
Decided: Use +Cnj
Dictionary
More words in the dictionary, especially the words of the text.
Input here: comma separated stuff to crkeng/inc/:
main/words/langs/crkeng/inc/nouns.csv
cat inc/nouns.csv
atim n dog
inini n man
nâpês n boy
apiscacihkos n antelope
mâyatihk n bighorn sheep
atihk n caribou
apisimôsnos n deer
wâwaskêsniw n elk
mistatim n horse
môswa n moose
maskwa n bear
okistatowân n grizzly bear
wâpask n polar bear
sîsîp n duck
môhkomân n knife
sakâw n soup
mîcimâpo n soup
sîsîpâwi n duck egg
wâwi n egg
iskwêw n woman
nipâw v sleep
wâpam v see
<e>
<lg>
<l pos="N">sakâw</l>
</lg>
<mg>
<tg>
<t pos="N">soup</t>
</tg>
</mg>
</e>
Results to be added to
main/words/langs/crkeng/src/N_crkeng.xml
Procedure for updating the dictionary (needed: account on the gtweb machine):
[/dicts/nds/NDSUpdatingDictionaries.html]
(with pikiskwewina or guusaaw as the variable for DICT)
- Log in to the server via SSH as the NDS user
in the following way: \ssh neahtta@gtweb.uit.no\ with a password - Then follow the instructions on the page
Length on ê
Many writers do not write ê. The analyser handles both, but we should consistently always write e.g. with macron in the code.