Meeting Jan 16

Participants: Antti, Conor, Trond - later: Sjur

Issues

Presentation

web page
- Online analyser
- Form generator
- Paradigm generator
pikiskwewina
- as dictionary and click-in-text
Spellchecker
- for the same text
learning tools based on Saami slides

syllabics conversion

Goal:

to have two spellcheckers (or one spellchecker understanding both)
Romanian as Roman
Bulgarian as Syllabics
Syllabics conversion online
demo syllabics

TODO

Trond to:

Send a list of relevant Oahpa presentations (Lene)
Look at cgi-bin: conversion to/from syllabics
look at the wrapped syllabic fst

Sjur to:

look at the wrapped syllabic fst
Take a trip to Bulgaria

Conor to:

Add words, look at the linguistics

Speller

How to have a speller

go to langs/crk, and issue these commands:

./configure --with-hfst --enable-spellers
make
sudo make install

Download LibreOffice if you do not have it already
(must be version 4.1 or newer).
download & install oxt (Mac version):
[http://divvun.no/static_files/voikko-2013-06-26.oxt]
- restart LibreOffice if it was already open
Open a new text document, write some text, select all
- e.g. in the lower margin of your document it says: Default Style US English
Click on the language name, and choose More…
set the language to Romanian (or soon: Bulgarian). Romanian (and Bulgarian) should have a blue “ABC” in front of it in the language drop down menu).

To update the speller:

add a new noun
make
sudo make install
restart LibreOffice

To repeat this on Windows:

Download and install LibreOffice (at least 4.1)
download and install the Windows beta oxt: [http://www.puimula.org/htp/testing/voikko-sma-fi.oxt]
build the speller on a mac or linux machine, final file can be found in: crk/tools/spellcheckers/fstbased/hfst/crk.zhfst
copy this zhfst file to the Windows machine, place it in: to-be-added

Text for the presentation

[http://www.ualberta.ca/~arppe/PlainsCree.html]

The text:

nitêminân nipâw sisone iskwatemihk.
dog+N+Pl1Ex sleeps+V beside+PART door+N+LOC

waniskâw kîkisepâ.
wakes.up+V morning+N 

waniskâw ekwa nohtekatew.
wakes.up+V and+PART is.hungry+V

wâpahtam ôskanisis wiyâkanihk ekwa mîciw.
sees+V bone+N+DIM bowl+N+LOC and+PART eats+V

ekota-ohci nôhkwâtam wiyâkan.
here.from+PART licks+V bowl+N

keyâpic nohtekatew ekwa kâwe nipâw.
still+PART is.hungry+V and+PART again+PART sleeps+V

Issues to fix:

waniskâw
- Trond to fix the waniskâw x 14 (due to Trond’s flag diacritic experiments).
nohtekatew nohtekatew +?
- VAI, to be added
nôhkwâtam nôhkwâtam +? Conor: Check with Dorothy
- VTA ??
mîciw mîciw +?
- VTA to be added
ekota-ohci ekota-ohci +?

Diminutives

Working with a productive analyser for now.

yaml files

Conor has checked files and added verb types. Different verb types are to be checked in.

Verbs

Subjunctive prefix ê-

In writing:

Either: always ê-
Or: êh in front of vowels, no mark (thus, just an ê) in front of consonants

Solution:

Two prefixes in lexc:

ê-
êh^%eh

and then to h deletion before %^eh in twolc

The tag is now +Sbj, an alternative is +Conjunct mode, +Cnj, so we could do that.

Decided: Use +Cnj

Dictionary

More words in the dictionary, especially the words of the text. Input here: comma separated stuff to crkeng/inc/:

main/words/langs/crkeng/inc/nouns.csv

cat inc/nouns.csv 
atim	n	dog
inini	n	man	
nâpês	n	boy	
apiscacihkos	n	antelope
mâyatihk	n	bighorn sheep
atihk	n	caribou
apisimôsnos	n	deer
wâwaskêsniw	n	elk
mistatim	n	horse
môswa	n	moose
maskwa	n	bear
okistatowân	n	grizzly bear
wâpask	n	polar bear
sîsîp	n	duck
môhkomân	n	knife
sakâw	n	soup
mîcimâpo	n	soup
sîsîpâwi	n	duck egg
wâwi	n	egg
iskwêw	n	woman
nipâw	v	sleep
wâpam	v	see

  <e>
    <lg>
      <l pos="N">sakâw</l>
    </lg>
    <mg>
      <tg>
        <t pos="N">soup</t>
      </tg>
    </mg>
  </e>

Results to be added to

main/words/langs/crkeng/src/N_crkeng.xml

Procedure for updating the dictionary (needed: account on the gtweb machine):

[/dicts/nds/NDSUpdatingDictionaries.html]

(with pikiskwewina or guusaaw as the variable for DICT)

Log in to the server via SSH as the NDS user
in the following way: \ ssh neahtta@gtweb.uit.no \ with a password
Then follow the instructions on the page

Length on ê

Many writers do not write ê. The analyser handles both, but we should consistently always write e.g. with macron in the code.

Plains Cree NLP Grammar

Page Content