GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.

View GiellaLT on GitHub

FAD- og korpusmøte 20.11.2012

Til stades: Berit Merete, Børre, Marja, Ciprian, Sjur, Trond

Saker:

Status

Gullgravinga

Ciprian har laga ei køyring (over 0.1) som M & BM har sett på, med lemma & POS+.

Status: Eit par dagars arbeid att.

Problem: Komposita.

$67 0 -5.798 0.0 0.2087912 språk+regle giella+njuolggadus $167 0 -4.885 0.0 0.5604396 språk+regle giella+njuolggadus

11 0 -7.605 0.0 0.1212121 sovemedisin oađđit+dálkkas $9 0 -7.805 0.0 1.0 sosial+satsing sosiála+áŋgiruššat $3 0 -8.904 0.0 0.2222222 sommerhalvår geassi+jahkebealli

$3 0 -8.904 0.0 0.2222222 sommerhalvår geassi+jahkebealli 11 0 -7.605 0.0 0.1212121 sovemedisin oađđit+dálkkas $9 0 -7.805 0.0 1.0 sosial+satsing sosiála+áŋgiruššat

OBS! vuohta er forsvunnet +Der/vuohta

7 0 -8.057 0.0 0.25 handel+avtale efta+gávpi+šiehtadus

$66 0 -5.813 0.0 0.25 spesialist+helse+tjeneste<n><m> erenoamáš+dearvvas+bálvalus<n>
=> spesialist+helse+tjeneste<n><m> erenoamáš+dearvvas+Der/vuohta+bálvalus<n>
erenoamášdearvvasvuohtabálvalus
dhcp806-ans:~ ttr000$ echo erenoamášdearvvasvuohtabálvalus | usme
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
erenoamášdearvvasvuohtabálvalus	erenoamáš+A+SgGenCmp+Cmp#dearvvas+A+Attr+Der/vuohta+N+SgNomCmp+Cmp#bálvalus+N+Sg+Nom
...


dhcp806-ans:~ ttr000$ echo erenoamášdearvvasvuohtabálvalus | usme | lookup2cg
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
"<erenoamášdearvvasvuohtabálvalus>"
	 "erenoamáš#dearvvasvuohta#bálvalus" N Sg Nom

Vi får lemma, men mistar delane. Jf:

dhcp806-ans:~ ttr000$ echo sátneheasta | usme
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
sátneheasta	sátni+N+SgNomCmp+Cmp#heasta+Ani+N+Sg+Nom


dhcp806-ans:~ ttr000$ echo sátneheasta | usme | lookup2cg
"<sátneheasta>"
	 "sátne#heasta" Ani N Sg Nom

Desse orda er merka.

63 0 -5.822 0.0 0.1111111 rik<adj> sátnerikkis<a>  = gt
34 0 -6.476 0.0 0.2 lære+verk<n><nt> sátni+oassi<n>  = ap


hum-tf4-ans161:second_run ttr000$ cat *candidates_ap* | grep sátnerikkis
hum-tf4-ans161:second_run ttr000$ cat *candidates_ap* | grep 'sátni+rikkis'


354 0 -4.096 0.0 0.025 samiskspråklig<adj> priváhtarievttálaš<a> gt
22 0 -6.911 0.0 0.5625 privatrettslig<adj> priváhtarievttálaš<a>

l ~/big/st/nob/nowac/nowac-1.1.lemmas_repaired.freq ## l ~/big/st/nob/nowac/nowac-1.1.lemmas.freq
l ~/big/st/nob/00_readme.txt

GJERA:

Korpuskonvertering

Later- og skip-tagger (på vent) korpusarbeidet skal pågå kontinuerlig

Korpus-buggar

TILTAK

Arbeid framover

Sjå ovafor.

Autshomato

Utsetje til neste møte.

Neste møte

Onsdag 19.12. 09.30