GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology.

View GiellaLT on GitHub

Page Content

FAD- og korpusmøte 20.11.2012

Til stades: Berit Merete, Børre, Marja, Ciprian, Sjur, Trond

Saker:

Status

Gullgravinga

Ciprian har laga ei køyring (over 0.1) som M & BM har sett på, med lemma & POS+.

Status: Eit par dagars arbeid att.

Problem: Komposita.

$67 0 -5.798 0.0 0.2087912 språk+regle giella+njuolggadus $167 0 -4.885 0.0 0.5604396 språk+regle giella+njuolggadus

11 0 -7.605 0.0 0.1212121 sovemedisin oađđit+dálkkas $9 0 -7.805 0.0 1.0 sosial+satsing sosiála+áŋgiruššat $3 0 -8.904 0.0 0.2222222 sommerhalvår geassi+jahkebealli

$3 0 -8.904 0.0 0.2222222 sommerhalvår geassi+jahkebealli 11 0 -7.605 0.0 0.1212121 sovemedisin oađđit+dálkkas $9 0 -7.805 0.0 1.0 sosial+satsing sosiála+áŋgiruššat

OBS! vuohta er forsvunnet +Der/vuohta

7 0 -8.057 0.0 0.25 handel+avtale efta+gávpi+šiehtadus

$66 0 -5.813 0.0 0.25 spesialist+helse+tjeneste<n><m> erenoamáš+dearvvas+bálvalus<n>
=> spesialist+helse+tjeneste<n><m> erenoamáš+dearvvas+Der/vuohta+bálvalus<n>
erenoamášdearvvasvuohtabálvalus
dhcp806-ans:~ ttr000$ echo erenoamášdearvvasvuohtabálvalus | usme
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
erenoamášdearvvasvuohtabálvalus	erenoamáš+A+SgGenCmp+Cmp#dearvvas+A+Attr+Der/vuohta+N+SgNomCmp+Cmp#bálvalus+N+Sg+Nom
...


dhcp806-ans:~ ttr000$ echo erenoamášdearvvasvuohtabálvalus | usme | lookup2cg
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
"<erenoamášdearvvasvuohtabálvalus>"
	 "erenoamáš#dearvvasvuohta#bálvalus" N Sg Nom

Vi får lemma, men mistar delane. Jf:

dhcp806-ans:~ ttr000$ echo sátneheasta | usme
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
sátneheasta	sátni+N+SgNomCmp+Cmp#heasta+Ani+N+Sg+Nom


dhcp806-ans:~ ttr000$ echo sátneheasta | usme | lookup2cg
"<sátneheasta>"
	 "sátne#heasta" Ani N Sg Nom

Desse orda er merka.

63 0 -5.822 0.0 0.1111111 rik<adj> sátnerikkis<a>  = gt
34 0 -6.476 0.0 0.2 lære+verk<n><nt> sátni+oassi<n>  = ap


hum-tf4-ans161:second_run ttr000$ cat *candidates_ap* | grep sátnerikkis
hum-tf4-ans161:second_run ttr000$ cat *candidates_ap* | grep 'sátni+rikkis'


354 0 -4.096 0.0 0.025 samiskspråklig<adj> priváhtarievttálaš<a> gt
22 0 -6.911 0.0 0.5625 privatrettslig<adj> priváhtarievttálaš<a>

l ~/big/st/nob/nowac/nowac-1.1.lemmas_repaired.freq # l ~/big/st/nob/nowac/nowac-1.1.lemmas.freq
l ~/big/st/nob/00_readme.txt

GJERA:

Korpuskonvertering

Later- og skip-tagger (på vent) korpusarbeidet skal pågå kontinuerlig

Korpus-buggar

TILTAK

Arbeid framover

Sjå ovafor.

Autshomato

Utsetje til neste møte.

Neste møte

Onsdag 19.12. 09.30