GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.
Til stades: Berit Merete, Børre, Marja, Ciprian, Sjur, Trond
Ciprian har laga ei køyring (over 0.1) som M & BM har sett på, med lemma & POS+.
Status: Eit par dagars arbeid att.
Problem: Komposita.
$67 0 -5.798 0.0 0.2087912 språk+regle
11 0 -7.605 0.0 0.1212121 sovemedisin
$3 0 -8.904 0.0 0.2222222 sommerhalvår
OBS! vuohta er forsvunnet +Der/vuohta
7 0 -8.057 0.0 0.25 handel+avtale
$66 0 -5.813 0.0 0.25 spesialist+helse+tjeneste<n><m> erenoamáš+dearvvas+bálvalus<n>
=> spesialist+helse+tjeneste<n><m> erenoamáš+dearvvas+Der/vuohta+bálvalus<n>
erenoamášdearvvasvuohtabálvalus
dhcp806-ans:~ ttr000$ echo erenoamášdearvvasvuohtabálvalus | usme
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
erenoamášdearvvasvuohtabálvalus erenoamáš+A+SgGenCmp+Cmp#dearvvas+A+Attr+Der/vuohta+N+SgNomCmp+Cmp#bálvalus+N+Sg+Nom
...
dhcp806-ans:~ ttr000$ echo erenoamášdearvvasvuohtabálvalus | usme | lookup2cg
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
"<erenoamášdearvvasvuohtabálvalus>"
"erenoamáš#dearvvasvuohta#bálvalus" N Sg Nom
Vi får lemma, men mistar delane. Jf:
dhcp806-ans:~ ttr000$ echo sátneheasta | usme
0%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>100%
sátneheasta sátni+N+SgNomCmp+Cmp#heasta+Ani+N+Sg+Nom
dhcp806-ans:~ ttr000$ echo sátneheasta | usme | lookup2cg
"<sátneheasta>"
"sátne#heasta" Ani N Sg Nom
Desse orda er merka.
63 0 -5.822 0.0 0.1111111 rik<adj> sátnerikkis<a> = gt
34 0 -6.476 0.0 0.2 lære+verk<n><nt> sátni+oassi<n> = ap
hum-tf4-ans161:second_run ttr000$ cat *candidates_ap* | grep sátnerikkis
hum-tf4-ans161:second_run ttr000$ cat *candidates_ap* | grep 'sátni+rikkis'
354 0 -4.096 0.0 0.025 samiskspråklig<adj> priváhtarievttálaš<a> gt
22 0 -6.911 0.0 0.5625 privatrettslig<adj> priváhtarievttálaš<a>
l ~/big/st/nob/nowac/nowac-1.1.lemmas_repaired.freq ##
l ~/big/st/nob/nowac/nowac-1.1.lemmas.freq
l ~/big/st/nob/00_readme.txt
GJERA:
Later- og skip-tagger (på vent) korpusarbeidet skal pågå kontinuerlig
TILTAK
Sjå ovafor.
Utsetje til neste møte.
Onsdag 19.12. 09.30