North Sami NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-sme

Page Content

GRAMDIVVUN-møte 19.05.2021 10:30-12:00

Til stades: Linda, Sjur, Duommá

Regeltyper

real-adnui-atnui.yaml real-johttui-johtui.yaml

real-Derh-Inf.yaml real-DerNomActSgGen-PrfPrc.yaml real-DerNomActSgGen-PrsSg1.yaml real-DerNomActSgGen-PrtSg1.yaml real-Ess-PrfPrc.yaml real-ImprtDu1-DerPassPrsSg3.yaml real-ImprtDu1-NSgNom.yaml real-ImprtPl2-Inf.yaml real-ImprtPl2-PrsPl3.yaml —- Precision 80,6 % real-NomAgIll-PrtSg3.yaml real-PrtPl3-PrsSg3.yaml

msyn-congruence_subj-verb.yaml syn-compound.yaml

evaluering av regeltyper

real-adnui-atnui.yaml

Total passes: 807, Total fails: 140, Total: 947 True positive: 749 True negative: 58 False positive 1: 2 False positive 2: 31 False negative 1: 7 False negative 2: 100 Precision: 95.8% Recall: 87.5% F₁ score: 91.5%

real-johttui-johtui.yaml

Total passes: 355, Total fails: 39, Total: 394 True positive: 353 True negative: 2 False positive 1: 2 False positive 2: 3 False negative 1: 5 False negative 2: 29 Precision: 98.6% Recall: 91.2% F₁ score: 94.8%

real-Derh-Inf.yaml

Total passes: 873, Total fails: 72, Total: 945 True positive: 423 True negative: 450 False positive 1: 5 False positive 2: 41 False negative 1: 5 False negative 2: 21 Precision: 90.2% Recall: 94.2% F₁ score: 92.2%

real-DerNomActSgGen-PrfPrc.yaml

Total passes: 1262, Total fails: 301, Total: 1563 True positive: 711 True negative: 551 False positive 1: 14 False positive 2: 126 False negative 1: 5 False negative 2: 156 Precision: 83.5% Recall: 81.5% F₁ score: 82.5%

— evtl. sjekke

real-DerNomActSgGen-PrsSg1.yaml Total passes: 98, Total fails: 34, Total: 132 True positive: 78 True negative: 20 False positive 1: 0 False positive 2: 22 False negative 1: 0 False negative 2: 12 Precision: 78.0% Recall: 86.7% F₁ score: 82.1%

— sjekke

real-DerNomActSgGen-PrtSg1.yaml Total passes: 1217, Total fails: 153, Total: 1370 True positive: 437 True negative: 780 False positive 1: 8 False positive 2: 115 False negative 1: 6 False negative 2: 24 Precision: 78.0% Recall: 93.6% F₁ score: 85.1%

—- sjekke

real-Ess-PrfPrc.yaml Total passes: 1217, Total fails: 153, Total: 1370 True positive: 437 True negative: 780 False positive 1: 8 False positive 2: 115 False negative 1: 6 False negative 2: 24 Precision: 78.0% Recall: 93.6% F₁ score: 85.1%

—- sjekke — ser bra ut

real-ImprtDu1-DerPassPrsSg3.yaml Total passes: 273, Total fails: 77, Total: 350 True positive: 266 True negative: 7 False positive 1: 3 False positive 2: 47 False negative 1: 0 False negative 2: 27 Precision: 84.2% Recall: 90.8% F₁ score: 87.4%

real-ImprtDu1-NSgNom.yaml Total passes: 526, Total fails: 111, Total: 637 True positive: 370 True negative: 156 False positive 1: 9 False positive 2: 70 False negative 1: 1 False negative 2: 31 Precision: 82.4% Recall: 92.0% F₁ score: 87.0%

msyn-number_congruence-subj-verb) Test 632 - Passes: 1, Fails: 1, Total: 2

Total passes: 675, Total fails: 149, Total: 824 True positive: 448 True negative: 227 False positive 1: 11 False positive 2: 70 False negative 1: 7 False negative 2: 61 Precision: 84.7% Recall: 86.8% F₁ score: 85.7%

—- sjekke

real-ImprtPl2-PrsPl3.yaml Total passes: 83, Total fails: 57, Total: 140 True positive: 76 True negative: 7 False positive 1: 13 False positive 2: 17 False negative 1: 1 False negative 2: 26 Precision: 71.7% Recall: 73.8% F₁ score: 72.7%

— sjekke – ser greit nok ut

tools/grammarcheckers/tests/real-NomAgIll-PrtSg3.yaml Total passes: 323, Total fails: 28, Total: 351 True positive: 147 True negative: 176 False positive 1: 3 False positive 2: 14 False negative 1: 1 False negative 2: 10 Precision: 89.6% Recall: 93.0% F₁ score: 91.3%

real-ImprtPl2-Inf.yaml Total passes: 1147, Total fails: 285, Total: 1432 True positive: 954 True negative: 193 False positive 1: 19 False positive 2: 68 False negative 1: 3 False negative 2: 195 Precision: 91.6% Recall: 82.8% F₁ score: 87.0%

tools/grammarcheckers/tests/real-PrtPl3-PrsSg3.yaml Total passes: 2111, Total fails: 300, Total: 2411 True positive: 799 True negative: 1312 False positive 1: 5 False positive 2: 269 False negative 1: 7 False negative 2: 19 Precision: 74.5% Recall: 96.8% F₁ score: 84.2%

syn-compound.yaml -c Total passes: 4885, Total fails: 543, Total: 5428 True positive: 3432 True negative: 1453 False positive 1: 20 False positive 2: 211 False negative 1: 9 False negative 2: 303 Precision: 93.7% Recall: 91.7% F₁ score: 92.7%

Linda Wiechetek: cat real-Ess-PrfPrc-eval-2021-05-18.txt | grep ‘errorortreal’ | grep ‘tp’ | wc -l 473 cat real-Ess-PrfPrc-eval-2021-05-18.txt | grep ‘errorortreal’ | grep ‘fn’ | wc -l 84 cat real-Ess-PrfPrc-eval-2021-05-18.txt | grep ‘real-Ess-PrfPrc’ | grep ‘fp’ | wc -l 5 Precision: TP/TP+FP = 473/(473+5)=0,9895 Recall: TP/TP+FN=473/(473+84)=0,85

cat real-ImprtPl2-Inf-eval-2021-05-18.txt | grep ‘errorortreal’ | grep ‘fn’ | wc -l 188 cat real-ImprtPl2-Inf-eval-2021-05-18.txt | grep ‘errorortreal’ | grep ‘tp’ | wc -l 467 cat real-ImprtPl2-Inf-eval-2021-05-18.txt | grep ‘real-ImprtPl2-Inf’ | grep ‘fp’ | wc -l 39

Precision: TP/TP+FP = 467/(467+39)=467/506=0.92 Recall: TP/TP+FN=467/(467+188)=467/655=0.71

cat real-PrtPl3-PrsSg3-eval-2021-05-18.txt | grep ‘errorortreal’ | grep ‘fn’ | wc -l 4 cat real-PrtPl3-PrsSg3-eval-2021-05-18.txt | grep ‘errorortreal’ | grep ‘tp’ | wc -l 32 cat real-PrtPl3-PrsSg3-eval-2021-05-18.txt | grep ‘real-PrtPl3-PrsSg3’ | grep ‘fp’ | wc -l 190 Precision: TP/TP+FP = 32/(32+190)=32/222=0.14 Recall: TP/TP+FN=32/(32+4)=32/36=0.88

cat real-NomAgIll-PrtSg3-eval-2021-05-18.txt | grep ‘real-NomAgIll-PrtSg3’ | grep ‘fp’ | wc -l 5 cat real-NomAgIll-PrtSg3-eval-2021-05-18.txt | grep ‘errorortreal’ | grep ‘tp’ | wc -l 21 cat real-NomAgIll-PrtSg3-eval-2021-05-18.txt | grep ‘errorortreal’ | grep ‘fn’ | wc -l 7

Precision: TP/TP+FP = 21/(21+5)=21/26=0.807 Recall: TP/TP+FN=21/(21+4)=21/25=0.84

Slepp-prosedyre:

  1. grammatikkontrollen/cg-filene er klar(e) - alt utkommentert som ikkje skal vera med
  2. sjekk ein siste gong at alt fungerer: make + make check (Duommá og Linda)
  3. etter det gi beskjed til Sjur og ikkje rør filan til han har kompilert og kjørt make check
  4. tagg gramcheck-versjon: GramDivvun-v.1.1.0, inkl. giella-core og giella-shared (Sjur)
  5. kopier zcheck-fila til server => (Gøteborg/Sjur)
  6. sjekk at ting fungerer i GDocs/MS Word (Linda, Duommá)
  7. SoMe-meldingar: ta med dei viktigaste nye feiltypane (Linda, Duommá)

Planlagt sleppdato: 25.5.

Feil som Linda har meldt til Sjur

Manglande analyse

echo "Smihtten, ahte leagoson dat maid váikkuhan dasa ahte mo olggobeale olbmot oidnet gávtti ja gávttegeavaheami eará láhkai go mii sámit. " \
| tools/grammarcheckers/modes/trace-smegramrelese-dev.mode | less

leagoson (auxiliar + partikler) får ingen analyse:

"<leagoson>"
        "leagoson" ? &typo #4->4 ADD:9530:uncorrected-typos

icke enklitiske partiklar uppför sej enklitisk. Sjå i Bugzilla eller https://github.com/giellalt/lang-sme/issues

Manglande retteforslag

Ádelhearrá healkkehii nu ahte goasii nisttihii luhkkárgirjii láhttái, muhto son oaččut fas árjjaid go fas jurdilii ledjona mearkka birra.

også luhkkárgirjii:luhkkárgirjji blir ikkje generert sjøl om det et får en feil

"<luhkkárgirjii>"
        "girje" N Sem/Ani Sg Ill <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> ADD:2210 ADD:2210 @<ADVL MAP:23106 &real-girjji #6->6 ADD:8293:real-girjji
                "luhkkár" N Sem/Hum Cmp/SgGen Cmp <W:0.0> #6->6
real-girjji
        "girji" Sem/Ani <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> ADD:2210 ADD:2210 @<ADVL MAP:23106 N Sg Gen &SUGGEST #6->6 ADD:8293:real-girjji COPY:8301:real-girjjii
                "luhkkár" Sem/Hum Cmp/SgGen Cmp <W:0.0> #6->6
                luhkkár+Cmp/SgGen+Cmp#girji+N+Sg+Gen ?
        "girje" N Sem/Ani Sg Ill <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> ADD:2210 ADD:2210 @<ADVL MAP:23106 &real-girjji #6->6 ADD:8293:real-girjji
                "luhkkár" N Sem/Hum Cmp/SgNom Cmp <W:0.0> #6->6
real-girjji
        "girji" Sem/Ani <W:0.0> <cohort-with-dynamic-compound> <cohort-with-dynamic-compound> ADD:2210 ADD:2210 @<ADVL MAP:23106 N Sg Gen &SUGGEST #6->6 ADD:8293:real-girjji COPY:8301:real-girjjii
                "luhkkár" Sem/Hum Cmp/SgNom Cmp <W:0.0> #6->6
luhkkár+Cmp/SgNom+Cmp#girji+N+Sg+Gen ?

har det noe med retting av underlesinger å gjøre?

Problem: retteforslag blir ikkje generert, jf inputdata - ordklassetag manglar. Kvifor?

Ikkje stor bokstav i forslag

problem - ikkje stor bokstav ved forslag, sjølv om feilen har stor bokstav (Test 376/616):

Finnmárkku dearvvašvuohta galgá geahččat mot sii doaibmet iešguđege suorggis ja leat bidjan johtui prošeavtta "Psykalaš dearvvašvuođasuddjema ja fágaidrasttideaddji spesialiserejuvvon gárrenmirkodivššodeami ovdánahttin ja ođđasit organiseren".

Psykalaš:Psyhkalaš er feilen. Den blir retta, men vi får ikkje store bokstaver:

"<">"
""" PUNCT <W:0.0> &punct-aistton-left #15->15 ID:15 R:RIGHT
:16 ADD:9170:punct-aistton-left ADDRELATION(RIGHT):9183:punct-aistt
on-left-rel ADD:9170:punct-aistton-left ADD:9170:punct-aistton-left
punct-aistton-left
""" PUNCT <W:0.0> "”Psykalaš"S &punct-aistton-left &SUGGESTWF #15->15 ID:15 R:RIGHT:16 ADD:9170:punct-aistton-left ADDRELATION(RIGHT):9183:punct-aistton-left-rel COPY:9197:punct-aistton-left-sugg
punct-aistton-left
"”" PUNCT RIGHT Err/Orth <W:0.0> #15->15 ID:15 R:RIGHT:16
"<Psykalaš>"
"psyhkalaš" A Err/Orth Sem/Dummytag Attr <W:0.0> @>N MAP:22170:r86 &LINK &punct-aistton-left &typo #16->16 ID:16 ADD:9178:punct-aistton-left-link ADD:9178:punct-aistton-left-link ADD:9427:Err/Orth-any ADD:9178:punct-aistton-left-link
punct-aistton-left
typo
"psyhkalaš" A Sem/Dummytag Attr <W:0.0> @>N MAP:22170:r86 &typo &SUGGEST #16->16 ID:16 ADD:9178:punct-aistton-left-link ADD:9178:punct-aistton-left-link ADD:9427:Err/Orth-any COPY:9436:Err/Orth-any
psyhkalaš+A+Attr psyhkalaš
"psyhkalaš" A Err/Orth Sem/Dummytag Sg Nom <W:0.0> @<SUBJ MAP:23652 &LINK &punct-aistton-left &typo #16->16 ID:16 ADD:9178:punct-aistton-left-link ADD:9178:punct-aistton-left-link ADD:9427:Err/Orth-any ADD:9178:punct-aistton-left-link
punct-aistton-left
typo
"psyhkalaš" A Sem/Dummytag Sg Nom <W:0.0> @<SUBJ MAP:23652 &typo &SUGGEST #16->16 ID:16 ADD:9178:punct-aistton-left-link ADD:9178:punct-aistton-left-link ADD:9427:Err/Orth-any COPY:9436:Err/Orth-any
psyhkalaš+A+Sg+Nom psyhkalaš