Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-sme
This file is now abandoned, as our bugs are reported and solved in our Bugzilla bug report system. This file is kept here for nostalgic reasons.
it accepts girkudáidda but not girkodáidda. The vow shortening in compounds thus does not quite work. Other examples:
Muitosuodjalus
oskkoldat
oskkoldat oskkoldat+N+Sg+Nom
diehtaga
diehtaga died1a+N+Sg+Gen
diehtaga died1a+N+Sg+Acc
oskkoldatdiehtaga
oskkoldatdiehtaga oskkoldatdiehtaga +?
oahpaheaddjeoahpus
oahpaheaddjeoahpus oahpaheaddjeoahpus +?
Answer: because Nom + Nom is not accepted for this type of words.
dárkil works but dárkileappot does not.
dohppema
dohppema dohppema +?
beastima
beastima beastit+V+n+N+Actio+Sg+Gen
beastima beastit+V+n+N+Actio+Sg+Acc
besten, dohppen, beastin are ok, but not bestema, dohppema, contrary to beastima. This is a problem for the DOHPPE lexicon.
goappas1 (wesrt)
goappas1iid
goappas1iid
goappas1iidda
goappas1iin
goappas1iiguin
goappas1in
guktot (east)
guktuid
guktuid
guktuide
guktuin
guktuiguin
gukton (?)
HAVE A LOOK.
The parser gives bealkálas from bealkit, which is correct, but it overgenerates to joavdálas for joavdit, where the correct form should be jovdelas. Look into this.
There are two documented patterns:
Lene -> Lenii
Manasse -> Manassei
The question is: Can there be made some generalisations?
Missing POS
otnás1
otnás1 otnás1+Sg+Gen
otnás1 otnás1+Sg+Acc
otnás1 otnás1+Sg+Nom
otnás1 otnás1+N+Sg+Gen
otnás1 otnás1+N+Sg+Acc
otnás1 otnás1+N+Sg+Nom
mánnálas1
mánnálas1 mánná+N+las1+Sg+Nom
mánnálas1 mánnálas1+N+Sg+Nom
mánnálas1 mánnálas1+A+Sg+Nom
mánnálas1 mánnálas1+A+Attr
The first entry does not say “+A”.
apply down> issoras+A+Comp+Sg+Nom
issorasat
issorat
issorabbu
issoreabbo
issoreabbu
issoret
issorit
issorut
apply down> fargat+A+Sg+Gen
fargat
fargada
suovat
suovada
deaivat > deives1 (missing)
jeagadit > jeagolas1 (missing)
Weak grade not rec. for máhli, duihmi, c1áihmi, -hl-, -hm-, -hn- also in weak grade.
MUSH
has defect Acc, Gen, and ‘apply down’ does not work
LASIS
is not found in the lexicon list at all. TODO: Write a lexicon for LASIS
All CG cases of series II E are checked. The ihx ones do not work (cf. above), but the other ones do.
At one stage , Acc/Gen forms were accompanied by several strange additional forms (Gen#vuoign1an/vuoignám). These are now commented out of the noun lexicon, by a ! mark.
TODO: Check with the original lexicon, to ensure that nothing crucial has been lost in the conversion process.
Correct:
apply down> giella+N+Pl+Com+PxSg3
gielaidisguin
apply down> giella+N+Pl+Com+PxPl3
gielaideasetguin
Errouneous:
apply down> beana+N+Pl+Com+PxPl3
beatnagiiddiset
apply down> beana+N+Pl+Com+PxSg3
beatnagiiddis
Also the contracted words luomi and gahpir behaved the same way as beana. It thus seems this is an error for all contracted nouns.
TODO: Go through the Px paradigm, and see if beana shows errors in other parts of the paradigm, and if there are other words that have problems in the Comitative Plural paradigm.
apply down> jearaldat+N+Pl+Ill
jearaldahkaide
Also for servodat
Missing:
dihto tietty
apply up> buorre
buorri1+A+Sg+Nom <== what is buorri1 ?
buorre+A+Sg+Nom
tag missing
duogás1 duogás1+Sg+Gen
duogás1 duogás1+Sg+Acc
duogás1 duogás1+Sg+Nom
Have a look at this:
apply up> goappa
goabbá+Pron+Interr+Sg+Acc
goabbá+Pron+Interr+Sg+Gen
apply up> goappá
goabbá+Pron+Interr+Sg+Acc
goabbá+Pron+Interr+Sg+Gen
It seems the first one is errouneous.
bienasta bitnii must be included in a list of multiword expressions in the preproscessor.
This preprocessor is located in gt/script/. It has two main problems:
(find examples)
Cf. this example:
"<girkoás1s1it>"
"girku" N Sg Nom # ás1s1i N Pl Nom
"girku" N Sg Gen # ás1s1i N Pl Nom
"<gulle>" S:1314, 1573, 1573, 1530
"gullat" V Ind Prs Du1
"gullet" V VGen
"gullet" V Ind Prs Sg3
Here, the correct reading “gullet V Ind Prt Pl3” is removed due to rule 1314, saying
REMOVE Pl3 IF (0 Sg3) (-1 (N Sg Nom)); ## Dokumeanta c1ilge, mo mii eallit.
But the Sg Nom in the preceeding word is the first part of the compound, not the second, and it should be disregarded during the context evaluation of the 1314 rule.
Possible solutions:
Todo: Evaluate this.