Bug reports, errors
This file is now abandoned, as our bugs are reported and solved in our Bugzilla bug report system. This file is kept here for nostalgic reasons.
Morphophonological (twol) errors
it accepts girkudáidda but not girkodáidda. The vow shortening in compounds thus does not quite work. Other examples:
Muitosuodjalus
oskkoldat
oskkoldat oskkoldat+N+Sg+Nom
diehtaga
diehtaga died1a+N+Sg+Gen
diehtaga died1a+N+Sg+Acc
oskkoldatdiehtaga
oskkoldatdiehtaga oskkoldatdiehtaga +?
oahpaheaddjeoahpus
oahpaheaddjeoahpus oahpaheaddjeoahpus +?
Answer: because Nom + Nom is not accepted for this type of words.
Moprpholoical errors (Errors in the rule file)
comparative
dárkil works but dárkileappot does not.
no paradigm for actio of DOHPPE
dohppema
dohppema dohppema +?
beastima
beastima beastit+V+n+N+Actio+Sg+Gen
beastima beastit+V+n+N+Actio+Sg+Acc
besten, dohppen, beastin are ok, but not bestema, dohppema, contrary to beastima. This is a problem for the DOHPPE lexicon.
Definite pronouns
goappas1 (wesrt)
goappas1iid
goappas1iid
goappas1iidda
goappas1iin
goappas1iiguin
goappas1in
guktot (east)
guktuid
guktuid
guktuide
guktuin
guktuiguin
gukton (?)
HAVE A LOOK.
e in front of -las in deverbal adjectives
The parser gives bealkálas from bealkit, which is correct, but it overgenerates to joavdálas for joavdit, where the correct form should be jovdelas. Look into this.
Illative of foreign names in -e
There are two documented patterns:
Lene -> Lenii
Manasse -> Manassei
The question is: Can there be made some generalisations?
Qestions, open issues
Missing POS
otnás1
otnás1 otnás1+Sg+Gen
otnás1 otnás1+Sg+Acc
otnás1 otnás1+Sg+Nom
otnás1 otnás1+N+Sg+Gen
otnás1 otnás1+N+Sg+Acc
otnás1 otnás1+N+Sg+Nom
Missing POS in derivatives
mánnálas1
mánnálas1 mánná+N+las1+Sg+Nom
mánnálas1 mánnálas1+N+Sg+Nom
mánnálas1 mánnálas1+A+Sg+Nom
mánnálas1 mánnálas1+A+Attr
The first entry does not say “+A”.
Diacritical marks in the lexicon forms
The forest of comparatives
apply down> issoras+A+Comp+Sg+Nom
issorasat
issorat
issorabbu
issoreabbo
issoreabbu
issoret
issorit
issorut
apply down> fargat+A+Sg+Gen
fargat
fargada
suovat
suovada
Missing declension forms (?)
deaivat > deives1 (missing)
jeagadit > jeagolas1 (missing)
Gradation error for certain nouns
Weak grade not rec. for máhli, duihmi, c1áihmi, -hl-, -hm-, -hn- also in weak grade.
Errors in the lexicon files (missing words)
MUSH
has defect Acc, Gen, and ‘apply down’ does not work
LASIS
is not found in the lexicon list at all. TODO: Write a lexicon for LASIS
Checking diary
All CG cases of series II E are checked. The ihx ones do not work (cf. above), but the other ones do.
The multiple genitive forms
At one stage , Acc/Gen forms were accompanied by several strange additional forms (Gen#vuoign1an/vuoignám). These are now commented out of the noun lexicon, by a ! mark.
TODO: Check with the original lexicon, to ensure that nothing crucial has been lost in the conversion process.
Miscellania
Comitative plural and Px
Correct:
apply down> giella+N+Pl+Com+PxSg3
gielaidisguin
apply down> giella+N+Pl+Com+PxPl3
gielaideasetguin
Errouneous:
apply down> beana+N+Pl+Com+PxPl3
beatnagiiddiset
apply down> beana+N+Pl+Com+PxSg3
beatnagiiddis
Also the contracted words luomi and gahpir behaved the same way as beana. It thus seems this is an error for all contracted nouns.
TODO: Go through the Px paradigm, and see if beana shows errors in other parts of the paradigm, and if there are other words that have problems in the Comitative Plural paradigm.
Words
apply down> jearaldat+N+Pl+Ill
jearaldahkaide
Also for servodat
Missing:
dihto tietty
apply up> buorre
buorri1+A+Sg+Nom <== what is buorri1 ?
buorre+A+Sg+Nom
tag missing
duogás1 duogás1+Sg+Gen
duogás1 duogás1+Sg+Acc
duogás1 duogás1+Sg+Nom
Compounds
Closed classes
goappa
Have a look at this:
apply up> goappa
goabbá+Pron+Interr+Sg+Acc
goabbá+Pron+Interr+Sg+Gen
apply up> goappá
goabbá+Pron+Interr+Sg+Acc
goabbá+Pron+Interr+Sg+Gen
It seems the first one is errouneous.
The tokenizer
bienasta bitnii must be included in a list of multiword expressions in the preproscessor.
The vislcg preprocessor lookup2cg
This preprocessor is located in gt/script/. It has two main problems:
- The quotation marks are not always in place
- The grammatical tags are kept on non-final elements in compounds.
The quotation marks are not always in place
(find examples)
The grammatical tags are kept on non-final elements in compounds.
Cf. this example:
"<girkoás1s1it>"
"girku" N Sg Nom # ás1s1i N Pl Nom
"girku" N Sg Gen # ás1s1i N Pl Nom
"<gulle>" S:1314, 1573, 1573, 1530
"gullat" V Ind Prs Du1
"gullet" V VGen
"gullet" V Ind Prs Sg3
Here, the correct reading “gullet V Ind Prt Pl3” is removed due to rule 1314, saying
REMOVE Pl3 IF (0 Sg3) (-1 (N Sg Nom)); ## Dokumeanta c1ilge, mo mii eallit.
But the Sg Nom in the preceeding word is the first part of the compound, not the second, and it should be disregarded during the context evaluation of the 1314 rule.
Possible solutions:
- Remove all grammatical information before the # symbol:
This is a clean solution. One marginal problem is that the initial tag, the “word” itself is kept, and this may act as a tag in its own right. - Change the grammatical tags before the # symbol into something
else, e.g. by wrapping < > parentheses around them.:
The output becomes cumbersome to read, but it may still be the best solution. - One possibility may be to include the # symbol in the set
definitions, so that for each tag, the set of corresponding tags
including a succeeding # is disregarded, e.g. SET NSGNOM = (N Sg
Nom) - (N Sg Nom #);:
This looks cumbersome, though, as all tag combinations must be decleared as sets.
Todo: Evaluate this.