Finite state and Constraint Grammar based analysers, proofing tools and other resources
240422: 1-(27055/707586) = 0.96176
240429: 1-(26573/707554) = 0.96244
240422: 547 1.09 46.75 49.54
240429: 579 1.16 45.71 48.86
old command for coverage:
cat test/data/Luima_Seripos_2013-2017.txt |\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
preprocess --corr=test/data/typos.txt|\
hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
grep " ?"|cut -d'"' -f2|wc -l
New command for coverage, removing Russian words (from Russian phrases in text):
cat test/data/Luima_Seripos_2013-2017.txt |\
hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
preprocess --corr=test/data/typos.txt|\
hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |\
grep " ?"|cut -d'"' -f2|\
hfst-lookup -q ../lang-rus/src/fst/analyser-gt-desc.hfstol |\
grep "+?"|wc -l
Has worked with Børre. The question of columns is still open.
Work with the missing list, added verbs and some other words.
Added to the missing list (the “to be solved” entries).
Worked on the reflexive pronouns (+Emph still to be done).
Has been adding words to foreign_words.lexc
.
Russian words should be expelled from foreign_words.lexc.
We now have 581 typos. More typos are welcome.
Emphatic and refelxive personal pronouns
ам+Pron+Emph+Sg1: амки = emph = Type 1 below
ам+Pron+Refl+Sg1+Nom: амки = Types 2, 3 below
ам+Pron+Pers+Sg1+Nom: ам =
ам+Pron+Refl+Sg1+Acc: [амкинам, амкина̄м, амким] = refl = minä pesin itseni
ам+Pron+Refl+Sg1+Dat: [амкинамн, амкимн]
ам+Pron+Refl+Sg1+Abl: [амкинамныл, акимныл]
ам+Pron+Refl+Sg1+Ins: [акинамтыл, амкимтыл]
General discussion:
Arg1 V Arg2 = refl Peter känner sig själv, Jag känner mig själv
| | Pron.Refl.Sg1
R <-----+
Arg1 V Arg2 = non-refl Peter känner honom, Peter känner mig
| |
R1 R2
Peter tränade mig själv (Peter trained me himself)
Pron.Acc Emph
hun
Látom magam mirrorban/futureben. 'I see myself.'
..., látom magam (is). 'I myself see (too).'
látod magad?
itse itse+Adv 0,000000
itse itse+Pcle 0,000000
itse itse+Pron+Sg+Nom 0,000000
The fst and yaml files should be updated accordingly.
Mansi discussion:
I.
1. According to literature, nominative forms are emphatic. This is supported by the data as well.
Амки о̄с ты па̄выл-т школа а̄стла-сум.
Sg1Emph too this village+Loc school finish+V+Act+Ind+Prt+ScSg1
I myself graduated from that school too.
Амки ня̄врам ат о̄ньщигл-асум, о̄йка-м тав хӯрум пыганэ, хӯрум а̄ги-янэ янмалт-асанум.
Sg1Emph child no own+V+Act+Ind+Prt+ScSg1 husband+PxSg1 he three boy+N+Pl+Nom+PxSg3 three girl+N+Pl+Nom+PxSg3 grow(tr)+V+Act+Ind+Prt+ScSg1+OcPl
I didn't have children myself, I was raising my husband's three sons and three daughters.
2. According to the data, nominative forms can be understood in reflexive-like meanings, especially when followed by postpoisitions. It doesn't seem to be a grammatical difference, but I still feel it to be different from the previous analysis.
Ты ня̄врам-акве-г ам [PoP амки нупыл-ум] хас-сагум, то̄нт нэ̄мхотьют а̄нум ат ла̄в-ыс, тыщир ва̄руӈкве ат кос э̄р-ыс.
thus child+Dim+Du Sg1 Sg1Refl above+PxSg1 write+V+Act+Ind+Prt+ScSg1+OcDu then nobody no say+V+Act+Ind+Prt+ScSg3 this.way do+Inf no atlhough may+V+Act+Ind+Prt+ScSg3
So I adopted the two children, I wasn't told that I would not have been doing so.
Ты а̄ги-рищ-иг-пыг-рищ-иг янмалт-ым ам о̄с нила ня̄врам-акве амки пал-т-ум ви-санум.
this girl+Dim+Du-boy+Dim+Du grow(tr)+Ger Sg1 too four child+Dim Sg1Refl side-Loc-PxSg1 bring+V+Act+Ind+Prt+ScSg1+OcPl
While I was raising this little girl and boy, I took in four other children as well.
Ты ня̄врам-ыт нэ̄пак-аныл хунь ва̄р-санум, то̄нт ла̄в-ве̄сум, амки нупыл-ум воссыг ул вос хансы-янум.
this child+Pl book+Nom+PxPl3 when do-+V+Act+Ind+Prt+ScSg1+OcPl then say-+V+Act+Ind+Prt+ScSg1+OcPl Sg1Refl no.nore write+V+Act+Ind+Prt+ScSg1+OcPl
When I was creating these children's documents, only then was I told not to adopt them.
3. In some situations, I'd translate the nominative form in the meaning of Russian "свой"
Ам тав-е̄н амки та̄рвитыӈ ва̄рмал-юм урыл потыртасум.
Sg1 Sg3 Sg1Refl difficult thing+PxSg1 about speak+V+Act+Ind+Prt+ScSg1
I told him about my(own) difficult situation.
Ам амки яныг-м-ам пора-м ёмащакв номи-лум, ма̄нь па̄выл-кве-ныл яныг ӯс-н интернат-ын о̄луӈкве ке̄т-вес-ӯв.
Sg1 Sg1Refl grow+V+PrfPrc+PxSg1 time+PxSg1 well think+V+Act+Ind+Prs+ScSg1+OcSg small village+Dim+Abl big town+Lat boarding.school+Lat live-Inf send+V+Pass+Ind+Prt+ScPl1
I well remember my own childhood, we were sent to the boarding school in the big town from our small village.
Ам амки о̄парищнам-ум Енизорова о̄л-ыс.
Sg1 Sg1Refl surname+PxSg1 Yenizarova be+V+Act+Ind+Prt+ScSg3
My maiden name (my own family name) was Yenizarova.
II.
1. According to literature, accusative, dative, ablative and instrumental forms are reflexive. It is supported by the data as well.
Ольга Александровна Фомичёва, Силава па̄выл-т о̄л-нэ ма̄ньщи нэ̄, таквина̄тэ̄н ос а̄гитэ̄н тамле суп тот ёвт-ыс.
O.A.F. Silava village-Loc live+V+PrsPrc Mansi woman Sg3.Refl+Dat and girl+PxSg3+Lat this.kind dress there buy+V+Act+Ind+Prt+ScSg3
A Mansi woman from Silawa village, O.A.F. bought herself and her daughter costumes like that there.
2. Some sentences do not fit to this understanding.
Ма̄хум-акве-т, на̄н ос на̄нкина̄н ёмас ва̄р-е̄н, пӯльница-н ва̄тихал уральтахтуӈкве ял-э̄н.
people+Dim+Pl Pl2 too Pl2Refl good do+V+Act+Imprt+ScPl2 hospital+Loc often guard(refl)+Inf go(freq)+V+Act+Imprt+ScPl2
Dear people, do yourselves a favour and go to the hospital often to have yourselves checked.
на̄н ос на̄нкина̄н ёмас ва̄р-е̄н
you(pl) too you(pl)?refl+Acc good(adj) do+Imp
ам+Pron+Pers+Sg1+Nom: [амкем, амккем, амке̄мт, амкке̄мт]
ам+Pron+Pers+Du1+Nom: [ме̄нккеме̄н, ме̄нкеме̄н, ме̄нкемнт]
ам+Pron+Pers+Pl1+Nom: [ма̄нкке̄в, ма̄нке̄в, ма̄нке̄вт]
Both, the two of (them) = сас
Па̄вл-э̄н-т тэ̄н ханты мир культура центр-ыт рӯпит-э̄г, сас ма̄щтыр нам-ыл ма-им о̄л-э̄г.
village+PxDu3+Loc Du3 Khanty people culture centre+Loc work+V+Du3, both master name+Instr give-Gerund be+Du3
Also found what might be a new prefix: ёла-
!@U.VPref.jola@ёла-тохруӈкве+V:@U.VPref.jol@тохр V_U “21odd” ;
$ grep jola src/fst/morphology/root.lexc
$ grep jol src/fst/morphology/root.lexc
+Pref/jol
@U.VPref.jol@ !!≈ * @CODE@
@R.VPref.jol@ !!≈ * @CODE@
Last meeting’s todo list still not done:
Preliminary time: 15.00 Monday 6th Finnish time (06.00 Edmonton time).