Mansi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-mns

mns meeting 9.6. 2023

Present: Csilla, Jack, Trond

Agenda

Status

For the work

Lexicon

Problematic words in src/fst/incoming/from-oppikirja.230504b.lexc:

нама̄ныл N       нам
Дуня̄гум-Света̄гум        N       Дуня Света
ащирма̄ныл       N       ащирма
костера̄нылт     N       костер

Problem: in Luima Seripos these words do not have long vowels.

The length is in the morphology. We do not want them.

TODO: Csilla to find out whether the error has a Mansi source.

yaml

Csilla opened for both short and long suffixes. It seems we have to do it that way.

participles:

Csilla has been looking at verb morphology. Our morphology does not have all morphophonological patterns.

Csilla to do this: See and edit the document docs/participles.md in Macdown.app or in Subethaedit. If in see, do cmd-R to see what it looks like.

incoming

The incoming folder

New files in src/fst/incoming/. Jack and Csilla have gone through the list (see last minute).

Still to do

  1. Add the missing two proper nouns (and others?)
  2. Delete the files in incoming

Here are the missing ones:

Ӣльям   Ӣльям+?   inf
А̄ля    А̄ля+?    inf

Status:

a file

Ӣльям    N
ва̄ӈкве    V

b file

А̄лям    N    А̄ля
нама̄ныл    N    нам
Дуня̄гум-Света̄гум    N    Дуня Света
ащирма̄ныл    N    ащирма
костера̄нылт    N    костер
ла̄тӈынувл    N    ла̄тыӈ
е̄мтнэ̄ныл    V    е̄мтэ̄ныл
йӣквыт    V    йӣквуӈкве   <========= N.pl and not V
лаквылтамыт    V    лаквылтаӈкве
пантхатамыт    V    пантхатуӈкве

The duplication forms

Verbs are duplicated with hyphen: sleeps-dreams where the two verbs form a compound wordform with the same morphosyntactic description. Jaska has fixed this.

Open issue: The tag issue: +CollN is not in our repertoire, we now consider +Coll.

Светагум-Дунягум Света+N+Prop+Du+Nom+PxSg1+CollN+Cmp#Дуня+N+Prop+Du+Nom+PxSg1

Let us consider going for +Cmp/Coll+Cmp

Some examples to illuminate:

guovttis    guovttis+N+Coll+Sg+Nom    0,000000

Света+N+Prop+Du+Nom+PxSg1+Coll+Cmp#Дуня+N+Prop+Du+Nom+PxSg1

Cmp/Coll Cmp

viisát    viissis+A+Der/AAdv+Adv    0,000000

"<šlámaideaddji>"
        "eaddji" N Sem/Hum Sg Nom <W:10.0> <sme>
                "šlápma" N Sem/Dummytag Cmp/PlGen Cmp <W:10.0>  <====
                
šlámaideaddji    šlápma+N+Cmp/PlGen+Cmp#eadjit+V+TV+PrsPrc

"<viisát>"
        "viissis" Ex/A Der/AAdv Adv <W:0.0> <sme>

"<fievrriduvvui>"
        "fievrridit" Ex/V Ex/TV Gram/3syll Der/PassL V IV Ind Prt Sg3 <W:0.0> <sme> @+FMAINV

make check

We generate:

ӯнлуӈкве+V+Inf    ӯнлаӈкве    0

ӯнттуӈкве+V+Inf    ӯнттаӈкве    0

ва̄нталаӈкве+V+Inf    вантлуӈкве    0
ва̄нталаӈкве+V+Inf    ва̄нтлуӈкве    0 <<<<< These now work as ва̄нталаӈкве etc, the contlex was incorrect

Jack got:

lang-mns jackrueter$ hfst-lookup src/analyser-gt-norm.hfstol 
> ӯнлуӈкве
ӯнлуӈкве    ӯнлуӈкве+V+Inf    0,000000

> ^C
Jacks-MacBook-Pro:lang-mns jackrueter$ hfst-lookup src/generator-gt-norm.hfstol 
> ӯнлуӈкве+V+Inf
ӯнлуӈкве+V+Inf    ӯнлуӈкве    0,000000

This is a bug in the testing routine. Trond to have a look.

corpus-mns-orig

This concerns only Wikipedia. Trond to ensure it is not included.

For the fst

Yaml test

SUMMARY for the gt-norm fst(s): PASSES: 5500 / FAILS: 2819 / TOTAL: 8319
SUMMARY for the gt-desc fst(s): PASSES: 6 / FAILS: 430 / TOTAL: 436

Text coverage test

We recognise 82.2 % of the test/data/Olvasmanyok.txt.

cat test/data/Olvasmanyok.txt |hfst-tokenise tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |wc -l
cat test/data/Olvasmanyok.txt |hfst-tokenise -cg tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |grep  ' ?'|wc -l

Formula: 1 - (number of words / number of missing words)

Olvasmanyok:

Oppikirja:

Priorities onwards

  1. Fix the things mentioned in these minutes (all)
  2. Work with the forthcoming missing list from the new Oppikirja
  3. Look at short/long vowel discrepancies in textbook
  4. (Work with the missing list from the forthcoming 0.5m corpus)

The typos.txt file is in test/data/typos.txt. Format:

speling<tab>spelling

fst

Nouns

Jack to look at lative (with Csilla?)

The issue is still open. (?)

corpus

Working corpus

Planning

Milestones

We return to that in the next meeting. (again)

Textbook

Today metalanguage is planned to be English. Making it Russian should be within reach:

Lesson 1 
Паща ōлэн! 
 
Greetings and phrases 
Паща ōлэн!                Hello!  
Паща, паща!                Hello! (When answering the previous one.)  
Хумус ōлэ̄гын?        How are you?  
Ёмас.                        Fine.  
Ōс ёмас ōлэн!                Good bye! 
Наӈ намын ма̄ныр?        What is your name?  
Ам намум Анна.        My name is Anna. 
Пӯмащипа!                Thank you!  
Я̄тилы!                Please! 

The Mansi do not traditionally use so much wishes or even apologies: using these expressions is of Russian influence. Still, also the following phrases exist: 

Са̄литэлн!                 Excuse me!  
Ōс ёмас ӯлум!        Good night!  
Ёмас тэ̄п!                Enjoy your meal!  
Атыӈ тэ̄п!                Enjoy your meal!  
Та яныт ёмас ва̄рмаль ва̄рен!        You are welcome! (Sg)  
Та яныт ёмас ва̄рмаль ва̄рэ̄н!        You are welcome! (Pl) 
 
Asking and answering:  
Наӈ хотыл о̄лэ̄гын?                                Where are you from? 
Наӈ хотыл ёхтысын?                                Where are you from? 
Ам Ханты-Мансийск усныл о̄лэ̄гум.        I am from Khanty-Mansiysk.  
Маныр вāрēгын?                                 What are you doing? 
Ам рӯпитэ̄гум.                                I am working. 
Наӈ ма̄ньщи?                                        Are you a Mansi? 
А-а.                                                Yes.  
Остыгыл.                                         Yes / Of course / Sure.  
А̄ти.                                                No. 

Exercises 
I How would you answer to the following greetings or questions? 

II Provide a brief oral discussion with your partner: start with greeting, then ask who (s)he is, and where (s)he comes from. Finally say goodbye. Then, write your discussion down here. 

Урок 1
 
приветствия и фразы
         Привет!
          Привет! (При ответе на предыдущий.)
     Как вы?
            отлично.
            До свидания!
        Как вас зовут?
     Меня зовут Анна.
        Спасибо!
      Пожалуйста!
Манси традиционно не используют так много пожеланий или даже извинений: использование этих выражений имеет русское влияние. Тем не менее, также существуют следующие фразы:
Прошу прощения!
доброй ночи!
Наслаждайтесь едой!
Наслаждайтесь едой!
          Пожалуйста! (сержант)
          Пожалуйста! (мн.ч.)
 
Спрашиваем и отвечаем:
             Откуда ты?
            Откуда ты?
   Я из Ханты-Мансийска.
         Что ты делаешь?
        Я работаю.
              Вы манси?
              Да.
           Да / Конечно / Конечно.
               нет.

Упражнения
Как бы вы ответили на следующие приветствия или вопросы?

II Проведите краткую устную беседу со своим партнером: начните с приветствия, затем спросите, кто он(а) такой и откуда он(а). Наконец попрощайтесь. Затем напишите свое обсуждение здесь.

Next meeting

Friday, June 16th at 11 Finnish time.