Inari Sámi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-smn

General tasks 2015

In the autumn we will make a plan for the MT work. In this document is only for dictionary and FST

August

Workers in August

Works to be done:

Time allocation

Tasks

verbs

nouns

adjectives <== get the landscape clear

other, closed POS <== get the POS right

Lexicon

For reference: Command:
cat misc/boundsmn.txt |preprocess|grep '[a-z]'|wc -l
cat misc/boundsmn.txt |preprocess|grep '[a-z]'|usmn|grep '?'|wc -l

Dictionaries:

smn-fin-smn dictionary launching

Dictionary: smn-fin-smn - deadline August 25?

A test version of the dictionary is online

TODO:

Dictionary: sme-smn transfer - deadline for 1-2 is August 5?, for the cifu presentation

The tool itself could be launched much later, as another tool than smn-fin-smn

TODO:

  1. correct columns in input excel dict files (Ciprian; ML, Miina, Trond)
  2. make transfer sme-fin + fin-smn (Ciprian)
  3. improve coverage
    1. Find holes (lacunas) in the dictionary (Trond)
    2. add missing translations (ML, Miina)
  4. put the dictionary online (Ryan)
  5. improve the interface (ML)
  6. launch the dictionary, when? how?

Testing FST

  1. Automatic testing (make check)
    1. yaml-files
    2. generating of lemmas
    3. generating of miniparadigms
  2. Analysis
    1. Analysis of texts (Erika)
    2. Coverage: creating missing lists, adding words to analyser
  3. Testing of analysator and dictonary (ML, Miina)

Morphology

nouns.lexc - first priority

TODO:

verbs.lexc - second priority

TODO:

adjectives.lexc - third priority

TODO:

smn-propernouns.lexc:

TODO:

abbreviations, acronyms - copy from sme

numerals, pronouns

adverbs, adpositions, conjunctions, subjunctions, particles, interjections

punctuation.lexc - should be ok

Dependencies

POS internal dependencies

For all FST work the following dependencies hold (for words without morphology several steps may be skipped):

  1. Linguistic ground work
  2. Yaml files and other test setup
  3. Plan of attack
  4. lexc and twolc work for the words in the yamlfiles
  5. yaml testing and refinement until yamlfiles are 100%
  6. go through lexicon file for all members of the contlex

Dependencies between POS within the FST

Otherwise there are no dependencies between the POS.

Dependencies between FST and dict and MT

  1. FST good enough to generate a substantial part of N, V, A paradigms
  2. an useful Neahttadigisánit with click-in-text
  3. FST with all POS done (but errors and holes here and there)
  4. good NDS with paradigm generation

Dependencies between FST and MT

  1. FST good enough to generate a substantial part of N, V, A paradigms
  2. alpha version of MT
  3. FST with all POS done (but errors and holes here and there)
  4. start working on MT transfer rules

Bidix and FST are not dependent upon each other, but it is easier to collect data to bidix with a good FST for text analysis.

General tasks 2015

In the autumn we will make a plan for the MT work. In this document is only dictionary and FST

smn-fin-smn dictionary launching

Dictionary: smn-fin-smn - deadline August 25?

A test version of the dictionary is online

TODO:

Dictionary: sme-smn transfer - deadline for 1-2 is August 5?, for the cifu presentation

The tool itself could be launched much later, as another tool than smn-fin-smn

TODO:

  1. correct columns in input excel dict files (Ciprian; ML, Miina, Trond)
  2. make transfer sme-fin + fin-smn (Ciprian)
  3. improve coverage
    1. Find holes (lacunas) in the dictionary (Trond)
    2. add missing translations (ML, Miina)
  4. put the dictionary online (Ryan)
  5. improve the interface (ML)
  6. launch the dictionary, when? how?

Testing FST

  1. Automatic testing (make check)
    1. yaml-files
    2. generating of lemmas
    3. generating of miniparadigms
  2. Analysis
    1. Analysis of texts (Erika)
    2. Coverage: creating missing lists, adding words to analyser
  3. Testing of analysator and dictonary (ML, Miina)

Morphology

nouns.lexc - first priority

TODO:

verbs.lexc - second priority

TODO:

adjectives.lexc - third priority

TODO:

smn-propernouns.lexc:

TODO:

abbreviations, acronyms - copy from sme

numerals, pronouns

adverbs, adpositions, conjunctions, subjunctions, particles, interjections

punctuation.lexc - should be ok

Dependencies

POS internal dependencies

For all FST work the following dependencies hold (for words without morphology several steps may be skipped):

  1. Linguistic ground work
  2. Yaml files and other test setup
  3. Plan of attack
  4. lexc and twolc work for the words in the yamlfiles
  5. yaml testing and refinement until yamlfiles are 100%
  6. go through lexicon file for all members of the contlex

Dependencies between POS within the FST

Otherwise there are no dependencies between the POS.

Dependencies between FST and dict and MT

  1. FST good enough to generate a substantial part of N, V, A paradigms
  2. an useful Neahttadigisánit with click-in-text
  3. FST with all POS done (but errors and holes here and there)
  4. good NDS with paradigm generation

Dependencies between FST and MT

  1. FST good enough to generate a substantial part of N, V, A paradigms
  2. alpha version of MT
  3. FST with all POS done (but errors and holes here and there)
  4. start working on MT transfer rules

Bidix and FST are not dependent upon each other, but it is easier to collect data to bidix with a good FST for text analysis.