South Sámi NLP Grammar

Finite state and Constraint Grammar based analysers, proofing tools and other resources

View the project on GitHub giellalt/lang-sma

Page Content

TTS tokenisation for smj

Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just:

make
echo "ja, ja" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst

More usage examples:

echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \
boasttu olmmoš, man mielde lahtuid." \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
echo "márffibiillagáffe" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst

Pmatch documentation: https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstPmatch

Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words:

Whitespace contains ASCII white space and the List contains some unicode white space characters

Apart from what’s in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a

TODO: Could use something like this, but built-in’s don’t include šžđčŋ:

Simply give an empty reading when something is unknown: hfst-tokenise –giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it’s safer to let hfst-tokenise handle them.

Needs hfst-tokenise to output things differently depending on the tag they get


This (part of) documentation was generated from tools/tokenisers/tokeniser-tts-cggt-desc.pmscript

Sitemap

Debugging site.pages:

URL: /assets/css/style.css - Title:

URL: /ConvertingToApertium.html - Title:

URL: /KompilereFST.html - Title:

URL: /Links.html - Title:

URL: /adj-meeting-05-2009.html - Title: Stoda no

URL: /docu-sma-adjs.html - Title: Sørsamiske adjektiv, system

URL: /docu-sma-background.html - Title: Background information on the South Saami project

URL: /docu-sma-bugs.html - Title: Bug reports, errors

URL: /docu-sma-deptags.html - Title: South Saami dependency tags

URL: /docu-sma-grammartags.html - Title: Overview

URL: /docu-sma-lex.html - Title: Documenting the South Saami lexicon file

URL: /docu-sma-morphophonology.html - Title: South Saami morphophonological processes

URL: /docu-sma-testplan.html - Title: Test plan for sma

URL: /docu-sma-twol.html - Title: Documentation of South Saami rules

URL: /docu-sma-verbs.html - Title: Souths Saami verb morphology

URL: /gramcheck/collecting-developer-texts.html - Title: Collecting developer texts

URL: /gramcheck/ - Title: Grammar checker for South Saami

URL: /index-header.html - Title: South Sámi documentation

URL: / - Title: South Sámi documentation

URL: /lemma.html - Title: Prinsipp for lemmatisering av sørsamisk

URL: /normativity-issues.html - Title: Background

URL: /sma-korpus-innsamling.html - Title: Korpusmøte for sma

URL: /sma-testdiary.html - Title: Test results for the morphology and lexicon files

URL: /sma.html - Title: South Sámi language model documentation

URL: /sma_lemma.freq.html - Title:

URL: /sma_wf.freq.html - Title:

URL: /src-cg3-disambiguator.cg3.html - Title: S O U T H   S Á M I   D I S A M B I G U A T O R

URL: /src-cg3-valency.cg3.html - Title: S O U T H   S Á M I   V A L E N C Y A N N O T A T O R

URL: /src-fst-morphology-affixes-abbreviations.lexc.html - Title: Continuation lexicons for abbreviations

URL: /src-fst-morphology-affixes-adjectives.lexc.html - Title: Adjective affixes

URL: /src-fst-morphology-affixes-nouns.lexc.html - Title: Nominal inflection sublexica

URL: /src-fst-morphology-affixes-possessive-suffixes.lexc.html - Title:

URL: /src-fst-morphology-affixes-propernouns.lexc.html - Title: Proper nouns morphology

URL: /src-fst-morphology-affixes-symbols.lexc.html - Title: Symbol affixes

URL: /src-fst-morphology-affixes-verbs.lexc.html - Title: South Saami verbal inflection sublexica

URL: /src-fst-morphology-compounding.lexc.html - Title: South Sámi morphological analyser

URL: /src-fst-morphology-phonology.twolc.html - Title: South Sámi morphophonological rule set

URL: /src-fst-morphology-root.lexc.html - Title: South Sámi morphological analyser

URL: /src-fst-morphology-stems-adjectives.lexc.html - Title: Adjective stems

URL: /src-fst-morphology-stems-adverbs.lexc.html - Title:

URL: /src-fst-morphology-stems-nouns.lexc.html - Title: South Sámi nouns

URL: /src-fst-morphology-stems-numerals.lexc.html - Title:

URL: /src-fst-morphology-stems-pronouns.lexc.html - Title: South Saami pronouns

URL: /src-fst-morphology-stems-sma-propernouns.lexc.html - Title:

URL: /src-fst-morphology-stems-verbs.lexc.html - Title: Verb stems

URL: /src-fst-oahpa-filer-aff-adjectives-oahpa.lexc.html - Title: Adjective affixes

URL: /src-fst-oahpa-filer-stems-adjectives-oahpa.lexc.html - Title: Adjective stems

URL: /src-fst-phonetics-txt2ipa.xfscript.html - Title:

URL: /src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html - Title:

URL: /src-fst-transcriptions-transcriptor-symbols2text.lexc.html - Title:

URL: /syntaks-testing.html - Title: Syntaks-testmateriale

URL: /tools-grammarcheckers-grammarchecker.cg3.html - Title:

URL: /tools-grammarcheckers-grc-disambiguator.cg3.html - Title: S O U T H   S Á M I   D I S A M B I G U A T O R

URL: /tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html - Title: Tokeniser for sma

URL: /tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html - Title: Grammar checker tokenisation for sma

URL: /tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html - Title: TTS tokenisation for smj

Root items:

URL: /ConvertingToApertium.html - Title: Convertingtoapertium

URL: /KompilereFST.html - Title: Kompilerefst

URL: /Links.html - Title: Links

URL: /adj-meeting-05-2009.html - Title: Stoda no

URL: /docu-sma-adjs.html - Title: Sørsamiske adjektiv, system

URL: /docu-sma-background.html - Title: Background information on the South Saami project

URL: /docu-sma-bugs.html - Title: Bug reports, errors

URL: /docu-sma-deptags.html - Title: South Saami dependency tags

URL: /docu-sma-grammartags.html - Title: Overview

URL: /docu-sma-lex.html - Title: Documenting the South Saami lexicon file

URL: /docu-sma-morphophonology.html - Title: South Saami morphophonological processes

URL: /docu-sma-testplan.html - Title: Test plan for sma

URL: /docu-sma-twol.html - Title: Documentation of South Saami rules

URL: /docu-sma-verbs.html - Title: Souths Saami verb morphology

URL: /gramcheck/ - Title: Grammar checker for South Saami

URL: /index-header.html - Title: South Sámi documentation

URL: / - Title: South Sámi documentation

URL: /lemma.html - Title: Prinsipp for lemmatisering av sørsamisk

URL: /normativity-issues.html - Title: Background

URL: /sma-korpus-innsamling.html - Title: Korpusmøte for sma

URL: /sma-testdiary.html - Title: Test results for the morphology and lexicon files

URL: /sma.html - Title: South Sámi language model documentation

URL: /sma_lemma.freq.html - Title: Sma_lemma.freq

URL: /sma_wf.freq.html - Title: Sma_wf.freq

URL: /src-cg3-disambiguator.cg3.html - Title: S O U T H   S Á M I   D I S A M B I G U A T O R

URL: /src-cg3-valency.cg3.html - Title: S O U T H   S Á M I   V A L E N C Y A N N O T A T O R

URL: /src-fst-morphology-affixes-abbreviations.lexc.html - Title: Continuation lexicons for abbreviations

URL: /src-fst-morphology-affixes-adjectives.lexc.html - Title: Adjective affixes

URL: /src-fst-morphology-affixes-nouns.lexc.html - Title: Nominal inflection sublexica

URL: /src-fst-morphology-affixes-possessive-suffixes.lexc.html - Title: Src-fst-morphology-affixes-possessive-suffixes.lexc

URL: /src-fst-morphology-affixes-propernouns.lexc.html - Title: Proper nouns morphology

URL: /src-fst-morphology-affixes-symbols.lexc.html - Title: Symbol affixes

URL: /src-fst-morphology-affixes-verbs.lexc.html - Title: South Saami verbal inflection sublexica

URL: /src-fst-morphology-compounding.lexc.html - Title: South Sámi morphological analyser

URL: /src-fst-morphology-phonology.twolc.html - Title: South Sámi morphophonological rule set

URL: /src-fst-morphology-root.lexc.html - Title: South Sámi morphological analyser

URL: /src-fst-morphology-stems-adjectives.lexc.html - Title: Adjective stems

URL: /src-fst-morphology-stems-adverbs.lexc.html - Title: Src-fst-morphology-stems-adverbs.lexc

URL: /src-fst-morphology-stems-nouns.lexc.html - Title: South Sámi nouns

URL: /src-fst-morphology-stems-numerals.lexc.html - Title: Src-fst-morphology-stems-numerals.lexc

URL: /src-fst-morphology-stems-pronouns.lexc.html - Title: South Saami pronouns

URL: /src-fst-morphology-stems-sma-propernouns.lexc.html - Title: Src-fst-morphology-stems-sma-propernouns.lexc

URL: /src-fst-morphology-stems-verbs.lexc.html - Title: Verb stems

URL: /src-fst-oahpa-filer-aff-adjectives-oahpa.lexc.html - Title: Adjective affixes

URL: /src-fst-oahpa-filer-stems-adjectives-oahpa.lexc.html - Title: Adjective stems

URL: /src-fst-phonetics-txt2ipa.xfscript.html - Title: Src-fst-phonetics-txt2ipa.xfscript

URL: /src-fst-transcriptions-transcriptor-abbrevs2text.lexc.html - Title: Src-fst-transcriptions-transcriptor-abbrevs2text.lexc

URL: /src-fst-transcriptions-transcriptor-symbols2text.lexc.html - Title: Src-fst-transcriptions-transcriptor-symbols2text.lexc

URL: /syntaks-testing.html - Title: Syntaks-testmateriale

URL: /tools-grammarcheckers-grammarchecker.cg3.html - Title: Tools-grammarcheckers-grammarchecker.cg3

URL: /tools-grammarcheckers-grc-disambiguator.cg3.html - Title: S O U T H   S Á M I   D I S A M B I G U A T O R

URL: /tools-tokenisers-tokeniser-disamb-gt-desc.pmscript.html - Title: Tokeniser for sma

URL: /tools-tokenisers-tokeniser-gramcheck-gt-desc.pmscript.html - Title: Grammar checker tokenisation for sma

URL: /tools-tokenisers-tokeniser-tts-cggt-desc.pmscript.html - Title: TTS tokenisation for smj

Directory items:

URL: /gramcheck/collecting-developer-texts.html - Title: Collecting developer texts