Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-sma
All doc-comment documentation in one large file.
"<.>" "<!>" "<?>" "<...>" "<¶>" sent
(>>>) (<s>)
(<<<) (</s>)
Nom Acc Gen Ine Ela Ill Com Ess
PxSg1 PxSg2 PxSg3 PxPl1 PxPl3 PxPl3
Der/A
Der/Car
Der/Dimin
Der/InchL
Der/NomAct
Der/NomAg
Der/PassL
Der/PassS
Der/Rec
Der/adte
Der/ahtje
Der/alla
Der/d
Der/eds
Der/ht
Der/htalle
Der/htj
Der/ihks
Der/ijes
Der/l
Der/laakan
Der/ldahke
Der/ldh
Der/ldihkie
Der/les
Der/lg
Der/st
Der/vuota
We define two lists for Err/xxx
tags:
Err/Orth
:
Err/Orth
Err/Orth-a/á
Err/Orth-nom/gen
Err/Orth-nom/acc
Err/DerSub
Err/CmpSub
Err/UnspaceCmp
Err/HyphSub
Err/SpaceCmp
Err/Spellrelax
err_orth_mt
Err/Orth-spes
:
Err/Orth-a/á
Err/Orth-nom/gen
Err/Orth-nom/acc
Err/DerSub
Err/CmpSub
Err/UnspaceCmp
Err/HyphSub
Err/SpaceCmp
Err/Spellrelax
err_orth_a_á_mt
err_orth_nom_acc_mt
err_orth_nom_gen_mt
Cmp/Hyph
<vdic>
REAL-TITLE OFFICE TITLE
CASES ADVLCASE NUMBER
INSTITUTION ORGANIZATION EDUCATION CURRENCY CURRENCY LESSON
REALCOPULAS
COPULAS
V-NOT-COP
MOD-ASP
GUKTIEGOSSE
DAESTIE
ILLADV
INEADV1
ELAADV1
INEADV
ELAADV
DV-MOD-ADV
ILLPO
REALCLB
SV-BOUNDARY
NP-BOUNDARY
V-DER
V-DER-SUF
N-DER N-DER-SUF
A-DER A-DER-SUF
PASS
LEX-V LEX-N LEX-A LEX-ADV
VERB-FORMS 2-PERS
BEFORE-SECTIONS
Rule for adding Sem/Date as a tag to readings which looks like dates (fjernes når vi får felles numeralfil fra shared)
Rules for adding
SECTION
Removing non-lexicalised forms when lexicalised
REmove Px if not family
INITIAL
Selecting postpositions when preceded by genitives, etc.
Rel or Interr OR Indef
Selecting adverbs in local contexts
Selecting verbs in local contexts, based upon agreement patterns
Selecting imperative sentence-initially with appropriate right context
Remove verb readings
Select Inf
Mapping CNP to CC and CS.
Mapping @CVP to all CS
Attributes or not
Select PrfPrc if DerNomAct
Mapping verbs
This rule removes all other readings, if there is a mapped V reading in the same cohort. Every case which this goes wrong, should be fixed in mapping rules or previous disrules.
leah Prs Sg2 = Pl3
Select Inf If Infv
Remove Prop Attr if not 1 Prop
Ger or Der/NomAct
Adj or Indef
Num
Adv or Po/Pr
Illative or genetive
Essive
Comitative
Accusative or illative
Indef or Adv
special lemmas
Adverb context prefers Adv
Verb person vs. Inf – moved here in order to have the pronouns disambiguated first.
Rule set taken from sme
gellie as numeral, not pronoun
This (part of) documentation was generated from src/cg3/disambiguator.cg3
"<.>" "<!>" "<?>" "<...>" "<¶>" sent
(>>>) (<s>)
(<<<) (</s>)
Number and person tags:
Der/A
Der/Car
Der/Dimin
Der/InchL
Der/NomAct
Der/NomAg
Der/PassL
Der/PassS
Der/Rec
Der/adte
Der/ahtje
Der/alla
Der/d
Der/eds
Der/ht
Der/htalle
Der/htj
Der/ihks
Der/ijes
Der/l
Der/laakan
Der/ldahke
Der/ldh
Der/ldihkie
Der/les
Der/lg
Der/st
Der/vuota
We define two lists for Err/xxx
tags:
Err/Orth
:
Err/Orth
Err/Orth-a/á
Err/Orth-nom/gen
Err/Orth-nom/acc
Err/DerSub
Err/CmpSub
Err/UnspaceCmp
Err/HyphSub
Err/SpaceCmp
Err/Spellrelax
err_orth_mt
Err/Orth-spes
:
Err/Orth-a/á
Err/Orth-nom/gen
Err/Orth-nom/acc
Err/DerSub
Err/CmpSub
Err/UnspaceCmp
Err/HyphSub
Err/SpaceCmp
Err/Spellrelax
err_orth_a_á_mt
err_orth_nom_acc_mt
err_orth_nom_gen_mt
Cmp/Hyph
<vdic>
REAL-TITLE OFFICE TITLE
CASES ADVLCASE NUMBER
INSTITUTION ORGANIZATION EDUCATION CURRENCY CURRENCY LESSON
REALCOPULAS
COPULAS
V-NOT-COP
MOD-ASP
GUKTIEGOSSE
DAESTIE
ILLADV
INEADV1
ELAADV1
INEADV
ELAADV
DV-MOD-ADV
ILLPO
REALCLB
SV-BOUNDARY
NP-BOUNDARY
V-DER
V-DER-SUF
N-DER N-DER-SUF
A-DER A-DER-SUF
PASS
LEX-V LEX-N LEX-A
VERB-FORMS 2-PERS
This (part of) documentation was generated from src/cg3/valency.cg3
**LEXICON ab-noun **
**LEXICON ab-adj **
**LEXICON ab-adv **
**LEXICON ab-num **
**LEXICON ab-nodot-noun ** The bulk
**LEXICON ab-nodot-adj **
**LEXICON ab-nodot-adv **
**LEXICON ab-nodot-num **
**LEXICON ab-dot-noun ** This is the lexicon for abbrs that must have a period.
**LEXICON ab-dot-adj ** This is the lexicon for abbrs that must have a period.
**LEXICON ab-dot-adv ** This is the lexicon for abbrs that must have a period.
**LEXICON ab-dot-num ** This is the lexicon for abbrs that must have a period.
**LEXICON ab-dot-cc **
**LEXICON ab-dot-verb **
**LEXICON ab-nodot-verb **
**LEXICON ab-dot-IVprfprc **
**LEXICON nodot-attrnomaccgen-infl **
**LEXICON nodot-attr-infl **
**LEXICON nodot-nomaccgen-infl **
**LEXICON dot-attrnomaccgen-infl **
**LEXICON dot-attr **
**LEXICON dot-nomaccgen-infl **
**LEXICON DOT ** - Adds the dot to dotted abbreviations.
This (part of) documentation was generated from src/fst/morphology/affixes/abbreviations.lexc
This is one of two parallel files containing adjective affixes. The files represent two alternative interpretation of the same data (South Saami adjectives). This file is used for spellchecking, the alternative file adjectives-oahpa.lexc is used for dictionary and icall applications. This file is compiled by default, the other one is compiled by in langs/sma giving the command .configure –with-oahpa before compiling.
LEXICON PRED_S
The PRED_S
lexicon is used for adjectives Predicatives.
+Sg+Nom:%>s FINAL1 ;
LEXICON PRED_0
The PRED_0
lexicon is used for adjectives Predicatives.
+Sg+Nom: FINAL1 ;
LEXICON PRED_H
The PRED_H
lexicon is used for adjectives Predicatives.
+Sg+Nom:%>h FINAL1 ;
LEXICON PRED_NE_ODD
The PRED_NE_ODD
lexicon is used for adjectives Predicatives.
+Sg+Nom:%>ne FINAL1 ;
:n ODDCASEOBL ;
:n ODDCOMP ;
LEXICON PRED_N
The PRED_N
lexicon is used for adjectives Predicatives.
+Sg+Nom:%>n FINAL1 ;
LEXICON e_E_EVEN
The e_E_EVEN
lexicon is used for adjectives on –e
and –e
In attributes and predicatives.With EVEN-NOCOMP.
:e ATTR_0 ;
:e PRED_0 ;
+Sg: NIEJTESGOBL ;
+Pl: NIEJTE_PL ;
NIEJTEREST ;
:e EVENCOMP ;
LEXICON e_E_EVENNOCOMP1
The e_E_EVENNOCOMP1
lexicon is used for adjectives on –e
and –e
stem. In attributes and predicatives.With EVEN-NOCOMP.
:e ATTR_0 ;
:e PRED_0 ;
+Sg: NIEJTESGOBL ;
+Pl: NIEJTE_PL ;
NIEJTEREST ;
LEXICON a_A_EVEN1
The a_A_EVEN1
lexicon is used for adjectives on –a
and –a
In attributes and predicatives.With EVEN-COMP.
:a ATTR_0 ;
+Sg: MAANASGNOM ;
MAANAOBL ;
:a EVENCOMP ;
LEXICON as_AS_EVEN1
The as_AS_EVEN1
lexicon is used for adjectives on –as
and –as
In attributes and predicatives.With EVEN-COMP.
:a ATTR_S ;
+Sg+Nom:as FINAL1 ;
+Cmp/SgNom:as R ;
LEXICON ie_IE_EVEN1
The ie_IE_EVEN1
lexicon is used for adjectives on –ie
and –ie
In attributes and predicatives.With EVEN-COMP.
:ie ATTR_0 ;
N_IE_FORMS ;
:ie EVENCOMP ;
LEXICON ie_IE_EVENNOCOMP
The ie_IE_EVENNOCOMP
lexicon is used for adjectives on –ie
and –ie
In attributes and predicatives. With EVEN-COMP.
:ie ATTR_0 ;
N_IE_FORMS ;
LEXICON a_A_EVEN1_NOCOMP
The a_A_EVEN1_NOCOMP
lexicon is used for adjectives on –ie
and –ie
In attributes and predicatives. With EVEN-COMP.
:a ATTR_0 ;
+Sg: MAANASGNOM ;
MAANAOBL ;
LEXICON es_ES_EVEN
The es_ES_EVEN
lexicon is used for adjectives on –es
and –es
In attributes and predicatives. With EVEN-COMP.
:e ATTR_S ;
:e PRED_S ;
:e EVENCOMP ;
LEXICON es_ES_EVENNOCOMP1
The es_ES_EVENNOCOMP1
lexicon is used for adjectives on –es
and –es
In attributes and predicatives. With EVEN-NOCOMP.
:e ATTR_S ;
:e PRED_S ;
:es ODDCASEOBL ;
LEXICON ies_IES_EVEN1
The ies_IES_EVEN1
lexicon is used for adjectives on –ies
and –ies
In attributes and predicatives. With EVEN-COMP.
ies_IES_EVENNOCOMP1 ;
:ie EVENCOMP ;
LEXICON ies_IES_EVENNOCOMP1
The ies_IES_EVENNOCOMP1
lexicon is used for adjectives on –ies
and –ies
In attributes and predicatives. With EVEN-NOCOMP.
:ie ATTR_S ;
:ie PRED_S ;
LEXICON eh_EH_ODDNOCOMP1
guektiengïeleh+A+Attr
guektiengïeleh+A+Attr
guektiengïeleh+A+Attr
guektiengïeleh+A+Attr
guektiengïeleh+A+Sg+Nom
guektiengïeleh+A+Sg+Nom
guektiengïeleh+A+Sg+Nom
guektiengïeleh+A+Sg+Nom
guektiengïeleh+A+Sg+Nom
guektiengïeleh+A+Sg+Nom
guektiengïeleh+A+Sg+Nom
guektiengïeleh+A+Sg+Nom
LEXICON BAERIES
(BÅERIES)UNEVEN adjective, attr = pred. Comparation uneven syllable. Presentlly only used for the båeries adjective.
:båerie ATTR_S ;
:båerie PRED_S ;
:båaras ODDCOMP ;
ÅEHPIES
ODD adjective, attr = pred. Comparation uneven syllable.
LEXICON GIERIES
Umlaut from attr to pred. Comparation uneven syllable. Presentlly only used for “gieries-gearehke” adjective. This lexicon covers the ies - ehke + umlaut change.
:gierie ATTR_S ;
:gearahk ODDCASE ;
:gearahk ODDCOMP ;
+Use/NG:gearahtj ODDCOMP ;
+Use/NG:gearahg ODDCOMP ;
BUERIE_UMLAUT_IE_STAMME
EVEN adjective with EVEN-UMLAUT Comparation for -ie-stems.
:buer ie_IE_EVENNOCOMP ;
:buerie EVENCOMPONLY ;
:bööre MES ;
+Der1+Der/Dimin+A:buaratj diminODDCOMP ;
+Der1+Der/Dimin+A:bööretj diminODDCOMP ;
Sjekk opp denne!
ihks_IHKS_igs_IGS_EVENNOCOMP
Adjective with no comp.
isvelihks+A+Attr
isvelihks+A+Attr
isvelihks+A+Sg+Nom
isvelihks+A+Sg+Nom
isvelihks+A+Sg+Nom
isvelihks+A+Sg+Nom
isvelihks+A+Sg+Nom
isveligke+Adv
isvelihke+Adv
+Use/NG:ihk%>s ATTRCONT ;
:ig ATTR_S ;
+Err/Orth:igks ATTR_H ; , cf onterligksh
+Sg+Nom+Use/NG:ihk%>s FINAL1 ;
+Sg+Nom+Use/NG:ig%>s FINAL1 ;
:ihk X_NIEJTE ;
+Use/NG:igk X_NIEJTE ;
+Use/NG:igke PRED_0 ;
:ihke PRED_0 ;
+Use/NG:ig N_IE_FORMS ;
e_ES_EVENNOCOMP2
This is for the adjective “jaame”
jaame+A+Attr
jaame+A+Sg+Nom
:e ATTR_0 ;
:e PRED_S ;
eCASEOBL ;
ODDEVEN2
This one gives EVEN and ODD Comparation.
:es ODDCASEOBL ;
:e EVENCOMP ;
+Cmp/SgNom:es R ;
+Use/NG:es ODDCOMP ;
es_E_EVEN3
This one gives EVEN Comparation, and -s in attributt and wowel in predikativ, which gives EVEN-COMP.
:e ATTR_S ;
:e EVENCOMP ;
as_oes_A_OE_EVEN3
This one gives EVEN Comparation, and -s in attributt and wowel in predikativ, which gives EVEN-COMP.
+Use/NG:a ATTR_S ;
:oe ATTR_S ;
:oe EVENCOMP_oe ;
+Use/NG:a EVENCOMP ;
oeh_ah_OE_A_EVEN3
This one gives EVEN Comparation, and -s in attributt and wowel in predikativ, which gives EVEN-COMP.
:oe ATTR_H ;
+Use/NG:a ATTR_H ;
N_OE ;
+Use/NG: MAANA ;
:oe EVENCOMP_oe ;
+Use/NG:e EVENCOMP ;
ies_IE_EVEN3
This one gives EVEN Comparation, and -s in attributt and wowel in predikativ, which gives EVEN-COMP.
:ie ATTR_S ;
N_IE_FORMS ;
:ie EVENCOMP ;
ies_IE_EVEN3NOCOMP
This one gives EVEN Comparation, and -s in attributt and wowel in predikativ.
:ie ATTR_S ;
N_IE_FORMS ;
These 6 adjectives is in the 4. group of the southsámi adjectives, the group which contains all umlaut-adjectives. Theese adjectives whivh have -as as attributeform and an as predicativeform, is south-southsámi adjectives, and they dont have any comparation. This group which covers the ies - an/ as-an and oes-an + umlaut change, is a small undergruppe of the 4.group
+A:a ATTR_S ;
Theese 5 adjectives is in the 4. group of the southsámi adjectives, The group which contains all umlaut-adjectives. Theese adjectives which have -oes as attributeform and -an as predicativeform, is north-southsámi adjectives, and they dont have any comparation. This group which covers the ies -> an/ as-> an and oes-> an + umlaut change, is a small undergruppe of the 4.group
+A:oe ATTR_S ;
+A: N_OE_OBL ;
+A:oe ATTR_H ;
+A:oe ATTR_H ;
MAST
The MAST
lexicon is used for adjectives on –masten
and masth
with an
used with the stem masten
ATTR_S ;
+Use/NG:e ATTR_S ;
+Use/NG: ATTR_H ;
+Use/NG:e ATTR_N ;
:e PRED_N ;
EVEN adjective EVEN Comparation. Used for all loan-adjectives “ijve”.
:ijv e_E_EVEN ;
+Use/NG:ïjv e_E_EVEN ;
+Err/Orth:iv e_E_EVEN ;
JELLE
The JELLE
lexicon is used for loanadjectives on jelle
and –jelle
with an
used with the stem jelle This one should be ‘jeelle’? SGM?
+Err/Orth:^ell e_ES_LOAN ;
:jell e_ES_LOAN ;
UELLE
:^ell e_ES_LOAN ;
+Err/Orth:vell e_ES_LOAN ;
:ijl e_E_EVEN ;
This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc
The default inflectional lexicon for odd-syllable nouns is N_ODD.
Words like gierehtse
is inflected using this lexicon. Other words inflected
like this are: iehkede
(evening), guehpere (nail), tjaeleme (writing).
Many words in this class will have vowel changes in the second syllable,
between a reduced vowel in odd-syllable forms and a full vowel or diphthong
in even-syllable forms, as displayed in the paradigm below. This alternation
is regulated by two-level rules, but the rules require that the full
vowel is spelled out in the lexical entry as follows:
gierehtse+N+Sem/Veh:gieriehts N_ODD "pulk" ; ! gieriehtsisnie
That is, in the stem of the entry it says -rieht-
, where ie
is the
diphthong that is realised in even-syllable word forms. Another example
word is darjome
:
darjome+N+Sem/Feat:darjoem N_ODD ;
with -oe-
as the stem vowel to get a vowel change o
=> oe
in even-syllable
word forms.
gierehtse+N+Sg+Nom
gierehtse+N+Sg+Gen
gierehtse+N+Sg+Acc
gierehtse+N+Sg+Ill
gierehtse+N+Sg+Ine
gierehtse+N+Sg+Ela
gierehtse+N+Sg+Com
gierehtse+N+Pl+Nom
gierehtse+N+Pl+Acc
gierehtse+N+Pl+Gen
gierehtse+N+Pl+Ill
gierehtse+N+Pl+Ine
gierehtse+N+Pl+Ela
gierehtse+N+Pl+Com
gieriehtsinie: gierehtse+N+Ess
Odd-syll loanwords: lexicon N_ODD_MEETERE
Odd-syll loanwords on -ihtele, such as kapihtele: lexicon IHTELE
Odd-syll loanwords: lexicon N_ODD_LOAN
Odd-syll sg: lexicon N_ODD_SG
Odd-syll pl: lexicon N_ODD_PL
Odd-syll loanwords: lexicon N_ODD_ESS
LEXICON TJE_LASSJE_RESIPR
beetnege+N+Sg+Nom
beetnege+N+Sg+Nom
beetnege+N+Sg+Gen
beetnege+N+Sg+Ill
beetnege+N+Sg+Ine
beetnege+N+Sg+Ela
beetnege+N+Sg+Com
beetnege+N+Pl+Nom
beetnege+N+Pl+Acc
beetnege+N+Pl+Gen
beetnege+N+Pl+Ill
beetnege+N+Pl+Ine
beetnege+N+Pl+Ela
beetnege+N+Pl+Com
beetneginie: beetnege+N+Ess
åeruve+N+Sg+Nom
åeruve+N+Sg+Gen
åeruve+N+Sg+Acc
åeruve+N+Sg+Ill
åeruve+N+Sg+Ine
åeruve+N+Sg+Ela
åeruve+N+Sg+Com
åeruve+N+Pl+Nom
åeruve+N+Pl+Acc
åeruve+N+Pl+Gen
åeruve+N+Pl+Ill
åeruve+N+Pl+Ine
åeruve+N+Pl+Ela
åeruve+N+Pl+Com
åeruve+N+Ess
åeruve+N+Der/Dimin+N+Sg+Nom
åeruve+N+Der/Dimin+N+Sg+Gen
åeruve+N+Der/Dimin+N+Sg+Acc
åeruve+N+Der/Dimin+N+Sg+Ill
åeruve+N+Der/Dimin+N+Sg+Ine
åeruve+N+Der/Dimin+N+Sg+Ela
åeruve+N+Der/Dimin+N+Sg+Com
åeruve+N+Der/Dimin+N+Ess
åeruve+N+Der/Dimin+N+Pl+Nom
åeruve+N+Der/Dimin+N+Pl+Gen
åeruve+N+Der/Dimin+N+Pl+Acc
åeruve+N+Der/Dimin+N+Pl+Ill
åeruve+N+Der/Dimin+N+Pl+Ine
åeruve+N+Der/Dimin+N+Pl+Ela
åeruve+N+Der/Dimin+N+Pl+Com
åerievadtjine: åeruve+N+Der/Dimin+N+Ess
seerije+N+Sg+Nom
seerije+N+Sg+Acc
seerije+N+Sg+Gen
seerije+N+Sg+Ill
seerije+N+Sg+Ine
seerijistie: seerije+N+Sg+Ela
daktere+N+Sg+Nom
daktere+N+Sg+Gen
daktere+N+Sg+Acc
daktere+N+Sg+Ill
daktere+N+Sg+Ine
daktere+N+Sg+Ela
daktere+N+Sg+Com
daktere+N+Pl+Nom
daktere+N+Pl+Acc
daktere+N+Pl+Gen
daktere+N+Pl+Ill
daktere+N+Pl+Ine
daktere+N+Pl+Ela
daktere+N+Pl+Com
daktere+N+Ess
daktere+N+Der/Dimin+N+Sg+Nom
daktere+N+Der/Dimin+N+Sg+Gen
daktere+N+Der/Dimin+N+Sg+Acc
daktere+N+Der/Dimin+N+Sg+Ill
daktere+N+Der/Dimin+N+Sg+Ine
daktere+N+Der/Dimin+N+Sg+Ela
daktere+N+Der/Dimin+N+Sg+Com
daktere+N+Der/Dimin+N+Ess
daktere+N+Der/Dimin+N+Pl+Nom
daktere+N+Der/Dimin+N+Pl+Gen
daktere+N+Der/Dimin+N+Pl+Acc
daktere+N+Der/Dimin+N+Pl+Ill
daktere+N+Der/Dimin+N+Pl+Ine
daktere+N+Der/Dimin+N+Pl+Ela
daktere+N+Der/Dimin+N+Pl+Com
daktaradtjine: daktere+N+Der/Dimin+N+Ess
australijeenere+N+Sg+Nom
australijeenere+N+Sg+Acc
australijeenere+N+Sg+Gen
australijeenaristie: australijeenere+N+Sg+Ela
sisilijaanere+N+Sg+Nom
sisilijaanere+N+Sg+Acc
sisilijaanere+N+Sg+Gen
sisilijaanaristie: sisilijaanere+N+Sg+Ela
radijaatore+N+Sg+Nom
radijaatore+N+Sg+Acc
radijaatore+N+Sg+Gen
radijaatore+N+Sg+Ine
radijaatore+N+Sg+Ela
radijaatorinie: radijaatore+N+Sg+Com
pijaanove+N+Sg+Nom
pijaanove+N+Sg+Acc
pijaanove+N+Sg+Gen
pijaanove+N+Sg+Ine
pijaanove+N+Sg+Ela
pijaanovinie: pijaanove+N+Sg+Com
administraatore+N+Sg+Nom
administraatore+N+Sg+Acc
administraatore+N+Sg+Gen
administraatore+N+Sg+Ine
administraatore+N+Sg+Ela
administraatorinie: administraatore+N+Sg+Com
faktore+N+Sg+Nom
faktore+N+Sg+Acc
faktore+N+Sg+Gen
faktore+N+Sg+Ine
faktore+N+Sg+Ela
faktore+N+Sg+Com
The oe with umlaut generate the uml-ones and have the non-uml ones as +Use/NG.
The oe without umlaut generate the non-uml-ones only, naturally without +Use/NG.
Lexicon N_OE_OBL is for the -oe nouns without umlaut Illative is lifted out in order to allow for Use/NG for the umlauted ones.
LEXICON EETE_LOAN loanwords with -eete -
universiteete+N+Sg+Nom
universiteete+N+Sg+Acc
universiteete+N+Sg+Ine
universiteete+N+Der/Dimin+N+Sg+Nom
universiteete+N+Der/Dimin+N+Sg+Acc
universiteete+N+Der/Dimin+N+Sg+Ine
NIEJTE_SG
Short descrioption of this lexicon, and its purpose.
vïelle+N+Sg+Nom
vïelle+N+Sg+Ill
vïelle+N+Sg+Com
vïelle+N+Der/Dimin+N+Sg+Nom
vïelle+N+Der/Dimin+N+Sg+Ill
vïelle+N+Sg+Acc+PxSg1
vïelle+N+Sg+Acc+PxSg1
vïelle+N+Sg+Acc+PxSg2
vïelle+N+Sg+Nom+PxSg3
vïelle+N+Sg+Nom+PxSg3
LEXICON KONTO
Lexicon for vowel-final words with invariant stems”
+Sg: KONTO_SG ;
+Pl: KONTO_PL ;
EVEN_ESS ;
+Cmp/SgNom: R ;
+Cmp/SgGen:%>n R ;
+Cmp/PlGen:%>j R ;
+Der1+Der/Dimin+N:%»tj GÅATETJE ;
This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc
Divvun & Giellatekno - open source grammars for Sámi and other languages
This (part of) documentation was generated from src/fst/morphology/affixes/possessive-suffixes.lexc
Place names
Tunturi+N+Prop+Sem/Plc+Attr
Tunturi+N+Prop+Sem/Plc+Sg+Acc
Tunturi+N+Prop+Sem/Plc+Sg+Ill
Tunturi+N+Prop+Sem/Plc+Sg+Ine
Tunturi+N+Prop+Sem/Plc+Sg+Ela
Tunturi+N+Prop+Sem/Plc+Sg+Com
Tunturinie: Tunturi+N+Prop+Sem/Plc+Ess
Eira+N+Prop+Sem/Sur+Attr
(Eng. ! Går som MAANA ! PL+Nom Eirah)Eira+N+Prop+Sem/Sur+Sg+Acc
(Eng. ! Går som MAANA ! PL+Acc Eiride –> Komp. ikke)Eira+N+Prop+Sem/Sur+Sg+Ill
(Eng. ! Går som MAANA ! PL+Ill Eiride –> Komp. ikke)Eira+N+Prop+Sem/Sur+Sg+Ine
(Eng. ! Går som MAANA ! PL+Ine Eirine)Eira+N+Prop+Sem/Sur+Sg+Ela
(Eng. ! Går som MAANA ! PL+Ela Eirijste –> Komp. ikke)Eira+N+Prop+Sem/Sur+Sg+Com
(Eng. ! Går som MAANA ! PL+Com Eirajgujmie)Eirine: Eira+N+Prop+Sem/Sur+Ess
(Eng. ! Går som MAANA ! PL+Ess Eirine)
Anu+N+Prop+Sem/Fem+Attr
(Eng. ! Går som gaalloe ! Arkivfoto ! Pl+Nom Anuh)Anu+N+Prop+Sem/Fem+Sg+Acc
(Eng. ! Går som gaalloe ! Arkivfotom ! Pl+Acc Anujde)Anu+N+Prop+Sem/Fem+Sg+Ill
(Eng. ! Går som gaalloe ! Arkivfotose ! Pl+Ill Anujde)Anu+N+Prop+Sem/Fem+Sg+Ine
(Eng. ! Går som gaalloe ! Arkivfotosne ! Pl+Ine Anujne)Anu+N+Prop+Sem/Fem+Sg+Ela
(Eng. ! Går som gaalloe ! Arkivfotoste ! Pl+Ela Anujste)Anu+N+Prop+Sem/Fem+Sg+Com
(Eng. ! Går som gaalloe ! Arkivfotojne ! Pl+Com Anujgujmie)Anune: Anu+N+Prop+Sem/Fem+Ess
(Eng. ! Går som gaalloe ! Arkivfotojne ! Pl+Ess Anujne)
Ane+N+Prop+Sem/Fem+Attr
(Eng. ! Går som nïejte)Ane+N+Prop+Sem/Fem+Sg+Acc
(Eng. ! Går som nïejte)Ane+N+Prop+Sem/Fem+Sg+Ill
(Eng. ! Går som nïejte)Ane+N+Prop+Sem/Fem+Sg+Ine
(Eng. ! Går som nïejte)Ane+N+Prop+Sem/Fem+Sg+Ela
(Eng. ! Går som nïejte)Ane+N+Prop+Sem/Fem+Sg+Com
(Eng. ! Går som nïejte)Anine: Ane+N+Prop+Sem/Fem+Ess
(Eng. ! Går som nïejte)
Ane+N+Prop+Sem/Fem+Pl+Acc
(Eng. ! Går som nïejte ! -> Kompilerer ikke)Ane+N+Prop+Sem/Fem+Pl+Ill
(Eng. ! Går som nïejte ! -> Kompilerer ikke)Ane+N+Prop+Sem/Fem+Pl+Ine
(Eng. ! Går som nïejte ! -> Kompilerer ikke)Ane+N+Prop+Sem/Fem+Pl+Ela
(Eng. ! Går som nïejte ! -> Kompilerer ikke)Anigujmie: Ane+N+Prop+Sem/Fem+Pl+Com
(Eng. ! Går som nïejte ! -> Kompilerer ikke)
Ally+N+Prop+Sem/Fem+Attr
(Eng. ! Går som nïejte)Ally+N+Prop+Sem/Fem+Sg+Acc
(Eng. ! Går som nïejte)Ally+N+Prop+Sem/Fem+Sg+Gen
(Eng. ! Går som nïejte)Ally+N+Prop+Sem/Fem+Sg+Ill
(Eng. ! Går som nïejte)Ally+N+Prop+Sem/Fem+Sg+Ine
(Eng. ! Går som nïejte)Ally+N+Prop+Sem/Fem+Sg+Ela
(Eng. ! Går som nïejte)Ally+N+Prop+Sem/Fem+Sg+Com
(Eng. ! Går som nïejte)Ally+N+Prop+Sem/Fem+Ess
(Eng. ! Går som nïejte)Ally+N+Prop+Sem/Fem+Pl+Acc
(Eng. !)Ally+N+Prop+Sem/Fem+Pl+Gen
(Eng. !)Ally+N+Prop+Sem/Fem+Pl+Ill
(Eng. !)Ally+N+Prop+Sem/Fem+Pl+Ine
(Eng. !)Ally+N+Prop+Sem/Fem+Pl+Ela
(Eng. !)Allyjgujmie: Ally+N+Prop+Sem/Fem+Pl+Com
(Eng. !)
Aunio+N+Prop+Sem/Sur+Attr
(Eng. ! Går som)Aunio+N+Prop+Sem/Sur+Sg+Acc
(Eng. !)Aunio+N+Prop+Sem/Sur+Sg+Ill
(Eng. !)Aunio+N+Prop+Sem/Sur+Sg+Ine
(Eng. !)Aunio+N+Prop+Sem/Sur+Sg+Ela
(Eng. !)Aunio+N+Prop+Sem/Sur+Sg+Com
(Eng. !)Aunio+N+Prop+Sem/Sur+Ess
(Eng. !)LEXICON LONDON-obj Objects. ODD-syllable
Windows+N+Prop+Sem/Obj+Attr
Windows+N+Prop+Sem/Obj+Sg+Nom
Windows+N+Prop+Sem/Obj+Sg+Ill
Windowsistie: Windows+N+Prop+Sem/Obj+Sg+Ela
Courtrai+N+Prop+Sem/Plc+Attr
Courtrai+N+Prop+Sem/Plc+Sg+Acc
Courtrai+N+Prop+Sem/Plc+Sg+Ill
Courtrai+N+Prop+Sem/Plc+Sg+Ine
Courtrai+N+Prop+Sem/Plc+Sg+Ela
Courtrai+N+Prop+Sem/Plc+Sg+Com
Courtrajjine: Courtrai+N+Prop+Sem/Plc+Ess
Haukilahti+N+Prop+Sem/Plc+Sg+Nom
Haukilahti+N+Prop+Sem/Plc+Sg+Acc
Haukilahti+N+Prop+Sem/Plc+Sg+Ill
Haukilahti+N+Prop+Sem/Plc+Sg+Ill
Haukilahti+N+Prop+Sem/Plc+Sg+Ine
Haukilahti+N+Prop+Sem/Plc+Sg+Ela
Haukilahti+N+Prop+Sem/Plc+Sg+Com
Haukilahti+N+Prop+Sem/Plc+Ess
OBS! Egentlig Mâki og Järvi kan egentlig slås sammen! - MAJA
Hautajärvi+N+Prop+Sem/Plc+Sg+Nom
Hautajärvi+N+Prop+Sem/Plc+Sg+Acc
Hautajärvi+N+Prop+Sem/Plc+Sg+Ill
Hautajärvi+N+Prop+Sem/Plc+Sg+Ine
Hautajärvi+N+Prop+Sem/Plc+Sg+Ela
Hautajärvi+N+Prop+Sem/Plc+Sg+Com
Hautajärvine: Hautajärvi+N+Prop+Sem/Plc+Sg+Ess
Akersgata+N+Prop+Sem/Plc+Attr
Akersgata+N+Prop+Sem/Plc+Sg+Acc
Akersgata+N+Prop+Sem/Plc+Sg+Ill
Propernoun
Abia+N+Prop+Sem/Plc+Sg+Nom
Abia+N+Prop+Sem/Plc+Sg+Gen
Abia+N+Prop+Sem/Plc+Sg+Acc
Abia+N+Prop+Sem/Plc+Sg+Ill
Abia+N+Prop+Sem/Plc+Sg+Ine
Abia+N+Prop+Sem/Plc+Sg+Ela
Abia+N+Prop+Sem/Plc+Sg+Com
Abia+N+Prop+Sem/Plc+Ess
the sne / snie business remains to be sorted out. the sne / snie business remains to be sorted out.
+Pl+Nom:e%>h FINAL1 ; +Pl+Acc:e%>ide FINAL1 ; +Pl+Gen:e%>i FINAL1 ; +Pl+Ill:e%>ide FINAL1 ; +Pl+Ine:e%>ine FINAL1 ; +Pl+Ela:e%>iste FINAL1 ; +Pl+Com:e%>igujmie FINAL1 ;
+Pl: N_ODD_PL ; ! normal noun
Propernoun
Ahoniemi+N+Prop+Sem/Plc+Sg+Nom
Ahoniemi+N+Prop+Sem/Plc+Sg+Gen
Ahoniemi+N+Prop+Sem/Plc+Sg+Acc
Ahoniemi+N+Prop+Sem/Plc+Sg+Ill
Ahoniemi+N+Prop+Sem/Plc+Sg+Ine
Ahoniemi+N+Prop+Sem/Plc+Sg+Ela
Ahoniemi+N+Prop+Sem/Plc+Sg+Com
Ahoniemi+N+Prop+Sem/Plc+Sg+Ess
+N+Prop+Sem/Plc+Sg+Ill:%>an FINAL1 ; !SUB - is this possible? IllSg without Uml in -ie?
+N+Prop+Sem/Plc+Pl: NIEJTE_PL ;
+N+Prop+Sem/Plc+Pl+Com+Err/Orth:%>igyjmie FINAL1 ; !
+N+Prop+Sem/Plc+Pl: CNAME_ODD_PL ; ! name special
This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc
This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc
This is the file for the South Saami verb inflection and derivation.
First we just list the auxiliaries and their inflection.
LEXICON LEA the copula
LEXICON LEA-PRES
LEXICON LEA-PRET
LEXICON LEA-IMP
LEXICON NEG
LEXICON OLLE
LEXICON NEGIMP
LEXICON IJ-PRES
LEXICON EDTJEDH
LEXICON ED-PRES
LEXICON ED-PRET
LEXICON ED-IMP
Odd syllable verbs differ in Prt Sg3. This form is treated separately, and the rest of the paradigm is conflated.
LEXICON TJOEVERIDH_IV
LEXICON GOLTELIDH_TV
LEXICON AALHTEDIDH_TV
LEXICON GOLTELIDH_IV
LEXICON GOLTELIDH, odd-syll with -adte- as Der2
LEXICON BALVEDIDH
LEXICON RIHPESIDH, -nidh and -sidh
LEXICON AAJVESTIDH_TV, for stems ending -t-: dåajvoeht-, odd-syll with -alle- as Der2 and passive -sovvedh
LEXICON DÅAJVOEHTIDH_TV
LEXICON DÅAJVOEHTIDH_IV
LEXICON DÅAJVOEHTIDH for stems ending -t-: dåajvoeht-, odd-syll with -alle- as Der2
LEXICON COMMON-ODD
Finite forms
Infinite forms
Derivations
LEXICON MAEHTEDH_TV
LEXICON BÅETEDH_TV
LEXICON BÅETEDH_TV_ePRET
LEXICON BÅETEDH_IV
LEXICON BÅETEDH_IV_ePRET
LEXICON BÅETEDH row A - Group I
LEXICON BÅETEDH_NOTVGEN row A - Group Ixxf
LEXICON SEVTEDH row A - Group I IMPERSONALS!
LEXICON ÅEREDH row A - Group I Hasselbrink: “öörim.”- (Thomassen) Qvigstad: “vöörtim”
LEXICON ÅEREDH_TV row A - Group I NO -øø-UMLAUT!!!!
LEXICON TJEARODH_TV
LEXICON TJEARODH_IV
LEXICON TJEARODH row C - Group II
LEXICON ABRODH row C - Group II
LEXICON TSEAHKODH_TV
LEXICON TSEAHKODH_IV
LEXICON TSEAHKODH row C - Group II these have (lexicalized) diminutives on -estit, and passives on -algidh
LEXICON GUARKEDH_TV
LEXICON GUARKEDH_IV
LEXICON GUARKEDH row B - Group III
Fått tilbakemelding på denne om at “jarkah” er +Ind+Prs+Sg2, og “Jarkh!” er +Imprt. Har forelöpig satt denne inn som Err/Orth
LEXICON SIJHTEDH_TV
LEXICON TJOEHPEDH_TV
LEXICON GALKEDH_IV
LEXICON TJOEHPEDH_IV
LEXICON TJOEHPEDH row D - Group IV
LEXICON TJOEHPEDH_NOTVGEN
LEXICON GALKEDH_CONT row D - Group IV
LEXICON BIEGKEDH row D - Group IV !impersonals
LEXICON BÅÅHKEDH_TV
LEXICON SÅÅJHTEDH_IV
LEXICON BÅÅHKEDH_IV
LEXICON BÅÅHKEDH row E - Group V
LEXICON SÅÅJHTEDH_CONT row E - Group V
LEXICON VÅÅJNEDH
LEXICON GÖÖLEDH_TV
LEXICON GÖÖLEDH_IV
LEXICON GÖÖLEDH row F - Group VI
LEXICON BÖÖVTEDH row F - Group VI
LEXICON EEREDH_TV
LEXICON EEREDH_IV
LEXICON ÅARAJEHTEDH_TV
LEXICON ÅARAJEHTEDH_IV
LEXICON ÅARAJEHTEDH row A - Group I
LEXICON BUARADEHTEDH
LEXICON GOEGKERDADTEDH_TV
LEXICON GOEGKERDADTEDH_IV
LEXICON GOEGKERDADTEDH row D - Group IV
LEXICON OBREDADTEDH
LEXICON GÅETEDH_TV
LEXICON GÅETEDH_IV
LEXICON GÅETEDH from Der/InchL
LEXICON STIEHPEGÅETEDH
LEXICON AHTJE_TV
LEXICON AHTJE_IV
LEXICON OBRIJAHTJEDH
LEXICON AHTJE row D - Group IV
LEXICON SOVVEDHrow D - Group IV
LEXICON IV_PASSIVE_L - Passive of intransitive verbs => impersonate verbs, like “dïjvelduvvieh” = “(de sakene) ble diskutert”, from “dïjveldidh” = “diskutere” (IV), only used in 3rd person Sg and Pl.
+V+IV+Act:%>eme FINAL1 ; +V+IV+PrsPrc:%>ije FINAL1 ; +V+IV+PrsPrc:%>ijes FINAL1 ; Derivations ———–
Nominal derivation sublexica
LEXICON LAAHKOEH_ODD
LEXICON LAAHKOEH_ÅBPOE
LEXICON LAAHKOEH_OMMES
LEXICON IGENSUFF
LEXICON V-I-PRS-SG Merge with V-EVEN-PRS if nothing special here.
LEXICON V-II-PRS-SG
LEXICON V-III-PRS-SG
LEXICON VSUF-V-EVEN-PRS
LEXICON VSUF-V-EVEN-PRS-DUPL
LEXICON VSUF-EVEN-PRS-DUPL
LEXICON VSUF-VI-EVEN-PRS
LEXICON V-IV-EVEN-PRS
LEXICON VSUF-EVEN-IMP
LEXICON VSUF-II-EVEN-IMP
LEXICON VSUF-III-EVEN-IMP
LEXICON VSUF-ODD-PRS
LEXICON V-ODD-PRS-SG
LEXICON V-ODD-PRS-DUPL
LEXICON ODD_PRS_NON_DU3
LEXICON ODD_PRS_DU3
LEXICON V-PRS-SG-12 Kutt denne viss ikkje ref til
LEXICON V-PRS-SG-1
LEXICON V-PRS-SG-2
LEXICON V-PRS-SG-3
LEXICON VSUF-PRT
LEXICON VSUF-PRT-SG-12
LEXICON VSUF-PRT-SG-3
LEXICON VSUF-PRT-DUPL
LEXICON V-EVEN-PRS V-PRS-SG-12 ; V-PRS-SG-3 ; V-EVEN-PRS-DUPL ;
This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc
Prefixes
It contains only one entry:
Noerhte- ProperNoun ;
R
This lexicon is the main entry for regular compounding. All entries NOT requiring a hyphen should point to it.
The whole content of it is a list of flag diacritics to control compounding.
After the flags, we continue to the Rreal ;
lexicon, for the real compounding action.
It should be noted that some of the flags above require a corresponding flag in the lexicon ENDLEX
to work properly.
Rreal
This is where the actual compounding happens.
RNum
For compounds of the type Num+Noun
. We can’t allow Num+Num
, thus we use a separate compounding lexicon, since the regular RHyph
lexicon
below contains a continuation pointing back to the numerals.
RHyph
This lexicon is used for compounds requiring a hyphen before the next part. As for the regular compounds, we first add a number of flag diacritics to restrict certain combinations, before we continue to the real compounding lexicon.
RHyphReal
This is where the actual hyphen compounding happens. The hyphen is added here.
This (part of) documentation was generated from src/fst/morphology/compounding.lexc
This file documents the phonology.twolc file
e deletion before i-initial suffix
Diphthong simplification ie:e
dåer0ed%>0em
★dåeried%>0em (is not standard language)
gier0ehtse%>0m
★gieriehtse%>0m (is not standard language)
gijm0e%>0be
Diphthong simplification oe:o
daaro0st%>0em
★daaroest%>0em (is not standard language)
gaalo0hke%>0m
★gaaloehke%>0m (is not standard language)
jeark0e%>0be
a/e alternation
aatsked%>0em
★aatskad%>0em (is not standard language)
daktere%>0m
★daktare%>0m (is not standard language)
gæhtje%>0be
a/i alternation
jåhtij0%>em
a/0 alternation
Even syllabic verbs Du3 e/i alternation V
Proper PlGen, PlCom
**Even syllabic verbs Du2, Du3, Pl1, Pl2 e/i class V **
vååjn>i0jibie
★vååjn>e0jibie (is not standard language)
vååjn>i0jægan
★vååjn>e0jægan (is not standard language)
juht»i0je%>0m
★juht»ieje%>0m (is not standard language)
klæhte»tje0
klæhta»tj%>asse
japtse»tje0
japtsa»tj%>asse
gålle»tje0
gålla»tj%>asse
bæss%>0am
balt%>0am
båhtj%>0a
paak%>0a
båat%>0a
bual%>0a
klæht%>0an
japts%>0an
gåll%>0an
gaavl%>0an
gåat%>0an
njuasl%>0an
jeaht%>0a
★jieht%>0a (is not standard language)
sjeall%>0an
gåate»tje0
gylj%>0e
fyrhtje%>0se
hohtje%>0se
ronhtje%>0se
★færhtje%>0se (is not standard language)
★hahtje%>0se (is not standard language)
★rånhtje%>0se (is not standard language)
tjyör%>0e
★tjear%>0e (is not standard language)
byörke%>0se
★bearke%>0se (is not standard language)
myörhtje%>0se
★mearhtje%>0se (is not standard language)
hååre%>0se
★haare%>0se (is not standard language)
rååfe%>0se
★råafe%>0se (is not standard language)
mænn%>0a
jåvk%>0a
tjeal%>0a
ruaht%>0a
minn%>0ien
berk%>0ien
juvk%>0ien
tjiek%>0ien
dïjveld»uvv0ieh
★dïjveld»ovv0ieh (is not standard language)
jeeht%>0im
tjeel%>0im
bööt%>0im
vööjn%>0im
maane»tje0
★maana»tje0 (is not standard language)
maana»tj%>asse
★maane»tj%>asse (is not standard language)
byss»0edh
syrr»0edh
★sïrr»0edh (is not standard language)
gylj»0edh
★gælj»0edh (is not standard language)
bost»0edh
★best»0edh (is not standard language)
dorj»0edh
★darj»0edh (is not standard language)
joht»0edh
★juht»0edh (is not standard language)
gohp»0edh
★gåhp»0edh (is not standard language)
govl»0edh
vyödt»0edh
★viedt»0edh (is not standard language)
tjyör»0edh
★tjear»0edh (is not standard language)
byöpm»0edh
★bïepm»0edh (is not standard language)
dååjr»0edh
gååt»0edh
gååt»0edh
ååst»0edh
vååjn»0edh
gåårk»0edh
vååj»0edh
vååssj»0edh
boel»0ehtjidh
★buel»0ehtjidh (is not standard language)
paak»0ehtjidh
★paek»0ehtjidh (is not standard language)
vïej»0edidh
★veaj»0edidh (is not standard language)
goerk»0edidh
★guark»0edidh (is not standard language)
skïlk»0edidh
★skælk»0edidh (is not standard language)
plotjk»0edidh
★plåtjk»0edidh (is not standard language)
båat»0ast»0alledh
★bået»0ast»0alledh (is not standard language)
★båat»iest»0alledh (is not standard language)
tjyör»0el»0adtedh
★tjear»0el»0adtedh (is not standard language)
★tjyör»oel»0adtedh (is not standard language)
dåeriedi%>dh
bisse%>dh
belte%>dh
buhtje%>dh
paeke%>dh
båete%>dh
buele%>dh
æbjo%>dh
hajko%>dh
gåhpo%>dh
tjearo%>dh
baajsko%>dh
gåaro%>dh
skælke%>dh
skajke%>dh
plåtjke%>dh
sleapke%>dh
snjåarke%>dh
sïrre%>dh
sarje%>dh
sodte%>dh
skïerke%>dh
slaapke%>dh
snjåare%>dh
tjoehpe%>dh
skylle%>dh
aalhteroste%>dh
skyöre%>dh
vååjne%>dh
skilhte%>dh
sijle%>dh
snjurme%>dh
snjeere%>dh
Spesialregel for ‘soptsesovvedh’ < soptsestidh. Ingen andre verb har st > s framfor passivderivasjon.
soptses0»ovvedh
laajhna-aaltoe
aerpie-eeke
★laajhna#aaltoe (is not standard language)
★aerpie#eeke (is not standard language)
Aevjie#aesie
This (part of) documentation was generated from src/fst/morphology/phonology.twolc
Error tag | Explanation |
---|---|
+Err/Orth | Substandard, unormert form av et ord |
+Err/Hyph | Substandard, unormert |
+Err/SpaceCmp | Substandard, unormert |
+Err/Attr | Substandard, unormert Attr-form av et ord |
+Err/Lex | lemma med dens ordformer er utenfor normen. No normative lemma, it’s grammatically correct. |
+Err/Der | Errors in derivations |
+Err/Spellrelax | Used to tag spellrelaxed typos (tag is inserted via flag diacritics) |
+Err/MissingSpace | in use in smi lexc |
Usage tag | Explanation |
---|---|
+Use/Marg | Marginal, korrekte, eksisterende former, men som er sjeldne. vi kan fjerne disse ordene f.eks fra speller, fordi de er så sjeldne og lite i bruk at de lemma som ligger nært kan bli forvekslet. |
+Use/-Spell | Excluded from speller |
+Use/-PLX | Excluded in PLX speller |
+Use/SpellNoSugg | Recognized but not suggested in speller |
+Use/Circ | Circular path |
+Use/CircN | Circular number path? |
+Use/Ped | Remove from pedagogical speller |
+Use/NG | Do not generate for isme-ped.fst and apertium |
+Use/MT | Generate for apertium only |
+Use/NotDNorm | For (spellings of) words that do not follow the orthographic principles of sma. Divvun suggest that this shouldn’t be normative, even though they are decided upon by GG. Included in speller. |
+Use/DNorm | For words without formal normalization. Divvun suggest that this should be normative. Included in speller. Based on 2010 normative decision & Ove Lorentz’ suggestions for the norm. |
+Use/PMatch | Do only include in fst’s for hfst-pmatch |
+Use/-PMatch | Do not include in fst’s made for hfst-pmatch |
+Use/GC | only retained in the HFST Grammar Checker disambiguation analyser |
+Use/-GC | never retained in the HFST Grammar Checker disambiguation analyser |
+Use/TTS | only retained in the HFST Text-To-Speech disambiguation tokeniser |
+Use/-TTS | never retained in the HFST Text-To-Speech disambiguation tokeniser |
Dialect tag | Explanation |
---|---|
+Dial/-S | Not in the South |
+Dial/S | Only in the South |
+Dial/-N | Not in the North |
+Dial/N | Only in the North |
+Dial/-NOR | Words not in Norway |
+Dial/NOR | Words only in Norway |
+Dial/-SW | Words not in Sweden |
+Dial/SW | Words only in Sweden |
+Dial/SH | Short forms |
+Dial/L | Long forms |
(to govern compound behaviour for the speller, ie what a compound SHOULD BE)
The default is +CmpN/SgN
, so when nothing is specified, that
will be used. To override that one, specify one or more of the
following tags. +CmpN/SgN
must be specified if also other tags
are listed - unless +CmpN/SgN
should not be used, for course.
Normative compounding tag | Explanation |
---|---|
+CmpN/Sg | Singular |
+CmpN/SgN | Singular Nominative |
+CmpN/SgG | Singular Genitive |
+CmpN/PlG | Plural Genitive |
These tags overrule the regular tags defined above. One or more can be specified.
Normative left-governing tag | Explanation |
---|---|
+CmpN/SgLeft | Sg to the left |
+CmpN/SgNomLeft | etc. |
+CmpN/SgGenLeft | ” |
+CmpN/PlGenLeft | ” |
Normative position tag | Explanation |
---|---|
+CmpNP/All | … be in all positions, default, this tag does not have to be written |
+CmpNP/First | … only be first part in a compound or alone |
+CmpNP/Pref | … only be first part in a compound, NEVER alone |
+CmpNP/Last | … only be last part in a compound or alone |
+CmpNP/Suff | … only be last part in a compound, NEVER alone |
+CmpNP/None | … not take part in compounds |
+CmpNP/Only | … only be part of a compound, i.e. can never be used alone, but can appear in any position |
Tags for compound analysis - this is what a compound actually is. We use this to research compounding patterns in the corpus.
Descriptive compounding tag | Explanation |
---|---|
+Cmp/Sg | Compounding using an unspecified singular stem |
+Cmp/SgNom | Compounding using nominative singular |
+Cmp/SgGen | Compounding using genitive singular |
+Cmp/PlGen | Compounding using genitive plural |
+Cmp/Attr | Compounding using attribute form |
+Cmp/eh | Compound stem in –eh, as in gaameh-gåaroje, from gaamege |
+Cmp/ege | Compound stem in –ege, as in gaamege-gåaroje |
+Cmp/FinEDel | Deletion of final e, as in voelem-gaaroeh, from voeleme |
+Cmp/ShH | Compounding using a short stem + h: –biejjh– (from biejjie), cf reakedsbiejjhvadtese |
+Cmp/Sh | Compounding using a short stem: –biejj– (from biejjie) |
+Cmp/SplitR | This is a split compound with the other part to the right: “Arbeids- og inkluderingsdepartementet” => Arbeids– = +Cmp/SplitR |
+Cmp/SplitL | This is a split compound with the other part to the left, this is the oposite of the previous case |
+Cmp | Dynamic compound - this tag should always be part of a dynamic compound. It is important for Apertium and the speller (to give extra weights to compounds), and useful in other cases as well. |
+Cmp/XForm | Alle Cmp som ikke har en klar klassifisering |
+Cmp/AttrH | Alle Cmp som har en attr-h |
+Du = Dual
Tense tag | Explanation |
---|---|
+Prs | Presens |
+Prt | Preteritum |
Person & Number tag | Explanation |
---|---|
+Sg1 | Singular, 1.person |
+Sg2 | Singular, 2.person |
+Sg3 | Singular, 3.person |
+Du1 | Dual , 1.person |
+Du2 | Dual , 2.person |
+Du3 | Dual , 3.person |
+Pl1 | Plural , 1.person |
+Pl2 | Plural , 2.person |
+Pl3 | Plural , 3.person |
Verbal tag | Explanation |
---|---|
+Neg | negation verb ij |
+ConNeg | main verb complement to Neg, form identical to Imp |
+VAbess | Verb Abessive |
+Inf | Infinitive and participles |
+PrfPrc | Infinitive and participles |
+PrsPrc | Infinitive and participles |
+Ger | Gerundium |
+VGen | Verbgenitive |
+Ind | Indicative |
+Imprt | Imperative |
+ImprtII | Imperative, for Neg: ollem ollh … |
+Cond | Kondisjonalis, for one form: lidtjie. To be looked at.+ lidtjim, + lidtjih |
+Act | -eme, could be changed to +Actio |
Semantic tags help disambiguation and syntactic analysis. All tags used are defined and listed below.
Multiple semantic tags are written as one tag, with the different semantic values separated by an underline _
.
All used combinations must be declared below, and the list must be manually maintained. The tags are ordered alphabetically, both the list and the semantic values within one tag.
Tag | Explanation |
---|---|
+MWE | multi word expressions, goes to abbr |
Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.
Flag | Explanation |
---|---|
@P.Px.add@ | Giving possibility for Px-suffixes (all except from Nom 3.p) |
@R.Px.add@ | Requiring P.Px.add-flag for Px-suffixes (all except from Nom 3.p) |
@P.Nom3Px.add@ | Giving possibility for Px-suffixes Nom 3.p |
@R.Nom3Px.add@ | Requiring P.Nom3Px.add flag for Px-suffixes Nom 3.p |
@P.Pmatch.Backtrack@ | Used on single-token analyses; tell hfst-tokenise/pmatch to backtrack by reanalysing the substrings before and after this poin in the form (to find combinations of shorter analyses that would otherwise be missed) |
@D.ErrOrth.ON@ | asdf |
@C.ErrOrth@ | asdf |
@P.ErrOrth.ON@ | asdf |
Derivations in the same position are mutually exclusive (can not be combined), whereas tags in different positions can be combined, so that position 1 derivations must precede position 2 derivations, and so on.
Pos1 | Pos2 | Pos3 | POS switches (from-to) | Explanation |
---|---|---|---|---|
+Der1 | Position tag, required | |||
+Der2 | Position tag, required | |||
+Der3 | Position tag, required | |||
+Der/htalle | VV | Passive, frekeventative | ||
+Der/lg | VV | Passive | ||
+Der/ijes | NA | Nomen agentis | ||
+Der/ihks | VA | (Handlernomen- tilbøyelig til å utføre den handlingen som grunnordet angir) | ||
+Der/les | VA | Intensive | ||
+Der/ldihkie | VA | |||
+Der/ldahke | VA | Resultatnomen (?) | ||
+Der/ldh | VA | Attributt | ||
+Der/ht | VV | Causative | ||
+Der/l | VV | Subitive | ||
+Der/st | VV | Diminutive, Subitive | ||
+Der/d | VV | Continuative, Konative, Frequentative, Refleksive, Momentan | ||
+Der/Car | -hts, Caritive, was Der/heapmi in sme | |||
+Der/htj | NN | Dim-cont, Frequentative | ||
+Der/Dimin | NN | Diminutive | ||
+Der/Rec | NN | Forholdsformer | ||
+Der/laakan | AAdv | adverb | ||
+Der/laaketje | AA | adjektiv | ||
+Der/Comp | AA | adjektiv | ||
+Der/Superl | AA | adjektiv | ||
+Der/vuota | AN | Noun | ||
+Der/adte | VV | Frequentative, Kontinuativ | ||
+Der/alla | VV | Frequentative | ||
+Der/eds | NA | Attributt | ||
+Der/PassL | VV | long only | ||
+Der/NomAg | VN | Nomen Agentis | ||
+Der/NomAct | VN | Nomen Actionis | ||
+Der/ahtje | VV | Inchoative | ||
+Der/InchL | VV | Inchoative |
All non-positional derivations should be preceded by the following tag,
to make it possible to target regular expressions in all derivations in a
language-independent way:
just specify
[+Der](+Der1 .. +Der5)
and you are set.
Derivation tag | POS switch | Explanation |
---|---|---|
+Der/PassS | VV | short passive only |
+Der/A | NA | comparation of N’s |
The following tags are used to guide conversion to IPA: loan words and foreign names are usually pronounced (approximately) as in the originating (majority) language. Instead of trying to identify the correct pronunciation based on phonotactics (orthotactics actually), we tag all words that can’t be correctly transcribed using the SMA transcriber with source language codes. Once tagged, it is possible to apply different IPA conversions to each of them. The principle of tagging is that we only tag to the extent needed, and following a priority:
Originating language tag | Originating language |
---|---|
+OLang/SME | North Sámi |
+OLang/SMA | South Sámi |
+OLang/SMJ | Lule Sámi |
+OLang/FIN | Finnish |
+OLang/SWE | Swedish |
+OLang/NOB | Norw. bokmål |
+OLang/NNO | Norw. nynorsk |
+OLang/ENG | English |
+OLang/RUS | Russian |
+OLang/UND | Undefined |
+OLang/PARA | parallelle navn, navnet skal ikke overføres til andre samisk språk |
A multichar that usually just goes to zero:
|»
Trigger | Explanation |
---|---|
%^DISIMP | diphthong simplification |
%^COMPDISIMP | diphthong simplification in comparatives |
%^COMPDISIMP2 | diphthong simplification in comparatives, type 2 |
%^COMPDISIMP3 | diphthong simplification |
%^PLCDISIMP | diphthong simplification in ACCRA-names |
%^NOMAGieDISIMP | diphthong simplification for NomAg ie stems |
%^1UML | a-uml, like 1sg prs, perf.part of båetedh/V-I, and ill sg of -ie nouns |
%^2UML | dark e, as 3sg prs & perf.part of tjearodh/V-II, and ill sg of -oe nouns |
%^3UML | adj Umlaut oeh:an |
%^3sUML | a-uml in 3sg prs of V-IV (roehtedh - ruahta) |
%^3dUML | ie-uml in 1du & 3pl prs of V-IV (roehtedh - ruehtien) |
%^iæUML | not used |
%^iUML | i-uml in pret of V-I (båetedh - böötim) |
%^PASSUML | Short passive Umlaut Rx->R5 |
%^didhUML | Der/d Umlaut for GUARKEDH-words |
%^htjidhUML | Umlaut für der/htjidh derivations |
%^adteUML | Umlaut für Der/adte and Der/alla derivations |
%^aLATUS | Latus-Umlaut for -ie stems |
%^uLATUS | Latus-Umlaut for -oe stems |
%^ConsDel | Stem consonant deletion in front of Der/PassL |
%^ILLELA | Stem vowel changes in Illative an Elative |
%^PLGENPLCOM | Stem vowel changes in final from e -> i, and withoaut -j- |
%^COMESS | Stem vowel changes in ACCRA-names |
∑ | Symbol used before # and - in dynamic compounds, and only there. Used to block optional conversion of word boundaries to spaces for error detection in grammar checkers. That is, dynamic compounds are not allowed to be written appart for error detection, only lexicalised ones. This is done to reduce the amound of ambiguity in the raw analyses to an amount we can cope with. |
We have manually optimised the structure of our lexicon using the following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:
Flag | Explanation |
---|---|
@P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
@R.ErrOrth.ON@ |
For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.
Flag | Explanation |
---|---|
@P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
@D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
@P.CmpPref.FALSE@ | Block these words from making further compounds |
@D.CmpLast.TRUE@ | Block such words from entering R |
@D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
@U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
@U.CmpNone.TRUE@ | Combines with the two previous ones to block compounding |
@P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
@D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
@U.CmpHyph.FALSE@ | Flag to control hyphenated compounds like proper nouns |
@U.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns |
@C.CmpHyph@ | Flag to control hyphenated compounds like proper nouns |
Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.
Flag | Explanation |
---|---|
@U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj. |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
The following flag diacritics are used by the grammar checker.
Flag | Explanation |
---|---|
@R.SpellRlx.ON@ | Flag used to tag spell-relax-analysed strings (and only those). |
@D.SpellRlx.ON@ | Flag used to tag spell-relax-analysed strings (and only those). |
@C.SpellRlx@ | Flag used to tag spell-relax-analysed strings (and only those). |
@P.Pmatch.Loc@ | Used on multi-token analyses; tell hfst-tokenise/pmatch where in the form/analysis the token should be split. |
@P.Pmatch.Backtrack@ | Used on single-token analyses; tell hfst-tokenise/pmatch to backtrack by reanalysing the substrings before and after this point in the form (to find combinations of shorter analyses that would otherwise be missed) |
Flag diacritic | Explanation |
---|---|
@U.number.one@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.two@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.three@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.four@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.five@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.six@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.seven@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.eight@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.nine@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.zero@ | Flag used to give arabic numerals in smj different cases ; |
This is the beginning of everything. The Root lexicon is reserved in the LexC language, and must be the first lexicon defined.
Here is the list of top-level lexica in the South Sámi analyser:
Abbreviation ;
Acronym ;
Adjective ;
Adposition ;
Adverb ;
Conjunction ;
Interjection ;
NounRoot ;
Numeral ;
Particle ;
Prefixes ;
Pronoun ;
ProperNoun ;
Punctuation ;
Subjunction ;
Symbols ;
Verb ;
And this is the ENDLEX of everything:
@D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ ENDLEX2 ;
The @D.CmpOnly.FALSE@
flag diacritic is ued to disallow words tagged
with +CmpNP/Only to end here.
The @D.NeedNoun.ON@
flag diacritic is used to block illegal compounds.
This (part of) documentation was generated from src/fst/morphology/root.lexc
This is one of two parallel files containing adjective stems. The files represent two alternative interpretation of the same data (South Saami adjectives). This file is used for spellchecking, the alternative file adjectives-oahpa.lexc is used for dictionary and icall applications. This file is compiled by default, the other one is compiled by in langs/sma giving the command .configure –with-oahpa before compiling.
etnihke+A+OLang/NOB:etnihke IHKE_IHKELES_LOAN ; !samediggediedahus 2012 - etnisk - etnisiteete+les
This (part of) documentation was generated from src/fst/morphology/stems/adjectives.lexc
egentlig satt disse inn i noun-adv-leksikon om disse skal være adverb?
<== why no case?
laakte bïejedh - legge for tett sammen
This (part of) documentation was generated from src/fst/morphology/stems/adverbs.lexc
NounRoot
This lexicon is the start of all noun lemmas. It splits the nouns in three classes as follows:
NounRoot –> FirstComponent NounRoot –> HyphNouns NounRoot –> Noun
Splitting nouns in NounNoPx, NounPx (with a P.Px.add flag) and NounPxKin (with a P.Nom3Px.add flag)
https://satni.uit.no/termwiki/index.php?title=Huksenteknihkka:borettslaghttps://satni.uit.no/termwiki/index.php?title=Huksenteknihkka:frittstående_borettslag
https://satni.uit.no/termwiki/index.php?title=Education:embetsstudium
Not according to umlautsystem
—Ije-
Lemma fra GG: merk DNorm
pp, tt, kk –> hp, ht, hk eller bp, dt, gk? bp, dt, gk strider i mot rettskrivingsprinsippene jfr.
6 koreen 5 tyrkijen 20 Bottleneck-hypotesen —- dynamisk sammensetning - how? 17 direkte
This (part of) documentation was generated from src/fst/morphology/stems/nouns.lexc
hva med
NAMAT ; ! duhatjienat, logigielat, etc. NAMAT derivs are SAS ; !viđajahkásaš
This (part of) documentation was generated from src/fst/morphology/stems/numerals.lexc
The Pronoun lexicon points to all the subgrops, presented in this order below:
Splitting in 1st, 2nd, 3rd
New lemma form, now number as baseform, due to Oahpa
the firstperspronsg for first pers has special consonantism
for nonfirstperspronsg the 2nd and 3rd are identical
LEXICON firstpersprondu
LEXICON nonfirstpersprondu
DIHTE is a personal pronoun, demonstrative dïhte is treated below.
This is for: the attributive forms of dïhte all forms of the other pronouns
LEXICON Demonstrative
Same as above, with exceptions in
Sg Ill, Sg Ine, Sg Ela, Pl Com
Still open: analyse morphologically or not…
LEXICON DAGKERES
LEXICON Indefinite
LEXICON indeven-e
LEXICON indeven-a
LEXICON muvhtiecase
LEXICON muvhtiesg
LEXICON muvhtiepl
LEXICON naakenlex
LEXICON indodd
LEXICON indsg_odd
LEXICON indpl_odd
LEXICON indess_odd
LEXICON ind_noninfl
LEXICON indsg-e
LEXICON indsg-a
LEXICON indpl-e
LEXICON indpl-a
LEXICON indess
LEXICON indcoll
LEXICON Reflexive
LEXICON OBLREFL
LEXICON OBLREFL-NONPAL
LEXICON OBLREFL-NONPAL2
This (part of) documentation was generated from src/fst/morphology/stems/pronouns.lexc
(Söderhamn. Gävleb))
This (part of) documentation was generated from src/fst/morphology/stems/sma-propernouns.lexc
contlex stem umlaut dict class
LEXICON Verb splits to AUX and Regular_verbs
LEXICON AUX lemma for edtjedh, ij and lea, each with their own contlex in affixes.
LEXICON Regular_verbs here comes the whole list, appr. 11000.
This (part of) documentation was generated from src/fst/morphology/stems/verbs.lexc
This is one of two parallel files containing adjective affixes. The files represent two alternative interpretation of the same data (South Saami adjectives). This file is used for dictionary and icall applications, the alternative file adjectives.lexc is used for spellchecking. This file is compiled by in langs/sma giving the command .configure –with-oahpa before compiling. The other file (adjectives.lexc) is compiled by default.
Lexical exceptions
**LEXICON A_LDH **
**LEXICON NOERE **
**LEXICON BUERIE **
**LEXICON LEEVLES **
**LEXICON SOOKES **
**LEXICON SOOJMES **
**LEXICON SMAAVE **, both smaave and plaave
**LEXICON SNAARE **
**LEXICON ORRE **
**LEXICON STOERE **
**LEXICON STOERE_COMP **
**LEXICON NAAKE **
**LEXICON GISSE **
**LEXICON GAMTE **
**LEXICON GIEVTE **
**LEXICON KRUANA **
**LEXICON VEELKES **
**LEXICON ROOPSES **
**LEXICON SNEEHPES ** burde GEEHPES -> SNEEHPES ? ikke dokumentert geehpebe. s_S_ODD bort?
**LEXICON GEEHPES **
**LEXICON GEERVE **
**LEXICON TJOEVKES **
**LEXICON SAETNIES **
**LEXICON SUVHTJIES **
**LEXICON SAEBRIES **
**LEXICON MUJVIES **
**LEXICON STAERIES **
**LEXICON GIERIES **
**LEXICON BAERIES **
**LEXICON GÆHTJOES **
**LEXICON AAREH **
**LEXICON MOOREH **
**LEXICON EVTEBE **
**LEXICON ATTR_EVEN **
**LEXICON ATTR_ODD **
**LEXICON ø_Ø_EVEN **
**LEXICON IJVEadj **
**LEXICON LES ** Should contain only loanwords (?)
**LEXICON IJAALE_A_LOAN **
**LEXICON AATE_adj_LOAN **
**LEXICON AALE **
**LEXICON AALEFORMS **
**LEXICON oe_OE_EVEN **
**LEXICON e_E_EVEN **
**LEXICON e_E_EVENNOCOMP **
**LEXICON a_A_EVENNOCOMP **
**LEXICON a_A_EVEN **
**LEXICON ie_IE_EVEN **
**LEXICON es_ES_EVEN **
**LEXICON ies_IES_EVEN **
**LEXICON Cs_CS_EVEN **
**LEXICON ihks_IHKS_EVENNOCOMP **
**LEXICON RAARH ** Attr= h, Comp =Even( Jïjtje-raarh)
**LEXICON MAST **
**LEXICON JELLE **
**LEXICON UELLE **
**LEXICON e_ES_LOAN ** LOAN - fjerna comp LA - vi kan vurdere om noen av disse bør ha komp.
**LEXICON ÆRE ** LOAN
**LEXICON ENTE ** LOAN
**LEXICON LES_LASSE_ie ** derivasjon av verb
**LEXICON LES_LASSE_NOM_ie **
**LEXICON LES_LASSE_OBL_ie **
**LEXICON e_ES_EVEN **
**LEXICON e_ES_ODDEVEN **
**LEXICON es_E_EVEN **
**LEXICON en_E_EVEN **
**LEXICON as_A_EVEN ** attr= s, pred= e, comp=EVEN(ebe,emes), Case/Substantvien hammoe= EVEN
**LEXICON oes_OE_EVEN ** attr= s, pred= Ø, Comp= EVEN(ebe, emes) case/substantiven hammoe= Even
**LEXICON ies_IE_EVENNOCOMP ** attr= s, pred. = Ø, comp jih case: even (ebe/emes/esne)
**LEXICON ies_(ehke)_IE_EHKE_ODDCOMPe_EVEN **
**LEXICON ies_(ehke)_IES_EHKE_ODDCOMPe_EVEN **
**LEXICON ies_(ehke)_EHKE_ODDCOMPe_EVEN **
**LEXICON ies_IES_IE_EVEN ** Comp & case = even (ebe/emes/esne)
**LEXICON ies_IES_IE_EVENNOCOMP ** Attr=s, Pred. = s jih Ø, Comp jih caese= even (ebe/emes/esne)
**LEXICON as_AS_EVEN ** Attr=s, Pred. = s jih Ø, Comp jih caese= even (ebe/emes/esne)
**LEXICON IJLE_LOAN_A **
**LEXICON ø_Ø_ODD **
**LEXICON ah_AH_ODD ** XXX mangler CASE - hva er riktig - kanskje EVEN?
**LEXICON ah_AH_ODDNOCOMP **
**LEXICON hth_HTH_ODD **
**LEXICON Ce_CE_ODD ** bårreske
**LEXICON ege_EGE_ODDEVEN ** rudtjege
**LEXICON s_S_Ø_ODD ** hamhpas
**LEXICON s_E_ODD **
**LEXICON as_AS_ODD **
**LEXICON s_S_ODD **
**LEXICON es_ES_ODDhk **
**LEXICON oes_OES_ODD **
**LEXICON oes_OES_ODDahk **
**LEXICON oes_OES_ODDas **
**LEXICON Ces_CES_ODDNOCOMP **
**LEXICON les_LES_ODD **
**LEXICON Cs_CS_CE_ODD **
**LEXICON Cs_CS_ODD **
**LEXICON hts_HTS_ODD **
**LEXICON an_AN_ODDNOCOMP **
**LEXICON AABELE **
**LEXICON IJBELE **
**LEXICON Ce_CES_ODD **
**LEXICON ø_S_ODDEVEN ** , Comp=even jih ODD, Case= ODD
**LEXICON ø_S_ODD ** , Comp=even jih ODD, Case= ODD
**LEXICON JEASIEGOELKIJE **
__LEXICON BÅETIJE ! __ bårreske
**LEXICON jes_js_JES_JS_ODD ** to adj: bualijes, fååmijes
**LEXICON ijes_ijs_IJE_ODD ** GUALIJES
**LEXICON ijes_ijs_IJE_ODDNOCOMP ** DÅAJMIJES
**LEXICON Cs_CE_ODD **
**LEXICON Ces_CE_ODD **
**LEXICON Ces_Ce_CES_CE_ODD ** as Ces_CES_CE_ODD, but with ATTR_0
**LEXICON Ce_Ces_CES_CE_ODD **
**LEXICON Ces_CES_CE_ODD **
**LEXICON hks_hke_HKS_HKE_ODD **
**LEXICON as_AN_ODD **
**LEXICON oes_AN_ODD **
**LEXICON s_N_ODD **
**LEXICON ah_AN_ODDNOCOMP **
**LEXICON oeh_OEH_ODDNOCOMP **
**LEXICON oeh_OEN_ODD **
**LEXICON oeh_AN_ODD **
**LEXICON oeh_OEH_AN_ODD **
**LEXICON ø_N_ODD **
**LEXICON e_AN_ODD **
**LEXICON ies_EME **
**LEXICON HKE_ODD_NGCOMP **
**LEXICON HKE_ODD_COMP **
**LEXICON GEERUVE **
**LEXICON JAEDTUVES **
**LEXICON ATTR_0_PRED_0 **
**LEXICON ATTR_0 **
**LEXICON ATTR_S **
**LEXICON ATTR_H **
**LEXICON ATTR_N **
**LEXICON ATTRCONT **
**LEXICON PRED_0 **
**LEXICON PRED_S **
**LEXICON PRED_H **
**LEXICON PRED_N **
**LEXICON PRED_AN **
**LEXICON ODDCASE **
**LEXICON ODDCASENOM **
**LEXICON ODDCASEOBL **
**LEXICON ije_ODDCASE **
**LEXICON ije_ODDCASENOM **
**LEXICON ije_ODDCASEOBL **
**LEXICON eCASE **
**LEXICON eCASENOM **
**LEXICON eCASEOBL **
**LEXICON aCASE **
**LEXICON aCASENOM **
**LEXICON aCASEOBL **
**LEXICON ieCASE **
**LEXICON oeCASE **
**LEXICON oeCASE_NOMSG **
**LEXICON oeCASE_OBL **
**LEXICON A_OE_SGILL_UML **
**LEXICON A_OE **
**LEXICON A_OE_SG **
**LEXICON A_OE_PL **
**LEXICON A_OE_ESS **
**LEXICON EVENCOMP **
**LEXICON EVENCOMPONLY **
**LEXICON EVENSUPONLY **
**LEXICON EVENCOMP_oe **
**LEXICON ODDCOMP **
**LEXICON EVENCOMPCASE **
**LEXICON EVENCOMPCASE_oe **
**LEXICON DIMCOMP **
**LEXICON ÅBPOE **
**LEXICON ÅBPOE_N_OE **
**LEXICON ÅBPOE_N_OE_SG **
**LEXICON ÅBPOE_N_OE_PL **
**LEXICON ÅBPOE_N_OE_ESS **
**LEXICON ABPA **
**LEXICON ABPA_SG **
**LEXICON ABPA_PL **
**LEXICON MES **
**LEXICON MES_oe **
**LEXICON OMMES **
**LEXICON LAAKAN **
**LEXICON LEEJNES **
This (part of) documentation was generated from src/fst/oahpa-filer/aff-adjectives-oahpa.lexc
This is one of two parallel files containing adjective stems. The files represent two alternative interpretation of the same data (South Saami adjectives). This file is used for dictionary and icall applications, the alternative file adjectives.lexc is used for spellchecking. This file is compiled by in langs/sma giving the command .configure –with-oahpa before compiling. The other file (adjectives.lexc) is compiled by default.
The file starts as follows:
TG-grammatihkeles:TG-grammatihkel LES ;
aajmoes:aajmoe s_S_ODD ;
aajne:aajne ATTR_0 ; \ … \
This (part of) documentation was generated from src/fst/oahpa-filer/stems-adjectives-oahpa.lexc
retroflex plosive, voiceless t ʈ 0288, 648 (
= ASCII 096)
retroflex plosive, voiced d ɖ 0256, 598
labiodental nasal F ɱ 0271, 625
retroflex nasal n
ɳ 0273, 627
palatal nasal J ɲ 0272, 626
velar nasal N ŋ 014B, 331
uvular nasal N\ ɴ 0274, 628
bilabial trill B\ ʙ 0299, 665
uvular trill R\ ʀ 0280, 640
alveolar tap 4 ɾ 027E, 638
retroflex flap r ɽ 027D, 637
bilabial fricative, voiceless p\ ɸ 0278, 632
bilabial fricative, voiced B β 03B2, 946
dental fricative, voiceless T θ 03B8, 952
dental fricative, voiced D ð 00F0, 240
postalveolar fricative, voiceless S ʃ 0283, 643
postalveolar fricative, voiced Z ʒ 0292, 658
retroflex fricative, voiceless s
ʂ 0282, 642
retroflex fricative, voiced z` ʐ 0290, 656
palatal fricative, voiceless C ç 00E7, 231
palatal fricative, voiced j\ ʝ 029D, 669
velar fricative, voiced G ɣ 0263, 611
uvular fricative, voiceless X χ 03C7, 967
uvular fricative, voiced R ʁ 0281, 641
pharyngeal fricative, voiceless X\ ħ 0127, 295
pharyngeal fricative, voiced ?\ ʕ 0295, 661
glottal fricative, voiced h\ ɦ 0266, 614
alveolar lateral fricative, vl. K alveolar lateral fricative, vd. K\
labiodental approximant P (or v) alveolar approximant r\ retroflex approximant r` velar approximant M\
retroflex lateral approximant l`
palatal lateral approximant L
velar lateral approximant L
Clicks
bilabial O\ (O = capital letter)
dental |
(post)alveolar !\
palatoalveolar =\
alveolar lateral ||
Ejectives, implosives
ejective > e.g. ejective p p> implosive < e.g. implosive b b< Vowels
close back unrounded M close central unrounded 1 close central rounded } lax i I lax y Y lax u U
close-mid front rounded 2 close-mid central unrounded @\ close-mid central rounded 8 close-mid back unrounded 7
schwa ə @
open-mid front unrounded E open-mid front rounded 9 open-mid central unrounded 3 open-mid central rounded 3\ open-mid back unrounded V open-mid back rounded O
ash (ae digraph) { open schwa (turned a) 6
open front rounded & open back unrounded A open back rounded Q Other symbols
voiceless labial-velar fricative W voiced labial-palatal approx. H voiceless epiglottal fricative H\ voiced epiglottal fricative <\ epiglottal plosive >\
alveolo-palatal fricative, vl. s\ alveolo-palatal fricative, voiced z\ alveolar lateral flap l\ simultaneous S and x x\ tie bar _ Suprasegmentals
primary stress “
secondary stress %
long :
half-long :\
extra-short _X
linking mark -
Tones and word accents
level extra high _T level high _H level mid _M level low _L level extra low _B downstep ! upstep ^ (caret, circumflex)
contour, rising contour, falling _F contour, high rising _H_T contour, low rising _B_L
contour, rising-falling _R_F
(NB Instead of being written as diacritics with _, all prosodic
marks can alternatively be placed in a separate tier, set off
by < >, as recommended for the next two symbols.)
global rise
voiceless 0 (0 = figure), e.g. n_0 voiced _v aspirated _h more rounded _O (O = letter) less rounded _c advanced _+ retracted _- centralized _” syllabic = (or _=) e.g. n= (or n=) non-syllabic _^ rhoticity `
breathy voiced _t creaky voiced _k linguolabial _N labialized _w palatalized ‘ (or _j) e.g. t’ (or t_j) velarized _G pharyngealized _?\
dental d apical _a laminal _m nasalized ~ (or _~) e.g. A~ (or A~) nasal release _n lateral release _l no audible release _}
velarized or pharyngealized _e velarized l, alternatively 5 raised _r lowered _o advanced tongue root _A retracted tongue root _q
This (part of) documentation was generated from src/fst/phonetics/txt2ipa.xfscript
We describe here how abbreviations are in South Sámi are read out, e.g. for text-to-speech systems.
For example:
Kopi fra smj : samme navn som denne fila:
SMJ NOAB ! Abbreviations that are not treated as abbreviations at the end of the sentence = * **esim.:esimerkiksi # ; ** contains abbreviations who are transitive in front of numerals = * **esim.:esimerkiksi # ; ** contains transitive abbreviations = * **esim.:esimerkiksi # ; ** su, dii ============ SMI abbrevisations: ============ smi_ITRAB smi_TRAB smi_TRNUMAB
This (part of) documentation was generated from src/fst/transcriptions/transcriptor-abbrevs2text.lexc
S O U T H S A A M I G R A M M A R C H E C K E R
This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.
BOS EOS
N A Adv V Pron CS CC CC-CS Po Pr Pcle Num Interj ABBR ACR CLB LEFT RIGHT WEB PPUNCT PUNCT MWE
COMMA ¶
Pers Dem Interr Indef Recipr Refl Rel Coll NomAg Prop Allegro Arab Romertall
Nom Acc Gen Ill Ela Ine Loc Com Ess Ess Sg Du Pl Cmp/SplitR Cmp/SgNom Cmp/SgGen Cmp/SgGen PxSg1 PxSg2 PxSg3 PxDu1 PxDu2 PxDu3 PxPl1 PxPl2 PxPl3 Px
Comp Superl Attr Ord Qst IV TV Prt Prs Ind Pot Cond Imprt ImprtII Sg1 Sg2 Sg3 Du1 Du2 Du3 Pl1 Pl2 Pl3 Inf ConNeg Neg PrfPrc VGen PrsPrc Ger Sup Actio VAbess
Der/A
Der/Car
Der/Dimin
Der/InchL
Der/NomAct
Der/NomAg
Der/PassL
Der/PassS
Der/Rec
Der/adte
Der/ahtje
Der/alla
Der/d
Der/eds
Der/ht
Der/htalle
Der/htj
Der/ihks
Der/ijes
Der/l
Der/laakan
Der/ldahke
Der/ldh
Der/ldihkie
Der/les
Der/lg
Der/st
Other semantic sets:
PROP-ATTR
PROP-SUR
HUMAN
TIME-N-SET
@+FAUXV
@+FMAINV
@-FAUXV
@-FMAINV
@-FSUBJ>
@-F<OBJ
@-FOBJ>
@-FSPRED<OBJ
@-F<ADVL
@-FADVL>
@-F<SPRED
@-F<OPRED
@-FSPRED>
@-FOPRED>
FOBJ
FMAINV
FAUXV
@>ADVL
@ADVL<
@<ADVL
@ADVL>
@ADVL
@HAB>
@<HAB
@HAB
@>N
@Interj
@N<
@>A
@P<
@>P
@HNOUN
@INTERJ
@>Num
@Pron<
@>Pron
@Num<
@OBJ
@<OBJ
@OBJ>
@OPRED
@<OPRED
@OPRED>
@PCLE
@COMP-CS<
@SPRED
@<SPRED
@SPRED>
@SUBJ
@<SUBJ
@SUBJ>
SUBJ
SPRED
OPRED
@PPRED
@APP
@APP-N<
@APP-Pron<
@APP>Pron
@APP-Num<
@APP-ADVL<
@VOC
@CVP
@CNP
OBJ
<OBJ
OBJ>
<OBJ-OTHERS
OBJ>-OTHERS
SYN-V
@X
This part of the file lists a large number of sets based partly upon the tags defined above, and partly upon lexemes drawn from the lexicon. See the sourcefile itself to inspect the sets, what follows here is an overview of the set types.
INITIAL
WORD NOT-COMMA
ADLVCASE
CASE-AGREEMENT CASE
NOT-NOM NOT-GEN NOT-ACC
NOT-V
REAL-NEG
MOOD-V
NOT-PRFPRC
SG1-V SG2-V SG3-V DU1-V DU2-V DU3-V PL1-V PL2-V PL3-V
These sets model noun phrases (NPs). The idea is to first define whatever can occur in front of the head of the NP, and thereafter negate that with the expression WORD - premodifiers.
Naming convention &errorclass-errortype-wrong-correct: So far only one errorclass: msyn.
RULE SECTION
VERB agreement
Ensure preceding nominal agrees with the verb
This (part of) documentation was generated from tools/grammarcheckers/grammarchecker.cg3
"<.>" "<!>" "<?>" "<...>" "<¶>" sent
(>>>) (<s>)
(<<<) (</s>)
Nom Acc Gen Ine Ela Ill Com Ess
PxSg1 PxSg2 PxSg3 PxPl1 PxPl3 PxPl3
Der/A
Der/Car
Der/Dimin
Der/InchL
Der/NomAct
Der/NomAg
Der/PassL
Der/PassS
Der/Rec
Der/adte
Der/ahtje
Der/alla
Der/d
Der/eds
Der/ht
Der/htalle
Der/htj
Der/ihks
Der/ijes
Der/l
Der/laakan
Der/ldahke
Der/ldh
Der/ldihkie
Der/les
Der/lg
Der/st
Der/vuota
We define two lists for Err/xxx
tags:
Err/Orth
:
Err/Orth
Err/Orth-a/á
Err/Orth-nom/gen
Err/Orth-nom/acc
Err/DerSub
Err/CmpSub
Err/UnspaceCmp
Err/HyphSub
Err/SpaceCmp
Err/Spellrelax
err_orth_mt
Err/Orth-spes
:
Err/Orth-a/á
Err/Orth-nom/gen
Err/Orth-nom/acc
Err/DerSub
Err/CmpSub
Err/UnspaceCmp
Err/HyphSub
Err/SpaceCmp
Err/Spellrelax
err_orth_a_á_mt
err_orth_nom_acc_mt
err_orth_nom_gen_mt
Cmp/Hyph
<vdic>
REAL-TITLE OFFICE TITLE
CASES ADVLCASE NUMBER
INSTITUTION ORGANIZATION EDUCATION CURRENCY CURRENCY LESSON
REALCOPULAS
COPULAS
V-NOT-COP
MOD-ASP
GUKTIEGOSSE
DAESTIE
ILLADV
INEADV1
ELAADV1
INEADV
ELAADV
DV-MOD-ADV
ILLPO
REALCLB
SV-BOUNDARY
NP-BOUNDARY
V-DER
V-DER-SUF
N-DER N-DER-SUF
A-DER A-DER-SUF
PASS
LEX-V LEX-N LEX-A LEX-ADV
VERB-FORMS 2-PERS
BEFORE-SECTIONS
Rule for adding Sem/Date as a tag to readings which looks like dates (fjernes når vi får felles numeralfil fra shared)
Rules for adding
SECTION
Removing non-lexicalised forms when lexicalised
REmove Px if not family
INITIAL
Selecting postpositions when preceded by genitives, etc.
Rel or Interr OR Indef
Selecting adverbs in local contexts
Selecting verbs in local contexts, based upon agreement patterns
Selecting imperative sentence-initially with appropriate right context
Remove verb readings
Select Inf
Mapping CNP to CC and CS.
Mapping @CVP to all CS
Attributes or not
Select PrfPrc if DerNomAct
Mapping verbs
This rule removes all other readings, if there is a mapped V reading in the same cohort. Every case which this goes wrong, should be fixed in mapping rules or previous disrules.
leah Prs Sg2 = Pl3
Select Inf If Infv
Remove Prop Attr if not 1 Prop
Ger or Der/NomAct
Adj or Indef
Num
Adv or Po/Pr
Illative or genetive
Essive
Comitative
Accusative or illative
Indef or Adv
special lemmas
Adverb context prefers Adv
Verb person vs. Inf – moved here in order to have the pronouns disambiguated first.
Rule set taken from sme
gellie as numeral, not pronoun
This (part of) documentation was generated from tools/grammarcheckers/grc-disambiguator.cg3
Usage:
$ make
$ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
$ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
$ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
$ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
Pmatch documentation: https://github.com/hfst/hfst/wiki/HfstPmatch
Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words:
U+00AD
U+FEFF
.Whitespace contains ASCII white space and the List contains some unicode white space characters
Apart from what’s in our morphology, there are
hfst-tokenise -a
Unknowns are made of:
Unknowns are tagged ?? and treated specially with hfst-tokenise
hfst-tokenise –giella-cg will treat such empty analyses as unknowns, and
remove empty analyses from other readings. Empty readings are also
legal in CG, they get a default baseform equal to the wordform, but
no tag to check, so it’s safer to let hfst-tokenise handle them.
Finally we mark as a token any sequence making up a:
This (part of) documentation was generated from tools/tokenisers/tokeniser-disamb-gt-desc.pmscript
Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just:
$ make
$ echo "ja, ja" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
More usage examples:
$ echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa boasttu olmmoš, man mielde lahtuid." | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
$ echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
$ echo "márffibiillagáffe" | hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
Pmatch documentation: https://github.com/hfst/hfst/wiki/HfstPmatch
Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words:
U+00AD
U+FEFF
.Whitespace contains ASCII white space and the List contains some unicode white space characters
Apart from what’s in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a
TODO: Could use something like this, but built-in’s don’t include šžđčŋ:
Simply give an empty reading when something is unknown: hfst-tokenise –giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it’s safer to let hfst-tokenise handle them.
Finally we mark as a token any sequence making up a:
This (part of) documentation was generated from tools/tokenisers/tokeniser-gramcheck-gt-desc.pmscript
Requires a recent version of HFST (3.10.0 / git revision>=3aecdbc) Then just:
make
echo "ja, ja" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
More usage examples:
echo "Juos gorreválggain lea (dárbbašlaš) deavdit gáibádusa \
boasttu olmmoš, man mielde lahtuid." \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
echo "(gáfe) 'ja' ja 3. ja? ц jaja ukjend \"ukjend\"" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
echo "márffibiillagáffe" \
| hfst-tokenise --giella-cg tokeniser-disamb-gt-desc.pmhfst
Pmatch documentation: https://kitwiki.csc.fi/twiki/bin/view/KitWiki/HfstPmatch
Characters which have analyses in the lexicon, but can appear without spaces before/after, that is, with no context conditions, and adjacent to words:
U+00AD
U+FEFF
.Whitespace contains ASCII white space and the List contains some unicode white space characters
Apart from what’s in our morphology, there are 1) unknown word-like forms, and 2) unmatched strings We want to give 1) a match, but let 2) be treated specially by hfst-tokenise -a
TODO: Could use something like this, but built-in’s don’t include šžđčŋ:
Simply give an empty reading when something is unknown: hfst-tokenise –giella-cg will treat such empty analyses as unknowns, and remove empty analyses from other readings. Empty readings are also legal in CG, they get a default baseform equal to the wordform, but no tag to check, so it’s safer to let hfst-tokenise handle them.
Needs hfst-tokenise to output things differently depending on the tag they get
This (part of) documentation was generated from tools/tokenisers/tokeniser-tts-cggt-desc.pmscript