Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-smj
All doc-comment documentation in one large file.
Rules for removing some Props which are identical to common nouns
IfonlyVerb selects the FMAINV reading in the cohort
Go for minimal weight (requires –with-backend-format=openfst-tropical)
This (part of) documentation was generated from src/cg3/disambiguator.cg3
**LEXICON ab-noun **
**LEXICON ab-adj **
**LEXICON ab-adv **
**LEXICON ab-num **
**LEXICON ab-nodot-noun ** The bulk
**LEXICON ab-nodot-adj **
**LEXICON ab-nodot-adv **
**LEXICON ab-nodot-num **
**LEXICON ab-dot-noun ** This is the lexicon for abbrs that must have a period.
**LEXICON ab-dot-adj ** This is the lexicon for abbrs that must have a period.
**LEXICON ab-dot-adv ** This is the lexicon for abbrs that must have a period.
**LEXICON ab-dot-num ** This is the lexicon for abbrs that must have a period.
**LEXICON ab-dot-cc **
**LEXICON ab-dot-verb **
**LEXICON ab-nodot-verb **
**LEXICON ab-dot-IVprfprc **
**LEXICON nodot-attrnomaccgen-infl **
**LEXICON nodot-attr-infl **
**LEXICON nodot-nomaccgen-infl **
**LEXICON dot-attrnomaccgen-infl **
**LEXICON dot-attr **
**LEXICON dot-nomaccgen-infl **
**LEXICON DOT ** - Adds the dot to dotted abbreviations.
This (part of) documentation was generated from src/fst/morphology/affixes/abbreviations.lexc
LEXICON GIEVRRA Adjectives with attribute in WeG and -s. As 1a in Spiik. Sg Acc: gievrav, Attr: gievras.
gárttje+A+Sg+Nom
gárttje+A+Sg+Acc
gárttje+A+Attr
gárttje+A+Der/Comp+A+Sg+Nom
LEXICON NUORRA Adjectives with attribute same as pred. As 1b in Spiik. Sg Acc: nuorav, Attr: nuorra.
visská+A+Sg+Nom
visská+A+Sg+Acc
visská+A+Attr
visská+A+Der/Comp+A+Sg+Nom
LEXICON GALLJE Adjectives on -e, the attribute is in WeG and e > a. As 1d in Spiik. Sg Acc: galjev, Attr: galja.
uhttse+A+Sg+Nom
uhttse+A+Sg+Acc
uhttse+A+Attr
uhttse+A+Attr
(Eng. # gets this attr from)uhttse+A+Der/Comp+A+Sg+Nom
LEXICON TJÁBBE Adjectives on -e, the attribute is in WeG and e > a. Same as GALLJE only different adv derivation. Sg Acc: tjáppev, Attr: tjáppa.
njálgge+A+Sg+Nom
njálgge+A+Sg+Acc
njálgge+A+Attr
njálgge+A+Der/Comp+A+Sg+Nom
LEXICON VILLDA Adjectives with attribute same as pred, without CG. As 1b in Spiik. Sg Acc: nuorav, Attr: nuorra.
frisska+A+Sg+Nom
frisska+A+Sg+Acc
frisska+A+Attr
frisska+A+Der/Comp+A+Sg+Nom
LEXICON HÁVSSKE Adjectives with attribute -s, without WeG. As 1c in Spiik. Sg Acc: hávsskev, Attr: hávsskes.
hoallá+A+Sg+Nom
hoallá+A+Sg+Acc
hoallá+A+Attr
hoallá+A+Der/Comp+A+Sg+Nom
LEXICON TJUODDJE Adjectives with attribute -is, without WeG. presently only “Tjuoddje” Sg Acc: tjuoddjev, Attr: tjuoddjis.
tjuoddje+A+Sg+Nom
tjuoddje+A+Sg+Acc
tjuoddje+A+Attr
tjuoddje+A+Der/Comp+A+Sg+Nom
Pres.participles
LEXICON SÁVADAHTTE Causative-participles. No attribute. No comparision. As 1e in Spiik. Sg Acc: sávadahttev. PrsPrc of causative verbs “uttrykker at handlingen lar seg gjøre eller er verdt å gjøre” (Kintel 1991).
vuojedahtte+A+Sg+Nom
vuojedahtte+A+Sg+Acc
LEXICON JUHKKE participles with -s attributive. No comparision As 1e in Spiik. Sg Acc: juhkkev, Attr: juhkkes. Spiik: presens particip har med den attributive formen på -s betydelsen “någon som är duktig i, snabb til att, begiven att utföra handlingen”.
vuohttje+A+Sg+Nom
vuohttje+A+Sg+Acc
vuohttje+A+Attr
LEXICON BÅRRE participles without the -s attributive. As 1e in Spiik. Sg Acc: bårrev, Attr: bårre. Spiik: presens particip har med den attributiva formen utan -s betydelsen ºdem som utför handlingenº.
ednabårre+A+Sg+Nom
ednabårre+A+Sg+Acc
ednabårre+A+Attr
Test data:
Loan words lexicas
LEXICON METÅVDÅLASJ LOAN! Foreign -isk adjectives adapted in updated normative way. To smj ending -alasj, adjective is truly derived from a noun. Mekanisk-mekanihkka-mekanihkalasj, instead of mekánalasj that goes to MEKÁNALASJ_BADASS. Pred and attr are both -alasj. Attr same as pred. With comparatives.
LEXICON METÅVDÅLASJ_CMP_INFL
kapitalismalasj+A+Sg+Nom
kapitalismalasj+A+Sg+Acc
kapitalismalasj+A+Attr
kapitalismalasj+A+Der/Comp+A+Sg+Nom
LEXICON MEKANIHKA_MEKANIJKA_LASJ LOAN! Same type of adjectives as METÅVDÅLASJ, only for adjektives that become mekanihkalasj in norway and mekanijkalasj in sweden, because of differences mekanik vs mekanikk>mekanijkka vs mekanihkka. Attr same as pred. With comparatives.
LEXICON IJJALASJ Just lik METÅVDÅLASJ only for words ending on ijjalasj/iddjalasj, so that we don’t need a lot of Area and Err tags in stems file.
LEXICON IJJALASJ_CMP_INFL
LEXICON OGIJJALASJ Just like IJJALASJ only for words ending on ogijjalasj/ogiddjalasj, so that we don’t need a lot of err tags in stems files. For words like “pedagogijjalasj” which also have “pedagåvgålasj” (not really a wrong derivation, but doesn’t mean pedagogisk) and “pedagogalasj” err taged.
LEXICON OGIJJALASJ_CMP_INFL
LEXICON SJÅNÅLASJ_SJONAL -sjonal/sjonell and -tional/tionel loanwords. Only for words that work as nouns, so that they are REAL dervations, as nasjonal-nasjåvnnå-nasjåvnålasj. NOT for words like “rasjonell”, with no real noun. Words as “rasjonell>rasjonálla-rasjonálalasj” go to lexicon ÁLLA. The fake derivation “nasjonálalasj” is err taged, so is the strange “nasjonálla/nasjunálla”.
LEXICON SJÅNÅLASJ_SJONAL_CMP_INFL
nasjåvnålasj+A+Sg+Nom
nasjåvnålasj+A+Sg+Acc
nasjåvnålasj+A+Attr
LEXICON SJÅNÅLASJ_SJONELL -sjonal/sjonell and -tional/tionel loanwords. Only for words that work as nouns, so that they are REAL dervations, as nasjonal-nasjåvnnå-nasjåvnålasj. NOT for words like “rasjonell”, with no real noun. Words as “rasjonell>rasjonálla-rasjonálalasj” go to lexicon ÁLLA. The fake derivation “nasjonálalasj” is err taged, so is the strange “nasjonálla/nasjunálla”.
LEXICON SJÅNÅLASJ_SJONELL_CMP_INFL
konstitusjåvnålasj+A+Sg+Nom
konstitusjåvnålasj+A+Sg+Acc
konstitusjåvnålasj+A+Attr
LEXICON MEKÁNALASJ_BADASS LOAN! Wronly assimilated -lasj adjectives from SE/NO -isk. Looks derived but isn’t since there is no real noun to be derived from. Like mekanisk-mekánalasj, but “mekádna” is no real noun! Like METÅVDÅLASJ, but gives the Err/Der tag, so it’s only for these wronly/non-derived loan adjectives.
LEXICON ARKTALASJ_CMP_INFL Foreign -isk, that are not real derivations. Same as MEKÁNALASJ_BADASS, but no +Use/-Spell tag since ther is no “right” way to assimilate these. This is a question for GG. Adapted to smj by simply adding -alasj in place of -isk. These are not real derivations, but sitation borrowed loan adjectives. Only words without a noun base, like arktisk and syntetisk. Pred and attr are both -lasj. No comparatives.
syntetalasj+A+Sg+Nom
syntetalasj+A+Sg+Acc
syntetalasj+A+Attr
LEXICON ORÁNSSJA Loan adjectives, not -isk. Used without the -lasj. Adjectives with attribute same as pred. So far only for oránssja.
LEXICON DEMONSTRATIJVA_LASJ_NO_NORM Loan adjectives from norwegian/swedish (Not adjectives ending on -isk). Words like demonstrativ, transitiv, dupleks, informativ, analog, privat. Gives both “demonstratijvva” and “demonstratijvalasj”. Two ways of adapting these adjectives are used, the adding of -lasj isn’t okey, because that’s a false derivation. But GG hasn’t decided how these should be handled. Looks like noun instead of adjective when adapted without the -lasj ending. Attr is in weak grad, used in strong grad ass pred even thou this seems a little bit odd “Værbba l transitijvva”.
LEXICON DEMONSTRATIJVA_LASJ_CMP_INFL
aktijvva+A+Attr
aktijvva+A+Sg+Nom
aktijvva+A+Sg+Acc
LEXICON ÁLA_LASJ_NO_NORM Same as DEMONSTRATIJVA_LASJ_NO_NORM. Only for adjectives ending on -al. Words like digital,liberal, lokal. Gives both “eksponentiálla” and “eksponentiálalasj”. Different lexicon for these -al adjectives because of Err/Orth tags. OBS, “dialektal”, is assimilated “dialevtalasj”, and goes to lexicon METÅVDÅLASJ.
LEXICON ÁLA_LASJ_INFL_CMP
LEXICON ELLA_LASJ_NO_NORM Loanwords, same as ÁLA_LASJ_NO_NORM and DEMONSTRANTIJVA_LASJ_NO_NORM. For NO and SE adjectives ending on -ell, eksperimentell, ideell, parallell. The short form is nom parallælla, attr, parallella The long form: paralellalasj, attr parallellalasj. Different lexicon for these -ell adjectives because of err/orth tags. OBS, “individuell”, is assimilated “indivijdalasj”, and goes to lexicon METÅVDÅLASJ.
LEXICON ELLA_LASJ_INFL_CMP
LEXICON ÁLLA-ÆLLA
LEXICON MEKÁNALASJ_CMP_INFL Same as METÅVDÅLASJ only without vuohta.
Inherent comparatives and superlatives lexica
LEXICON OANEP Inherent comparatives, gives comp and superl. Adjectives that are lexicalized in their comparative (and superlative) forms, like sisŋep, bárep. Some entries are likely incorrect compared forms of other adjectives, like ådåp and ruvvap (more research needed).
lagáp+A+Sg+Nom
lagáp+A+Der/Superl+A+Sg+Nom
LEXICON TJAVGGÁMUS Inherent superlatives, only gives superl. Some words are lexicalized in their superlative forms, like dájvvámus. Some are likely incorrect superlative forms, like tjábbámus (more research is needed)
dájvvámus+A+Sg+Nom
4-syllable miscellanious stems
LEXICON ÁRMMOGIS Adjectives on -is, attribute same as pred. Odd-syllable comparison. As 2 in Spiik. Sg Acc: ármmogisáv, Attr: ármmogis.
bahágis+A+Sg+Nom
bahágis+A+Sg+Acc
bahágis+A+Attr
bahágis+A+Der/Comp+A+Sg+Nom
LEXICON SÆHKÁLAK Adjectives on -álak, attribute same as pred. Odd-syllable comparison. So far only for “sæhkálak”.
sæhkálak+A+Sg+Nom
sæhkálak+A+Sg+Nom
sæhkálak+A+Sg+Acc
sæhkálak+A+Sg+Acc
sæhkálak+A+Attr
sæhkálak+A+Attr
sæhkálak+A+Der/Comp+A+Sg+Nom
sæhkálak+A+Der/Comp+A+Sg+Nom
LEXICON ÅLLAGSJ_CMP_INFL Adjectives on -asj, attribute same as pred. No comparatives. 2 in Spiik. Sg Acc: ållagattjav, Attr: ållagasj.
belulasj+A+Sg+Nom
belulasj+A+Sg+Acc
belulasj+A+Attr
LEXICON DÁRBULASJ_CMP_INFL Adjectives on -asj, attribute same as pred. Odd-syllable comparison. Sg Acc: dárbulattjav, Attr: dárbulasj. Essive -attjan, -adtjan is subtaged. Err/Orth also -ahttja.
dábálasj+A+Sg+Nom
dábálasj+A+Sg+Acc
dábálasj+A+Attr
dábálasj+A+Attr
dábálasj+A+Der/Comp+A+Sg+Nom
LEXICON ASIDASJ_CMP_INFL Adjectives on -asj, -is attr. Odd-syllable comparison. Sg Acc: asidattjav, Attr: asidis.
gågulasj+A+Sg+Nom
gågulasj+A+Sg+Acc
gågulasj+A+Attr
gågulasj+A+Der/Comp+A+Sg+Nom
LEXICON UDNODIBME Adjectives on -dibme, attribute on -is. Odd-syllable comparison. Sg Acc: udnodimev, Attr: udnodis.
gælvodibme+A+Sg+Nom
gælvodibme+A+Sg+Nom
gælvodibme+A+Sg+Acc
gælvodibme+A+Sg+Acc
gælvodibme+A+Attr
gælvodibme+A+Attr
gælvodibme+A+Der/Comp+A+Sg+Nom
gælvodibme+A+Der/Comp+A+Sg+Nom
LEXICON TJALMEDIBME Like UDNODIBME but no comparatives. Sg Acc: tjalmedimev, Attr: tjalmedis.
huvsodibme+A+Sg+Nom
huvsodibme+A+Sg+Acc
huvsodibme+A+Attr
LEXICON SUOLASIEHKE -siehke. Sg Acc: suolasiegev, attr: suolasiek
hánessiehke+A+Sg+Nom
hánessiehke+A+Sg+Acc
hánessiehke+A+Attr
LEXICON TJIEGOS Attr same as pred. For adjectives with -e in second syllable e>á: divtes>diktásav in StrG. As a. in Spiik. Sg Acc: tjiehkusav, Attr: tjiegos. Consonant gradation.
måskas+A+Sg+Nom
måskas+A+Sg+Acc
måskas+A+Attr
måskas+A+Der/Comp+A+Sg+Nom
bihtja+A+Sg+Nom
bihtja+A+Sg+Acc
bihtja+A+Attr
bihtja+A+Der/Comp+A+Sg+Nom
LEXICON LINES Attr ending on -a. Adjektives ending on -es. Does same as TJIEGOS, but with attr -a. As g. in Spiik. lines, Sg Acc: lidnásav, attr: lidna. Consonant gradation.
lines+A+Sg+Nom
lines+A+Sg+Acc
lines+A+Attr
lines+A+Der/Comp+A+Sg+Nom
LEXICON GALMAS Attr ending on -a or -å. Adjectives on -as, ås- and ás. As e. in Spiik. Sg Acc: galmmasav, attr: galmma, Consonant gradation.
njuoskas+A+Sg+Nom
njuoskas+A+Sg+Acc
njuoskas+A+Attr
njuoskas+A+Der/Comp+A+Sg+Nom
LEXICON OAMES Attr ending on -e. Adjectives on -es with attribute -e. As g2. in Spiik. Sg Acc: oabmásav, Attr: oabme. Consonant gradation.
goastes+A+Sg+Nom
goastes+A+Sg+Acc
goastes+A+Attr
goastes+A+Der/Comp+A+Sg+Nom
LEXICON SUOHKAT Attr III -is, not suohkkadis but SUOHKKIS. With CG to attr, not from nom to Acc. Same as JALGGAT only with this CG. SUOHKKIS. Without CG between nom and acc. Adjectives on -at and -åt, with attribute III -is. As f. in Spiik. Sg Acc: suohkadav, attr: suohkkis,
rávvat+A+Sg+Nom
rávvat+A+Sg+Acc
rávvat+A+Attr
rávvat+A+Der/Comp+A+Sg+Nom
LEXICON MÅJDÅS Adjectives with no attr. With CG. Sg Acc: måjddåsav. If there is an attribute that dosn’t fit to any lexicon it mus be hardcoded.
rávdes+A+Sg+Nom
rávdes+A+Sg+Acc
rávdes+A+Der/Comp+A+Sg+Nom
Without CG
LEXICON VIEKSES Attr same as pred. Without CG, but With vowel changes. Sg Acc: væksásav, Attr: viekses. Like TJIEGOS only without the CG but with vowel changes. Mayby change this to a lexicon withput attr and then hardcode attr?
LEXICON ALEK Attr same as pred. Without CG, without any vowel changes. Like TJIEGOS only without the CG an vowelchanges.
purpur+A+Sg+Nom
purpur+A+Sg+Acc
purpur+A+Attr
purpur+A+Der/Comp+A+Sg+Nom
LEXICON BASSTEL Attr ening on -is. Without CG. Adjs on -et, -l, -r, sm om -k, -sj with attr -is and no consonant gradation. As b. in Spiik. Sg Acc: basstelav, Attr: basstelis. Many of these entries might be instances of derivations, like belak, deblak, and maybe also basstel, bargán.
goavrret+A+Sg+Nom
goavrret+A+Sg+Acc
goavrret+A+Attr
goavrret+A+Der/Comp+A+Sg+Nom
LEXICON MUTTÁK Two attr enings -is and same as pred. Without CG. Adjs on -ák/-ak/-ek, two attr: -is and same as pred. As c. in Spiik. Sg Acc: muttágav, Attr: muttágis and mutták. These seem to be instances of the adjectival -k derivation. Unclear whether such derivation have different attr forms or not, and thats maybe why some of these derivations are found in BASSTEL lexicon.
bárvak+A+Sg+Nom
bárvak+A+Sg+Acc
bárvak+A+Attr
bárvak+A+Attr
bárvak+A+Der/Comp+A+Sg+Nom
LEXICON JALGGAT Attr III -is, not jalggadis but JALGGIS. Without CG. Adjectives on -at, with attribute III -is. As f. in Spiik. Sg Acc: jalggadav, attr: jalggis,
russjkat+A+Sg+Nom
russjkat+A+Sg+Acc
russjkat+A+Attr
russjkat+A+Der/Comp+A+Sg+Nom
LEXICON TJÅRGGÅT Attr III -is, not tjårggådis but tjårggis. Without CG. Same as JALGGAT only for adjectives ending ot -åt. Adjectives on -åt, with attribute III -is. As f. in Spiik. Sg Acc: jalggadav, attr: jalggis,
russjkat+A+Sg+Nom
russjkat+A+Sg+Acc
russjkat+A+Attr
russjkat+A+Der/Comp+A+Sg+Nom
LEXICON RIHTSOK No attr, without CG and also without any vowelchanges. The lexicon gives no attribute, either because the adjective dosnºt have attr, because there is stemvowel change in attr that the lexicon canºt handle or because there are strange atrributes that donºt fit to any other lexicon (these attributes are hardcoded). Sg Acc: rihtsogav.
rihtsok+A+Sg+Nom
rihtsok+A+Sg+Acc
rihtsok+A+Der/Comp+A+Sg+Nom
exception lexicons for odd-syll
LEXICON IENNILS no comparatives, attr same as pred.
ieŋŋils+A+Sg+Nom
ieŋŋils+A+Sg+Acc
LEXICON RÁDAS Presently only used for “rádas”. This word has special consonant gradation d>dd. Attr same as pred. Sg Acc: ráddasav, Attr: rádas. Consonant gradation.
rádas+A+Sg+Nom
rádas+A+Sg+Acc
rádas+A+Attr
rádas+A+Attr
(Eng. # from LEXATTR)rádas+A+Der/Comp+A+Sg+Nom
LEXICON LUOBES Err/Orth lexicon! Does the same as TJIEGOS only e>a instead of usuall e>á, must be some err/orth. Sg Acc: luohpasav, Attr: luobes. Consonant gradation. NO Attr, must be hardcoded
LEXICON LÅSSÅT Two attr, two comp. As f3. in Spiik. So far the only word i this lexicon i “låssåt”, because both låssis and låsså are attr and comparative is both låsep(hybrid?) and låssådabbo.
låssåt+A+Sg+Nom
låssåt+A+Sg+Acc
låssåt+A+Attr
låssåt+A+Attr
låssåt+A+Der/Comp+A+Sg+Nom
låssåt+A+Der/Comp+A+Sg+Nom
LEXICON STUORAK Only for stuorak. It hase two attributes. Has even-syllable comparison: stuoráp and stuorámus.Sg Acc: stuoragav, attr: stuor and stuorra. This might be a -k derivation of adjective stuorre attr stuor(ra). The comparison is thus based on the original adjective and thus it naturally is an even syll comparison.
stuorak+A+Sg+Nom
stuorak+A+Sg+Acc
stuorak+A+Attr
stuorak+A+Attr
stuorak+A+Der/Comp+A+Sg+Nom
LEXICON ALLAK Adjs on -ak, attr.on -a. Have both gasep/gaggagabbo and alep/allagabbo as comparatives. As d. in Spiik. So far only the adjectives “allak” and “gassak” go to this lexicon.
gassak+A+Sg+Nom
gassak+A+Sg+Acc
gassak+A+Attr
gassak+A+Der/Comp+A+Sg+Nom
gassak+A+Der/Comp+A+Sg+Nom
LEXICON GÅBDDÅK Adjs on -åk, attr. on -å. Has even-syllable comparison: gåbdep and gåbdemus. So far “gåbddåk” is the only word in this lexicon. As d2. in Spiik. Sg Acc: gåbddågav, Attr: gåbddå.
gåbddåk+A+Sg+Nom
gåbddåk+A+Sg+Acc
gåbddåk+A+Attr
gåbddåk+A+Der/Comp+A+Sg+Nom
Inherent comparatives and superlatives
LEXICON NUORTTALABBO Inherent comparatives, gives both comp and superl. Most of the words are the compared forms of -el(a) words, like nuorttal, lullel.
guddnelabbo+A+Sg+Nom
guddnelabbo+A+Der/Superl+A+Sg+Nom
guddnelabbo+A+Attr
guddnelabbo+A+Attr
LEXICON GASSKALAMOS Inherent superlatives, gives onlys superl. Words that are lexicalized in their superlative forms.
ájtodamos+A+Sg+Nom
LEXICON SÁDNES Attr same as pred. Sg Acc: sáddnáv, Attr: sádnes.
hávres+A+Sg+Nom
hávres+A+Sg+Acc
hávres+A+Attr
hávres+A+Der/Comp+A+Sg+Nom
LEXICON GOAVSOS Attr same as pred. Sg Acc: goaksuv, Attr: goavsos.(goavsos is so far the only word in this lexicon)
goavsos+A+Sg+Nom
goavsos+A+Sg+Acc
goavsos+A+Sg+Acc
(Eng. # From lexicon TJIEGOS)goavsos+A+Attr
goavsos+A+Der/Comp+A+Sg+Nom
goavsos+A+Der/Comp+A+Sg+Nom
(Eng. # from lexicon TJIEGOS)LEXICON SUVRES Sg Acc: suvrráv, Attr: suvra.
suvres+A+Sg+Nom
suvres+A+Sg+Acc
suvres+A+Sg+Acc
(Eng. # From lexicon SJÆVNNJAT)suvres+A+Attr
suvres+A+Der/Comp+A+Sg+Nom
suvres+A+Der/Comp+A+Sg+Nom
(Eng. # from LINES)LEXICON GÅLMAKTES Attr same as pred. without cg but with vowel changes. Sg Acc: gålmaktáv, Attr: gålmaktes. VIEKSES makes odd-syll same thing.
LEXICON BU/MUS comparison for even-syll adjectives. Also derivates diminutive and adverbs from the comparisions.
LEXICON ABBO/AMOS comparison for odd-syll adjectives. Also derivates diminutive and adverbs from the comparisions.
LEXICON BUStem Comparative even-syll, case and attr.
LEXICON ABBO Comparative odd-syll, get case and attr. With the dialect differences “-ubbo” and “-æbbo”.
LEXICON BUOREMUS Superlative even-syll, get attr and nom case.
LEXICON AMOS Superlative odd-syll, get case and attr. With the dialect differences “-umos” and “-æmos”.
Comparative and Superlative sub-lexica
LEXICON CompSup-EVEN
LEXICON CompSup-EVENWEAKSTEM
LEXICON ATTR Sends attributes to
LEXICON ATTR_PrsPrc Attr without -vuohta derivation.
LEXICON DenominalAdjsV1 ! even noun stems are sent here
LEXICON DenominalAdjsV1_1 ! even noun stems without grade alternation are sent here
LEXICON DenominalAdjsV2 ! even noun stems are sent here. -asj derivation
LEXICON DenominalAdjsKINO ! unassimilated nouns are sent here
LEXICON DenominalAdjsODD ! gives derivation -ahtes
LEXICON DenominalAdjsContr
Derivations to adjectives, hardcoded in adjectives stems file
LEXICON DIEHTEMAHTES ! odd syllable For hardcoded -ahtes words. Derived from odd-syll NomAct (Bårråt>bårråm-bårråmahtes), or from odd-syll verbs as buorránit>buorránahtes. Migth want to split lexicon in two.
LEXICON LÁGÁSJ
LEXICON BÁJNUK ! hardcoded denominal derivations, latus has changed from o>u, a>a, e>á (Bájnno>bájnuk, juolgge>juolgák, giella>gielak. Attr same as pred, no comp in this lexicon.
LEXICON TSÅHPÅK ! hardcoded denominal derivations latus has changed from o>u, a>a, e>á AND -GIS attr. Attr same as pred is err/orth taged. no comp in this lexicon.
LEXICON GIEVLEK ! hardcoded derivations, not same as BÁJNUK since latus has unexpected vowel. Latus hasn’t changed o>u, a>a, e>á. Goes directly to BÁJNUK, only made to sort these different kinds of derivations. Many of these may be derivated from verbs or other adjectives.
LEXICON SJERVAK ! hardcoded derivations, not same as TSÅHPÅK since latus has unexpected vowel. Latus hasn’t changed o>u, a>a, e>á. Goes directly to TSÅHPÅK, only made to sort these different kinds of derivations. Many of these may be derivated from verbs or other adjectives.
LEXICON DIBME ! even and contracted
LEXICON LIS ! Handlernomen på -is?
LEXICON Ahkásasj ! lexicalized and denominal -asj derivations
LEXICON STÁVVALIS ! Must be “stávvalis” in bot pred and attr, as “guovddelis”. OK& Kintel 2012: stávval attr stávvalis this is err/orth taged, also as second compound, this is err/orth taged. No comparison.
Derivations to adjectives, continuation lexicon not for hardcoded adjectives
LEXICON AHTES ! odd syllable, only a continuation lexicon for words that are not in adjectives stems. Just as DIEHTEMAHTES, only with the +A tag that adjectives already get i stems file.
LEXICON AHKES
LEXICON AGAdj ! denominal derivations go here, attr same as pred, no comp in this lexicon
This (part of) documentation was generated from src/fst/morphology/affixes/adjectives.lexc
LEXICON MUORRA Standard even stems with cg (note Q1). OBS: Nouns with invisible 3>2 cg (as busºsa) go to this lexicon.
kártta+N+Sg+Nom
kártta+N+Sg+Com
LEXICON TÁLLA Same as MUORRA, but for words with º (extra length). Not in MUORRA because of other err/orths
LEXICON ALMME Same as MUORRA, but with special -LASJ derivation. For noun that have strong grade -lasj. “Almmelasj” instead of “almálasj” which is Err/Orth-taged..
LEXICON NOADE Even stem without cg. OBS: No nouns with invisible 3>2 cg (as busºsa) in this lexicon. OBS: Because of denominal nouns taking a weak grade stem, entries in grade 3 are given the gradation mark º in order to prevent alternation to weak grade. We should consider creating a separate denominal nouns lexicon for NOADE instead.
låda+N+Sg+Nom
låda+N+Sg+Ela
LEXICON KÁFFA For even-syll words with cg cg III-I: káf’fa-káfav, jáf’fo-jáfo. No vowelchanges jet, need new twolc code.
káffa+N+Sg+Nom
káffa+N+Sg+Nom
káffa+N+Sg+Ela
káffa+N+Sg+Ela
LEXICON LINNJA Only for the loan word “linnja”. Because it’s a loan word, the “nnj” is pronounced “nn-j”, and therefore does not behave as the regular lule sami “nj” sound and therefore it doesn’t follow the rule that makes a:á in 1. grade with short vowel in first syllable (It isn’t as linnja-linjáv or birás-birrasav). This word is therefore sub taged. Norwegian/Swedish words with a short “i” followed by two different consonants are assimilated to lule sami in different manners accoring to the consonants in question, but the word is always on grade III (Morén-Duolljá 2014). Both err/orth and correct form is part of this lexicon.
LEXICON BOAKSA Only for word “boaksa”. Both boaksa-båvsa and Err/Orth boaksa-båksa are part of lexicon.
LEXICON SÁMEGIEL Compounds on -giella, with short -giel as middle compound (sámegielåhpadiddje)
rievsakgiella+N+Sg+Nom
LEXICON AHKA Words like tjerastahka, with short compound form
báládahka+N+Sg+Nom
báládahka+N+Sg+Nom
báládahka+N+Sg+Acc
LEXICON DARRHA Only for “darrha” or compounds that end on “darrha”.
báktedarrha+N+Sg+Nom
báktedarrha+N+Sg+Acc
LEXICON GÁDDE 2 syllable stems with cg (note Q1) with comparatives
boassjo+N+Sg+Nom
boassjo+N+Sg+Com
boassjo+N+Sg+Com
boassjo+N+Der/Comp+A+Sg+Nom
LEXICON SJIEVNNJET Like GAHPER but with comparatives. Odd-syllable C-final noun without cg, no vowchange, no short Ess.
sjievnnjet+N+Sg+Nom
sjievnnjet+N+Sg+Ela
sjievnnje+N+Der/Comp+A+Sg+Nom
sjievnnje+N+Der/Superl+A+Sg+Nom
LEXICON ÅLGGO Like MUORRA, but with comparatives. This lexicon was previously without sg ill/ine/elat, but these nouns can be conjugated for regular location cases. However, “adverbs” like ålggot (from outside), nuorttan (at north), oarjas (to south), etc., are more commonly used to denote location/direction (should therefore maybe consider subing the regular location case forms).
lulle+N+Sg+Nom
lulle+N+Sg+Acc
lulle+N+Der/Comp+A+Sg+Nom
LEXICON MIEHTE Like MUORRA but no locative/elative/illative sg. Presently no words in this lexica except for err subed nuortto
nuortto+N+Sg+Nom
nuortto+N+Sg+Acc
LEXICON BÅVSÅ Like MUORRA, only in plural. All, except ganta, juvdá and ávta, have regular, singular stem counterparts.
båvså+N+Pl+Nom
båvså+N+Pl+Acc
LEXICON LÅHTSASA Like GAHPER, only in plural. Without derivations, these should maybe be added.
LEXICON MUORRA_LOAN For loan words that do not fit in a loan word lexicon because of wrong short cmp, or partially assimilated loanwords without separate lexicas (medállja), or for Err/Orths assimilated with cg but with other errors. This lexicon gives no short compound forms. Potential short cmps must therefore be hard coded into the FirstComponent lexicon. This also for compounded words with partially assimilated loan words. Examples of problem words: sirup>siráhppa og stetoskop>stetoskoahppa.
LEXICON MUORRA_LOAN_NO_LASJ Like MUORRA_LOAN without -lasj derivation. This lexicon is made for Sem/Hum words like økonåvmmå, biolåvggå, agronåvmmå and so on. We don’t want agronåvmålasj since it means something else than “agronomisk”, the meaning of agronåvmålasj is barely used but messed up with “agronomijjalasj”
LEXICON MUORRA_LOAN_EXTRA_LENGTH Same as MUORRA_LOAN just for words with º (extra length).
LEXICON KAFIEDJA_CMP_INFL Recent loanwords on -edja. Ends on -é in norwegian. Short and long cmp. “Kafea” and “kaféa” are subtaged. See comments about the -ie/-e dialtags in ALFABIEHTTA.
LEXICON ALLEGORIJJA_CMP_INFL Recent loanwords ending on -i in NOR/SWE, with long and short compound form. Standardized as-iddja (SWE) and -ijºja (NOR). Previously often assimilated as -ija (or just -ia), but both forms are ungrammatical: Short vowels cannot preceed and follow a single intervocalic consonant. -ija is thus ungrammatical as the short a would be lenghtened to á, like “idja-ijá”.
akademijja+N+Sg+Nom
akademijja+N+Sg+Nom
akademijja+N+Sg+Ela
akademijja+N+Sg+Ela
LEXICON TEKSTIJLLA_CMP_INFL Recent loanwords on -ijlla with long and short compound-form. . Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
tekstijlla+N+Sg+Nom
tekstijlla+N+Sg+Ela
LEXICON ASIJLLA_CMP_INFL Recent loanwords on -ijlla, from nor and swe words ending on -yl. With long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
asijlla+N+Sg+Nom
asijlla+N+Sg+Ela
LEXICON BENSIJNNA Recent loanwords on -ijnna with long and short compound-form
LEXICON BENSIJNNA_CMP_INFL Recent loanwords on -ijnna with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
tamburijnna+N+Sg+Nom
tamburijnna+N+Sg+Ela
LEXICON MASJIJNNA_CMP_INFL Recent loanwords on -sjijnna with long and short compound-form: -SKIN
bivtasmasjijnna+N+Sg+Nom
bivtasmasjijnna+N+Sg+Ela
LEXICON ADJEKTIJVVA_CMP_INFL Recent loanwords on -ijvva with long and short compound-form
datijvva+N+Sg+Nom
datijvva+N+Sg+Ela
LEXICON PARADIJSSA_CMP_INFL Recent loanwords on -ijssa with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
servijssa+N+Sg+Nom
servijssa+N+Sg+Ela
LEXICON TELEFÅVNNÅ_CMP_INFL Recent loanwords on -åvnnå with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
persåvnnå+N+Sg+Nom
persåvnnå+N+Sg+Ela
LEXICON INSTITUSJÅVNNÅ_CMP_INFL Recent loanwords on -sjåvnnå with long and short compound-form: -TION IN SWEDISH. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
populasjåvnnå+N+Sg+Nom
populasjåvnnå+N+Sg+Ela
LEXICON MISJÅVNNÅ_CMP_INFL Recent loanwords on -sjåvnnå with long and short compound-form: -SSION IN SWEDISH. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
sesjåvnnå+N+Sg+Nom
sesjåvnnå+N+Sg+Ela
LEXICON PENSJÅVNNÅ_CMP_INFL Recent loanwords on -sjåvnnå with long and short compound-form: -SION IN SWEDISH. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
suspensjåvnnå+N+Sg+Nom
suspensjåvnnå+N+Sg+Ela
LEXICON PARTISIHPPA_CMP_INFL Recent loanwords from swe -cip and nor -sipp, becoming -sihppa in Norway, both -sijppa and -sihppa are used in Sweden (Particip vs partisipp). Short and long compound-form.
partisihppa+N+Sg+Nom
partisihppa+N+Sg+Ela
partisihppa+N+Sg+Nom
partisihppa+N+Sg+Ela
LEXICON ALKOHÅVLLÅ_CMP_INFL Recent loanwords on -åvllå with long and short compound-form. The old stadarization form “alkohola” is sub taged. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
parabåvllå+N+Sg+Nom
parabåvllå+N+Sg+Ela
LEXICON AGRONÅVMMÅ_CMP_INFL Recent loanwords on -åvmma with long and short compound-form. -lasj derivation is error taged. The old stadarization form -oma that does not follow lulesami rules is sub taged.
agronåvmmå+N+Sg+Nom
agronåvmmå+N+Sg+Ela
LEXICON DEMAGÅVGGÅ_CMP_INFL Recent loanwords ending on -og with long and short compound form. Assimilated to smj as -åvggå. -lasj derivation is error taged. The old stadarization -oga that does not follow lulesami rules is sub taged.
pedagåvggå+N+Sg+Nom
pedagåvggå+N+Sg+Nom
pedagåvggå+N+Sg+Ela
LEXICON LAKTÅVSSÅ_CMP_INFL Recent loanwords ending on -ose in nrowegian and -os in swedish, with long and short compound form. Assimilated to smj as -åvsså. The old stadarization -oga that does not follow lulesami rules is sub taged.
laktåvsså+N+Sg+Nom
laktåvsså+N+Sg+Ela
LEXICON FAKTÅVRRÅ_CMP_INFL Recent loanwords on -åvrrå with long and short compound-form.
LEXICON MIKROSKÅVPPÅ_CMP_INFL Recent loanwords on -åvppå (-op in NOB/SWE) with long and short compound-form. Long vowel and short consonant is assimilated with njuoban, but somehow a lot of -op words are assimilated -oahppa (biskop is pronounced as -opp, so that’s different, maybe some have used “biskop” as template), so this is Err/Orth taged.
oajvvekontåvrrå+N+Sg+Nom
oajvvekontåvrrå+N+Sg+Ela
LEXICON KULTUVRRA_CMP_INFL Recent loanwords on -vrra with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
muvrra+N+Sg+Nom
muvrra+N+Sg+Com
LEXICON TERAPÆVTTA_CMP_INFL Recent loanwords on -ævtta/ievtta with long and short compound-form. No -lasj derivation. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
terapævtta+N+Sg+Nom
terapævtta+N+Sg+Nom
terapævtta+N+Sg+Nom
terapævtta+N+Sg+Com
terapævtta+N+Sg+Com
LEXICON ADVÆRBBA_CMP_INFL Recent loanwords on -ærbba with long and short compound-form
detransitijvvaværbba+N+Sg+Nom
detransitijvvaværbba+N+Sg+Nom
detransitijvvaværbba+N+Sg+Ela
LEXICON SUBSTÁNSSA_CMP_INFL Recent loanwords on -ánssa with long and short compound-form. Originally -ans in SWE and NOR. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
instánssa+N+Sg+Nom
instánssa+N+Sg+Ela
LEXICON VALÆNSSA_CMP_INFL Recent loanwords on -ænssa with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
intelligænssa+N+Sg+Nom
intelligænssa+N+Sg+Nom
intelligænssa+N+Sg+Acc
LEXICON ADVOKÁHTTA_CMP_INFL Recent loanwords on -áhtta with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
klimáhtta+N+Sg+Nom
klimáhtta+N+Sg+Ela
LEXICON ALFABIEHTTA_CMP_INFL Recent loanwords originally on -et both in Norway and Sweden. Assimilation differences, however, create two lule sami forms: -iehtta in NOR and -æhtta in SWE. LONG -e is assimilated in different ways in Norway and Sweden: In Norway, it becomes -ie, and in Sweden -e. Tiedja/tedja, systiebma/systebma and so on. This is especially apparent in assimilated words with long e in third grade: E becomes æ in third grade so we get “universitæhtta” in SWE, but this is very strange to people on the norwegian side of the border as they want “universitiehtta”. Both -ie and -e are dialtaged in lexicons HYDROGIEDNA, APOTIEHKKA, SYSTIEBMA, KAFÉ. Previously people often wrote -ehtta in Norway, but this is incorrect as e always becomes æ in grade three.
mobilitiehtta+N+Sg+Nom
mobilitiehtta+N+Sg+Nom
mobilitiehtta+N+Sg+Acc
mobilitiehtta+N+Sg+Acc
LEXICON INTERNÆHTTA_CMP_INFL Recent loanwords on -æhtta with long and short compound-form: -ET IN SWEDISH, -ETT in norwegian. Differs from ALFABIEHTTA because -ehtta isn’t used in NOR.
intranæhtta+N+Sg+Nom
intranæhtta+N+Sg+Nom
intranæhtta+N+Sg+Ela
LEXICON TABLÆHTTA_CMP_INFL Recent loanwords on -æhtta with long and short compound-form. -ETT in both norwegian and in swedish.
kvartæhtta+N+Sg+Nom
kvartæhtta+N+Sg+Nom
kvartæhtta+N+Sg+Ela
LEXICON INSTITUHTTA_CMP_INFL Recent loanwords on -uhtta, with long and short compound-form on -utt(NOR)/-ut(SWE). The swedish -ut also gets uvtta, as ANTIHKKA-antijkka, but instituhtta is also used in sweden, so no Area/NO tag.
minuhtta+N+Sg+Nom
minuhtta+N+Sg+Nom
minuhtta+N+Sg+Ela
minuhtta+N+Sg+Ela
LEXICON SATELIHTTA_CMP_INFL Recent loanwords on -ihtta, with long and short compound-form on -itt(NOR)/-it(SWE). The swedish -it also gets ijtta, as ANTIHKKA-antijkka, but satelihtta is also used in sweden, so no Area/NO tag.
inuihtta+N+Sg+Nom
inuihtta+N+Sg+Nom
inuihtta+N+Sg+Ela
inuihtta+N+Sg+Ela
LEXICON APOTIEHKKA_CMP_INFL Recent loanwords on -iehkka in NOR, -æhkka in SWE. -ehkka as sub. With long and short compound-form on -k. See comments about the -ie/-e dialtags in ALFABIEHTTA.
old “apotehkka” (long e not allowed in grad III, even though it’s in dictionaries it’s wrong)
kartotiehkka+N+Sg+Nom
kartotiehkka+N+Sg+Ela
kartotiehkka+N+Sg+Nom
kartotiehkka+N+Sg+Ela
LEXICON ANTIHKKA_CMP_INFL Recent loanwords on -hkka in Norway, both -ijkka and -hkka are used in Sweden (Antik vs antikk). With long and short compound-form on -kk/-k. The swedish forms were earlier added to stems for the Swedish version, but now added here.
dialektihkka+N+Sg+Nom
dialektihkka+N+Sg+Ela
dialektihkka+N+Sg+Nom
dialektihkka+N+Sg+Ela
LEXICON SEMINÁRRA_CMP_INFL Recent loanwords on -árra with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
hektárra+N+Sg+Nom
hektárra+N+Sg+Ela
LEXICON AREÁLLA_CMP_INFL Recent loanwords on -álla with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
gasskavokálla+N+Sg+Nom
gasskavokálla+N+Sg+Ela
LEXICON AMBASSADERRA_CMP_INFL Recent loanwords on -ør with long and short compound-form. Standarized by Giellagálldo 05.05.14 as -erra. -ørra is subtaged
observaterra+N+Sg+Nom
observaterra+N+Sg+Ela
LEXICON VETERINERRA_CMP_INFL Recent loanwords on -erra. Words ending in -ær in both SWE and NOR. Both long and short compound-form. The old standardization form -æra, without cg, is subtaged, -also -ær’ra and -ærra.
LEXICON ATMOSFERRA_CMP_INFL Recent loanwords -on erra. But with different endings in SE and NO, ending on -ære, -ær in NOR and -är, -ära in SWE (Ingefær NO, ingefära in SE). Only long compound-form, short form must be hardcoded in firstcompnent lexicon. The old standardization form -æra, and -era, without cg, are subtaged, -also -ær’ra and -ærra.
atmosferra+N+Sg+Nom
atmosferra+N+Sg+Ela
LEXICON KARAKTIERRA_CMP_INFL Recent loanwords -on ierra in NOR, -erra in SWE, because of long e assimilates diffenrent ways. Words ending on -er in NOR, and -er or -är in SWE. Only long compound-form, short form must be hardcoded in firstcompnent lexicon.
karaktierra+N+Sg+Nom
karaktierra+N+Sg+Ela
karaktierra+N+Sg+Nom
karaktierra+N+Sg+Ela
LEXICON TABÆLLA_CMP_INFL Recent loanwords on -älºla with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
flotælla+N+Sg+Nom
flotælla+N+Sg+Nom
flotælla+N+Sg+Ela
LEXICON TELEGRÁMMA_CMP_INFL Recent loanwords on -ámºma with long and short compound-form
grámma+N+Sg+Nom
grámma+N+Sg+Ela
LEXICON TOPOGRÁFFA_CMP_INFL Recent loanwords on -áfºfa with long and short compound-form, no -lasj derivation since most of these words are humans.
telegráffa+N+Sg+Nom
telegráffa+N+Sg+Ela
LEXICON SYSTIEBMA_CMP_INFL Recent loanwords on -ebma/-iebma with long and short compound-form. -em in NOR and SWE. See comments about the -ie/-e dialtags in ALFABIEHTTA. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
vokalsystiebma+N+Sg+Nom
vokalsystiebma+N+Sg+Nom
vokalsystiebma+N+Sg+Ela
vokalsystiebma+N+Sg+Ela
LEXICON ORGÁDNA_CMP_INFL Recent loanwords on -ádna with long and short compound-form
doarjjaorgádna+N+Sg+Nom
doarjjaorgádna+N+Sg+Nom
doarjjaorgádna+N+Sg+Acc
LEXICON KOLLÆKTA_CMP_INFL Recent loanwords on -ækta with long and short compound-form
subjækta+N+Sg+Nom
subjækta+N+Sg+Nom
subjækta+N+Sg+Ela
LEXICON HYDROGIEDNA_CMP_INFL Recent loanwords on -iedna in NOR and -edna in SWE. Both long and short compound-form. Norwegian/swedish -en. The old standardization form -ena, without cg, is subtaged. See comments about the -ie/-e dialtags in ALFABIEHTTA.
LEXICON PATÆNNTA_CMP_INFL Recent loanwords on -ænnta with long and short compound-form. The -ennta form (used in “Ådå testamennta”) is taged as sub (e always becomes æ in grade three).
patænnta+N+Sg+Nom
patænnta+N+Sg+Nom
patænnta+N+Sg+Ela
LEXICON VARIÁNNTA_CMP_INFL Recent loanwords on -ánnta with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
praktikánnta+N+Sg+Nom
praktikánnta+N+Sg+Ela
LEXICON FANATISSMA_CMP_INFL Recent loanwords on -ssma with long and short compound-form.
kabbalissma+N+Sg+Nom
kabbalissma+N+Sg+Ela
LEXICON TURISSTA_CMP_INFL Recent loanwords on -ssta with long and short compound-form. -lasj derivation is error taged. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
journalissta+N+Sg+Nom
journalissta+N+Sg+Ela
LEXICON PRIEMIJ_CMP_INFL Assimilated loanwords. on -ie/-y, like premie and bandy. Become odd syllable loan words with cg, like “riebij”. Nom: premij, gen prebmiha. Long and short essive.
priemij+N+Sg+Nom
priemij+N+Sg+Nom
priemij+N+Sg+Ela
priemij+N+Sg+Ela
priemij+N+Ess
priemij+N+Ess
priemij+N+Ess
priemij+N+Ess
LEXICON A_CMP_INFL Sub-forms. Lexicon for giving sub-variation conjugation by simply adding an -a to the norwegian/swedish word. No cg. Like “alkohola” and “agronoma”. These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
LEXICON ERR/ORTH_EVEN_WEAK_CASES Even stem Err/orth lexicon without nominative, illative and essive. Only for entries with ERR/ORTH tag. Made so that we don’t get entries that are both norm and with error tag. Entries like “ålggo” have no grade alternation, a common error is writing it like it has, ålggo>ålgov. tálla>tálav, klimáksa>klimáksav, prefiksa>prefiksav, barggo>barggov
LEXICON ERR/ORTH_EVEN_WEAK_CASES2 Even stem Err/orth lexicon without nominative, illative and essive, AND ALSO Sg+Gen, Sg+Ine, Pl+Nom, Pl+Com and Pl+Gen (to not get homonemies.
LEXICON ERR/ORTH_EVEN_STRONG_CASES Even stem Err/orth lexicon with only nominative, illative and essive. Only for entries with ERR/ORTH tag. Made so that we don’t get entries that are both norm and with error tag. Hydrogena is used as nom and is err/orth, hydrogena>hydrogenav is not err/orth. marináda-nom, banána-nom
LEXICON ERR/ORTH_ODD Err/Orth lexicon doing the opposit of what odd-syllable nouns do. Strong grade in nom and weak in all other.
dálkas+N+Err/Orth+Sg+Nom
dálkas+N+Err/Orth+Sg+Acc
dálkas+N+Err/Orth+Der/Dimin+N+Sg+Nom
LEXICON NOADE_BADASS 2 syll stems without cg. Badly or wrongly assimilated words, ie. assimilated in a way that isn’t lulesami. (Same as NOADE) Most of the words are Err/Orth tagged with a standardized lemma. Some are Err/Lex tagged, 5.9.2019: EJP/SNM: fjerna +Use/-Spell - sjølv om vi ikkje likar orda, så vil vi sjå til at dei blir skrive rett etter smj-ortografien! Dei fleste orda er uansett merka med +Err/Orth :)
balláda+N+Sg+Nom
balláda+N+Sg+Ela
LEXICON C_ILL_IJ_BADASS Badly or wrongly assimilated words. Last letter is consonant, no cg, no vowchange, with illative -ij. (Same as GAHPER) Assimilated in a way that isn’t lulesami. Most of the words are Err/Orth tagged with a standardized lemma. Some are Err/Lex tagged, and some only recieve the +Use/-Spell tag from the lexicon.
sentimehter+N+Sg+Nom
sentimehter+N+Sg+Ela
sentimehter+N+Sg+Ill
LEXICON C_ILL_AJ_BADASS Badly or wrongly assimilated words. Last letter in consonant, no cg, no vowchange, with illative -aj. Should have been assimilated to even-syll, but are used as odd-syll, and mostly just assimilated with changing to letter á. So almosed same as CELSIUS_UNASS.
kálsium+N+Sg+Nom
kálsium+N+Sg+Ela
kálsium+N+Sg+Ill
LEXICON KINO_UNASS_CMP_INFL V-final unassimilated loanwords. Not lulesami. No diacritics whatsoever. Words that aren’t assimilated at all. Really just norwegian words with a kind of sami inflection. Get even syllable case marking. Are part of the spell checker.
netto+N+Sg+Nom
netto+N+Sg+Ela
LEXICON C_ILL_IJ_UNASS C-final unassimilated loanwords, gives illative- ij. Not lulesami. No diacritics whatsoever. Really just foreign words with a kind of sami inflection. Odd syllable case marking (like GAHPER). Are part of the spell checker.
sirkus+N+Sg+Nom
sirkus+N+Sg+Ill
sirkus+N+Sg+Ela
LEXICON C_ILL_AJ_UNASS C-final unassimilated loanwords, gives illativ -aj. Also odd-syll words ending on letter i, as selleri. Not lulesami. No diacritics whatsoever. Really just norwegian words with a kind of sami inflection. Case marking like standard even 4 syllable stems (see proper nouns file on the case marking of foreign words with stressed last syllable). Are part of the spell checker.
aids+N+Sg+Nom
aids+N+Sg+Ill
aids+N+Sg+Ela
aids+N+Ess
aids+N+Abe
aids+N+Abe
aids+N+Der/Dimin+N+Sg+Nom
+Der4+Der/ahtes:e»g AHTES ; Only for odd-syllble stems
LEXICON GÅNÅGIS Standard C-final 4-syllabic stems
rahtjamus+N+Sg+Nom
rahtjamus+N+Sg+Ill
rahtjamus+N+Sg+Ela
LEXICON BERULASJ For words ending on -asj. Same as GÅNÅGIS but with strong essive and illative -adjtan and -adtjaj subtaged, same with PX “-adjtam”. These forms are barely used today. -lahttja is also Err/Orth-taged.
LEXICON BEDNAGASJ Like BERULASJ, but for derived nouns in diminutive. No cg, no vowchange, no short Ess. Has only one dimin derivation since these words already are dimin, ie. no double dim as for GAHPER. No abessive, not totally sure about this, I think we must use postposition dagi when it’s diminutive,
bednagasj+N+Sg+Nom
bednagasj+N+Sg+Ela
LEXICON HÁVSAGUSJ Like BEDNAGASJ, but not diminutive. No cg, no vowchange, no short Ess. Has only one dimin derivation. No abessive, not totally sure about this, I think we must use postposition dagi when it’s diminutive,
LEXICON JIHPELIJ gen:jihpelahá
gehtsulij+N+Sg+Nom
gehtsulij+N+Sg+Acc
LEXICON OARJJILIJ gen:oarjjilihá
allilij+N+Sg+Nom
allilij+N+Sg+Ela
LEXICON VIESSOMUJ gen:viessumuhá
bårråmuj+N+Sg+Nom
bårråmuj+N+Sg+Ill
LEXICON OADÁDAGÁ Plural forms of words like tjerastahka with short compound-form
látjádagá+N+Pl+Nom
látjádagá+N+Pl+Ela
LEXICON BERRAHATTJA Plural stems. Like IEDNITJA, these do not have corresponding singular stems. Most stems here have the same form as the pl nom form of diminutive derivations, but (while it may have originated as a diminuitive derivation) it is not the same derivation (today) and it does not have a singular form.
gahpanisá+N+Pl+Nom
gahpanisá+N+Pl+Ill
gahpanisá+N+Pl+Ela
LEXICON SIJDDALAHÁ Plurals
lullelahá+N+Pl+Nom
lullelahá+N+Pl+Acc
LEXICON SISSNELUHÁ plurals. presently only for sissŋeluhá
sissŋeluhá+N+Pl+Nom
sissŋeluhá+N+Pl+Ill
LEXICON DAGI_SINGULAR Earlier we generated “bijladagi” and bijlajdagi as abbessiv. This has been fixed, but to be able to analyse what we earlier generated, we needed this lexicon. Only singular. Gives Err/tag to “bijladagi” and makes correct “bijla dagi”.
LEXICON DAGI_PLURAL Earlier we generated “bijladagi” and bijlajdagi as abbessiv. This has been fixed, but to be able to analyse what we earlier generated, we needed this lexicon. Only plural. Gives Err/tag to “bijlajdagi” and makes correct “bijlaj dagi”.
LEXICON SURGULASJ-EVEN
LEXICON N-EVENWEAKSTEM-NO-ABE same as N-EVENWEAKSTEM but without abessive (abessive it Err/Infl-taged). Used for 4-syll nouns
LEXICON GAHPER Odd-syllable C-final noun without cg, no vowchange, no short Ess. Spiik A3
stiebil+N+Sg+Nom
stiebil+N+Sg+Ela
LEXICON ÅRES Odd-syllable C-final noun with CG, 2ndsyll vowchange. Long and short essive. Spiik A1
sjattos+N+Sg+Nom
sjattos+N+Sg+Ela
sjattos+N+Ess
sjattos+N+Ess
LEXICON SÅHKÅR Odd-syllable C-final noun with CG and 2ndsyll vowelchange. Has only long essive. Spiik 2b
spiger+N+Sg+Nom
spiger+N+Sg+Ela
spiger+N+Ess
LEXICON MIEHTAR Only for word “miehtar”. Same as SÅHKÅR but with Area-differences and a lot of Err/Orths.
miehtar+N+Sg+Nom
miehtar+N+Sg+Nom
miehtar+N+Sg+Ela
miehtar+N+Sg+Ela
miehtar+N+Ess
miehtar+N+Ess
LEXICON GÁMAS Odd-syllable C-final noun with CG, no 2ndsyll vowchange (OBS: a does not change). Long and short essive. Spiik A2
sjábtjas+N+Sg+Nom
sjábtjas+N+Sg+Ela
LEXICON BENA Odd-syllable V-final noun with cg, no 2nsyll vowchange. Deletes g. Long and short essive. Spiik 2a
galma+N+Sg+Nom
galma+N+Sg+Ela
LEXICON SUOBDE gen: suobddega. Presently only for “suobde”. For some reason -e dosn’t become á. So not in lexicon BENA. Long and short essive.
ságe+N+Sg+Nom
ságe+N+Sg+Acc
LEXICON SÁGE gen: sáhkaha. Presently only for “ságe”. Long and short essive.
ságe+N+Sg+Nom
ságe+N+Sg+Acc
LEXICON BAVSEV Ends on -v and last vowel changes to i: bavsev:baksIma. Not like gierkav gierkkAma and birev birEma.
sievtev+N+Sg+Nom
sievtev+N+Sg+Ela
LEXICON RÁBEV rábev:ráhpuga. Presently only for “rábev”.
rábev+N+Sg+Nom
rábev+N+Sg+Ela
LEXICON RITJAS ! Like GÁMAS but without stem a-lengthening for grade I (underlying long -i-). presently only for “ritjas”.
ritjas+N+Sg+Nom
ritjas+N+Sg+Ela
LEXICON SÅGAS gen: sågaska. Presently only for “sågas”.
sågas+N+Sg+Nom
sågas+N+Sg+Acc
LEXICON SJUVÁJ Presently only for “sjuváj”. sjuváj-sjuvvaga. Only this word
sjuváj+N+Sg+Nom
sjuváj+N+Sg+Ela
LEXICON BØSOJ Because of bösoj in O.Korhonen, and bæsoj-bessuga. Only for these two words. J becomes g.
LEXICON GUOVSOJVUOJOJ vuojoj:vuodjom. Presently only for “guovsojvuojoj”.
guovsojvuojoj+N+Sg+Nom
guovsojvuojoj+N+Sg+Acc
LEXICON BUTJES butjes-buttjása. Presently only for “butjes”. This is an sub. Korhonen has this form but if you look in Grundstöm it’s buttjes-budtjasa. Must be a typo in Korhonen, because ttj-tj dosn’t exist in smj. This form is err subed in stems file.
LEXICON TJÅLKES tjålkes:tjoalkkas- Presently only for “tjålkes and tsålkes”. This must be wrong, and it dosn’t exist in Grundström. Å in 1. syll isn’t possible with e in 2. syll. Must be tjoalkes-tjoalkkása or tjålkas-tjoalkkasa. This form is err subed in stems file.
tsålkes+N+Sg+Nom
(is not standard language)tsålkes+N+Sg+Acc
(is not standard language)LEXICON VÁJES vájes:vádjas- Presently only for “báhkovájes”. It’s a sub: 2. syll e doesn’t become a. Must be vájes-vádjása or vájas-vádjasa. The second is used in NT, so I belive thats the right one. This form is err subed in stems file.
Derived stems
LEXICON BADJEL Derived nouns with acc -elav, ill -elij, elat -elas, etc. These were previously categorized as adpositions and adverbs, but according to Bruce Morén-Duolljá (2014) they are actually case forms of nouns derived from certain location nouns. Derived from even strong stems (badje -> badjel). Odd syllable inflection, but only singular nominative-elative (not clear if they take comitative and essive case). With comparatives. No Px.
allel+N+Sg+Nom
allel+N+Sg+Ela
allel+N+Der/Comp+A+Sg+Nom
LEXICON BÁRNEP bárnep:bárnebu-. Comparisation of nouns. No -ahtá abesive.
iednep+N+Sg+Nom
iednep+N+Sg+Acc
LEXICON OAPPÁSJ Like GAHPER, but for derived nouns in diminutive, have an underived form. Doesn’t get abesive -ahtá or -ahtes derivation. Oddsyll, no cg, no vowchange, no short Ess. Has only one dimin derivation since these words already are dimin, ie. not double dim as in GAHPER.
oappásj+N+Sg+Nom
oappásj+N+Sg+Ela
LEXICON FIERUN Like GAHPER, but instruments derived from verbs. Fierrot>fierun. No short essive.
fierun+N+Sg+Nom
fierun+N+Sg+Ela
LEXICON GUOLLÁR Like GAHPER, but actor derived from contracted verbs (ACTOR for evensyll verbs). Guollit>guollár. No short essive.
LEXICON IELLEM Nomen actionionis derived from even verbs. Earlier these went directly to VSBST-ODD, now they get tag Gram/NomAct before going there. Can’t put it in VSBST-ODD lexicon because paths from verb lexicons.
LEXICON TJIEKTJAMA Pl Nomen actionionis derived from even verbs. Earlier these went directly to VSBST-ODD-PL, now they get tag Gram/NomAct before going there. Can’t put it in VSBST-ODD-PL lexicon because of paths from verb lexicons.
LEXICON AKTIDIBME Nomen actionionis derived from uneven verbs, ending DIBME. Earlier these went directly to VSBST-EVEN, now they get tag Gram/NomAct before going there. Can’t put it in VSBST-ODD lexicon because paths from verb lexicons.
LEXICON BERUSTIBME Nomen actionionis derived from uneven verbs, ending STIBME and DIBME is Err/orth-taged. Earlier these went directly to VSBST-EVEN, now they get tag Gram/NomAct before going there. Can’t put it in VSBST-ODD lexicon because paths from verb lexicons.
LEXICON DÁRBBAGA Like BENA, but plural. Presently only for “dárbbaga”, has singular stem counterpart.
dárbbaga+N+Pl+Nom
dárbbaga+N+Pl+Acc
LEXICON BÆLLJASA Like GÁMAS, but plural. These have corresponding singular stems.
jiednabælljasa+N+Pl+Nom
jiednabælljasa+N+Pl+Nom
jiednabælljasa+N+Pl+Acc
jiednabælljasa+N+Pl+Acc
LEXICON IEDNITJA Odd syllable pluralforms only. These do not have a singular form.
jáhkoguojmitja+N+Pl+Nom
jáhkoguojmitja+N+Pl+Acc
LEXICON SNJIERÁGA Odd syllable pluralforms only. These have corresponding singular stems.
guovlloådåsa+N+Pl+Nom
guovlloådåsa+N+Pl+Acc
LEXICON MANEBU oddsyllable plural only. presently only for “maŋebu”.
maŋebu+N+Pl+Nom
maŋebu+N+Pl+Acc
LEXICON SUOLOJ C-final with cg II-III: ålmåj:ålmmå
njurgoj+N+Sg+Nom
njurgoj+N+Sg+Acc
LEXICON ÅLMÅJ_LOAN Same as SUOLOJ, only for loan words. Follows Ráhka/Mikkelsen’s Bårjås 2014. C-final with cg II-III: ålmåj:ålmmå
bistroj+N+Sg+Nom
bistroj+N+Sg+Acc
bistroj+N+Sg+Acc
LEXICON GUOMOJ C-final with cg I-III: guomoj:guobbmu
ænoj+N+Sg+Nom
ænoj+N+Sg+Acc
ænoj+N+Sg+Nom
ænoj+N+Sg+Acc
LEXICON SARVES C-final with cg II-III. sarves:sarvvá
moarmes+N+Sg+Nom
moarmes+N+Sg+Acc
LEXICON SVÁLES C-final with cg I-III. sváles:svállá (lºl)
sváles+N+Sg+Nom
sváles+N+Sg+Acc
LEXICON GÅHKES C-final with cg II-III with vowel harmony (a/á=å). gåhkes:gåhkkå. Presently only for “gåhkes”.
gåhkes+N+Sg+Nom
gåhkes+N+Sg+Acc
LEXICON SJUOKKAJ sjuokkaj:sjuoggá. Presently only for “sjuokkaj”.
sjuokkaj+N+Sg+Nom
sjuokkaj+N+Sg+Acc
LEXICON GISTÁ gistá:gisstá. Presently only for “gistá”.
gistá+N+Sg+Nom
gistá+N+Sg+Acc
LEXICON DUOLMUN Fierrot>fierun, instruments derived from verbs, used only for verb derivation, not for lexicalized lemmas. No short essive.
This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc
+Sg+Com:%>jn K ;
+Ess:n K ;
+Sg+Com:jn K ;
+Sg+Com+Attr:jn K ;
+Pl+Nom: K ;
+Pl+Gen:j K ;
+Pl+Acc:jt K ;
+Pl+Ill:jda K ;
+Pl+Ine:jn K ;
+Pl+Ela:js K ;
+Pl+Com:j K ;
+Ess:a%>n K ;
+Ess+Use/NG:e%>n K ; ! not sure about the e-ending, dont think its standardized
+Sg+Gen:%- K ; ! ???
+Sg+Com:jn K ;
+Ess:n K ; +Sg+Com:jn K ;
+Ess:n K ;
+Sg+Nom+Use/NG:v K ;
+Sg+Com:jn K ;
+Sg+Com+Attr:jn K ;
+Sg+Nom:%>a K ;
+Sg+Nom+Use/NG:%>e K ;
+Ess:a%>n K ;
+Ess+Use/NG:e%>n K ;
+Sg+Nom:e%> K ;
+Ess:e%>n K ;
+Pl+Nom: K ;
+Sg+Com:jn K ;
+Sg+Com+Attr:jn K ;
+Pl+Com:j K ;
LEXICON ARABICCOMPOUNDS ! arabic as first part,
LEXICON ARABICCASES adds +Arab
LEXICON ARABICCASE adds +Arab
LEXICON ARABICCASE0 adds +Arab
LEXICON DIGITCASES to distinguish between 0 and oblique
LEXICON DIGITCASE0
+Num: ROMNUMTAGOBL ;
This (part of) documentation was generated from src/fst/morphology/affixes/numerals.lexc
+Use/NG+Gen:n NAMÁK ; ! adjectival -k derivation does not take pronouns +Use/NG+Ela:sstága K ; !Can’t find this anywhere. Maybe this is really dástága/dastagá? in “dáhtakcas”
+Use/NG+Gen: NAMÁK ; ! adjectival -k derivation does not take pronouns
+Use/NG+Gen:aj NAMÁK ; ! adjectival -k derivation does not take pronouns
+Ine:a%>jna K-s ;
+Abe+Use/NG:a%>jdak K ; ! covered in non-idiosync
+Abe+Use/NG:a%>jdagi K ; ! covered in non-idiosync
+Abe+Use/NG:a%>jdagá K ; ! covered in non-idiosync
+Abe+Use/NG:a%>jtagá K ; ! covered in non-idiosync
This (part of) documentation was generated from src/fst/morphology/affixes/pronouns.lexc
Unstressed last syllable
Words in ACCRA lexicons end on vowel, have no CG and get “even-syllable” case marking where case suffixes are added directly. Illative e:i, but not o:u. Last syllable is unstressed. Both non-assimilated and assmilated stems (although not all are fully, or correctly, assmilated)
LEXICON ACCRA-ani Vowel-final names where case endings are added directly, no cg. Illative e changes to i. Animales.
Tjuorri+N+Prop+Sem/Ani+Sg+Nom
Tjuorri+N+Prop+Sem/Ani+Sg+Ill
Tjuorri+N+Prop+Sem/Ani+Sg+Ela
LEXICON ACCRA-obj Vowel-final names where case endings are added directly, no cg. Object names
Gestapo+N+Prop+Sem/Obj+Sg+Nom
Gestapo+N+Prop+Sem/Obj+Sg+Ill
Gestapo+N+Prop+Sem/Obj+Sg+Ela
LEXICON ACCRA-org Vowel-final names where caseendings are added directly, no cg. organizations
Giellatekno+N+Prop+Sem/Org+Sg+Nom
Giellatekno+N+Prop+Sem/Org+Sg+Ill
Giellatekno+N+Prop+Sem/Org+Sg+Ela
LEXICON ACCRA-mal Vowel-final names where case are added directly, no cg. Male names
Antonio+N+Prop+Sem/Mal+Sg+Nom
Antonio+N+Prop+Sem/Mal+Sg+Ill
Antonio+N+Prop+Sem/Mal+Sg+Ela
LEXICON ACCRA-fem Vowel-final names where case endings are added directly, no cg. Female names
Barbro+N+Prop+Sem/Fem+Sg+Nom
Barbro+N+Prop+Sem/Fem+Sg+Ill
Barbro+N+Prop+Sem/Fem+Sg+Ela
LEXICON ACCRA-femsur Vowel-final names where case endings are added directly, no cg. Female names also used as surnames
Sara+N+Prop+Sem/Fem+Sg+Nom
Sara+N+Prop+Sem/Fem+Sg+Ill
Sara+N+Prop+Sem/Sur+Sg+Ill
Sara+N+Prop+Sem/Fem+Sg+Ela
LEXICON ACCRA-malfem Vowel-final names where case endings are added directly, no cg. Names that can be both female and male names
Janne+N+Prop+Sem/Mal+Sg+Nom
Janne+N+Prop+Sem/Fem+Sg+Nom
Janne+N+Prop+Sem/Mal+Sg+Ill
Janne+N+Prop+Sem/Mal+Sg+Ela
LEXICON ACCRA-objplc Vowel-final names where case endings are added directly, no cg. Names that can be both objects and place names
Soria-Moria+N+Prop+Sem/Obj+Sg+Nom
Soria-Moria+N+Prop+Sem/Obj+Sg+Ill
Soria-Moria+N+Prop+Sem/Obj+Sg+Ela
Soria-Moria+N+Prop+Sem/Plc+Sg+Ill
Soria-Moria+N+Prop+Sem/Plc+Der/k+N+Sg+Nom
Soria-Moria+N+Prop+Sem/Plc+Der/lasj+N+Sg+Nom
LEXICON ACCRA-femplc Vowel-final names where case endings are added directly, no cg. Names that can be both female and place names
Salla+N+Prop+Sem/Fem+Sg+Nom
Salla+N+Prop+Sem/Plc+Pl+Nom
Salla+N+Prop+Sem/Fem+Sg+Ill
Salla+N+Prop+Sem/Fem+Sg+Ela
Salla+N+Prop+Sem/Plc+Der/k+N+Sg+Nom
Salla+N+Prop+Sem/Plc+Der/lasj+N+Sg+Nom
LEXICON ACCRA-sur Vowel-final names where case endings are added directly, no cg. Surnames
Tønne+N+Prop+Sem/Sur+Sg+Nom
Tønne+N+Prop+Sem/Sur+Sg+Ill
Tønne+N+Prop+Sem/Sur+Sg+Ela
LEXICON ACCRA-malsur Vowel-final names where case endings are added directly, no cg. Names that can be both male- and surnames
Valio+N+Prop+Sem/Sur+Sg+Nom
Valio+N+Prop+Sem/Mal+Sg+Nom
Valio+N+Prop+Sem/Sur+Sg+Ill
Valio+N+Prop+Sem/Sur+Sg+Ela
LEXICON ACCRA-plc Vowel-final names where caseendings are added directly, no cg. Place names
Burma+N+Prop+Sem/Plc+Sg+Nom
Burma+N+Prop+Sem/Plc+Sg+Ill
Burma+N+Prop+Sem/Plc+Sg+Ela
Burma+N+Prop+Sem/Plc+Der/k+N+Sg+Nom
Burma+N+Prop+Sem/Plc+Der/lasj+N+Sg+Nom
LEXICON ACCRA_MWE-plc Vowel-final names where caseendings are added directly, no cg. Place names
LEXICON GIRUNA-plc For proper Kiruna. Same as ACCRA. Different lexicon because of sma.
Veitsiluoto+N+Prop+Sem/Plc+Sg+Nom
Veitsiluoto+N+Prop+Sem/Plc+Sg+Nom
Veitsiluoto+N+Prop+Sem/Plc+Sg+Ill
Veitsiluoto+N+Prop+Sem/Plc+Sg+Ela
Veitsiluoto+N+Prop+Sem/Plc+Der/k+N+Sg+Nom
Veitsiluoto+N+Prop+Sem/Plc+Der/lasj+N+Sg+Nom
LEXICON ACCRA-LOAN-org Only nominatives. Vowel-final names where case endings are added directly, no cg. organizations
Idrettsforbund-Norge:
(Eng. Samenes Idrettsforbund-Norge+N+Prop+Sem/Org+Sg+Nom)Idrettsforbund-Norge:
(is not standard language Samenes Idrettsforbund-Norge+N+Prop+Sem/Org+Sg+Ela # negative test)LEXICON ACCRA-LOAN-obj Only nominatives. Vowel-final names where case endings are added directly, no cg. Object names
Norwegian
(Eng. Sami Experience: The Norwegian Sami Experience+N+Prop+Sem/Obj+Sg+Nom)Norwegian
(is not standard language Sami Experience: The Norwegian Sami Experience+N+Prop+Sem/Obj+Sg+Ill # Negative test)LEXICON ACCRA-LOAN-plc Only nominatives. Vowel-final names where case endings are added directly, no cg.Place names
Kautokeino+N+Prop+Sem/Plc+Sg+Nom
Kautokeino+N+Prop+Sem/Org+Sg+Ela
(is not standard language # negative test)In smj RONDANE is same as ACCRA, in use in smi because of diffrences in sme. No -lasj or -k. Last syllable is unstressed. Non-assimilated-stems.
LEXICON RONDANE-plc E-final names, with no cg. elative -s, ill -ij. Place names
Bakkane+N+Prop+Sem/Plc+Sg+Nom
Bakkane+N+Prop+Sem/Plc+Sg+Gen
Bakkane+N+Prop+Sem/Plc+Sg+Acc
Bakkane+N+Prop+Sem/Plc+Sg+Ine
Bakkane+N+Prop+Sem/Plc+Sg+Ill
Bakkane+N+Prop+Sem/Plc+Sg+Ela
Bakkane+N+Prop+Sem/Plc+Sg+Com
Bakkane+N+Prop+Sem/Plc+Pl+Ill
Bakkane+N+Prop+Sem/Plc+Pl+Ela
Bakkane+N+Prop+Sem/Plc+Der/k+N+Sg+Nom
(is not standard language)Bakkane+N+Prop+Sem/Plc+Der/lasj+N+Sg+Nom
(is not standard language)LEXICON RONDANE-SG-plc E-final names, with no cg. elative -s, ill -ij. Place names
Bakkane+N+Prop+Sem/Plc+Sg+Nom
Bakkane+N+Prop+Sem/Plc+Sg+Gen
Bakkane+N+Prop+Sem/Plc+Sg+Acc
LEXICON RONDANE-LOAN Only nominative.Place names
Azorene+N+Prop+Sem/Plc+Sg+Nom
Azorene+N+Prop+Sem/Plc+Sg+Ill
(is not standard language # Negative test)LEXICON RONDANE-SG-LOAN Only nominative. Place names
LEXICON RONDANE-sur Surnames
Benneche+N+Prop+Sem/Sur+Sg+Nom
Benneche+N+Prop+Sem/Sur+Sg+Ill
Benneche+N+Prop+Sem/Sur+Sg+Ela
LEXICON RONDANE-obj Objects
Office+N+Prop+Sem/Obj+Sg+Nom
Office+N+Prop+Sem/Obj+Sg+Gen
Office+N+Prop+Sem/Obj+Sg+Acc
Office+N+Prop+Sem/Obj+Sg+Ine
Office+N+Prop+Sem/Obj+Sg+Ill
Office+N+Prop+Sem/Obj+Sg+Ela
Office+N+Prop+Sem/Obj+Sg+Com
LEXICON RONDANE-org Organizations
Picture+N+Prop+Sem/Org+Sg+Nom
Picture+N+Prop+Sem/Org+Sg+Ill
Picture+N+Prop+Sem/Org+Sg+Ela
LEXICON RONDANE-mal Male names
Lawrence+N+Prop+Sem/Mal+Sg+Nom
Lawrence+N+Prop+Sem/Mal+Sg+Ill
Lawrence+N+Prop+Sem/Mal+Sg+Ela
LEXICON RONDANE-fem Female names
Jannike+N+Prop+Sem/Fem+Sg+Nom
Jannike+N+Prop+Sem/Fem+Sg+Ill
Jannike+N+Prop+Sem/Fem+Sg+Ela
These sublexica are irrelevant for ACCRA, but added for the sake of the lexicon MARJA
GATA are Norwegian place names that end on -gata. Gets even-syllable casemarking. Last syllable is unstressed. Non-assimilated stems.
LEXICON GATA-plc Norwegian place names that end on -gata. Gets even-syllable casemarking. Last syllable is unstressed.
Munkegata+N+Prop+Sem/Plc+Sg+Nom
Munkegata+N+Prop+Sem/Plc+Sg+Ill
Munkegata+N+Prop+Sem/Plc+Sg+Ela
Words in MARJA end on vowel, with CG, even-syllable case marking. Illative change e to á, illative i stays i. Last syllable is unstressed. Real lule sami stems.
LEXICON MARJA-fem Odd-syllable with cg. Female names
Gáddjá+N+Prop+Sem/Fem+Sg+Nom
Gáddjá+N+Prop+Sem/Fem+Sg+Ill
Gáddjá+N+Prop+Sem/Fem+Sg+Ela
LEXICON MARJA-ani Animal names
Gávrásski+N+Prop+Sem/Ani+Sg+Nom
Gávrásski+N+Prop+Sem/Ani+Sg+Ill
Gávrásski+N+Prop+Sem/Ani+Sg+Ela
LEXICON MARJA-mal Male names
Biet-Ánnda+N+Prop+Sem/Mal+Sg+Nom
Biet-Ánnda+N+Prop+Sem/Mal+Sg+Ill
Biet-Ánnda+N+Prop+Sem/Mal+Sg+Ela
LEXICON MARJA-obj Objects
Bern-Konvensjåvnnå+N+Prop+Sem/Obj+Sg+Nom
Bern-Konvensjåvnnå+N+Prop+Sem/Obj+Sg+Ill
Bern-Konvensjåvnnå+N+Prop+Sem/Obj+Sg+Ela
LEXICON MARJA-org Organizations
Domænadoajmma+N+Prop+Sem/Org+Sg+Nom
Domænadoajmma+N+Prop+Sem/Org+Sg+Ill
Domænadoajmma+N+Prop+Sem/Org+Sg+Ela
LEXICON MARJA-plc Vowel final names with Gradation and Ill change (place names)
Dundarevuobme+N+Prop+Sem/Plc+Sg+Nom
Dundarevuobme+N+Prop+Sem/Plc+Sg+Ill
Dundarevuobme+N+Prop+Sem/Plc+Sg+Ela
LEXICON MARJA-sur Surnames
Skálltje+N+Prop+Sem/Sur+Sg+Nom
Skálltje+N+Prop+Sem/Sur+Sg+Ill
Skálltje+N+Prop+Sem/Sur+Sg+Ela
LEXICON MARJA-plc-der = place name derivations and corresponding flag. Presently not used in SMJ.
LEXICON SUOBMA-plc Placenames. Like MARJA but no derivation
Suobma+N+Prop+Sem/Plc+Sg+Nom
Suobma+N+Prop+Sem/Plc+Sg+Ill
Suobma+N+Prop+Sem/Plc+Sg+Ela
LEXICON SUOBMA-org Placenames. Like MARJA but no derivation
Stressed last syllable
These proper nouns are in essence partly assimilated loan word as foreign words with stressed last syllable are assimilated to sami by (often adapting the stressed syllable vowel, and) adding an unstressed syllable consisting of adapted (or if necesarry added) consonants and ending on vowel a (Morén-Duollja 2014). Proper nouns are only partly assimilated in that the stressed syllable vowel is not adapted in any way, neither are consonats inserted, only the final “a” remains. These proper nouns therefore work like regular a-stem nouns and get an even syllable case marking.
Words in lexicon NYSTØ end on vowel, no cg. Non-assimilated stems
LEXICON NYSTØ-fem Femal names
Britney+N+Prop+Sem/Fem+Sg+Nom
Britney+N+Prop+Sem/Fem+Sg+Acc
Britney+N+Prop+Sem/Fem+Sg+Ill
Britney+N+Prop+Sem/Fem+Sg+Ela
LEXICON NYSTØ-mal Male name
Taneli+N+Prop+Sem/Mal+Sg+Nom
Taneli+N+Prop+Sem/Mal+Sg+Acc
Taneli+N+Prop+Sem/Mal+Sg+Ill
Taneli+N+Prop+Sem/Mal+Sg+Ela
LEXICON NYSTØ-obj Objects
Infiniti+N+Prop+Sem/Obj+Sg+Nom
Infiniti+N+Prop+Sem/Obj+Sg+Acc
Infiniti+N+Prop+Sem/Obj+Sg+Ill
Infiniti+N+Prop+Sem/Obj+Sg+Ela
LEXICON NYSTØ-org Organizations
Kulturby+N+Prop+Sem/Org+Sg+Nom
Kulturby+N+Prop+Sem/Org+Sg+Acc
Kulturby+N+Prop+Sem/Org+Sg+Ill
Kulturby+N+Prop+Sem/Org+Sg+Ela
LEXICON NYSTØ-LOAN-org Organizations loan
Sameby+N+Prop+Sem/Org+Sg+Nom
Sameby+N+Prop+Sem/Org+Sg+Ill
(is not standard language)Sameby+N+Prop+Sem/Org+Sg+Ela
(is not standard language)LEXICON NYSTØ-sur Sur names
Sandoz+N+Prop+Sem/Sur+Sg+Nom
Sandoz+N+Prop+Sem/Sur+Sg+Acc
Sandoz+N+Prop+Sem/Sur+Sg+Ill
Sandozas: Sandoz+N+Prop+Sem/Sur+Sg+Ela
Teigmo+N+Prop+Sem/Plc+Sg+Nom
Teigmo+N+Prop+Sem/Plc+Sg+Acc
Teigmo+N+Prop+Sem/Plc+Sg+Ill
Teigmo+N+Prop+Sem/Plc+Sg+Ela
LEXICON NYSTØ-LOAN-plc Place names loan
Bodø+N+Prop+Sem/Plc+Sg+Nom
Bodø+N+Prop+Sem/Plc+Sg+Ill
(is not standard language)Bodø+N+Prop+Sem/Plc+Sg+Ela
(is not standard language)LEXICON NYSTØ-plc Place names
Borgå+N+Prop+Sem/Plc+Sg+Nom
Borgå+N+Prop+Sem/Plc+Sg+Acc
Borgå+N+Prop+Sem/Plc+Sg+Ill
Borgå+N+Prop+Sem/Plc+Sg+Ela
LEXICON NYSTØ_MWE-plc Place names
Words in DUBAI lexicon end on vowel+vowel and have no cg. Last syllable is stressed. Get even syllable case marking. Non-assimilated stems. Not sure if this lexicon is necessary, at least for smj’s sake.
LEXICON DUBAI-fem I-final names. No cg. Female names
Mai+N+Prop+Sem/Fem+Sg+Nom
Mai+N+Prop+Sem/Fem+Sg+Ill
Mai+N+Prop+Sem/Fem+Sg+Ela
LEXICON DUBAI-obj I-final names. No cg. Object names
Hyundai+N+Prop+Sem/Obj+Sg+Nom
Hyundai+N+Prop+Sem/Obj+Sg+Ill
Hyundai+N+Prop+Sem/Obj+Sg+Ela
LEXICON DUBAI-org Organizations
Khoi+N+Prop+Sem/Org+Sg+Nom
Khoi+N+Prop+Sem/Org+Sg+Ill
Khoi+N+Prop+Sem/Org+Sg+Ela
LEXICON DUBAI-mal Male names
Kublai+N+Prop+Sem/Mal+Sg+Nom
Kublai+N+Prop+Sem/Mal+Sg+Ill
Kublai+N+Prop+Sem/Mal+Sg+Ela
LEXICON DUBAI-sur Surnames
Maarthai+N+Prop+Sem/Sur+Sg+Nom
Maarthai+N+Prop+Sem/Sur+Sg+Ill
Maarthai+N+Prop+Sem/Sur+Sg+Ela
LEXICON DUBAI-plc Place names
Madurai+N+Prop+Sem/Plc+Sg+Nom
Madurai+N+Prop+Sem/Plc+Sg+Ill
Madurai+N+Prop+Sem/Plc+Sg+Ela
Madurai+N+Prop+Sem/Plc+Der/k+N+Sg+Nom
Words in lexicon BERN end on conconant, no cg, even syllable case marking with -av, -aj, -as, etc. Last syllable is stressed. Both assimilated and non-assmilated stems.
LEXICON BERN-ani Animals
Lillemor+N+Prop+Sem/Ani+Sg+Nom
Lillemor+N+Prop+Sem/Ani+Sg+Ill
Lillemor+N+Prop+Sem/Ani+Sg+Ela
LEXICON BERN-mal Male names
Eystein+N+Prop+Sem/Mal+Sg+Nom
Eystein+N+Prop+Sem/Mal+Sg+Ill
Eystein+N+Prop+Sem/Mal+Sg+Ela
LEXICON BERN-surmal name that are both sur- and male names
Pipin+N+Prop+Sem/Sur+Sg+Nom
Pipin+N+Prop+Sem/Sur+Sg+Ill
Pipin+N+Prop+Sem/Sur+Sg+Ela
LEXICON BERN-fem Female name
Ragnfrid+N+Prop+Sem/Fem+Sg+Nom
Ragnfrid+N+Prop+Sem/Fem+Sg+Ill
Ragnfrid+N+Prop+Sem/Fem+Sg+Ela
Different lexicon for female persons. Audhild.
LEXICON BERN-sur Surnames
Lind+N+Prop+Sem/Sur+Sg+Nom
Lind+N+Prop+Sem/Sur+Sg+Ill
Lind+N+Prop+Sem/Sur+Sg+Ela
LEXICON BERN-plc Placenames
Beijing+N+Prop+Sem/Plc+Sg+Nom
Beijing+N+Prop+Sem/Plc+Sg+Ill
Beijing+N+Prop+Sem/Plc+Sg+Ela
LEXICON BERN_MWE-plc Placenames
LEXICON BERN-objsur Names used as both objects and surnames.
Stenbukk+N+Prop+Sem/Obj+Sg+Nom
Stenbukk+N+Prop+Sem/Obj+Sg+Ill
Stenbukk+N+Prop+Sem/Obj+Sg+Ela
LEXICON BERN-orgsur Names used for both organizations and surnames.
Nord+N+Prop+Sem/Org+Sg+Nom
Nord+N+Prop+Sem/Org+Sg+Ill
Nord+N+Prop+Sem/Org+Sg+Ela
LEXICON BERN-obj Objects. Obs: Different lexicon for organisations. Microsoft.
Sult+N+Prop+Sem/Obj+Sg+Nom
Sult+N+Prop+Sem/Obj+Sg+Ill
Sult+N+Prop+Sem/Obj+Sg+Ela
LEXICON BERN-org Organizations
Laks+N+Prop+Sem/Org+Sg+Nom
Laks+N+Prop+Sem/Org+Sg+Ill
Laks+N+Prop+Sem/Org+Sg+Ela
LEXICON BERN-LOAN-org Organizations loan.
Reinsamelag+N+Prop+Sem/Org+Sg+Nom
Reinsamelag+N+Prop+Sem/Org+Sg+Ill
(is not standard language)Reinsamelag+N+Prop+Sem/Org+Sg+Ela
(is not standard language)LEXICON BERN-LOAN-plc Placenames loan.
Mehamn+N+Prop+Sem/Plc+Sg+Nom
Mehamn+N+Prop+Sem/Plc+Sg+Ill
(is not standard language)Mehamn+N+Prop+Sem/Plc+Sg+Ela
(is not standard language)LEXICON BERN-LOAN-obj Objects loan.
Verneplan+N+Prop+Sem/Obj+Sg+Nom
Verneplan+N+Prop+Sem/Obj+Sg+Ill
(is not standard language)Verneplan+N+Prop+Sem/Obj+Sg+Ela
(is not standard language)Different lexicon for names that are both surnames and places.
Lexicons OY work as BERN lexicons
Words in LONDONBERN are sent to both LONDON and BERN lexicons. Non-assmilated stems.
4-syllable stems
Words in lexicon BASUDIS are trisyllabic in sg nom, and work like standard 4-syllable nouns. End on conconant and have cg. Even syllable case marking with acc -áv, ill -áj, ela -ás, etc. Real lule sami stems.
LEXICON BASUDIS-org Only singular. Placenames
LEXICON BASUDIS-mal Male names
Ájluhasj+N+Prop+Sem/Mal+Sg+Nom
Ájluhasj+N+Prop+Sem/Mal+Sg+Ill
(Eng. !should add dummy to prevent unusual dtj-stem?)Ájluhasj+N+Prop+Sem/Mal+Sg+Ela
LEXICON BASUDIS-plc Place names
Ulldevis+N+Prop+Sem/Plc+Sg+Nom
Ulldevis+N+Prop+Sem/Plc+Sg+Ill
Ulldevis+N+Prop+Sem/Plc+Sg+Ela
Plurals
Words in lexicon VARGGAT even-syllable sámi plurals .
LEXICON VARGGAT-plc Plural stems, sáme names. Place names
LEXICON VARGGAT-org Plural stems, sáme names.
Bieva+N+Prop+Sem/Plc+Sg+Nom
(is not standard language)Bieva+N+Prop+Sem/Plc+Pl+Nom
Bieva+N+Prop+Sem/Plc+Pl+Ill
Bieva+N+Prop+Sem/Plc+Pl+Ela
Words in lexicon ALEUHTAT even-syllables assimilated plurals.
LEXICON ALEUHTAT-plc Plural names, not sami names. like -váre, -gårtje
Words in lexicon LONDON end on conconant, no cg, case marking with -av, -ij, -is, etc. Last syllable is unstressed. Gets a regular odd syllable case marking. Both real lule sami stems, assimilated stems and non-assimilated stems
LEXICON LONDON-sur Odd-syllable. Surnames. Final foot structure (X.) and (X..) => Loc:%>is
Åstot+N+Prop+Sem/Sur+Sg+Nom
Åstot+N+Prop+Sem/Sur+Sg+Ill
Åstot+N+Prop+Sem/Sur+Sg+Ela
LEXICON LONDON-ani Animals
Jubmel+N+Prop+Sem/Ani+Sg+Nom
Jubmel+N+Prop+Sem/Ani+Sg+Ill
Jubmel+N+Prop+Sem/Ani+Sg+Ela
LEXICON LONDON-org Only singular Organizations
Klassekampen+N+Prop+Sem/Org+Sg+Nom
Klassekampen+N+Prop+Sem/Org+Sg+Ill
Klassekampen+N+Prop+Sem/Org+Sg+Ela
LEXICON LONDON-mal Male names
Matteus+N+Prop+Sem/Mal+Sg+Nom
Matteus+N+Prop+Sem/Mal+Sg+Ill
Matteus+N+Prop+Sem/Mal+Sg+Ela
LEXICON LONDON-malsur Names that can be both male- and surnames. Not used in smj-propernouns
Timeus+N+Prop+Sem/Mal+Sg+Nom
Timeus+N+Prop+Sem/Mal+Sg+Ill
Timeus+N+Prop+Sem/Mal+Sg+Ela
LEXICON LONDON-fem Female names
Luhták+N+Prop+Sem/Fem+Sg+Nom
Luhták+N+Prop+Sem/Fem+Sg+Ill
Luhták+N+Prop+Sem/Fem+Sg+Ela
LEXICON LONDON-malfem Names that can be both male and female names.Not used in smj-propernouns
Robin+N+Prop+Sem/Fem+Sg+Nom
Robin+N+Prop+Sem/Fem+Sg+Ill
Robin+N+Prop+Sem/Fem+Sg+Ela
LEXICON LONDON-malplc Names that can be both male- and placenames.Not used in smj-propernouns
Jergol+N+Prop+Sem/Mal+Sg+Nom
Jergol+N+Prop+Sem/Mal+Sg+Ill
Jergol+N+Prop+Sem/Mal+Sg+Ela
LEXICON LONDON-plc Only singular. Placenames
Njierek+N+Prop+Sem/Plc+Sg+Nom
Njierek+N+Prop+Sem/Plc+Sg+Ill
Njierek+N+Prop+Sem/Plc+Sg+Ela
LEXICON TJIERREK-plc Only singular. Placenames. Same as LONDON, but does not get Sem/Sur tag, not usuall for SMJ place names to become surnames.
Njierek+N+Prop+Sem/Plc+Sg+Nom
Njierek+N+Prop+Sem/Plc+Sg+Ill
Njierek+N+Prop+Sem/Plc+Sg+Ela
LEXICON LONDON-orgsur Names that can be both organizations and surnames.Not used in Smj-propernouns
Rieser+N+Prop+Sem/Sur+Sg+Nom
Rieser+N+Prop+Sem/Sur+Sg+Ill
Rieser+N+Prop+Sem/Sur+Sg+Ela
LEXICON LONDON-obj Objects.
Rovdjursutredningen+N+Prop+Sem/Obj+Sg+Nom
Rovdjursutredningen+N+Prop+Sem/Obj+Sg+Ill
Rovdjursutredningen+N+Prop+Sem/Obj+Sg+Ela
LEXICON LONDON-LOAN-obj Objects loan. Not used in smj-propernouns
Sameloven+N+Prop+Sem/Obj+Sg+Nom
Sameloven+N+Prop+Sem/Obj+Sg+Ill
(is not standard language)Sameloven+N+Prop+Sem/Obj+Sg+Ela
(is not standard language)LEXICON LONDON-LOAN-plc Only nominatives. Placenames loan. Not used in Smj-propernouns
Jordandalen+N+Prop+Sem/Plc+Sg+Nom
Jordandalen+N+Prop+Sem/Plc+Sg+Ill
(is not standard language)Jordandalen+N+Prop+Sem/Plc+Sg+Ela
(is not standard language)LEXICON LONDON-LOAN-org Only nominative. Organizations loan.Not used in smj-propernouns
Samfunnsavdelingen+N+Prop+Sem/Org+Sg+Nom
Samfunnsavdelingen+N+Prop+Sem/Org+Sg+Ill
(is not standard language)Samfunnsavdelingen+N+Prop+Sem/Org+Sg+Ela
(is not standard language)JOKULL-plc are placenames. Lexicon added to make the code compile (?)
+N+Prop+Sem/Plc: LONDONDECL-PLC-SUR ; Placenames. NB added to make the code compile, needs revision. Gets an odd syllable case marking. Non-assimilated stems.
Drangajökull+N+Prop+Sem/Plc+Sg+Nom
Drangajökull+N+Prop+Sem/Plc+Sg+Ill
Drangajökull+N+Prop+Sem/Plc+Sg+Ela
Words in lexicon ANAR end on conconant, no cg, case marking with ill -ij, ela -is. Gets an odd syllable case marking. Lule sami stems.
LEXICON ANAR-mal Male names.
LEXICON ANAR-plc Place names
Guhttás+N+Prop+Sem/Plc+Sg+Nom
Guhttás+N+Prop+Sem/Plc+Sg+Ill
Guhttás+N+Prop+Sem/Plc+Sg+Ela
Words in PIPPI lexicons are i-final, have no cg, no second syllable vowel change, and get odd syllable case marking with acc -hav, ill -hij, elat -his, etc. Works as “riebij”, but without the -j in nominative (it should maybe be Sirij and Pippij in nom?) and without cg. The last syllable is unstressed. Non-assimilated stems.
LEXICON PIPPI-ani IVowel-final names where case endings are added directly, no cg. Animals.
Rullahuuli+N+Prop+Sem/Ani+Sg+Nom
Rullahuuli+N+Prop+Sem/Ani+Sg+Ill
Rullahuuli+N+Prop+Sem/Ani+Sg+Ela
LEXICON PIPPI-obj Vowel-final names where case endings are added directly, no cg. Object names
Audi+N+Prop+Sem/Obj+Sg+Nom
Audi+N+Prop+Sem/Obj+Sg+Ill
Audi+N+Prop+Sem/Obj+Sg+Ela
LEXICON PIPPI-org Vowel-final names where caseendings are added directly, no cg. organizations
Kon-Tiki+N+Prop+Sem/Org+Sg+Nom
Kon-Tiki+N+Prop+Sem/Org+Sg+Ill
Kon-Tiki+N+Prop+Sem/Org+Sg+Ela
LEXICON PIPPI-mal Vowel-final names where case are added directly, no cg. Male names
Gianni+N+Prop+Sem/Mal+Sg+Nom
Gianni+N+Prop+Sem/Mal+Sg+Ill
Gianni+N+Prop+Sem/Mal+Sg+Ela
LEXICON PIPPI-fem Vowel-final names where case endings are added directly, no cg. Female names
Guri+N+Prop+Sem/Fem+Sg+Nom
Guri+N+Prop+Sem/Fem+Sg+Ill
Guri+N+Prop+Sem/Fem+Sg+Ela
LEXICON PIPPI-femsur Vowel-final names where case endings are added directly, no cg. Female names also used as surnames
Turi+N+Prop+Sem/Fem+Sg+Nom
Turi+N+Prop+Sem/Fem+Sg+Ill
Turi+N+Prop+Sem/Sur+Sg+Ill
Turi+N+Prop+Sem/Fem+Sg+Ela
LEXICON PIPPI-malfem Vowel-final names where case endings are added directly, no cg. Names that can be both female and male names
Kari+N+Prop+Sem/Mal+Sg+Nom
Kari+N+Prop+Sem/Fem+Sg+Nom
Kari+N+Prop+Sem/Mal+Sg+Ill
Karihis: Kari+N+Prop+Sem/Mal+Sg+Ela
LEXICON PIPPI-sur Vowel-final names where case endings are added directly, no cg. Surnames
Sammallahti+N+Prop+Sem/Sur+Sg+Nom
Sammallahti+N+Prop+Sem/Sur+Sg+Ill
Sammallahtihis: Sammallahti+N+Prop+Sem/Sur+Sg+Ela
LEXICON PIPPI-plc Vowel-final names where caseendings are added directly, no cg. Place names
Lapinlampi+N+Prop+Sem/Plc+Sg+Nom
Lapinlampi+N+Prop+Sem/Plc+Sg+Ill
Lapinlampihis: Lapinlampi+N+Prop+Sem/Plc+Sg+Ela
LEXICON PIPPI-LOAN-plc Only nominatives. Vowel-final names where case endings are added directly, no cg.Place names
Haltiatunturi+N+Prop+Sem/Plc+Sg+Nom
Haltiatunturi+N+Prop+Sem/Plc+Sg+Ill
(is not standard language)★Haltiatunturijis: Haltiatunturi+N+Prop+Sem/Plc+Sg+Ela
(is not standard language)
Words in lexicon DUORTNUS end on conconant, have cg and second syllable vowel change o:u, e:á. Odd syllable case marking. Real lule sami or one non-assimilated stem.
LEXICON DUORTNUS-mal Male names
Mihkal+N+Prop+Sem/Mal+Sg+Nom
Mihkal+N+Prop+Sem/Mal+Sg+Ill
Mihkal+N+Prop+Sem/Mal+Sg+Ela
LEXICON DUORTNUS-sur Male names
Vándar+N+Prop+Sem/Sur+Sg+Nom
Vándar+N+Prop+Sem/Sur+Sg+Ill
Vándar+N+Prop+Sem/Sur+Sg+Ela
LEXICON DUORTNUS-org Odd-syllable ending on consonant, with cg. Organizations
LEXICON DUORTNUS-plc Odd-syllable ending on consonant, with cg.Placenames
Hardangerduottar+N+Prop+Sem/Plc+Sg+Nom
Hardangerduottar+N+Prop+Sem/Plc+Sg+Gen
Hardangerduottar+N+Prop+Sem/Plc+Sg+Ill
Hardangerduottar+N+Prop+Sem/Plc+Sg+Ela
LEXICON TIEMPEL-obj Same as DUORTNUS, only without second syll vowel change. Odd syllanle case marking Lexicon presently only for two -tiempel-final words. Lule sami stems.
Artemistiempel+N+Prop+Sem/Obj+Sg+Nom
Artemistiempel+N+Prop+Sem/Obj+Sg+Ill
Artemistiempel+N+Prop+Sem/Obj+Sg+Ine
Artemistiempel+N+Prop+Sem/Obj+Sg+Ela
LEXICON TIEMPEL-org Same as DUORTNUS, only without second syll vowel change. Odd syllanle case marking Lexicon presently only for two -tiempel-final words. Lule sami stems.
Samovarteáhtar+N+Prop+Sem/Org+Sg+Nom
Samovarteáhtar+N+Prop+Sem/Org+Sg+Ill
Samovarteáhtar+N+Prop+Sem/Org+Sg+Ine
Samovarteáhtar+N+Prop+Sem/Org+Sg+Ela
Lexicon HEANDARAT is not in use in smj
+Pl+Nom:aQ1 K ; +Pl+Gen:aQ1j K ; +Pl+Gen:aQ1j RHyph ; +Pl+Acc:aQ1jt K ; +Pl+Ill:aQ1jda K ; +Pl+Ine:aQ1jn K ; +Pl+Ela:aQ1js K ; +Pl+Com:aQ1j K ;
Words in lexicon EATNAMAT are odd-syllable plurals. Lule sami stems and non-assimilated stems.
LEXICON EATNAMAT-plc Place names. Presently only for Vuolleednama
Vuolleednama+N+Prop+Sem/Plc+Sg+Nom
(is not standard language)Vuolleednama+N+Prop+Sem/Plc+Pl+Nom
Vuolleednama+N+Prop+Sem/Plc+Pl+Ill
Vuolleednama+N+Prop+Sem/Plc+Pl+Ela
LEXICON EATNAMAT-org Organizations
Words in lexicon DAVVISUOLLU are contracted propernouns ending on -åj/-oj. Lule sami stems
LEXICON DAVVISUOLU-plc Contracted stems ending on -oj. Place names.
Victoriasuoloj+N+Prop+Sem/Plc+Sg+Nom
Victoriasuoloj+N+Prop+Sem/Plc+Sg+Ill
Victoriasuoloj+N+Prop+Sem/Plc+Sg+Ela
Words in lexicon GEAVNNIS are contracted propernouns ending on -s.
LEXICON GEAVNNIS-plc Contracted stems ending on -es. Place names. Lule sami stems.
Gaza-Sárges+N+Prop+Sem/Plc+Sg+Nom
Gaza-Sárges+N+Prop+Sem/Plc+Sg+Ill
Gaza-Sárges+N+Prop+Sem/Plc+Sg+Ela
Words in lexicon SUOLLOT are contracted plurals. Lule sami stems.
LEXICON SULLOT-plc Plural names, only names ending on -suollu.
Falklandsuollu+N+Prop+Sem/Plc+Sg+Nom
(is not standard language)Falklandsuollu+N+Prop+Sem/Plc+Pl+Nom
Falklandsuollu+N+Prop+Sem/Plc+Pl+Ill
Falklandsuollu+N+Prop+Sem/Plc+Pl+Ela
ERVASTI is only used in smi-propenouns. Ervasti names are 3-syllable and are needed as a seperate lexicon because of sma. ERVASTI is same as ACCRA in smj and gets even syllable case marking.
MAKI and NIEMI is only used in smi-propenouns. Maki names are even-syllable finnish names and are needed as a seperate lexicon because of sma. MÄKI is same as ACCRA in smj and gets even syllable case marking.
HANNOLA is the same as ACCRA
This (part of) documentation was generated from src/fst/morphology/affixes/propernouns.lexc
This (part of) documentation was generated from src/fst/morphology/affixes/symbols.lexc
Table of content:
IV means intransitive verbs, TV means transitive verbs.
LEXICON NEG
LEXICON ÅRROT
LEXICON LIEHKET
LEXICON LULU
LEXICON GALGGAT_IV even-syllable modal verbs.
soajttet+V+IV+Ind+Prs+Sg1
soajttet+V+IV+Ind+Prt+Sg1
soajttet+V+IV+Ind+Prt+Pl1
LEXICON VIERTTIT_IV Contracted modal verbs.
hæhttut+V+IV+Inf
hæhttut+V+IV+Inf
hæhttut+V+IV+Ind+Prs+Sg1
hæhttut+V+IV+Ind+Prs+Sg1
hæhttut+V+IV+Ind+Prt+Pl1
hæhttut+V+IV+Ind+Prt+Pl1
Intransitives
LEXICON GALSSJOT_IV Impersonal o-verbs
hærmmot+V+IV+Ind+Prs+Sg3
hærmmot+V+IV+Ind+Prs+Sg3
hærmmot+V+IV+Ind+Prt+Sg3
hærmmot+V+IV+Ind+Prt+Sg3
LEXICON BÅRSSJOT_IV o-verbs with
hæssot+V+IV+Ind+Prs+Sg1
hæssot+V+IV+Ind+Prs+Sg1
hæssot+V+IV+Ind+Prt+Sg1
hæssot+V+IV+Ind+Prt+Pl1
hæssot+V+IV+Ind+Prt+Pl1
LEXICON VILSSJOT_IV o-verbs as BÅRSSJOT but without derivations -stit, -stallat, -stahttet, - stasstet. With dim -astit that are hardcoded
libjjot+V+IV+Ind+Prs+Sg1
libjjot+V+IV+Ind+Prt+Sg1
libjjot+V+IV+Ind+Prt+Pl1
LEXICON BUOLLET_IV e-verbs
liddet+V+IV+Ind+Prs+Sg1
liddet+V+IV+Ind+Prt+Sg1
liddet+V+IV+Ind+Prt+Pl1
LEXICON BOAHTET_IV e-verbs like BUOLLET_IV without passive
boahtet+V+IV+Ind+Prs+Sg1
boahtet+V+IV+Ind+Prt+Sg1
boahtet+V+IV+Ind+Prt+Pl1
LEXICON VIEDJET_IV e-verbs GRADE II-I WITH IE DIPHT.
biehket+V+IV+Ind+Prs+Sg1
biehket+V+IV+Ind+Prs+Sg1
biehket+V+IV+Ind+Prt+Sg1
biehket+V+IV+Ind+Prt+Pl1
LEXICON ASSTAT_IV only for asstat, no passive
asstat+V+IV+Ind+Prs+Sg1
asstat+V+IV+Ind+Prt+Sg1
asstat+V+IV+Ind+Prt+Pl1
LEXICON RAVGGAT_IV a- and å-verbs only Sg3 passive.
bivvat+V+IV+Ind+Prs+Sg1
bivvat+V+IV+Ind+Prt+Sg1
bivvat+V+IV+Ind+Prt+Pl1
LEXICON BIEGGAT_IV Impersonals
dednjat+V+IV+Ind+Prs+Sg3
dednjat+V+IV+Ind+Prs+Sg3
dednjat+V+IV+Ind+Prt+Sg3
LEXICON RAVGGALASSTET_IV Like RAVGGAT for already derived words (except words ending -uššat) - no actio as first part of compounds, but reintroduced
dehpudallat+V+IV+Ind+Prs+Sg1
dehpudallat+V+IV+Ind+Prt+Sg1
dehpudallat+V+IV+Ind+Prt+Pl1
LEXICON BIEKKASTALLAT_IV Already derived impersonals
duhpárasstet+V+IV+Ind+Prs+Sg3
duhpárasstet+V+IV+Ind+Prt+Sg3
LEXICON GUOTTEDALLAT_IV passives on -allat - no actio as first part of compounds, but reintroduced
duolmudallat+V+IV+Ind+Prs+Sg1
duolmudallat+V+IV+Ind+Prt+Sg1
duolmudallat+V+IV+Ind+Prt+Pl1
LEXICON HIEBADUVVAT_IV passives on -uvvat - no actio as first part of compounds, but reintroduced
duostoduvvat+V+IV+Ind+Prs+Sg1
duostoduvvat+V+IV+Ind+Prt+Sg1
duostoduvvat+V+IV+Ind+Prt+Pl1
Transitives LEXICON MÁHTTET_TV verbs without personal passive
jáhkket+V+TV+Ind+Prs+Sg1
jáhkket+V+TV+Ind+Prt+Sg1
jáhkket+V+TV+Ind+Prt+Pl1
LEXICON BASSAT_TV a- and å-verbs. Three passives
jåksåt+V+TV+Ind+Prs+Sg1
jåksåt+V+TV+Ind+Prt+Sg1
jåksåt+V+TV+Ind+Prt+Pl1
LEXICON BASSALASSTET_TV Like BASSAT for already derived words (except words ending -uššat) - no actio as first part of compounds, but reintroduced. Three passives
jårgudallat+V+TV+Ind+Prs+Sg1
jårgudallat+V+TV+Ind+Prt+Sg1
jårgudallat+V+TV+Ind+Prt+Pl1
LEXICON HIEJTEDAHTTET_TV Like BASSALASSTET_TV, but for words ending on -ahttet. Diffrence is Use/NG an Use/-Spell for NomAg “hiejedahttijn”, since this is rearly used an is mixed up with gerundium “hiejtedattijn”. Like BASSAT for already derived words (except words ending -uššat) - no actio as first part of compounds, but reintroduced. Three passives
jårgudallat+V+TV+Ind+Prs+Sg1
jårgudallat+V+TV+Ind+Prt+Sg1
jårgudallat+V+TV+Ind+Prt+Pl1
LEXICON JUHKAT_TV a-verbs like BASSAT_TV but but without derivations -stit, -stallat, -stahttet, - stasstet. Dim -istit that are hardcoded. Three passives
njammat+V+TV+Ind+Prs+Sg1
njammat+V+TV+Ind+Prt+Sg1
njammat+V+TV+Ind+Prt+Pl1
LEXICON LÁHPPET_TV e-verbs. Three passives
oajttet+V+TV+Ind+Prs+Sg1
oajttet+V+TV+Ind+Prt+Sg1
oajttet+V+TV+Ind+Prt+Pl1
LEXICON JIEHKET_TV e-verbs GRADE II-I WITH IE DIPHT. Three passives
sievvet+V+TV+Ind+Prs+Sg1
sievvet+V+TV+Ind+Prs+Sg1
sievvet+V+TV+Ind+Prt+Sg1
sievvet+V+TV+Ind+Prt+Pl1
LEXICON DIEHTET_TV Only this one word, unusual diphtong behavior. No passive
diehtet+V+TV+Ind+Prs+Sg1
diehtet+V+TV+Ind+Prt+Sg1
diehtet+V+TV+Ind+Prt+Pl1
LEXICON GÁDJOT_TV o-verbs. only duvvat passive.
sjpædtjot+V+TV+Ind+Prs+Sg1
sjpædtjot+V+TV+Ind+Prs+Sg1
sjpædtjot+V+TV+Ind+Prt+Sg1
sjpædtjot+V+TV+Ind+Prt+Sg1
sjpædtjot+V+TV+Ind+Prt+Pl1
sjpædtjot+V+TV+Ind+Prt+Pl1
LEXICON JÅRGGOT_TV o-verbs with dim -astit that are hardcoded. Duvvat and dallat passive.
boarkkot+V+TV+Ind+Prs+Sg1
boarkkot+V+TV+Ind+Prt+Sg1
boarkkot+V+TV+Ind+Prt+Pl1
This is just awaiting a manual classification
LEXICON BIEKKASTIT_IV Impersonals, only Sg3
LEXICON JÅRGESTIT_IV At the moment IV, we may perhaps change IV/TV.
doalvestit+V+IV+Ind+Prs+Sg1
doalvestit+V+IV+Ind+Prt+Sg1
doalvestit+V+IV+Ind+Prt+Pl1
LEXICON BEGATJIT_IV Words ending -tjit, -jdit, reciprocals on -dit, momentatives on -dit, -edit, continuatives on -ldit, -nit, essives on -hit and 5-syllables - no actio cmps, but only Sg3 passivereintroduced
duojkkuhit+V+IV+Ind+Prs+Sg1
duojkkuhit+V+IV+Ind+Prt+Sg1
duojkkuhit+V+IV+Ind+Prt+Pl1
LEXICON BALÁDIT_IV continuatives on -dit, frequentatives on -odit, reciprocals, momentatives and frequentatives ending -alit - actio cpms, only Sg3 passive
lihtudit+V+IV+Ind+Prs+Sg1
lihtudit+V+IV+Ind+Prt+Sg1
lihtudit+V+IV+Ind+Prt+Pl1
LEXICON SUOGNALIT_IV Trisyllabic Verbs ending -lit. only Sg3 passive
loavkkalit+V+IV+Ind+Prs+Sg1
loavkkalit+V+IV+Ind+Prt+Sg1
loavkkalit+V+IV+Ind+Prt+Pl1
LEXICON LASSÁNIT_IV verbs ending -nit, -sit, no passive
rievddánit+V+IV+Ind+Prs+Sg1
rievddánit+V+IV+Ind+Prt+Sg1
rievddánit+V+IV+Ind+Prt+Pl1
LEXICON BÁHTARIT_IV verbs ending -rit. only Sg3 passive
sjtávttjurit+V+IV+Ind+Prs+Sg1
sjtávttjurit+V+IV+Ind+Prt+Sg1
sjtávttjurit+V+IV+Ind+Prt+Pl1
LEXICON UNNEDIT_TV All -uvvat passives.
nuoledit+V+TV+Ind+Prs+Sg1
nuoledit+V+TV+Ind+Prt+Sg1
nuoledit+V+TV+Ind+Prt+Pl1
LEXICON MUJTATJIT_TV Words ending -tjit, -jdit, reciprocals on -dit, momentatives on -dit, -edit, continuatives on -ldit, -nit, essives on -hit and 5-syllables - no actio cmps, but reintroduced. All -uvvat passives
nårddådit+V+TV+Ind+Prs+Sg1
nårddådit+V+TV+Ind+Prt+Sg1
nårddådit+V+TV+Ind+Prt+Pl1
LEXICON BÅNJÅDIT_TV continuatives on -dit, frequentatives on -odit, reciprocals, momentatives and frequentatives ending -alit - actio cpms. All -uvvat passives.
tsirggalit+V+TV+Ind+Prs+Sg1
tsirggalit+V+TV+Ind+Prt+Sg1
tsirggalit+V+TV+Ind+Prt+Pl1
LEXICON VUORDDELIT_TV Trisyllabic Verbs ending -lit. All -uvvat passives
tsåggålit+V+TV+Ind+Prs+Sg1
tsåggålit+V+TV+Ind+Prt+Sg1
tsåggålit+V+TV+Ind+Prt+Pl1
LEXICON SJIERRIT_IV Impersonals
boavddit+V+IV+Ind+Prs+Sg3
boavddit+V+IV+Ind+Prt+Sg3
LEXICON BASSUT_IV Passives
buvvut+V+IV+Ind+Prs+Sg1
buvvut+V+IV+Ind+Prt+Sg1
buvvut+V+IV+Ind+Prt+Pl1
LEXICON OADDÁT_IV Incoative, (doarrut,jåhttåt). Only Sg3 passive. Does not make nouns via -ár derivation.
bæhkkát+V+IV+Ind+Prs+Sg1
bæhkkát+V+IV+Ind+Prs+Sg1
bæhkkát+V+IV+Ind+Prt+Sg1
bæhkkát+V+IV+Ind+Prt+Sg1
bæhkkát+V+IV+Ind+Prt+Pl1
bæhkkát+V+IV+Ind+Prt+Pl1
LEXICON DULLUT_IV Does not make nouns via -ár derivation. Only Sg3 passiv.
dussut+V+IV+Ind+Prs+Sg1
dussut+V+IV+Ind+Prt+Sg1
dussut+V+IV+Ind+Prt+Pl1
LEXICON TJUOLLÁT_TV Incoativ. All passive. Does not make nouns via -ár derivation, (gullát, bårråt)
gajkkát+V+TV+Ind+Prs+Sg1
gajkkát+V+TV+Ind+Prt+Sg1
gajkkát+V+TV+Ind+Prt+Pl1
LEXICON STRÁFFUT_TV Does not make nouns via -ár derivation. All duvvat passives.
gáhpput+V+TV+Ind+Prs+Sg1
gáhpput+V+TV+Ind+Prt+Sg1
gáhpput+V+TV+Ind+Prt+Pl1
LEXICON TSIEGGIT_TV Makes nouns via -ár derivation. All duvvat passives.
gámmpit+V+TV+Ind+Prs+Sg1
gámmpit+V+TV+Ind+Prt+Sg1
gámmpit+V+TV+Ind+Prt+Pl1
gámmpit+V+TV+Der/r+N+Sg+Nom
LEXICON VALLIT_TV Makes nouns via -ár derivation. Gets only passive Sg3
hinnit+V+TV+Ind+Prs+Sg1
hinnit+V+TV+Ind+Prt+Sg1
hinnit+V+TV+Ind+Prt+Pl1
hinnit+V+TV+Der/r+N+Sg+Nom
contraced verbs assimilated and outside the main pattern.
LEXICON PLÁNIT_TV Transitive Two-syll contraced words not in third grade as contraced verb have been. Two syllable transitive NEW loan verbs. Makes nouns via -ár derivation. All passives.
bloaggit+V+TV+Ind+Prs+Sg1
bloaggit+V+TV+Ind+Prs+Sg1
bloaggit+V+TV+Ind+Prt+Sg1
bloaggit+V+TV+Ind+Prt+Sg1
bloaggit+V+TV+Ind+Prt+Pl1
bloaggit+V+TV+Ind+Prt+Pl1
bloaggit+V+TV+Der/r+N+Sg+Nom
bloaggit+V+TV+Der/r+N+Sg+Nom
LEXICON SLEDUT_IV Intransitive Two-syll contraced words not in third grade as contraced verb have been. Only Sg3 passiv.
håŋŋlit+V+IV+Ind+Prs+Sg1
håŋŋlit+V+IV+Ind+Prt+Sg1
håŋŋlit+V+IV+Ind+Prt+Pl1
LEXICON BADASS_TV NEW badly assimilated two syllable transitive loan verbs. Makes nouns via -ár derivation. All passives. Err/orth taged in stem file
LEXICON BADASS_IV NEW badly assimilated two syllable intransitive loan verbs. Makes nouns via -ár derivation. Only Sg3 passiv. Err/orth taged in stem file.
LEXICON ABBONERE_TV Transitive loan words with more than two syllables with -erit/ierit endings. Duvvat passives. Does not make nouns via -ár derivation. Only the two last syllables are assimilated to sami. LONG -e is assimilated in different ways in Norway and Sweden: In Norway, it becomes -ie, and in Sweden -e.
LEXICON BRILJERE_IV Intransitive loan words with more than two syllables with -erit/ierit endings. Does not make nouns via -ár derivation. Only the two last syllables are assimilated to sami. Long -e is assimilated in different ways in dialects in Norway and Sweden: In Norway it often becomes -ie, while in Sweden itºs usually -e.
LEXICON BRILJERE_IV_INFL
briljierit+V+IV+Ind+Prs+Sg1
briljierit+V+IV+Ind+Prs+Sg1
briljierit+V+IV+Ind+Prt+Sg1
briljierit+V+IV+Ind+Prt+Sg1
briljierit+V+IV+Ind+Prt+Pl1
briljierit+V+IV+Ind+Prt+Pl1
LEXICON ABBONERE_TV_INFL
abbonierit+V+TV+Ind+Prs+Sg1
abbonierit+V+TV+Ind+Prs+Sg1
abbonierit+V+TV+Ind+Prt+Sg1
abbonierit+V+TV+Ind+Prt+Sg1
abbonierit+V+TV+Ind+Prt+Pl1
abbonierit+V+TV+Ind+Prt+Pl1
This (part of) documentation was generated from src/fst/morphology/affixes/verbs.lexc
This (part of) documentation was generated from src/fst/morphology/compounding.lexc
This file documents the phonology.twolc file
The file contains the rule set for the non-segmental Lule Sámi morphphonological rules
The file is modeled upon the corresponding file for North Sámi, but has been revised and differs from it on several issues. The grammatical sources are Spiik 1989: Lulesamisk grammatik and Nystø and Johnsen 2001: Sámásta 2.
The rule file has the sections Alphabet, Sets, Definition and Rules. The rules are ordered thematically, with 3 main sections: Consonant alternations (except CG), vowel alternations, and consonant gradation.
All Lule Saami letters are listed. The Lule Sámi ENG sound is represented as ñ. Lule Sámi letter repertoire is not fully standardised. In the source code we write (and you shall write!) æ; ø; ŋ, but the parser tolerates input written with the the letters ä; ö; ń, ñ (cf. the 4 rules in the file smj/src/orthography/spellrelax.regex).
small letters = a á b c d e f g h i j k l m n ñ ń ŋ o p q r s t u v w x y z æ:æä ä:æä ø ö å %- é ó ú í à è ò ù ì ë ü ï â ê ô û î ã ý ç č đ ð š ŧ þ ß ª
capital letters = A Á B C D E F G H I J K L M N Ñ Ń Ŋ O P Q R S T U V W X Y Z Æ:ÆÄ Ä:ÆÄ Ø Ö Å É Ó Ú Í À È Ò Ù Ì Ë Ü Ï Â Ê Ô Û Î Ã Ý Ç Č Đ Ð Š Ŧ þ
The 3rd degree mark º is never realized, hence declared as º:0. º:0 = Gradation mark %/ = Literal /, not the TWOLC reserved symbol ‘:’ = Apostrophe
Literal quotes and angles must be escaped (cf morpheme boundaries further down):
h2, g2 etc. are consonants deleted in the Nom. m3, d3 etc. (?) are consonants that undergo certain processes word-finally. This issue should be looked into. Perhaps the two sets can be unified. The reason why there are more distinctions than for sme, is that the cns deletion process is more phonological in sme.
’:’ = Morphophonemes in sme, here temporarily due to common propernoun file
The Dummy symbols are taken from the sme file for convenience, only a small part of them are actually used, they are defined in the Sets section along the way, included there as soon as they are used. The set of actually used Dummy symbols is thus the set declared in “Dummy”. The Dummy symbols trigger morphophonological rules. X is used for nouns and adjectives, Y for verbs and Q for processes common to all The symbols themselves are used in the following way:
OBS: the definitions are not all correct or sufficiently specific
**Z4:0 : weak grade trigger fºf:f and e:á, e:å, o:á, o:u in front of diminutives, e**: å in -lasj der
These are the sets:
WeG: the dummy symbols that trigger weak grade
Vow = a á e i o u y æ ä ø ö å æä
A Á E I O U Y Æ Ä Ø Ä Å ÆÄ
é ó ú í à è ò ù ì ë ü ï â ê ô û î ã ý
É Ó Ú Í À È Ò Ù Ì Ë Ü Ï Â Ê Ô Û Î Ã Ý
a9 e9 o9 æ9 ä9
a9 e9 o9 æ9 ä9
É Ó Ú Í À È Ò Ù Ì Ë Ü Ï Â Ê Ô Û Î Ã Ý ;
CapCns = B C D F G H J K L M N Ñ Ń Ŋ P Q
R S T V W X Z Ç Č Đ Ð Š Ŧ þ ;
In this section, the consonants are defined. This includes consonant clusters in the various grades and consonant alternations.
The alternation patterns according to Spiik’s alternations series, here named S4, S5, … for “Spiik alternation series 4, 5, etc.” as they are presented in his grammar..
Class | Alternation | Series |
S7 | kkn:k0n | series 1 |
S8 | fºf:f0f | series 2 |
S9 | jgg:j0g | series 3 |
S4 | hkk:h0k | series 4 |
S5 | xy:zy (no zeros) | series 5 |
S6 | xx:yy (no zeros) | series 6 |
S7 | xy:zy (no zeros) | series 7 |
S8 | —– (no cg) | series 8 |
Definition of gradation symbols:
LowerG12: A definition of Grade 1 or 2 consonant sequences
G32: A definition of Grade 3 or 2 consonant sequences
G31: A definition of Grade 3 or 1 consonant sequences which is not SMJ-grammar only for err/orths
The rules section has the following chapters: Consonant alternations in certain pos, vowel lengthening, diphthong simplification, stem vowel alternations, consonant gradation rules
All rules deal with word-final position.
**Word Final Devoicing of Certain Single Consonants d9 etc. **
**Word Final Devoicing of Certain Single Consonants m9-v ** ! Spilt up because of err/orths ending on v, gierkav> we want err/orth gierkkam
gier0kav>0
gierkkam>a
gierkkam>ij
**Err/Orths. **
Word final weakening -tj and -ttj to -sj part 1
Word final weakening -tj and -ttj to -sj part 2
jågåsj
Word Final Deletion of n8 m8 g8 h8
Word Final Neutralization of g8, h8, m8
Deleting Final h9 in Short Essive of Uneven Syllables
Deleting Final l9 in Short Essive of Uneven Syllables
Deleting Final m9 in Short Essive of Uneven Syllables
Deleting Final n9 in Short Essive of Uneven Syllables
Deleting Final r9 in Short Essive of Uneven Syllables
The second syllable vowel a is lengthened to á whenever the stem consonants are in grade 1 and the first syllable vowel is short. Short vowels cannot preceed and follow a single intervocalic consonant.
Compulsatory lengthening in grade I even-syllables
The diphthong simplification handles oa:å and æ:e. Phonologically, these are identical processes, but since the dipthong is written by two letters in the former case and by one letter in the latter, the alternations must be handled separately. This section also handles ie:æ, these are in principle the same as oa:å, but the alternation does not occur in so many contexts.
**oa:å Diphtong Simplification Part I **
oa:å Diphtong Simplification Part II
toahkki00jn
★t0åhkki00jn (is not standard language)
b0ållu0j
★r0åvggu0j (is not standard language)
t0ås0su00jn
★toas0su00jn (is not standard language)
★m0ås0su0jn (is not standard language)
moas0su0jn
goar0ru00jn
goarru00jn
★g0år0ru00jn (is not standard language)
★g0år0ru0 (is not standard language)
g0årru0
doaddje0
★d0åddje0 (is not standard language)
g0år0ru0dit
★goar0ru0dit (is not standard language)
toabbmu00j
t0åbmu0j
★t0åbbmu00j (is not standard language)
★toa0mu0dallat (is not standard language)
oaddu00j
b0å0sjku00jn
★boas0jku00jn (is not standard language)
b0åj0stu00jn
★boaj0stu00jn (is not standard language)
b0åkku00jn
★boakku00jn (is not standard language)
examples:*
examples:*
examples:*
examples:*
examples:*
examples:*
examples:*
examples:*
examples:*
**æ:e Diphthong Simplification **
hæärránis
hæärránis#gæähttjalibme>
pasien0ta>0
paten0ta>0
kvotien0ta>0
klien0ta>0
Lev0nja>0
a^dræässa#sáhtso>
★a^dressa#sáhtso> (is not standard language)
**ie:æä Diphthong Simplification Part I **
0æälvv00ut
0æähtts00up
ie:æä Diphthong Simplification Part II The multichar æä is always the only option
jæähttse>0
jæähttse>0
g0æä0rá»0dalla>t
Vowel-change oa:å for verbs part I
Vowel-change oa:å for verbs part II
hoallá0
goaddne0
★hållá0 (is not standard language)
This section is divided according to stem vowels: a-, e-, o-, å-stems.
For a-stems, there is a:e and a:i. Each alternation is triggered by a combination of phonological content and dummy symbols.
a:e in Present Participle of even-syllable verbs
a:i in Prs Prc of even-syllable verbs
a-stem vowel deletion
For e-stems, there is e:i, e:á, e:å, e:u and e:a. Each alternation is triggered by a combination of phonological content and dummy symbols.
e:i in e-stems
manassi0j
bie0si0j
boahtti0j
gá0li0sj
gá0li0tjav
gá0li0tjin
gá0li0tjihpit
gá0li0tjibá
gá0li0tjip
gá0li0tja
gie0ri>0tja
The following two rules constitute a <= / => rule pair.
e:á in certain stem types 1
e:á in certain stem types 2
bárnná0m
★bárnne0m (is not standard language)
bálggá0v
gállá0m#
gá0lá0v#
bá0gu0sj#
goa0dá0sj#
e:å in certain stem types with å as root vowel
jå0då0v
jåhtå0
jåhtå0m
e-stem vowel deletion
For i-stems, there is i:á. The alternation is triggered by a combination of phonological content and dummy symbols.
i:á in Verb Derivation
The duplicates of the three lines of the two following rules are there to resolve the => conflict between the two rules.
o:u in certain stem types 1
o:u in certain stem types 2
u:o in contracted nouns
o-stem vowel deletion
å:e in Present Participle of even-syllable verbs
å:i in Actor nouns of even-syllable verbs
å-stem vowel deletion
Stem vowel deletion in even-syllable verbs, imp 3sg, 3du, 2pl, 3pl
0æälvv00up
giess00up
The consonant gradation rules differ considerably from the corresponding rules for North Sámi. Instead of generalizing oversets of consonants (Cx:Cy <=> …), each rule contains the alternation for one consonant only, and to the right of the <=> arrow is listed all the contexts where the relevant alternation appears. The disadvantage with this method is that the same context must be written several times, if e.g. both p, t and k are deleted in the same contexts, each of these contexts must be written several times, one for each consonant. The advantage is that there are no conflicts during compilation, compilation takes 10 seconds rather than 3 minutes. The earlier North-Sámi-style rule set was ordered according to CG pattern. This pattern is still visible in the new rules, via the reference S1-3 etc. (Spiik’s Series 1, 3-letter pattern, etc) behind each subrule.
This actually opens up for a migration to an xfst rule file instead of the current twolc format, since what xfst really cannot do is generalize over sets (Cx:Cy etc.). This is an issue for future revisions to decide.
The rules are divided in two subsections, deletion rules and change (alternation) rules.
The b, d, g deletion rules are similar, via the optional ( b ) etc. in front of the “_” symbol, both bm:m and bbm:bm alternations are covered. The contexts differ to a certain extent. For b and d, the III-I special gradation bbm:m is covered by two separate rules, and a special Dummy (X6), not part of the ordinary WeG set.
Note that one of the rules for t:0 refers to #: as part of its context. As soon as clitics are added to the word form, this rule will thus not be triggered. Look into this when the clitics are added.
Consonant gradation b:0 deletes b in S7 and S9 contexts
Consonant gradation d:0 … etc.
Consonant gradation g:0
Consonant gradation k:0
Consonant gradation l:0
Consonant gradation m:0
Consonant gradation n:0
Consonant gradation p:0
Consonant gradation s:0
ru0sjpe0
Consonant gradation ŋ:0
Consonant gradation f:0
Consonant gradation r:0
Consonant gradation v:0
Consonant gradation j:0
Consonant gradation t:0
Gradation Series 4, II-I, tj and ts
The Cx:Cy format was kept for hk:g, hp:b, ht:d, since the left context h:0 was unique, and no compilation conflict thus arose.
The bb:pp, gg:kk, dd:tt alternations were split into three rules, since keeping them in one Cx:Cy rule created compilation conflicts. Also, d:t contain a rule not found for the other two…
Gradation Series 4, II-I
bb:pp
gg:kk
vákke0
g:k change for clitic -ge
dd:tt and dtj, dts
Gradation Series 7, III-II, ks(t), kt, ktj, kts
Exceptional II-III inverse gradation in present participles
This gradation is only for II-I syllable verbs that get III as present participles.
Candidates:
ddj - dj - dj
hpp - hp - b
Strategy: Do insertion rule for the initial element.
Consonant insertion as II-III strengthening gradation with bm, gŋ
Consonant insertion as II-III strengthening gradation with dn/j + as I-III strengthening gradation with d
Consonant insertion as II-III strengthening gradation with hk, hp,
Consonant insertion as II-III strengthening gradation with htt(j/s)
Debugging of twol-rules
All rule conflicts have been successfully resolved. The rule file should be kept that way. Look out for conflicts in the compilation process, and resolve them as they appear!
This (part of) documentation was generated from src/fst/morphology/phonology.twolc
@CODE
= telephone number (beta testings)All Err-tags must have a normative form as lemma except Err/Lex
hfst-pmatch
The tags are of the following form:
These govern compound behaviour for normative tools like the speller, ie. what a compound SHOULD BE.
The first part of the component may be ..
This part of the component can ..
The second part of the compound may require that the previous (left part) is (and thus overrides the regular CmpN tags):
But these tags can again be overriden by the first word in a compound, if this part of the compound is tagged with a def tag:
Tags for compound analysis - this is what a compound actually is. Some of these tags are also used in combination with the above normative tags to actually enforce compound restrictions in the fst.
These tags should always be located just before the POS tag.
Not sure which section this goes in: (before POS)
The following tags are used to describe the dynamic derivational system in Lule Sámi as encoded in this lexical description. The tags are classified according to a positional system, where each tag can be in one and only one position, and can only combine with tags from an earlier / lower position. This is done to avoid possible overgeneration in the derivational system.
+Der/gusj Prop -I
There are no such tags in SMJ, but for symmetry and code coherence with SME the class is still kept.
The following tags are used to guide conversion to IPA: loan words and foreign names are usually pronounced (approximately) as in the originating (majority) language. Instead of trying to identify the correct pronunciation based on phonotactics (orthotactics actually), we tag all words that can’t be correctly transcribed using the SME transcriber with source language codes. Once tagged, it is possible to split the lexical transducer in smaller ones according to langu- age, and apply different IPA conversion to each of them. The principle of tagging is that we only tag to the extent needed, and following a priority:
Tags from SME, coming to smj by propernouns.
We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again:
Flag diacritic | Explanation |
---|---|
@P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised |
@C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised |
Flag diacritic | Explanation |
---|---|
@P.Pmatch.Loc@ | Used on multi-token analyses; tell hfst-tokenise/pmatch where in the form/analysis the token should be split. Used e.g. in bijladagi to split bijla from dagi , or after abbreviations with full stops before the full stop, to allow an alternate +CLB analysis of it in case of a sentence final abbreviation. NB! This will give a faulty lemma for the abbreviation, as it will not include the full stop. This can lead to other issues, but presently we have no other solution if we want to keep the full stopp as a separate token. We could leave a full stop at the end of the abbreviation lemma as well (but not on the input side - we only have one full stop in the input). That must be tested, it could work, but then requires special attention when generating suggestions in e.g. grammar checkers - it should not generate two full stops. |
@P.Pmatch.Backtrack@ | Used on single-token analyses; tell hfst-tokenise/pmatch to backtrack by reanalysing the substrings before and after this point in the form (to find combinations of shorter analyses that would otherwise be missed) |
Flag diacritic | Explanation |
---|---|
@D.ErrOrth.ON@ | To be written |
@R.ErrOrth.ON@ | To be written |
@C.ErrOrth@ | To be written |
@P.ErrOrth.ON@ | To be written |
For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm.
Flag diacritic | Explanation |
---|---|
@P.CmpFrst.FALSE@ | Require that words tagged as such only appear first |
@D.CmpPref.TRUE@ | Block such words from entering ENDLEX |
@P.CmpPref.FALSE@ | Block these words from making further compounds |
@D.CmpLast.TRUE@ | Block such words from entering R |
@D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding |
@U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding |
@U.CmpNone.TRUE@ | Combines with the two previous ones to block compounding |
@P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R |
@D.CmpOnly.FALSE@ | Disallow words coming directly from root. |
@U.CmpHyph.FALSE@ | Flag to control hyphenated compounds like proper nouns |
@U.CmpHyph.TRUE@ | Flag to control hyphenated compounds like proper nouns |
@C.CmpHyph@ | Flag to control hyphenated compounds like proper nouns |
Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags.
Flag diacritic | Explanation |
---|---|
@U.Cap.Obl@ | Disallow downcasing of names when not derived: Deatnu |
@U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj. |
@P.Px.add@ | Giving possibility for Px-suffixes (all except from Nom 3.p) |
@R.Px.add@ | Requiring P.Px.add-flag for Px-suffixes (all except from Nom 3.p) |
@P.Nom3Px.add@ | Giving possibility for Px-suffixes Nom 3.p |
@R.Nom3Px.add@ | Requiring P.Nom3Px.add flag for Px-suffixes Nom 3.p |
@C.SpellRlx@ Flag used to tag spell-relax-analysed strings (and only those).
Flag diacritic | Explanation |
---|---|
@U.number.one@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.two@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.three@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.four@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.five@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.six@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.seven@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.eight@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.nine@ | Flag used to give arabic numerals in smj different cases ; |
@U.number.zero@ | Flag used to give arabic numerals in smj different cases ; |
Root
The beginning of everything. Every FST defined in LexC must start with the
reserved lexicon name Root
.
LEXICON Acronym
LEXICON ProperNoun
And this is the ENDLEX of everything:
@D.CmpOnly.FALSE@@D.CmpPref.TRUE@@D.NeedNoun.ON@ ENDLEX2 ;
The @D.CmpOnly.FALSE@
flag diacritic is ued to disallow words tagged
with +CmpNP/Only to end here.
The @D.NeedNoun.ON@
flag diacritic is used to block illegal compounds.
ENDLEX2
ENDLEX3
ENDLEX4
This (part of) documentation was generated from src/fst/morphology/root.lexc
vájnno vájnno vájnno
This (part of) documentation was generated from src/fst/morphology/stems/adjectives.lexc
sme mojonjálmmiid
This (part of) documentation was generated from src/fst/morphology/stems/adverbs.lexc
LOAN LOAN LOAN LOAN SWE altar
This (part of) documentation was generated from src/fst/morphology/stems/nouns.lexc
XXXtuvsánat
XXXtuvsánat
This (part of) documentation was generated from src/fst/morphology/stems/numerals.lexc
Reciprocal pronouns as multiword expression
This (part of) documentation was generated from src/fst/morphology/stems/pronouns.lexc
Splitting in 4 + 1 groups, because of the preprocessor
**LEXICON ITRAB ** are intransitive abbreviations, A.S. etc.
**LEXICON NOAB ** du, gen, jur
This class contains homonyms, which are both intransitive abbreviations and normal words. The abbreviation usage is less common and thus only the occurences in the middle of the sentnece (when next word has small letters) can be considered as true cases.
For abbrs for which numerals are complements, but other words not necessarily are. This group treats arabic numerals as if it were transitive but letters as if it were intransitive.
This lexicon is for abbrs that always have a constituent following it.
This class contains homonyms, which are both abbrs for which numerals are complements and normal words. The abbreviation usage is less common and thus only the occurences in the middle of the sentence can be considered as true cases.
This (part of) documentation was generated from src/fst/morphology/stems/smj-abbreviations.lexc
Converts ACROS to IPA. Intended for use with TTS.
>
marks undlying morpheme boundary between lemma and inflectional suffix,
:
is the same, but in the surface orthography. The idea is that the
pronunciation of the last letter sound (like e:
when reading the letter
P
) can be different when followed by a case ending compared to when not.
If that is not true, the system can be simplified.
Default, letter by letter pronunciation
This (part of) documentation was generated from src/fst/phonetics/acro2ipa.xfscript
retroflex plosive, voiceless t ʈ 0288, 648 (
= ASCII 096)
retroflex plosive, voiced d ɖ 0256, 598
labiodental nasal F ɱ 0271, 625
retroflex nasal n
ɳ 0273, 627
palatal nasal J ɲ 0272, 626
velar nasal N ŋ 014B, 331
uvular nasal N\ ɴ 0274, 628
bilabial trill B\ ʙ 0299, 665
uvular trill R\ ʀ 0280, 640
alveolar tap 4 ɾ 027E, 638
retroflex flap r ɽ 027D, 637
bilabial fricative, voiceless p\ ɸ 0278, 632
bilabial fricative, voiced B β 03B2, 946
dental fricative, voiceless T θ 03B8, 952
dental fricative, voiced D ð 00F0, 240
postalveolar fricative, voiceless S ʃ 0283, 643
postalveolar fricative, voiced Z ʒ 0292, 658
retroflex fricative, voiceless s
ʂ 0282, 642
retroflex fricative, voiced z` ʐ 0290, 656
palatal fricative, voiceless C ç 00E7, 231
palatal fricative, voiced j\ ʝ 029D, 669
velar fricative, voiced G ɣ 0263, 611
uvular fricative, voiceless X χ 03C7, 967
uvular fricative, voiced R ʁ 0281, 641
pharyngeal fricative, voiceless X\ ħ 0127, 295
pharyngeal fricative, voiced ?\ ʕ 0295, 661
glottal fricative, voiced h\ ɦ 0266, 614
alveolar lateral fricative, vl. K alveolar lateral fricative, vd. K\
labiodental approximant P (or v) alveolar approximant r\ retroflex approximant r` velar approximant M\
retroflex lateral approximant l`
palatal lateral approximant L
velar lateral approximant L
Clicks
bilabial O\ (O = capital letter)
dental |
(post)alveolar !\
palatoalveolar =\
alveolar lateral ||
Ejectives, implosives
ejective > e.g. ejective p p> implosive < e.g. implosive b b< Vowels
close back unrounded M close central unrounded 1 close central rounded } lax i I lax y Y lax u U
close-mid front rounded 2 close-mid central unrounded @\ close-mid central rounded 8 close-mid back unrounded 7
schwa @
open-mid front unrounded E open-mid front rounded 9 open-mid central unrounded 3 open-mid central rounded 3\ open-mid back unrounded V open-mid back rounded O
ash (ae digraph) { open schwa (turned a) 6
open front rounded & open back unrounded A open back rounded Q Other symbols
voiceless labial-velar fricative W voiced labial-palatal approx. H voiceless epiglottal fricative H\ voiced epiglottal fricative <\ epiglottal plosive >\
alveolo-palatal fricative, vl. s\ alveolo-palatal fricative, voiced z\ alveolar lateral flap l\ simultaneous S and x x\ tie bar _ Suprasegmentals
primary stress “
secondary stress %
long :
half-long :\
extra-short _X
linking mark -
Tones and word accents
level extra high _T level high _H level mid _M level low _L level extra low _B downstep ! upstep ^ (caret, circumflex)
contour, rising contour, falling _F contour, high rising _H_T contour, low rising _B_L
contour, rising-falling _R_F
(NB Instead of being written as diacritics with _, all prosodic marks can alternatively be placed in a separate tier, set off by < >, as recommended for the next two symbols.)
global rise
voiceless 0 (0 = figure), e.g. n_0 voiced _v aspirated _h more rounded _O (O = letter) less rounded _c advanced _+ retracted _- centralized _” syllabic = (or _=) e.g. n= (or n=) non-syllabic _^ rhoticity `
breathy voiced _t creaky voiced _k linguolabial _N labialized _w palatalized ‘ (or _j) e.g. t’ (or t_j) velarized _G pharyngealized _?\
dental d apical _a laminal _m nasalized ~ (or _~) e.g. A~ (or A~) nasal release _n lateral release _l no audible release _}
velarized or pharyngealized _e velarized l, alternatively 5 raised _r lowered _o advanced tongue root _A retracted tongue root _q
This (part of) documentation was generated from src/fst/phonetics/smj2sampa-from-old-infra.xfscript
Converts to IPA. Mainly intended for use with TTS.
This (part of) documentation was generated from src/fst/phonetics/txt2ipa.xfscript
At some points we will need the genitives, for approximate numbers. Here they are.
avta guovte gålmå nielje vidá gudá gietja gávtse avtse låge lågenanavta lågenanguovte
This (part of) documentation was generated from src/fst/transcriptions/clock-from-old-infra.lexc
We describe here how abbreviations in Lule Sami are read out, e.g. for text-to-speech systems.
This class contains homonyms, which are both intransitive abbreviations and normal words. The abbreviation usage is less common and thus only the occurences in the middle of the sentnece (when next word has small letters) can be considered as true cases.
For abbrs for which numerals are complements, but other words not necessarily are. This group treats arabic numerals as if it were transitive but letters as if it were intransitive.
This lexicon is for abbrs that always have a constituent following it
This class contains homonyms, which are both abbrs for which numerals are complements and normal words. The abbreviation usage is less common and thus only the occurences in the middle of the sentence can be considered as true cases.
This (part of) documentation was generated from src/fst/transcriptions/transcriptor-abbrevs2text.lexc
We describe here how abbreviations in Lule Sami are read out, e.g. for text-to-speech systems.
This (part of) documentation was generated from src/fst/transcriptions/transcriptor-acro2text.lexc
This is still a dummy file.
This (part of) documentation was generated from src/fst/transcriptions/transcriptor-date-digit2text.lexc
This (part of) documentation was generated from src/fst/transcriptions/transcriptor-numbers-digit2text.lexc
We describe here how abbreviations in Lule Sami are read out, e.g. for text-to-speech systems.
Miscellaneous symbols
Smileys
Emojies
Clause boundary symbols
Single punctuation marks
Paired punctuation marks
This (part of) documentation was generated from src/fst/transcriptions/transcriptor-symbols2text.lexc
L U L E S A A M I G R A M M A R C H E C K E R
This section lists all the tags inherited from the fst, and used as tags in the syntactic analysis. The next section, Sets, contains sets defined on the basis of the tags listed here, those set names are not visible in the output.
BOS EOS
N A Adv V Pron CS CC CC-CS Po Pr Pcle Num Interj ABBR ACR CLB
LEFT RIGHT WEB
PPUNCT PUNCT
COMMA
Pers Dem Interr Indef Recipr Refl Rel Coll NomAg Prop Allegro Arab Romertall
Nom Abe Acc Gen Ine Ela Ill Loc Com Ess Ess Sg Du Pl Cmp/SplitR Cmp/SgNom Cmp/SgGen Cmp/SgGen PxSg1 PxSg2 PxSg3 PxDu1 PxDu2 PxDu3 PxPl1 PxPl2 PxPl3 Px
Comp, both for adverbs and adjectives Superl, both for adverbs and adjectives Attr Ord Qst IV TV Prt Prs Ind Pot Cond Imprt ImprtII Sg1 Sg2 Sg3 Du1 Du2 Du3 Pl1 Pl2 Pl3
Inf ConNeg Neg PrfPrc VGen PrsPrc Ger Sup Actio VAbess
Err/Orth
PROP-ATTR PROP-SUR
TIME-N-SET
@+FAUXV @+FMAINV @-FAUXV @-FMAINV @-FSUBJ> @-F<OBJ @-FOBJ> @-FSPRED<OBJ @-F<ADVL @-FADVL> @-F<SPRED @-F<OPRED @-FSPRED> @-FOPRED> @>ADVL @ADVL< @<ADVL @ADVL> @ADVL @HAB> @<HAB @>N @Interj @N< @>A @P< @>P @HNOUN @INTERJ @>Num @Pron< @>Pron @Num< @OBJ @<OBJ @OBJ> @OPRED @<OPRED @OPRED> @PCLE @COMP-CS< @SPRED @<SPRED @SPRED> @SUBJ @<SUBJ @SUBJ> SUBJ SPRED OPRED @PPRED @APP @APP-N< @APP-Pron< @APP>Pron @APP-Num< @APP-ADVL< @VOC @CVP @CNP OBJ