Finite state and Constraint Grammar based analysers, proofing tools and other resources
View the project on GitHub giellalt/lang-smj
LEXICON MUORRA Standard even stems with cg (note Q1). OBS: Nouns with invisible 3>2 cg (as busºsa) go to this lexicon.
kártta+N+Sg+Nom
kártta+N+Sg+Com
LEXICON TÁLLA Same as MUORRA, but for words with º (extra length). Not in MUORRA because of other err/orths
LEXICON ALMME Same as MUORRA, but with special -LASJ derivation. For noun that have strong grade -lasj. “Almmelasj” instead of “almálasj” which is Err/Orth-taged..
LEXICON NOADE Even stem without cg. OBS: No nouns with invisible 3>2 cg (as busºsa) in this lexicon. OBS: Because of denominal nouns taking a weak grade stem, entries in grade 3 are given the gradation mark º in order to prevent alternation to weak grade. We should consider creating a separate denominal nouns lexicon for NOADE instead.
låda+N+Sg+Nom
låda+N+Sg+Ela
LEXICON KÁFFA For even-syll words with cg cg III-I: káf’fa-káfav, jáf’fo-jáfo. No vowelchanges jet, need new twolc code.
káffa+N+Sg+Nom
káffa+N+Sg+Nom
káffa+N+Sg+Ela
káffa+N+Sg+Ela
LEXICON LINNJA Only for the loan word “linnja”. Because it’s a loan word, the “nnj” is pronounced “nn-j”, and therefore does not behave as the regular lule sami “nj” sound and therefore it doesn’t follow the rule that makes a:á in 1. grade with short vowel in first syllable (It isn’t as linnja-linjáv or birás-birrasav). This word is therefore sub taged. Norwegian/Swedish words with a short “i” followed by two different consonants are assimilated to lule sami in different manners accoring to the consonants in question, but the word is always on grade III (Morén-Duolljá 2014). Both err/orth and correct form is part of this lexicon.
LEXICON BOAKSA Only for word “boaksa”. Both boaksa-båvsa and Err/Orth boaksa-båksa are part of lexicon.
LEXICON SÁMEGIEL Compounds on -giella, with short -giel as middle compound (sámegielåhpadiddje)
rievsakgiella+N+Sg+Nom
LEXICON AHKA Words like tjerastahka, with short compound form
báládahka+N+Sg+Nom
báládahka+N+Sg+Nom
báládahka+N+Sg+Acc
LEXICON DARRHA Only for “darrha” or compounds that end on “darrha”.
báktedarrha+N+Sg+Nom
báktedarrha+N+Sg+Acc
LEXICON GÁDDE 2 syllable stems with cg (note Q1) with comparatives
boassjo+N+Sg+Nom
boassjo+N+Sg+Com
boassjo+N+Sg+Com
boassjo+N+Der/Comp+A+Sg+Nom
LEXICON SJIEVNNJET Like GAHPER but with comparatives. Odd-syllable C-final noun without cg, no vowchange, no short Ess.
sjievnnjet+N+Sg+Nom
sjievnnjet+N+Sg+Ela
sjievnnje+N+Der/Comp+A+Sg+Nom
sjievnnje+N+Der/Superl+A+Sg+Nom
LEXICON ÅLGGO Like MUORRA, but with comparatives. This lexicon was previously without sg ill/ine/elat, but these nouns can be conjugated for regular location cases. However, “adverbs” like ålggot (from outside), nuorttan (at north), oarjas (to south), etc., are more commonly used to denote location/direction (should therefore maybe consider subing the regular location case forms).
lulle+N+Sg+Nom
lulle+N+Sg+Acc
lulle+N+Der/Comp+A+Sg+Nom
LEXICON MIEHTE Like MUORRA but no locative/elative/illative sg. Presently no words in this lexica except for err subed nuortto
nuortto+N+Sg+Nom
nuortto+N+Sg+Acc
LEXICON BÅVSÅ Like MUORRA, only in plural. All, except ganta, juvdá and ávta, have regular, singular stem counterparts.
båvså+N+Pl+Nom
båvså+N+Pl+Acc
LEXICON LÅHTSASA Like GAHPER, only in plural. Without derivations, these should maybe be added.
LEXICON MUORRA_LOAN For loan words that do not fit in a loan word lexicon because of wrong short cmp, or partially assimilated loanwords without separate lexicas (medállja), or for Err/Orths assimilated with cg but with other errors. This lexicon gives no short compound forms. Potential short cmps must therefore be hard coded into the FirstComponent lexicon. This also for compounded words with partially assimilated loan words. Examples of problem words: sirup>siráhppa og stetoskop>stetoskoahppa.
LEXICON MUORRA_LOAN_NO_LASJ Like MUORRA_LOAN without -lasj derivation. This lexicon is made for Sem/Hum words like økonåvmmå, biolåvggå, agronåvmmå and so on. We don’t want agronåvmålasj since it means something else than “agronomisk”, the meaning of agronåvmålasj is barely used but messed up with “agronomijjalasj”
LEXICON MUORRA_LOAN_EXTRA_LENGTH Same as MUORRA_LOAN just for words with º (extra length).
LEXICON KAFIEDJA_CMP_INFL Recent loanwords on -edja. Ends on -é in norwegian. Short and long cmp. “Kafea” and “kaféa” are subtaged. See comments about the -ie/-e dialtags in ALFABIEHTTA.
LEXICON ALLEGORIJJA_CMP_INFL Recent loanwords ending on -i in NOR/SWE, with long and short compound form. Standardized as-iddja (SWE) and -ijºja (NOR). Previously often assimilated as -ija (or just -ia), but both forms are ungrammatical: Short vowels cannot preceed and follow a single intervocalic consonant. -ija is thus ungrammatical as the short a would be lenghtened to á, like “idja-ijá”.
akademijja+N+Sg+Nom
akademijja+N+Sg+Nom
akademijja+N+Sg+Ela
akademijja+N+Sg+Ela
LEXICON TEKSTIJLLA_CMP_INFL Recent loanwords on -ijlla with long and short compound-form. . Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
tekstijlla+N+Sg+Nom
tekstijlla+N+Sg+Ela
LEXICON ASIJLLA_CMP_INFL Recent loanwords on -ijlla, from nor and swe words ending on -yl. With long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
asijlla+N+Sg+Nom
asijlla+N+Sg+Ela
LEXICON BENSIJNNA Recent loanwords on -ijnna with long and short compound-form
LEXICON BENSIJNNA_CMP_INFL Recent loanwords on -ijnna with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
tamburijnna+N+Sg+Nom
tamburijnna+N+Sg+Ela
LEXICON MASJIJNNA_CMP_INFL Recent loanwords on -sjijnna with long and short compound-form: -SKIN
bivtasmasjijnna+N+Sg+Nom
bivtasmasjijnna+N+Sg+Ela
LEXICON ADJEKTIJVVA_CMP_INFL Recent loanwords on -ijvva with long and short compound-form
datijvva+N+Sg+Nom
datijvva+N+Sg+Ela
LEXICON PARADIJSSA_CMP_INFL Recent loanwords on -ijssa with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
servijssa+N+Sg+Nom
servijssa+N+Sg+Ela
LEXICON TELEFÅVNNÅ_CMP_INFL Recent loanwords on -åvnnå with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
persåvnnå+N+Sg+Nom
persåvnnå+N+Sg+Ela
LEXICON INSTITUSJÅVNNÅ_CMP_INFL Recent loanwords on -sjåvnnå with long and short compound-form: -TION IN SWEDISH. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
populasjåvnnå+N+Sg+Nom
populasjåvnnå+N+Sg+Ela
LEXICON MISJÅVNNÅ_CMP_INFL Recent loanwords on -sjåvnnå with long and short compound-form: -SSION IN SWEDISH. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
sesjåvnnå+N+Sg+Nom
sesjåvnnå+N+Sg+Ela
LEXICON PENSJÅVNNÅ_CMP_INFL Recent loanwords on -sjåvnnå with long and short compound-form: -SION IN SWEDISH. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
suspensjåvnnå+N+Sg+Nom
suspensjåvnnå+N+Sg+Ela
LEXICON PARTISIHPPA_CMP_INFL Recent loanwords from swe -cip and nor -sipp, becoming -sihppa in Norway, both -sijppa and -sihppa are used in Sweden (Particip vs partisipp). Short and long compound-form.
partisihppa+N+Sg+Nom
partisihppa+N+Sg+Ela
partisihppa+N+Sg+Nom
partisihppa+N+Sg+Ela
LEXICON ALKOHÅVLLÅ_CMP_INFL Recent loanwords on -åvllå with long and short compound-form. The old stadarization form “alkohola” is sub taged. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
parabåvllå+N+Sg+Nom
parabåvllå+N+Sg+Ela
LEXICON AGRONÅVMMÅ_CMP_INFL Recent loanwords on -åvmma with long and short compound-form. -lasj derivation is error taged. The old stadarization form -oma that does not follow lulesami rules is sub taged.
agronåvmmå+N+Sg+Nom
agronåvmmå+N+Sg+Ela
LEXICON DEMAGÅVGGÅ_CMP_INFL Recent loanwords ending on -og with long and short compound form. Assimilated to smj as -åvggå. -lasj derivation is error taged. The old stadarization -oga that does not follow lulesami rules is sub taged.
pedagåvggå+N+Sg+Nom
pedagåvggå+N+Sg+Nom
pedagåvggå+N+Sg+Ela
LEXICON LAKTÅVSSÅ_CMP_INFL Recent loanwords ending on -ose in nrowegian and -os in swedish, with long and short compound form. Assimilated to smj as -åvsså. The old stadarization -oga that does not follow lulesami rules is sub taged.
laktåvsså+N+Sg+Nom
laktåvsså+N+Sg+Ela
LEXICON FAKTÅVRRÅ_CMP_INFL Recent loanwords on -åvrrå with long and short compound-form.
LEXICON MIKROSKÅVPPÅ_CMP_INFL Recent loanwords on -åvppå (-op in NOB/SWE) with long and short compound-form. Long vowel and short consonant is assimilated with njuoban, but somehow a lot of -op words are assimilated -oahppa (biskop is pronounced as -opp, so that’s different, maybe some have used “biskop” as template), so this is Err/Orth taged.
oajvvekontåvrrå+N+Sg+Nom
oajvvekontåvrrå+N+Sg+Ela
LEXICON KULTUVRRA_CMP_INFL Recent loanwords on -vrra with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
muvrra+N+Sg+Nom
muvrra+N+Sg+Com
LEXICON TERAPÆVTTA_CMP_INFL Recent loanwords on -ævtta/ievtta with long and short compound-form. No -lasj derivation. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
terapævtta+N+Sg+Nom
terapævtta+N+Sg+Nom
terapævtta+N+Sg+Nom
terapævtta+N+Sg+Com
terapævtta+N+Sg+Com
LEXICON ADVÆRBBA_CMP_INFL Recent loanwords on -ærbba with long and short compound-form
detransitijvvaværbba+N+Sg+Nom
detransitijvvaværbba+N+Sg+Nom
detransitijvvaværbba+N+Sg+Ela
LEXICON SUBSTÁNSSA_CMP_INFL Recent loanwords on -ánssa with long and short compound-form. Originally -ans in SWE and NOR. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
instánssa+N+Sg+Nom
instánssa+N+Sg+Ela
LEXICON VALÆNSSA_CMP_INFL Recent loanwords on -ænssa with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
intelligænssa+N+Sg+Nom
intelligænssa+N+Sg+Nom
intelligænssa+N+Sg+Acc
LEXICON ADVOKÁHTTA_CMP_INFL Recent loanwords on -áhtta with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
klimáhtta+N+Sg+Nom
klimáhtta+N+Sg+Ela
LEXICON ALFABIEHTTA_CMP_INFL Recent loanwords originally on -et both in Norway and Sweden. Assimilation differences, however, create two lule sami forms: -iehtta in NOR and -æhtta in SWE. LONG -e is assimilated in different ways in Norway and Sweden: In Norway, it becomes -ie, and in Sweden -e. Tiedja/tedja, systiebma/systebma and so on. This is especially apparent in assimilated words with long e in third grade: E becomes æ in third grade so we get “universitæhtta” in SWE, but this is very strange to people on the norwegian side of the border as they want “universitiehtta”. Both -ie and -e are dialtaged in lexicons HYDROGIEDNA, APOTIEHKKA, SYSTIEBMA, KAFÉ. Previously people often wrote -ehtta in Norway, but this is incorrect as e always becomes æ in grade three.
mobilitiehtta+N+Sg+Nom
mobilitiehtta+N+Sg+Nom
mobilitiehtta+N+Sg+Acc
mobilitiehtta+N+Sg+Acc
LEXICON INTERNÆHTTA_CMP_INFL Recent loanwords on -æhtta with long and short compound-form: -ET IN SWEDISH, -ETT in norwegian. Differs from ALFABIEHTTA because -ehtta isn’t used in NOR.
intranæhtta+N+Sg+Nom
intranæhtta+N+Sg+Nom
intranæhtta+N+Sg+Ela
LEXICON TABLÆHTTA_CMP_INFL Recent loanwords on -æhtta with long and short compound-form. -ETT in both norwegian and in swedish.
kvartæhtta+N+Sg+Nom
kvartæhtta+N+Sg+Nom
kvartæhtta+N+Sg+Ela
LEXICON INSTITUHTTA_CMP_INFL Recent loanwords on -uhtta, with long and short compound-form on -utt(NOR)/-ut(SWE). The swedish -ut also gets uvtta, as ANTIHKKA-antijkka, but instituhtta is also used in sweden, so no Area/NO tag.
minuhtta+N+Sg+Nom
minuhtta+N+Sg+Nom
minuhtta+N+Sg+Ela
minuhtta+N+Sg+Ela
LEXICON SATELIHTTA_CMP_INFL Recent loanwords on -ihtta, with long and short compound-form on -itt(NOR)/-it(SWE). The swedish -it also gets ijtta, as ANTIHKKA-antijkka, but satelihtta is also used in sweden, so no Area/NO tag.
inuihtta+N+Sg+Nom
inuihtta+N+Sg+Nom
inuihtta+N+Sg+Ela
inuihtta+N+Sg+Ela
LEXICON APOTIEHKKA_CMP_INFL Recent loanwords on -iehkka in NOR, -æhkka in SWE. -ehkka as sub. With long and short compound-form on -k. See comments about the -ie/-e dialtags in ALFABIEHTTA.
old “apotehkka” (long e not allowed in grad III, even though it’s in dictionaries it’s wrong)
kartotiehkka+N+Sg+Nom
kartotiehkka+N+Sg+Ela
kartotiehkka+N+Sg+Nom
kartotiehkka+N+Sg+Ela
LEXICON ANTIHKKA_CMP_INFL Recent loanwords on -hkka in Norway, both -ijkka and -hkka are used in Sweden (Antik vs antikk). With long and short compound-form on -kk/-k. The swedish forms were earlier added to stems for the Swedish version, but now added here.
dialektihkka+N+Sg+Nom
dialektihkka+N+Sg+Ela
dialektihkka+N+Sg+Nom
dialektihkka+N+Sg+Ela
LEXICON SEMINÁRRA_CMP_INFL Recent loanwords on -árra with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
hektárra+N+Sg+Nom
hektárra+N+Sg+Ela
LEXICON AREÁLLA_CMP_INFL Recent loanwords on -álla with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
gasskavokálla+N+Sg+Nom
gasskavokálla+N+Sg+Ela
LEXICON AMBASSADERRA_CMP_INFL Recent loanwords on -ør with long and short compound-form. Standarized by Giellagálldo 05.05.14 as -erra. -ørra is subtaged
observaterra+N+Sg+Nom
observaterra+N+Sg+Ela
LEXICON VETERINERRA_CMP_INFL Recent loanwords on -erra. Words ending in -ær in both SWE and NOR. Both long and short compound-form. The old standardization form -æra, without cg, is subtaged, -also -ær’ra and -ærra.
LEXICON ATMOSFERRA_CMP_INFL Recent loanwords -on erra. But with different endings in SE and NO, ending on -ære, -ær in NOR and -är, -ära in SWE (Ingefær NO, ingefära in SE). Only long compound-form, short form must be hardcoded in firstcompnent lexicon. The old standardization form -æra, and -era, without cg, are subtaged, -also -ær’ra and -ærra.
atmosferra+N+Sg+Nom
atmosferra+N+Sg+Ela
LEXICON KARAKTIERRA_CMP_INFL Recent loanwords -on ierra in NOR, -erra in SWE, because of long e assimilates diffenrent ways. Words ending on -er in NOR, and -er or -är in SWE. Only long compound-form, short form must be hardcoded in firstcompnent lexicon.
karaktierra+N+Sg+Nom
karaktierra+N+Sg+Ela
karaktierra+N+Sg+Nom
karaktierra+N+Sg+Ela
LEXICON TABÆLLA_CMP_INFL Recent loanwords on -älºla with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
flotælla+N+Sg+Nom
flotælla+N+Sg+Nom
flotælla+N+Sg+Ela
LEXICON TELEGRÁMMA_CMP_INFL Recent loanwords on -ámºma with long and short compound-form
grámma+N+Sg+Nom
grámma+N+Sg+Ela
LEXICON TOPOGRÁFFA_CMP_INFL Recent loanwords on -áfºfa with long and short compound-form, no -lasj derivation since most of these words are humans.
telegráffa+N+Sg+Nom
telegráffa+N+Sg+Ela
LEXICON SYSTIEBMA_CMP_INFL Recent loanwords on -ebma/-iebma with long and short compound-form. -em in NOR and SWE. See comments about the -ie/-e dialtags in ALFABIEHTTA. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
vokalsystiebma+N+Sg+Nom
vokalsystiebma+N+Sg+Nom
vokalsystiebma+N+Sg+Ela
vokalsystiebma+N+Sg+Ela
LEXICON ORGÁDNA_CMP_INFL Recent loanwords on -ádna with long and short compound-form
doarjjaorgádna+N+Sg+Nom
doarjjaorgádna+N+Sg+Nom
doarjjaorgádna+N+Sg+Acc
LEXICON KOLLÆKTA_CMP_INFL Recent loanwords on -ækta with long and short compound-form
subjækta+N+Sg+Nom
subjækta+N+Sg+Nom
subjækta+N+Sg+Ela
LEXICON HYDROGIEDNA_CMP_INFL Recent loanwords on -iedna in NOR and -edna in SWE. Both long and short compound-form. Norwegian/swedish -en. The old standardization form -ena, without cg, is subtaged. See comments about the -ie/-e dialtags in ALFABIEHTTA.
LEXICON PATÆNNTA_CMP_INFL Recent loanwords on -ænnta with long and short compound-form. The -ennta form (used in “Ådå testamennta”) is taged as sub (e always becomes æ in grade three).
patænnta+N+Sg+Nom
patænnta+N+Sg+Nom
patænnta+N+Sg+Ela
LEXICON VARIÁNNTA_CMP_INFL Recent loanwords on -ánnta with long and short compound-form. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
praktikánnta+N+Sg+Nom
praktikánnta+N+Sg+Ela
LEXICON FANATISSMA_CMP_INFL Recent loanwords on -ssma with long and short compound-form.
kabbalissma+N+Sg+Nom
kabbalissma+N+Sg+Ela
LEXICON TURISSTA_CMP_INFL Recent loanwords on -ssta with long and short compound-form. -lasj derivation is error taged. Frequently typos that does not follow lulesami rules are sub taged; These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
journalissta+N+Sg+Nom
journalissta+N+Sg+Ela
LEXICON PRIEMIJ_CMP_INFL Assimilated loanwords. on -ie/-y, like premie and bandy. Become odd syllable loan words with cg, like “riebij”. Nom: premij, gen prebmiha. Long and short essive.
priemij+N+Sg+Nom
priemij+N+Sg+Nom
priemij+N+Sg+Ela
priemij+N+Sg+Ela
priemij+N+Ess
priemij+N+Ess
priemij+N+Ess
priemij+N+Ess
LEXICON A_CMP_INFL Sub-forms. Lexicon for giving sub-variation conjugation by simply adding an -a to the norwegian/swedish word. No cg. Like “alkohola” and “agronoma”. These forms goes against the standarization rule, but are found because of earlier standarizations rules and dictionaries.
LEXICON ERR/ORTH_EVEN_WEAK_CASES Even stem Err/orth lexicon without nominative, illative and essive. Only for entries with ERR/ORTH tag. Made so that we don’t get entries that are both norm and with error tag. Entries like “ålggo” have no grade alternation, a common error is writing it like it has, ålggo>ålgov. tálla>tálav, klimáksa>klimáksav, prefiksa>prefiksav, barggo>barggov
LEXICON ERR/ORTH_EVEN_WEAK_CASES2 Even stem Err/orth lexicon without nominative, illative and essive, AND ALSO Sg+Gen, Sg+Ine, Pl+Nom, Pl+Com and Pl+Gen (to not get homonemies.
LEXICON ERR/ORTH_EVEN_STRONG_CASES Even stem Err/orth lexicon with only nominative, illative and essive. Only for entries with ERR/ORTH tag. Made so that we don’t get entries that are both norm and with error tag. Hydrogena is used as nom and is err/orth, hydrogena>hydrogenav is not err/orth. marináda-nom, banána-nom
LEXICON ERR/ORTH_ODD Err/Orth lexicon doing the opposit of what odd-syllable nouns do. Strong grade in nom and weak in all other.
dálkas+N+Err/Orth+Sg+Nom
dálkas+N+Err/Orth+Sg+Acc
dálkas+N+Err/Orth+Der/Dimin+N+Sg+Nom
LEXICON NOADE_BADASS 2 syll stems without cg. Badly or wrongly assimilated words, ie. assimilated in a way that isn’t lulesami. (Same as NOADE) Most of the words are Err/Orth tagged with a standardized lemma. Some are Err/Lex tagged, 5.9.2019: EJP/SNM: fjerna +Use/-Spell - sjølv om vi ikkje likar orda, så vil vi sjå til at dei blir skrive rett etter smj-ortografien! Dei fleste orda er uansett merka med +Err/Orth :)
balláda+N+Sg+Nom
balláda+N+Sg+Ela
LEXICON C_ILL_IJ_BADASS Badly or wrongly assimilated words. Last letter is consonant, no cg, no vowchange, with illative -ij. (Same as GAHPER) Assimilated in a way that isn’t lulesami. Most of the words are Err/Orth tagged with a standardized lemma. Some are Err/Lex tagged, and some only recieve the +Use/-Spell tag from the lexicon.
sentimehter+N+Sg+Nom
sentimehter+N+Sg+Ela
sentimehter+N+Sg+Ill
LEXICON C_ILL_AJ_BADASS Badly or wrongly assimilated words. Last letter in consonant, no cg, no vowchange, with illative -aj. Should have been assimilated to even-syll, but are used as odd-syll, and mostly just assimilated with changing to letter á. So almosed same as CELSIUS_UNASS.
kálsium+N+Sg+Nom
kálsium+N+Sg+Ela
kálsium+N+Sg+Ill
LEXICON KINO_UNASS_CMP_INFL V-final unassimilated loanwords. Not lulesami. No diacritics whatsoever. Words that aren’t assimilated at all. Really just norwegian words with a kind of sami inflection. Get even syllable case marking. Are part of the spell checker.
netto+N+Sg+Nom
netto+N+Sg+Ela
LEXICON C_ILL_IJ_UNASS C-final unassimilated loanwords, gives illative- ij. Not lulesami. No diacritics whatsoever. Really just foreign words with a kind of sami inflection. Odd syllable case marking (like GAHPER). Are part of the spell checker.
sirkus+N+Sg+Nom
sirkus+N+Sg+Ill
sirkus+N+Sg+Ela
LEXICON C_ILL_AJ_UNASS C-final unassimilated loanwords, gives illativ -aj. Also odd-syll words ending on letter i, as selleri. Not lulesami. No diacritics whatsoever. Really just norwegian words with a kind of sami inflection. Case marking like standard even 4 syllable stems (see proper nouns file on the case marking of foreign words with stressed last syllable). Are part of the spell checker.
aids+N+Sg+Nom
aids+N+Sg+Ill
aids+N+Sg+Ela
aids+N+Ess
aids+N+Abe
aids+N+Abe
aids+N+Der/Dimin+N+Sg+Nom
+Der4+Der/ahtes:e»g AHTES ; Only for odd-syllble stems
LEXICON GÅNÅGIS Standard C-final 4-syllabic stems
rahtjamus+N+Sg+Nom
rahtjamus+N+Sg+Ill
rahtjamus+N+Sg+Ela
LEXICON BERULASJ For words ending on -asj. Same as GÅNÅGIS but with strong essive and illative -adjtan and -adtjaj subtaged, same with PX “-adjtam”. These forms are barely used today. -lahttja is also Err/Orth-taged.
LEXICON BEDNAGASJ Like BERULASJ, but for derived nouns in diminutive. No cg, no vowchange, no short Ess. Has only one dimin derivation since these words already are dimin, ie. no double dim as for GAHPER. No abessive, not totally sure about this, I think we must use postposition dagi when it’s diminutive,
bednagasj+N+Sg+Nom
bednagasj+N+Sg+Ela
LEXICON HÁVSAGUSJ Like BEDNAGASJ, but not diminutive. No cg, no vowchange, no short Ess. Has only one dimin derivation. No abessive, not totally sure about this, I think we must use postposition dagi when it’s diminutive,
LEXICON JIHPELIJ gen:jihpelahá
gehtsulij+N+Sg+Nom
gehtsulij+N+Sg+Acc
LEXICON OARJJILIJ gen:oarjjilihá
allilij+N+Sg+Nom
allilij+N+Sg+Ela
LEXICON VIESSOMUJ gen:viessumuhá
bårråmuj+N+Sg+Nom
bårråmuj+N+Sg+Ill
LEXICON OADÁDAGÁ Plural forms of words like tjerastahka with short compound-form
látjádagá+N+Pl+Nom
látjádagá+N+Pl+Ela
LEXICON BERRAHATTJA Plural stems. Like IEDNITJA, these do not have corresponding singular stems. Most stems here have the same form as the pl nom form of diminutive derivations, but (while it may have originated as a diminuitive derivation) it is not the same derivation (today) and it does not have a singular form.
gahpanisá+N+Pl+Nom
gahpanisá+N+Pl+Ill
gahpanisá+N+Pl+Ela
LEXICON SIJDDALAHÁ Plurals
lullelahá+N+Pl+Nom
lullelahá+N+Pl+Acc
LEXICON SISSNELUHÁ plurals. presently only for sissŋeluhá
sissŋeluhá+N+Pl+Nom
sissŋeluhá+N+Pl+Ill
LEXICON DAGI_SINGULAR Earlier we generated “bijladagi” and bijlajdagi as abbessiv. This has been fixed, but to be able to analyse what we earlier generated, we needed this lexicon. Only singular. Gives Err/tag to “bijladagi” and makes correct “bijla dagi”.
LEXICON DAGI_PLURAL Earlier we generated “bijladagi” and bijlajdagi as abbessiv. This has been fixed, but to be able to analyse what we earlier generated, we needed this lexicon. Only plural. Gives Err/tag to “bijlajdagi” and makes correct “bijlaj dagi”.
LEXICON SURGULASJ-EVEN
LEXICON N-EVENWEAKSTEM-NO-ABE same as N-EVENWEAKSTEM but without abessive (abessive it Err/Infl-taged). Used for 4-syll nouns
LEXICON GAHPER Odd-syllable C-final noun without cg, no vowchange, no short Ess. Spiik A3
stiebil+N+Sg+Nom
stiebil+N+Sg+Ela
LEXICON ÅRES Odd-syllable C-final noun with CG, 2ndsyll vowchange. Long and short essive. Spiik A1
sjattos+N+Sg+Nom
sjattos+N+Sg+Ela
sjattos+N+Ess
sjattos+N+Ess
LEXICON SÅHKÅR Odd-syllable C-final noun with CG and 2ndsyll vowelchange. Has only long essive. Spiik 2b
spiger+N+Sg+Nom
spiger+N+Sg+Ela
spiger+N+Ess
LEXICON MIEHTAR Only for word “miehtar”. Same as SÅHKÅR but with Area-differences and a lot of Err/Orths.
miehtar+N+Sg+Nom
miehtar+N+Sg+Nom
miehtar+N+Sg+Ela
miehtar+N+Sg+Ela
miehtar+N+Ess
miehtar+N+Ess
LEXICON GÁMAS Odd-syllable C-final noun with CG, no 2ndsyll vowchange (OBS: a does not change). Long and short essive. Spiik A2
sjábtjas+N+Sg+Nom
sjábtjas+N+Sg+Ela
LEXICON BENA Odd-syllable V-final noun with cg, no 2nsyll vowchange. Deletes g. Long and short essive. Spiik 2a
galma+N+Sg+Nom
galma+N+Sg+Ela
LEXICON SUOBDE gen: suobddega. Presently only for “suobde”. For some reason -e dosn’t become á. So not in lexicon BENA. Long and short essive.
ságe+N+Sg+Nom
ságe+N+Sg+Acc
LEXICON SÁGE gen: sáhkaha. Presently only for “ságe”. Long and short essive.
ságe+N+Sg+Nom
ságe+N+Sg+Acc
LEXICON BAVSEV Ends on -v and last vowel changes to i: bavsev:baksIma. Not like gierkav gierkkAma and birev birEma.
sievtev+N+Sg+Nom
sievtev+N+Sg+Ela
LEXICON RÁBEV rábev:ráhpuga. Presently only for “rábev”.
rábev+N+Sg+Nom
rábev+N+Sg+Ela
LEXICON RITJAS ! Like GÁMAS but without stem a-lengthening for grade I (underlying long -i-). presently only for “ritjas”.
ritjas+N+Sg+Nom
ritjas+N+Sg+Ela
LEXICON SÅGAS gen: sågaska. Presently only for “sågas”.
sågas+N+Sg+Nom
sågas+N+Sg+Acc
LEXICON SJUVÁJ Presently only for “sjuváj”. sjuváj-sjuvvaga. Only this word
sjuváj+N+Sg+Nom
sjuváj+N+Sg+Ela
LEXICON BØSOJ Because of bösoj in O.Korhonen, and bæsoj-bessuga. Only for these two words. J becomes g.
LEXICON GUOVSOJVUOJOJ vuojoj:vuodjom. Presently only for “guovsojvuojoj”.
guovsojvuojoj+N+Sg+Nom
guovsojvuojoj+N+Sg+Acc
LEXICON BUTJES butjes-buttjása. Presently only for “butjes”. This is an sub. Korhonen has this form but if you look in Grundstöm it’s buttjes-budtjasa. Must be a typo in Korhonen, because ttj-tj dosn’t exist in smj. This form is err subed in stems file.
LEXICON TJÅLKES tjålkes:tjoalkkas- Presently only for “tjålkes and tsålkes”. This must be wrong, and it dosn’t exist in Grundström. Å in 1. syll isn’t possible with e in 2. syll. Must be tjoalkes-tjoalkkása or tjålkas-tjoalkkasa. This form is err subed in stems file.
tsålkes+N+Sg+Nom
(is not standard language)tsålkes+N+Sg+Acc
(is not standard language)LEXICON VÁJES vájes:vádjas- Presently only for “báhkovájes”. It’s a sub: 2. syll e doesn’t become a. Must be vájes-vádjása or vájas-vádjasa. The second is used in NT, so I belive thats the right one. This form is err subed in stems file.
Derived stems
LEXICON BADJEL Derived nouns with acc -elav, ill -elij, elat -elas, etc. These were previously categorized as adpositions and adverbs, but according to Bruce Morén-Duolljá (2014) they are actually case forms of nouns derived from certain location nouns. Derived from even strong stems (badje -> badjel). Odd syllable inflection, but only singular nominative-elative (not clear if they take comitative and essive case). With comparatives. No Px.
allel+N+Sg+Nom
allel+N+Sg+Ela
allel+N+Der/Comp+A+Sg+Nom
LEXICON BÁRNEP bárnep:bárnebu-. Comparisation of nouns. No -ahtá abesive.
iednep+N+Sg+Nom
iednep+N+Sg+Acc
LEXICON OAPPÁSJ Like GAHPER, but for derived nouns in diminutive, have an underived form. Doesn’t get abesive -ahtá or -ahtes derivation. Oddsyll, no cg, no vowchange, no short Ess. Has only one dimin derivation since these words already are dimin, ie. not double dim as in GAHPER.
oappásj+N+Sg+Nom
oappásj+N+Sg+Ela
LEXICON FIERUN Like GAHPER, but instruments derived from verbs. Fierrot>fierun. No short essive.
fierun+N+Sg+Nom
fierun+N+Sg+Ela
LEXICON GUOLLÁR Like GAHPER, but actor derived from contracted verbs (ACTOR for evensyll verbs). Guollit>guollár. No short essive.
LEXICON IELLEM Nomen actionionis derived from even verbs. Earlier these went directly to VSBST-ODD, now they get tag Gram/NomAct before going there. Can’t put it in VSBST-ODD lexicon because paths from verb lexicons.
LEXICON TJIEKTJAMA Pl Nomen actionionis derived from even verbs. Earlier these went directly to VSBST-ODD-PL, now they get tag Gram/NomAct before going there. Can’t put it in VSBST-ODD-PL lexicon because of paths from verb lexicons.
LEXICON AKTIDIBME Nomen actionionis derived from uneven verbs, ending DIBME. Earlier these went directly to VSBST-EVEN, now they get tag Gram/NomAct before going there. Can’t put it in VSBST-ODD lexicon because paths from verb lexicons.
LEXICON BERUSTIBME Nomen actionionis derived from uneven verbs, ending STIBME and DIBME is Err/orth-taged. Earlier these went directly to VSBST-EVEN, now they get tag Gram/NomAct before going there. Can’t put it in VSBST-ODD lexicon because paths from verb lexicons.
LEXICON DÁRBBAGA Like BENA, but plural. Presently only for “dárbbaga”, has singular stem counterpart.
dárbbaga+N+Pl+Nom
dárbbaga+N+Pl+Acc
LEXICON BÆLLJASA Like GÁMAS, but plural. These have corresponding singular stems.
jiednabælljasa+N+Pl+Nom
jiednabælljasa+N+Pl+Nom
jiednabælljasa+N+Pl+Acc
jiednabælljasa+N+Pl+Acc
LEXICON IEDNITJA Odd syllable pluralforms only. These do not have a singular form.
jáhkoguojmitja+N+Pl+Nom
jáhkoguojmitja+N+Pl+Acc
LEXICON SNJIERÁGA Odd syllable pluralforms only. These have corresponding singular stems.
guovlloådåsa+N+Pl+Nom
guovlloådåsa+N+Pl+Acc
LEXICON MANEBU oddsyllable plural only. presently only for “maŋebu”.
maŋebu+N+Pl+Nom
maŋebu+N+Pl+Acc
LEXICON SUOLOJ C-final with cg II-III: ålmåj:ålmmå
njurgoj+N+Sg+Nom
njurgoj+N+Sg+Acc
LEXICON ÅLMÅJ_LOAN Same as SUOLOJ, only for loan words. Follows Ráhka/Mikkelsen’s Bårjås 2014. C-final with cg II-III: ålmåj:ålmmå
bistroj+N+Sg+Nom
bistroj+N+Sg+Acc
bistroj+N+Sg+Acc
LEXICON GUOMOJ C-final with cg I-III: guomoj:guobbmu
ænoj+N+Sg+Nom
ænoj+N+Sg+Acc
ænoj+N+Sg+Nom
ænoj+N+Sg+Acc
LEXICON SARVES C-final with cg II-III. sarves:sarvvá
moarmes+N+Sg+Nom
moarmes+N+Sg+Acc
LEXICON SVÁLES C-final with cg I-III. sváles:svállá (lºl)
sváles+N+Sg+Nom
sváles+N+Sg+Acc
LEXICON GÅHKES C-final with cg II-III with vowel harmony (a/á=å). gåhkes:gåhkkå. Presently only for “gåhkes”.
gåhkes+N+Sg+Nom
gåhkes+N+Sg+Acc
LEXICON SJUOKKAJ sjuokkaj:sjuoggá. Presently only for “sjuokkaj”.
sjuokkaj+N+Sg+Nom
sjuokkaj+N+Sg+Acc
LEXICON GISTÁ gistá:gisstá. Presently only for “gistá”.
gistá+N+Sg+Nom
gistá+N+Sg+Acc
LEXICON DUOLMUN Fierrot>fierun, instruments derived from verbs, used only for verb derivation, not for lexicalized lemmas. No short essive.
This (part of) documentation was generated from src/fst/morphology/affixes/nouns.lexc