German language
-
+SP2 singular or plural Formal
-
+NoArt Stark accompanied preceding Indef
Schwach Def
gemischt Indef
-
+Symbol = independent symbols in the text stream, like £, €, ©
Homonyms
Semantic tags to help disambiguation & synt. analysis: (before POS)
- +Sem/Act = Activity
- +Sem/Adr = Webadr
- +Sem/Amount = Amount
- +Sem/Ani = Animate
- +Sem/Aniprod = Animal Product
- +Sem/Body = Bodypart
- +Sem/Body-abstr = siellu, vuoig?a, jierbmi
- +Sem/Build = Building
- +Sem/Build-part = Part of Bulding, like the closet
- +Sem/Cat = Category
- +Sem/Clth = Clothes
- +Sem/Clth-jewl = Jewelery
- +Sem/Clth-part = part of clothes, boallu, sávdnji…
- +Sem/Ctain = Container
- +Sem/Ctain-abstr = Abstract container like bank account
- +Sem/Ctain-clth = Soft container, like a rucksack
- +Sem/Curr = Currency like dollár, Not Money
- +Sem/Dance = Dance
- +Sem/Dir = Direction like GPS-kursa
- +Sem/Domain = Domain like politics, reindeerherding (a system of actions)
- +Sem/Drink = Drink
- +Sem/Dummytag = Dummytag
- +Sem/Edu = Educational event
- +Sem/Event = Event
- +Sem/Feat = Feature, like Árvu
- +Sem/Feat-phys = Physiological feature, ivdni, fárda
- +Sem/Feat-psych = Psychological feauture
- +Sem/Feat-measr = Psychological feauture
- +Sem/Fem = Female name
- +Sem/Food = Food
- +Sem/Food-med = Medicine
- +Sem/Furn = Furniture
- +Sem/Game = Game
- +Sem/Geom = Geometrical object
- +Sem/Group = Animal or Human Group
- +Sem/Hum = Human
- +Sem/Hum-abstr = Human abstract
- +Sem/Ideol = Ideology
- +Sem/Lang = Language
- +Sem/Mal = Male name
- +Sem/Mat = Material for producing things
- +Sem/Measr = Measure
- +Sem/Money = Has to do with money, like wages, not Curr(ency)
- +Sem/Obj = Object
- +Sem/Obj-clo = Cloth
- +Sem/Obj-cogn = Cloth
- +Sem/Obj-el = (Electrical) machine or apparatus
- +Sem/Obj-ling = Object with something written on it
- +Sem/Obj-rope = flexible ropelike object
- +Sem/Obj-surfc = Surface object
- +Sem/Org = Organisation
- +Sem/Part = Feature, oassi, bealli
- +Sem/Perc-cogn = Cloth
- +Sem/Perc-emo = Emotional perception
- +Sem/Perc-phys = Physical perception
- +Sem/Perc-psych = Physical perception
- +Sem/Plant = Plant
- +Sem/Plant-part = Plant part
- +Sem/Plc = Place
- +Sem/Plc-abstr = Abstract place
- +Sem/Plc-elevate = Place
- +Sem/Plc-line = Place
- +Sem/Plc-water = Place
- +Sem/Pos = Position (as in social position job)
- +Sem/Process = Process
- +Sem/Prod = Product
- +Sem/Prod-audio = Audio product
- +Sem/Prod-cogn = Cognition product
- +Sem/Prod-ling = Linguistic product
- +Sem/Prod-vis = Visual product
- +Sem/Rel = Relation
- +Sem/Route = Route
- +Sem/Rule = Rule or convention
- +Sem/Semcon = Semantic concept
- +Sem/Sign = Sign (e.g. numbers, punctuation)
- +Sem/Sport = Sport
- +Sem/State =
- +Sem/State-sick = Illness
- +Sem/Substnc = Substance, like Air and Water
- +Sem/Sur = Surname
- +Sem/Symbol = Symbol
- +Sem/Time = Time
- +Sem/Tool = Prototypical tool for repairing things
- +Sem/Tool-catch = Tool used for catching (e.g. fish)
- +Sem/Tool-clean = Tool used for cleaning
- +Sem/Tool-it = Tool used in IT
- +Sem/Tool-measr = Tool used for measuring
- +Sem/Tool-music = Music instrument
- +Sem/Tool-write = Writing tool
- +Sem/Txt = Text (girji, lávlla…)
- +Sem/Veh = Vehicle
- +Sem/Wpn = Weapon
- +Sem/Wthr = The Weather or the state of ground
Multiple Semantic tags:
- +Sem/Act_Group Activity and Group
- +Sem/Act_Plc A persons job is an activity, and a place as well
- +Sem/Act_Route Activity and Route, ie johtolat
- +Sem/Amount_Build Amount and Building
- +Sem/Amount_Semcon
- +Sem/Ani_Body-abstr_Hum
- +Sem/Ani_Build
- +Sem/Ani_Build-part
- +Sem/Ani_Build_Hum_Txt
- +Sem/Ani_Group
- +Sem/Ani_Group_Hum
- +Sem/Ani_Hum
- +Sem/Ani_Hum_Plc
- +Sem/Ani_Hum_Time
- +Sem/Ani_Plc
- +Sem/Ani_Plc_Txt
- +Sem/Ani_Time
- +Sem/Ani_Veh
- +Sem/Aniprod_Hum
- +Sem/Aniprod_Obj-clo
- +Sem/Aniprod_Perc-phys
- +Sem/Aniprod_Plc
- +Sem/Ant_Fem Anthtroponym Female
- +Sem/Ant_Mal Anthtroponym Male
- +Sem/Body-abstr_Prod-audio_Semcon
- +Sem/Body_Body-abstr
- +Sem/Body_Clth
- +Sem/Body_Food
- +Sem/Body_Group_Hum
- +Sem/Body_Hum
- +Sem/Body_Mat
- +Sem/Body_Measr
- +Sem/Body_Obj_Tool-catch
- +Sem/Body_Plc
- +Sem/Body_Time
- +Sem/Build-part_Plc
- +Sem/Build_Build-part
- +Sem/Build_Clth-part
- +Sem/Build_Edu_Org
- +Sem/Build_Event_Org
- +Sem/Build_Org
- +Sem/Build_Route
- +Sem/Clth-jewl_Curr
- +Sem/Clth-jewl_Money
- +Sem/Clth-jewl_Plant
- +Sem/Clth_Hum
- +Sem/Ctain-abstr_Org
- +Sem/Ctain-clth_Plant
- +Sem/Ctain-clth_Veh
- +Sem/Ctain_Feat-phys
- +Sem/Ctain_Furn
- +Sem/Ctain_Tool
- +Sem/Ctain_Tool-measr
- +Sem/Curr_Org
- +Sem/Dance_Org
- +Sem/Dance_Prod-audio
- +Sem/Domain_Food-med
- +Sem/Domain_Prod-audio
- +Sem/Edu_Event
- +Sem/Edu_Group_Hum
- +Sem/Edu_Mat
- +Sem/Edu_Org
- +Sem/Event_Food
- +Sem/Event_Hum
- +Sem/Event_Plc
- +Sem/Event_Time
- +Sem/Feat-measr_Plc
- +Sem/Feat-phys_Tool-write
- +Sem/Feat-phys_Veh
- +Sem/Feat-phys_Wthr
- +Sem/Feat-psych_Hum
- +Sem/Feat_Plant
- +Sem/Food_Perc-phys
- +Sem/Food_Plant
- +Sem/Game_Obj-play
- +Sem/Geom_Obj
- +Sem/Group_Hum
- +Sem/Group_Hum_Org
- +Sem/Group_Hum_Plc
- +Sem/Group_Hum_Prod-vis
- +Sem/Group_Org
- +Sem/Group_Sign
- +Sem/Group_Txt
- +Sem/Hum_Lang
- +Sem/Hum_Lang_Plc
- +Sem/Hum_Lang_Time
- +Sem/Hum_Obj
- +Sem/Hum_Org
- +Sem/Hum_Plant
- +Sem/Hum_Plc
- +Sem/Hum_Tool
- +Sem/Hum_Veh
- +Sem/Hum_Wthr
- +Sem/Lang_Tool
- +Sem/Mat_Plant
- +Sem/Mat_Txt
- +Sem/Measr_Time
- +Sem/Money_Obj
- +Sem/Money_Txt
- +Sem/Obj-play
- +Sem/Obj-play_Sport
- +Sem/Obj_Semcon
- +Sem/Clth-jewl_Org
- +Sem/Org_Rule
- +Sem/Org_Txt
- +Sem/Org_Veh
- +Sem/Part_Prod-cogn
- +Sem/Perc-emo_Wthr
- +Sem/Plant_Plant-part
- +Sem/Plant_Tool
- +Sem/Plant_Tool-measr
- +Sem/Plc-abstr_Rel_State
- +Sem/Plc-abstr_Route
- +Sem/Plc_Pos
- +Sem/Plc_Route
- +Sem/Plc_Substnc
- +Sem/Plc_Substnc_Wthr
- +Sem/Plc_Time
- +Sem/Plc_Tool-catch
- +Sem/Plc_Wthr
- +Sem/Prod-audio_Txt
- +Sem/Prod-cogn_Txt
- +Sem/Semcon_Txt
- +Sem/Obj_State
- +Sem/Substnc_Wthr
- +Sem/Time_Wthr
Derivation
Valency tags, i.e. tags assigned to verbs or prepositions for denoting their arguments
- +%<acc%> case tags
- +%<dat%> case tags
- +%<gen%> case tags
- +%<nom%> case tags
TRIGGER
- %^A2E
- %^A2I
- %^A2O
- %^A2Ö
- %^A2U
- %^A2Ü
- %^A2IE
- %^O2E
- %^O2I
- %^O2IE
- %^EI2IE
- %^EI2I
- %^CC2C mm:m0 kommen kommst kamst
- %^DeV g0:ch
- %^LowerVow o:a
- %^Pen penultimate syllable
- %^RmVow remove vowel
- %^Umlaut umlaut a:ä, o:ö, u:ü
Escaped symbols
Symbols that need to be escaped on the lower side (towards twolc):
The Usage extents are marked using following tags:
- +Err/Orth
- ** +Use/Marg ** marginal (?)
- ** +Use/-Spell ** Excluded in speller
- ** +Use/SpellNoSugg ** recognized but not suggested in speller
COMPOUNDING
- +Cmp compound
- +Use/Circ FIXME, (where I am)
Flag diacritics
We have manually optimised the structure of our lexicon using following
flag diacritics to restrict morhpological combinatorics - only allow compounds
with verbs if the verb is further derived into a noun again:
| @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised
| @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised
| @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised
For languages that allow compounding, the following flag diacritics are needed
to control position-based compounding restrictions for nominals. Their use is
handled automatically if combined with +CmpN/xxx tags. If not used, they will
do no harm.
| @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first
| @D.CmpPref.TRUE@ | Block such words from entering ENDLEX
| @P.CmpPref.FALSE@ | Block these words from making further compounds
| @D.CmpLast.TRUE@ | Block such words from entering R
| @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding
| @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding
| @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R
| @D.CmpOnly.FALSE@ | Disallow words coming directly from root.
Use the following flag diacritics to control downcasing of derived proper
nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use
these flags. There exists a ready-made regex that will do the actual down-casing
given the proper use of these flags.
| @U.Cap.Obl@ | Allowing downcasing of derived names: deatnulasj.
| @U.Cap.Opt@ | Allowing downcasing of derived names: deatnulasj.
- @U.Cap.Obl@ Nouns ; This is stated once since R also leads directly to Nouns
This (part of) documentation was generated from src/fst/morphology/root.lexc