On this page
INTRODUCTION TO MORPHOLOGICAL ANALYSER OF Gunwinggu LANGUAGE.
Definitions for Alphabets
Alphabets
The alphabet used to writing surface word-forms in Gunwinggu language are:
a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9 %0
These punctuations are always escaped in lexc files: % %# %: %; %! %< %> %% %” These are other common punctuation in Gunwinggu language
- , . | ? … ¿ ¶ ❡ ¬ • ● · · ‒ – — ― − _ = ≈ @CODE@ ‘ * + ± ` ´ / ~ ‐ ° ( ) [ ] { } « » ‹ › “ ” „ ‟ ‘ ’ ‚ ‛ ❛ ❜ ❝ ❞ ❟ ❠ ❮ ❯ 〝 〞 〟 § € £ ¥ ® © √ ◊ ♦ ☐ ⚬ № ‰ ¢ ¦ ª × ‡ ™ → ■ □ ▲ ► ▼ ★ ☆ ☺ ✓ ❖ 😄 🙂 " ּ U+05BC HEBREW POINT DAGESH OR MAPIQ U+00AD SOFT HYPHEN U+00A0 NO-BREAK SPACE U+202F NARROW NO-BREAK SPACE
Lexical analysis symbols
The morphological analyses of wordforms for the Gunwinggu language are presented in this system in terms of the following symbols.
Tags for parts-of-speech (POS)
Error (non-standard language) tags
- +Err/Orth substandard form
- +Err/Lex substandard word
- +Err/DerSub substandard for derivation
- +Err/CmpSub substandard for compounding
- +Err/MissingSpace missing a space
- +Err/MissingHyph missing a hyphen
- +Err/Hyph unnecessary extra hyphen
- +Err/SpaceCmp extra space in compound
- +Err/Spellrelax typos under spell relax
- +Err/Confused confusion pairs
Usage tags
- +Use/-Spell excluded from speller
- +Use/SpellNoSugg recognized but not suggested in speller
- +Use/Circ circular paths
- +Use/NG excluded from generators
- +Use/PMatch included in tokenisers only
- +Use/-PMatch excluded from tokeisers
- +Use/GC included in grammar checker only
- +Use/-GC excluded from grammar checker
- +Use/TTS included in text-to-speech only
-
+Use/-TTS excluded from text-to-speech
- +Dial/-XYZ forms not in use in
- +Dial/XYZ forms in use in
Tags for indicating the orthography used
+Orth/Strd - Standard orthography +Orth/IPA - IPA transcription
Tags for indicating alternative orthographies, cf configure.ac
+AltOrth/standard - Standard orthography +AltOrth/-standard - NOT Standard orthography
Morphophonology
To represent phonologic variations in word forms we use the following symbols in the lexicon files:
Tags for common files
used to control hyphenation
- ∑ = trigger for hyphen that does not work like hyphenation
Flag diacritics
We have manually optimised the structure of our lexicon using following flag diacritics to restrict morhpological combinatorics - only allow compounds with verbs if the verb is further derived into a noun again: | Flag | Explanation | —- | ———– | @P.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @D.NeedNoun.ON@ | (Dis)allow compounds with verbs unless nominalised | @C.NeedNoun@ | (Dis)allow compounds with verbs unless nominalised
Following flags are used to control tokenisation and analyses in certain corner cases (e.g. a full stop at the end of the sentence is also needed for a full stop of an abbreviation).
| Flag | Explanation |
|---|---|
| @P.Pmatch.Loc@ | split point for multitoken word |
| @P.Pmatch.Backtrack@ | merge point for mutltiword token |
For languages that allow compounding, the following flag diacritics are needed to control position-based compounding restrictions for nominals. Their use is handled automatically if combined with +CmpN/xxx tags. If not used, they will do no harm. | Flag | Explanation | —- | ———– | @P.CmpFrst.FALSE@ | Require that words tagged as such only appear first | @D.CmpPref.TRUE@ | Block such words from entering ENDLEX | @P.CmpPref.FALSE@ | Block these words from making further compounds | @D.CmpLast.TRUE@ | Block such words from entering R | @D.CmpNone.TRUE@ | Combines with the next tag to prohibit compounding | @U.CmpNone.FALSE@ | Combines with the prev tag to prohibit compounding | @P.CmpOnly.TRUE@ | Sets a flag to indicate that the word has passed R | @D.CmpOnly.FALSE@ | Disallow words coming directly from root.
Use the following flag diacritics to control downcasing of derived proper nouns (e.g. Finnish Pariisi -> pariisilainen). See e.g. North Sámi for how to use these flags. There exists a ready-made regex that will do the actual down-casing given the proper use of these flags. | Flag | Explanation | —- | ———– | @U.Cap.Obl@ | Allowing downcasing of derived names | @U.Cap.Opt@ | Allowing downcasing of derived names
gup flags
These flags are used in the gup fst.
| Flag | Explanation |
|---|---|
| @P.PERS.T@ | |
| @P.TENSE.NP@ | |
| @P.TENSE.P@ | |
| @P.TYPE.N@ | |
| @P.TYPE.V@ | |
| @P.VALENCE.AUGM@ | |
| @P.VALENCE.INTR@ | |
| @P.VALENCE.TRNS@ | |
| @R.PERS.T@ | |
| @R.TENSE.NP@ | |
| @R.TENSE.P@ | |
| @R.TYPE.N@ | |
| @R.TYPE.V@ | |
| @R.VALENCE.AUGM@ | |
| @R.VALENCE.INTR@ | |
| @R.VALENCE.TRNS@ |
The word forms in Gunwinggu language start from the lexeme roots of basic word classes, or optionally from prefixes:
This (part of) documentation was generated from src/fst/morphology/root.lexc