Finite state and Constraint Grammar based Text-to-Speech processing
View the project on GitHub giellalt/speech-sme
Morphophonological rules to decide upon:
Words with long latus vowels can be allegro shortened. This means (disyllabic) words with final vowel /ii/, /aa/ and /uu/. These vowels shorten to /e/, /a/ and /o/ respectively.
Morphological environments for allegro shortening: For disyllabic verbs: present tense connegative: in loga ->in loka (now we have /in lokaa/ with latus lengthening. 2sg imperative: loga -> loka (we have /lokaa/) (contracted and trisyllabic verbs do not have allegro shortening) Compund nouns: sometimes first part is shortened. The rules for á-a shortening are not clear. Suggestion: let orthography decide. If /e/ or /o/ in latus, and diphthong in centre, then allegro. Otherwise, largo.
We already have allegro shortening for diphthongs preceding /e/ and /o/. We need morphological tags to have allegroshortening when monophthongs are in the vowel centre, because allegro shortening in some cases shortens consonants, as well. Some consonant centres shorten from QIII to QII. In allegro-shortening environments, certain rules should not apply, such as latus lengthening and secondary lengthening. So we need these tags to block rules from applying.
Example: orthographic form čolgga receives several analyses:
čolgga čolga+N+Sg+Gen
čolgga čolga+N+Sg+Acc
čolgga čolgat+V+TV+VGen
čolgga čolgat+V+TV+Imprt+ConNeg
čolgga čolgat+V+TV+Imprt+Sg2
čolgga čolgat+V+TV+Ind+Prs+ConNeg
The three topmost analyses are not allegro shortening environments, so latus lengthening and secondary lengthening apply:
/tʃolkː.kɑː/
The rest are all allegro shortening environments, and this shortening should block other rules: /tʃolk.kɑ/.
While long and short clusters are differentiated in orthography, long geminates are written with two letters, exactly like short geminates:
orthographic | phonological |
---|---|
Guossi.nom.sg | /kuos:sii/ |
Guossi.acc.sg | /kuossii/ |
guossa.nom.sg | /kuossa/ |
Today we only have QIII geminates when primary lengthening applies. QIII and QII are otherwise only differentiated when there is a difference in the orthography.
Possible tags:
/pes:seht/
vs besset.1pl.present: /peesseht/
- beassat.
This means verbs with -e
, -o
or -á
in the latus.
Note that this means only when the centre is a geminate. Words like málet are obviously QI./por:roht/
vs borrot.1pl.imp: /porroht/
/kul:loht/
vs gullot.1pl.imp: /kulloht/
/por:rojuv:voht/
This is just a special case of the stem class rule./por:rii/
/por:ruu/
, borri: /por:rii/
/sul:lo/
Problems: Nouns and verbs with QIII strong grade, such as vuoššat and golli:
/vuoš:šaht/
vs vuoššat.2sg.present /vuoššaht/
/kol:lii/
vs golli.acc.sg /kollii/
In lexicon, G3 is specified for nouns. When G3 + Nom.sg, ill.sg or essive, then QIII. G3 is not specified for verbs, since it has not been necessary so far.
Monophthongs that stem from diphthong simplification are long before QII and QII.
Diphthong simplification environments:
/o/
(except when o
is allegroshortened)/e/
(except when e
is allegroshortened)/ij/
or orthographic ii/uj/
in illative singular(There are also some -uj adjectives with diphthong simplification, but these are all QIII, so the monophthong is short anyway.)
A monophthong /i, o, e, u/
before /e/
, /o/
and /ij/
, and illative /uj/
can thus be a monophthongized diphthong. If it is, it will be long before QI and QII consonant centre. The monophthong could be a simplified diphthong or an orginal monophthong. Check out the -at and -it verbs:
oađđit - mun ođđen: /oođđen/
gođđit - mun gođđen: /kođđen/
dan dihte: /tihte/
diehtit - moai dihte: /tiihte/
Biđđit - son biđii: /piđij/
Diehtit - son diđii: /diiđij/
Goahti - gođiid: /koođiiht/
Gođđit - son gođii: /kođij/
Some orthographic forms are also ambiguous between two readings. To differentiate, we need to use the same disambiguation as in translations:
moai sohpe: /soohpe/ from soahpat and /sohpe/ from sohpat
moai biđđe: /piiđđe/ from bieđđat and /piđđe/ from biđđit
Some derived words have long monophthongs throughout their paradigm, and no diphthong anywhere:
firon /fiiron/, from 'fierrut'
doron /tooron/, from 'doarrut'
geso /keeso/, another form of 'geaŧŧu'
bures /puures/, from /buorre/
We have vowel lengthening today:
define VowelLengthening e -> e e , i -> i i , o -> o o , u -> u u
| .#. (Cns*) _ (h) %^ Cns Vow [Cns ]( .#. ) ;
| ---
This takes care of the four words above. However, this environment also has short monophthongs (especially allegro forms):
visot: /visoht/ vs. visoš /viisoš/
spoađđu - spođoš /spoođoš/
nođđu - nođoš /nođoš/
bođii /poođij/ - gođii /kođij/
Norwegian monosyllabic words with long vowel and short consonant are represented with a long monophthong in the centre. They should be marked somehow. For instance toga, now we have /thoga:/ (with latus lengthening, bad). We want /tho:ga/. (Note: Short vowel + long consonant is represented as QIII: buss: /bus:se/, penn: /phen:na/). We might also want /tho:ga:/ from Norwegian toga.
When monosyllabic words are written together with a particle, they look like disyllabic words. Our converter will interpret these as disyllabic, and give them a consonant centre. This is obviously bad, because what looks like a consonant centre is actually finis+initium. In adition to that, several rules might apply, such as allegro shortening. The sentence In dovdda geange is now transcribed as:
in toβtː.tɑː kĕæŋŋ.ke.
We want the converter to recognize the allegro-shortening environment after the negation, and to recognize the monosyllabic pronoun and particle -ge. We want:
in toβt.tɑ keæn ke
Now latus lengthening also applies to the particle, as in the sentence Miiba de: /mijː.pɑː te/. Compare with separate pronoun and particle mii ba de: /mij pɑ te/.
Many particles end with e or o, thus creating a potential allegro shortening environment. We have different transcriptions of moaige and moai ge:
/mŏɑjː.ke/ vs. /moɑj ke/
This also affects other words. Word final consonants are not recocnized as such, and /t/ remains, when it should disappear.
manatgo: /ma.naat.go/ /manat go: ma.naah ko/
Some small, but important, words are not subject to usual phonological rules. Oba, not subject to latus lengthening: /o.pa/ not /o.paa/ oppa, not subject to latus lengthening, or secondary lenghthening: /op.pa/ not /op.paa/. (unlike the noun oppas, which is correctly /op:paas/.
Once allegro shortening of consonants is up an running, certain words that are originally allegro, are now largo: mielde, and guokte. They have diphthong + e, allegro shortening environment. The consonant centre should be QIII, still.
Movt, govt and nuvt do not have t -> h word finally. We now have /movh/, /kovh/ and /nuvh/. We want /movht, kovht and nuvht/. The /movh, kovh and nuvh/ pronunciation can work utterance internally but not utterance finally. Perhaps the best thing to do is to have t ->ht apply utterance finally or to words in isolation, and t ->h word finally elsewhere. This has already been pointed out by Helsinki.
Orthographic vuodjá has two pronunciations:
/vuoccaa/ if 3sg of /vuodjat/
/vuoc:caa/ if 3sg of /vuodjit/
This means that secondary lengthening does not apply to first, even though it meets the environment description. The application of the rule instead looks into the latus vowel of the infinitive, which is short. (Extremely interesting (morpho)phonologically, I could go on and on about it, but hopefully there is a practical solution to this).