North Sami Text-to-Speech

Finite state and Constraint Grammar based Text-to-Speech processing

View the project on GitHub giellalt/speech-sme

Page Content

Morphophonological rules to decide upon:

  1. allegro shortening environments
  2. long and short geminates, QIII vs QII
  3. long and short monophthongs
  4. monosyllabic word and particle written together vs. disyllabic words

Allegro Shortening Environments

Words with long latus vowels can be allegro shortened. This means (disyllabic) words with final vowel /ii/, /aa/ and /uu/. These vowels shorten to /e/, /a/ and /o/ respectively.

Morphological environments for allegro shortening: For disyllabic verbs: present tense connegative: in loga ->in loka (now we have /in lokaa/ with latus lengthening. 2sg imperative: loga -> loka (we have /lokaa/) (contracted and trisyllabic verbs do not have allegro shortening) Compund nouns: sometimes first part is shortened. The rules for á-a shortening are not clear. Suggestion: let orthography decide. If /e/ or /o/ in latus, and diphthong in centre, then allegro. Otherwise, largo.

We already have allegro shortening for diphthongs preceding /e/ and /o/. We need morphological tags to have allegroshortening when monophthongs are in the vowel centre, because allegro shortening in some cases shortens consonants, as well. Some consonant centres shorten from QIII to QII. In allegro-shortening environments, certain rules should not apply, such as latus lengthening and secondary lengthening. So we need these tags to block rules from applying.

Example: orthographic form čolgga receives several analyses:

čolgga	čolga+N+Sg+Gen
čolgga	čolga+N+Sg+Acc
čolgga	čolgat+V+TV+VGen
čolgga	čolgat+V+TV+Imprt+ConNeg
čolgga	čolgat+V+TV+Imprt+Sg2
čolgga	čolgat+V+TV+Ind+Prs+ConNeg

The three topmost analyses are not allegro shortening environments, so latus lengthening and secondary lengthening apply:

/tʃolkː.kɑː/

The rest are all allegro shortening environments, and this shortening should block other rules: /tʃolk.kɑ/.

Long And Short Geminates

While long and short clusters are differentiated in orthography, long geminates are written with two letters, exactly like short geminates:

orthographic phonological
Guossi.nom.sg /kuos:sii/
Guossi.acc.sg /kuossii/
guossa.nom.sg /kuossa/

Today we only have QIII geminates when primary lengthening applies. QIII and QII are otherwise only differentiated when there is a difference in the orthography.

Possible tags:

Problems: Nouns and verbs with QIII strong grade, such as vuoššat and golli:

In lexicon, G3 is specified for nouns. When G3 + Nom.sg, ill.sg or essive, then QIII. G3 is not specified for verbs, since it has not been necessary so far.

long and short monophthongs

Monophthongs that stem from diphthong simplification are long before QII and QII.

Diphthong simplification environments:

(There are also some -uj adjectives with diphthong simplification, but these are all QIII, so the monophthong is short anyway.)

A monophthong /i, o, e, u/ before /e/, /o/ and /ij/, and illative /uj/ can thus be a monophthongized diphthong. If it is, it will be long before QI and QII consonant centre. The monophthong could be a simplified diphthong or an orginal monophthong. Check out the -at and -it verbs:

oađđit  - mun ođđen:  /oođđen/
gođđit  - mun gođđen: /kođđen/
          dan dihte:  /tihte/
diehtit - moai dihte: /tiihte/
Biđđit  - son biđii:  /piđij/
Diehtit - son diđii:  /diiđij/
Goahti  - gođiid:     /koođiiht/
Gođđit  - son gođii:  /kođij/

Some orthographic forms are also ambiguous between two readings. To differentiate, we need to use the same disambiguation as in translations:

moai sohpe: /soohpe/ from soahpat and /sohpe/ from sohpat
moai biđđe: /piiđđe/ from bieđđat and /piđđe/ from biđđit

Derived words

Some derived words have long monophthongs throughout their paradigm, and no diphthong anywhere:

firon /fiiron/, from 'fierrut'
doron /tooron/, from 'doarrut'
geso /keeso/, another form of 'geaŧŧu'
bures /puures/, from /buorre/

We have vowel lengthening today:

define VowelLengthening e -> e e , i -> i i , o -> o o , u -> u u 
|   .#. (Cns*) _ (h) %^ Cns Vow [Cns ]( .#. ) ;
| --- 

This takes care of the four words above. However, this environment also has short monophthongs (especially allegro forms):

visot: /visoht/ vs. visoš  /viisoš/
spoađđu          -  spođoš /spoođoš/
nođđu            -  nođoš  /nođoš/
bođii /poođij/   -  gođii  /kođij/

Loanwords from Norwegian

Norwegian monosyllabic words with long vowel and short consonant are represented with a long monophthong in the centre. They should be marked somehow. For instance toga, now we have /thoga:/ (with latus lengthening, bad). We want /tho:ga/. (Note: Short vowel + long consonant is represented as QIII: buss: /bus:se/, penn: /phen:na/). We might also want /tho:ga:/ from Norwegian toga.

Monosyllabic Word And Particle

When monosyllabic words are written together with a particle, they look like disyllabic words. Our converter will interpret these as disyllabic, and give them a consonant centre. This is obviously bad, because what looks like a consonant centre is actually finis+initium. In adition to that, several rules might apply, such as allegro shortening. The sentence In dovdda geange is now transcribed as:

in toβtː.tɑː kĕæŋŋ.ke.

We want the converter to recognize the allegro-shortening environment after the negation, and to recognize the monosyllabic pronoun and particle -ge. We want:

in toβt.tɑ keæn ke

Now latus lengthening also applies to the particle, as in the sentence Miiba de: /mijː.pɑː te/. Compare with separate pronoun and particle mii ba de: /mij pɑ te/.

Many particles end with e or o, thus creating a potential allegro shortening environment. We have different transcriptions of moaige and moai ge:

/mŏɑjː.ke/ vs. /moɑj ke/

This also affects other words. Word final consonants are not recocnized as such, and /t/ remains, when it should disappear.

manatgo: /ma.naat.go/     /manat go: ma.naah ko/

Miscellaneous

Some small, but important, words are not subject to usual phonological rules. Oba, not subject to latus lengthening: /o.pa/ not /o.paa/ oppa, not subject to latus lengthening, or secondary lenghthening: /op.pa/ not /op.paa/. (unlike the noun oppas, which is correctly /op:paas/.

Once allegro shortening of consonants is up an running, certain words that are originally allegro, are now largo: mielde, and guokte. They have diphthong + e, allegro shortening environment. The consonant centre should be QIII, still.

Movt, govt and nuvt do not have t -> h word finally. We now have /movh/, /kovh/ and /nuvh/. We want /movht, kovht and nuvht/. The /movh, kovh and nuvh/ pronunciation can work utterance internally but not utterance finally. Perhaps the best thing to do is to have t ->ht apply utterance finally or to words in isolation, and t ->h word finally elsewhere. This has already been pointed out by Helsinki.

Orthographic vuodjá has two pronunciations:

/vuoccaa/ if 3sg of /vuodjat/
/vuoc:caa/ if 3sg of /vuodjit/

This means that secondary lengthening does not apply to first, even though it meets the environment description. The application of the rule instead looks into the latus vowel of the infinitive, which is short. (Extremely interesting (morpho)phonologically, I could go on and on about it, but hopefully there is a practical solution to this).