North Sami Text-to-Speech

Finite state and Constraint Grammar based Text-to-Speech processing

View the project on GitHub giellalt/speech-sme

Page Content

This document is the beginning of reading instructions for voice talents.

Neutralizations in Guovdageaidnu dialect

gŋ, kŋ and ŋ –> dnj, tnj, nj
ddj, dj ->  žž, čč          (in some speakers)
tk, tkk ->  sk, skk
tm, tmm - > sm, smm
dn ->   tn                  (in some speakers)
ŧ:ŧ, ŧŧ, ŧ -> s:s, ss, s


hr:r, hr, hr -> r:r, rr, r  (in some speakers)
hl:l, hl, hl -> l:l, ll, l  (in some speakers)
hm ->           m           (in some speakers)
ih:l, ihll - >  il, ill     (in some speakers)
ih:m, ihmm ->   im, imm     (in some speakers)
ih:n; ihnn ->   in, inn     (in some speakers)
vh:l, vhll ->   vl, vll     (in some speakers)

In reading, the following contrasts should be maintained:

voiceless sonorants: i.e. skuhrrat vs. skurrat, liehmu vs. liema
palatalized stops:  moddját vs. gožžat, bidjat vs. gičču
t-clusters:         gotka vs. goaski, fátmi vs. leasmi
voiced nasal clusters:  eadni vs. eatni
ŧ:                      muoŧŧá vs. guossa

Notice also the difference between muoŧŧái (to maternal aunt) and ‘‘muoŧ’ŧái(having many maternal aunts). They are spelt the same way, but pronounced differently. Anticipating some disambiguation in the future, we should maintain the difference in reading.Muoŧ’ŧáiis the only word with QIII quantity ofŧ’’.

(This leaves one neutralization, velar vs. coronal nasal clusters. I don’t think it is realistic to expect speakers to be able to pronounce the velar clusters while at the same time maintaining good reading fluency.)

Choice of words/suffixes

Some words have been chosen because they contain infrequent consonant centres. One example is gieđbmi - gieđmmi. The variant used in Guovdageaidnu is gievdni - gievnni. To ensure enough occurences of đbm-đmm, gieđbmi is used in the text, and must not be replaced by gievdni. Another one is guđju. It shoud not be read as gulju.

Some words are represented with all their different consonant centres: bispa and bisma are both used in the texts, and should be read as written, to ensure enough occurrences of sp and sm. The word for cloth can be pronounced in a number of ways, all of which are represented in the text, limsku, livsku and linsku. These should be read as they are written. All the consonant centres are infrequent and occur in few words.

The eastern suffix -smit is used three times, instead of western -smuvvat. This is to ensure enough occurrences of –sm– in the consonant margin. The same goes for -rmit instead of -rmuvvat.

The word vuobirs is used instead of vuobis, which is the more common variant. However, voice talents are instructed to read as the word is spelled.

The suffix -lmas has a different variant -lvas, which is used in the word buozalvas.

Onsets

Certain onsets are typical of the Eastern dialects, such as gl–. The word glássa is láse in Guovdageaidnu. However, in order to ensure enough occurrences of gl in onset position, glása occurs in the text and should be reas as it is written. Other words are flágga and flásku. The onset sr–, is rare in the Guovdageaidni dialect. In the text, some words will occur with these onsets, such as sroba, and srubistit (šlubistit in Guovdag.), and the words should be pronounced as written.

The texts have been adapted to Guovdageaidnu reading, so the few “alien” things remaining, should be read as they are written.