Lule Sami Text-to-Speech

Finite state and Constraint Grammar based Text-to-Speech processing

View the project on GitHub giellalt/speech-smj

Page Content

TTS synthesis notes by KHA

General pipeline / workflow for training data for synthesis:

Text data -> cleaning, correcting -> tokenization, normalization -> POS tagging, morphological analysis etc -> G2P -> Prosody modeling -> Baseline for training the synthesis.

Commented version:

Text data
-> cleaning, correcting # (manuell) korrektur i teksten
-> tokenization, normalization # Normalization: digots, abbreviations, etc - what about parentheses?
-> POS tagging, morphological analysis etc # GiellaLT stuff
-> G2P # This is the real meat
-> Prosody modeling # intonation, stress, extract features from Praat, perhaps using explicit symbols from the analysis step?
-> Baseline for training the synthesis.

Candidate technologies/engines

Wavenet

Pros and cons:

Facebook VoiceLoop

Pros and cons:

Mozilla TTS

Pros and cons:

(Mozilla) Tacotron

Other systems