Finite state and Constraint Grammar based Text-to-Speech processing
View the project on GitHub giellalt/speech-smj
General pipeline / workflow for training data for synthesis:
Text data -> cleaning, correcting -> tokenization, normalization -> POS tagging, morphological analysis etc -> G2P -> Prosody modeling -> Baseline for training the synthesis.
Commented version:
Text data
-> cleaning, correcting # (manuell) korrektur i teksten
-> tokenization, normalization # Normalization: digots, abbreviations, etc - what about parentheses?
-> POS tagging, morphological analysis etc # GiellaLT stuff
-> G2P # This is the real meat
-> Prosody modeling # intonation, stress, extract features from Praat, perhaps using explicit symbols from the analysis step?
-> Baseline for training the synthesis.
Pros and cons:
Pros and cons:
Pros and cons: