GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.

View GiellaLT on GitHub

Developing TTS

Recording

Finding voice talents and working with them

Technical setup

Microphones & room acoustics

Sound card, sampling rate & DAW

Soundfiles and backups

Building a manuscript/Preparing a TTS text corpus

Text prompting

Post-processing of recordings

All of these CAN be done with an AI-based “resynthesis” tool called Resemble-Enhance which is available in GitHub. This does echo and noise removal very well and even for very bad quality material. The consequences of this to the synthesis output is, however, still not well-known. Using Resemble-enhance can require a computing cluster, because it needs effective computing power. We used our Sigma2 computing cluster for this.

Transcribing the recordings

Creating the final speech corpus with sound files and corresponding text transcripts

Splitting the recordings and text transcripts to approx. sentence-long individual files

Some tips for using WebMAUS: * Audio files over 200 MB/30mins in size should be split in smaller chunks first or the aligner won’t work/will work very slowly * A TIP for very long audio files: use Pipeline without ASR with G2P -> Chunker -> MAUS options * There is no Sámi model available in WebMAUS, but the Finnish model works for Sámi – note that numbers etc. would be normalized in Finnish if any in the text input so make sure numbers are normalized before using WebMAUS! * First, you need to upload identically named .txt and .wav pairs. * To retain original punctuation, choose this Pipeline name: G2P->MAUS->SUBTITLE!
* WebMAUS automatically outputs a Praat .TextGrid annotation file with 4 annotation layers and boundaries on phoneme and word levels/tiers, additionally a tier named “TRN” that contains the original sentences with original punctuation retained!

Text processing / normalisation

Using rule-based technologies (Sjur writes this)

Building a voice

Combining parts

CI/CD & package distribution