GiellaLT

GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology.

View GiellaLT on GitHub

Page Content

Developing TTS

Recording

Finding voice talents and working with them

Technical setup

Microphones & room acoustics

Sound card, sampling rate & DAW

Soundfiles and backups

Building a manuscript/Preparing a TTS text corpus

Text prompting

Post-processing of recordings

All of these CAN be done with an AI-based “resynthesis” tool called Resemble-Enhance which is available in GitHub. This does echo and noise removal very well and even for very bad quality material. The consequences of this to the synthesis output is, however, still not well-known. Using Resemble-enhance can require a computing cluster, because it needs effective computing power. We used our Sigma2 computing cluster for this.

Transcribing the recordings

Creating the final speech corpus with sound files and corresponding text transcripts

Splitting the recordings and text transcripts to approx. sentence-long individual files

Text processing / normalisation

Using rule-based technologies (Sjur writes this)

Building a voice

Combining parts

CI/CD & package distribution