Preparing text for TTS
Collect enough text to be read in as training material. The model may be built based on 3-12 hours, a good target is 10 hours speech. This should equal appr. 45000-50000 words. Collecting and especially prepararing the text may take several months.
Keep in mind:
- The text should cover digraph sequences, consonant gradation strings, etc.
- The text should be balanced topic-wise
- It should contain numbers of different types (ordinals, cardinals, years, dates, clock expressions)
- It should also contain loan words
Links to text collections
So far, there has been built TTS systems for North, Lule and South Sámi. Currently (2025), projects for Inari Sámi, Kven and Meänkieli are under planning or in the making. Text collections are in closed corpora, pendants to the open github repositories for the languages for which TTS programs are being made.
Here are the open git repositories: