North Sami Text-to-Speech

Finite state and Constraint Grammar based Text-to-Speech processing

View the project on GitHub giellalt/speech-sme

Page Content

Preprocessing is our term for converting ortographic text to a string of IPA (or similar) symbols, suitable as input to the speech synthesis engine.

Technologies

The preprocessing will be done with the following two technologies only:

Both HFST and VISLCG3 provides runtime C or C+ libraries. No scripted languages like Perl or Python, as they fit poorly for distribution as binary packages.

Architecture

The basic architecture of the preprocessing can be illustrated with the following picture:

Preprocessing architecture

Modifications compared to our regular development environment

In the regular processing of text we have the following pipeline:

text -> preprocess.pl -> morph. analysis -> lookup2cg.pl -> disamb.

Now, because we can’t rely on the availability of Perl, we can’t use preprocess.pl nor lookup2cg.pl. Instead we need something based on hfst-proc. This entails a couple of things to be done:

Another change is that to ensure round-trip consistency we need to make use of +v1 tags, and also augment the morphological analysis with tags for all variation in the morphology, e.g. locatives on –n. The speech synthesis needs to as closely as possible render the input text, including misspellings and alternate (including non-standard) inflections and word forms.

Benefits

With the proposed architecture we get a couple of benefits:

Drawbacks and their counteractions

The main drawback is that we get a much more complex system, which increases the risk of introducing bugs and errors. We thus need to compensate this by testing each component of the pipeline thoroughly, and also the total package.