Finite state and Constraint Grammar based Text-to-Speech processing
View the project on GitHub giellalt/speech-sme
Preprocessing is our term for converting ortographic text to a string of IPA (or similar) symbols, suitable as input to the speech synthesis engine.
The preprocessing will be done with the following two technologies only:
Both HFST and VISLCG3 provides runtime C or C+ libraries. No scripted languages like Perl or Python, as they fit poorly for distribution as binary packages.
The basic architecture of the preprocessing can be illustrated with the following picture:
In the regular processing of text we have the following pipeline:
text -> preprocess.pl -> morph. analysis -> lookup2cg.pl -> disamb.
Now, because we can’t rely on the availability of Perl, we can’t use preprocess.pl
nor lookup2cg.pl
. Instead we need something based on hfst-proc
. This entails a couple of things to be done:
hfst-proc
is correctly formatted as input to CG
hfst-proc
for thislookup2cg.pl
hfst-proc
to only output the simplest analyses (local disambiguation)hfst-proc
will do segmentation for us, we need to make sure all multiword expressions are covered in our lexicons as suchAnother change is that to ensure round-trip consistency we need to make use of +v1
tags, and also augment the morphological analysis with tags for all variation in the morphology, e.g. locatives on –n
. The speech synthesis needs to as closely as possible render the input text, including misspellings and alternate (including non-standard) inflections and word forms.
With the proposed architecture we get a couple of benefits:
The main drawback is that we get a much more complex system, which increases the risk of introducing bugs and errors. We thus need to compensate this by testing each component of the pipeline thoroughly, and also the total package.