Conversion to IPA
We now have a working setup. Below are first instructions for getting started, followed by the original specifications, and a list of open issues.
Specification
Input is a VISLCG3 cohort usually containing a Phon element (string with Phon attached at the end):
"<Dr.>"
"dåktår" Area/NO N Sem/Hum Sg Acc "dåktår>av"Phon <W:0.0>
"dr" Area/NO N Sem/Hum ABBR Gram/TAbbr Sg Acc <W:0.0>
The conversion is done using one or more rule transducers. That is, the output should be something like this (with huge reservations for the actual IPA!):
"<Dr.>"
"dåktår" Area/NO N Sem/Hum Sg Acc "dɔkʰtɔrav"Phon <W:0.0>
"dr" Area/NO N Sem/Hum ABBR Gram/TAbbr Sg Acc <W:0.0>
Compounds
"<álggosijddo>"
"sijddo" N Sem/Obj-surfc Sg Nom "álggo#sijddo"Phon <W:0.0> @SUBJ> #7->16
"álggo" N Sem/Time Cmp/SgNom Cmp "álggo#sijddo"Phon <W:0.0> #7->16
Fake three tape fst — abstract phonemic representation
There is now support for building TTS specific analysers that do the analysis in two steps:
- step:
echo åludallabehtit | hfst-lookup -q src/analyser-tts-gt-input.hfstol åludallabehtit oallo»X7dalla>behti>t 0,000000 - step:
echo 'oallo»X7dalla>behti>t' | hfst-lookup -q src/analyser-tts-gt-output.hfstol oallo»X7dalla>behti>t oallot+V+TV+Der/dalla+V+Ind+Prs+Pl2+Err/Orth 0,000000
And hfst-pmatch and hfst-tokenise have been extended such that given the above input, the output should be:
echo åludallabehtit | hfst-tokenise -g tools/tokenisers/tokeniser-tts-cggt-desc.pmhfst
"<åludallabehtit>"
"oallot" V TV Der/dalla V Ind Prs Pl2 Err/Orth "oallo»X7dalla>behti>t"MIDTAPE <W:0.0>
:\n
Using this, we have access to underlying abstractions over phonemes, and more detailed consonant gradation information, which should help us produce more accurate and better speech synthesis.
To use this setup, please configure as follows:
MISSING! MIDTAPE is always printed, also in cases where the MIDTAPE is identical to the surface string. It should not, since that will interfere with phon data from other steps.
Input data priority and IPA conversion
The processing goes as follows:
hfst-tokenisegives a MIDTAPE representation if different from surface form- normalisation turns abbreviations into words, and store them in
phonstrings - in IPA conversion, MIDTAPE takes presedence over
phonstrings - in IPA conversion, if neither MIDTAPE nor
phonstring is present, use word form as starting point (also handles cases of no analysis) - use different conversion fst’s for MIDTAPE and
phonstrings, and possibly also for different OLang tags
Given that we retain the VISLCG3 cohort stream even after the IPA conversion, it is possible to do further disambiguation afterwords, if needed or wanted.
Outputting IPA
We need a very simple tool that will take a VISLCG3 stream, and for each cohort only output the Phon strings (possibly with the plain-text wordform as comment, for debugging).
Given this input:
"<Dr.>"
"dåktår" Area/NO N Sem/Hum Sg Acc "dɔkʰtɔrav"Phon <W:0.0>
"dr" Area/NO N Sem/Hum ABBR Gram/TAbbr Sg Acc <W:0.0>
return this (or something similar, must be compatible with the synthesiser engines we want to use):
dɔkʰtɔrav # Dr.
This string is then fed to the synthesiser.
Ambiguity handling
In cases where there is still ambiguity left in the CG stream, proceed as follows:
- if the resulting IPA strings are identical, give a warning to
stderr, and proceed - if the resulting IPA strings are different AND in debug/verbose mode, print the amboiguous strings to
stderr, and stop - if the resulting IPA strings are different AND NOT in debug/verbose mode AND they have different weights: pick the one with the lowest weight, print a warning on
stderr, and proceed - if the resulting IPA strings are different AND NOT in debug/verbose mode AND they have identical weights: pick the first one, print a warning on
stderr, and proceed
Open issues
- punctuation conversion: they are presently left as is, but should possibly be converted to symbols for various breaks, intonation modulation etc
- exceptional pronunciation: we have not yet decided, but it could either be specified in
src/phonetics/txt2ipa.xfscriptor in the lexc files. Using thetxt2ipafile will work right out of the box, lexc requires some hard thinking - make use of syntactic parsing to govern prosody