North Sami Text-to-Speech

Finite state and Constraint Grammar based Text-to-Speech processing

View the project on GitHub giellalt/speech-sme

Page Content

Sámi Speech Synthesis / TTS planning meeting

Topics:

Financing:

Political view

Meetings

The Divvun project board meeting

Based upon the 2003 report.

Less than a year, and NOK ≈ 3.8 / 4.2 m

Arbeids- og inkluderingsdepartementet, samisk avdeling

Forthcoming meeting.

Political setting:

These political premises lead to the conclusion in 20.5.3[#1]

20.5.3 Talesyntese

“Stadig flere tjenester gjøres elektroniske og mye informasjon gjøres tilgjengelig i digital form, blant annet i biblioteker. For å kunne gi alle tilgang er det av flere pekt på behov for ytterligere innsats fra det offentlige, og mange aktører har pekt spesielt på tilgang til verktøy for talesyntese.”

“Tale har vist seg å være en effektiv måte å presentere informasjon på, enten den blir brukt alene, sammen med teksten, eller er synkronisert med teksten ved at markøren viser hvilket ord som blir lest. Taleprogram kan gi mennesker med lese- og skrivevansker tilgang til ulike typer tekst, fra fagbøker til aviser. Mennesker som skal lære norsk vil også kunne få god nytte av slik funksjonalitet. Sametinget har utredet om det er mulig å utvikle talesyntese for samiske språk. Talesyntesen er tenkt brukt som tilleggsverktøy i kombinasjon med ordinære korrekturprogram. Erfaring viser at bruk av talesyntese støtter både lese- og skriveprosessen. I tillegg vil talesyntese kunne bli brukt som grunnlag for å utvikle et moderne tjenestetilbud på mange felt.”

Two views:

1 - we’re done

2 - we have just started

Form:

The company will be reluctant to open up the system. Bringing the two together might be harder

For speech synthesis we will need a person who is capable of developing a tts.

Our view

Strategic goal

Establish an as independent production line as possible Be able to scale up the number of languages

Where will we stop?

Maximal version:

Within our scope: Production line for open platforms

The linguistic part, and the basic (to be specified) part of the production line for proprietary platforms

Embedding applications into proprietary platforms is outside our scope

  1. do the linguistics
  2. see what the company does for microsoft
  3. learn from that
  4. do “the same” for unix platforms

Prerequisites

Parsers:

Languages:

Work

Difference before and after 186

Needing:

For Finnish, 1 h speech, appr 500 - 1000 sentences. Recording is cheap, the cost is in the annotation: Labeling accentuation and breaks. 1 day = 100 sentences

What is the input from our intonation team?

Work on the prediction side: Make a grammar for predicting the prosodic unit

The synthesis makes the contours as soon as we have the prosodic units.

Putting in the breaks was important, but after that the hmm worked.

Autosegmental break

Predicting abstract prominence:

  1. Syntactic analysis ==>
  2. information and intonantion structure, accent prediction =>
  3. F0

Product from our proof-of-concept demo.

Important development goals:

Tasks for making the voice:

Labeling

syn-analysis => accent and phrasing

Voice-building:

Tasks for making TTS out of the voice:

Tasks for making a product:

Try to build in an option to open-source future. HMM is a lot of open source.

Most HMM projects: hts toolkit, festival, etc.

The future of speech synthesis is in HMM synthesis.

2003: 2/3 unit selection, 1/3 diphone, small fraction doing HMM

2008: 1/3 => 2/3 hmm, 1/3 US, nobody doing diphone

Persons:

Project-internal training:

24 mmth vs 48 mmth??

Presentations on thursday:

Sjur to integrate his former report and notes from this meeting

Demo:

hmm examples for Finnish: [http://homepages.inf.ed.ac.uk/jyamagis/]

[1] [http://www.regjeringen.no/nb/dep/aid/dok/regpubl/stmeld/2007-2008/stmeld-nr-28-2007-2008-/20/5.html?id=513086]