IVONA

IVONA

IVONA visualisation
Developer(s)	IVONA Software
Initial release	2005 (2005)
Written in	C/C++
Operating system	Cross-platform
Available in	Polish / English / Romanian / German / Castilian Spanish / American Spanish / French / Welsh / Italian / Icelandic / Brazilian Portuguese more coming soon
Type	Text-To-Speech
License	Commercial
Website	www.ivona.com

IVONA is a multi-lingual speech synthesis system developed at Polish IT company IVONA Software. It offers a full text to speech system with various APIs. It was acquired by Amazon.com in January 2013,^[1] for its Kindle product range.

Inside IVONA

IVONA text-to-speech system was described at Blizzard Challenge 2006.^[2] and Blizzard Challenge 2007 (special version for Blizzard Challenge).^[3] It is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end—often referred to as the synthesizer—then converts the symbolic linguistic representation into sound.

Unit selection synthesis

IVONA uses Unit Selection with Limited Time-scale Modification (USLTM) described in their Blizzard Challenge 2006 paper.^[2] Unit selection synthesis uses large databases of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following: individual phones, syllables, morphemes, words, phrases, and sentences. The division into segments is done using a specially modified speech recognizer.^[4] An index of the units in the speech database is then created based on the segmentation and acoustic parameters like the fundamental frequency (pitch), duration, position in the syllable, and neighboring phones. At runtime, the desired target utterance is created by determining the best chain of candidate units from the database (unit selection).

Unit selection provides the greatest naturalness, because it applies digital signal processing (DSP) to the recorded speech only at concatenation points. DSP often makes recorded speech sound less natural.

Generated speech quality

IVONA Text To Speech System received the highest Mean Opinion Score (MOS) at the scientific contest Blizzard Challenge 2007 in Bonn, Germany. The sentences read out by IVONA were evaluated by experts, a group of British and American students and volunteers recruited via the Internet. Average mean opinion score for IVONA was the highest (3.9 points) from all speech synthesizers. A real person’s recording scored 4.7.^[5]

IVONA was also evaluated at Blizzard Challenge 2006 in Pittsburgh, USA and received best Mean Opinion Score (MOS) provided by Speech Experts and Undergraduates for full database results.^[6]

Voices and languages

IVONA's voice portfolio presently attests that the software speaks 23 languages with 53 voices.^[7] Those languages are listed as:

English (American, Australian, British, Indian, Welsh)
Welsh
Danish
Dutch
French (and Canadian French)
German
Icelandic
Italian
Polish
Portuguese (and Brazilian Portuguese)
Romanian
Russian
Spanish (Castilian, American)
Swedish
Turkish
Norwegian

System compatibility

IVONA is compatible with Windows, Unix, Android, Tizen, iOS based systems.

References

External links

Official website

Speech synthesis

Free software	eSpeak Gnopernicus Gnuspeech Orca Festival Speech Synthesis System FreeTTS Sinsy Automatik Text Reader

Proprietary software	DECtalk Software Automatic Mouth Talk It! Microsoft Agent Microsoft Speech API Microsoft text-to-speech voices Readspeaker Voice browser CoolSpeech VoiceWeb BrowseAloud LaLaVoice Vocaloid Cantor Symphonic Choirs IVONA CereProc Utau Voiceroid NIAONiao Virtual Singer Vocalina Realivox CeVIO Creative Studio Chipspeech Alter/Ego PPG Phonem

Machine	Echo 2 Pattern playback Phasor RIAS Texas Instruments LPC Speech Chips TuVox

Applications	AOLbyPhone DialogOS Dr. Sbaitso MBROLA Microsoft Narrator Microsoft Speech Server PlainTalk Voice font

Protocols	Speech Synthesis Markup Language SABLE VoiceXML

Developers/ Researchers	Alan W. Black Catherine Browman Franklin Seaney Cooper Gunnar Fant Haskins Laboratories Wolfgang von Kempelen Ignatius Mattingly Philip Rubin Yamaha

Process	Articulatory synthesis Concatenative synthesis Currah Inverse filter PSOLA Phase vocoder Self-voicing

This article is issued from Wikipedia - version of the Sunday, April 24, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.