Prosodically Rich Speech Synthesis Interface Using Limited Data of Celebrity Voice - Journal of Computer and Communications

JCC > Vol.4 No.16, December 2016

Journal of Computer and Communications

Volume 4, Issue 16 (December 2016)

ISSN Print: 2327-5219 ISSN Online: 2327-5227

Google-based Impact Factor: 1.12 Citations

Prosodically Rich Speech Synthesis Interface Using Limited Data of Celebrity Voice ()

HTML XML

Download as PDF (Size: 4191KB) PP. 79-94

DOI: 10.4236/jcc.2016.416006 1,134 Downloads 2,015 Views

Author(s)

Takashi Nose¹, Taiki Kamei²

Affiliation(s)

¹Department of Communication Engineering, Graduate School of Engineering, Tohoku University, Sendai, Japan.
²Department of Applied Information Sciences, Graduate School of Information Sciences, Tohoku University, Sendai, Japan.

ABSTRACT

To enhance the communication between human and robots at home in the future, speech synthesis interfaces are indispensable that can generate expressive speech. In addition, synthesizing celebrity voice is commercially important. For these issues, this paper proposes techniques for synthesizing natural-sounding speech that has a rich prosodic personality using a limited amount of data in a text-to-speech (TTS) system. As a target speaker, we chose a well-known prime minister of Japan, Shinzo Abe, who has a good prosodic personality in his speeches. To synthesize natural-sounding and prosodically rich speech, accurate phrasing, robust duration prediction, and rich intonation modeling are important. For these purpose, we propose pause position prediction based on conditional random fields (CRFs), phone-duration prediction using random forests, and mora-based emphasis context labeling. We examine the effectiveness of the above techniques through objective and subjective evaluations.

KEYWORDS

Parametric Speech Synthesis, Hidden Markov Model (HMM), Prosodic Personality, Prosody Modeling, Conditional Random Field (CRF), Random Forest, Emphasis Context

Share and Cite:

Nose, T. and Kamei, T. (2016) Prosodically Rich Speech Synthesis Interface Using Limited Data of Celebrity Voice. Journal of Computer and Communications, 4, 79-94. doi: 10.4236/jcc.2016.416006.

Cited by

No relevant information.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies