Emotional Speech Synthesis Based on Prosodic Feature Modification

Ling He; Hua Huang; Margaret Lech

doi:10.4236/eng.2013.510B015

Engineering > Vol.5 No.10B, October 2013

Emotional Speech Synthesis Based on Prosodic Feature Modification

Ling He, Hua Huang, Margaret Lech
School of Electrical and Computer Engineering, RMIT University, Melbourne, Australia.
School of Electrical Engineering and Information, Sichuan University, Chengdu, China.
DOI: 10.4236/eng.2013.510B015 PDF HTML 3,381 Downloads 5,571 Views Citations

Abstract

The synthesis of emotional speech has wide applications in the field of human-computer interaction, medicine, industry and so on. In this work, an emotional speech synthesis system is proposed based on prosodic features modification and Time Domain Pitch Synchronous OverLap Add (TD-PSOLA) waveform concatenative algorithm. The system produces synthesized speech with four types of emotion: angry, happy, sad and bored. The experiment results show that the proposed emotional speech synthesis system achieves a good performance. The produced utterances present clear emotional expression. The subjective test reaches high classification accuracy for different types of synthesized emotional speech utterances.

Keywords

Emotional Speech Synthesis; Prosodic Features; Time Domain Pitch Synchronous Overlap Add

Share and Cite:

He, L. , Huang, H. and Lech, M. (2013) Emotional Speech Synthesis Based on Prosodic Feature Modification. Engineering, 5, 73-77. doi: 10.4236/eng.2013.510B015.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	R. Cowie, E. Douglas-Cowie, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz and J. G. Taylor, “Emotion Recognition in Human-Computer Interaction,” Signal Processing Magazine, IEEE, Vol. 18, No. 1, 2001, pp. 32-80. http://dx.doi.org/10.1109/79.911197
[2]	M. Schröder, R. Cowie and E. Cowie, “Emotional Speech Synthesis: A Review,” Eurospeech-2001, 2001.
[3]	J. E. Cahn, “The Generation of Affect in Synthesized Speech,” Journal of the American Voice I/O Society, Vol. 9, 1990, pp. 1-19.
[4]	F. Burkhardt and F. Sendlmeier, “Verification of Acoustical Correlates of Emotional Speech Using Formant-Synthesis,” ISCA Workshop on Speech & Emotion, Northern Ireland, 2000, pp. 151-156.
[5]	M. Bulut, S. Narayan and A. Syrdal, “Expressive Speech Synthesis Using a Concatenative Synthesizer,” Proceedings of ICSLP, 2002, pp. 1265-1268.
[6]	E. Eide, “Preservation, Identification, and Use of Emotion in a Textto-Speech System,” Proceedings of IEEE Workshop on Speech Synthesis, 2002, pp. 127-130.
[7]	A. W. Black and N. Cambpbell, “Optimising Selection of Units from Speech Database for Concatenative Synthesis,” Proceedings of EUROSPEECH-95, 1995, pp. 581-584.
[8]	J. Pitrelli, R. Bakis, E. Eide, R. Fernandez, W. Hamza and M. Picheny, “The IBM Expressive Text-to-Speech Synthesis System for American English,” IEEE Transactions on Speech Audio Process, Vol. 14, No. 4, 2006, pp. 1099- 1108. http://dx.doi.org/10.1109/TASL.2006.876123
[9]	W. Hamza, R. Bakis, E. Eide, M. Picheny and J. Pitrelli, “The IBM Expressive Speech Synthesis System,” Proceedings of ICSLP, 2004.
[10]	G. Hofer, K. Richmond and R. Clark, “Informed Blending of Databases for Emotional Speech Synthesis,” Proceedings of Interspeech, 2005, pp. 501-504.
[11]	M. Schroder, “Speech and Emotion Research: An Overview of Research Frameworks and a Dimensional Approach to Emotional Speech Synthesis,” Ph.D. Thesis, Saarland University, Saarland, 2004.
[12]	L. R. Rabiner and R. W. Schafer, “Digital Processing of Speech Signals,” Prentice-Hall, Inc., Englewood Cliffs, 1978.
[13]	F. Burkhardt, A. Paeschke, M. Rolfes, et al., “A Database of German Emotional Speech,” Proceedings of Interspeech, 2005.

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies