Design of Multilingual Speech Synthesis System

DOI: 10.4236/iim.2010.21008   PDF   HTML     6,428 Downloads   11,595 Views   Citations


The main objective of this paper is to convert the written multilingual text into machine generated synthetic speech. This paper is proposed in order to provide a complete multilingual speech synthesizer for three languages Indian English, Tamil and Telugu. The main application of TTS system is that it will be helpful for blind and mute people that they could have the text read to them by computer. TTS system will help in retrieving the information from sites that contain information in different languages. It can be used in educational institutions for pronunciation teaching of different languages. We use concatenative speech syn-thesis where the segments of recorded speech are concatenated to produce the desired output. We apply prosody which makes the synthesized speech sound more like human speech. Smoothing is also done to smooth the transition between segments in order to produce continuous output. The Optimal Coupling algorithm is enhanced to improve the performance of speech synthesis system.

Share and Cite:

S. SARASWATHI and R. VISHALAKSHY, "Design of Multilingual Speech Synthesis System," Intelligent Information Management, Vol. 2 No. 1, 2010, pp. 58-64. doi: 10.4236/iim.2010.21008.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] S. P. Kishore, R. Kumar, and R. Sangal “A data-driven synthesis approach for Indian languages using syllable as basic unit,” in Intl. Conf. on Natural Language Processing (ICON), pp. 311–316, 2002.
[2] D. T. Chappell and J. H. L. Hansen “Spectral smoothing for speech segment concatenation,” Speech Communication, Vol. 36, No. 3–4, 2002.
[3] G. L. Jayavardhana Rama, A. G.Ramakrishnan, M. Vijay Venkatesh and R. Murali Shankar, “Thirukkural–A text- to-speech synthesis system,” Paper Presented in the the Tamil Internet 2001 Conference and Exhibition (TI2001), 2001.
[4] S. Sch?tz, “Data-driven formant synthesis of speaker age,” In G. Ambrazaitis and S. Sch?tz (eds.). Lund Working Papers 52, Proceedings of Fonetik, Lund, pp. 105–108. 2006
[5] K. Tokuda, H. Zen, and A. W. Black, “An hmm-based speech synthesis system applied to English,” paper Presented in the Proc. of IEEE Speech Synthesis Workshop, 2002.
[6], (Dhvani-TTS System for Indian Languages), 2001.
[7] S. Saraswathi and T. V. Geetha, “Language models for Tamil Speech Recognition,” Publication in IETE Special Issue on Spoken Language Processing, Vol. 24, No. 5, pp. 375–383, 2007.
[8] Céu Viana, “Concatenative speech synthesis for European Portuguese,” Paper Presented in the third ESCA/COSCOSDA International Workshop on Speech Synthesis, Australia, 1998.
[9] N. Sridhar Krishna and H. A. Murthy, “Duration modeling of Indian languages Hindi and Telugu,” Paper Presented in the proceedings of 5th ISCA Speech Synthesis Workshop, 2004.
[10] N. Sridhar Krishna and H. A. Murthy, “A new prosodic phrasing model for Indian language Telugu”, Paper Presented in the Proceedings of Interspeech-2004, ICSLP 8th International Conference on Spoken Language Processing, pp. 793–796, 2004.
[11] S. Imai, “Cepstral analysis synthesis on the mel frequen- cy Scale,” Paper Presented in Proceedings of ICASSP, Vol. 8, pp. 93–96, 1983.
[12] J. H. L. Hansen and D. T. Chappell, “An auditory based distortion measure with application to concatenative speech synthesis,” IEEE Transactions on Speech and Audio Processing, Vol. 6, No. 5, pp. 489–495, ge and Data Engineering, 1999, Vol. 11, No. 1, pp. 133–142, 1998.

comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.