Real Time Prosody Modification
Krothapalli Sreenivasa Rao
DOI: 10.4236/jsip.2010.11006   PDF    HTML     5,102 Downloads   9,150 Views   Citations


Real time prosody modification involves changing the prosody parameters such as pitch, duration and intensity of speech in real time without affecting the intelligibility and naturalness. In this paper prosody modification is performed using instants of significant excitation (ISE) of the vocal tract system during production of speech. In the conventional prosody modification system the ISE are computed using group delay function, and it is computationally intensive task. In this paper, we propose computationally efficient methods to determine the ISE suitable for prosody modification in interactive (real time) applications. The overall computational time for the prosody modification by using the proposed method is compared with the conventional prosody modification method which uses the group delay function for computing the ISE.

Share and Cite:

K. Rao, "Real Time Prosody Modification," Journal of Signal and Information Processing, Vol. 1 No. 1, 2010, pp. 50-62. doi: 10.4236/jsip.2010.11006.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] D. G. Childers, K. Wu, D. M. Hicks, and B. Yegnanarayana,“Voice conversion,” Speech Communication, Vol. 8, pp. 147-158, June 1989.
[2] E. Moulines and J. Laroche, “Non-parametric techniques for pitch-scale and time-scale modification of speech,” Speech Communication, Vol. 16, pp. 175-205, Feb. 1995.
[3] B. Yegnanarayana, S. Rajendran, V. R. Ramachandran, and A. S.M. Kumar, “Significance of knowledge sources for TTS system for Indian languages,” SADHANA Academy Proc. In Engineering Sciences, Vol. 19, pp. 147-169, Feb. 1994.
[4] M. R. Portnoff, “Time-scale modification of speech based on short-time Fourier analysis,” IEEE Trans. Acoustics, Speech, and Signal Processing, Vol. 29, pp. 374-390, June. 1981.
[5] M. R. Schroeder, J. L. Flanagan, and E. A. Lundry, “Bandwidth compression of speech by analytic-signal rooting,” Proc. IEEE, Vol. 55, pp. 396-401, Mar. 1967.
[6] M. Narendranadh, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, “Transformation of formants for voice conversion using artificial neural networks,” Speech Communication, Vol. 16, pp. 206-216, Feb. 1995.
[7] E. B. George andM. J. T. Smith, “Speech Analysis/Synthesis and modification using an Analysis-by-Synthesis/Overlap-Add Sinusoidal model,” IEEE Trans. Speech and Audio Processing, Vol. 5, pp. 389-406, Sept. 1997.
[8] Y. Zhang and J. Tao, “Prosody modification on mixedlanguage speech synthesis,” in Proc. Int. Conf. Spoken Language Processing, (Brisbane, Australia), Sept. 2008.
[9] S. R. M. Prasanna, D. Govind, K. S. Rao, and B. Yegnanarayana, “Fast prosody modification using instants of significant excitation,” in Speech Prosody 2010, (Chicago, USA), May 2010.
[10] D. Govind and S. R. M. Prasanna, “Expressive speech synthesis using prosodic modification and dynamic time warping,” in NCC 2009, (Guwahati, India), January 2009.
[11] Y. Stylianou, “Applying the harmonic plus noise model in concatenative speech synthesis,” IEEE Trans. Speech and Audio Processing, Vol. 9, pp. 21-29, Jan. 2001.
[12] H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigne, “Restructuring speech representations using a pitch- adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction: Possible role of a repetitive structure in sounds,” Speech Communication, Vol. 27, pp. 187-207, 1999.
[13] R. MuraliSankar, A. G. Ramakrishnan, and P. Prathibha, “Modification of pitch using DCT in source domain,” Speech Communication, Vol. 42, pp. 143-154, Jan. 2004.
[14] T. F. Quatieri and R. J.McAulay, “Shape invariant time-scale and pitch modification of speech,” IEEE Trans. Signal Processing, Vol. 40, pp. 497-510, Mar. 1992.
[15] W. Verhelst, “Overlap-add methods for time-scaling of speech,” Speech Communication, Vol. 30, pp. 207-221, 2000.
[16] D. O’Brien and A. Monaghan, Improvements in Speech Synthesis, ch. Shape invariant pitch and time-scale modification of speech based on harmonic model. Chichester: John Wiley & Sons, 2001.
[17] P. S. Murthy and B. Yegnanarayana, “Robustness of groupdelay-based method for extraction of significant excitation from speech signals,” IEEE Trans. Speech and Audio Processing, Vol. 7, pp. 609-619, Nov. 1999.
[18] J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, Vol. 63, pp. 561-580, Apr. 1975.
[19] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discretetime signal processing. Upper Saddle River, NJ.: Prentice-Hall, 1999.
[20] K. S. Rao and B. Yegnanarayana, “Prosody modification using instants of significant excitation,” IEEE Trans. Speech and Audio Processing, Vol. 14, pp. 972-980, May 2006.
[21] S. Haykin, Neural Networks: A Comprehensive Foundation. New Delhi, India: Pearson Education Aisa, Inc., 1999.
[22] D. Gabor, “Theory of communication,” J. IEE, Vol. 93, No. 2, pp. 429-457, 1946.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.