MAP-based Audio Coding Compensation for Speaker Recognition

DOI: 10.4236/jsip.2011.23021   PDF   HTML     5,651 Downloads   8,698 Views   Citations


The performance of the speaker recognition system declines when training and testing audio codecs are mismatched. In this paper, based on analyzing the effect of mismatched audio codecs in the linear prediction cepstrum coefficients, a method of MAP-based audio coding compensation for speaker recognition is proposed. The proposed method firstly sets a standard codec as a reference and trains the speaker models in this codec format, then learns the deviation distributions between the standard codec format and the other ones, next gets the current bias via using a small number adaptive data and the MAP-based adaptive technique, and then adjusts the model parameters by the type of coming audio codec format and its related bias. During the test, the features of the coming speaker are used to match with the adjusted model. The experimental result shows that the accuracy reached 82.4% with just one second adaptive data, which is higher 5.5% than that in the baseline system.

Share and Cite:

T. Jiang and J. Han, "MAP-based Audio Coding Compensation for Speaker Recognition," Journal of Signal and Information Processing, Vol. 2 No. 3, 2011, pp. 165-169. doi: 10.4236/jsip.2011.23021.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] F. Bimbot, J. F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-García, D. P. Delacrétaz and D. A. Reynolds, “A Tutorial on Text-Independent Speaker Verification,” EURASIP Journal on Applied Signal Processing, Vol. 4, 2004, pp. 430-451. doi:10.1155/S1110865704310024
[2] T. Kinnunen and H. Li, “An Overview of Text-Independent Speaker Recognition: From Features to Supervectors,” Speech Communication, Vol. 52, No. 1, 2010, pp. 12-40. doi:10.1016/j.specom.2009.08.009
[3] M. Phythian, J. Ingram and S. Sridharan, “Effects of Speech Coding on Text-Dependent Speaker Recognition,” Proceedings of IEEE Conference Speech and Image Technologies for Computing and Telecommunications, Vol. 1, Brisbane, December 1997, pp. 137-140.
[4] R. B. Dunn, T. F. Quatieri, D. A. Reynolds and J. P. Campbell, “Speaker Recognition from Coded Speech and the Effects of Score Normalization,” 35th Asilomar Conference on Signals, System s and Computers, Vol. 2, Pacific Grove, November 2001, pp. 1562-1567.
[5] T. Jiang, B. Y. Gao and J. Q. Han, “Speaker Identification and Verification from Audio Coded Speech in Matched and Mismatched Conditions,” IEEE International Conference on Robotics and Biomimetics, Guilin, December 2009, pp. 2199-2204.
[6] T. F. Quatieri, R. B. Dunn, D. A. Reynolds, J. P. Campbell and E. Singer, “Speaker Recognition Using G. 729 Speech Codec Parameters,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Vol. 2, June 2000, pp. 1089-1092.
[7] M. G. Kuitert and L. Boves, “Speaker Verification with GSM Coded Telephone Speech,” Proceedings EUROSPEECH 1997, Vol. 2, Rhodes, September 1997, pp. 975-978.
[8] B. S. Atal, “Effectiveness of Linear Prediction Characteristics of the Speech Wave for Automatic Speaker Identification and Verification,” Journal of the Acoustical Society of America, Vol. 55, No. 6, 1974, pp. 1304-1312. doi:10.1121/1.1914702
[9] G. L. Gauvain and C. H. Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 2, 1994, pp. 291-298. doi:10.1109/89.279278
[10] J. Bilmes, “A Gentle Tutorial on the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” Technical Report ICSI-TR-97-021, 1997.

comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.