Applying Score Reliability Fusion to Bi-Model Emotional Speaker Recognition

H. B. Zhang; T. Wang; T. Huang; X. Yang

doi:10.4236/jsip.2013.43B001

Journal of Signal and Information Processing > Vol.4 No.3B, August 2013

Applying Score Reliability Fusion to Bi-Model Emotional Speaker Recognition

H. B. Zhang, T. Wang, T. Huang, X. Yang
MicroStrategy Software (Hangzhou) Co., Ltd., Hangzhou, Zhejiang, 310012, China.
School of Physics & Electrical Information Engineering, Ningxia University, Yinchuan, 750021, China.
DOI: 10.4236/jsip.2013.43B001 PDF HTML 4,150 Downloads 5,424 Views Citations

Abstract

Emotion mismatch between training and testing is one of the important factors causing the performance degradation of speaker recognition system. In our previous work, a bi-model emotion speaker recognition (BESR) method based on virtual HD (High Different from neutral, with large pitch offset) speech synthesizing was proposed to deal with this problem. It enhanced the system performance under mismatch emotion states in MASC, while still suffering the system risk introduced by fusing the scores from the unreliable VHD model and the neutral model with equal weight. In this paper, we propose a new BESR method based on score reliability fusion. Two strategies, by utilizing identification rate and scores average relative loss difference, are presented to estimate the weights for the two group scores. The results on both MASC and EPST shows that by using the weights generated by the two strategies, the BESR method achieve a better performance than that by using the equal weight, and the better one even achieves a result comparable to that by using the best weights selected by exhaustive strategy.

Keywords

Emotional Speaker Recogitnion; Score Reliability Fusion; Fusion Weight Estimating Strategy; Bi-Model

Share and Cite:

H. Zhang, T. Wang, T. Huang and X. Yang, "Applying Score Reliability Fusion to Bi-Model Emotional Speaker Recognition," Journal of Signal and Information Processing, Vol. 4 No. 3B, 2013, pp. 1-6. doi: 10.4236/jsip.2013.43B001.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	M. Liberman, et al., “Emotional Prosody Speech and Transcripts,” Philadelphia; Linguistic Data Consortium. 2002.
[2]	T. Huang and Y. Yang, “Learning Virtual HD Model for Bi-model Emotional Speaker Recognition,” ICPR, Istanbul, Turkey, 23-26 Aug. 2010, pp. 1614-1617.
[3]	D. G. Childers, J. J. Yea and E. L. Bocchieri, “Source/ Vocal-tract Interaction in Speech and Singing Synthesis,” Proc Stockholm Music Acoust Conf, 1983, pp. 125-141.
[4]	D. G. Childers and C. F. Wong, “Measuring and Modeling Vocal Source-Tract Interaction, Ieee Transactions on Biomedical Engineering, Vol. 41, No. 7, 1994, pp. 663-671. doi:10.1109/10.301733
[5]	H. Akaike, “A New Look at the Statistical Model Identification,” Automatic Control, IEEE Transactions on, Vol. 19, No. 6, 1974, pp. 716-723. doi:10.1109/TAC.1974.1100705
[6]	L. Rabiner, “On the Use of Autocorrelation Analysis for Pitch Detection, Acoustics,Speech and Signal Processing,” IEEE Transactions on, Vol. 25, No. 1, 1977, pp. 24-33. doi:10.1109/TASSP.1977.1162905
[7]	L. H. Cai, D. Z. Huang and R. Cai, “Basis of Modern Speech Technology and Application,” Tsinghua University press, Beijing, China, 2003.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies