Speaker Recognition System Based on the Baseband Correlation Score Reliability Fusion ()
Qi He,
Ting Huang,
Hongbo Zhang
MicroStrategy Software (Hangzhou) Co., Ltd., Hangzhou, China.
School of Physics & Electrical Information Engineering, Ningxia University, Yinchuan, China.
Science and Technology Department of Ningxia, Yinchuan, China.
DOI: 10.4236/cn.2013.53B2107
PDF
HTML
4,719
Downloads
5,964
Views
Citations
Abstract
Emotion mismatch between training and testing will cause
system performance decline sharply which is emotional speaker recognition. It
is an important idea to solve this problem according to the emotion
normalization of test speech. This method proceeds from analysis of the
differences between every kind of emotional speech and neutral speech. Besides,
it takes the baseband mismatch of emotional changes as the main line. At the
same time, it gives the corresponding algorithm according to four technical
points which are emotional expansion, emotional shield, emotional normalization
and score compensation. Compared with the traditional GMM-UBM method, the
recognition rate in MASC corpus and EPST corpus was increased by 3.80% and
8.81% respectively.
Share and Cite:
He, Q. , Huang, T. and Zhang, H. (2013) Speaker Recognition System Based on the Baseband Correlation Score Reliability Fusion.
Communications and Network,
5, 596-600. doi:
10.4236/cn.2013.53B2107.
Conflicts of Interest
The authors declare no conflicts of interest.
References
[1]
|
M. Pawlewski and J. Jones, “URU Plus—A Scalable Component-Based Speaker-Verification System for BT’s 21st Century Network,” BT Technology Journal, Vol. 25, No. 3-4, 2007, pp. 170-178.
http://dx.doi.org/10.1007/s10550-007-0072-y
|
[2]
|
J. Q. Han, L. Zhang and Y. R. Zheng, “Speech and Signal Processing,” Tsinghua University Press, Beijing, 2004.
|
[3]
|
S. Furui, “Cepstral Analysis Technique for Automatic Speaker Verification,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 29, No. 2, 1981, pp. 254-272. http://dx.doi.org/10.1109/TASSP.1981.1163530
|
[4]
|
R. D. Zilca, B. Kingsbury, J. Navratil, et al., “Pseudo Pitch Synchronous Analysis of Speech with Applications to Speaker Recognition,” IEEE Transactions on Audio Speech and Language Processing, Vol. 14, No. 2, 2006, pp. 467-478. http://dx.doi.org/10.1109/TSA.2005.857809
|
[5]
|
D. Morrison, R. Wang and L. C. De Silva, “Ensemble Methods for Spoken Emotion Recognition in Call-Centres,” Speech Communication, Vol. 49, No. 2, 2007, pp. 98-112. http://dx.doi.org/10.1016/j.specom.2006.11.004
|