Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC, LPCC, PLP, RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions

Abstract

In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.

Share and Cite:

Këpuska, V. and Elharati, H. (2015) Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC, LPCC, PLP, RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions. Journal of Computer and Communications, 3, 1-9. doi: 10.4236/jcc.2015.36001.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Kepuska, V. and Klein, T. (2009) A Novel Wake-Up-Word Speech Recognition System, Wake-Up-Word Recognition Task, Technology and Evaluation. Nonlinear Analysis: Theory, Methods & Applications, 71, e2772-e2789.
http://dx.doi.org/10.1016/j.na.2009.06.089
[2] Veisi, H. and Sameti, H. (2013) Speech Enhancement Using Hidden Markov Models in Mel-Frequency Domain. Speech Communication, 55, 205-220. http://dx.doi.org/10.1016/j.specom.2012.08.005
[3] Zhu, Q. and Alwan, A. (2000) On the Use of Variable Frame rate Analysis in Speech Recognition. 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, 3, 1783-1786.
[4] Rabiner, L. R. and Juang, B.-H. (1993) Fundamentals of Speech Recognition. Vol. 14, PTR Prentice Hall, Englewood Cliffs.
[5] Chetouani, M., Gas, B. and Zarader, J. (2002) Discriminative Training for Neural Predictive Coding Applied to Speech Features Extraction. Proceedings of the 2002 International Joint Conference on Neural Networks, 1, 852-857.
http://dx.doi.org/10.1109/ijcnn.2002.1005585
[6] Dave, N. (2013) Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition. International Journal for Advance Research in Engineering and Technology, 1.
[7] Hermansky, H. (1990) Perceptual Linear Predictive (PLP) Analysis of Speech. The Journal of the Acoustical Society of America, 87, 1738-1752.
[8] Hermansky, H., Morgan, N., Bayya, A. and Kohn, P. (1991) The Challenge of Inverse-E: The RASTA-PLP Method. 1991 Conference Record of the 25th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, 4-6 November 1991, 800-804. http://dx.doi.org/10.1109/acssc.1991.186557
[9] Dugad, R. and Desai, U. (1996) A Tutorial on Hidden Markov Models. Signal Processing and Artificial Neural Networks Laboratory, Department of Electrical Engineering, Indian Institute of Technology, Bombay Powai, Mumbai, 400 076, India.
[10] Kepuska, V.Z. and Elharati, H.A (2015) Performance Evaluation of Conventional and Hybrid Feature Extractions Using Multivariate HMM Classifier. International Journal of Engineering Research and Applications (IJERA), 5, 96-101.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.