A new approach for HIV-1 protease cleavage site prediction combined with feature selection


Acquired immunodeficiency syndrome (AIDS) is a fatal disease which highly threatens the health of human being. Human immunodeficiency virus (HIV) is the pathogeny for this disease. Investigating HIV-1 protease cleavage sites can help researchers find or develop protease inhibitors which can restrain the replication of HIV-1, thus resisting AIDS. Feature selection is a new approach for solving the HIV-1 protease cleavage site prediction task and it’s a key point in our research. Comparing with the previous work, there are several advantages in our work. First, a filter method is used to eliminate the redundant features. Second, besides traditional orthogonal encoding (OE), two kinds of newly proposed features extracted by conducting principal component analysis (PCA) and non-linear Fisher transformation (NLF) on AAindex database are used. The two new features are proven to perform better than OE. Third, the data set used here is largely expanded to 1922 samples. Also to improve prediction performance, we conduct parameter optimization for SVM, thus the classifier can obtain better prediction capability. We also fuse the three kinds of features to make sure comprehensive feature representation and improve prediction performance. To effectively evaluate the prediction performance of our method, five parameters, which are much more than previous work, are used to conduct complete comparison. The experimental results of our method show that our method gain better performance than the state of art method. This means that the feature selection combined with feature fusion and classifier parameter optimization can effectively improve HIV-1 cleavage site prediction. Moreover, our work can provide useful help for HIV-1 protease inhibitor developing in the future.


Share and Cite:

Yuan, Y. , Liu, H. and Qiu, G. (2013) A new approach for HIV-1 protease cleavage site prediction combined with feature selection. Journal of Biomedical Science and Engineering, 6, 1155-1160. doi: 10.4236/jbise.2013.612144.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Brik, A. and Wong, C.H. (2003) HIV-1 protease: Mechanism and drug discovery. Organic & Biomolecular Chemistry, 1, 5-14. http://dx.doi.org/10.1039/b208248a
[2] Chou, K.C. (1996) Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry, 233, 1-14.
[3] Nanni, L. (2006) Comparison among feature extraction methods for HIV-1 protease cleavage site prediction. Pattern Recognition, 39, 711-713.
[4] Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T. and Kanehisa, M. (2008) AAindex: Amino acid index database, progress report 2008. Nucleic Acids Research, 36, 202-205.
[5] Niu, B., Lu, L., Liu, L., Gu, T.H., Feng, K.Y., Lu, W.C. and Cai, Y.D. (2009) HIV-1 protease cleavage site prediction based on amino acid property. Journal of Computational Chemistry, 30, 33-39.
[6] Du, P. and Li, Y. (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics, 7, 518.
[7] Nanni, L. and Lumini, A. (2006) MppS: An ensemble of support vector machine based on multiple physicochemical properties of amino acids. Neurocomputing, 69, 1688-1690. http://dx.doi.org/10.1016/j.neucom.2006.04.001
[8] Sarda, D., Chua, G.H., Li, K.B. and Krishnan, A. (2005) pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties. BMC Bioinformatics, 6, 152.
[9] Nanni, L. and Lumini, A. (2011) A new encoding technique for peptide classification. Expert Systems with Applications, 38, 3185-3191.
[10] Maclin, R. and Opitz, D. (1999) Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198.
[11] Jain, A.K., Duin, R.P.W. and Mao, J. (2000) Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 4-37.
[12] Guyon, I. and Elisseeff, A. (2003) An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157-1182.
[13] He, X. and Niyogi, X. (2004) Locality preserving projections. Neural Information Processing Systems, 16, 153.
[14] Yan, H., Yuan, X., Yan, S. and Yang, J. (2011) Correntropy based feature selection using binary projection. Pattern Recognition, 44, 2834-2842.
[15] Bradley, A.P. (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30, 1145-1159.
[16] Powers, D.M.W. (2011) Evaluation: From precision, recall and f-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2, 37-63.
[17] Cai, Y.D. and Chou, K.C. (1998) Artificial neural network model for predicting HIV protease cleavage sites in protein. Advances in Engineering Software, 29, 119-128.
[18] You, L., Garwicz, D. and Rognvaldsson, T. (2005) Comprehensive bioinformatic analysis of the specificity of human immunodeficiency virus type 1 protease. Journal of Virology, 79, 12477-12486.
[19] Kim, H., Yoon, T.S., Zhang, Y., Dikshit, A. and Chen, S.S. (2006) Predictability of rules in HIV-1 protease cleavage site analysis. Lecture Notes in Computational Science, 3992, 830-837.
[20] Kontijevskis, A., Wikberg, J.E. and Komorowski, J. (2007) Computational proteomics analysis of HIV-1 protease interactome. Proteins: Structure, Function, and Bioinformatics, 68, 305-312.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.