Using Chou’s Pseudo Amino Acid Composition for Protein Remote Homology Detection

Abstract

Protein remote homology detection is a key problem in bioinformatics. Currently, the discriminative methods, such as Support Vector Machine (SVM), can achieve the best performance. The most efficient approach to improve the performance of the SVM-based methods is to find a general protein representation method that is able to convert proteins with different lengths into fixed length vectors and captures the different properties of the proteins for the discrimination. The bottleneck of designing the protein representation method is that native proteins have different lengths. Motivated by the success of the pseudo amino acid composition (PseAAC) proposed by Chou, we applied this approach for protein remote homology detection. Some new indices derived from the amino acid index (AAIndex) database are incorporated into the PseAAC to improve the generalization ability of this method. Our experiments on a well-known benchmark show this method achieves superior or comparable performance with current state-of-the-art methods.

Share and Cite:

Liu, B. and Wang, X. (2013) Using Chou’s Pseudo Amino Acid Composition for Protein Remote Homology Detection. Engineering, 5, 149-153. doi: 10.4236/eng.2013.510B032.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] L. Liao and W. S. Noble, “Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships,” Journal of Computational Biology, Vol. 10, No. 6, 2003, pp. 857-868. http://dx.doi.org/10.1089/106652703322756113
[2] T. F. Smith and M. S. Waterman, “Identification of Common Molecular Subsequences,” Journal of Molecular Biology, Vol. 147, No. 1, 1981, pp. 195-197. http://dx.doi.org/10.1016/0022-2836(81)90087-5
[3] B. Qian and R. A. Goldstein, “Performance of an Iterated T-Hmm for Homology Detection,” Bioinformatics, Vol. 20, No. 14, 2004, pp. 2175-2180. http://dx.doi.org/10.1093/bioinformatics/bth181
[4] V. N. Vapnik, “Statistical Learning Theory,” 1998.
[5] H. Saigo, et al., “Protein Homology Detection Using String Alignment Kernels,” Bioinformatics, Vol. 20, No. 11, 2004, pp. 1682-1689. http://dx.doi.org/10.1093/bioinformatics/bth141
[6] B. Liu, et al., “Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection,” PLoS ONE, Vol. 7, No. 9, 2012, p. e46633. http://dx.doi.org/10.1371/journal.pone.0046633
[7] S. Kawashima, et al., “AAindex: Amino Acid Index Database, Progress Report 2008,” Nucleic Acids Research, Vol. 36, No. Database, 2008, pp. D202-D205.
[8] B. Liu, et al., “A Discriminative Method for Protein Remote Homology Detection and Fold Recognition Combining Top-n-Grams and Latent Semantic Analysis,” BMC Bioinformatics, Vol. 9, 2008, p. 510. http://dx.doi.org/10.1186/1471-2105-9-510
[9] T. Lingner and P. Meinicke, “Remote Homology Detection Based on Oligomer Distances,” Bioinformatics, Vol. 22, No. 18, 2006, pp. 2224-2231. http://dx.doi.org/10.1093/bioinformatics/btl376
[10] K. C. Chou, “Prediction of Protein Cellular Attributes Using Pseudo Amino Acid Composition,” Proteins: Structure, Function, and Bioinformatics, Vol. 43, 2001, pp. 246-255. http://dx.doi.org/10.1002/prot.1035
[11] Q. W. Dong, et al., “Application of Latent Semantic Analysis to Protein Remote Homology Detection,” Bioinformatics, Vol. 22, No. 3, 2006, pp. 285-290. http://dx.doi.org/10.1093/bioinformatics/bti801
[12] S. E. Brenner, et al., “The ASTRAL Compendium for Sequence and Structure Analysis,” Nucleic Acids Research, Vol. 28, No. 1, 2000, pp. 254-256. http://dx.doi.org/10.1093/nar/28.1.254
[13] Y. D. Cai and K. C. Chou, “Predicting Enzyme Subclass by Functional Domain Composition and Pseudo Amino Acid Composition,” Journal of Proteome Research, Vol. 4, 2005, pp. 967-971. http://dx.doi.org/10.1021/pr0500399
[14] Y. D. Cai and K. C. Chou, “Nearest Neighbour Algorithm for Predicting Protein Subcellular Location by Combining Functional Domain Composition and Pseudoamino Acid Composition,” Biochemical and Biophysical Research Communications, Vol. 305, 2003, pp. 407-411. http://dx.doi.org/10.1016/S0006-291X(03)00775-7
[15] H. B. Shen and K. C. Chou, “Predicting Protein Subnuclear Location with Optimized Evidence-Theoretic K-Nearest Classifier and Pseudo Amino Acid Composition,” Biochemical and Biophysical Research Communications, Vol. 337, 2005, pp. 752-756. http://dx.doi.org/10.1016/j.bbrc.2005.09.117
[16] Y. D. Cai and K. C. Chou, “Predicting Membrane Protein Type by Functional Domain Composition and Pseudo Amino Acid Composition,” Journal of Theoretical Biology, Vol. 238, 2006, pp. 395-400. http://dx.doi.org/10.1016/j.jtbi.2005.05.035
[17] K. C. Chou and H. B. Shen, “MemType-2L: AWEB Server for Predicting Membrane Proteins and Their Types by Incorporating Evolution Information through Pse-PSSM,” Biochemical and Biophysical Research Communications, Vol. 360, 2007, pp. 339-345. http://dx.doi.org/10.1016/j.bbrc.2007.06.027
[18] K. C. Chou and H. B. Shen, “ProtIdent: A Web Server for Identifying Proteases and Their Types by Fusing Functional Domain and Sequential Evolution Information,” Biochemical and Biophysical Research Communications, Vol. 376, 2008, pp. 321-325. http://dx.doi.org/10.1016/j.bbrc.2008.08.125
[19] K. C. Chou and Y. D. Cai, “Predicting Protein Quaternary Structure by Pseudo Amino Acid Composition,” Proteins: Structure, Function, and Bioinformatics, Vol. 53, 2003, pp. 282-289. http://dx.doi.org/10.1002/prot.10500
[20] H. B. Shen and K. C. Chou, “QuatIdent: A Web Server for Identifying Protein Quaternary Structural Attribute by Fusing Functional Domain and Sequential Evolution Information,” Journal of Proteome Research, Vol. 8, 2009, pp. 1577-1584. http://dx.doi.org/10.1021/pr800957q
[21] H. B. Shen and K. C. Chou, “Ensemble Classifier for Protein Fold Pattern Recognition,” Bioinformatics, Vol. 22, 2006, pp. 1717-1722. http://dx.doi.org/10.1093/bioinformatics/btl170
[22] H. B. Shen and K. C. Chou, “Predicting Protein Fold Pattern with Functional Domain and Sequential Evolution Information,” Journal of Theoretical Biology, Vol. 256, 2009, pp. 441-446. http://dx.doi.org/10.1016/j.jtbi.2008.10.007
[23] M. Gribskov and N. L. Robinson, “Use of Receiver Operating Characteristic (Roc) Analysis to Evaluate Sequence Matching,” Computational Chemistry, Vol. 20, No. 1, 1996, pp. 25-33. http://dx.doi.org/10.1016/S0097-8485(96)80004-0
[24] Q. Dong, et al., “Protein Remote Homology Detection Based on Binary Profiles,” Proceedings of 1st International Conference on Bioinformatics Research and Development (BIRD), Germany, 2007, pp. 212-223.
[25] C. S. Leslie, et al., “Mismatch String Kernels for Discriminative Protein Classification,” Bioinformatics, Vol. 20, No. 4, 2004, pp. 467-476. http://dx.doi.org/10.1093/bioinformatics/btg431

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.