Semantic Similarity over Gene Ontology for Multi-Label Protein Subcellular Localization

Abstract

As one of the essential topics in proteomics and molecular biology, protein subcellular localization has been extensively studied in previous decades. However, most of the methods are limited to the prediction of single-location proteins. In many studies, multi-location proteins are either not considered or assumed not existing. This paper proposes a novel multi-label subcellular-localization predictor based on the semantic similarity between Gene Ontology (GO) terms. Given a protein, the accession numbers of its homologs are obtained via BLAST search. Then, the homologous accession numbers of the protein are used as keys to search against the gene ontology annotation database to obtain a set of GO terms. The semantic similarity between GO terms is used to formulate semantic similarity vectors for classification. A support vector machine (SVM) classifier with a new decision scheme is proposed to classify the multi-label GO semantic similarity vectors. Experimental results show that the proposed multi-label predictor significantly outperforms the state-of-the-art predictors such as iLoc-Plant and Plant-mPLoc.

Share and Cite:

Wan, S. , Mak, M. and Kung, S. (2013) Semantic Similarity over Gene Ontology for Multi-Label Protein Subcellular Localization. Engineering, 5, 68-72. doi: 10.4236/eng.2013.510B014.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] K. C. Chou and Y. D. Cai, “Predicting Protein Localization in Budding Yeast,” Bioinformatics, Vol. 21, 2005, pp. 944-950. http://dx.doi.org/10.1093/bioinformatics/bti104
[2] H. Nakashima and K. Nishikawa, “Discrimination of Intracellular and Extracellular Proteins Using Amino Acid Composition and Residue-Pair Frequencies,” Journal of Molecular Biology, Vol. 238, 1994, pp. 54-61. http://dx.doi.org/10.1002/prot.1035
[3] K. C. Chou, “Prediction of Protein Cellular Attributes Using Pseudo Amino Acid Composition,” Proteins: Structure, Function, and Genetics, Vol. 43, 2001, pp. 246-255. http://dx.doi.org/10.1002/prot.1035
[4] O. Emanuelsson, H. Nielsen, S. Brunak and G. von Hei- jne, “Predicting Subcellular Localization of Proteins Based on Their N-Terminal Amino Acid Sequence,” Journal of Molecular Biology, Vol. 300, No. 4, 2000, pp. 1005- 1016. http://dx.doi.org/10.1006/jmbi.2000.3903
[5] H. Nielsen, J. Engelbrecht, S. Brunak and G. von Heijne, “A Neural Network Method for Identification of Prokaryotic and Eukaryotic Signal Peptides and Prediction of Their Cleavage Sites,” International Journal of Neural Systems, Vol. 8, 1997, pp. 581-599. http://dx.doi.org/10.1142/S0129065797000537
[6] M. W. Mak, J. Guo and S. Y. Kung, “PairProSVM: Protein Subcellular Localization Based on Local Pairwise Profile Alignment and SVM,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 5, No. 3, 2008, pp. 416-422.
[7] S. Wan, M. W. Mak and S. Y. Kung, “Protein Subcellular Localization Prediction Based on Profile Alignment and Gene Ontology,” 2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP’11), September 2011, pp. 1-6.
[8] K. C. Chou and Y. D. Cai, “Prediction of Protein Subcellular Locations by GO-FunD-PseAA Predictor,” Biochemical and Biophysical Research Communications, Vol. 320, 2004, pp. 1236-1239. http://dx.doi.org/10.1016/j.bbrc.2004.06.073
[9] S. Wan, M. W. Mak and S. Y. Kung, “GOASVM: A Subcellular Location Predictor by Incorporating Term-Frequency Gene Ontology into the General Form of Chou’s Pseu-do-Amino Acid Composition,” Journal of Theoretical Biology, Vol. 323, 2013, pp. 40-48. http://dx.doi.org/10.1016/j.jtbi.2013.01.012
[10] K. C. Chou and H. B. Shen, “Predicting Eukaryotic Protein Sub-cellular Location by Fusing Optimized Evidence-Theoretic K-Nearest Neighbor Classifiers,” Journal of Proteome Research, Vol. 5, 2006, pp. 1888-1897. http://dx.doi.org/10.1021/pr060167c
[11] S. Wan, M. W. Mak and S. Y. Kung, “Adaptive Thresholding for Multi-Label SVM Classification with Application to Protein Subcellular Localization Prediction,” 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’13), 2013, pp. 3547- 3551.
[12] S. Mei, “Multi-Label Multi-Kernel Transfer Learning for Human Protein Subcellular Localization,” PLoS ONE, Vol. 7, No. 6, 2012, Article ID: e37716. http://dx.doi.org/10.1371/journal.pone.0037716
[13] S. Wan, M. W. Mak and S. Y. Kung, “GOASVM: Protein Subcellular Localization Prediction Based on Gene Ontology Annotation and SVM,” 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12), 2012, pp. 2229-2232. http://dx.doi.org/10.1109/ICASSP.2012.6288356
[14] K. C. Chou and H. B. Shen, “Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Pro- tein Subcellular Localization,” PLoS ONE, Vol. 5, 2010, Article ID: e11335. http://dx.doi.org/10.1371/journal.pone.0011335
[15] H. B. Shen and K. C. Chou, “Virus-mPLoc: A Fusion Classifier for Viral Protein Subcellular Location Prediction by Incorporating Multiple Sites,” Journal of Biomolecular Structure & Dynamics, Vol. 26, 2010, pp. 175- 186. http://dx.doi.org/10.1080/07391102.2010.10507351
[16] Z. C. Wu, X. Xiao and K. C. Chou, “iLoc-Plant: A Multi-Label Classifier for Predicting the Subcellular Localization of Plant Proteins with Both Single and Multiple Sites,” Molecular BioSystems, Vol. 7, 2011, pp. 3287-3297. http://dx.doi.org/10.1039/c1mb05232b
[17] X. Xiao, Z. C. Wu and K. C. Chou, “iLoc-Virus: A Multi-Label Learning Classifier for Identifying the Subcellular Localization of Virus Proteins with Both Single and Multiple Sites,” Journal of Theoretical Biology, Vol. 284, 2011, pp. 42-51. http://dx.doi.org/10.1016/j.jtbi.2011.06.005
[18] M. Zhu, L. Gao, Z. Guo, Y. Li, D. Wang, J. Wang and C. Wang, “Globally Predicting Protein Functions Based on Co-Expressed Protein-Protein Interaction Networks and Ontology Taxonomy Similarities,” Gene, Vol. 391, No. 1-2, 2007, pp. 113-119. http://dx.doi.org/10.1016/j.gene.2006.12.008
[19] Z. Lei and Y. Dai, “Assessing Protein Similarity with Gene Ontology and Its Use in Subnuclear Localization Prediction,” BMC Bioinformatics, Vol. 7, 2006, p. 491. http://dx.doi.org/10.1186/1471-2105-7-491
[20] X. Wu, L. Zhu, J. Guo, D. Y. Zhang and K. Lin, “Prediction of Yeast Protein-Protein Interaction Network: Insights from the Gene Ontology and Annotations,” Nucleic Acids Research, Vol. 34, No. 7, 2006, pp. 2137-3150. http://dx.doi.org/10.1093/nar/gkl219
[21] D. Yang, Y. Li, H. Xiao, Q. Liu, M. Zhang, J. Zhu, W. Ma, C. Yao, J. Wang, D. Wang, Z. Guo and B. Yang, “Gaining Confidence in Biological Interpretation of the Microarray Data: The Functional Consistence of the Significant GO Categories,” Bioinformatics, Vol. 24, No. 2, 2008, pp. 265-271. http://dx.doi.org/10.1093/bioinformatics/btm558
[22] P. Resnik, “Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity in Natural Language,” Journal of Artificial Intelligence Research, Vol. 11, 1999, pp. 95-130.
[23] P. W. Lord, R. D. Stevens, A. Brass and C. A. Goble, “Investigating Semantic Similarity Measures across the Gene Ontology: The Relationship between Sequence and Annotation,” Bioinformatics, Vol. 19, No. 10, 2003, pp. 1275-1283. http://dx.doi.org/10.1093/bioinformatics/btg153
[24] D. Lin, “An Information-Theoretic Definition of Similarity,” Proceedings of the 15th International Conference on Machine Learning, 1998, pp. 296-304.
[25] C. Pesquita, D. Faria, A. O. Falcao, P. Lord and F. M. Counto, “Semantic Similarity in Biomedical Ontologies,” PLoS Computational Biology, Vol. 5, No. 7, 2009, Article ID: e1000443. http://dx.doi.org/10.1371/journal.pcbi.1000443
[26] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D. J. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, Vol. 25, 1997, pp. 3389-3402. http://dx.doi.org/10.1093/nar/25.17.3389
[27] S. Wan, M. W. Mak and S. Y. Kung, “mGOASVM: Multi-Label Protein Subcellular Localization Based on Gene Ontology and Support Vector Machines,” BMC Bioinformatics, Vol. 13, 2012, p. 290.
[28] K. C. Chou and H. B. Shen, “Recent Progress in Protein Subcellular Location Prediction,” Analytical Biochemistry, Vol. 1, No. 370, 2007, pp. 1-16. http://dx.doi.org/10.1016/j.ab.2007.07.006

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.