Computational Approaches for Biomarker Discovery


Computational biology plays a significant role in the discovery of new biomarkers, the analyses of disease states and the validation of potential biomarkers. Biomarkers are used to measure the progress of disease or the physiological effects of therapeutic intervention in the treatment of disease. They are also used as early warning signs for various diseases such as cancer and inflammatory diseases. In this review, we outline recent progresses of computational biology application in research on biomarkers discovery. A brief discussion of some necessary preliminaries on machine learning techniques (e.g., clustering and support vector machines—SVM) which are commonly used in many applications to biomarkers discovery is given and followed by a description of biological background on biomarkers. We further examine the integration of computational biology approaches and biomarkers. Finally, we conclude with a discussion of key challenges for computational biology to biomarkers discovery.

Share and Cite:

Yousef, M. , Najami, N. , Abedallah, L. and Khalifa, W. (2014) Computational Approaches for Biomarker Discovery. Journal of Intelligent Learning Systems and Applications, 6, 153-161. doi: 10.4236/jilsa.2014.64012.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Mitchell, T. (1997) Machine Learning. McGraw-Hill, New York.
[2] Malik Yousef, N.N. and Khalifav, W. (2010) A Comparison Study between One-Class and Two-Class Machine Learning for MicroRNA Target Detection. Journal of Biomedical Science and Engineering, 3, 347-252.
[3] Jain, A.K. and Dubes, R.C. (1988) Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs.
[4] Hartigan, J. (1975) Clustering Algorithms. Wiley, New York.
[5] Tryon, R.C. and Bailey, D.E. (1973) Cluster Analysis. McGraw-Hill, New York.
[6] Sneath, P.H.A. and Sokal, R.R. (1973) Numerical Taxonomy. Freeman, San Francisco.
[7] Anderberg, M.R. (1973) Cluster Analysis for Applications. Academic Press, New York.
[8] Jardine, N. and Sibson, R. (1971) Mathematical Taxonomy. Wiley, London.
[9] MacQueen, J.B. (1967) Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, 1, 281-297.
[10] Dunn, J.C. (1973) A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics, 3, 32-57.
[11] Bezdek, J.C. (1981) Pattern Recognition with Fuzzy Objective Function Algoritms. Plenum Press, New York.
[12] Johnson, S.C. (1967) Hierarchical Clustering Schemes. Psychometrika, 32, 241-254.
[13] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39, 1-38.
[14] Yuan, Y. and Shaw, M.J. (1995) Induction of Fuzzy Decision Trees. Fuzzy Sets and Systems, 69, 125-139.
[15] Vapnik, V. (1995) The Nature of Statistical Learning Theory. Springer, New York.
[16] Donaldson, I., Martin, J., De Bruijn, B., Wolting, C., Lay, V., Tuekam, B., Zhang, S.D., Baskin, B., Bader, G., Michalickova, K., Pawson, T. and Hogue, C.W.V. (2003) PreBIND and Textomy—Mining the Biomedical Literature for Protein-Protein Interactions Using a Support Vector Machine. BMC Bioinformatics, 4, 11.
[17] Pavlidis, P., Weston, J., Cai, J. and Grundy, W.N. (2001) Gene Functional Classification from Heterogeneous Data. Proceedings of the 5th Annual International Conference on Computational Biology, Montreal, 22-25 April 2001, 249-255.
[18] Haussler, D. (1999) Convolution Kernels on Discrete Structuresed. Technical Report UCSCCRL-99-10, Baskin School of Engineering, University of California, Santa Cruz.
[19] Novak, K. (2006) Taking out the Trash. Nature Reviews Cancer, 6, 92.
[20] Novak, K. (2006) Marked Aggression. Nature Reviews Cancer, 6, 96.
[21] Goymer, P. (2006) Different Treatment. Nature Reviews Cancer, 6, 94-95.
[22] Young, R.A. (2000) Biomedical Discovery with DNA Arrays. Cell, 102, 9-15.
[23] Hergenhahn, M., Muhlemann, K., Hollstein, M. and Kenzelmann, M. (2003) DNA Microarrays: Perspectives for Hypothesis-Driven Transcriptome Research and for Clinical Applications. Current Genomics, 4, 543-555.
[24] ESRC (Economic and Social Research Council) (2002) Genomics Scenario Project 2. Overview and Forecasts of the Applications of Genomics.
[25] Collins, F.S., Green, E.D., Guttmacher, A.E. and Guyer, M.S. (2003) A Vision for the Future of Genomics Research. Nature, 422, 835-847.
[26] Eggen, A. (2003) Basics and Tools of Genomics. Outlook on Agriculture, 32, 215-217.
[27] Jeffrey, S.S. (2008) Cancer Biomarker Profiling with microRNAs. Nature Biotechnology, 26, 400-401.
[28] Heneghan, H.M., Miller, N., Lowery, A.J., Sweeney, K.J. and Kerin, M.J. (2010) MicroRNAs as Novel Biomarkers for Breast Cancer. Journal of Oncology, 2010, Article ID: 950201.
[29] Wang, Y., Tetko, I.V., Hall, M.A., Frank, E., Facius, A., Mayer, K.F.X. and Mewes, H.W. (2005) Gene Selection from Microarray Data for Cancer Classification—A Machine Learning Approach. Computational Biology and Chemistry, 29, 37-46.
[30] Li, T., Zhang, C.L. and Ogihara, M. (2004) A Comparative Study of Feature Selection and Multiclass Classification Methods for Tissue Classification Based on Gene Expression. Bioinformatics, 20, 2429-2437.
[31] Inza, I., Larrañaga, P., Blanco, R. and Cerrolaza, A.J. (2004) Filter versus Wrapper Gene Selection Approaches in DNA Microarray Domains. Artificial Intelligence in Medicine, 31, 91-103.
[32] Zhang, X.G., Lu, X., Shi, Q., Xu, X.Q., Leung, H.C.E., Harris, L.N., et al. (2006) Recursive SVM Feature Selection and Sample Classification for Mass-Spectrometry and Microarray Data. BMC Bioinformatics, 7, 197.
[33] Duan, K.B., Rajapakse, J.C., Wang, H.Y. and Azuaje, F. (2005) Multiple SVM-RFE for Gene Selection in Cancer Classification with Expression Data. IEEE Transactions on NanoBioscience, 4, 228-234.
[34] Yang, X.W., Lin, D.Y., Hao, Z.F., Liang, Y.C., Liu, G.R. and Han, X. (2003) A Fast SVM Training Algorithm Based on the Set Segmentation and k-Means Clustering. Progress in Natural Science, 13, 750-755.
[35] Pan, W. (2002) A Comparative Review of Statistical Methods for Discovering Differentially Expressed Genes in Replicated Microarray Experiments. Bioinformatics, 18, 546-554.
[36] Li, F. and Yang, Y.M. (2005) Analysis of Recursive Gene Selection Approaches from Microarray Data. Bioinformatics, 21, 3741-3747.
[37] Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002) Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning, 46, 389-422.
[38] Xiong, M., Fang, X. and Zhao, J. (2001) Biomarker Identification by Feature Wrappers. Genome Research, 11, 1878-1887.
[39] Yousef, M., Jung, S., Showe, L.C. and Showe, M.K. (2007) Recursive Cluster Elimination (RCE) for Classification and Feature Selection from Gene Expression Data. BMC Bioinformatics, 8, 144.
[40] Luo, L.K., Huang, D.F., Ye, L.J., Zhou, Q.F., Shao, G.F. and Peng, H. (2011) Improving the Computational Efficiency of Recursive Cluster Elimination for Gene Selection. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8, 122-129.
[41] Grate, L. (2005) Many Accurate Small-Discriminatory Feature Subsets Exist in Microarray Transcript Data: Biomarker Discovery. BMC Bioinformatics, 6, 97.
[42] Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P. and Saeys, Y. (2009) Robust Biomarker Identification for Cancer Diagnosis with Ensemble Feature Selection Methods. Bioinformatics, 26, 392-398.
[43] Deng, X., Geng, H. and Ali, H.H. (2007) Cross-Platform Analysis of Cancer Biomarkers: A Bayesian Network Approach to Incorporating Mass Spectrometry and Microarray Data. Cancer Informatics, 3, 183-202.
[44] Huang, H.C., Jupiter, D. and VanBuren, V. (2010) Classification of Genes and Putative Biomarker Identification Using Distribution Metrics on Expression Profiles. PLoS ONE, 5, e9056.
[45] Oh, J.H., Kim, Y.B., Gurnani, P., Rosenblatt, K.P. and Gao, J.X. (2008) Biomarker Selection and Sample Prediction for Multi-Category Disease on MALDI-TOF Data. Bioinformatics, 24, 1812-1818.
[46] Li, Y., Wang, N., Perkins, E.J., Zhang, C.Y. and Gong, P. (2010) Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset. PLoS ONE, 5, e13715.
[47] Yousef, M., Ketany, M., Manevitz, L., Showe, L.C. and Showe, M.K. (2009) Classification and Biomarker Identification Using Gene Network Modules and Support Vector Machines. BMC Bioinformatics, 10, 337.
[48] Nacu, S., Critchley-Thorne, R., Lee, P. and Holmes, S. (2007) Gene Expression Network Analysis and Applications to Immunology. Bioinformatics, 23, 850-858.
[49] Pirooznia, M., Yang, J.Y., Yang, M.Q. and Deng, Y.P. (2008) A Comparative Study of Different Machine Learning Methods on Microarray Gene Expression Data. BMC Genomics, 9, S13.
[50] Tai, F. and Pan, W. (2007) Incorporating Prior Knowledge of Predictors into Penalized Classifiers with Multiple Penalty Terms. Bioinformatics, 23, 1775-1782.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.