A Localized-Statistic-Based Approach for Biomarker Identification of Omics Data

Abstract

Omics data provides an essential means for molecular biology and systems biology to capture the systematic properties of inner activities of cells. And one of the strongest challenge problems biological researchers have faced is to find the methods for discovering biomarkers for tracking the process of disease such as cancer. So some feature selection methods have been widely used to cope with discovering biomarkers problem. However omics data usually contains a large number of features, but a small number of samples and some omics data have a large range distribution, which make feature selection methods remains difficult to deal with omics data. In order to overcome the problems, wepresent a computing method called localized statistic of abundance distribution based on Gaussian window(LSADBGW) to test the significance of the feature. The experiments on three datasets including gene and protein datasets showed the accuracy and efficiency of LSADBGW for feature selection.

Share and Cite:

Zhang, K. , Chen, H. and Li, Y. (2013) A Localized-Statistic-Based Approach for Biomarker Identification of Omics Data. Engineering, 5, 433-439. doi: 10.4236/eng.2013.510B089.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] D. J. Oliver, B. Nikolau and E. S. Wurtele, “Functional Genomics: High-Throughput mRNA, Protein, and Metabolite Analyses,” Elsevier, 2002, pp. 98-106.
[2] N. Ishii and M. Tomita, “Multi-Omics Data-Driven Systems Biology of E. coli,” Springer, 2009, p. 41.
[3] S. Smit, H. C. J. Hoefsloot and A. K. Smilde, “Statistical Data Processing in Clinical Proteomics,” Elsevier, 2008, pp. 77-88.
[4] H. Shin and M. K. Markey, “A Machine Learning Perspective on the Development of Clinical Decision Support Systems Utilizing Mass Spectra of Blood Samples,” Elsevier, 2006, pp. 227-248.
[5] I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,” MIT Press Cambridge, 2003, pp. 1157-1182.
[6] E. Marchiori, et al., “Feature Selection for Classification with Proteomic Data of Mixed Quality,” 2005, pp. 1-7.
[7] H. W. Ressom, et al., “Classification Algorithms for Phenotype Prediction in Genomics and Proteomics,” NIH Public Access, p. 691.
[8] M. Dakna, et al., “Technical, Bioinformatical and Statistical Aspects of Liquid Chromatography-Mass Spectrometry (LC-MS) and Capillary Electrophoresis-Mass Spectrometry (CE-MS) Based Clinical Proteomics: A Critical Assessment,” Elsevier, 2009, pp. 1250-1258.
[9] Chen, J. J., et al., “Gene Selection with Multiple Ordering Criteria,” BioMed Central Ltd., 2007, p. 74.
[10] A. Vlahou, et al., “Development of a Novel Proteomic Approach for the Detection of Transitional Cell Carcinoma of the Bladder in Urine,” ASIP, 2001, pp. 1491-1502.
[11] M. J. Campa, et al., “Protein Expression Profiling Identifies Macrophage Migration Inhibitory Factor and Cyclophilin A as Potential Molecular Targets in Non-Small Cell Lung Cancer 1,” AACR, 2003, pp. 1652-1656.
[12] J. M. Koomen, et al., “Plasma Protein Profiling for Diagnosis of Pancreatic Cancer Reveals the Presence of Host Response Proteins,” AACR, 2005, pp. 1110-1118.
[13] J. M. Koomen, et al., “Diagnostic Protein Discovery Using Proteolytic Peptide Targeting and Identification,” John Wiley & Sons, Ltd., Chichester, 2004.
[14] K. R. Kozak, et al., “Identification of Biomarkers for Ovarian Cancer Using Strong Anion-Exchange ProteinChips: Potential Use in Diagnosis and Prognosis,” National Acad Sciences, 2003, pp. 12343-12348.
[15] W. Zhu, et al., “Detection of Cancer-Specific Markers Amid Massive Mass Spectral Data,” National Acad Sciences, 2003, pp. 14666-14671.
[16] T. C. W. Poon, et al., “Comprehensive Proteomic Profiling Identifies Serum Proteomic Signatures for Detection of Hepatocellular Carcinoma and Its Subtypes,” American Association of Clinical Chemistry, 2003, p. 752-760.
[17] A. Valerio, et al., “Serum Protein Profiles of Patients with Pancreatic Cancer and Chronic Pancreatitis: Searching for a Diagnostic Protein Pattern,” John Wiley & Sons, Ltd., Chichester, 2001.
[18] M. Wagner, D. Naik and A. Pothen, “Protocols for Disease Classification from Mass Spectrometry Data,” WI- LEY-VCH Verlag Weinheim, 2003.
[19] M. Wagner, et al., “Computational Protein Biomarker Prediction: A Case Study for Prostate Cancer,” BioMed Central Ltd., 2004, p. 26.
[20] S. Bhattacharyya, et al., “Diagnosis of Pancreatic Cancer Using Serum Proteomic Profiling,” 2004, pp. 674-686.
[21] L. H. Cazares, et al., “Normal, Benign, Preneoplastic, and Malignant Prostate Cells Have Distinct Protein Expression Profiles Resolved by Surface Enhanced Laser Desorption/Ionization Mass Spectrometry 1,” AACR, 2002, pp. 2541-2552.
[22] J. M. Sorace and M. Zhan, “A Data Review and Re-Assessment of Ovarian Cancer Serum Proteomic Profiling,” BioMed Central Ltd., 2003, p. 24.
[23] T. A. Zhukov, et al., “Discovery of Distinct Protein Profiles Specific for Lung Tumors and Pre-Malignant Lung Lesions by SELDI Mass Spectrometry,” 2003, p. 267.
[24] R. X. Li, et al., “Loca-lized-Statistical Quantification of Human Serum Proteome Associated with Type 2 Diabetes,” Public Library of Science, 2008.
[25] J. K. Eng, A. L. McCormack and J. R. Yates Iii, “An Approach to Correlate Tandem Mass Spectra Data of Peptides with Amino Acid Sequences in a Protein Database,” Elsevier Science Pub. Co., New York, 1994, pp. 976-989.
[26] K. Y. Yeung, et al., “Model-Based Clustering and Data Transformations for Gene Expression Data,” Oxford University Press, 2001, pp. 977-987.
[27] Y. I. Moon, B. Rajagopalan and U. Lall, “Estimation of Mutual Information Using Kernel Density Estimators,” APS, 1995, pp. 2318-2321.
[28] B. W. Silverman, “Density Estimation for Statistics and Data Analysis,” Chapman & Hall/CRC, 1986.
[29] F. J. Müller, et al., “Regulatory Networks Define Phenotypic Classes of Human Stem Cell Lines,” Nature Publishing Group, 2008, pp. 401-405.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.