Identification of a 12-Gene Signature for Lung Cancer Prognosis through Machine Learning
Erin Bard, Wei Hu
DOI: 10.4236/jct.2011.22017   PDF    HTML     6,017 Downloads   10,779 Views   Citations


Personalized medicine is critical for lung cancer treatment. Different gene signatures that can classify lung cancer patients as high- or low-risk for cancer recurrence have been found. The aim of this study is to identify a novel gene signature that has higher recurrence risk prediction accuracy for non-small cell lung cancer patients than previous re-search, which can clearly differentiate the high- and low-risk groups. To accomplish this we employed an ensemble of feature selection algorithms, an ensemble of classification algorithms, and a genetic algorithm, an evolutionary search algorithm. Compared to one previous study, our 12-gene signature more accurately classifies the patients in the training set (n = 256), 57.32% compared to 50.78%, as well as in the two test sets (n = 104 and n = 82), 67.07% compared to 54.9% and 57.32% compared to 54.8%; where the prediction accuracy was determined by the average of the four classifiers. Through Kaplan-Meier analysis on high- and low-risk patients our 12-gene signature revealed statistically significant risk differentiation in each data set: the training set had a p-value less than 0.001 (log-rank) and the two test sets had (log-rank) p-values less than 0.05. Analysis of the posterior probabilities revealed strong correlation between 5-year survival and the 12-gene signature. Also, functional pathway analysis uncovered associations between the 12-gene signature and cancer causing genes in the literature.

Share and Cite:

E. Bard and W. Hu, "Identification of a 12-Gene Signature for Lung Cancer Prognosis through Machine Learning," Journal of Cancer Therapy, Vol. 2 No. 2, 2011, pp. 148-156. doi: 10.4236/jct.2011.22017.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Centers for Disease Control and Prevention, 2011.
[2] A. Jemal, et al., “Cancer Statistics,” CA Cancer Journal for Clinicians, Vol. 56, No. 2, 2006, pp. 106-130. doi:10.3322/canjclin.56.2.106
[3] D. Jackman and B. Johnson, “Small-Cell Lung Cancer,” The Lancet, Vol. 366, No. 9494, 2005, pp. 1385-1396. doi:10.1016/S0140-6736(05)67569-1
[4] A. End, “Diagnosis and Treatment of Lung Cancer – Non-Small Cell Lung Cancer, Small Cell Lung Cancer and Carcinoids,” European Surgery: ACA Acta Chirurgica Austriaca, Vol. 38, No. 1, 2006, pp. 45-53. doi:10.1007/s10353-006-0209-0
[5] Y.-W. Wan, et al., “Hybrid Models Identified a 12-Gene Signature for Lung Cancer Prognosis and Chemoresponse Prediction,” PLoS ONE, Vol. 5, No. 8, 2010. doi:10.1371/journal.pone.0012222
[6] M. Raponi, et al., “Gene Expression Signatures for Predicting Prognosis of Squamous Cell and Adenocarcinomas of the Lung,” Cancer Research, Vol. 66, No. 15, 2006, pp. 7466-7472. doi:10.1158/0008-5472.CAN-06-1191
[7] K. Shedden, et al., “Gene Expression-Based Survival Prediction in Lung Adenocarcinoma: A Multi-Site, Blin- ded Validation Study,” Nature Medicine, Vol. 14, 2008, pp. 822-827. doi:10.1038/nm.1790
[8] J.-Y. Yeh, “Applying Data Mining Techniques for Cancer Classification on Gene Expression Data,” Cybernetics & Systems, Vol. 39, No. 6, 2008, pp. 583-602. doi:10.1080/01969720802188292
[9] B. F. De Souza, et al., “Applying Genetic Algorithms and Support Vector Machines to the Gene Selection Problem,” Journal of Intelligent & Fuzzy Systems, Vol. 18, No. 5, 2007, pp. 435-444.
[10] J. Yu, et al., “Feature Selection and Molecular Classification of Cancer Using Genetic Programming,” Neolpasia, Vol. 9, No. 4, 2007, pp. 292-303.
[11] J. Hongying, et al., “Joint Analysis of Two Microarray Gene-Expression Data Sets to Select Lung Adenocarcinoma Marker Genes,” BMC Bioinformatics, Vol. 5, 2004, p. 81. doi:10.1186/1471-2105-5-81
[12] A. Bhattacharjee, et al., “Classification of Human Lung Carcinomas by mRNA Expression Profiling Reveals Distinct Adenocarcinoma Subclasses,” Proceedings of the National Academy of Sciences, Vol. 98, No. 24, 2001, pp. 13790-13795. doi:10.1073/pnas.191502998
[13] P. Larra?aga, et al., “Machine Learning in Bioinformatics,” Briefings in Bioinformatics, Vol. 7, No. 1, 2006, pp. 86-112. doi:10.1093/bib/bbk007
[14] D. E. Goldberg, “Genetic and Evolutionary Algorithms Come of Age,” Communications of the ACM, Vol. 37, No. 3, 1994, pp. 113-119. doi:10.1145/175247.175259
[15] B. Mallick, et al., “Bayesian Classification of Tumors by Using Gene Expression Data,” Journal of the Royal Statistical Society, Series B (Statistical Methodology), Vol. 67, No. 2, 2005, pp. 219-234. doi:10.1111/j.1467-9868.2005.00498.x
[16] D. E. Goldberg, “Genetic Algorithms in Search, Optimization and Machine Learning,” Kluwer Academic Publishers, Boston, 1989.
[17] Waikato Environment for Knowledge Analysis, 2011.
[18] The R Project for Statistical Computing, 2011.
[19] Ingenuity Pathway Analysis, 2011.
[20] Significance Analysis of Microarrays, 2011.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.