TITLE:
A Novel Treatment Optimization System and Top Gene Identification via Machine Learning with Application on Breast Cancer
AUTHORS:
Yuhang Wu, Yang Chen
KEYWORDS:
Machine Learning, Genomics, Treatment Selection, Dimension Reduction, Gene Selection, Cross Validation, Breast Cancer
JOURNAL NAME:
Journal of Biomedical Science and Engineering,
Vol.11 No.5,
May
30,
2018
ABSTRACT: Traditional treatment selection of cancers mainly relies on clinical observations and doctor’s judgment, but most outcomes can hardly be predicted. Through Genomics Topology, we use 272 breast cancer patients’ clinical and gene information as an example to propose a treatment optimization and top gene identification system. This study faces certain challenges such as collinearity and the Curse of Dimensionality within data, so by the idea of Analysis of Variance (ANOVA), Principal Component Analysis (PCA) is implemented to resolve this issue. Several genes, for example, SLC40A1 and ACADSB, are found to be both statistically significant and biological-studies supported; the model developed can precisely predict breast cancer mortality, recurrence time, and survival time, with an average MSE of 3.697, accuracy rate of 88.97%, and F1 score of 0.911. The result and methodology used in this study provide a channel for people to further look into the more precise prediction of other cancer outcomes through machine learning and assist in the discovery of targetable pathways for next-generation cancer treatment methods.