TITLE:
Gene Expression Model for the Disease Prediction with Auto-Encoder Model with Classifiers
AUTHORS:
Arjun Kunwar, Shulin Wang
KEYWORDS:
Auto-Encoder, Stacked Voting, Classification, Cancer Diagnosis, Gene Expression, Prediction
JOURNAL NAME:
Journal of Biosciences and Medicines,
Vol.13 No.3,
March
14,
2025
ABSTRACT: Gene expression is the process through which genetic information in DNA is converted into functional products, primarily proteins. This involves two main steps: transcription, where DNA is copied into messenger RNA (mRNA), and translation, where mRNA is decoded by ribosomes to synthesize proteins. Gene expression is tightly regulated to ensure proper cellular function, and its analysis is vital in fields like cancer research, drug development, and genetic engineering. Hence, this paper proposed effective Voting-based Stacked Denoising Auto-encoder (VSDA) for the prediction of diseases. The VADA model uses the stacked model within the Auto-encoder for the accurate prediction of the gene expressions. This paper investigates the performance of four machine learning classifiers—Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbours (KNN), and Multi-Layer Perceptron (MLP)—on a cancer diagnosis dataset, using metrics such as Precision, Recall, F1-Score, and Support across multiple cancer types. Our results show that MLP achieves the highest overall performance with an average Precision of 0.92, Recall of 0.75, and F1-Score of 0.74. SVM follows closely with an average Precision of 0.89, Recall of 0.78, and F1-Score of 0.79, demonstrating strong reliability, particularly for cancers such as LUAD, KIRC, and THCA. RF exhibited an average Precision of 0.75, Recall of 0.68, and F1-Score of 0.66, indicating balanced performance but with slightly lower accuracy compared to SVM and MLP. KNN, while performing well in certain cancer types, had the lowest overall F1-Score of 0.60 and Precision of 0.71, showing greater variability across different cancer types. These results underscore the superiority of MLP in most scenarios, with SVM offering a competitive alternative for specific cancers. The study highlights the importance of classifier selection based on specific cancer datasets, with the goal of improving diagnostic accuracy and supporting clinical decision-making.