Advancing Early Detection of Colorectal Adenomatous Polyps via Genetic Data Analysis: A Hybrid Machine Learning Approach ()
1. Introduction
Globally, Colorectal cancer (CRC) is the third most common type of cancer as well as the second leading cause of cancer deaths in adults [1]-[4]. In 2020, the documented incidence of CRC across the world was 1,931,590 (men: 1,065,960, women: 865,630) [5]. Particularly, total CRC mortality worldwide was 935,173 (men: 515,637, women: 419,536) in 2020 [5]. Global Cancer Observatory estimates that by 2040, this number will be close to 1,919,534 [6]. Accurate diagnosis and treatment of CRC patients is an enormous challenge because of the disease’s complexity and variability [7]. A linear progression from normal colonic epithelium to adenoma, carcinoma transformation, and metastasis cause CRC. The predominant causes of CRC can be greatly diminished if the malignant polyps are appropriately identified and promptly removed and treated [1] [8] [9]. Different studies have demonstrated a correlation between a high adenoma detection rate and a reduced risk of invasive CRC that can primarily cause mortality. If precancerous polyps (adenoma) are detected initially and removed, it is possible to prevent CRC [10]-[12]. Therefore, it is crucial to spot precancerous lesions such as adenomatous polyps and CRC as early as feasible. The current diagnostic procedures include stool based screening, colonscopy and histology. Each detection method has its own limitations.
Stool-based screening is currently the test most frequently used to detect CRC early worldwide [13] [14]. This kind of test looks for blood in the stool or analyzes the DNA in the stool for indications of a colorectal polyp or CRC. These tests have the appealing merit of being less intrusive and simple to perform, however, they have the limitation of having to be carried out more frequently [15]. Furthermore, implementing this screening exhibits poor sensitivity to adenoma lesions, thus, these assays are insufficient for adenoma screening [16]. With the advantages of superior sensitivity, specificity, and direct visualization, colonoscopy is regarded as the top standard approach for CRC screening and is seen as being crucial in the detection of cancer and precancerous lesions diagnosis and removal currently [17]. M. Tharwat et al. [8] conducted survey research on the use of artificial intelligence including deep learning (DL) and machine learning (ML) in the diagnosis of CRC indicating that most of the research in this field is based on colonoscopy and histology. However, colonscopy comes with certain limitations. Colonoscopy requires expert manual exams which are subject to a variety of errors [9]. In addition, a colonoscopy may miss some tiny polyps that may develop CRC in the future [14].
There have been studies that explored other methods to overcome the aforementioned limitations. Ying Su et al. [17] used gene expression data with ML including random forest (RF) and support vector machines (SVM) for colon cancer staging. Their method classifies CRC and colon metastasis into five distinct stages 0, I, II, III, and IV according to the American Cancer Society CRC survivorship care guidelines [18]. Also, Koppad S. et al. [19], created a predicted strategy employing several ML techniques to find a set of genes that may one day function as probable CRC diagnostic biomarkers. Following a similar route, Lacalamita et al. [20] applied AI algorithms such as Linear Model, RF, k-Nearest Neighbors (k-NN), and Artificial Neural Networks using Gene Expression Omnibus (GEO) dataset [21] to distinguish adenoma tissue and primary CRC. Their best classification algorithm was k-NN. These studies share a limitation; the data are used as is without dealing with data imbalance among the three classes of CRP. Raghav et al. [22] applied an unsupervised learning methodology that utilized hierarchical clustering and feature selection (FS) to identify distinct molecular subtypes. By employing gene expression data from patients with CRC, their model achieved an accuracy of 89% following the feature selection. Chen et al. [23] developed a functional evolution network to examine the dysfunctions occurring during CRC stages. Through an investigation of gene modules and their molecular functions, they identified cellular functions that shed light on the evolution process of CRC staging. A deep neural architecture search model was presented by P Sun et al. [7] to diagnose consensus molecular subtypes from gene expression data. Their model searches and optimizes neural network architecture using the ant colony algorithm, one of the heuristic swarm intelligence methods.
CRC remains a significant public health concern, and early detection and diagnosis are crucial for improving patient treatment outcomes. Previous studies have not adequately addressed the issue of imbalanced data among different classes of colorectal polyps (CRP). It is crucial to adopt a comprehensive approach to tackle this data imbalance problem in order to enhance the performance and accuracy of machine learning (ML) methods in the context of colorectal cancer (CRC) detection and diagnosis. Furthermore, existing methods for CRC detection and diagnosis have shown promise; however, there is still room for further improvement in their performance. By refining these methods, we can enhance their applicability and reliability in clinical settings.
After conducting a comprehensive review of existing literature, it becomes evident that previous studies have primarily focused on detecting CRC, staging CRC, and identifying genes associated with CRC diagnosis. Here, this study takes a proactive approach by aiming to early detect precancerous colorectal polyps (CRP) by developing an advanced ML approach that incorporates genetic data to analyse CRP, thereby enhancing the possibility of CRC prevention by early detection and diagnosis of precancerous CRP as well as diagnosis of CRC. While this study builds on the common ground that exists in the aforementioned reviewed studies, it presents several notable contributions, which encompass the following:
The introduction of a method that combines RF and SVM, results in exceptional accuracy for classifying CRP. This approach effectively distinguishes between normal instances and adenomatous CRP (ACRP).
The utilization of genetic data analysis enhances the precision of detection, providing a more comprehensive understanding of CRP diagnosis for the prevention of CRC.
The validation of the proposed method is through the utilization of a larger publicly available dataset, which has been previously confirmed by physicians [24] as relevant to CRC.
The identification of specific genes associated with the detection and classification of CRP, sheds light on the underlying molecular mechanisms involved in this process.
The study successfully narrowed down the list of genes associated with CRP classification from an initial count of 13,670 genes to a more focused set of 186 genes. These genes are identified as the most relevant and closely linked to the detection and classification of CRPs.
This paper is organized as follows: Section 2 provides a detailed description of the materials, methodology, and approach used in this study. Section 3 presents the experimental results obtained and offers a comprehensive discussion of these findings.
2. Data Acquisition
A public dataset of 705 microarrays samples was inherited from GEO data available online. The dataset is aggregated across 12 independent studies. The collected microarray comprised 231 normal, 132 adenomas, and 342 CRC tissue samples. To overcome the imbalance of the dataset, the Synthetic Minority Over-sampling Technique (SMOTE) algorithm [25] is applied to oversampling the minority classes’ number of samples to be equivalent to the largest class. SMOTE tuned all categories to be (342 normal, 342 adenomas, and 342 CRC) obtaining a total of 1026 instances. The resulting dataset is partitioned into 820 training and 206 testing samples with a division rate of 80% and 20% for training and testing respectively.
3. Methodology
This section encompasses feature selection, random forest, support vector machine, the proposed hybrid ML technique and performance evaluation techniques. Figure 1 illustrates an overview of the proposed methodology for identifying normal, adenoma, and carcinoma CRPs using GE data.
3.1. Feature Selection
The FS process is one of the robust pre-processing methods. This process is applied to reduce the dimensionality of the classification data by eliminating redundant and irrelevant features. FS enhances classification accuracy and reduces CPU time and memory needs by selecting a relevant subset of features from a given set of large numbers of attributes. In ML, the FS process varies between three forms [26]: 1) wrapper, 2) filter, and 3) Intrinsic or hybrid methods. In filter methods, every feature subset is validated based on a general employing an evaluation function. The wrapper methods include a learning algorithm or classifier to evaluate how important the selected feature subset. Sometimes, the wrapper-based techniques show superiority when compared to filter approaches [27]. Hybrid methods are efficient on the computational side. In this study, we applied the supervised wrapper FS method [28] to select important features of
![]()
Figure 1. A flowchart of the proposed methodology overview.
the microarray gene expression data for CRC classification. Further, the genetic algorithm (GA) optimizer [29] is employed as a meta-heuristic feature selector. A classification algorithm is utilized to evaluate the selected attributes.
3.2. Random Forest Classifier
The RF classifier is an effective ensemble classifier that combines a set of CARTs classification trees to make a prediction [30]-[32]. The ensemble RF classifier works in a way such that a vector θk of generated random values is distributed over the combined tree in the forest, and each tree is derived using the training data and the distributed vector θk [33]. The classification technique of new instances is achieved by applying the RF based on the majority voting class of the combined decision trees results to reach the final class. The generalization error is computed as follows:
where the random vectors X and Y are the X, Y probability space, mg indicates the margin function that assesses the range between the average of votes at the right output random vectors, compared to the average vote for any other output. The mg function is defined as follows:
The RF method has two hyperparameters that are strength and correlation, the former hyperparameter is an indicator of the accuracy of the individual classification tree, while the latter hyperparameter measures the dependence between the classification trees.
3.3. Support Vector Machine
Support vector machine (SVM) is a binary and multi-class classification algorithm. SVM employs a discriminant hyperplane to separate data classes. The hyperplane is defined to maximize the margin space and reduce the new instance prediction error based on the defined support vectors from training data. the SVM has been able to learn linear separable and non-linear data through applying the kernel functions [34]. The SVM algorithm has two main hyperparameters that need to be tuned properly to obtain a better performance [35]. The C or regularization hyperparameter controls the trade-off between the width of the hyperplane margin and the number of misclassified samples. The smaller the C value, the larger the margin size. A large C will guide to a small hyperplane-margin size and a smaller number of misclassified points. The C hyperparameter is defined as follows:
The SVM second hyperparameter is the kernel method, that is responsible for the mapping of the input space to a high-dimensional feature space to separate the non-linear data [36].
3.4. Classification Ensemble Model Based on the Tuned Hyperparameters
A classification ensemble model based on the tuned hyperparameters (CEM-TH) method is proposed. This method ensemble the two classifiers SVM and RF through a tuned combination of weights and internal individual hyperparameters. The combination of weights is optimized over the k-folds cross-validation (KFCV) method using a heuristic optimization algorithm. The hyperparameters of individual classifiers are tuned for each base classifier. These parameters are tuned using the meta-heuristic optimizer Grey Wolf optimizer (GWO) [37]. In the training stage, each base classifier is trained using the training data and its hyperparameters are tuned using the KFCV approach. After evaluation, each learner prediction is weighted by the corresponding weight obtained by GWO. In detail, this approach strengthens the knowledge share between the trained base learners through the weighted prediction step, where specific prediction samples are upgraded to the final prediction vector (Y). Specifically, the selection methodology is controlled by an exploration vector (A) generated randomly of prediction samples’ lengths. Therefore, the higher weight assigned to the learner, the more prediction samples are selected from the current base learner predictions by the vector (A) to be maintained in the final (Y) prediction. The process is concluded by evaluating the CEM-TH predictions maintained in vector (Y). The previous steps continue in evolving the optimized weights for each base learner to reach maximum accuracy. The algorithm is explained in Algorithm 1.
![]()
The resulting dataset of 1026 microarray samples is partitioned into training/testing with a division rate of 80/20 respectively (820 samples training and 206 testings). The training data is fed into the classifier algorithm. These classifiers are RF, SVM, and CEM-TH. To assess the classification model, a set of effective metrics is employed to evaluate the performance. Classifiers are employed and compared within a fair comparison condition. For a fair comparison, FS and (GA) optimizer are applied with all classifiers under the same hyperparameter settings. The termination criteria are set as follows, the GA maximum iterations are set to 50 iterations, with a population size of 20 agents. Moreover, the GA is employed to optimize each classifier hyperparameter and guarantee that each classification algorithm obtains more optimal performance. Further, each classifier is trained on the training dataset, then the hyperparameters are tuned using k-fold cross-validation (CV). The CV process enables the classifier to explore the features of training data effectively as a validation approach. The classification algorithm is validated on (k-1) folds, while the residual 1 fold is utilized to evaluate the training results. In this study, the training data is cross-validated with a k hyperparameter that is set to five folds.
3.5. Performance Evaluation Techniques
In order to determine the technique that exhibited the highest performance, we employed various evaluation methods, including the Confusion Matrix function [38], the area under the curve (AUC), and the t-Test: Paired Two Sample for Means. These evaluation measures were applied to all the techniques under consideration. The equations used to calculate these metrics are as follows:
To ensure a comprehensive assessment of both the proposed methodology and other techniques, we calculated conventional evaluation metrics for the results of each technique. These metrics encompassed accuracy, precision, recall, F1-Score, and specificity. The evaluation was conducted using the “caret” package in R. Through the rigorous application of these evaluation techniques, our objective was to identify the technique that demonstrated superior performance among the alternatives.
4. Experimental Results and Performance Evaluations
4.1. Results of Feature Selection
The proposed approach used the publicly available GEO dataset. FS is applied to obtain the selected features vector. This vector contains the relevant feature GA to encode a binary feature vector of length 13,670 the same as the input gene expression dataset’s number of attributes. Each vector value is either 1 for “include” or 0 for “exclude” for the corresponding attribute value. The vector of selected features is available for the classification stage. A total of 186 genes have been chosen from a pool of 13,670 genes. Table 1 presents the names of selected genes that have been found to be associated most with the detection and classification of CRPs. These genes play a crucial role in understanding the molecular mechanisms and biological processes involved in CRPs classification.
4.2. Results of Applying the ML Classifiers with Feature Selection
In this section, we present the outcomes obtained by employing the proposed methodology with other ML classifiers such as RF and SVM while applying feature selection with all classifiers. Our objective is to determine the most effective approach. To achieve this, we validate the performance of these classifiers using a range of classification metrics. These metrics include accuracy, precision, recall, F1-score, specificity, confusion matrix, t-Test, and AUC. By utilizing these evaluation measures, we aim to identify the optimal approach among the tested classifiers.
The classification performance of each classifier in normal, adenoma, and CRC cases is displayed in Tables 2-4, respectively. Table 2 presents the performance specifically for normal cases. Among the classifiers tested, the proposed CEM-TH exhibited the highest performance, followed by RF and then SVM. CEM-TH classifier achieved a remarkable accuracy of 98.6%, followed by RF with an accuracy 97.9, while SVM achieved an accuracy of 96.5%. When considering precision, recall, F1-score, and specificity metrics, CEM-TH consistently demonstrated superior results.
The performance of the adenoma cases analysis is presented in Table 3. The proposed CEM-TH classifier achieved the highest performance across all evaluation metrics, with accuracy, precision, recall, F1-score and specificity rates of
Table 1. Names of genes associated with detection and classifying CRPs.
1552263at |
1552680aat |
1555058aat |
1555935sat |
1565951sat |
1568623aat |
200790at |
200831sat |
200982sat |
201088at |
201152sat |
201516at |
201773at |
202226sat |
202450sat |
202636at |
202813at |
203295sat |
203370sat |
203881sat |
203968sat |
203997at |
204072sat |
204235sat |
204559sat |
205089at |
205141at |
205238at |
206153at |
206656sat |
207112sat |
207223sat |
207509sat |
207620sat |
207705sat |
208018sat |
208688xat |
208891at |
209016sat |
209082sat |
209198sat |
209379sat |
209496at |
209616sat |
209652sat |
209780at |
209822sat |
209832sat |
209901xat |
209925at |
210115at |
210467xat |
210754sat |
210935sat |
211302sat |
211367sat |
211656xat |
211734sat |
211996sat |
212276at |
212316at |
212398at |
212601at |
212801at |
213012at |
213610sat |
213766xat |
213959sat |
214155sat |
214431at |
214567sat |
214792xat |
214866at |
214975sat |
215099sat |
215633xat |
216022at |
216247at |
216250sat |
216973sat |
217179xat |
217232xat |
217884at |
218145at |
218284at |
218418sat |
218455at |
219155at |
219476at |
219856at |
219890at |
219908at |
219909at |
220074at |
220182at |
220206at |
220413at |
221019sat |
221088sat |
221896sat |
222642sat |
222695sat |
222790sat |
223274at |
223452sat |
223679at |
224176sat |
224516sat |
224590at |
224759sat |
224796at |
224990at |
225012at |
225030at |
225291at |
225507at |
225544at |
225568at |
225664at |
225667sat |
225829at |
225872at |
225898at |
225943at |
226187at |
226223at |
226269at |
226384at |
226930at |
227433at |
227569at |
227624at |
227657at |
227725at |
227926sat |
227962at |
228003at |
228090at |
228155at |
228245sat |
228262at |
228333at |
228355sat |
228937at |
228990at |
229061sat |
229674at |
230099at |
230204at |
230333at |
230895at |
231399at |
231829at |
231906at |
232103at |
232213at |
232465at |
233700at |
233857sat |
235076at |
235190at |
235456at |
235740at |
235783at |
235948at |
236216at |
236894at |
237459at |
238017at |
238142at |
238625at |
238673at |
239069sat |
239761at |
239811at |
241036at |
241956at |
242814at |
243140at |
243303at |
243386at |
244261at |
45297at |
91826at |
213424at |
AFFX.PheX.5at |
Table 2. Comparison of classifying normal cases in CRP examination using various classifiers results, when applying modified FS methodology.
Method |
Accuracy |
Precision |
Recall |
F1-score |
Specificity |
RF |
97.9 |
93 |
98 |
96 |
98 |
SVM |
96.5 |
91 |
98 |
94 |
98 |
Proposed |
98.6 |
95 |
99 |
98 |
98 |
Table 3. A comparison of classifying adenoma cases in CRC examination results using the same classifiers.
Method |
Accuracy |
Precision |
Recall |
F1-score |
Specificity |
RF |
94.3 |
86 |
86 |
86 |
96 |
SVM |
93.6 |
86 |
83 |
84 |
96 |
Proposed |
96.5 |
89 |
93 |
91 |
98 |
96.5%, 89.0%, 93%, 91% and 98%, respectively. RF and SVM classifiers also demonstrated strong performance, achieving accuracy of 94.3% and 93.6% for predicting adenoma samples, respectively. In terms of precision, recall, F1-score and specificity. This illustrates that the performance of the proposed CEM-TH classifier outperforms the other classifiers.
Table 4 displays the performance of classifying CRC cases. The proposed CEM-TH classifier with FS achieved the highest performance across all evaluation metrics, with accuracy, precision, recall, F1-score, and specificity rates of 97.9%, 99%, 96%, 97.9%, 98% and 96.0%, respectively. RF also demonstrated strong performance compared to SVM. The proposed CEM-TH classifier yielded higher results when compared to other classifiers and the existing literature. Furthermore, it is noteworthy that applying the proposed method led to an improvement in the classifiers’ performance in terms of precision, recall, F1-score, and specificity.
The effectiveness of the developed approach is convincingly demonstrated by applying the proposed method to enhance the classification performance of RF and SVM in distinguishing CRP into normal, adenoma, and CRC categories. This is evident from the clear graphical representations presented in Figure 2. This figure provides a comprehensive comparison of the results obtained from the three classifiers, clearly highlighting the superior performance of the proposed method over alternative classifiers. These findings serve as compelling evidence for the efficacy of the developed approach.
Figures 3(a)-(c) provide a graphical comparison of the accuracies of the classifiers in analyzing normal, adenoma, and CRC cases, using confusion matrices. These results align with the statistical accuracy findings presented in Tables 2-4. Additionally, the Receiver Operating Characteristic (ROC) curves of these classifiers, illustrating their performance in classifying normal, adenoma, and CRC cases, can be observed in Figures 4(a)-(c), respectively. These ROC curves provide further insights and reinforce the efficacy of the proposed method.
Table 4. A comparison of classifying CRC cases in CRC examination results using the same classifiers.
Method |
Accuracy |
Precision |
Recall |
F1-score |
Specificity |
RF |
96.5 |
98 |
94 |
96.5 |
94 |
SVM |
97.2 |
98.1 |
96 |
97.1 |
96 |
Proposed |
97.9 |
99 |
96 |
97.9 |
96 |
Figure 2. Classification accuracy for normal, adenoma, and cancerous types of CRP.
Figure 3. Confusion matrices obtained by RF, SVM and the proposed approach for normal, adenoma, and cancerous types of CRPs.
Figure 4. ROC curves for the competitor methods for identifying CRP cases into noramal, adenoma and CRC.
Upon careful examination of these comparisons, it becomes evident that the performance of the proposed CEM-TH classifier surpasses that of all other classifiers across all evaluation metrics. This superior performance is particularly notable when the method is applied to normal tissues, achieving an accuracy of 98.6% along with other exceptional metrics. The CEM-TH classifier also demonstrates the highest performance when applied to adenoma and CRC tissues, achieving accuracies of 96.5% and 97.9% for adenoma and CRC cases respectively. These results confirmed that the proposed CEM-TH classifier achieves the highest performance when compared to the others.
In order to further validate the results, a t-Test: Paired Two Sample for Means was conducted, providing insights into the significance of the proposed method compared to alternative approaches. The results of the t-Test highlight the significance of the proposed method in various scenarios. The proposed method demonstrated a significant improvement compared to RF and SVM (P-value < 0.05), as depicted in Figure 5(a). In the case of adenoma and CRC tissues, the proposed method exhibited a significant improvement in performance compared to the other approaches (P-value < 0.05), as shown in the tables presented in Figure 5(b) and Figure 5(c). These findings provide additional evidence of the effectiveness and superiority of the proposed method for classifying adenoma and CRC cases, solidifying its significance in comparison to alternative approaches.
Figure 5. Comparison of the proposed method with other approaches using p-values in (a) normal, (b) adenoma, and (c) CRC cases.
4.3. Comparison of the Proposed Methodology to the Literature
The study conducted by Lacalamita et al. [20], utilizing datasets from GEO [21], employed a collection of four datasets downloaded from the repository, comprising a total of 465 samples. These samples were grouped into three cohorts: 105 normal samples, 155 adenomas samples, and 205 CRC samples. Whereas, in the proposed approach, a dataset consisting of 705 array samples inherited from the GEO datasets [24] was examined. This dataset was aggregated from 12 independent studies and encompassed 231 normal samples, 132 adenomas samples, and 342 CRC tissue samples.
By employing a k-NN model, Lacalamita et al. achieved an accuracy of 91.11% and AUC of 97.6%. P Sun et al. [7] applied deep neural architecture for gene expression data on eight CRC datasets. They achieved an accuracy of 95.48%, specificity of 98.07%, and an average sensitivity of 96.24%. The proposed method in the current study outperformed these results by achieving an accuracy of 97.7% ± 1.1%, precision of 94.3% ± 5%, recall of 96% ± 3%, F1-score of 95.7% ± 4%, specificity of 97.3% ± 1.2%, average AUC of 97.3% ± 1%, and average p-value of 0.0425 ± 0.0715. When comparing other performance metrics between the two methods, it becomes evident that the proposed method significantly advanced the existing literature (P = 0.00007).
These comparisons highlight the effectiveness of the proposed method in accurately identifying normal and adenomas CRP, as well as CRC cases, with the highest accuracy. The results demonstrate how the proposed method can aid in the early detection of ACRP as well as prevention of CRC by identifying adenomas CRP. Early identification of ACRP can guide physicians in devising strategies for CRC treatment and prevention, ultimately contributing to a reduction of CRC mortality.
5. Conclusions and Suggestions
In this study, an approach utilizing GDA and hybrid ML techniques, combined with FS, has been proposed for early detection of CRP and early diagnosis of CRC. The integration of the RF, SVM and FS using the proposed technique resulted in the highest performance achieved for early detection of ACRP.
The FS process played a crucial role in identifying relevant features associated with CRC, contributing significantly to the high performance of the proposed method. The proposed CEM-TH classifier demonstrated outstanding performance in accurately identifying adenomatous CRP and diagnosing CRC, surpassing other methods in terms of performance.
The remarkable accuracies of 96.5% and 98.6% in identifying pre-cancerous adenoma polyps and normal samples respectively highlight the potential impact of this method on CRC prevention through early detection and timely treatment of adenomatous polyps.
The early identification of CRP holds great significance in guiding physicians’ strategies for CRC treatment and prevention. The proposed approach, which combines GDA, FS, and hybrid ML techniques, shows promise for clinical application in the early detection of ACRP and treatment planning strategies of CRC. The advantages of this approach are expected to contribute to a reduction in CRC mortality.
The study successfully narrowed down the list of genes associated with CRP classification from 13,670 genes to a more focused set of 186 genes. These genes were identified as the most relevant to the classification of CRP into normal, ACRP or CRC. This reduction in gene numbers allows for a more targeted and efficient analysis of the genetic factors involved in CRP classification.
Future research directions should focus on further improving the method’s performance by evaluating it on additional datasets and expanding its application for CRC staging using diverse datasets. Building upon the success of this study, further advancements can be made in developing more effective strategies for early detection of ACRP, CRC prevention, and treatment planning strategies of CRC.
Acknowledgements
The authors extend their appreciation to the Deputyshipp for Research and Innovation, Ministry of Education in Saudi Arabia for funding this research work through project number 442-162.
Data and Source Code Availability Statement
In this study, publicly accessible datasets were examined. This data can be downloaded here, https://figshare.com/collections/A_merged_microarray_meta-dataset_for_transcriptionally_profiling_colorectal_neoplasm_formation_and_progression/5328719. The source code is available upon request to the corresponding author.
Disclosure Statement
The authors declare no conflicts of interest.
Funding
The authors extend their appreciation to the Deputyshipp for Research and Innovation, Ministry of Education in Saudi Arabia for funding this research work through project number 442-162.