Screening of Colon Cancer Apoptosis-Related Genes Based on Bioinformatics and Construction of Survival Prognostic Model ()
1. Introduction
The incidence of colorectal cancer (CRC) has been increasing year by year [1] . According to statistics in 2020, CRC has become the third most common cancer in the world, and the second leading cause of cancer-related deaths (9.4%) [2] . Due to the lack of simple early diagnosis methods, CRC is usually diagnosed in advanced stages. Although there are various treatment options for colorectal cancer, the lack of potential prognostic factors for early diagnosis is one of the reasons for poor prognosis [3] , and a large amount of research and practice has shown that screening and early diagnosis of CRC can effectively reduce the mortality rate [4] . Therefore, it is urgent to explore potential prognostic biomarkers for clinical diagnosis and treatment of CRC.
Apoptosis is a special programmatic cell death process that allows normal cells to maintain dynamic balance during proliferation, differentiation, and apoptosis. Resistance to apoptosis in invasive tumor cells has been identified as a key factor in tumor progression [5] . Cancer cells cannot survive after detachment from the primary site without resistance. It has been found that tumor cells resist apoptosis in multiple ways, such as by secreting growth factors and activating expression patterns of survival signaling pathways [6] . However, there is currently limited research on the relationship between apoptosis and distant metastasis in colorectal cancer.
Therefore, this study focuses on exploring the prognostic value of ANRGs in CRC and constructing a predictive prognostic model for CRC patients.
2. Materials
1) Clinical Data Collection
The RNA sequencing data and related clinical information of CRC samples are obtained from the official website of The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/). A total of 452 CRC patients with complete clinical information are used for analysis. The RNA sequencing data and clinical data GSE103479 are extracted from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). 515 ANRGs are downloaded from the GeneCard database (https://www.genecards.org/) and Harmonizome (https://maayanlab.cloud/Harmonizome/).
Methods
1) Differential expression analysis of ANRGs
Differential expression analysis of ANRGs. The limma R package was used to identify DEGs in the TCGA-CRC cohort and to compare the expression of ANRGs between tumor tissues and adjacent normal tissues. Differences in genetic screening standard are: found error rate (FDR) < 0.05 and |log2 (FC)| 1 or more. Mapping heat maps and volcanic maps using the “pheatmap” R package for visualization of differential expression of ANRGs in colorectal cancer.
2) Cluster of differentially expressed genes related to lost-nest apoptosis
Consensus clustering was applied to identify different Anoikis correlation patterns associated with k-means representation. The Unified manifold Approximation and Projection (UMAP) was used to verify the reliability of R package “ggplot2” clustering.
3) Functional Enrichment Analysis
The “c2.cp.kegg.symbols.gmt” was downloaded to perform GSVA analysis. The “GSVA” R package was used to perform enrichment analysis.
4) Development and Validation of Prognostic Features Based on Apoptosis-Related Genes Single-factor and multi-factor Cox regression and LASSO analysis were used to identify DEGs closely related to overall survival (OS) for constructing risk features. The predictive ability of the model was evaluated by Kaplan-Meier survival curve and time-dependent receiver operating characteristic (ROC) curve analysis.
5) Relationship Between Risk Score and Immune Cell Infiltration CIBERSORT and ssGSEA R scripts were used to quantify the relative proportions of infiltrating immune cells. CIBERSORT was used to estimate the proportions of immune cell types between low-risk and high-risk groups. The sum of all estimated immune cell type scores in each sample equals 1. Rank correlation analysis was used to explore the relationship between risk score values and immune infiltrating cells.
6) Construction and Evaluation of Predictive Nomogram Clinical pathological features and risk score were used to construct a predictive nomogram. The calibration plot was performed for internal validation to verify accuracy. Decision curve analysis (DCA) was performed to evaluate clinical net benefits.
7) Statistical Analysis
Statistical analysis was conducted using R software v4.2.2. A p-value < 0.05 was considered statistically significant, and FDR (false discovery rate) q < 0.05 was considered statistically significant.
3. Results
1) Identification of Prognostic Apoptosis-Related Genes
A total of 515 ANRGs were obtained from GeneCard and Harmonizome. 161 DEGs were identified in TCGA-CRC samples compared with adjacent normal tissue (Figure 1(a)). A volcano plot showed 112 upregulated ANRGs and 49 downregulated ANRGs (Figure 1(b)). A total of 156 DEGs were finally obtained from the TCGA-CRC and GSE103479 cohorts, and 14 of these DEGs were found to be associated with survival and showed statistical significance (p < 0.05) (Figure 1(c)). TIMP1, CDKN2A, NAT1 were significantly associated with prognosis (Figure 1(d)).
2) To better understand the role of ANRGs in CRC, Consensus Cluster Plus R package was used to perform consensus clustering on 14 prognosis-related DEGs (p < 0.05) (Figure 2(a)). The overall survival analysis showed that the prognosis differences among the three subtypes were statistically significant (p < 0.01) (Figure 2(b)). UMAP and tSNE were used to test the accuracy of the clustering (Figure 2(c), Figure 2(d)). The corresponding clinical and pathological characteristics of the three subtypes and the heatmap of ANRG expression were shown (Figure 2(e)). In addition to exploring the overall distribution of 14 ANRGs,
attention was also paid to the differential enrichment of KEGG pathways between group A and group B (Figure 2(f)).
3) Building and validating prognostic features related to apoptosis of honeybee colonies. Exploring the clinical value of apoptosis-related genes and using 14 ANRGs to participate in Lasso algorithm (Figure 3(a), Figure 3(b)). Seven ANRGs were identified as independent prognostic factors by multivariate Cox analysis. According to their coefficients, the risk score = the sum of the expressions of the seven ANRGs * their respective coefficients was calculated. There was a significant difference in risk scores among the three groups, with group B showing
a higher risk score (p < 0.01) (Figure 3(c)). The forest plot showed the changes in risk and survival status related to apoptosis of honeybee colonies (Figure 3(d)). The K-M survival analysis showed that the high-risk group had a worse prognosis (Figures 3(e)-(g)). ROC curve analysis of OS at 1 year, 3 years, and 5 years showed that ANRGs features had satisfactory discriminatory performance (Figures 3(h)-(j)).
4) Gene set enrichment analysis and immune activity of different risk scores in CRC patients’ TME. The TME plays an important role in tumor occurrence, development, and response to immunotherapy. Further exploration of the distribution of TME in high-risk and low-risk populations of CRC patients was conducted. The CIBERSORT R script was used to quantify the relative proportions of infiltrating immune cells, showing the correlation between different immune cells and risk scores (Figure 4(a)). The proportion of activated macrophages M0 gradually increased as the risk score increased (R = 0.24) (Figure 4(b)). Activated macrophages M0 had significant differences in high-risk patients, indicating that activated macrophages M0 may be an important factor in the prognosis of CRC patients (Figure 4(c)). The correlation between immune cells in CRC patients may provide clues for a better understanding of the composition of TME in specific types of tumors (Figure 4(d)). The 7 ANRGs used to
build the ANRGs risk model had different expression patterns between high-risk and low-risk populations and were closely related to the infiltration of various immune cells (Figure 4(e), Figure 4(f)). Estimated scores of expression profiles for high-risk and low-risk groups were obtained (Figure 4(g)).
5) Establishing a Prognostic Nomogram for Colorectal Cancer Patients
Taking into account the influence of clinical and pathological factors on the predictive model, we combined the ANRGs-related risk model with clinical information to construct a prognostic nomogram (Figure 5(a)). The calibration plot demonstrated good accuracy of the nomogram (Figure 5(b)). The forest plot showed that risk score, N2 stage, and age were the main influencing factors in the nomogram (Figure 5(c)). The cumulative risk curve revealed that CRC patients with higher scores had gradually increasing total survival risk (Figure 5(d)). The decision curve analysis (DCA) results indicated that the nomogram could effectively predict the short-term and long-term survival of CRC patients (Figures 5(e)-(g)). These results suggest that the ANRGs-based risk score nomogram can serve as an effective method to predict patient prognosis in clinical practice.
4. Discussion
Colorectal cancer (CRC) is one of the common malignant tumors in the digestive system. Despite significant progress in various treatment modalities in recent years, the prognosis of patients is still not ideal. The lack of reliable biomarkers makes the diagnosis and treatment of CRC difficult, resulting in poor survival rates for patients. Therefore, it is necessary to establish an effective prognostic model for CRC patients. Cancer cells promote tumor proliferation, invasion, and metastasis by resisting anoikis-induced apoptosis, but the role of
anoikis in CRC is unclear [7] . In this study, the relationship between ANRGs expression and prognosis of CRC patients was studied using bioinformatics methods, and a prognostic risk model containing 7 ANRGs was constructed. Furthermore, the relationship between ANRGs and TME was investigated.
TIMP1, TNFRSF10B, CDKN2A, LTB4R2, CRYAB, RAC3 and UCHL1 are involved in the construction of prognostic models, and these genes are closely related to the occurrence and development of tumors.
Tissue inhibitor of metalloproteinase-1 (TIMP1): a glycoprotein expressed in various tissues of the organism, is a member of the TIMP family. It has been found that TIMP1 can promote the progression of multiple tumors such as gastric cancer and pancreatic cancer [8] . The upregulation of TIMP1 in CRC is associated with poor prognosis and can promote tumor cell proliferation and metastasis through the FAK/Akt signaling pathway [9] . Tumor necrosis factor receptor superfamily member 10B (TNFRSF10B) is a protein-coding gene that can mediate the exogenous apoptosis pathway of various cancer cells [10] . Studies have shown that up-regulation of TNFRSF10B in CRC cells can improve the efficiency of cancer treatment [11] . Cyclin dependent kinase inhibitor 2A (CDKN2A) encodes the P16 gene involved in a series of cellular pathways, including promoting tumor cell proliferation, inhibiting tumor cell apoptosis, inducing tumor stromal angiogenesis and reducing cancer cell sensitivity to chemotherapy. CDKN2A plays a regulatory role in cell proliferation and apoptosis, and is associated with poor prognosis of liver cancer and renal cell carcinoma [12] [13] . Studies have found that the expression of CDKN2A is up-regulated in CRC and is closely related to the proliferation of tumor cells [14] . Leukotriene B4 receptor 2 (LTB4R2), a G-protein-coupled receptor regulates chemotaxis and wound healing. Leukotriene B4 receptor 2 (LTB4R2), a g-protein-coupled receptor regulates chemotaxis and wound healing. Its expression has been associated with invasion and metastasis of lung cancer and breast cancer [15] [16] . It was also confirmed that high expression of LTB4R2 promoted CRC cell proliferation [17] . The Alpha-B crystallin (CRYAB), a member of the small molecular heat shock protein family, was first identified as a major structural protein in the lens of the eye. CRYAB is widely believed to function primarily as a molecular cocrystallin to promote cell survival. In addition to acting as a molecular chaperone, it has been shown to be involved in the occurrence of multiple tumors [18] [19] . However, CRYAB has been shown to inhibit the proliferation and metastasis of CRC cells [20] . In this study, high CRYAB expression was associated with poor prognosis in CRC patients, so the mechanism of CRYAB in colorectal cancer needs further study. Ras-related C3 botulinum toxin substrate 3 (RAC3) is a member of the Rho GTPase family. Rac is a subset of Rho-GTPase. Rho and rac-GTase are associated with human carcinogenesis, cancer cell proliferation, migration and invasiveness. Studies have found that RAC3 is overexpressed in bladder cancer and endometrial cancer and promotes tumor progression [21] . RAC3 increases resistance to chemotherapeutic drugs by inhibiting CRC apoptosis and autophagy [22] . Ubiquitin C-terminal hydrolase L1 (UCHL1) is expressed in normal tissues such as neurons, but there is increasing evidence that UCHL1 is up-regulated in some human cancers and plays a key role in cell proliferation, migration, invasion and anti-apoptosis. It is highly expressed in a variety of tumors and is associated with poor prognosis of lung cancer and breast cancer [23] [24] . Studies have shown that UCHL1 is up-regulated in CRC cells and is associated with overall survival of patients [25] .
In this study, CRC patients were divided into three subtypes (cluster A, B, and C) using consensus clustering analysis, and patients in cluster B had poor prognosis. To further explore its mechanism, GSVA analysis was performed, and the results showed that the main pathways involved in cluster B were ECM receptor interaction and focal adhesion, which are key pathways for tumor cells to escape their original growth environment and settle in new anchoring sites [26] . The risk score of the 7 ANRGs was an important prognostic factor for the survival of CRC patients. The survival rate of high-risk group CRC patients was significantly lower than that of low-risk group. The nomogram prognostic model constructed by integrating multiple risk factors can effectively predict the prognosis of CRC patients. Immune infiltration analysis of different risk groups showed that the proportion of regulatory T cells, monocytes, M0 macrophages, activated dendritic cells, mast cells, and neutrophils was higher in the high-risk group, while the proportion of plasma cells, resting memory T-CD4 cells, M1, and M2 macrophages was higher in the low-risk group. These differences may be important factors influencing patient prognosis and immune therapy response and require further research.
5. Conclusion
In summary, many ANRGs can regulate the occurrence and development of tumors, and the expression of ANRGs in tumor tissue has great prospects in predicting survival outcomes. These ARGs can serve as new molecular targets. In this study, a prognostic risk scoring model for CRC based on 7 ANRGs was constructed using LASSO and COX regression analysis. The predictive performance of the model is stable and the modeling and risk genes can also serve as potential therapeutic targets and research foundations for the pathogenesis of colorectal cancer. Therefore, this prognostic risk model can assist clinicians in developing accurate and personalized treatment plans for patients. Secondly, this study verified the reliability of the constructed risk model through two external datasets. The predictive efficacy verification of the risk model (mean AUC > 0.600) demonstrated that the model also has a moderate degree of predictive performance in other independent datasets. However, there are still limitations in this study, and the above conclusions are based on the TCGA and GEO databases. Large-scale clinical trials are needed to evaluate the predictive efficacy of the model.