Bioinformatics Analysis of the Association between Osteosarcoma and Ewing Sarcoma Comorbidity ()
1. Research Background
Osteosarcoma (OS), originating from mesenchymal cells, is the most common primary bone malignancy [1]. Osteosarcoma can occur in all ages, and most of them are teenagers. It is highly malignant and has a disability rate. Its 5-year survival rate is only 10% - 20% [2]. In the early stage of the disease, the symptoms are often manifested as intermittent pain. With the development of the disease, persistent pain may occur and may cause lower limb dysfunction [3] [4]. The incidence of osteosarcoma is high and the prognosis is poor. It usually occurs in the distal femur and metastasizes to internal organs and tissues [5]. Although the current treatment methods are diversified, the survival rate of distant metastasis is still low [6] [7]. Ewing’s sarcoma (ES) is a rare, aggressive small round cell malignant tumor that occurs in bone and soft tissue. It is a bone malignant tumor characterized by low differentiation, high invasiveness, high local recurrence rate and distant metastasis. It occurs in children and adolescents [8] [9], and its incidence is second only to osteosarcoma [10]. At present, the prognostic indicator of Ewing’s sarcoma is whether the patient has tumor metastasis [11]. With the gradual optimization of treatment options, the 5-year survival rate of patients has increased from 10% to 70%. However, if distant metastasis occurs, the annual survival rate is only less than 30% [12]. In patients with long-term survival, 10% to 20.5% of patients will be secondary to other malignant tumors, including osteosarcoma, lymphoma, leukemia, etc., causing further deterioration of symptoms [13] [14]. They are the two most common malignant bone tumors in children and adolescents. The mechanism of occurrence and development is complex, and the mortality and disability rates are high. The prognosis of patients with partial metastasis or recurrence of sarcoma is still particularly severe, which seriously affects the health and quality of life of patients. Although a lot of efforts have been made in targeted therapy, the prognosis has not improved significantly. Therefore, there is an urgent need for new therapies to improve treatment.
Bioinformatics is a newly developed interdisciplinary subject in recent years. Its application in biology and medicine has attracted more and more attention [15]. Using bioinformatics methods to analyze the massive tumor-related data generated by high-throughput technology, it is helpful to find the key targets of tumors and open up new ideas for the early diagnosis, treatment and drug development of cancer. Relying on bioinformatics technology, researchers can analyze the ES genome data of OS comorbidity in the public database to provide a reference for exploring the pathogenesis of the two comorbidities.
In this study, bioinformatics methods were used to analyze and explain the genetic correlation between osteosarcoma and Ewing’s sarcoma, and to screen the co-morbid marker molecules of the two, in order to provide indicators for early disease diagnosis. In summary, this study conducted bioinformatics analysis of the gene chip data of ES and OS and their control populations included in the Gene Expression Omnibus (GEO) public database, studied the key genes and their molecular networks, and explored the possible molecular biological functions involved. It provides a theoretical reference for revealing the potential association and molecular mechanism of the two.
1.1. Data Acquisition
The original data of ES patient data chip GSE17674 and OS patient data chip GSE16088 were obtained by searching the public gene expression omnibus (GEO) database of the National Center for Biotechnology Information with “Ewing’s Sarcoma” and “Osteosarcoma” as keywords. GSE17674 included 44 ES patients and 18 non-ES patients. GSE16088 included 14 OS patients and 6 non-OS patients.
Screening differentially expressed genes based on the R language limma package. The differential expression analysis of OS and ES disease mRNA chip data was performed, and the test statistics P2 was set as the screening condition to screen the differentially expressed mRNAs (DEmRNAs). The common target genes of OS and ES diseases were obtained by intersecting the differential genes of OS and ES diseases.
1.2. GO/KEGG Signal Pathway Enrichment Analysis
The R clusterProfiler package was used for GO and KEGG enrichment analysis. GO functional enrichment was performed on the common DEmRNAs of the two to analyze the biological processes they participate in, and KEGG signaling pathway enrichment analysis was performed at the same time. The biological processes (BP), cellular components (CC) and molecu-lar functions (MF) data of the differentially expressed genes were obtained, and the data were downloaded and sorted.
1.3. PPI Protein Interaction Diagram and Core Module Analysis
The differential genes were analyzed by STRING (search tool for the retrieval of interacting genes/proteins) 11.0 online tool. The calculation results of STRING were imported into Cytoscape 3.9.1 software, and plug-ins such as CytoNCA, Cytohubba and MCODE were used for protein interaction network diagram, co-expression of core genes and modules.
2. Results
2.1. Screening of Common mRNAs
Using the R language limma package to analyze the differential expression of gene chips for the two diseases, it was found that most of the gene expression levels were basically consistent, indicating that the data was suitable for further analysis, as shown in Figure 1.
(a) (b)
Note: (a) is GSE16088 data chip PCA diagram; (b) is GSE17674 data chip PCA diagram.
Figure 1. PCA of differentially expressed genes.
(a) OS (b) ES
Note: (a) is the volcanic map of GSE16088 data set; (b) is the volcanic map of GSE17674 data set, blue is the down-regulated gene, and red is the up-regulated gene.
Figure 2. Volcano map of common differentially expressed genes.
Figure 3. Wayne diagram of differentially expressed genes in OS and ES.
A total of 933 differentially expressed genes were identified between OS patients and normal individuals, and 1482 differentially expressed genes were identified between ES patients and normal individuals, as shown in Figure 2. In addition, we determined the intersection of these two data sets, and obtained a total of 335 co-expressed genes, which are shown in the Venn diagram (Figure 3).
2.2. GO and KEGG Enrichment Analysis of Common Differentially Expressed Genes
The R clusterProfiler package was used to perform GO and KEGG pathway enrichment analysis on the common target genes of OS and ES. Biological Processes (BP), Cellular Components (CC), Molecular Functions (MF) and Pathways in KEGG were analyzed. It can be seen:
Figure 4. GO/KEGG enrichment analysis.
BP function was mainly enriched in sister chromatid segregation, collagen fibrillization, ossification, extracellular structure organization, and extracelluar matrix organization, as shown in Figure 4(a); CC function is mainly enriched in focal adhesion, extracelluar matrix organization, complex of collagen trimers, endoplasmic reticulum lumen, collagen-containing extracellular matrix and so on (see Figure 4(b)); The functions of MF are mainly enriched in collagen binding, platelet-derived growth factor binding, structural component of cytoskeletion, extracellular matrix structural component conferring tensile strength, extracellular matrix structural constituent and so on (see Figure 4(c)); The KEGG pathway is mainly enriched in cell cycle, ECM receptor interaction (ECM), collagen-containing extracellular matrix, endoplasmic reticulum lumen, extracellular structure organization, ossification, extracellular matrix organization, etc., as shown in Figure 4(d).
2.3. PPI Protein Interaction Map Construction and Core Gene
Screening
The STRING online tool was used to analyze 335 common DEmRNAs, and the lowest interaction score was set to 0.4, and the interaction diagram of common differential genes was obtained. The PPI interaction map was obtained by importing the calculation results of STRING into Cytoscape 3.9.1 software, and the genes with higher betweenness centrality (BC) were obtained by using the CytoNCA plug-in BC algorithm. The genes were arranged clockwise according to the score, and the common expression gene module was obtained by the MOCDE algorithm in the Cytoscape plug-in. There were 3 core modules with more than 4 points (Figure 5(a), Figure 5(b), Figure 5(c)); five Hub genes (Figure 6), NCAPG, MAD2L1, CDK1, RRM2 and RFC4, were obtained by 10 algorithms such as MCC, DMNC and MNC in CytoHubba plug-in.
(a)
(b)
(c)
Figure 5. Core modules.
Figure 6. Core differentially expressed genes.
3. Discussion
Osteosarcoma is the most common primary malignant tumor of bone, accounting for 15% of all malignant tumors in children. Among the patients, the 5-year survival rate of patients without metastatic osteosarcoma is relatively low, about 60% - 70%, and the 5-year survival rate of patients with metastatic osteosarcoma is only 19% - 30% [16] [17].In patients with sarcoma, tumor metastasis and recurrence are still the main causes of clinical death, while nearly a quarter of patients with osteosarcoma still have lung metastasis [18]. Even with the continuous progress of surgical treatment and neoadjuvant therapy in recent years, unfortunately, the progression-free survival and overall survival of contemporary osteosarcoma patients have remained largely unchanged [19]. Ewing’s sarcoma is the second most common bone and soft tissue malignancy in children and adolescents. Its incidence is second only to osteosarcoma. Previous studies have shown that the 5-year survival rate of patients with local metastasis is 65% - 75%, while the 5-year survival rate of patients with multiple systemic metastases is less than 30%. This is because they have a high tendency to metastasize and are ineffective for conventional chemotherapy and radiotherapy [20] [21]. Although multidisciplinary treatment options for Ewing’s sarcoma are rapidly evolving, including surgery, systemic or local chemotherapy and radiotherapy, the prognosis of patients with advanced Ewing’s sarcoma remains very poor [22].
Because there are relatively few clinical OS and ES specimens, sample collection is difficult, and screening common genes is difficult. However, with the development of genomics, gene databases have emerged in large numbers. Through large-scale data set screening in large databases, important changes in cancer etiology can be found and these findings can be transformed into effective treatment. Therefore, this study used bioinformatics methods to mine the OS and ES-related genes included in the GEO database, and obtained the common differentially expressed genes associated with the comorbidity of OS and ES. Using online prediction tools to obtain regulated target genes, the target genes were analyzed for related biological processes, signaling pathways, gene-protein interactions, etc., to explore the similarities and differences between OS and ES in pathogenesis.
We selected the GSE17674 and GSE16088 datasets from the GEO database to find the common differentially expressed genes of OS and ES and their related biological functions. Differentially expressed genes associated with OS and ES were screened by R language. Volcano maps were drawn for the differentially expressed genes obtained from the two data sets, and the intersection of the co-expressed genes was obtained by Venn diagram to obtain 335 common differentially expressed genes. GO and KEGG enrichment analysis of differential genes showed that the occurrence and development of OS and ES were mostly related to cell cycle, sister chromatid separation, ECM receptor interaction, etc., which were consistent with the literature reports, providing a theoretical basis for the similarities and differences between OS and ES at the molecular level. And these genes are consistent in the expression of these two diseases (up-regulated or down-regulated), suggesting that the two are likely to have commonalities in a certain link or pathway in the occurrence and development. These results will help to explain and supplement the possible mechanisms of OS and ES from different perspectives, provide direction and theoretical basis for the basic research of the two comorbidities, and greatly reduce the preliminary work of basic research. Finally, we obtained five core common differentially expressed genes of NCAPG, MAD2L1, CDK1, RRM2 and RFC4 through the plug-in, which provided a new theoretical direction for further diagnosis and identification of OS and ES at the gene level.
This study suggests that NCAPG, MAD2L1, CDK1, RRM2 and RFC4 are involved in the pathogenesis of OS and ES. The NCAPG gene is significantly overexpressed in a total of 10 human sarcomas, and NCAPG affects the development and prognosis of different cancers through different pathways. It is involved in the pathogenesis of a variety of tumors, including prostate cancer, glioma, liver cancer, breast cancer, lung adenocarcinoma, ovarian cancer, colorectal cancer and gastric cancer, and its high expression is significantly associated with OS and is significantly negatively correlated with disease-free survival [23]. At the same time, GO and KEGG functional enrichment showed that NCAPG was mainly involved in the process of division of sarcoma cell cycle, which may be related to the continuous proliferation of sarcoma cells. NCAPG promotes tumor development by abnormally regulating tumor cell proliferation, invasion, metastasis, apoptosis, cell cycle, which makes NCAPG a potential diagnostic and prognostic biomarker [24]. MAD2L1 is an important part of the mitotic checkpoint complex protein. It is located on human chromosome 4. Its dysfunction may lead to chromosome instability and aneuploidy. Abnormal mitosis and mismatch of gene replication may promote the formation of human cancer [25]. MAD2L1 has been found to be overexpressed in concentrated cancers including breast cancer, lung cancer, rectal cancer, liver cancer and gastric cancer. However, the understanding of MAD2L1 in osteosarcoma and Ewing’s sarcoma is still insufficient and needs further study [26]. Among the cell cycle regulatory proteins, CDK1 often plays a role after binding to cyclins, which can promote the transition from G2 phase to M phase, regulate the progression of G1 phase and the transition from G1 phase to S phase. CDK1 is an important regulator of cell proliferation, differentiation and apoptosis. It is highly expressed in many tumor diseases and promotes cell proliferation, abnormal division and abnormal differentiation [27] [28]. GO functional enrichment indicated that CDK1 was mainly involved in the biological processes of mitosis, cell division and cell cycle. Nucleotide reduction subunit RRM2 is the catalytic subunit of ribonucleotide reductase, which can catalyze ribonucleotides to form deoxyribonucleotides and participate in DNA repair and cell cycle processes [29]. Studies have found that the level of RRM2 plays an important role in the regulation of cell cycle. The expression of RRM2 is increased in malignant tumors, which can promote tumor cell proliferation and immune escape, leading to the development of malignant tumors. It is a potential tumor prognostic marker [30] [31]. Studies have shown that patients with positive expression of RRM2 have a lower survival rate, suggesting that RRM2 is helpful to evaluate the prognosis of patients [32]. The RFC family plays an important role in DNA replication and DNA repair. Among them, RFC4, the fourth largest subunit of RFC complex, is also involved in these biological processes. RFC4, also known as replication factor C subunit 4, is located on the long arm of chromosome 3 and may be involved in the extension of multi-primer DNA templates. Current studies have shown that the RFC4 gene is significantly associated with poor prognosis in a variety of tumors [33] [34]. In summary, the above genes are related to cell cycle, cell division and abnormal differentiation. Their imbalance may lead to cell cycle disorder and induce cancer cell proliferation out of control. These genes are involved in the development of ES and OS comorbidity through different molecular mechanisms, providing potential targets and strategies for future therapeutic interventions. Therefore, the diagnosis and treatment of OS patients with ES can be studied from the core genes of NCAPG, MAD2L1, CDK1, RRM2 and RFC4. The expression of this gene may become a reliable molecular marker for judging the development of OS comorbid ES cancer cells, which provides a new theoretical direction for further diagnosis and identification of OS and ES at the gene level. In the future, scientists can use bioinformatics to combine multiple data sources to obtain more accurate and reliable pathway gene prediction results.
4. Deficiencies
The limitation of this study is that we currently focus on high-risk genes in risk assessment, and have not yet combined with low-risk genes for comprehensive analysis, and the results have a certain one-sidedness. The data set used at the same time depends on the osteosarcoma and Ewing’s sarcoma tissue data sets in the public GEO database, and lacks corresponding clinical specimens. Due to the limited number of samples included, there may be some deviations in the analysis results. In future studies, clinical samples need to be further collected to obtain more representative results, and relevant conclusions need further experimental verification.
Funding
Postgraduate Innovation Project of Youjiang Medical University for Nationalities (YXCXJH2023004/YZCXJH2023011); Guangxi Medical and Health Appropriate Technology Development and Promotion and Application Project (S201917).
NOTES
*Co-first authors.
#Corresponding author.