Identification of Hub Genes Associated with Hepatocellular Carcinoma Prognosis by Bioinformatics Analysis

Objective: This study aimed to identify hub genes that are associated with hepatocellular carcinoma (HCC) prognosis by bioinformatics analysis. Methods: Data were collected from the Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) liver HCC datasets. The robust rank aggregation algorithm was used in integrating the data on differentially expressed genes (DEGs). Online databases DAVID 6.8 and REACTOME were used for gene ontology and pathway enrichment analysis. R software version 3.5.1, Cytoscape, and Kaplan-Meier plotter were used to identify hub genes. Results: Six GEO datasets and the TCGA liver HCC dataset were included in this analysis. A total of 151 upregulated and 245 downregulated DEGs were identified. The upregulated DEGs most significantly enriched in the functional categories of cell division, chromosomes, centromeric regions, and protein binding, whereas the downregulated DEGs most significantly enriched in the epoxygenase P450 pathway, extracellular region, and heme binding, with respect to biological process, cellular component, and molecular function analysis, respectively. Upregulated DEGS most significantly enriched the cell cycle pathway, whereas downregulated DEGs most significantly enriched the metabolism pathway. Finally, 88 upregulated and 40 downregulated genes were identified as hub genes. The top 10 upregulated hub DEGs were CDK1, CCNB1, CCNB2, CDC20, CCNA2, AURKA, MAD2L1, TOP2A, BUB1B and BUB1. The top 10 downregulated hub DEGs were ESR1, IGF1, FTCD, CYP3A4, SPP2, C8A, CYP2E1, TAT, F9 and CYP2C9. Conclusions: This study identified several upregulated and downregulated hub genes that are associated with the prognosis of HCC patients. Verification of these results using in vitro and in vivo studies is warranted.


Introduction
Hepatocellular carcinoma (HCC) is one of the six most common cancers and the third leading cause of cancer-related death [1]. The most effective treatment for HCC is curative resection, which includes liver transplantation and hepatectomy [2]. However, only 20% of the HCC patients are candidates for curative hepatectomy and HCC often recurs shortly after surgery [2]. Palliative treatment is the main treatment modality for most HCC patients [2]. However, transarterial chemoembolization is only a regional therapy for intrahepatic tumors [2], and it cannot be used for extrahepatic metastatic tumors, including lung metastasis, bone metastasis, or circulating tumor cells. Hence, local therapy combined with systemic therapy may be an ideal scheme. Although most HCC are caused by viral infections, anti-viral therapy is somehow effective [3], but it does not directly prevent the occurrence and development of liver tumors. Therefore, target therapy for tumorigenesis may be a promising systematic treatment. However, current target therapies have limited effects on the prognosis of HCC patients. Although sorafenib could improve the survival time of advanced HCC patients, it has not substantially changed the outcome of patients with HCC, and the median survival time of patients treated with sorafenib is only three months longer than with palliative treatment [4]. Additionally, the application of sorafenib posthepatectomy does not achieve the expected benefits [5]. The median disease-free survival time was 8.5 (2.9 -19.5) and 8.4 (2.9 -19.8) months in the sorafenib and placebo groups, respectively [5].
Although many researchers focused on studying the mechanism of HCC tumorigenesis [6] [7] [8], the mechanism of hepatocellular tumorigenesis is still not been fully elucidated, and an effective therapeutic target is still needed to improve the prognosis of HCC. In recent decades, due to the rapid development of highthroughput sequencing and the wide application of gene chips, extensive studies elucidating the mechanism of HCC tumorigenesis have been conducted, and bioinformatics analysis has helped in identifying key genes in hepatocellular carcinogenesis [7] [9] [10]. Shen et al. reported that TOP2A, NDC80, FOXM1, HMMR, KNTC1, PTTG1, FEN1, RFC4, SMC4, and PRC1 are the top 10 core genes associated with HCC tumorigenesis [10]. Chen et al. reported that TOP2A, CDC20, MAD2L1, BUB1B, RFC4, CCNB1, CDKN3, CCNB2, TPX2, and FEN1 are the 10 hub genes in hepatitis B virus-related HCC [9]. However, some datasets in these studies consisted of less than 10 cases, and the Venn diagram was used to integrate their gene expression data. These two factors may miss several important genes during extraction of DEGs [9] [10]. Therefore, to identify the hub gene of HCC and to achieve more reliable results, this study also conducted bioinformatics analysis based on several datasets and each dataset consisting of >30 cases. In addition, robust rank aggregation (RRA) algorithm was used to integrate the expression data of various datasets to identify differentially expressed genes (DEGs).

Microarray Datasets
Gene expression data were retrieved from Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo) and The Cancer Genome Atlas (TCGA) liver HCC datasets (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/t cga). Only datasets with cases > 30 were included in the analysis. Only datasets consisting of HCC tissues and adjacent non-tumor tissues were included in the analysis.

Identification of the DEGs
Identification of DEGs was performed with R software ver. 3.5.1 using the "limmapackage". The log 2 |fold change (FC)| of gene expression in tumor tissues compared to adjacent non-tumor tissues was calculated. Gene expression difference with |logFC| > 1.0 and adjusted p value < 0.05 were regarded as DEGs for each included dataset. However, DEGs among these datasets were calculated using R software based on "RobustRankAggreg" package [11]. Integrate genes with significance score lower than 0.05 was regarded as DEGs [11]. These DEGs were verified using Oncomine (https://www.oncomine.org/resource/main.html) and GEPIA (http://gepia.cancer-pku.cn/).

Gene Ontology and Pathway Enrichment Analysis of DEGs
Online database DAVID 6.8 (https://david.ncifcrf.gov/) was used for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of DEGs. GO analysis consists of biological process (BP), cellular component (CC), and molecular function (MF) analysis. The online database Reactome was also used for pathway enrichment analysis of DEGs (http://www.reactome.org).

Identification of Potential Hub DEGs by Protein-Protein Interaction (PPI) Network Analysis
A PPI network was constructed using online STRING database (http://string-db.org). Cytoscape software version 3.7.1 was used to identify potential hub genes. DEGs with interaction degrees ≥ 10 were regarded as potential hub genes. MCODE plug-ins were used to explore closely related functional modules in the PPI networks. The degree cutoff of module network scoring was set to 2. The node score cutoff, K-core, and maximum depth of module finding were set to 0.2, 2, and 100, respectively.

Identification of Hub DEGs by Survival Analysis
The clinical data were collected from the TCGA liver HCC dataset. Clinical data from a total of 364 patients were included in the survival analysis. The survival results were calculated using Kaplan Meier plotter (http://kmplot.com/analysis/). The Kaplan Meier method was used in survival analysis. These results were verified using the online database GEPIA (http://gepia.cancer-pku.cn/).

Data Characteristics
Six GEO datasets (GSE14520, GSE25097, GSE36376, GSE57957, GSE76427, and GSE121248) and the TCGA liver HCC dataset were analyzed in this study. A total of 1328 tumor tissues and 834 adjacent non-tumor tissues were included in the analysis. The characteristics of the included datasets were detailed in Table  1. All datasets included tumor tissues and adjacent non-tumor tissues.

Identification of DEGs
Using the R software, DEGs in each dataset were identified. The gene expression data in each dataset are shown using a volcano map ( Figure 1). The RRA algorithm identified significant DEGs among these datasets. A total of 151 upregulated and 245 downregulated genes were identified ( Table 2). The top 20 upregulated and downregulated DEGs are shown in Figure 2.

GO and Pathway Enrichment Analysis of DEGs
After GO enrichment analysis for upregulated DEGs, a total of 74, 37, and 26 items with P values <0.05 were identified in BP, CC, and MF enrichment analysis, respectively, while for downregulated DEGs, a total of 87, 21, and 34 items with P values <0.05 were identified in BP, CC, and MF enrichment analysis, respectively ( Figure 3). The upregulated DEGs significantly enriched in the categories of cell division, mitotic nuclear division, and sister chromatid cohesion, whereas the downregulated DEGs significantly enriched in the categories of epoxygenase P450 pathway, oxidation-reduction processes, and drug metabolic processes, with respect to BP analysis. The upregulated DEGs were mostly involved in the chromosome, centromeric region, midbody, and kinetochore, whereas the downregulated DEGs were mostly involved in the extracellular region, organelle membranes, and extracellular exosomes with respect to CC analysis. The upregulated DEGs mostly enriched in the categories of protein binding, ATP binding, and protein kinase binding, whereas the downregulated DEGs most enriched in the categories of heme binding, oxidoreductase activity, acting on paired donors with incorporation or reduction of molecular oxygen, and oxygen binding with respect to MF analysis. After KEGG and REACTOME pathway enrichment analyses of DEGs, a total of 51 and 44 enrichment pathways with P values <0.05 were identified for upregulated and downregulated DEGs, respectively. The upregulated DEGs mostly enriched in the categories of cell cycle, oocyte meiosis, and progesterone-mediated oocyte maturation, while the downregulated DEGs mostly enriched in the categories of metabolic pathways, retinol metabolism, and tryptophan metabolism with respect to KEGG pathway analysis ( Figure 4). The upregulated DEGs mostly enriched in the categories of resolution of sister chromatid cohesion, separation of sister chromatids, and mitotic prometaphase, while the downregulated DEGs mostly enriched in the categories of metallothioneins bind-metals, CYP2E1 reactions, and xenobiotics with respect to REACTOME pathway analysis ( Figure 4).

Identification of Potential Hub Genes by PPI Network Analysis
A total of 357 DEGs were filtered into the PPI network analysis ( Figure 5). By taking the interaction degree of ≥10, a total of 165 DEGs that included 104 upregulated DEGs and 61 downregulated DEGs were identified as potential hub genes (Table 3).  A total of 15 functional modules were found in the MCODE plug-in in Cytoscape, and three significant modules are shown in Figure 5. Among these, module 1 included 78 nodules and 2724 edges. The genes in module 1 mainly enriched in the categories of cell division, nucleus, and protein binding with respect to BP, CC, and MF analysis, respectively ( Figure S1). These genes mostly enriched in the categories of the cell cycle, oocyte meiosis, and progesterone-mediated oocyte maturation with respect to KEGG pathway analysis ( Figure S2) and enriched in the categories of resolution of sister chromatid cohesion, separation of sister chromatids, and mitotic prometaphase with respect to REACTOME pathway analysis. All DEGs in module 1 were upregulated. Module 2 included 16 nodules and 69 edges. The genes in module 2 mainly enriched in the categories of complement activation, classical pathway, membrane attack complex, serinetype endopeptidase activity, and complement and coagulation cascades pathway after BP, CC, MF and pathway enrichment analysis, respectively ( Figure S1 and Figure S2). Module 3 included 10 nodules and 37 edges. The genes in module 3 mainly enriched in the categories of the epoxygenase P450 pathway, organelle membrane, oxygen binding, and xenobiotics pathway with BP, CC, MF and pathway enrichment analysis, respectively ( Figure S1 and Figure S2). All DEGs in modules 2 and 3 were downregulated.
The results of survival analysis of the top 10 upregulated and top 10 downregulated hub genes are shown in Figure 6.

Discussion
The identification of hub genes by bioinformatics analysis is widely used in tumorigenesis research, including HCC [9] [10]. However, most studies employ the Venn diagram to identify hub genes [9] [10], and thus some important DEGs may be missed. In addition, not all research studies use the same gene chip platform and not all gene chip platforms have all the gene probes. In addition, over time, more genes are being discovered, so that some genes that have not previously been detected by microarrays may be differentially expressed genes and have important functions. Robust rank aggregation algorithms can avoid these discrepancies and integrate data from different platforms [11]. Another difference from previous studies is that the minimum sample size of the dataset included in this study is 39, whereas that of previous studies is <10 [9] [10]. The small sample size of datasets may lead to biased or statistically insignificant results. This study identified more hub genes than previous studies because this study utilized datasets with a larger sample size and integrated DEG data using an RRA algorithm.
Our study identified 151 upregulated DEGs, which is relatively higher in number than previous studies [9] [10]. Through GO analysis, these genes were determined to mostly enrichedin the functional categories of cell division, chromosome, centromeric region, and protein binding using BP, CC and MF analysis, respectively. Pathway analysis revealed that these genes enriched in the categories of the cell cycle, resolution of sister chromatid cohesion, and separation of sister chromatids pathway. These results are similar to those of previous studies [10] [12]. Module 1 is the most significant network complex that was filtered by PPI network analysis, and all genes are upregulated. The DEGs in Module 1 is similar to that described by Shen [10]. These genes play an important role in tumor cell cycle. Blocking the expression of any gene may lead to a disruption of the cell cycle, thereby delaying the occurrence and development of tumors. In addition, we also identified a total of 245 downregulated DEGs. Through GO analysis, these genes were shown to mostly enriched in the categories of epoxygenase P450 pathway, extracellular region, and heme binding with BP, CC, and MF analysis, respectively. Pathway analysis showed that these genes enriched in metabolic pathways, metal-binding metallothioneins bind-metals, and retinol metabolism. These results have also been described in previous studies [10] [13]. Module 2 is the second most significant network complex that was filtered using PPI network analysis, and the genes in module 2 are all downregulated. However, the genes in module 2 differ from that described by Shen [10]. Furthermore, the genes in module 2 enriched in the complement and coagulation cascades pathway. The genes in module 3 enriched in the xenobiotics pathway, which is similar to module 2 of Shen's study [10], but the genes in module 3 in our study significantly vary from that in module 2 of the same study [10]. These discrepancies may be attributed to the sample size of the datasets employed in the analysis and variations in the integration algorithm employed [10].
PPI network analysis was used to identify potential hub genes. Because not all DEGs are associated with HCC prognosis, we identified hub genes that may be related to HCC prognosis. We identified 141 hub genes, which is relatively higher than that reported by previous studies [9] [16]. Wu et al. reported that the CDK1 inhibitor RO3306 can improve the efficacy of sorafenib in the treatment of HCC patient-derived xenograft tumor models by blocking the CDK1/PDK1/ β-catenin signaling pathway [14]. CCNB1 and CCNB2 encode proteins that are associated with CDK1 and are essential to the cell cycle by regulating the G2/M transition. CCNA2 controls G1/S and G2/M phase transitions by forming a serine/threonine protein kinase complex with CDK1 or CDK2. Gu et al. reported that microRNA-144 can inhibit cell proliferation, migration, and invasion of HCC by targeting CCNB1 [17]. Chai et al. reported that FOXM1 can promote cell proliferation in HCC by directly binding and activating the CCNB1 gene at the transcriptional level [18]. Li et al. reported that the overexpression of CCNB2 promotes cell proliferation and migration of HCC via the CCNB2/PLK1 pathway, and CCNB2 is associated with poor prognosis [19]. Yue et al. reported that knocking down CCNA2 reduces cell proliferation, which is caused by ZHX2 knockdown [20]. CDC20 act as a regulatory protein at multiple points in the cell cycle by interacting with other proteins. CDC20 is required for the activity of the anaphase-promoting complex/cyclosome, and the complex is modulated by MAD2L1, which is a component of the mitotic spindle assembly checkpoint. Li et al. reported that silencing CDC20 can inhibit cell proliferation by G2/M-phase arrest [21]. Li et al. and colleagues reported that miR-200c-5p can suppress cell proliferation, migration, and invasion of HCC by downregulating MAD2L1 [22]. AURKA is a mitotic serine/threonine kinase that regulates cell cycle progression. Chen et al. reported that the overexpression of AURKA induces entry into the epithelial-mesenchymal transition and cancer stem cell behaviors via the PI3K/ AKT pathway and silences AURKA-suppressed radiation-enhanced cell invasiveness in HCC [23]. Gao et al. reported that downregulation of AURKA results in G2/M phase cell arrest and induces apoptosis of HepG2 cells [24]. Also, Zhang et al. reported that AURKA promotes chemoresistance in HCC by targeting the NF-kappaB/microRNA-21/PTEN signaling pathway [25]. TOP2A is essential for proper segregation of daughter chromosomes during mitosis and meiosis by controlling the topological states of DNA. Sudan et al. reported that quercetin-3-Oglucoside induces cell cycle arrest in HCC by suppressing TOP2A [26]. Both BUB1B and BUB1 encode serine/threonine-protein kinases that are involved in spindle checkpoint functions during mitosis [27]. Xu et al. reported that MiR-490-5p could inhibit cell proliferation, invasion, and migration as well as increase apoptosis by regulating the TGFβ/Smad signaling pathways by inhibiting BUB1 [28]. The top 10 upregulated DEGs are involved in mitosis and cell cycle regulation.
This study also identified several downregulated hub genes that enriched in various metabolic pathways. However, these genes are also associated with HCC prognosis, indicating that these genes play an important role in HCC proliferation and progression. The top 10 downregulated DEGs were ESR1, IGF1, FTCD, CYP3A4, SPP2, C8A, CYP2E1, TAT, F9, and CYP2C9. ESR1 encodes an estrogen receptor that binds with a ligand steroid hormone to regulate gene expression and cell proliferation and differentiation in eukaryotic tissues. Tu et al. re-ported that the overexpression of ESR1 induces apoptosis in ESR1-negative Hep3B cells by binding ESR1 to SP1 protein [29]. Additionally, this ESR1-SP1 complex binds to the TNFα gene promoter and active caspase 3 in a ligand-dependent manner [29]. Liu et al. reported that microRNA-18a stimulates cell proliferation in HCC by suppressing ESR1 expression [30]. Therefore, microRNA-18a could block the protective effects of estrogen and promote the development of HCC in women [30]. Additionally, Dai et al. reported that ESR1 may be a tumor suppressor gene in HCC, and ESR1 may be repressed by promoter hypermethylation [31]. IGF1 encodes a protein that is similar to insulin and is involved in mediating growth and development. This study has also shown that IGF1 is positively correlated with HCC prognosis. In addition, Shen et al. reported that preoperative low serum levels of IGF-1 indicate poor prognosis in HCC patients who have undergone hepatectomy [32], and Cho et al. reported that low baseline serum IGF-1 levels are independently correlated with shorter time to progression and poorer overall survival in patients with HCC who underwent TACE [33]. Conversely, several studies have found that the IGF1 is positively correlated with HCC tumorigenesis [34] [35]. Liu reported that IGF-1 promotes the invasive and migratory ability of HCC cells by epithelial-mesenchymal transition via activating survivin [35]. Lei et al. reported that IGF-1 induces the growth and metastasis of HCC by inhibiting proteasome-mediated CTSB degradation [34]. Therefore, the function and mechanism of IGF1 in HCC tumorigenesis still need to be verified in future studies. FTCD encodes a protein that participates in histidine catabolism. Naama and colleagues reported that histidine catabolism drains the cellular pool of tetrahydrofolate, which is an essential cofactor in nucleotide synthesis [36]. Thus, upregulating FTCD may help inhibit cell proliferation. CYP3A4, CYP2E1, and CYP2C9 encode several members of the cytochrome P450 superfamily of enzymes. Cytochrome P450 proteins can catalyze many reactions that are involved in the synthesis of cholesterol and steroids and in drug metabolism. Ashida et al. identified that CYP3A4 is a novel tumor suppressor gene, and its downregulation is related to poor HCC prognosis [37]. Liu et al. reported that HBx promotes cell growth by inhibiting CYP2E1 expression by downregulating HNF4α [38]. Yu et al. reported that CYP2C9 is suppressed in HCC cells by hsa-miR-128-3p [39]. These studies indicate that CYP2C9, CYP2E1, and CYP3A4 are protective biomarkers for HCC patients, but their underlying mechanisms remain unclear [37] [38] [39]. These possibly inhibit tumorigenesis by regulating metabolic pathways [40] [41], as indicated by the results of pathway enrichment analysis in this study. C8A encodes protein that is a component of the complement cascade system. F9 encodes a protein that is involved in the blood coagulation cascade. However, these two genes also positively correlated to overall survival in HCC, although the underlying mechanisms are unknown. These may involve regulation of the complement and coagulation cascades pathway in which the two genes are involved, as demonstrated in the results of KEGG pathway enrichment analysis. Lao et al. reported that phosphoprotein SPP2 can inhibit the growth and bone metastasis of BMP2-induced prostate cancer [42]. However, the mechanism of lower SPP2 expression that is associated with poor prognosis of HCC remains unclear. Fu and colleagues reported that the downregulation of TAT contributes to the development and progression of HCC [43]. However, the mechanism of TAT in the tumorigenesis inhibition of HCC is not clear. These genes may also become novel targets for the treatment of HCC, and the mechanisms of these genes may be elucidated in future studies.
The hub genes identified in this study are much more reliable than hub genes identified in previous studies although this study identified much more hub DEGs than previous studies. This may be due to the use of a larger sample size for the included datasets, thereby reducing selection bias. The second reason is that the RRA algorithm in integrating DEGs may decrease data omissions. The third reason is that the DEGs were associated with the prognosis of HCC. However, the study also has several limitations. First, our prognostic analysis is based on TCGA data only. The sample size is relatively small and there is a certain bias risk. Second, these DEGs still need to be validated by in vivo and in vitro studies, as there were only identified by bioinformatics analysis. Also, the mechanism of several DEGs still need to be elucidated.

Conclusion
This study identified a significantly higher number of DEGs that are associated with HCC prognosis. Upregulated DEGs enriched in the cell cycle, whereas downregulated DEGs enriched in the metabolism pathway. These results needed to be verified by in vivo and in vitro studies.

Data Availability
The gene expression data supporting this bioinformatics analysis are from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo) and The Cancer Genome Atlas liver hepatocellular carcinoma (https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomi cs/tcga) datasets. Journal of Cancer Therapy Figure S2. The KEGG and REACTOME pathway enrichment analysis of genes in module 1 (a), 2 (b) and 3 (c).