Integrated Analysis of the Gene Expression Profiling and Copy Number Aberration of the Ovarian Cancer

Objective: DNA copy number alterations and difference expression are fre-quently observed in ovarian cancer. The purpose of this way was to pinpoint gene expression change that was associated with alterations in DNA copy number and could therefore enlighten some potential oncogenes and stability genes with functional roles in cancers, and investigated the bioinformatics significance for those correlated genes. Method: We obtained the DNA copy number and mRNA expression data from the Cancer Genomic Atlas and identified the most statistically significant copy number alteration regions using the GISTIC. Then identified the significance genes between the tumor samples within the copy number alteration regions and analyzed the correlation using a binary matrix. The selected genes were subjected to bioinformatics analysis using GSEA tool. Results: GISTIC analysis results showed there were 45 significance copy number amplification regions in the ovarian cancer, SAM and Fisher’s exact test found there have 40 genes can affect the expression level, which located in the amplification regions. That means we obtained 40 genes which have a correlation between copy number amplification and drastic up- The combination of the copy number data and expression has provided a short list of candidate genes that are consistent with tumor driving roles. These would offer new ideas for early diagnosis and treat target of ovarian cancer.


Introduction
The mortality and incidence of ovarian cancer are the first and second among the female reproductive-system cancers, respectively [1]. Its early detection/diagnosis is difficult; when symptoms manifest, pelvic or distant metastasis are usually observed [2], making complete removal of the tumor difficult, with 5-year-survival still remaining in 30% [3] [4] [5]. To improve its early detection and thus survival, understanding the molecular pathways in this disorder may be critical, which may open new therapeutic strategies.
The copy number aberration (CNA) is the structure variation of the different copy number in the special region of the genomic. From the genetic material of tumor tissue get the "driver CAN" is the main objective of the tumor diagnose and cytogenetic studies. Such alterations can indicate the genomic instability of a tumor and are a result of acquired somatic mutations in the evolution of tumor cells from a normal state to a neoplastic state. Therefore, characterization of genomic abnormalities may help elucidate the molecular pathogenesis of ovarian cancer as well as reveal the gentic markers of progression.
Many different expression genes have been identified in the gene expression profile, while most of those genes may be the "passenger gene", which have a limit affective for the tumor development [6] [7]. The key challenge has been to identify driver oncogenes or tumor suppressor genes that play important roles during tumor initiation and progression [8]. Genomic DNA copy number aberration is an important type of genetic alteration observed in tumor cells, and it contributes to tumor evolution by alterations of the expression of genes within the region. Identified the over-expressed and amplification genes may be having a benefit, because these gene may represent driver gene aberrations [9].
Many studies had reported the CNA and different expression profile in the ovarian cancer, respectively [10]- [18], while few further studies had been carried out to explore the correlation between the amplification of CNA and gene expression. In this paper, we aimed at study the correlation between the amplification of CNA and expression genes, we only analyzed the genes located in the chromosomal regions with recurrent aberrations. The purpose of this way was to pinpoint gene expression change that were associated with alterations in DNA copy number and could therefore enlighten some potential oncogenes and stability genes with functional roles in cancers, and investigated the bioinformatics significance for those correlated genes.

Material
In this study, we used the data from The Cancer Genome Atlas (TCGA) project. We downloaded 200 patients of ovarian cancer of the Level 2 copy number data and level 3 mRNA expression data from the DCC data portal, which used the same platform to measurement. We also download 50 patients of the normal ovarian tissue of the mRNA expression data.

Array-Based CGH Analysis
To identify possible regions of amplification, we segmented the level 2 copy number data using Circular Binary Segmentation (CBS) algorithm [19] [20]. It is included in Bioconductor package DNAcopy (http://www.bioconductor.org). To identify the significance regions of common aberrations across all hybridizations, the Genomic Identification of Significant Targets in Cancer (GISTIC) approach was utilized on the data [21].

Integrated Analysis of the Copy Number Data and Expression Data
The purpose of this study was to pinpoint gene expression change that were associated with alterations in DNA copy number and could therefore enlighten some potential oncogenes and stability genes with functional roles in cancers [22]. First, two-class unpaired Significance Analysis of Microarray (SAM) was used to find the genes, which located in the copy number amplification regions that have a differentially expressed between the tumor and normal samples. Genes with an FDR < 0.05 were considered to have significant differential expression and passed to the next stage.
Next, we build two matrices: one expression and one CNA, which are gene (row) by sample (column). At this stage, the CNA matrix is binary: if copy number amplification occurs in a particular gene in particular sample the element is one, otherwise the element is zero. Then, two-class unpaired SAM is used to find genes that are differentially expressed with respect to the copy number amplification status of a particular gene across the all tumor samples. Genes with an FDR < 0.05 are considered to have significant amplification-correlated differential expression and are passed to further analysis.
Last, an expression matrix is created, this time only containing genes deemed to have significant amplification-correlated differential expression in the previous step. Then, the matrix is converted to a diff-expressed binary matric with the following calculation: 1) the z-score for each expression matrix element is calculated with respect to that element's row (i.e., gene specific); this is repeated for each row (gene). 2) For the diff-expressed binary, any element with a z-score > 2.0 or z-score < −2.0 is 1, otherwise the element is 0. Then, Fisher's exact p-value is calculated for each gene in the expression matrix by populating a two-by-two contingency table with a binary expression vector (category one) and the CNA vector (category two); this process is repeated for each binary expression vector from the binary expression matrix. This calculation allowed us to recover only genes that had drastic amplification-correlated expression and to assign each correlation with an exact p-value. The entire process is repeated once for each amplification gene [23]. The entire process implemented in R language [24]. Figure 1 showed the copy number segments for the two samples, average of 444 segments for per sample. In order to identify the significance regions of aberrations from the large of copy number segments, we used the GISTIC method to the segments data.  Table 1, including the frequency, the possible target genes, chromosome position, q-value. Several oncogenes previously known to have copy number changes in human ovarian cancer, such as CCNE, EVI1, MYC, FGFR3 and KRAS, were readily identified by GISTIC.

Integrated Analysis of Copy Number Aberration and Gene Expression
The corresponding gene expression probes within these CNARs were mapped to   139 unique genes. To evaluate whether the expression levels of the 139 genes were differentially expressed, we applied SAM statistical analyses on gene expression data between tumor and normal tissues. We identified 55 individual genes have significant differentially expressed (Figure 3). Among them, 45 genes showing concordance in the same directional change of both CNA and gene expressed were selected for further exploration.
To further analysis these genes which have the same directional change of both CNA and gene expression, patients were divided into two groups as described in the methods: the "copy number varied" group and the "copy number neutral" group. Next, for each one of such genes, an unpaired two class SAM method was applied to the two groups, by which we found 44 genes which can Journal of Cancer Therapy influence the expression levels between tumor tissues with and without copy number alterations. To confirm the impact is the resulted of copy number alteration, we performed Fisher's exact test as the methods described, and identified 40 genes lead to at least one of the 17,765 genes differentially expression. That means we obtained 40 genes which have a correlation between copy number amplification and drastic up-and down-expression, which p-value < 0.05 (Fisher's exact test) and an FDR < 0.05. These results indicate that CNAs are important elements in driving downstream gene signaling in ovarian tumorigenesis.

Gene Set Enrichment Analysis (GSEA)
In order to explored the 40 genes functional in the cancer progress and development, we used the Molecular Signatures Database v4.0 in the GSEA investigate gene sets. We found there have several gene sets was overlapped with our genes [15]- [20]. Among them, most gene sets were association with kinds of cancer. The gene sets detail description was in Table 2.

Discussion
It is well-known that there are many causative elements contributing to cancer progression and tumorigenesis, such as transcriptional dysregulations, sequence mutations, and genetic variations. Among these complicated factors, Copy Number Alterations (CNAs) have been widely reported to serves as a key driver of genetic variation [25]. In this study, we analyzed CNAs by array CGH. Frequent chromosomal regions with high levels of amplifications and deletions were identified from the study. Additionally, to account for the complex relationship between copy number and gene expression, we performed an integrated analysis on ovarian cancer to identify differentially expressed genes with concordant genomic alterations and explored the impact on the gene expression. Finally, gene set enrichment analysis was used to find these driver genes bioinformatics information.   [30]. Noticeably, gain at 3q26.2 was detected at the highest frequency (85%) and 8q24.3 at the second (80%). Taken together, we speculated that identified CNAs, especially gain 3q26.2 and 8q24.3 as well as including candidate genes (EVI1, NFKBIL2, FOXH1, FBXL6, CPSF1, CYHR1, VPS28, SLC39A4, GPR172A, KIFC2, ADCK5), may play an important biological role in the pathogenesis in ovarian cancer. Indeed, a detail genomic analysis of gene EVI1 has been performed on ovarian cancer cells [31] [32]. Furthermore, we also found some putative oncogenes in these CNARs, such KRAS, CCNE1, MYC. etc.
Regarding the 174 genes residing in CNARs, significantly different expression associated CNAs was detected in 55 genes (40%). Among the selected genes, 45 genes (82%) showed positive correlation between CNA and mRNA expression and 10 genes (18%) showed negative correlation. The most positively correlated gene, TACC3, was identified here but no functional study is available at this time. However, the second gene, CCNE1, has been shown to play an important role in the development and processes of ovarian cancer [33] [34] [35]. In addition, the elevated correlations of the 45 concordantly changed genes further evidenced that our statistical approaches are able to efficiently identify dysregulation genes based on CNA.
To further explore whether these 45 CNA-driven genes can affect the mRNA expression levels, the SAM statistical and Fisher exact test have been implemented as describe in methods. After excluding those not affect mRNA expression levels (p-value < 0.05, Fisher's exact test), only 40 (80%) genes remained for further analysis. The results shown that most copy number alterations can affect the mRNA expression levels, especially those putative oncogenes of ovarian cancer as previously reported in the ovarian cancer research, such as CCNE1, KRAS, EVI1.

Conclusion
Based on these analyses, we believe that the identification of driver genes in tumor amplicons can be greatly facilitated by studying gene expression patterns in conjunction with gene network data. The combination of the copy number data and expression has provided a short list of candidate genes that are consistent with tumor driving roles.

Funding
The study was partially supported by a grant from Guangdong Province innovation school project No. 2018KQNCX128, Xiamen Medical and Health Guidance Project No. 3502Z20209111