Insilico Identification of Genes and Molecular Pathways during Aging in Drosophila Brain ()
1. Introduction
Aging is followed by cognitive decline in brain and basic risk factor for neurodegenerative disorders. There are many toxic proteins involved in brain aging giving rise to many disorders like Alzhiemer’s, parkinsons, memory loss and synaptic plasticity. The toxic protein aggregation is influential factor involved in neurodegeneration [1]. This aging may lead to progression of age related disorder, Alzhiemer’s disorder. Amyeloid beta plagues and neurofibrillary tangles are undoubtedly major hallmarks of Alzhiemer’s which is increased due to accumulation of these toxic proteins, and remarkable pathways are associated with Alzhiemer’s like oxidative stress and low amount of antioxidants [2]. A recent publication showed that AD is not only a neurodegenerative, but it is systemic too [3] which evidences repotire of genes involved in neurodegeneration which are not only expressed in brain but also in other tissue parts. So, before finding distinctive features defined as tangles and plagues, we can identify hallmark regulators which are responsible for this accumulation.
Global transcriptome analysis provides important information about brain aging in various animal models including Drosophila which is relatively easy to handle due to its small brain size and capacity to acquire genetic manipulations. Fly brain studies are useful to untangle various neurological disorders like Alzhiemers’s [4], Parkinson’s [5] that occurs with the effect of aging and to understand the role of various pathways and regulators which play or interplay in the aging and neurodegeneration. Several groups demonstrated the gene expression modulation in aging in mammals [6] [7] [8] containing many tissue types including brain and have used the microarray data which have much more contamination and more false positives than that of RNA sequencing. Single cell RNA technique was also used to identify the brain aging precluding the gene expression changes and specific pathway information related to aging [9]. Recently, the whole transcriptome brain profiling of aging flies is published which includes both the sexes and ages with the coexpression module of genes affecting learning and memory [10] but has not shown the correlation of neurodegenerative disorders with aging.
Here, we provide the differential gene expression studies of whole fly brain of both the sexes young versus old and also the common pathways and its genes involved in aging and dementia commonly.
2. Materials and Method
In this study we have used eight different samples of paired end RNA sequence of fly heads in which 4 samples are from brain tissue of 5 days (1 male and 1 female) and 10 days (1 male and 1 female) young flies. Another 4 samples contains 20 days (1 male and 1 female) and 30 days (1 male and 1 female) brain tissue of old fly heads taken from sequence read archive database from NCBI (http://www.ncbi.nlm.nih.gov.in/SRA) under accession no. PRJNA418957. The young and old samples are bifurcated accordingly considering one male and 1 female in each age group. The RNA-seq data contains sequence reads as there are some experimental and sequencing errors which may create noise including adapters and hence they need to removed and trimmed off. The sequence reads are filtered using a cutadapt program which is used to remove adapter sequence from the reads and sequencing errors.
2.1. Mapping and Alignment of Reads
The filtered reads are aligned and mapped against the drosophila whole genome to identify the regions which are expressed in both young and old fly heads. The mapping is done using bowtie 2.0 to get the BAM file containing header, mapping quality and reads which are sequenced. There are 16 sets obtained containing reverse as well as forward reads aligned and a set of regions which are not aligned.
2.2. Counting the Reads to Annotate against the Respective Genes
The aligned reads are obtained from BAM files generated by bowtie. These reads are further counted against the protein coding gene to obtain the annotation of reads. For this featurecount program is used to obtain the counts. The reads which are overlapped with the genes are counted and annotated. The number of overlapping reads also depends upon the depth of the sequence and quality of the sequencing. The count file is simple tabular file generated containing gene IDs and number of reads against them.
2.3. Differential Gene Expression Analysis and Gene Annotation
The obtained count file is used as input for differential analysis [11] [12] of the gene i.e. the reads which are counted against the gene is analysed for differential expression in young flies and old flies to find out the gene regulation in brain of old age flies compared to young flies. Limma voom [13] [14] package is used for finding differential expression of gene. The tool is used to analyse mean variance of the samples and normalisation of each sample count file. The mean variance analysis and statistical analysis is the measure of expression of gene in each sample which can further filtered to get the most differential expression of the gene [15] [16] [17]. The differentially expressed genes are established for gene ontology analysis to get the biological function of each gene and pathways in which these genes are involved. The pathways are also designed and deduced for identifying the actual connection of each gene with the other and hence gene regulatory network of the complete highly expressed genes is obtained.
3. Results
3.1. Identification of Genes Mapped with Young and Old Fly Heads of Male and Female
Sequences obtained from different samples of aging male and female flies from SRA dataset are raw sequences containing noise and unwanted sequencing errors hence it needs to go through quality check to get efficient results, for this trimmomatic and cutadapt [18] program is used to remove noise and adapter sequences and it was found that around 60% reads are unique in reverse and 48.5% reads are unique in forward strand in paired end RNAseq data of aging flies’ heads. The refined and filtered data is deployed for mapping which is done by bowtie2.0 program against the reference genome of drosophila melanogaster to identify the genes which are affected in old flies in comparison to young flies. The mapping statistics is shown in (Figure 1) which indicates 40% - 60% of alignment which is found to be unique and round 20% - 40% reads are unaligned with the drosophila genome.
The BAM file is obtained as output file of mapping which is actually compressed form containing mapping information which is further inspected and visualised through IGV (integrated genome viewer) [19] [20] which shows mapped reads at each position and connecting lines between the aligned reads indicate reads mapped against introns (Figure 2).
3.2. Counting Number of Reads per Annotated Genes
For counting the number of reads many programs are there [21] [22] [23] corresponding to each annotated gene we used featurecount [15] program which takes the mapped BAM input and type of strandness as we have unstranded RNAseq data hence we got the counts in forward as well as reverse direction both. We also used a gtf. file as input to get the annotated genes against the reads. 20% - 30% of reads are mapped with genes and the numbers are found to be less due to use of a particular tissue type against the whole genome of flies (Figure 3).
Figure 2. The mapping regions present on the chromosome X of all the 8 samples and regions of each samples are mapped differently shows the differential expression of regions.
Figure 3. X-axis shows percent of reads assigned to the genes and Y-axis show the sample number. The graph shows basic statistics of reads which are mapped against the gene.
3.3. Identification of Differentially Expressed Features
To be able to identify differential gene expression [24] in old flies all dataset of total 8 samples having forward and reverse strands are analysed. 4 samples of young flies i.e. 5 days and 10 days old and 4 samples of old flies having 30 days and 40 days old are analysed and as we have already 8 files containing forward and reverse counts against each gene. Some samples have more reads compared to other samples which shows higher sequencing depths of the samples hence mapping of reads against the gene also depends on sequencing depth and length of the gene, longer the gene higher the reads found against them. So, to deal with such situation normalization of data is done and for that limmavoom [13] [14] is used to run Differential gene expression. The R script is used for limma package to normalise the count table obtained from different samples and this step is done to equalise the relative abundance of each gene in a RNA sample as in some cases small number of genes are highly expressed in one sample compare to another which may cause false positives hence normalisation is used for equalising the abundance. Here, we incorporated a factor called aging of flies comprising young flies and old flies. The data generated here contains the differential regulation of genes of old flies compared to young flies. And it was found that 141 genes are upregulated and 262 genes are down regulated in old flies as compared to young flies out of approx. 17,555 genes. The summary table is obtained containing gene identifiers, mean of normalised counts average to all samples, fold change in log2, standard error estimate for log2 fold change, wald statistics, p value, p value adjusted to multiple testing which controls FDR. A graphical summary of results is obtained which contains mean variance plot, mds and box plots and volcano plot (Figures 4-6) which show the variance and similarity in the gene expression of different samples.
3.4. Extraction and Annotation of Differentially Expressed Genes
This obtained data is filtered to extract and annotate differentially expressed genes by identifying absolute fold change greater than 2 and for that first the significant adjusted p value is calculated by filtering all values less than 0.05 and found 430 genes which are significantly expressed further this result is narrowed down by filtering abs log 2 fold value greater than 1. It was found that there are 350 genes which are expressed differentially with significant adjusted p value. The mean variance of table is calculated and Z scores are calculated and annotated using DAVID. The top 32 genes with Z scores are shown in heatmap (Figure 6).
3.5. Gene Ontology Analysis
The genes are further analysed to identify their presence in various biological pathways and to understand the process of aging in more depth. The Go analysis reduces the complexity by linking the gene directly to its biological and molecular function. Goseq program is used to identify the pathways related to these genes. The wallneius Rank category method is used to get corresponding GO
(a)(b)(c)
Figure 4. (a) shows fold change value red dots signifies genes having positive values i.e. genes which are upregulated and blue dots shows gene having negative log fold value i.e. genes are down regulated. (b) mean variance model shows number of genes present in outliers and the values of genes fall under mean variance value. (c) shows log fold change value of gene i.e. the negative values plotted shows the down-regulated genes and positive values show the up-regulated genes.
Figure 6. Showing heatmap of topmost differentially expressed genes. The rows represents name of the genes and column represents name of samples. Blue color shows negative scores i.e. down regulated genes in the samples and Red colour shows upregulated genes in the sample heads of old flies and young flies.
term over represented p value, under represented p value, number of differentially expressed genes, number of genes in this category and details of the term is calculated. A graph of top 10 over represented GO term is given in the (Figure 7) which shows that genes involved in stress response, apoptotic process, histone modification, covalent chromatin modification, protein modification by small protein and protein ubiquitination, immune and defence response regulation. The KEGG pathways plotted using pathview [25] against topmost expressed genes are also generated which shows complete information about the genes and their role in pathways involved in aging and dementia (Figure 8).
Figure 7. Topmost overrepresented GO term against the corresponding genes which are differentially expressed in old flies as compared to young flies.
Our result provides the link between aging brain and dementia which can be further deployed to establish therapeutics against neurodegeneration and aging brain. As, the data shows that both run parallel to each other and factors affecting brain aging strongly contributes to neurodegeneration and dementia.
4. Conclusion
Mitochondrial DNA damage and oxidative stress run parallel during aging and lead to dementia as reactive oxygen species production surpasses cellular antioxidants defence system which contains antioxidant enzymes and this declination can be easily seen in AD brain [8]. It is also evidenced that AD affects brain
Figure 8. This figure shows different metabolic and biological pathways in which the genes are present which are differentially expressed in aging flies. The red star shown in the pathway is denoting a gene which is found to be expressed differentially. (a) is showing oxidative phosphorylation pathway and metabolic pathway. (b) is glycan degradation pathway (c) glycerophospholipid metabolism (d) is neuroactive ligand receptor interaction.
and periphery in which many causative pathways play/interplay significant role and simultaneously many factors risk AD like diabetes, obesity, hypertension, stroke and other cardiovascular risk factors [26]. Neurodegeneration and Aging are inseparable but who is under who is still a debate because of along with aging function of brain changes and aggregation of toxic protein increase. At the same time, apoptosis occurs, leading to reduction in brain volume [27]. Once again mitochondrial energy production decreases due to increase in oxidative stress and finally decline in mitochondrial function is a cause of aging.
Recently the brain transcriptome change in the aging flies is being published [9] showing the pathways and genes involved in aging. In our study, we found 262 genes are down regulated in old flies which are associated with many pathways required for normal functioning of brain and body and are required to promote neuronal growth and neurogenesis in aging brain. A recent discovery of progressive increase of many genes involved in high level of nucleic acid oxidation genes in mitochondrial DNA with aging and in AD cases leads to oxidative stress [28] [29] [30] and contributes strongly in aging and in dementia as here also gene ontology analysis of genes we identified CG10211, CG13280, CG16761, ND6 gene in drosophila shows high amount of expression of nucleic acid oxidation genes present in mitocondrial DNA. Dementia is a common consequence of diabetes as insulin level may affect neurotransmission, cell survival and amyloid trafficking [31] [32]. The differential expression data in our study also shows many genes like CG7985, Glucosidase 2 alpha subunit, involved in insulin modulation. Inflammatory proteins in plasma are also associated with severity of dementia [33]. Lactin-galC1, Drosomycin like gene is also found to be down regulated and provides defence against fungal infection. Okouchi M et al., evidenced change in apotosis regulation leads to many neurodegenerative disorders like Alzhiemer’s, Parkinson’s, Huntington's (HD) diseases, amyotrophic lateral sclerosis (ALS), spinal muscular atrophy (SMA), and diabetic encephalopathy. In flies, our results show that Drep3, DNA pol alpha, psn (presinilin) are differentially regulated with aging in flies. Signal transduction pathways, epigenetic regulation, immune system, vascular system and angiogenesis aberrant regulation are major causes of neurogenesis and we found out various genes, receptors and harmon related to these pathways showing change in expression in adult flies [34]. Many genes, peptides and receptors are mapped with the reads of adult (aging) and young flies show change in expression related to neurogenesis and neurotransmitter decline like chemosensory protein, calcinuerin, Pyrokinin 2 receptor, mucin, Adipokinetic harmone, spatzle, suppressor of zeste, pleiohomeotic and mthl. Hence, in depth study of these genes, receptors, peptides and regulators will also help us in construing specific genes which can be controlled during aging to make the brain function normally by using medical therapeutic technique.