Compilation and Analysis of Atherosclerosis Gene Expression Data

The objective of this project was to search for consensus in differential gene expression data and in regulation of differentially expressed genes among DNA microarray studies of atherosclerotic vessels and plaque. Seventeen DNA microarray studies of atherosclerosis were analyzed. Only 19 genes were found to be differentially expressed in 3 or more of the studies. The nineteen genes belong to classic gene ontologies known to be involved in atherosclerosis: immunity and defense, metabolism, proteases, receptors, and signal transduction. Four bioinformatics programs (TRED, rVISTA, JASPAR, and Ariadne Pathways) were used to further analyze the promoter regions and common upstream regulators of the 19 genes. Twelve of the genes shared nine common upstream regulators, many of them known to affect atherosclerosis, and one possible new pathway was identified that may be involved in this disease.


Introduction
A number of DNA microarray studies of atherosclerosis from human tissue specimens, mouse models, and non-human primate models, have been published over the past decade.These publications established lists of genes that were significantly changed in vessels containing atherosclerotic plaques as well as in the atherosclerotic plaques themselves.While DNA microarray technology is a powerful tool, there may be significant variability among studies [1] leading to gene lists that may not agree.The tools of bioinformatics can be used to search for a consensus among these lists from DNA microarray studies of established atherosclerosis.
This project was undertaken to search for similarities in differential gene expression data among DNA microarray studies of atherosclerotic vessels and plaque.We hypothesized that DNA microarray data from a variety of atherosclerotic vascular beds would provide a consensus list of differentially expressed genes that would uncover molecular mechanisms underlying the pathology of atherosclerosis.In order to identify prominent gene expression changes in established atherosclerosis, we searched for genes that were differentially expressed in microarray publications of atherosclerotic vessels and plaque.We employed several bioinformatics programs to identify molecular data related to these differentially expressed genes and their relationship to atherosclerosis.

Methods
Seventeen DNA microarray studies pertaining to atherosclerosis and published during the time span of 2002 to 2013 were examined [2]- [18].Fourteen of the seventeen publications probed human vasculature, one examined apolipoprotein E knock-out mice [9], and one investigated a non-human primate model [17].In addition, a review article which characterized genes that may be involved in plaque rupture harvested from several human and mouse specimens was also included [2].Many of the raw datasets had not been deposited in a publicly available database so this analysis could not use Gene Set Enrichment Analysis with careful statistical analysis.Rather, this study relied upon the statistical analysis carried out by the original authors who identified significant differences in gene expression in their studies.Our study correlated and compiled those lists of differentially expressed genes.The lists of differentially expressed genes from each DNA microarray publication were compared to identify those genes that were significantly different in three or more publications.The consensus list of differentially expressed genes was then analyzed with Ariadne Pathways software (Version 9, Ariadne Genomics Inc., Rockville, MD) to determine whether the genes shared common upstream regulators.Transcription factors activated by the common upstream regulators were identified in the literature.
Three bioinformatics programs were used to predict transcription factor binding to the promoter regions of genes in the consensus list of the differentially expressed genes.These were the JASPAR database (http://jaspar.genereg.net/),rVISTA (http://rvista.dcode.org/),and TRED (Transcriptional Regulatory Element Database) (http://rulai.cshl.edu/cgi-bin/TRED/tred.cgi?process=home).The JASPAR and rVISTA databases use position weight matrices to predict transcription factor binding within given DNA sequences.Both programs contain position weight matrices for the relevant transcription factors STAT1, STAT3, ELK, HIF1, AP1, EGR1, CREB, and SP1.JASPAR also includes the position weight matrix for RelA, whereas rVISTA includes those for SMAD, PPARA, and PPARG.An additional function in rVISTA is the ability to compare the predicted binding site sequences between two species; human and mouse sequences were compared in this analysis.TRED contains a curated list of experimentally defined and published transcription factors shown to bind to the regulatory regions of genes, as well as the promoter sequences for each of the genes.The transcription factors listed in TRED that were pertinent to this analysis were STAT1, STAT3, JUN, HIF1, SP1, SMAD3, RelA, PPARA and PPARG.
The results from the bioinformatics analyses were organized into a chart matrix.When all of the bioinformatics programs predicted the same upstream regulator, transcription factor, and differentially expressed gene, the pathway was further analyzed via literature searches to determine its relevance to atherosclerosis.
Nineteen genes were found to be differentially expressed in three or more of the DNA microarray studies.(ontology: signal transduction).Eighteen of the genes were upregulated and one, IGFBP6, was downregulated.Four of the original seventeen DNA microarray publications did not include any genes in the consensus list [2] [7] [8] [16] although the number of genes measured in these studies ranged from 205 to 47,000.Fifteen of the nineteen consensus genes were represented in the gene list from the non-human primate model [17] and ten differentially expressed genes were found in the list from human femoral artery [14].The remaining publications each contained five or fewer genes from the consensus list.
The nineteen genes were analyzed with Ariadne Pathways software to determine whether the genes shared upstream regulators.Twelve of the nineteen genes shared nine common upstream regulators (Table 1) which were categorized into three types: ligands (IFNG, IL6, TGFB, and TNF), signal transduction pathway components (MAPK1/ERK and MAPK14/p38), and regulators that act directly as transcription factors (SP1, PPARA, and PPARG).The six regulators involved in signal transduction pathways (ligands and components) have the ability to activate more than one transcription factor in order to alter gene expression.Literature searches were performed to search for the preferred transcription factors for each upstream regulator.The JASPAR, rVISTA, and TRED databases were used to predict binding sites for those transcription factors in the promoter regions of the differentially expressed genes.
The data from each of the three transcription factor binding site prediction programs were organized into a chart matrix (Table 2).The JASPAR database detected 119 transcription factor binding sites in the promoter regions of the twelve genes that shared upstream regulators.Only 7 of the promoters of the 12 genes contained conserved sequences when analyzed in rVISTA, which reduced the number of transcription factor binding sites to 25. TRED added further stringency as the data from this database relies on published transcription factor binding data, but TRED contained data for only 9 of the 12 genes.TRED identified 25 binding sites within the promoter regions of those genes differentially expressed genes.Of particular interest was the transcription factor binding sites detected by all three programs in the same gene (Table 2).Only four distinct transcription factor binding sites in three differentially expressed genes were detected by all three bioinformatics programs.By combining the data from the three programs, definitive signal transduction pathways were predicted.For the purpose of this analysis, signal transduction pathways are defined as an upstream regulator, a transcription factor, and a differentially expressed gene from the consensus list.The four predicted pathways were IFNG→STAT1→CCL2, IL6→STAT3→CCL2, ERK→SP1→MMP9, and ERK→SP1→BAX.

Discussion
The goal of this study was to identify a list of genes whose expression was affected by atherosclerosis and to search for regulatory commonalities among the genes on the consensus list.The list of genes was developed by noting consensus among genes that appeared on the differential expression gene lists of three or more DNA microarray publications on established atherosclerotic vessels and plaque.Regulatory commonalities were ascertained through the application of four bioinformatics programs/databases: Ariadne Pathways, TRED, rVISTA, and JASPAR.It was necessary to use all of these databases because although each contributed to a critical aspect of the analysis, none was individually able to provide a complete analysis.The 19 differentially expressed genes identified by consensus in this study were categorized into five functional categories: immunity and defense, metabolism, proteases, receptors, and signal transduction.As an inflammatory disease, genes in the immunity and defense ontology play a role in atherosclerosis.Oxidized low density lipoproteins engage the immune system and activate antigen presenting cells, which in turn stimulate the formation of T cells, thereby exacerbating atherosclerosis [19].Monocytes and macrophages take up the oxidized low density lipoproteins in order to protect the vessel from the toxic effects of the lipoproteins [20].CCL2, one of the genes identified in this study, has been shown to mediate infiltration of monocytes and macrophages into the vessel wall by trafficking these cells to areas of inflammation in atherosclerosis [21].CCL2 has also been linked to increased endothelial and smooth muscle cell proliferation [22].
GAPDH, APOE, and FABP5 are involved in metabolism.The finding that GAPDH was changed in three DNA microarray studies was surprising.GAPDH is widely used as a reference gene for molecular biology experiments.However, GAPDH may play other roles in addition to its metabolic housekeeping functions.GAPDH has been shown to interact with the cytoskeleton, regulate phosphotransferase and kinase activity, control mRNA, manage tRNA export, and be involved in DNA replication and repair [23].Moreover, GAPDH has been shown to control the functions of the macrophage scavenger receptor (MSR), a mediator in atherosclerosis de-velopment [23].In atherosclerosis, APOE plays a protective role in the vasculature.When APOE activates its receptor on macrophages, the macrophages revert to a less inflammatory phenotype [24].APOE also regulates cholesterol efflux [25].FABP5 has been linked to ER stress and is a proatherogenic protein.Knockout studies of FABP5 have shown reduced atherosclerosis formation [26], and both macrophages and adipocytes are sensitive to the effects of FABP5 [27].
Members of the protease ontology found in our study, including CTSS and MMP9, play roles in plaque instability and rupture.Increased levels of CTSS have been associated with the presence of atherosclerosis in both mice and humans [28].The expression of CTSS leads to disruption of the vessel wall which aids in the migration of both monocytes and smooth muscle cells into the area; both of these cell types are involved in the formation of atherosclerosis [29].MMPs can contribute to plaque instability and rupture [30].MMP9 is protective to the vascular wall when expressed normally but leads to increased inflammation and plaque destabilization when overexpressed [31].Caspases, which are involved in apoptosis signaling, are involved in cell death in the atherosclerotic plaque [32].The inhibition of caspase 8, identified in our study, protects macrophages from apoptosis [32].
Three receptors were identified in this analysis: CD163, CD63, and CSF2RB.CD163 may act as a protective molecule by upregulating heme oxygenase-1, which guards the vessel from atherosclerosis [33].However, in symptomatic plaque, increased expression of both CD163 and heme oxygenase-1 has been found, which suggests that these may have been upregulated in response to plaque hemorrhages [34].CD63 was increased in a rabbit model of atherosclerosis as well as plaque isolated from human vessels [35].The role of CSF2RB is largely unstudied in atherosclerosis.
Four signaling proteins, MAP2K1, RARRES1, IGFBP6, and BAX, were identified as changed in atherosclerotic plaque in this study.The mitogen activated protein kinase signaling pathways, which include MAP2K1, have all been shown to be involved in atherosclerosis.Activation of the pathway which includes MAP2K1 by IL6 can destabilize atherosclerotic plaques [36].Activation of this pathway also can lead to macrophage proliferation as well as smooth muscle cell activation [37].The functions of RARRES1 and IGFBP6, the only gene that was downregulated in the consensus list, in atherosclerosis have yet to be defined.
Many of the genes shared the common upstream regulators IFNG, IL6 and MAPK1.The cytokine, IFNG, is secreted by all of the immune cells found in atherosclerotic plaque [21].Not only does IFNG act upon macrophages, it also regulates the expression of adhesion molecules on endothelial cells [38].Because of its variety of roles in plaque formation and rupture, it is not surprising that IFNG has the ability to increase the expression of many of the atherosclerosis-associated genes found in this study.IL6 was another upstream regulator which can affect the expression of genes on the consensus list.IL6 has been shown to induce the release of other inflammatory cytokines as well as prothrombic mediators, activate MMPs, and lead to plaque development and destabilization [39].As stated above, the mitogen activated protein kinase pathway is involved in atherosclerosis.MAPK1, identified in this analysis as capable of controlling the expression of seven of the twelve genes, can phosphorylate a multitude of protein targets [21].It has also been shown to stimulate and induce migration of smooth muscle cells [40].
When the data were examined from the perspective of intact pathways composed of an upstream regulator, a transcription factor, and a differentially expressed gene, four distinct pathways were identified that may play a role in atherosclerosis.The expression of the differentially expressed genes CCL2, MMP9, and BAX were predicted to be controlled by IFNG, IL6, and ERK, respectively.The transcription factors involved were STAT1, STAT3, and SP1.The involvement of the pathways that activate CCL2 and MMP9 have been described elsewhere [41]- [43].However, the pathway MAPK1→SP1→BAX has yet to be characterized in established atherosclerosis.
BAX is one member of a large family of proteins involved in apoptosis.Activated BAX homodimerizes, inserts itself into the mitochondrial membrane, and causes the release of cytochrome C [44].These actions induce apoptosis in the cell.The loss of BAX-dependent apoptosis in macrophage cells increases the size of atherosclerotic plaque due to increased presence of the apoptosis-resistant macrophages [45].In the data presented here, BAX expression was increased in atherosclerotic plaque.Endothelial cells, smooth muscle cells, and macrophages are affected by increased apoptosis in the presence of atherosclerotic plaque [46]- [48].Since aberrant apoptosis in all of the cell types involved in atherosclerosis occurs, increased BAX expression is an important find.
While the relationship between MAPK1 and BAX gene expression has been studied in atherosclerosis, there are no publications reporting that MAPK1 can induce BAX expression specifically through the transcription factor SP1.However, it has been shown that SP1 can bind to the BAX promoter [49].The study herein identified a new pathway through which this gene, BAX, may be regulated in atherosclerosis.Further research is needed in order to support the concept that MAPK1 can increase the expression of BAX in vascular cells via the transcription factor SP1.

Conclusion
The bioinformatics analysis described herein provides more insight into the underlying molecular biology of atherosclerosis than a simple survey of available microarray data of this disease.These data have shown that the differentially expressed genes represent classic gene ontologies involved in atherosclerosis.We observed a degree of consensus among the differentially expressed genes from the microarray studies regardless of the vascular bed studied or the microarray platform used.However, the consensus was limited.Only nineteen genes out of the thousands measured achieved accordance in three or more studies.On the other hand, we did observe general agreement in many of the upstream regulators of those genes with factors known to affect atherosclerosis.Moreover, the application of three transcription factor site prediction programs identified established pathways in atherosclerosis as well as the possibility of a new signal transduction pathway involved in this disease.

Table 1 .
Upstream Regulators Identified by Ariadne Pathways Software.Twelve of the nineteen differentially expressed genes shared nine common upstream regulators.

Table 2 .
Transcription factor binding predictions derived from bioinformatics analysis of atherosclerosis gene expression using the JASPAR, rVISTA, and TRED databases.All 3 of the databases identified four distinct pathways involved in atherosclerosis.The column labels denote the upstream regulators and transcription factors.The row labels name the differentially expressed genes.