An in silico Analysis of Upstream Regulatory Modules (URMs) of Tapetum Specific Genes to Identify Regulatory cis -Elements and Transcription Factors

The present work presents an in silico analysis of Upstream Regulatory Modules (URMs) of genes expressed in tapetum specific manner in dicotyledon and monocotyledon plants. In the current analysis, we identified several motifs conserved in these URMs of which ten were observed to be part of known cis-elements using tools and databases like MEME, PLACE, MAST and TFSEARCH. We also identified that binding sites for two transcription factors, DOF and WRKY71 were found to be present in majority of the URMs.


Introduction
Tapetum is the innermost layer of the anther wall of plants. It performs the function of a nourishing tissue that remains in continuity with the pollen mother cell through plasmadesmatal connections till the formation of meiocytes occurs in young anther. Tapetum varies from unilayer to multilayer in different plant species and can be uninucleate or multinucleate. Although tapetum cells form a single or at-most a few cell layers in the anther tissue, several studies have been carried out to understand how these cell layers develop and the functions played by them in pollen cell development [1] [2] [3] [4]. These studies have led to the identification of several genes expressed in tapetum specific manner. Such genes have mainly been identified by analyzing comparative cDNA libraries, subtractive hybridization, microarray analysis, in-situ hybridization and in recent years by Laser Dissection Microscopy followed by RNA sequencing [1] [5]- [16].
TA29 from Nicotiana tabacum [1] and A9 from Arabidopsis thaliana [6] are examples of tapetum specific genes that were identified in early years. The promoters of these genes known as TA29 and A9 promoters have been used extensively in the expression of transgenes like barnase and barstar from Bacillus amyloliquifaciens to develop pollination control systems for hybrid seed production [5] [17] [18] [19] [20] [21]. Tight regulation of these promoters leading to tapetum specific expression was the key to success of this system. Attaining a robust tissue specificity of a promoter may need the combinatorial interplay of positive and negative regulators (transcription factors, TFs). The TFs would bring about their outcome by binding to the promoter through specific motifs or cis-elements. Several tapetum specific promoters have been identified till date, examples of which have been summarized in Table 1. However, there is limited knowledge about the transcription factors or the cis-elements of the promoters that are important for regulating these promoters. Although some tapetum specific promoters have been recently characterized in details e.g. OsLTP6 from rice [22] and A9 from Arabidopsis [23] in most of the studies, the characterization is limited to identifying the minimum length of the promoter needed for tapetum specific expression.
The present work is an attempt to identify conserved motifs/cis-elements present in genes expressing in the tapetum tissue of dicotyledon and monocotyledon plants. Further, putative TFs that may bind to these elements have also been predicted. Information generated from this work can be used for experimental validation.

Method
Motif Based Sequence Analysis Suite, MEME suite ver. 4.9.1 [24] was used to find out the conserved motifs in the different datasets. PLACE database [25] was used to figure out the cis-elements from the conserved motifs so obtained. Multiple Alignment & Search Tool, MAST ver. 4.9.1 [26] was used to attain consensus sequence of the conserved motifs obtained from MEME analysis. TFSEARCH software ver.1.3 [27] was used to find the putative TFBS and the transcription factors.

Results and Discussion
A literature survey was carried out to identify genes that expressed in a tapetum or anther specific manner. A total of 34 genes, 24 from dicot and 10 from monocot plants were identified and used in the present analysis (Table 1). From these, two datasets were developed comprising of 600 bp Upstream Regulatory Module (URM), one from dicot and another from monocot species. URMs [40] are defined as a region of a gene upstream to the translational start site, which includes the 5'UTR. Analyzing the URM was necessary in this analysis as in most cases the transcriptional start site has not been experimentally identified. P. A. Sharma, P. K. Burma The sequences for the respective URMs were downloaded from NCBI website.
The sequence files of dicots and monocots URMs thus generated were submitted separately at MEME Tool available online for analysis of conserved motifs. In order to identify the conserved motifs, MEME program was run with different parameters that defined the motif width (5 -13, 6 -10 or 6 -14) and the total number of motifs to be generated was fixed at 10. After identifying the motifs generated using the different widths, it was observed that in most cases, the motif generated with 6 -14 width encompassed those generated by 5 -13 or 6 -10 width. Thus, the 10 motifs generated with 6 -14 width were taken for further analysis. The position of the motifs in the different URMs as generated by MEME for both datasets and the sequence of the motifs as identified by MAST are presented in Figure 1 and Figure 2.
After identifying the conserved motifs in the two datasets of anther/tapetum specific genes, the next step was to analyze if these motifs corresponded to any known cis-elements of plant promoters. This was done by creating strings of the identified motifs and submitting it to the PLACE database. This led to the identification of several known cis-elements. This data was then manually curated and 10 known cis-elements were identified that are enlisted in Table 2. It was observed that out of the 10 identified motifs in case of dicots (Figure 1), 6 of them have already been reported in the literature. In this case, no known cis-elements were identified for motifs 1, 2, 8 and 10. In case of monocots, we could identify known cis-elements only for motifs 1, 5 and 7. This could be ref- lective of the fact that generally more information is available for dicot promoters than those of monocots.
We then attempted to see if there was any information about TFs binding to these cis-elements. In order to do so, we first analyzed the presence of transcription factor binding sites using TFSEARCH tool. TFSEARCH searches highly correlated sequence fragments versus a TFMATRIX that is a transcription factor binding site profile database present in "TRANSFAC" database [27]. Strings of the motifs enlisted in Figure 1 and Figure 2 were submitted as query sequence to TFSEARCH which is available online. These led to the identification of four transcription factors which could bind to these URMs. These were DOF [ Table 2. Known cis-elements corresponding to identified motifs in Figure 1 and Figure  None [56] American Journal of Molecular Biology  across lower and higher plants possessing multiple genes encoding the Dof domain containing protein [57]. Its cDNA was first isolated from maize [58] [62]. It is involved in regulation of genes of specific pathway for carbon metabolism in maize where it regulates C4PEPC (C4 photosynthetic phosphoenol-pyruvate carboxlase), cyPPDK (cytosolic pyruvate orthophosphate dikinase) and non photosynthetic PEPC [45].
WRKY71 belongs to WRKY family of transcription factors. They are reported to be present across lower eukaryotes (protista) to ferns (pteridophytes) and in plants [63]. The WRKY family members are identified by the presence of a conserved 60 amino acid residue region and a zinc finger domain. Promoters of genes carrying the W-box are potential targets of the WRKY factors [55]. They are key components in the innate immunity of the plant and bind to the W-box of pathogenesis related genes [55] [64]. They are involved in seed and trichome development and embryogenesis [63]. They function as both activators and repressors by protein-protein interaction and autoregulation [65]. WRKY71 expresses in the aleurone layer in rice and is reported to function as a repressor of gibberellic acid signalling pathway in aleurone layer cells. GA pathway is involved in growth and development of plants [66].
The present analysis has led to the identification of certain elements and TFs that could regulate tapetum specific promoters. However, the role of these needs to be experimentally analysed. This can be done by a "loss-of-function" strategy in which the cis-elements in a given URM are mutated and changes in promoter activity, if any are analysed. In a second strategy, "gain-of function", a given TF can be ectopically expressed and its influence on the activity of a given URM is recorded.