Plant Long ncRNAs : A New Frontier for Gene Regulatory Control

Long non-coding RNA (lncRNA) refers to an over 200 nt functional RNA molecule that will not be translated into protein. Previously thought to be dark matters of the genome, lncRNAs have been gradually recognized as crucial gene regulators. Although tremendous progress has been made in animals and human, the study of lncRNAs in plant is still in its infancy. Here, we reviewed the biogenesis and regulation mechanisms of lncRNAs and summarized the achievements that have been made in plant lncRNA identification and functional characterization. Genome-wide identification has uncovered large amount of lncRNAs in Arabidopsis, Rice, Maize and Wheat, and more information from other plant species will be expected with the aid of deep sequencing technologies. Similar to other species, LncRNAmediated gene regulation also widely exists in plants, even though only a few functionally characterized examples are available. Up to now, at least four divergent lncRNA-mediated regulation mechanisms have been unraveled, including target mimicry, transcription interference, PRC2 associated histone methylation and DNA methylation. lncRNAs may be involved in the regulation of flowering, male sterility, nutrition metabolism, biotic and abiotic stress response in plants.


Introduction
Classic central dogma indicates a flow of genetic information from DNA to RNA to protein.RNA molecules are the only messengers that pass information from DNA to protein, which ultimately decides the cellular function and phenotype.With the discovery of non-coding RNAs in the past decades, the classic central dogma has been greatly extended to encompass the developing roles of RNAs.A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into protein.Transcriptomic analyses by whole genome tiling arrays or transcriptome sequencing have revealed that 70% -90% of the mammalian genome is transcriptionally active, but only 1% -2% code for proteins, suggesting that a large proportion of mammalian RNAs are ncRNAs [1][2][3].In the model plant Arabidopsis (Arabidopsis thaliana), less than 50% of its genome is capable of coding proteins [4].In addition to the structural ncRNA such as transfer RNAs, ribosomal RNAs, small nuclear RNAs and small nucleolar RNAs, some of the ncRNAs are believed to play regulatory roles in eukaryotes.Based on the length, the regulatory ncRNAs can be further divided into small ncRNAs (sncRNA, shorter than 200 nt) and long ncRNA (lncRNA, longer than 200 nt) [2,5,6].Small regulatory RNAs, such as micro RNAs and small interfering RNAs, have been extensively studied and are well-known for their important roles in post-transcriptional and trancriptional regulations.However, the regulatory function of lncRNA, which takes 80% of the ncRNAs, largely remains unknown.

Biogenesis and Regulation Mechanism of lncRNA
The biogenesis of lncRNAs is very similar to that of protein-coding mRNAs and sncRNAs, most of the lncRNAs are transcribed by RNA polymerase II in eukaryotes.However, many novel lncRNAs have been found to be transcribed by RNA polymerase III, which was previously thought to only transcribe infrastructure RNAs like tRNA and 5S RNA [7].Large-scale complementary DNA (cDNA) sequencing projects such as FANTOM (Functional Annotation of Mammalian cDNA) has revealed that lncRNAs have many common features of mRNAs, in-cluding 5'capping, splicing, and polyadenylation [1].Most of the lncRNAs are localized in the nucleus with a few exceptions that localized in cytosolic fractions [8,9].lncRNAs can originate from intronic, exonic, intergenic, intragenic, promoter regions, 3'-and 5' UTR, and enhancer sequences and can be transcribed in either the sense or antisense direction [10].Natural Antisense Transcripts (NATs) is referred to the antisense transcripts of protein-coding transcripts.NATs are broadly grouped into two categories based on whether they act in cis or in trans [11].The so-called cis-NATs are transcribed from the same loci as sense transcripts and therefore have perfect match with the sense transcripts.On the contrary, trans-NATs are transcribed from different genomic loci and usually display only partial complementarity with the sense transcript.NATs are widespread in eukaryotes with evidences showing that between 5% -30% of transcriptional units in diverse eukaryotes have been found to have cis-NATs [12].Conley et al. (2008) have found each protein-coding locus is associated with an average of 6 cis-NATs in Human [13].Some literature also shows that NATs represent a significant portion of the transcriptome in plants [14][15][16].Recent studies have implied important roles of NATs in the regulation of gene transcription.For example, the human inherited form of αthalassemia is caused by the silencing of the hemoglobin α-2 gene through cis-NAT action [17].
Various mechanisms have been proposed for lncRNA in the regulation of gene transcription [18].The regulation mechanism is to a large extent determined by the genomic location of lncRNA transcription.Cis-NATs or exotic lncRNAs derived from protein-coding locis usually take a "transcriptional interference" effect in regulation.Because the promoters of the target gene and lncRNA are close to each other, the transcription initiated from both promoters may be co-regulated.This coregulation may occur via the competition for RNA ploymerase II and other initiation factors or via the premature termination of elongation complex from both sense and antisense promoters due to the interaction with each other [19].However, there is also a report showing that lncRNA can enhance the accessibility of target genes to RNA polymerase, rather than competing for the RNA polymerase [20].Some lncRNAs can bind to the promoter DNA of target gene to form a RNA-dsDNA triplex, thus physically block the preinitiation complex from the target gene promoter [21].Other lncRNAs may also regulate target genes in the transcriptional level by controlling the transcription factor subcellular localization or inhibiting the RNA polymerase activities [22][23][24].
Other than the regulation at the transcriptional level, lncRNAs also post-transcriptionally modulate various aspects of mRNA processing, including pre-mRNA alter-native splicing, transport, translation, and degradation.For instance, translation of Zeb2 mRNA needs the retention of its long 5'UTR intron due to an internal ribosome entry site in it.A natural antisense transcript (NAT) that overlaps the 5' splice site in the intron can prevent the intron from being spliced out, thus keep the Zeb2 translated [25].Besides alternative splicing, lncRNAs may also regulate target mRNA stability via a trans-acting manner.Similar to microRNAs, this requires complementary base paring with target mRNA so that a double strand RNA duplex can be formed, and then the RNA duplex can be processed into endo-siRNAs to degrade the target mRNA [26].
Recently, emerging evidence has shown that lncRNAs may play essential roles in the epigenetic control of gene expression, including Tsix/RepA/Xist, HOTAIR and COLDAIR [27][28][29][30].X-inactive specific transcript (Xist) RNA is one of the earliest examples that lncRNA regulate gene transcription by modifying the chromatin status [30].In female mammals, one of the two copies of the X chromosome is inactivated to maintain the same dosage of gene products as males.Xist, an lncRNA of 17kb in mouse or 19kb in human, is a major effector in X chromosome silencing.Xist is expressed in the inactive X-chromosome during early embryonic stem cell differentiation and coated on the chromosome region.Subsequently, the coated Xist recruits Polycomb repressor complex 1 (PRC1), which mediates histone H2A lysine 119 ubiquitinylation (H2AK119ub1), and PRC2, which mediates histone H3 lysine 27 trimethylation (H3K27 me3) to enrich the suppressing chromatin modifications.It is believed that RNA coating together with the histone modification and DNA methylation build the epigenetic memory for the mitotically heritable X-chromosome inactivation [18].

lncRNAs in Plants
Compared with human and animals, the transcriptomic identification of lncRNA in plants is still in its infancy.Fortunately, some dramatic progress has been made in the last few years with the aid of the novel highthroughput sequencing techniques.By sequencing the full length cDNA from various tissues and hybridizing RNA populations to whole genome arrays (WGA), Yamada et al. (2003) analyzed the transcripitional activity of Arabidopsis genome [4].About 7600 (30%) of the annotated genes in their study were found to have significant antisense RNA expression.Interestingly, the sense and antisense RNAs for many genes displayed a tissuespecific expression pattern, indicating that these transcripts are biologically functional.Another study showed that 76 Arabidopsis ncRNAs were identified through in silica genome-wide analysis of full-length cDNA databases, including 5 siRNA precursors, 14 natural antisense transcipts of protein-coding genes [31].According to the stress expression profile, 22 lncRNAs were found to be related to abiotic stress response.Ectopic expression of two lncRNAs affected Arabidopsis differentiation and growth responses to abiotic stresses.Very recently, a deep sequencing of mRNA for assessing sense and antisense transcripts that were derived from different abiotic stresses and normal conditions have uncovered 3819 putative rice cis-NATs, out of which 2292 are potential small RNA precursors and 503 may be related with abiotic stress response.The deep sequence data from isolated epidermal cells of rice seedlings, a special type of homogenous cells, showed that 54.0% of cis-NATs were expressed simultaneously [32].lncRNAs appears to be widespread in Maize as well.Boerner and McGinnis (2012) attempted to identify, classify and localize potential lncRNAs in maize genome by using a computational pipeline [33].Totally 1802 lncRNAs were identified from 18,668 full-length cDNA sequences with over 200 bp in length, 60% of which are annotated to be small RNA precursors.Cis-NATs take a proportion of 20% of the lncRNAs derived from the genic region.In wheat, Xin et al. (2011) also identified 71 powdery mildew infection responsive lncRNAs and 77 heat shock responsive lncRNAs respectively [34].
Despite the identification of lncRNAs in plant genomes, our understanding of the functions of lncRNAs remains rather poor.So far, most of our knowledge in lncRNA regulation is from FLC , a master repressor of flowering in Arabidopsis.Vernalization is the acquisition of certain plant's ability to flower or germinate in the spring by exposure to the prolonged cold of winter.Prolonged cold treatment induces the expression of VIN3 (Vernalization insensitive 3), which can recruit the chromatin remodeling complex PRC2 to deposit the suppressing H3K27me3 modification on the FLC locus, thus stably maintain the FLC gene in a "shut down" status.By using a single nucleotide resolution array which covers both strands of FLC and 50kb adjacent region, Swiezewski et al. (2009) indentified a serial of cold induced antisense transcripts covering the entire FLC locus and named them as COOLAIR (cold induced antisense intragenic RNA) [35].Like many other lncRNAs in mammals, COOLAIR are capped, have poly A tails and are alternatively spliced.Supported by the evidence that the reaction to cold of COOLAIR was even earlier than VIN3, the earliest factor in the polycomb silencing mechanism, COOLAIR is believed to negatively regulate FLC sense transcription in a polycomb-independent manner.This conclusion is also reinforced by the fact that COOLAIR remains transcribed in the absence of VIN3.However, COOLAIR repression effect is reversible after the plants moved back from cold to warm conditions, suggesting that COOLAIR only transiently suppresses the FLC , and polycomb machinery is indispensable for the construction of epigenetic memory of FLC inactivation.Because the COOLAIR and nascent FLC expression levels are negatively correlated, and a reduction in occupancy of RNA polII at FLC was observed with the presence of COO LAIR, it seemed that COOLAIR suppresses FLC by a transcription interference manner.Thus, a regulation model of FLC in vernalization was speculated with the following possible path: COOLAIR induction during cold, transient suppression of FLC sense transcription, up-regulation of VIN3, recruitment of polycomb repression complex 2 by VIN3, epigenetic memory construction of FLC inactivation (Figure 1(A)).Although, it has been demonstrated that COOLAIR suppresses FLC through transcription interference, in particular promoter interference, the possible involvement of any post-transcriptional machinery still needs to be explored in future studies.
COLDAIR is another class of lncRNA derived from FLC locus [27].Divergent from COOLAIR, COLDAIR is a sense transcript starting from the first intron of FLC, and it is approximately 1100 bases long with 5' cap but no obvious polyA tail was found in the 3' end, which are typical features of lncRNAs that are transcribed by RNA polymerase V and IV.Nevertheless, COLDAIR is transcribed by RNA polymerase II like many other lncRNAs in mammals.The COLDAIR expression is induced by cold exposure, reaches a peak after 20 days of treatment, and returns back to the prevernalized level after 30 days of cold.This kind of expression pattern is very similar to COOLAIR, except the peak expression comes 10 days later than COOLAIR [35].The COLDAIR knock-down plants showed late flowering after vernalization, which can be explained by the phenomena that there is no enrichment of H3K27me3 on the FLC chromatin.More interestingly, RNA binding assay showed that COLD-AIR specifically interact with CLF (Curly Leaf), a key component of PRC2 complex for H3K27me3 modification.Hence, it is very likely that COLDAIR negatively modulates FLC via a polycomb-dependent model.COL-DAIR may play a role in the recruitment of PRC2 to FLC chromatin to trigger the epigenetic memory establishment of FLC silencing by vernalization (Figure 1

(B)).
Plant lncRNA was also reposted to regulate gene expression via a target mimicry mechanism [36].The non-protein coding gene IPS1 (Induced By Phosphate Starvation1) from Arabidopsis thaliana is highly induced by phosphate starvation.Accumulation of IPS1 can reduce the phosphate content in shoot of plants by upregulating the expression of PHO2, a negative regulator for phosphate uptake.Interestingly, miR399, which specifically degrades PHO2, was found partially complementary with conserved 23-nt motif of IPS1.The imperfect complementation allows IPS1 to form a RNA duplex with miR399, instead of being degraded by miR399, thus arrest miR399 for its normal function in phosphate translocation.Meanwhile, modified miR399-cleavable IPS1 did not own any inhibitory activity on miR-399.Therefore, it seems that the uncleavable IPS1 competes with PHO2 for miR399 to keep a balance of functional miR 399 in phosphate absorption (Figure 1

(C)).
In rice, an lncRNA LDMAR was cloned for controlling PSMS (Photo-sensitive male sterility).Originated from an elite japonica rice variety Nongken 58N (NK 58N), Nongken 58S (NK58S) was a spontaneous mutant exhibiting PSMS, i.e. its pollen becomes completely sterile when grown under long-day conditions, whereas the pollens are viable under short-day growth conditions.By using a position cloning strategy, Ding et al. (2011) successfully cloned the LDMAR gene, in which a C-to-G mutation caused the PSMS in NK 58S [37].LDMAR is 1236 bases in length and non-protein-coding, essentially an lncRNA.Pollen fertility under long-day conditions for both NK58N and NK58S requires a high dosage of LDMAR.However, the transcript abundance of LDMAR in NK 58S is lower than NK58N under long-days, possibly due to the fact that the DNA in the promoter region of NK 58S LDMAR is more methylated than that of NK 58N LDMAR.Based on the findings, the authors proposed that the C-to-G mutation altered the secondary structure of LDMAR in NK58S, and the structural alteration brought DNA methylation in the promoter region, which suppressed the LDMAR expression, and the insufficient LDMAR eventually led to the sterility of NK 58S under long-days (Figure 1D).Later, Psi-LDMAR, a siRNA derived from the sense strand of LDMAR promoter region was also found to be responsible for the regulation of DNA methylation in this regoin [38].Closely following Ding et al's work (2011), another group narrowed the essential sequence for PMSA in LDMAR locus to a 136-nt small RNA [39].Nevertheless, the data presented so far has not excluded the possibility that either the 1236 nt lncRNA or 136 nt small RNA or both are the functional form of LDMAR [40].
In addition to the lncRNAs reviewed above, a few other lncRNAs from plant were also reported, including the Enod40 related to soil bacteria-plant interaction in Medicago and Soybean [41,42]; IPS1 related to phosphate metabolism in tomato, Medicago and rice [43][44][45] (Table 1).

Epigenetic Regulation of Genes by Polycomb Associated LncRNAs
Polycomb-group proteins (PcG) are a family of proteins that can regulate the target genes expression by remodeling the structures of chromatin associated with the target genes.PcG proteins function by forming three principle types of hetero multimeric complexes, including polycomb repressive complex 1 (PRC1), polycomb repressive complex 2 (PRC2) and Pho RC.The PRC2 complex is mostly conserved among eukaryotes and has been extensively studied [46].It has been demonstrated that PRC2 catalyzes trimethylation on the lysine 27 of histone 3 (H3K27me3) to suppress the chromatin associated gene expression, while PRC1 represses gene expression by catalyzing the monoubiquitination of histone H2A at lysine 119 (H2AK119ub).
It should be noted that PRC2 is recruited by lncRNAs in several reported cases, implying a widespread of the lncRNA-PRC2 gene silencing mechanism in different species [47].So far, the question concerning how is PRC 2 recruited to the right chromatin region has not been fully addressed.Tsai et al. (2010) suggsted that lncRNAs may serve as scaffolds by providing binding surfaces for polycomb complex assembling, thereby specifying the pattern of histone modifications on target genes in a temporal-and spatial-manner [29].
Genome-wide identification of lncRNAs that associated with polycomb complex, especially PRC2, will be extremely helpful for the understanding of lncRNA-mediated epigenetic regulation and the molecular function of lncRNAs.Up to now, some research has been done in this field.Khali et al. (2009) tried to identify human PRC2-associated lncRNAs by using a method of RIP (RNA co-immunoprepcipitation) followed by hybridization to tilling array [48].As a result, around 20% of lncRNAs were bound by PRC2, and additional lncRNAs were bound by other chromatin modifying complexes.Employing a RNA immunoprecipitation sequencing (RIP-Seq) method, over 9000 PRC2-interacting RNAs, half of which are non-coding RNAs, were identified in mouse [49].It is very likely that Ezh2 bridged the RNA with PRC2 for interactions.By using a novel method Chromatin Isolation by RNA Purification (ChIRP), Chu et al. (2011) revealed the focal and sequence-specific distribution patterns of three lncRNA occupancy sites in Drosophila and human genomes [50].Drosophila roX2 RNA favors the male X-linked gene region; Human telomerase RNA TERC distributed more on the telomeres, while HO TAIR lncRNA preferentially occupies a GA-rich DNA motif.These featured distribution patterns suggested that lncRNAs regulate chromatin status in a sequence-specific manner.Unfortunately, to the best of our knowledge, no work on the PRC2 associated lncRNA identification in plants has been reported so far.

Conclusions and Perspectives
Although the study in plant lncRNAs started much later than human and animals, tremendous progresses have been made in the past few years.With the aid of new, affordable deep sequencing technologies and bioinformatics, we can anticipate a knowledge explosion of plant lncRNAs identifications in the following years.Currently, we still rely on the bioinformatic prediction of open reading frame to determine if a RNA is coding or noncoding.It becomes extremely complicated to clearly distinguish coding RNA from non-coding RNA, since coding RNA can be short with a small ORF, while noncoding RNA derived from genic region may contain long ORF with homology to known proteins [51].Therefore, the proteomic identification of plant proteins will be necessary to at least exclude the coding RNAs in the transcriptome.Meanwhile, how to remove the transcript noise from the transcriptome remains to be a great challenge in lncRNA identification.
With only a few lncRNAs functionally characterized so far, the function of lncRNAs in plants is largely a mystery.The emerging role of lncRNA in human, animals as well as plant development have indicated that lncRNA-mediated gene regulation widely exists and is conserved among eukaryotes.Thus, the knowledge gained from other organisms may be applied to plants.For examples, since some of the cell-or tissue-specific expression patterns are epigenetically controlled by lnc RNAs through a PRC2 complex recruiting mechanism, it is rationale to speculate that some of the plant development processes are modulated via the same mechanism.On the other hand, the new techniques such as RIP-Seq and ChIRP, which have been successfully used to identify lncRNA interactive DNA, RNA and proteins in other species, can definitely be used in plants and facilitate the functional characterization of plant lncRNAs.

Figure 1 .
Figure 1.Schematic representation of four types of lncRNA regulation mechanism in plants.(A) COOLAIR regulates FLC in a transcription interference model.Top: FLC is transcribed by RNA Polymerase II before prolonged cold exposure; Middle: early cold treatment induced the expression of COOLAIR, which interferes with the RNA polymerase II binding to the FLC promoter, thus transiently suppresses the FLC expression; Bottom: after longer cold exposure, VIN3 recruits PRC2 complex to deposit H3K27me3 modification on FLC loci; (B) COLDAIR regulates FLC in a PRC2 associated histone modification model.Top: COLDAIR is induced by cold treatment.Middle: COLDAIR recruit PRC2 complex to the FLC loci; Bottom: PRC2 complex deposit H3K27me3 on the FLC loci; (c) ISP1 regulates PHO2 in a target mimicry model.Top: under normal growth condition, miR399 specifically binds to PHO2 and degrades PHO2 mRNA; Bottom: ISP1 competitively binds with miR399 to arrest its degradation function on PHO2; (D) LDMAR regulates the transcription of itself by a DNA methylation model.In NK58N, LDMAR is normally expressed.In NK58S, the C-to-G mutation altered the secondary structure of LDMAR and leads to the promoter DNA methylation, which reduced the LDMAR expression responsible for PSMS in NK58S.