Genetic variation may play a crucial role in non-coding RNA biogenesis

Transcription, post-transcriptional modification, translation, post-translational modification, DNA replication, and signaling interaction of intraand extracellular components are the relevant mechanisms in gene regulation. Transcription is one of the most important mechanisms in the control of gene expression. Further, post-transcriptional modifications play a crucial role after transcription which determine whether the transcribed gene is coding or non-coding RNA (ncRNAs). Genome-wide analysis of RNAs provides information about the coding RNAs, whereas the status of ncRNAs are still at large and must be discussed in detail as variations in the ncRNAs can lead to different phenotypes. In this short article, we discuss the role of genetic variation in ncRNA genes and how this variation may play a crucial role in ncRNA biogenesis that eventually leads to phenotypic variation and thus speciation.


DARK MATTERS IN GENE EXPRESSION
Our understanding of the principle mechanisms that orchestrate the central dogma of life is limited due to the lack of appropriate physical approaches [1,2], i.e. accurate and advanced experimental facilities to understand the genetic variations, the limitations in the bioinformatics tools which predict DNA/RNA secondary structure, interactions, control elements, among others.Due to these limitations, a detailed understanding of the exact mechanisms underlying the processes of transcription and translation are still obscure.Transcription is a complex and tightly regulated mechanism in gene expression where the genes can be turned on or off based on internal or external signals [3].The post transcriptional modifications of the transcripts designates them as coding (protein-coding) or non-coding RNAs (ncRNAs) [4][5][6].Extensive research on RNA proves their significant role in gene expression [7,8].Recently, a pilot study reported the properties of loss-of-function (LoF) variants of human protein-coding genes [9], which show that each human individual carries ~100 LoF variants, with ~20 genes completely lost.These loci can be described as "dark matter" because intronic sequences hide crucial information in their sequences which affect gene expression.There is evidence that variations in the exonic or the intronic regions of DNA can affect the structure and functions of their target mRNA or ncRNAs and vice versa, and thus the cellular signaling pathways of the cells that control, for example, the birth and death of cells.It is thus important to discuss the status of genetic variations not only in coding RNA but also in the ncRNAs.While ncRNAs are important in the control of gene expression, it is unclear how these ncRNAs originate.Below, we discuss the role of genetic variation in creating variation in ncRNAs during biogenesis and how it affects the structure of ncRNAs.

ROLE OF GENETIC VARIATION IN NCRNA BIOGENESIS
Each individual's genome differs in its expression patterns from other members of the species, as well as between species [10].Using a Markov model, one recent report suggests that humans and chimpanzees speciated 4.1 million years ago.Total human genome analysis reported that only <2% of the total genome sequence has protein-coding capacity [11].This implies that ~98% of the genome might have both coding and ncRNA producing genes, which might show expression variation in any given environment.These variations would likely impart changes in two major components: 1) it could affect the secondary structure of the encoded protein (as a result, LoF variant proteins could be generated as shown by MacArthur et al.), and 2) it could mis-target the ncRNAs by relaxing the base pairing consistency [12].Such misparing might be more likely to affect the function of particular RNAs, and could also disturb cellular signaling pathways controlling gene expression, since ncRNAs regulate the expression of many genes [13].It would be more interesting if genome sequence data also discussed the LoF variants of intronic sequences.This would help us understand whether the predicted LoF variants were generated by intronic variation.Recent evidence also suggests that certain sequences have the potential to generate ncRNAs with different functions (possibly through non-canonical ncRNA biogenesis pathways [14]), which are important for cellular functions.This includes Sno-microRNA [15] and pi-sno-microRNA (x-ncRNA) [16,17], which are evolutionarily conserved.This leads to the intriguing question: how does a particular gene sequence produce structurally and functionally different ncRNAs (see Figure 1).The answer for this phenomenal question is currently unknown and the approaches to address this question are being recognized as an active area of investtigation.Whether or not the single-nucleotide polymerphism (SNP) variants of ncRNA could play crucial roles also remains to be clarified.The primary ncRNA sequences are transcribed from DNA sequences [2].It is not known how deletion, addition, repeats and other LoF variation in the DNA sequence of ncRNAs affect the structure of the primary transcript [18].Genetic variation may play a critical role, not only in the LoF of mRNAs but also in ncRNAs.Although experimental evidence suggests that several proteins (e.g.Dicer, RISC) are involved in the production of a particular ncRNA from precursor-ncRNA [19,20], this evidence could not elucidate the secondary or tertiary structure of the primary transcripts.This is because of the stochasticity in the formation of the tertiary RNA structure which includes energetic formations (Gibbs theory of free energy function) of tertiary structure, chelation of divalent cations/multicharged ligand interaction in RNA sequences which are poorly understood [21,22].Data analysis of LoF variants of coding genes confirm the presence of changes in the sequence of RNA and hence changes in the structure and function of the RNA.This phenomenon hints that ncRNA might be more susceptible to structural and functional changes due to changes in sequence [23] if the aforementioned RNA folding theories are true.It is thus also important to address the status of an ncRNA variant in the same environmental condition in any human genome data, since it might give a path to predict the biogenesis pathway of ncRNA families.For example, in the case of Cystic fibrosis, LoF variants of non-coding genes have deleterious effects [24].Since most of the human genome accumulates and transmits genetic mutations over time, understanding the detailed status of the noncoding sequence mutations along with the genetic variation of coding sequences might help us overcome deleterious effects [25] and might allow us to see ongoing genetic evolution in this era.

THEORIES OF GENETIC VARIATION, LOF AND SPECIATION
Theories on genetic variations have been discussed extensively for several decades.Recently, a report reviewed several proposed the theories of genetic variations [26] as: 1) "Less is more" hypothesis-advantageous effects of LoF variants; 2) "Less is less" hypothesis-deleterious effects of LoF variants; and 3) "less is nothing"-tolerated LoF variants situation.These theories describe the LoF of particular characteristics based on the comparison of populations or by tracking of an individual trait.The adaptation of a sub-population leads to the evolution of new species.This means that adopted characteristics are transferred to offspring which undergo natural selection, where the favorable traits are preserved allowing the sub-population to survive, which may then be subject to speciation.Further, Charles Darwin proposed that "each slight variation, if useful, is preserved" which determines how a sub-population adapts and evolves to form a new species in the process of natural selection [27].
A systematic survey of LoF variant genes using clinical sequencing data provides the status of individuals affected by a particular disease and infers that the selected individuals experience the deleterious effect by LoF variants in that particular environment [28].While discussing the deleterious effects of genetic variation, it is important to extract more information.For example, it is not known whether these effects may become advantageous to the offspring, if transmitted.Further, the position of the mutation in the intronic gene sequence might play a significant role certain diseases.Most genomewide data lacks single nucleotide polymorphisms (SNPs) of introns.Several reports provide evidence that SNPs can affect the structure and function of ncRNAs, which can cause diseases such as cancer [29].Providing this data will add much more value to understand how the "less is less hypothesis" plays a role in the selection of cells in a population by coordinating gene expression patterns via coding and ncRNAs.
The LoF variants in ncRNAs may play a significant role in the biogenesis of a particular type of ncRNA and hence lead to the production of multiple types of ncRNA from the same sequence.LoF variants in coding and ncRNA may thus control gene expression of cells in a given environment in a combinatorial manner for its survival.In human genome data, understanding the LoF variant function in ncRNA might therefore help us explore the real status of the gene expression system in a particular environment.This will allow us to study how the less is less (the role of deleterious effects in speciation) phenomenon occurs in the natural system.

Figure 1 .
Figure 1.SNPs/metal induced RNA structural variations produce different types of non-coding RNAs from the same locus.