Evolutionarily conserved features of the retained intron in alternative transcripts of the nxf 1 ( nuclear export factor ) genes in different organisms

One of the features of intron-containing genes of the nxf (nuclear export factor) family in different organisms is the presence of an evolutionarily conserved exon-intron block: exon 110nt-intron-exon 37nt. The intron in this evolutionarily conserved block, which we call a “cassette” intron, can be excised or retained in alternative transcripts of nxf1. It corresponds to intron 10 11 in the genes that are orthologous to nxf1 in vertebrates, and intron 5 6 in the genes that are orthologous to nxf1 in Drosophilidae. The alignment of sequences of cassette introns in nxf1 genes in vertebrates has revealed four evolutionarily conserved sequences: 1) 5’ flanking sequence, 2) a region containing СТЕ (constitutive transport element), 3) third conserved sequence, and 4) 3’ flanking sequence. Introns 5 6 of nxf1 in Drosophilidae have no similar conserved sequences. The results of sequence alignment demonstrate a similarity between cassette introns of nxf1 in Drosophilidae in two poly(A) sequences. The prevalence of Dm nxf1 transcripts containing cassette intron 5 6 under completely spliced transcripts in the heads of adult Drosophila melanogaster suggests a functional importance of transcripts that contain a retained intron. Evolutionary conservation, which in Drosophilidae is evident in the presence of poly(A) sequences in cassette introns of the nxf1 genes, is an adaptive feature: the poly(A) sequences are capable of mimicking the 3’-end of transcripts, promote transport from the nucleus to the cytoplasm, or are involved in NMD control. The ability to form characteristic secondary structures is a common feature of nxf1 cassette introns.


INTRODUCTION 1.The nxf Gene Family and Functions of the nxf1 Gene
The nxf (nuclear export factor) gene family was named after the function of the universal gene nxf1 responsible for the nuclear-cytoplasmic transport of most mRNAs [1][2][3].This path of mRNA export is RanGTP-independent [4].Several proteins involved in mRNA export pathways are less conserved and presumably appeared later in the evolution of eukaryotes [5].The blocking of the transport path enabled by the NXF1 protein results in the accumulation of polyadenylated RNAs within the nucleus [4,6,7].Genes of the nxf family have been found in eukaryotic organisms of the Opisthokonta group, and are characterized by evolutionary conservation [1,5,8].Genomes of various fungi have only one gene, Mex67, which belongs to this family; animals usually have two to five paralogous genes (see Table 1) [1,[9][10][11].Plants and some protozoa lack genes of the nxf family [5].nxf1 genes are the most evolutionarily conserved in the nxf family.What is even more interesting is that nxf1 genes across different species of mammals exhibit a much greater degree of similarity than Mm nxf1 and Mm nxf2 or Hs nxf1 and Hs nxf2 exhibit between themselves (Figure 1).
In the S. cerevisiae yeast, the factor Mex67 (mRNA EXport factor of 67 kDa), which is orthologous to other NXF1 proteins of the eukaryotes of the Opisthokonta group, is also involved in the nuclear-cytoplasmic transport of ribosomal RNAs [5,12,13].
Initially, the NXF1 protein in humans was identified as a potential cytoplasmic cofactor for Tip (tyrosine kinase interacting protein) encoded by the herpesvirus saimiri, and was named TAP (Tip-associated protein)  [14].Later, TAP was demonstrated to be involved in the nuclear-cytoplasmic transport of the unspliced or partially spliced RNA of retroviruses.TAP directly recognizes only one sequence-CTE (Constitutive Transport Element)-which was initially discovered in RNAs of retroviruses [15][16][17][18].Adaptor proteins mediate the interaction between NXF1 and cellular mRNAs in metazoan [19][20][21].Nuclear mRNA export is connected with transcription, splicing, processing, and mRNA quality control [12].

Modular Principle of Organization of NXF Factors
Proteins of the NXF family have a modular domain or-ganization consisting of an RNA-binding domain (RBD), four leucine-rich repeats (LRRs), a domain exhibiting a similarity to the nuclear transport factor 2 (NTF2-like domain), and a C-terminal ubiquitin associated (UBA)-like domain (Figure 2) [22,23].These proteins are considered RNA transport receptors due to the combination of their receptor and transport functions.The N-terminus of the protein is predominantly responsible for interacting with mRNA, while the C-terminus enables transport of the RNP complex through nuclear pores by interacting with partner protein p15 and nucleoporins-proteins of nuclear pore complexes [24,25].The RBD (RNA-binding domain) belongs to the RRM (RNA recognition motif) family, which has a distinctive βαββαβ structure,  characteristic of many RNA-binding proteins, such as РАВ-poly(A)-binding-protein, Sex-lethal protein (Sxl), and U1A and U2B″ splicing factors [16].The receptor function is much older: RRM have been discovered in proteins of prokaryotes [26].While the RRM domain alone is sufficient for binding with mRNA in vitro, both RRM and LRR are necessary for exporting mRNA in vivo [16,23].The C-terminal end of NXF proteins is represented by NTF2-like and UBA-like domains.The NTF2-like domain is thought to be derived from a prokaryotic precursor [27].The C-terminal end in proteins paralogous to NXF often differs from the corresponding parts in NXF1 proteins.Consequently, many paralogous proteins, such as Dm NXF3, Dm NXF4, Ce NXF2, Hs NXF3, Hs NXF5, Mm NXF3, and Mm NXF7 are incapable of binding with nucleoporins [1,8,11,[28][29][30].In most cases this is due to the loss of a UBA-like domain in paralogous proteins, or the absence of both domains (as in the case of the Dm NXF4 protein) [1,7,31].
The unification of the transport and receptor functions probably took place simultaneously with the origin of the eukaryotic cell.In eukaryotes, transcription and mRNA processing are physically and temporally separated from translation by the nuclear membrane, which required the formation of mechanisms for the active transport of macromolecules, including mRNAs, from the nucleus to the cytoplasm.Most transcripts begin translation soon after they exit the nucleus.The specific class of transcripts capable of staying for extended periods of time in the cytoplasm in a state unavailable for translation presumably required specialization of transport receptors.There are paralogous genes that, unlike the universal gene nxf1, usually exhibit an organ-specific character of expression in mammals [1,8].The products of paralogous genes are usually localized in the cytoplasm [8,11,28,30,32].
Several processes of transformation of genetic material underlie the formation of gene families: duplication, separation of certain genes, or fusion of different parts of genes [33][34][35][36].The existence of clusters of linked genes in different organisms seems to support the hypothesis that the nxf gene family was formed via duplication [1,8].
Integration of cDNA copies of transcripts into a genome may also produce gene copies that contain no introns [36][37][38].Perhaps this mechanism eventually produced genes Dm nxf3 and Dm nxf4, which contain no introns in D. melanogaster.It is important to note, however, that the lack of introns in genes that are orthologous to Dm nxf4 is not characteristic of all species of the genus Drosophila for which it has been described (FlyBase, 2012).
LLR and NTF2-like domains responsible for the interaction of NXFs with various partner proteins are the most highly conserved [39].
Most genes of eukaryotes have introns that are then deleted from pre-mRNA during splicing.As a rule, cellular mRNAs with introns do not leave the nucleus [16,46].Translation of incompletely spliced mRNA due to the presence of a premature termination codon may produce truncated proteins, which are potentially deleterious to a cell.In the nucleus, transcripts undergo quality control, and normally only completely spliced mRNAs are capable of export [47,48].
Genes that correspond to transcripts with intron retention are not uncommon in humans [49].The expression of intron-containing messages has been shown to occur in a variety of diseases including several cancers [50], and also as a response to vascular injury in rats [51].Transcripts with retained introns may serve as sources of alternative protein products with an independent function that may, among other things, influence the function of a full-length product [52].
If mRNAs with retained introns are abundant in the cytoplasm, intron retention is probably regulated by factors involved in both splicing and mRNA export.Some of the retroviruses have been shown to export unspliced RNA by means of cis-acting RNA elements, termed constitutive transport elements (CTEs), which interact directly with cellular export proteins [53].Export of mRNA with retained introns in simple retroviruses is carried out with the help of NXF1 (TAP), which binds directly to the CTE sequence in the genome of retroviruses [15][16][17][18]24,53,54]. Microinjection experiments performed on Xenopus oocytes have demonstrated that TAP (Hs NXF1) directly interacts with the CTE, allowing the export of CTE-containing RNAs [15].Moreover, TAP remains bound to CTE-containing RNAs in polyribosomes and may be present inside the nucleus or in the nuclear rim, as well as in the cytoplasm [55].

Alternative Intron-Retaining Transcripts of nxf1 Genes
Both the domain structure and the intron-exon structure of the NXF family proteins are evolutionarily conserved.Among the known nxf genes, most have an intron-exon structure.Mex67 in fungi, nxf4 in some species of the genus Drosophila, and Dm nxf3 in D. melanogaster, have no introns.A feature specific to nxf1 genes is the existence of alternative transcripts with a retained intron flanked by evolutionarily conserved exons: 110 nt upstream and 37 nt downstream of the respective intron.This intron from the evolutionarily conserved block, which can be excised or retained in alternative transcripts of nxf1, we named a "cassette" intron.Such an evolutionarily conserved exonintron block is characteristic of most of the nxf family genes that exhibit an intron-exon structure.There exist nxf genes in which the sequences of exons 110 nt and 37 nt are not separated by an intron, and are represented by exon 147 nt: Dm nxf2 and Ce nxf2, for example.Transcripts of nxf1 genes with a cassette intron between exons 110 nt and 37 nt have been shown for nxf1 genes in M. musculus [8], H. sapiens [56], and D. melanogaster [57].Such nxf1 transcripts have been found in other species, as well.ESTs (expressed sequence tags) include a portion of one of the aforementioned exons and a portion of the cassette intron (Genbank, UCSC) (see Table 1).
We have demonstrated that the transcript with cassette intron 5 -6 in the head tissues of adult fruit flies exceeds the universal, completely spliced transcript of the gene Dm nxf1, in its relative content (Figure 3) [57].Usually, intron-containing transcripts stay in the nucleus [46,48].Even if they enter the cytoplasm, they are subject to the cytoplasmic mRNA quality control mechanism (surveillance) termed NMD (nonsense-mediated mRNA decay) [58,59].The large number of transcripts of nxf1 genes with a retained intron in different organisms raises the following questions: 1) How do intron-containing transcripts manage to exit the nucleus, bypassing the mRNA quality control mechanism?
2) What makes the intron-containing transcripts immune to NMD in the cytoplasm?
3) What evolutionarily conserved features characterize cassette introns in alternative intron-containing transcripts of nxf1 genes?
Analyzing specific features of the intron retained in alternative transcripts of nxf1 genes may help answer these questions.The existence of a sequence with a length of about 100 nt, which resembles the CTE of the MPMV virus in cassette introns 10 -11, represented in genes Hs nxf1 and Mm nxf1, has been observed [56].This sequence is highly conserved in cassette introns of nxf1 in various animals (Figure 4).Hs nxf1 mRNA with the retained intron is exported from the nucleus and is represented in a polysomic fraction [56].Due to the presence of a premature stop-codon at the beginning of the cassette intron 10 -11, translation results in a truncated protein.A similar truncated protein has been discovered in human cells [56].The CTE sequence is an important element affecting the expression of intron-retaining mRNAs in mammals [60,61].The CTE has not been discovered in the corresponding introns of paralogous genes of the nxf family in mice and humans.
When comparing introns 5 -6 of nxf1 genes in different species of Drosophila, we have not discovered extended homologous sequences (such as cassette introns in genes orthologous to the nxf1 of vertebrates).However, we have found a common feature of introns 5 -6 of the nxf1 gene in different species of Drosophila: the presence of two poly(A) sequences, each with a length of around 100 nt (Figure 5).This feature accounts for the difference of cassette introns 5 -6 of nxf1 genes in Drosophilidae from cassette introns 10 -11 of nxf1 genes in vertebrates.
The cassette introns of nxf1 in Drosophilidae form complex secondary structures.Poly(A) sequences of introns 5 -6 of nxf1 in Drosophilidae is usually located in loops (Figure 6).Because the same RNA sequence can form several secondary structure variants, Figure 6 depicts only the most probable structures of introns 5 -6 of nxf1 genes in some Drosophila species.The choice of species was made in accordance with the relationships between Drosophila species, taking into consideration the phylogenetic tree constructed with the divergence of sequences of intron 5 -6 (Figure 7).
What happens to the poly(A) RNA sequence?The presence of a poly(A) tail is an important element in RNA export [62].Export efficiency depends on the length of this sequence.The poly(A) sequence is responsible for binding with corresponding proteins [63].The poly(A) sequence is believed to be a recognition target for export factors [62].
Taking into account the secondary structure of cassette introns of nxf1 genes in Drosophilidae, poly(A) sequences are probably open for interaction with proteins that may recognize them.Poly(A) sequences may not only serve as nuclear export markers, but may also protect the corresponding transcript from degradation in the cytoplasm.It has been demonstrated that the protein PABPC1 (cytoplasmic poly(A)-binding protein) suppresses NMD in D. melanogaster [64].

Specific Features of Retained Introns in Alternative Transcripts of nxf Genes
Cassette introns of nxf1 genes in different species of mammals form complex secondary structures, as well as cassette introns in Drosophilidae nxf1 (Figure 8).It is possible that this feature of cassette introns determines the fate of intron-retaining transcripts in the cytoplasm.It is also possible that cassette introns of nxf1 genes have an independent function both inside the intron-retained transcript, and as the product of splicing of the pre-mRNA of the nxf1.RNAs transcribed from introns are known to participate in a number of processes related to post-transcriptional control of gene expression [65].Cassette introns of nxf1 genes in vertebrates have four evolutionarily conserved regions (Figure 4).The first, with a length of around 130 nt, is the 5'-end of cassette introns (Figure 4).This sequence includes 17 complete    The first conserved region of cassette introns (Figure 4(a)) is partially complimentary to the fourth conserved region (Figure 4(d)), which is at the 3'-end of the intron.In secondary RNA structures the first and fourth regions form a "stem", thus "closing" the secondary structure of cassette introns 10 -11 (Figure 8).The supposed proteins, which correspond to the open reading frame in the third evolutionarily conserved sequence of introns 10 -11 of the genes Hs nxf1 and Mm nxf1, exhibit a great degree of similarity (data not provided).If the open reading frame is significant, it may be preserved in other cassette introns of nxf1 genes in vertebrates.This, however, we did not observe when evaluating the third evolutionarily conserved region of nxf1 genes in other vertebrates for the presence of a long evolutionarily conserved ORF.In all the depicted secondary structures of cassette introns, region three is a hairpin (Figure 8), which supports the assumption of a structural role of this evolutionarily conserved region.
Long non-coding RNAs often contain ORF.It is speculated that such sequences can be translated, but with very low efficiency, or only at a specific stage of development [66].Little is known about ncRNAs that code for functionally significant short proteins [67].Potentially, the possibility for synthesizing short peptides also exists for cassette introns of nxf1 genes.Short peptides, according to some researchers [68], may be a new class of bioactive signaling molecules.
Many transcripts, including those that code for a protein, also carry out regulatory functions.They are capable of interactions via a specific nucleotide sequence, thus playing a structural role or serving as catalysts [66,69].Evolutionarily conserved sequences in the cassette intron of nxf1 genes raise the questions of what adaptive advantages these sequences bring, and what the functional significance of intron-retained transcripts is.The evolutionary path of forming adaptive features may vary depending on the characteristics of a particular taxon.
The existence of sequences that may facilitate export of transcripts containing these introns from the nucleus to the cytoplasm is a feature shared between introns 10 -11 in nxf1 genes in vertebrates and introns 5 -6 in nxf1 genes of different species of Drosophila.This, along with the ability of these introns to form secondary structures,  suggests the functional significance of such transcripts.It is possible that the retention of the cassette intron in transcripts of nxf1 genes facilitates the synthesis of the truncated protein NXF1.We have shown that in the heads of adult flies the transcript with cassette intron 5 -6 is more abundant than completely spliced transcripts (Figure 3) [57].It would be interesting to test the hypothesis about the existence of the truncated protein Dm NXF1 specific to the brain tissues of the fruit fly.

CONCLUSION
A great variety of alternative transcripts, including intron-retained and non-coding RNAs, is characteristic of the nervous system.Thus, it is not surprising that transcripts with cassette intron 5 -6 in Dm nxf1 are found primarily in the head of fruit flies.Insects that are characterized by complex behavior are model organisms, which facilitate the study of the molecular mechanisms of the nervous system.The large number of transcripts with retained intron 5 -6 of the Dm nxf1 gene in the head of adult fruit flies points to the functional significance of these transcripts.The existence of evolutionarily conserved sequences in the cassette introns of nxf1 genes in animals within some taxonomic groups can be regarded as the acquisition of adaptive properties of the corresponding intron-containing transcripts.These sequences may provide benefits in nuclear-cytoplasmic transport, resistance to NMD, and a possible involvement in the regulation of gene expression.The ability of cassette introns to form complex secondary structures suggests that these introns may have an independent, possibly structural, function and also be a source of non-coding RNAs.
To build the secondary structures of nucleotide sequences we used the UNAFold v3.8 suite (http://mfold.rna.albany.edu/).Secondary structure prediction was based on the minimum free energy calculation [74].

Figure 1 .
Figure 1.Phylogenetic tree of NXF family sequences, constructed for species of the phyla Nematoda, Arthropoda, and Chordata.The tree was constructed for sequences of ubiquitous transcripts.The tree was drawn by the MrBayes program (2,000,000 generations).Numbers denote posterior clade probabilities.

Figure 2 .
Figure 2. Intron-exon structure of Hs nxf1 and Dm nxf1 genes.Arrow shows cassette intron.The color of exons corresponds to color of domains in protein.

Figure 3 .
Figure 3. Northern blot analysis of total RNA from different tissues of the adult Drosophila female.1-ovaries, 2-heads.The sizes (in kb) of molecular weight RNA markers are indicated on the left (Ivankova et al., 2010).

Figure 6 .
Figure 6.Secondary structures of cassette introns 5 -6 of the nxf1 genes in five species of Drosophila.Marked loops, formed using poly(A) sequences.

Figure 7 .
Figure 7. Phylogenetic tree constructed using divergences of the sequences of cassette intron 5 -6 of nxf1 genes in eleven species of Drosophila.truncated proteins corresponding to the intron-containing transcripts in other vertebrates (Figure 9).The first stop-codon in intron 10 -11 is duplicated by the next one, 12 nt down.Evolutionary conservation of the first region (Figure 4(a)) suggests the functional significance of the truncated NXF1.The only species that is not present in the list is the M. musculus.Deleting one of the five nucleotides (C) at the position 40 -44 of Mm nxf1 intron shifts the open reading frame compared to other species, The second evolutionarily conserved sequence (Figure 4(b)) contains CTE and has been identified by Li with co-authors [56].Whereas this sequence does not contain an ORF (open reading frame), the third evolutionarily conserved sequence (Figure 4(c)) in introns 10 -11 of the genes Hs nxf1 and Mm nxf1 is a long open reading frame.

Figure 8 .
Figure 8. Secondary structures corresponding to the minimum energy state of the cassette intron of the nxf1 gene in different species of mammals.A-Callithrix jacchus; B-Canis lupus familiaris; C-Monodelphis domestica; D-Loxodontha africana; E-Rattus norvegicus; F-Equus cabalus.Roman numerals denote evolutionarily conserved regions: I-5'-end of intron, II-CTE sequence, III-third conserved sequence, IV-3'-end of intron.

Figure 9 .
Figure 9. C-terminal end of the short isoform of the NXF1 protein in mammals, which is presumably translated from an alternative transcript containing unspliced intron 10 -11.

Table 1 .
Characteristics of the genes of the nxf family in different animals: information about the introns retained in alternative tran- scripts of a number of genes of this family.