Identification and Characterization of Reverse Transcriptase Fragments of Long Interspersed Nuclear Elements ( LINEs ) in the Morus notabilis Genome

Reverse transcriptase (rt) fragments from LINE retrotransposons in the mulberry genome were analyzed in terms of heterogeneity, phylogeny, and chromosomal distribution. We amplified and characterized conserved domains of the rt using degenerate primer pairs. Sequence analyses indicated that the rt fragments were highly heterogeneous and rich in A/T bases. The sequence identity ranged from 31.8% to 99.4%. Based on sequence similarities, the rt fragments were categorized into eight groups. Furthermore, similar stop codon distribution patterns among a series of clones in the same group indicated that they underwent a similar evolutionary process. Interestingly, phylogenetic analyses of the rt fragments isolated from mulberry and 13 other plant species revealed that two distantly related taxa (mulberry and Paeonia suffruticosa) grouped together. It does not appear that this phenomenon resulted from horizontal transposable element transfer. Fluorescence in situ hybridization analysis revealed that most of the rt fragments were concentrated in the subtelomeric and pericentromeric regions of the mulberry chromosomes, but that these elements were not abundant in the mulberry genome. Future studies will focus on the potential roles of these elements in the subtelomeric and pericentromeric regions of the mulberry genome.


Introduction
Transposable elements (TEs), which were first discovered in maize by Barbara B. Ma et al.McClintock, are also known as "jumping genes" because of their ability to replicate and move to new genomic locations [1].They are ubiquitous and abundant components of all eukaryotic genomes, and play important roles in the structural organization and evolution of genes and genomes [2]- [7].Based on their mechanism of transposition, TEs are classified as retrotransposons (Class I) or DNA transposons (Class II) [8] [9] [10].Class I retrotransposons move to new chromosomal locations via an RNA intermediate (i.e., "copy and paste" mechanism).In contrast, Class II DNA transposons move via a DNA intermediate (i.e., "cut and paste" mechanism) [6] [8].Depending on whether or not they are flanked by long terminal repeats (LTRs), retrotransposons can be further classified as LTR or non-LTR retrotransposons [10].Non-LTR retrotransposons are usually further divided into long or short interspersed nuclear elements (LINEs and SINEs, respectively) [10].
The LINE retrotransposons are ubiquitous, showing a great variation in structure and size [11] [12] [13].A large body of knowledge on mammalian LINEs has been accumulated [14] [15].In contrast, LINEs in plants have been poorly investigated.The first identified plant LINE retrotransposon was Cin4 in Zea mays, which inactivates the A1 gene following its insertion into the A1 3'-untranslated region [16].Since that pioneer study, numerous other LINEs have been identified in taxa such as Lilium speciosum (del2) [17], Arabidopsis thaliana (Tal 1-1) [18], Chlorella vulgaris (Zepp) [19], Hordeum vulgare (BLIN) [20], Oryza sativa (Karma) [21], Ipomoea batatas (LIb) [22], Beta vulgaris (BNR) [13], and so on.Full-length LINEs have one or two open reading frames (ORFs) encoding proteins required for reverse transcription.The ORF1 sequence contains the gag gene, while genes for an endonuclease (en), reverse transcriptase (rt), and a cysteine-rich domain (Cys) encoding a putative RNA-binding motif are present in ORF2 [12] [23].The rt is a key enzyme for retrotransposition and shares several conserved domains that are typical of retroviral RNA-directed DNA polymerases [24].Previously studies have suggested that amplification of rt fragments using degenerate oligonucleotide primers complementary to the conserved domains of the rt is a feasible and efficient approach to evaluate the characterization of LINE retrotransposons in various plant species [18] [25] [26] [27] [28].
Morus (mulberry) is a representative genus of the cosmopolitan family Moraceae (Rosales), and comprises of more than 13 species (over 1000 cultivars), which are widely distributed in Asia, Africa, Europe, and the United States [29] [30].Meanwhile, mulberry attracts people for its delicious fruit and rich source of medicines against certain serious diseases [31] [32].The relationship between mulberry and silkworm is part of the best example of "plant defense-insect adaptation" [33] [34].The mulberry species, Morus notabilis, has a relatively small genome (estimated to be 357 Mb), and cytogenetic data suggest that M. notabilis is composed of 14 chromosomes (2n = 14) [35].The previous studies published from our lab are the only papers describing the presence of LINE retrotransposons in the mulberry genome [35] [36].Detailed characterization of LINE retrotransposable elements in mulberry has not been carried out so far.
In the present work, our objective was to characterize the diversity of rt fragments of LINE retrotransposons from the M. notabilis genome, which were amplified and cloned using degenerate primers.Meanwhile, this present work also attempted to characterize their heterogeneity and phylogenetic relationships.In addition, fluorescence in situ hybridization (FISH) was used to clarify the distribution of these elements within the chromosomes.These results will lead us to better understand the LINE retrotransposons roles on the structural, functional, and evolutionary dynamics of mulberry genomes.

Plant Materials and DNA Isolation
Young leaves of M. notabilis C.K.Schn (Taxonomy ID: 981085) (2n = 14) were obtained from mulberry trees growing in Ya'an, Sichuan Province, China.The collected young leaves were stored in liquid nitrogen until used.Total genomic DNA used as a template for the cloning was extracted from the young leaves using a standard cetyltrimethylammonium bromide (CTAB) protocol [37].

Polymerase Chain Reaction (PCR) and Cloning of Amplicons
The rt sequences of LINEs were amplified from the genomic DNA of mulberry using degenerate primers (forward: 5'-GGGATCCNGGNCCNGAYGGNWT-3'; reverse: 5'-SWNARNGGRTCNCCYTG-3') [18].The primers were synthesized by BGI (Shenzhen, China).PCR reaction mixture contained 20 ng DNA, 10 pmol of each primer, 0.25 mM of each dNTPs (Takara, Japan), 10× PCR buffer (including 3.5 mM MgCl 2 , Takara, Japan), and 1 U rTaq polymerase (Takara, Japan).PCR amplification was carried out in 96-well thermal cycler (Applied Biosystems, USA).The PCR program was: 94˚C for 5 min; 35 cycles at 94 °C for 1 min, 50˚C for 1 min, and 72˚C for 1 min; 72˚C for 7 min.PCR products were analyzed on 1.5% agarose gels and purified using the Agarose Gel DNA Extraction Kit (TaKaRa, Japan) according to the manufacturer's instructions.Purified products were cloned into the pMD19-T vector (TaKaRa, Japan) following the manufacturer's instructions.Two independent rounds of PCR amplification and cloning were carried out for the elements.The positive clones were verified by PCR and sequenced in both directions using M13 universal primers at Sangon Biotech (Shanghai, China).Clones were named according to the following rules: Mno stands for the Morus notabilis, L means the type of the element (L for LINE), and the serial stands for the clone number from Morus notabilis.default parameters [39].Sequence identities were calculated by BioEdit (version 7.2.5) with BLOSUM62 matrix [40].The locations of stop codons in rt sequences were demonstrated using the Gene Structure Display Server (GSDS, http://gsds.cbi.pku.edu.cn/)[41].Nucleotide sequences of the isolated rt fragments and 29 LINE rt sequences from 13 other plant species (Supplementary material 1) were aligned by MUSCLE (version 3.8.31)with default parameters [39].In order to perform phylogenetic analysis, MEGA6 was used to find the best-fit substitution models for those datasets with default parameters [42].The best substitution model (Tamura 3-parameter + G, T92 + G) was used to construct a phylogenetic tree according to the maximum-likelihood method with the pairwise deletion in MEGA 6 [42] [43] [44].Tree topology was assessed by bootstrap analysis with 1000 resampling replicates.

Chromosome Preparation and FISH
Mulberry chromosome spreads were prepared using young leaves treated with 2 mM 8-hydroxyquinoline in darkness for 3 h at room temperature (24˚C).Samples were fixed in a methanol/glacial acetic acid solution (v/v = 3:1) for 2 h at 4˚C, incubated in 1/15 M KCl for 30 min, and digested by an enzyme mixture (5% cellulose and 2.5% pectinase) at 37˚C for 3 h.After the cell walls were completely degraded, samples were spread onto slides.According to the manufacturer's instructions for the PCR DIG Probe Synthesis Kit (Roche), probes were labeled with digoxigenin-11-dUTP using PCR with degenerate primers.Fluorescence in situ hybridization was completed according to a modified procedure [45].Briefly, the prepared chromosomes (on slides) were treated with 100 μg/ml RNase for 15 min at 37˚C, and then digested with 1 μg/ml proteinase K for 10 min at 37˚C.Samples were denatured with 70% (v/v) formamide for 10 min at 72˚C, and then immediately treated for 5 min with each of 70%, 90%, and 100% (v/v) anhydrous ethanol solutions precooled to −20˚C.The hybridization mixture, which consisted of 2× SSC, 0.25 μg salmon sperm DNA, 10% (w/v) SDS, 50% (w/v) DS, 50% (v/v) formamide, and 400 ng labeled DNA probe, was denatured for 6 min at 96˚C.The slides and hybridization mixture were incubated at 80˚C for 10 min and then maintained at 37˚C for 16 h.The slides were washed with 10% (v/v) formamide for 10 s, 2× SSC at 37˚C for 5 min (five times), and 0.2% (v/v) Tween-20 at room temperature for 5 min.Digoxigenin was detected using FITC-conjugated anti-digoxigenin antibody (Roche), and chromosomes were counterstained with 4',6-diamidino-2-phenylindole (DAPI).Slides were viewed using a Leica DM2500 fluorescence microscope (Leica, Germany).Images were captured using the CV-M4+CL progressive scan charge-coupled device camera (DM2500, Leica) and analyzed using CytoVision software (version 7.3.1).

Identification of rt Fragments
The sequences of the expected 580 bp amplicons [18] were compared with se-quences available in the NCBI and GIRI databases using BLAST [38].In total, two independent rounds of PCR and cloning yielded 43 clones with homology to known retroelements in the NCBI and GIRI (http://www.girinst.org/)database.All these elements were selected for further analysis.All the clones are deposited in GenBank under accession numbers: KT900650-KT900692 (Supplementary material 2).

Characterization of LINE rt Sequences
Nucleotide sequences derived from the isolated rt fragments were aligned and used to construct a phylogenetic tree using the maximum-likelihood method in MEGA6.The identified rt sequences were classified into eight groups (Figure 1).Group I contained the most rt clones (41.8%, 18/43), followed by Group VI (30.2%, 13/43).These two groups accounted for 72.1% of the 43 clones and were further classified into several subfamilies.Additionally, sequences of clones from the same group were of almost the same length, and were highly similar (> 97%, except for Group II) (Table 1 and Supplementary material 2).The length of isolated LINE rt fragments ranged from 557 bp (MnoL_11, MnoL_13, and MnoL_15) to 595 bp (MnoL_31), with an average of 579 bp.The AT/GC ratio ranged from 1.26 (MnoL_20 and MnoL_37) to 1.98 (MnoL_42), with an average of 1.37, which indicated the rt sequences are rich in AT (Table 1 and Supplementary material 2).Pairwise comparisons revealed that similarity among 43 rt nucleotide sequences ranged from 31.8% (MnoL_20 and MnoL_31, MnoL_31 and MnoL_36) to 99.4% (MnoL_17 and MnoL_29) (Supplementary material 3).These results suggest that the rt sequences isolated were highly heterogeneous.
The alignment of amino acid sequences of multiple isolated mulberry LINE rt rfragments evealed that all sequences contained several premature stop codons, with the exception of the MnoL_10 sequence (Figure 2).The MnoL_42 sequence had 16 premature stop codons, which is the most of any clone.Additionally, the premature stop codons were in similar locations in the sequences from the same group (Figure 2).Furthermore, all sequences carry frameshift mutation, except for the MnoL_10, MnoL_23, and MnoL_31 sequences (Supplementary material 4).

Phylogenetic Analysis of LINE rt Clones
Phylogenetic analyses indicated that the mulberry LINE rt sequences are homologous to rt sequences in other species (Figure 3).One interesting feature of the result is that PTLRT19 grouped with other mulberry rt sequences (Group V, VI, VII), instead of other PTLRT (PTLRT12, PTLRT4, and PTLRT14) from Paeonia suffruticosa.In fact, P. suffruticosa and M. notabilis are distantly related taxa.
The phylogenetic between the LINE-rt sequences and the host species trees were incongruous (APG, The Angiosperm Phylogeny Group, http://www.theplantlist.org).Figure 3. Phylogenetic analysis of reverse transcriptase fragments from mulberry and other thirteen plant species.All nucleotide sequences of reverse transcriptase fragments from mulberry and the representative members of other thirteen plant species were aligned by muscle (version 3.8.31)with default parameters.Firstly, MEGA6 was used to find the best-fit substitution models for those datasets with default parameters.The best substitution model (Tamura 3-parameter + G) was used to construct a phylogenetic tree, using the maximum-likelihood method with the pairwise deletion in MEGA6.Frequency (>50%) of replicate trees in which the associated taxa clustered together in the bootstraps test (1000 replicates) were shown.Detailed information of other thirteen plant species used in the present research was shown in supplementary material 1.

Distribution of LINEs in the Mulberry Genome
Fluorescence in situ hybridization (FISH) was performed to study the distribution of these sequences along mulberry chromosomes.Chromosomal localization of the LINEs elements was performed using a heterogeneous probe cocktail containing all isolated clones.FISH with such a cocktail revealed that hybridization signals were mainly concentrated in subtelomeric and pericentromeric regions (Figure 4).

Characterization of LINEs
All cloned rt fragments could be classified into eight groups (Figure 1).Group I and VI consisted of the most rt clones (72.1%, 31/43).Only one clone (MnoL_42) contained in the Group VIII (Figure 1).The range of nucleotide sequence similarities between MnoL_42 and the other clones was only 32.8 to 37.2% (Supplementary material 3).Meanwhile, the number of premature stop codons in MnoL_42 was 16, which is much higher than in the other clones (Supplementary material 2).These results indicated that mutations accumulated progressively over evolutionary time.Combining these data with the phylogenetic analysis results (Figure 1) allows us to come to the conclusion that the MnoL_42 is an ancient LINE in mulberry.
The rt fragments amplified from the mulberry genome were highly heterogeneous.Almost all the 43 rt sequences described here contained frameshifts and premature stop codons (Figure 2 and Supplementary material 5), which were the main causes of the observed heterogeneity.These results are consistent with those observed in other plants, including Hordeum species [20] and Vicia species [27].Furthermore, the rt fragments from clones within the same group exhibited very few differences, suggesting that the heterogeneity among rt sequences is also the result of base substitutions, deletions, and insertions.As shown in Figure 2, similar stop codon distribution patterns among sequences of the rt fragments from the same group suggested that they went through a similar evolutionary process.As indicated in Table 1 and Supplementary material 2, the AT/GC ratio ranged from 1.26 to 1.98, with an average of 1.37.The results suggested that the rt sequences are rich in AT bases, which is important for the LINE copy and paste replication mechanism [12] [46].An intact LINE element contains two open reading frames, ORF1 and ORF2.The ORF2 contains reverse transcriptase, which is a critical enzyme responsible for the replication process of LINE [12] [23].One of the critical steps in the life cycle of LINE is that the ORF2 protein cleaves the first one DNA strand at the target.Due to the fact that the target sequence in this site is always rich, AT bases and the target site are usually similar to consensus TTAAAA [47] [48]; the AT bases content is high in the rt sequences to ensure that the target sites can be identified efficiently in the replication process of LINE.
Interestingly, there were no frameshifts or premature stop codons in MnoL_ 10 (Figure 2 and Supplementary material 5).All rt fragments from mulberry, with the exception of clone MonL_10, represented potential pseudogenes (possessed stop codons or frameshifts).It would be worthwhile carrying out further research on MonL_10, which may be a potential active transposable element.

Phylogenetic Analysis of LINEs
Interestingly, we found a phenomenon that PTLRT19 grouped with other mulberry rt sequences (Group V, VI, and VII), instead of other PTLRT (PTLRT12, PTLRT4, and PTLRT14) from Paeonia suffruticosa (Figure 3).While the two species (P.suffruticosa and M. notabilis) are distantly related taxa according to the APG (The Angiosperm Phylogeny Group, http://www.theplantlist.org).Similarity phenomenon had been found in other studies.For example, twenty-six genomes harbor at least one case of horizontal TE transfer (HTT), which may be important in TE-driven genome evolution, and these HTTs involve species as distantly related as palm and grapevine, tomato and bean, poplar and peach, and so on [49].It is hypothesized that HTT may be the reason for this phenomenon observed in our research.Further analysis was performed in this work.Accordingly, three criteria have been defined for the detection of HTTs: (i) patchy distribution of the TEs in phylogenies, (ii) high sequence similarity of the TE between distantly related taxa, and (iii) phylogenetic incongruence between the TE and host species trees [50] [51] [52].In the present work, although there is phylogenetic incongruence between the TE and trees of the two species, the range of nucleotide sequence similarities between PTLRT19 and mulberry rt sequences (Group V, VI, and VII) is only 0.520 to 0.562 (Supplementary material 4).These results suggest that the conclusion for PTLRT19 is that it is uncertain whether it represents a horizontal transfer event.

Chromosomal Localization of LINE Retrotransposons
The LINE distribution patterns are associated with LINE functions.For example, FISH experiments in Cannabis sativa suggested that differential accumulation of LINE retrotransposon elements onto the Y chromosome leads to sex chromosome heteromorphism [53].Although the chromosomal distribution of LINEs has been analyzed in only a few plant species, the FISH results here revealed that the distribution of LINEs in mulberry chromosomes was similar to that in sugar beet and peanut chromosomes [28] [54].Furthermore, the weak hybridization signals observed in this study indicated that LINEs were not abundant in the mulberry genome (Figure 4).Most of the hybridization signals were concentrated in subtelomeric and pericentromeric regions.The subtelomeric and pericentromeric regions are generally considered to correspond to the constitutively heterochromatic region.In fact, there is extensive DNA methylation in these regions [55].Thus, we hypothesized that the tendency for LINEs to insert into these regions may be related to DNA methylation in mulberry.
Meanwhile, recent reports suggest that LINE insertion into promoters can influence promoter functionality and gene regulation, resulting in up or down regulation of reporter genes [56].Insertion of a LINE into a gene can induce alternative splicing or change gene expression patterns, which can result in a change in the function of the gene [57].Although we currently have no evidence that the mulberry LINEs described here are active and functional, previous studies have indicated that some LINEs are active and functional in other species [58].So, our future studies will attempt to characterize the functions of mulberry LINEs more comprehensively, considering their localization in subtelomeric and pericentromeric regions.
Cloned sequences were compared with the previously characterized plant retroelement sequences in the National Center for Biotechnology Information (NCBI) (http://blast.ncbi.nlm.nih.gov/) and Genetic Information Research Institute (GIRI) (http://www.girinst.org/)databases using BLAST [38].The nucleotide and protein sequences were aligned using MUSCLE (version 3.8.31)with B. Ma et al.

Figure 1 .
Figure 1.Phylogenetic analysis of reverse transcriptases from mulberry LINE retrotransposons.All cloned nucleotide sequences of reverse transcriptase fragments from mulberry were aligned by muscle (version 3.8.31)under default parameters.The best substitution model (Tamura 3-parameter + G), which was tested by MEGA6, was used to construct a phylogenetic tree based on a maximum-likelihood method with the pairwise deletion in MEGA6.Only more than 50% of the frequency of replicate (1000 replicates) trees were shown.The sequences were classified into eight groups: Group I to Group VIII.

Figure 2 .
Figure 2. Premature stop codon positions in reverse transcriptase sequences from mulberry LINE retrotransposons.The locations of premature stop codons in rt sequences were demonstrated using the Gene Structure Display Server (GSDS, http://gsds.cbi.pku.edu.cn/).Each straight red line represents one of the eight groups of reverse transcriptase sequences (from top to bottom: Group I to Group VIII).The blue line means reverse transcriptase fragments.Red block appeared in the blue line means premature stop codons.

Table 1 .
Length, AT/GC ratio, and similarity [range (average)] of reverse transcriptase fragments from mulberry LINE retrotransposons.