Molecular Variability and Genetic Structure of IYMV in Burkina Faso

Imperata yellow mottle virus (IYMV, Sobemovirus) was first described in 2008 in the south-western region of Burkina Faso (West Africa). The genetic diversity of IYMV was not documented up to day. In this study, the variability of CP of IYMV was evaluated through the molecular characterization of 38 isolates collected in the western part of Burkina Faso. Comparison of sequences of these new isolates and one IYMV sequence available in GenBank revealed that the average nucleotide diversity was low. The ratio of nonsynonymous over synonymous nucleotide substitutions per site was low, indicating a CP diversification under strong purifying selection. Despite of the low nucleotide diversity, phylogenetic analyses revealed segregation of IYMV isolates into six major clades. There was no correlation of phylogenetic grouping of isolates based on geographical location. This is the first study of the genetic diversity of IYMV.


Introduction
The perennial grass Imperata cylindrica (L.) P. Beauv., a common and persistent weed in many food crops such as like cassava, maize, sorghum and rice is consi-observed and demonstrated conclusively in Zea mays and I. cylindrica [4]. Experimentally, the virus has a crop host range including two cereals (Sorghum bicolor, Pennisetum glaucum) [5] and three wild grasses (Rottboellia exaltata Setaria verticillata, Brachiaria xantholeuca [4] [5]. Contrary to other sobemoviruses, it remains unknown whether insects such as beetles or even the I. cylindrica seeds themselves can serve as vector for IYMV infection. IYMV is a positive single stranded RNA virus with the particle of 32 nm in diameter. Its genome is 4.447 nucleotide long and comprises five ORFs [6].  [6]. Such putative fifth ORF is also present in the IYMV genome, and overlaps the 5' end of the ORF2a in the +2 reading frame ( Figure 1).
Until now, only one complete IYMV genomic sequence from western region of Burkina Faso (West Africa) had been published [4]. The molecular diversity is therefore not documented and several important factors of epidemiology of IYMV are still poorly understood, such as alternative hosts in fields. Nevertheless, the knowledge of IYMV genetic diversity is essential for a better description of its aetiology, pathogenicity, and ecology develop appropriate strategies to counteract the IYMV spread and disease. The most common molecular markers for investigation of genetic diversity of the genus sobemovirus and other plants virus is a coat protein [8] [9]. In addition, on the basis of coat protein genes sequences, various viruses have been grouped [10]. The aim of this study was therefore to investigate the genetic variability based on molecular analyses of CP gene sequences originated from IYMV isolates obtained from different locations of Burkina Faso.

RNA Extraction, RT-PCR Amplifications and Sequencing
Total RNA was extracted from frozen infected Imperata cylindrica leaves using the RNeasy Plant Mini Kit (Qiagen), according to manufacturer's instructions. Slight modifications were made on the protocol to optimize the quality and quantity of the total RNA. The quality of RNA extraction was compared by measurement of RNA concentration. Reverse-transcription (RT) was performed using the primers IYMV-R4438-4454 while Polymerase Chain Reaction (PCR) was performed using IYMV-F3483-3502 and IYMV-R4385-4394 described by Koala et al., 2017. All steps and conditions, including, RT and PCR followed the protocol of koala 2017 [5]. All PCR products of the correct size were purified from 1% agarose gels using GENECLEAN turbo Protocols columns before being sent to Genewiz (Essex, UK) for sequencing.

Recombination and Genetic Diversity Analysis
The sequences contigs obtained in this study were assembled using the Seqman II program in the DNASTAR 10.0 (DNAStar Inc., Madison, USA). The 38 sequences were then compared and analyzed with the available GenBank accession NC-011536 sequence (Table 1). Multiple nucleotide sequence alignments were performed by using CLUSTAL W with default parameter [11].
Alignments were also adjusted manually to guarantee correct reading frames. Noncoding sequences were removed before alignment.
As frequent recombination can provide a false positive signal for positive selection in codon specific analytical methods this paragraph is necessary. So, you need to identify and remove recombinant sequences before implemented selection pressure acting on CP genes. Interestingly, this analysis could provide important results which can improve the paper quality.  [12]- [18]. The default detection thresholds were used. Only events supported by three kinds of methods were retained.
Pairwise genetic distances among nucleotide and amino acid sequences were calculated using the Kimura's two parameters [19] and using the Jones Taylor Thornton (JTT) model implemented in MEGA v.6.0 [20]. To evaluate variation in selection pressure, during CP evolution, the direction and degree of selective constraints operating in a coding region were assessed by the ratio between nucleotide diversities at nonsynonymous and synonymous positions (dNS/dS).
The extent of IYMV variation among these sequences was evaluated using the index π by DnaSp version 5.0. With a sliding window of 100 nt and a step size of 25 nt. The parameter π is the mean number of nucleotide differences per site between two sequences to measure the nucleotide diversity.The value assigned to the nucleotide was that of the window midpoint.

Construction of Phylogenetic Trees
Phylogenetic relationships between isolates were inferred by maximum-likelihood (ML) methods. The best fitting nucleotide substitution model with the lowest BIC score was determined using MEGA v.6.0 [20].

Imperata cylindrica Harvest Campaigns Identify Up to 38 IYMV Isolates in South Western Burkina Faso
Within the frame of a 3-year harvest campaign in distinct areas of South-Western Burkina Faso (Figure 2), a total of 38 samples of I. cylindrica leaves were analyzed for Imperata yellow mottle virus detection.
As expected, RT-PCR on total RNAs from infected plant materials resulted in the amplification of DNA fragments of about 1000 bp for all sample listed in Table 1. PCR amplifications representative of different plants are shown ( Figure   3).

Recombination Analysis
In   These four potential recombinants were detected by one or two methods of RDP program with a low degree of confidence. In addition, one of the parental isolates was often unknown. Based on the criteria of recombination selection, these Potential Recombination events were not accepted. No evidence for potential recombination events was found among the other isolates using RPD4.

Sequence Analysis
The average of genetic diversity among the 39 listed in Table 1 was 4.6% for nt, with the peak (7.6%) of nucleotide substitutions per site between sequences present at the 5' half N-terminal protein coding region ( Figure 5). The average number of nucleotide substitutions per synonymous sites was high (π s = 0.164), yet 18 times higher than the number of nonsynonymous diversity (π a = 0.009), i.e. a ω ratio (π a /π s ) of 0.07. The maximum of the nonsynonymous and synonymous diversity between two any sequences was 2.1% and 25%, respectively.
As ω < 1, this suggests that the CP sequences are under high purifying selective constraints. The p-value of the Z test was highly significant (P < 0.001) and confirmed that, diversification in the CP gene of the BF isolates was found under a strong purifying selection. Using Fisher's codon based exact test included in MEGA v.6.0 there was no evidence for positive selection (data not shown) [20]. Total number of nucleotide sites of the 39 IYMV sequences was 822 nt encoding 273 amino acids. The 273 aa residues were dominated by hydrophobic amino acids.
Analyses of the polymorphic sites among sequences of the Burkina Faso isolates revealed 136 variable sites for nucleotide and 24 for amino acid sequences. Indeed, 13% and 10% of amino acids changes resulted of mutations at 1st and 2nd nt positions of codons, respectively. We also noted that conserved amino acid sequence of CP of IYMV exhibit several common features of sobemoviruses. The N-terminal region is rich in basic amino acids and contains an arginine

Phylogenetic Analysis
A total 39 CP gene sequence were analyzed. The phylogenetic relationships among the sequences were constructed using maximum-likehood methods ( Figure 6). The 39 CP nt sequences revealed segregation of the isolates under study into six clades.

Discussion
To assess the genetic diversity of IYMV, we compared CP sequences of 39 iso-  [24]. The fact that Imperata cylindrica is a perennial grass, a nonfood and designated as a noxious weed in agricultural and nonagricultural fields the West Africa prevent the exchange of Imperata cylindrica propagation material. The spread via rhizomes is main mechanism of spread of Imperata cylindrica although some research indicated a spread by seed dispersal [25]. Therefore, Imperata cylindrica cannot spread to very long distance. The heterogeneous sequences between isolates of IYMV could be explained by the great potential of genetic variation in Imperata cylindrica reported recently [26].
Indeed, perennial grass survive a long time in nature, adapting to different environmental conditions and consequently sometimes involves the development of new ecotypes. This is also true for viruses that infect perennial plants to maintain themselves and adapt to new environmental conditions [26]. Indeed, during the adaptation to the new conditions the multiplication of viruses is accompanied by various mutations due to the lack of repair process associated with their RNA dependent RNA polymerase [21]. In addition, it have been reported that the purifying selection often results in amino acid changes with functional or structural modifications such as genome protection, cell-to-cell movement, transmission between plants, interactions with the host and/or vector, etc. [22].
IYMV CPs sequences are under high purifying selective constraints, the structuration of phylogenetic clades revealed in our analysis that the structure in six clades were associated with amino acid changes, particularly the R-domain region of coat protein (1 -66) ( Table 2)

Conclusion
This is the first study of the genetic diversity of IYMV in Burkina Faso and we think that will allow contributing to a better understanding of IYMV evolution and epidemiology in Burkina Faso. In addition, the diagnosis using the specific primers will make of useful tool for population structure studies of IYMV in Burkina Faso. Although, its results are a prerequisite for further management of imperata yellow mottle disease, it would be interesting to study the genetic diversity in the neighboring countries such as in Mali and Benin (Data not shown) where the presence of IYMV has been suspected. As the global diversity of IYMV is low, it would be interesting to obtain the complete sequence of other proteins in a number of other viral protein from the different isolates representative of the 6 clades. In this context, will be particularly interesting to sequence the P1 protein as it has been demonstrated for Rice yellow mottle virus [29] that P1 displayed the highest diversity in the RYMV genome, and the VPg protein has is the major determinant for resistance breaking in RYMV [30].