Solanaceae Evolutionary Dynamics of the I2-nbs Domain

In Solanaceae family several plant resistant genes to pathogen (R genes) have been mapped and cloned. Most of them encode Nucleotide Binding Site Leucine Rich Repeat domain (NBS-LRR) protein. However, little is known about the resistance genes variability pattern and the evolutionary process acting on different species belonging to the same family. The aims of the present work, was to genotype and study the evolutionary relationship of fifty wild tomato accessions using the I2 resistance gene sequences. Thirty-three new candidate homologues I2 resistance gene nucleotide sequence were obtained from wild tomato species. Nucleotide polymorphisms in I2-NBS domain was detected in wild tomato species: diversity could have accumulated over a long time and species sorting could have produced new variants. In order to study the NBS-LRR domain variability we analyzed the evolution process acting on the amino acid sequence. The FEL method (codon Model) based on dN/dS, was used to estimate the presence of positive, negative and neutral selection acting on each codon. The I2-NBS domain sequence data studied seems to be under a general purification process of evolution. However, intermittent bouts of positive selection sites were detected in high variable regions. Phylogenetic analysis conducted within the Solanaceae family shows that the Solanum genus is under a rapid adaptative divergence process and Nicotiana and Capsicum clustered separately; Solanum peruvianum, in particular, displayed to be the most polymorphic specie. These results might be important for the identification of new sources of resistance genes to tomato pathogens.


Introduction
Resistance to pathogens attack in plant is regulated by genes that confer resistance to a specific pathogen.The relationship between plant and pathogen involves plant resistance genes (R genes) and pathogen avirulence genes (Avr-genes) [1].Plant resistance R genes, enable the plant to recognize the presence of specific pathogens and initiate defense responses [2].The R gene structure plays a key role in resistance, encoding, sometimes, specialized receptors that recognize the corresponding avirulence (Avr) produced by pathogens [3,4].Most of the R genes belong to an ancient family that encodes proteins with nucleotide-binding site (NBS) and Leucine-Rich-Repeat (LRR) domains [5][6][7] and, therefore, can also be detected in silico [8].The NBS shows to be quite useful to localize, identify and study resistance genes through cloning strategies based on the PCR approach allowing them to be retrieved in public databases [9,10].
The ability of plants to evolve under biotic pressure requires in-depth understanding of R gene organization and evolution.Domestication and early breeding led to the loss of some important features including resistances to diseases and pests.Therefore the attention of breeders turned to the wild relatives of cultivated plants [11].Important sources of resistance genes to tomato pathogens are reported in wild species [12].The I2 gene, identified in S. pimpinellifolium (L.pimpinellifolium), encodes an NBS-LRR protein that confers resistance to race 2 of Fusarium oxysporum f. sp.lycopersici, the causative agent of vascular wilt disease in many Solanum species [13].The I2-locus is localized in the long arm of chromosome 11 and contains at least 7 members.One member (I2C-1) confers partial resistance to race 2 of pathogen and another member (I2C-K) complete resistance [13,14].It is likely that other Solanaceae species harbor R genes related to I2.For example, the R3a gene on S. demissum chromosome 11 is an I2 homolog conferring resistance to Phytophthora infestans [15].The exploration of the natural biodiversity in wild tomato species could have a critical role in discovering other genetic source of resistance gene to overcome the new pathogen variants.This work was designed to achieve several aims: identify resistance gene homolog (RHGs) in wild tomato species, analyzing the polymorphism level of the I2-NBS; secondly, to evaluate models of I2 gene evolution and selection acting on specific amino acids; finally, to iden-tify essential residues that might be involved in the plant evolution of new specificities or in regulating defense activation.

PCR and Sequencing of the I2-NBS Domain in Wild Tomato Species
The I2 gene (AF118127) was analyzed with InterProScan tool (http://www.ebi.ac.uk/InterProScan/) to identify resistance gene domains.The NBS domain-specific primers were designed with Primer3 (http://frodo.wi.mit.edu/cgibin/primer3/primer3_www.cgi):I2F: CTGAAGGATTTGATGCTTTG I2R: GTCTTCCGACCTCTTCAAGT PCR was executed with 25 ng of genomic DNA, 10 pmol primers, 1 U of Taq DNA polymerase Kit (Invitrogen), 10 pmol dNTPs, 2 mM MgCl 2 in 25 μl reaction volumes.Amplification was performed using the following cycling conditions: 1 min 94˚C, followed by cycles 30 of 1 min at 94˚C, 1min and 30 sec at 60˚C and 2 min at 72˚C and 7 min at 72˚C.The PCR products were separated on 1.5% agarose gels and purified using High Pure PCR product purification Kit (Roche).Sequence reaction was made according to ABI PRISMR BigDye ™ Terminator v 3.1 Ready reactions Cycle Sequencing Kit protocol.The samples were purified using 1/10 vol of Sodium acetate 3 M, 2.5 vol of ethanol (95%).ABI PRISMR 3100-Avant Genetic Analyzer was used for the sequenceing.All sequences were repeated three times to avoid any PCR artifacts.Low quality sequences were excluded.Identical sequences were recorded and one representative was used for phylogenetic analysis.All the unique sequences have been deposited into GenBank with accession numbers HM101253-HM101274.

Analysis of Intraspecific/Interspecific Level of Variability
Divergence between different population/species and the I2 reference gene was evaluated using DnaSP 3.0 [18].The divergence is expressed as the number of net nucleotide substitutions per site among different populationns (Da) along the nucleotide alignment sequence.

Dataset Building and Multiple Alignments
A

Multiple Alignments
All multiple alignments were generated using ClustalW [19] applying the default settings and manually edited with Bioedit 7.0 [20].
The nucleotide multiple alignment derived from the homolog I2 genes identified in wild tomato species was used as a backbone to align the corresponding amino acid datasets.Pairwise comparisons and multiple alignment were performed using MEGA3 [21] and Bioedit 7.0.A unique sequence dataset for nucleotide and amino acid sequences was generated.DnaSP 4.0 [18,22] was used to explore nucleotide polymorphisms in our wild tomato sequences.

Phylogenetic Analysis
Nucleotide phylogenetic analysis was conducted on I2 dataset.A unique sequences dataset was obtained using phylogenetic nucleotide trees analysis.GTR evolutionary model [23] using the PHYML v2.4.4 program [24] was applied.Non-parametric bootstrap [23] was performed to test the robustness of the tree topologies (1000 replicates).Trees were visualized with the Geneious software (Copyright © 2005-2007 Biomatters Ltd.).The JTT + T model was selected by Protest [25] and was used as the evolutionary model setting for the PHYML v 2.4.4 program [24] to create the phylogenetic amino acid tree (1000 bootstrap replicates).

Intraspecific and Interspecific I2 Variability in Wild Tomato Species
A specific PCR approach on I2-NBS domain was chosen to identify homolog genes in wild tomato species.A total of 33 wild tomato accessions were successfully sequenced and the data were used for the following analysis.Twenty-two unique sequences were submitted in GenBank database as reported in material and methods.A total of 11.187 nucleotides for I2 gene were obtained.BlastN analysis reveals for each wild tomato species at least 96% identity with the tomato reference gene.Multiple alignment of single fragments 339 bp long was performed evidencing several polymorphic sites (data not showed).The initial alignment required manual editing in order to minimize gaps.Figure 1 shows the nucleotide variability trend of each species compared to the I2 reference gene.All species (except S. neorickii and S. pimpinellifolium) showed a low level of variability trend from the nucleotide 0 to 143 and a general ipervariable nucleotide region (from nucleotide 163 to 323).In particular, two maximum peaks at 163 and 263 were identified.Intraspecific polymorphism level was evaluated for 4 species.Table 2 reports the following data: number of haplotypes, haplotype diversity (Hd), Eta (n. of single nucleotide polymorphisms), number of segregating sites (S) and finally the number of nucleotide differences per site (Pi).According to our results S. peruvianum showed the highest score of intra-specific variability and S. habrochaites showed the highest score for variability expressed as number of segregating sites and number of single nucleotide polimorphisms.

Solanaceae I2-NBS Sequences Catalogues
BlastN analysis against the NCBI database was conducted in order to collect the Solanaceae sequences with the highest level of identity closely related to the I2-NBS reference gene.A total of 84 published Solanaceae RGAs were retrieved.

Phylogenetic Analysis of Nucleotide Sequences
Phylogenetic analysis based on nucleotide sequences was conducted on NBS-I2-ALN using a likelihood test.Phylogenetic nucleotide trees were inferred using the GTR evolutionary model.A low level of distance in term of nucleotide substitution among all accessions was found (as indicated in the bar).The reliability of tree was then established by conducting 1000 bootstrap replications.The resulting phylogenetic tree can be studied, designnating 9 main groups (A-I) (Figure 2).Most of the wild tomato sequences cluster in group A. This dataset contains only Solanum sequences (including the reference functional gene I2).Group B is defined by five wild tomato species sequences originating herein.Cluster C includes potato late blight resistance protein R3a gene (STAY849382) and the tomato I2 paralogues (LEAF 004879) that cannot be easily positioned.The cluster D is characterized by only pepper (C.annuum) sequences, (bootstrap value 999).Group E includes few Solanum sequences difficult to cluster and one of N. tabacum.Finally, groups F, G, H, I contain: only N. tabacum sequences.

Analysis of Selection on Individual Codons within the NBS Domain
In order to study the NBS-LRR domain variability we analyzed the type of evolution process acting on the amino acid sequence.The FEL method (codon Model) was used to estimate the presence of positive, negative and neutral selection acting on each codon.The analysis of selection on individual codons suggested a general process of purification.Figure 3 reports a graphical representation of the ω trend.Most sites are under negative Copyright © 2012 SciRes.AJPS In our work we explored the putative presence of I2 alleles in 6 tomato species.The PCR approach showed that at least an amplicon is reproducible in all analyzed species.Considering the geographic distribution and the mating system of each species, we were not surprised to detect nucleotide polymorphism among species tested and versus the reference gene.S. neorickii and S. Pimpinellifolium displayed the highest net nucleotide substitution per site.The nucleotide diversity in S. habrochaites, S. peruvianum, S. chilense and S. corneliomulleri, instead, showed a conserved nucleotide sequence-core and a more variable core.The biological function of the conservative nucleotide sequence as well as the hypervariable nucleotide spot could be an important base for further plant-pathogen research [34].Nucleotide polymorphism heterogeneity along I2 sequences implies that diversity is also important in single species evolution history.
selection and few sites (characterized by purification) had a dN/dS normalized >0.In particular, using a 0.05 significance level, we found 65 sites under negative selection (ω < 1) and a few codons under a moderate positive selection process.The amino acid sequence between positions 32 to 52 seems to be characterized by a large number of codons under negative selection.The next region, instead, is a more variable sequence.Using a tighter parameter (a probability value of 0.1%) three amino acids under positive selection were evidenced in positions 16.55 and 80.

Discussion
The growing number of cloned R genes in the last 20 years offers the opportunity to study the evolutionary dynamics of this gene class.The ability of plant species to survive over evolutionary time might depend on their ability to maintain and usefully generate diversity at resistance loci [29].R genes are often members of tightly linked multigene families, which can be functionally diversified.There has been speculation on the forces that play a key role in the evolution of R genes, allowing plants to generate novel resistance to match the changing patterns of pathogen virulence.Sequence comparisons among these genes have revealed remarkable similarities in general structure, and variability of specific domains, that participate in protein-protein interaction and signal transduction [30].Evolution of resistance genes remains largely unexplored, but useful information has recently been gained from molecular and genetic analysis [10,31,32].Slight nucleotide variation in some strategic positions could have a very dramatic effect on protein function in intra or inter-molecular activity and hence in resistance response to pathogens and pests [33].
Our investigation revealed a high level of diversity within S. peruvianum accessions.This population displayed the highest variability in terms of mutational sites and diversity (substitutions, insertions, deletions) and haplotype diversity (Hd) whereas S. habrochaites showed the highest value of Theta (S, Pi, Eta) that is probably related to the mating system.S. peruvianum is a selfincompatible species and S. habrochaites an outcrossing facultative species and hence are subjected to a major genetic shuffle.The high level of polymorphisms identified in these species for R genes is consistent with molecular phylogenetic data previously reported [35].Molecular analysis of wild tomato species have shown that genetic variation within species decreases with an increasing degree of selfing [36][37][38].S. peruvianum is reported to be the ancestral species of wild S. lycopersicon species [39].Consequently, the huge potential for R gene diversity maintenance within a population could have accumulated over a long period of time.Lineage sorting of the polymorphism of S. peruvianum and the emergence of new variants after speciation events could have shaped Rgene diversity found in other species.The I2 locus originating from S. pimpinellifolium consists of a cluster of seven paralogous sequences on chromosome 11 [14].An orthologous member of S. pennellii that confers partial resistance to F. oxysporum, lycopersici race 2 was also identified [40].
I2 paralogous sequences were also found on chromosomes 8 and 9 and three regions along chromosome 11 [10,13].Recombination hot spots are reported in several R gene loci.This may be partially responsible for scatter of nucleotide polymorphism diversity among loci and species [41].R gene polymorphism is an important component of variation for resistance to pathogens, and new insights can be gained by investigating the genealogies of these genes.Hence our next efforts were to increase the amount of information related to the architecture of diversity in I2 genes in the Solanaceae family.We identified putative I2 orthologues in several Solanaceae species hosts of F. oxysporum (S. demissum, S. tuberosum, S. caripense, S. melongena, C. annuum, N. tabacum) by in silico analysis.Multiple alignments obtained both for nucleotide and amino acid sequences presented enough sequence similarity to design phylogenetic trees.In general, phylogenetic relationships evidenced by the I2 tree reflect Solanaceae species division [42].N. tabacum is the most distant Solanaceae species, followed by Capsicum (a separate group) and by tomato and potato, always clustering together.The R3a gene may have originated slighter before the I2 functional gene that is evolving rapidly in all Solanum species.The few S. melongena sequences were spread along the dendrogram.The fact that incompatible species generally consist of numerous heterogeneous populations [43] could explain this behavior.Moreover, a small group of five accessions including two S. peruvianum, two S. habrochaites and one S. pimpinellifolium species evidenced specific nucleotide features.Previous studies on phylogenetic analysis of I2 genes reported that evolutionary forces act on the NBS domain of this gene in agreement with the "birthand-death hypotesis" pattern of evolution [7] showing gene clustering between related species as well between distant species [34].Our data support this hypothesis, as clearly showed in the phylogenetic tree where, three main clades are evidenced: one clade containing only ancestral Nicotiana sequences, one containing only Capsicum sequences and a wider clade containing sequences from different Solanum species indicating that a more recent divergent selection is shaping the I2 domain evolution.Identification of genes going through adaptation plays a key role in understanding evolutionary biology [44].Sequence variability could be the result of random drift or could involve an "evolutionary selection process".In our paper we applied all strategies required to avoid false positive results and we applied models to maximize the detection of sites subject to selection.The ω is, for its simplicity and robustness, one of the most widely used evolutionary tests [32,[45][46][47].The portion of I2 protein characterized in this study generally presents a ω below 1.This indicates that purifying selection is a functional constraint on the evolution of the DNA sequence.In the global purifying background selection three residues with a positive value of ω were found.Couch et al. (2006) [34] identified in a different Solanaceae dataset a higher number of codon under positive selection in the I2-NBS domain.In the three-dimensional conformation the three positively selected sites are located in external areas.They could be involved in inter or intra-molecular interactions necessary for appropriate binding activities or in the negative regulation of defences in the absence of a pathogen effector [32].Purifying selection may be operating to remove or keep deleterious substitutions at low frequencies; single amino acid variation could be maintained by neutral selection and by intermittent bouts of positive selection.Translated amino acid sequences did not present stop codons suggesting that all the variants could be functional if a recognition event acts as switch.In conclusion R gene allelic diversity in plant wild populations is part of a complex evolutionary process for species survival.Selection for novel or diverse pathogen recognition capabilities is an important factor in species success.In the evolution of wild tomato species other important factors such as environmental conditions and mating should be taken into account.Tomato wild species evidence single accession peculiarities.The pattern of evolved difference suggests that the I2 gene is undergoing under a process of rapid adaptive divergence in the Solanum genus.Purifying selection serves to maintain the NBS cores stable, and neutral or positive selection ensures single amino acid variation.Elucidation of the selection mechanism acting on this domain could help to design new crop improvement strategies for the future.

Figure 1 .
Figure 1.Analysis of DNA divergence among wild tomato species sequences.Each species is compared with the I2 reference gene sequence.The divergence is expressed as the number of net nucleotide substitutions per site between populations (Da).The nucleotide positions are shown in ordinate.

(
AF408704) and S. lycopersicum I2C-2 resistance (AF00 4879) showed 99% identity with the reference gene previously described as members of the I2 cluster locus, as did the potato late blight resistance protein R3a gene (AY849382).A nucleotide alignment of the NBS domain (NBS-I2-ALN) unique sequences, 25 derived from our work and 56 from NCBI database, was created.The NBS-I2-ALN is 336 positions long with 22 gaps.In particular 6 gaps (2 codons) are typically shown only in 4 C. annuum genotypes (DQ205996; DQ206012; DQ205 998; DQ205980) from position number 38 to 44.Two tomato accessions (AF004879 and AF534287) showed 3 gaps at the same positions (51 -53); one N. tabacum and 2 S. melongena genotypes (respectively DQ206210, DQ20 6026 and DQ206073) displayed 6 gaps at positions 69 -78 (data not shown).Nucleotide sequences were translated into amino acid sequences according to the frame of the reference resistance gene.The new dataset of 67 unique protein sequences was used for further analysis.

Figure 2 .Figure 3 .
Figure 2. Evolutionary process in the I2-NBS domain in Solanaceae species.The trend of (dN/dS normalized) calculated according to the FEN model using the 0.05 significance level.The I2 resistance gene amino acid sequence was used as reference sequence.

Table 2 . Intra-specific polymorphism analysis of the I2-NBS domain. Variability is expressed as N (number of haplotype), Hd (haplotype diversity), Eta (number of single nucleotide polymorphisms), S (number of segregating sites), and Pi (nucleotide differences per site). The standard deviation is also indicated (D.S.) S. neorickii and S. pimpinellifolium were not included in this analysis because not enough sequences were collected.
ST = Solanum tuberosum, SC = Solanum caripense).ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) was used to translate all nucleotide sequences into amino acid sequences.The I2 amino acid sequence (reference gene) was used as template to identify the correct gene frame shift.