Tracking Selection Signatures Based on Variation in OsLEA27 within Myanmar Landraces of Upland and Dryland Rice

To track the selection evident along the genome segment of OsLEA27 gene, a member of dehydrin gene family, 2.9-kbp nucleotide sequence containing the promoter 5’ upstream and transcribed region of OsLEA27 was determined for 35 upland and dryland Myanmar landraces from droughtprone areas. Nucleotide diversity, neutrality tests, haplotype network analysis, and linkage disequilibrium (LD) analysis were performed to infer the impact of selection and to investigate nonrandom associations of SNPs within all or part of the entire OsLEA27 region. The evidence for LD, the presence of two distinct haplotype groups across four different geographical regions, and the significant values obtained in a sliding-window analysis of mutation-drift tests all suggest the effects of selection on OsLEA27 in a set of 30 landraces. The neutrality test values for 5’ upstream region of OsLEA27 were significantly negative (p < 0.05) within the set of 30 accessions. The entire OsLEA27 region was significantly negative in accessions of the northern group, indicating a recent increase in population size or selection pressure. This evidence for selection signatures at OsLEA27 in this study sample provides insight into the roles of selection, crop adaptation, and genetic diversity in establishing present-day variation at the OsLEA27 locus.


Introduction
Rice (Oryza sativa L.) is well adapted to semiaquatic conditions but highly sensitive to water shortage during its growth and pre-reproductive period.Adaptation to water-shortage conditions is a desirable trait, especially for farmers from drought-prone areas.Studies of rice from drought-prone regions would open avenues to understanding of the influence of selection on the genetic variation found in such materials.
In Myanmar, rice landraces are generally grown in two different ecosystems: upland and dryland.The term "dryland rice" refers to landraces or local varieties that have high tolerance to water deficiency whereas "upland rice" refers to those landraces or local varieties having fair-to-good levels of drought tolerance.Since each of these ecotypes has distinct characteristics, landraces or local varieties from both upland and dryland regions are valuable resources for future utilization as germplasm and for study of drought tolerance.Ecosystems, morphological variations and complex physiological responses, local conditions, and poor cultural practices and management have played a vital role in shaping the diversity patterns of drought tolerance genes through natural selection [1]- [3].
During dehydration, cells undergo great physical and chemical changes, which may become irrecoverable and ultimately lead to cell death.Desiccation tolerance is a trait associated with LEA proteins [4].Desiccation-tolerant species synthesize group II LEA proteins known as dehydrin proteins [5] [6], RAB (responsive to ABA), or D-11.Dehydrins are induced by abiotic stress.Under conditions of dehydration, the presence of dehydrin proteins causes deformation of the cell and prevents it from rupturing during drastic fluctuations in plant water content [7].
Wang et al. identified eight dehydrin genes, OsLEA22 to OsLEA29, with bioinformatics approaches and performed expression analysis in rice.Expression of OsLEA27 and OsLEA28 was up-regulated by drought treatment as revealed by microarray analysis.However, the expression of OsLEA28 under both the normal conditions and the treatments could not be detected by sqRT-PCR [8].Therefore, we used the OsLEA27 gene to study the role of genetic variation in Myanmar upland and dryland rice landraces.
The regulation of dehydrine gene expression in rice has been extensively studied, and cis elements in the dehydrine promoters have been identified because dehydrin genes are important for tolerance to abiotic stress.An understanding of the nucleotide diversity and selection signatures (i.e., evidence of selection) in the OsLEA27 gene within upland and dryland rice landraces would provide information to know the molecular nature of adaptation from the view point of population and evolutionary genetic.The contribution of selection in shaping the genetic variation among individuals in landraces would provide valuable information because some kind of selection events associated with adaptation and the evolution of the new profile of species [9]- [11].For these reasons, focusing on the role of selection in shaping the patterns of diversity in OsLEA27 may be also one approach to know an evolution of drought tolerance trails (adapting to domestication or strong selection pressure) within Myanmar upland and dry-land rice landrace.The objectives of this study were to investigate nucleotide diversity and linkage disequilibrium (LD) and to examine the possible factors that influence selection on the OsLEA27 gene within Myanmar upland and dryland rice landraces.Additionally, tracking along the genome segment of OsLEA27 gene to detect signs that potentially indicate the action of natural or artificial selection within specific drought faced population is the main theme of this study.

Materials and Phenotyping
A total of 35 Myanmar rice landraces were used in the present study.These included 27 locally wide adapted landraces from upland and dryland areas (seven accessions from the central dry zone regions and the remaining 20 from northern and western upland regions) and accepted from Department of Agriculture, Myanmar.Phenotyping of drought adaptable traits was conducted in Plant Biotechnology Center, Myanmar at summer season.These 27 accessions were averagely assumed as an intermediate level of tolerance to drought accordingly to their panicle harvest index; 60 and above 60 percentage and yield per plant; about 8.4 g, by comparing with the lowland cultivar during the treatment of drought condition.Moreover, eight rainfed and irrigated local cultivars; five high-eating quality rice cultivars (panicle harvest index; 40) and three traditional local cultivars (panicle harvest index; less than 40) from the Ayeyarwadee delta were added (Supplementary Table 1).

Genomic DNA Extraction, PCR Amplification and Sequencing
The seeds were acclimatized at room temperature for 2 days, and then seed dormancy was broken by placing the seeds in an oven at 50˚C for 5 days.Plants of each accession were grown in a controlled environment of 28˚C and 14-h light conditions.At three week old seedling stage, DNA was extracted by using a modified CTAB method [12] and used for PCR amplification of the genomic region of OsLEA27.To amplify a 2.9-kbp region containing the promoter and transcribed region of OsLEA27, the six primer sets (Supplementary Table 2) were designed based on Os11g26760.1 locus (genbank NC008404.2,17154224 to 17157109) using Primer 3 (http://bioinfo.ut.ee/primer3-0.4.0/primer3/).Amplifications were carried out as one initial denaturation step at 95˚C for 5 min, followed by 40 cycles of denaturation step at 95˚C for 30 s, primer annealing at 55˚C -60˚C for 30 s, and primer extension at 72˚C for 30 s.
Sub-cloning of PCR products and transformation of competent E. coli were done according to the pGEM-T Easy Vector System Technical Manual (TM042) by using a T-Vector cloning kit (A1360; Promega Corporation, Madison, WI, USA).DNA sequencing was performed using an ABI PRISM Genetic Analyzer 3130 (Applied Biosystems Japan, Tokyo, Japan) according to the manufacturer's instructions.

Sequence Alignment and Statistical Analysis
The OsLEA27 gene sequences from 35 rice accessions, Nipponbare and Kasalath were aligned and compared using BioEdit software v.7.0.9.0 [13].Based on the pairwise differences, a neighbor-joining bootstrap consensus tree was constructed using MEGA v.4.1 [14].The proportion of segregating sites (θ = 4Neμ, where Ne is the effective population size and μ is the per-generation mutation rate [15], the average number of pairwise nucleotide differences among sequences (k) [16], the average number of pairwise nucleotide substitutions per site between two sequences (i.e., the nucleotide diversity π) [16], the number of polymorphic sites(S), the number of different haplotypes (H), and the haplotype diversity (Hd) [17] were computed by DnaSP v5 [18].Haplotype networks, representing unique alleles separated by mutational steps, were constructed using the Network 4.5.1.6program (http://www.fluxus-technology.com/sharenet.htm) with the median-joining method [19].LD within and upstream of the OsLEA27 gene was measured as r 2 (squared allele frequency correlation) [20] using DnaSP v5 and TASSEL v2.1 [21].Fisher's exact test and χ 2 test implemented in the DnaSP v5 program were used to calculate the P values for all pairwise comparisons of polymorphic sites within and upstream of the OsLEA27 gene.Tajima's D [22] and Fu & Li's D * and F * [23] were also calculated with DnaSP for testing deviation from neutrality.To understand whether different regions were subjected to different evolutionary forces, the neutrality tests were applied by a sliding-window method.Test statistics were analyzed on windows of 250 bp with a step size of 25 bp by using the function implemented in DnaSP v5.

Patterns of DNA Polymorphism in Various Regions of OsLEA27
We began by attempting to sequence the entire OsLEA27 region (i.e., the gene locus region plus the 5' promoter region) in 35 rice accessions.Five of the 35 accessions failed to show amplification of the 5' promoter region.Therefore, 35 sequences were used for gene locus region (890 bp) analysis but only the 30 sequences representing the entire OsLEA27 region (2885 bps) were used to search for selection signatures and for LD analysis.
In the 5' upstream region of OsLEA27 (1.9 kbp; NC008404.2(17,155,115 to 17,157,109)), which could be analyzed in 30 of the 35 accessions, there were 30 variable sites and π of 0.0029.Hd was 0.825 and H was 12.The average SNP frequency distribution was 1 SNP per 66 bp.Within the entire OsLEA27 region, 78 SNPs were found, with π = 0.00592.A higher level of nucleotide variation was found in the OsLEA27 gene locus (π = 0.0136, θ = 0.01481) than in the 5' upstream region (π = 0.00292, θ = 0.00387) (Table 1).In a sliding-window  analysis along the entire OsLEA27 region, one prominent peak of nucleotide variation was found in the intron region.No notable peaks were detected in the 5' upstream region (Supplementary Figure 1).When the landraces were grouped by geographical region into four subpopulations, the western upland (hereafter, "western") group showed the highest value of nucleotide diversity (π = 0.0127), followed by the central dryland ("central"), northern upland ("northern"), and delta (rainfed and irrigated groups) (Table 3).Similarly, the average number of pairwise nucleotide differences (k) within each subpopulation followed the same trend.The peak position for π was identical in the 30 accessions for which the entire OsLEA27 region could be analyzed, and divergence between accessions from the western group (highest value of nucleotide diversity (π); Table 3) and a combined group consisting of the remaining accessions in sliding-window analysis (Supplementary Figure 1).This finding implied that there were no significant nucleotide variations among landraces.Thus, the 30 landraces could not be classified by geographic region based on nucleotide sequence variation in the OsLEA27 region.  1 and Table 2).Although the frequency of two observed alleles was greater than the expected frequency within 15 segregating sites (Supplementary Figure 2), the Tajima's D, Fu & Li's D * test and Fu & Li's F * test did not detect any deviation from neutrality in the entire OsLEA27 region.

Tests of Neutrality in the Entire OsLEA27 Region
In contrast, sliding-window analysis of the mutation-drift equilibrium tests also showed that value of three neutrality tests were significant and were detected in the 5' upstream region.Moreover, positive values of D * that were significant at the 0.05 and 0.01 levels were detected in the intron region (Figure 1).The trends of D, D * , and F * were not uniform throughout the entire sequence because they are each calculated in slightly different ways.When analyzed by geographic location, accessions from the northern groups significantly deviated from selective neutrality (p < 0.05) for all three test statistics (Table 3).

Haplotype Diversity and Haplotype Network Analyses of the Entire OsLEA27 Region
Analysis of the entire OsLEA27 region revealed 16 haplotypes (Figure 2).Thirteen (81%) of the haplotypes were represented by a single landrace; these were designated as "private haplotypes".The highest Hd was found in landraces from the central region (Hd = 1, indicating that each one had a unique haplotype) and the lowest diversity was observed in accessions from the delta region (Hd = 0.524) (Table 3).Overall, the 16 haplotypes contained a total of 78 variable sites across all the accessions, which we further subjected to network analysis.The 16 haplotypes were categorized into two groups (I and II) by three parsimony informative sites (T55356G, C54628G, and C54567G).Group I was again divided into subgroups Ia (seven haplotypes) and Ib(Hap 8 only).Group II also comprised two subgroups (IIa and IIb) on the basis of four parsimony informative sites (C55543A, A55057G, T54983C, and T54258A) (Figure 3(a)).Group II contained six private haplotypes originating from three different regions along with one unique haplotype (Hap 9) represented only by accessions from the delta region.Hap 16 formed a separate group, distinguished from the two major haplotype groups by a high number of polymorphic sites.Approximately 50% and 37% of all the accessions had haplotypes from groups I and II, respectively, with the remaining 13% belonging to Hap 16 (Figure 2   performance of drought adaptable trait (panicle harvesting index; PHI% = about 70% or over 70%) except two accession AC-33 and AC-31 from delta region (Supplementary Table 1).Members of both major haplotype groups were found in all four geographic regions.

Linkage Disequilibrium in the Entire OsLEA27 Region
Linkage disequilibrium (LD) was studied by calculating the p value of Fisher's exact test for all pairwise comparisons of polymorphic sites.Figure 3 shows the decay of LD with increasing nucleotide pairwise distance for 30 rice accessions in the 5' upstream region (Figure 3(a)), the OsLEA27 gene locus (Figure 3(b)), and the entire OsLEA27 region (Figure 3(c)).For the 78 polymorphic sites in the entire OsLEA27 region, there were 2926 pairwise comparisons: 1318 (45%) showed significant LD between pairs of polymorphic sites.This percentage was higher than for the upstream region (27%) and lower than for the OsLEA27 gene locus (59%) (Table 4).The higher percentage of significant pairwise comparisons in the OsLEA27 gene locus relative to other regions is noteworthy and might represent a real difference from normal LD decay.Even more striking was the low frequency of significant comparisons in the 5' upstream region, normal decay of LD was detected along the entire OsLEA27 region.
The value r 2 is the squared allele frequency correlation to fit the logarithm curve across distances for all SNPs loci examined.In the 5' upstream region, the r 2 value for LD appeared to increase slightly with increasing distance (Figure 3(a)).In the OsLEA27 gene locus (890 bp), rapid decay of LD was found between 400 bp and 450 bp.Almost all of the significant pairwise comparisons (640 of the 1081 tests) were scattered within exon 1 and the intron region (Figure 3(b)).
The occurrence of rapid decay LD in OsLEA27 locus depends on the sample size and its population history.To investigate LD decay along the complete OsLEA27 region, 32 rice accessions (the 30 with available 5' region sequence, japonica control; Nipponbare and indica control; Kasalath) were used for phylogenetic analysis.
Four accessions were grouped with Nipponbare as a separate group.Fifteen accessions from group I haplotypes were clustered with Kasalath (data not shown).LD along the 5' upstream and OsLEA27 locus was determined by χ 2 tests for the 30 accessions.In the group I haplotypes composed with fifteen accessions, the numbers of polymorphic sites in the 5' upstream and OsLEA27 locus region were seven and six, respectively.Six of the 21 tests within the 5' upstream region were significant, whereas one out of 15 in the OsLEA27 gene locus region was significant (Figure 4).These results indicated that the large number of significant pairwise comparisons within the OsLEA27 locus region was caused by the dimorphic pattern (i.e., the clustering into two distinct groups).

Nucleotide Diversity, Phylogenetic Analysis, and Haplotype Networks in a Group of Myanmar Landraces
In general, selection, demography, effective population size, and mating are the factors that affect SNP frequency and nucleotide diversity, two important measures of genetic variation [17].In Myanmar upland and dryland rice landraces, a lower level of nucleotide variation was found in the 5' upstream region of the OsLEA27 gene (π = 0.00292, θ = 0.00387) than in the OsLEA27 gene locus region.Human selection may act on regulatory elements in the 5' upstream region that affect the transcription of OsLEA27, whereas purifying selection may affect  the coding region (which had a low level of variation and πnonsyn/πsyn < 1).High nucleotide variation in the intron region was found in the OsLEA27 gene locus and consequently affected the phylogenetic analysis.These segregating sites in the intron played a role in grouping the studied samples and shaped the genetic variation towards selection.The observed phylogenetic relationships and haplotype network analysis indicated mixing of the japonica subspecies into the predominantly indica gene pool of upland and dryland landraces.
Considering the entire OsLEA27 region, two distinct haplotype groups are found in rice from several distinct geographic areas, providing further evidence for directional selection.Hap 1 is the predominant haplotype due to its high frequency in accessions from northern upland, central dryland, and western upland regions.Accessions from Hap 1 showed high performance in testing the drought adaptable trait (Panicle harvesting index; PHI% = above 70%) (Supplementary Table 1).The predominance of this haplotype seems to be the consequence of strong selection.
Even if clustering were totally independent of geographic origin, we would need to consider an uneven level in adaptation to stress conditions within geographic groups.Generally, rice has adapted to humans extending its range of cultivation and changes in several environmental features, and the effects of selection by humans would be one possible and important factor [24].

Tracking Linkage Disequilibrium along the Entire OsLEA27 Region
The LD curve representing the 5' upstream region suggests that there may have been a decrease in effective population size, leading to an increase in inbreeding.These two possible events can be correlated with domestication and a population bottleneck.It is interesting to note that the increase in LD in this region pointed to consider impact of selection pressure.The lower level of nucleotide diversity in the 5' upstream region and the higher LD showed the possible effects of selection, associated with environmental factors and/or human interaction.These results depict the correlation between nucleotide diversity and LD, which may provide a means to search for evidence of selection in particular parts of the OsLEA27; 5' upstream regions [25].Because the actual force of selection may depend on the expression level of OsLEA27, the promoter of which is active under abiotic stress [26].
When we analyzed the complete set of 35 rice accessions, dimorphism from wide geographical regions also showed that LD decay occurred rapidly.Rapid decay of LD within the OsLEA27 gene locus depends on its populationhistory.A wide geographic sample of germplasm has more chances to decay rapidly in LD [27].

Factors Affecting the Evidence of Selection Detected with Neutrality Tests
In a standard neutrality test, Tajima D, Fu & Li's D * and Fu & Li's F * tests in the entire OsLEA27 region failed but 5´upstream region was significant with negative value which indicate a recent expand in population size or selection pressure.That significant value area was 5' upstream region where cis-regulatory element be there; such regions facilitate transcription of a particular gene for functional adaptation.
Significant positive values of Fu & Li's D * test were found at an intron.The D * test statistic is based on the distribution of mutations in the genealogy and compares the number of old and new mutations [28].These results showed an excess of intermediate allelic frequency in intron-splicing sites.We support here that recent admixture between two highly divergent subpopulations may shape the diversifying selection model.In sliding-window analysis, sites responsible for significant negative and positive test values were identified in the promoter region and intron, respectively.
Accessions from the northern group deviated from mutation-drift equilibrium because of recent population size expansion or a recent selective sweep, as evidenced by the significant negative value of Tajima's D. We speculate here that individuals from several subpopulations intermated and that new haplotypes appeared in their progeny.The adaptation of rice, typically a semiaquatic species, to mountainous areas with limited water may be due in part the effect of the OsLEA27 gene, its promoter region, and/or the effects of these regions on one another.

Detailed Analysis of Selection Signatures within Myanmar Upland and Dryland Landraces from Different Geographic Groups
One target of this study was to analyze the selection signatures within landrace accessions from diverse geo-graphical regions.Even though no significant deviations from neutrality were initially observed with three neutrality tests in the entire OsLEA27 region, purifying selection was observed in the coding region.In fact, most protein-coding genes have evolved under purifying selection of varying strength [29].Therefore, purifying selection in the transcribed region was not sufficient evidence of artificial selection.We were interested to learn whether analysis of the promoter region would give results opposite to those obtained for the transcribed region.Sequence variation in promoter regions results in differential regulation of transcription [30] [31].Therefore, we assume that the increased LD and significant of neutrality tests in the promoter region of OsLEA27 has resulted from the effects of artificial selection on that area, resulting in decreased recombination in that region.Furthermore, the presence of highly variable sites in the intron with positive significant values of Fu & Li's D * indicates mixing of two highly divergent subpopulations.A dimorphism effect, i.e., mixing of the japonica and indica gene pools, was detected in the intron-splicing region.In my unpublished works of NCBI BLAST search showed that 11% of intron-splicing region sequences were nearly identical to japonicaOsLEA27 and 88% were similar to indicaOsLEA27.If Paw San group accessions from the delta region were the same accession identified as japonica type (Supplementary Table 1), which have similar fragment as indicaOsLEA27 according to a BLAST search, these results might suggest that there is diversifying selection between the japonica and indica sequence types.That haplotype dimorphism effect related with the selection events on candidate genes during the domestication of one or both of two subspecies [32].
The selection signatures in landraces from the northern group revealed significant positive selection.The mountainous ecosystem is natural selector for upland landraces, which remain under continuous selection pressure on slope areas that are unable to retain water.We speculate that natural selection was followed by artificial selection, which fixed the alleles of OsLEA27 that were common to the landraces from the northern group.Therefore, the characteristics of this study sample may result from intermating of individuals from different subpopulations and/or artificial selection imposed on this landrace group.For landraces from the western and central regions, more detailed studies are still needed to make firmer conclusion.

Conclusion
In this study, we have analyzed nucleotide diversity and selection signatures by using trait-oriented sequence analysis.The pattern of variation within OsLEA27 revealed weak selection pressure along the entire region.However, the significant excess of low allelic frequency in the northern group suggested that the haplotype structure and population differentiation had been caused by the action of positive natural selection.The high variability observed in the intron region may represent a signature of diversifying selection within these upland and dryland landraces that are very unique genotypes.Additionally, although this study was based on the 2.9 kbp genome segment of one specific gene, outcome information provided the powerful proof of concept for evidence of selection within specific population.The results of selection signature analysis of the OsLEA27 locus enhance our understanding of the roles of selection, crop adaption and genetic diversity on the variability of OsLEA27 in Myanmar landraces and will motivate further study of this locus and its association with drought tolerance.

Figure 1 .
Figure 1.Sliding-window analysis of neutrality tests along the entire Os-LEA27 gene: (a) Fu & Li's D * , (b) Tajima's D, and (c) Fu & Li's F * .The window size is 250 bp, and step size is 25 bp.Confidence intervals at 95% significance levels for each of the three tests are shown for a sample size of 30.Stars indicate the center of windows showing a significant statistic.Significance levels are shown as * (0.05 level) and ** (0.01 level).
(b)).Accessions from group I have high

Figure 2 .
Figure 2. Nucleotide polymorphisms among haplotypes and haplotype network.(a) Position of Nucleotide polymorphisms among haplotypes at the entire region of OsLEA27.The exact position of substitutions on NC008404.2(17154224 -17157109 bp) are indicated on the top base.Upper horizontal line represents gene structure of OsLEA27; short black and red vertical lines represent SNPs and replacement sites, respectively.Colored shading indicates parsimony informative sites, which differentiated the haplotype groups.Haplotypes were based on 78 SNPs.(b) Haplotype network at the entire region of OsLEA27.Each haplotype is represented by a circle; the number in each circle is the haplotype number, and its size reflects the number of accessions with that haplotype.Dashed shapes encircle the major haplotype groupings.Roman numerals on the haplotype network figure indicate the number of SNP sites between the connected haplotypes.Number of landraces from each geographic region is shown intable at right.Yellow color in the table represent Group I haplotypes and blue is Group II.

Figure 3 .
Figure 3. Scatterplot diagram of pairwise distance and LD in three regions of OsLEA27: (a) 5' upstream region, (b) OsLEA27 gene locus, and (c) entire region of OsLEA27 gene.LD based on r2 between all SNPs in the 5' upstream region, OsLEA27 gene locus, and entire region of OsLEA27 in 30 landraces.The logarithmic trend lines represent the average LD decay.

Figure 4 .
Figure 4. Tests of linkage disequilibrium between upstream region and OsLEA27 gene locus of 30 landraces and 15 accessions from haplotype group I with chi-square (χ 2 ) test.Number and percent of significant comparisons tests at three regions of OsLEA27 gene between the accessions from Group I haplotypes and whole accessions are shown table.

Table 1 .
Comparison of population and genetic parameters between the 5' upstream region, OsLEA27 locus, and entire Os-LEA27 region.

Table 2 .
Summary of DNA variation in the entire region of OsLEA27 gene locus of 30 landraces.

Table 3 .
Population variability parameters and genetic parameters for the entire OsLEA27 region.

Table 4 .
Assessment of linkage disequilibrium with Fisher's test.