Detection of Adaptive Genetic Diversity in Wild Potato Populations and Its Implications in Conservation of Potato Germplasm

A better understanding on how genetic diversity is structured at natural habitats can be helpful for exploration and acquisition of plant germplasm. Historically, studies have relied on DNA markers to elucidate potato genetic diversity. Current advances in genomics are broadening applications allowing the identification of markers linked to genomic regions under selection. Those markers, known as adaptive markers, unlock additional ways to value and organize germplasm diversity. For example, conservation priorities could be given to germplasm units containing markers associated to unique geographic identity, and/or linked to traits of tolerance to abiotic stresses. This study investigated if adaptive marker loci were possible to be identified in a large AFLP marker dataset of ninety-four populations of the wild potato species S. fendleri. These populations originated from six different mountain ranges in southern Arizona, USA. A total of 2094 polymorphic AFLP markers were used to conduct genetic diversity analyses of populations and mountain ranges. Adaptive markers were detected using Bayesian methods which distinguished marker loci departing significantly from frequencies expected under neutral models of genetic differentiation. This identified 16 AFLP loci that were considered to be adaptive. To contrast diversity parameters generated with each set of markers, analyses that included all the 2094 AFLP markers, and only the 16 adaptive markers were conducted. The results showed that both were efficient for establishing genetic associations among populations and mountain ranges. However, adaptive markers were better on revealing geographic patterns and identity which would suggest these markers were linked to selection at the natural sites. An additional test to determine if adaptive markers associated to climate variables found two loci associated among populations revealed that only two were needed to build a core subset able to keep all the markers. This preliminary assessment shows that adaptive genetic diversity could offer an additional way to measure diversity in potato germplasm and to set up options for conservation and research.


Introduction
Use of valuable genetic diversity found in wild relatives is a traditional approach in plant breeding to add sources of resilience to pests and diseases and, adaptation to abiotic stresses. From that perspective, the implementation of ex situ crop germplasm repositories has played a key role in collecting, preserving, organizing, and very importantly, making that diversity quickly available to breeding and research groups. Repositories therefore require efficient work at all the phases of the conservation process [1] [2]. The most important is to implement methods that do not affect the genetic diversity preserved at genebanks. Knowing how much diversity is in the collections is a way to identify if undesirable changes are taking place and propose mitigation plans if needed [3].
Genebanks have relied on DNA or molecular markers as tools to quantify genetic variation and, to answer questions on diverse areas of the germplasm conservation process [4] [5]. Many of these studies have largely used PCR, fragment-based techniques, like RAPDs, SSRs, AFLPs, etc. These techniques assume that most of the DNA fragment sequences were neutral, that is, originated in genomic regions without gene expression or fitness advantage. In recent years however, advances in population genomics and computational methods have revealed that these markers are not totally neutral and that some of them actually are in regions experiencing selection [6]. This type of markers can theoretically identify genetic diversity associated to local selection and adaptation and thus, unlock additional options for characterization and organization of germplasm. For example, markers associated to local selection could identify germplasm units with unique environmental-geographical structure and/or with positive expression of genetic traits resulting from selective pressure to pests/diseases [7]. Previous works [8] [9] [10] [11] have indicated that robust genome scans done with dominant markers, like AFLPs, can provide enough number of markers and genome coverage to search for marker loci with adaptive signals. This uses extensive simulations to gauge population differentiation and identify marker loci with frequencies departing from those expected under neutral models of genetic differentiation. Significant changes in frequency would imply that loci were subjected to selection and therefore, potentially adaptive.
Bayesian methods and F ST -outlier tests have proved to be effective when assess-A. H. del Rio, J. B. Bamberg ing populations from the same taxa but adapted to different environmental conditions. Markers influenced by selection are anticipated to show unexpected high or low departures relative to the null dispersal of F ST for markers not under selection. This method has then become very popular because it is easy and fast to implement [10] [12].
The USPG holds more than 6000 accessions that include collections of potato wild relatives. These species distribute from southern Chile to the southwest USA and, have shown to be excellent sources of genetic traits that can be transferred to enhance the crop [4] [13]. Research at the USPG has utilized molecular markers to determine quantity and quality of the genetic diversity acquired and preserved in the collections. These studies have offered scientific foundations to, for example, support strategies for plant exploration and collecting [14], validate conservation approaches [15] and, assemble core subsets [16]. The important goal at the USPG has been to create well-informed scientific knowledge to assist decisions on potato germplasm conservation [4]. Another significant result from these studies was that genetic differentiation in natural populations, estimated with neutral markers, was not proportionally correlated to geographic separation nor linked to variation in environmental conditions [17]. Moreover, additional studies reported that DNA markers also failed to predict associations of phenotypic diversity (likely originated from local evolutionary processes) with geographically divergent taxa [18].
Here we are expanding our previous study [19] that applied richness of unique AFLP alleles to determine value for conservation in populations of the potato species S. fendleri. That study revealed that populations collected in Pinaleño and Chiricahua mountain ranges captured most of the unique markers so that genebanks should prioritize those regions to add diversity. This unlocked the idea of assessing for adaptive genetic diversity since it would offer an additional way to define germplasm value for conservation priorities. Therefore, this study evaluated patterns of genetic diversity in a group of geographically and environmentally diverse populations of S. fendleri collected in Southern Arizona to investigate the possibility of detecting adaptive markers and testing their potential use in conservation efforts.

Plant materials and sampling sites
The study was intended to identify levels of genetic differentiation in 94 populations of the wild potato species S. fendleri from six different mountain ranges in southern Arizona, USA ( Figure 1). These populations were collected in different USPG exploration trips [19] [20]. Solanum fendleri is a tetraploid (2n = 4x = 48) with disomic segregation. It is able to reproduce sexually either by selfing or outcrossing, and asexually by tuber propagation. The populations used this study inhabit a particular set of mountain ranges known as the Sky Islands.
As the name implies these mountain ranges are isolated by large extensions of dessert, valleys or grassland with distinct topographic and climatic conditions. It American Journal of Plant Sciences is believed that that setting restricts gene flow resulting in strong genetic isolation and differentiation among plant populations. Table 1 (Figure 1). Specifics of the geographic coordinates and habitat descriptions for each population at each mountain range are available at the USPG databases (https://npgsweb.ars-grin.gov/gringlobal/search.aspx). To assign patterns of genetic identity and variation, each mountain range was assumed to represent all the diversity of the populations it contained. That increased the odds of identifying marker loci that were uniquely expressed at each mountain range.

DNA extraction and Generation of AFLP markers
Extraction of DNA is described in [14]. In short, genomic DNA was isolated from individual young leaves in a bulk of 27 individual plants for each population using the DNeasy Plant Mini kit (Qiagen Inc., Valencia, CA, USA). The sampling strategy and extraction methods were set to counter the effect of genetic heterogeneity and to provide an adequate representation of total diversity of the population [21]. The generation of AFLP markers and estimation of genetic diversity followed USPG methods shown to be effective for wild potato species. Only polymorphic loci within S. fendleri populations were included in the analysis as they were true informative of genetic variation across genotypes and populations.

Estimation of genetic diversity and identification of adaptive AFLP markers
To determine levels of genetic diversity in the populations, polymorphic AFLP markers were used to calculate genetic relationships among populations as well as other standard estimates of population genetic structure using GenAlex version 6.5 [22]. Constellation plots [23] based on hierarchical and multivariate cluster analysis and Principal Component (PC) analysis were used to present visual representations of genetic diversity and differentiation among populations and mountain ranges.
To detect adaptive genetic diversity, an analysis to identify signatures of selection in the AFLP marker dataset was used. The approach assumed that frequencies at some marker loci were not random and exhibited unusual patterns of differentiation not explained by chance and probably reflecting selection. A statistical multinomial-Dirichlet model to identify these loci was applied using the program BayeScan [24]. BayesScan is presented as a hierarchical Bayesian me- 100,000 total iterations). The analysis was run three times to validate the outcome. The significance of adaptive loci detection was validated using posterior odds (i.e., the probability that an event will happen after all evidence has been taken into account) and used Jeffrey's scale with a false discovery rate (FDR ≤ 0.05) [24] which allowed identifying marker loci that can be reliably regarded as adaptive across populations.

Estimation of climate variables at collecting sites
To determine associations of DNA marker variation with climate variables observed at the sites, climate data was collected based on the geographical location and elevation for each population site. The climate profiles were extracted from Worldclim databases https://www.worldclim.org/. These databases contain archives of global historical weather and climate data. The variables used included monthly averages of maximum, minimum, mean temperatures and rainfalls. In addition, variables known as bioclimatic variables were added. They consist of monthly temperature and rainfall that are supposed to produce more biologically meaningful parameters. Bioclimatic variables involve annual trends (e.g., mean annual temperature, and annual precipitation), seasonality (e.g., annual range in temperature and precipitation), and extreme or limiting environmental factors (e.g., temperature of the coldest and warmest month, and precipitation of the wet and dry quarters). A total of 55 climate variables were applied in multivariate cluster analysis to establish climate-based associations and use them to test for adaptive marker associations.

Results and Discussion
Genetic diversity among populations All of the ninety-four populations of S. fendleri were effectively genotyped with 2094 AFLP markers generated using the USPG standard protocol. When all of these markers were utilized for genetic diversity assessments, it was found that the average percentage of polymorphic AFLP loci across populations was 69.5%.
The highest and lowest percentages of polymorphic loci were found in CHI (77.08%) and RIN (60.17%) ( Table 2).
Genetic distance pairwise associations among populations showed that the range between two populations went from a high of 42% to a low of 9%, with an average of 23.5%. A hierarchical cluster analysis presented in a Constellation plot shows that all the populations were able to be distinguished individually and that no duplicate populations were found (Figure 2). The plot also shows that, most times. populations from the same mountain range do not cluster together.
In fact, only a few clusters of populations sharing the same geographic origin are generated; i.e., some CAT populations are clustering in one unique large group.
( Figure 2). From a practical standpoint, however, this is an important result-it shows that AFLP markers can be used to differentiate the populations of S. fendleri used in the study as separate and unique germplasm units.   Table 1.
A. H. del Rio, J. B. Bamberg Genetic diversity and structure among mountain ranges The levels of genetic diversity (h) within-population at each mountain range ranged from 0.180 to 0.236 with an average of 0.204. The populations from PIN mountain range exhibited the highest genetic diversity, while HUA and CAT revealed the lowest (Table 2). This was particularly interesting, PIN mountain range contained the smallest number of populations among all the ranges, yet it showed the highest variation. A previous study [19] reported that PIN populations were also set apart because of their highest number of unique AFLP markers when compared with other mountain ranges. The additional number of unique loci could have been responsible here for increasing h levels.
A multivariate Principal Component (PC) analysis based on covariance of genetic distances confirmed that S. fendleri populations did not exhibit consistent association patterns based on their regional origins (Figure 3). In fact, in contrast, it was observed that populations belonging to different geographic ranges clustered together. To determine genetic relationships among mountain ranges, a distance matrix using Nei's genetic distance coefficients [22] was generated.
Nei's coefficients assumed frequency variation of AFLP marker alleles within populations at each mountain range and the variation over all loci. This analysis revealed that, genetically, mountain ranges were very close related. The average genetic distance between two mountain ranges was only 5%, ranging from 2% to 12%. Since PIN had the highest number of unique AFLP alleles, this mountain range was the most differentiated from the rest with an average of 10.5%. Other studies [19] had also reported that PIN populations contained the highest number of unique alleles among all the mountain ranges. On the other hand, RIT showed the least genetic differentiation overall with an average of 3.7%. In general, genetic differentiation levels were quite small which can explain why populations from different mountain ranges failed to resolve a coherent pattern based on genetic and geographic origin. This also corroborates previous results that genetic differentiation in natural potato populations cannot be correlated to geographic separation, or associated with specific ecological or environmental variables at the sites [17] [25].

Detection of adaptive genetic markers
We investigated whether Bayesian methods were useful to identify AFLP marker loci departing from neutral expectations of genetic differentiation in geographically (and environmentally) diverse potato populations. Previous work [26] had indicated that a robust coverage of AFLP markers across genomes would enhance the odds of finding non-neutral markers near to gene regions and/or to segments linked to differentiation in the genome. From that standpoint, 2094 was a significant number of AFLP markers to allow searching for outlier marker loci. With a detection confidence level at 95%, the BayeScan run discovered a total of 16 outlier loci possibly under selection. The parameters used followed threshold recommendations given by the program for a very strong evidence of selection (FDR < 0.05; log 10 (PO) = 1.2) (Figure 4).    Figure 5). For instance, it was noted that in 14 marker loci the alleles were not found at all in a given mountain range (i.e., allele at locus 677 was not present in CAT, allele at locus 378 was not found in CHI). In the other extreme, three cases showed adaptive marker loci with an allele frequency of 1.00 (see Figure 5). The adaptive AFLP locus most frequently observed was locus 689 with a frequency of 0.698 across mountain ranges, while the less frequent was locus 378 with a frequency of 0.167. In fact, locus 378 was almost unique, it was only detected in two mountain ranges (CAT and PIN) ( Figure 5).
A point to remark was that the ability to identify adaptive genetic diversity  with a Bayesian method required some key factors, in particular, marker coverage and the magnitude of linkage disequilibrium. At the time we wrote this paper, there was no specific data on genome size for S. fendleri. but according to the c-values database from the Kew Gardens (http://data.kew.org/cvalues/), the largest genome size for potatoes was that of S. tuberosum, about 860 Mb. Assuming this value as a possible higher estimate for the genome size of S. fendleri and taking into account the total number of AFLP markers used here, a conservative estimate for genome coverage was adequate; about one marker every 750 Kb on average. On the other hand, the level of linkage disequilibrium in S. fendleri is unknown, but was probably low because this species has some levels of outcrossing. Therefore, any of the detected outlier marker was probably closely linked to an adaptive gene or to a mutation truly responsible for the atypical pattern of genetic diversity [27].
Genetic diversity and geographic structure based on adaptive markers One question was whether adaptive markers were superior than neutral within mountain ranges were higher in adaptive markers (~14% more on average). Likewise, the averaged percentage of polymorphic loci increased significantly from 69.5% to 81.2% (Table 3).
A multivariate cluster analysis based on a genetic distance matrix built with adaptive markers showed stronger genetic similarities among populations from the same mountain range, and enhanced genetic differentiation among mountain ranges ( Figure 6). This improved discriminating power can be explained by the higher genetic distances found among mountain ranges. The averaged Nei's genetic distance between mountain ranges with adaptive markers was 37% (compared to only 5% when all the AFLP markers were included). The genetic distance between HUA and CAT mountain ranges was the highest (63%) while the lowest was found between RIN and RIT (5%). The PC graph clearly showed how adaptive markers had better power to merge populations from the same site, and to set apart the geographic ranges ( Figure 6). Other than some excep-    distances and make habitats very different even when they are located nearby, or very similar when they are far apart [28].
Two clusters containing populations from different mountain ranges revealed associations with specific adaptive loci. One cluster including two populations of PIN, 585115 and 655251, and three CAT populations (578234, 632327 and 658176) (Figure 7), shared the allele from locus 378. Moreover, this allele was exclusive for the two PIN populations. A detailed inspection of all of the climate variables found that all five populations had the same value for Precipitation of Wettest Quarter. A second example was found in a cluster of two populations of CHI (564026 and 592406) and four from HUA (275167, 283100, 641028 and 641037). All of them shared the allele from adaptive locus 21. This allele was exclusive for the two populations of CHI (Figure 7). In this case, this cluster had the climate variable Precipitation of Driest Month similar for all the populations.
Additional climate-based clusters associating populations from different mountain ranges were examined but no more significant associations with markers were detected. Though this analysis was very preliminary, the results present a glimpse of how adaptive markers can be applied for germplasm screening and trait discovery. For example, using the examples above, the association of adaptive allele at locus 21 with a variable related to low rainfall could be the starting point to investigate if that germplasm offers genetic sources for adaptation to drought. In recent years, tolerance to abiotic stresses have become key genetic traits to identify as they offer chances for developing resilient varieties to climate change [29].
Core subset for adaptive markers An analysis of the presence of adaptive markers along individual populations at all mountain ranges showed that only two populations were needed to capture all of the 16 adaptive markers. Populations 658179 (from CAT) and 641032 (from HUA), when combined, included all the adaptive markers. However, if mountain range representation is sought as a parameter to include in a core subset, to capture the maximum number of adaptive markers at each of the mountain ranges, then a total number of populations needed is 18 (see Table 4).

Conclusions
The prospect of using adaptive markers in germplasm is that it could add another way to value, screen and characterize germplasm for conservation and use. Because of their broad distribution, potato wild relatives and native landraces are good materials for mining useful genetic variants associated to environmental adaptation, particularly, as a response to abiotic stresses, or to diseases and pests. In recent years the expansion of high-throughput sequencing is facilitating genome-wide assessments in plant species. Methods like genotyping by sequencing (GBS) can potentially accelerate processes of identifying adaptive genetic diversity in potato species. The results from this study proved that adaptive genetic diversity has enhanced capacity for detecting patterns of geographic structure in S. fendleri. This has the potential of supporting planning for future germplasm exploration and collecting. Another important finding was detecting marker association with specific environmental variables at the sites so that genebanks have options of identifying evolutionary significant units, in other words, chances of detecting germplasm units or groups with unique genetic distinctiveness due to environmental adaptation. Ultimately, the understanding of the mechanisms that shape genetic variation could help to predict responses of populations to future environmental conditions.