The Use of Sequence-Based SSR Mining for the Development of a Vast Collection of Microsatellites in Aquilegia formosa

Numerous microsatellite markers were developed for Aquilegia formosa from sequences deposited within the Expressed Sequence Tag (EST), Genomic Survey Sequence (GSS), and Nucleotide databases in NCBI. Microsatellites (SSRs) were identified and primers were designed for 9 SSR containing sequences in the Nucleotide database, 3803 sequences in the EST database, and 2226 sequences in the GSS database. For validation purposes, 45 primer pairs were used to amplify DNA from 16 A. formosa individuals from the H. J. Andrews experimental forest in Oregon, a Long Term Ecological Research (LTER) site. Genetic polymorphisms were identified at 30 of the 45 microsatellite loci with an average of 13.2 alleles per locus, and the observed level of heterozygosity was greater than 0.8 for 21 of the 30 loci. The use of these polymorphic loci was sufficient to individually separate the 16 individuals using a principal coordinate analysis. This comprehensive collection of primers significantly increased the availability of microsatellite primers for Aquilegia spp. and provided ample material for future studies that required highly variable SSRs such as mapping and association studies and investigation of plant mating system and gene flow.


Introduction
The western columbine, Aquilegia formosa, is a perennial herb from the family Ranunculaceae.It belongs to the genus Aquilegia, which includes approximately 75 species with the similar number of species occurring in North America, Asia and Europe [1].The phylogenetic relationships among species within the genus have been difficult to resolve, possibly due to a rapid speciation in the group [2]- [4].Two independent radiation events have emerged, however, one leading to the North American and the other to the European clades [1] [4].
The ecology of many Aquilegia species has been well studied and there exists ample variation in pollinators, floral traits and mating systems among species, among populations and even within populations of the same species, with evidence of some causal relationships between pollinators, floral traits and mating systems [3] [5]- [9] [10].Phenotypic plasticity exists for many of the floral traits with potential adaptive responses to global warming [11] [12].Habitats can range from Alpine to Lowland, and can include desert springs, temperate forests and rocky outcrops [1] [13].Moreover, habitat specialization may have played a greater role in the radiation of columbines in Europe while pollinator specialization has been more important in the diversification of North American species [1] [3].
The presence of unusual floral organs in Aquilegia species, which include sepals that have the appearance of petals (petaloid sepals), the staminodium, an additional whorl of a novel type of organ placed between carpels and stamens, and petal with a spur with nectaries found at the tip of the spurs, permits the evolutionary study of floral novelties [14] [15].As a result of its variation in ecology, floral morphology, pollination, floral development and mating system, the genus Aquilegia is ascending to the status of model species for studies in ecology, evolution and plant development [16] [17].To improve its usefulness as a model species, genomic tools are being developed for this genus [18], including the recent release of the A. coerulea Goldsmith whole genome sequence (http://www.phytozome.net).The plant A. coerulea Goldsmith is, however, not the wild A. coerulea species, but represents an unknown mixture of different Aquilegia species of both North American and European origins (S. A. Hodges, personal communication).
Microsatellites remain important genetic markers in many studies including genetic mapping, association mapping, investigations of plant mating systems and paternity analyses.Despite the recent development of nuclear tools for Aquilegia, the suitability of the primers developed for currently existing microsatellite loci [19]- [21] remains limited because some of the loci are redundant [22] and others display low levels of polymorphism [21].In this study, we identified microsatellites (SSRs) and designed primers from the Nucleotide database, the Expressed Sequence Tag (EST) database, and the Genomic Survey Sequence (GSS) database through NCBI, and validated the screening by testing 45 of the primers using a wild A. formosa population.Our goal was to develop a larger number of microsatellite loci with a wide range of motif length and repeat number from this data mining process to facilitate well-needed studies of population genetic structure of Aquilegia species [22] [23], and to promote further investigations of the role of pollinators on plant reproductive success [24], mating system [8], floral selection and gene movement within and among populations of Aquilegia species.

Data Mining of SSRs
The SSR mining was performed on 128,082 Aquilegia formosa nucleotide sequences publically available online through NCBI.These sequences included 106 sequences from the nucleotide database and 12,310 sequences from the genomic survey sequences (GSS) database for A. formosa, and an additional 115,666 sequences in the EST database from A. formosa × A. pubescens pooled RNA.The presence of short sequence repeats (SSRs) within these sequences was detected using SSR Locator [25].The program parameters were set to identify and localize microsatellite motifs ranging from 1 -10 bp and only the sequences containing motifs with repeat lengths of mono ≥ 12, di ≥ 6, tri ≥ 4, and tetra through nona ≥ 3 were considered to be SSRs.
We chose the program WebSat [26] to design the oligonucleotide primers from the SSR flanking sequences because it offers the unique ability to visualize the SSR containing sequences.This feature ensured that the primer sequence did not overlap with the SSR of interest and that no primer pair contained more than one SSR.The major parameters selected for primer design were a primer length of 19 to 25 bp (optimum 22 bp), a PCR product size varying between 120 and 325 bp, a GC content between 40% and 80%, and an optimum melting temperature of 55˚C.

Plant Material and DNA Isolation
Leaf tissue was collected from 16 A. formosa individuals from a population located in the H. J. Andrews expe-rimental forest in Oregon, a Long Term Ecological Research (LTER) site.Permission to collect was obtained from Mark Schulze, the Director of the H. J. Andrews experimental forest.The plant species, A. formosa, is not an endangered or protected species and the coordinates for the population were 44.12656 latitude and −122.07827longitude.Total genomic DNA from 0.1 g leaf tissue was extracted using a Macherey-Nagel (MN) Plant II kit (Düren, Germany) following the manufacturer's instructions.
Forty-five SSR primer pairs were used to conduct a survey of genetic polymorphism using 16 A. formosa individuals.These primer pairs were selected based on the length of their motif and the number of repeats.Nine of these primer pairs came from the nucleotide sequence database, 18 from the GSS, and 18 from the EST sequences.We attempted to redesign primers for specific use with M13 sequences and PIG tails for the 32 SSR loci which were previously available for A. formosa [19] [20].However, because of the high stringency of our parameters (length, GC content, melting temperatures, and annealing temperatures), the observed redundancy of loci, and the SSR position in the sequences obtained from NCBI, we were only able to redesign acceptable primer pairs for eight loci.Therefore, eight of the 9 sequences containing SSRs from the nucleotide database in this study were previously identified [19] although the current study used different parameters and protocols for primer pairs design and genotyping.
The observed number of alleles (Na), the levels of observed (H o ) and expected heterozygosity (H e ), the Shannon Information Index (I), and the major allele frequency (M AF ) or frequency of the most common allele were calculated for each polymorphic locus in GenAlEx 6.4 [29].In addition, Hardy-Weinberg Equilibrium (HWE), null allele frequency, and non-exclusion probability of identity (NE-I) were calculated for each polymorphic locus using Cervus 3.0 [30].One-Way ANOVAs were performed to identify differences in diversity statistics among the three databases (Nucleotide, EST and GSS).Finally we performed a principal coordinate analysis (PCoA) with all polymorphic loci in GenAlEx 6.4 [29] based on the genetic distances between pairs in order to demonstrate the markers' ability to discriminate between unique individuals.

Results and Discussion
A screen of the 128,082 A. formosa sequences deposited in NCBI identified a total of 65 SSRs in the nucleotide database, 4372 SSRs in the GSS database, and 11,519 SSRs in the EST database.The most frequent motifs identified among the 15,956 ungapped SSR loci were trinucleotides (48.9%), followed by mononucleotides (19.1%), tetranucleotides (13.2%) and finally dinucleotides (9.5%) (Figure 1; Table S1).The most frequent mononucleotide motif was A (17.2% of the total), dinucleotide was AG (4.6%), trinucleotide, GAA (7.6%) and tetranucleotide, AGAA (2.0%).The 15,956 sequences containing SSR were inspected for suitable flanking sites for primer design, and primer-pairs were designed for 9 of the SSRs from the nucleotide database, 3,803 of the EST SSRs, and 2226 of the GSS SSRs using WebSat [26] (Table S2 and Table S3).
Thirty of the 45 SSR primer pairs tested amplified in the expected size range (Table 1) and displayed polymorphism across individuals (Nucleotide: 6, GSS: 10, and EST: 14 primer pairs; (Table 2)).Of the remaining 15 primer pairs tested, nine did not amplify at all and 6 amplified allelic patterns inconsistent with single locus segregation in a diploid species (Table S4).Of the 8 SSR loci from the Nucleotide database which were rede signed for this study [19], only 6 amplified in a diploid inheritance pattern (Table 2, Table S4).Therefore, of the SSR loci previously available for Aquilegia, 20 have low allelic diversity [21] and only 6 of the 32 SSR loci with higher allelic diversity [19] [20] met the more stringent criteria used for primer development in this study.
The 30 polymorphic loci yielded 408 alleles with an average of 13.6 alleles and a range of 5 to 25 alleles (N a ) per locus (Table 2).These values are consistent with previous primer discoveries in Aquilegia species from North America where, for 28 individuals and 16 primers, an average of 14.4 alleles per locus has been reported for A. formosa, and 13.9 alleles per locus for A. pubescens [19].In addition, Gallagher et al. (2004) [20] reported 15.5 alleles per locus in A. chrysantha from 16 primers in 165 individuals.Lower number of alleles per locus has been detected in two Asian columbine species, with 2 -4 alleles per locus in A. flabellata and 2 to 5 in A. oxysepala [21].For most of the loci examined, the level of heterozygosity was high, a pattern that has been previously observed for another columbine species, A. coerulea [11].
The results of the ANOVAs revealed no statistically significant differences among the three databases, GSS, EST and Nucleotide, in the total number of alleles per locus (GSS = 14.2).This lack of difference is present despite the fact that the microsatellite loci identified in the EST database are transcript-based SSRs and are therefore considered to be more evolutionarily conserved and potentially less polymorphic than SSRs developed from the genome [28].Similar results have been observed in germplasm screens of potato and water Table 1.Primer sequences and their relevant characteristics.These primers were designed for 30 polymorphic SSR containing loci from A. formosa identified in sequences retrieved from the Nucleotide, Expressed Sequence Tag (EST), and Genomic Survey Sequences (GSS) NCBI databases.melon using EST-SSR loci [31] [32].Perhaps the high levels of polymorphism observed in transcript-based SSRs for A. formosa adults in the H. J. Andrews experimental forest could be due to the fact that homozygotes  are eliminated in the adult population as a result of the strong inbreeding depression that exists in the species [33].The 30 polymorphic SSRs were more than sufficient to separate all 16 A. formosa individuals from the H. J. Andrews experimental forest in a principal coordinate analysis (PCoA) (Figure 2).The extremely low non-ex- clusion probability of identity for the 30 markers (9.33E−49) indicates that these markers are suitable for genetic fingerprinting and other studies.For example, genotyping using only two primers, DR51797 and ER968006.2,discriminated all 16 individuals because the combined non-exclusion probability of identity for the two markers was 2E−5.These results indicate the potential of these SSRs for examining population-level processes such as paternity analyses and studies where individuals must be identified [24].Researchers can achieve the same genetic information content using one microsatellite with k alleles as (k-1) biallelic markers such as SNPs [34].Furthermore, SSRs are relatively evenly distributed across genomes which make them useful for conducting comparative genomic analyses, linkage and QTL mapping, and genome wide association studies [35].Finally, the transcript-based SSRs for which primers were designed in this species are likely to be associated with functional genes and could be an important resource for studies searching for and identifying genes in Aquilegia.The 6038 primer-pairs designed for SSR loci in A. formosa and the 30 polymorphic SSR which were tested and validated (Tables S2 & S3) complement the molecular tools currently available for Aquilegia research [18].

Conclusion
An extensive number of microsatellite primers located in transcribed and genomic regions were developed for A. formosa.This collection is the most comprehensive set of microsatellite primer-pairs developed in Aquilegia to date, and illustrates how sequence-based SSR mining can produce many microsatellites of a diverse motif type and repeat length that adds to the existing microsatellite libraries often based on enrichment methods to provide a larger marker resource for scientists.The collection of microsatellites presented here contributes to the various molecular tools being developed in Aquilegia.Given the extensive knowledge of the ecology, pollination biology, floral biology and mating system of some Aquilegia species, these molecular tools can facilitate the mapping of functional genes and promote studies of the population genetic structure, selection and gene flow in diverse species.

Figure 1 .
Figure 1.Frequency of repeat numbers within mononucleotide through nonanucleotide microsatellite (SSR) motif levels.The nonanucleotide for GSS and EST only have repeats of 3 while the octanucleotides for GSS have repeats of 3 and 4.These repeat numbers were examined within 11,519 SSRs identified in the NCBI EST database from A. formosa × A. pubescens pooled RNA and within 4,372 SSRs identified in the NCBI GSS database for A. formosa.

ER938628. 2 a
Sequences from the NCBI Nucleotide Database; b Sequences from the NCBI GSS Database; c Sequences from the NCBI EST Database; d M13 tail and PIG tail included in the observed allele size range; e PCR performed with 5' FAM fluorescently labeled primers; f PCR performed with 5' HEX fluorescently labeled primers.

Figure 2 .
Figure 2. Principal Coordinate Analysis (PCoA) showing the genetic relationships of 16 A. formosa individuals.Leaf tissue for these individuals was collected from the H. J. Andrews experimental forest in Oregon, USA.The PCoA used alleles from 30 polymorphic microsatellite loci identified in sequences retrieved from the NCBI Nucleotide, Expressed Sequence Tag (EST), and Genomic Survey Sequence (GSS) databases.

Table 2 .
Diversity statistics for 30 polymorphic SSR loci tested in 16 A. formosa individuals.