New Interpretations about Clonal Architecture for the Sour Pitaya ( Stenocereus gummosus , Cactaceae ) , Arising from Microsatellite Markers of de novo Isolation and Characterization

The use of molecular markers has improved the ecological and evolutionary research in the case of clonal reproduction species, allowing the identification of boundaries among clonal modules (ramets), genetic individuals (genets), and populations. Microsatellite markers were developed for Stenocereus gummosus, a columnar cactus with both sexual and clonal reproduction which is distributed in the Sonoran Desert, Mexico. 454-Pyrosequencing reads were analyzed to detect microsatellite markers. Forty primer pairs were screened to look for polymorphism. Nine loci were genotyped for two S. gummosus localities. Sampling strategy was intended to avoid collecting a genetic individual more than once, considering that clonal architecture for this cactus was previously deduced as clumped. Polymorphic loci exhibited low allele count, ranging from 2 7 (mean of 3.78 ± 0.62 SE); mean heterozygosity values were 0.221 and 0.234 HO and 0.408 and 0.306 HE, with FIS of 0.383 and 0.299, for peninsular and continental localities respectively. Unexpectedly, some multilocus genotypes were found repeated within locality, which were assumed as clones since data was evaluated as sufficient (clonal richness R of 0.966 and 0.897). These results were different from those previously reported: the distribution of clones might as well be intermingled, having a minimum ramet dispersion distance of 30 m. This characteristic was also consistent with Corresponding author. O. A. Lozano Garza et al. 2 the recent colonization proposed for this and other arid lands plants. A wider genetic neighborhood, due to clone dispersion might affect diversity indexes while increasing the chance of geitonogamy and mating among relatives. The markers isolation and its population characterization allowed addressing new questions about S. gummosus ecology, clonal reproduction and reproductive biology.


Introduction
The use of molecular markers has increased the opportunity to address ecological and evolutionary questions about complex lifestyle organisms.Models originally developed for exclusively sexual reproduction life forms are currently implemented for molecular research on organisms capable of clonal reproduction [1].An adequate identification of clonal membership is a crucial step for studying populations of clonal organisms, and a task for which highly polymorphic microsatellites have shown to gather enough statistical power [2].
Sexual reproduction is believed to be more important than clonal propagation in columnar cacti [3], despite of several mechanisms of clonal fragmentation and growth have been described for this group [4]- [7].Stenocereus gummosus (Engelm.ex.K. Brandegee) A.C.Gibson & K.E.Horak, is not an exception, since the energy expenditure for flower and fruit development, along with its highly viable seed production, is quite remarkable [8] [9], in contrast to the low seed setting registered for S. eruca [10], a species derived from S. gummosus [6].
Stenocereus gummosus is a columnar cactus with shrubby growth form of decumbent branching stems (Figure 1(a) and Figure 1(b)) [11], and it's notable as a co-dominant species in most plant communities in the Baja California Peninsula.The species is endemic to the Sonoran Desert [9].It exhibits a disjunct distribution between Baja California Peninsula and a restricted coastal area of mainland Mexico, with presence in the Gulf islands [12].Sphingidae moths and small beetles of the Nitidulidae and Cantharidae families mainly pollinate S. gummosus self-incompatible nocturnal flowers (Figure 1(c) and Figure 1(d)) [9] [13].
Previous allozyme evaluations found heterozygote deficiency under Hardy-Weinberg expectation, for populations throughout the distribution of S. gummosus [13], with mating among relatives, genetic subdivision, homozygote advantage, clonal structure, and geographic variations in the mating system, were proposed as possible causes for large inbreeding coefficient (F IS ) levels; and of these hypotheses, only the clonal structure have been directly evaluated.The proportion of clonal growth was determined as less frequent than the one from sexual recruitment [14], with R ranging from 0.633 to 0.959 (R estimated according to Dorken and Eckert [15]).This spatial distribution of identical genotypes was only explored for the population with the highest proportion of unique genotypes, where the distance between ramets was less than 9 meters, describing a clumped distribution of the clonal units, according to Charpentier [16].
Nonetheless, it has been pointed out that the clonal architecture may have strong impact on geitonogamy (pollination among flowers of the same plant either from the same or different ramet) [4] [16], a characteristic that is difficult to discriminate from outcrossing events among relatives [17].Both of these reproductive events may affect F IS and interpretations about genetic population structure.
When clumped clonal architecture occurs for species with self-incompatible flowers, a sampling strategy can be designed to avoid collecting ramets of the same genetic individual, taking advantage of multilocus genotype (MLG) and clonal patch extension.With this strategy, the effect of geitonogamy can be excluded from F IS , which may result exclusively from the effect of outcrossing among relatives, provided that the species is selfincompatible and that the distance among the samples collected exceeds the genetic neighborhood size driven by pollinator's dispersal limitation [4] [7] [18] [19].
Besides S. gummosus, other columnar cacti's clonal identification has been accomplished through allozyme and RAPD markers (e.g.Lophocereus schottii, Ferocactus robustus, S. eruca) [4] [7] [20].These markers have been criticized because of their low resolution for identifying clonal membership, having low polymorphism as a compromising characteristic [2].Parra et al. [21] determined the clonal identity of Stenocereus pruinosus individuals, using 4 cross amplification microsatellite markers originally described for Polaskia chichipe [22].Al- though these markers may be adequate for the evaluation of clonal identity, their available number and its allelic diversity may not provide enough statistical power to that end [23].This issue could be solved by designing species or genus specific microsatellite markers.The focus of this study is to isolate and describe specific hypervariable neutral markers and to test its statistical power for clonal membership analysis.This will provide the possibility to assess the causes of the high F IS, which are previously found by allozyme markers in Stenocereus gummosus.These loci will be useful for further genetic diversity estimations, clonal group determination through MLG, and the measure of population structure in S. gummosus and other members of the genus.

Plant Material and Sample DNA Preparation
A sample of 40 S. gummosus individuals from Bahía Sargento (29.212369˚N, 112.202189˚W), and 30 individuals from Piedra Blanca (28.229661˚N, 113.159389˚W) was collected for population characterization.The first locality was chosen nearby the one from previous analysis for spatial clonal distribution by Molina Freaner and Clark-Tapia [14], allocated on mainland Mexico; the second locality was intended to represent the Peninsula populations, which may be reproductively isolated from the Sonora ones since the Gulf of California permanently separates them.Each sampled individual was distanced on-site at least for 30 -35 m from the rest, leaving the unsampled individuals in between.Assuming in a clumped clonal architecture, as found by Molina Freaner and Clark-Tapia [14], each individual sample might belong to a different genet.The collected individuals describe two inverted semicircles to opposite sides of each coordinate previously specified.Sampled tissue fragments were taken from the stem chlorenquima to be stored in 96˚ ethanol on 2 mL tubes, and deposited at the tissue collection of the Laboratorio de Genética para la Conservación (CIBNOR, La Paz, BCS, Mexico).The source specimens were left living on-site, for which no voucher specimen was collected.
Genomic DNA was extracted using an optimized protocol of de la Cruz et al. [24].A sample of 50 ng of RNA-free, high quality DNA from an individual collected at El Comitán (24.135385˚N, 110.42504˚W); voucher specimen HCIB27832 deposited at the Herbarium of the CIBNOR (La Paz, Baja California Sur, Mexico) was used for shotgun sequencing by using a 454 GS-FLX Titanium instrument (Roche Applied Sciences, Indianapolis, Indiana, USA) at the UCSC Genome Sequencing Center (Santa Cruz, California, USA); other seven barcoded species were included in the same plate.

Microsatellites Identification and Primer Design
Msatcommander [25] was used to filter sequences among 454 reads containing perfect microsatellites with 2 -6 bp repeat units, counting at least five repeats for di-nucleotides and four for tri-, tetra-, penta-, and exa-nucleotides.Primers were designed for filtered microsatellite hit results using Primer3 software [26] embedded in QDD [27] when hits fulfilled the following parameters: (1) 50˚C -70˚C melting temperatures with a 2˚C maximum difference between paired primers; (2) PCR product of 90 -320 bp in length; (3) GC content > 40%; and (4) A primer length between 17 and 27 nucleotides.Default parameter values were maintained in Primer3 for primer self-complementarities and complement between pairs.Favoring larger motifs (repetition numbers) and lower melting temperature differs between primers [28], 40 primer pairs were chosen for PCR amplification screening and polymorphism test.The names for the loci were designated with the prefix "Sgum" and a consecutive number (1 -40).Polymorphic loci are listed in Table 1, describing microsatellite motifs, PCR conditions, and fluorescent dye-labels (Applied Biosystems, Foster City, California, EUA) incorporated to forward primer sequences.

PRC Procedures
PCR reactions were carried out in 10 μL volume, using an iCycler thermocycler (BIORAD Laboratories, Hercules, California, USA), containing approximately 45 ng of genomic DNA, 1 × PCR buffer (20 mM Tris-HCl, pH 8.4, 50 mM KCl), 0.25 mM of each dNTP, 0.4 μM of each primer, 100 mM of BSA (New England Biolabs, Beverly, Massachusetts, USA), and 0.3 -0.4 units of TaqDNA polymerase (Invitrogen, Carlsbad, California, USA), testing 1.5, 2.0 and 2.5 mM concentrations of MgCl 2 ; reactions for Sgum24 also included 5% glycerol.Forward primers for polymorphic loci were fluorescent-labeled (Table 1).Cycling conditions were conducted as Table 1.Summary data for 9 polymorphic microsatellite loci amplified by PCR for Stenocereus gummosus: forward (F) and reverse (R) primer pairs, fluorescent dye-labeled forward primer, product and repeat sizes as found on 454 reads, and repeat motif.The allelic range found (Range), optimized magnesium chloride concentration (MgCl 2 ), and annealing temperatures (Tm) are indicated for each locus.follows: 95˚C for 5 min as initial denaturation, 35 cycles at 94˚C for 60 s each, 35 s at the locus-specific annealing temperature (Table 1), extension at 72˚C for 15 s, and a final extension at 72˚C for 15 min.Amplification products were visualized in ABI 310-3730 PRISM ® (Life Technologies, Carlsbad, California, USA), using LIZ600 as size standard (Applied Biosystems).

Data Analysis
Allele assignation was automated with GeneMapper 4.1 (Applied Biosystems).The minimum peak height acceptance was set to 100 Relative Fluorescence Units (RFU), and the minimum height ratio between peaks for heterozygotes to one third.All peaks assigned were manually checked for allele shape (Figure 2).To know if the set of loci genotyped provides enough power to discriminate all MLG present in the sample, a Monte Carlo procedure, resampling 5000 permutations, incrementing loci number for each locality was implemented in GENCLONE 2.0 [23] [29], and its corresponding results were plotted using the R package ggplot2 [30].Clones were identified among samples by analyzing the MLG composition, calculating the genotypic richness index (R) and identifying one-allele difference MLG.Both identical MLG and one-allele difference MLG (23 individuals) were confirmed by PCR re-amplification and visualization as described above to reduce the probability of scoring errors.Of the individual genotypes, 10% were reassessed starting from DNA extraction to approach an error rate.The probability of identity (p id ) trough all loci (average probability that two unrelated individuals, drawn from the same randomly mating population, will by chance have the same multilocus genotype) was estimated using GenAlEx 6.5 [31].Repeated MLG samples were assumed to correspond to ramets from the same individual, thus excluding repetitions for the following estimations.
Presence of null alleles, scoring errors due to stuttering and allelic drop out was explored implementing MICRO-CHEKER software [32].Frequency for null alleles was computed through 5000 bootstrap resampling steps over loci for global fixation index (F ST ) values by the Expectation Maximization algorithm [33] with FreeNA software [34].Effective alleles number (A E ) and inbreeding coefficient (F IS ) were estimated with standard methods implemented in GenAlEx 6.5 [31].Expected and observed heterozygosity (H E and H O , respectively), deviations from the Hardy-Weinberg equilibrium, and linkage disequilibrium between markers, were calculated or tested using default parameters in GENEPOP 4.2 [35].Benjamini & Hotchberg [36] False Discovery Rate correction was used for multiple hypothesis tests.

Microsatellite Identification, Primer Design and PCR Screening
Genomic sequencing has been chosen over other techniques for microsatellite markers development due to the high sequencing throughput and the possibility of selecting microsatellites and motif length.The resulting number of markers and the polymorphic loci proportion are expected to be higher with this technique [37].Unfortunately, no previous information of any microsatellite development technique for neither the Stenocereus gummosus species nor genus is available to compare against the results reported here.
The 454 pyrosequence reaction resulted in 42,554 reads with a mean length of 311.11 nucleotides, where 1539 (3.6%) contained a microsatellite motif, regardless of their suitability for primer design.A total of 179 reads (11.6% of microsatellite loci found) were suitable for primer design.Out of the 40 chosen loci for PCR and polymorphism screening, 28 were monomorphic, 10 were polymorphic, and 2 exhibited inconsistent amplification patterns.Only polymorphic loci with more than 50% of the genotypes successfully recorded (Table 1) were used for population characterization.The polymorphic loci proportion (30%) was greater than reported for genomic libraries (15% and 21%, Table 2), but lower than enriched genomic libraries (53%, 91% and 70%, Table 2) and cross amplification (57%, Table 2); these differences may rely on the species reproduction system, demographic issues, mutation rates on microsatellite loci, specific genomic composition, or even on the markers isolation technique implemented [37] [38].

Population Characterization
Micro-Checker software [32] detected homozygote excess in 4 loci for Piedra Blanca locality (Sgum05, Sgum12, Sgum31 and Sgum36) and in 3 loci for Bahía Sargento (Sgum05, Sgum31 and Sgum36), scoring error due to stuttering were only predicted to happen in both localities for Sgum05 and Sgum31, and no signal of allelic dropout was detected.The null alleles frequencies calculated for two loci within Piedra Blanca (0.33 for Sgum05 and 0.27 for Sgum31), and one for Bahía Sargento (0.21 for Sgum05), are likely to hinder population differentiation analysis with F ST , and may overestimate F IS [34]; the rest of the calculated frequencies were left  a, b, c, d,  e and S. gummosus) and one population structure (f) studies, all them using microsatellite loci on Cactaceae species.Species, tribe, geographic region where the study has been carried out, the technique use for marker isolation, the number of populations analyzed (Populations), and the sample size (Sample) are specified.[41]; c from Solórzano et al. [44]; d from Pérez et al. [43]; e from Otero-Arnaiz et al. [22]; f wild populations from Parra et al. [21].

Species
aside since they were very small (mean frequency of 0.069 ± 0.02 S.E.).When PCR amplification and product visualization were repeated, 5 alleles assignment were modified for the loci Sgum05, Sgum06 and Sgum24 (error ratio of 0.018 accounting for 270 PCR reactions repeated); these scoring errors were always identified as mistakes at the electrophoresis peak reading for allele size assignment, which allowed improving the automated analysis.An increment on sample size [39], or a PCR primers redesign based on individuals' sequences for those loci [40], could be useful to avoid bias due to null alleles on further population genetics studies.In this case, for which the goal was to determine clonal membership, achieving precision and reproducibility on genotyping lets know that the loci are useful, provided that null alleles and stuttering behave evenly at the population level.Significant linkage disequilibrium was detected between Sgum05 and Sgum31 when analyzed globally over both localities individuals.This condition persisted only in Piedra Blanca when sample was analyzed by locality.The number of alleles per locus (A) among polymorphic loci ranged from 2 -7 (Table 3), H O ranged from 0.000 (Sgum25 in Piedra Blanca, and Sgum05 in Bahía Sargento) to the maximums of 0.552 and 0.556 (Sgum06), and H E from 0.034 (Sgum39) and 0.055 (Sgum25) to 0.673 (Sgum36) and 0.557 (Sgum06) (Table 3), with mean values of 0.221 and 0.234 for H O and 0.408 and 0.306 for H E , in Piedra Blanca and Bahía Sargento respectively.The mean F ST was 0.082 ± 0.39 S.E.Significant deviations from Hardy-Weinberg Equilibrium (HWE) were detected in 4 loci in Piedra Blanca (Sgum05, Sgum12, Sgum31 and Sgum36), and three in Bahía Sargento (Sgum05, Sgum31 and Sgum36); deviations are due to heterozygote deficiency in all cases (Table 3), but this departure from HWE may as well be null alleles artifact, as predicted by Micro-Checker software.

Clonal Membership and Geitonogamy
Both diversity indexes, mean allele number for polymorphic loci (A = 3.78 ± 0.062 SE), and expected and observed heterozygosity (H O = 0.228 ± 0.047 SE, H E = 0.357 ± 0.048 SE), reached the lowest value when compared to other cacti (Table 2), having mean values (±SD) of A = 5.05 ± 1.84, H O = 0.461 ± 0.153, and H E = 0.559 ± 0.138 [21] [22] [41]- [44].These low diversity indexes are intuitively explained in terms of inbreeding coefficient on a low effective population size (N e ) scenario [45], coherent to the F IS of 0.608 ± 0.025 SE previously reported [13], albeit inconsistent with the Molina Freaner and Clark-Tapia [14] conclusion, where sexual recruitment is found as more frequent than clonal growth.The average F IS here reported (0.341 ± 0.122 SE) may be less biased by selection than allozyme results [46], but null alleles can still be artificially increasing it in a small magnitude in respect to the actual inbreeding [47].Though attenuated, the inbreeding coefficient is yet remarkable, so that reproduction system aspects like pollination syndrome, mating among relatives and the possibility of geitonogamy, cannot be discarded as causes for these low diversity indices.
The MLG resampling procedure shown that 7 loci were enough to determine clonal membership for Piedra Blanca locality (Figure 3(a)), but 8 or 9 loci might be required in the case of Bahía Sargento (Figure 3(b)); the p id values of each locality accounting the 9 loci were 2 × 10 −4 and 2.6 × 10 −3 , respectively.Piedra Blanca accounted 29 unique genotypes with R = 0.966, and 36 unique genotypes were detected for Bahía Sargento with R = 0.897.These values slightly increase the ones found by Molina Freaner and Clark Tapia [14] with allozyme markers, where the maximum was R = 0.959 for a locality nearby Bahía Sargento from this study.One-allele difference MLG within locality were found only for Bahía Sargento, with 12 individuals into at least 2 putative multilocus lineages (clonal ramets carrying a somatic mutation identified as one-allele difference) [23], where 2 of the 3 repeated MLG of that locality are members.This finding provides a scenario that is supposed to be eliminated by collecting individuals distanced by at least 30 meters, in which ramets are now somehow able to spread away from the source plant, farther than it has been previously found [13].
Identical MLG of Lophocereus schottii individuals, distanced in field by 200 or more meters, have been defined as long-distance transportation events, where detached stems are dispersed downstream by floodwaters [7].This clone dispersion mechanism is likely to occur along the distribution range of S. gummosus (shared with L. schotti), species mostly found on alluvial, arroyo margins and rocky hillsides [8] [12].The latter results in an intermingled distribution of the clones and a subsequent increase of the genetic neighborhood [7], where ramets disperse not only to the space surrounding the source specimen but, in this case, also for at least 30 meters away.Intermingled clonal architecture is a characteristic that gives a clonal plant the capability to spread more quickly [48].It has been proposed that L. schottii and S. gummosus populations along the Baja California Peninsula have occupied their current distribution range through recent colonization events, reaching such large areas [13] [49].
This scenario implies that no discrimination can be made about the effect of geitonogamy from that of endogamy, because the genetic neighborhood size was not exceeded by the sampling strategy followed [17] [19].Though the impact of geitonogamy is expected to be less severe as clones dispersion capability increases on insect pollinated plants [16], it may increment the probability of mating among relatives as it increases neighborhood size.
A population analysis of wider geographical range, employing these markers, might be useful to enhance accuracy for linkage disequilibrium tests.These 9 isolated microsatellite loci are the first identified and characte- rized ones for the genus Stenocereus.The availability of these molecular tools allows assessing questions integrating population genetics, ecology, conservation, and reproductive biology for S. gummosus and for other species phylogenetically related.

Figure 3 .
Figure 3. Boxplot describing the genotypic resolution for the 9 loci characterized in each population data set: (a) Piedra Blanca; and (b) Bahía Sargento.The box edges indicate the minimum and the maximum, while the inner line shows the mean number of distinct MLG found for x loci.

Table 2 .
Number and percentage of polymorphic loci (Polym.),genetic diversity measured as mean allele number per locus (A), observed (H O ) and expected (H E ) heterozygosity, ± its respective Standard Error (SE) of six characterizations (

Table 3 .
Total number of samples analyzed (N) and number of alleles per locus (A) found through both populations are listed first; population sample size (n), effective alleles number (A E ), inbreeding coefficient (F IS ), observed heterozygosity (H O ) and expected heterozygosity under Hardy-Weinberg Equilibrium (H E ) at Piedra Blanca and Bahía Sargento are specified.The H E and F IS estimations do not apply for loci with only 1 allele, depicted with (-).