A Simplified Approach for Olive ( Olea europaea L . ) Genotyping and Cultivars Traceability

The Tubulin Based Polymorphism (TBP) method was used to genotype olive cultivars of different origin and to produce short-size cultivar-specific molecular probes. Both the first and the second intron of the members of the olive β-tubulin gene family were exploited as sources of DNA polymorphism. Compared with the data obtained with the use of a set of 11 SSR markers selected from an Olea europea L. database, TBP is shown to provide similar, if not better, information about the polymorphic content of the olive genomes, releasing, at the same time, a simple and discriminatory DNA barcode specific for any of the analyzed cultivars. Such a barcode is the source for the preparation of variety specific molecular probes.


Introduction
Native to the Mediterranean basin, olive (Olea europaea L.) represents one of the most ancient and precious agricultural trees for both cultural and economic reasons.The actual richness and diffusion of olive germplasms reflects on one side its longevity and scarce breeding improvement, and on the other a way of propagation that, largely based on cutting and grafting, has promoted a vast and multidirectional dispersal of the plant genetic material.This world-wide distribution has eventually contributed to a relevant mixing of genotypes and has been clearly documented by the high level of heterozygosity found in well investigated olive collections such as that of the USDA National Clonal Germplasm Repository (NCGR-USA).This type of propagation has always made hard to retrace the distribution events and to agree upon a consensus classification of different cultivars [1].In addition, the socio-economical changes of the last decades have driven significant technological improvements in the olive cultivation, increasing the risk of genetic erosion of the local olive germplasm, resulting from the selection of few cultivars, more suitable for intensive cultivation systems [2].In accordance with this assumption, olive germplasm identification and characterization represents a high priority challenge to ensure the appreciation of the great genetic diversity existing within the worldwide-distributed olive resources.In fact, several studies performed with the use of different sets of molecular markers, typically SSRs, experienced a substantial difficulty to provide consistent information about the extant genetic variation and reciprocal genetic relationships as well as to assign cultivars to their original area of cultivation [1]- [6].On the contrary of what has been achieved in the classification of the grape germplasm [7] and despite a meritorious attempt done by different laboratories, an internationally shared set of SSR markers capable of high discrimination of olive cultivars has neither been conclusively defined nor accepted yet.Essentially based on dinucleotide repeats, irregular series of SSR alleles are often observed at several loci due to the presence of unexpected compound motifs, as revealed by nucleotide sequencing [8].This has a negative influence on the mode of allelic increments and causes discrepancies in alleles sizes that makes it difficult to compact the data in reliable binnings.In order to minimize the effect and make data comparison feasible between different laboratories, allelic ladders, obtained by the nucleotide sequencing of reference cultivars, are required [1].Similarly, to the reference alleles used in grape, allelic ladders help to define a homogeneous reference system.Even so, a robust and largely shared cluster analysis on olive cultivars is still missing.Very recently, a new approach that made use of 10 standard simple sequence repeats (SSRs) and 9 olive expressed tag sequences (OLEST) based on tri-nucleotidic EST-SSR, has been proposed as a reference combinatorial method of analysis that could possibly overcome the well known limit of the olive genotyping based on dinucleotide SSR motifs [9].Because of all these uncertainties, the laboriousness of the proposed methods and the lack of agreement on commonly shared molecular markers, the feasibility of the TBP (Tubulin-Based-Polymorphism) approach to the genotyping and classification of olive germplasm has been assessed.TBP is a nuclear molecular marker with high discrimination power.It uncovers DNA length polymorphisms found within the two introns commonly present in conserved positions in the coding region of any member of each plant beta-tubulin gene family.TBP is highly successful in the molecular authentication and genomic profiling of any plant species to which a distinct Tubulin-based-Barcode (TbB) can be assigned.Depending on the way of crossing and propagation, TBP may also be very effective in the identification of varieties and hybrids as it has been already reported for grape and passionflower [10] [11].Given the genetic history of olive and its way of propagation that brought to our times a substantially retained and ancient germplasm, it was worthwhile to verify if TBP could possibly provide a new, alternative, easy and reliable tool for the characterization of the existing genomes variability.In accordance, we have used TBP for genotyping 15 different olive varieties of Greek, Italian and Spanish origin and compared the results to those obtained with the use of 11 published microsatellite markers retrieved from the OLEA database: http://www.oleadb.it/[12].Data with reference to DNA barcoding, phylogenetic relationships and the production of varieties-specific molecular probes are reported.

Plant Material and DNA Extraction
In the present study, 15 olive cultivars of different origin were analysed (Table 1): 10 Greek accessions were provided by the Agricultural University of Athens and the remaining part were obtained from the private collection of the F. lli Buccelletti Nurseries in Tuscany.
Young fresh leaves were grinded using mortar and pestle in the presence of liquid nitrogen.The genomic DNA (gDNA) was isolated from 100mg of fine powder using the DNeasy Plant Mini Kit (Qiagen) according to the standard procedure.For each accession, the gDNA was individually extracted from three distinct leaves and then bulked together, as representative of the sample.Quality and amount of the obtained DNA were evaluated both by spectroscopic and by Qubit TM fluorometric quantitation (Thermo Fisher Scientific).Genomic DNA samples were stored at −20˚C.

SSR
Eleven microsatellite loci were selected from three groups of published SSR primers: DCA [13], GAPU [14] and EMO [15], all but one (DCA 4) according to those included in the OLEA database.20 ng of gDNA was used as a template for PCR reaction according to the protocol described by Huang et al. [16] for the amplification of microsatellite fragments using labelled fluorescent M13 tail in order to separate them by capillary electrophoresis with the Applied Biosystems® 3500 Genetic Analyser (Thermo Fisher Scientific).M13 tailed primers were synthesized and labelled with one of three florescent dyes: 6-FAM, VIC and NED (Thermo Fisher Scientific).The use of fluorescent dye-labelled SSR markers provided greater precision in allele resolution and sizing.Samples preparation and capillary electrophoresis conditions have been already described [17].
Microsatellite data were analysed with GeneMapper v 5.0 software (Thermo Fisher Scientific) and manually checked and rescored as necessary.A negative PCR control (not template) was included for each SSR reaction.All amplifications were repeated at the least twice, to ensure the consistency of the analysis.

TBP Amplification and Capillary Electrophoresis (CE)
The TBP 1 st and 2 nd intron amplification was essentially performed according to Breviario et al. [18] except for the labelling with FAM fluorophore (Thermo Fisher Scientific) of both forward primers at their 5' position, done as described by Gavazzi et al. 2012 [17].
Thirty ng of genomic DNA were used as a template for PCR amplification.

Data Analysis
TBP and SSR peaks (alleles) obtained for the different olive accessions were scored and converted into binary data by assigning 1 to presence or 0 to absence.
The genetic similarity matrices were estimated using the NTSYSpc2.1 software [19] according to the Dice's coefficient and a Principal coordinate analysis (PCoA) were inferred by DCENTER and EIGEN procedures to explore associations among accessions.The Mantel test [20], obtained using the COPH and MXCOMP procedures (NTSYSpc2.1 software), was used to verify the level of agreement of the respective TBP and SSR similarity matrices.
The Polymorphism Information Content (PIC) values were calculated using Anderson et al. [21] formula for both marker systems.In addition, for the SSR markers, the number of alleles per locus (k), Observed heterozygosity (Hobs), Expected heterozygosity (HExp) and the frequency of null alleles were computed by Cervus 3.0.7 [22].

PCR Probes Design and Amplification Test
Intron nucleotide sequence information was obtained by cloning and sequencing the amplification products of the TBP 2 nd intron, from both the "Frantoio" and "Arbequina" varieties.The isolated target sequences were aligned by Vector NTI

TBP Effectively Discriminates among Different Olive Varieties
The TBP method, based on the detection of intron length polymorphisms  The tubulin identity of the amplified fragments was verified by nucleotide sequencing and such information was further exploited for the preparation of a specific molecular probe (see below).On the average, the 1 st intron of the olive β-tubulin members, proximal to the 5' end of the gene, was longer in size and more broadly distributed than the 2 nd intron First intron sizes range from about 80 bp to more than 1000 bp, after subtracting 305 bp of the amplified exon boundaries.However, the vast majority of the amplified bands from the 2 nd intron has short sizes, less than 100 bp in length.The number of detected alleles is 26 for the 1 st intron and 30 for the 2 nd with an overall mean value of 28 (Table 2).The highest percentage of polymorphic alleles (80%) was recorded by the 1 st intron amplification, despite the lower number of peaks while 63% of polymorphism was the percentage detected from the amplification of the 2 nd intron.The reduced number of the detected peaks from the 1 st intron amplification reflects the limit of the CE analysis to discriminate products with a maximum size of 1400 bp.The different sizes distribution of the alleles clearly suggests that the two groups of introns have evolved independently from each other.Polymorphic information content (PIC) calculated for both 1 st and 2 nd introns is also high and significant: 0.973.This value reflects a good discrimination power among the 15 analyzed genotypes.Repeatability and reproducibility of the data were ascertained by verifying the consistency of the barcodes in multiple independent extractions and analyses.In one case, 100 distinct accessions of the same variety, "Arbequina", showed the same identical TBP profile (data not shown).
As shown in Figure 2 the Principal Coordinates Analysis (PCoA), inferred by combining the TBP data obtained from both introns, explains 51% of the total variation existing among the different analyzed samples.With the exception of "Adramitini", the plot revealed three major groups that can be referred to the different geographical origin, Greece, Spain and Italy, of the analyzed cultivars (black, green and red circles respectively in Figure 2).

SSR Data
In order to ascertain the level of reliability of the data based on TBP, the DNA samples of the same 15 olive varieties were analyzed with a battery of 11 microsatellite markers previously developed for Olea europaea L. and then included in the OLEA database (Table 3).Both loci carrying complex motifs and di-nucleotide repeats were included.The variation between the estimated allele sizes and the published values were considered valid within the range of ±1 bp.All the estimated size ranges fall within the expected published value.
As opposed to the TBP analysis, 11 individual reactions were run, one for each marker.A total number of 81 alleles were detected with an average of 7.4 alleles per locus.The details of the analyzed SSR loci and the data on the detected polymorphisms are reported in Table 4.
The set of the analyzed loci revealed good PIC values, ranging from 0.577 to 0.842 with a mean value of 0.742, indicating an adequate informativeness level, yet lower than that obtained with TBP.The mean value of the observed heterozygosity (HObs) was slightly lesser than that of the expected heterozygosity Figure 2. Three dimensional projection of the Principal Coordinate Analysis (PCoA) inferred by the TBP data (1 st and 2 nd intron).The three axes explain 20%, 16% and 15% of the total variation, respectively.The red circle includes Italian cultivars, the green one cultivars from Spain, except for "Adramitini", and all the other samples from Greece are highlighted by the grey circle.(HExp).As a consequence, the probability of occurrence of null alleles (f) was positive for each of those loci showing HObs < HExp.Such null alleles can arise when mutations prevent the annealing of the primers to the target sites although this may not be the only explanation for the reduced level of heterozygosity found.In fact, higher homozygosis content could also have originated by variable levels of inbreeding that may have affected the analyzed varieties.The highest values of HObs were found for those loci (GAPU103a, DCA18, DCA16 and GAPU71B) that do not associate to any linkage groups, according to published data on mapping [1].This is consistent with the observation that only independent loci (not co-segregating) can provide correct information about the genetic evolution and differentiation of a population.
The comparison performed between the SSR allele sizes registered in the OLEA database and the values detected in our study, limited to 5 varieties and 10 loci commonly shared (100 detected alleles), revealed an unmatched allele rate of 12% and an unmatched locus rate (both alleles not matching) of 6% (data not shown).
The three dimensional projection of PCoA inferred by the SSR data (Figure 3) explains the 30% of the total variation present among the analyzed cultivars.
Despite the absence of well distinguishable groups, the plot display a clear separation of the Italian cultivars, according to the second axis (red circles in Figure 3), and of the Spanish cultivars ("Arbequina" and "Arbosana", green circles in Figure 3) according to the third dimension.Specifically, these latter cultivars always share one allele in 9 out of 11 loci while both alleles are commonly shared at the remaining two loci showing the highest similarity values.

Preparation of Varieties-Specific Molecular Probes
An additional advantage that may come from the use of the TBP molecular marker and its pattern of amplification is the possibility of producing variety specific probes.In fact, some of the PCR fragments obtained either from the amplification of the 1 st or of the 2 nd intron, may contain enough sequence information to allow the design of cultivar specific probes.The case is shown for the development and use of a pair of primers that specifically recognize some β-tubulin target sequences of the "Arbequina" variety.They have been designed starting from the isolation of the 409 bp fragment, an amplification product of the second intron (shown with a red arrow in Figure 4).
Once isolated and sequenced, a couple of primers was designed to target an internal nucleotide region that contains an insertion of 9 nucleotides, specific for "Arbequina", thus allowing the selective amplification of a 205 bp fragment (data not shown).The specificity of this probe was checked on DNA extracted from A similar strategy could be adopted for the preparation of any other cultivar-specific probe exploiting the presence of specific alleles in one of the two introns.Alternatively, if a specific allele is shared with other cultivars for one intron, an additional discriminating probe can be developed from the pattern of amplification obtained from the other intron.In this case, variety recognition will be achieved with the use of a limited, cost-effective combination of probes.

Discussion
The lack of a commonly shared set of microsatellite markers that could be consistently used for olive cultivars genotyping, together with the laboriousness of many of the current approaches led us to investigate about the possibility of using the TBP method as an alternative, reliable and simplified approach.TBP is a molecular marker principally based on intron length polymorphism that can also be exploited for inherent nucleotide sequence diversity.Used as any other set of molecular markers, mainly SSRs, so far developed and applied in olive [2] [5] [23], TBP can provide similar if not better information.Additionally the straightforward release of a simple and discriminatory DNA barcoding (TbB), and the production of variety specific probes that can also be used in olive oil traceability are of advantage.With respect to SSRs, TBP is less laborious, just one PCR reaction for each intron instead than 11 as it was in our study, and shows a better discrimination power with a PIC value significantly higher than that obtained with the set of SSRs (0,973 versus 0,742).In addition, TBP is independent of sequencing, unless for probe production, and is not influenced by homoplasy or nucleotide changes at the primer hybridization site that can also contribute to discrepancy in the final results [1] [24].Moreover, TBP does not require either reference or ladder alleles created ad hoc and it is easily transferable among species and varieties.In fact, despite their vast use, transferability of some SSR markers among different cultivars may be difficult when relying on compound motifs that suffer of irregular incremental steps [25].In our case, we American Journal of Plant Sciences could not find the expected SSR-based allelic profile, reported in the database, in 6% of our analyses and this may also be dependent on differences in laboratory equipment and reagents, as described by Baldoni et al. [1].As already reported in the results section, TBP may also offer an alternative to SSRs with respect to the possible co-segregation of the target loci because the different members of the β-tubulin gene family are typically scattered on different chromosomes, as can be found analyzing the data of several plant whole genome projects.With regard to this aspect, it is also important to emphasize the difference in sizes and number of the alleles that are obtained from the amplification of the two β-tubulin introns.This indicates that whatever is the mechanism that introduce nucleotide variability within the intron, often insertions and deletions (INDELs) due to transposable elements, it acts independently on the two introns.The last feature of the TBP method that deserves few line of discussion relates to the preparation of specific molecular probes, actually couple of primers capable of selectively amplifying cultivars-specific regions.Selection for short sizes amplicons should be favored because of the possible use of the probes on DNA extracted from olive oil, a largely consumed product known to undergo several counterfeits and frauds.For this reason, as we have also shown for "Arbequina", fragments resulting from the amplification of the second intron should be preferred.Even so, they may not always correspond to unique alleles containing discriminatory target sequences.Applied to a vast collection of cultivars, it can occur that allelic variant and target sequence may be shared between few of them, especially for more closely related varieties, but differences, useful for diagnostic purpose, may be found in additional amplicons originated from either of the two introns.Eventually, data banks storing all the sequences of the β-tubulin introns of all the olive cultivars will be a definitive and reliable source for any kind of study and analysis.

Conclusions
In conclusion, we have presently reported that the TBP method can represent a promising experimental alternative for an easy and simplified classification of the olive germplasm.Based on a single PCR reaction, it is capable of distinguishing different olive varieties, assigning to each of them a specific DNA barcode.In addition, it allows the preparation of variety specific molecular probes that could be possibly used in traceability assays.
The amplicons were tested on a 2% (w/v) agarose gel, stained by Atlas ClearSight DNA Stain (1 μg•mL −1 ) (Bioatlas).Two different dilution ratios (1:10 and 1:20) were applied to PCR products.Therefore, two distinct CE runs were performed for each analysed sample.Two microliters containing 100 -200 pg of diluted FAM-labeled PCR product were mixed with 0.18 µl of GeneScan TM 1200 LIZ™ Size Standard (Thermo Fisher Scientific) and 17.82 µl Hi-Di Formamide to a final volume of 20 µl.After denaturation at 95˚C for 5 minutes, the samples were loaded on the Applied Biosystems® 3500 Genetic Analyser.CE separation was performed using an 8-capillary array of 50 cm trough the POP-7 TM polymer Thermo Fisher Scientific), setting the following instrument specifications: 60˚C, injection time 5 seconds, run time 1680 seconds, injection voltage 10 kV and run voltage 15 kV.A negative (not template) and a positive (30 ng of Triticum durum L. gDNA) PCR controls were included in all TBP reaction.All amplification was repeated at the least twice, to ensure the consistency of the analysis.

Advance 9 (
Thermo Fisher Scientific) and verified according to the sequence information available on NCBI database (NCBI; http://www.ncbi.nlm.nih.gov/).The primers (forward Arb 5'-AGG CGG TCT CTC GAC TCT CT-3" and reverse Arb 5'-GGC CAG GAA ATC TCA GAC AA-3') were manually designed in order to selectively recognize a specific allele for the "Arbequina" accession.The end-point PCR reactions on 10 ng of gDNA were performed in a total volume of 20 µl, containing 2X Taq DNA Polymerase Master Mix providing 2.0 mM MgCl 2 (VWR International PBI, Milan, Italy), 0.5 µM of each primers and deionized water (Merck KGaA, Darmstadt, Germany).PCR conditions were: 3 min initial denaturation at 94˚C followed by 35 amplification cycles (94˚C 30 s, 66˚C 40 s and 72˚C 40 s) and final extension step of 5 min at 72˚C.Four µl of the PCR products were checked on 2% (w/v) agarose gel.Positive ("Arbequina" gDNA), negative (Triticum durum L. gDNA) and not template controls were included in all tests.The cross reactivity evaluation of the isolated probe was tested against 10 ng of gDNA for all the varieties included in this study.The detection limit of the probe was estimated by the amplification of serial dilutions of "Arbequina" gDNA added to a gDNA solution of Triticum durum L. used as the background.
present in the members of the β-tubulin gene family, was applied to 15 different olive varieties, mainly of Greek origin provided by the Agricultural University of Athens within the context of an exchange Erasmus program.Both β-tubulin introns were evaluated for the polymorphic content.Figure1shows that each of the 15 different varieties release a specific Tubulin-based Barcode (TbB) resulting from the presence of diverse alleles with different intron size.Each analyzed accession is characterized by its own specific and distinctive DNA profile.

Figure 1 .
Figure 1.TBP amplification diagram of the analyzed olive cultivars according to the name code reported in Table 1.Each fragment represents an intron allele with its specific size.Molecular sizes (base pairs) are reported on the left.Panel (a) TBP 1 st intron amplification.Panel; (b) TBP 2 st intron amplification.

Figure 3 .
Figure 3. PCoA three dimensions representation of the 15 olive accessions according to the SSR markers.Italian and Spanish cultivars are indicated by red and green points respectively, and grouped according to their origin.

Figure 4 .
Figure 4. Section of the CE-TBP 2 nd intron profile for the cultivars "Arbequina" and "Frantoio".The size of the peaks is reported in base pairs on the x-axis and the peak height (RFU-Relative Florescent Units) on the y-axis.A red arrow indicates a peak of 409 bp specific for the "Arbequina" cultivar.

Figure 5 .
Figure 5. (a) Cross reactivity test performed by the amplification of the isolated Arbequina's specific probe (205bp) against all the cultivars included in this study.The cultivars name code is reported according to Table 1; (b) Limit of detection of the isolated Arbequina's specific probe estimated testing different amount of target gDNA (10 to 0.0005 ng).Molecular sizes marker is reported on both sides.

Table 1 .
List of the 15 olive cultivars included in this study.Use: T, table olive; O, olive oil; T/O, dual use.

Table 2 .
Genetic parameters estimated among 15 olive cultivars throughout TBP 1 st and 2 nd intron analysis.

Table 3 .
Codes and motifs of the selected SSR loci.

Table 4 .
Diversity parameters estimated for the SSR loci annualized within the 15 olive cultivars.
k: number of alleles; N number of individuals; HObs, observed heterozygosity; HExp, expected heterozygosity; f: null allele frequency.