DNA Barcodes in Fig Cultivars ( Ficus carica L . ) Using ITS Regions of Ribosomal DNA , the psbA-trnH Spacer and the matK Coding Sequence

Molecular markers provide a useful method for genotype characterization and allow a high precision determination of the genetic relationship between cultivars and varieties. A system based on DNA sequences—which is known as DNA barcoding—will choose one or several standard loci which can be sequenced and compared to differentiate between species. In this research, the ITS, matK, and trnH-psbA sequences were evaluated for the molecular identification of seven F. carica genotypes, generating complete sequences for the first two loci, but unable to produce bidirectional sequences by using the trnH-psbA sequence. The ITS sequence presented the highest variation rates, while the phylogeny constructed with the matK sequence obtained the highest percentage of solved monophyletic groups. Through Pearson’s correlation analysis, it was possible to determine the existence of a significant correlation between the ITS region and psbA-trnH, and the matK and psbA-trnH sequences, but not between ITS and matK. The phylogenies constructed with the ITS + matK barcodes and ITS + matK + psbA-trnH presented the highest percentage for resolution. However, considering the cost efficiency and the facilitated recovery by using PCR, the matK + ITS combination is recommended.


Introduction
Fig (Ficus carica L.) has been an emblematic fruit tree since ancient times, associated to the beginning of horti-culture in the Mediterranean basin.It is a known fact that fig was domesticated from a group of various species from the South and East regions of the Mediterranean, possibly in the same time span as cereals were domesticated, hence suggesting its cultivation since early Neolithic times [1].
Morphological, physiological, and agronomical performance parameters are frequently used in fig trees in order to establish the descriptors required for the identification of existing genotypes, with the parameters related to leaves and fruits being the most selective.However, these traits are very sensitive and are dependent upon environmental conditions; hence the number of descriptors is limited and in some cases; it does not allow separating the phenotypes in different groups [2] [3].
Presently, different DNA analysis methods have allowed a precise identification of the different genotypes, and they can be classified into three basic types [4].One of these categories corresponds to methods which are not based on polymerase chain reaction (PCR), where it is possible to find, for example, restriction fragment length polymorphism analysis (RFLP) and the variable number tandem repeats (VNTRs).The second classification includes techniques that employ arbitrary or semi-arbitrary primers; for instance, multiple arbitrary PCR primers such as (MAAP), RAPD and RAMPO.The third category is the one where a PCR with a specific objective is used, enumerating the microsatellite markers (SSR) and the inter-single sequence repeats (ISSR).
Recently, due to the development of fast, efficient, and accessible technologies for DNA sequencing, a new system based on DNA sequences and used for the identification of living organisms has been developed.Known as "DNA barcoding", it consists of choosing one or several standard loci that can be sequenced routinely and reliably, resulting in easily comparable data that allows distinguishing species from one another.The defendants of this initiative establish that DNA barcoding facilitates the identification of species and contributes to the renewal of biological collections, while it also allows acceleratinge biodiversity inventory [5].
In the Ficus genus, different molecular markers have been used to characterize the germplasm; however, the results are limited in comparison to other plant species [6].In F. carica, studies report the use nuclear ITS, the trnH-psbA intergenic spacer or the matK sequence as DNA barcodes, for the phylogeny reconstruction in different population groups [7] [8].
The aim of this research was to evaluate the use of ITS regions of ribosomal DNA, the psbA-trnH spacer, and the matK sequence as markers for molecular identification through DNA barcoding of different Ficus carica L. accessions present at the Centro de Investigaciónen Biotecnología of InstitutoTecnológico de Costa Rica.

DNA Extraction, Amplification and DNA Sequencing
Total DNA extraction was obtained from young leaf tissue of seven different accessions of F. carica (Table 1), using the "DNeasy plant extraction mini kit" (Qiagen Inc.), according to the protocol suggested by the manufacturer.The concentration and purity of the DNA was quantified through spectrophotometric analysis, by determining the absorbance at 260 nm and 280 nm.
The PCR reactions were performed in a final volume of 50 μL, using DreamTaq™ Master Mix 2X (Fermentas), 1.5 ng of the primers and 0.5 µL of genomic DNA.Amplification of the ITS region was achieved using the primers 5'-AAGGTTTCCGTAGGTGAAC-3' and 5'-TATGCTTAAACTCAGCGGG-3', KIM3F and KIM1R primers for matK, and finally, psbA3'f and trnH primers for the psbA-trnH spacer.PCR chain reaction was optimized through a temperature gradient trial (∆T) between 50˚C and 57.5˚C, with a gradual temperature increase of 2.5˚C for ITS and matK sequences, and between 62˚C and 65˚C with a 1˚C increase for psbA-trnH; using, in all cases, the following thermal profile: [94˚C 2 min, (94˚C for 30 s, ∆T for 30 s, 72˚C for 50 s) × 30 cycles.Furthermore, the effect of adding dimethyl sulfoxide (DMSO) in a final concentration of 5% (v/v), as an adjunct in the PCR reactions, was also assessed.
The sequencing process of the amplified products was obtained by sending 20 μL of each sample to Macrogen Inc. (USA).

Editing, Sequence Alignment and Phylogenetic Reconstruction
The DNA sequences were edited using BioEdit program [9], assembled through the CAP3 tool [10], and aligned to their homologous sequences using the EMBL-EBI MUSCLE tool [11].The multiple alignment file was analyzed with the MEGA v 5.0 software [12] in order to calculate the genetic distance per loci between the accessions; according to the number of base pair substitutions between the sequences, eliminating all the positions with missing data, and using Kimura's 2-parameter model.Additionally, a Pearson's correlation analysis was performed between the distance matrices from the three loci.The development of a phylogenetic tree was achieved through the maximum likelihood (ML) method with a 500 replicate bootstrap.To estimate the resolution of each DNA barcode, the percentage of monophyletic groups generated was calculated using a bootstrap higher than 50% as a parameter to define the nodes, according to the recommendation provided by Tripathi et al. (2013) [13].

DNA Extraction, Amplification and Sequencing
The Qiagen D Neasyplant extraction mini kit obtained total DNA from the seven fig accessions at an average concentration of 31.70 μg/mL with an average A 260 /A 280 ratio of 1.24.In the optimization of the PCR chain reaction, the 55˚C temperature allowed the generation of more intense bands for the ITS region and matKsequence, while for the psbA-trnH sequence, the best results were achieved at 64˚C.Moreover, it was determined that adding 5% (v/v) of DMSO benefited the PCR chain reaction for the ITS region sequence, however, in matK and psbA-trnH it presented an inhibitory effect (Figure 1).
From the analyzed loci, matK presented a longer sequence and psbA-trnH has the shortest one; meanwhile, the highest G-C content was obtained with the ITS and the smallest content with psbA-trnH.In the case of psbA-trnH, there were no bidirectional sequences produced; hence the longest amplified sequence was the one reported (Table 2).

DNA Extraction, Amplification and Sequencing
In this research, the absorbance ratio of 260/280 in the DNA extractions was always less than 1.57, demonstrating the presence of impurities with strong absorbance in, or close to, 280 nm [14].There is a probability that la-tex and endogenous phenols of the fig's leaf tissue were not totally eliminated with the kit.Regarding this matter, Weiblen (2000) [15] mentions that modifications in extraction procedures have been required in order to obtain DNA of increased purity, due to the presence of latex in F. carica tissue.The amplified ITS region, with an average of 614 pb, was similar to the one reported by Baraket et al. (2009) [7], who found ITS regions of 697 pb in 31 fig cultivars.The G-C content in the ITS sequence of the evaluated accessions was of 63.72%, which is comparable to the ones obtained by Baraket et al. (2009) [7], which ranged between 53.1% to 64.1%.Due to the high percentage of G + C content present in the ITS region, its amplification through PCR was inhibited, since the considerable intermolecular forces formed by the triple hydrogen bonds in the DNA generate structures which are difficult to denature [16].DMSO interrupts the formation of a secondary structure in the DNA, resulting in a destabilization of the double helix structure, and hence aiding the PCR chain reaction [17].These results are consistent with those obtained by Razafimandimbison et al. (2004) [18], who recommended the use of DMSO or BSA along with TMACl in the final PCR chain reaction.Similarly, Kress et al. (2009) [8] mentioned adding a final concentration of 5% DMSO in all the PCR reactions.
The amplified matK sequence of 894 pb was analogous to the one reported by Kress et al. (2009) [8], with an average of 850 pb in a plant community from Panamá, which included samples from the Ficus genus.In contrast to the ITS region, the percentage of the G + C content of this locus was below 40%; which explains the reason why adding DMSO inhibited the PCR chain reaction instead of optimizing it.Even though low recovery percentages were reported for the sequence, the amplification problems for matK were not due to the sequence composition; but they were rather caused by the efficiency and specificity of the primers used [19] [20].However, the KIM3F and KIM1R primers were effective for the locus amplification in the varieties of F. carica that were analyzed.
In this study, it was not possible to generate a complete bidirectional sequence for the psbA-trnH sequence, and only 267 pb segment was obtained, which is smaller than the 386 pb sequence amplified by Roy et al. (2010) [21] for the Ficus genus, and the 488 pb reported for F. carica by Kirin et al. (gb|KC584953.1|;unpublished).The complications in generating bidirectional sequences in this locushave been previously documented by several research groups.Li et al. (2011) [22] explained that the sequence interruption occurs because of the mononucleotide repetitions; meanwhile Fazekas et al. (2008) [23] argue that the problems in producing bidirectional sequences is caused mainly by the presence of homopolymers in the electrophoresis runs.

Editing, Sequence Alignment and Phylogenetic Reconstruction
The percentage of variable sites and the intraspecific distances of the loci evidenced the presence of polymorphisms between the analyzed F. carica cultivars, with the ITS region being the locus that presented the most variability out of the three loci used.Fu et al. (2011) [24] found similar results when comparing the ITS, rbcL, matK, and trnH-psbAsequences, since the researchers determined that the ITS region showed the highest average genetic distance as well as the number of variable sites.On the other hand, Tripathiet al. (2013) [13] mention that, in an intraspecific level, rbcL and trnH-psbA are the least divergent loci, matching the present research in which psbA-trnH had the lowest average K2P distance value.
In regards to the resolution of the monophyletic groups, the ITS region by its own was the locus with the lowest resolution (25.57%), contrary to the findings of Roy et al. (2010) [21] who reached the best species resolution of the Ficus genus with this molecular marker.The existence of a strong, common genetic basis among the accession could be the possible explanation for these results [25].On the other hand, the matK sequence presented the highest individual resolution percentage (85.71%),followed by psbA-trnH (50%).Even though, in many cases, matK has been considered efficient in differentiating species, there are opposing reports on using trnH-psbA as a locus for DNA barcoding [13].
The individual phylogenies of the three loci presented two main inconsistencies.In the first place, some individuals were grouped together with the ITS sequence, but were then divided in two or more groups when using the plastome sequences; and secondly, the species which were grouped together using the psbA-trnH sequence, were then divided when grouping them with matK, both belonging to the plastid DNA.These scenarios suggest hybridization and introgression between closely related varieties, the existence of shared ancestral polymorphisms, and/or a different mutation rate between some of the sequences [22].
These results also indicate that the analyzed sequences contribute in different ways to distinguishing the diverse fig genotypes.This hypothesis is supported in the fact that the polymorphisms generated in each of the sequences have different origins from a molecular level viewpoint, and can differ in their application when searching for genetic diversity and establishing the relationship between the genotypes.
The strong and statistically significant correlation observed between ITS and psbA-trnH, as well as with matK and psbA-trnH suggest that the phylogenetic information between these locus pairs is closely related, while no significant correlation between ITS and matK indicate the presence of different evolutionary events between the two regions, allowing each other to complement the distinction between genotypes [7].
The multilocus DNA codes allowed a more clear phylogenetic separation of the fig varieties, and in general, they provided a similar grouping pattern for the accessions, especially when the ITS region was complemented with the information from any of the plastome sequences.
While the greatest K2P distance was observed in the multilocusITS + psbA-trnH, the highest percentage of monophyletic groups was achieved only when the ITS + matK combination was present.Li et al. (2011) [22] presented similar results when establishing that matK allows greater species differentiation.On the other hand, Fu et al. (2011) [24] reported high monophyly percentages when combining ITS with the matK+ rcbLbarcode.
Additionally the multilocus phylogenetic analysis evidenced the speciation process that the cultivars from Colombia, El Salvador, Negro San Juan and Pacayas have experienced from the traditional varieties of Brown Turkey and Brogiotto Bianco; which is a connection that is not fully evident from the analysis based only on the plastome sequences (Figure 2).
When identifying varieties of F. carica, the most efficient DNA barcodes were the matK + ITS and the matK + psbA-trnH + ITS combinations, since they allowed a 100% monophyly differentiation of the species.However, considering cost efficiency and the difficulty in obtaining bidirectional psbA-trnH sequences, the recommendation would be to use a two-sequence multilocus rather than one with three sequences.The reported results are consistent to those gathered by Li et al. (2011) [22], who found a higher species differentiation percentage with the matK + psbA-trnH + ITS multilocus, but recommended the matK + ITS combination based on cost efficiency.
Table 4 shows the genetic distance matrix of the analyzed accessions, using the proposed barcode, while the ITS + matK multiple sequence alignment is found in Appendix 6.The genetic distance of the cultivars ranged between 0.0020 and 0.0285, being the Guarinta accession the one presenting the highest divergence when compared to the rest.Related results were obtained by Baraket et al. (2009) [7] who reported that the genetic dis-   [25] found a genetic distance ranging from 0.004 to 0.170 in 31 F. carica cultivars with the ITS + trnL-trnF multilocus.

Conclusion
The gathered data demonstrated that the local fig germplasm was characterized by its genetic diversity.Despite the low genetic variation levels in the F. carica accessions, the multilocus ITS + matK barcode was not the most reliable and cost efficient alternative for the identification of the cultivars but it also was useful for explaining the phylogenetic relationships in this taxonomic level.Furthermore, the three analyzed locus presented their particular strengths and weaknesses for genotype identification in F. carica at intraspecific taxonomic level, which should be accounted before designating them as universal DNA barcodes for plants.

Figure 2 .
Figure 2. Phylogeny reconstruction based on the nrITS + matK sequences and using the maximum likelihood (ML).

Table 1 .
F. carica accessions found at the Centro de Investigación en Biotecnología of Instituto Tecnológico de Costa Rica.

Table 2 .
Size and percentage of G-C content for the ITS region, matK and psbA-trnH sequences.

Table 3 .
Efficiency of the DNA barcodes of individual locus and their combinations.