Emergence of Plastidial Intergenic Spacers as Suitable DNA Barcodes for Arid Medicinal Plant Rhazya stricta

The desert plant Rhazya stricta has anticancer and antimicrobial properties, and is widely used in indigenous medicines of Saudi Arabia. However, the therapeutic benefits rely on an accurate identification of this species. The authenticity of R. stricta and other medicinal plants and herbs procured from local markets can be questionable due to a lack of clear phenotypic traits. DNA barcoding is an emerging technology for rapid and accurate species identification. In this study, six candidate chloroplastid barcodes were investigated for the authentication of R. stricta. We compared the DNA sequences from fifty locally collected and five market samples of R. stricta with database sequences of R. stricta and seven closely related species. We found that the coding regions matK, rbcL, rpoB, and rpoC1 were highly similar among the taxa. By contrast, the intergenic spacers psbK-psbI and atpF-atpH were variable loci distinct for the medicinal plant R. stricta. psbK-psbI clearly discriminated R. stricta samples as an efficient single locus marker, whereas a two-locus marker combination comprising psbK-psbI + atpF-atpH was also promising according to results from the Basic Local Alignment Search Tool and a maximum likelihood gene tree generated using PHyML. Two-dimensional DNA barcodes (i.e., QR codes) for the psbK-psbI and psbK-psbI + atpF-atpH regions were created for the validation of fresh or dried R. stricta samples.


Introduction
Rhazya stricta of the Apocynaceae family is an important folkloric medicinal plant of Arabia [1]. The genus Rhazya includes only two species, Rhazya stricta and Rhazya orientalis (syn. Amsonia orientalis) [2] and both of these are unambiguously distributed, as the natural habitats of R. stricta are the coastlines and arid regions of Arabian Peninsula and the Indian subcontinent [3] while those of R. orientalis are northwest Turkey and northeast Greece [4]. But these two sister species exhibit similarity in possessing therapeutically significant secondary metabolites. About 100s of alkaloids [2] and flavonoids from R. stricta contain innumerable pharmacological properties [5] [6] [7]. R. stricta is considered as a potential chemopreventive [8] [9], antifungal [10], and antidiabetic [11] agent and also produces analgesic and sedative effects [7]. Likewise the metabolites originating from R. orientalis have various anti-cancer and anti-tumor properties [12]. This study targeted the locally available samples of R. stricta species hence its sole sister species R. orientalis was not included in this study.
Like other medicinal plants, R. stricta is procured either from the vast deserts or from herbal markets, where it goes by the vernacular name "Harmal", and is found in a dried or powdered form. Although authenticity is among the most significant aspects related to pharmacological products [13] the absence of distinct phenotypic traits for R. stricta impedes the identification of this plant. Unfortunately, contamination and adulteration of medicinal plant products are quite common in the herbal markets, which have even caused severe diseases and death in some cases [14]. It is essential to carry out molecular level identification of medicinal herbs before they are used as a therapeutic agent [15]. Thus, there is a dire need for accurately identifying R. stricta at the molecular level in addition to the traditional taxonomical examination to protect consumers.
Within the past two decades, we have witnessed advances in molecular level techniques for accurately discerning specimens, even in the absence of diagnostic morphological characteristics. These techniques have proved unambiguous for the authentication of medicinal plants [16]. One such molecular tool is DNA barcoding, which can offer quick authentication compared with traditional taxonomy [17]. This technology identifies unknown species by using standardized DNA segments as universal product codes. It targets highly conserved DNA sequences in which minor nucleotide polymorphisms have evolved [18]. These nucleotide variations aid in the creation of unique DNA barcode markers, which can be used to validate a sample from a given species [19].
Whereas the 5' end of the mitochondrial gene for cytochrome c oxidase 1 is used for the standard barcode for animals [20], this region is unsuitable for use in identifying plants owing to a low substitution rate [21]. A variety of complex evolutionary processes in plants (e.g., hybridization and polyploidy) complicate the distinctions for defining species boundaries [22]. Consequently, after years of tremendous efforts, not one single region has worked as an identifier for all of the plant species tested [23] [24], and it is unlikely that a single universal plant DNA barcoding marker exists [25].  [29], including the coding and noncoding regions of plastid genomes (rbcL, psbA-trnH, trnL-trnF, and matK) and nuclear regions (5S, 16S, 18S, and ITS) [30]. These efforts have led to six plastid DNA regions as leading candidates for a suitable plant barcode, namely, matK, rbcL, rpoB, rpoC1, and psbK-psbI and atpF-atpH intergenic spacers [29]. Indeed, some of the loci from chloroplastid DNA, including matK, rbcL, atpF-atpH, psbK-psbI, rpoB, and rpoC1, have shown promising results for identifying medicinal plants. For example, the matK region, which was proposed as a standard barcode for flowering plants after studying 1,084 plant species [28], produced positive results for the medicinal plant Rauvolfioideae (a subfamily of Apocynaceae) [31]. rbcL was recommended for species discrimination [29], owing to the successes for its amplification and sequencing [27]. Moreover, taxonomic confusions among Cupressaceae, Cornaceae, Ericaceae, and Geraniaceae specimens were resolved via rbcL sequences [32]. Sequences for rpoB and rpoC1 were informative in identifying Ochradenus arabicus [19], and Chase et al. [26] suggested they be used in combination with matK for DNA barcoding.
Promising results have also been obtained with the noncoding intergenic spacers, which have evolved comparatively rapidly and exhibit sequence divergence and high rates of insertion/deletion [27]. The intergenic spacer atpF-atpH could be used for validating medicinal plant material [33] and for distinguishing all three species in the genera Landoltia and Spirodela [34]. The intergenic spacer psbK-psbI shows potential for barcoding of the flora of Kruger National Park, South Africa [28] and its use has been endorsed over that of matK and other loci by the CBOL Plant Working Group, 2009 [29] due to its capability for discriminating species.
Nevertheless, a single locus DNA barcode has been proposed to lack sufficient variation for discriminating closely related taxa [35]. Hence, to enhance the identification capabilities of barcode markers, we investigated a multi-locus DNA barcode [27] [28]. We also determined the abilities of the six markers proposed by the CBOL Plant Working Group for discriminating R. stricta at the molecular level.

Ethics Testimony
The R. stricta plant is indigenous to the region and was found growing in the surrounding deserts and on roadsides; hence, no permission was required for sample collection. Samples were immersed in liquid nitrogen and stored at -80˚C until DNA extraction. Five raw dried and powdered samples of R. stricta, were obtained from the local herbal market to check the efficiency of the proposed DNA barcode.

DNA Extraction, Amplification, and Sequencing
R. stricta is a poisonous medicinal plant with high levels of alkaloids and flavonoids that can make DNA extraction challenging. To overcome this, genomic DNA was extracted from young leaf tissues [36] using a DNeasy plant mini kit  Table 1. As initial amplifications failed for some loci (e.g., matK), which has been reported for samples in the order Gentianales [37], the thermocycling conditions were optimized using touchdown PCR (Table 1). This resulted in successful amplification of target regions.
The amplicons werThe amplicons were analyzed on a 1.5% agarose gel as described above for genomic DNA analysis using a 100-bp DNA ladder (Promega Corp., USA) as a molecular marker. Gel images were analyzed on a gel documentation system (Biospectrum 410; UVP). PCR products were sent to Macrogen, Inc. (Seoul, South Korea) for bidirectional sequencing using the same primers to resolve ambiguities [38].

Data Analysis
Sequence chromatograms were viewed via Flinch TV 1.4.0 (Geospiza, Inc., Seattle, WA, USA) to analyze base calls and quality values. Further analysis and alignments were carried out using Geneious 9.1 software [39]. Sequence similarities were identified using the BLAST algorithm (http://blast.ncbi.nlm.nih.gov/blast.cgi) to judge the identification capability of the six-barcode regions. High Bit scores, Grades (a weighted score for the hit comprising the E-value, pairwise identity, and coverage), and a cutoff E-value of 1e20 were taken into account. Sequences of the barcode regions for R. stricta and that performed the best were downloaded from the GenBank database to assess their discriminatory efficiency (Table 2). We studied species from different genera, as no data from the chloroplast regions of other species in the genus Rhazya (other than R. stricta) appeared in our GenBank database search. These sequences were edited and aligned in Geneious Pro 9.1 [39], using the multiple alignments tool MUSCLE plug-in [40]. For alignment, sequence orders from the Blast search were preserved. GC contents and the percentages of identical nucleotide sites were analyzed with the same software.
We used the tree-base method of identification with phylogenetic distance (via gene tree) to study the divergences of different barcode markers. Here, a query was allocated to the species with whom it clustered. In this study, phylogenetic distances were analyzed using PHyML 3.0 software [41]. On the basis of the HKY85 model of nucleotide substitution, this software generated a maximum likelihood gene tree with a 1000-replicate bootstrap test to authenticate R. stricta. After these analyses, all the sequences were deposited into the GenBank nucleotide database (Table 3).
In addition to the single locus barcodes, a double locus combination was investigated by concatenating the candidate barcode regions using Geneious Pro 9.1 software. The mean intra-and interspecific genetic distances were calculated using MEGA 6.0 [42] with the Kimura 2-parameter model. The proposed DNA barcodes were validated on five dried powered market samples ofR. stricta.

Two-Dimensional DNA Barcoding
The nucleotide sequences of the candidate barcode markers, which validated R. stricta species efficiently, were converted to two-dimensional DNA barcode images using an open-source PHP QR coding method [43].

Success of DNA Isolation and Amplification
Although the extraction of DNA was challenging from some samples, DNA was successfully extracted using the modifications described in the Methods. All six of the chloroplast barcode loci were successfully amplified after optimizing the thermocycling conditions. The properties of these six loci are given in " Table 4".
There were no variations in the intraspecific alignments of sequencing products for the six loci (from fifty R. stricta samples). The mean intraspecific distances (calculated by MEGA 6) were zero, verifying that all of the fifty R. stricta samples belong to the same species with no genotypic variation.
The interspecific divergences (investigated after carrying out a multiple alignment between R. stricta and the seven species from Apocynaceae) for matK, rbcL, rpoB, and rpoC1 regions were low due to the high percentages of identical sites (Table 4). Nevertheless, interspecific divergences were high for atpF-atpH and psbK-psbI loci due to variations in R. stricta sample sequences ( Table 4).
The identical sites percentages (calculated using Geneious Pro 9.1) for these two loci were much lower than for the coding loci (Table 4). " Figure 1" highlights the variable informative sites in the psbK-psbI locus from R. stricta sample sequences. The mean interspecific distance was highest for psbK-psbI (7.80%) compared with those of the other loci.

Maximum Likelihood Tree Identification
The maximum likelihood tree [41] for the psbK-psbI region clustered sample sequences of R. stricta (query sequences) with the GenBank R. stricta sequences into a single independent clade that was highly supported (Figure 2(a)). The case was similar for the double locus (psbK-psbI + atpF-atpH) sample sequences ( Figure 2(b)). Hence, the similarity-based method and tree-based identification indicate that psbK-psbI is an appropriate DNA barcode region for identifying R. stricta. The psbK-psbI intergenic region aided by atpF-atpH also displayed promising results, supporting its use as a double locus barcode for R. stricta. These two proposed DNA barcodes (psbK-psbI and psbK-psbI + atpF-atpH) clearly identified dried powdered R. stricta samples acquired from the market.

Generation of Two-Dimensional Barcodes
The purpose of DNA barcoding is to accurately and quickly identify medicinal plants and their products at the molecular level to benefit the consumer. With this aim, the nucleotide sequences of the DNA barcode markers (psbK-psbI and psbK-psbI + atpF-atpH) were considered analogous to the products' barcodes in a supermarket and were converted to two-dimensional barcodes (i.e., QR codes)

Discussion
Approximately 80% of the world's population are inclined to use herbal medicines for their primary care [44]. Similarly, the Saudi population exhibits a keen interest in the medicinal species of the local flora [45]. For example, the numerous pharmacological properties of R. stricta make this a significant medicinal plant in this part of the world [46]. Although it is easily available from its local natural habitat, identifying this plant in herbal markets is difficult due to the lack (a) (b) Figure 3. DNA barcodes and two-dimensional DNA barcodes of psbK-psbI (a) and psbK-psbI + atpF-atpH (b) sequences for R. stricta.

S. A. Khan et al.
of distinct morphological traits and the variability in storage conditions and product ages. In an effort to safeguard consumers' health, we investigated a method of DNA barcoding for validating R. stricta at the molecular level, irrespective of its physical state.
DNA barcoding is considered the renaissance of taxonomy [47]. This molecular tool emerged rapidly over the past two decades for species discrimination [25], and it has outperformed other diagnostic tools for identifying and authenticating species [48]. Numerous studies have not only affirmed its competency for species identification, but also highlighted its strengths in defining species boundaries and flagging new species [20] [49]. Owing to its ease and swiftness, this tool has been used extensively for distinguishing medicinal plants [50] [51] [52]. It is predicted to play a significant role in identifying medicinal plant products in the future [53].
It was proposed that an ideal DNA barcode (i) can be easily amplified and sequenced, (ii) does not exceed 1 kb in size, and (iii) has higher interspecific than intraspecific variation [54]. We selected six barcoding loci to examine with these criteria in mind. The rbcL, rpoB, rpoC1, psbK-psbI, and atpF-atpH loci were easily amplified and sequenced. The amplification of matK was challenging initially, which has been found in medicinal Uncaria species [55] and temperate flora comprising 436 species in 269 genera of land plants [56]. By contrast, the psbK-psbI region amplified easily, similar to that for Panax species (Ginsengs) [57]. The successful amplification of intergenic spacers (psbK-psbI and atpF-atpH) results from the ability to use universal primers for the highly conserved coding sequences on either side of the locus [58].
In this study, the sequences of R. stricta for matK, rbcL, rpoC1, and rpoB were similar to those for the other taxa. Accordingly, matK did not aid in species identification in other studies, suggesting that it may be nonfunctional in some taxa [29] [17]. The rpoC1, rpoB [59] [60], and rbcL [61] loci similarly have shown very low species discrimination in other investigations. The noncoding intergenic spacer regions, psbK-psbI and atpF-atpH, are promising barcode markers for R. stricta due to their faster divergence than genes in otherwise slow-evolving plant species [62] [27]. Indeed, the psbK-psbI region exhibited efficient barcode recovery and species discrimination in a number of studies [29] [63]. Zuo et al. [58] endorsed psbK-psbI over coding regions, as it is a more informative and powerful marker for species identification. This locus discriminates the Kruger National Park flora [28] and taxa of Orchidaceae in Korea [64]. Similarly, the atpF-atpH intergenic spacer has been endorsed as a supplementary locus [65] as it was found to successfully authenticate medicinal plant materials in a study involving 17 barcode regions [33]. Another option that was proposed was to combine psbK-psbI with other loci, such as matK and atpF-atpH [66] [28] [65]. DNA barcoding takes advantage of polymorphisms that lead to gaps in sequences. Polymorphisms in psbK-psbI and atpF-atpH regions were observed as gaps upon amplicon alignment that were found to be distinct to R. stricta and might serve for barcoding this species. In addition, the psbK-psbI and concatenated psbK-psbI + atpF-atpH regions displayed 64.1% and 72.4% similarities (on the basis of nucleotide sites), respectively, to the other species sequences on alignment. Hence, these loci are considerably unique with regard to R. stricta. Indeed, a 100% monophyletic differentiation of the R. stricta species was found for the single locus (psbK-psbI) and double locus combination (psbK-psbI + atpF-atpH) as shown by the maximum likelihood trees. Moreover, the psbK-psbI and psbK-psbI + atpF-atpH loci successfully identified market samples of R. stricta.

Conclusion
In this study, we found that matK, rbcL, rpoC1, and rpoB coding regions are not effective markers for R. stricta, as they were similar to the sequences of other species. By contrast, intergenic spacer regions psbK-psbI and atpF-atpH contained sequence polymorphisms that are unique to R. stricta. We propose the use of psbK-psbI as an optimal single locus barcode and psbK-psbI + atpF-atpH as a complementary double locus combination for creating DNA barcodes to identify the medicinal plant R. stricta. We generated two-dimensional barcodes (QR codes) for these loci to facilitate molecular authentication of R. stricta via an easy and inexpensive method for consumers. This method would also greatly benefit market supervision of other medicinal plants of Saudi Arabia and its neighboring countries. It can be a preliminary step towards generating a databank of barcode markers for economically significant flora of this arid region.