Development of a Molecular Marker to Identify a Candidate Line of Turmeric ( Curcuma longa L . ) with a High Curcumin Content — Development of Molecular Marker of Turmeric

Dried and fresh rhizomes of the spice turmeric (Curcuma longa L.) are well known in traditional medicine, and curcumin is widely used in various geographic regions. Although there are differences in the amount of curcumin within this species, identification of the candidate line by rhizome is difficult because of the relative simplicity of its morphological characteristics. To accurately identify lines of C. longa with a high content of curcumin, we analysed several sequences of chloroplast DNA. First, to determine the appropriate outgroup taxa in which to conduct infraspecific analyses of C. longa, we reconstructed the molecular phylogenetic tree of C. longa and its allied species. The results showed that C. aromatica and C. zedoaria are closely related to C. longa. Next, to develop a molecular marker for identifying lines of C. longa with a high content of curcumin, a network analysis using chloroplast microsatellite regions was performed. Results showed that a unique haplotype within C. longa corresponds to the high curcumin content line. Therefore, the chloroplast microsatellite regions used for the analysis allowed us to determine the lines of this species with high curcumin content.


Introduction
Traditional medicine is known to be fertile ground for the source of modern medicines [1].One medicine in that category is curcumin, a yellow coloring agent present in the spice turmeric (Curcuma longa L.) that belongs to the ginger family (Zingiberaceae).Besides its use in cooking to add color and as a preservative, turmeric is used in Indian traditional medicine to treat various common ailments including stomach upset, flatulence, dysentery, ulcers, jaundice, arthritis, sprains, wounds, acne, and skin and eye infections [2].In this way, curcumin is widely used in various geographic regions in which the morphologies of the dried or fresh rhizomes of Curcuma plants are highly similar, and it is therefore difficult to correctly identify individual Curcuma species.
Recent molecular phylogenetic studies have revealed new aspects of the relationships among Curcuma species.For example, to identify the rhizomes of C. longa, Sasaki et al. [3] developed a molecular identification method for Curcuma species using an amplification-refractory mutation system analysis of 18S ribosomal ribonucleic acid (rRNA) and the trnK gene.Although this method can identify species in a line of samples with high accuracy, the process is complicated.This marker is sensitive to experimental conditions of the polymerase chain reaction (PCR) because judgment is based on the presence or absence of PCR products.In addition, Minami et al. [4] reported that Curcuma species can be identified using an intergenic spacer between trnS and trnfM of chloroplast deoxyribonucleic acid (cpDNA).Although this method is useful for identifying C. longa from various Curcuma rhizomes, the study by Minami et al. [4] may require re-evaluation since the number of sample species used were few, and it did not employ appropriate outgroup taxa.The same can be said of the study by Sasaki et al. [3].In our view, out-groups are important for assessing plesiomorphic or apomorphic states for each characteristic and for interpreting the phylogenetic relationships within C. longa.Alternatively, Aoi et al. [5] reported a difference in the amount of curcumin found among individuals of C. longa.Therefore, it is necessary to use infraspecific genetic markers to determine the amount of curcumin in individuals of this species in order to acquire it efficiently.
Large-scale sequencing of a predefined region of approximately 1500 base pairs (bp) of the cpDNA matK has one main goal in monocotyledonous plants including C. longa: to clarify the phylogenetic position and closely related species in order to determine the outgroup taxa of unknown individuals [6].Although the sequence of the matK gene plays an important role in identifying the outgroup taxa of C. longa, it cannot be used to identify lines in this species containing high amounts of curcumin because of its low substitution rate [7].Thus, determining more reliable genetic markers within C. longa requires more informative DNA regions.In recent years, cpDNA sequence data have been used frequently to reconstruct the phylogenetic relationships of a wide range of land plants [8].Some regions were reported to have nucleotide substitution rates that are sufficiently high to facilitate the elucidation of relationships among closely related species [7,9].Moreover, the utility of non-coding cpDNA regions within species-level phylogeny has been demonstrated in many taxonomic groups of land plants [10].However, phylogenetic analysis using regions with high substitution rates, such as microsatellites, is often biased by multiple substitutions at a single site.In addition, because algorithms of phylogenetic analysis can cause problems when data do not represent a tree-like structure [11], we also analysed the cpDNA sequences with the statistical parsimony network approach.This approach reflects the genealogical relationships of the sequences used; that is, single-mutation steps separate adjacent haplotypes in the network, and older haplotypes are placed at internal branching points, whereas younger haplotypes occur towards the tip position.In the current study, we show that this analysis of the infraspecific relationships within C. longa can result in identification of clear relationships not only among young and rapidly speciating groups but also among old lineages reaching deep into the divergence of this group.
Thus, the aim of our study was to develop infraspecific genetic primers, using a phylogenetic approach, for identifying lines of C. longa with high curcumin content.

Cultivation and Measurements
The experiments were carried out in the fields (sandy soil) of the Faculty of Agriculture, Kochi University, Japan for four years (2006)(2007)(2008)(2009).Table 1 lists the sample lines used for the cultivation experiments in this study: 2 C. aromatica Salisb., 12 C. longa L., and 1 C. zedoaria Rosc.Rows were constructed with a width of 70 cm and height of 20 cm.Rhizomes were transplanted 8 cm below the soil surface in one-row ridges, on hills separated by a distance of 30 cm, in late May for four years.For fertileizer dressing, a total of 1.5 kg/a of N, 0.6 kg/a of P 2 O 5 , and 1.4 kg/a of K 2 O was applied over four years.In addition, 200 kg/a of compost fertilizer, 15 kg/a of magnesia lime, and 30 kg/a of chicken droppings were applied (Chicken droppings were not used in 2006.).The experimental plots were arranged in a randomized complete design with two replicates, which formed three rows.Due to a lack of seed rhizomes, some lines of C. longa were examined without replicate or two rows.Rows were covered by silver mulch.Irrigation was performed using a sprinkler, with observation of the amount of precipitation and the condition of the plants.Four or six hills containing average-sized shoots per line were sampled in early December each year for four years.
For curcumin analysis, the curcuminoid content of curcumin 1, curcumin 2, and curcumin 3 (curcumin, demethoxycurcumin, and bis-demethoxycurcumin, respectively) of the primary branch rhizome was measured by high performance liquid chromatography (HPLC, HITACHI, L-2420), according to the method described by Sato et al. [12].We extracted curcumin by 80% of ethanol from 0.3 g of the dried and ground samples.HPLC was performed on a column of YMC-Pack Pro C18.Mixed acetonitrile with 0.1% of phosphoric acid (50:50) was used for the mobile phase.Current flowed at a rate of 1 mL per min.The amount of injection was 10 µL.

DNA Analysis
Total DNA was isolated from fresh root with a Plant Genomic DNA Mini Kit (Viogene, Sunnyvale, CA, USA), according to the manufacturer's protocol.The PCR mix contained approximately 100 to 200 ng of total DNA, 1 µM of each primer, 200 mM of each deoxynucleotide, 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM of MgCl 2 , and 1.25 units of Taq polymerase.Double-stranded DNA was amplified, after incubation at 94˚C for 2 min, by 45 1) Four or six hills containing average-sized shoots per line were harvested; 2) Not cultivated.
cycles of incubation at 94˚C for 1.5 min, 48˚C for 2 min, and 72˚C for 3 min, with final extension at 72˚C for 15 min.We amplified four regions from cpDNA, namely matK, rpl16 intron 2, petB intron 1, and petB intron 2, with primers designed by Johnson and Soltis [13] and Nishizawa and Watano [9].After amplification, reaction mixtures were subjected to electrophoresis in 1% low melting-temperature agarose gels for separation of specific amplified products.We sequenced the purified PCR products using a Big Dye Terminator Cycle Sequencing Kit (ABI PRISM, Perkin Elmer Applied Biosystems, Foster City, CA, USA) and an ABI PRISM 3100-Avant Genetic Analyzer according to the manufacturer's instructions.The primers used for sequencing were the same as those used for amplification.
To construct a phylogenetic tree based on matK sequences of Curcuma and its allied species, the amplified regions were aligned using ClustalW [14] and were improved manually using MEGA 4 [15].Phylogenetic relationships were analysed using the neighbour-joining (NJ) method with PAUP* 4.08b [16].The NJ analyses were performed using MEGA 4 with Kimura's two-parameter model.For the NJ analyses, bootstrapping with 1000 pseudo-replicates was chosen to examine the robustness of the clades and their phylogenetic relationships.In addition, a network tree of microsatellite regions of cpDNA was constructed with Network 1.4.1.2(Fluxus Technology Ltd., Suffolk, UK) using the median-joining method [17].

Curcumin Content
We mainly describe results of the cultivation experiments for 2009, since data for the previous years of the study (2006, 2007, and 2008) showed similar trends.
Table 4 shows the curcumin content in the maturity period for the years 2006 through 2009.With respect to the curcumin content of primary branch rhizomes in the maturity period, total curcuminoid content, in order from highest to lowest, was as follows: South Asian C. longa (3198.4-2315.1 mg/100 g), with the exception of Indonesia B > domestic C. longa (305.0-392.2mg/100 g), with the exception of Wakayama B > Indonesia B (309.5 mg/100 g) > C. aromatica (121.9-126.9mg/100 g).Indonesia A and B and Vietnam A and B showed a high content of accumulated curcuminoid in the rhizome.However, C. longa (Wakayama B) and C. zedoaria   showed little curcuminoid content.The inside color of the rhizome in C. longa (Wakayama B) and C. zedoaria was white and purple, respectively, whereas C. longa, which showed an accumulation of curcuminoid, had a yellow-colored rhizome (Figure 1).
In order from highest to lowest of each content of curcumin 1, 2, and 3 showed a similar trend with the order of total curcuminoid content.With respect to the ratio of its curcumin 1, 2, and 3 in domestic C. longa (with the exception of Wakayama B) was 69%-71%, 21%-23%, and 8%-9%, respectively.However, the ratio of curcuminoid content in South Asian C. longa (with the exception of Indonesia B) was 46%-52% (curcumin 1), 26%-28% (curcumin 2), and 22%-26% (curcumin 3), indicating a low percentage of curcuminoid in curcumin 1 and a high percentage of curcuminoid in curcumin 3, compared with domestic C. longa.The ratio of curcumin 1, 2, and 3 in Indonesia B differed from that in Indonesia A and C and more closely resembled domestic C. longa, despite the fact that Indonesia A, B, and C originated in the same country.In C. aromatica, the ratio of curcumin 1, 2, and 3 was 47%-53%, 47%-52%, and 0-1%, respecttively.C. aromatica also had a high percentage of curcumin 2.
The relative order of curcuminoid content in the primary branch rhizomes was similar throughout the four years of the study.Morishita et al. [18] reported that year-to-year variation in curcuminoid content is more vigorous than that of the varietal differences in turmeric.In the present study, the difference of curcuminoid content in Curcuma species had not only varietal difference but also year-to-year variation.

DNA Analysis
To construct the molecular phylogenetic tree of Curcuma and its allied species, we determined sequences of the matK gene of Curcuma cpDNA and seven outgroups of Boesenbergia rotunda, Cautleya spicata, Cornukaempferia aurantiflora, Curcumorpha longiflora, Hedychium greenei, Scaphochlamys biloba, and Zingiber mioga.The lengths of the matK gene of Curcuma species varied from 1831 bp (C.longa (Wakayama B)) to 1846 bp (C.thorelii).
The results of the phylogenetic analysis indicated that Curcuma species consists of a monophyletic group (Figure 2), with the genus Curcuma primarily divided into two monophyletic groups: clade 1 and clade 2. Clade 1, located at the most basal monophyletic position of Cur-   ever, because there was no synapomorphic character in this group, we were unable to determine whether C. aromatica was monophyletic or paraphyletic.In addition, 13 samples of C. longa did not share a synapomorphic character that would place them in the monophyletic group; therefore, this species is also considered to show polytomy.We were unable to determine whether C. longa was monophyletic or paraphyletic.
To establish the genetic polymorphisms within C. longa, we determined the chloroplast microsatellite regions of rpl16 intron 2, petB intron 1, and petB intron 2 of C. aromatica, C. longa, and C. zedoaria.The sequence lengths of rpl16 intron 2, petB intron 1, and petB intron 2 were 42-46 bp, 168-169 bp, and 47-62 bp, respectively.Network relationships of each individual by statistical parsimony showed that C. longa included two haplotypes: C and D (Figure 3).The genetic relationship between haplotype C and D was distant.It is notable that haplotype C was closely related to C. aromatica and C. zedoaria despite haplotype C being the most distinct of the C. longa haplotypes.

Phylogenetic Relationship of Curcuma and Its Allied Species
To clarify the phylogenetic relationships of Curcuma and its allied species, we determined the nucleotide sequences of the cpDNA matK region for all species of the genus examined in this study.Only a few characters were considered in the phylogenetic analysis.We suggest that the degree of genetic differentiation among species of the genus Curcuma is small, despite the fact that these species are morphologically diverse, as evidenced by shoot/ root architecture and floral coloration.Therefore, it is likely that the genus lineage has undergone recent, rapid radiation.To confirm the scenario of rapid radiation in the genus Curcuma, we also determined the high-substitution regions of the regions of the rps16 intron (accession number AB557658) and trnG (UCC) intron (accession numbers AB557659 and AB557660), and the two intergenic regions of trnV (UAC)-trnM (CAU) (accession number AB55766) and petD-rpoA (accession number AB557662).No variable characters were detected even though the utility of these regions for intraspecific phylogeny was well determined [9].This result also supports the hypothesis that the evolutionary history of the Curcuma genus underwent recent diversification.Several reports of the rapid radiation of continental plant species have been published [19,20], although most of these examples are reported to have occurred on oceanic islands [21,22].In most oceanic examples, rapid radiation is hypothesized to have been driven by low levels of competition in new habitats [23].However, none of the samples used in the present study occur on oceanic islands.Alternatively, the concept of key innovations often is employed to explain the rapid radiation of a lineage [20,23,24]; therefore, the evolution of any key innovations in Curcuma may lead to rapid radiation.In the future, more studies should compare the morphologic and/or physiologic characters of Curcuma and its allied species to determine the nature of any key innovations.

Genetic Markers to Identify Lines with High Curcumin Content
Simple molecular marker assays that do not rely on specialized equipment or reagents underpin routine research activities in many laboratories worldwide.Recent research has shown that the non-coding portion of the plastid genome is more variable than previously anticipated [9].Mutational hot spots have been reported from intronic or intergenic cpDNA regions of a number of organisms.Such hot spots are usually characterized by the presence of mononucleotide repeats [25] and a high incidence of duplications and other types of indels [26,27].
Short inversions [28] and minisatellites [29,30] have also been identified but seem to occur less frequently.In the present study, we were able to determine the sequencing repeat of three regions in Curcuma species, rpl16 intron 2, petB intron 1 and petB intron 2, providing evidence for additional and previously unrecognised types of polymorphic regions in the cpDNA of this group.Identifying mutational hot spots in cpDNA is of utmost interest for studies below the species level.These so-called microsatellites have been used to detect intraspecific chloroplast polymorphisms in various species from a number of plant families [7].In many cases, the amplified fragments also contained size-variable sites in the target species.Thus, microsatellites may provide reasonable genetic markers to amplify the polymorphic chloroplast regions from any grasses, crops, fruits, and vegetables of interest.As the position of mutational hot spots in general is usually not conserved across species, these primer pairs will have to be tested in any new taxon on the basis of trial and error.However, the present report could extend the application of microsatellites to the Curcuma species.In fact, we observed three instances of haplotype sharing within C. longa, together with a considerable degree of intraspecific variation.Moreover, it is interesting that C. longa exhibits unique chloroplast haplotypes (e.g.haplotype C) that are involved in lines with high curcumin content (Figure 3).This result indicates that the microsatellite regions are useful for identifying C. longa candidate lines with high curcumin content.Further studies will determine whether more comprehensive sampling and additional evidence will support the results of the present study regarding the identification of C. longa lines with high curcumin content.
In this study, we showed that the cpDNA of C. longa can be divided into two groups, expecting to expose a cryptic species with the appropriate biologic characteristics.Cryptic species are closely related species that differ critically in genetic, physiologic, behavioral, and ecological traits [31], and the abundance of morphologically cryptic or unrecognized species, even in well-known taxa, suggests that there are more species than are currently recognized or estimated [32].Recent studies have reported that cryptic species seem to be especially common in certain taxa [33][34][35][36], and similar evidence for cryptic species was found in our results.Therefore, in order to verify this hypothesis with respect to C. longa, further reciprocal crosses among the haplotypes discussed in the present study may lead to a conclusion of cryptic diversity.In the future, additional research will be needed to determine if the various cryptic species differ in their ecological requirements and tolerance to potential environmental stresses.There is also the need to re-examine the alpha taxonomy of the four species showing cryptic diversity.Without this basic taxonomic research, these cryptic species will remain in taxonomic crypsis [37].
Our limited data set does not allow us to decide the most appropriate genetic marker at the present time.To accurately map the distribution of C. longa, and to monitor phylogeographic patterns within this species in more detail, we are currently applying the chloroplast markers developed in the present study to a large number of specimens sampled throughout the distributional range of this species.In addition, genetic and phylogenetic studies in Curcuma species have lagged behind those of other crop species.The network analysis presented here, along with the polymorphic information presented, provide valuable genetic resources for future studies of C. longa and related species.

Figure 2 .
Figure 2. Phylogenetic tree of Curcuma and its allied species using the neighbour-joining (NJ) method.The numbers below the branches indicate the bootstrap value.Black squares indicate high curcumin content.Black circles indicate medium curcumin content.White circles indicate low curcumin content.

Figure 3 .
Figure 3. Parsimony network of C. longa, C. aromatica, and C. zedoaria based on cpDNA rpl16 intron 2, petB intron 1 and petB intron 2. Haplotype letters correspond to those in Table 3. White circles indicate the taxa in which haplotypes were found.Black circles between haplotypes represent a mutational step.

Table 4 . Curcumin content at maturity 2006-2009.
1) Curcumin content in primary branch rhizomes (mg/100g).2) Percentage of curcumin 1, 2, and 3 content to curcuminoids content.Values followed by the same letter in a column are not significantly different at 5% level by one-way ANOVA.