Discrimination of Wild-Grown and Cultivated Ganoderma lucidum by Fourier Transform Infrared Spectroscopy and Chemometric Methods

Wild-grown Ganoderma lucidum (G. lucidum), a traditional Chinese herbal medicine, is highly cherished and expensive for its medicinal efficiency. This study targets the development of an accurate and effective analytical method to distinguish wild-grown G. lucidum from cultivated ones, which are of essential importance for the quality assurance and estimation of its medicinal value. Furthermore, different parts of G. lucidum have been studied to examine the differences between wild-grown and cultivated ones. Fourier transform infrared (FTIR) diffuse reflectance spectroscopy combined with the appropriate chemometric method has been proven to be a rapid and powerful tool for discrimination of wild-grown and cultivated G. lucidum with classification accuracy of 98%. The informative spectral absorption bands for discrimination emphasized by the linear diagnostic rule have provided quantitative interpretations of the chemical constituents of wildgrown G. lucidum regarding its anticancer effects.


Introduction
Ganoderma lucidum (G.lucidum), a traditional Chinese herbal medicine called "lingzhi" in Chinese, has been widely used as a medical remedy in China and other East Asian countries for centuries.According to the literature record of Chinese medical classic, Herbal Compendium of Shen Nong, this edible mushroom is one of the most esteemed and potent herbal medicines used for maintaining good health and preventing disorders and diseases.It has also been considered as a potential candidate for treatment of different diseases, including cancer [1] [2].
The wild-grown high-quality G. lucidum is rare in nature, and thus has always been highly cherished and expensive for its medicinal value.The cultivated G. lucidum has been commercially in demand, particularly in China during the past several decades.Due to their different growing conditions, the wild-grown and cultivated G. lucidums may contain different levels of effective chemical components which affect their quality and medicinal efficacy.Since many G. lucidum products now come in various formulations such as capsules and powder, it is difficult to identify its wild-grown product by means of physical appearance, smell, or taste.Therefore, an accurate and effective analytical method to determine the differences between wild and cultivated G. lucidum in their unprocessed states is of essential importance for the quality assurance and estimation of medicinal value before it is converted to the final product.
G. lucidum, like other Chinese herbal medicines, is a complicated system of compounds.Currently, herbal medicines have been commonly investigated with the use of high performance liquid chromatography (HPLC), thin layer chromatography (TLC), and colorimeter.These methods are found to be expensive, time-consuming, labour-intensive, and requiring a large quantity of organic solvents.Also, the results are inadequate for classification purposes because of the limited amount of active chemical components that can be detected in what is a very complex system [3] [4].
Fourier transform infrared (FTIR) spectroscopic methods have many advantages for the classification of herbal medicines in terms of easy and direct usage of technique, non-destructiveness, a small quantity of samples needed and short data acquisition time.Studies on herbal medicines using the FTIR technique are still in its infancy [5] [6].
Furthermore, FTIR spectra of herbal medicines consist of many overlapping absorption bands representing the different modes of vibration of a large number of molecular constituents in the compounds.These vibrational bands are sensitive to the physical and chemical states of the compounds, and they can be detected at low levels [4].However, the differences in the FTIR spectra within the same herbal species may be subtle and even not visible to the naked eye.Even for experienced analysts, distinguished by simple visual inspection, the slight difference between samples among particular absorption bands is subjective and the results may vary between analysts.Therefore, suitable chemometric methods have been applied in our study to analyze the FTIR spectra.
For the analysis of herbal medicine, there are limited studies which quantify the main constituents in herbal medicines samples, like ginseng, semen cassia, and G. lucidum, by using FTIR spectroscopy [3]- [6].Moreover, to our best knowledge, there have been no such reports on the discrimination of wild-grown G. lucidum samples from cultivated ones using FTIR spectroscopy.
This paper investigated the feasibility of the discrimination between wild-grown and cultivated G. lucidum, as well as the discrimination between different parts of the G. lucidum, by FTIR diffuse spectroscopy along with chemometric methods.The multivariate methods based on linear discriminant analysis (LDA) explored in this paper would be simple, robust and computationally efficient.In particular, the directions of linear discriminant vectors can be potentially interpretable as directions with the informative spectral bands emphasized for discrimination, which would be useful in exploring the correlations between spectral features and the major chemical compounds of wild-grown G. lucidum regarding its anti-cancer effects.

Sample Preparation
The cultivated G. lucidum and the wild-grown G. lucidum were originated from Taishan, China.The fruiting body (or pileus) of both types of G. lucidum was cross-sectioned into thin slices.A cross-section of the G. lucidum slice showed three structured layers with colours growing lighter from the top to the bottom of the pileus: the upper crust (skin), the mid-context layer (flesh), and the lower tubular layer (fine channels) as shown in Figure 1.A total of 15 cultivated sliced samples and 15 wild-grown sliced samples were used for data collection.For each sample, four parts (top surface, upper middle area, lower middle area and bottom surface) were studied.From each part, three spectra were collected at different positions with one from centre, one from right-hand side and one from left-hand side of the cross-section, as illustrated in Figure 1.In total, 360 G. lucidum spectra, including 180 spectra from cultivated samples and 180 spectra from wild-grown samples, were then collected and used for our analysis.
Before collecting the FTIR diffuse reflectance spectrum, the fine powder of each raw sample was transferred into a circular (1-cm diameter) silicon carbide (SiC) disc by rubbing it on the sample.The diffuse reflectance spectrum of the G. lucidum powder coating on the disc was recorded directly without further processing of the sample.

FTIR Spectroscopic Measurement
A FTIR spectrometer (Perkin Elmer Model 100) equipped with a diffuse reflectance accessory was used to record the diffuse reflectance spectra of the G. lucidum powder coated on SiC discs.The FTIR diffuse reflectance spectra were recorded in the mid-IR region of 4000 -400 cm −1 at resolution of 4 cm −1 with 16 scans for each spectrum.Each spectrum with high signal-to-noise signal was obtained by an average of these 16 scans.The background spectrum which was the diffuse reflectance spectrum of the SiC disc without the sample powder was also recorded with the same parameters.The sample spectrum was then ratioed with the background spectrum to obtain a transmittance or absorbance spectrum with the unwanted absorption bands of water and carbon dioxide removed [7].Therefore, the diffuse reflectance absorption spectrum of a G. lucidum sample with strong absorption bands was accurately collected.

Spectral Pre-Treatment
FTIR spectra are affected by both the concentration of the chemical constituents and the physical properties of the analyzed product, and the latter properties account for the majority of the variance among spectra while the variance due to chemical composition is considered to be small.It is necessary to perform mathematical pretreatments to reduce the variation due to physical effects, such as baseline variation, light scattering, path length differences, etc, so as to enhance the contribution of the chemical composition [4] [6] [8].
The spectra were first smoothed using the Savitzky-Golay algorithm [9], spanning a 10-point window.To speed subsequent manipulation, the smoothed data were then reduced by taking every third point only.To remove the regions of the spectra with low signal-to-noise ratios arising from the lower system response, only the wavenumbers ranging from 4000 to 450 cm −1 , with 593 spectra points at 5.987 cm −1 intervals, were used in the analysis.The standard normal variate (SNV) method [10], as a mathematical transformation method for spectra, was used to remove slope variation and to correct light scatter due to different particle sizes.The spectra were therefore normalized by setting the mean intensity of each spectrum to zero and the variance to one.The mean spectra from cultivated and wild-grown G. lucidum after pretreatment were presented in Figure 2.

Statistical Analysis
FTIR spectra of herbal medicines consist of many overlapping absorption bands which are the product of complex patterns of biochemical components.Multivariate statistical methods including principal component analysis (PCA), partial least squares (PLS) and linear discriminant analysis (LDA) [11]- [13] were therefore employed in this study to investigate the differences of spectra from wild-grown and cultivated G. lucidum.PCA and PLS were used to reduce the dimension of the original spectral data matrix, X, with little loss of information.From a large number of variables measured on a given set of samples, PCA extracts a small to moderate number of new variables that account for most of the variability between samples.The new variables, called principal components (PCs), are linear combinations of all the original spectral measurements and are uncorrelated to each other.Alternatively, PLS seeks to find a small to modest number of latent variables, each of which, called PLS component, is obtained by maximizing the covariance between response y and all possible linear functions of X.Then LDA focuses on finding a linear combination of the new variables, provided either by PCA or PLS, to construct canonical variate which best separates the two groups.Using pretreated spectra data described in Section 2.3, classification rules were derived using principal component discriminant analysis (PCDA) [14] [15],  and partial least squares discriminant analysis (PLSDA) [15] [16].The PCDA involved an initial PCA on the pre-treated spectra followed by a LDA performed on the first k PCs' scores.The PLSDA involved a PLS regression on the pre-treated spectra followed by a LDA on the first k PLS components' scores.Both PCDA and PLSDA were carried out with k ranging from 2 to 20.
Leave-one-out cross-validation was used to train the algorithm by carrying out the PCDA or PLSDA classification rule on all the data except one site which was then tested.This was repeated until all sites have been tested and an overall model accuracy was determined.To ensure that the results obtained are not training set specific, a repeated holdout validation as an alternative analysis was used with 60% of the data to train and 40% to test the model.To ensure statistical robustness, this process was repeated 50 times with different randomly resampled training and test sets, and the averages with their standard deviation were presented to assess the classification performance.
Furthermore, since different parts of G. lucidum showed very different internal structures, the samples collected from different parts of the fruiting body were believed to have different constituent properties [17].The same PCDA, PLSDA procedure and validation analysis were carried out on wild-grown and cultivated G. lucidum for discrimination between spectra from different parts of the pileus.
Since a preliminary study showed that the spectra collected from the different positions (central, left or right) within the same part of the sample do not show significant differences for either cultivated or wild-grown G. lucidum, which can also been supported by the results in Section 3.3, hereby we treated those spectra from the same part of the sample as the ones from the same site.Therefore, all the above analysis was carried out on a per-site-base since this usually gave more reliable results than a per-spectra analysis.In per-site analysis, the linear discrimination rule was based on the average canonical scores of the spectra from one particular site for either leave-one-out cross-validation or repeated holdout validation.All the algorithms for computations and analyses were implemented in R statistical programming language [18].

Absorption Band Assignments of FTIR Spectra of G. lucidum
The typical FTIR spectrum of G. lucidium after pretreatment was presented in the region of 4000 -450 cm −1 .The major peaks of the absorption bands were labelled on the mean spectrum as shown in Figure 2. Table 1 provided the wavenumbers and their corresponding assignments of the aborption bands in the FTIR spectrum of G. lucidum based on literature [19]- [22].
The polysaccharides and the triterpene compounds (also known as triterpenoids) have long been established to be the most biologically active substances in G. lucidum [23].The bioactive polysaccharides in the forms of glucomannan and arabinan identified by the absorption band at 1064 cm −1 and 1035 cm −1 respectively (listed in Table 1) in G. lucidum have been demonstrated to exhibit strong anti-tumor activities including preventing oncogenesis and tumor metastasis [24] [25].Furthermore, the triterpene compounds identified by the absorption band at 1377 cm −1 and 1145 cm −1 (given in Table 1) in G. lucidum, has shown to inhibit primary solid-tumor growth in the spleen and secondary metastatic tumor growth in the liver [24].

Discrimination between Wild-Grown and Cultivated G. lucidum
As shown in Figure 2, the mean spectra of wild-grown and cultivated G. lucidum after pretreatment had very similar patterns.It was difficult to distinguish between wild-grown and cultivated G. lucidum through visual inspection, which indicated that the major components in two types of G. lucidum are similar.Thus, appropriate multivariate chemometric methods as described in Section 2.4 were applied to discern the differences between these two types of G. lucidum.
For PCDA model, the number of PCs chosen is crucial to the performance of discrimination.The discrimination results of cross-validation and repeated holdout were used to optimize the number of PCs.The first fifteen PCs representing 98% of the total variance in the spectral data were used to construct the PCDA model for discriminating between wild-grown and cultivated G. lucidum.Results of the leave-one-out cross-validation and the repeated holdout validation analyses for assessing the PCDA model were shown in Table 2(a).The leaveone-out cross-validation analysis gave a discrimination accuracy of 98%.The repeated holdout validation analysis produced an average discrimination accuracy of 96% with 2% standard deviation.
With the relationship between the spectra variables and the responses taken into account for latent variable design, the PLSDA model appeared to do a better job than the PCDA model as shown in Table 3(a), giving comparable discrimination results but using a fewer optimal number of latent variables (only four) in constructing the canonical variate.
The results from both models suggested that there may exist some inherent compositional differences caused by different growing environment between cultivated and wild-grown G. lucidum even though they actually belong to the same species.

Discrimination between Different Parts of G. lucidum Slice
When studying on the differences between different parts of G. lucidum slice, we combined the spectral data of upper middle area and that of lower middle area.So three different parts, top, middle and bottom parts, of pileus of G. lucidum, were studied to examine their differences.
The PCDA model was firstly applied to discriminate between top and middle parts of pileus for cultivated and wild-grown G. lucidum separately.The two groups, cultivated and wild-grown G. lucidum, showed comparably good discrimination results with above 98% accuracy for leave-one-out cross-validation and 98% accuracy for repeated holdout validation analysis as shown in Table 2

(b) and Table 2(c).
When the PCDA model was then applied to discriminate between bottom and middle parts of pileus for culti-   vated and wild-grown G. lucidum separately, the two groups showed different discrimination performances.The wild-grown group achieved a high accuracy of 99% for leave-one-out cross-validation and 96% for repeated holdout validation while the cultivated group gave a lower accuracy of 87% for leave-one-out cross-validation and 83% for repeated holdout validation as presented in Table 2

(b) and Table 2(c).
The PLSDA model gave fairly consistent results with PCDA model, but used a very small number of optimal latent variables (only two or three) for discrimination as shown in Table 3

(b) and Table 3(c).
The results suggested that the differences between upper and middle parts were prominent for both wildgrown and cultivated G. lucidum.However, the differences between middle and bottom parts of wild-grown G. lucidum may be better detected than that of cultivated one.These findings can also be presented in a 3D space diagram provided by the PCDA or the PLSDA model.Figure 3(a) and Figure 3(c) showed that in the 3D space represented by the first three PCs, a clear separation plane can be found to discriminate top part from middle part for either wild-grown or cultivated group.When distinguishing between middle and bottom parts, there was a neat separation between the two parts for the samples from wild-grown group as illustrated in Figure 3(d).However, for the samples from cultivated group, the separation between the middle and bottom parts was not clearly displayed as shown in Figure 3(b).The discrimination between different parts of G. lucidum can be much better displayed in a 3D space by using PLSDA model as shown in Figure 4.With the first four PLS components used in constructing the optimal PLSDA model, the 3D scatter plot of the first three PLS components in Figure 4(d) illustrated a much clearer separation between middle and bottom parts of wild-grown G. lucidum, when compared to the corresponding 3D scatter plot of the first three PCs in Figure 3(d) which explained 90% of the total variance of the spectral data.
The mean spectra from middle and bottom parts of wild-grown G. lucidum also showed some increased between-group differences compared with that of cultivated one, especially at certain spectral regions like ~2900 cm −1 , ~1600 cm −1 , and ~1000 cm −1 as shown in Figure 5.
Further interest in the differences between central and side (left-side or right-side) positions within the same part of pileus were also explored by using a PCDA model with number of PCs varying from 7 to 15.No obvious difference was found for either cultivated or wild-grown G. lucidum as shown in Table 4.This also verified that it was reasonable to treat those spectra from the different positions of the same part of pileus as replicated spectra collected from the same site, as mentioned in Section 2.4.
With different growing environments, different parts of G. lucidum may contain different levels of major chemical components and thus show some different internal structures.Completely exposed to nature or cultivated environments, the upper part of G. lucidum often changes quickly.However, the changes of bottom part with environments usually take longer time, and thus those wild-grown one, which grows slower and are harvested when being old, showed more differences between bottom and middle parts than the cultivated one.The internal structure of the fruiting body of G. lucidum seems to be very important for identifying wild-grown group from cultivated one.

Correlation between Spectral Absorption Bands and Chemical Components of G. lucidum and Its Medicine Effect
Discrimination performance may be explained by the correlation between spectral feature and chemical constituents of G. lucidum.The PCDA loading combining the loadings from the PCA and LDA gives the PCDA loading of the original variables in constructing a canonical variate.In the same way as for the PCDA loading, combining the loadings from the PLS and LDA gives the PLSDA loading of the original variables.Both the PCDA loading and the PLSDA loading show the contribution at each wavelength to the linear diagnostic rule and thus can be related easily to the spectral features, which permits interpretation of its spectral basis.In Figure 6, the most obvious feature was a large PCDA loading in the regions of 1150 -1000 cm −1 and 1760 -1600 cm −1 (peaking at around 1000 cm −1 and 1600 cm −1 respectively) corresponding to some slight differences between the mean spectra of wild-grown and cultivated G. lucidum.The prominent absorption peaks observed in the region of 1150 -1000 cm −1 were very characteristic of triterpenoids and polysaccharides due to the C-O and C-C vibrations.The other prominent absorption peaks at around 1760 -1600 cm −1 were consistent with a C=O stretching vibration in carbonyl compounds which may be characterized by the presence of high content of terpenoids and protein in the mixture of G. lucidum.The presence of a sharp peak at ~2900 cm −1 was due to C-H stretching vibration [26]- [28].
A comparison between the PCDA loadings and the PLSDA loadings for discrimination between cultivated and wild-grown G. lucidum can be seen in Figure 7.The loading features observed from PLSDA model were very similar to those from the PCDA model.The major features of these loadings can also be explained by the assignments of the corresponding absorption bands in the FTIR spectrum listed in Table 1.The high consistency between the loadings of the linear diagnostic rules and the chemical features of the FTIR spectrum may provide a quantitative explanation of the major chemical constituents of wild-grown G. lucidum with respect to chemometrics.It is known that G. lucidum contains approximately 400 different bioactive compounds [23].Among these ingredients, triterpenoids, polysaccharides and protein are the major chemical constituents of G. lucidum [25].These biologically active compounds have been demonstrated to prevent oncogenesis and tumor metastasis, and thus have anticancer effect [24] [25] [29].Some comparative studies also reported that the different parts of the fruiting body of G. lucidum showed differences with regard to their antitumor effects in human breast cancer cells and immunomodulatory activities [27].

Conclusions
Wild-grown G. lucidum is a rare and cherished herb for its many therapeutic effects and medicinal value.Since many G. lucidum products are sold in various formulations, distinguishing between wild-grown and cultivated G. lucidum products by morphological means becomes difficult.Thus the quality assurance of wild-grown G. lucidum is of essential importance.
In this study, FTIR spectroscopy combined with multivariate analysis after the appropriate spectral data pretreatment has been proved to be a very powerful tool to distinguish wild-grown G. lucidum from cultivated ones.The great advantage of FTIR spectroscopy is its easiness of sample preparation, no need of sample destruction and rapid identification of natural products.The results of this study showed that an excellent classification performance can be obtained by linear discrimination models with accuracy up to 98%.Both the PCDA and the PLSDA model can achieve comparable classification accuracy.But the PLSDA model was simpler than the PCDA model by using a small number of latent variables, which had advantages in terms of algorithm implementation and model interpretation.Furthermore, different parts of G. lucidum have been studied to investigate the differences between wild-grown and cultivated ones.The wild-grown G. lucidum showed more differences than the cultivated one in its internal structure of the fruiting body, particularly between bottom and middle parts, which seemed to be very important for identifying wild-grown product.
These results suggested that this multivariate analysis method may have commercial and regulatory potential to avoid time-consuming recalibration work, costly and laborious chemical and visual analysis for each sample.Though using entire spectral band, the mathematical classification algorithm based on linear discriminant analysis was computationally efficient.Most importantly, the directions of discriminant vectors used here can be physically interpretable as directions where the informative spectral bands for classification are emphasized, which showed some correlation between spectral absorption band and certain important chemical components of wild-grown G. lucidum with anticancer effects.This is a novel, yet interesting finding, as there have been no such studies on G. lucidum which have shown this correlation from the viewpoint of chemometrics.This work therefore played an important role of providing quantitative interpretation and scientific support to the claims on the health benefits of G. lucidum as well as on the antitumor properties of wild-grown G. lucidum.Further studies will focus on the different parts of G. lucidum for its medicinal value by using FTIR spectroscopy.

Figure 1 .
Figure 1.A cross-section of fruiting body of G. lucidum.

Figure 2 .
Figure 2. Mean spectra from cultivated (solid line) and wild-grown (dotted line) G. lucidum after standard pre-treatment.

Figure 3 .
Figure 3.The 3D scatter plot of the first three PCs' scores of the spectra from (a) top and middle parts of cultivated G. lucidum; (b) middle and bottom parts of cultivated G. lucidum; (c) top and middle parts of wild-grown G. lucidum; (d) middle and bottom parts of wild-grown G. lucidum (top or bottom part: circle (○); middle part: triangle (▲)).

Figure 4 .
Figure 4.The 3D scatter plot of the first three PLS components' scores of the spectra from (a) top and middle parts of cultivated G. lucidum; (b) middle and bottom parts of cultivated G. lucidum; (c) top and middle parts of wild-grown G. lucidum; (d) middle and bottom parts of wild-grown G. lucidum (top or bottom part: circle (○); middle part: triangle (▲)).

Figure 5 .
Figure 5. Mean spectra from middle part (in solid line) and bottom part (in dotted line) of (a) cultivated G. lucidum, and (b) wild-grown G. lucidum.

Figure 6 .
Figure 6.PCDA loadings for discrimination between the spectra from cultivated and wild-grown G. lucidum.The PCDA loading is shown in grey, with the mean spectra for the two types superimposed (cultivated in solid line, wild-grown in dotted line).

Table 4 .
Discrimination results of leave-one-out cross-validation for differentiating between central and side positions within the same part of pileus of (a) cultivated G. lucidum or (b) wild-grown G. lucidum, using seven to fifteen PCs for LDA.

Figure 7 .
Figure 7.Comparison of PCDA loadings (green line) and PLSDA loadings (blue line) for discrimination between the spectra from cultivated and wild-grown G. lucidum.

Table 1 .
Absorption band assignments of the FTIR spectrum of G. lucidum.

Table 2 .
Discrimination results of PCDA model using leave-one-out cross-validation and repeated holdout validation for differentiating between (a) spectra of cultivated and wild-grown G. lucidum; (b) spectra of different parts of wild-grown G. lucidum; (c) spectra of different parts of cultivated G. lucidum using optimal number of PCs for LDA.

Table 3 .
Discrimination results of PLSDA model using leave-one-out cross-validation and repeated holdout validation for differentiating between (a) spectra of cultivated and wild-grown G. lucidum; (b) spectra of different parts of wild-grown G. lucidum; (c) spectra of different parts of cultivated G. lucidum using optimal number of PLS components for LDA.