Near-Infrared Spectroscopy Combined with Partial Least Squares Discriminant Analysis Applied to Identification of Liquor Brands

The identification of liquor brands is very important for food safety. Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential. However, it is also difficult because the components of such liquor samples are very similar. Near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was applied to identification of liquor brands with the same flavor and alcohol content. A total of 160 samples of Luzhou Laojiao liquor and 200 samples of non-Luzhou Laojiao liquor with the same flavor and alcohol content were used for identification. Samples of each type were randomly divided into the modeling and validation sets. The modeling samples were further divided into calibration and prediction sets using the Kennard-Stone algorithm to achieve uniformity and representativeness. In the modeling and validation processes based on PLS-DA method, the recognition rates of samples achieved 99.1% and 98.7%, respectively. The results show high prediction performance for the identification of liquor brands, and were obviously better than those obtained from the principal component linear discriminant analysis method. NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the discriminant analysis of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.


Introduction
Chinese liquor is a distilled spirit mainly made from grain and obtained using distiller's yeast.This type of liquor also contains abundant micronutrients and active ingredients.Moderate drinking results in positive effects to various aspects.China is the leading country in liquor production and consumption.Unfortunately, many fake products are being sold in the market because liquor occupies sufficient market share and is highly profitable.These fake liquors are generally composed of the low-cost inferior liquors that counterfeit famous liquor brands and are prepared by simple dilution of industrial ethanol.They not only cause economic losses to producers of famous liquor brands but also pose serious threat to the health of consumers.As an important part of liquor quality inspection, the identification of liquor brands is increasingly attracting considerable attention.
Chinese liquor is a complex mixtures and composed mainly of water and ethanol; the remaining components contain hundreds of trace elements with various contents.Identification of liquor brands usually requires the determination of various feature components and their content recipes using traditional instrumental analysis methods (e.g., high-performance liquid chromatography).Such a detection method is complicated, costly, and cannot meet the needs of large-scale applications.Currently, identification of liquor products relies mainly on the sensory judgment of tasters.This method results in problems (e.g., great subjectivity and low precision) and is difficult to conduct in large-scale promotions.Thus, developing a simple and effective identification method of liquor brands is valuable.
Chemometric developments have demonstrated the significant potential of the near-infrared (NIR) spectroscopy analysis method in rapid and reagent-less measurement.This method is a powerful tool for quantitative analysis in various fields, such as agriculture [1] [9], environment [10], and medicine [11] [12] [13] [14] and so on.The NIR quantitative analysis has been used for determining the main components of liquor, including ethanol [7], ethylacetate [8], and aldehydes [9].However, the components and contents are different in liquors of various brands that have diverse raw materials and production processes.Therefore, the identification of liquor brands is still difficult by the quantitative analysis of the above-mentioned conventional components.
Spectral discriminant analysis uses computer pattern recognition to identify and classify samples on the basis of collected spectral data.Instead of the quantitative analysis for some components of samples, its bases are the spectral overall features including the spectral similarities of the same type samples and the Chinese liquor brands use three main flavors, namely, strong fragrant, mild fragrant, and sauce fragrant flavors.Compared with the liquor samples with same flavor, the difference of the liquor components and the corresponding spectra with different flavors are more obvious [15].Water and ethanol are the main components in liquor.The spectra of liquors with different ethanol contents are remarkably diverse.
The identification of liquor brands is very important for food safety.Most of the fake liquors are usually made into the products with the same flavor and alcohol content as regular brand, so the identification for the liquor brands with the same flavor and the same alcohol content is essential.However, it is also difficult because the components of such liquor samples are very similar.As far as we know, there is currently barely no effective discriminant method for liquor samples with the same flavor and alcohol content.
The present study focused on the identification method for liquor brands with the same flavor and ethanol content.Although difficult, such a method is important and effective.The partial least squares discrimimant analysis (PLS-DA) method combined with the NIR spectroscopy were employed for the spectral discriminant analysis of liquor brands, and the principal component linear discriminate analysis (PCA-LDA) method is also performed for comparison.Spectral measurement was performed using a VERTEX 70 FT-NIR Spectrometer (Bruker Co., Germany) equipped with a transmission accessory and a 1 mm cuvette.Twelve scans of symmetrical interferograms at an 8 cm −1 resolution were added to each spectrum.The scanning scope of the spectrum ranged from 14994 cm −1 to 3996 cm −1 at a 3.857 cm −1 wavenumber interval, with 2852 wavenumbers.An InGaAs detector was used for the entire scanning region.Each liquor sample was measured twice, and the mean value of the measurements was used for modeling and validation.The spectra were obtained at 25˚C ± 1˚C and 45% ± 1% RH.The time of acquisition of an NIR spectrum was about 0.5 min.

Calibration, Prediction, and Validation Process
The Kennard-Stone (K-S) algorithm [16] [17] is an effective method for sample division in experimental planning.The objective is to select a maximally diverse subset from a large set of candidate samples.Thus, the subset can uniformly and sufficiently represent the entire sample space.The algorithm assumes that a "distance" between two samples can be defined, and the value is low when the two samples are similar and high when the samples are dissimilar.
A framework of calibration, prediction, and validation was performed to produce objective models.To ensure modeling representativeness and integrity, the calibration, prediction, and validation sets must all contain negative and positive samples.First, 60 negative and 80 positive samples were randomly selected for validation.The remaining samples (100 negative and 120 positive) were used for modeling.Using the K-S algorithm, the modeling samples were further divided into calibration (50 negative and 60 positive) and prediction (50 negative and 60 positive) sets to achieve uniformity and representativeness.Then, all models were established for the calibration and prediction sets, and the modeling parameters were optimized on the basis of the prediction recognition rate.Finally, the selected model was revalidated against the validation samples excluded from the modeling process.

PCA-LDA Method
PCA-LDA is the commonly well-performed method for spectral discriminant analysis [1] [2].According to the principal component of the cumulative variance contribution rate to select the number of principal components, the first three principal components usually represent most of the information provided by the original variables.The two-dimensional PCA models with the combinations of any two in the first three principal components were usually adopted to next LDA procedure.The optimal principal component combination was selected according to the maximum P_REC.The detailed procedure can be found in the previous study [1] [2].
Based on principal component analysis, the PCA-LDA method uses the principal component vector of the spectral matrix to achieve qualitative discrimination of samples.

PLS-DA Method
Unlike the PCA-LDA method, the PLS-DA method classified the results of PLS quantitative analysis based on the assignment method, and then achieved qualitative discrimination of samples [18] [19].In the PLS-DA method, the process for calibration and prediction is as follows.
(1) The category variables of calibration samples were defined, and the value was assigned to 1 (or 0) for each positive (or negative) sample.(2) The number of PLS factors (F) was set from 1 to 20, and the PLS regression coefficients for each F were calculated on the basis of the spectra and categorical variables of all calibration samples.( On the basis of the obtained PLS coefficients and the spectrum of each prediction sample, the corresponding predictive values ( P V  ) of a category variable were further calculated for each F; when P V  > 0.5, the category variables (V P ) of prediction samples were assigned to 1 and the samples were determined as positive; otherwise, V P values were assigned to 0, and the samples were determined as negative.( 4) Referring to the genuine brand type of each prediction sample and the number of correctly recognized prediction samples, the prediction recognition rate can be calculated easily and was denoted as P_REC.The optimal number of PLS factors (F) was selected according to the maximum P_REC.

Model Validation
The validation samples excluded from the modeling optimization process were used to validate the optimal model of PLS-DA method.According to the genuine brand type of each validation sample and the number of correctly recognized validation samples, the validation recognition rate can be calculated easily and was denoted as V_REC.Moreover, the corresponding validation recognition rates of negative and positive samples can be calculated and were denoted as V_REC − and V_REC + , respectively.

Results and Discussion
The NIR spectra in the entire scanning region (14994 -3996 cm −1 ) of 200 Luzhou Laojiao (negative) and 160 non-Luzhou Laojiao (positive) liquor samples are shown in Figure 1.At 5128 and 6896 cm −1 , the absorption bands related with the OH stretch first overtone and second overtone of water, respectively [20].
The spectral features of Figure 1(a) and Figure 1(b) were compared.Given that the spectra of negative and positive samples were overlapping, no obvious spectral differences were obtained for direct discriminant analysis.

PCA-LDA Model
Using the method in Section 2.3, the PCA-LDA model was first established.The optimal principal component combination was PC 1 -PC 2 , and the corresponding P_REC was 92.4%.The corresponding modeling parameters and effects were summarized in Table 1.

PLS-DA Model
The PLS-DA model was established according to the method in Section 2.4.The optimal number of PLS factors (F) was 7, and the corresponding P_REC was 99.1%.The corresponding modeling parameters and effects were also summarized in Table 1.The result indicates that the NIR spectroscopy combined with the PLS-DA method achieved good performance for the discriminant analysis of liquor brands, which was obviously better than that obtained from PCA-LDA model.

Validation
The randomly selected validation samples excluded from the modeling optimization process were used to validate the optimal model of PLS-DA method (F = 7).The corresponding validation recognition rates V_REC − , V_REC + , and V_REC achieved 96.7%, 100% and 98.6%, respectively.As shown in Figure 2, the validation samples were clearly divided into two parts using the different predicted class variables.Among them, only two samples were wrongly discriminated.

Conclusions
Chinese liquor is a popular alcoholic beverage, and occupies huge market share in China.The identification of liquor brands is of great significance for liquor food safety.There is currently barely no effective discriminant method for liquor samples with the same flavor and alcohol content because their chemical components are very similar.
In the present study, the PLS-DA method was successfully applied to the NIR spectral discriminant analysis of liquor brands with the same flavor and ethanol content.The experimental results indicate that the optimal PLS-DA model achieved high prediction recognition rate for the identification of liquor brands, and were obviously better than the results obtained from the PCA-LDA method.
NIR spectroscopy combined with the PLS-DA method provides a quick and effective means of the identification of liquor brands, and is also a promising tool for large-scale inspection of liquor food safety.
Further wavelength selection can usually improve the spectral prediction effect, and reduce the scope of waveband, which will be the direction of the future researches.

Figure 2 .
Figure 2. Validation recognition of the optimal PLS-DA model.

Table 1 .
Modeling parameters and effects of PCA-LDA and PLS-DA models.
Note: PCC: principal component combination; F: number of PLS factors; P_REC: prediction recognition rate.