Visible and Near-Infrared Spectroscopic Discriminant Analysis Applied to Brand Identification of Wine

High-end wine brand is made through the use of high-quality grape variety and yeast strain, and through a unique process. Not only is it rich in nutrients, but also it has a unique taste and a fragrant scent. Brand identification of wine is difficult and complex because of high similarity. In this paper, visible and near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was used to explore the feasibility of wine brand identification. Chilean Aoyo wine (2016 vintage) was selected as the identification brand (negative, 100 samples), and various other brands of wine were used as interference brands (positive, 373 samples). Samples of each type were randomly divided into the calibration, prediction and validation sets. For comparison, the PLS-DA models were established in three independent and two complex wavebands of visible (400 780 nm), short-NIR (780 1100 nm), long-NIR (1100 2498 nm), whole NIR (780 2498 nm) and whole scanning (400 2498 nm). In independent validation, the five models all achieved good discriminant effects. Among them, the visible region model achieved the best effect. The recognition-accuracy rates in validation of negative, positive and total samples achieved 100%, 95.6% and 97.5%, respectively. The results indicated the feasibility of wine brand identification with Vis-NIR spectroscopy.


Introduction
Wine is an alcoholic beverage with mild alcohol content, diversified taste and high popularity among consumers. High-end wine brand is made through the use of high-quality grape variety and yeast strain, and through a unique process. Not only is it rich in nutrients, but also it has a unique taste and a fragrant scent.
Authentication of a high-quality wine brand can effectively avoid wine adulteration and fraud. It is beneficial to protect the intellectual property rights of producers and the interests of consumers.
The traditional identification methods for wine brands mainly include the wine taster and composition analysis methods. The former is based on artificial experience, which has subjective bias and low efficiency; the latter requires quantitative analyses of multiple characteristic components and then classification according to the concentration ranges, which is complex, high cost, and low inaccuracy.
Near-infrared (NIR) spectroscopy primarily reflects the absorption of overtones and the combinations of the vibrations of hydrogen-containing functional groups (X-H). It has the advantages of fast, real-time and online measurement and has been effectively used in numerous fields, such as the agricultural [ [16].
Spectral discriminant analysis is a pattern-recognition method based on their spectral data. It is based on the spectral similarity of the same samples and spectral dissimilarity of different types of samples to achieve classification. Discriminant analysis based on visible-NIR spectroscopy is a simple and effective qualitative analysis method. It has been successfully applied to many aspects, such as classification of wine and liquor [17]- [21], discrimination of milk powder adulteration [22] [23], authenticity identification of multi-grain rice seeds [24], identification of transgenic sugarcane leaves [7] [25], etc. Among them, the identification of wine is performed mainly according to the grape variety and origin. However, there have been no reports on the identification of the wine brand.
The identification of the wine brand is better for protecting the interests of producers and consumers, which has more direct market value and social benefits.
In the present study, one well-known brand of red wine with the same vintage was used as an identification subject, and various other brands of red wine were used as interference samples. The partial least squares-discriminant analysis (PLS-DA), an effective spectral discriminant analysis method [26] [27], was used to establish discriminant analysis models for wine brands in the visible-NIR region (400 -2498 nm). Further, the whole scanning region was naturally divided into visible (400 -780 nm), short-NIR (780 -1100 nm), long-NIR (1100 -2498 nm) and whole NIR (780 -2498 nm) regions. PLS-DA models were established in the five wavebands, respectively, and compared and selected.

Samples and Division of Calibration, Prediction and Validation
Chilean Aoyo wine (2016 vintage) was selected as the identification brand (nega-tive). And twenty bottles purchased from formal commercial channels were obtained. Their authenticity can be guaranteed. They were randomly divided into 8, 6 and 6 bottles as the calibration, prediction and validation sets, respectively. A small amount of wine (approximately 2 mL) was taken five times from each bottle as five samples, and 40, 30 and 30 negative samples for calibration, prediction and validation were obtained, respectively. In total, there were 100 negative samples.  (25), prediction (24) and validation (24) samples, respectively.
Finally, 145, 114 and 114 positive samples for calibration, prediction and validation were obtained, respectively. In total, there were 373 positive samples.

Instruments and Measurement
The instrument used in this study was the XDS Rapid Content™ Liquid Grating Spectrometer (FOSS, Denmark) with a 1 mm cuvette. Spectra were acquired over the 400 -2498 nm with a 2 nm wavelength gap, which included the entire NIR region and a large part of the visible region. Si and PbS detectors were used for the detection of 400 -1100 and 1100 -2498 nm wavebands, respectively.
Every sample was measured thrice and the average spectra were calculated. The spectral measurement was at 25˚C ± 1˚C and 46% ± 1% relative humidity.

PLS-DA Method
In any particular spectral region, PLS-DA calibration and prediction models were established as follows. 1) Each positive and negative sample was assigned the categorical variable (C) values 1 and 0, respectively. 2) The number of PLS latent variables (LV) was set as 1 to 15. Based on the spectra and categorical variables of samples in the calibration set, the PLS coefficients corresponding to each LV were calculated. 3) Based on the spectrum of each calibration (prediction) sample and the obtained PLS coefficients, the prediction value ( C  ) of categorical variable for the sample was calculated (corresponding to each LV). 4) , the sample was deemed positive, and when 0.5 C <  , the sample was deemed negative.
The PLS-DA is a dichotomy method based on quantification. In order to ensure the stability of the intermediate value, the spectral numbers of negative and positive samples in the calibration set were made to be approximately equal.
Thus, the first two spectra and average spectra of each negative sample and average spectra of each positive sample were used for calibration, prediction and validation.
In summary, a total of 673 spectra (negative 300 and positive 373) were used for calibration (negative 120 and positive 145, a total of 265), prediction (negative 90 and positive 114, a total of 204) and validation set (negative 90 and positive 114, a total of 204).

Model Evaluation Indicators
According to the real category (positive and negative) of the samples, the nine recognition-accuracy rates (RARs) of the positive and negative samples in calibration and prediction were proposed and calculated as follows: The nine recognition-accuracy rates are proposed from the perspectives of the positive and negative samples in calibration and prediction, which can evaluate the discrimination model more comprehensively. Moreover, the evaluation method is also applicable to the discriminant analysis of other objects.
The computer algorithms for the abovementioned method were designed using MATLAB version 7.6 software.

PLS-DA Models and Comparison
The Vis-NIR spectra of negative and positive samples of wine for the whole scanning region (400 -2498 nm) are shown in Figure 1(a) and Figure 1(b). It can be seen from Figure 1 that in the visible region (400 -780 nm), the amplitude of the spectral change of the negative sample was not large, while the amplitude for the positive samples was significantly increased. In addition, in the combination region (2100 -2300 nm), spectral differences between the two types of samples were also observed.
Based on the spectra throughout the visible, short-NIR, long-NIR, whole NIR and whole scanning region, the PLS-DA models were established, respectively.
In the visible region (400 -780 nm), the corresponding RARs for the positive and negative samples of calibration and prediction (

RAR
) of five wavebands S. X. Liao et al.

Model Validation
The samples that were not involved in calibration were used to validate the PLS-DA models in the above five regions. The obtained validation RARs  Table 2. In addition, the predicted values of the categorical variables for the PLS-DA models of five wavebands are shown in Figure 2.
The results showed that the five models achieved good validation discriminant effects. Among them, the visible model achieved the best effect ( Validation RAR reached 97.5%), which is significantly better than the three NIR models. The above results provide an experimental basis for selecting a more suitable analytical waveband and spectral instrument.
In addition, using the same grape variety and origin, different brands of wine can also be produced. Therefore, comparing with the wine classification based on variety or origin, brand identification of wine is more difficult because of high similarity. Therefore, the exploration of brand identification of wine is novel. It is worth mentioning that the conventional method requires quantitative analyses of multiple characteristic components and then classifies according to the concentration ranges, while the method in this article is more simple and effective.

Conclusions
The identification of a high-quality wine brand is significant, which can effectively prevent wine adulteration and fraud and protect the interests of producers and consumers. The traditional identification method is complex and low efficiency.
Discriminant analysis based on visible-NIR spectroscopy is a simple and effective qualitative analysis method.
In the present study, the PLS-DA models were established in the visible, short-NIR, long-NIR, whole NIR and whole scanning region, respectively. The experimental results indicate that the five PLS-DA models all achieved good dis-criminant effects for the brand identification of wine. Among them, the visible model achieved the best effect ( Validation RAR reached 97.5%), which is significantly better than the three NIR models. The article provides a reliable basis and convenience for further wavelength selection work, which also provides a valuable reference for selecting a suitable spectral instrument. And, the proposed method is also expected to be an application to brand identification of other objects.