American Journal of Analytical Chemistry
Vol.11 No.02(2020), Article ID:98422,10 pages
10.4236/ajac.2020.112008

Visible and Near-Infrared Spectroscopic Discriminant Analysis Applied to Brand Identification of Wine

Sixia Liao1, Jiemei Chen1, Tao Pan2*

1Department of Biological Engineering, Jinan University, Guangzhou, China

2Department of Optoelectronic Engineering, Jinan University, Guangzhou, China

Copyright © 2020 by author(s) and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/

Received: January 18, 2020; Accepted: February 21, 2020; Published: February 24, 2020

ABSTRACT

High-end wine brand is made through the use of high-quality grape variety and yeast strain, and through a unique process. Not only is it rich in nutrients, but also it has a unique taste and a fragrant scent. Brand identification of wine is difficult and complex because of high similarity. In this paper, visible and near-infrared (NIR) spectroscopy combined with partial least squares discriminant analysis (PLS-DA) was used to explore the feasibility of wine brand identification. Chilean Aoyo wine (2016 vintage) was selected as the identification brand (negative, 100 samples), and various other brands of wine were used as interference brands (positive, 373 samples). Samples of each type were randomly divided into the calibration, prediction and validation sets. For comparison, the PLS-DA models were established in three independent and two complex wavebands of visible (400 - 780 nm), short-NIR (780 - 1100 nm), long-NIR (1100 - 2498 nm), whole NIR (780 - 2498 nm) and whole scanning (400 - 2498 nm). In independent validation, the five models all achieved good discriminant effects. Among them, the visible region model achieved the best effect. The recognition-accuracy rates in validation of negative, positive and total samples achieved 100%, 95.6% and 97.5%, respectively. The results indicated the feasibility of wine brand identification with Vis-NIR spectroscopy.

Keywords:

Wine, Brand Identification, Visible-Near Infrared Spectroscopy, Partial Least Squares Discriminant Analysis, Waveband Selection

1. Introduction

Wine is an alcoholic beverage with mild alcohol content, diversified taste and high popularity among consumers. High-end wine brand is made through the use of high-quality grape variety and yeast strain, and through a unique process. Not only is it rich in nutrients, but also it has a unique taste and a fragrant scent. Authentication of a high-quality wine brand can effectively avoid wine adulteration and fraud. It is beneficial to protect the intellectual property rights of producers and the interests of consumers.

The traditional identification methods for wine brands mainly include the wine taster and composition analysis methods. The former is based on artificial experience, which has subjective bias and low efficiency; the latter requires quantitative analyses of multiple characteristic components and then classification according to the concentration ranges, which is complex, high cost, and low inaccuracy.

Near-infrared (NIR) spectroscopy primarily reflects the absorption of overtones and the combinations of the vibrations of hydrogen-containing functional groups (X-H). It has the advantages of fast, real-time and online measurement and has been effectively used in numerous fields, such as the agricultural [1] [2] [3] [4], food [5] [6] [7] [8], environmental [9] [10] and biomedical fields [11] - [16].

Spectral discriminant analysis is a pattern-recognition method based on their spectral data. It is based on the spectral similarity of the same samples and spectral dissimilarity of different types of samples to achieve classification. Discriminant analysis based on visible-NIR spectroscopy is a simple and effective qualitative analysis method. It has been successfully applied to many aspects, such as classification of wine and liquor [17] - [21], discrimination of milk powder adulteration [22] [23], authenticity identification of multi-grain rice seeds [24], identification of transgenic sugarcane leaves [7] [25], etc. Among them, the identification of wine is performed mainly according to the grape variety and origin. However, there have been no reports on the identification of the wine brand. The identification of the wine brand is better for protecting the interests of producers and consumers, which has more direct market value and social benefits.

In the present study, one well-known brand of red wine with the same vintage was used as an identification subject, and various other brands of red wine were used as interference samples. The partial least squares-discriminant analysis (PLS-DA), an effective spectral discriminant analysis method [26] [27], was used to establish discriminant analysis models for wine brands in the visible-NIR region (400 - 2498 nm). Further, the whole scanning region was naturally divided into visible (400 - 780 nm), short-NIR (780 - 1100 nm), long-NIR (1100 - 2498 nm) and whole NIR (780 - 2498 nm) regions. PLS-DA models were established in the five wavebands, respectively, and compared and selected.

2. Materials and Methods

2.1. Samples and Division of Calibration, Prediction and Validation

Chilean Aoyo wine (2016 vintage) was selected as the identification brand (negative). And twenty bottles purchased from formal commercial channels were obtained. Their authenticity can be guaranteed. They were randomly divided into 8, 6 and 6 bottles as the calibration, prediction and validation sets, respectively. A small amount of wine (approximately 2 mL) was taken five times from each bottle as five samples, and 40, 30 and 30 negative samples for calibration, prediction and validation were obtained, respectively. In total, there were 100 negative samples.

Three well-known Chinese brands of wines of Great Wall (2018 vintage), Changyu (2018 vintage) and Dynasty (2004 vintage) were selected as the interference brands (positive). Twenty bottles per brand were also purchased from formal commercial channels. Similarly, 20 bottles of each brand were divided into calibration (8), prediction (6) and validation (6) sets. Each bottle was sampled five times. And 120, 90 and 90 positive samples for calibration, prediction and validation were obtained, respectively. In order to widen the interference range, the other 21 imported wine brands (one bottle each) and homemade wines from four sources (52 bottles in total) were also collected as interference samples (positive), for a total of 73 bottles. They were randomly were divided as the calibration (25), prediction (24) and validation (24) samples, respectively. Finally, 145, 114 and 114 positive samples for calibration, prediction and validation were obtained, respectively. In total, there were 373 positive samples.

2.2. Instruments and Measurement

The instrument used in this study was the XDS Rapid Content™ Liquid Grating Spectrometer (FOSS, Denmark) with a 1 mm cuvette. Spectra were acquired over the 400 - 2498 nm with a 2 nm wavelength gap, which included the entire NIR region and a large part of the visible region. Si and PbS detectors were used for the detection of 400 - 1100 and 1100 - 2498 nm wavebands, respectively. Every sample was measured thrice and the average spectra were calculated. The spectral measurement was at 25˚C ± 1˚C and 46% ± 1% relative humidity.

2.3. PLS-DA Method

In any particular spectral region, PLS-DA calibration and prediction models were established as follows. 1) Each positive and negative sample was assigned the categorical variable (C) values 1 and 0, respectively. 2) The number of PLS latent variables (LV) was set as 1 to 15. Based on the spectra and categorical variables of samples in the calibration set, the PLS coefficients corresponding to each LV were calculated. 3) Based on the spectrum of each calibration (prediction) sample and the obtained PLS coefficients, the prediction value ( C ˜ ) of categorical variable for the sample was calculated (corresponding to each LV). 4) When C ˜ 0.5 , the sample was deemed positive, and when C ˜ < 0.5 , the sample was deemed negative.

The PLS-DA is a dichotomy method based on quantification. In order to ensure the stability of the intermediate value, the spectral numbers of negative and positive samples in the calibration set were made to be approximately equal. Thus, the first two spectra and average spectra of each negative sample and average spectra of each positive sample were used for calibration, prediction and validation.

In summary, a total of 673 spectra (negative 300 and positive 373) were used for calibration (negative 120 and positive 145, a total of 265), prediction (negative 90 and positive 114, a total of 204) and validation set (negative 90 and positive 114, a total of 204).

2.4. Model Evaluation Indicators

According to the real category (positive and negative) of the samples, the nine recognition-accuracy rates (RARs) of the positive and negative samples in calibration and prediction were proposed and calculated as follows:

R A R C a l i b r a t i o n + = N ˜ C + N C + , R A R C a l i b r a t i o n = N ˜ C N C (1)

R A R P r e d i c t i o n + = N ˜ P + N P + , R A R P r e d i c t i o n = N ˜ P N P (2)

R A R T o t a l + = N ˜ C + + N ˜ P + N C + + N P + , R A R T o t a l = N ˜ C + N ˜ P N C + N P (3)

R A R C a l i b r a t i o n = N ˜ C + + N ˜ C N C + + N C , R A R P r e d i c t i o n = N ˜ P + + N ˜ P N P + + N P (4)

R A R T o t a l = N ˜ C + + N ˜ C + N ˜ P + + N ˜ P N C + + N C + N P + + N P (5)

where N C + , N C , N P + and N P are the numbers of real positive and negative samples in the calibration and prediction sets, respectively. N ˜ C + , N ˜ C , N ˜ P + and N ˜ P are the numbers of correctly recognized positive and negative samples in the calibration and prediction sets. The standard deviation ( R A R S D ) of the above nine RARs was further calculated to evaluate their equilibrium. From the negative, positive, calibration and prediction aspects, these 10 indicators comprehensively evaluated the prediction effect of the DA model and the equilibrium between indicators. The optimal LV was determined according to the maximum total RAR ( R A R T o t a l ).

Next, the validation samples were used as prediction samples to identify following the previous steps. And referring to the real category of validation samples, the positive and negative validation RARs ( R A R V a l i d a t i o n + and R A R V a l i d a t i o n ) and the total validation RAR ( R A R V a l i d a t i o n ) were calculated as follows:

R A R V a l i d a t i o n + = N ˜ V + N V + , R A R V a l i d a t i o n = N ˜ V N V , R A R V a l i d a t i o n = N ˜ C + + N ˜ V N V + + N V (6)

where N V + and N V are the numbers of real positive and negative samples in the validation set, respectively; and N ˜ V + , N ˜ V are the numbers of correctly recognized positive and negative samples in the validation set, respectively.

The nine recognition-accuracy rates are proposed from the perspectives of the positive and negative samples in calibration and prediction, which can evaluate the discrimination model more comprehensively. Moreover, the evaluation method is also applicable to the discriminant analysis of other objects.

The computer algorithms for the abovementioned method were designed using MATLAB version 7.6 software.

3. Results and Discussion

3.1. PLS-DA Models and Comparison

The Vis-NIR spectra of negative and positive samples of wine for the whole scanning region (400 - 2498 nm) are shown in Figure 1(a) and Figure 1(b). It can be seen from Figure 1 that in the visible region (400 - 780 nm), the amplitude of the spectral change of the negative sample was not large, while the amplitude for the positive samples was significantly increased. In addition, in the combination region (2100 - 2300 nm), spectral differences between the two types of samples were also observed.

Based on the spectra throughout the visible, short-NIR, long-NIR, whole NIR and whole scanning region, the PLS-DA models were established, respectively.

In the visible region (400 - 780 nm), the corresponding RARs for the positive and negative samples of calibration and prediction ( R A R C a l i b r a t i o n + , R A R C a l i b r a t i o n , R A R P r e d i c t i o n + and R A R P r e d i c t i o n ) were 97.9%, 100.0%, 95.6% and 100.0%, respectively; in the short-NIR region (780 - 1100 nm), the corresponding RARs were 97.9%, 99.2%, 94.7% and 94.4%, respectively; in the long-NIR region (1100 - 2498 nm), the corresponding RARs were 97.2%, 100.0%, 93.9% and 90.0%, respectively; in the whole NIR region (780 - 2498 nm), the corresponding RARs were 95.9%, 100.0%, 94.7% and 90.0%, respectively; in the whole scanning region (400 - 2498 nm), the corresponding RARs were 97.9%, 100.0%, 96.5% and 100.0%, respectively. In addition, six other RARs ( R A R C a l i b r a t i o n , R A R P r e d i c t i o n , RA R Total + , R A R T o t a l and R A R T o t a l ) of five wavebands are summarized in Table 1. As can be seen from Table 1, the total RARs ( R A R T o t a l ) of five wavebands

Figure 1. Vis-NIR spectra of wine samples for (a) negative, (b) positive.

Table 1. Recognition-accuracy rates of five wavebands PLS-DA models in modelling.

Table 2. Recognition-accuracy rates of the five wavebands PLS-DA models in validation.

were higher. Moreover, the nine RAR values have small fluctuations. Comparing with short- and long-NIR regions, the visible region achieved better discriminant effect and equilibrium.

3.2. Model Validation

The samples that were not involved in calibration were used to validate the PLS-DA models in the above five regions. The obtained validation RARs ( R A R Validation , R A R Validation + and R A R V a l i d a t i o n ) are summarized in Table 2. In addition, the predicted values of the categorical variables for the PLS-DA models of five wavebands are shown in Figure 2.

The results showed that the five models achieved good validation discriminant effects. Among them, the visible model achieved the best effect ( R A R V a l i d a t i o n reached 97.5%), which is significantly better than the three NIR models. The above results provide an experimental basis for selecting a more suitable analytical waveband and spectral instrument.

In addition, using the same grape variety and origin, different brands of wine can also be produced. Therefore, comparing with the wine classification based on variety or origin, brand identification of wine is more difficult because of high similarity. Therefore, the exploration of brand identification of wine is novel. It is worth mentioning that the conventional method requires quantitative analyses of multiple characteristic components and then classifies according to the concentration ranges, while the method in this article is more simple and effective.

4. Conclusions

The identification of a high-quality wine brand is significant, which can effectively

Figure 2. Validation samples classified as positive and negative with PLS-DA models for (a) Visible, (b) Short-NIR, (c) Long-NIR, (d) Whole NIR and (e) Whole scanning region.

prevent wine adulteration and fraud and protect the interests of producers and consumers. The traditional identification method is complex and low efficiency. Discriminant analysis based on visible-NIR spectroscopy is a simple and effective qualitative analysis method.

In the present study, the PLS-DA models were established in the visible, short-NIR, long-NIR, whole NIR and whole scanning region, respectively. The experimental results indicate that the five PLS-DA models all achieved good discriminant effects for the brand identification of wine. Among them, the visible model achieved the best effect ( R A R V a l i d a t i o n reached 97.5%), which is significantly better than the three NIR models. The article provides a reliable basis and convenience for further wavelength selection work, which also provides a valuable reference for selecting a suitable spectral instrument. And, the proposed method is also expected to be an application to brand identification of other objects.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 61078040) and the Science and Technology Project of Guangdong Province of China (No. 2014A020212445).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Cite this paper

Liao, S.X., Chen, J.M. and Pan, T. (2020) Visible and Near-Infrared Spectroscopic Discriminant Analysis Applied to Brand Identification of Wine. American Journal of Analytical Chemistry, 11, 104-113. https://doi.org/10.4236/ajac.2020.112008

References

  1. 1. Cozzolino, D. and Morón, A. (2006) A Potential of Near-Infrared Reflectance Spectroscopy and Chemometrics to Predict Soil Organic Carbon Fractions. Soil & Tillage Research, 85, 78-85. https://doi.org/10.1016/j.still.2004.12.006

  2. 2. Rossel, R.A.V., Walvoort, D.J.J., McBratney, A.B., et al. (2006) Near Infrared, mid Infrared or Combined Diffuse Reflectance Spectroscopy for Simultaneous Assessment of Various Soil Properties. Geoderma, 131, 59-75.https://doi.org/10.1016/j.geoderma.2005.03.007

  3. 3. Chen, H.Z., Pan, T., Chen, J.M. and Lu, Q.P. (2011) Waveband Selection for NIR Spectroscopy Analysis of Soil Organic Matter Based on SG Smoothing and MWPLS Methods. Chemometrics and Intelligent Laboratory Systems, 107, 139-146.https://doi.org/10.1016/j.chemolab.2011.02.008

  4. 4. Pan, T., Li, M.M. and Chen, J.M. (2014) Selection Method of Quasicontinuous Wavelength Combination with Applications to the Near-Infrared Spectroscopic Analysis of Soil Organic Matter. Applied Spectroscopy, 68, 263-271.https://doi.org/10.1366/13-07088

  5. 5. Chen, J.Y., Zhang, H. and Matsunaga, R. (2006) Rapid Determination of the Main Organic Acid Composition of Raw Japanese Apricot Fruit Juices Using Near-Infrared Spectroscopy. Journal of Agricultural and Food Chemistry, 54, 9652-9657. https://doi.org/10.1021/jf061461s

  6. 6. Liu, Z.Y., Liu, B., Pan, T. and Yang, J.D. (2013) Determination of Amino Acid Nitrogen in Tuber Mustard Using Near-Infrared Spectroscopy with Waveband Selection Stability. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 102, 269-274. https://doi.org/10.1016/j.saa.2012.10.006

  7. 7. Guo, H.S., Chen, J.M., Pan, T., Wang, J.H. and Cao, G. (2014) Vis-NIR Wavelength Selection for Non-Destructive Discriminant Analysis of Breed Screening of Transgenic Sugarcane. Analytical Methods, 6, 8810-8816.https://doi.org/10.1039/C4AY01833H

  8. 8. Lyu, N., Chen, J.M., Pan, T., Yao, L.J., Han, Y. and Yu, J. (2016) Near-Infrared Spectroscopy Combined with Equidistant Combination Partial Least Squares Applied to Multi-Index Analysis of Corn. Infrared Physics & Technology, 76, 648-654.https://doi.org/10.1016/j.infrared.2016.01.022

  9. 9. Sousa, C., Lucio, M.M.L., Neto, M.O.F.B., Marcone, G.P.S., Pereira, A.F.C., Dantas, E.O., Fragoso, W.D. and Araujo, M.C.U. (2007) A Method for Determination of COD in a Domestic Wastewater Treatment Plant by Using Near-Infrared Reflectance Spectrometry of Seston. Analytica Chimica Acta, 588, 231-236.https://doi.org/10.1016/j.aca.2007.02.022

  10. 10. Pan, T., Chen, Z.H., Chen, J.M. and Liu, Z.Y. (2012) Near-Infrared Spectroscopy with Waveband Selection Stability for the Determination of COD in Sugar Refinery Wastewater. Analytical Methods, 4, 1046-1052.https://doi.org/10.1039/c2ay05856a

  11. 11. Pan, T., Xie, J., Chen, J.M., Chen, H.Z., et al. (2010) Joint Optimization of Savitzky-Golay Smoothing Modes and PLS Factors Was Applied to Near Infrared Spectral Analysis of Serum Cholesterol. 2010 4th International Conference on Bioinformatics and Biomedical Engineering, Chengdu, China, 18-20 June 2010, 1-4.https://doi.org/10.1109/ICBBE.2010.5514789

  12. 12. Pan, T., Liu, J.M., Chen, J.M., Zhang, G.P. and Zhao, Y. (2013) Rapid Determination of Preliminary Thalassaemia Screening Indicators Based on Near-Infrared Spectroscopy with Wavelength Selection Stability. Analytical Methods, 5, 4355-4362. https://doi.org/10.1039/c3ay40732b

  13. 13. Han, Y., Chen, J.M., Pan, T. and Liu, G.S. (2015) Determination of Glycated Hemoglobin Using Near-Infrared Spectroscopy Combined with Equidistant Combination Partial Least Squares. Chemometrics and Intelligent Laboratory Systems, 145, 84-92. https://doi.org/10.1016/j.chemolab.2015.04.015

  14. 14. Yao, L.J., Lv, N., Chen, J.M., Pan, T. and Yu, J. (2016) Joint Analyses Model for Total Cholesterol and Triglyceride in Human Serum with Near-Infrared Spectroscopy. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 159, 53-59.https://doi.org/10.1016/j.saa.2016.01.022

  15. 15. Chen, J.M., Yin, Z.W., Tang, Y. and Pan, T. (2017) Vis-NIR Spectroscopy with Moving-Window PLS Method Applied to Rapid Analysis of Whole Blood Viscosity. Analytical and Bioanalytical Chemistry, 409, 2737-2745.https://doi.org/10.1007/s00216-017-0218-9

  16. 16. Yang, Y.H., Lei, F.F., Zhang, J., Yao, L.J., Chen, J.M. and Pan, T. (2019) Equidistant Combination Wavelength Screening and Step-by-Step Phase-out Method for the Near-Infrared Spectroscopic Analysis of Serum Urea Nitrogen. Journal of Innovative Optical Health Sciences, 12, Article ID: 1950018.https://doi.org/10.1142/S1793545819500184

  17. 17. Cozzolino, D., Smyth, H.E. and Gishen, M. (2003) Feasibility Study on the Use of Visible and Near-Infrared Spectroscopy Together with Chemometrics to Discriminate between Commercial White Wines of Different Varietal Origins. Journal of Agricultural and Food Chemistry, 51, 7703-7708.https://doi.org/10.1021/jf034959s

  18. 18. dos Santos, T., Cláudia, A., et al. (2017) Merging Vibrational Spectroscopic Data for Wine Classification According to the Geographic Origin. Food Research International, 102, 504-510. https://doi.org/10.1016/j.foodres.2017.09.018

  19. 19. Yu, J., Zhan, J.C. and Huang, W.D. (2017) Identification of Wine According to Grape Variety Using Near-Infrared Spectroscopy Based on Radial Basis Function Neural Networks and Least-Squares Support Vector Machines. Food Analytical Methods, 10, 3306-3311.https://doi.org/10.1007/s12161-017-0887-1

  20. 20. Hu, X.Z., Liu, S.Q., Li, X.H., et al. (2019) Geographical Origin Traceability of Cabernet Sauvignon Wines Based on Infrared Fingerprint Technology Combined with Chemometrics. Scientific Reports, 9, Article No. 8256. https://doi.org/10.1038/s41598-019-44521-8

  21. 21. Zhong, J., Chen, J.M., Yao, L.J. and Pan, T. (2018) Discriminant Analysis of Liquor Brands Based on Moving-Window Waveband Screening Using Near-Infrared Spectroscopy. American Journal of Analytical Chemistry, 9, 124-133.https://doi.org/10.4236/ajac.2018.93011

  22. 22. Capuano, E., Boerrigter-Eenling, R., Koot, A., et al. (2015) Targeted and Untargeted Detection of Skim Milk Powder Adulteration by Near-Infrared Spectroscopy. Food Analytical Methods, 8, 2125-2134. https://doi.org/10.1007/s12161-015-0100-3

  23. 23. Xu, L.L., Li, W.Q., Zhu, H., et al. (2016) Detection of Adulteration of Milk Powder by Near Infrared Spectroscopy. Journal of Food Safety and Quality, 7, 3133-3137.

  24. 24. Chen, J.M., Li, M.L., Pan, T., Pang, L.W., Yao, L.J. and Zhang, J. (2019) Rapid and Non-Destructive Analysis for the Identification of Multi-Grain Rice Seeds with Near-Infrared Spectroscopy. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 219, 179-185. https://doi.org/10.1016/j.saa.2019.03.105

  25. 25. Yao, L.J., Xu, W.Q., Pan, T. and Chen, J.M. (2018) Moving-Window Bis-Correlation Coefficients Method for Visible and Near-Infrared Spectral Discriminant Analysis with Applications. Journal of Innovative Optical Health Sciences, 11, Article ID: 1850005. https://doi.org/10.1142/S1793545818500050

  26. 26. Chiang, L.H., Russell, E.L. and Braatz, R.D. (2000) Fault Diagnosis in Chemical Processes Using Fisher Discriminant Analysis, Discriminant Partial Least Squares, and Principal Component Analysis. Chemometrics and intelligent laboratory systems, 50, 243-252. https://doi.org/10.1016/S0169-7439(99)00061-1

  27. 27. Barker, M. and Rayens, W. (2003) Partial Least Squares for Discrimination. Journal of Chemometrics: A Journal of the Chemometrics Society, 17, 166-173.https://doi.org/10.1002/cem.785