Application of Extended Multiplicative Signal Correction to Short-Wavelength near Infrared Spectra of Moisture in Marzipan

Short-wavelength near infrared spectroscopy (SW-NIR) is a very rapid, versatile and precise technique, which can be used in many different situations and for very types of products and chemical compounds. Extended multiplicative signal correction (EMSC) is a modification of the standard MSC pre-processing method that allows the separation of physical light scattering effects from chemical (vibrational) light absorbance effects in spectra. In this paper, the EMSC is applied and compared with first derivate, second derivate, MSC and SNV in combination of PLSR to obtain robust models in terms of accuracy and predict ability with a reduced calibration data set using SW-NIR spectra of moisture in marzipan. The Extended Multiplicative Signal Correction—EMSC and combination methods provide the best results in terms of prediction ability and calibration SW-NIR spectra of moisture in marzipan. The best classification results were obtained by Extended Multiplicative Signal Correction followed by second derivates.


Introduction
Multivariable electromagnetic spectrophotometry in the near or mid-infrared region offers great practical and economical advantages for analysis of large sample series, as demonstrated by diffuse NIR reflectance or transmittance spectroscopy in areas such as agriculture, food technology, pharmaceutics, and petrochemistry [1].Today, such high-speed instruments are routinely designed to yield precise quantitative determination for a variety of chemical and physical properties, using multivariate calibration to solve the selectivity problems caused by the lack of sample preparation and for automatic detection of outliers [2].Preprocessing of the spectral measurements is used for optimizing the subsequent multivariate calibration.
When analyzing more or less intact complex samples by diffuse reflectance or transmittance spectroscopy, uncontrolled variations in light scattering are often a dominating artifact that complicates subsequent quantita-tive chemical analysis.This undesired scattering variation is due to uncontrolled physical variations in the measured samples: particle size and shape, sample packing, sample surface, etc.If the light scattering could be modeled and corrected for mathematically in a more elaborate preprocessing stage, these problems could be reduced or eliminated.The cost of NIR analysis could then be reduced, because the need for controlled sample preparation could be further reduced, the number of calibration samples could be reduced, and the statistical calibration modeling process could be simplified.Moreover, a pragmatic but reasonably accurate model-based light scatter correction may shed new light on the light scattering processes themselves [3].
One of the main mistake sources found in quantitative determinations, through the spectroscopy or Short-Wavelength near Infrared (SW-NIR) spectroscopy (700 -1100 nm), it is phenomenon of scattering of light, provoked by the non homogeneity of the sample, mainly for the granulometric differences, geometry, packing and orientation of the particles [4,5].The light scattering alters the functional relationship between the intensity of the reflection measures and the concentration of the present absorbent species in the sample.To model light scattering and reflectance simultaneously is an extremely difficult task, because the geometry and the orientation of the particles vary randomize sample.Thus, in the construction of precise and robust models, it is necessary to minimize the effect of the scattering [4,5].
The short-wavelength near infrared (SW-NIR) spectroscopy (700 -1100 nm) is a relatively new analytical technique with a high potential for food analysis [6][7][8][9][10].The SW-NIR presents several advantages over conventional near-infrared methods (900 -2500 nm): 1) Absorption in this wavelength region arises from the thirdand fourth overtone vibrational (CH, OH, NH) transitions which are characterized by low extinction coefficients; 2) Signal-to-noise ratios on the order of 10,000:l can be obtained, thus making very subtle changes in the spectra useful for analysis; 3) Good quantitative results can be obtained for highly scattering samples [8].These absorbances are overtones and combination bands from fundamental molecular vibration bands in the IR region.The IR absorbances themselves are often too strong to allow simple, representative analysis of complex samples.But the NIR bands are sufficiently weakened to allow the light to penetrate anywhere from a few millimeters to a couple of centimetres through the samples.The NIR measurements can be taken as reflectance or transmittance, depending on what is most practical.Finally, since NIR instruments require both multivariate calibration chemometrics and spectroscopic insight, this multidisciplinary technique may fall between the traditionally specialized academic chairs.One disadvantage of SW-NIR spectroscopy is that spectral features often appear quite overlapped and require the use of sophisticated data analysis techniques such as partial least-squares (PLS) calibration methods to obtain meaningful correlations [8,9].
The paper presents the new application spectral preprocessing methods for improving the multivariate calibration of multichannel analytical instruments based on spectroscopic background knowledge: extended multiplicative signal correction (EMSC) is designed to improve the separation of light scattering and light absorbance.Conventional projection on latent structures regression (PLSR) is then used for the subsequent empirical "soft modelling" calibration.Near infrared (NIR) data are used for illustrating their applications.Before this ghastly group to be modeled by chemometrics technical, with the purpose of minimizing the effects caused by difficulty in the obtaining of an ideal spectrum.This study compares the EMSC (Extended Multiplicative Signal Correction) with first derived, second derived, SNV (Standard Normal Variate) and MSC (Multiplicative Signal Correction) methods in terms of robustness and prediction ability of the final PLS models in SW-NIR spectra of marzipan.This template, created in MS Word 2003 and saved as "Word 97 -2003 & 6.0/95-RTF" for the PC, provides authors with most of the formatting specifications needed for preparing electronic versions of their papers.All standard paper components have been specified for three reasons: 1) ease of use when formatting individual papers; 2) automatic compliance to electronic requirements that facilitate the concurrent or later production of electronic products; and 3) conformity of style throughout a journal paper.Margins, column widths, line spacing, and type styles are built-in; examples of the type styles are provided throughout this document and are identified in italic type, within parentheses, following the example.Some components, such as multi-leveled equations, graphics, and tables are not prescribed, although the various table text styles are provided.The formatter will need to create these components, incorporating the applicable criteria that follow.

PLS Regression
PLSR is today probably the most widely applied multivariate calibration method in chemometrics [11,12].It is commonly used in quantitative spectroscopy to correlate spectroscopic data (X) with related physico-chemical data (Y).It is based on so-called latent variables like PCA and PCR [12], but for PLSR the decomposition of X during regression is guided by the variation in Y: the explained covariance between X and Y is maximized, so that the variation in X directly correlating with Y is extracted.
An important feature of PLSR is that, as mentioned, it is based on latent variables and can therefore handle the usually highly collinear spectroscopic data, in contrast to MLR [13].The linear model between the vector Yc containing the centered reference data and the matrixXc containing the centered spectral data can be described by: where b is a vector which contains the regression coefficients to be determined during the calibration, and e is the residual.In order to obtain a good estimation of b, the PLSR model needs to be calibrated on samples that span the variation in Y well and in general are representative of the future samples.Depending on the complexity of the future samples, this may require a huge number of samples [14].

Extended Multiplicative Signal Correction
A number of chemometric preprocessing methods have Copyright © 2013 SciRes.JDAIP been proposed to explicitly model the effect of multiplicative light scattering [15].One of the most frequently reported techniques in the literature is that of multiplicative signal correction (MSC) [16].The methodology involves regressing each spectrum in a set of related samples, i.e., the samples comprise the same chemical components, on a reference spectrum (for example, the mean spectrum) to estimate the intercept and slope of the estimated regression equation that will theoretically capture the information relating to the effect of multiplicative light scattering.Each individual spectrum is then corrected by subtracting the intercept and dividing by the slope.An alternative procedure proposed to correct for multiplicative light scattering, that is similar to MSC, is the inverted scatter correction (ISC), more recently is the Extended multiplicative signal correction (EMSC) is a modification of the standard MSC pre-processing method [4,5,[15][16][17][18] that allows the separation of physical light scattering effects from chemical (vibrational) light absorbance effects in spectra.It was developed by Martens and Stark [19,20] the methodology to identify and separate various effect in multi-chanel measurements making the measurements suitable lives it goes multivariate calibration, improving robustness and predictive ability [4,5,19].This approach is able to estimate and separate multiplicative physical effects (path length, light scattering, sample thickness, etc.) from additive chemical effects (absorbance of analytes and interferants) and additive physical effects (temperature shifts, baseline variations, etc.) [4,5,14,21].It can also be used to remove identified but undesired "physical" and "chemical" interference effects, while retaining identified, but desired effects as well as unidentified effects in the data.For these purposes, EMSC allows the use of previous knowledge about the system and its components (constituents' spectra) in the correction, which can sometimes be very useful and yield good calibration results [4,5].EMSC appears to be applicable to different types of spectroscopic data (UV, VIS, NIR, IR, Raman), chromatography, electrophoresis and sensory data [14].

Data Sets
The spectral measurement compositional analysis of 32 marzipan samples.Traditional moisture was performed on all samples.The Spectral data matrix is obtened in a Instrument: Infratec 1255, Dispersive scanning is a optical principle, the Available Spectral Range is 850 -1050 nm, the Spectral Sampling is 2 nm.The data set was produced by J. Christensen et al. [22].The file of this data set was obtended in the Public data sets for multivariate data analysis, located in the home page of the Quality and Technology, Department of Food Science, Faculty of Science, University of Copenhagen, the matrix (32 × 100), more information http://www.models.kvl.dk/research/data/Marzipan/index.asp.

Model of Calibration and Prediction
The 32 spectral of marzipan samples was utilized for construction of the models of calibration and prediction, using full Cross Validation (CV), was construction one models of calibration and prediction of each on preprocessing: Raw, First Derivative (1st), Second Derivative (2nd), SNV, MSC, EMSC, EMSC + 1st, 1st + EMSC, EMSC + 2nd and 2nd + EMSC.All the techniques of preprocessing were evaluated in terms of robustness and prediction ability of the final PLSR.For the construction of the models, the date set was centered in the average.The robustness of preprocessing techniques was evaluated for RMSEC (Root Mean Square Error of Cross Validation) and RMSEP (Root Mean Error of Prediction) and correlation (calibration and prediction).For the construction of the models of calibration, prediction and comparison of the preprocessing techniques was utilized the UNSCRAMBLER V 9.2.Table 1 presents the performance statistics of the PLSR models goes quantization of the marzipan moisture predictions using SW-NIR (850 -1050 nm) spectra from the calibration and the predictions (32 sample), using full cross validation.CV is cross validation, RMSEC is the root mean square error of cross validation and RMSEP is the root mean error of prediction.

Results and Discussion
The calibration and prediction test set results for all the tested pre-transformation methods are summarized in Table 1 in terms of the RMSEC (CV) and RMSEP (CV), CV is cross validation, and of the correlation coefficients based thereon.
Compared to the untransformed raw data, the basic second derivative (2nd), SNV and MSC did not affect the results very much.
The first derivative (1st) and EMSC pre-transforma-  The better calibration and prediction test set is Extended Multiplicative Signal Correction followed by second derivates (EMSC + 2nd), this combination usually results in a reduction of the scatter related offset and light scattering and reveals mores spectral features compared to the raw spectra.

Conclusion
The result of the different preprocessing methods shows that the new, extended MSC, or be, the Extended Multiplicative Signal Correction-EMSC methods provide the best overall performance in terms of prediction ability and calibration short-wavelength near infrared spectroscopy (SW-NIR) spectra of moisture in marzipan.The success is presumably due to the ability of spectral modeling to separate chemical light-absorbance and physical light scatter effects.The best classification results were obtained by Extended Multiplicative Signal Correction followed by second derivates (EMSC + 2nd).This way the application of the extended multiplicative signal correction-EMSC is very important for estimating and separating multiplicative physical effects (light scattering) and additive chemical effects (light absorbance) in near infrared spectroscopy.

Figure 1
Figure 1 presents different preprocessing methods applied to the short-wavelength near infrared (SW-NIR) spectra (850 -1050 nm) of moisture in marzipan.The Figure 1(a) is the SW-NIR spectra without any preprocessing, the raw spectra of the 32 samples of marzipan in the calibration set.From the figure, large additive offset, multiplicative scaling effects and light scattering are readily observed.Figure 1(b) displays the same SW-NIR spectra after SNV pre-transformation.In comparison, Figure 1(c) shows MSC pre-transformed spectra.Figure 1(d) displays the same SW-NIR spectra after EMSC pretransformation.Table1presents the performance statistics of the PLSR models goes quantization of the marzipan moisture predictions using SW-NIR (850 -1050 nm) spectra from the calibration and the predictions (32 sample), using full cross validation.CV is cross validation, RMSEC is the root mean square error of cross validation and RMSEP is the root mean error of prediction.The calibration and prediction test set results for all the tested pre-transformation methods are summarized in Table1in terms of the RMSEC (CV) and RMSEP (CV), CV is cross validation, and of the correlation coefficients based thereon.Compared to the untransformed raw data, the basic second derivative (2nd), SNV and MSC did not affect the results very much.The first derivative (1st) and EMSC pre-transforma-