Near-Infrared Spectroscopy Coupled with Kernel Partial Least Squares-Discriminant Analysis for Rapid Screening Water Containing Malathion

Near-infrared spectroscopy coupled with kernel partial least squares-discriminant analysis was used to rapidly screen water containing malathion. In the wavenumber of 4348 cm to 9091 cm, the overall correct classification rate of kernel partial least squares-discriminant analysis was 100% for training set, and 100% for test set, with the lowest concentration detected malathion residues in water being 1 μg·ml. Kernel partial least squares-discriminant analysis was able to have a good performance in classifying data in nonlinear systems. It was inferred that Near-infrared spectroscopy coupled with the kernel partial least squares-discriminant analysis had a potential in rapid screening other pesticide residues in water.


Introduction
Malathion, S-(1,2-dicarbethoxyethyl)-O,O-dimethyldthiophosphate (its structural formula is shown in Figure 1) is one of the most commonly used organophosphate insecticides.It is extensively applied for controlling motile stages of mites and some other insects on fruits and vegetables.Malathion toxicity, in a manner similar to all organophosphates, is known to inhibit acetylcholinesterase and causes the accumulation of acetylcholine within synapses and the consequent overstimulation of postsynaptic receptors [1].
The reported methods to determine the malathion are high-performance liquid chromatography [2], atomic-absorption [3], carbon nanotube modified gold electrode [4], capillary electrophoresis [5], ion mobility spectrometry [6], dual fluorescence and electrochemical detection [7], CO 2 laser [8].However, these methods require expensive instrumentation or complicated pretreatment procedure, which limit their application for real-time detection of malathion.Thus, it is appropriate to seek fast, reliable and economically analytical methods of malathion by simple and relatively inexpensive instrumentation.
Near-infrared spectroscopy (NIRS) [9,10] is a spectroscopic method which contains the information of vibrations of -CH, -OH, -NH and -SH bonds.Some NIR instruments are portable, and have the potential to perform some analytical tasks out of the laboratory to gain the advantages of low cost, accuracy and test speed [11,12].The purpose of this study is merely to establish a rapid detection method for examining malathion residues in water.

Sample Preparation
Malathion (98.2% purity) was purchased from Institute for the Control of Agrochemicals, Ministry of Agriculture (ICAMA), while bottled water (Hangzhou Wahaha Group Co., Ltd., China) obtained from a local supermarket, was employed for the preparation of aqueous solutions.Stock solution of malathion (100 μg•ml −1 ) was prepared in water.Among prepared samples, malathion-free samples were pure bottled water, and the malathion-containing samples were obtained by adding standard stock solution into bottled water to make the concentration from 1 to 100 μg•ml −1 .

Collection of the NIR Spectra
An YDZ1-1 NIR spectrometer (light path was shown in Figure 2) from Nanjing Instrument Co., Ltd., (Nanjing, China) was used in this study.Liquid sample was placed above an integrating sphere, and covered by a goldcoated reflector.Incident light was transmitted through the sample and then reflected back from a gold-coat reflector, which was compatible with the reflection characteristics of the instrument.After that the reflecting light passed through sample again and was transmitted to integrated sphere for detecting.So the light passed through the sample twice.Each individual spectrum was the average of 2 scans collected with a resolution of 2 nm over the wavelength range of 1100 -2300 nm (wavenumber, 9091 -4348 cm −1 ).The spectra were acquired at temperature of 25 (±1)˚C.Original NIR spectra of 100 μg•ml −1 malathion-containing water and pure water were shown in Figure 3.

Software
Chemometric analysis, including qualitative determination of malathion was performed in MATLAB 7.6.0(Math Works Inc. Natick, USA)

Methods
Partial least squares (PLS) regression [13][14][15] is a multivariate linear projection method, which used to find the fundamental relations between the predictor matrix X and the response matrix .PLS decomposes the matrix of zero-mean variables Y X and the matrix of zero-mean variables into the form: where is the X score matrix; is the P X loading matrix; is the E X residual matrix; is the Y score matrix; Q is the Y loading matrix; is the residual matrix.and represent information after removing most noise.Based on the correlation between them, the linear regression model can be given by: In practical, the relationship between predictor matrix and response matrix coming from experimental data is often not linear.Lambert-Beer's law [16] only works at monochromatic radiation, system not saturated in light,  absorbers behaving independently, absorbers being distributed homogenously and low concentrations.Apparent deviations from Lambert-Beer's law may be caused by chemical and/or physical effects, instrumental effects or both.So non-linearity in NIR spectra may arise from factors such as highly absorbing samples, the multiplicative effect of differences in particle size among samples, nonlinear detector responses, interactions between analytes, etc.In our type of system, the spectral instruments optical scattering, detector responses and high concentration may cause non-linear behavior.
Kernel partial least squares (KPLS) is a novel kernel method developed by Rosipal et al. [17,18].Briefly speaking, the kernel methods could be performed in two successive steps.The first step is to embed the original data via a nonlinear mapping ( ) in the input space into a much higher dimensional feature space.The second step is that a linear algorithm is designed to discover the linear relationship in that feature space (see Figure 4).
KPLS is a nonlinear extension of linear PLS in which the input data are transformed into a high-dimensional feature space via the nonlinear mapping x Φ ( ) . For example, the mapping   it can be seen that the data points, which are nonlinear in the original 2-D input space, have remarkably become linearly separable in 3-D feature space.Then the PLS algorithm can then be carried out in the feature space [19][20][21].The limitation of PLS which it only can deal with linear system can be avoided.
The nonlinear transformation effect in KPLS can be completed only by dot product as described in Equation ( 4): where denotes kernel function, which satisfies Mercer's theorem [17,18].There are several kernel functions in common use.In this study we used the Radial Basis Function [22][23][24]  , exp where is kernel parameter.After kernel function and kernel parameter are determined, is the kernel matrix of training set, which is computed and centered by using Equations ( 6)-( 8): where t nm × (1 ≤ ≤ n, n: the number of training samples, m: the number of wavenumber variables) denotes training set, and is a n-dimensional square matrix, in which each element is obtained by computing kernel function between the two training samples.

K K I K KI I KI
The algorithm of KPLS can be summarized as follows: Step 2: Randomly initialize Step 3: Step 4: Step 5: , ; u u u ←

E
Step 6: Repeat Steps 3-6 until the convergence; Step 7: residual matrix and F were computed, ,

(
) ( ) , where I is a n-dimensional identity matrix; Step 8: turn to step 3 until the convergence of residual matrix and E F .The predicted data of training set are evaluated by using Equation ( 9): ( ) where is formed by the columns of latent vector .is formed by the columns of latent vector .is where ) The predicted data of test set are evaluated by using Equation (13).
If the mode uses to be an indicator vector coding two classes: −1 for members of Class A, 1 for members of Class B, a kernel partial least squares-discriminant analysis (KPLS-DA) model is developed.The KPLS-DA model is developed by regression of the predictor matrix X against the response matrix .The model based on experimental data is established in order to assign unknown samples to a previously defined sample class based on pattern of its measured features.The threshold is set to an assigned value, and a sample is considered to be categorized correctly if the predicted value lies on the same side of the threshold.

Y
The purpose of this study is merely to establish a rapid detection method for examining malathion residues in water.It simply detects whether there are malathion residues in water, without the demand for the strict linear relationship between absorbance and concentration.So the KPLS-DA method is used to build the model in this study.
The KPLS-DA codes were written by the author according to the algorithm proposed above.

Selecting of Training and Test Sets
For the study, 2/3 of the spectra were utilized for training and the remaining 1/3 were kept for test.Accuracy of the models was reported by the number of misclassified sam-ples.A total of 140 prepared samples were utilized as a training set (68 malathion-free samples and 72 malathioncontaining samples) and 70 prepared samples (34 malathion-free samples and 36 malathion-containing samples) were utilized for test.

Results of KPLS-DA Model
In this research we used the Radial Basis Function.In the indicator vector of sample classes, −1 was for water samples not containing malathion and 1 was for water samples containing malathion (1 -100 μg•ml −1 ).The threshold was set to 0 for detecting whether water containing malathion.The water containing malathion was classified correctly if the value was above 0, and for the pure water, the value was below 0. The number of factors and the value of σ2 for the final KPLS-DA model were selected by observing the correct classification rate of each class.
For the final KPLS-DA model, the number of factors was 15 and the value of σ2 was 0.045.In the wavenumber from 4348 to 9091 cm −1 , the correct classification rates were 100% for training set, and 100% for test set.The predicted results of samples in training set and test set were shown in Figures 6 and 7    It was known that the highest concentration of malathion among misjudged samples was 1 μg•ml −1 and satisfactory correct classification rates (100%) were obtained.So for the KPLS-DA method, the lowest concentration detected malathion residues in water was 1 μg•ml −1 .

Conclusion
Based on KPLS-DA method, malathion in water samples could be detected by NIR spectroscopy.Results showed that at the wavenumber from 4348 cm −1 to 9091 cm −1 , a classification accuracy of 100% for training set, and 100% for test set were obtained, with the lowest concentration detected malathion residues in water being 1 μg• ml −1 .Compared to other qualitative analysis methods, (for example, cluster analysis), KPLS-DA displayed results more directly to us in a form of scattergram and could be used as a "concentration sieve" by setting different threshod.If the threshold being the maximal concentration permitted in water, samples containing malathion at a concentration lower than the threshod were otherwise not qulified, and then rapid on-site determination could be achieved.If necessary, the nonpassing samples were left to accept the quantitative analysis of HPLC, GC, etc.Therefore, a lot of labor, material and money could be saved.The main advantages of this near infrared method are convenient sampling, no pretreatment, no consumption of organic solvent and short measurement time (5 min).It can be concluded that the proposed spectrometric methodology is a fast and environmentally friendly alternative to the classic chromatographic procedures for rapid screening water containing malathion.Although only malathion was just detected in this study, we could infer that NIR spectroscopy coupled with the KPLS-DA method may have a potential in rapid screening other pesticide residues in water.
x Φ transforms the 2-D data points into a new 3-D space.

Figure 5 (
a) shows the data

Figure 4 .Figure 5 .
Figure 4.The mapping φ(x) embeds the data points into a feature space where the nonlinear relationship now appears linear. :

Figure 7 .
Figure 7. Predicted results of KPLS-DA in test set.