Global Optimization of Norris Derivative Filtering with Application for Near-Infrared Analysis of Serum Urea Nitrogen ()
1. Introduction
As we know, near-infrared (NIR) spectroscopy primarily reflects the absorption of overtones and combinations of vibrations of X-H functional groups (such as C?H, O?H and N?H), has obvious advantages in rapid and reagent-free analyses, and has been effectively used in agriculture [1] [2] [3] [4] , food [5] [6] [7] [8] , environment [9] [10] , biomedicine and other fields [11] - [16] . NIR spectroscopy of most sample types can usually be measured directly without physically or chemically treating. It is possible to directly measure complex liquid samples containing multiple components (e.g. blood, milk, etc.) by transmission. However, its spectrum contains both instrumental noise and interference caused by other unknown components. Thus spectral pretreatment of high signal-to-noise ratio is significantly necessary to solve this problem.
In spectral preprocessing, appropriate smoothing and derivative can effectively eliminate noises. The famous Norris derivative filter (NDF) includes two steps: the moving average smoothing and differential derivation. It uses three parameters: the derivative order (d), the number of smoothing points (s) and the number of differential gaps (g). NDF is an algorithm group with various parameters and modes, which is often used in the NIR analysis [17] [18] . The appropriate Norris modes should be chosen according to the analytical objects. It is necessary to make large-scale optimization selections for Norris modes. Due to the heavy workload, such work has not been reported yet.
Serum urea nitrogen (SUN) is an important indicator for clinical evaluation of renal function and protein metabolism. It plays an important role in the diagnosis of renal diseases in acute myocardial infarction [19] and acute heart failure [20] . And it is also an adjunct predictor of acute pancreatitis [21] , hypertension and depression. The normal value of SUN is 2.9 to 7.5 mmol∙L−1. The conventional analytical methods for SUN include gas chromatography, ion conductivity and urease methods. These methods require chemical reagents and complex operations, which are inconvenient for rapid and daily screening of large population. Therefore, the development of simple and rapid analytical method for SUN in the routine testing is of great significance.
In fact, the serum urea nitrogen (CO(NH2)2) contains the hydrogen-containing group (-NH2), which has characteristic absorption in the NIR region. Therefore, in molecular level, the NIR spectrum has the possibility to quantitatively analyze urea nitrogen. There have been some researches and developments in this area, such as using NIR spectroscopy to analyze the urea nitrogen content in serum simulated solution or urine [15] [16] . Recently, silver mirror enhanced NIR diffuse reflectance spectroscopy has been proposed for the analysis of SUN [22] . However, the existing analytical accuracy did not meet the requirements of clinical application, further innovation in methods is very necessary. And it is also crucial to conduct chemometric studies for model optimization.
In this study, combined with partial least squares (PLS) regression, a large-scale parameter cyclic and global optimization platform for NDF algorithm was constructed to achieve globally optimal selection. And the NIR spectroscopic analysis of SUN was taken as an example to evaluate the performance of the proposed NDF platform.
2. Methods and Materials
2.1. Experimental Materials, Instruments and Measurement Methods
A total of 210 serum samples were collected from a hospital, the clinical actual values of serum urea nitrogen of the samples were obtained. The SUN values ranged from 2.1 to 41.6 (mmol∙L−1), and the mean and standard deviation (SD) were 8.0 and 7.0 (mmol∙L−1), respectively. Since the serum samples were collected and used in this study, the informed consent of all individual participants was obtained. The experiment was carried out according to relevant laws and institutional guidelines and approved by local medical institutions, which obtained the informed consent of all participants.
The instrument was an XDS Rapid Content™ Liquid Grating Spectrometer (FOSS, Denmark) equipped with a 2 mm. The spectra spanned 780 to 2498 nm with a 2 nm wavelength gap, including the overall NIR region. The detectors on wavebands of 780 - 1100 and 1100 - 2498 nm were Si and PbS detectors, respectively. Every sample was measured thrice. And the average spectrum of each sample was calculated and used for modeling. The spectra were measured at 25˚C ± 1˚C and 46% ± 1% relative humidity.
2.2. Mul-Partition Modeling
The 75 samples were randomly selected from 210 samples as independent validation set (not involved in modeling). The remaining 135 samples were used as the modeling set, which was further divided into calibration (80 samples) and prediction (55 samples) sets for 10 times to achieve the parameter stability. The root-mean-square errors (SEP) and correlation coefficients (RP) for prediction were determined for each partition. The mean values (SEPAve, RP,Ave) and standard deviations (SEPSD, RP,SD) for all the partitions were further determined, respectively. The comprehensive indicator SEP+ = SEPAve + SEPSD was used to select the parameters with stability. The SEP+ takes the modeling prediction accuracy (SEPAve) and stability (SEPSD) both into account. The selected models were validated using the validation set. The corresponding SEP, RP, and ratio of performance to deviation (RPD) were further determined, respectively.
, where CSD was SD of actual values for the 75 samples (Mean: 8.6 mmol∙L−1; SD: 7.8 mmol∙L−1).
2.3. Norris Derivative Filter Algorithm
The NDF algorithm uses the symmetrical window of wavelengths to perform moving average smoothing on spectra firstly. The number of wavelengths in the smoothing-window is called the number of smoothing points (s, odd), which is set to
; S can take the maximum odd number of the total number of wavelengths (N0) in the entire spectral waveband. Since the low correlation, it is unreasonable to use a too wide smoothing-window. In this study, the upper limit S was set as 99. In addition, when s = 1, the moving average smoothing cannot be performed for that the original spectra were not pretreated.
The absorbance of the kth wavelength was xk. And the absorbance of the smoothing-window center xk were xi,
. The smoothing value of xk was as follows:
(1)
It’s worth noting that for the leftmost
and rightmost wavelengths, the symmetrically smoothing cannot be performed. Based on an idea of data continuity, for the leftmost
wavelengths, the smoothing values of the absorbance xk were as follows:
(2)
For the rightmost
wavelengths, the smoothing method was similar and omitted.
Difference derivation: The spectra pretreated by moving average smoothing were then used for derivation. The 1st derivative of absorbance was calculated using the center difference method. Since the NIR spectra are relatively flat and the spectral resolution of different objects is usually different, the original spectral data gap is not necessarily suitable for the differential gap of derivative. The Norris derivative uses variable number of wavelength gaps as the number of differential gaps (g) for the derivative,
. Large g is unreasonable due to the correlation is low. In this study, the upper limit G was set as 50.
For xk, the 1st derivative of absorbance was calculated using the following center difference:
(3)
Similarly, for the leftmost and the rightmost g wavelengths respectively, the center difference cannot be performed. Based on the idea of data continuity, for the g leftmost wavelengths, the 1st derivative value of the absorbance was calculated by the forward difference method as follows:
(4)
For the rightmost g wavelengths, the calculation method for 1st derivative value of the absorbance was similar and omitted.
The 2nd derivative value can be obtained by derivation on the basis of the above 1st derivative value, and the process was not described again.
Optimization: Considering that the absolute value of the derivative of the 3rd or more is small and the spectral information is low, it is generally not recommended to use the derivative of the 3rd or more. In this study, the derivative order is set as d = 0, 1, 2. In particular, when d = 0, only the previous moving average smoothing was required.
As above, based on parameters combinations (d, s, g), d = 0, 1, 2;
;
, a total of 5050 NDF modes were obtained. They were used to pretreat the sample spectra separately. The corrected spectra were then used to establish PLS models, named Norris-PLS models. Finally, according to the predicted effect, the optimal parameters were preferred as follows:
(5)
The s and the g are both important parameters of NDF. When d = 0, there is only one variable parameter s. The corresponding prediction error is SEP+(0, s),
. When d = 1, 2, there are two variable parameters (s, g). The corresponding single-parameter local optimal models were as follows:
(6)
(7)
3. Results and Discussion
3.1. Direct PLS Model without Pretreatment
The NIR spectra of the 210 serum samples in the entire scanning region (780 - 2498 nm) are illustrated in “Figure 1(a)”. As comparison, the PLS model in the whole region (780 - 2498 nm) without spectral pretreatment was established. The modeling effect (SEPAve, RP,Ave, SEPSD, RP,SD, and SEP+) are summarized in “Table 1”. The results showed that high prediction error (SEP+ = 7.07 mmol∙L−1) and low correlation (RP = 0.535).
3.2. Norris-PLS Models
All Norris-PLS models corresponding to 5050 NDF modes also were established. For different derivative order, the optimal SEP+ corresponding to each single-parameter (number of smoothing points s or number of differential gaps g) are shown in “Figure 2”. It can be observed that the global optimal NDF mode was 2nd derivative, 33 smoothing points and 15 differential gaps (d = 2, s = 33 and g = 15). The corresponding SEP+ and RP were obviously improved to 2.62 mmol∙L−1 and 0.930, respectively. In particular, the prediction error (SEP+) is only 37% of the original. From “Figure 2”, it is observed that the prediction effects corresponding to different parameters (d, s, g) are significantly different. Thus, the ergodic choice of the parameters was significantly necessary to achieve
Figure 1. NIR spectra of 210 serum samples (a) raw spectra and (b) NDF spectra (d = 2, s = 33, g = 15).
Table 1. Prediction effects of modelling for the analysis of SUN (mmol∙L−1).
better modeling performance. The prediction effects of the global optimal Norris-PLS model are also summarized in “Table 1”. The spectra used the optimal NDF mode (d = 2, s = 33, g = 15) are shown in “Figure 1(b)”.
In fact, there was significant saturate absorbance around 1900nm and 2400 nm in raw spectra, see in “Figure 1(a)”. They introduce noise and have an impact on modeling. Therefore, the absorbance higher than 3 (corresponded 99.9% absorption rate) was further excluded, and the remaining was 780 - 1880 & 2082 - 2344 nm. On the basis of the NDF spectra, the PLS model was established in the unsaturated region (780 - 1880 & 2082 - 2344 nm). The modeling effect is also summarized in Table 1. The SEP+ and was RP further improved to 1.87 mmol∙L−1 and 0.966, respectively. Among then, the SEP+ further declined 29%.
Figure 2. SEP+ of the local optimal Norris-PLS models for SUN corresponding to each single-parameter: (a) Numbers of smoothing points and (b) Numbers of differential gaps.
Therefore, eliminating the interference waveband with high-absorption does help to improve the modeling effect.
3.3. Models Validation
The 75 validation samples not incorporated in modeling were used to evaluate the Norris-PLS model that were established in the unsaturated region (780 - 1880 & 2082 - 2344 nm) with the optimal NDF mode. The PLS regression coefficients were determined using the spectra and actual SUN values of all modeling samples. The predicted values were then determined using the resulting regression coefficients and the spectra of the validation samples.
For the model, the relationship between the predicted and actual SUN values is shown in “Figure 3”. The evaluation values for validation (SEP, RP, and RPD) are 1.66 mmol∙L−1, 0.977 and 4.7, respectively. This model has achieved good prediction effect. The prediction values were close to the actual values with high precision and correlation. The results showed that the prediction effects corresponding to different parameters were significantly different. The parameters can’t be pre-set by experience. The global optimization of the parameters was significantly necessary to achieve better modeling performance.
Figure 3. Relationship between the predicted and actual SUN values based on the optimal models.
4. Conclusions
NDF algorithm is a well-performed spectral preprocessing method. The appropriate Norris modes should be chosen according to the analytical objects. It is necessary to make global optimization selection for Norris modes to achieve optimal modeling performance.
Serum urea nitrogen is an important blood clinical screening index and has clinical reference value for the diagnosis and treatment of many major diseases. The use of NIR spectroscopy to establish a rapid and reagent-free detection method for serum urea nitrogen can provide new technical support for related health screening of large populations. Using the optimal Norris-PLS model for SUN, the modelling prediction error (SEP+) further decreased to 2.62 mmol∙L−1 (decline rate 63%). Then, the saturated wavebands with high absorption were removed, and the SEP+ then decreased to 1.87 mmol∙L−1 (decline rate 29%). Therefore, the good performance of the global optimization of NFD algorithm was indicated.
This study developed a large-scale parameter cyclic and global optimization platform for NDF algorithm and successfully used to SUN analysis. Meantime, the well-posed study of parameters in Norris-PLS method was given. We believed that the above promotion has such significance and can provide valuable reference for the NIR analysis of complex objects.
Acknowledgements
This work was supported by the Science and Technology Project of Guangdong Province of China (No. 2014A020213016, No. 2014A020212445) and the University-Enterprise Joint Research Project “Intelligent detection network technology joint research centre” (No. 40115031).