Global Optimization of Norris Derivative Filtering with Application for Near-Infrared Analysis of Serum Urea Nitrogen

Near-infrared (NIR) spectroscopy combined with chemometrics methods was applied to the rapid and reagent-free analysis of serum urea nitrogen (SUN). The mul-partitions modeling was performed to achieve parameter stability. A large-scale parameter cyclic and global optimization platform for Norris derivative filter (NDF) of three parameters (the derivative order: d, the number of smoothing points: s and the number of differential gaps: g) was developed with PLS regression. Meantime, the parameters’ adaptive analysis of NDF algorithm was also given, and achieved a significantly better modeling effect than one without spectral pre-processing. After eliminating the interference wavebands of saturated absorption, the modeling performance was further improved. In validation, the root mean square error (SEP), correlation coefficient (RP) for prediction and the ratio of performance to deviation (RPD) were 1.66 mmol∙L-1, 0.966 and 4.7, respectively. The results showed that the high-precision analysis of SUN was feasibility based on NIR spectroscopy and Norris-PLS. The global optimization method of NDF is also expected to be applied to other analysis objects.

es, and has been effectively used in agriculture [1] [2] [3] [4], food [5] [6] [7] [8], environment [9] [10], biomedicine and other fields [11]- [16]. NIR spectroscopy of most sample types can usually be measured directly without physically or chemically treating. It is possible to directly measure complex liquid samples containing multiple components (e.g. blood, milk, etc.) by transmission. However, its spectrum contains both instrumental noise and interference caused by other unknown components. Thus spectral pretreatment of high signal-to-noise ratio is significantly necessary to solve this problem.
In spectral preprocessing, appropriate smoothing and derivative can effectively eliminate noises. The famous Norris derivative filter (NDF) includes two steps: the moving average smoothing and differential derivation. It uses three parameters: the derivative order (d), the number of smoothing points (s) and the number of differential gaps (g). NDF is an algorithm group with various parameters and modes, which is often used in the NIR analysis [17] [18]. The appropriate Norris modes should be chosen according to the analytical objects. It is necessary to make large-scale optimization selections for Norris modes. Due to the heavy workload, such work has not been reported yet.
Serum urea nitrogen (SUN) is an important indicator for clinical evaluation of renal function and protein metabolism. It plays an important role in the diagnosis of renal diseases in acute myocardial infarction [19] and acute heart failure [20]. And it is also an adjunct predictor of acute pancreatitis [21], hypertension and depression. The normal value of SUN is 2.9 to 7.5 mmol•L −1 . The conventional analytical methods for SUN include gas chromatography, ion conductivity and urease methods. These methods require chemical reagents and complex operations, which are inconvenient for rapid and daily screening of large population. Therefore, the development of simple and rapid analytical method for SUN in the routine testing is of great significance.
In fact, the serum urea nitrogen (CO(NH 2 ) 2 ) contains the hydrogen-containing group (-NH 2 ), which has characteristic absorption in the NIR region. Therefore, in molecular level, the NIR spectrum has the possibility to quantitatively analyze urea nitrogen. There have been some researches and developments in this area, such as using NIR spectroscopy to analyze the urea nitrogen content in serum simulated solution or urine [15] [16]. Recently, silver mirror enhanced NIR diffuse reflectance spectroscopy has been proposed for the analysis of SUN [22]. However, the existing analytical accuracy did not meet the requirements of clinical application, further innovation in methods is very necessary. And it is also crucial to conduct chemometric studies for model optimization.
In this study, combined with partial least squares (PLS) regression, a large-scale parameter cyclic and global optimization platform for NDF algorithm was constructed to achieve globally optimal selection. And the NIR spectroscopic analysis of SUN was taken as an example to evaluate the performance of the proposed NDF platform.  tively. Every sample was measured thrice. And the average spectrum of each sample was calculated and used for modeling. The spectra were measured at 25˚C ± 1˚C and 46% ± 1% relative humidity.

Mul-Partition Modeling
The 75 samples were randomly selected from 210 samples as independent validation set (not involved in modeling). The remaining 135 samples were used as the modeling set, which was further divided into calibration (80 samples) and prediction (55 samples) sets for 10 times to achieve the parameter stability. The root-mean-square errors (SEP) and correlation coefficients (R P ) for prediction were determined for each partition. The mean values (SEP Ave , R P,Ave ) and standard deviations (SEP SD , R P,SD ) for all the partitions were further determined, respectively. The comprehensive indicator SEP + = SEP Ave + SEP SD was used to select the parameters with stability. The SEP + takes the modeling prediction accuracy (SE-P Ave ) and stability (SEP SD ) both into account. The selected models were validated using the validation set. The corresponding SEP, R P , and ratio of performance to deviation (RPD) were further determined, respectively.

Norris Derivative Filter Algorithm
The NDF algorithm uses the symmetrical window of wavelengths to perform moving average smoothing on spectra firstly. The number of wavelengths in the smoothing-window is called the number of smoothing points (s, odd), which is set to 1, 3, , s S =  ; S can take the maximum odd number of the total number of wavelengths (N 0 ) in the entire spectral waveband. Since the low correlation, it is unreasonable to use a too wide smoothing-window. In this study, the upper limit S was set as 99. In addition, when s = 1, the moving average smoothing cannot be performed for that the original spectra were not pretreated.
The absorbance of the k th wavelength was x k . And the absorbance of the smoothing-window center x k were x i , value of x k was as follows: It's worth noting that for the leftmost 1 2 s − and rightmost wavelengths, the symmetrically smoothing cannot be performed. Based on an idea of data continuity, for the leftmost 1 2 s − wavelengths, the smoothing values of the absorbance x k were as follows: For the rightmost 1 2 s − wavelengths, the smoothing method was similar and omitted.
Difference derivation: The spectra pretreated by moving average smoothing were then used for derivation. The 1 st derivative of absorbance was calculated using the center difference method. Since the NIR spectra are relatively flat and the spectral resolution of different objects is usually different, the original spectral data gap is not necessarily suitable for the differential gap of derivative. The Norris derivative uses variable number of wavelength gaps as the number of differential gaps (g) for the derivative, 1, 2, , g G =  . Large g is unreasonable due to the correlation is low. In this study, the upper limit G was set as 50. For x k , the 1 st derivative of absorbance was calculated using the following center difference: Similarly, for the leftmost and the rightmost g wavelengths respectively, the center difference cannot be performed. Based on the idea of data continuity, for the g leftmost wavelengths, the 1 st derivative value of the absorbance was calculated by the forward difference method as follows: For the rightmost g wavelengths, the calculation method for 1 st derivative value of the absorbance was similar and omitted.
The 2 nd derivative value can be obtained by derivation on the basis of the Optimization: Considering that the absolute value of the derivative of the 3 rd or more is small and the spectral information is low, it is generally not recommended to use the derivative of the 3 rd or more. In this study, the derivative order is set as d = 0, 1, 2. In particular, when d = 0, only the previous moving average smoothing was required.
As above, based on parameters combinations (d, s, g), d = 0, 1, 2; 1,3, ,99 s =  ; 1, 2, , 50 g =  , a total of 5050 NDF modes were obtained. They were used to pretreat the sample spectra separately. The corrected spectra were then used to establish PLS models, named Norris-PLS models. Finally, according to the predicted effect, the optimal parameters were preferred as follows:

Direct PLS Model without Pretreatment
The NIR spectra of the 210 serum samples in the entire scanning region (780 -2498 nm) are illustrated in " Figure 1 The modeling effect (SEP Ave , R P,Ave , SEP SD , R P,SD , and SEP + ) are summarized in " Table 1". The results showed that high prediction error (SEP + = 7.07 mmol•L −1 ) and low correlation (R P = 0.535).

Norris-PLS Models
All Norris-PLS models corresponding to 5050 NDF modes also were established.
For different derivative order, the optimal SEP + corresponding to each single-parameter (number of smoothing points s or number of differential gaps g) are shown in " Figure 2". It can be observed that the global optimal NDF mode was 2 nd derivative, 33 smoothing points and 15 differential gaps (d = 2, s = 33 and g = 15). The corresponding SEP + and R P were obviously improved to 2.62 mmol•L −1 and 0.930, respectively. In particular, the prediction error (SEP + ) is only 37% of the original. From " Figure 2", it is observed that the prediction effects corresponding to different parameters (d, s, g) are significantly different.
Thus, the ergodic choice of the parameters was significantly necessary to achieve  better modeling performance. The prediction effects of the global optimal Norris-PLS model are also summarized in " Table 1". The spectra used the optimal NDF mode (d = 2, s = 33, g = 15) are shown in " Figure 1  Therefore, eliminating the interference waveband with high-absorption does help to improve the modeling effect.

Models Validation
The

Conclusions
NDF algorithm is a well-performed spectral preprocessing method. The appropriate Norris modes should be chosen according to the analytical objects. It is necessary to make global optimization selection for Norris modes to achieve optimal modeling performance.
Serum urea nitrogen is an important blood clinical screening index and has clinical reference value for the diagnosis and treatment of many major diseases.
The use of NIR spectroscopy to establish a rapid and reagent-free detection method for serum urea nitrogen can provide new technical support for related health screening of large populations. Using the optimal Norris-PLS model for SUN, the modelling prediction error (SEP + ) further decreased to 2.62 mmol•L −1 (decline rate 63%). Then, the saturated wavebands with high absorption were removed, and the SEP + then decreased to 1.87 mmol•L −1 (decline rate 29%).
Therefore, the good performance of the global optimization of NFD algorithm was indicated.
This study developed a large-scale parameter cyclic and global optimization platform for NDF algorithm and successfully used to SUN analysis. Meantime, the well-posed study of parameters in Norris-PLS method was given. We believed that the above promotion has such significance and can provide valuable reference for the NIR analysis of complex objects.