^{1}

^{2}

^{*}

^{1}

^{1}

^{1}

^{1}

^{1}

The best hyperspectral estimation model of soil total nitrogen (TN) was established, which provided the basis for rapid and accurate estimation of soil total nitrogen content, scientific and rational fertilization and soil informatization management. A total of 92 brown soil samples were collected from the orchard of Qixia County, Yantai City, Shandong Province. After drying and grinding, the hyperspectrum of the soil was measured in the laboratory using ASD FieldSpec3. The TN contents of brown soil were measured by Kjeldahl method. The sensitive wavelengths were selected by multiple linear stepwise regression method. The hyperspectral estimation model of TN was established by Random Forest (RF) and Support Vector Machines (SVM). The models were validated by independent samples. The best estimation model was obtained. The sensitive wavelengths were 956 nm, 995 nm, 1020 nm, 1410 nm, 1659 nm and 2020 nm. The coefficients of determination (R
^{2}) of the two estimation models were 0.8011 and 0.8283, the root mean square errors (RMSE) were 0.022 and 0.025, and relative errors (RE) were 0.1422 and 0.1639, respectively. Random Forest model and Support Vector Machines model are feasible in estimating TN contents, but the Support Vector Machines model is better.

Soil total nitrogen, as one of the essential nutrients for crop growth, has a significant effect on the growth and development of crops. The traditional chemical determination method of soil nitrogen is time-consuming and laborious. The near-earth hyperspectral technique developed rapidly in recent years with its higher spectral resolution provides a possibility for rapid and real-time estimation of soil total nitrogen content, which has important practical significance for scientific and rational fertilization of soil. At present, domestic and foreign scholars have made some research results on estimating soil nitrogen content. Reeves et al. [

This study was based on the brown soil of apple orchard in Qixia County, Yantai City, Shandong province. The spectral reflectance data of soil samples were obtained by using ASD FieldSpec 3 under controllable indoor conditions. At the same time, the TN content of soil samples was analyzed. The soil sample data was pretreated. The law of the change of TN content and its correlation were analyzed. The hyperspectral estimation model of soil total nitrogen content was established.

The soil samples were collected in Qixia County, Yantai City, Shandong Province (120˚33'E to 121˚15'E, 37˚05'N to 37˚32'N). Qixia County is located in the center of Jiaodong Peninsula. Warm temperate monsoon semi-humid climate, years of average temperature 11.4˚C, the average annual rainfall of 640 - 846 mm. It is a mountainous hilly terrain. Orchard soil is mostly brown soil, thin and soft, acidic, rich mineral elements and good permeability.

A total of 23 orchards in Qixia were sampled and collected on October 20 - 23, 2010. A total of 92 brown soil samples were collected. We randomly selected 4 trees at each sampling point. The soil samples were collected in the east, west, south and north directions below each fruit tree. The depth of the collected soil samples is 0 - 20 cm. After mixing the soil samples, we use the quartation to obtain the final sample. The location of the sampling area is shown in

The hyperspectral reflectance of the soil is measured by ASD FieldSpec3. The spectral range of the spectrometer is 350 - 2500 nm. The spectral interval is 1.4 nm in the range of 350 - 1000 nm, and the spectral resolution is 3 nm. The 1000 - 2500 nm range is 2 nm, the spectral resolution is 10 nm. The resampling interval is 1 nm and the output band number is 2151. The treated soil sample is placed in a vessel with a diameter of 15 cm and a depth of 2 cm. The soil surface is flattened after filling. Spectral measurements were carried out under the same

conditions as the darkroom. The light source of the instrument uses a halogen lamp with a power of 50 W. The light source is 30 cm away from the soil sample center. The optical fiber probe is fixed on the tripod, the field of view of the probe is 25˚ and the distance is 15 cm from the soil surface. At the time of measurement, the vessel was rotated three times, and the rotation angle was about 90˚ each time. Then the soil samples were obtained in four directions. After averaging, the reflectance data of the soil samples were obtained.

The soil TN content was determined by Kjeldahl method. In the presence of a catalyst, the soil samples were digested with concentrated sulfuric acid to convert the organic nitrogen into an inorganic ammonium salt. The ammonium salt was converted to ammonia under alkaline conditions, distilled with water vapor and absorbed by excess acid. The soil TN content was calculated by the standard alkaline titration.

Calculation formula:

TNcontent% = C × ( V − V 0 ) × 0.014 × D × 100 m (1)

Among them, C is the standard acid solution concentration (0.01 mol/L). V is the standard liquid volume of the acid used for titration(mL). V 0 is the standard acid solution volume for titration blank (mL). 0.014 is the molar mass of nitrogen (kg/mol). m is the sample volume (g). D is the multiple of the fraction, that is the volume of the decontamination liquid volume/absorb the measured volume.

The pretreatment of soil spectral data is a necessary and effective means to improve the precision of hyperspectral modeling [

Select a smooth window with a width of (2 w + 1). Calculate the spectral mean x ¯ a of each w wavelength point from the center wavelength point a and a point in the window, and the x ¯ a is substituted for the measured value at the wavelength point a. Change the value of a to move the window in turn. Until the smoothing of all wavelength points is completed. Polynomial least squares fitting is used to multiply the data in the moving window to achieve smooth purposes [

x s g = x ¯ a = 1 H ∑ i = − w + w x a + i h i (2)

In the formula, the h i is a smoothing coefficient, which is fitted by the least square method. H is the normalized factor.

Multiple scatter correction (MSC) is mainly to eliminate the scattering effect of uneven particle size distribution and particle sizes. The attributes of the MSC algorithm are the same as the standardization, which is based on the spectral array of a group of samples. The average spectra of all the NIR spectra were calculated first, then the average spectra were used as standard spectra. The near-infrared spectra of each sample were a-element linear regression with the standard spectra. The linear translation (regression constant) and the tilt offset (regression coefficient) of each spectrum relative to the standard spectra are obtained. The baseline relative tilt is subtracted from the original spectra of each sample and divided by the regression coefficient correction spectra.

Specific formula:

Calculate the average spectra,

A ¯ = ∑ j = 1 M A j M (3)

Using the mean spectra to calculate the regression coefficients,

A j = m j A ¯ + b j (4)

Using regression coefficients to calculate the corrected spectra of MSC,

A j ( MSC ) = ( A j − b j ) m j (5)

Among them, A j ( j = 1 , 2 , ⋯ , M ) is the spectral data of SG smoothing processing. A ¯ is the mean spectra. m j and b j are regression coefficients. is corrected by MSC Spectra.

The first order differential of the reflectance spectrum is obtained by differential technique. The formula is as follows:

f ′ ( X i ) = [ f ( X i + 1 ) − f ( X i − 1 ) ] / 2 Δ λ (6)

In the formula: X i is the wavelength of the wavelength. f ′ ( X i ) is the first order differential spectrum of X i . Δ λ is the interval of wavelength X i − 1 to X i .

After logarithmic transformation of soil reflectivity, the spectral difference of visible area can be enhanced. The stochastic factors caused by the changes of illumination condition and topography are reduced. The formula is as follows:

Y ( X i ) = log X i (7)

In the formula: X i is the wavelength of each band. Y ( X i ) is the logarithmic transformation of the soil reflectivity.

The multiple linear stepwise regression is used to filter sensitive bands, select or remove variables according to the set F value, and finally establish the best model with only a few variables [

Random Forest (RF) is a kind of algorithm based on classification tree, which improves the accuracy of prediction without significant improvement. RF can explain the effect of some independent variables on the variable Y. If the dependent variable Y has n observations, there are k independent variables associated with it. In the construction of the classification tree, the random forest will randomly re-select the n observations in the original data. Some of which are selected multiple times, some have not been selected. This is Bootstrap re-sampling method. At the same time, RF randomly selected partial variables from k independent variables to determine the classification tree nodes. In this way, each time you build a classification tree may be different. In general, RF randomly generates hundreds of to thousands of classification trees and then selects the tree with the highest degree of repeat as the final result [

Support Vector machines (support vector machine, SVM) is the first proposed by Corinna Cortes and Vapnik in 1995. It is based on the theory of VC dimension of statistical learning theory and the minimum principle of structural risk, as well as seek the best compromise between the complexity of the model and the learning ability of the limited sample information in order to obtain the best promotion ability. It has many unique advantages in solving small sample, nonlinear and high dimensional pattern recognition. It can be applied to other machine learning problems such as function fitting. Support Vector Machine regression is a better way to realize the idea of structural risk minimization. It has machine learning theory and technology, and the learning algorithm of neural network is included in the field of nuclear technology [

The estimation effect of the model is tested by coefficients of determination (R^{2}), root mean square error (RMSE) and average relative error (RE).

Calculation formula:

R 2 = ∑ i = 1 n ( y i ^ − y i ¯ ) 2 ∑ i = 1 n ( y i − y ¯ ) 2 (8)

RMSE = 1 n ∑ i = 1 n ( y i − y i ^ ) 2 (9)

RE = 1 n ∑ i = 1 n | y i − y i ^ | y i × 100 % (10)

y i is the measured value. y i ^ is the predicted value. y ¯ is the mean of the measured value. y i ¯ is the average of the predicted value. n is the number of samples. The larger the R^{2}, the more stable the model. The smaller the RMSE and the RE, the higher the estimation accuracy of the prediction model [

Samples | Observations | Maximum (g∙kg^{−1}) | Minimum (g∙kg^{−1}) | Mean (g∙kg^{−1}) | Standard deviation (g∙kg^{−1}) |
---|---|---|---|---|---|

Total Samples Calibration Validation | 92 69 23 | 0.352 0.352 0.247 | 0.073 0.079 0.073 | 0.145 0.147 0.140 | 0.052 0.055 0.044 |

absorption valleys at 1400 nm, 1920 nm and 2200 nm, showing typical soil spectrum characteristics.

The spectral reflectance data obtained by SG smoothing of the original spectra are analyzed by the correlation analysis with the total nitrogen content of the soil. The results are shown in

As shown in

It can be seen that the differential spectra can provide higher resolution and clearer spectral contour transformation than the original spectra. It can also eliminate the influence of background interference and improve the correlation between the soil total nitrogen content and spectral reflectance. Therefore, we use multivariate scattering to correct the first-order differential transformation form compared with the correlation of four kinds of transformation modes in

The reflectivity of the first order differential is corrected by the spectral multivariate scattering and the TN content of the soil is the dependent variable. The significant level of setting the selected variables is 0.05; the level of elimination variable is 0.10. Then, the multivariate linear stepwise regression analysis is obtained. The sensitivity of the total nitrogen were 956 nm, 995 nm, 1020 nm, 1410 nm, 1659 nm and 2020 nm.

The RF regression model of soil total nitrogen content was established by selecting the screened sensitive wavelength as the independent variable of the prediction model of soil total nitrogen content. Select Ntrees = 300, the training sample proportion is 50%. The node at the variable number is 3. The results of modeling samples of stochastic forest regression model and validation samples are shown in

Selecting the filtered sensitive wavelength as the condition attribute of support vector machine regression and the soil total nitrogen content as the decision attribute. The support vector machine type finally confirmed for epsilon-svr and the kernel function type is RBF through the parameter optimization, the regression modelling and the verification. The model parameter is shown in

Degree | Gamma | Coef 0 | Epsilon | C | Nu | Shrinking | p |
---|---|---|---|---|---|---|---|

3 | 0.5 | 0.001 | 0.001 | 1 | 0.5 | 1 | 0.01 |

Modeling method | R^{2} | RMSE | RE (%) |
---|---|---|---|

RF | 0.7058 | 0.0251 | 0.1698 |

SVM | 0.7754 | 0.0221 | 0.1525 |

The estimation model was tested by independent sample data in order to analyze the prediction ability of soil TN content by RF model and SVM model. The test results are shown in ^{2}) of the two methods were 0.7058 and 0.7754 respectively, the root mean square errors (RMSE) were 0.0251 and 0.0221 and the relative errors (re) were 0.1698 and 0.1525 respectively. In contrast, we can see that the SVM model is more accurate after verification.

The correlation between spectral reflectance and soil TN content was improved by using the first order differential treatment of MSC and SG smoothing spectral reflectance data at the same time. A series of sensitive wavelengths (956 nm, 995 nm, 1020 nm, 1410 nm, 1659 nm and 2020 nm) are screened out by the multivariate linear stepwise regression analysis, which provides an important basis for improving the stability and reliability of the model.

Comparing the two models of RF and SVM, the model predictive coefficients (R^{2}) were 0.7058 and 0.7754, RMSE were 0.0251 and 0.0221 and the relative errors (RE) were 0.1698 and 0.1525 respectively. The results show that the SVM regression model is more feasible than the RF regression model in estimating the soil total nitrogen content. In contrast, we can see that the model established by SVM is more accurate after verification.

This paper was supported by the National Nature Science Foundation of China (41671346, 41271369), Funds of Shandong “Double Tops” Program (SYL2017XTTD02) and agriculture big data project of Shandong Agricultural University (75016).

Cao, S.J., Zhu, X.C., Li, C., Wei, Y., Guo, X.Y., Yu, X.Y. and Chang, C.Y. (2017) Estimating Total Nitrogen Content in Brown Soil of Orchard Based on Hyperspectrum. Open Journal of Soil Science, 7, 203-215. https://doi.org/10.4236/ojss.2017.79015