Estimating the Texture of Purple Soils Using Vis-NIR Spectroscopy and Optimized Conversion Models ()
1. Introduction
Soil texture depicts soil particle composition expressed as the mass percentage of soil particles at all levels [1] . It is a relatively stable natural attribute of the soil and affects the fertilization response, water-holding capacity, aeration, and the difficulty of tillage [2] . More specifically, soil texture regulates the growth and development of crops by determining runoff infiltration and soil water content, air, and heat, as well as the transformation rate and existing state of soil nutrients, thus affecting the growth and development of crop roots and the yield and quality of crops [3] . Different soil textures always have distinct agricultural production traits; therefore, it is necessary to understand the soil texture when using, managing, and improving soil [4] . Moreover, soil texture also modulates land surface and atmospheric processes, such as soil-plant-atmosphere interaction, soil erosion, and soil solute transport [5] . Therefore, a rapid acquisition of soil texture information is of great significance for land management to improve agricultural productivity and ecosystem services.
Soil texture is always determined by laboratory methods and in-situ field measurements. Laboratory measurements generally adopt specific gravity and suction tube methods, and field measurements employ dry test and wet test methods. These methods largely require large-scale field sampling and pretreatment efforts [6] . Thus, it is time-consuming and labor-intensive and causes irrecoverable damage to samples [7] . The laser diffraction method offers a relatively alternative option [8] . In recent years, soil Vis-NIR reflectance spectroscopy has been increasingly used to solve the trade-offs between large-scale soil information and high cost [9] . Its attraction lies in that one single measurement can be used to assess a wide range of soil properties, thus facilitating the analysis of many samples in a short time [10] . It can predict the chemical and physical properties of soils such as soil organic matter, texture, and clay minerals [11] .
It is difficult to directly predict soil texture from the spectrum. Spectral preprocessing is an important initial step and improves the accuracy of the prediction results [12] . Approaches used for spectral preprocessing include Savitzky-Golay (SG) smoothing [13] and multiplicative scattering correction (MSC) [14] . Given soil components interact with each other in a complex way to produce their spectra; there is a strong correlation between the adjacent bands of the spectral data. However, not all bands have an equal impact on the subsequent spectral processing [15] . Therefore, the importance of characteristic bands selection in spectral data processing and analysis has been widely recognized and has become a key step in hyperspectral data modeling and analysis [16] . It is also necessary to select the appropriate calibration model for quantitative measurement. Using appropriate calibration methods is key in determining the success of simulated reflection spectrum calibration. Common calibration methods include linear and nonlinear approaches. The linear regression model establishes simple linear and multivariate linear regression equations based on the variables and auxiliary information of known points, to realize the prediction of unknown points [17] [18] . The commonly used linear methods include stepwise multiple linear regression (SMLR) [19] and partial least squares regression (PLSR) [20] . Machine learning in nonlinear methods has good applicability in dealing with multi-dimensional, nonlinear massive data and improving the generalization ability of the model [21] [22] . Common models include random forest (RF) [23] , support vector machine (SVM) [24] , neural networks (NNs) [25] , etc.
Purple soil, one of the soil types defined within the genetic soil classification of China, is the most important and widely distributed type of arid land soil in southern China. Purple soil is formed by weathering of the Mesozoic and Cenozoic purple sandstone (including Triassic, Jurassic, Cretaceous, and Tertiary) [26] . It is fast weathered and it has high soil mineral content, but also has the defects of the shallow soil layer and low aggregate content, coupled with the influence of human farming activities, thus purple soil is susceptible to environmental disturbances such as reservoir fluctuations and heavy rainfall [27] , which seriously threatens the safety of village buildings and roads and hinders the development of agricultural production [28] . However, there are few studies on the estimation of purple soil texture, especially for the three types of purple soil. Therefore, it is necessary to study the Vis-NIR reflectance spectra of the calcareous (pH > 7.5), neutral (6.5 ≤ pH ≤ 7.5), and acidic (pH < 6.5) purple soil and their relationship with the texture of the soil and assess the suitability of Vis-NIR spectroscopy coupled with SMLR, PLSR, and BPNN methods for quickly obtaining textural information of soil in Chongqing.
2. Materials and Methods
2.1. Site Description
The sampling site is located in Tongnan and Beibei Districts of Chongqing. Tongnan District is characterized by a shallow hill landform of the basin. It lies between 105˚31' - 106˚00'E and 29˚47' - 30˚26'N and rises to altitudes of 300 to 400 m. The purple soil in this area accounts for 46.30% of the total cultivated land, and the main soil type is calcareous purple soil. Beibei District belongs to the parallel ridgeline area of Chongqing. It lies between 106˚18' - 106˚40'E and 29˚27' - 30˚05'N and rises to altitudes of 500 to 900 m with steeper terrain than that in Tongnan. Paddy soil and purple soil accounted for 81.7% of the soil area in the Beibei District, and the main soil types are neutral and acidic purple soils. Both two districts have a humid subtropical monsoon climate, with annual average temperatures ranging from 16˚C to 18˚C, annual rainfall ranging from 1000 to 1350 mm, and total annual sunshine of 1000 to 1200 hours.
2.2. Soil Sampling and Analysis
Soil sampling was carried out using a five-point mixing method. 62 calcareous purple soil samples were collected from Gaohe catchment in Tongnan, 63 neutral purple soil samples and 65 acidic purple soil samples were collected from Jigongshan and Baihelin catchments in Beibei. All samples were air-dried, disaggregated, and sieved using a 2-mm mesh after removal of stones, plant roots, and litter. Soil texture was determined using laser particle size analysis (Malvern MS2000) and grouped into three subcategories in terms of clay (<0.002 mm), silt (0.002 - 0.05 mm), and sand (>0.05 - 2 mm). The wide difference in the percentage contents of clay and sand in soil samples leads to varied spectral properties [29] ; therefore, this paper only studied the contents of clay and sand. Two-thirds of the datasets of the clay and sand contents were used for calibration and one-third for validation according to a stratified sampling approach.
2.3. Spectra Measurements
The spectral reflectance of the soil samples was measured using an ASD FieldSpec 4 spectroradiometer (Analytical Spectral Devices Inc., USA). We used a wavelength range of 350 - 2500 nm, with a sampling interval of 1.4 nm between 350 and 1000 nm, and 2 nm between 1000 and 2500 nm, and measured a total of 2150 nm. A halogen light source (50 W) that could provide parallel light was placed at a distance of 30 cm from the soil (which was held in a black sample dish with a diameter of 6.5 cm and a depth of 2 cm) and at a zenith angle of 45˚. To reduce the diffuse reflection of light, the surface was scraped flat and calibrated before each measurement. The spectral reflectance of each sample was measured at four angles (rotating the petri-dish 90˚ clockwise) with five replications, and the average of these 20 measurements was deemed as the standard spectral reflectance of the sample.
2.4. Spectrum Preprocessing
The SG convolution smoothing method [30] was used to pretreat spectral reflectance data, and then the smoothed spectral data can be transformed to increase the correlation between spectral reflectance and the soil’s physical and chemical properties [31] . Spectral transformation mainly uses three methods including continuum-removal (CR), first-order differential reflectivity (R'), and second-order differential reflectivity (R"). The three methods can highlight effectively the reflection and absorption characteristics of the spectral curve. Among them, the R' and R" can also be used to quickly determine the inflection point and the position of the maximum and minimum reflectance in the wavelength range. The calculation method is as follows:
(1)
(2)
(3)
Here,
is the wavelength band at i nm,
is the wavelength band at i + 1 nm, and
is the spectral reflectance of band
.
and
are the first- and second-order differential reflectivity of band
, where
= λi + 1 − λi = 10 nm.
2.5. Model Description
2.5.1. Linear Modeling
In this study, we used the SMLR and PLSR methods for the modeling. SMLR is used to predict the dependent variable according to the best combination of several independent variables. The key is that the selection of the dependent variable is based on the principle of keeping the most significant band of the dependent variable and keeping the number of the dependent variable as small as possible. PLSR is a fully independent linear regression model, which combines the methods of multivariate regression analysis and principal component analysis. It can solve the problems of independent variable multicollinearity and sample numbers less than the number of variables [32] , and it can also effectively identify spectral information and noise, thereby reducing the spectral dimension and data redundancy in spectral modeling [33] .
2.5.2. Non-Linearity Modeling
A BPNN is an information processing system based on the structure and function of the brain’s neural network [34] . It can fit complex nonlinear functions by learning from a large number of sample data and has a strong nonlinear mapping and generalization ability [35] . In this study, the BPNN was operated in MATLAB software, and the network consisted of an input layer, a hidden layer, and an output layer. The soil samples were divided into a model set, test set, and verification set according to the proportion of 4:2:3, and they were normalized in the learning process.
2.5.3. Prediction Accuracy
The indicators that we used to test the three models were the coefficient of determination (R2), root mean square error (RMSE), and the ratio of performance to inter-quartile distance (RPIQ). A larger RPIQ indicates a better-fitting effect of the model. The values of R2, RMSE, and RPIQ were calculated as follows:
(4)
(5)
(6)
Here,
is the measured value and
is the predicted textural value, and n is the number of samples. IQ is the difference between the third and the first quartile of the sample.
3. Results
3.1. Descriptive Statistics of the Soil Texture
The descriptive statistics of clay and sand contents in the three purple soils are shown in Figure 1 and Figure 2. The clay content of all soil samples ranged from 4.40% to 27.12%, and the sand content ranged from 0.34% to 36.57%. Combined with the USDA soil texture classification triangle, it could conclude that soil samples had three different textural classes (silt, silt loam, and silty clay loam). Among them, the skewness and kurtosis values of the clay content varied from −0.17 to 1.88, roughly fitting the usual normal distribution. The mean and standard deviation of clay content in neutral purple soil was 19.18% and 2.14%, respectively, and the difference in clay content was larger than that of calcareous and acidic purple soils. While the mean and standard deviation of sand content in acidic purple soil was 18.26% and 8.71%, respectively, and the difference in sand content was larger than that of the other two soils. In addition, the clay and sand contents of the three purple soils fell within the range of 10% to 100%, indicating moderate variation.
Figure 1. Descriptive statistics of soil clay content. SD: standard deviation, CV: variation coefficient.
Figure 2. Descriptive statistics of soil sand content. SD: standard deviation, CV: variation coefficient.
3.2. Correlation of Transformed Reflectance Spectra and Soil Texture
The correlation between the spectral reflectance of the three purple soils and their clay content in different bands is shown in Figure 3. In the original spectrum, the clay content of the calcareous and neutral purple soils had a low correlation with spectral reflectance, whereas the clay content of the acidic purple soil had a very significant correlation with the spectral reflectance, and all bands were belonging to the sensitive bands. After the CR, R', and R" transformation, the correlation of the original reflectance increased obviously. The sensitive bands corresponding to the highest correlation coefficients after the CR and R' transformation were concentrated at 2417 - 2437 nm and 433 - 566 nm, respectively, and the absolute value of the correlation coefficients were 0.72 and 0.83.
The correlation between the spectral reflectance of the three purple soils and their various transformations with the sand content over a wavelength range of 350 - 2500 nm is shown in Figure 4. The original reflectance of calcareous and acidic purple soils with their sand content showed a very significant correlation, its sensitive bands were located at 879 - 1364 nm and 350 - 2500 nm, respectively. However, the correlation between the original reflectance of neutral purple soil and the sand content was not significant. After the CR transformation, the spectral reflectance with the sand content of calcareous purple soil had the highest correlation coefficient of 0.79, and the sensitive bands changed to 2417 - 2437 nm. But the correlation coefficient between the spectral reflectance of the R' transformation and sand content was not as good as that of the CR transformation, and the correlation coefficient was 0.70. Compared with the CR and R' transformations, the reflectance after the R" transformation had the highest correlation coefficient with sand content, with a value up to 0.85. Its sensitive bands were concentrated between 448 and 467 nm.
Figure 3. Correlation between the spectral reflectance and clay content under different spectral transformations for calcareous, neutral, and acidic purple soils.
Figure 4. Correlation between sand content and the spectral reflectance under different spectral transformations for the calcareous, neutral, and acidic purple soils.
3.3. Model Calibration and Validation
3.3.1. SMLR and PLSR Modeling
Table 1 summarizes the SMLR estimation results of the clay and sand contents
Table 1. SMLR model results for clay and sand contents in the three purple soils after applying the four transformations.
Notes: R: reflectivity; CR: continuum-removal; R': first-order differential reflectivity; R": second-order differential reflectivity; RMSE: root mean square error; RPIQ: the ratio of performance to inter-quartile distance.
in the three purple soils under the 10 bands with the highest correlations. Based on the four spectral indices, except that the R2 of clay content in acidic purple soil after the CR transformation was lower than that of the original reflectance, the R2 of clay and sand contents in other soil types after the CR, R' and R" transformation was higher than that of the original reflectance. And the modeling effect of clay and sand contents in the three purple soils after the R" transformation was better than that of the other two transformations. Among them, the acidic purple soil after the R" transformation had the best prediction effect with the R2 of 0.822, 0.675, the RMSEV of 0.868, 4.772, and the RPIQ of 2.135, 1.871, respectively. By contrast, the modeling effect of clay content was better than that of sand content. In short, the clay content in acidic purple soil after the R" transformation had the best modeling effect.
The Calibration and validation sets of the PLSR model were obtained by extracting the first 10 characteristic bands with the highest correlation coefficient for the clay and sand contents for the three purple soils (Table 2). The modeling effect of the original reflectance after mathematical transformations was better
Table 2. PLSR model results for clay and sand contents in the three purple soils after four transformations.
Notes: R: reflectivity; CR: continuum-removal; R': first-order differential reflectivity; R": second-order differential reflectivity; RMSE: root mean square error; RPIQ: the ratio of performance to inter-quartile distance.
than that of the original reflectance, especially the model prediction effect after the R" transformation was better than other transformations. The modeling results of clay and sand contents in acidic purple soil were better than those of neutral and calcareous purple soils, with the modeling set of
> 0.736, and the verification set of
> 0.744. Among them, the predicted value of clay content was higher than that of sand content (the
was 0.832, the RMSEV was 0.851, and the RPIQ was 2.056), and the modeling and validation sets of the sand content had a larger RMSE of 4.772, indicating that there is a large deviation between the predicted value and the measured value. In conclusion, the clay content of acidic purple soil by the R" transformation has the best model prediction ability.
3.3.2. BPNN Modeling
As mentioned above, soil texture and spectral reflectance had the highest correlation coefficient and the best modeling effect after the R" transformation. Thus, the spectral data from the 350 - 2500 mm waveband transformed by R" was used as the input layer, and tansig and purelin were employed as the transfer function of the hidden and output layers, respectively. In addition, the learning rate, the maximum training time, and the expected error of the model were set to 0.02, 1000, and 0.0001, respectively. The results of the modeling and verification sets are shown in Table 3, the clay and sand contents of the three purple soils have high prediction accuracy (R2 > 0.5), indicating that the model effect after the R" transformation has a generalization ability and a high fitting degree. Among them, the modeling effect of acidic purple soil was better than that of calcareous and neutral purple soils (R2 > 0.7, RPIQ > 1.9), but the sand content of acidic purple soil had a larger RMSE of 4.114. In contrast, the BPNN model of clay content in acidic purple soil had a better model prediction effect.
4. Discussion
4.1. Effect of Mathematical Transformation
There is a certain correlation between soil texture and spectral reflectance. The
Table 3. BPNN modeling results of clay and sand contents for the three purple soils following the R" transformation.
Notes: RMSE: root mean square error; RPIQ: the ratio of performance to inter-quartile distance.
bands respond more sensitive to the texture with the increase of correlation, which is conducive to the determination of characteristic bands. To improve the correlation between soil texture and spectral reflectance, the original spectral reflectance of mathematical transformation is necessary. In this study, the correlation was improved after the CR, R' and R" transformation of the original reflectance, especially the correlation coefficient was increased to 0.8 by the R" transformation. This is consistent with the highest correlation between soil moisture content and spectral reflectance of the three purple soils after the R" transformation [36] . These results demonstrated that the second derivative extremely eliminates the baseline effect and enhances the micro-absorption characteristics [37] . Mathematical transformation can also improve the accuracy of the model [32] . In this study, the prediction model after the R" transformation had the best effect (R2 > 0.5), indicating that the model after the R" transformation has a high fitting degree and generalization ability.
4.2. The Performance of the Three Models
The linear (SMLR and PLSR) and the nonlinear (BPNN) prediction models of the texture (clay and sand contents) in the three purple soils were established and the adaptability of the three models was evaluated by calculating R2, RMSE, and RPIQ. For the prediction value of calcareous purple soil, the R2 of the SMLR model was slightly higher than that of the PLSR and BPNN models (R2 = 0.682), and it had the lowest RMSE (RMSE = 0.932). While for the predicted value of neutral and acidic purple soils, the prediction effect of the PLSR model was better than that of SMLR and BPNN. This may be because the three purple soils have different physical and chemical properties, resulting in different spectral characteristics, thus there are different prediction models in the three purple soils [38] .
Moreover, it is found that the linear model has a better fitting degree than the nonlinear model in this study. This may be due to the fact that the linear model used the characteristic bands for modeling, while the nonlinear model needed the training of big data, and the full-band was used for modeling. The full-band (350 - 2500 nm) exists certain redundant information (including noise and repeated variables), and these variables in the construction of soil physical and chemical properties’ estimation model may reduce the prediction accuracy and reliability of the model [39] . Therefore, compared with the nonlinear model, if the linear model can overcome the multicollinearity problem directly by applying statistical rotation to simulate the relationship [40] , coupled with its simple operation, and good stability. It is recommended to use the linear method to quickly estimate the soil texture of the three purple soils. However, compared with the traditional measurement methods, the linear model has the advantages of less time-consuming and low cost, but it has a low prediction accuracy in estimating soil texture.
4.3. Limitations of This Study
Soil spectral reflectance is affected by texture, moisture, and other physical and chemical factors. The different percentage contents of clay, sand, and silt will lead to the scattering effect in the process of testing the spectrum, and the correlation information between the soil components to be tested and the spectrum is covered [41] . Therefore, in this study, soil samples were ground and screened to ensure the uniformity of soil particle size. Moreover, the influence of soil moisture on the soil surface spectral reflectance is complex, which makes it difficult to directly interpret the spectral parameters corresponding to soil characteristics from the obtained soil current spectral data [42] . Hereby, in this study, the effect of soil moisture on spectral reflectance was eliminated by using air-dried soil, and the clay content of air-dried soil can be better estimated than wet soil [43] .
The deficiency lies in that the estimation accuracy of soil texture in the fields could be affected negatively by several factors such as spectral mixture, atmospheric conditions, and spatial variability of soil moisture content, whereas our soil spectrum was collected indoors [44] [45] . In particular, there may be a large variation in soil moisture in the field, which will reduce the prediction accuracy of soil texture [46] . Therefore, to apply the models of this study to the field, they need to be further optimized. In the future research, it is better to focus on integration between spectra obtained in the field or in the laboratory, and spectra from airborne or satellite imagery [47] [48] , and evaluate the external parameter orthogonalisation (EPO) method that can project all the soil spectra orthogonal to the space of unwanted variation (i.e. moisture) to eliminate the effects of moisture on the spectra [49] .
5. Conclusion
In the study, different pre-processing methods were utilized for the estimation of the soil texture of the three purple soils based on Vis-NIR spectroscopy. The soil samples had three textural classes (including silt, silt loam, and silty clay loam). In addition, the CR, R', and R" pre-processing methods had strong positive influence on the improvement of correlation analysis and the performance of the three models, among which the R" had the greatest performance. On this basis, the SMLR, PLSR, and BPNN models were established, and the accuracy among them was compared. The results show that the SMLR model is more suitable for estimating the texture of calcareous purple soil, and the PLSR model is the optimum approach for neutral and acidic purple soils. The SMLR and PLSR models also provide better estimations than the BPNN model in the calibration verification step. But these model theories were studied under well-controlled laboratory conditions, future research will consider extending models for field and large-scale applications.
Acknowledgements
This research was supported by the Chongqing Talent Program (CQYC201905009), the Science Fund for Distinguished Young Scholars of Chongqing (cstc2019jcyjjqX0025), Sichuan Science and Technology Program (2020YJ0202,2021YFS0288), and Technology Research Project of Chongqing Municipal Education Commission (JQN201800531).