Capability of Visible-Near Infrared Spectroscopy in Estimating Soils Carbon, Potassium and Phosphorus ()
1. Introduction
For the large majority of countries, agriculture is an area of particular interest, in social, economic and environmental points of view [1] . It plays a key role in Malian economy and the country food security. Quantification of soils nutrient levels available to plants contributes to the success of productive cropping systems and healthy environment [2] . In Mali, chemical methods are widely used to make measure soil physical and chemical properties. However, these methods require the use of chemical extractants which are expensive and harmful to human and his environment. In this context, the search for an alternative solution to replace or reduce conventional methods for soil property analyses becomes necessary.
Advantages of spectroscopy methods are justified for several reasons. For example, sample preparation involves only drying and grinding, thus the sample chemical properties are not affected by the analysis. Also, the measurement is fast and several soil properties can be estimated from a single scan. Moreover, the technique can be performed in the laboratory or directly in situ.
First publications on the potential of VIS-NIR spectroscopy for soil analysis appeared in the early 1990 [3] [4] . Since this period, many works have been carried out on the use of this technique, especially during the last decades [5] . Particular emphasis has been placed on the classical soil properties such as soil organic matter (SOM), clay content, mineralogy, chemical nutrients, structure and microbial activity [5] . According to [6] , calibrations for total organic carbon as well as total soil nitrogen are the most likely to be successful. Previous findings indicated a coefficient of determination ranging from 0.23 to 0.92 for P and 0.11 to 0.55 for exchangeable K [7] [8] [9] . However, predictions for P and K by spectroscopy remained unacceptable [10] [11] .
Regarding the development of spectroscopic methods, few studies have been done in some African countries like Mali. Other studies [12] used soil samples from across the Lake Victoria basin of Kenya to investigate the potential of near infrared diffuse reflectance spectroscopy to estimate some properties. However, they suggested further calibrations using more diverse soil types and testing alternative infrared diffuse reflectance based methods.
The interest of the VIS-PIR spectroscopy method is justified for several reasons. For instance, sample preparation involves only drying and grinding, the sample is not affected by the analysis in any way, also no chemicals (environmental hazard) are required. In addition, the measurement is fast because it takes only a few seconds and several soil properties can be estimated simultaneously from a single scan. Moreover, the technique can be done in the laboratory or directly in situ.
A network of infrared spectroscopy laboratories is supported by the World Agroforestry Center (ICRAF) in national institutions in Africa, currently in Cote d’Ivoire, Kenya, Malawi, Mali, Mozambique, Nigeria, and Tanzania. Despite all this effort the spectroscopy method is poorly used for soil analysis in different research institutions. The present study contributes to document and to demonstrate the potential of the diffuse reflectance spectroscopy to estimate some soil properties in Mali.
The general objective of this study is to evaluate the performance of VIS-NIR spectroscopy in comparing the estimates of two regression models, namely Principal Component Regression (PCR) and Partial Least Squared Regression (PLSR) for the determination of total carbon (C), available phosphorus (P) and exchangeable potassium (K).
2. Material and Methods
2.1. Sample Collection
Soil sampling and preparation were carried out by the soil laboratory “Laboratoire Sol-Eau-Plante” in Mali. Areas covered by the study are the administrative regions of Koulikoro and Sikasso where 755 and 122 samples, were collected respectively. The soil samples were collected from 0 - 10 cm depth. Figure 1 shows the geographical characteristics of each sampling site.
2.2. Measurements of Soil Reference Data
Soil reference data measurements were carried out in the soil laboratory “Laboratoire Sol-Eau-Plante” (LSEP) using standard laboratory methods. Soil samples were priory air dried, crushed and sieved to 2 mm. The total carbon was measured by an automatic titrator using the modified Anne method which is the oxidation of soil carbon by potassium dichromate. The available phosphorus of the soil was extracted with a combined solution of 0.1 M HCl and 0.03 M NH4F and the measurements were made using an ultraviolet (UV) spectrophotometer or a colorimeter. For the exchangeable K, the soil was leached with a 1 M ammonium acetate solution at pH 7. It was determined directly in the ammonium acetate percolate using a flame photometer.
2.3. Sample Selection Method
The 877 soils samples were partitioned into two sub-samples constituting sets of calibration and validation. The selection was made in the way that all sampling sites are represented in both calibration and validation sets. Approximately 2/3 of samples from each sampling site are selected to form the calibration sample set (587 samples) and the remaining 1/3 is used to form the validation sample set (290 samples).
2.4. Spectral Measurements
Spectral measurements and their processing were carried out at the Laboratory of Optics, Spectroscopy and Atmospheric Sciences (LOSSA). These measurements consist of recording the soils reflectance over the wavelength range of 342 - 1060 nm. Soils samples were priory air dried and crushed to pass a 2-mm sieve. A Miniature Fiber Optic Spectrometer working on UV-VIS-NIR spectral range (BLUE-Wave Miniature Fiber Optic Spectrometers for UV-VIS-NIR & OEM, StellarNet Inc.) was used to perform the spectral measurements. The spectrometer is connected to a PC on which the Spectra Wiz software is installed for controlling the data acquisition. The samples were scanned using a halogen tungsten
Figure 1. Maps representing the distribution of soil sampling sites in Mali.
SL1 lamp source manufactured by StellarNet Inc. This light source has a wide spectral Range for 350 - 2500 nm-effective for color, reflectance, transmittance, and absorbance measurements. The Y-shaped optical fiber was used to transport light from the source to the sample and from sample to the spectrometer (Figure 2(a)) with the same incident angle of illumination.
2.5. Pre-Treatment of Spectra
Before being used, the raw spectral data are undergoing various pretreatments. The most common strategy for pre-processing spectra is to submit the raw data to one or more mathematical transformations intended to make them suitable for modeling.
Figure 2. (a) Scheme describing the spectroscopic equipment used for scanning soil samples and spectrum acquisition; (b) Raw spectrum; (c) filtered spectrum.
Sample spectra were filtered using the RunMean function under “caTools” package of the R statistical software. The filtering consists in eliminating the interference related to the experimental conditions and the electronic noise of the measuring instrument.
Spectral reflectance was measured in the wavelength range of 342 - 1060 nm. Since the UV band is of a little interest for the soil spectral study, we have restrained the spectral band to 400 - 1000 nm with a spectral resolution of 0.5 nm. Figure 2(b) and Figure 2(c) shows the effect of filtering on the spectrum of a sample chosen randomly. The filtered spectra were then converted to spectral absorbance (A) using the following relationship:
(1)
where R is the actual spectral Reflectance.
2.6. Chemometric Analysis of the Soil Chemistry and Spectral Data
The first part of the analysis is the calibration which consists of developing a mathematical model to determine the chemical properties Y (unknown concentration) from the available spectral measurements X (spectral absorbance). The model is setup using variable X and Y information of the calibration sample set. Once the model is established, it can be used to estimate the chemical properties of unknown samples. Two regressions models have been involved in this analysis: the principal component regression (PCR) and the partial least square regression (PLSR).
Validation is the second step of the analysis which consists in evaluating the performance of the calibrated model by comparing its estimates with the reference values. During this phase, the accuracy of the model is evaluated on a set of independent samples meaning samples that did not participate in the calibration process. Thus, a good prediction implies a good quality of the calibration model. If the model appears satisfactory, it can be applied in routine analysis to analyze unknown samples.
2.6.1. PCR Model
Principal component regression (PCR) is a two-step multivariate analysis method. The first step consists of performing a principal component analysis (PCA) of the explanatory data matrix X to convert them into new data matrix: the matrix T (X-scores) and a matrix P' (X-loadings). PCA creates new orthogonal variables T (latent variables) that are linear combinations of the original X variables with the coefficients ai.
(2)
In the second step, a multiple linear regression (MLR) is established between the scores obtained and the measure (known) variable Y.
The Principal Component Analysis is a way of dealing with the problem of poorly conditioned matrices. The objective is to obtain a certain number of components capturing the maximum variation relative to the variables of the matrix X while assuring the model a certain quality of prediction. PCR can be considered as a linear regression method in which the response variable is regressed on new components.
2.6.2. PLS Model
The partial least squares regression (PLSR) is based on principal components on both the independent variable X, and the dependent variable Y. The PLS model links an unknown variable Y to a block of explanatory variables X through latent variables that are linear combinations of the initial explanatory variables [13] . The latent variables explain as much as possible the covariance between both variables X and Y. The approach is to calculate the principal scores of X and Y and to set up a regression model between the scores.
The equation system bellow (Equation (3)) highlights how the independent data matrix X can be decomposed into a matrix T (X-score), and a matrix, P' (X-loading), plus an error matrix, E. Similarly, the dependent data matrix Y is decomposed into the matrix U (Y-scores) and a matrix Q' (Y-loadings) plus the error term F. These decompositions are made so as to maximize the covariance between the scores matrix T and U.
(3)
The X-scores (Ti) are orthogonal and they are estimated as linear combinations of the original variables Xi. Thus, the matrix of latent variables T is a linear transformation of X.
2.7. Statistics Assessing Model Performance
Both multivariate models were validated with an independent data set representing about 1/3 of the total sample. The models performances were assessed by using some standard statistics: coefficient of determination (R2), the Bias and the root mean square error (RMSE). These statistical measures were computed using the following formulations:
(4)
With
the values of the measurements,
the predicted values and
the average of the measurements.
3. Results and Discussion
Descriptive statistics of three soil properties C, P and K are summarized in Table 1. The difference between the minimum and maximum values of the concentrations for these elements demonstrates their strong variability in across the sampling sites. Also statistics for the validation sample set were within the range of the statistics for calibration sample set for all the three soil properties (Table 1).
3.1. Performance of Models Cross-Validation
By checking at all the statistical criteria R2, Bias and RMSE, both models show
Table 1. Descriptive statistics of the soil reference data constituting of 587 soil samples of calibration and 290 soil samples of validation.
*STD: Standard deviation.
good calibration qualities. They have strong coefficients of determination and weak bias for all elements. The PCR gives the best calibration quality with R2 stronger than 0.80 for all elements (Table 2). This performance is better than that found by [14] for the carbon, with R2 = 0.68 in the NIR region (700 - 2498). The PLSR model also has good calibration quality with coefficients of 0.801 for potassium, 0.872 for carbon and 0.881 for phosphorus (Table 2).
However, the coefficient of determination is not the only parameter to be considered for assessing the performance of a model. The root mean square error and the Bias between the measured and predicted values are also used as statistics to evaluate the robustness of a model.
The RMSE obtained from the PCR method is 0.22, 1.00 and 0.17 respectively for carbon, phosphorus and potassium. And for the PLSR method, the RMSE is 0.21 for the C, 1 for the P and 0.15 for the K. The bias obtained from both models (0.0004 for C; 0.0017 for P and 0.0003 for K) are relatively weaker compared to their respective average values (0.51, 1.33 and 0.25).
These results can be further improved by proceeding to homogeneous distribution of the samples from different sampling sites, as some sampling sites are represented by nearly 60 samples while others sites had only 4 samples. It has been argued that the calibration of predictive models on a limited number of samples or on fairly homogeneous samples may limit the scope of the calibration model [15] .
3.2. Independent Validation
It can be seen that the performance of the prediction models varies from one chemical property to another and also from one model to another. An independent validation of both multivariate models calibrated for the three chemical properties (C, K and P) reveals lower performance of prediction with regards to the cross-validation performance. The PCR model has coefficients of determination of 0.17 for carbon, 0.34 for potassium, and 0.50 for phosphorus (Figures 3(a)-(c)). These values are comparatively lesser than 0.87 (C) and 0.64 (K) obtained respectively by [16] with the band 1100 - 2500 nm and [17] with the band 400 - 2498 nm.
The independent validation of the PLSR method yielded R2 = 0.29 for C, R2 = 0.42 for K and R2 = 0.57 for P (Figure 3(d), Figure 3(e), Figure 3(f)). Comparing
Table 2. Statistics showing the performances of the PCR and PLSR models for cross-validation (calibration) and for independent validation. All the coefficients of determination exceed the 95% of level of significant.
Figure 3. Scatter plot with regression line showing the statistics comparing the reference values of C, P and K with their respective estimations from PCR model ((a)-(c)) and PLSR model ((d)-(f)).
to some previous finding, these performances are lower compared to R2 = 0.66 (C) and R2 = 0.61 (K) obtained respectively by Sorensen and [18] with the band 408 - 2492 nm and [19] with the band 400 - 2498 nm. This low performance can be attributed to the wavelength band used and also to the large extension of the study area. Indeed, the predictions of the soil components by spectroscopy may fail if the samples are collected in a very large geographical area or from different morpho-pedological contexts. Some previous findings [20] explain the failure by the difference between the original soil parent materials. However, for the potassium, our result is better than R2 = 0.40 obtained by [19] in the same conditions.
Although, for some chemical, the estimates gave low performances for both models, but it has to be noted that the PLSR assures qualities of prediction of the phenomenon better than the PCR. This is due to the fact that the PLSR components capture the information carried by the explanatory variables while paying attention to the link between the two variables.
The RMSE and the bias found are very low compared to the average of the reference data. For PCR, the RMSE obtained was 0.23 for C; 1 for P and 0.17 for K and the bias was 0.0125 for total carbon, 0.0709 for phosphorus and 0.002 for potassium. The PLSR model shows RMSE values of 0.22, 0.95 and 0.16, respectively for C, P and K; the respective biases found were −0.0108; 0.2023 and 0.009.
5. Conclusion
This study documents the potentiality of the VIS-NIR diffuse reflectance spectroscopy in soil study. This method of analysis is a very promising tool for soils study: rapidity, ease of measurement, without the use of chemicals and even in situ measurements in the field can be envisaged. Results show that the PLSR estimation over-performs the prediction of the PCR model. The independent validation reveals that VIS-NIR spectroscopy over 400 - 1000 nm has limited performance for estimating some soil properties. The prospect of using the entire spectral band of the VIS-NIR (400 - 2500 nm) for the analysis of soil properties may be considered. The creation of a spectral database by selected zone to limit the study area can be a promising solution to achieve good results. For instance, spectral soils database can be realized for specific key areas for Malian agriculture, such as “Office du Niger” and “zone CMDT” dedicated respectively for rice and cotton cultivation. Faster and easier prediction of the soil properties of these areas can be very promising. This will contribute to the development of agriculture in these areas which constitutes the major agricultural basins of Mali and have a considerable impact on the country economy.
Acknowledgements
We acknowledge the International Science Program (ISP/IPPS) for supporting the Laboratory of Optics, spectroscopy and Atmospheric Science (LOSSA) of the “Faculté des Sciences et Techniques de Bamako”. Our gratitude goes to the “Laboratoire Sol-Eau-Plante de l’IER” for providing soils samples and reference chemical properties.