Hyperspectral Analysis for a Robust Assessment of Soil Properties Using Adapted PLSR Method

Near-InfraRed and Visible (Vis-NIR) spectroscopy is a promising tool allow-ing to quantify soil properties. It shows that information encoded in hyperspectral data can be useful after signal processing and model calibration steps, in order to estimate various soil properties throughout appropriate statistical models. However, one of the problems encountered in the case of hyperspectral data is related to information redundancy between different spectral bands. This redundancy is at the origin of multi-collinearity in the explanatory variables leading to unstable regression coefficients (and, difficult to in-terpret). Moreover, in hyperspectral spectrum, the information concerning the chemical specificity is spread over several wavelengths. Therefore, it is not wise to remove this redundancy because this removal affects both rele-vant and irrelevant hyperspectral information. In this study, the faced challenge is to optimize the estimation of some soil properties by exploiting all the spectral richness of the hyperspectral data by providing complementary rather than redundant information. To this end, a new reliable approach based on hyperspectral data analysis and partial least squares regression is proposed.


Introduction
Soil is a part of the natural environment and one of the most valuable natural resources. In environmental monitoring, and given its importance in environmental management sustainable agriculture and hydrological, reliable and rapid assessment of soils properties is a crucial challenge. Various remote sensing data, including multispectral and hyperspectral remote sensing data [1], have been widely used to identify and map soil properties (such as saline soils). Moreover, several research studies targeted to exploit the reflectance spectrum across the visible, near infrared and shortwave infrared region for the assessment of soil properties [2]. Although, the use of hyperspectral data to estimate soil characteristic information has gradually oriented to predict soil physical and chemical properties, etc., soils spectral reflectance encapsulates the necessary information to qualify and quantify all predictable properties. In fact, and according to [3] [4], a soil property is predictable by Vis-NIR spectroscopy if it is correlated with a chemical specificity. Indeed, the interaction between matters and electromagnetic waves (within the soil spectrum sample) is not directly exploitable due to the great variability of the phenomenon [5] [6]. The huge amount of involved variables, where some of which are poorly known (such as incidence angles, light intensity, soil components distribution, etc.), leads to the fact that the best description approach of the phenomenon is the statistical modeling. Therefore, to conduct a predictive soil properties model, it is more efficient to rely on learning-based methods such as the linear regression approach. An overview of these approaches and how they provide optimal results under certain circumstances is given in [7] [8]. These approaches assume that the data verify a number of assumptions such as linearity. In reality, the data relatively deviate from these prior assumptions leading, thus, to efficiency lack of these models in several situations. One of the problems, encountered in the case of hyperspectral data, is related to information redundancy between different spectral bands. Indeed, this redundancy creates multi-collinearity in the explanatory variables and makes the regression coefficients unstable and difficult to interpret. In fact, a hyperspectral spectrum is composed of a few hundred wavelengths and the information concerning a given chemical specificity is spread over several wavelengths. Therefore, it is irrelevant to remove this redundancy since this removal severely affects the hyperspectral information. The challenge in this study is to exploit all the spectral richness of the data by providing complementary rather than redundant information.
The idea is to weight each band according to its importance in the projection, thus the important bands are more favored than those of less importance.
The model calculates for each band a parameter called VIP (variable importance in the projection) that we will use for this weighting.
The new model based on this weighting is more suitable for data and has a better coefficient of determination. This paper is organized as follows. In Section 2, the materials and methods are

Investigation Area
The investigated site is located in the Governorate of Zaghouan in Tunisia; it is part of the plain SMINJA and BOURBIAA ( Figure 1). This area covers 34,000 Ha limited to the East by the locality of Zaghouan, to the north by the Bir Mchergua and to the west by the locality of Jabbes. It is characterized by a semi-arid climate to mild winter. The average annual rainfall, recorded in the station of the Agricultural Development Corporation [9] over a period of 11 years, is 390 mm. This water level is distributed as follows: 75% during the autumn-winter season, 21% and 4% respectively during the spring and summer [9]. This rainfall is marked by a significant inter-seasonal and inter-annual variation. The average annual temperature is 17.8˚C. The minimum and maximum is 10.9˚C and 24.6˚C and the coldest month (January) is 9.6˚C [9]. The prevailing winds are those in the northwest direction during the wet season and southwest during the dry season. The synthesis of data, from the agricultural map of Zaghouan governorate, allows to identify the main soil types in the SMINJA sector ( Figure 2). The soil cover consists mainly of:  -Vertisoils occupying about 30% of the studied area. These soils evolve on clay alluvium and are healthy, of fine texture "agrilo-silty". They are swelling clays whose exchange capacity is high. Their main characters are: high content of fine elements; strong retention capacity and a relatively high level of limestone. From an agronomic point of view, these soils have a good aptitude for arable crops and a mediocre aptitude for fruit growing in irrigated olive and almond trees. For annual crops, these soils have a good ability, with organic and mineral amendments [10].
-Poorly developed soils: these soils, in turn, cover around 30% of the studied area and are mainly located in the plain of SMINJA. They are deep, healthy and not very moist. The surface horizons of these soils are coarse-textured (sandy-loam-sandy) and devoid of organic matter.
We also note, mainly, the presence of the Brown Limestone Soils (SBC) as well as the Isohumic Soils distributed at different places of the studied area.

Field Sampling and Laboratory Spectral Measurement
The field sampled database consists of 82 soil samples collected as a ground truth data in the investigated region ( Figure 3). Soil samples were measured using a portable ASD FieldSpec 3 spectroradiometer designed for field environ-

Soil Properties Modeling and Assessment
The goal of developing a regression model is primarily to map the soil properties of a geographic area. Indeed, the problem lies in the fact that if the investigation area is relatively large, it is difficult, or even impossible, to "visit" the whole area.
where Y is the predicted property, b i are the coefficients of the model, R λi is the reflectance corresponding to the wavelength λ i . Recalling the fact that wavelengths do not have the same importance for prediction, therefore, we calculate the VIP (Variable Importance in Projection) which estimates the importance of each variable in the projection used for the PLS model. For a variable, the more the VIP coefficient is large, the more the variable is considered as being important. Selection methods allow keeping only a small amount of variables that better describe the model and eliminate the rest of variables ensuring, thus, parsimony in the description of the model. One of the most popular methods is the VIP proposed in [11] [12] [13]. The major issue concerns the selection of spectral bands for which VIP ≥ VIP 0 where VIP 0 is a threshold with a value generally considered as 1 or 0.8. This method assumes that eliminated bands do not contain information or that they just vehicle noise.
Spectral bands for which VIP < VIP 0 can contain a non-negligible additional information and, therefore, contribute to increase the coefficient of determination R 2 . The classical PLSR gives the same importance to all the spectral bands while the VIP selection method gives importance to the retained spectral bands and removes the importance of the eliminated bands [14].
We propose to keep all spectral bands while attributing less importance to bands with a low VIP. The weighting coefficient of a band A is the VIP of this band (Table 1). Consequently, a new spectrum is deduced from the original spectrum: where, (R i )': weighted reflectance; R i : reflectance of wavelength λ i ; VIP i : VIP corresponding to λ i ; In this way, the minor variable is not null.

Descriptive Soils Properties Prediction Using PLSR Based on VIP Selection Method
For each considered property, all samples in the data set are used to calibrate the soil property prediction models of Clay, Carbon, Organic Matter and Sand. Figure 4 represents histograms of these properties. The PLSR method is applied to build a prediction model for each property. The relationships between measured and predicted soil properties values are represented in Figure 5. In order to determine the optimal calibrated model, three evaluation criteria are considered: the highest values of R 2 , the highest values of adjusted R 2 and the lowest value of RMSE (Root Mean Square Error). The results are presented in Tables 2-5.
The main statistical parameters for clay, carbon, organic matter and sand data  Table 6. A significant difference between the minimum value and the maximum value of each property occurs. This difference is more remarkable for certain properties than others such as for sand. This reflects a spatial variation of these components.

Soil Properties Prediction Using Adapted PLSR Based on VIP Weighted Method
One of the problems of the selection method is the threshold determination from which the bands will be selected. In fact, a too small threshold gives the same importance to noise and information. This significantly affects the prediction model and degrades it. On the other hand, a too big threshold tends to scarify some important spectral bands and, therefore, to neglects information. The application of the weighting method has the advantage of having no randomly or iteratively search. However, each band is weighted with its own VIP.

Conclusions
In this paper, a comprehensive and analysis experiment on a new soil property prediction model based on VIP weighted process is provided. Certainly, not all wavelengths are equally important. For this, we calculate the VIP (Variable Importance in Projection) which estimates the importance of each variable in the projection used for the PLSR model. The more the VIP coefficient is large for a given variable, the more the variable is considered important. Unlike the classical PLSR prediction model, the VIP selection method gives importance to the retained bands and removes the importance of the eliminated spectral bands.
However, in this study we have shown that the adapted PLSR based on VIP weighted method offers a potentially on soil property prediction. We propose to keep all spectral bands but give less importance to bands that have a low VIP. The weighting coefficient of a band A is the VIP of this band and a new spectrum is deduced from the original spectrum. The experimental results show that the novel approach outperforms the standard PLSR model. Future studies are tied to the weighting techniques using statistical strategies to improve the prediction model.