Prediction of Soil Salinity Using Remote Sensing Tools and Linear Regression Model

Soil salinity is one of the most damaging environmental problems worldwide, especially in arid and semi-arid regions. Multispectral data Sentinel_2 are used to study saline soils in southern Tunisia. 34 soil samples were collected for ground truth data in the investigated region. A moderate correlation was found between electrical conductivity and the spectral indices from SWIR. Different spectral indices were used from original bands of Sentinel_2 data. Statistical correlation between ground measurements of Electrical Conductivity (EC), spectral indices and Sentinel_2 original bands showed that SWIR bands (b11 and b12) and the salinity index SI have the highest correlation with EC. Based on these results and combining these remotely sensed variables into a regression analysis model yielded a coefficient of determination R2 = 0.48 and an RMSE = 4.8 dS/m.


Introduction
Soil salinity is widespread in the southern part of Tunisia from the east coast until the desert in the south. It is considered an important component of ecosystem degradation in the world's dry lands and can lead to desertification [1] [2] [3].
According to [4] roughly 20% of irrigated agriculture worldwide is affected by salinization. The FAO\UNESCO soil map of the world provides a ranking of the affected areas by the salt in the world. In the ranking, Australia holds the first  [5]. Salt affected soils can be found on every continent, and at elevations ranging from 5000 m (Tibetan plateau) to below sea level (Dead Sea) with over 10 percent of the total surface of dry land being salt-affected [6] [7]. Soil salinity in southern Tunisia sets out several negative influences such limiting plant growth, reducing crop productivity and degrading soil quality. Monitoring and mapping salt-affected areas are required to fully describe this phenomenon. Similar studies that combine the remote sensing, statistical analysis and ground truth measurements have been carried out, where it was found as the most efficient [8] [9]. Various remote sensing data are being widely used to identify and map saline soils including aerial photographs, multispectral and hyperspectral remote sensing data [10].
In recent decades, there has been a widespread application of remote sensing data to map soil salinity, either directly from bare soil or indirectly from vegetation in a real-time and cost-effective manner at various scales [11]. Besides, assessing soil salinity spatial modelling, which is the utilization of numerical equations to simulate and predict real phenomena and processes, has followed several approaches. The approaches used range from artificial neural network [12] [13], to classification and regression tree [14], to fuzzy logic [15], to generalized Bayesian analysis [14], to geostatistics (e.g., Kriging, CoKriging and regression Kriging) [16] [17] and statistical analysis (e.g., regression, ordinary least squares) [18] [19]. An overview of these techniques and how they provide optimal results under certain circumstances is given in the review papers of McBratney et al. [20] and Scull et al. [21]. An integrated approach using RS in addition to various statistical methods has great potential for developing soil prediction models. In the case of soil salinity, statistical analysis, in particular linear regression, has created a tremendous potential among other techniques for improvement in the way that soil salinity is modelled, because of its rapid, practical and cost-effective manner [22]. A variety of statistical models based on remote sensing data has been developed and has revealed reasonable predictors of soil salinity in the lite-  Sentinel_2, were used [20]. The arid and semi-arid zones of Tunisia and especially agricultural region like the region of Gabes, Ghannouch are seriously threatened by soil salinity. Thus, predicting the variability of soil salinity and mapping its spatial distribution are becoming increasingly important in order to implement or support effective soil reclamation programs that minimize or prevent future increases in soil salinity. The overall aim of this study was to develop effective combined spectral-based statistical regression models using Sentinel_2 images to predict and map spatial variation in soil salinity in the region of Gabes, Ghanouch.

Investigation Area
Gabes-Ghannouch is both a Mediterranean and Saharan region. It is located in South-Eastern Tunisia from Jeffara plain into the Gulf of Gabes (Figure 1(a)).
The study area has been chosen not only because of the important agriculture interests in this region, but also the environmental problems related to soil, such as salinization. Geographic location corresponds to a Latitude/Longitude respectively about 33˚42' and 10˚30' (Figure 1(b)). It has a typical Mediterranean climate  where maximum temperatures reached in the period between June and August (48˚C), while the coldest temperatures are measured between December and February. Due to its proximity to the sea, the climate of the study area slightly differs from the typical arid or semi-arid. The rainfall is irregular and ranges between 150 -240 mm per year with six months dry season (April-Sept), where the rain does not exceed 4 mm per month. According to [32] the investigated area is situated under an arid climate, where the annual evaporation value is ~1950 mm using the Pische and Bac methods. The evaporation in this region is relatively very high due to the dry climate conditions; therefore salt left after water evaporation on the top soils accumulates rapidly and accelerates the soil salinization process. This fact leads to salt accumulation in the upper layers of the Chott sediments and to crust formation [33].
The study area includes wetlands and steppe plains as well as areas used for agriculture.

Data Used and Statistical Data Processing
The satellite image Sentinel 2 was used to map the soil salinity. This image was acquired in May 2018 and is composed of a multi-spectral imager MSI which provides views in 13 spectral bands from visible to infrared with a resolution varying from 10 to 60 meters. Sentinel 2 spectral bands are incorporated into a spectrum range varying from 443 nm (blue) to 2190 nm in the SWIR.
Bands reflectance, considered as a spectral indices, and the spectral salinity indices derived from the blue, green, red and near infra-red bands, were used to predict soil salinity from satellite images. After obtaining these indices from the Sentinel_2 image corresponding to the sampling sites, correlation analyzes between the EC measurements and these indices were performed, these correlations are based on the Pearson function. The indices are described in Table 1.
Statistical method is performed like multiple regression models. The purpose is to understand the relation between the spectral indices and the electrical conductivity of sampled soil.

Linear Regression Model
A linear regression was used to establish relationship between the NIR, SWIR spectra and the reference data from analysis of EC based on the statistical analy- and there is an autocorrelation in the residuals, which is contrary to one of the assumptions of parametric linear regression.
( ) ( ) (1) where: N; Number of points, Z*(x i ) is estimated value at point x i Z(x i ) and is observation value at point x i .

Result and Discussion
Based on the data set collected from the fieldwork, the investigation area is considered as highly affected by salinity according to the results obtained from the Department of primary industries in Australia [35]. The study area is also dominated by a gypsic soil [9]. These areas of high and extreme saline soil are completely degraded region, where plants growth is suppressed. Alike Halophyte plants, which are very rare to find and it is very hard to grow through the high content of gypsum [4].

Descriptive Analysis
The main statistical parameters for EC data are given in Table 2. The distribution of the EC values is characterized by an average of 3.81 dS/m and a standard deviation of 6.40. A significant difference between a minimum of 0.25 dS/m (EC of healthy soils) and a maximum of 31.7 dS m (EC of saline soils), which reflects a significant spatial variability of this component [36].

Correlation between Spectral Indices and EC from the Ground Truth
A Pearson correlation between the electrical conductivity values and the Sentinel 2 spectral bands was conducted Table 1 to evaluate which spectrum interval could reveal more about the salt affected area. Correlation between the Sentinel 2 spectral bands and EC from the ground truth shows a higher correlation in the SWIR region of the spectrum interval as shown in Table 3.
The most correlated bands are the band 11 and 12 of SWIR, the empiric equation A = log(1/R) which transform the reflectance to absorbance improve the correlation by 3% that's why bands absorbance will be considered as spectral indices and will be integrated to construct the model. The salinity index SI provides the highest correlation 49% Table 4, not only among the salinity indices but among all the spectral indices performed in this work.
Color indices showed a low correlation with EC varying between 13% and 21%. Salinity indices show a moderate correlation with the EC, varying between 8% and 49%.   The most correlated is spectral salinity index SI Table 4 and SWIR bands (b11 and b12).

Regression Analysis Modelling
The linear regression is used to predict the spatial variability of soil salinity based on remote sensing and ground truth measurements.
The standard error RMSE (root mean square error) of the estimation is about 4.8 dS/m. This error decreases with increasing soil salinity, which means the higher the electrical conductivity is, the closer the predicted conductivity will lie The empirical relationship between measured and estimated EC values showed an overestimation of the predicted electrical conductivity values. Figure 3 shows that predicted values of electrical conductivity are often higher than the values from the ground truth measurements.
The plot of the standardized residuals versus the predicted values of EC shown in Figure 4 proved that no specific trends are identified; therefore, our proposed regression model is approved.

Conclusions
The present study demonstrates that combining the Sentinel_2 SWIR bands and the salinity index into a regression model offers a potentially quick and inexpensive method to model the spatial variation in soil salinity. The combination of these remotely sensed variables into one model was able to explain 48% of the spatial variation in the soil salinity of the study area.
Although this study demonstrates that soil salinity mapping and modelling can be undertaken with good accuracy based on high spatial resolution multispectral images, further research is needed to focus on investigating the possibility of hyperspectral data in mapping and modelling soil salinity.