Prediction of Soil Salinity Using Remote Sensing Tools and Linear Regression Model ()
1. Introduction
Soil salinity is widespread in the southern part of Tunisia from the east coast until the desert in the south. It is considered an important component of ecosystem degradation in the world’s dry lands and can lead to desertification [1] [2] [3]. According to [4] roughly 20% of irrigated agriculture worldwide is affected by salinization. The FAO\UNESCO soil map of the world provides a ranking of the affected areas by the salt in the world. In the ranking, Australia holds the first place with 84.7 106 ha while Africa occupies the second with 69.5 106 ha, then come Latin America and Middle East taking the third and fourth classes respectively with 59.4 106 ha and 53.1 106 ha [5]. Salt affected soils can be found on every continent, and at elevations ranging from 5000 m (Tibetan plateau) to below sea level (Dead Sea) with over 10 percent of the total surface of dry land being salt-affected [6] [7]. Soil salinity in southern Tunisia sets out several negative influences such limiting plant growth, reducing crop productivity and degrading soil quality. Monitoring and mapping salt-affected areas are required to fully describe this phenomenon. Similar studies that combine the remote sensing, statistical analysis and ground truth measurements have been carried out, where it was found as the most efficient [8] [9]. Various remote sensing data are being widely used to identify and map saline soils including aerial photographs, multispectral and hyperspectral remote sensing data [10].
In recent decades, there has been a widespread application of remote sensing data to map soil salinity, either directly from bare soil or indirectly from vegetation in a real-time and cost-effective manner at various scales [11]. Besides, assessing soil salinity spatial modelling, which is the utilization of numerical equations to simulate and predict real phenomena and processes, has followed several approaches. The approaches used range from artificial neural network [12] [13], to classification and regression tree [14], to fuzzy logic [15], to generalized Bayesian analysis [14], to geostatistics (e.g., Kriging, CoKriging and regression Kriging) [16] [17] and statistical analysis (e.g., regression, ordinary least squares) [18] [19]. An overview of these techniques and how they provide optimal results under certain circumstances is given in the review papers of McBratney et al. [20] and Scull et al. [21]. An integrated approach using RS in addition to various statistical methods has great potential for developing soil prediction models. In the case of soil salinity, statistical analysis, in particular linear regression, has created a tremendous potential among other techniques for improvement in the way that soil salinity is modelled, because of its rapid, practical and cost-effective manner [22]. A variety of statistical models based on remote sensing data has been developed and has revealed reasonable predictors of soil salinity in the literature [23] [24] [25] [26] [27]. In Thailand, Shrestha [28] developed several salinity prediction models containing spectral variables, including Normalized Difference Vegetation Index (NDVI), Normalized Difference Salinity Index (NDSI), the eight original bands of Landsat Enhanced Thematic Mapper plus (Landsat ETM+) and soil properties. The results indicated that mid-infrared (band 7) and near-infrared (band 4) had the highest association with the measured EC. Combining these variables yielded salinity prediction models to infer soil salinity over a large area. In contrast, Mehrjardi et al. [29] found that among the Landsat ETM+ bands 1 - 5 and 7, band 3 (red band) had the highest correlation with EC, and based on that result, a regression model fitted to relate EC to band 3 and the exponential relation was found to be the best type of model.
A regression model based on image enhancement techniques (spectral indices, Principal Components Analysis (PCA) and Tasseled Cap Transformation (TCT)) have also been extensively used to predict soil salinity and to improve the characterised variability of salinity. For example, Tajgardan et al. [30] combined Principal Components Analysis (PCA) techniques and regression analysis to predict and map soil salinity from data collected by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) at the north of the Aq-Qala Region in northern Iran. From this study, a suitable regression model was developed with electrical conductivity (EC) to predict soil salinity. Similarly, Afework [31] built a reliable model to predict soil salinity in the Metehara sugarcane farms in Ethiopia by relating EC to the Normalized Difference Salinity Index (NDSI) using linear regression. Other researchers found that incorporating satellite images spectral bands with enhanced images has great promise for soil salinity modelling and mapping. Bouaziz et al. [19] conducted a study to detect soil salinity based on the Moderate Resolution Imaging Spectroradiometer (MODIS) and a multiple linear regression. They found that incorporating Salinity Index SI2 with near-infrared (NIR) (band 3) into a statistical model allowed researchers to gain great insight into the spatial detection of the spread of soil salinity. Recently, Judkins and Myint [25] found that Landsat band 7, Transformed Normalized Vegetation Index (TNDVI) and Tasselled Cap 3 and 5, derived from TCT, provided high correlation to the variation in soil salinity. Combining these spectral variables into a multiple linear regression model enabled them to predict and map soil salinity surface variation levels efficiently. Most of the reviewed studies and others found in the literature modelled soil salinity using statistical analysis and multispectral images with moderate spatial resolution (e.g., Landsat, MODIS, etc.), while only in limited studies multispectral high spatial resolution images such as Sentinel_2, were used [20]. The arid and semi-arid zones of Tunisia and especially agricultural region like the region of Gabes, Ghannouch are seriously threatened by soil salinity. Thus, predicting the variability of soil salinity and mapping its spatial distribution are becoming increasingly important in order to implement or support effective soil reclamation programs that minimize or prevent future increases in soil salinity. The overall aim of this study was to develop effective combined spectral-based statistical regression models using Sentinel_2 images to predict and map spatial variation in soil salinity in the region of Gabes, Ghanouch.
2. Materials and Methods
2.1. Investigation Area
Gabes-Ghannouch is both a Mediterranean and Saharan region. It is located in South-Eastern Tunisia from Jeffara plain into the Gulf of Gabes (Figure 1(a)). The study area has been chosen not only because of the important agriculture interests in this region, but also the environmental problems related to soil, such as salinization. Geographic location corresponds to a Latitude/Longitude respectively about 33˚42' and 10˚30' (Figure 1(b)). It has a typical Mediterranean climate
Figure 1. Location of the study area (a) Composed MODIS image of Mediterranean sea from 2005 (b) Sentinel_2 image of southern Tunisia 2018.
where maximum temperatures reached in the period between June and August (48˚C), while the coldest temperatures are measured between December and February. Due to its proximity to the sea, the climate of the study area slightly differs from the typical arid or semi-arid. The rainfall is irregular and ranges between 150 - 240 mm per year with six months dry season (April-Sept), where the rain does not exceed 4 mm per month.
According to [32] the investigated area is situated under an arid climate, where the annual evaporation value is ~1950 mm using the Pische and Bac methods. The evaporation in this region is relatively very high due to the dry climate conditions; therefore salt left after water evaporation on the top soils accumulates rapidly and accelerates the soil salinization process. This fact leads to salt accumulation in the upper layers of the Chott sediments and to crust formation [33].
The study area includes wetlands and steppe plains as well as areas used for agriculture.
2.2. Soil Sampling Method
Soil samples are collected within the upper ~10 cm from the soil surface. The campaign of soil sample collection was made in May 2018, which corresponds to the multi-spectral data acquisition date. The choice of dry season to collect the samples was not arbitrarily selected, but aimed at enhancing the detection of spectral characteristics of salt at surface during salt accumulation at that specific time; Salt in the soils, in dry season, is rising up due to capillarity. The signal of salty soil, at this period of the year, is stronger and easier to detect from the optical sensors [34]. The soil sample locations were selected in such a way to minimize any noise that could affect the spectral signature from the soil. Thus, all samples used in this study are at least 60 m away from objects, which are not defined as soil (e.g.: trees, houses, streets, etc.).
At all sample location, a procedure is used to collect the soil. Each analysed sample in this work is a mix of four soil samples. These 4 samples are collected from 4 corners of a (60 × 60) square, where the center is considered the location of the sample, then the mix of 4 soil collected from 4 corners is the soil sample considered for chemical analysis Figure 2. These steps are applied for all the samples, in order to optimize the representation of the samples within the pixel of the Sentinel_2 image [3]. The use of 60 × 60 m square for the samples collection aims to be correlated to the spatial resolution of the multispectral image.
Salinity at the top-soil is determined by measuring electrical conductivity (EC). 1/5 soil/water diluted extracts is a convenient method [1] used in this study to estimate soil salt content. To measure the EC of our samples, following steps are conducted: 1) Drying the samples, 2) Sieving (Size of the soil particle <2 mm), 3) Agitation, 4) and then measuring the EC values. EC is usually expressed in decisiemens per m at 25˚C (dS/m).
2.3. Data Used and Statistical Data Processing
The satellite image Sentinel 2 was used to map the soil salinity. This image was acquired in May 2018 and is composed of a multi-spectral imager MSI which provides views in 13 spectral bands from visible to infrared with a resolution varying from 10 to 60 meters. Sentinel 2 spectral bands are incorporated into a spectrum range varying from 443 nm (blue) to 2190 nm in the SWIR.
Bands reflectance, considered as a spectral indices, and the spectral salinity
Figure 2. Mixing of the samples from the 4 corners to represent one soil sample.
indices derived from the blue, green, red and near infra-red bands, were used to predict soil salinity from satellite images. After obtaining these indices from the Sentinel_2 image corresponding to the sampling sites, correlation analyzes between the EC measurements and these indices were performed, these correlations are based on the Pearson function. The indices are described in Table 1. Statistical method is performed like multiple regression models. The purpose is to understand the relation between the spectral indices and the electrical conductivity of sampled soil.
2.4. Linear Regression Model
A linear regression was used to establish relationship between the NIR, SWIR spectra and the reference data from analysis of EC based on the statistical analysis. The highest values of R2 and the lowest value of RMSE (root mean square error) were used to determine the optimal calibrated model. The smallest RMSE indicate the most accurate prediction, this RMSE was derived according to equal of (1). The model will be assessed graphically by analysing the standardized residuals versus the predicted values of EC. By plotting the residuals with the descriptive variable, if a trend is identified, it indicates that the model is not accurate
Table 1. Formula used to generate the indices.
and there is an autocorrelation in the residuals, which is contrary to one of the assumptions of parametric linear regression.
(1)
where: N; Number of points, Z*(xi) is estimated value at point xi Z(xi) and is observation value at point xi.
3. Result and Discussion
Based on the data set collected from the fieldwork, the investigation area is considered as highly affected by salinity according to the results obtained from the Department of primary industries in Australia [35]. The study area is also dominated by a gypsic soil [9]. These areas of high and extreme saline soil are completely degraded region, where plants growth is suppressed. Alike Halophyte plants, which are very rare to find and it is very hard to grow through the high content of gypsum [4].
3.1. Descriptive Analysis
The main statistical parameters for EC data are given in Table 2. The distribution of the EC values is characterized by an average of 3.81 dS/m and a standard deviation of 6.40. A significant difference between a minimum of 0.25 dS/m (EC of healthy soils) and a maximum of 31.7 dS m (EC of saline soils), which reflects a significant spatial variability of this component [36].
3.2. Correlation between Spectral Indices and EC from the Ground Truth
A Pearson correlation between the electrical conductivity values and the Sentinel 2 spectral bands was conducted Table 1 to evaluate which spectrum interval could reveal more about the salt affected area. Correlation between the Sentinel 2 spectral bands and EC from the ground truth shows a higher correlation in the SWIR region of the spectrum interval as shown in Table 3.
The most correlated bands are the band 11 and 12 of SWIR, the empiric equation A = log(1/R) which transform the reflectance to absorbance improve the correlation by 3% that’s why bands absorbance will be considered as spectral indices and will be integrated to construct the model. The salinity index SI provides the highest correlation 49% Table 4, not only among the salinity indices but among all the spectral indices performed in this work.
Color indices showed a low correlation with EC varying between 13% and 21%. Salinity indices show a moderate correlation with the EC, varying between 8% and 49%.
Table 2. Descriptive statistics on the electrical conductivity of samples.
Table 3. Correlation matrix between Landsat spectral bands and EC values.
Table 4. Correlation matrix between salinity indices and EC values.
The most correlated is spectral salinity index SI Table 4 and SWIR bands (b11 and b12).
3.3. Regression Analysis Modelling
The linear regression is used to predict the spatial variability of soil salinity based on remote sensing and ground truth measurements. The prediction of the EC values from Sentinel_2 bands and the spectral indices is associated with the identification of 3 variables shown in Equation (2). A significant coefficient of determination R2 indicates that the predictor variables used in the model can explain 48% of the total variation of the predicted EC values. The regression empirical relationship is given by the following formula:
(2)
The standard error RMSE (root mean square error) of the estimation is about 4.8 dS/m. This error decreases with increasing soil salinity, which means the higher the electrical conductivity is, the closer the predicted conductivity will lie to the ground truth measurement.
The empirical relationship between measured and estimated EC values showed an overestimation of the predicted electrical conductivity values. Figure 3 shows that predicted values of electrical conductivity are often higher than the values from the ground truth measurements.
The plot of the standardized residuals versus the predicted values of EC shown in Figure 4 proved that no specific trends are identified; therefore, our proposed regression model is approved.
Figure 3. Relationship between measured and estimated electrical conductivity values.
Figure 4. Relationship between estimated electrical conductivity values and standardized residuals.
4. Conclusions
The present study demonstrates that combining the Sentinel_2 SWIR bands and the salinity index into a regression model offers a potentially quick and inexpensive method to model the spatial variation in soil salinity. The combination of these remotely sensed variables into one model was able to explain 48% of the spatial variation in the soil salinity of the study area.
Although this study demonstrates that soil salinity mapping and modelling can be undertaken with good accuracy based on high spatial resolution multispectral images, further research is needed to focus on investigating the possibility of hyperspectral data in mapping and modelling soil salinity.