Comparison and Evaluation of GIS-Based Spatial Interpolation Methods for Estimation Groundwater Level in AL-Salman District — Southwest Iraq

The aim of the research is to compare spatial prediction methods: (RBF), (IDW), (OK), (UK), and (SK) for the production of the groundwater level map and the prediction error map in study area as well. Setting the foundations and criteria for choosing the most appropriate mathematical method for the construction of statistical surfaces in the representation of the level of groundwater in study area. These methods were used to predict the spatial distribution map of the groundwater level based on measured data from 764 wells in May 2016. The study reveals that comparing the spatial interpolation models and evaluating their accuracy, through some statistical indicators and cross-validation is the best way to choose the optimal model for the representation of data entered in any site. As a result of the statistical comparison between the five spatial interpolation models and validation of the results using (cross validation) it was found that the universal Kriging (UK) method is the best method to represent the level of groundwater in Salman district because this model has the lowest root mean square error (RMSE), the lowest mean error (ME), and the highest coefficient of determination (R) value. The groundwater level and prediction standard error maps produced in the geographic information system (GIS) give additional data and information that describe the aquifer system in study area and will ultimately improve sustainable groundwater management.


Introduction
The good application of hydrological methods is the basis for the management and development of water resources.As a result of the natural increase in the population of Iraq and the decline of the discharge of the Tigris and Euphrates because of the control of the countries of the upper basin, this has led to increased demand for water and this requires thinking about developing methods of searching for water and finding alternative sources of surface water to meet this challenge in the future as well as the scarcity of usable water is one of the greatest challenges in the twenty-first century [1].Groundwater is an important source of rural and urban needs for economic and social development throughout the world [2] [3].It is also the main source of water in areas with low rainfall, as in the study area, which is characterized by the scarcity of surface water resources.
The groundwater aquifer in AL-Salman district (dry zone) is the only source of household and agricultural needs in the study area.Therefore, it is necessary to rationalize the water in these areas through proper planning, which depends on the use of mathematical models to help the decision-maker to take correct steps in the planning and investment optimization of water projects.Proper planning requires the analysis and study of the behavior of the aquifer and the development of accurate digital maps that show the level of groundwater and its development over time [4].One of the problems facing hydrogeological studies is the estimation of the data values in a given location, either because the data are missing or the site does not have measurements or it is impossible to make measurements for the entire area studied because this work is costly materially or morally.The scientific method in this case is to take sporadic samples from that area and then to predict unknown points (areas where samples are not taken).These mathematical processes are called spatial interpolation [5].Therefore, spatial interpolation models can be defined as a set of statistical methods used to predict the values of phenomena in sites where measurements are not available based on a limited number of measured points.
All the phenomena with the spatial extension occupy a size in the space.To realize this size, we must see the outer surface surrounding this size, which is called the statistical surface.The spatial differences of this surface can be represented on the maps using one spatial interpolation model.Therefore, spatial statistical techniques are the main means of creating these maps.Since there is no universally approved method that can be adopted, this study compares different spatial completion methods: radial basis functions (RBF), inverse distance weighting (IDW), ordinary Kriging (OK), universal Kriging (UK) and sample Kriging (SK).Based on a set of statistical criteria, the best spatial interpolation model can be determined to represent the reality of the groundwater level in the study area evaluation.Moreover, the research aims to: 1) Setting the foundations and criteria for choosing the most appropriate mathematical method for the construction of statistical surfaces in the representa-Journal of Geographic Information System tion of the level of groundwater in study area.
2) Comparison of the prediction methods: IDW, RFP, UK, OK, and SK for the production of the groundwater level map and the prediction standard error map in study area.
The importance of research comes from the importance of studying groundwater related projects based on models that are in line with the development of GIS applications.

Material and Methodology
To reach the research objectives, the work was divided into three stages, as in Figure 1.
1) Collection, input, processing and analysis of data using the ArcGIS 10.5 program.
2) Comparison and evaluation of spatial interpolation models and selection of the best ones.
3) Production of groundwater level map and prediction standard error map to present and generalization of interpolation results.
covering an area of (17,462 km 2 ) and constitute (4%) of Iraq's total area of (435,052 km 2 ).The study area is a part of the southern desert in AL-Salman  Eocene sediments are the most prevailing ones.The rock sequence, is generally composed of carbonates with marl intercalations and lest amount of elastics [6].
The elevation of the study area ranges between (448) meters above sea level in the far south and ( 5) meters above sea level in the north [7].
The climate of the study area is characterized by a dry hot climate in the summer and mild in winter, temperatures range between 16˚C and 32˚C, the annual rate of rain is less than (80) mm, relative humidity (39%) and the total annual evaporation (3484) mm [8].The groundwater in the study area, especially in its northern parts, is being depleted due to the excessive increase in the drilling of artesian wells and the inefficiency of irrigation methods, as well as extreme evaporation and low rainfall.The study was based on field measurements taken from 764 artesian wells during the month of May 2016 [9].

Spatial Interpolation Methods
The spatial interpolation methods assessed here include in ArcGIS10.5 program the following: Inverted way: the weighted distance to derive the statistical surface expressed by a diver from the measured values at a number of points belonging to this surface and then a network of points is completed and the values of the phenomenon are calculated at these points according to a mathematical equation (Equation ( 1)).The calculation of the value of the phenomenon at any point of the network (statistical surface) is calculated and predicted in a way that is inversely proportional to its distance from the measured points after giving a weight per point [11].This method is closely related to distance as values decrease with distance in other words that the predicted values will not exceed the values of the specimens and the prediction will be limited to the known values [12].
This method is based on spatial correlation, where measured data are used at specific points in the region to estimate data for points where no measurements are available [13].This means that the data of each given point is significantly affected as close to the point where measurements are not available and less.Its impact whenever it departs from it [14].
where j z : estimated value for the unknown point at location j. ( ) where: r = the destance from sample to estimation, c = smoothing factor.
• Kriging The Kriging model is one of the most complex and robust methods, applying advanced statistical methods and needs to know spatial statistics because the data must be subjected to statistical examination before application.It depends on the distance and the relationship between the values known in predicting unknown values, and it is possible to predict values that exceed or less than known values but do not pass them as in the style of Spline.The Kriging method is the best procedure for nonlinear linear completion [16].The Kriging method is a linear interpolation procedure that provides unbiased linear estimates of different values in space, an advanced geostatistical procedure that generates an estimated statistical surface from a scattered set of points with z values (Equation ( 3)) [18] [19].In this model, the semivariogram plays a key role in the analysis and modeling of geostatistical data, and takes into account the autocorrelation between spatial data to construct mathematical models of spatial correlation structures expressed by variables [11] [12].It is calculated by the following Equation ( 3): ( ) ( ) where ( o s ): prediction location, γ: unknown weight for the measured value at the i location, ( ) i z s : measured value at the i site, n: number of samples.

Assessment Methods
The accuracy of spatial interpolation methods was assessed on the basis of three statistical criteria: Root Mean Square Error (RMSE), Mean Error (ME), and coefficient of determination (R 2 ).The model can be validated when (RMSE) is as low as possible, the (ME) is near zero, and the closer (R 2 ) of the correct one is, the better the model.1) RMSE is used as an important parameter that indicates the accuracy of spatial analysis in geographic information system and remote sensing [20].Lower (RMSE) indicates an interpolate is likely to give most reliable estimates in the areas with no data.The minimum (RMSE) calculated by Cross Validation can be used to find the best spatial interpolation model control parameters [21] [22].
The root mean square error (RMSE) was calculated for each model prediction using the formula [23]: where: ( ) and n is the number of pairs (errors).
2) The mean error (ME), is used for determining the degree of bias in the estimates and its provides an absolute measure of the size of the error.The large values indicate larger discrepancies between predicted and observed values [24].
ME formula as below: ( ) ( ) where: ( ) 3) Coefficient of determination: it is called the linear correlation coefficient square (R 2 ).It is expressed by the ratio of total squares of regression divided by total squares.The value ranges between the correct one and zero and is calculated by the following equation [25]: where: ave Q is the mean of measured values, ave P is the mean predicted values, n: number of sample used for predication.

Exploratory Analysis
Before applying the spatial interpolation models, the exploratory analysis of the data used should be performed using geostatistical techniques supported by GIS programs.Spatial models give more representative results if the data are distributed naturally and may result in distorted results if the data are abnormal.The spatial statistical extension contains a set of tools for the distribution of data such as the histogram, the trend analysis tool, Semivariogram/Covariance Cloud, and some statistical indicators.The natural distribution of the data that takes the shape of the natural curve (bell) in which the value of the measures of central tendency (Mean, Median and Mode) is characterized by this curve that the coefficient of skewness is equal to zero, and the coefficient of Kurtosis is equal to 3.
Each skewness coefficient is close to zero and all Kurtosis coefficients close to the value of 3 indicate a normal distribution of data [26] [27].In this study, the skewness coefficient = 1.21 and the Kurtosis coefficients = 2.9.In terms of the skewness coefficient, we observe that the distribution of the data is skewness to the right (positive skewness).This indicates that the majority of the values of the samples used have low values and in terms of the Kurtosis curve we find that the distribution of the data used in the form of the average flattening because the value of the coefficient is very close to 3.
(Figure 3) The structure of the data and extreme values of groundwater levels and trends are described in three dimensions (X, Y, Z).The blue curve shows the Spatial autocorrelation are used to detect and measure the similarity of contiguous phenomena that depend on the comparison between the value of the phenomenon and the average value of the structure (statistical value).If the difference between the contiguous parameters is smaller than the difference between all the parameters, it indicates that the adjacent values are similar because of the similarity of the surrounding conditions.In this case, it can be said that there is a positive reciprocal spatial autocorrelation.However, if the values of adjacent phenomena differ, it can be said that there is a negative spatial autocorrelation, in other words, a lack of spatial autocorrelation.Moran's Index is one of the important measures in detecting the spatial autocorrelation between the elements of the phenomenon studied and the pattern of the spread of the phenomenon is it dispersal, regular or random.The value of the Moran directory is between +1 and −1.If the directory value is close to (+1), this indicates the clustered pattern.If the value is close to (−1), this indicates the random pattern [24].
(Figure 4) The statistical analysis of the spatial autocorrelation of groundwater samples in the study area shows that the clustered correlation is shown by the Moran's Index of increase (0.71) and Z = 11, at a significant level (0.01).
In addition, the spatial autocorrelation relationship is detected by the semivariogram/covariance cloud instrument as in Figure 5 where each red dot represents a pair of similar groundwater samples in values.Which confirms that the data in the areas close to each other tend to be more similar and the correlation H. S. Njeban After confirming the validity of the natural distribution of groundwater data and not containing abnormal data, and before producing the spatial interpolation map of groundwater levels, it is necessary to conduct comparative and assessment of spatial interpolation methods and choose the best model for representing the groundwater level in the study area.

Comparison and Choice of Optimal Model
In this study 64 tests were conducted in order to find the nearest model representing the reality of the groundwater level in the study area.Then the best method was chosen for each of the five models: 1) inverse distance weighted (IDW), 2) radial basis functions (RBF), 3) ordinary Kriging (OK), 4) simple  To compare the spatial interpolation models and the selection of the best, the cross validation technique provided by ArcGIS 10.5 was used to judge statistically the accuracy of these models in the representation of groundwater levels in the study area as shown in Figures 6-9.As a result of the statistical comparison between the five spatial interpolation models and validation of the results using (cross validation) it was found that the (UK) method is the best method to represent the level of groundwater in Salman district because this model has the lowest Root Mean Square Error (RMSE), the lowest mean error (ME), and the highest Coefficient of determination (R 2 ) value as in Table 1 and Figure 10.This process is the first step to obtain high quality in the representation of groundwater data and produce a spatial prediction map of the groundwater level.The results also indicated that the radial basis functions (RBF) model is the lowest spatial interpolation models in the accuracy of the statistical indicators, and the other models in terms of preference in the following order: UK > IDW > OK > SK > RBF.

The Maps Groundwater Level and Prediction Standard Error
The geostatistical techniques allowed to find the best spatial interpolation method and to produce groundwater level map and the prediction standard error Journal of Geographic Information System      1) Groundwater levels largely follow the surface topography in the study area, moving away from the surface and increasing in the highlands as in the southwestern part of the study area.The groundwater level approaches the surface in lowlands such as valleys and depressions.
2) The presence of a AL-Salman depression in the middle of the study area helped to approach the groundwater from the surface compared to the areas around the depression.
3) If the groundwater level intersects with the surface of the earth, the groundwater appears in the form of perfusion or springs as it is in the far north of the study area or near the AL-Sulibat depression.
4) The hydraulic gradient of the groundwater with the topographic slope is largely consistent in the study area from the south-southwest to the north and north-east.
5) The hydraulic gradient rate was 1.411 m/km, while the topographic slope Figure 11.Map of groundwater level by using optimal chosen model.

Conclusions
The new thing that was discussed and reached in this research is to determine the best models of spatial interpolation in the representation of groundwater levels in the Salman area, after conducting comparisons and evaluation according Journal of Geographic Information System

Figure 3 .
Figure 3. Trend analysis of groundwater level in study area.

Figure 4 .
Figure 4. Spatial autocorrelation of groundwater level samples in the study area.

Figure 5 .
Figure 5. Semivariogram cloud to groundwater level in study area.

Figure 10 .
Figure 10.Statistical accuracy indicators RMSE, ME, and R 2 for every methods optimal model.

Figure 13
Figure13shows the spatial distribution of uncertainties to spatial predictions in groundwater level in study area as the uncertainty varies between (1.3 -2.2) meters in the near areas of the measured point, while this error gradually increases to reach (11.79 -18.87) meters in areas far from measured points.These maps can be used as a basis in finding the best locations for drilling wells in the study area, through subtraction groundwater surface layer from digital elevation model by using ArcGIS and produce a map showing the depth of the groundwater, can be utilized in the planning, development and decision-making processes of the actors in Iraq.
[17] high and low values that differ from the input data set and are highly sensitive to extreme values because of the inclusion of original data values in the sample[16][17].It is calculated by the following Equation (2): [16]l of the hands, but predicts the values that lie above the maximum values is below the minimum[15][16].One of the advantages of this model is that it generates enough statistical surface from a few samples.The disadvantages of this H. S. Njeban Journal of Geographic Information System method are to , n is number of samples (sum of squared errors) observed-estimated (values) i z x is observed value at point i x , ( ) ˆi z x is predicted value at point H. S. Njeban Journal of Geographic Information Systemi x