Application of Artificial Neural Networks Model as Analytical Tool for Groundwater Salinity

The main source of water in Gaza Strip is the shallow coastal aquifer. It is extremely deteriorated in terms of salinity which influenced by many variables. Studying the relation between these variables and salinity is often a complex and nonlinear process, making it suitable to model by Artificial Neural Networks (ANN). Initially, it is assumed that the salinity (represented by chloride concentration, mg/l) may be affected by some variables as: recharge rate, abstraction, abstraction average rate, life time and aquifer thickness. Data were extracted from 56 municipal wells, covering the area of Gaza Strip. After a number of modeling trials, the best neural network was determined to be Multilayer Perceptron network (MLP) with four layers: an input layer of 6 neurons, first hidden layer with 10 neurons, second hidden layer with 7 neurons and the output layer with 1 neuron which gives the final chloride concentration. The ANN model generated very good results depending on the high correlation between the observed and simulated values of chloride concentration. The correlation coefficient (r) was 0.9848. The high value of (r) showed that the simulated chloride concentration values using the ANN model were in very good agreement with the observed chloride concentration which mean that ANN model is useful and applicable for groundwater salinity modeling. ANN model was successfully utilized as analytical tool to study influence of the input variables on chloride concentration. It proved that chloride concentration in groundwater is reduced by decreasing abstraction, abstraction average rate and life time. Furthermore, it is reduced by increasing recharge rate and aquifer thickness.


Introduction
The main source of water in Gaza Strip is the shallow aquifer which is part of the coastal aquifer.The quality of the groundwater is extremely deteriorated in terms of salinity and nitrates.Salinity in the Gaza coastal aquifer is often described by the chloride concentration in groundwater.Depending on location and hydrochemical processes, rates of salinization may be gradual or sudden [1].
Salinization of groundwater may be caused by a number and/or combination of different processes, including: seawater intrusion, migration of brines from the deeper parts of the aquifer, dissolution of soluble salts in the aquifer (water-rock interaction), and contribution from discharges from older formations surrounding the coastal aquifer.In addition, potential man-induced (anthropogenic) sources include agricultural return flows, wastewater seepage, and disposal of industrial wastes [2].In addition, water quality (e.g-salinization) is influenced by many factors such as flow rate, contaminant load, medium of transport, water levels, initial conditions and other site-specific parameters.The estimation of such variables is often a complex and nonlinear process, making it suitable for Artificial Neural Networks (ANN) application [3].
The importance of this article is to develop ANN model studying the relation between groundwater salinity (represented by chloride concentration mg/l) and some hydrological variables as: recharge rate (R), abstraction (Q), abstraction average rate (Qr), life time (Lt), and aquifer thickness (Th).Understanding spatial relations between hydrological variables and salinity of groundwater can contribute in an integration of water resources management.Modeling groundwater salinity using traditional modeling softwares consume a lot of efforts and required huge quantity of data while ANN could provide an easy and efficient tool for modeling and prediction that help in water resources management.This research might be considered as one of the few contributions in quantitatively modeling of the relation between groundwater salinity and the hydrological variables in spatial scale using ANN.

Groundwater Salinity in Gaza Strip
The Gaza Strip is a narrow strip of land on the Mediterranean coast.The area is bounded by the Mediterranean in the west, the 1948 cease-fire line in the north and east and by Egypt in the south.The total area of the Gaza Strip is 365 km 2 with approximately 40 km long and the width varies from 8 km in the north to 14 km in the south [4].Figure 1 showed regional and location map of Gaza Strip.
Gaza Strip is one of the places where the exploitation level of recourses exceeds the carrying capacity of the environment.This is especially true for the water and land resources, which are under high pressure and subject to sever over exploitation, pollution and degradation.Quality of the groundwater is a major problem in Gaza strip.The aquifer is highly vulnerable to pollution.The domestic water is becoming more saline every year and average chloride concentrations of 500 mg/l or more is no longer an exception.Most of the public water supply wells don't comply with the drinking water quality standards and concentrations of chloride and nitrate of the water exceed the World Health Organization (WHO) standards in most drinking water wells of the area and represent the main problem of groundwater quality.Over pumping of groundwater and salt water intrusion are the main reasons behind high chloride concentration [2].
It is clearly noticed that the chloride concentration increases significantly over all Gaza Strip especially in southern east and middle area.The best water quality is found in the sand dune areas in the north, mainly in the range of 50 -250 mg/l.Figure 2 and Figure 3 present average chloride concentration of pumped Groundwater of Gaza Strip for the year 2002 and 2007.

Brief description of Artificial Neural Networks
ANN refers to computing systems whose central theme is  borrowed from the analogy of biological neural networks.They represent highly simplified mathematical models of biological neural networks.They include the ability to learn and generalize from examples to produce meaningful solutions to problems even when input data contain errors or are incomplete, and to adapt solutions over time to compensate for changing circumstances and to process information rapidly [8].
The brain consists of a large number of neurons, connected with each other by synapses.These networks of neurons are called neural networks, or natural neural networks.ANN is a simplified mathematical model of a natural neural network.ANN are a new informationprocessing and computing technique inspired by biological neuron processing [9] (Lee et al., 1998).The human brain is a collection of more than 10 billion interconnected neurons.Each neuron is a cell that uses biochemical reactions to receive, process, and transmit information [10].Figure 4 presented mammalian neuron.Treelike networks of nerve fibers called dendrites are connected to the cell body or soma, where the cell nucleus is located.Extending from the cell body is a single long fiber called the axon, which eventually branches into strands and sub strands, and is connected to other neurons through synaptic terminals or synapses.The transmission of signals from one neuron to another at synapses is a complex chemical process in which specific transmitter substances are released from the sending end of the junction [10].
Artificial neurons connected together form a network.The structure of ANN is, as rule, layered.Three functional group can be distinguished in the ANN i.e. the inputs receiving signals from the network's outside and introducing them into its inside, the neuron which process information and the neurons which generate results.A model of the artificial neuron is shown in the Figure 5.
ANN is an informational system simulating the ability of a biological neural network by interconnecting many simple artificial neurons.The neuron accepts inputs from a single or multiple sources and produces outputs by simple calculations, processing with a predetermined non-linear function [12].
Most ANN has three layers or more: an input layer, which is used to present data to the network; an output layer, which is used to produce an appropriate response to the given input; and one or more intermediate layers, which are used to act as a collection of feature detectors.Determination of appropriate network architecture is one of the most important, but also one of the most difficult, tasks in the model-building process.Unless carefully designed an ANN model can lead to over parameterization, resulting in an unnecessarily large network [13].ANN model of three layers.

Construction of Data Matrix for ANN Model
In order to model the groundwater salinity in Gaza strip using ANN it is necessary to gather data for training purposes.The training data must include a number of cases, each containing values for input and output variables.The first decisions needed are: which are variables to use, and how many (and which) cases to gather, the choice of variables (at least initially) is guided by intuition.Understanding and expertise in the problem domain and conditions give initially idea of which input variables are likely to be influential.Once in ANN, variables can be select and deselect, ANN can also experimentally determine useful variables [15].As a first pass, any variables which could have an influence on groundwater salinity should be included on initial studies.
The required data were extracted mainly from the domestic wells in Gaza Strip because it usually have quality test twice a year in February and October periodically.The quality test includes the chloride concentration test which gives us a great chance to monitor groundwater salinity in Gaza Strip and it's changes two times per year.The previous assumed variables will be gathered, studied, validated and rearranged to create training data matrix which should contain many hundreds of cases each containing values for input variables and output.
In this research, it is necessary to deal with regular time series data to construct data training matrix so many sources of data have been neglected because of the deficiency of complete required data.Since that the detailed abstraction records have not been obtained for years prior to 1996, the period of model which include the modeling and calibration starts from 1997 to 2006.There are an estimated 4000 wells within the Gaza Strip, almost all of these wells are privately owned and used for agricultural purposes.Approximately 100 wells are owned and operated by municipalities and are used for domestic supply [16].In this research, data were extracted from 56 wells, most of them are municipal wells and they almost cover the total area of Gaza Strip as represented in Figure 7 The choice of these wells depends only on the availability of required data.

 Selection the Variables of ANN Model
Hydrogeologically, the change of chloride concentration (salinity) was assumed to be depend on many variables such as infiltration, abstraction, life time of abstraction from aquifer, groundwater level, aquifer depth, aquifer thickness, and distance from sea shore line.The variables are described in Table 1.

 Time Distribution Phases of ANN Model Data
The model data were extracted mainly from domestic wells in Gaza Strip because they usually have records of chloride concentration twice a year in February and October periodically.The time distribution divides the year in two phases A and B. The phase A starts from April to September and the phase B starts from October to March in next year.For example, time phase 1997-A extends from April 1997 to September 1997, time phase 1997-B extends from October 1997 to March 1998 and time phase 1998-A extends from April 1998 to September 1998, etc.So all other factors were organized according in the space.As the current data are collected from limited sources (56 municipal wells), they may constitute clusters.Therefore, the distribution of each variables across its range in the database is examined.The mean, standard deviation and ranges of different variables used to train the ANN is shown in Table 1.The frequency distributions of different variables studied across the range of the 499 cases are represented graphically as histograms with normal distribution curve in Figure 8.

Building ANN Model
The procedural steps in building and applying for ANN model varies according to the tool used in building ANN models.Using STATISTICA Neural Networks (SNN), the procedural steps involves the following procedures:  Data Importation It include feeding the data matrix for SNN to train the Network by "importing" or through the data entry process.The data must be in acceptable format such as spreadsheet.The input data is the cases that the network uses to train itself.

 Problem Definition
Problem Definition was achieved by specifying the inputs (Independent) and the output (Dependent) variable for the ANN model.Initially, there are nine inputs variables and one output variables as mentioned in Table 1.The organizing of ANN model data are required to construct some hundreds of data cases of input and output variables.These cases construct data matrix.Data organizing was carried out using software Ms. Excel and Access software.The data matrix is considered as row material to ANN model.
 Network Design

Analysis of ANN Model Data
Determining the appropriate architecture of network among the available networks based on the type of the data and the problem.After many trials, Multilayer Perceptron network (MLP) has been chosen because of its high capabilities to generalize well in problems plagued with significant heterogeneity and nonlinearity.
Considering only those cases that have complete numeric values for all variables without any missing data, only 499 cases satisfy the above-mentioned criteria from 1997 to 2004.ANN model might perform well over an entire space only when the training data are evenly distributed

 Network Training
Once the type of network has been chosen, the conditions to stop training processes were set before the network is trained.Training was controlled by some of conditions as: the maximum number of iterations, target performance which specifies the tolerance between the neural network prediction and actual output, the maximum run time and the minimum allowed gradient and .

 Network Calibration
A trained network was continuously trained in order to make a model perform best on the training set.However, after some time, it is very possible for the network to "memorize" the training set instead of learning it.In order to prevent the possibility of memorization to occur, calibration is utilized.Calibration is a parameter, which indicates that the network has trained enough thus stopping the iteration process.
 Testing of Network After the network has been successfully trained well, it is then tested against a set of cases withheld from it during its training session.The ANN is then ready to be applied to any other values of variables.The results are then presented in statistical manner.Regression analysis is utilized to measure the degree of correlation between the actual output and the network output.Correlation factor (r) of 1 gives an indication of a perfect model while an (r) of 0 indicates a very bad model.Mathematically the values of (r) represented in Equation (1).

Performance of ANN
The progress of the training was checked by plotting the training, and test mean square errors versus the performed number of iterations, as presented in Figure 10. Figure 11 presented a comparison of simulated chlo-ride concentration using ANN and the observed chloride concentration.The Figure 11 showed a very high correlation between the observed and predicted values of chloride concentration.The correlation coefficient (r) between the predicted and observed output values of the ANN model is 0.9848.The high value of correlation coefficient (r) showed that the simulated chloride concentration values using the ANN model were in very good agreement with the observed chloride concentration which gave initial impression that ANN model are useful and applicable.Simulated chloride concentration using ANN model and observed chlorideconcentration on 1/10/ 2000 are presented in Figure 12.

Regression Statistics of ANN Model
In regression problems, the purpose of the neural network is to learn a mapping from the input variables to a    2.

Response Graph
Response graph shows the effect on the output variable prediction of adjusting input (independent) variables.The ANN model was utilized to study the influence of the input variables on output variable which is chloride concentration.Figure 13 presented a response graph of each input variables of final ANN model.
Figures 13(a,c-e) indicated that chloride concentration increases nonlinearly as chloride concentration initial, abstraction, abstraction average rate and life time increase.Figures13(b,f) indicated that chloride concentration decreases nonlinearly as recharge rate and aquifer thickness increase.

Response Surface
A response surface is a figure shows the effect on the output variable prediction of adjusting two input (independent) variables.The ANN model was utilized to study  Figures 14(a) indicated that the chloride concentration increases nonlinearly as recharge decreases and abstraction increases and the effect of recharge is stronger than effect of abstraction.Figure 14(b) indicated that the chloride concentration increases nonlinearly as recharge decreases and abstraction average rate increases and the effect of recharge is similar to effect of abstraction average rate.Figure 14(c) indicated that the chloride concentration increases nonlinearly as life time increases and recharge decreases and the effect of recharge is stronger than effect of life time.Figures 14(d) indicated that the chloride concentration increases nonlinearly as recharge decrease and aquifer thickness and the effect of aquifer thickness is stronger than effect of recharge.
Figure 14(e) indicated that the chloride concentration increases nonlinearly as abstraction and abstraction average rate increase and the effect of abstraction average rate is similar to the effect of abstraction.Figure 14(f) indicated that the chloride concentration increases nonlinearly as abstraction and life time increase and the effect of life time is similar to effect of abstraction.Figure 14(g) indicated that the chloride concentration increases nonlinearly as abstraction increases and aquifer thickness decrease.In addition, it was noted that effect of aquifer thickness is stronger than effect of abstraction.
Figure 14(h) indicated that the chloride concentration increases nonlinearly as abstraction average rate and life time increase and the effect of abstraction average rate is stronger than effect of life time.Figure 14(i) indicated that the chloride concentration increases nonlinearly as abstraction average rate increases and aquifer thickness decreases and the effect of aquifer thickness is similar to effect of abstraction average rate.
Figure 14(j) indicated that the chloride concentration increases nonlinearly as life time increases and aquifer thickness decreases.In addition, it was noted that effect of aquifer thickness is similar to effect of life time.

Utilizing ANN Model as Analytical Tool
The ANN model was utilized to study the influence of the input variables on chloride concentration.Hypothetical cases of input variables were assumed to study the influence of the input variables.Three level of confidence were assumed: the first one was consolidating the values of input variables on the mean value and changing the value of studied variable gradually from minimum value to maximum value in the range of input variable.
The second level of confidence was consolidating the values of abstraction, abstraction average rate and life time on the mean plus the value of standard deviation.In addition it was consolidating the values of recharge rate and aquifer thickness on the mean subtract the value of standard deviation which produce conditions lead to increase chloride concentration in groundwater.
The third level of confidence was consolidating the values of abstraction, abstraction average rate and life time on the mean subtract the value of standard deviation and consolidating the values of recharge rate and aquifer thickness on the mean plus the value of standard deviation which produce conditions lead to decrease chloride concentration in groundwater.
To obtain the values of gradual changing for input variable from minimum value to maximum value in the range of input variable, the range was divided to ten steps and the value gradually was increased from minimum value to maximum value in the range.Table 3 presented the hypothetical values of gradual change of input variables.Hypothetical values of input variables for the three analysis conditions were computed as explained above and they were presented in Table 3.

Influence of Recharge Rate on Chloride
Concentration By application the above mentioned procedure and using the final ANN model to calculate the value of final chloride concentration for each hypothetical case, the effect of recharge rate on chloride concentration was studied.Results of the three conditions (normal, increasing and decreasing) were presented in Figure 15 and Table 5.
It was noted that increasing recharge rate from 0 to 80 mm/m 2 /month resulted in a large influence in final chloride concentration as follows:     It is noted that stabilization point of chloride concentration for normal condition occurred at recharge rate = 22 mm/m 2 /month and for increasing condition occurs at recharge rate = 52 mm/m 2 / month which mean that increasing condition required height recharge rate to achieve stabilization point of chloride concentration.In decreasing condition final chloride concentration stayed less than 330 mg/l with values 316.89 mg/l to 268.75 mg/l for all values of recharge rate even if small values of recharge rate were available.

Influence of Abstraction on Chloride
Concentration By application the above mentioned procedure and using the final ANN model to calculate the value of final chloride concentration for each hypothetical case, the effect of abstraction on chloride concentration was studied.Results of the three conditions (normal, increasing and decreasing) were presented in Figure 16.It was noted that increasing abstraction from 0 to 250 m 3 /hr results in a small influence in final chloride concentration.

Impacts of Abstraction Average Rate on
Chloride Concentration By application the above mentioned procedure and using the final ANN model to calculate the value of final chloride concentration for each hypothetical case, the effect of abstraction average rate on chloride concentration was    chloride concentration in groundwater) and some related hydrological factors such as recharge rate (R), abstraction (Q), abstraction average rate (Qr), life time (Lt) and aquifer thickness (Th).
2) The best neural network was Multilayer Perceptron network (MLP) with four layers: an input layer of 6 neurons, first hidden layer with 10 neurons, second hidden layer with 7 neurons and the output layer with 1 neuron which gives the final chloride concentration (Cl f ).
3) The new approach generated very good results depending high correlation between the observed and predicted values of chloride concentration.The correlation coefficient (r) between the observed and predicted the output values of the ANN model was 0.9848.The high value of correlation coefficient (r) showed that the simulated chloride concentration values using the ANN model were in very good agreement with the observed chloride concentration which mean that ANN model are useful and applicable.
4) The ANN model proved that chloride concentration in groundwater is reduced by decreasing abstraction (Q), abstraction average rate (Qr) and life time (Lt).Furthermore, it is reduced by increasing recharge rate (R) and aquifer thickness (Th).
5) Therefore, the current research showed that ANN model can be used in groundwater quality management and it is comparable to other used approaches such as groundwater modelling and statistical modelling.
6) It showed that the strong remedial actions for solving the groundwater deterioration problem in the aquifer of Gaza Strip (salinity) are reducing the abstraction rate and increasing the recharge quantities to the aquifer.

Figure 3 .
Figure 3. Average chloride concentration of pumped groundwater of Gaza Strip for the year 2007 [7].

Figure 6
demonstrated schematic description of a general

Figure 7 .
Figure 7. Study wells location in Gaza Strip.

Figure 8 .
Figure 8. Frequency distribution of the variables across the range of 499 cases.(a) Frequency distribution of Cl o ; (b) Frequency distribution of R; (c) Frequency distribution of Q; (d) Frequency distribution of Qr; (e) Frequency distribution of Lt; h) Frequency distribution of Th. (

1 .Figure 9 .
Topology of ANNAfter a number of training trials, the best neural network was determined to be Multilayer Perceptron network (MLP) with four layers: an input layer of 6 neurons, first hidden layer with 10 neurons, second hidden layer with 7 neurons and the output layer with 1 neuron as shown in The six input neurons are: initial chloride concentration (Cl o ), recharge rate (R), abstraction (Q), abstraction average rate of area (Qr), life time (Lt), aquifer thickness (Th).The output neuron gives the final chloride concentration (Cl f ).

Figure 11 .
Figure 11.Comparison of simulated chloride concentration using ANN model and the observed chloride concentration.

Figure 12 .
Figure 12.Comparison of simulated chloride concentration using ANN and the observed chloride concentration on 1/10/2000.continuous output variable.A network is successful at regression if it makes predictions with accepted accuracy.SNN automatically calculates correlation coefficient (r) between the actual and predicted outputs.A perfect prediction will have a correlation coefficient of 1.0.A correlation of 1.0 does not necessarily indicate a perfect prediction (only a prediction which is perfectly linearly correlated with the actual outputs), although in practice the correlation coefficient is a good indicator of performance.It also provides a simple and familiar way to compare the performance of neural networks with standard least squares linear fitting procedures.The degree of predictive accuracy needed varies from application to application.Regression statistics are listed as follows:  Data Mean: Average value of the target output variable. Data S.D.: Standard deviation of the target output

Figure 13 .
Figure 13.Response graph of each input variables of ANN model.(a) Response graph of Qr; (b) Response graph of R; (c) esponse graph of Q; (d) Response graph of Cl o ; (e) Response graph of Lt; (f) Response graph of Th.R

Figures 14 .
Figures 14. Response surface of each two input variables of ANN model.(a) Response surface of R & Q; (b) Response surface of R & Qr; (c) Response surface of R & Lt; (d) Response surface of R & Th; (e) Response surface of Q & Qr; (f) Response surface of Q & Lt; (g) Response surface of Q & Th; (h) Response surface of Qr & Lt; (i) Response surface of Th & Qr; (j) Response surface of Th & Lt.
In normal condition, when the initial chloride concentration = 330 mg/l, abstraction =105 m 3 /hr, abstraction average rate = 22 mm/m 2 /month, life time = 22 years and aquifer thickness = 65 m.Final chloride concentration decrease from 346.00 mg/l to 290.91 mg/l.Final chloride concentration stayed stable of 330 mg/l on recharge rate of 28 mm/ month. In increasing condition, when the initial chloride concentration = 330 mg/l, abstraction = 146 m 3 /hr, abstraction average rate = 29 mm/m 2 /month, life time = 36 years and aquifer thickness = 40 m.Final chloride concentration decreased from 363.06 mg/l to 309.93 mg/l.Final chloride concentration stayed stable of 330 mg/l on recharge rate of 52 mm/ m 2 /month. In decreasing condition, when initial chloride concentration = 330 mg/l, abstraction = 47 m 3 /hr, abstraction average rate = 16 mm/m 2 /month, Life

Figure 15 .
Figure 15.Impact of recharge rate on chloride concentration.

Figure 16 .
Figure 16.Effect of abstraction on chloride concentration.

Figure 18 .
Figure 18.Effect of life time on chloride concentration.

Figure 19 .
Figure 19.Effect of aquifer thickness on chloride concentration.

Table 2 . The values of regression statistics for final ANN model. Regression sta- tistics All model data Training data set Validation data set Test data set
Notes: Low value of Error Mean, Abs E. Mean and S.D. Ratio showed that the error between observed and simulated chloride concentration values using the ANN model are small; High value of correlation coefficient (r) showed that the simulated chloride concentration values using the ANN model are in good agreement with the observed hloride concentration.c

Table 4 . Hypothetical values of input variables for the three analysis conditions.
time = 8 years and aquifer thickness = 91 m.Final chloride concentration decreased from 316.89 mg/l to 268.75 mg/l.In this condition final chloride concentration stayed less than 330 mg/l for all values of recharge rate even if small values of recharge because of very good condition of small value of abstraction, abstraction average rate life time and large aquifer thickness.