Groundwater Level Prediction Using Artificial Neural Networks: A Case Study in Tra Noc Industrial Zone, Can Tho City, Vietnam

The objective of this study is to predict groundwater levels (GWLs) under different impact factors using Artificial Neural Network (ANN) for a case study in Tra Noc Industrial Zone, Can Tho City, Vietnam. This can be achieved by evaluating the current state of groundwater resources (GWR) exploitation, use and dynamics; setting-up, calibrating and validating the ANN; and then predicting GWLs at different lead times. The results show that GWLs in the study area have been found to reduce rapidly from 2000 to 2015, especially in the Middle-upper Pleistocene (qp2-3) and upper Pleistocene (qp3) due to the over-withdrawals from the enterprises for production purposes. Concerning this problem, an Official Letter of the People’s Committee of Can Tho City was issued and taken into enforcement in 2012 resulting in the reduction of exploitation. The calibrated ANN structures have successfully demonstrated that the GWLs can be predicted considering different impact factors. The predicted results will help to raise awareness and to draw an attention of the local/central government for a clear GWR management policy for the Mekong delta, especially the industrial zones in the urban areas such as Can Tho city.

mestic and production for millions of people in the Mekong Delta [1] [2]. In the context of contaminated surface water and fluctuating water levels downstream caused by the construction of hydroelectric projects and expansion of cultivated area in the upper Mekong, the role of GWR is becoming more and more important since the 1990s [3]. In addition, the impact of urbanization, population growth, land use changes and climate change will degrade the GWR in terms of the quantity, quality and dynamics of GWR [4].
There have been many researches on GWR dynamics using hydrogeological or statistical models. For instance, Radu Goru et al. (2001) [5] utilized a geological geographic information system (GIS) database that offers facilities for groundwater-vulnerability analysis and hydrogeological modelling had been de- In the Mekong River basin, So Kazama et al. (2007) [8] determined the variation of GWR caused by flooding over inundated areas located in lower part of the Mekong River basin using numerical modeling and field observations. The research concluded that flood control which reduced the area of inundation, resulted in a reduction of GWR in the area. Thus, while flood control activities were vital to reduce negative flood impacts in the Mekong River basin, they also negatively impacted on GWR in the area. Babel et al. (2006) [9] studied on the various negative impacts on the environment and society caused by land subsidence which has been a problem in Bangkok, Thailand, since the 1970s. The intensive groundwater extraction for industrial and domestic purposes since the 1950s, which led to a decline of GWLs, was the primary cause of land subsidence.
Nguyen Tieng Vang and Tran Van Ty (2017) [10] conducted research in the Tra Noc Industrial Zone, Can Tho city to assess the current status of exploitation, GWLs changes and management of GWR. From which, the relationship between groundwater extraction, water level in Bassac River (CTH-039803 station) and GWLs at monitoring stations/wells was established. The results showed that the extraction of groundwater in the Tra Noc Industrial Zone was very large; over-exploitation of GWR might be a major cause of decrease in GWLs leading to the decrease in GWLs of Pleistocene and Holocene aquifers of 4 m and 1 m, respectively from 2000 to 2015. Rainfall and Bassacriver was found to be the major source of recharge to Holocene aquifer. In addition, management of GWR was not effective, lack of close coordination between enterprises and local GWR management agencies/departments.

Study Area and Data
Can Tho city is the youngest and largest urban area in the Mekong Delta, including 8 industrial zones with a total area of over 2366 ha. These industrial zones are located along the national highways and Bassacriver which is one of the two branches of Mekong river after entering Vietnam. Industrial activities have caused serious environmental problems such as pollution of water sources, microbial contamination, subsidence, etc. Tra Noc Industrial Zone was established and developed since the 1990s including Tra Noc 1 Industrial Zone (Tra Noc Ward, BinhThuy District) and Tra Noc 2 Industrial Zone (Phuoc Thoi Ward, O Mon District) with total planning area of 300 hectares (Figure 1). Currently, there are 16 groundwater resources (GWR) monitoring stations/wells in Can Tho city, of which two stations (QT08 and QT16) are located in the study area. At each station, there are 3 monitoring wells in 3 aquifers and at different depths (Middle-Upper Pleistocene (qp2-3), Upper Pleistocene (qp3) and Holocene floor (qh)). From 2000 to 2015, the GWLs of Pleistocene (qp3 and qp2-3) in the Tra Noc Industrial Zone had declined rapidly. However, in the Holocene, the trend of groundwater levels (GWLs) was relatively stable.
Data of rainfall at Can Tho station and river water levels at two stations, average withdrawal discharge of industrial use purposes and observed GWLs at Pleistocene aquifer (qp2-3 and qp3 layers) at different monitoring wells were collected. Data and their sources are presented in Table 1.

Methodology
An Artificial Neural Network (ANN) consists of input, hidden and output layers and each layer includes an array of processing data. ANN is characterized by its structure representing the pattern of connection between nodes, connection weights, and activation function. ANN models were developed using different sets of combinations of the input parameters and the best combination model was selected based on the performance statistics.
Data of groundwater levels (GWLs) was first used to initialize the ANN model with observed GWLs at a given time to reproduce water level variations using input variables (rainfall, river water levels and withdrawal discharge from pumping). The selected ANN structures via trial and error were first calibrated on a training dataset to perform 1-, 2-, 3-month ahead predictions of future GWLs using past observed GWLs and the input variables. Simulations were then produced on another data set by iteratively feeding back the predicted GWLs, along with real data.

ANN Model Setting-Up
To develop ANN, the neural network toolbox from the Visual Gene Developer (http://www.visualgenedeveloper.net/) [13] was used. This toolbox provides the capability to design many different kinds of neural systems for various applications.

Data Pre-and Post-Processing
Data pre-processing was carried out for analyzing and transforming the input and output variables to minimize noise, and to highlight important relationships. The raw data were normalized between zero and one (unitless). Pre-processing: Post-processing: where y t is the observed data; a, bare minimum and maximum values of observed data, respectively; t y′ is the normalized value of observed data.

ANN Structures
The structure of ANN is determined by trial and error. The number of nodes in the hidden layers and the stopping criteria were optimized in terms of obtaining precise and accurate output. The activation function of the hidden/output layers was set to a sigmoid function as this proved by trial and error to be the best in depicting the non-linearity of the modeled natural system, among a set of other options. There is no well-established direct method for selecting the number of hidden nodes for an ANN model for a given problem. Thus the common trial-and-error approach remains the most widely used method [14]. Variables in the input vector to ANN models are presented in Table 2.
There are many kinds of neural networks depending on their structures, function and training methods. A typical feedward neural network with a back propagation learning algorithm to train it was used. A typical neural network is presented below: where x i is the input vector, O is the output vector, w i is a weight factor between two nodes and f(N) is a activation function. Among the different kinds of activation functions, the sigmoid was used in this study. The back propagation learning algorithm is based on a generalized delta-rule accelerated by a momentum term [15].
To improve the performance of the network, the weight factors were adjusted using following equations: where η is the learning rate; α is the momentum coefficient; Δw is the previous weight factor change; O is the output; δ is the gradient-descent correction term;  and p stands for pattern. The learning rate (η) and the momentum coefficient (α) were randomly generated from 0.01 to 1 and from 0 to 1, respectively.
The back propagation algorithm is applied as follow: 1) Normalize the training data and initialize all weights (normally a small random value between minus one to one); 2) Compute the output of neurons in the hidden layer and in the output layer; 3) Compute the error and update the weights; 4) Update all weights and repeat steps 2 and 3 for all training data; 5) Repeat steps 2 to 4 until the error converges to an acceptable level.
The performance of the trained network was checked by determining the error between the predicted value and the observed one.

Calibration and Validation
Available data was divided into two distinct sets namely the training/calibration and validation sets. As the training set is used by neural network to learn the patterns present in the data, 70% of data was allocated to the calibration set (2004-2012), 30% to validation set (2013)(2014)(2015). In this study, the networks were selected based on best performance on the training set, and a final check on the performance of the trained network was made using the validation set.

Criteria of Evaluation
Three different criteria were used in order to evaluate the suitable networks and their abilities to produce accurate predictions.
The Root Mean Square Error (RMSE): Efficiency Index: ∑ where X i is the observed data, X is the mean observed data, Y i is the calculated data and n is the number of observations. RMSE indicates the difference between the observed and calculated (ANN) values. The lowest the RMSE, the more accurate the prediction is. The best fit between observed and calculated values is indicated by EI and R 2 .

Current State of GWR
The total exploitation rate of groundwater resources (GWR) in Tra Noc Industrial Zone from 2004 to 2016 is shown in Figure 2. In addition, the enterprises in Tra Noc Industrial Zone have used combination of different water sources for production and daily usage. Only 18.18% of enterprises used GWR; the others used tap water and GWR accounted for 63.64%; and the remained used combined sources (data is not shown here). However, the exploitation of GWR for production showed the increasing trend again after 2012.  It can be seen in Figure 3 and Figure 4 that GWLs at Pleistocene aquifer reduced from 2000 to 2015. During this period, almost all of the enterprises in the area have exploited GWR for the production, especially in the Middle-upper Pleistocene (qp2-3) and upper Pleistocene (qp3). From 2010 onwards, the exploitation has been reduced thanks to the enforcement of Official Letter No. 2946/UBND-KT of the People's Committee of Can Tho City (2010) [16].
In addition, these two figures demonstrate that there was possible GWR recharge from rain water as there was the a little lag-time of GWLs and rainfall amount. According to the DONRE of Can Tho city (2011) [17], the depth of Pleistocene aquifer was from 35 m to 149 m (MSL), thus, this aquifer may receive some recharge from Bassac River (river depth of 33 m-MSL at Can Tho station).

Results of ANN Structure Selection, ANN Calibration and Validation
All trainings were carried out by the neural network toolbox from the Visual Gene Developer. By means of trial and error for different ANN structures, the input layer consisted of various input nodes, and a 3-monthly time-lag was included (time lags t, t-1, t-2, and t-3 considering t is the value of a given variable at the present time step), and optimum ANN structures were obtained. The output of the network is a prediction of the GWLs at three lead times (1-, 2-, 3-month). The number of hidden neurons was determined through trial and error.  The results of ANN structure selection show that the optimum structures for qp2-3 and qp3 are 14-15-1 and 12-15-1 (with respectively to the input, hidden and output nodes), respectively. The number of nodes in the hidden layer has a slightly impacts on the accuracy of prediction. Therefore, these two structures were selected for 1-, 2-, 3-month GWLs prediction at QT08 and QT16, respectively. Figure 5 shows examples of the ANN structures (14-15-3 and 12-15-3) and weights (in the figure, red color corresponds to high positive number and violet color means high negative number. Line width is proportional to absolute number of weight factor or threshold value).
The comparison between observed and 1-month predicted GWLs at QT08 at qp2-3 and qp3 layers, respectively are presented in Figure 6 and Figure 7.
Looking at the qualitative performance of GWLs, the shape and character of the predicted GWLs fits quite well with observations. Although the peak GWLs are under-or overestimated, this is not considered to be a serious problem, since the objective of this study is to assess the mean monthly GWLs, which are well fitted to observed GWLs at monitoring wells.
The correlations between GWLs and other impact factors such as rainfall, water levels in Bassacriver and GWR withdrawal for industrial uses were tested.
The results show high negative correlations between GWLs and GWR withdrawal for industrial uses. In contrast, there are low correlations between GWLs and rainfall/water levels in Bassacriver (data is not shown here). Therefore, further study should consider the future projection of GWR pumping for different purposes.
Performance statistics are summarized in Table 3 and the scatter plot of observed and predicted GWL for 1-2-3 month at QT08 and QT16 at Pleistocene aquifer (qp2-3 and qp3) are depicted in Figure 8 and Figure 9. It is observed from Table 3      From Table 3 and Figure 8 and Figure 9, it is clear that the calibrated and validated ANN predicted the GWLs with reasonable quality, so it can be used to evaluate the effects of different scenarios (rainfall, river water levels and GWR pumpings) on GWL in the study area. It can be concluded that, in general, the results indicate the potential of neural computing techniques (ANN) in predicting the GWLs at observation wells at 1-, 2-, 3-month lead time.

Conclusions
Greater demand of groundwater resources (GWR) for domestic and industrial production purposes cause the widespread exploitation of the resources. GWLs City was issued and taken into enforcement in 2012, to monitor the exploitation.
Application of Artificial Neural Network (ANN) has successfully demonstrated that the groundwater levels (GWLs) can be predicted by considering different impact factors. The predicted results will help to draw an attention of the local/central government to devise and formulate a clear GWR management policy for the Mekong delta, especially the industrial zones in the urban areas such as Can Tho city.
There are high negative correlations between GWLs decline and GWR withdrawal for industrial uses; therefore, further study should consider scenarios of GWR pumping for different purposes.