Characterization and Classification of Groundwater from Wells Using an Electronic Tongue ( Kairouan , Tunisia )

A sensor array comprising 9 potentiometric chemical sensors and some pattern recognition tools for the data processing has been applied in order to characterize the groundwater in the plain of Kairouan Region (Tunisia). A total of 17 groundwater samples were collected from three different villages and analyzed for their chemical components. Nine chemical parameters were determined: Potassium, Sodium, Calcium, Ammonium, Cadmium, Chlorides, Nitrates, Fluoride and pH. Multi-sensor responses measured in each water sample were diagnosed by Principal Component Analysis (PCA) and Cluster Analysis (CA). PCA is a procedure for reducing data redundancy. CA is used to detect spatial similarity among sampling sites. This methodology is simple, rapid and the obtained results demonstrate that the electronic tongue technique based on the sensor array combined with pattern recognition method could be a useful tool for the characterization and the classification of wells water samples.


Introduction
Groundwater originates from the deep infiltration of rain and surface water.In general terms groundwater flows slowly through geologic formations and remains in contact with minerals often isolated from the atmosphere.The quality parameters of groundwater are a matter of serious concern today, which represents a critical point to be evaluated.These parameters aim at human consumption and irrigation or industrial use.It is very necessary to make a constant monitoring of water qualities for the protection of the natural environment as well as for the public well-being.Therefore it is worthwhile to use an appropriate control system able to point out any variation occurring in the water characteristics.When combined with chemometric techniques, these chemical sensors arrays are called electronic tongue.
Chemometric methods, namely, cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminate analysis (DA), are highly increasing in their use.They have been applied to the analysis of environmental data, such as water samples from lakes and rivers [1].Some examples of PCA and CA applications in environmental practices are described below.
Multivariate statistical techniques, CA and PCA were applied to the data on water quality of Manchar Lake in Pakistan [2].This was done to characterize and evaluate surface and freshwater quality, which is useful in verifying temporal and spatial variations caused by natural and anthropogenic factors linked to seasonality [3,4].Shrestha et al. applied the PCA and CA techniques for the evaluation of temporal/spatial variations and the interpretation of a large complex water quality data set of the Fuji river basin [5].In addition, the CA technique has been used for the characterization of the groundwater system of the southern plain of Friuli-Venezia Giulia Region (Italy) on the basis of its physico-chemical composition.This was to detect mul-tivariate patterns for unpolluted waters typical of specific areas in the plain, as well as for eventual polluted zones [6].These studies have provided good examples of the effective application of PCA and CA methods.
In recent years, the PCA and CA techniques have been applied to a variety of environmental issues, including evaluation of the monitoring of groundwater wells.For example, Simeonov et al. show a very comprehensive description of multivariate statistical assessment of water quality of northern Greece based on the evaluation of a large and complex database [7].
This present work aims at the characterization of well waters collected from three villages in Kairouan plain devoted to intensive agricultural practices, animal and human needs.In each sample nine chemical parameters were determined such as Calcium, Sodium, Potassium, Ammonium, Cadmium, Fluoride, Chloride, Nitrate and pH which are commonly used for defining the water quality.The experimental data obtained from the sensor array applied for the different water samples analysis are treated using pattern recognition techniques, such as the PCA and CA.

Sampling
The Kairouan plain involves more than 3000 km 2 and represents the most important part in the semi-arid central zone of Tunisia, with more than 5,000 shallow wells and more than 150 boreholes.This area is devoted to intensive agricultural practices and the groundwater is heavily used for irrigation and for domestic use.The analytical data are relative to 17 domestic wells with depth ranging from 40 to 140 m.The geographical distribution of the villages in which wells are located is represented in Figure 1.
The number of sampling wells in their territory is as follows: six wells located in Hadjeb el Aioun, six wells in Chebika and five wells in Sidi Amor Bou Hadjila.All water samples were taken in autumn (October-November 2009) using polyethylene bottles and were analyzed as soon as possible.

Chemical Sensor Array
Experimental measurements were performed with an electronic tongue system comprising 9 ion-selective electrodes (ELIT electrodes-NICO 2000 Ltd and Metrohm electrodes) and an Ag/AgCl reference electrode.The sensors and their attributes are shown in Table 1.
The Potentiometric measurements were performed thanks to an Ion-Analyser ELIT 9808 (8-Channel Ion-Analyser) which is a powerful tool for measuring and monitoring ion concentrations.The Ion-Analyser is re-quired to convert the electrical signal from the ion sensitive electrode into a relevant unit of concentration (ppm or mol .L -1 ).The electronic tongue architecture is shown in Figure 2. The principle of the method is to detect the changes in the voltage (mV) intensity between the chemical sensors (Working Electrodes) and the Ag/AgCl Reference Electrode (RE).The typical responses of two sensors for ions detection in water sample are shown in Figure 3 as an example.
Each curve represents a different potential sensor response in function with time (min).

The Dataset and Statistical Procedures
In this study 9 chemical parameters have been deduced from the characterized water samples collected from 17 wells.The data matrix based on the chemical parameters is subjected to different multivariate statistical techniques CA and PCA in order to examine the relationships between variables (such as chemical parameters in groundwater), to extract the information about the similarities or dissimilarities between sampling sites, to classify the well water in clusters and to identify the important discriminant variables.Statistical computations were executed using the statistical software package, SPSS 11.0.The multivariate methods utilized are summarized below.

Dataset
Data were organized into a matrix of 17 rows (well waters) x 8 columns (chemical parameters).The output data is treated subsequently with some kind of a multivariate analysis technique.Chemometrics make use of multivariate methods to analyze a large amount of chemical data.These methods extract information from data and produce a model that attempts to describe the reality.Chemometric methods-also known as multivariate statistical techniques-are enjoying in the last years a high scientific interest and are now routinely used in most fields of application [8][9][10].Chemometric methods identify the natural clustering pattern and group variables on the basis of similarities between the samples.The most common chemometric methods for classification are namely, cluster analysis (CA) and principal component analysis (PCA) which were applied to the multi-sensor output data sets.These multidimensional data analysis methods are becoming very popular in environmental studies dealing with measurements and monitoring [11].

Principal Components Analysis (PCA)
Principal component analysis is a technique widely used for reducing the dimensions of multivariate problems [12].PCA provides an objective way of finding indices of this type so that the variation in the data can be accounted for as concisely as possible [13].PCA provides information on the most meaningful parameters which describe the whole data set interpretation, data reduction and to summarize the statistical correlation among constituent in the water with minimum loss of original information [3].

Cluster Analysis (CA)
The CA technique is an unsupervised classification procedure that can be applied to data that exhibit "natural" grouping.The procedure involves measuring either the difference or the similarity between the objects to be clustered.The resulting clusters of objects should then exhibit high internal (within cluster) homogeneity and high external (between clusters) heterogeneity.Hierarchical agglomerative clustering is the most common approach, which provides instinctive similarity relationships between any other sample and the entire data set, and is typically illustrated by a dendrogram [14].The dendrogram provides a visual summary of the clus-tering processes, presenting a picture of the groups and their proximity, with a dramatic reduction in dimension-ality of the original data.The Euclidean distance usually gives the similarity between two samples and this distance can be represented by the difference between analytical values from the samples [15].Many applications of CA to water quality assessment have been reported [16][17][18][19].In this study, hierarchical agglomerative CA was performed on the normalized data set by means of the Ward's method, using squared Euclidean distances as a measure of simi-larity.

Chemical Parameter Features of the Ground Water Wells
The following chemical parameters were determined such as Potassium (K + ), Sodium (Na + ), calcium (Ca 2+ ), Ammonium  The minimum and maximum values of all chemical parameters of water samples collected from three sampling villages are presented in Table 2.The results are compared to the values of the World Health recommended maximum permissible limits [20].The values that go beyond the permissible limits are high-lighted.The pH value is related to the amount of hydrogen/hydroxide ions in water.The pH in water samples ranged from 7 to 7.74.During the autumn season, the pH trend shows minor variation.
The amount of Na + and Cl -in water wells was found to be the highest in Sidi Amor village that lie in a plain exposed to agricultural pollution.This amount exceeded the values proposed by WHO water quality standards.Sodium concentrations in water samples ranged from 380 mg/l to 786 mg/l, which was higher than permissible limit (200 mg/l).According to Versari et al. chloride concentrations higher than 200 mg/l are considered to be a risk for human health and may cause unpleasant taste of water [21].
It has been reported that high consumption of salts, NaCl in particular, may be crucial for the development of essential hypertension, may increase the risk for stroke, left ventricular hypertrophy, osteoporosis, renal stones and asthma [22].However, McCarthy (Entry et Farmer, 2001) also points out that the salt restrictions may evoke detrimental counter-regulatory metabolic responses such as increased production of rennin and angiotensin II together with increased sympathetic activity that are potentially inimical to vascular health.
The levels of 3 were found to be higher in Sidi omor water analysis (average 150 mg/l), which are mostly higher than permissible limit of drinking water by WHO guidelines (50 mg/l), indicating pollution in this region.Most cases of nitrate contamination in groundwater depend upon climate, differences in soil, fertilizer application, irrigation practices, and farming systems.The main source of nitrate pollution in water wells of Sidi Amor Bou Hadjila (that situated adjacent to farm fields) results from agriculture and the actions of farmers.High nitrate concentrations (>50 mg/l) can cause "bluebaby'' syndrome (a condition that prevents blood from carrying oxygen) and has been tentatively linked to increased rates of stomach cancer, birth defects, miscarriage, leukemia, Non-Hodgkin's lymphoma, reduced body growth and slower reflexes, and increased thyroid size [23].

NO 
The concentrations of Cd 2+ were found to be very low (<0.01 mg/l) in water samples collected from different villages.Elevated concentrations of Cd 2+ can cause nausea, vomiting, salivation and renal failure as well as kidney, liver and blood damages [24].Saleh et al. suggested that high concentrations of Cd 2+ may even cause muta-tions [25].
The K + concentration was less than permissible limit (12 mg/l) in all water samples.Moreover Ca 2+ concentration was below the optimum (100 mg/l) level in all the samples studied.Sodium is considered as a contributory cause of dietary cancer, whereas potassium may play a protective role [26].The effect of calcium is less clear as it may depend on the concentration of both sodium and potassium [27].
The results revealed that the quality of well water of Sidi Amor village is out of limited; according to the WHO standards, which can be used for drinking by humans only after prior treatment.

Multivariate Analysis
Several pattern recognition methods such as PCA and CA were applied to the multi-sensor output data sets.PCA can only be used as an unsupervised pattern recognition method; this behavior can indicate the data trend in a visualizing dimension (i.e., 1-dimension, 2-dimension or 3-dimension) space.
In this research work PCA was applied to summarize the statistical correlation among components in the water samples.Concentration order among all chemical parameters differs greatly and the statistical results should be highly biased by any parameter with high concentration.The result of the PCA based on the correlation matrix of chemical components is expressed in Table 3 and showed the high interdependence between particular variables such as Na + which was highly correlated with Cl -, K + , 3 NO  , and 4 NH  .On the basis of the score-plot (Figure 5) most of the information is gathered in the first two significant factors (96.2%).Their loadings are presented in the Table 4 and correspond to the correlation coefficient of a particular variable.Factor 1, with the highest grouping power, is highly correlated to chemical parameters: K + , 4 NH  , Cl -, Na + and 3 NO  .In our case we have only considered the two first principal components, PC1 and PC2, which explains 96.2% of total variance in the data set (9 variables) of groundwater.The first component (PC1) accounts for over 67.4% of the total variance in the data set of the well water samples, and the second component (PC2) explains 28.8% of the total variance.opposite way to Cd 2+ and F -, Ca 2+ appear more dispersed into the components space, showing a more individualized behavior.For visualizing cluster trend of 17 well water samples, a 2-dimension scatter plot using the top two principal components score vectors (i.e., PC1, PC2) is obtained, in Figure 6(b).We find a clear cluster trend for the three sampling villages in this 2-dimension space.
In general, the wells corresponding to different towns are clustered in different regions of the components space, so the differential behavior of the fractions is clearly demonstrated.
The superposition of the loading and score plots of Figure 6 shows that the sample wells No. 13, 14, 15, 16 and 17 were all characterized by high content of K + , NH 4 + , Cl -, Na + and 3 NO  .The 17 well waters with low-mineral content were grouped on the right side of the plot (circled).Continuous circles inscribe wells belonging to the same cluster.
The redundancy of information suggests applying the CA in order to reduce the dimensionality of dataset as it was done in PCA.Cluster analysis groups the objects (cases) into classes (clusters) on the basis of similarities within a class and dissimilarities between different classes.In this study, cluster analysis was used to detect the similarity groups between well water samples.The dendrogram grouping all the 17 wells into three statistically significant clusters (groups) is presented in the Figure 7. Cluster I is formed by wells No. 1, 2, 3, 4, 5, and 6, cluster

Conclusions
The electronic tongue on the basis of the array of 9 ionselective electrodes and pattern recognition techniques for the data processing was employed to qualitative analysis in order to characterize and to classify groundwater samples collected from kairouan plain.It can be concluded that the multivariate statistical techniques such as PCA and CA are a powerful tools for the classification of a series of well waters as function of their chemical composition.This approach of using a chemical multi-sensor array sensitive to various species might also be a viable option to distinguish different water resources.

Figure 1 .
Figure 1.Map of study area in which wells are located.

Figure 3 .
Figure 3. Responses of seven sensors of the electronic tongue in water samples.
 , Fluoride (F -) and pH as an illustration.The values of these parameters are plotted together in Figure4.

Figures 6 (
a) and 6(b) show respectively the loading and score plots for PC1 and PC2.From the interpretation of these Figures we can deduce the pattern of behavior of the chemical elements.

Figure 6 (
a) shows the behavior of the variables (chemical parameters).As it can be seen, there is an association of K + , 4 , Cl -, Na + and 3 NH  NO  , and in a less extension pH.T ese elements behave in an h

Figure 5 .
Figure 5. Score-plot for the principal component model of the monitoring data.

Table 4 .Figure 6 .
Figure 6.Representation of the PC scores on the first two components for the Kairouan well waters.Waters with high-mineral content are located within the ellipse.(a) Loading plot of variables; (b) score plot of observations.

Figure 7 .
Figure 7. Dendrogram showing clustering of groundwater.Each cluster indicates groups of similar physico-chemistry.II by wells No. 7, 8, 9, 10, 11, 12, and finally cluster III by wells 13, 14, 15, 16, 17.It is clearly seen that cluster III is characterized by the biggest Euclidean distance among the other clusters (high significance of clustering).This cluster corresponds to wells situated relatively in Sidi Amor Bou Hadjila village.The results indicate that the CA technique is useful in offering reli-able classification of well waters.There are other reports where similar approach has successfully been applied to water quality programs[4,7].