Modelling Nitrate Pollution Vulnerability in the Brussel’s Capital Region (Belgium) Using Data-Driven Modelling Approaches

Groundwater vulnerability for nitrate pollution of groundwater in the Brussel’s Capital Region was modelled using data-driven modelling approaches. The land use in the study area is heterogeneous. The South South-Eastern part of the region is forested, while the remaining part is urbanised. Groundwater nitrate concentration data were determined at 48 measurement stations distributed over the study area. In addition, oxygen and nitrogen isotope concentration of the nitrates were determined. The data show that the groundwater body is degraded, particularly in the urbanised part of the study area. The contamination with nitrates at degraded stations is slightly decreasing, while the opposite is true for the nitrate contamination at the less degraded stations. We modelled the contamination and trends of nitrate contamination using linear and non-linear statistical modelling techniques. In total, we defined 23 spatially distributed proxy variables that could explain nitrate contamination of the groundwater body. These proxy variables were defined at the grid size of 10 m, and averaged over the influence zone of each measurement station. The influence zones were identified using a simplified particle tracking algorithm from the groundwater piezometric map. The cal-culated influence zones were consistent with results obtained from a detailed numerical groundwater flow and transport model. Stepwise regression al-lowed explaining 56% of the observed variability of nitrate contaminations, while non-linear artificial neural network modelling allows explaining nearly 60% of the variability. The dominant explaining variables are the percentage of impermeable surface, the percentage of the sewage system that infrastructure on groundwater degradation and are consistent with the isotopic signature of nitrates determined on the sampling stations. The overlay of the nitrate contamination data with the DRASTIC vulnerability model shows that this latter conceptual model captures partially the spatial signature of the observed contamination.


Introduction
Groundwater pollution vulnerability defines the sensitivity of a groundwater body of being adversely affected by an imposed contaminant load [1]. This concept entails two notions: intrinsic and specific vulnerability. Intrinsic vulnerability defines the vulnerability of groundwater to contaminants generated by human activities, depending on the inherent geological, hydrological and hydrogeological characteristics of an area (soil type, topography, recharge, vadose zone, etc.), but independent of the nature of contaminants. For specific vulnerability, specific physicochemical properties from contaminants are considered [2].
Groundwater pollution risk can be defined as the process of estimating the possibility that a particular event may occur under a given set of circumstances [3] and the assessment is achieved by overlaying hazard and vulnerability [4].
Several approaches exist for assessing groundwater vulnerability. They can be grouped into methods based on the use of 1) process-based simulation models, 2) statistical models and 3) overlay and index methods [2] [5]. Alternatively, they can be classified according to the degree of integration of monitoring data in the vulnerability assessment [6]. Hence, distinction can be made between vulnerability assessment methods based on generic data, based on groundwater monitoring data, or hybrid methods based both on monitoring and generic data.
The first and most straightforward method for modelling vulnerability consists of the mapping of pollution as assessed from a monitoring program. In this approach the actually observed pollution is used as a metric of vulnerability. The rational of this model relies on the logic that if pollution is observed, the groundwater body should be vulnerable. Yet, the inverse is not necessarily true.
In Europe, groundwater sites are monitored on a regular basis for a broad set of physicochemical parameters. This monitoring activity is deployed to comply with the current different regulations related to water (e.g. Water Framework Directive in Europe). The data collected in the different monitoring programs can therefore directly be used for making a first assessment of vulnerability. Mapping of the pollution can be done using standard GIS software or using more advanced geostatistical procedures. For instance, [7] illustrated how time dynamics of pollution parameters can be integrated into the mapping approach. Journal of Water Resource and Protection However, monitoring programs suffer from many drawbacks such as limited spatial support, low space-time resolution of observed pollution, limited number of pollution parameters and in particular a high cost [8]. In addition, the results of monitoring programs do not directly allow identifying the origin of the pollution, and does not allow a clear distinction between specific and intrinsic vulnerability. The observed pollution is therefore often a biased metric of the real vulnerability. Groundwater pollution maps therefore only yield a partial image of the real groundwater vulnerability.
Alternatively, vulnerability can be assessed by means of vulnerability models, which integrates the loading, transport, retention and attenuation processes of pollution towards the groundwater body in a mathematical formalism. Use can be made of process-based models [9], statistical based models [10] [11] or generic parametric vulnerability models, like the DRASTIC model [12]. In the class of statistical models, use can be made of statistical models based on observation data, or statistical models based on more complex system models, i.e. the so-called meta-models [13] [14]. The generic parametric methods represent definitely the most utilized approach. The methods based on the monitoring data or on the modelling data have the potential to predict vulnerability. Most of them can even predict the full statistical distribution (i.e. the expected value and the higher moments) of the vulnerability, which allows to assess the accuracy of the assessment (the precision cannot be assessed as vulnerability cannot be "measured" in a direct way). The accuracy and hence the uncertainty will be region and case specific. Hence methods can be selected and optimized in terms of the uncertainty associated with the vulnerability assessment method. Alternatively, methods can be combined, following the logic that each method allows a partial assessment of the vulnerability of a groundwater body. Such combined methods rely on operational data fusion techniques. Data fusion is a method that can be used to combine optimally various sources of information about groundwater quality and vulnerability in a consistent and accurate model prediction. Among different data fusion techniques, a Bayesian Data Fusion (BDF) approach was recently proposed by [15]. It was especially designed for spatial predictions problems and provides a consistent framework of fusing an arbitrary large number of information sources that are related to a same variable of interest in order to provide a unique spatial prediction. The main advantage of a Bayesian approach is to put the problem of data fusion into a clear probabilistic framework. Recently, the BDF method was also successfully applied to map groundwater pollution in Belgium [16] and the Democratic Republic of Congo [17].
Most of these recent vulnerability assessment methodologies reviewed in the previous sections have been developed at the regional scale, national or continental scale, focusing on rural and nature environments. Yet few of these methods have been tested in strongly urbanized and human impacted environments. There is therefore scope to analyze more in detail the vulnerability assessment methodologies for urban and peri-urban environments. This is partic-Journal of Water Resource and Protection ularly important given the strong concentration of the global population in urban environments, and the many groundwater function and services developed in urban settings such as the provision of drinking water to the urban population. The study presented in this paper should therefore be considered in this context.
In this study, we present the use of a statistical based modelling approach for assessing groundwater vulnerability of the Brusselean groundwater body in the Brussel's Capital Region, in Belgium. We focus on the nitrate pollution problem, since the local authorities need to comply with the European Nitrate directive, and therefore sufficient nitrate monitoring data are available to implement statistical modelling approaches. We compare linear and non-linear models. We consider the loading of the independent explanatory variables in these models as indicators of possible pollution sources. The development of a statistical based modelling technique should be considered as an initial step towards a hybrid model that combines process knowledge or data based empirical models with monitoring data.

The Study Region
The Brussel's Capital Region (BCR) is situated in the center of Belgium and encompasses 19 communities, a. o. the city of Brussels ( Figure 1).
The land use is dominated by urbanization. Forty eight percent of the surface area is urbanized. The non-urbanized area is dominated by forest cover (11% of total area) situated in the South Eastern part of the region, and public parks and gardens (8% of total area). The river Zenne, which is an affluent of the Scheldt river, intersects the region. The river is covered in the major part of the city. The relief is gently sloping from the river banks onwards. Details on the land use of the BCR can be consulted at http://www.geo.irisnet.be/en/maps/new/.
The hydrogeology of the BCR is illustrated in Figure

The Nitrate Monitoring Data Set
The Brusselean groundwater body needs to comply with current EU environmental regulations, amongst others with the EU nitrate directive. Given the unconfined nature of the water body, the water body is potentially subjected to pollution from point sources (infiltration holes, animal storage facilities, cemeteries, waste water treatment facilities) or from diffuse sources (natural mineralization of soils, leaking sewage systems, fertilization of public parks, urban agriculture, …). Therefore a nitrate monitoring program was initiated, encompassing 48 monitoring points. The positions of these monitoring stations are given in Figure 3 and Figure 4. We consider in this study the monitored yearly averaged nitrate data since 2006. We mapped the mean average annual concentration and Figure 3. Nitrate contamination observed in the Brussels groundwater body as observed in 50 monitoring stations. In the background, data are projected on a DRASTIC vulnerability map. The latter has been parametrized using standard generic data. analyzed the statistical trend. The statistical trend analyses were made using the Tau Kendall trend test [17]. A selection of water samples was also analyzed for the stable N and O isotopes. Isotope data were analyzed by means of the Bayesian source identification model SIAR (Stable Isotope Analysis in R) [19].

The Statistical Modelling Approach
Statistical models were developed to predict nitrate groundwater pollution in terms of easily available spatial attributes. The dependent variable in these models was the mean of the annual average nitrate concentration at a given location. The independent variables were the values of the representative ancillary variables within the influence zone of the monitoring station. In total 23 spatially distributed ancillary attributes were defined. Ancillary variables were defined on a regular 10 m resolution grid. The selected attributes belonged to 4 categories. The natural hydrogeological environment category comprised variables that are directly related to the basic hydrogeological setting (depth of the aquifer, size of the influence zone of the monitoring well, part of the influence zone confined with Middle Eocene clayey formation, etc.). Second, the urban density category included variables such as population density in the influence zone, percentage of impermeable surface in the influence zone, etc. The authorization category included variables related to specific urbanization permits (e.g. authorization for animal exploitation). The last category, encompassed variables related to the status of the sewage system.
Previous studies illustrated the sensitivity of statistical modelling results on the size and shape of the influence zones of each monitoring point [21]. Influence zones, and hence the independent parameters that are considered in the statistical model, should be defined using hydrogeological criteria, similarly as with methods to delineate groundwater well protection zones. In this study, we used a GIS based simplified approach to model the influence zones. We modelled particle transport from the monitoring wells, using the slope of the flipped aquifer depth map as a proxy for the hydraulic gradient. We considered generic data of the aquifer thickness, porosity and transmissivities, and modelled steady state flow magnitude and direction in the GIS environment. Subsequently, we injected conservative particles in the flow field and evaluated the particle displacement after 5, 10 and 20 years. We added a porous tuff parameter corresponding to a longitudinal dispersivity of 30 m and lateral dispersivity of 10 m. The envelopes of particles displaced after 5, 10 and 20 years were superposed to define the influence zone of each monitoring well.
We implemented linear statistical models using multiple stepwise regression, and non-linear artificial neural networks with one neuron layer and 3 neurons,

Results and Discussion
The mean annual average nitrate concentration in the monitoring wells exceeds the WHO nitrate drinking water norm for nitrate significantly in the Northern urbanized part of the groundwater body ( Figure 3) The simplified methodology for delineating the influence zones of the monitoring wells was compared with the results of a delineation method using the numerical and mechanistic hydrogeological model code FEFLOW [20]. Results suggest a conservative estimate of the influence zone with the simplified methodology ( Figure 6). The methodology can therefore be used to weigh the spatial attributes of the independent variables for constructing the statistical models. The Akaike Information Criterion (AIC) and R2 were used to identify the independent parameters to be included in the linear multiple regression and ANN models. Results of the AIC and R2 in terms of the number of independent parameters are shown in Figure 7. A linear model with 5 parameters allows to describe 56% of the observed variation in nitrate concentrations of the calibration data set (Figure 7, Figure 8). When more than 6 parameters are used, the increasing AIC suggests an overfitting of the data.
It is well known that many processes determining nitrate pollution of groundwater are often non-linear. To incorporate possible non-linearity, the model with  These models can be used to assess the degradation status in rural and peri-urban environments. In urban environments however, the sources and pathways of nitrate contamination are much more complex. Process based modelling approaches are therefore less appropriate for modelling vulnerability of groundwater systems in these environments.

Conclusions
In this study, we implemented statistical models to predict the groundwater nitrate contamination of the unconfined Brusselean groundwater body in the urban environment of the Brussel's Capital Region. This groundwater body is degraded, with nitrate concentration levels exceeding the drinking water norms in the northern part of the region. The spatial structure of the nitrate contamination is consistent with the predicted vulnerability by means of the DRASTIC model. The temporal structure exhibits small trends that are moderate significant, with an increasing trend for the points with low concentrations and a decreasing trend for the points with high concentrations. The explanatory variables suggest a strong impact of urban infrastructure on groundwater nitrate degradation. This is consistent with many other studies reported in literature and with the dual isotopic analysis of detected nitrate in this groundwater body. This also illustrates the importance of the maintenance and/or rehabilitation of urban infrastructure to preserve groundwater quality. Groundwater restauration in the studied urban environment should primarily focus on the rehabilitation of the degraded part of the sewage water network. The statistical models were able to explain 56% of total variation of observed nitrate contamination when a linear model structure is used. This level could increase to 60% in validation mode when a non-linear model structure was used. The proposed models can therefore be used to predict spatially distributed nitrate contamination of the groundwater body in terms of available spatially distributed ancillary variables. The predicted contamination can hence be considered as a proxy for the groundwater vulnerability. The statistical basis of the models developed in this study allows also to assess the uncertainty of the contamination predictions, and these uncertainty assessments are based on a solid theoretical basis. Statistical models for predicting vulnerability have therefore a comparative advantage to parametric models as they are data based and as they allow to add a quality label to the vulnerability prediction.
The statistical models presented in this study focus on nitrate contamination.
It is expected that similar approaches can be developed for other contaminants.
The statistical models can therefore be used to improve the overall mapping of groundwater quality by assimilating available and new groundwater monitoring data with predictions coming to the statistical models. Use can be made of advanced data fusion techniques to generate such type of models. The presented approaches can therefore be used to support the monitoring program, by identifying the locations which currently are poorly sampled, or by interpolation when monitoring data are missing. The resulting maps may be valuable for steering the many groundwater protection or restauration programs in such urban environments.