^{1}

^{*}

^{1}

^{1}

^{2}

Groundwater vulnerability for nitrate pollution of groundwater in the Brussel’s Capital Region was modelled using data-driven modelling approaches. The land use in the study area is heterogeneous. The South South-Eastern part of the region is forested, while the remaining part is urbanised. Groundwater nitrate concentration data were determined at 48 measurement stations distributed over the study area. In addition, oxygen and nitrogen isotope concentration of the nitrates were determined. The data show that the groundwater body is degraded, particularly in the urbanised part of the study area. The contamination with nitrates at degraded stations is slightly decreasing, while the opposite is true for the nitrate contamination at the less degraded stations. We modelled the contamination and trends of nitrate contamination using linear and non-linear statistical modelling techniques. In total, we defined 23 spatially distributed proxy variables that could explain nitrate contamination of the groundwater body. These proxy variables were defined at the grid size of 10 m, and averaged over the influence zone of each measurement station. The influence zones were identified using a simplified particle tracking algorithm from the groundwater piezometric map. The calculated influence zones were consistent with results obtained from a detailed numerical groundwater flow and transport model. Stepwise regression allowed explaining 56% of the observed variability of nitrate contaminations, while non-linear artificial neural network modelling allows explaining nearly 60% of the variability. The dominant explaining variables are the percentage of impermeable surface, the percentage of the sewage system that is in a degradation state, the number of urban infrastructure construction permits with a high pollution risk, the size of the influence zone, and the depth of the groundwater sampling. These results illustrate the important role of urban infrastructure on groundwater degradation and are consistent with the isotopic signature of nitrates determined on the sampling stations. The overlay of the nitrate contamination data with the DRASTIC vulnerability model shows that this latter conceptual model captures partially the spatial signature of the observed contamination.

Groundwater pollution vulnerability defines the sensitivity of a groundwater body of being adversely affected by an imposed contaminant load [

Several approaches exist for assessing groundwater vulnerability. They can be grouped into methods based on the use of 1) process-based simulation models, 2) statistical models and 3) overlay and index methods [

The first and most straightforward method for modelling vulnerability consists of the mapping of pollution as assessed from a monitoring program. In this approach the actually observed pollution is used as a metric of vulnerability. The rational of this model relies on the logic that if pollution is observed, the groundwater body should be vulnerable. Yet, the inverse is not necessarily true.

In Europe, groundwater sites are monitored on a regular basis for a broad set of physicochemical parameters. This monitoring activity is deployed to comply with the current different regulations related to water (e.g. Water Framework Directive in Europe). The data collected in the different monitoring programs can therefore directly be used for making a first assessment of vulnerability. Mapping of the pollution can be done using standard GIS software or using more advanced geostatistical procedures. For instance, [

Alternatively, vulnerability can be assessed by means of vulnerability models, which integrates the loading, transport, retention and attenuation processes of pollution towards the groundwater body in a mathematical formalism. Use can be made of process-based models [

The methods based on the monitoring data or on the modelling data have the potential to predict vulnerability. Most of them can even predict the full statistical distribution (i.e. the expected value and the higher moments) of the vulnerability, which allows to assess the accuracy of the assessment (the precision cannot be assessed as vulnerability cannot be “measured” in a direct way). The accuracy and hence the uncertainty will be region and case specific. Hence methods can be selected and optimized in terms of the uncertainty associated with the vulnerability assessment method. Alternatively, methods can be combined, following the logic that each method allows a partial assessment of the vulnerability of a groundwater body. Such combined methods rely on operational data fusion techniques. Data fusion is a method that can be used to combine optimally various sources of information about groundwater quality and vulnerability in a consistent and accurate model prediction. Among different data fusion techniques, a Bayesian Data Fusion (BDF) approach was recently proposed by [

Most of these recent vulnerability assessment methodologies reviewed in the previous sections have been developed at the regional scale, national or continental scale, focusing on rural and nature environments. Yet few of these methods have been tested in strongly urbanized and human impacted environments. There is therefore scope to analyze more in detail the vulnerability assessment methodologies for urban and peri-urban environments. This is particularly important given the strong concentration of the global population in urban environments, and the many groundwater function and services developed in urban settings such as the provision of drinking water to the urban population. The study presented in this paper should therefore be considered in this context.

In this study, we present the use of a statistical based modelling approach for assessing groundwater vulnerability of the Brusselean groundwater body in the Brussel’s Capital Region, in Belgium. We focus on the nitrate pollution problem, since the local authorities need to comply with the European Nitrate directive, and therefore sufficient nitrate monitoring data are available to implement statistical modelling approaches. We compare linear and non-linear models. We consider the loading of the independent explanatory variables in these models as indicators of possible pollution sources. The development of a statistical based modelling technique should be considered as an initial step towards a hybrid model that combines process knowledge or data based empirical models with monitoring data.

The Brussel’s Capital Region (BCR) is situated in the center of Belgium and encompasses 19 communities, a. o. the city of Brussels (

The land use is dominated by urbanization. Forty eight percent of the surface area is urbanized. The non-urbanized area is dominated by forest cover (11% of total area) situated in the South Eastern part of the region, and public parks and gardens (8% of total area). The river Zenne, which is an affluent of the Scheldt river, intersects the region. The river is covered in the major part of the city. The relief is gently sloping from the river banks onwards. Details on the land use of the BCR can be consulted at http://www.geo.irisnet.be/en/maps/new/.

The hydrogeology of the BCR is illustrated in

Brussels sandy formation is partially covered by a less permeable Middle Eocene clayey formation (Maldegem Formation). The unconfined Brusselean groundwater body situates within the Brussels sandy aquifer. This groundwater body is locally exploited for drinking water provision purposes at an average rate of 2.5 mio m^{3}/year.

The Brusselean groundwater body needs to comply with current EU environmental regulations, amongst others with the EU nitrate directive. Given the unconfined nature of the water body, the water body is potentially subjected to pollution from point sources (infiltration holes, animal storage facilities, cemeteries, waste water treatment facilities) or from diffuse sources (natural mineralization of soils, leaking sewage systems, fertilization of public parks, urban agriculture, …). Therefore a nitrate monitoring program was initiated, encompassing 48 monitoring points. The positions of these monitoring stations are given in

analyzed the statistical trend. The statistical trend analyses were made using the Tau Kendall trend test [

Statistical models were developed to predict nitrate groundwater pollution in terms of easily available spatial attributes. The dependent variable in these models was the mean of the annual average nitrate concentration at a given location. The independent variables were the values of the representative ancillary variables within the influence zone of the monitoring station. In total 23 spatially distributed ancillary attributes were defined. Ancillary variables were defined on a regular 10 m resolution grid. The selected attributes belonged to 4 categories. The natural hydrogeological environment category comprised variables that are directly related to the basic hydrogeological setting (depth of the aquifer, size of the influence zone of the monitoring well, part of the influence zone confined with Middle Eocene clayey formation, etc.). Second, the urban density category included variables such as population density in the influence zone, percentage of impermeable surface in the influence zone, etc. The authorization category included variables related to specific urbanization permits (e.g. authorization for animal exploitation). The last category, encompassed variables related to the status of the sewage system.

Previous studies illustrated the sensitivity of statistical modelling results on the size and shape of the influence zones of each monitoring point [

We implemented linear statistical models using multiple stepwise regression, and non-linear artificial neural networks with one neuron layer and 3 neurons, implemented in JMP SASTM (see e.g. see https://www.jmp.com/support/help/14-2/neural-networks.shtml or https://www.jmp.com/support/help/14-2/multivariate-methods.shtml#). We used the Akaike Information Criterion (AIC) to analyze the robustness of the model and to identify the most appropriate model structure. We separated the data set, so that data from 38 stations could be used for model identification (i.e. the model calibration data set) and 10 stations for model validation (i.e. the model validation data set).

The mean annual average nitrate concentration in the monitoring wells exceeds the WHO nitrate drinking water norm for nitrate significantly in the Northern urbanized part of the groundwater body (

Generally, only small trends can be observed in the nitrate concentration time series, and only a part of them are significant from a statistical point of view (

Results of the dual isotope analysis are shown in

with the more detailed SIAR analysis (results not shown) suggest a strong influence of manure or sewage and soil mineralization to the overall nitrate groundwater loading. Given the small contribution of agriculture in the BCR land use, the contribution of manure in the overall loading is not likely. Hence, the isotope analysis suggests merely a contribution from leaking sewage systems to the observed nitrate loading. Waste water in BCR is majorly collected in a unitary sewage network system, and the status of the unitary system is poor. Some of the sewage lines are more than 100 years old and are partially degraded. The waste water manager recently mapped the status of the sewage network. It is estimated that more than 500 km of the total of 1900 km of sewage lines are in a poor status. Hence leaking of organic matter loaded waste water towards the groundwater body is likely.

The simplified methodology for delineating the influence zones of the monitoring wells was compared with the results of a delineation method using the numerical and mechanistic hydrogeological model code FEFLOW [

The Akaike Information Criterion (AIC) and R2 were used to identify the independent parameters to be included in the linear multiple regression and ANN models. Results of the AIC and R2 in terms of the number of independent parameters are shown in

It is well known that many processes determining nitrate pollution of groundwater are often non-linear. To incorporate possible non-linearity, the model with

five parameters was refined using a non-linear artificial neural network (ANN) model. With this ANN model, we are able to explain 79% of the variation in mean annual nitrate concentration in training mode, and 60% in validation mode. This is consistent with other studies showing that non-linear statistical models increase the explanatory power of the model [

The 5 most significant parameters explaining observed nitrate concentrations in the linear model are 1) the percentage of impermeable surface in the influence zone of the monitoring well; 2) the density of sewage water collectors that are classified as moderate in these influence zones; 3) the number of environmental permits considered being at risk in the influence zone; 4) the size of the influence zone; and 5) the depth of the monitoring well. The individual effect of these parameters on the predicted mean nitrate concentration for both the linear and the non-linear model is illustrated in

Process based models, indicator models and statistical models can be used to asses nitrate contamination vulnerability of groundwater. In this study, we preferred statistical models to asses the vulnerability. Indeed, most of the approaches in the literature deal with nitrate contamination in rural and peri-urban environments, where nitrate contamination from agricultural origin may significantly

contribute to the degradation of groundwater quality. The processes of nitrate fate and transport in agricultural soils and subsoils are well known and are integrated in validated nitrate fate and transport models.

These models can be used to assess the degradation status in rural and peri-urban environments. In urban environments however, the sources and pathways of nitrate contamination are much more complex. Process based modelling approaches are therefore less appropriate for modelling vulnerability of groundwater systems in these environments.

Statistical and machine learning approaches are appropriate to model nitrate contamination of groundwater in complex environments, when sufficient data on the contamination and the possible explanatory variables are available. The individual effects give some insight on the relevance of explanatory variables in the overall process. The overall model structure identified in the present study may be rather generic, since it aligns with other studies on nitrate contamination in urban environments. Yet, the specific model parameters of the statistical models are study site specific and should at best be recalibrated for each new study.

In this study, we implemented statistical models to predict the groundwater nitrate contamination of the unconfined Brusselean groundwater body in the urban environment of the Brussel’s Capital Region. This groundwater body is degraded, with nitrate concentration levels exceeding the drinking water norms in the northern part of the region. The spatial structure of the nitrate contamination is consistent with the predicted vulnerability by means of the DRASTIC model. The temporal structure exhibits small trends that are moderate significant, with an increasing trend for the points with low concentrations and a decreasing trend for the points with high concentrations. The explanatory variables suggest a strong impact of urban infrastructure on groundwater nitrate degradation. This is consistent with many other studies reported in literature and with the dual isotopic analysis of detected nitrate in this groundwater body. This also illustrates the importance of the maintenance and/or rehabilitation of urban infrastructure to preserve groundwater quality. Groundwater restauration in the studied urban environment should primarily focus on the rehabilitation of the degraded part of the sewage water network.

The statistical models were able to explain 56% of total variation of observed nitrate contamination when a linear model structure is used. This level could increase to 60% in validation mode when a non-linear model structure was used. The proposed models can therefore be used to predict spatially distributed nitrate contamination of the groundwater body in terms of available spatially distributed ancillary variables. The predicted contamination can hence be considered as a proxy for the groundwater vulnerability. The statistical basis of the models developed in this study allows also to assess the uncertainty of the contamination predictions, and these uncertainty assessments are based on a solid theoretical basis. Statistical models for predicting vulnerability have therefore a comparative advantage to parametric models as they are data based and as they allow to add a quality label to the vulnerability prediction.

The statistical models presented in this study focus on nitrate contamination. It is expected that similar approaches can be developed for other contaminants. The statistical models can therefore be used to improve the overall mapping of groundwater quality by assimilating available and new groundwater monitoring data with predictions coming to the statistical models. Use can be made of advanced data fusion techniques to generate such type of models. The presented approaches can therefore be used to support the monitoring program, by identifying the locations which currently are poorly sampled, or by interpolation when monitoring data are missing. The resulting maps may be valuable for steering the many groundwater protection or restauration programs in such urban environments.

The authors declare no conflicts of interest regarding the publication of this paper.

Vanclooster, M., Petit, S., Bogaert, P. and Lietar, A. (2020) Modelling Nitrate Pollution Vulnerability in the Brussel’s Capital Region (Belgium) Using Data-Driven Modelling Approaches. Journal of Water Resource and Protection, 12, 416-430. https://doi.org/10.4236/jwarp.2020.125025