^{1}

^{*}

^{1}

^{*}

Characterizations of unknown groundwater pollution sources in terms of source location, source flux release history and sources activity initiation times, from sparse observation concentration measurements are a challenging task. Optimization-based methods are often applied to solve groundwater pollution source characterization problem. These methods are effective only when the starting times of activity of the sources are precisely known, or the possible time window within which the sources activity actually start is known with a fair degree of certainty. However, in real life scenarios, the starting time of the activity of the sources is either unknown or can lie anywhere within a time window of years or decades. Absence of any prior information about the span of time window, within which the sources become active, makes existing source identification methodologies inefficient. As an alternative, an optimization-based source identification model is proposed, to simultaneously estimate source flux release history and sources activity initiation times. The method considers source flux release history and sources activity initiation times as explicit decision variables, optimally estimated by the decision model. Performance of the developed methodology is evaluated for an illustrative study area having multiple sources with different source activity initiation times, missing observation data and transient flow conditions. These evaluation results demonstrate the potential applicability of the proposed methodology and its capability to correctly estimate the unknown source flux releasing history and sources activity initiation times.

In the event of detection of any pollutant in the groundwater system, the next step is to design an effective remediation strategy for reclaiming a polluted aquifer. This requires precise information of the source characteristics in terms of source location, source flux release history and sources activity initiation times. The process of flow and transport in the groundwater system is so slow, that many times pollution in an aquifer is detected much after the pollution sources have become active or may even have ceased to exist. Therefore, even a rough guess of the time span within which the source activity has started may be very difficult.

Optimization based methodologies developed so far for unknown source flux characterization which relies on some knowledge of the approximate starting time of the sources activities. In all real world problems, the starting time of the activity of the sources is either unknown or can lie anywhere within a time window of years or decades. Absence of any prior information about the starting time of activity of the pollution sources, or the span of time window, within which the sources becomes active, renders the existing source identification methodologies inefficient. To overcome this, a methodology is developed for simultaneously estimating source flux releasing history and sources activity initiation times, using a new optimal decision model that can efficiently solve the source characterization problem, when it is impossible to specify a small time span within which the source activity may have started.

Unknown groundwater pollution sources characterization is non-linear [

All existing optimization based source identification techniques implicitly assume that the starting time of the activity of the sources is precisely known, or the time span for the possible start of the sources activity is known with fair degree of certainty. This is because, in the existing methodologies, the starting time is not treated as an explicit decision variable, but generally estimated indirectly by solving for source fluxes magnitudes at defined discretised time intervals. The following scenarios represent the existing methods and their limitations.

1) The actual starting time of the source activity is assumed to be precisely known. This is an impractical assumption and in all real world scenarios, there is seldom any clue about the actual starting time of the sources. As a result most of these developed methods cannot be applied to such real world scenarios.

2) In scenarios where the time span for the possible start of the sources activities is known with fair degree of certainty, a typical range being of 1 to 5 years, the precise actual starting time of the sources is estimated indirectly by estimating the source fluxes magnitude for entire time span, discretised into smaller stress periods. As a result, large number of decision variables is added to the optimization problem. Each of these decision variables represent the source flux magnitude for a potential/actual source for a given stress period. Hence, the optimization problem needs to be solved for a large number of source flux magnitudes at different time intervals. Such a solution may theoretically seem possible but with every added decision variable, the dimension of the search space increases, leading to exponential rise in the computational time. This indirect approach does not represent an efficient formulation of the optimum decision model.

3) In the third scenario where there is no information of time span for the possible start of the sources activities, solution of source identification becomes highly challenging. If the solution approach as discussed for the previous scenario is applied to solve for the actual starting time, would mean solving for hundreds of source flux magnitudes for a very large number of stress periods, for a very large time span. This would not only increase the size of the optimization problem, but may render it computationally infeasible.

4) The solution methodology for all the above mentioned scenarios implicitly assumes that the actual starting time of the sources will always lie within a specified time span considered in the study. If this is untrue, the existing methods fail to give any meaningful result.

To address these limitations, a new methodology is proposed for simultaneously identifying the starting times of the activity of the sources and their flux release history. A new optimum decision model is formulated and solved for this purpose. SA is used for solving the optimization problem with starting times of the sources as the explicit decision variable. Performance of the proposed methodology is evaluated by solving an illustrative example problem for different sources activity initiation time scenarios, transient flow conditions and missing observation data to demonstrate its potential applicability.

The proposed methodology incorporates the starting time of the activity of the source as explicit unknown variable in the optimization based decision model. Simulated Annealing is used for solving the optimization problem with an objective of minimizing the residual error between the simulated and observed pollutant concentrations measurements at observed locations. The optimization model is solved such that the unknown source fluxes and the stating times of the activity of the sources are the direct solutions of the source identification problem.

A linked simulation optimization model simulates the physical process within the optimization model. Therefore the simulation model is treated as a set of binding constraints for the optimization model. Therefore any feasible solution of the optimization model needs to satisfy the flow and the transport simulation model. The advantage of this approach is that it is possible to link any complex numerical model to the optimization model. In this simultaneous optimal source flux and activity starting time identification model, the flow and transport simulation models are linked to the optimization model using the SA algorithm for solution.

A three dimensional numerical model MODFLOW [

where K_{xx}, K_{yy} and K_{zz} represent the values of hydraulic conductivity along the x, y and z coordinate axes (LT^{−}^{1})

h is the potentiometric head (L)

W is the volumetric flux per unit volume representing sources and/or sinks (T^{−}^{1})

S_{s} is the specific storage of the porous material (L^{−}^{1})

t is time (T)

x, y and z are the cartesian co-ordinates (L)

A three dimensional modular pollutant transport model MT3DMS [

where C is the concentration of pollutants dissolved in groundwater (ML^{−}^{3})

t is time (T)

x_{i} is the distance along the respective Cartesian coordinate axis (L)

D_{ij} is the hydrodynamic dispersion coefficient tensor (L^{2}T^{−}^{1})

v_{i} is the seepage or linear pore water velocity (LT^{−}^{1}); it it is related to the specific discharge or Darcy flux through the relationship, v_{i} = q_{i}/θ

q_{s} is volumetric flux of water per unit volume of aquifer representing fluid sources (positive) and sinks (negative) (T^{−}^{1})

C_{s} is the concentration of the sources or sinks (ML^{−}^{3})

θ is the porosity of the porous medium (dimension less)

^{−}^{3}T^{−}^{1}).

Simulated Annealing (SA) is used as an optimization algorithm to solve the optimization problem. Simulated Annealing was first introduced by [

In source identification problem where the starting times of the activity of the sources is known, temporal pollutant source fluxes from all the potential sources, represented by the term

If the starting time of the activity of the sources is assumed to be known, the simulated concentration measurements can be correctly mapped in time to the observed concentration measurements to calculate the residual error at the observed locations. However, in real life scenario, the starting time of the source is mostly unknown. Hence it becomes unclear which temporal concentration measurement values on the simulated breakthrough curve, corresponds in time to the observed concentration measurements, and can be used to calculate the residual error at the observed locations. This can be understood from the illustrative example shown below.

rrectly in time and space. Hence, the time lags between the start of the activity of the sources and the observed concentration measurement needs to be ascertained in order to estimate the sources activity starting times.

To overcome the limitation of the earlier proposed methodologies, an explicit decision variable, time lag

The following variables are used in the formulation of the source identification model:

N = number of concentration measurements available (assumed same for all measurement locations) such that

^{−}^{3});

^{−}^{3}).

The objective function is defined as:

Subject to

where

The constraint set in equation (6) essentially represents the linking of the optimization algorithm with the numerical groundwater flow and transport simulation model through the decision variables. The optimization algorithm searches for optimal values of unknown pollutant source fluxes,

(i is an integer variable varying from 1 to maximum number of sources) by generating candidate solutions of these decision variables in optimization algorithm. Candidate values of fluxes

This optimal search process with different candidate values of unknown variables results in an optimal solution that minimizes residual error between the simulated and observed pollutant concentrations. It is to be noted that

The performance of the proposed methodology is evaluated for an illustrative polluted aquifer study area as shown in figure 3. The performance is evaluated for different scenarios by varying source activity initiation times, varying the degree of spatial heterogeneity in the study area and different combination of observation locations. The study area is specified as comprising of heterogeneous, anisotropic, and confined aquifer. This study area has total dimension of 2100 meters in the x direction and 1950 meters in the y direction. The entire study area is discretised into smaller grids of size Δx, Δy and Δz in x, y and z direction respectively. The study area contains three different hydro-geological zones with different values of hydraulic conductivity K_{xx} and po-

rosity θ. Groundwater flow and solute transport processes are simulated using the value for saturated thickness of the aquifer b, longitudinal dispersivity αL, transverse dispersivity αT and horizontal anisotropyas given in table 1.

In the discretised study area, cells marked with red star represent the grid locations containing a pollutant source S(i) where i represents the source number. Cells marked with green and purple circles are the grid locations containing an observation well.

In the first scenario, all the three sources are assumed to have different activity initiation times (table 2) and activity duration of 900 days. The entire activity duration of the sources is divided into three equal stress periods of 300 days. The pollutant flux from the sources is assumed to be constant over a stress period. The pollutant flux from each of the sources is represented as S(i)(j), where i represents the source number and j represents the stress period number. A total of nine source fluxes S11, S12, S13, S21, S22, S23, S31, S32, S33, _{xx} and effective porosity θ values in the three zones are kept different to show heterogeneity in the aquifer system. The scenario is further complicated by considering irregular monitoring for pollutant concentration measurements and missing observation measurement data Cobs.

Since all flow and pollutant transport in groundwater system is inherently transient, the performance of the developed methodology was evaluated for transient flow and solute transport condition in the second scenario. Three different sources are considered and it is assumed that all the three sources start activity at the same time. It is also assumed that the starting time of the activity of the sources is same as the starting time of the simulation

stress period. A total of nine source fluxes S11, S12, S13, S21, S22, S23, S31, S32, S33 and ∆T are considered as explicit unknown variables in the optimization problem.

In actual field application, the concentration measurements are to be obtained from field data. However, for performance evaluation purpose, these measurements are synthetically generated by simulation for assumed actual pollutant sources. The observed aquifer responses are simulated by numerical simulation models, MODFLOW and MT3DMS in GMS7.0. Initial and boundary conditions i.e. initial heads, boundary heads are specified in the numerical simulation models. Actual source fluxes are utilized to simulate these observed concentration measurement data at specified measurement locations.

The time of start of the numerical simulation model for synthetically generating the observed concentration measurements is designated as the initial time

In the first evaluation scenario considered, the time lag between the time of first pollutant concentration measurement

In the second evaluation scenario all the sources are assumed to start at the same time and the time lag between the time of first pollutant concentration measurement ^{m}. A total of four temporal pollutant

concentration measurements from each of the six observation wells are utilized.

In order to reflect real life condition, where the pollutant measurements are erroneous, the numerically simulated concentrations were perturbed to incorporate measurement errors. The observed concentrations generated using MODFLOW and MT3DMS were perturbed by adding error terms to the simulated measurement data to represent the effect of random measurement errors. These errors were added in order to incorporate realistic measurement errors. The observed pollutant concentration data is perturbed with random measurement error with maximum deviation of 10 percent of the measured concentration value as shown in equation (7).

where

Latin hypercube distribution is chosen for generating random error data evenly distributed across all class intervals, thus eliminating any clustering of sample data in few of the class intervals.

The proposed methodology for source flux identification and estimation of source activity initiation time is evaluated using synthetically generated observed concentration measurements as explained in the previous section. The source identification model simulates the aquifer response for a period of 5000 days. Concentration measurements are recorded at interval of 200 days for scenario 1 and 182 days for Scenario 2, at specified observation locations. Since the actual starting time

Candidate values of unknown source fluxes and lag times are generated within the optimization model. The generated flux values are used for simulation of flow and transport process as a part of the linked simulation-optimization model. Value of time lag

The evaluation results of scenarios 1 using error free concentration measurement data and perturbed erroneous concentration measurement data is presented in figure 4. The solution results of identification using error free data and perturbed error data are compared to the actual source flux and lag time. Each of the unknown source flux variables S(i)(j) and lag time ∆T,

Results of source flux identification using error free data closely match with the actual source fluxes values for all source fluxes in both the scenarios. The time lag between the start of source activity and the first pollutant concentration measurement is precisely identified by the model in both the scenarios. In Scenario 1, the estimated time lags can be used to estimate the starting time of the activity of the sources by subtraction

Even while using perturbed concentration measurement data in the identification model, the time lag between the start of source activity and the first pollutant concentration measurement is accurately estimated leading to accurate identification of the starting time of the activity of the sources. The solution results for source flux estimates using erroneous concentration measurements data show large errors of estimation in comparison to error free measurement data. These deviations between the actual and the estimated value of the source fluxes show the effect of errors in concentration measurement data, which accounts for random measurement errors in a real world scenario.

It can be seen in figure 5 that while using perturbed concentration measurement the source fluxes estimated in Scenario 2 having transient flow condition shows large deviation in estimating S11, S21 and S31. On analysing the concentration measurements from the observation locations it was found that after a lag time of 1638 days, as was the case in Scenario 2, most of the pollutant plume had already passed over the observation well locations used in this scenario. As a result, only the tail end of the breakthrough curve was captured in the observed pollutant concentrations. It is also highly probable that the observed pollutant concentration from the tail end of the breakthrough curve was due to the later part of source activity, and the effect of the initial source activity was not captured by the tail end of the breakthrough curve. This could explain the reason for large deviations in estimating the flux values from all the three sources during stress period one using erroneous concentration measurement data. This also stresses the need for optimal monitoring well location for source identification.

This developed methodology for simultaneous identification of the source fluxes and their starting time appears to perform satisfactorily in estimating the unknown groundwater pollution source fluxes and their starting times for the illustrative example. These results show that this proposed methodology is capable of overcoming major limitations in the earlier methods. The main limitations of these earlier proposed source identification methods were: 1) It assumed that the starting time of the activity of the sources is precisely known; 2) In scenarios where the time span for the possible start of the sources activities is known with fair degree of certainty, actual starting time was estimated indirectly by solving for source flux magnitudes; 3) In scenarios, where there was no information of the time span for the possible start of the sources activities, solving indirectly for a large number source flux magnitudes over a very large time span, rendered the earlier methods computationally infeasible.

The proposed methodology is applicable to real world problems of source identification in polluted aquifer sites, where no information is available about the starting times of the sources activity and the uncertainty spans over a large time period. This methodology in its current form can estimate the activity starting times for all the potential sources both in steady state and transient conditions. It can also handle multiple sources having different source activity initiation times and missing observation data.

The main advantage of the proposed methodology is its capability to treat the starting time of source activity as an explicit decision variable. This capability has the potential to render many real life unknown groundwater pollution source identification problems computationally feasible. This will ultimately enhance the capability of addressing groundwater pollution remediation issue in complex contaminated aquifer study areas, where there is very little prior knowledge of the source location, source flux release history, and the source activity initiation time.

We acknowledge the financial support for this work provided by CRC for Contamination Assessment and Remediation of the Environment (CRC-CARE), Australia and James Cook University, Australia.