^{1}

^{*}

^{2}

^{3}

In this study, we introduce a numerical method to reduce the solute transport equation into a reduced form that can replicate the behavior of the model described by the original equation. The basic idea is to collect an ensemble of data of state variables (say, solute concentration), called snapshots, by running the original model, and then use the proper orthogonal decomposition (POD) techniques (or the Karhunen-Loeve decomposition) to create a set of basis functions that span the snapshot collection. The snapshots can be reconstructed using these basis functions. The solute concentration at any time and location in the domain is expressed as a linear combination of these basis functions, and a Galerkin procedure is applied to the original model to obtain a set of ordinary differential equations for the coefficients in the linear representation. The accuracy and computational efficiency of the reduced model have been demonstrated using several one-dimensional and two-dimensional examples

Accurate predictions of radionuclide transport in general come from process models, which are defined as detailed flow and contaminant transport model that best replicate the available data for a site. They usually are the most complex and sophisticated models of flow and transport at a particular site. A significant technical issue arises when one tries to use those model results in probabilistic systems modeling: it is impractical to directly solve the computationally intensive model for each Monte Carlo realization that would be necessary to properly span the range of uncertainty for every model parameter.

On contrast to process models, systems models have been proposed to represent complex process models with simplified models that are suitable for Monte Carlo analysis. Systems models incorporate streamlined versions of one or more process models, along with uncertain estimates of the contaminant source term. Systems models integrate knowledge of all of the processes relevant to assessing risk, and therefore are critical to support remedies or decisions to monitor groundwater. Process models synthesize knowledge of one component of the transport path; the systems models integrate knowledge of all aspects of contaminant transport.

The groundwater pathway in the systems model should be an abstraction model that we believe captures the essential features of the groundwater process model, although there is a risk that the simplification process will filter out something important. When the groundwater pathway is simulated in a process model, it is generally not practical to embed the entire model in the systems model-computational limitations intercede. Consequently, we often settle for one or a few simplified models. These simplified models are called abstraction models. Within these simplified or other types of transport modules, parameters that can be varied in a probabilistic analysis include the groundwater velocity, dispersion coefficient, and sorption parameters. In the best case, the process model is used to justify these parameter distributions, but often this is not done in great detail due to the complexity of the process models and the lack of a convenient tool to formally abstract the process model results. The result may be a systems model that does not fully consider, or indeed even contradicts, one or more of the process models. The credibility of the systems model is placed in jeopardy when this happens.

If an efficient and accurate model abstraction procedure can be implemented for groundwater contaminant transport, we largely avoid the issue of justifying the validity of our abstraction model: the original process model is effectively incorporated in the systems model. Alternatively, if a modeling analysis stops short of systems analysis, the reduced model can still be used to explore uncertainties much more efficiently, thereby allowing better sensitivity analyses to be conducted.

To date, methods for proceeding from complex process models to simplified systems models have not been formally determined or defined. In this study, a method is developed to obtain efficient and accurate reduced models, alternatively referred to as abstraction models. The technique, called the proper orthogonal decomposition (POD), also known as principal component analysis or Karhunen-Loeve expansion, uses the results from a process model as “data” that provides the basis for a reduced model. The procedure, in general terms, calls for the running of the process model, in this case a contaminant transport simulation, forward in time, recording “snapshots” of contaminant concentration. The mathematical technique transforms the results into a set of basis functions that span the behavior of the transport problem throughout the model domain. The reduced model is then constructed by assuming that the solution of the transport problem can be formulated as a linear combination of the basis functions. A Galerkin procedure using the basis functions results in a small set of ordinary differential equations that are solved in time.

There are a variety of applications in various fields of science for which this technique has been applied. For example, researchers in the field of fluid dynamics have used POD methods to discern so-called coherent structures within a turbulent flow field [

An excellent summary of the underpinning theoretical development of POD is presented by [

The governing equation for transport of a single solute in porous media can be written as [

subject to an initial condition and appropriate boundary conditions. Here f is the porosity, S_{l} is the liquid saturation, r_{l} is the liquid density, C is the solute concentration, D is the dispersion tensor, u_{l} is the Darcy velocity, C_{r} represents the adsorption of the solute onto the porous media, r_{r}dC_{r}/dt is an equilibrium sorption term, and q is sources or sinks. All parameters in Equation (1) can be space-dependent, but for simplicity the coordinate has been suppressed. The general equilibrium model for adsorption of species onto the porous media is given by [

where α_{1}, α_{2}, and β are parameters defining different sorption-isotherm models. In this study, the linear isotherm model (α_{1} = K_{d}, α_{2} = 0, and β = 1) is used: C_{r} = K_{d} C, where K_{d} is the partition or distribution coefficient.

If we assume that flow is in saturated porous media at the steady state condition, it follows that r_{l}, D, and u_{l} are independent of time, and saturation S_{l} = 1. In this case, Equation (1) can be simplified as

This is the full model we deal with in this study. Given appropriate initial and boundary conditions, as well as other parameter fields such as permeability, one can first solve for the velocity field and then the concentration field from Equation (3).

Let, denote a set of N_{s} observations (or snapshots) of a state variable (in this case, solute concentration) observed or simulated from the full model run at time t_{i},. The basic idea of the POD method is to find a set of functions, called basis functions that have a structure typical of the members of the ensemble. The basis functions are chosen to give the best representation of the ensemble of snapshots, which maximizes

where is the inner product of the basis function and the concentration field. It can be shown that the basis functions can be expressed as a linear combination of snapshots [

where f_{kn} is the kth component of the nth eigenvector of the kernel K that is computed from

By Equation (5), each snapshot can be reconstructed exactly using these basis functions as

To summarize, for a given ensemble of N_{s} snapshots, one first computes the kernel K from Equation (6) and solves the eigenproblem KF = lF, which gives N_{s} eigenvalues l_{n}, , sorted from largest to smallest, and their corresponding eigenvectors,. The basis functions are computed from Equation (5) and any snapshot can be reconstructed using Equation (7). Since the matrix K is real, symmetric, and positive semi-definite, all eigenvalues are non-negative. The importance of the nth eigenvectors depends on its relative “energy”, characterized by the ratio of the nth eigenvalue to the sum of all eigenvalues (total energy): . In many cases, the first few eigenvalues carry most of the total energy and thus concentration C(x) in Equation (7) can be approximated by truncating the first M terms ().

To find the spatial distribution of concentration at a time that is not in the ensemble of snapshots, one has to solve for the coefficients in Equation (7). In the following section, a reduced model is introduced, in which ordinary differential equations for these coefficients are derived from the original (partial differential) transport equations.

Suppose that the full model, i.e., Equation (3), is solved to obtain a set of snapshots of the concentration distribution C(x, t_{i}),. Based on the algorithm described in the previous section, one can find a set of basis functions Then the Galerkin’s method can be employed by seeking an approximation of concentration C(x, t) as

where M < N_{s}, and a_{m} are time-varying coefficients that are independent of spatial locations. The physical mean of (8) is that the concentration field is approximated by a linear combination of some pre-determined, space-dependent “structures”, weighted by some time-dependent coefficients. Since in (8) is an approximation of the true concentration, replacing in the full model, i.e., (3), by will produce a model error. In the Galerkin’s method, this error is forced to be orthogonal to all basis functions, i.e.,

for, where. Here the orthogonality of any two functions is defined as the integration of the product of two functions over the simulation domain being zero. Substituting (8) into (9) and recalling the orthogonality of basis functions yield

which can be written as

where, and

are defined as

and

The initial condition for Equation (11) is derived from the original initial condition C(x, t) = C_{0}(x):

Thus the partial differential equation (PDE) has been reduced to a system of M ordinary differential equations (ODE). Provided that M is fairly small, the reduction in computational time should be significant compared to the numerical solutions of the PDE in the original model.

In this section, several examples are presented to illustrate how the model reduction techniques can significantly reduce computational efforts in solute transport in porous media, while retaining accuracy. This is accomplished by comparing results from the reduced models against those from the full model runs. Note that, in solving the full model numerically, it is quite often that numerical errors may be introduced. To avoid this, in the first two examples, simple one-dimensional transport problems were chosen because analytical solutions for these simple cases are available, which made it easy to assess the accuracy of the reduced model.

In the first example, we consider a one-dimensional transport problem in a saturated column of 1 m in length, uniformly discretized into 200 elements. The hydraulic conductivity is a constant K_{s} = 1.0 m/day for the entire column, and the flow is driven by a hydraulic gradient of J = 0.001, which produces a uniform flow with Darcy velocity of 1.1813 × 10^{-8 m/s. Other transport parameters are given as: the dispersivity coefficient α}_{L} = 0.03333 m, partition coefficient K_{d} = 0.1, water density r_{l} = 1000.0 kg/m^{3}, rock density r_{r} = 2500 kg/m^{3}, and porosity f = 0.25. Under the given initial condition C(x, 0) = 0, a fixed concentration C(x, t) = 1.0 at the inlet (x = 0), and a zero concentration gradient ¶C(x, t)/¶x = 0 at x = ¥, this problem can be solved analytically [

where is the retardation factor, and erfc is the complementary error function. Twenty-five concentration snapshots are computed using Equation (15) at time t = nDt, where Dt = 20 days and n ranges from 1 to 25. Some selected snapshots are illustrated in

Using these 25 snapshots, the kernel K is computed from (6), and the eigenvalues and eigenvectors associated with this kernel are solved from KF = lF. These eigenvalues and eigenvectors depend significantly on the choice of snapshots. It is critical that each snapshot be significantly different from all others.

The eigenvalues for this set of snapshots are depicted in

The first few basis functions are illustrated in _{s} (=25) terms.

Based on these basis functions, all snapshots can be reconstructed using Equation (7). Comparisons of the true snapshots and reconstructed snapshots indicate that at most 6 basis functions are enough to reconstruct these snapshots with sufficient accuracy. Of course, in general, the number of basis functions required to obtain an accurate solution will depend on the parameters (K_{s}, and α_{L}, L) in the solute transport problem.

The spatial distribution of solute concentration at any given time can be derived from solving the reduced model. The ordinary differential Equation (11) with the initial condition (14) was solved using the fourth-order Runge-Kutta method with a time step of Dt = 1000 s. In particular, the ensemble of those snapshots can be solved from the reduced model (rather than reconstructed from (7). Four selected concentration distributions computed from the reduced model, as functions of the number of basis functions included, are illustrated in

It should be pointed out that, if the number of basis functions included in the reduced model is not enough, the results could be completely wrong. As illustrated in

It is interesting to investigate how well the reduced model can predict the concentration distribution at an elapsed time that is different from those times at which the ensemble of snapshots are taken.

In the second example, all transport parameters are the same as in the previous example except that (1) the partition coefficient K_{d} is zero in this case, and (2) instead of a fixed constant concentration at the inlet (x = 0), a unit pulse input is imposed at the inlet for a duration of 5 days. For this simple case, analytical solution is also available [

where t_{0} is the duration of the pulse input, which starts at zero. Using this equation, 25 snapshots were calculated from t = 10 to 250 days at an increment of 10 days. Several selected snapshots are illustrated in

Following the POD method, the kernel K was computed from these 25 snapshots using (6); eigenvalues and eigenvectors were solved from KF = lF; and then basis

functions were computed using (5). The set of eigenvalues as a function of the mode is depicted in

In the third example, solute transport is modeled in a two-dimensional rectangular domain of size 1 m × 0.5 m, uniformly discretized into 100 × 50 square elements. Nonflow conditions are prescribed at two lateral boundaries. The hydraulic head is fixed at the left and right boundaries as 10.001 m and 10.0 m, respectively, with a hydraulic gradient J = 0.001. The porous medium is heterogeneous in both the hydraulic conductivity K_{s} and the sorption coefficient K_{d}. It is assumed that the log hydraulic conductivity Y = ln(K_{s}) has a normal distribution and is second-order stationary following an isotropic, exponential covariance function with a correlation length of 0.3m. The statistics of the log hydraulic conductivity are given as = 0 (i.e., the geometric mean of the saturated hydraulic conductivity K_{G} = 1.0 m/day) and = 0.693 (the coefficient of variation CV = 100%).

For the transport problem, it is assumed that the initial concentration is zero in the domain and concentration of C(x, t) = 1.0 is fixed at the middle of the upstream boundary, (0.0 m, 0.25 m). It is also assumed the log partition

coefficient is uncorrelated with the log hydraulic conductivity and that ln(K_{d}) is also second-order stationary field following an isotropic, exponential covariance function with a correlation length of 0.3 m. The statistics of ln(K_{d}) are given as = -1.6094 (i.e., the geometric mean =0.2) and = 9.95 × 10^{-3} (the coefficient of variation CV = 10%). The spatial distribution of K_{d} generated using the sgsim code of GSLIB is illustrated in _{L} = 0.02 m, and the transverse dispersivity coefficient α_{T} = 10^{-4} m.

The full transport model was run for 200 days using the FEHM code and the concentration distribution was recorded at t = 10n days, where n ranges from 1 to 20, and these 20 concentration fields were taken as snapshots for the model reduction method. Basis functions were then computed from these snapshots using the POD method, and Equation (11) with initial condition (14) were solved numerically.

duced model can reproduce the full model results very well, even for a contour level as low as C = 0.01.

To compare the computational efficiency of these two methods, the CPU time required for the full model run was recorded, in which the maximum time step was set to 2 days while the actual size of the time step was automatically adjusted during the solution process by the program itself. For the transport problem as described above, the FEHM code takes 23.3 hours. The computation time needed for the reduced model depends on the time step used in the fourth-order Runge-Kutta method, which is fixed at Dt = 1000 seconds in this case. The CPU time for the reduced model run is only 78 seconds. Of course, this run time will increase if a smaller time step was used or a great number of snapshots and functions is required. However, the comparison of model results from the full model and the reduced model indicates that the time step of Dt = 1000 seconds used in the reduced model is small enough for this problem, and the overall comparison indicates that the number of snapshots and basis functions were sufficient.

In this study, we have demonstrated that the advectiondispersion equation can be cast in a reduced model form, and a reduced numerical model can be developed that replicates the behavior of the original model. We have derived, from the original model equations, a method for reducing the transport model, a partial differential equation with unknowns at each numerical grid point, to a small number of ordinary differential equations solved in time. The method consists of running the original model to obtain the snapshots of concentration in the model domain, computing the basis functions for the model using the POD technique, and using these basis functions in a Galerkin procedure to obtain the ordinary differential equations of the reduced model.

The accuracy and computational efficiency of the reduced model have been investigated using several onedimensional and two-dimensional examples with variable conductivity field K_{s} and sorption coefficient K_{d}. These examples demonstrate that the reduced model can reproduce the full model results very accurately while the computational time (in terms of the CPU time) required for the reduced model is much less than that required for the full model.