_{1}

^{*}

Inverse geochemical modeling of groundwater entails identifying a set of geochemical reactions which can explain observed changes in water chemistry between two samples that are spatially related in some sense, such as two points along a flow pathway. A common inversion approach is to solve a set of simultaneous mass and electron balance equations involving water-rock and oxidation-reduction reactions that are consistent with the changes in concentrations of various aqueous components. However, this mass-balance approach does not test the thermodynamic favorability of the resulting model and provides limited insight into the model uncertainties. In this context, a Monte Carlo-based forward-inverse modeling method is proposed that generates probability distributions for model parameters which best match the observed data using the Metro-polis-Hastings search strategy. The forward model is based on the well-vetted PHREEQC geochemical model. The proposed modeling approach is applied to two test applications, one involving an inverse modeling example supplied with the PHREEQC code that entails groundwater interactions with a granitic rock mineral assemblage, and the other concerning the impact of fuel hydrocarbon bioattenuation on groundwater chemistry. In both examples, the forward-inverse approach is able to approximately reproduce observed water quality changes invoking mass transfer reactions that are all thermodynamically favorable.

Inverse process models attempt to estimate model parameter values based on changes in observed data between two or more data sets (in contrast to forward process models, which predict the values of variables based on assumed model parameters). In this context, the compositional differences between two groundwater samples—an initial water and a final water, should contain the information necessary for an inverse model to unravel candidate causative reaction and mixing histories. The results of such an inverse model may not necessarily be a uni- que problem exacerbated by uncertainties associated with the groundwater chemistry data itself (e.g., analytical uncertainty, representativeness of samples, incomplete analyses suites).

Different strategies can be employed to address the inverse problem and associated uncertainties. Simple mass and charge balance constraints can be used to determine which reactions are mathematically consistent with the data [_{2} and fluorite between two points along a groundwater flow path in an Indian aquifer.

An alternative strategy is to utilize a forward-inverse approach that entails running many forward models to identify those reactions that best match the observed data. Least-squares error minimization or similar approaches can be invoked for parameter estimation, a strategy employed by generic inverse models such as CXTFIT [

To address these issues, a modified Monte-Carlo-based geochemical modeling strategy is proposed to constrain the parameter space search. The strategy is based on the Metropolis-Hastings algorithm [

The MHMC strategy involves the sequential generation of sets of proposed parameter values, submission of those values as input into a forward model, and quantification of the resulting goodness-of-fit of the model output to the data through a scoring scheme. The approach is iterative in nature in that the current score for a given iteration is compared with the score from the prior run; if the goodness of fit using the current parameter set represents an improvement, the proposal is automatically accepted. Proposals that score poorly in comparison are not necessarily rejected, however, provided that the proposal scores above some threshold fraction (typically chosen at random) of the prior score. This strategy prevents the algorithm from prematurely converging on local minima or maxima, allowing for more complete exploration of the parameter space but with a bias towards more promising parameter combinations. The parameter set values themselves are selected as part of a Markov Chain sequence; each new proposal represents only relatively minor changes in postulated values in comparison to the prior set. Rejected proposals result in a subsequent new set of posited parameter values using the previous “seed” values, whereas the parameter values associated with accepted proposals are assigned as the new seed values for the next iteration.

The overall algorithm for the application of the MHMC strategy to the generation of an inverse geochemical model is summarized in

After a proposal is run as a forward geochemical model, the resulting concentration predictions must be compared to the corresponding measured concentrations in the final water composition for each of the selected metrics. A useful definition of a proposal’s score is a lumped sum of the squares of the errors for each metric component, defined logarithmically:

where N is the number of water quality metrics, c_{sim} and c_{obs} the simulated and observed concentrations, respectively, and w a weighting factor introduced to provide some flexibility in emphasizing the importance of matching certain metrics with respect to others (any such intentional bias will be application-specific).

The MHMC algorithm accepts any proposal yielding a score that is less than that of the prior proposal, with scores that do not meet this requirement accepted only conditionally. Specifically, proposals with higher scores than the prior trial are accepted when the inequality,

is satisfied, where r is a random number selected between 0 and 1, _{j} the prior and current scores. The exponent α serves as a means to throttle the proposal acceptance rate and is selected by trail-and-error for specific applications; an acceptance rate between 0.3 and 0.7 generally leads to quicker convergence to viable distributions of model parameter values.

PHREEQC was employed as the forward model in the MHMC algorithm. A python script was used to automate the generation of posited parameter sets for each iteration, write the input file for PHREEQC, execute PHREEQC, and subsequently read, process, score, and record the resulting output.

A geochemical inverse model example described by [_{0.62}Ca_{0.38}Al_{1.38}Si_{2.62}O_{8}), biotite mica (as KMg_{3}AlSi_{3}O_{10}(OH)_{2}), kaolinite, montmorillonite (as Ca_{0.17}Al_{2.33}Si_{3.67}O_{10}(OH)_{2}), calcite, halite, gypsum, and CO_{2}. The perennial spring sample is characterized by higher concentrations of major cations and anions (e.g., Na, Ca, Mg, HCO_{3}, SO_{4}) than the ephemeral spring sample, an observation consistent with presumed differences in residence time (_{2}, calcite, in the perennial spring water, coupled with the precipitation of kaolinite and silica. Because PHREEQC’s inverse model considers compositional uncertainty in the measured data, two different possible inverse models are indicated (

The MHMC inverse model for the Sierra Spring water problem assumes a set of phases included in PHREEQC’s thermodynamic database (phreeqc.dat) that approximately correspond to the mineralogy specified in the example problem. This assemblage includes calcium montmorillonite (characterized by the same stoichiometry as the PHREEQC inverse modeling example), kaolinite, and chalcedony as the silica phase. A K- mica phase, KAl_{3}Si_{3}O_{10}(OH)_{2}, was used as a surrogate for the biotite mica assumed by the PHREEQC inverse model example (for which an equilibrium constant was not specified). Unlike the example mica, this K-mica phase does not contain any magnesium, so magnesium was not included as a metric for scoring the proposal. Plagioclase was modeled as an ideal solid solution consisting of albite, NaAlSi_{3}O_{8}, and anorthite, CaAl_{2}Si_{2}O_{8}, end-members, with an initial Na:Ca ratio of approximately 2:1. In addition to this Na:Ca ratio, other uncertain model parameters included initial masses of kaolinite and K-mica as well as equilibrium constants for calcium montmorillonite and K-mica. Search ranges for the initial mineral masses were lognormally distributed between 10^{−6} and 10^{−3} moles/kgw, while uncertainties in the equilibrium constants were addressed by specifying a log saturation index constraint for mineral precipitation/dissolution in the PHREEQC input files between −0.25 and +0.25.

A total of 5000 trial proposals were evaluated, requiring several minutes of run time on a Windows-based personal computer, with the score contribution weighting factors for all components set equal to 1.0 except for Na and pH, for which weighting factors were set to 5.0 and 10.0 to improve the rate of convergence. Approximately 37 percent of the proposals were accepted. Matches of the MHMC inverse algorithm results to the water quality data are shown in _{2}, and plagioclase all dissolve into solution by approximately corresponding amounts. However, the remaining silicate phases exhibit somewhat different behavior in that Ca-montomorillonite is favored to dissolve. This dissolution reaction contributes to the precipitation of both kaolinite and chalcedony in quantities that exceed those yielded by the PHREEQC inverse model results.

While the results of the only the best-fit proposal are shown in ^{−6} and 10^{−3} moles/kilogram of water, characterized by a lognormal distribution (log mean = 3.2 × 10^{−5} mol/kgw). The MHMC-derived distribution shown in

A second demonstration application concerns the impact of fuel hydrocarbon bioattenuation on groundwater chemistry at an anonymous site in Montana, USA, where impacts to groundwater by diesel from a leaking above-ground tank are known to have occurred. Inverse modeling entailed using a limited site quality data set to assess consistency of the data with bioattenuation and to quantify hydrocarbon mass reduction to support remedial planning. Groundwater quality data measured in site groundwater wells includes manganese, ferrous iron, sulfate, sulfide, methane, bicarbonate alkalinity, chloride, and trace elements of potential environmental concern, in addition to dissolved-phase diesel fuel, reported as total petroleum hydrocarbons (TPH). In contrast, general indicators of groundwater quality, such as major cations and pH, have not been subject to monitoring.

Processes that are commonly observed to occur in association with bioattenuating organic matter or fuel hydrocarbons, such as reduction of ferric iron and sulfate as well as methanogenesis [

Both dissolved iron and manganese were detected in nearly all groundwater samples collected from the site, including background samples. Consequently, PHREEQC was used to set the initial redox potential to reflect equilibrium with goethite, FeOOH (manganese geochemistry was not included in the inverse modeling assessment). Ferrihydrite, Fe(OH)_{3}, is an alternative ferric oxide mineral choice, although trial simulations indicated it

was much less effective in matching the observed data. Goethite was also designated as a reactive hydrous ferric oxide surface (HFO) vis-à-vis PHREEQC’s surface chemistry modeling feature. The HFO surface was assumed to be in equilibrium with background groundwater chemistry as an initial condition. This specification fixed the total amount of arsenic in the system, with the majority of the arsenic mass existing in a sorbed state on the HFO (arsenic was modeled as existing only in the arsenate form, i.e., the +V redox state). Additional phases included in the model were barite (BaSO_{4}), calcite (CaCO_{3}), and mackinawite (FeS). Diesel was represented in the model by a mean stoichiometry of C_{12}H_{23}.

A total of 5000 proposals were generated and assessed using the MHMC algorithm, of which 1816 proposals, or approximately 36 percent, were accepted. A comparison of median measured background and impacted biodegradation constituents and the forward-model output generated by the best-scoring proposal is shown in

The sensitivity of the accepted proposal scores to two key parameters—the initial mass of diesel and the initial mass of goethite are shown in

In addition to the posited initial masses, equilibrium dissolution constants (represented in proxy by fixing the logarithm of the saturation index for the mineral phase) for mineral phases with presumably uncertain composition—hydrous ferric oxide and iron sulfide were also treated as unknowns and thus allowed to vary within certain ranges in the proposal parameter sets. Among the top ten scoring proposals, log saturation indices for the iron oxyhydroxide phase, goethite), ranged between +1.2 and +1.8; that of the best proposal was approximately +1.5. In effect, this represents an intermediate solubility between that of goethite and the more soluble phase Fe(OH)_{3}, as represented in PHREEQC’s database but not considered in the inversion. Saturation indices among the top ten proposals for FeS phase (mackinawite) ranged between +1.0 and +1.4, with that of the best proposal at approximately +1.3. Analogous to the iron oxyhydroxide phase, this value corresponds to an intermediate solubility between mackinawite and the more soluble amorphous FeS precipitate phase represented in the phreeqc.dat database of the PHREEQC code but not included in the model.

The goal of the MHMC strategy is to identify approximate probability distributions of model parameters that best reproduce observed data. In principle, employing ever larger numbers of iterations should yield increasingly accurate approximations, assuming that the underlying forward model is itself applicable. For the geochemical inverse modeling approach proposed in this study, the forward model PHREEQC has already been well vetted elsewhere in the literature. Therefore, the best-scoring proposals generated by the MHMC algorithm, employing a large number of iterations, can be regarded as providing insights that are consistent with respect to mass balance, charge balance, electron transfer, and thermodynamic considerations. In this context, the MHMC approach serves to safeguard against inverse models that contain spurious mass transfer reaction sets that should not otherwise occur, a threat that exists with approaches such as PHREEQC’s built-in inverse modeling tool. However, some limitations of the MHMC approach, as proposed, should be considered. Clearly, the validity of its application to any specific problem should always be assessed on a case-by-case basis, requiring consideration of mineral phase selection and the reliability of thermodynamic data. Second, the geochemical inverse modeling does not take into account the reaction path followed by the system as one water composition is transformed into another; only the initial and final solution states are addressed (i.e., reaction kinetics are not addressed).