Second-Order MaxEnt Predictive Modelling Methodology. I: Deterministically Incorporated Computational Model (2nd-BERRU-PMD)

Abstract

This work presents a comprehensive second-order predictive modeling (PM) methodology designated by the acronym 2nd-BERRU-PMD. The attribute “2nd” indicates that this methodology incorporates second-order uncertainties (means and covariances) and second-order sensitivities of computed model responses to model parameters. The acronym BERRU stands for “Best- Estimate Results with Reduced Uncertainties” and the last letter (“D”) in the acronym indicates “deterministic,” referring to the deterministic inclusion of the computational model responses. The 2nd-BERRU-PMD methodology is fundamentally based on the maximum entropy (MaxEnt) principle. This principle is in contradistinction to the fundamental principle that underlies the extant data assimilation and/or adjustment procedures which minimize in a least-square sense a subjective user-defined functional which is meant to represent the discrepancies between measured and computed model responses. It is shown that the 2nd-BERRU-PMD methodology generalizes and extends current data assimilation and/or data adjustment procedures while overcoming the fundamental limitations of these procedures. In the accompanying work (Part II), the alternative framework for developing the “second- order MaxEnt predictive modelling methodology” is presented by incorporating probabilistically (as opposed to “deterministically”) the computed model responses.

Share and Cite:

Cacuci, D. (2023) Second-Order MaxEnt Predictive Modelling Methodology. I: Deterministically Incorporated Computational Model (2nd-BERRU-PMD). American Journal of Computational Mathematics, 13, 236-266. doi: 10.4236/ajcm.2023.132013.

1. Introduction

The results of measurements and computations are never perfectly accurate. On the one hand, results of measurements inevitably reflect the influence of experimental errors, imperfect instruments, or imperfectly known calibration standards. Around any reported experimental value, therefore, there always exists a range of values that may also be plausibly representative of the true but unknown value of the measured quantity. On the other hand, computations are afflicted by errors stemming from numerical procedures, uncertain model parameters, boundary and initial conditions, and/or imperfectly known physical processes or problem geometry. Therefore, knowing just the nominal values experimentally measured or computed quantities is insufficient for applications. The quantitative uncertainties accompanying measurements and computations are also needed, along with the respective nominal values. The discrepancies between experimental and computational results provide the basic motivation for performing quantitative model verification (meaning: “is the mathematical model solved correctly?”) and model validation (meaning: “does the model represent reality?”), which are essential components of “predictive modeling.”

Predictive modeling commences by identifying and characterizing the uncertainties involved in every step in the sequence of the numerical simulation processes that ultimately lead to a prediction. Predictive modeling comprises three key elements, namely model calibration, model extrapolation, and estimation of the validation domain. Model calibration addresses the combination of experimental and computational data and their uncertainties for the purpose of obtaining “best estimate” values for model parameters (to be used for updating the model’s parameters) and predicted model results, along with “best estimate” uncertainties (covariance/correlation matrices) for these “best-estimate” predicted parameters and results requires. Such a combination of computational and experimental information requires reasoning from incomplete, error-afflicted, and occasionally discrepant information, which includes: 1) errors and uncertainties in the data used in the simulation (e.g., input data, model parameters, initial conditions, boundary conditions, sources and forcing functions); 2) numerical discretization errors; and 3) uncertainties in (e.g., lack of knowledge of) the processes being modeled.

Under ideal circumstances, the result of predictive modeling is a probabilistic description of possible future outcomes based on all recognized errors and uncertainties. This probabilistic description enables the subsequent activity of “model extrapolation,” which aims at quantifying the uncertainties in predictions under new environments or conditions, including both untested regions of the parameter space and higher levels of system complexity in the validation hierarchy. The quantification of the validation domain underlying the models of interest requires estimation of contours of constant uncertainty in the high-dimensional space that characterizes the application of interest, including the identification of areas where the predictive estimation of uncertainty meets specified requirements for the performance, reliability, or safety of the system of interest.

The earliest activities aimed at extracting best-estimate values for model parameters by combining computational and experimental information using variational methods were initiated in the 1960s in the atmospheric and oceanographic sciences [1] [2] [3] and, in parallel, in the nuclear energy field [4] . In the earth, atmospheric and oceanographic sciences these activities were carried out under the name of “data assimilation” (DA) as described in well-known books [5] [6] [7] [8] , while in the nuclear sciences these activities reached conceptual maturity under the name of “nuclear data or cross section adjustment” [9] [10] [11] [12] . The “data adjustment” and “data assimilation” methodologies can thus be considered to have been the earliest systematic methodologies that embody the principles of “predictive modeling.

The fundamental criterion used in data adjustment and data assimilation methods is the least squares criterion, which is employed in a variety of deterministic (variational) and/or statistical forms (including Bayesian minimization, maximum likelihood, and minimum variance methods). The fundamental tenet of data adjustment and data assimilation is the minimization of a user-defined “cost functional” which describes the squared departures between computational results and the observations/experiments of the respective results. In contradistinction to the least-squares tenet underlying data adjustment and/or assimilation, the “BERRU-PM” methodology developed by Cacuci [13] [14] employs the “maximum entropy” (MaxEnt) principle [15] to combine computational and experimental information for obtaining best-estimate predicted mean values for model responses and parameters, together with reduced predicted uncertainties for these best-estimate values, thereby eliminating the need for minimizing the user-chosen “quadratic cost functional representing the weighted errors between measured and computed responses.” BERRU-PM is an acronym for “Best-Estimate Results with Reduced Uncertainties -Predictive Modeling” and is a MaxEnt methodology that incorporates first-order sensitivities of model responses with respect to the model parameters. Such sensitivities are most efficiently computed by using the adjoint sensitivity analysis method for nonlinear systems originally developed by Cacuci [16] [17] . The first-order BERRU-PM methodology was recently extended by Cacuci [18] to include second-order sensitivities. The main differences between DA [5] [6] [7] [8] results and the BERRU-PM [16] [17] [18] results are as follows:

1) DA [5] [6] [7] [8] is formulated conceptually either just in the phase-space of measured responses (“observation-space formulation”) or just in the phase-space of the model’s dependent variables (“state-space formulation”). Hence, DA can calibrate initial conditions as “direct results” but cannot directly calibrate any other model parameters. In contradistinction, the BERRU-PM methodology is formulated conceptually in the most inclusive “joint-phase-space of parameters, computed and measured responses.” Consequently, the BERRU-PM methodology simultaneously calibrates responses and parameters, thus simultaneously providing results for forward and inverse problems.

2) If experiments are perfectly well known, the DA methodology fails fundamentally, but the BERRU-PM methodology does not. The DA methodology also fails fundamentally when the response measurements happen to coincide with the computed value of the response, in which DA yields the trivial result. In contradistinction, the BERRU-PM methodology yields a non-trivial result.

3) The BERRU-PM methodology is significantly more efficient computationally than DA since it only requires the inversion a matrix of size having the dimensions of the number of model responses, whereas DA requires inversion of much larger matrices in the phase-space of dependent variables.

In this work, the methodology presented in [18] will be extended by including the full second-order representation of the computed responses, thereby creating the 2nd-BERRU-PMD methodology, where the attribute “2nd” indicates that this methodology incorporates second-order uncertainties (means and covariances) and second-order sensitivities of computed model responses to model parameters, and where the acronym BERRU stands, as before, for “Best-Estimate Results with Reduced Uncertainties;” the last letter (“D”) in the acronym indicates “deterministic,” referring to the deterministic inclusion of the computational model responses. The 2nd-BERRU-PMD methodology is fundamentally based on the MaxEnt principle.

This work is structured as follows: Section 2 presents the mathematical modeling of the physical system under consideration. Section 3 presents the development of the 2nd-BERRU-PMD methodology. It is shown that the results produced by the 2nd-BERRU-PMD methodology include and extend the results produced by the first-order BERRU-PM methodology and the extant data assimilation and/or data adjustment procedures while overcoming the fundamental limitations of these procedures. The concluding discussion presented in Section 4 lays the ground for the subsequent presentation [19] of the alternative framework for developing the “second-order MaxEnt predictive modelling methodology” which probabilistically (as opposed to “deterministically”) incorporates the computed model responses. This alternative methodology [19] will be designated by the acronym 2nd-BERRU-PMP, where the last letter (“P”) in the acronym indicates “probabilistic” inclusion of the computational model responses.

2. Mathematical Modeling of the Physical System

In general terms, the modeling of a physical system and/or the result of an indirect experimental measurement requires consideration of the following modeling components:

1) A mathematical model comprising linear and/or nonlinear equations that relate the system’s independent variables and parameters to the system’s state (i.e., dependent) variables;

2) Inequality and/or equality constraints that delimit the ranges of the system’s parameters;

3) One or several computational results, customarily referred to as system responses (or objective functions, or indices of performance), which are computed using the mathematical model; and

4) Experimentally measured responses, with their respective nominal (mean) values and uncertainties (variances, covariances, skewness, kurtosis, etc.).

A nonlinear physical system can be generally modeled by means of coupled equations which can be represented in operator form as follows:

N m [ v ( x ) , α ] = Q m ( x , α ) , x Ω x ( α ) . (1)

The superscript “m” in Equation (1) indicates “model.” Boundary and/or initial conditions must also be provided if differential operators appear in Equation (1). In operator form, these boundaries and/or initial conditions are represented as follows:

B m [ v ( x ) ; α ; x ] B s ( x , α ) = 0 , x Ω x ( α ) , (2)

where the vector B s ( x , α ) symbolically indicates “source terms on the boundary.” In this work, matrices will be denoted using capital bold letters while vectors will be denoted using either capital or lower-case bold letters. The symbol “ ” will be used to denote “is defined as” or “is by definition equal to.” Transposition will be indicated by a dagger ( ) superscript. The equalities in this work are considered to hold in the weak (“distributional”) sense. The right-sides of Equations (1) and (2), as well as of other various equations to be derived in this work, may contain “generalized functions/functionals”, particularly Dirac-distributions and derivatives thereof. The quantities which appear in Equations (1) and (2) are defined below.

1) The quantity α ( α 1 , , α T P ) is a TP-dimensional vector, having components α 1 , , α T P , which denote the model’s imprecisely known parameters. The quantity “TP” denotes the “total number of model parameters.” Without loss of generality, the model parameters α ( α 1 , , α T P ) D α T P can be considered to be real scalars defined over a domain D α , which is included in a TP-dimensional subset of the T P . These model parameters usually stem from processes that are external to the physical system under consideration and their precise value is seldom, if ever, known. The known characteristics of the model parameters may include their nominal (expected/mean) values and, possibly, higher-order moments or cumulants (i.e., variance/covariances, skewness, kurtosis), which are usually determined from experimental data and/or processes external to the physical system under consideration. Occasionally, just the lower and the upper bounds may be known for some model parameters. The components of the TP-dimensional column vector α T P are considered to also include imprecisely known geometrical parameters that characterize the physical system’s boundaries in the phase-space of the model’s independent variables. Mathematically, the model parameters can be considered to be quasi-random scalar-valued quantities which follow an unknown multivariate distribution denoted as p α ( α ) . The mean values (which in the context of computational modeling are called “nominal” values) of the model parameters will be denoted as α i 0 ; the superscript “0” will be used throughout this monograph to denote “nominal values.” These nominal values are formally defined as follows:

α i 0 D α α i p α ( α ) d α , i = 1 , , T P . (3)

The expected values of the measured parameters will be considered to constitute the components of a vector denoted as α 0 [ α 1 0 , , α i 0 , , α T P 0 ] . The covariance, cov ( α i , α j ) , of two model parameters, α i and α j , is defined as follows:

cov ( α i , α j ) D α ( α i α i 0 ) ( α j α j 0 ) p α ( α ) d α ; i , j = 1 , , T P . (4)

The covariances cov ( α i , α j ) are considered to constitutes the components of the covariance matrix for the model parameters and will be denoted as C α α [ cov ( α i , α j ) ] T P × T P .

2) The generic nonlinear model is considered to comprise TI independent variables which will be denoted as x i , i = 1 , , T I , and which are considered to be components of a TI-dimensional column vector denoted as x ( x 1 , , x T I ) T I , where the sub/superscript “TI” denotes the “total number of independent variables.” The vector x T I of independent variables is considered to be defined on a phase-space domain which will be denoted as Ω ( α ) and which is defined as follows: Ω ( α ) { λ i ( α ) x i ω i ( α ) ; i = 1 , , T I } . The lower boundary-point of an independent variable is denoted as λ i ( α ) and the corresponding upper boundary-point is denoted as ω i ( α ) . The boundary Ω ( α ) is also considered to be imprecisely known since it may depend on both geometrical parameters and material properties. A typical example of boundaries that depend on both geometrical parameters and material properties are the “boundaries facing vacuum” in models based on diffusion theory, where conditions are imposed on the “extrapolated boundary” of the respective spatial domain. The “extrapolated boundary” depends both on the imprecisely known physical dimensions of the problem’s domain and also on the medium’s properties, such as atomic number densities and microscopic transport cross sections. The boundary of Ω ( α ) , denoted as Ω ( α ) { λ i ( α ) ω i ( α ) , i = 1 , , T I } , includes the set of the endpoints λ i ( α ) , ω i ( α ) , i = 1 , , T I of the respective intervals on which the components of x are defined.

3) The vector v ( x ) [ v 1 ( x ) , , v T D ( x ) ] is a TD-dimensional column vector, having components v i ( x ) , i = 1 , , T D , which represent the model’s dependent variables (also called the model’s “state functions”). The abbreviation “TD” denotes “total number of dependent variables.” Without loss of generality, we can consider that v ( x ) E v , where E v is a normed linear space over the scalar field F of real numbers.

4) The quantity N m [ v ( x ) ; α ] [ N 1 m ( v ; α ) , , N T D m ( v ; α ) ] is a TD-dimensional column vector comprising the left-sides of the equations underlying the mathematical/computational model; the superscript “m” indicates “model.” The components N i m ( v ; α ) , i = 1 , , T D are operators (including differential, difference, integral, distributions, and/or finite or infinite matrices) acting nonlinearly on the dependent variables v ( x ) , the independent variables x and the model parameters α . The mapping N m ( v ; α ) is defined on the combined domains of the model’s parameters and state functions, i.e., N : D E E Q , where D = D u D α , D u E u , D α E α , E = E u E α .

5) The vector Q m ( x , α ) [ q 1 m ( x ; α ) , , q T D m ( x ; α ) ] E Q is a TD-dimensional column vector which represents inhomogeneous source terms (i.e., the right-sides of the equations underlying the model), which usually depend nonlinearly on the uncertain parameters α . The vector Q m ( x , α ) is defined on a normed linear space denoted as E Q .

6) In Equation (2), the vector B m ( v ; α ) [ B 1 m ( v ; α ) , , B T B m ( v ; α ) ] , where the subscript “TB” denotes “total number of boundary conditions,” comprises components B i m ( v ; α ) , i = 1 , , T B , which are nonlinear operators in v ( x ) and α , which are defined on the boundary Ω x ( α ) of the model’s domain Ω x ( α ) . The vector B s ( x , α ) [ B 1 s ( x ; α ) , , B T B s ( x ; α ) ] comprises inhomogeneous boundary sources (indicated by the superscript “s”) which are nonlinear functions of α . The column vector 0 has TB components, all of which are identically zero.

Solving Equations (1) and (2) at the nominal parameter values, α 0 , provides the “nominal solution” v 0 ( x ) , i.e., the vectors v 0 ( x ) and α 0 satisfy the following equations:

N m [ v 0 ( x ) ; α 0 ] = Q m ( x , α 0 ) , x Ω x , (5)

B m [ v 0 ( x ) ; α 0 ; x ] B s ( x , α 0 ) = 0 , x Ω x ( α 0 ) . (6)

The results computed using a mathematical model are customarily called “model responses” (or “system responses” or “objective functions” or “indices of performance”). Consider that there are a total number of TR such model responses, each of which can be considered to be a component of the “vector of model responses” r ( r 1 , , r k , , r T R ) . Each of these model responses is formally a function (implicit and/or explicit) of the dependent variables and model parameters α , which can be represented formally as follows:

r k = r k [ v ( x ) ; α ] ; k = 1 , , T R . (7)

In particular, a measurement of a physical quantity that depends on the model’s state functions and parameters can be considered to be a response denoted as R p [ v ( x ) ; α ] , which is to be evaluated at x = x p ( α ) , where x p ( α ) [ x 1 p ( α ) , , x k p ( α ) , , x T I p ( α ) ] denotes the location in phase-space of the specific “measurement point.” Such a measurement (or measurement-like) response can be represented mathematically as follows:

R p [ v ( x ) ; α ] λ 1 ( α ) ω 1 ( α ) λ T I ( α ) ω T I ( α ) F [ v ( x ) ; α ; x ] δ [ x 1 x 1 p ( α ) ] × × δ [ x T I x T I p ( α ) ] d x 1 d x T I . (8)

where the function F [ v ( x ) ; α ; x ] comprises the mathematical dependence of the measurement device on the model’s dependent variable(s), and where the quantity δ [ x i x i p ( α ) ] denotes the Dirac-delta functional. The measurement’s location in phase-space, x p ( α ) , may itself be afflicted by measurement (experimental) uncertainties. Hence, the components of x p ( α ) must be included among the components of the vector α of model parameters, even though the quantity x p ( α ) appears only in the definition of the response but does not appear in Equations (1) and (2), which mathematically define the physical model. Thus, the physical “system” is to be understood as being defined to comprise both the system’s computational model and the system’s responses. In most cases, the coordinates x k p ( α ) , k = 1 , , T I , will simply be independent (albeit uncertain) model parameters included among the components of the vector α , in which case x k p ( α ) / α n = 1 , if α n x k p and x k p ( α ) / α n = 0 , if α n x k p .

The expression on the right-side of Equation (8) can evidently be computed using the model, thus producing the “computed response” value, which will be denoted as r k c [ v ( x ) ; α ; x ] , and which can be compared to the corresponding “experimentally measured” value; the superscript “c” indicates “computed. The computed response r k c [ v ( x ) ; α ; x ] depends (implicitly and/or explicitly) on the model’s parameters and dependent variables, which also depend, in turn, on the model’s parameters. Therefore, the uncertainties affecting the model parameters α will “propagate” both directly and indirectly, through the model’s dependent variables, to induce uncertainties in the computed responses, which will therefore be denoted simply as r k c ( α ) . The nominal value of the response will be denoted r k ( α 0 ) , indicating that this value is obtained by computing the model response using the expected/nominal parameter values α 0 ( α 1 0 , , α T P 0 ) .

3. 2nd-BERRU-PMD: Second Order MaxEnt Predictive Modeling Methodology with Deterministically Included Computed Responses

This Section presents the mathematical and physical considerations leading to the development of the “second-order MaxEnt predictive modeling methodology for obtaining best-estimate results with reduced uncertainties” by deterministically incorporating the computational model. This methodology is designated using the acronym 2nd-BERRU-PMD, where the last letter (“D”) indicates “deterministic,” referring to the inclusion of the computed model responses. The posterior first-order moments (best-estimate predicted mean values) and posterior second-order moments (best-estimate predicted correlations) produced by the 2nd-BERRU-PMD methodology are compared with the results produced by the 1st-BERRU-PM methodology [14] , as well as to the results produced by data assimilation procedures, indicating the specific ways in which the 2nd-BERRU-PMD overcomes limitations of these latter procedures.

3.1. MaxEnt Representation of Experimental Information for Responses and Parameters

Consider that the total number of experimentally measured responses is TR. The information usually available regarding the distribution of such measured responses comprises the first-order moments (mean values), which will be denoted as r i e , i = 1 , , T R , and the second-order moments (variances/covariances), which will be denoted as cov ( r i , r j ) e , i , j = 1 , , T R , for the measured responses. The letter “e” will be used either as a superscript or a superscript to indicate experimentally measured quantities. The expected values of the experimentally measured responses will be considered to constitute the components of a vector denoted as r e ( r 1 e , , r T R e ) . The covariances of the measured responses are considered to be components of the T R × T R -dimensional covariance matrix of measured responses, which will be denoted as C r r e [ cov ( r i , r j ) e ] T R × T R . In principle, it is also possible to obtain correlations between some measured responses and some model parameters. When such correlations between measured responses and measured model parameters are available, they will be denoted as cor ( α i , r j ) e , i = 1 , , T P ; j = 1 , , T R , and they can formally be considered to be elements of a rectangular correlation matrix which will be denoted as C α r e [ cor ( α i , r j ) e ] T P × T R . As discussed in Section 2, cf. Equations (3) and (4), the model parameters are characterized by the vector of mean values α 0 [ α 1 0 , , α i 0 , , α T P 0 ] and the covariance matrix C α α [ cov ( α i , α j ) ] T P × T P .

The MaxEnt principle can now be applied, as described in Appendix A, to construct the least informative (and hence, most conservative) distribution using the available experimental information, to obtain the following expression:

p e ( z | z e , C e ) = ( 2 π ) ( T R + T P ) / 2 D e t ( C e ) exp [ 1 2 ( z z e ) C e 1 ( z z e ) ] , (9)

where D e t ( C e ) denotes the determinant of the matrix C e and where:

C e ( C r r e C r α e C α r e C α α ) ; z = ( r α ) ; z e = ( r e α 0 ) . (10)

3.2. Construction of the Joint Posterior MaxEnt Probability Distribution 2nd-BERRU-PMD

Consistent with the consideration that only mean values and covariances (i.e., first- and second-order distributional moments) are available, the 2nd-BERRU-PMD methodology considers that only first- and second-order sensitivities of the computed responses with respect to the model parameters are available. In this case, the vector of model responses can be represented by the following second-order Taylor-series written in “vector/matrix” form:

r = r c ( α 0 ) + [ S ( α 0 ) ] ( α α 0 ) + p ( α ; α 0 ) ; p ( α ; α 0 ) [ p 1 ( α ; α 0 ) , , p T R ( α ; α 0 ) ] ; p k ( α ; α 0 ) 1 2 ( α α 0 ) [ Q k ( α 0 ) ] ( α α 0 ) , (11)

where the components of the vector r c ( α 0 ) [ r 1 c ( α 0 ) , , r T R c ( α 0 ) ] represent the values of the model responses computed at the nominal parameter values α 0 , while the other vectors and matrices contain first-order and second-order sensitivities of responses with respect to the model parameters, evaluated at the nominal parameter values α 0 , and are defined below.

S ( α ) ( s 11 s 1 , T R s T R , 1 s T R , T R ) ( r 1 ( α ) α 1 r 1 ( α ) α T P r T R ( α ) α 1 r T R ( α ) α T P ) ; Q k ( α ) ( q 11 ( k ) q 1 , T P ( k ) q T P , 1 ( k ) q T P , T P ( k ) ) ( 2 r k c ( α ) α 1 α 1 2 r k c ( α ) α 1 α T P 2 r k c ( α ) α T P α 1 2 r k c ( α ) α T P α T P ) . (12)

The merging of the experimental information with the computational information using the mathematical model is accomplished by seeking a mathematical representation of a “true” probability distribution p ( z ) , z ( r , α ) , which incorporates all of the second-order information provided in the foregoing about the model parameters, measured and computed responses. This “true” probability distribution must satisfy the following characteristics: 1) p ( z ) is normalized to unity; 2) p ( z ) is approximated as closely as possible by the experimental distribution p e ( z | z e , C e ) and 3) p ( z ) satisfies the following integral forms of the deterministic relationships among responses and parameters represented mathematically by Equation (11):

D z g k ( z ) p ( z ) d z = 0 ; g k ( z ) r k r k c ( α 0 ) i = 1 T P s k i δ α i 1 2 i = 1 T P j = 1 T P q i j ( k ) δ α i δ α j = 0 ; k = 1 , , T R . (13)

The “true” normalized distribution p ( z ) which minimizes the discrepancy between it and the distribution p e ( z | z e , C e ) while satisfying the constraints expressed by Equation (13) is obtained by applying the steps generally outlined in Appendix B, which yields the following expressions:

p ( z ) = p e ( z | z e , C e ) Z exp { k = 1 T R θ k g k ( z ) } ; (14)

Z D z p e ( z | z e , C e ) exp { k = 1 T R θ k g k ( z ) } d z . (15)

In Equations (14) and (15), the quantities θ 1 , , θ k , , θ T R are Lagrange multipliers which remain to be determined. In statistical mechanics, the normalization integral Z defined in Equation (15) is called the partition function (or sum over states) and carries all of the information available about the possible states of the system. The quantity ( log Z ) is called the “free energy” and plays an important role in determining the expressions of the Lagrange multipliers and hence of the probability distribution p ( z ) . In the present case, it follows from Equations (15) and (13) that:

[ ln Z ] θ k = 1 Z D z g k ( z ) p e ( z | z e , C e ) exp { k = 1 T R θ k g k ( z ) } d z = 0 , k = 1 , , T R . (16)

On the other hand, alternative expressions for the derivatives [ ln Z ] / θ k of the free energy with respect to the Lagrange multipliers can be obtained by evaluating the integral which defines the partition function Z in Equation (15). Equating these alternative expressions to zero, in view of Equation (16), will provide equations which are to be solved to obtain the explicit expressions of the Lagrange multipliers.

Integrals such as represented by Equation (15) can be quantified, to any desired order of accuracy—even exactly—using the well-known saddle-point (Laplace expansion) method, see e.g., [20] . Examining the exponent of the function entering the normalization integral Z defined in Equation (15) readily indicates that if the term involving the second-order sensitivities of the response with respect to the parameters is retained in the exponent, the expression of the saddle-point will involve the vector of Lagrange multipliers, θ ( θ 1 , , θ k , , θ T R ) , in a nonlinear equation which cannot be solved analytically to obtain a closed-form expression for these Lagrange multipliers. Furthermore, solving numerically this nonlinear equation in θ would require the inversion of the (very large) parameter-covariance matrix C α α , which is impractical. On the other hand, as will be shown below, a closed-form expression for the saddle-point and, consequently, for the Lagrange multipliers can be obtained if the terms containing the second-order response sensitivities are considered by expanding the respective exponential term in a Taylor-series in powers of θ ( θ 1 , , θ k , , θ T R ) or, equivalently, around the nominal parameter values. Expanding the second-order terms transforms the normalization integral Z defined in Equation (15) into the following form:

Z D z f ( r , α ) e h ( r , α ; θ ) d r d α , (17)

where θ ( θ 1 , , θ k , , θ T R ) and where:

h ( r , α ; θ ) 1 2 ( r r e α α 0 ) ( C r r e C r α e C α r e C α α ) 1 ( r r e α α 0 ) + θ [ r + r c ( α 0 ) + S ( α α 0 ) ] , (18)

f ( r , α ) 1 1 2 k = 1 T R θ k ( α α 0 ) [ Q k ( α 0 ) ] ( α α 0 ) + h o t s , (19)

where the acronym “hots” stands for “higher-order terms in the expansion of the exponential exp { k = 1 T R θ k g k ( z ) } .” Neglecting these higher-order terms, it follows from Equations (17)-(19) that

Z = I 1 1 2 k = 1 T R θ k J k , (20)

where

I 1 D z e h ( r , α ; θ ) d r d α , (21)

J k D z ( α α 0 ) [ Q k ( α 0 ) ] ( α α 0 ) e h ( r , α ; θ ) d r d α . (22)

The saddle-point, denoted as z s p ( r s p , α s p ) , of the function h ( r , α ; θ ) is defined as the point where the following partial gradients vanish:

r h ( r , α ; θ ) = 0 , α h ( r , α ; θ ) = 0 , at z = z s p . (23)

The conditions represented by Equation (23) yield the following equation:

( C r r e C r α e C α r e C α α ) 1 ( r s p r e α s p α 0 ) = ( θ S θ ) . (24)

Solving Equation (24) yields the following expressions for the components of the saddle point z s p ( r s p , α s p ) of the function h ( r , α ; θ ) :

r s p = r e + ( C r r e C r α e S ) θ , (25)

α s p = α 0 + ( C α r e C α α S ) θ . (26)

Using the saddle point method (which is exact for the integral I 1 ) or “completing the square in the exponent of I 1 ” yields the following exact expression for the integral I 1 :

I 1 = ( 2 π ) ( T R + T P ) / 2 | C e | 1 / 2 exp [ θ d + 1 2 ( θ S θ ) C e ( θ S θ ) ] , (27)

where | C e | denotes the determinant of the matrix C e and where d denotes the “vector of deviations between the nominal values of computed and measured responses” and is defined below:

d r c ( α 0 ) r e ; d ( d 1 , , d T R ) ; d 1 r i c ( α 0 ) r i e ; i = 1 , , T R . (28)

The expression of the response in terms of the computational model, i.e., Equation (11), can be used to obtain the following representation for d in terms of the computed responses:

d r r e S ( α α 0 ) p ( α ; α 0 ) . (29)

Thus, the “vector of deviations” d can be considered to be a (quasi-) random variable. The expressions of the “vector of mean values of deviations” d ¯ [ d ¯ 1 , , d ¯ T R ] and the covariance matrix D [ D i j ] T R × T R of the vector of deviations d are derived below:

d ¯ D z [ r r e S ( α α 0 ) p ( α ; α 0 ) ] p e ( z | z e , C e ) d z = [ d ¯ 1 , , d ¯ T R ] , (30)

d ¯ k i = 1 T P j = 1 T P q i j ( k ) cov ( α i , α j ) ; k = 1 , , T R ; (31)

D D z ( d d ) p e ( z | z e , C e ) d z = D z p e ( z | z e , C e ) [ r r e S ( α α 0 ) p ( α ; α 0 ) ] × [ r r e S ( α α 0 ) p ( α ; α 0 ) ] d z C r r e + S C α α S C r α e S S C α r e . (32)

The third- and fourth-order correlations were neglected in the final expression obtained for the covariance matrix D [ D i j ] T R × T R in Equation (32). Notably, the result obtained in Equation (30) indicates that if the second-order sensitivities of the response with respect to the model parameters are neglected, then d ¯ k = 0 , k = 1 , , T R , in which case d would be a random vector with zero-mean and covariance D . Therefore, neglecting the second-order sensitivities would imply that the computed model responses would be consistent with the measurements only if d ¯ = 0 , i.e., r c ( α 0 ) r e . Although such a situation would not be impossible to conceive, it would probably not occur often in practice. These considerations further underscore the significant impact which the second-order response sensitivities have on the correct interpretation of actual, physical systems.

Applying the saddle-point method to the expression provided in Equation (22) yields the following expression:

J k = I 1 ( α s p α 0 ) [ Q k ( α 0 ) ] ( α s p α 0 ) [ 1 + O ( C α α ) ] , (33)

where the quantity O ( C α α ) denotes terms of the order of the components of the parameter covariance matrix C α α , and can therefore be neglected by comparison to the leading term of order unity.

Replacing the expression obtained in Equations (27) and (33) into Equation (20) and neglecting the higher-order terms yields the following expression for the partition function:

Z = I 1 { 1 1 2 k = 1 T R θ k ( α s p α 0 ) [ Q k ( α 0 ) ] ( α s p α 0 ) } . (34)

Taking the logarithm of Equation (34) yields the following expression for the “free energy”:

ln Z = T R + T P 2 ln ( 2 π ) + 1 2 ln | C m | θ d + h 1 ( θ ) + h 2 ( θ ) , (35)

h 1 ( θ ) 1 2 [ θ C r r e θ θ ( C r α e S + S C α r e ) θ + θ S C α α S θ ] , (36)

h 2 ( θ ) ln { 1 1 2 k = 1 T R θ k ( α s p α 0 ) [ Q k ( α 0 ) ] ( α s p α 0 ) } = ln { 1 1 2 k = 1 T R θ k [ θ M ( k ) θ ] } , (37)

M ( k ) [ M i j ( k ) ] T R × T R ( C α r e C α α S ) [ Q k ( α 0 ) ] ( C α r e C α α S ) = [ M ( k ) ] . (38)

The expression for h 2 ( θ ) presented in Equation (37) has been obtained by using Equation (26). As indicated in Equations (36) and (37), respectively, the quantity h 1 ( θ ) contains only first-order response sensitivities (with respect to the model’s parameters) and second-order bilinear terms in the components of θ , while the quantity h 2 ( θ ) comprises all second-order response sensitivities and terms higher than second-order in the components of θ .

In view of Equations (35)-(37), the derivatives of the free energy with respect to the Lagrange multipliers θ k are provided by the expression below:

[ ln Z ] θ k = d k + h 1 ( θ ) θ k + h 2 ( θ ) θ k , k = 1 , , T R . (39)

It follows from Equations (39) and (16) that the vector of Lagrange multipliers must be the solution of the following system of coupled nonlinear equations:

w k ( θ ) h 1 ( θ ) θ k + h 2 ( θ ) θ k d k = 0 , k = 1 , , T R , (40)

where:

h 1 ( θ ) θ k = j = 1 T R D k j θ j ; [ D k j ] T R × T R D C r r e + S C α α S C r α e S S C α r e ; (41)

h 2 ( θ ) θ k = 1 2 θ k i = 1 T R θ i [ θ M ( i ) θ ] 1 1 2 i = 1 T R θ i [ θ M ( i ) θ ] 1 2 θ M ( k ) θ i = 1 T R θ i j = 1 T R M k j ( i ) θ j u k ( θ ) . (42)

The approximation 1 1 2 j = 1 T R θ j [ θ M ( j ) θ ] 1 has been used to obtain the approximate expression on the right-side of Equation (42), which thus comprises just the bilinear terms in θ , neglecting the high-order terms involving the components of θ , which are expected to be comparatively small.

Introducing the vectors w ( θ ) [ w 1 ( θ ) , , w k ( θ ) , , w T R ( θ ) ] and u ( θ ) [ u 1 ( θ ) , , u k ( θ ) , , u T R ( θ ) ] enables the compact writing, in matrix-vector form, of Equation (40), as follows:

w ( θ ) d + D θ + u ( θ ) = 0 . (43)

If only the first-order sensitivities of the response with respect to the model parameters are considered (and thus the second- and higher-order sensitivities are neglected or are unavailable), the term u ( θ ) is neglected and the remaining first-order term in Equation (43) yields the following first-order solution, denoted as θ ( 1 ) , for the vector of Lagrange multipliers:

θ ( 1 ) = D 1 d . (44)

Notably, the matrix D has dimensions T R × T R . Since T R T P in practical situations, it follows that the computation of the coordinates of the saddle-point z s p ( r s p , α s p ) involves the inversion of just the smallest-possible matrix (i.e., D ) that arises in the “predictive modeling” methodology. Replacing the result obtained for θ ( 1 ) in Equations (25) and (26) provides the corresponding expressions for the coordinates r s p and α s p of the saddle-point z s p ( r s p , α s p ) .

Using the results obtained for θ ( 1 ) in Equation leads to the expressions already presented in the 1st-BERRU-PM methodology [14] for the saddle point z s p ( r s p , α s p ) and, subsequently for the best-estimate (calibrated) mean values for the parameters and responses, as well as for the best-estimate covariance matrices for the corresponding best-estimate parameters and responses. These expressions will not be reproduced here, since they will also be obtained as particular cases of the 2nd-BERRU-PM methodology to be presented in the remainder of this Section.

The goal of predictive modeling is to reduce, as much possible, the discrepancies between the measured response values and the corresponding values predicted by the computational model But if the nominal values of the measured responses happen—by nature, design or accident—to coincide with the corresponding computed values, the vector of deviations vanishes, i.e., d = 0 , in which case Equations (44), (25), (26) and (14) indicate that:

d = 0 ; θ ( 1 ) = 0 ; r s p = r e , α s p = α 0 ; p ( z ) = p e ( z | z e , C e ) . (45)

The results shown in Equation (45) indicate that if only first-order sensitivities are used and if the nominal values of the measured responses happen to coincide with the corresponding computed values, the model would have no impact on the initial (measured) distribution of responses and parameters, so there would be no calibration of either the predicted responses or parameters. Evidently this conclusion reveals a fundamental limitation of using just the first-order sensitivities for predictive modeling (and data assimilation and/or data adjustment methods). But this conclusion also provides the fundamental motivation for using second-order sensitivities, in addition to the natural expectation that using the second-order sensitivities will produce more accurate results, in general.

It is also important to note that if the response measurements were perfect and if they were uncorrelated to the model parameters, i.e., if C r r e = 0 and C r α e = C α r e = 0 , the matrix D = S C α α S would nevertheless be non-singular since the matrix S C α α S is nonsingular. This, the solution shown in Equation (44) would still exist; this is in contradistinction with the extant data assimilation schemes, which fail if the measurements are perfect. Notably, all extant data assimilation schemes consider that the response measurements and model parameters are uncorrelated.

If the second-order sensitivities, which are included in the term u ( θ ) , are retained in Equation (40), this equation becomes nonlinear in the components of θ ( θ 1 , , θ k , , θ T R ) and therefore requires an iterative method for solving. The structure of u ( θ ) indicates that the simplest (and probably the most efficient) method for solving Equation (43) would be Newton’s method, as can be surmised by the similarity between this equation and a simple quadratic algebraic equation. In this vein, consider the application of Newton’s method to the quadratic equation, as described below:

q ( x ) a x 2 + b x + d = 0 ; q ( x ) = 2 a x + b ; x ( n + 1 ) = x ( n ) [ q ( x n ) ] 1 q ( x n ) ; n = 0 , 1 , . (46)

If d 0 , the usual initial-guess for starting the Newton iteration is x ( 0 ) = d / b , which yields the first-iterate x ( 1 ) = [ 1 + a d / ( b 2 2 a d ) ] d / b . The correction term included in this first-iterate solution is of first order in the product ( a d ), so the subsequent iterates become increasingly more accurate when this product is small; even the first-iterate becomes exact in the limit as a 0 . On the other hand, if d = 0 , the initial guess x ( 0 ) = d / b = 0 would yield solely the trivial solution x = 0 . Therefore, if d 0 , the initial-guess for starting the Newton iteration would be x ( 0 ) = b / a , which would yield the non-zero first-iterate solution x ( 1 ) = b / a + d / b , and so on.

Applying Newton’s method to solve Equation (43) yields the following iterative equation for finding its solution(s) θ ( θ 1 , , θ k , , θ T R ) :

θ ( n + 1 ) = θ ( n ) [ A ( θ ( n ) ) ] 1 w ( θ ( n ) ) ; n = 0 , 1 , , (47)

where the Jacobian matrix A ( θ ) is defined as usual, namely:

A ( θ ) ( w 1 , , w k , , w T R ) ( θ 1 , , θ n , , θ T R ) [ w 1 / θ 1 · w 1 / θ T R · · · w T R / θ 1 · w T R / θ T R ] [ A 1 , 1 ( θ ) · A 1 , T R ( θ ) · · · A T R , 1 ( θ ) · A T R , T R ( θ ) ] . (48)

Using Equations (42) and (43) yields the following expression for the components/elements A k , n ( θ ) of the Jacobian matrix A ( θ ) :

A k n ( θ ) = w k θ n = θ n [ j = 1 T R D k j θ j 1 2 θ M ( k ) θ i = 1 T R θ i j = 1 T R M k j ( i ) θ j ] = D k n j = 1 T R M n j ( k ) θ j j = 1 T R M k j ( n ) θ j i = 1 T R θ i M k n ( i ) = D k n V k n ( θ ) , (49)

where:

V k n ( θ ) j = 1 T R θ j [ M n j ( k ) + M k j ( n ) + M k n ( j ) ] , V ( θ ) [ V k n ( θ ) ] T R × T R . (50)

As indicated in Equation (49), if the initial guess θ ( 0 ) = 0 is used, the Jacobian matrix would take on the value A ( θ = 0 ) = D . Consequently, the resulting solution of Equation (47) would be obtained in a single iteration and would be the same as the first-order solution obtained in Equation (44), i.e., θ ( 1 ) = D 1 d . Similarly, if the second-order response sensitivities to the model parameters are neglected, Equation (50) indicates that A ( M ( k ) = 0 ) = D .

To obtain the nontrivial solution of Equation (47) when d is very small or actually vanishes ( d = 0 ), a non-zero initial guess ( θ ( 0 ) 0 ) must be used for starting the Newton iterations, as was discussed in the foregoing by analogy with the solution of a quadratic algebraic equation using Newton’s method. The nontrivial solution of Equation (47) which would be valid even when d = 0 will be denoted as θ ( h , n + 1 ) , where the superscript “h” indicates that the expression of θ ( h , n + 1 ) will include higher-order terms, which will involve the second-order sensitivities of computed responses with respect to model parameters. The corresponding Newton iteration indicated by Equation (47) will thus take on the following form for finding the nontrivial root of Equation (43) when d is very small or actually vanishes:

θ ( h , n + 1 ) = θ ( h , n ) [ D V ( θ ( h , n ) ) ] 1 w ( θ ( h , n ) ) ; n = 0 , 1 , . (51)

As has been discussed in the foregoing in conjunction with the algebraic quadratic equation, it would be advantageous to start the Newton iteration shown in Equation (51) with an initial guess θ ( h , 0 ) that would have components analogous to the initial guess x ( 0 ) = b / a for the quadratic equations. It would be further advantageous if the computation of the initial guess θ ( h , 0 ) would not require inversion of matrices. An expression for θ ( h , 0 ) which satisfies these considerations can be derived by setting d = 0 in Equation (43) and by considering just the diagonal terms of the matrices involved in this equation, thus obtaining:

θ ( h , 0 ) ( θ 1 ( h , 0 ) , , θ j ( h , 0 ) , , θ T R ( h , 0 ) ) ; θ j ( h , 0 ) = 2 D j j / 3 M j j ( j ) ; j = 1 , , T R . (52)

Using the initial guess provided in Equation (52) in Equation (50) yields the following starting expressions for components of the Jacobian matrix A ( θ ) :

A k n ( θ ( h , 0 ) ) = D k n j = 1 T R θ j ( h , 0 ) [ M n j ( k ) + M k j ( n ) + M k n ( j ) ] . (53)

Furthermore, using the initial guess provided in Equation (52) in Equation (43) yields the following starting expressions for components w k ( θ ) of the vector w ( θ ) [ w 1 ( θ ) , , w k ( θ ) , , w T R ( θ ) ] :

w k ( θ ( h , 0 ) ) = u k ( θ ( h , 0 ) ) + j = 1 T R D k j θ j ( h , 0 ) d k , k = 1 , , T R ; (54)

u k ( θ ( h , 0 ) ) = 1 2 i = 1 T R j = 1 T R θ i ( h , 0 ) M i j ( k ) θ j ( h , 0 ) i = 1 T R θ i ( h , 0 ) j = 1 T R M k j ( i ) θ j ( h , 0 ) 4 9 i = 1 T R j = 1 T R ( D i i M i i ( i ) ) ( D j j M j j ( j ) ) [ 1 2 M i j ( k ) + M k j ( i ) ] , k = 1 , , T R . (55)

The impact of the second-order sensitivities through the components M i j ( k ) of the matrices M ( k ) is evident in Equations (53)-(55). The first-iterate θ ( h , 1 ) of the solution of Equation (51) has the following expression:

θ ( h , 1 ) = θ ( h , 0 ) ( D V ( h , 0 ) ) 1 ( u ( h , 0 ) + D θ ( h , 0 ) d ) ; V ( h , 0 ) V ( θ ( h , 0 ) ) ; u ( h , 0 ) u ( θ ( h , 0 ) ) . (56)

It is evident from the expressions provided in Equations (53)-(55) that even if d = 0 (i.e., if the vector of deviations between the nominally measured and computed values of the responses vanishes), the starting-values of the components w k ( θ ( h , 0 ) ) 0 are non-zero because of the impact of the second-order response sensitivities. In addition, the initial-Jacobian matrix A ( θ ( h , 0 ) ) is nonsingular (also because of the impact of the second-order response sensitivities), so the first-iterate θ ( h , 1 ) has a finite non-zero value. All subsequent iterates will also have non-zero values. Thus, using the second-order response sensitivities overcomes the limitation of the first-order predictive modeling methodology 1st-BERRU-PM, as well as the limitation of extant data adjustment and data assimilation procedures, which all fail when the measured response values coincide with the computed response values.

Replacing the solution of Equation (51) into Equations (25) and (26) yields the following expressions for the coordinates of the saddle point z s p ( r s p , α s p ) of the function h ( r , α ; θ ) :

r s p = r e + ( C r r e C r α e S ) θ ( h , n + 1 ) , (57)

α s p = α 0 + ( C α r e C α α S ) θ ( h , n + 1 ) . (58)

The best-estimate (optimal) mean values (first-order moments) and correlations (second-order moments) of the posterior distribution of the combined computational model with measured responses and parameters are obtained by using the saddle-point method to evaluate the integrals that define these first- and second-order moments. These best-estimate values will be denoted using the superscript “bed” to indicate that these “best-estimate” values arise from deterministically incorporated/assimilated computed responses.

Thus, the predicted “best-estimated values with deterministically incorporated computed responses” for the predicted mean values of responses and parameters, respectively, have the following expressions:

r b e d D z r p ( z ) d z = r s p = r e + ( C r r e C r α e S ) θ ( h , n + 1 ) ; (59)

α b e d D z α p ( z ) d z = α s p = α 0 + ( C α r e C α α S ) θ ( h , n + 1 ) . (60)

The expression of the “best-estimate with deterministically incorporated computed responses” covariance matrix for the responses, denoted as C r r b e d , is obtained from its definition in conjunction with Equation (59), which yields:

C r r b e d D z ( r r b e d ) ( r r b e d ) p ( z ) d z = D z [ r r e ( C r r e C r α e S ) θ ( h , n + 1 ) ] [ r r e ( C r r e C r α e S ) θ ( h , n + 1 ) ] p ( z ) d z . (61)

The expression of the “best-estimate with deterministically incorporated computed responses” covariance matrix for the parameters, denoted as C α α b e d , is obtained from its definition in conjunction with Equation (60), which yields:

C α α b e d D z ( α α b e d ) ( α α b e d ) p ( z ) d z = D z [ α α 0 ( C α r e C α α S ) θ ( h , n + 1 ) ] [ α α 0 ( C α r e C α α S ) θ ( h , n + 1 ) ] p ( z ) d z . (62)

The expression of the “best-estimate with deterministically incorporated computed responses” correlation matrix among parameters and responses, denoted as C α r b e d , is obtained by using its definition in conjunction with Equations (59) and (60), which yields:

C α r b e d D z ( α α b e d ) ( r r b e d ) p ( z ) d z = D z [ α α 0 ( C α r e C α α S ) θ ( h , n + 1 ) ] [ r r e ( C r r e C r α e S ) θ ( h , n + 1 ) ] p ( z ) d z . (63)

The impact of the second-order response sensitivities on the mean values and correlation/covariance matrices can be assessed already by using the first iterate solution, θ ( h , 1 ) , in Equations (59)-(63). Using the first-iterate, θ ( h , 1 ) , in Equations (59) and (60), respectively, yields the following expressions for the “first-iterate best-estimate” values for the responses and parameters, respectively:

r ( 1 ) b e d = r e + ( C r r e C r α e S ) [ θ ( h , 0 ) ( D V ( h , 0 ) ) 1 ( u ( h , 0 ) + D θ ( h , 0 ) d ) ] ; (64)

α ( 1 ) b e d = α 0 + ( C α r e C α α S ) [ θ ( h , 0 ) ( D V ( h , 0 ) ) 1 ( u ( h , 0 ) + D θ ( h , 0 ) d ) ] . (65)

As the expressions obtained in Equations (64) and (65) indicate, combining experimental and computational information modifies the correlations among the parameters through couplings introduced by the sensitivities of the participating responses. The expressions obtained in Equations (64) and (65) possess the following new features by comparisons to their counterparts provided by extant data assimilation and data adjustment procedures:

1) If the second-order sensitivities are neglected, then the expressions in (64) and (65) take on the following simplified forms:

no 2 nd -order sensitivities : r ( 1 ) b e d = r e + ( C r r e C r α e S ) D 1 d ; (66)

no 2 nd -order sensitivities : α ( 1 ) b e d = α 0 + ( C α r e C α α S ) D 1 d . (67)

As expected, the expressions in Equations (66) and (67) are identical to the expressions provided by the 1st-BERRU-PM [14] methodology and therefore possess the same strengths (easy to implement) and weaknesses (trivial results, no calibration if d = 0 ) as the results produced by the 1st-BERRU-PM methodology.

2) If the vector of deviations between the nominally measured and computed values of the responses vanishes, i.e., if d = 0 , the expressions provided in Equations (64) and (65) take on the following simplified forms:

if d = 0 : r ( 1 ) b e d ( d = 0 ) = r e + ( C r r e C r α e S ) [ θ ( h , 0 ) ( D V ( h , 0 ) ) 1 ( u ( h , 0 ) + D θ ( h , 0 ) ) ] 0 (68)

if d = 0 : α ( 1 ) b e d ( d = 0 ) = α 0 + ( C α r e C α α S ) [ θ ( h , 0 ) ( D V ( h , 0 ) ) 1 ( u ( h , 0 ) + D θ ( h , 0 ) ) ] 0 (69)

As evidenced by comparing Equations (68) and (69) with Equations (66) and (67), respectively, the contributions of the second-order sensitivities provide the fundamental difference when d = 0 between non-calibration [i.e., r ( 1 ) b e d ( d = 0 ) = r e and α ( 1 ) b e d ( d = 0 ) = α 0 ] in the case for data adjustment, data assimilation and 1st-BERRU-PM methodologies, cf. Equations (66) and (67), respectively, versus the calibration r ( 1 ) b e d ( d = 0 ) r e and α ( 1 ) b e d ( d = 0 ) α 0 provided by the 2nd-BERRU-PMD expressions obtained in Equations (68) and (69).

3) The components of the matrix V ( h , 0 ) are usually smaller (since they contain higher-order quantities) than the corresponding components of the matrix D , so the contributions provided by the term V ( h , 0 ) can be considered to be “corrections” to the components of the matrix D . If the contributions of the matrix V ( h , 0 ) are neglected by comparison to the corresponding components of the matrix D , the expressions provided in Equations (64) and (65) reduce to the following expressions:

if V ( h , 0 ) = 0 : r ( 1 ) b e d ( V ( h , 0 ) = 0 ) = r e + ( C r r e C r α e S ) D 1 ( d u ( h , 0 ) ) ; (70)

if V ( h , 0 ) = 0 : α ( 1 ) b e d ( V ( h , 0 ) = 0 ) = α 0 + ( C α r e C α α S ) D 1 ( d u ( h , 0 ) ) . (71)

The above expressions underscore, through the term u ( h , 0 ) , the fundamental importance of including second-order sensitivities, since they provide the basic contribution to maintaining the effects of calibration for both responses and parameters, even when d = 0 , i.e., when the values of the measured responses coincide with the corresponding values of the computed responses. This essential contribution stemming from the second-order responses sensitivities is clearly highlighted by comparing the expressions obtained in Equations (70) and (71) with the corresponding first-order (1st-BERRU-PM) expressions provided in Equations (66) and (67), respectively.

Using the first-iterate, θ ( h , 1 ) , in Equation (61) yields the following expression for the “first-iterate best-estimate” covariance matrix C r r ( 1 ) b e d for the responses:

C r r ( 1 ) b e d = D z { r r e ( C r r e C r α e S ) [ θ ( h , 0 ) ( D V ( h , 0 ) ) 1 ( u ( h , 0 ) + D θ ( h , 0 ) d ) ] } × { r r e ( C r r e C r α e S ) [ θ ( h , 0 ) ( D V ( h , 0 ) ) 1 ( u ( h , 0 ) + D θ ( h , 0 ) d ) ] } p ( z ) d z = [ r r e + X r θ ( h , 0 ) Y r d + Y r u ( h , 0 ) ] [ r r e + X r θ ( h , 0 ) Y r d + Y r u ( h , 0 ) ] . (72)

The following definitions were used in Equation (72):

D z ( ) p ( z ) d z , (73)

C r d ( 1 ) ( r r e ) d = ( r r e ) [ r r e S ( α α 0 ) p ( α ; α 0 ) ] C r r e C r α e S ; (74)

X r C r d ( 1 ) [ ( D V ( h , 0 ) ) 1 D I ] ; (75)

Y r C r d ( 1 ) ( D V ( h , 0 ) ) 1 . (76)

Expanding the integrand in the second equality in Equation (72) yields the following expression for the “first-iterate best-estimate” response covariance matrix C r r ( 1 ) b e d :

C r r ( 1 ) b e d = C r r e C r d ( 1 ) Y r + X r θ ( h , 0 ) ( θ ( h , 0 ) ) X r X r θ ( h , 0 ) d ¯ Y r + X r θ ( h , 0 ) ( u ( h , 0 ) ) Y r Y r ( C r d ( 1 ) ) Y r d ¯ ( θ ( h , 0 ) ) X r + Y r D Y r Y r d ¯ ( u ( h , 0 ) ) Y r + Y r u ( h , 0 ) ( θ ( h , 0 ) ) X r Y r u ( h , 0 ) d ¯ Y r + Y r u ( h , 0 ) ( u ( h , 0 ) ) Y r . (77)

Using the first-iterate, θ ( h , 1 ) , in Equation (62) and evaluating the resulting integrals yields the following expression for the “first-iterate best-estimate” covariance matrix C α α ( 1 ) b e for the model parameters:

C α α ( 1 ) b e d = [ α α 0 + X α θ ( h , 0 ) Y α d + Y α u ( h , 0 ) ] [ α α 0 + X α θ ( h , 0 ) Y α d + Y α u ( h , 0 ) ] = C α α e C α d ( 1 ) Y α + X α θ ( h , 0 ) ( θ ( h , 0 ) ) X α X α θ ( h , 0 ) d ¯ Y α + X α θ ( h , 0 ) ( u ( h , 0 ) ) Y α Y α ( C α d ( 1 ) ) Y α d ¯ ( θ ( h , 0 ) ) X α + Y α D Y α Y α d ¯ ( u ( h , 0 ) ) Y α + Y α u ( h , 0 ) ( θ ( h , 0 ) ) X α Y α u ( h , 0 ) d ¯ Y α + Y α u ( h , 0 ) ( u ( h , 0 ) ) Y α . (78)

The following definitions were used in Equation (78):

C α d ( 1 ) ( α α 0 ) d = ( α α 0 ) [ r r e S ( α α 0 ) p ( α ; α 0 ) ] C α r e C α α e S ; (79)

X α C α d ( 1 ) [ ( D V ( h , 0 ) ) 1 D I ] ; (80)

Y α C α d ( 1 ) ( D V ( h , 0 ) ) 1 . (81)

Using the first-iterate, θ ( h , 1 ) , in Equation (63) and evaluating the resulting integrals yields the following expression for the “first-iterate best-estimate” correlation matrix C α r ( 1 ) b e d for the parameters and responses:

C α r ( 1 ) b e d = [ α α 0 + X α θ ( h , 0 ) Y α d + Y α u ( h , 0 ) ] [ r r e + X r θ ( h , 0 ) Y r d + Y r u ( h , 0 ) ] = C α r e C α d ( 1 ) Y r + X α θ ( h , 0 ) ( θ ( h , 0 ) ) X r X α θ ( h , 0 ) d ¯ Y r + X α θ ( h , 0 ) ( u ( h , 0 ) ) Y r Y α ( C r d ( 1 ) ) Y α d ¯ ( θ ( h , 0 ) ) X r + Y α D Y r Y α d ¯ ( u ( h , 0 ) ) Y r + Y α u ( h , 0 ) ( θ ( h , 0 ) ) X r Y α u ( h , 0 ) d ¯ Y r + Y α u ( h , 0 ) ( u ( h , 0 ) ) Y r . (82)

As the expressions obtained in Equations (77), (78) and (82) indicate, combining experimental and computational information modifies the correlations among the parameters through couplings introduced by the sensitivities of the participating responses.

The expression obtained in Equations (77), (78) and (82) for C r r ( 1 ) b e d , C α α ( 1 ) b e d and C α r ( 1 ) b e d , respectively, will now be analyzed below by considering the same simplifying conditions as were considered for the best-estimate response and parameter mean-values r ( 1 ) b e d and α ( 1 ) b e d , respectively, as follows:

1) If all of the second-order sensitivities are neglected, then the following simplifications occur: X r = X α = 0 , Y r = C r d ( 1 ) D 1 and Y α = C α d ( 1 ) D 1 . Therefore, the expressions in Equations (77), (78) and (82) take on the following simplified forms:

no 2 nd -order sensitivities : C r r ( 1 ) b e d = C r r e C r d ( 1 ) D 1 C r d ( 1 ) . (83)

no 2 nd -order sensitivities : C α α ( 1 ) b e d = C α α e C α d ( 1 ) D 1 C α d ( 1 ) . (84)

no 2 nd -order sensitivities : C α r ( 1 ) b e d = C α r e C α d ( 1 ) D 1 C r d ( 1 ) . (85)

As expected, the expressions obtained in Equation (83)-(85) are identical to the expressions provided by the first-order predictive modeling methodology 1st-BERRU-PM [14] and therefore possess the same strengths (easy to implement) and weaknesses (i.e., trivial results and no calibration if d = 0 ) as 1st-BERRU-PM.

2) If the vector of deviations between the nominally measured and computed values of the responses vanishes, i.e., if d = 0 , the expressions obtained in Equations (77) for C r r ( 1 ) b e d , C α α ( 1 ) b e d and C α r ( 1 ) b e d , respectively, will take on the following simplified forms:

if d = 0 : C r r ( 1 ) b e d = C r r e C r d ( 1 ) Y r + X r θ ( h , 0 ) ( θ ( h , 0 ) ) X r + X r θ ( h , 0 ) ( u ( h , 0 ) ) Y r Y r ( C r d ( 1 ) ) + Y r D Y r + Y r u ( h , 0 ) ( θ ( h , 0 ) ) X r + Y r u ( h , 0 ) ( u ( h , 0 ) ) Y r ; (86)

if d = 0 : C α α ( 1 ) b e d = C α α e C α d ( 1 ) Y α + X α θ ( h , 0 ) ( θ ( h , 0 ) ) X α + X α θ ( h , 0 ) ( u ( h , 0 ) ) Y α Y α ( C α d ( 1 ) ) + Y α D Y α + Y α u ( h , 0 ) ( θ ( h , 0 ) ) X α + Y α u ( h , 0 ) ( u ( h , 0 ) ) Y α ; (87)

if d = 0 : C α r ( 1 ) b e d = C α r e C α d ( 1 ) Y r + X α θ ( h , 0 ) ( θ ( h , 0 ) ) X r + X α θ ( h , 0 ) ( u ( h , 0 ) ) Y r Y α ( C r d ( 1 ) ) + Y α D Y r + Y α u ( h , 0 ) ( θ ( h , 0 ) ) X r + Y α u ( h , 0 ) ( u ( h , 0 ) ) Y r . (88)

If the contributions of the second-order sensitivities are neglected in the expressions obtained in Equations (86)-(88), these expressions will reduce to the first-order expressions shown in Equations (83)-(85).

3) If the contributions of the matrix V ( h , 0 ) are neglected by comparison to the corresponding components of the matrix D , then the following simplifications occur: X r = X α = 0 , Y r = C r d ( 1 ) D 1 and Y α = C α d ( 1 ) D 1 . Consequently, the expressions provided in Equations (77), (78) and (82) will reduce to the following forms:

if V ( h , 0 ) = 0 : C r r ( 1 ) b e d ( V ( h , 0 ) = 0 ) = C r r e C r d ( 1 ) D 1 ( C r d ( 1 ) ) Y r d ¯ ( u ( h , 0 ) ) Y r Y r u ( h , 0 ) d ¯ Y r + Y r u ( h , 0 ) ( u ( h , 0 ) ) Y r ; (89)

if V ( h , 0 ) = 0 : C α α ( 1 ) b e d ( V ( h , 0 ) = 0 ) = C α α e C α d ( 1 ) D 1 ( C α d ( 1 ) ) Y α d ¯ ( u ( h , 0 ) ) Y α Y α u ( h , 0 ) d ¯ Y α + Y α u ( h , 0 ) ( u ( h , 0 ) ) Y α ; (90)

if V ( h , 0 ) = 0 : C α r ( 1 ) b e d ( V ( h , 0 ) = 0 ) = C α r e C α d ( 1 ) D 1 ( C r d ( 1 ) ) Y α d ¯ ( u ( h , 0 ) ) Y r Y α u ( h , 0 ) d ¯ Y r + Y α u ( h , 0 ) ( u ( h , 0 ) ) Y r . (91)

The above expressions underscore, through the terms involving the vector u ( h , 0 ) , the fundamental importance of including second-order sensitivities, which provide non-zero contributions even when d ¯ = 0 .

It is important to emphasize that the posterior distribution of responses and parameters is non-Gaussian since the “best estimate” triple correlations among the best-estimate responses and parameters are nonzero even if the experimentally measured responses and parameters were normally distributed initially. This fact can readily be shown by considering Equations (64) and (65), which can be written as follows:

r ( 1 ) b e d = r e + r ( 0 ) ; r ( 0 ) ( C r r e C r α e S ) [ θ ( h , 0 ) ( D V ( h , 0 ) ) 1 ( u ( h , 0 ) + D θ ( h , 0 ) d ) ] ; (92)

α ( 1 ) b e d = α 0 + β ( 0 ) ; β ( 0 ) ( C α r e C α α S ) [ θ ( h , 0 ) ( D V ( h , 0 ) ) 1 ( u ( h , 0 ) + D θ ( h , 0 ) d ) ] . (93)

Using Equations (92) and (93) in conjunction with the definition of triple correlations for multivariate random quantities yields the following “lowest-order estimate” for the best-estimate triple correlations among responses and parameters, which denoted below as T r r , i j k ( 1 ) b e d and T α α , i j k ( 1 ) b e d , respectively:

T r r , i j k ( 1 ) b e d ( r i r i e r i ( 0 ) ) ( r j r j e r j ( 0 ) ) ( r k r k e r k ( 0 ) ) = E [ ( r i r i e ) ( r j r j e ) ( r k r k e ) ] r i ( 0 ) cov ( r j , r k ) r j ( 0 ) cov ( r i , r k ) r k ( 0 ) cov ( r i , r j ) r i ( 0 ) r j ( 0 ) r k ( 0 ) ; i , j , k = 1 , , T R ; (94)

T α α , i j k ( 1 ) b e d ( α i α i 0 β i ( 0 ) ) ( α j α j 0 β j ( 0 ) ) ( α k α k e β k ( 0 ) ) = E [ ( α i α i 0 ) ( α j α j 0 ) ( α j α j 0 ) ] β i ( 0 ) cov ( α j , α k ) β j ( 0 ) cov ( α i , α k ) β k ( 0 ) cov ( α i , α j ) β i ( 0 ) β j ( 0 ) β k ( 0 ) ; i , j , k = 1 , , T P . (95)

If the responses are normally distributed initially, then the initial triple correlations are all zero, i.e., E [ ( α i α i 0 ) ( α j α j 0 ) ( α j α j 0 ) ] for all i , j , k = 1 , , T R . Nevertheless, as indicated in Equation (94), the “best-estimate response triple correlations” T r r , i j k ( 1 ) b e d 0 are non-zero even in such a case, unless there is “no calibration,” i.e., if r i ( 0 ) = r j ( 0 ) = r k ( 0 ) = 0 for all i , j , k = 1 , , T R . Similarly, if the parameters are normally distributed initially, then the initial triple correlations are all zero, i.e., E [ ( r i r i e ) ( r j r j e ) ( r k r k e ) ] = 0 for all i , j , k = 1 , , T P . Nevertheless, as indicated in Equation (95), the “best-estimate parameter triple correlations” T α α , i j k ( 1 ) b e d 0 are non-zero even in such a case, unless there is “no calibration,” i.e., if α i ( 0 ) = α j ( 0 ) = α k ( 0 ) = 0 for all i , j , k = 1 , , T P .

4. Discussion and Conclusions

This work has presented the 2nd-BERRU-PMD methodology, where: the attribute “2nd” indicates that this methodology incorporates second-order uncertainties (means and covariances) and second-order sensitivities of computed model responses to model parameters; the acronym BERRU stands for “best-estimate results with reduced uncertainties;” and the letter “D” indicates “deterministic,” referring to the deterministic inclusion of the computational model responses. The 2nd-BERRU-PMD methodology is fundamentally based on the maximum entropy (MaxEnt) principle. This principle is in contradistinction to the fundamental principle that underlies the extant data assimilation and/or adjustment procedures which minimize in a least-square sense a subjective user-defined functional which is meant to represent the discrepancies between measured and computed model responses. It has shown that the 2nd-BERRU-PMD methodology includes all of the previous results produced by the first-order BERRU-PM methodology, generalizing and extending current data assimilation and/or data adjustment procedures while overcoming the fundamental limitations of these procedures. Since the framework of the 2nd-BERRU-PMD methodology comprises the combined phase-space of parameters and responses, the expressions obtained for the predicted best-estimate responses and calibrated parameters can be used for solving both forward/direct and inverse problems.

The accompanying work [19] will present the alternative framework for developing the “second-order MaxEnt predictive modelling methodology” by incorporating the computed model responses probabilistically (as opposed to “deterministically”). This alternative methodology [19] will be designated by the acronym 2nd-BERRU-PMP, where the last letter (“P”) in the acronym indicates “probabilistic” inclusion of the computational model responses. It will be shown that although both the 2nd-BERRU-PMD and the 2nd-BERRU-PMP frameworks predict analogous 2nd-order expressions for the best-estimate posterior first- and second-order moments, the respective expressions for the predicted responses, calibrated predicted parameters and their predicted uncertainties (covariances) are not identical to each other. The additional considerations that make the corresponding expressions predicted by the 2nd-BERRU-PMD and the 2nd-BERRU-PMP become identical will also be analyzed in the accompanying Part II [19] .

Appendix A: Construction of the Second-Order Maximum Entropy Distribution for Responses and Parameters

When an unknown distribution p ( α , r ) , defined on a domain D D α D r (where D α denotes the domain of definition of the parameters and D r denotes the domain of definition of the responses) needs to be reconstructed from a finite number of its known moments, the principle of maximum entropy (MaxEnt) originally formulated by Jaynes [15] provides the optimal compatibility with the available information, while simultaneously ensuring minimal spurious information content. In particular, when only the first-order and second-order moments of the joint distribution of model parameters and responses p ( α , r ) are known, the corresponding second-order MaxEnt distribution p ( α , r ) is constructed by following the procedure outlined below.

1) Known means and covariances for responses:

r i 0 D r i p ( α , r ) d r d α , i = 1 , , T R ; (96)

cov ( r i , r j ) D e ( r i r i 0 ) ( r j r j 0 ) p ( α , r ) d r d α ; i , j = 1 , , T R . (97)

2) Known means and covariances for parameters:

α j 0 D e α j p ( α , r ) d r d α , j = 1 , , T P ; (98)

cov ( α i , α j ) D e ( α i α i 0 ) ( α j α j 0 ) p ( α , r ) d r d α ; i , j = 1 , , T P ; (99)

cov ( α i , r j ) D e ( α i α i 0 ) ( r j r j 0 ) p ( α , r ) d r d α ; i = 1 , , T P ; j = 1 , , T R . (100)

According to the MaxEnt principle, the probability density p ( α , r ) would satisfy the “available information” provided in Equations (96)-(100), without implying any spurious information or hidden assumptions, if:

1) p ( α , r ) maximizes the Shannon [21] information entropy, S, as defined below:

S = D p ( α , r ) ln [ p ( α , r ) ] d α d r , (101)

2) p ( α , r ) satisfies the “moments constraints” defined by Equations (96)-(100);

3) p ( α , r ) satisfies the normalization condition:

D p ( α , r ) d α d r = 1. (102)

The MaxEnt distribution p ( α , r ) is obtained as the solution of the constrained variational problem H ( p ) / p = 0 , where the entropy (Lagrangian functional) H ( p ) is defined as follows:

H ( p ) = D p ( α , r ) ln [ p ( α , r ) ] d α d r λ 0 [ D p ( α , r ) d α d r 1 ] k = 1 T R λ k ( 1 ) [ D r k p ( α , r ) d α d r r k 0 ] i = 1 T P λ i ( 2 ) [ D α i p ( α , r ) d α d r α i 0 ]

1 2 k = 1 T R l = 1 T R λ k l ( 11 ) [ D r k r l p ( α , r ) d α d r c o v ( r k , r l ) r k 0 r l 0 ] k = 1 T R i = 1 T P λ k i ( 12 ) [ D r k α i p ( α , r ) d α d r c o v ( r k , α i ) r k 0 α i 0 ] 1 2 i = 1 T P j = 1 T P λ i j ( 22 ) [ D α i α j p ( α , r ) d α d r cov ( α i , α j ) α i 0 α j 0 ] . (103)

In Equation (103), the quantities λ k ( 1 ) , λ k ( 2 ) , λ k l ( 11 ) , λ k l ( 12 ) , and λ k l ( 22 ) denote the respective Lagrange multipliers, and the factors 1/2 have been introduced for subsequent computational convenience.

Solving the equation H ( p ) / p = 0 yields the following expression for the resulting MaxEnt distribution p ( z ) :

p c ( z ) = 1 Z ( b , Λ ) exp ( b z 1 2 z Λ z ) , (104)

where the various vectors and matrices are defined as follows:

z = ( r α ) ; b ( λ ( 1 ) λ ( 2 ) ) ; λ ( 1 ) ( λ 1 ( 1 ) · λ T R ( 1 ) ) ; λ ( 2 ) ( λ 1 ( 2 ) · λ T P ( 1 ) ) ; (105)

Λ ( Λ ( 11 ) Λ ( 12 ) [ Λ ( 12 ) ] Λ ( 22 ) ) ; Λ ( 11 ) ( λ 11 ( 11 ) · λ 1 , T R ( 11 ) · λ k l ( 11 ) · λ T R , 1 ( 11 ) · λ T R , T R ( 11 ) ) ; Λ ( 12 ) ( λ 11 ( 12 ) · λ 1 , T P ( 12 ) · λ k l ( 12 ) · λ T R , 1 ( 12 ) · λ T R , T P ( 12 ) ) ; Λ ( 22 ) ( λ 11 ( 22 ) · λ 1 , T P ( 22 ) · λ k l ( 22 ) · λ T P , 1 ( 22 ) · λ T P , T P ( 22 ) ) . (106)

The normalization constant Z ( b , Λ ) in Equation (104) is defined as follows:

Z ( b , Λ ) = D exp ( b z 1 2 z Λ z ) d z ; d z d α d r . (107)

In statistical mechanics, the normalization constant Z is called the partition function (or sum over states) and carries all of the information available about the possible states of the system, while the MaxEnt distribution p ( z ) is called the canonical Boltzmann-Gibbs distribution. The integral in Equation (107) can be evaluated explicitly by conservatively extending the computational domain D to the entire multidimensional real vector space N , where N T R + T P , to obtain the following expression:

Z c ( b , Λ c ) = N exp ( b z 1 2 z Λ z ) d z = ( 2 π ) N / 2 D e t ( Λ ) e ( 1 2 b Λ 1 b ) . (108)

The Lagrange multipliers are determined in terms of the known information (means and covariances of parameters and responses) by differentiating the “free energy” F ( b , C ) ln Z ( b , C ) with respect to the components of the vector b ( λ ( 1 ) , λ ( 2 ) ) to obtain the following expressions:

F ( b , C ) λ k ( 1 ) = 1 Z D r k e ( b z 1 2 z Λ z ) d z = D c r k p ( α , r ) d z = r k 0 ; k = 1 , , T R ; (109)

F ( b , C ) λ i ( 2 ) = 1 Z D α i e ( b z 1 2 z Λ z ) d z = D α i p ( α , r ) d α d r = α i 0 ; i = 1 , , T P . (110)

The results obtained in Equations (109) and (110) can be collectively written in vector-matrix form as follows:

F ( b , C ) b = z 0 ; z 0 ( r 0 , α 0 ) ; r 0 ( r 1 0 , , r T R 0 ) . (111)

On the other hand, it follows from Equation (108) that:

F ( b , C ) b = [ ln Z ( b , C ) ] b = Λ 1 b . (112)

The relations obtained in Equations (111) and (112) imply the following relation:

b = Λ z 0 . (113)

Differentiating a second time the relation provided in Equation (109) or (110) yields the following relations:

2 F ( b , C ) λ j ( 1 ) λ k ( 1 ) = 1 Z 2 Z λ j ( 1 ) r k 0 + 1 Z D ( r j ) r k e ( b z 1 2 z Λ z ) d z = cov ( r j , r k ) ; j , k = 1 , , T R ; (114)

2 F ( b , C ) λ i ( 2 ) λ k ( 1 ) = cov ( α i , r k ) ; i = 1 , , T P ; k = 1 , , T R ; (115)

2 F ( b , C ) λ i ( 2 ) λ j ( 2 ) = cov ( α i , α j ) ; i , j = 1 , , T P . (116)

The results obtained in Equations (114)-(116) can be collectively written in vector-matrix form as follows:

2 F ( b , C c ) b b = C ; C ( C r r C r α C α r C α α ) ; C r r [ cov ( r j , r k ) ] T R × T R ; C r α [ cov ( r k , α i ) ] T R × T P = ( C α r ) ; C α α [ cov ( α i , α j ) ] T P × T P . (117)

On the other hand, it follows from Equation (112) that:

2 F ( b , C ) b b = Λ 1 . (118)

The relations obtained in Equations (117) and (118) imply the following relation:

Λ 1 = C . (119)

Introducing the results obtained in Equations (113) and (119) into Equation (104) and (108) yields the following expression for the MaxEnt distribution p ( z | z , C ) :

p ( z | z , C ) = ( 2 π ) N / 2 D e t ( C ) exp [ 1 2 ( z z 0 ) C 1 ( z z 0 ) ] . (120)

Appendix B: Construction of the “Constrained Minimum Discrepancy Distribution with a Given Function”

The aim is to obtain a mathematical expression for a distribution p ( z ) , defined over a domain z D , which is normalized to unity and satisfies K independent constraints of the following form:

D g k ( z ) p ( z ) d z = c k < , k = 1 , , K ; (121)

with

D p ( z ) d z = 1 . (122)

In addition, it is required that the normalized distribution p ( z ) be “minimally discrepant from” (i.e., “fit as well as possible”) a known function f ( z ) , which is also defined over the same domain z D .

The discrepancy (i.e., “lack of fit”) between an approximate distribution f ( z ) and a true distribution p ( z ) , both defined over the same domain z D , is denoted below as δ ( f | p ) , and is defined [see, e.g., ] as follows:

δ ( f | p ) D p ( z ) log p ( z ) f ( z ) d z . (123)

The discrepancy δ ( f | p ) is also called, see e.g., [22] , the “Kulback-Leibler divergence or relative entropy.” The normalized distribution p ( z ) which minimizes the discrepancy defined in Equation (123) while satisfying the constraints expressed by Equations (121) and (122) is obtained by minimizing the Lagrangian functional F ( p ) defined below:

F ( p ) D p ( z ) log p ( z ) f ( z ) d z + k = 1 T R θ k [ D g k ( z ) p ( z ) d z c k ] + c [ D p ( z ) d z 1 ] . (124)

In Equation (124), the quantities c and θ 1 , , θ k , , θ T R are Lagrange multipliers which remain to be determined. The stationary value of F ( p ) is provided by the probability p ( z ) where the first variation of F ( p ) vanishes, i.e., p ( z ) is the solution of the following equation

δ F ( p ) d d ε { F [ p ( z ) + ε τ ( z ) ] } ε = 0 = D [ log p ( z ) f ( z ) + k = 1 T R θ k g k ( z ) + ( c + 1 ) ] τ ( z ) d z = 0 . (125)

The desired posterior distribution is the normalized solution of Equation (125), having the following expression:

p ( z ) = 1 Z f ( z ) exp [ k = 1 T R θ k g k ( z ) ] ; Z D f ( z ) exp [ k = 1 T R θ k g k ( z ) ] d z . (126)

It is important to note that in the limiting case when the function f ( z ) is the uniform distribution f ( z ) = constant , the distribution p ( z ) becomes a MaxEnt-distribution. In other words, the distribution which is least discrepant from (i.e., “best-fits”) the uniform distribution is the MaxEnt distribution.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Sasaki, Y.K. (1955) A Fundamental Study of the Numerical Prediction Based on the Variational Principle. Journal of the Meteorological Society of Japan, 33, 262-275.
https://doi.org/10.2151/jmsj1923.33.6_262
[2] Sasaki, Y.K. (1958) An Objective Analysis Based on the Variational Method. Journal of the Meteorological Society of Japan, 36, 77-88.
https://doi.org/10.2151/jmsj1923.36.3_77
[3] Eckart, C. (1960) Hydrodynamics of Ocean and Atmosphere. Pergamon Press, Oxford.
[4] Humi, I., Wagschal, J.J. and Yeivin, Y. (1964) Multi-Group Constants from Integral Data. Proceedings of the 3rd International Conference on the Peaceful Uses of Atomic Energy, Geneva, 31 August-9 September 1964, 398.
[5] Lewis, J.M., Lakshmivarahan, S. and Dhall, S.K. (2006) Dynamic Data Assimilation: A Least Square Approach. Cambridge University Press, Cambridge.
https://doi.org/10.1017/CBO9780511526480
[6] Lahoz, W., Khattatov, B. and Ménard, R. (2010) Data Assimilation: Making Sense of Observations. Springer Verlag, Heidelberg.
https://doi.org/10.1007/978-3-540-74703-1
[7] Faragó, I., Havasi, á. and Zlatev, Z. (2013) Advanced Numerical Methods for Complex Environmental Models: Needs and Availability. Bentham Science Publishers, Bussum.
https://doi.org/10.2174/97816080577881130101
[8] Cacuci, D.G., Navon, M.I. and Ionescu-Bujor, M. (2014) Computational Methods for Data Evaluation and Assimilation. Chapman & Hall/CRC, Boca Raton.
[9] Rowlands, J., et al. (1973) The Production and Performance of the Adjusted Cross-Section Set FGL5. Proceedings International Symposium on Physics of Fast Reactors, Tokio.
http://www.cern.ch/root
[10] Gandini, A. and Petilli, M. (1973) A Code Using the Lagrange’s Multipliers Method for Nuclear Data Adjustment. CNEN-RT/FI(73)39, Comitato Nazionale Energia Nucleare, Rome.
[11] Kuroi, H. and Mitani, H. (1975) Adjustment to Cross-Section Data to Fit Integral Experiments by Least Squares Method. Journal of Nuclear Science and Technology, 12, 663-680.
https://doi.org/10.1080/18811248.1975.9733172
[12] Dragt, J.B., Dekker, J.W.M., Gruppelaar, H. and Janssen, A.J. (1977) Methods of Adjustment and Error Evaluation of Neutron Capture Cross Sections. Nuclear Science and Engineering, 62, 117-129.
https://doi.org/10.13182/NSE77-3
[13] Cacuci, D.G. (2014) Predictive Modeling of Coupled Multi-Physics Systems: I. Theory. Annals of Nuclear Energy, 70, 266-278.
https://doi.org/10.1016/j.anucene.2013.11.027
[14] Cacuci, D.G. (2019) BERRU Predictive Modeling: Best-Estimate Results with Reduced Uncertainties. Springer, Berlin.
https://doi.org/10.1007/978-3-662-58395-1
[15] Jaynes, E.T. (1957) Information Theory and Statistical Mechanics. Physical Review, 106, 620-630.
https://doi.org/10.1103/PhysRev.106.620
[16] Cacuci, D.G. (1981) Sensitivity Theory for Nonlinear Systems: I. Nonlinear Functional Analysis Approach. Journal of Mathematical Physics, 22, 2794-2812.
https://doi.org/10.1063/1.525186
[17] Cacuci, D.G. (2016) The Second-Order Adjoint Sensitivity Analysis Methodology for Nonlinear Systems—I: Theory. Nuclear Science and Engineering, 184, 16-30.
https://doi.org/10.13182/NSE16-16
[18] Cacuci, D.G. (2023) Perspective on Predictive Modeling: Current Status, New High-Order Methodology and Outlook for Energy Systems. Energies, 16, Article 933.
https://doi.org/10.3390/en16020933
[19] Cacuci, D.G. (2023) Second-Order MaxEnt Predictive Modeling Methodology. II: Probabilistically Incorporated Computational Model Responses (2nd-BERRU-PMP). American Journal of Computational Mathematics, 13, No. 2.
[20] Shun, Z. and McCullagh, P. (1995) Laplace Approximation of High Dimensional Integrals. Journal of the Royal Statistical Society: Series B (Methodological), 57, 749-760.
https://doi.org/10.1111/j.2517-6161.1995.tb02060.x
[21] Shannon, C.E. (1948) A Mathematical Theory of Communication. The Bell System Technical Journal, 27, 379-423.
https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
[22] Bernardo, J.M. and Smith, A.F.M. (2006) Bayesian Theory. John Wiley & Sons, Chichester.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.