^{1}

^{2}

Bayesian model averaging (BMA) is a popular and powerful statistical method of taking account of uncertainty about model form or assumption. Usually the long run (frequentist) performances of the resulted estimator are hard to derive. This paper proposes a mixture of priors and sampling distributions as a basic of a Bayes estimator. The frequentist properties of the new Bayes estimator are automatically derived from Bayesian decision theory. It is shown that if all competing models have the same parametric form, the new Bayes estimator reduces to BMA estimator. The method is applied to the daily exchange rate Euro to US Dollar.

Several models are a priori plausible in statistical modeling; it is thus quite common nowadays to apply some model selection procedure to select a single one. For an overview of frequentist model selection criteria, see Leeb and Poetscher [

1) a distribution family of the observation (sampling distribution),

2) a prior distribution for the parameter,

3) a loss function associated to a decision

The posterior distribution of

A posterior distribution and a loss function lead to an optimal decision rule (Bayes rule), together with its risk function and its frequentist properties.

Consider a situation in which some quantity of interest, μ, is to be estimated from a sample of observations that can be regarded as realizations from some unknown probability distribution, and that in order to do so, it is necessary to specify a model for the distribution. There are usually many alternative plausible models available and, in general, they each lead to different estimates of μ. Consider a sample of data, x, and a set of K models _{t}. Each M_{k} consists of a family of distributions_{k} is the true model is denoted by _{k} (given that M_{k} is true) by

where

is the integrated likelihood under M_{k}. If

Bayesian model selection involves selecting the “best” model with some selection criterion; more often the Bayesian information criterion (BIC), also known as the Schwarz criterion [

Let μ be a quantity of interest depending on x, for example a future observation from the same process that generated x. The idea is to use a weighted average of the estimates of μ obtained using each of the alternative models, rather than the estimate obtained using any single model. More precisely, the posterior distribution of μ is given by

Note that _{k} is the true model.

The posterior distribution of μ, conditioned on M_{k} being true, is given by

The posterior mean and posterior variance are given by

A classical reference is Hoeting et al. [

Clyde and Iversen [

An R [

The purpose of this section is to define a new BMA method. The prior of the quantity of interest can be defined as

where _{k}.

The parametric statistical model

with _{k} (i.e. the sampling distribution of M_{k}). The use of Bayes rule leads to the posterior of the quantity of interest

Defining a loss function, Bayesian estimates are then obtained with its long and short run properties known. All the frequentist properties of Bayes rules now apply, in particular one can find conditions under which there are consistent and admissible. This approach is referred to as Mixed based Bayesian model averaging (MBMA).

Proposition 1. Under (8) and (9), assuming that for all k and j,

Proof.

Since,

Dividing (a) by (b) yields the result.

Corollary 2. Suppose that all the models have identical sampling distribution, that is

Proof.

In the numerator of (11),

The numerator of (11) is therefore

Therefore, the denominator of (11),

a mixture of marginal distributions.

Therefore

Thus in this special case, the posteriors mean and variance using the MBMA are those of BMA given in Equations (6) and (7).

Evaluating the long run properties of MBMA involves studying frequentist issues, including: asymptotic methods, consistency, efficiency, unbiasedness, and admissibility. Details about derivations for more general Bayes estimates can be found e.g. in Gelman [_{0} the value of the parameter that makes the model distribution closest (e.g. in the sense of Kullback-Leiber information) to the true distribution.

1) If the sample size is large and the posterior distribution

2) If the likelihood (_{0} is not on the boundary of the parameter space, as the sample size n tends to ∞, the posterior distribution of μ approaches normality with mean μ_{0} and variance _{0}.

3) Suppose the normal approximation for the posterior distribution (_{0} and the true data distribution is included in the class of models, then

4) When the truth is included in the family of models (

5) If a prior distribution (

One measure of predictive performance is the Good’s logarithm score rule [

Applying this to MBMA leads to

MBMA provides thus better predictive performance than any single model.

Laplace distribution,

Laplace distribution (the double exponential) is symmetric with fat tails (much fatter than the normal). It is not bell-shaped (it has a peak at

Suppose that the mean is known and the quantity of interest is

_{1} and M_{2}, i.e. 0.5 each; after observing data, M_{1} is more likely to be true (0.83) than M_{2} (0.17). While M_{1}, M_{2} and MBMA have priors (over the parameter of interest) and statistical models; BMA does not have. This implies that the frequentist properties of MBMA can be automatically derived form Bayesian decision theory (see Subsection); this is not possible for BMA. The bayesian estimates (conditional on the observations) of these models are very similar, with MBMA having the smaller conditional variance (0.03).

In general, as Bayes estimate, the form of the posteriors mean and variance for MBMA are not known in advance; in a special case, the properties of MBMA are those of BMA and are given in Equations (6) and (7). Posterior distributions of MBMA are very complex, thus a major challenge is in computing. MBMA estimate is thus computationally demanding (but feasible) since the posterior

Model | ||||||
---|---|---|---|---|---|---|

M_{1} | 0.5 | 0.83 | 0.373 | 0.033 | ||

M_{2} | 0.5 | 0.17 | 0.345 | 0.052 | ||

MBMA | NA | Mixture | Mixture | NA | 0.323 | 0.030 |

BMA | NA | NA | NA | NA | 0.368 | 0.036 |

This paper proposes a new method (with application) for model averaging in Bayesian context (MBMA) when the main focus of a data analyst is on the long run (frequentist) performances of the Bayesian estimator. The method is based on using a mixture of priors and sampling distributions for model averaging. When conditioning on data at hand, the well popular Bayesian model averaging (BMA) should be preferable, given the complexity in computing of MBMA. MBMA is especially useful when exploiting the well known frequentist properties within the framework of Bayesian decision theory.

We thank the editor and the referee for their comments on earlier versions of this paper.

Georges Nguefack-Tsague,Walter Zucchini, (2016) A Mixture-Based Bayesian Model Averaging Method. Open Journal of Statistics,06,220-228. doi: 10.4236/ojs.2016.62019