^{1}

^{2}

It is quite common in statistical modeling to select a model and make inference as if the model had been known in advance; i.e. ignoring model selection uncertainty. The resulted estimator is called post-model selection estimator (PMSE) whose properties are hard to derive. Conditioning on data at hand (as it is usually the case), Bayesian model selection is free of this phenomenon. This paper is concerned with the properties of Bayesian estimator obtained after model selection when the frequentist (long run) performances of the resulted Bayesian estimator are of interest. The proposed method, using Bayesian decision theory, is based on the well known Bayesian model averaging (BMA)’s machinery; and outperforms PMSE and BMA. It is shown that if the unconditional model selection probability is equal to model prior, then the proposed approach reduces BMA. The method is illustrated using Bernoulli trials.

Statistical modeling usually deals with situation in which some quantity of interest is to be estimated from a sample of observations that can be regarded as realizations of some unknown probability distribution. In order to do so, it is necessary to specify a model for the distribution. There are usually many alternative plausible models available and, in general, they all lead to different estimates. Model uncertainty refers to the fact that it is not known which model correctly describes the probability distribution under consideration. A discussion of the issue of model uncertainty can be found e.g. in Clyde and George [

In frequentist approach the estimator obtained after model selection is referred to as post-model selection estimator (PMSE) whose properties are difficult to derive (Berk et al. [

Bayesian model selection involves selecting the “best” model with some selection criterion; more often the Bayesian information criterion (BIC), also known as the Schwarz criterion [

Conditioning on data at hand (it is usually the case), Bayesian model selection is free of model selection uncertainty. Since Bayesian inference is mostly concerned with conditional inference, this phenomenon is often overlooked so long as one is concerned with unconditional inference. Thus the motivation of this paper to raise awareness of the fact that model selection uncertainty is present in Bayesian modeling when interest is focused on frequentist performances of Bayesian post-model selection estimator (BPMSE).

The present paper is organized as follows: Section 2 presents the problem while Section 3 highlights the difficulties of assessing the frequentist properties of BPMSEs. The new method for taking into account model selection uncertainty is shown in Section 4 while an application for Bernoulli trials is given in Section 5. The papers ends with Concluding remarks.

Bayesian model selection (formal or informal) can be summarized by the following main steps:

1. Quantity of interest

2. Data

3. Use x for exploratory data analysis

4. From (3), specify

5. Use any model selection criteria and data x to select a model (model uncertainty)

6. Specify a prior distribution for

7. Compute the posterior distribution for

8. Define a loss function.

9. Find the optimal decision rule. E.g. for square error loss,

More on Bayesian theory can be found in Gelman et al. [

Bayesian post-model-selection estimator (BPMSE) is referred to the Bayes estimator obtained after a model selection procedure has been applied. Here, a squared error loss is considered, but the main idea remains unchanged for any other loss function. Given the selection procedure, BPMSE can been written as

where

Long-run performance of Bayes estimators: Usually, the goal of the analysis is to select a model for inference using any selection procedure. One is interested in evaluating the long run performance (frequentist performance) of the selected model. In general, Bayes estimators have good frequentist properties (e.g. Carlin and Louis [

Interest is focused on studying the frequentist properties of

The frequentist risk: The frequentist risk of BPMSEs is defined as

where L is a loss function. One can now see that this risk is difficult to compute; it is hard to prove admissibility and minimaxity properties of BPMSEs, since their associated priors are not known.

Coverage probabilities: When the data have been observed, one can construct a confidence region.

Suppose that after observing the data, model

and then derives an approximate region at the

where

A stochastic version (assuming normality) is given by

The coverage probability of the stochastic form is given by

which is now difficult, as it involves computing the variance and expectation of BPMSE.

Consistency: Another frequentist property of Bayes estimators is consistency. It is shown that, under appropriate regularity conditions, Bayes estimators are consistent (Bayarri and Berger [

In this framework, interest is focused with the long run performance of BPMSES, not on posterior evaluation, since in the posterior evaluation, the model selection uncertainty problem does not exist. Under model selection uncertainty, from Equation (1), a fundamental ingredient is the selection procedure S. This selection procedure should depend on the objective of the analyst and should be taken into account in modeling uncertainty at two levels: prior and posterior to the data analysis. In the following, we define the posterior quantity and derive Bayesian-post-model selection in a coherent way. The new method is referred to as Adjusted Bayesian model averaging (ABMA).

The initial representation of model uncertainty is captured by parameter prior uncertainty and the model space prior, the selection procedure is used to update model prior. Formally, consider the possible models

where

model.

The true state of the nature is that a given model is true; the decision here is to select a model. Given that model

The expectation is taken with respect to the true model

with probability one,

That is, if

On the other hand, if

When the data have been observed, the posterior model selection probability for each model

Nature and Decision | ... | ... | ||||
---|---|---|---|---|---|---|

- | - | |||||

- | - | |||||

... | - | - | - | - | - | - |

- | - | |||||

... | - | - | - | - | - | - |

- | - |

where

is the marginal likelihood of

Posterior distribution: After the data x is observed, and given the selection procedure S, from the law of total probability, the posterior distribution of

Posterior mean and variance:

Proposition 1 Under Equation (8), the posterior mean and variance are given by

where

Proof. Under Equation (8), the posterior mean is

The posterior variance under Equation (8) is

The method can be then summarised as follows:

1.

2.

3.

Note that if the unconditional model selection probability is equal to model prior, then the proposed weights are the same as BMA weights, namely the probability that each model is true given the data,

A basic property: From the non-negativity of Kullback-Leiber information divergence, it follows that

where the expectation is taken with respect to the posterior distribution in Equation (8). This logarithm score rule was suggested by Good ( [

For computational purposes,

where

where

Let

The Bayes risk of

For some models, beta prior will be used for

is the Bayes estimate of

Various results obtained in this Section are not sensitive to the variation of different parameters. R software [

(a)

The posterior model probabilities

Model 1 is selected if

BMA corresponds to weighting the models with their posterior; the corresponding estimator is

The BPMSE

For illustration of the case

(b) Consider the following two models:

Let the selection procedure consisting of choosing the model with higher posterior.

The parameters for simulating

(c) Consider the following two models:

Estimators for

(a) Consider also a choice between the following models:

(b) Consider also a choice between the following models:

A good feature of integrated risk is that it allows a direct comparison of estimators (since it is a number). Con-

sider a choice between the following models:

For each model (between 10 and 200), the integrated risk is computed and comparisons of estimators is given in

This paper has proposed a new method of assigning weights for model averaging in a Bayesian approach when

the frequentist properties of the estimator obtained after model selection are of interest. It was shown via Bernoulli trials that the new method performs better than Bayesian post-model selection and Bayesian model averaging estimators using risk function and integrated risk. The method needs to be applied in more realistic and myriads situations before it can be validated. In addition, further investigations are necessary to derive its theoretical properties, including large sample theory.

The authors thank the Editor and the referee for their comments on earlier versions of this paper.

Georges Nguefack-Tsague,Walter Zucchini, (2016) Effects of Bayesian Model Selection on Frequentist Performances: An Alternative Approach. Applied Mathematics,07,1103-1115. doi: 10.4236/am.2016.710098