Bayes Factor with Lindley Paradox and Tow Standard Methods in Model

For any statistical analysis, Model selection is necessary and required. In many cases of selection, Bayes factor is one of the important basic elements. For the unilateral hypothesis testing problem, we extend the harmony of frequency and Bayesian evidence to the generalized p-value of unilateral hypothesis testing problem, and study the harmony of generalized P-value and posterior probability of original hypothesis. For the problem of single point hypothesis testing, the posterior probability of the Bayes evidence under the traditional Bayes testing method, that is, the Bayes factor or the single point original hypothesis is established, is analyzed, a phenomenon known as the Lindley paradox, which is at odds with the classical frequency evidence of p-value. At this point, many statisticians have been worked for this from both frequentist and Bayesian perspective. In this paper, I am going to focus on Bayesian approach to model selection, starting from Bayes factors and going within Lindley Paradox, which also briefly talks about partial and fractional Bayes factor. Trying to use a simple way to consider this paradox is the thing what I want to do in the paper. On the other hand, a detailed derivation of BIC and AIC is given in Section 4. The guiding principle of selecting the op-timal model is to investigate from two aspects: one is to maximize the likelihood function, the other is to minimize the number of unknown parameters in the model. The larger the likelihood function value, the better the model fitting, but we can not simply measure the model fitting accuracy, which leads to more and more unknown parameters in the model, and the model that becomes more and more complex would have caused an overmatch. There-fore, a good model should be the combination of the fitting accuracy and the number of unknown parameters to optimize the configuration.


Introduction
Many statisticians are naturally involved in the question of model selection [1], in case to define the "best model" to fit real data, different approaches have been proposed since last century, many well-known methods such as F-test [2], AIC, BIC [3], Bayesian model averaging [4]. We are focusing on Bayesian approach, as we analyze data from some possible models 1 , , k M M  . We denote θ ∈ Θ as parameter and ( ) π θ as prior probability, then for likelihoods ( ) However, Bayes factor has its only limitation, that is Bayes factors itself can only show the difference of how hypothesis model is against a null model [5]. Also, Bayes factor has a close connection with priors, if we change the width of the prior, it will also change the Bayes factor. At this point, we may need to consider about Lindley Paradox.
In Section 2, we give a simple and general explanation of Bayes factor. Following, in Section 3, we will talk about Lindley's Paradox. And Section 4 can be one of the main parts of the theoretical approach for AIC and BIC, for which we give the derivation. A simple example is given as well to use AIC and BIC.

Bayes Factor
Before talking about all things, first we would construct one of the most important variables within Bayesion Methods-Bayes Factor [6].
Suppose we have data D with prior θ and 1 2 , M M as two different models.
By Condition Rule, we have: Bayesian method fits in many models for testing because it can provide a decisiveness of the evidence agree the null model in contrast p-values [7] which are usually just regarded as evidence mearsurement against the alternative [8]. Also, the Bayes factor (Jefferys, 1961) [9] is used in Bayesian hypothesis. Assuming that ( )

Introduction to Lindley Paradox
The Lindley's Paradox shows how a value (or the number of standard deviations) is used in a Frequent Assumption [12] test results in a completely different inference from Bayesian hypothesis [13]. When we faced with improper priors (like priors can't integrate to one) in the null hypothesis and model selection, we will find some problems. Such priors can be acceptable, but for other purposes it is also acceptable. So we consider testing the hypotheses: So we can use different z that we want to change the posterior arbitrarily.
Meanwhile, when using proper and not clear priors might cause similar problems. Because the probability of data in a complex model with a diffuse prior will be very small. So one thing we must know, when we do research in Bayes factor a clearer and simper model is better. It was called the Lindley paradox.

A Simple Model in Lindley Paradox
Many authors [14] have discussed this so-called paradox [15] in different ways [16]. So I want to find a simple way to consider this problem. The usual point null hypothesis testing problem is to test: The Bayes factor is given by: In order to consider the paradox, we can formalise it and compare the two following normal models: Consider a physical system where quantity X may be measured and assume.
And we need to use the σ to define both the priors. The prior of the null hypothesis is 0 ρ supposing the 0 ρ can depend on σ .
Computing the Bayes factor representing the odds of the null hypothesis 0 H is: Then, we can use the mean value in prior distribution with and make the rest of the prior probability as a normal distribution with variance τ , so: Evaluating the conditional probabilities: So we have an equation like before, we can talk about the prior ( ) ρ σ . Our approach is to measure the value of alternative assumptions about zero. In Asymptotically Bayesian attribute, if the model is incorrectly specified, the posterior will accumulates in the model. In the case of the Kullback-Leibler divergence, the closest to the real model [17]. As a result, divergence represents the loss. Because we know the prior before. The excepted loss can be given: The model prior represent the loss relatied with a probability statement, it also determined self-information loss function. So we have the prior on the alternative model is: The prior of the null hypothesis is ( ) 1 ρ σ ∝ , then we can get: ( )

Derivation of BIC
In this section we are going to talk about the basic idea [18] of how BIC (Bayesian information criterion) constructed and given the derivation of BIC [4].
As what we have showed in section one, For which i I θ is the Fisher information matrix for a single data point 1 y , and after substituting we final get for BIC:

Derivation of AIC
We can measure the quality of ( ) j p y (as an estimate of p) by the Kullback-Leibler distance [19]:

Example of Simple Model
Let us consider again with the example in section 3, if we take data ( ) 1 . Which AIC is estimate a constant plus the relative distance between unknow likelihood function.

Conclusion
The question of how to choose a best model and what is a best model, it is hard to define. More precise, the controversy has existed for a long time, and no doubt it will continue longer. In this paper, we have discussed Bayes factor in hypothesis. It is obviously that Bayes factor is increasingly used in many fields of statistic research. For Bayes factor standard methods, AIC and BIC, we would consider to use for model selection. However, we also should notice that for all methods they all have their own limitation, such as the sensitivity of priors in Lindley's paradox. Even both frequentist and Bayesian statisticians have came up with different new ideas, it is still hard to be implemented or understand by all other. Moreover, from statistic point, the method also needs to be general enough to apply. Such as for Lindley's paradox, the partial Bayes factor in case to avoid the sensitive of priors, it takes the minimal training sample from data set to get prior and then apply with rest of the data. Partial Bayes factor at some point did deduce the influence of sensitivity of prior, but how to find the minimal training sample could also be a hard problem. Same as fractional Beyes factor, even it proves the method of choosing data for partial Bayes facto, it still has many limitations we need consider.