Extending the Behrens-Fisher Problem to Testing Equality of Slopes in Linear Regression : The Bayesian Approach

Testing the equality of means of two normally distributed random variables when their variances are unequal is known in the statistical literature as the “Behrens-Fisher problem”. It is well-known that the posterior distributions of the parameters of interest are the primitive of Bayesian statistical inference. For routine implementation of statistical procedures based on posterior distributions, simple and efficient approaches are required. Since the computation of the exact posterior distribution of the Behrens-Fisher problem is obtained using numerical integration, several approximations are discussed and compared. Tests and Bayesian Highest-Posterior Density (H.P.D) intervals based upon these approximations are discussed. We extend the proposed approximations to test of parallelism in simple linear regression models.


Introduction
Suppose that x 1 and x 2 are two independent normal random variables with means μ 1 and μ 2 , and variances 2 1 σ and 2 2 σ , respectively.Samples of sizes n 1 and n 2 drawn from the corresponding populations are denoted by ij x ( 1, 2 i = and 1, 2, , i j n = ).It is desired to test the hypothesis H 0 : μ 1 = μ 2 when it cannot be taken as known that the variance ratio this paper to list or survey these solutions.However, the Bayesian approach to this problem, viewed as one of the most fascinating approaches of statistical inference on the means of heterogeneous normal populations, will be the main focus of this paper, and we shall also give special attention to the problem of testing parallelism of two linear regression lines when the variances of the error terms are not equal.
The exact Bayesian solution to the BF problem was given by Jeffreys [1] which was identical to the solution of Behrens [2] and Fisher [3].The resulting posterior distribution of U µ µ = − , which is the primitive of a valid Bayesian procedure, possesses a form requiring an integration that cannot be performed analytically, and hence, direct and routine implementation of this test is impossible.Although evaluation of such an integral can be achieved numerically, using the approaches described by Reilly [4] and Naylor and Smith [5], and by Monte-Carlo integration as given in Robert and Casella [6] it is of interest to provide expressions for routine applications under the Bayesian approach.
The paper has two chief objectives.First we present a comparison among several approximations to the tails of the posterior distribution of the variate , where μ j is the mean of the j th population.Second we extend the methodologies to address the question of parallelism or equality of slopes when the variances of the error terms of the two regression lines are not equal.Recommendations will follow the examples in the last sections.

Testing Equality of Normal Means: Posterior Analysis
Based on the samples outcome, the usual sufficient statistics are defined as follows: x n = = ∑ and ( ) ( ) The likelihood function of the combined data from two samples drawn independently from two normal populations with means μ 1 and μ 2 , and variances σ and 2 2 σ , respectively, is proportional to: Following Lee [7], we use the non-informative prior for the means and the standard deviations ( ) From Box & Tiao [8], the joint posterior density of ( ) µ µ σ σ is the product of (2.1) and (2.2) and is then given by: ( ) σ σ out of (2.3), the marginal posterior density of The conditioning in Equation (2.4) is on the represents the data vector x.
Equation (2.4) was obtained by Jeffreys [1].In the Bayesian approach, inferences about U are completely determined by (2.4), which is not amenable to simple manipulation in order to have the tests on U conducted in a routine fashion.Before applying some of the suggested approximations, we should understand the nature of the problem, and in order to draw safe conclusions, we consider not only the marginal posterior density of u, but also we need to examine the other components of (2.3).For this purpose, we state the following results without proof, since they are easily obtained by application of the calculus of probability: 1) The conditional posterior pdf for U, given μ 2 , is the univariate student's ( ) , 1 2) The marginal posterior pdf for the variance ratio λ is such that ( ) s s λ has the well-known Snedecor's F-distribution with ( ) n − , and ( ) grees of freedom, i.e., ( ) 3) The conditional posterior pdf for U, given λ, is such that where ( ) ( )( ) s do not differ much, the conditional distribution in (2.7) will remain nearly constant over a wide range of λ values.In the next section we derive different approximations to deal with the situation when the range of λ is wide enough to have a considerable effect on t.

Monte Carlo Integration (Exact Approximation)
We shall write the posterior density of U as follows: In (3.1a) we face the same problem as we have with (2.4), however, this equation will be the tool of discussion in the remainder of this section.As can be seen evaluation of the posterior density of U is just evaluating the integral in (3.1a) which can be written as: Referring to Robert and Casella [6], the principle of the Monte Carlo method for approximating (3.1b) is to generate sample ( ) , , , n λ λ λ from their densi- ty ( ) and suggest as an approximation the empirical average ,

Moments (Scale) Approximation
The second approximation to be considered is by the method of moments.
While Patil [10] derived the posterior moments of U using (2.4), they are identical to the posterior moments Denoting the r th central moment of U by r m , and its 4 th cumulant by 4 l , we can show from Equation (3.4) that ( ) ( ) The fourth cumulant 4 l is: Following Patil [10], we suggest the following approximation to the posterior distribution of δ: In the above equation [s] means the smallest integer larger than s.Thus one may use tables of student's t-distribution with [b] degrees of freedom to make tests and construct approximate H.P.D. intervals for U.In other words: ( )

Modal Approximation
Barnard [9] suggested that if the intention is to make inference on U alone by integrating out λ and it is found that ( ) were sharp with most of its probability mass concentrated over a small region about its mode denoted by λ * , performing the numerical integration in (3.1a) will be nearly equivalent to assigning the model value to λ in the conditional density, ( ) ) , then substituting in (3.7) we have: Therefore, as a modal approximation to the posterior distribution of we take: .9)

Approximation Based on Averaging
The following lemma due to Feller is quite appealing and may be used to ap- , n n → ∞ , then by taking ( ) φ the right hand integral of (3.1a) can be approximated by: where Accordingly, one may take as an approximating distribution to the posterior distribution of U.

Edgeworth Expansion
The Edgeworth expansion has been used extensively by many authors in order to approximate the density function of any statistics n ν .We refer to the paper by Barndorff-Nielson and Cox [12] for an overview and comparison between these techniques.The Edgeworth expansion, due to Edgeworth [13] [14], for the density function of any statistic ( ) ( ) at a point c is given by the general formula: where 1 θ and ( ) θ − are respectively the coefficients of skewness and Kurtosis for the density of n ν .As can be seen when 1 0 ϑ = , the terms of order − + = at c is given by: ( ) ( ) The Edgeworth expansion for the density of U can easily be obtained, from Equation (3.12), by using linear transformation, Although we are not approximating the distribution function directly, in practice these approximations may be used for calculating tail areas.Thus it is of interest to see how these approximations for the posterior distributions of Oncotype DX is a commercial assay used for making decisions regarding the treatment of breast cancer.The results are reported as a tumor recurrence score ranging from 0 to 100, Klein et al. [15] showed that the recurrence score correlated (among other genetic factors) with the tumor grade.That is there was a significant difference in the mean recurrence scores in patients with tumor grade 1 as compared to the mean recurrence score of patients with tumor grade 3.In this example we present the summary data for 40 and 37 breast cancer women samples with mean recurrence scores in tumor rage 1 and tumor grade 3.This data is a good example that we can use to apply the proposed methodology.The following results were obtained: 171.25 s = Approximations, discussed in section 3, to the posterior density of U µ µ = − are applied to the above data.In addition we also consider the usual asymptotic normal approximation which for large n 1 and n 2 , is given as suggested by Welch [16] as: The results of the data analyses based on the proposed approximations are presents in Table 1.
As we can see, all approximations have confidence limits close to the exact limits, probably because the sample sizes are moderately large.We provide the R-codes for the calculations of these limits in the Appendix within the applications of linear regression.

Extension of the Behrens-Fisher Problem to Testing Equality of Slopes of Two Independent Regression Lines
Linear and nonlinear regression models are ubiquitous in medical and biological research.Testing equality of slopes of two linear regression lines is of special interest.This is illustrated in the following application which is a continuation of example 1.
Table 1.Approximate 95% HPD for the difference in two means under heterogeneity using the breaking strength data.Ki67 is a commonly used marker of cancer cell proliferation, and has significant prognostic value in tumor recurrence of breast cancer.In this illustration which is a continuation to example 1, we use a sub-sample of women who had Oncotype DX testing performed and available Ki67 indices which correlated with tumor grades (1 versus 3).Literature documented that Ki67 scores contribute significantly to models that predict risk of recurrence in breast cancer, for example see; Cuzick et al. [17], Klein et al. [14], and more recently Thakur et al. [18].In this example we examine the relationship between Ki67 as predictor of tumor recurrence scores of breast for tumor grade 3 and grade 1 separately.Of interest is to test the equality of the slopes of the linear regressions of tumor recurrence score on the Ki67 for both grades.We use Bayesian methodology to answer this question.
We shall use the following notations.First we denote the independent variable by x to predict values of the dependent variable denoted by y.
Suppose that we have two linear regression lines, then conditional on ij x we assume that In general we assume that we have two conditions, and we would like to estimate In our Bayesian analysis we shall take a reference prior that is independently uniform in , j j α β and log j φ , such that: Let us define the following quantities: For the two regression lines, the joint posterior of ( ) , , , , , , , , , , , Integrating out 1 α and 2 α , the joint posterior of ( ) , , , β β φ φ is thus given as: ( ) Integrating out 1 φ and 2 φ we get: Under the transformation δ β β = − , one can show that the posterior den- sity of δ is given by: ( where 2 The Bayesian inferences on δ are completely determined by Equation (4.4) which we cannot easily manipulate in order to have statistical inferences test on δ conducted in a routine fashion.Moreover, it is clear from (4.4) that the exact marginal posterior distribution of ( ) t-distribution with r j degrees of freedom.
Similar to testing the equality of means of two normally distributed distributions and as shown in the first part of the paper, we use the suggested approximations to the integral given in (4.4) to find the marginal posterior distribution of δ.However, it is quite helpful to not only examine the posterior density of δ, but also examine the components of the joint posterior density given in (4.4).
For this purpose, we state the following results without proof, since they can be easily obtained by applications of the calculus of probability.
1) The conditional posterior p The unconditional posterior mean and variance of δ are: 2) The marginal posterior p•d•f of the variance ratio λ is a scale multiplicative of the F-distribution.That is: Therefore, the posterior moments of λ are: Using the inverse moments of the F-distribution we can show that: , , E x y b b δ λ = − ( ) These results are derived based on the fact that conditional on λ, the posterior distribution of Is that of a t-distribution with ( ) + − degrees of freedom.

Modal Approximation
As suggested by Box and Tiao [8], we approximate the posterior density on replacing λ by its modal value so that: As a t-statistic with ( ) is the modal value of λ.

Approximation Based on Averaging
Here we find that the conditions of Feller's [8] lemma are satisfied by the two central moments of λ .This is because: ( ) , n n → ∞ .By taking ( ) (5.2)   can be approximated by  As we can see, for tumor grade 1, the association between Ki67 and recurrence is quite weak, but the association is stronger for tumor grade 3.

Remarks:
For all the proposed methods, our data analysis approach is simulation-based.
The number of replications used is sufficient.For the Monte-Carlo integration, which we consider to be the exact we monitored the simulation.As we can see in Figure 3 the sequence of simulations tends to stabilize as we approach the specified number of simulations.
We should note that the red line bands in Figure 3 are not 95% confidence bands in classical sense, but they correspond to the confidence assessment that is produced for every number of iteration, if we decide to stop at this number of iterations.The 2.5% and 97.5% quantiles for the methods are shown in Table 2.

Discussion
In the data analytic part, one may be interested in the shape of the density of the approximating distributions and how they deviate from the exact density.We did not discuss this issue since most of the time we are interested in the tail area Table 2. Approximate 95% HPD for the difference in two slopes of linear regression lines heterogeneity using the Tumor recurrence scores data. of the distribution in order to construct confidence intervals on the mean difference or the difference of slopes.From Table 1 and Table 2, we can see that all the approximations perform well when compared to the exact limits.However, the scale approximation, which uses the first 4 central moments, provides more accurate confidence limits.The modal approximation seems to err in the upper limits of difference between the target parameters.Our general conclusion is that when the sample sizes are reasonably large as in the breast cancer example all the approximations (except the modal) may be used.For sample sizes and routine implementation of the proposed test, the scale (the four moments) approximation is recommended.
m 4 to the second and fourth central moments of the R.H.S. of ~ in (3.6) we have:

1 2 U
µ µ = − perform at the tail.Example 1: "Mean tumor recurrence scores in breast cancer patients stratified by tumor grades."

4 µ
∆ to the second and fourth control moments of the R•H•S of d = in (5.1) we get:

ξ
mean the smallest integer larger than ξ.Thus one may use tables of student's t-distribution with [ ] ν degrees of freedom to construct H.P.D. intervals on δ.As can be seen this result is identical to the moment or the scale approximation of the posterior distribution of the difference between means.

Figure 2 .
Figure 2. Scatter plot of recurrence score against Ki67 for tumor grade 3.

Figure 3 .
Figure 3. Monitoring the approximation of the function ( ) , , x y π δ λ with mean ± two standard deviations against the iterations for a single sequence of simulations. 6