Estimation of a Linear Model in Terms of Intra-Class Correlations of the Residual Error and the Regressors

Juha Lappi

doi:10.4236/ojs.2022.122013

Open Journal of Statistics > Vol.12 No.2, April 2022

Estimation of a Linear Model in Terms of Intra-Class Correlations of the Residual Error and the Regressors

Juha Lappi
University of Eastern Finland, Joensuu, Finland.
DOI: 10.4236/ojs.2022.122013 PDF HTML XML 105 Downloads 493 Views

Abstract

Objectives: The objective is to analyze the interaction of the correlation structure and values of the regressor variables in the estimation of a linear model when there is a constant, possibly negative, intra-class correlation of residual errors and the group sizes are equal. Specifically: 1) How does the variance of the generalized least squares (GLS) estimator (GLSE) depend on the regressor values? 2) What is the bias in estimated variances when ordinary least squares (OLS) estimator is used? 3) In what cases are OLS and GLS equivalent. 4) How can the best linear unbiased estimator (BLUE) be constructed when the covariance matrix is singular? The purpose is to make general matrix results understandable. Results: The effects of the regressor values can be expressed in terms of the intra-class correlations of the regressors. If the intra-class correlation of residuals is large, then it is beneficial to have small intra-class correlations of the regressors, and vice versa. The algebraic presentation of GLS shows how the GLSE gives different weight to the between-group effects and the within-group effects, in what cases OLSE is equal to GLSE, and how BLUE can be constructed when the residual covariance matrix is singular. Different situations arise when the intra-class correlations of the regressors get their extreme values or intermediate values. The derivations lead to BLUE combining OLS and GLS weighting in an estimator, which can be obtained also using general matrix theory. It is indicated how the analysis can be generalized to non-equal group sizes. The analysis gives insight to models where between-group effects and within-group effects are used as separate regressors.

Keywords

Best Linear Unbiased Estimator, Ordinary Least-Squares, Generalized Least Squares, Singular Correlation Matrix, Between-Group Effects, Within-Group Effects

Share and Cite:

Lappi, J. (2022) Estimation of a Linear Model in Terms of Intra-Class Correlations of the Residual Error and the Regressors. Open Journal of Statistics, 12, 188-199. doi: 10.4236/ojs.2022.122013.

1. Introduction

Ordinary least squares (OLS) and generalized least squares (GLS) are basic methods to estimate coefficients of regressor variables in regression equations that are linear with respect to coefficients. Let OLSE and GLSE denote the corresponding estimators. OLS minimizes the sum of squared residuals (deviations from the estimated model), while GLS minimizes the estimation variances of the coefficient estimates if the residual errors (deviations from the true model) have non-equal variances and/or are correlated between different observations. OLS estimates are unbiased also for cases where GLS estimates have smaller variances, but the estimated variances of the coefficient estimates can be biased, leading to biased t- and F-tests. GLS estimation utilizes the inverse of the covariance matrix of residual errors. Thus, GLSE does not exist if the inverse does not exist, i.e., the covariance matrix is singular. If the inverse exists, then GLSE is the best linear unbiased estimator, BLUE. It requires deeper matrix algebra to derive BLUE when GLSE does not exist. Further complications are caused if there are linear dependencies between predictor variables, mathematically if the model matrix, to be presented shortly, does not have full column rank. In practice, such difficulties can be avoided by dropping linearly dependent predictors. It is assumed here that there are no such dependencies.

In the history of statistics, attention has been given to analyze what are the necessary and sufficient conditions that guarantee that OLSE is BLUE. Reference [1] describes the influential role of C. R: Rao in this question.

When analyzing grouped data, a common model for correlated residual errors is to assume that there is a constant correlation between all observations of each group, and members of different groups are uncorrelated. Let $ρ_{e}$ denote this intra-class correlation, and let i. denote group, and j denote an observation number within a group. A non-negative intra-class correlation of residual error is implied by assuming a variance component model $e_{i j} = v_{i} + ε_{i j}$ , where $v_{i}$ is the random group effect with mean zero and variance $σ_{v}^{2}$ and $ε_{i j}$ is a random individual effect with variance $σ_{ε}^{2}$ and uncorrelated with $v_{i}$ . Thus, $ρ_{e} = σ_{v}^{2} / (σ_{v}^{2} + σ_{ε}^{2})$ . The intra-class correlation of residual errors can be also negative. If $ρ_{e}$ is positive, then two observations taken from a given group are, on average, more alike than two observations taken from different groups. When $ρ_{e}$ is negative, then two observations taken from a given group are, on average, more different that two observations taken randomly from the whole population. Positive $ρ_{e}$ implies that group averages have a larger variance than a random sample of equal size from the whole population, while negative $ρ_{e}$ implies that group averages vary less than averages of samples from the whole population. On the extreme, all group means are equal, and all the variation is between individuals. While assuming a positive $ρ_{e}$ , it is not necessary to specify the group size (of course, when dealing with data, group size plays an important role). The effect of a negative $ρ_{e}$ is always dependent on the group size, and it does not make sense to assume constant negative $ρ_{e}$ for different group sizes. The possibility of a negative $ρ_{e}$ is too often ignored. Even if the overall $ρ_{e}$ is positive, due to natural variation between groups, the competition between individuals from limited resources is decreasing $ρ_{e}$ , and this competition should be described when analyzing grouped data. The lower limit of $ρ_{e}$ can be obtained from the condition $var (\sum_{j = 1}^{n} e_{i j}) = n σ_{e}^{2} + n (n - 1) ρ_{e} σ_{e}^{2} \geq 0$ , which implies that $ρ_{e} \geq - 1 / (n - 1)$ . The upper limit of $ρ_{e}$ is 1. This paper assumes a constant $ρ_{e}$ and a constant group size, which allows a straightforward treatment of negative $ρ_{e}$ .

Even if the matrix formulas of OLSE and GLSE are simple, it may remain intuitively unclear how the correlation structure of residual errors and the values of the regressor variables interact. It is shown that this interaction can be described in terms of the empirical intra-class correlations of the regressor variables. Closed form algebraic formulas are first derived for a special case where a simple linear model is estimated. These derivations are then generalized to a model with several regressors. They give a concrete and easily understood explanation and illustration for the matrix theory results which go beyond the mathematical expertise of most scientists who are just interested to apply statistical methods in research

This paper discusses the following questions: 1) How do the variances of the GLS estimates depend on the intra-class correlation of residuals and on the values of the x-variable in a simple linear model? 2) What are the ratios of the GLS variances and OLS variances? 3) What is the bias in the estimated variances of the coefficient estimates when OLS is used? 4) How does GLS utilize the group means of the regressors and the deviations from the group means? 5) How this formulation shows why OLS is sometimes equal to GLS? 6) How can a BLUE be constructed when the correlation matrix of residual errors is singular, i.e., $ρ_{e} = 1$ or $ρ_{e} = - 1 / (n - 1)$ and the intra-call correlations of regressors gets any values? Similarly, as with $ρ_{e}$ , the extreme values of the regressor correlations need to be considered separately. This last BLUE part may not be of interest to practicing scientists, but it may serve as an introduction to the BLUE problem for researchers, who cannot fluently read matrix theory. For matrix theory experts this part may be trivial or self-evident.

Reference [2] compared the GLS estimation of a simple regression line with OLS estimation using differences in the predictor variable. This work is here put into a general context. The results reported in [3] regarding the bias of F-tests when OLS is used, are exemplified and generalized. Algebraic formulas show how OLSE, GLSE, BLUE and singular correlation matrices are related, and thus illustrate the matrix results presented in [4] [5], and [6]. Reference [5] provides the matrix conditions that are used to show that a suggested estimator is BLUE.

Let us then move to more formal definitions. Let us assume a model $y = X b + e$ , where $y$ is a N dimensional random vector, $X$ is $N \times p$ model matrix, $b$ is p dimensional fixed coefficient vector, $E (e) = 0$ , and $var (e) = σ^{2} V$ . If $b$ is estimated with the GLS estimator ${\hat{b}}_{GLS} = {(X^{'} V^{- 1} X)}^{- 1} X^{'} V^{- 1} y$ , then $var ({\hat{b}}_{GLS}) = σ^{2} {(X^{'} V^{- 1} X)}^{- 1}$ . If $b$ is estimated with the OLS estimator ${\hat{b}}_{OLS} = {(X^{'} X)}^{- 1} X^{'} y$ then $E ({\hat{b}}_{OLS}) = b$ , and $var ({\hat{b}}_{OLS}) = σ^{2} {(X^{'} X)}^{- 1} X^{'} V X {(X^{'} X)}^{- 1}$ . In the OLS regression, $σ^{2}$ is estimated with ${\hat{σ}}_{OLS}^{2} = {(N - p)}^{- 1} u^{'} u$ , where $u$ is the residual vector $y - X {\hat{b}}_{OLS}$ . Then, $u^{'} u = e^{'} M e$ where $M = I - X {(X^{'} X)}^{- 1} X^{'}$ . The expected value of $e^{'} M e$ is $σ^{2} trace (M V)$ . Thus, $E ({\hat{σ}}_{OLS}^{2}) = {(N - p)}^{- 1} trace (M V) σ^{2}$ .

It is now assumed that that data consist of $K \geq 2$ groups, indexed with i, each group has $n \geq 2$ observations, indexed with j. It is denoted that $N = K n$ . $V$ is block diagonal so that each block $V_{i}$ has a compound symmetry correlation structure where the non-diagonal elements are equal to the intra-class correlation $ρ_{e}$ . The term “intra-class correlation” is used for historical reasons, but otherwise the term “group” is used. The intercept is the first parameter in $b$ , i.e., the first column in $X$ is vector of ones, $1_{N}$ . It is assumed further that other columns are centered, i.e., the sum of elements of the column is zero. This assumption makes the analysis simpler. Total and group averages are denoted as $\bar{x}$ , ${\bar{x}}_{i}$ , $\bar{y}$ and ${\bar{y}}_{i}$ .

Closed form algebraic equations are initially derived for the simple linear model $y_{i j} = a + b x_{i j} + e_{i j}$ . A model with several regressors is presented later. Centering means that the working model is $y_{i j} = a^{'} + b (x - \bar{x}) + e_{i j}$ . Estimates of $a^{'}$ and b are uncorrelated both in OLS and GLS. After estimating $a^{'}$ and b, an estimate of a is obtained from $\hat{a} = {\hat{a}}^{'} - \hat{b} \bar{x}$ . Thereafter $var (\hat{a}) = var ({\hat{a}}^{'}) + {\bar{x}}^{2} var (\hat{b})$ , and $cov (\hat{a}, \hat{b}) = var ({\hat{a}}^{'})$ .

2. GLS in the Simple Linear Model

First, let us assume that $- 1 / (n - 1) < ρ_{e} < 1$ , which means that $V$ is non-singular. Then $V_{i}^{- 1}$ is a matrix with diagonal elements:

$α = (1 + (n - 2) ρ_{e}) / ((1 - ρ_{e}) (1 + (n - 1) ρ_{e}))$ (1)

The non-diagonal elements are equal to:

$β = - ρ_{e} / ((1 - ρ_{e}) (1 + (n - 1) ρ_{e}))$ (2)

Thereafter $X^{'} V^{- 1} X$ is

$X^{'} V^{- 1} X = (\begin{matrix} N (α + (n - 1) β) & 0 \\ 0 & α \sum_{i = 1}^{K} \sum_{j = 1}^{n} {(x_{i j} - \bar{x})}^{2} + β \sum_{i = 1}^{K} \sum_{j = 1}^{n} \sum_{j^{'} \neq j}^{} (x_{i j} - \bar{x}) (x_{i j^{'}} - \bar{x}) \end{matrix})$ (3)

The second diagonal element in (3) can be expressed as $\sum_{i = 1}^{K} \sum_{j = 1}^{n} {(x_{i j} - \bar{x})}^{2} (α + β R_{x})$ using the empirical intra-class correlation $R_{x}$ , which can be presented in two equivalent forms:

$R_{x} = \sum_{i = 1}^{K} \sum_{j = 1}^{n} \sum_{j^{'} \neq j}^{} (x_{i j} - \bar{x}) (x_{i j^{'}} - \bar{x}) / (n - 1) \sum_{i = 1}^{K} \sum_{j = 1}^{n} {(x_{i j} - \bar{x})}^{2}$ (4)

$R_{x} = n^{2} \sum_{i}^{K} {({\bar{x}}_{i} - \bar{x})}^{2} / (n - 1) \sum_{i = 1}^{K} \sum_{j = 1}^{n} {(x_{i j} - \bar{x})}^{2} - \frac{1}{n - 1}$ (5)

Reference [7] explains how the intra-class correlation, dating back to Fisher, generalizes the correlation idea in order to measure the similarity of group members. Equation (4) appears as covariance divided by variance, and (5) is almost equal to the sample variance of group means divided by the total sample variance, which resembles the definition of $ρ_{e}$ . Equation (5) indicates that $R_{x} \geq - 1 / (n - 1)$ . The lower limit is obtained when all group means ${\bar{x}}_{i}$ are equal. The upper limit $R_{x} = 1$ is obtained when $x_{i j} = {\bar{x}}_{i}$ for all i and j. If $R_{x} = 0$ , then $\sum_{i}^{K} {({\bar{x}}_{i} - \bar{x})}^{2} / K$ , the population variance of ${\bar{x}}_{i}$ , is equal to the population variance $\sum_{i = 1}^{K} \sum_{j = 1}^{n} {(x_{i j} - \bar{x})}^{2} / N$ divided by n, resembling the variance of the mean of uncorrelated random variables.

Matrix ${(X^{'} V^{- 1} X)}^{- 1}$ is obtained by taking the reciprocals of the diagonal elements of (3). The variances of the estimates of $a^{'}$ considered here are proportional to $C_{a^{'}} = σ^{2} / N$ , and the variances of the estimate of b are proportional

to $C_{b} = σ^{2} / \sum_{i = 1}^{K} \sum_{j = 1}^{n} {(x_{i j} - \bar{x})}^{2}$ . When $ρ_{e} = 0$ , $var ({\hat{a}}^{'}) = C_{a}$ , and $var (\hat{b}) = C_{b}$ . First, we note that ${\hat{a}}^{'}_{GLS} = {\hat{a}}^{'}_{OLS} = \bar{y}$ . Thus, we directly derive that $var ({\hat{a}}^{'}) = C_{a} (1 + (n - 1) r_{e})$ . If $r_{e} = - 1 / (n - 1)$ , then $var ({\hat{a}}^{'}) = 0$ . For b, we obtain:

$var ({\hat{b}}_{GLS}) = C_{b} \frac{(1 - ρ_{e}) (1 + (n - 1) ρ_{e})}{1 + (n - 2) ρ_{e} - (n - 1) R_{x} ρ_{e}}$ (6)

If $ρ_{e} = 0$ or $ρ_{e} = R_{x}$ , then $var ({\hat{b}}_{GLS}) = C_{b}$ . It holds that $var ({\hat{b}}_{GLS}) > C_{b}$ , if $ρ_{e}$ is between 0 and $R_{x}$ . Equation (6) is an increasing function of $R_{x} ρ_{e}$ , an increasing function of $R_{x}$ when $ρ_{e} > 0$ , and a decreasing function of $R_{x}$ when $ρ_{e} < 0$ . When $R_{x} = 1$ , then $var ({\hat{b}}_{GLS}) = C_{b} (1 + (n - 1) ρ_{e})$ . When $R_{x} = - 1 / (n - 1)$ , then ${var}_{GLS} (\hat{b}) = C_{b} (1 - ρ_{e})$ . The dependency of $var ({\hat{b}}_{GLS}) / C_{b}$ on $ρ_{e}$ for different values of $R_{x}$ is shown in Figure 1.

3. OLS in Simple Linear Model When $ρ_{e} \neq 0$

What happens if OLS is used when $ρ_{e} \neq 0$ ? Using $var ({\hat{b}}_{OLS})$ , provided in the introduction section, we obtain:

$var ({\hat{b}}_{OLS}) = C_{b} (1 + (n - 1) R_{x} ρ_{e})$ (7)

When $ρ_{e} \neq 0$ and OLS are used to estimate the variances of the parameter estimates, the bias combines the bias of the estimate of $σ^{2}$ and the errors of ${var}_{OLS} ({\hat{a}}^{'}_{OLS}) / σ^{2}$ and ${var}_{OLS} ({\hat{b}}^{'}_{OLS}) / σ^{2}$ where ${var}_{OLS}$ indicates the variances implied by OLS assumptions (these are not biases, as they are not related to the expected values). In our case:

$E ({\hat{σ}}_{OLS}^{2}) = (1 - \frac{(n - 1) (1 + R_{x}) ρ_{e}}{N - 2}) σ^{2}$ (8)

It holds that $E ({\hat{σ}}_{OLS}^{2}) < σ^{2} \Leftrightarrow ρ_{e} > 0$ . Now,

Figure 1. The dependency of $var ({\hat{b}}_{GLS}) / C_{b}$ in (6) on $ρ_{e}$ , the intra-class correlation of residual errors, for different values of $R_{x}$ , the intra-class correlation of x (denoted on the curves) when n = 3. Blue lines are for negative values of $R_{x}$ and red lines for positive values. The thick line is for $R_{x} = 0$ . Note that the same variance is obtained for $ρ_{e} = 0$ and $ρ_{e} = R_{x}$ .

$\begin{matrix} v \hat{a} r_{OLS} ({\hat{a}}^{'}_{OLS}) = {\hat{σ}}_{OLS}^{2} / N = {\hat{σ}}_{OLS}^{2} var ({\hat{a}}^{'}_{OLS}) / (N var ({\hat{a}}^{'}_{OLS})) \\ = {\hat{σ}}_{OLS}^{2} var ({\hat{a}}^{'}_{OLS}) / (1 + (n - 1) ρ_{e}) \end{matrix}$ (9)

In (9), $v \hat{a} r_{OLS}$ denotes the variance estimate we obtain in OLS. If it is combined with (8), we can obtain $E (v \hat{a} r_{OLS} ({\hat{a}}^{'}))$ in terms of $var ({a^{'}}_{OLS})$ . The estimated variance is biased downwards if $ρ_{e} > 0$ . If we similarly combine (8) and (7) we obtain:

$E (v \hat{a} r_{OLS} ({\hat{b}}_{OLS})) = \frac{N - 2 - (n - 1) (1 + R_{x}) ρ_{e}}{(N - 2) (1 + (n - 1) R_{x} ρ_{e})} var ({\hat{b}}_{OLS})$ (10)

When K increases, the bias coefficient in (10) approaches $1 / (1 + (n - 1) ρ_{e} R_{x})$ , which again demonstrates the adverse effect of $ρ_{e} R_{x}$ . The direction of the bias can be seen using the difference between the numerator and denominator of (10). Thus

$E (v \hat{a} r_{OLS} ({\hat{b}}_{OLS})) < var ({\hat{b}}_{OLS}) \Leftrightarrow (N - 2) R_{x} ρ_{e} + (1 + R_{x}) ρ_{e} > 0$ (11)

When K increases (and thus N also increases), then the direction of bias is dependent on the sign of $R_{x} ρ_{e}$ . The condition can be stated in a simpler form as $(N - 1) R_{x} ρ_{e} + ρ_{e} > 0$ , but (11) is needed for comparison with [3] in which two inequalities are derived which taken together, imply that the F-test in OLS leads to p-values that are too small. In our case, conditions in [3] are $(n - 1) (1 + R_{x}) ρ_{e} > 0$ and $(n - 1) R_{x} ρ_{e} > 0$ , which together imply the validity of the inequality in (11). Moreover, (11) can solve the direction of the bias in the case where sub-conditions indicate opposing directions. When $v \hat{a} r_{OLS} ({\hat{b}}_{OLS})$ is used in a t-test, its bias also produces bias in the computed p-values. Reversing the inequality in (11), we obtain a condition for obtaining too large p-values.

4. BLUE for Several Regressors, Nonsingular V

Let us now assume a general linear model $y = X b + e$ , the first column of $X$ being $1_{N}$ and the other columns centered. It is here assumed that $V$ is nonsingular, i.e., $- 1 / (n - 1) < ρ_{e} < 1$ . Let x and z denote two regressors. Both $α$ and $β$ are inversely proportional to $(1 - ρ_{e}) (1 + (n - 1) ρ_{e})$ . Thus, $α$ and $β$ increase when $ρ_{e}$ approaches $- 1 / (n - 1)$ or 1, leading to singular $V_{i}^{}$ when either of these limits are obtained. Let us define $W_{i} = (1 - ρ_{e}) (1 + (n - 1) ρ_{e}) V_{i}^{- 1}$ . The diagonal elements of $W_{i}$ are equal to $(1 + (n - 2) ρ_{e})$ , the nondiagonal elements are equal to $- ρ_{e}$ . Let $W$ combine all blocks. Using $W$ , we obtain a linear estimator:

${\hat{b}}_{W} = {(X^{'} W X)}^{- 1} X^{'} W y$ (12)

Element $i j$ of $X^{'} W$ is for x:

$\begin{matrix} u_{i j} = (1 + (n - 2) ρ_{e}) (x_{i j} - \bar{x}) - ρ_{e} \sum_{j^{'} \neq j}^{n} (x_{i j^{'}} - \bar{x}) \\ = (1 + (n - 1) ρ_{e}) (x_{i j} - {\bar{x}}_{i}) + (1 - ρ_{e}) ({\bar{x}}_{i} - \bar{x}) \end{matrix}$ (13)

The first diagonal element of $X^{'} W X$ , corresponding to the intercept, is $N (1 - ρ_{e})$ , and an element for x is:

$X^{'} W X_{x x} = \sum_{i = 1}^{K} \sum_{j = 1}^{n} [(1 + (n - 1) ρ_{e}) {(x_{i j} - {\bar{x}}_{i})}^{2} + (1 - ρ_{e}) {({\bar{x}}_{i} - \bar{x})}^{2}]$ (14)

Equation (14) explains why (6) is an increasing function of $R_{x} ρ_{e}$ : with a large $R_{x} ρ_{e}$ GLS places large weighting on the component with small variation. The non-diagonal elements are zero for the first row and column, and generally:

$\begin{matrix} X^{'} W X_{x z} = \sum_{i = 1}^{K} \sum_{j = 1}^{n} (1 + (n - 1) ρ_{e}) (x_{i j} - {\bar{x}}_{i}) (z_{i j} - {\bar{z}}_{i}) \\ + \sum_{i = 1}^{K} \sum_{j = 1}^{n} (1 - ρ_{e}) ({\bar{x}}_{i} - \bar{x}) ({\bar{z}}_{i} - \bar{z}) \end{matrix}$ (15)

Note that on the second-row summation over n could be dropped by multiplying with n.

The first element of $X^{'} W y$ is $N (1 - ρ_{e}) \bar{y}$ , and the others are:

$\begin{matrix} X^{'} W y_{x} = \sum_{i = 1}^{K} \sum_{j = 1}^{n} (1 + (n - 1) ρ_{e}) (x_{i j} - {\bar{x}}_{i}) (y_{i j} - {\bar{y}}_{i}) \\ + \sum_{i = 1}^{K} \sum_{j = 1}^{n} (1 - ρ_{e}) ({\bar{x}}_{i} - \bar{x}) ({\bar{y}}_{i} - \bar{y}) \end{matrix}$ (16)

The averages ${\bar{y}}_{i}$ and $\bar{y}$ do not contribute, but they may increase (or decrease) understanding. They do not indicate centering of y because they are not involved in the first element of $X^{'} W y$ . In GLS, the same estimates are obtained with any scaling of $V^{- 1}$ , thus ${\hat{b}}_{W} = {\hat{b}}_{GLS}$ for a non-singular $V$ , but may be computationally more stable. Variances are then computed using

$var ({\hat{b}}_{W}) = ((1 - ρ_{e}) (1 + 1 / (n - 1))) {(X^{'} W X)}^{- 1}$ (17).

5. BLUE When V is Singular

5.1. BLUE in the Simple Linear Regression with Singular V

Let us introduce the estimation with a singular $V$ using the simple linear model. $y_{i j} = a + b x_{i j} + e_{i j}$ . If $ρ_{e} = 1$ and $R_{x} \neq 1$ we can estimate b in the model with zero variance with $\hat{b} = (y_{i j} - y_{i j^{'}}) / (x_{i j} - x_{i j^{'}})$ using two observations such that $x_{i j} \neq x_{i j^{'}}$ . As $ρ_{e} = 1$ implies that $e_{i j} = {\bar{e}}_{i}$ for all observations, a can be estimated by computing the arithmetic mean (which is OLS estimator) of $y_{i j} - \hat{b} x_{i j}$ or $y_{i} - \hat{b} {\bar{x}}_{i}$ , the latter computation implying correct variance estimator $σ^{2} / K$ .

If. $R_{x} \neq 1$ , we can find at least two observations in the same group having different x values. If there are more observations deviating from the group means, there is an infinite number of estimating equations which all produce this same estimate with zero variance.

If $ρ_{e} = 1$ and $R_{x} = 1$ , then all y and x values are equal to the group means in all groups (for y with probability one), and b needs to be estimated with OLS using group means. This means that $var (\hat{b})$ is n times larger than we would get for $ρ_{e} = 0$ , as could be anticipated from Figure 1 for $n = 3$ . The same estimates are produced using observation level OLS regression, which produces biased estimates for the variances $var (\hat{b})$ and $var (\hat{a})$ .

If $ρ_{e} = - 1 / (n - 1)$ and $R_{x} \neq - 1 / (n - 1)$ , then we can find at least two groups i and $i^{'}$ so that ${\bar{x}}_{i} \neq {\bar{x}}_{i^{'}}$ , and b can be estimated with $({\bar{y}}_{i} - {\bar{y}}_{i^{'}}) / ({\bar{x}}_{i} - {\bar{x}}_{i^{'}})$ with zero variance. If there are more than two differing group means, there is an infinite number of estimating equations producing same estimates. After estimating b, a can be estimated with the arithmetic mean of ${\bar{y}}_{i} - b {\bar{x}}_{i}$ with variance $σ^{2} / K$ . The same estimate is obtained with the arithmetic mean of $y_{i j} - b x_{i j}$ , but standard OLS computations would produce a biased estimate for the estimation variance.

If $ρ_{e} = - 1 / (n - 1)$ and $R_{x} = - 1 / (n - 1)$ , i.e. all group means of ${\bar{e}}_{i}$ and ${\bar{x}}_{i}$ are equal (for ${\bar{e}}_{i}$ with probability one), then BLUE of b and a can be obtained by dropping one observation from each group and doing OLS regression in the remaining data. This means that $var (\hat{b})$ is equal to $n / (n - 1)$ times the OLS variance obtained for $ρ_{e} = 0$ . For $n = 3$ , $n / (n - 1) = 1.5$ as could be anticipated from Figure 1.

5.2. BLUE in the Multiple Regression When $ρ_{e} = 1$

When $ρ_{e} = 1$ , then $V$ is singular. Also $W$ is singular, all of its diagonal elements are equal to $n - 1$ , and nondiagonal elements are equal to −1. The estimator cannot be applied directly because the first diagonal element of $X^{'} W X$ is $N (1 - ρ_{e}) = 0$ thus producing singular $X^{'} W X$ . According to (14) the diagonal elements are zero also for such predictors x having $R_{x} = 1$ , i.e. no variation in $x_{i j} - {\bar{x}}_{i}$ . If all predictors have $R_{x} = 1$ then OLSE is BLUE, because $C (V X) \subset C (X)$ , see [4].

A general BLUE can be obtained using the decomposition $X = (\begin{matrix} X_{1} & X_{2} \end{matrix})$ where $X_{1}$ contains $1_{N}$ and all columns having $R_{x} = 1$ and $X_{2}$ contains other columns. It is now required that K ≥ 3. Let $b = {(\begin{matrix} {b^{'}}_{1} & {b^{'}}_{2} \end{matrix})}^{'}$ decompose $b$ similarly. If we weight columns of $X_{1}$ as done in OLS, i.e. replace columns of ${X^{'}}_{1} W$ in Equation (12) with $X_{1}$ we get an estimator

${\hat{b}}_{GOLS} = B y = {(\begin{matrix} {X^{'}}_{1} X_{1} & {X^{'}}_{1} X_{2} \\ {X^{'}}_{2} W X_{1} & {X^{'}}_{2} W X_{2} \end{matrix})}^{- 1} (\begin{matrix} {X^{'}}_{1} \\ {X^{'}}_{2} W \end{matrix}) y$ (18)

I suggest that the estimator is called a GOLS estimator as it combines OLS and GLS estimation principles in the same estimator. Using the standard formula for computing the inverse of a partitioned matrix, the estimator can be presented as

${\hat{b}}_{GOLS} = B y = (\begin{matrix} {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1} - {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1} X_{2} {(X_{2}' W X_{2})}^{- 1} {X^{'}}_{2} W \\ {({X^{'}}_{2} W X_{2})}^{- 1} {X^{'}}_{2} W \end{matrix}) y$ (19)

Thus

${\hat{b}}_{2} = {({X^{'}}_{2} W X_{2})}^{- 1} {X^{'}}_{2} W y$ (20)

Thus, ${\hat{b}}_{2}$ is the GLSE of $b_{2}$ as estimated solely using $X_{2}$ . Now $var ({\hat{b}}_{2}) = {({X^{'}}_{2} W X_{2})}^{- 1} {X^{'}}_{2} W V W X_{2} {({X^{'}}_{2} W X_{2})}^{- 1} = 0$ . Noting that the second row of $B$ is part of the first row, we obtain:

${\hat{b}}_{1} = {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1} y - {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1} {\hat{b}}_{2}$ (21)

${\hat{b}}_{1} = {\hat{b}}_{1, OLS} - {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1} X_{2} {\hat{b}}_{2}$ (22)

where ${\hat{b}}_{1, OLS}$ is the OLS estimator of $b_{1}$ obtained ignoring $X_{2}$ .

Using (22), we get $var ({\hat{b}}_{1}) = σ^{2} n {({X^{'}}_{1} X_{1})}^{- 1}$ , i.e. the same variance what we get when $b_{2} = 0$ and we regress group means of y on group means of all variables in $X_{1}$ . This is natural because all y values and x values are equal in all observations in each group.

Denote that $M = I - X {(X^{'} X)}^{- 1} X^{'}$ . An estimator $A y$ is BLUE for $X b$ , if and only if $A (\begin{matrix} X & V M \end{matrix}) = (\begin{matrix} X & 0 \end{matrix})$ , see [5]. If $X$ has full column rank (as we have), then $b$ is estimable and $B y$ is BLUE for $b$ if and only if $B (\begin{matrix} X & V M \end{matrix}) = (\begin{matrix} I & 0 \end{matrix})$ . Then, that $B X = I$ can be seen directly from taking into account that ${X^{'}}_{2} W X_{1} = 0$ . Noting that $W V = 0$ and ${X^{'}}_{1} V = n {X^{'}}_{1}$ we obtain that:

$B V M = {(n {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1} - n {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1} X {(X^{'} X)}^{- 1} X^{'}, 0)}^{'}$ (23)

Matrix $X {(X^{'} X)}^{- 1} X^{'}$ is a projector to the space spanned by $X$ . Now $C (n {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1}) \subset C (X)$ , thus $n {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1} X {(X^{'} X)}^{- 1} X^{'} = n {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1}$ , which completes the demonstration that $B V M = 0$ .

The GOLS estimator can be put into general matrix theory context using [5]. Now $C (X_{1})$ and $C (X_{2})$ are disjoint and $C (X_{1}) \subset C (V)$ and $C (X_{2}) ⊄ C (V)$ . In such a case (20) provides BLUE for $b_{2}$ . The estimator $X_{1} {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{1} y$ is an unbiased for $X_{1} b_{1}$ . This unbiased estimator is updated into BLUE in (22), being an application of proposition 10.5 on p. 228 of [5].

5.3. BLUE When $ρ_{e} = - 1 / (n - 1)$

If $ρ_{e} = - 1 / (n - 1)$ then all elements of $W_{i}$ are equal to $1 / (n - 1)$ . Let us organize $X$ into $X = (\begin{matrix} X_{1} & X_{2} \end{matrix})$ , where $X_{1}$ contains all such predictors for which $R_{x} = - 1 / (n - 1)$ and $X_{2}$ contains all other columns. Equation provides again a BLUE with $var ({\hat{b}}_{2}) = 0$ and the same arguments can be used to prove it also for this case. Now $var ({\hat{b}}_{1}) = σ^{2} \frac{n}{n - 1} {({X^{'}}_{1} X_{1})}^{- 1}$ , which is the same variance we obtain if $b_{2} = 0$ and $b_{1}$ is estimated from data where one redundant observation is dropped from each group.

It may a useful exercise to get confidence in multiple regressor derivations to show how the estimators derived in Section 5.1 for the simple linear model are produced also with the matrix formulas. Matrix formulas use in computations of all observations while formulas in section 5.1 utilize only nonredundant information.

6. Discussion

When $y_{i j}$ is regressed on $x_{i j}$ , it is implicitly assumed that the regression of $y_{i j}$ on $x_{i j} - {\bar{x}}_{i}$ , and the regression of ${\bar{y}}_{i}$ on ${\bar{x}}_{i} - \bar{x}$ , have the same slope. GLS estimation provides different weights to $x_{i j} - {\bar{x}}_{i}$ and ${\bar{x}}_{i} - \bar{x}$ when attempting to utilize the correlation structure, although $(1 + (n - 1) ρ_{e}) (x_{i j} - {\bar{x}}_{i})$ and $(1 - ρ_{e}) ({\bar{x}}_{i} - \bar{x})$ are put into the same column of $X$ as shown in (14). When developing mixed models for grouped data, it is often necessary to consider both $x_{i j} - {\bar{x}}_{i}$ and ${\bar{x}}_{i}$ as separate predictors, see [8]. It is natural to assume that $y_{i j}$ is related to $x_{i j} - {\bar{X}}_{i}$ and $x_{i j} - {\bar{X}}_{i}$ , where ${\bar{X}}_{i}$ is the mean of x in the whole group. Reference [8] suggests a solution, based on a multivariate mixed model, which solves the measurement error bias problem that occurs when ${\bar{x}}_{i}$ computed from a sample from group i is used to “measure” ${\bar{X}}_{i}$ .

The mean of variable $x_{i j} - {\bar{x}}_{i}$ is zero in each group, thereby indicating a negative within-group correlation. If $y_{i j}$ is linearly related to $x_{i j} - {\bar{x}}_{i}$ , the negative within-group correlation of $x_{i j} - {\bar{x}}_{i}$ is also transmitted to $y_{i j}$ . Hence, the addition of predictor $x_{i j} - {\bar{x}}_{i}$ to the models makes the estimated variance of the random intercept larger, as the “negative variance” is subtracted from the residual variance, see [8] and [9].

The mixed model formulation always leads to non-negative intra-class correlations. However, negative intra-class correlations are needed in situations where the group means have a smaller variance than that implied by the assumption of uncorrelated individuals. In the marginal interpretation of mixed model equations, a negative definite variance matrix of random effects that maximize the likelihood is allowed, leading to negative correlations and negative variances, provided the marginal variance of the y-vector is positive semi-definite, see [10]. The significance of a negative intra-class coefficient is dependent on the group size.

When the x-variable is a random variable with a theoretical intra-class correlation $ρ_{x}$ , the empirical intra-class correlation $R_{x}$ is a random variable. The expected value of $R_{x}$ approaches $ρ_{x}$ when K increases. Simulations with normally distributed variables show that, for given values of n and K, $var (R_{x})$ can be well described with function $c {(ρ_{x} + 1 / (n - 1))}^{s} {(1 - ρ_{x})}^{q}$ . Negatively correlated variables can be simulated using principal components.

Reference [2] discussed the estimation of the slope in a simple linear model assuming that $n = 2$ , $x_{i 1} = 0$ and $x_{i 2} = t_{i}$ for all i. To compare the formulas presented above with the formulas in [2], denote that $p = K {\bar{t}}^{2} / \sum_{i = 1}^{K} t_{i}^{2}$ where $\bar{t}$ is the average of $t_{i}$ , then $0 < p \leq 1$ , and $p = 1$ when all $t_{i}$ ’s values are equal. It holds that $R_{x} = - p / (2 - p)$ . They compare the variance of GLSE to the variance of the OLS estimate obtained by regressing $y_{i 2} - y_{i 1}$ on $t_{i}$ . The standard OLSE of the slope has, however, a smaller variance than their OLSE based on differences when $ρ_{e} > 0.5$ . For $ρ_{e} = 0$ , the ratio of the variance of the standard OLSE to the variance of their OLSE is $2 - p$ . When all $t_{i}$ ’s are equal, then $p = 1$ , $R_{x} = - 1$ , and all three methods provide the same estimate.

If the group sizes $n_{i}$ vary in a data set, it is not reasonable to assume the same negative intra-class coefficient $ρ_{e}$ for all groups. Thus $ρ_{e}$ needs to be made group specific $ρ_{e i}$ . Reference [7] generalizes the intra-class correlation to non-equal group sizes, but this generalization does not provide meaningful analysis for the interaction of the intra-class correlation of the residual and the values of regressors. However, it is possible to define a measure for the between-group variation of x which gets value of zero if there is no variation between groups, and use this measure to analyze how the intra-class correlation of residual errors and the between-group and within-group variation of predictors interact. This analysis can be extended to auto-correlation structures., and is under development.

In experiments, (6) informs us that $R_{x}$ should be $- 1 / (n - 1)$ (all ${\bar{x}}_{i}$ should be equal) if it is anticipated that $ρ_{e} > 0$ . If it is anticipated that $ρ_{e} < 0$ , then the same treatment level should be applied to the whole group.

The derivations of this paper are fully covered with the general matrix theory. The purpose of the paper is to make general matrix formula understandable using algebraic derivations which show how between-group variation and within-group variations of residual errors are connected to the between-group and within-group variations of regressors in the estimation of a general linear model.

Acknowledgements

I thank Ronald Christensen for taking seriously the first version dealing with the simple linear model, Simo Puntanen for patient guidance to the conditions which prove that an estimator is BLUE, and Jarkko Isotalo for indicating how the GOLS estimator is related to the general BLUE theory. I take full responsibility if I have not understood their explanations correctly. They fly smoothly in linear spaces and subspaces where I need to walk watching each step.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Markiewics, A., Puntanen, S. and Styan, P.H. (2022) The Legend of the Equality of OLSE and BLUE: Highlighted by C. R. Rao in 1967. In: Arnold, B.C., Balakrishnan, N. and Coelho, C.A., Eds., Methodology and Applications of Statistics, Springer, Cham.
[2]	Jeske, D.R. and Myhre, J.M. (2018) Regression Using Pairs vs. Regression on Differences: A Real-life Case Study for a Master’s Level Methods Class, The American Statistician, 72, 163-168. https://doi.org/10.1080/00031305.2017.1292956
[3]	Christensen, R. (2003) Significantly insignificant F tests. The American Statistician. 57, 27-32. https://doi.org/10.1198/0003130031108
[4]	Puntanen, S. and Styan, G.P.H. (1989) The Equality of the Ordinary Least Squares Estimator and the Best Linear Unbiased Estimator. The American Statistician, 43, 153-161. https://doi.org/10.1080/00031305.1989.10475644
[5]	Puntanen, S., Styan, G.P.H. and Isotalo, J. (2011) Matrix Tricks for Linear Statistical Models: Our Personal Top Twenty. Springer, Heidelberg. https://doi.org/10.1007/978-3-642-10473-2
[6]	Christensen, R. (2011) Plane Answers to Complex Questions. The Theory of Linear Models. 4th Edition, Springer, New York. https://doi.org/10.1007/978-1-4419-9816-3
[7]	Kendall, M. and Stuart, A. (1979) The Advanced Theory of Statistics: Volume 2: Inference and Relationship. 4th Edition, Charles Griffin, London.
[8]	Mehtätalo, L. and Lappi, J. (2020) Biometry for Forestry and Environmental Data with Examples in R. Chapman and Hall/CRC, New York. https://doi.org/10.1201/9780429173462
[9]	Snijders, T. and Bosker, R.J. (1994) Modeled Variance in Two-Level Models. Sociological Methods Research, 22, 342-363. https://doi.org/10.1177/0049124194022003004
[10]	Pryseley, A., Tchonlafi, C., Verbeke, G. and Molenberghs, G. (2011) Estimating Negative Variance Components from Gaussian and Non-Gaussian Data: A Mixed Models Approach. Computational Statistics and Data Analysis, 55, 1071-1085. https://doi.org/10.1016/j.csda.2010.09.002

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies