A New Test for Large Dimensional Regression Coefficients

In the article, hypothesis test for coefficients in high dimensional regression models is considered. I develop simultaneous test statistic for the hypothesis test in both linear and partial linear models. The derived test is designed for growing p and fixed n where the conventional F-test is no longer appropriate. The asymptotic distribution of the proposed test statistic under the null hypothesis is obtained.


Introduction
Some high dimensional data, such as gene expression datasets in microarray, exhibits the property that the number of covariates greatly exceeds the sample size.The discovery of "large p, small n" paradigm brings challenges to many traditional statistical methods, and thus the asymptotic properties of various estimators when p goes to infinity much faster than n have been discussed (see [1][2][3]).Reference [1] considered uniform convergence for a large number of marginal discrepancy measures targeted on univariate distributions, means and medians.Reference [3] proposed a two sample test on high dimensional means.Both of these aforementioned articles considered testing under "large p, small n" without a regression structure, which the present article concentrates on.
Zhong and Chen in [4] proposed a test statistic for testing the regression coefficients in linear models when p/n → ρ in (0,1).As in microarray data, the number of genes (p) is in the order of thousands whereas the sample size (n) is much less, usually less than 50 due to limitation for replications.The fact results in p going to infinity and thus I think the consideration of p going to infinity and n remains constant is more practical.
Covariate selection for high dimensional linear regression has received considerable attention in recent years.Penalizing methods are alternatives to the traditional least squares estimator for shrinkage estimation as in [5,6].Shao and Chow in [7] proposed a variable screening method using ridge estimators as both p and n go to infinity.In contrast to the assumptions in the literature, I consider "large p, fixed n" setting in linear models for variable selection.Testing hypothesis on the regression coefficients is critical in determining the effects of covariates on certain outcome variable.Motivated by the latest need in biology to identify significant sets of genes, rather than individual gene, I aim at developing simultaneous tests for coefficients in linear regression models.
The partial linear models have been extensively studied.They have a wide range of applications, from statistics to biomedical sciences.In these models, some of the relations are believed to be of certain parametric form while others are not easily parameterized.Several approaches have been developed to construct estimators.A profile likelihood approach was used in [8,9].In this article, I apply a difference based estimation method in the partial linear models.The method of taking differences to eliminate the effect of the unknown nonparametric component has been used in both nonparametric and semiparametric settings.Rice in [10] first introduced a differencing estimator of the residual variance.Horowitz and Spokoiny in [11] used the differencing method to test between a parametric model and a nonparametric alternative.After taking the differences to eliminate the bias induced from the nonparametric term, I concentrate attention on estimating the linear component and then formulate the test statistic for testing the linear components.
The article begins with the conventional F-test.I will then discuss the efficiency of ridge estimator and propose a new test statistic for "large p, fixed n" setting.The asymptotic distribution of the proposed test statistic under null hypothesis is established.Extensions to partial linear models are then made in Section 3.

F-Test for "Large n, Small p"
I will start from reviewing the F-test for hypothesis (2) by Rao in [12].When we have a large sample size, we can use least squares method to estimate the coefficients.
The least squares estimator is As proven in [12], under H0, F n,p ~ F p,n-p .Hence, an αlevel F-test rejects H0 if F n,p > F p,n-p;α , the upper αquantile of the F p,n-p distribution.The F statistic is a monotone function of the likelihood ratio statistic and is distributed as a noncentral F distribution under the alternative (see [13]).

A New Test Statistic for "Large p, Fixed n"
I have seen a limitation with the F-test defined in Equation (3): it can not be applied to large p and small n.As more and more datasets exhibit larger dimension than sample size, we are in need to formulate a test statistic to suit the large "p and small n" paradigm.Because least squares estimator   is inappropriate when p > n, I modify the F-statistic in two aspects.One is to replace the least squares estimator with an appropriate estimator of β.The second is to find the asymptotic distribution of the new test statistic.
To overcome the singularity of X'X when p > n in model (1), consider using penalizing methods.The ridge estimator  of β in [14] where h p is the regularization parameter.Luo in [15] proved that the  in Equation ( 4) is mean squared er-ror consistent of β under certain conditions.More recently, Luo in [16] proved the mean squared error consistency under less restrictive conditions.The assumptions and the results in [16] are given below.Assumption A. 1/h p = o(1).For sufficiently large p, there is a vector b p×1 such that β = X'Xb.Furthermore, there exists a constant ε > 0 such that each component of b p×1 is   ).It was proven in [16] that under the Assumption A and . In this article, I will take the opportunity to explore more concise asymptotic results about  under Assumption A and B. Because X'X can have at most n positive eigenvalues, without loss of generality, let λ ip be the ith nonzero eigenvalue of X'X and assume λ ip > 0 for all i = 1, 2, ... , n.Let Г = (τ ij ) p×p be an orthogonal matrix such that where Λ n×n is a diagonal matrix with elements λ ip , i = 1, 2, ... , n. Theorem 1.Under Assumption A and B, given that the p covariates are uncorrelated, if h p is chosen such that p -ε/2 h p /σ p = o(1) and λ ip = o(h p ) for all i =1, 2, ... , n, I have where diag(X'X) means the diagonal matrix with diagonal elements of X'X. Proof.
Because the random error ϵ is multivariate normal,  is a multivariate normal.
where diag(X'X) j means the jth diagonal element of X'X.Given that the p covariates are uncorrelated, I conclude As in [16], the bias( ˆj 1), the assumption that p -ε/2 h p /σ p = o(1) guarantees bias( ˆj Along with result in (5), that completes the proof for Theorem 1.Now I can modify the F n,p to a test statistic for "large p, fixed n" paradigm.Define Under assumption A and B, as p → ∞,  in Equation ( 4) is mean squared error consistent of β which implies  converges in probability to β. Apply the continuous mapping theorem, converges to 1. Apply the Slutsky's theorem, I conclude that under H 0 , test statistic G n,p in Equation ( 6) converges in distribution to χ n 2 as p → ∞ and n is a fixed constant.Hence, an α-level G n,p statistic rejects H 0 if G n,p > χ n;α 2 , the upper α quantile of the χ n 2 distribution.

Extension to Partial Linear Models
Partial linear models are more flexible than standard linear models.They can be a suitable choice when one suspects that the response Y linearly depends on X, but Y is nonlinearly related to Z. Consider a fix design version of the partial linear model which has the matrix form where Y = (Y 1 , Y 2 , ... , Y n+1 )', X is a (n + 1) × p matrix whose ith row is given by x i , the p covariates of x i are uncorrelated and ϵ = (ϵ 1 , ϵ 2 , ... , ϵ n+1 )' is normally distributed with a mean vector 0 and covariance matrix 2 1 p n I   .Estimators of the linear component for n > p situation have been discussed in [17][18][19].The methods are not applicable for p > n, I propose the following procedure to obtain a statistic for hypothesis (2) in partial linear model (7).Assume the sequence {z i }→ c 0 as p → ∞, for all i = 1, 2, ... , n + 1, where c 0 is a finite constant.The unknown function f is continuous at point c 0. Consider Since z i → c 0 for all i = 1, 2, ... , n + 1, for any ψ > 0, there exists a large enough p value so that we have   .Function f is continuous at point c 0 , so for a large enough p, we have which implies that for a finite n, Define a matrix We now consider the matrix form of Equation ( 8), which is Because of Equation ( 9), I can ignore the presence of nonparametric part in model (11).Thus, (11) becomes DY = DXβ + Dϵ (12) where matrix D is given in (10).Luo in [20] examined the asymptotic distribution of ridge estimator of β in (12).Obviously the random errors Dϵ are not independent and thus the following procedure is crucial for the extension of previous results.Without loss of generality, assume sample size n is even.Define (see Equations ( 13) and ( 14)).So Equation ( 12) becomes and Notice that D 1 ϵ ~ N(0, σ p 2 I n/2 ) and D 2 ϵ ~ N(0, σ p 2 I n/2 ).Now I can apply the results in Section 2.2 in model (15) and model (16).It follows that the two statistics for testing hypothesis (2) in model ( 15) and ( 16) are given by n n ) σ p and h p are chosen such that p -ε •h p = o(1) and σ p =o(h p 0.5 D 2 'D 2 X + p p h I ) -1 X'D 2 'D 2 Y.When all assumptions for Theorem 1 hold, under H 0 , both and converge in distribution to χ n 2 as p → ∞.Hence, the de-