^{1}

^{2}

^{*}

Model averaging has attracted increasing attention in recent years for the analysis of high-dimensional data. By weighting several competing statistical models suitably, model averaging attempts to achieve stable and improved prediction. To obtain a better understanding of the available model averaging methods, their properties and the relationships between them, this paper is devoted to make a review on some recent progresses in high-dimensional model averaging from the frequentist perspective. Some future research topics are also discussed.

With the advent of high-throughput technologies, high-dimensional data have been frequently generated for the understanding of biological processes such as disease occurrence and cancer study. Motivated by these important applications, there has been a dramatic development in the statistical analysis of high-dimensional data; see [

Model selection and model averaging are two approaches used to improve estimation and prediction in the regression problems. Model selection assigns the weight of a single optimal model to 1 and weights for other candidate models to 0, thus the parsimonious and compact representations of the data can be obtained. In recent years, the shrinkage methods have become popular as they can achieve simultaneous model selection and parameter estimation. Such methods include, but are not limited to, the least absolute shrinkage and selection operator (LASSO, Tibshirani [

However, the process of model selection ignores the additional uncertainty or even introduces bias, and therefore often underestimates variance [

Instead of relying on only one model, model averaging compromises across a set of competing models by assigning different weights. In doing so, model uncertainty is incorporated into the conclusions about the unknown parameters. Besides, if the weights can be properly determined, then prediction performance could be enhanced [

Regarding model averaging techniques, Frequentist Model Averaging (FMA) and Bayesian Model Averaging (BMA) are two different methods in the literature. Compared with FMA, there are extensive references on BMA where a prior probability to each candidate model is set for the model uncertainty; for an overview of BMA, see [

The aim of this paper is to make a review on the current methods of the FMA in the high-dimensional linear models. The methods on FMA estimation are surveyed in Section 2. Some future research topics are discussed in Section 3.

So far, most current model averaging approaches are developed for the classic setting in which the number of observations is greater than the number of predictors, with the main focus of determination of the weights for individual models. These approaches include Akaike information criterion model averaging (AIC, Akaike [

However, for the high-dimensional setting, model averaging has only recently been studied. This is very different from the finite dimensional case because many of the fixed dimensional model averaging procedures either do not work at all or, for their implementation, require some theoretical or computational adjustment.

Given the dataset of n observations, a linear regression model takes the form of

y i = β 1 x i 1 + ⋯ + β p x i p + ϵ i , i = 1 , 2 , ⋯ , n , (1)

where y i is the response in the ith trial, x i 1 , ⋯ , x i p are the predictors, β 1 , ⋯ , β p are the regression coefficients, and ϵ i is the error term. Alternatively, in matrix form, model (1) can be written as

y = X β + ϵ , (2)

where

y = ( y 1 ⋮ y n ) , β = ( β 1 ⋮ β p ) , ϵ = ( ϵ 1 ⋮ ϵ n ) , X = ( X 1 ⋮ X n ) = ( x 11 x 12 ⋯ x 1 p x 21 x 22 ⋯ x 2 p ⋮ ⋮ ⋮ x n 1 x n 2 ⋯ x n p ) .

Developing for the data in which the number of predictors p is much greater than the number of observations n, Ando and Li [

After the candidate models and their corresponding least-squares predicted values { u ^ 1 , ⋯ , u ^ M } are obtained, the second stage of procedure of [

w ^ = CV ( w ) = ( y − u ˜ ) ′ ( y − u ˜ ) , (3)

where u ˜ = ∑ k = 1 M w k u ˜ k . Finally, the model averaging predicted value u ^ is expressed as

u ^ = ∑ k = 1 M w ^ k u ^ k . (4)

There are several contributions of Ando and Li [

Following [

Nevertheless, Lin et al. [

To reduce the variance of estimators, Lin et al. [

z i k = ∑ b ∈ I i x ′ i β ^ ( b , M ^ λ k ) | I i | , (5)

where I i is the set of indexes of test dataset D t e s t b that contain observation i. After z i k is determined, the optimal weight vector w is estimated by minimizing

∑ i = 1 n ( y i − ∑ k = 1 K w k z i k ) 2 . (6)

Finally, the model averaging predicted value takes the form of

u ^ = 1 B ∑ k = 1 K ( w ^ k ∑ b = 1 B u ^ ( b , M ^ λ k ) ) . (7)

The procedure of [

In this paper, we have made a review on the development of the FMA approach for high-dimensional linear regression models. The performance of the FMA procedures highly depends on how to choose weights in estimation, since different weights will result in different risks and asymptotic properties. Consequently, much of the current work focuses on weight choice to achieve stable prediction. Another issue is how to deal with the high-dimensional settings as the least-squares estimates are not unique. The general idea is to reduce the dimensions first and then to combine the low-dimensional models using the appropriate weights.

Although substantial progress has been made recently, the research on the FMA approach is a relatively new topic, for which a lot of problems remain unsolved and future work still needs to be done.

On possible direction is the extension of the FMA approach to other modeling settings containing generalized linear mixed model and Cox proportional hazards model, both of which are widely used in biological and medical research. For example, Zhang and Zou [

We also note that missing values are quite common in high-dimensional data, which leaves space for further research in model averaging. Schomaker et al. [

Finally, in current research on the weight choice, many focus on developing those weights which are non-negative; it seems interesting to explore the possibility of further relaxing the weights to allow for negative values. These and many other unsettled issues deserve further investigation.

The authors are grateful for a grant from Shandong University (IFYT18032).

Fu, P.P. and Pan, J. (2018) A Review on High-Dimensional Frequentist Model Averaging. Open Journal of Statistics, 8, 513-518. https://doi.org/10.4236/ojs.2018.83033