^{1}

^{1}

^{*}

^{1}

In recent years, variable selection based on penalty likelihood methods has aroused great concern. Based on the Gibbs sampling algorithm of asymmetric Laplace distribution, this paper considers the quantile regression with adaptive Lasso and Lasso penalty from a Bayesian point of view. Under the non-Bayesian and Bayesian framework, several regularization quantile regression methods are systematically compared for error terms with different distributions and heteroscedasticity. Under the error term of asymmetric Laplace distribution, statistical simulation results show that the Bayesian regularized quantile regression is superior to other distributions in all quantiles. And based on the asymmetric Laplace distribution, the Bayesian regularized quantile regression approach performs better than the non-Bayesian approach in parameter estimation and prediction. Through real data analyses, we also confirm the above conclusions.

Since the pioneering work by Koenker and Bassett in 1978, quantile regression (QR) has been deeply studied and widely applied to descript the elaborate relationship between the dependent variable and predictors [

In 2004, Koenker added the Lasso regularization method to the mixed-effect quantile regression model for the first time, and the Lasso penalty made the random effect shrink to zero [

Based on the existing literature, the Bayesian quantile regression is realized by expressing the asymmetric Laplace distribution as scale mixtures of the standard normal distribution and the standard exponential distribution, and the Gibbs sampler is used to simulate the distributed parameters. The regularized quantile regression under the Bayesian framework is compared with the non-Bayesian regularized quantile regression method. Finally, the prostate cancer data sets are used to illustrate the advantages and disadvantages of these two approaches.

Given data { ( x i , y i ) , i = 1 , ⋯ , n } , with covariate vector x ′ i = ( x i 1 , x i 2 , ⋯ , x i k ) and y ′ = ( y 1 , ⋯ , y n ) is the response variable. The θ th quantile regression model for the response y i given x i takes the form of

Q y i ( θ | x i ) = x ′ i β ( θ ) , (1)

where Q y i ( θ | x i ) = F y i − 1 ( θ | x i ) is the inverse cumulative distribution function and β ( θ ) is the unknown coefficients vector that is dependent on the quantile θ , θ ∈ ( 0 , 1 ) .

The regression parameter β can be estimated by minimizating the following objective function

min β ∑ i = 1 n ρ θ ( y i − x ′ i β ) , (2)

where ρ θ ( u ) = u ( θ − I ( u < 0 ) ) is the loss function and I ( ⋅ ) denotes the indicator function.

In 2001, Yu and Moyeed [

f ( y | μ , σ , θ ) = θ ( 1 − θ ) σ exp { − ρ θ ( y − μ ) σ } , (3)

where, σ is the scale parameter, μ is the location parameter and θ is the asymmetrc parameter.

Then the likelihood function of the sample y = ( y 1 , y 2 , ⋯ , y n ) ′ can be expressed as

L ( y | μ , σ , θ ) = θ n ( 1 − θ ) n σ n exp { − 1 σ ∑ i = 1 n ρ θ ( y i − μ ) } . (4)

Tsionas [

For ω = 1 − 2 θ θ ( 1 − θ ) , f = 2 θ ( 1 − θ ) , if μ ∼ A L D ( 0 , σ , θ ) , random variable μ = ω z + ϕ σ − 1 2 z υ , Therefore, it can be known that the independent variable y i of the quantile regression is equivalent to

y i = x ′ i β + ω z i + ϕ σ − 1 2 z i υ i . (5)

The Bayesian quantile regression parameter estimation model with Lasso penalty (Li and Zhu) [

min β ∑ i n ρ θ ( y i − x ′ i β ) + λ ∑ j = 1 k | β j | , (6)

Li et al. [

Bayesian quantile regression with adaptive Lasso penalty (BQR-AL) is based on different penalty parameters are applied to different regression coefficients. Therefore, the parameter estimation model of BQR-AL is

min β ∑ i = 1 n ρ θ ( y i − x ′ i β ) + ∑ j = 1 k | λ j β j | , (7)

Alhamzawi and Yu [

p ( β j | σ , λ j ) = σ 1 2 2 λ j exp { − σ 1 2 | β j | λ j } , (8)

Andrews and Mallows [

ξ 2 exp { − ξ | t | } = ∫ 0 ∞ 1 2 π s exp { − t 2 2 s } ξ 2 2 exp { − ξ 2 2 s } d s , ξ > 0. (9)

Let η = σ 1 2 λ j , (8) can be written as

p ( β j | σ , λ j ) = η 2 exp { − η | β j | } , (10)

also equivalent to

p ( β j | σ , λ j ) = ∫ 0 ∞ 1 2 π s j exp { − β j 2 / 2 s j } η 2 2 exp { − η 2 s j / 2 } d s j , (11)

so there are

p ( β j | σ , λ j 2 ) = ∫ 0 ∞ 1 2 π s j exp { − β j 2 / 2 s j } σ 2 λ j 2 exp { − σ s j / 2 λ j 2 } d s j . (12)

The prior distribution of λ j 2 is set to the inverse gamma prior, so the distribution density function of λ j 2 is

p ( λ j 2 | δ , γ ) = γ δ Γ ( δ ) ( λ j 2 ) − 1 − δ exp { − γ λ j 2 } , (13)

where δ > 0 and γ > 0 are two hyperparameters. Yi and Xu [

In summary, the Bayesian quantile regression hierarchical model with adaptive Lasso penalty is

y i = x ′ i β + ω z i + ϕ σ − 1 2 z i υ i , p ( β 0 ) ∝ 1 , p ( υ i ) = 1 2 π exp { − υ i 2 2 } , p ( z i | σ ) = σ exp { − σ z i } ,

p ( β j , s j | σ , λ j 2 ) = 1 2 π s j exp { − β j 2 / 2 s j } σ 2 λ j 2 exp { − σ s j / 2 λ j 2 } , p ( λ j 2 | δ , γ ) = τ δ Γ ( δ ) ( λ j 2 ) − 1 − δ exp { − γ λ j 2 } , p ( σ ) = σ a − 1 exp { − b σ } , p ( γ , δ ) = γ − 1 . (14)

From the hierarchical model, the joint posterior density function of each parameter is

p ( β , z , s , σ , λ 1 , ⋯ , λ k | y , Χ ) ∝ p ( y | β , z , σ , Χ ) ∏ i = 1 n p ( z i | σ ) ∏ j = 1 k p ( β j , s j | σ , λ j 2 ) p ( λ j 2 | γ , δ ) p ( σ ) p ( γ , δ ) ∝ ∏ i = 1 n σ σ − 1 ϕ 2 z i exp { − σ ( y i − x ′ i β − ω z i ) 2 2 ϕ 2 z i − σ z i } × ∏ j = 1 k 1 2 π s j exp { − β j 2 / 2 s j } σ 2 λ j 2 exp { − σ s j / 2 λ j 2 } × γ δ Γ ( δ ) ( λ j 2 ) − 1 − δ exp { − γ λ j 2 } γ − 1 σ a − 1 exp { − b σ } ,

where

y = ( y 1 , y 2 , ⋯ , y n ) , Χ = ( x 1 , x 2 , ⋯ , x n ) , z = ( z 1 , z 2 , ⋯ , z n ) , s = ( s 1 , s 2 , ⋯ , s k ) .

The full condition posterior distribution of each parameter is

β 0 | ⋅ ∼ N ( β ¯ 0 , s β 0 2 ) , β j | ⋅ ∼ N ( β ¯ j , s β j 2 ) , z i | ⋅ ∼ GIG ( 1 2 , α , l ) 2 , s j | ⋅ ∼ GIG ( 1 2 , σ 2 λ j 2 , β j 2 ) ,

σ | ⋅ ∼ G ( a 1 , a 2 ) 3 , λ j 2 | ⋅ ∼ GIG ( 1 + δ , σ s j 2 + γ ) , γ | ⋅ ∼ G ( k δ − 1 , ∑ j = 1 k λ j − 2 ) . (15)

Here

α = σ ω 2 ϕ − 2 + 2 σ , l = σ ϕ − 2 ( y i − x ′ i β ) 2 , a 1 = 3 2 n + k + a , a 2 = ∑ i = 1 n [ ( y i − x ′ i β − ω z i ) 2 ϕ 2 z i + z i ] + ∑ j = 1 k s j 2 λ j 2 + b . (16)

Since the full condition posterior distribution p ( δ | ⋅ ) ∝ ( Γ ( δ ) ) − k γ k δ ∏ j = 1 k λ j − 2 δ

of δ does not have a closed form, it is a logarithmic convex function. Gilks [

Based on the MCMC algorithm of Gibbs sampling, Bayesian estimation is carried out on the model. The simulation studies used to compare the regularized quantile regression under the Bayesian and the non-Bayesian framework. These methods include Bayesian quantile regression with adaptive Lasso penalty (BQR-AL), Bayesian quantile regression with Lasso penalty (BQR-L), quantile regression with Lasso penalty (QR-L), quantile regression with SCAD penalty (QR-SCAD) and quantile regression (QR).

Here, we follow the same simulation strategy introduced by Li, Xi and Lin [

We consider a linear model

y i = x ′ i β + ε i , i = 1 , ⋯ , n

where ε i ’s have the θ th quantile equal to zero.

For i.i.d. random errors, this paper will consider the following four forms of simulation

Simulation 1: β = ( 3 , 1.5 , 0 , 0 , 2 , 0 , 0 , 0 ) ,

Simulation 2: β = ( 0.85 , 0.85 , 0.85 , 0.85 , 0.85 , 0.85 , 0.85 , 0.85 ) ,

Simulation 3: β = ( 5 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ) ,

Simulation 4: β = ( 3 , ⋯ , 3 , ︸ 10 0 , ⋯ , 0 ︸ 10 , 3 , ⋯ , 3 ︸ 10 ) .

In the first three simulation studies, the rows of x is generated in a multivariate normal distribution N ( 0 , Σ ) with ( Σ ) i j = 0.5 | i − j | . In Simulation 4, we first generate Ζ 1 and Ζ 2 from N ( 0 , 1 ) , then let x j = Ζ 1 + ν j , j = 1 , ⋯ , 10 , x j ∼ N ( 0 , 1 ) , j = 11 , ⋯ , 20 , x j = Ζ 2 + ν j , where j = 21 , ⋯ , 30 , where ν j ∼ N ( 0 , 0.01 ) , j = 1 , ⋯ , 10 , 21 , ⋯ , 30 .

In each simulation, we consider the error distributions in our simulation follows

1) A normal distribution N ( μ , 1 ) with the θ th quantile equal to zero.

2) A Laplace distribution L a p l a c e ( μ , 1 ) with the θ th quantile equal to zero.

3) A t distribution with three degrees of freedom, t ( 3 ) .

4) A χ 2 distribution with three degrees of freedom, χ ( 3 ) 2 .

5) A asymmetric Laplace distribution A L D ( μ , 0.5 , 1 ) with the θ th quantile equal to zero.

The number of observations in one simulated sample is n = 200. The simulation is repeated 50 times for each error distribution. The evaluation index is the median mean absolute deviation (MMAD), i.e.

MMAD = median ( 1 / 200 ∑ i = 1 200 | x ′ i β ^ − x ′ i β t r u e | )

The quantile regression model for the quantile θ = ( 0.3 , 0.5 , 0.7 ) is estimated separately. The simulation results are shown in Figures 1-4.

For simulation 4, since the number of variables is larger than the sample size, the design matrix is a singular matrix, so the QR and QR-SCAD methods cannot be run in this simulation, and the other methods can still operate normally. This also proves the advantages of the regularization method. The results are shown in

Figures 1-4 show that, in terms of the MMAD, BQR-AL and BQR-L method performs better than the other regularized quantile regression method. The results of the MMAD for simulation 1 - 4 are reported in Figures 1-4. From these simulation, we can learn the following results:

1) In the above simulation, the MMADs of BQR-AL and BQR-L tend to give lower MMAD compared with the other regularized quantile regression under non-Bayesian for all distributions under considerations. It is shown that the stability and repeatability of the Bayesian regularized quantile regression are better.

2) In the case of sparse and very sparse regression coefficient, the MMAD value of BQR-AL is the smallest. In the case of dense regression coefficient, the MMAD value of BQR-L is smaller. Moreover, the estimation effect of the two methods is similar.

3) The BQR-AL and BQR-L methods can achieve good results under all error term distributions. It is shown that the regularized Bayesian quantile regression method is robust to the assumption of the error term, and the two methods are satisfactory even if the error term deviates from ALD.

4) No matter what the distribution of the original data is, when the error distribution is ALD, the regularized quantile regression method under Bayesian framework has high accuracy, especially the BQR-AL method, and the estimated value of its parameters is the closest to the real value.

In addition to observing the MMADs of each method, this paper can also observe the estimation of its parameters. Due to the limited space, in this paper, the parameter estimation results of error obeying ALD distribution in simulation 1 are simulated:

It can be seen from the parameter estimates in

Consider the following model when the error term is subject to a non-i.i.d.

y = 2 + x 1 + x 2 + x 3 + ( 1 + x 3 ) ε ,

where x 1 ∼ N ( 0 , 1 ) , x 3 ∼ U ( 0 , 1 ) , x 2 ∼ x 1 + x 3 + z , where z ∼ N ( 0 , 1 ) and ε ∼ N ( 0 , 1 ) . The remaining five noise variables x 4 , x 5 , x 6 , x 7 , x 8 are generated from the independent standard normal distribution. The results are shown in

It can be known from

This section mainly analyzes prostate cancer data in the “bayesQR” package [

θ | Method | β 1 | β 2 | β 3 | β 4 | β 5 | β 6 | β 7 | β 8 |
---|---|---|---|---|---|---|---|---|---|

β t r u e | 3 | 1.5 | 0 | 0 | 2 | 0 | 0 | 0 | |

0.3 | BQR-AL | 3.0110 | 1.5013 | −0.0103 | 0.0035 | 1.9993 | 0.0101 | −0.0004 | −0.0088 |

BQR-L | 3.0092 | 1.5014 | −0.0113 | 0.0053 | 1.9948 | 0.0128 | 0.0002 | −0.0096 | |

QR-SCAD | 3.0090 | 1.5094 | −0.0094 | 0.0174 | 1.9965 | 0.0122 | 0.0043 | −0.0007 | |

QR-L | 3.0163 | 1.4942 | −0.0130 | 0.0079 | 1.9952 | 0.0083 | −0.0001 | −0.0091 | |

QR | 3.0143 | 1.4997 | −0.0146 | 0.0063 | 2.0018 | 0.0067 | −0.0007 | −0.0148 | |

0.5 | BQR-AL | 3.0088 | 1.5077 | −0.0105 | −0.0004 | 1.9979 | 0.0170 | −0.0032 | −0.0064 |

BQR-L | 3.0064 | 1.5074 | −0.0107 | 0.0013 | 1.9925 | 0.0202 | −0.0032 | −0.0066 | |

QR-SCAD | 3.0062 | 1.5109 | −0.0005 | 0.0102 | 1.9931 | 0.0257 | −0.0020 | 0.0084 | |

QR-L | 3.0167 | 1.4983 | −0.0101 | 0.0050 | 1.9883 | 0.0182 | −0.0043 | −0.0066 | |

QR | 3.0124 | 1.5095 | −0.0137 | 0.0045 | 1.9939 | 0.0160 | −0.0027 | −0.0081 | |

0.7 | BQR-AL | 3.0057 | 1.5142 | −0.0123 | −0.0051 | 1.9973 | 0.0234 | −0.0043 | −0.0035 |

BQR-L | 3.0025 | 1.5135 | −0.0122 | −0.0029 | 1.9907 | 0.0271 | −0.0048 | −0.0035 | |

QR-SCAD | 3.0015 | 1.5242 | −0.0047 | 0.0079 | 1.9937 | 0.0311 | −0.0002 | 0.0063 | |

QR-L | 3.0108 | 1.5042 | −0.0080 | 0.0014 | 1.9890 | 0.0228 | −0.0021 | −0.0078 | |

QR | 3.0065 | 1.5127 | −0.0122 | −0.0011 | 1.9949 | 0.0231 | −0.0060 | −0.0050 |

θ | 0.3 | 0.5 | 0.7 |
---|---|---|---|

BQR-AL | 0.67640 | 1.58100 | 2.44680 |

BQR-L | 0.65308 | 1.55980 | 2.41390 |

QR-SCAD | 0.72930 | 1.57730 | 2.43000 |

QR-L | 0.69427 | 1.57320 | 2.43920 |

QR | 0.72090 | 1.59810 | 2.48700 |

record of 97 male patients undergoing radical prostatectomy, containing the level of prostate antigen y (lpsa) and eight influencing factors. These influencing factors were: log cancer volume (lcavol), log prostate weight (lweight), age, log of the amount of benign prostatic hyperplasia (lbph), seminal vesicle invasion (svi), log of capsular penetration (lcp), Gleason score (gleason) and percentage of Gleason score 4 or 5 (pgg 45). As with the numerical simulation of the second part, still consider θ = ( 0.3 , 0.5 , 0.7 ) here.

In

first 1000 samples are discarded. The results are shown in

It can be seen from

Of course, we can also intuitively understand by drawing images. In this section, for a more intuitive observation, the estimated values of the various methods for θ = 0.7 are plotted, and similar results are obtained for other quantiles. In order to be intuitive, the estimated values of each method will be translated, and the image is drawn as shown in

θ | variable | QR | BQR-L | BQR-AL |
---|---|---|---|---|

lcavol | 0.8310 (0.5078, 0.9594) | 0.6101 (0.4360, 0.7904) | 0.6232 (0.4515, 0.8133) | |

lweight | 0.3166 (0.0679, 0.5257) | 0.2425 (0.0836, 0.4068) | 0.2444 (0.0833, 0.4216) | |

age | −0.1282 (−0.2590, 0.1218) | −0.1523 (−0.2868, −0.0066) | −0.1628 (−0.2974, −0.0283) | |

0.3 | lbph | 0.1693 (0.0221, 0.2993) | 0.1897 (0.0315, 0.3510) | 0.1990 (0.0392, 0.3528) |

svi | 0.2687 (0.1134, 0.44230) | 0.2928 (0.0862, 0.4690) | 0.3071 (0.1107, 0.4882) | |

lcp | −0.2632 (−0.5559, 0.0310) | −0.1089 (−0.3149, 0.0794) | −0.1279 (−0.3325, 0.0734) | |

gleason | −0.0249 (−0.1910, 0.1988) | 0.0823 (−0.0756, 0.2363) | 0.0867 (−0.0737, 0.2500) | |

pgg45 | 0.2430 (−0.0540, 0.5164) | 0.1071 (−0.0499, 0.2952) | 0.1134 (−0.0523, 0.3113) | |

lcavol | 0.6278 (0.4480, 0.8226) | 0.6083 (0.4292, 0.7917) | 0.6282 (0.4488, 0.8207) | |

lweight | 0.2759 (0.0882, 0.4258) | 0.2420 (0.0744, 0.4183) | 0.2486 (0.0934, 0.4152) | |

age | −0.1994 (−0.2879, −0.0667) | −0.1487 (−0.2888, −0.0034) | −0.1643 (−0.2961, −0.0190) | |

0.5 | lbph | 0.2319 (0.0745, 0.3454) | 0.1858 (0.0157, 0.3500) | 0.2011 (0.0443, 0.3513) |

svi | 0.3312 (0.1606, 0.4665) | 0.2900 (0.0958, 0.4709) | 0.3039 (0.0988, 0.4813) | |

lcp | −0.1830 (−0.3249, −0.0517) | −0.1051 (−0.3043, 0.0788) | −0.1288 (−0.3363, 0.0611) | |

gleason | 0.1467 (−0.0753, 0.2190) | 0.0843 (−0.0769, 0.2417) | 0.0835 (−0.0838, 0.2426) | |

pgg45 | 0.1146 (0.0060, 0.3218) | 0.1055 (−0.0568, 0.3019) | 0.1129 (−0.0493, 0.3099) | |

lcavol | 0.6639 (0.3897, 0.8638) | 0.6068 (0.4278, 0.7969) | 0.6266 (0.4491, 0.8235) | |

lweight | 0.0786 (−0.0389, 0.4159) | 0.2436 (0.0796, 0.4181) | 0.2467 (0.0825, 0.4175) | |

age | −0.0977 (−0.3382, 0.0101) | −0.1503 (−0.2868, −0.0046) | −0.1632 (−0.2968, −0.0250) | |

0.7 | lbph | 0.1652 (0.0001, 0.4058) | 0.1878 (0.0335, 0.3423) | 0.2027 (0.0401, 0.3598) |

svi | 0.3157 (0.1816, 0.5286) | 0.2887 (0.0891, 0.4667) | 0.3077 (0.1121, 0.4902) | |

lcp | −0.0888 (−0.3516, −0.0038) | −0.1008 (−0.2950, 0.0811) | −0.1270 (−0.3326, 0.0662) | |

gleason | −0.0967 (−0.1385, 0.0368) | 0.0820 (−0.0750, 0.2393) | 0.0814 (−0.0900, 0.2466) | |

pgg45 | 0.2324 (0.0755, 0.2866) | 0.1092 (−0.0549, 0.3104) | 0.1131 (−0.0575, 0.3198) |

It can be clearly seen from

Bayesian quantile regression with adaptive Lasso penalty is an extension and improvement of the Lasso method. Adaptive Lasso penalty is based on different penalty parameters are applied to different regression coefficients. This method can effectively eliminate the influence of noise variables and obtain more accurate parameter estimation. Through the Gibbs sampling algorithm, this paper systematically compares the regularized quantile regression under the non-Bayesian and Bayesian framework, and finds that when the error term obeys the independent identically distributed or heteroscedasticity distribution, both BQR-AL and BQR-L have higher accuracy and are superior to non-Bayesian methods. When the error obeys ALD, the BQR-AL method has the highest accuracy for the MMAD under the same quantile, and its parameter estimate is also the closest to the true value in general. In the real data set, we can also find the same conclusion. Therefore, we can say that the Bayesian penalty regression method can get a good effect under the condition that the coefficient is sparse or dense, and it can be described in full aspect at different quantile points, and it will occupy a very important position in the future high-dimensional data analysis.

This work was supported by the National Natural Science Foundation of China [grant numbers 61763008, 71762008]; Guangxi Science and Technology Plan Project [grant numbers 2018GXNSFAA294131, 2018GXNSFAA050005, 2016GXNSFAA380194].

The authors declare no conflicts of interest regarding the publication of this paper.

Tang, Q.Q., Zhang, H.M. and Gong, S.F. (2020) Bayesian Regularized Quantile Regression Analysis Based on Asymmetric Laplace Distribution. Journal of Applied Mathematics and Physics, 8, 70-84. https://doi.org/10.4236/jamp.2020.81006