^{1}

^{*}

^{1}

In this paper, we research the regression problem of time series data from heterogeneous populations on the basis of the finite mixture regression model. We propose two finite mixed time-varying regression models to solve this. A regularization method for variable selection of the models is proposed, which is a mixture of the appropriate penalty functions and
* l*
_{2} penalty. A Block-wise minimization maximization (MM) algorithm is used for maximum penalized log quasi-likelihood estimation of these models. The procedure is illustrated by analyzing simulations and with an application to analyze the behavior of urban vehicular traffic of the city of S
ão Paulo in the period from 14 to 18 December 2009, which shows that the proposed models outperform the FMR models.

The problem of variable selection in FMR models has been widely discussed [

f ( y ; x , θ ) = ∑ k = 1 K π k f ( y ; η k ( x ) , ϕ k ) (1)

where y is an independent and identically distributed (IID) response and x is a p × 1 vector of covariates. π = ( π 1 , ⋯ , π k ) T denotes the mixing proportions satisfying 0 < π k < 1 , ∑ k = 1 K π k = 1 . f ( y ; η k ( x ) , ϕ k ) is the kth mixture component density. η k ( x ) = h ( α k + x T β k ) for k = 1, ⋯ , K , for a given link function h ( ⋅ ) , and a dispersion parameter ϕ k .

However, in some situations, observations were not independent. As pointed out in [

The generalised autoregressive conditional heteroskedasticity (GARCH) model is widely used in time series analysis. A mixture generalized autoregressive conditional heteroscedastic (MGARCH) model was pointed out in [

There has been extensive studies about variable selection methods. A recent review of the literature regarding the variable selection problem in FMR models can be found in [

The method of the maximum penalized log-likelihood (MPL) estimation is usually the EM algorithm. [

The rest of the paper is organized as follows: in Section 2, the definition of finite mixture of time-varying regression Models and in Section 3, feature selection methods are discussed. In Section 4, the block-wise MM algorithm for its estimation and the BIC for choosing tuning parameters and components are presented, and the example of the Gaussian distribution is derived. Simulation studies on the performance of the new variable selection methods are then provided in Section 5. In Section 6, analysis of a real data set illustrates the use of the procedure. Finally, conclusions are given in Section 7.

Let { y t ; t = 1, ⋯ , n } be a response variable which is a time series. { x t ; t = 1, ⋯ , n } is a p-dimensional vector of covariates, and each of them is a time series. For an FM-AR(d) model with K components, the conditional density function for observation t is given as follows:

f ( y t ; x t , θ ) = ∑ k = 1 K π k f ( y t ; η k ( x t ) , ϕ k ) , (2)

where

η k ( x t ) = h ( α k + x t T β k 1 + x t − 1 T β k 2 + ⋯ + x t − d T β k d ) , (3)

for k = 1, ⋯ , K , for a given link function h ( ⋅ ) , and a dispersion parameter φ k t .

The master vector of all parameters is given by θ = ( π T , α T , ϕ T , β T ) T , with

β = ( β 11 ⋯ β 1 d ⋮ ⋱ ⋮ β K 1 ⋯ β K d ) , (4)

where β k i = ( β k i 1 , ⋯ , β k i p ) T ∈ ℝ p , i = 1, ⋯ , d . Let x ˜ t = ( x t T , x t − 1 T , ⋯ , x t − d T ) , and β ˜ = ( β k 1 , ⋯ , β k d ) T , (3) can be rewrote as η k ( x t ) = h ( α k + x ˜ t β ˜ ) .

Let { y t ; t = 1 , ⋯ , n } be a response variable which is a time series. Let { x t ; t = 1, ⋯ , n } is a p-dimensional vector of covariates, and each of them is a time series. For some distributions with unequal dispersion parameter ϕ k , we propose the FM-GARCH models. For an FM-GARCH (d,M,S) model with K components, the conditional density function for observation t is given as follows:

f ( y t ; x t , θ ) = ∑ k = 1 K π k f ( y t ; η k ( x t ) , ϕ k t ) , (5)

where η k ( x t ) = h ( α k + x ˜ t β ˜ ) for k = 1, ⋯ , K , for a given link function h ( ⋅ ) , and a conditional heteroscedastic (a dispersion parameter)

ϕ k t = γ 0 k + ∑ m = 1 M γ k m ϵ k , t − m + ∑ s = 1 S δ k s ϕ k , t − s , (6)

where γ 0 k > 0 , γ k m ≥ 0 , δ k s ≥ 0 , and ϵ k t = ϕ k t e k t , e k t is an independent and identically distributed series with mean zero and variance unity.

The master vector of all parameters is given by θ = ( π T , α T , γ 0 T , β T , γ T , δ T ) T , with γ 0 = ( γ 01 , ⋯ , γ 0 K ) T , γ = ( γ 1 , ⋯ , γ K ) T , γ k = ( γ k 1 , γ k 2 , ⋯ , γ k M ) T , and δ = ( δ 1 , ⋯ , δ K ) T , δ k = ( δ k 1 , δ k 2 , ⋯ , δ k S ) T .

Let { ( x t , y t ) ; t = 1, ⋯ , n } be a sample of observations from the FM-AR or FM-GARCH model. The quasi-likelihood function of the parameter θ is given by [

L n ( θ ) = ∏ t = 1 n f ( y t ; x t , θ ) = ∏ t = 1 n { ∑ k = 1 K π k f ( y t ; η k ( x t ) , ϕ k t ) } . (7)

The log quasi-likelihood function of the parameter θ is given by

L n ( θ ) = ∑ t = 1 n l o g ∑ k = 1 K π k f ( y t ; η k ( x t ) , ϕ k t ) . (8)

When the effect of a component of x is not significant, the corresponding ordinary maximum quasi-likelihood estimate is often close to 0, but not equal to 0. Thus this covariate is not excluded from the model. Inspired by an idea of [

F n ( θ ) = L n ( θ ) − P n ( θ ) , (9)

with the mixture penalty (or regularization) function:

P n k ( θ ) = ∑ k = 1 K π k ∑ i = 1 d ∑ j = 1 p p n ( β k i j ; λ n k ) + 1 2 ∑ k = 1 K π k ∑ i = 1 d ∑ j = 1 p υ n k β k i j 2 , (10)

for some ridge tuning parameter υ n k ≥ 0 , and p n ( β k i j ; λ n k ) is a nonnegative penalty function. In the penalty function P n ( θ ) , the amount of l 2 penalty imposed on the componentwise regression coefficients β k i j ’s are chosen proportional to π k . The functions p n ( β k i j ; λ n k ) are designed to identify the no significant coefficients β k i j ’s in the mixture components f ( y t ; η i ( x t ) , ϕ k t ) . General regularity conditions about the p n ( β k i j ; λ n k ) is given in [

We estimate the new method using the following well-known penalty (or regularization) functions:

• LASSO penalty: p n ( β ; λ n k ) = λ n k | β | .

• MCP penalty: p ′ n ( β ; λ n k ) = ( λ n k − n b n k | β | ) + .

• SCAD penalty: p ′ n ( β ; λ n k ) = λ n k I ( n | β | < λ n k ) + ( a n k λ n k − n | β | ) + a n k − 1 I ( n | β | > λ n k ) .

Here, I is the indicative function. The constant a n k ≥ 2 and b n k ≥ 0 pointed in [_{2}, MIXMCP-ML_{2}, MIXSCAD-ML_{2} penalties.

A new method for maximizing the penalized log-likelihood function is the block-wise Minorization Maximization (MM) algorithm inspired by [

We follow the approach of [

G 1 ( θ ; θ ( r ) ) = − 1 2 ∑ k = 1 K π i ∑ j = 1 d ∑ k = 1 p p n ( β i j k 2 w i j k ( r ) ; λ n i ) − 1 2 ∑ k = 1 K π i ∑ j = 1 d ∑ k = 1 p υ n i β i j k 2 + C 1 ( θ ( r ) ) , (11)

where w i j k ( r ) = β i j k 2 ( r ) + ε 2 , for some ε > 0 , and

C 1 ( θ ( r ) ) = − ε 2 2 ∑ k = 1 K π i ∑ j = 1 d ∑ k = 1 p p n − 1 ( w i j k ( r ) ; λ n i ) − 1 2 ∑ k = 1 K π i ∑ j = 1 d ∑ k = 1 p p n ( w i j k ( r ) ; λ n i ) . (12)

Moreover, minorize the log quasi-likelihood function L n ( θ ) by

G 2 ( θ ; θ ( r ) ) = ∑ k = 1 K ∑ t = 1 n τ k t ( r ) log π i + ∑ k = 1 K ∑ t = 1 n τ k t ( r ) log f ( y t ; η i ( x t ) , ϕ k t ) − ∑ k = 1 K ∑ t = 1 n τ k t ( r ) log τ k t ( r ) , (13)

where τ k t ( r ) = π i ( r ) f ( y t ; η i ( r ) ( x t ) , ϕ k t ( r ) ) / f ( y t ; x t , θ ( r ) ) .

Note that τ k t ( r ) and G 2 ( θ ; θ ( r ) ) are analogous to the posterior probability and the expected complete-data log-likelihood function of the expectation-maximization algorithm respectively.

The block-wise MM algorithm maximizes F n ( θ ) iteratively in the following two steps:

• Block-wise Minorization-step. Conditioned on the rth iterate θ ( r ) , the FM-GARCH model can be block-wise minorized in the coordinates of the parameter components π , α , γ 0 , γ , δ , and β , via the minorizers

G π ( π ; θ ) = G 2 ( π , α ( r ) , γ 0 ( r ) , β ( r ) , γ ( r ) , δ ( r ) ; θ ( r ) ) − P n ( π , β ( r ) ) , (14)

G α , γ 0 ( α , γ 0 ; θ ( r ) ) = G 2 ( π ( r ) , α , γ 0 , β ( r ) , γ ( r ) , δ ( r ) ; θ ( r ) ) − P n ( θ ( r ) ) , (15)

G γ , δ ( γ , δ ; θ ( r ) ) = G 2 ( π ( r ) , α ( r ) , γ 0 ( r ) , β ( r ) , γ , δ ; θ ( r ) ) − P n ( θ ( r ) ) , (16)

G β ( β ; θ ( r ) ) = G 1 ( π ( r ) , β ; θ ( r ) ) + G 2 ( π ( r ) , α ( r ) , γ 0 ( r ) , β , γ ( r ) , δ ( r ) ; θ ( r ) ) , (17)

respectively. Similar block-wise minorized can be made for FM-AR model.

• Block-wise Maximization-step. Upon finding the appropriate set of block-wise minorizers of F n ( θ ) , we can maximize (14) to compute the ( r + 1 ) th iterate block-wise update of π . Solving for the appropriate root of the FOC (first-order condition) for the Lagrangian, we can compute the ( r + 1 ) th iterate block-wise update

π k ( r + 1 ) = ∑ t = 1 n τ k t ( r ) ζ * + z k , (18)

for each k, where z k = ∑ i = 1 d ∑ j = 1 p p n ( β k i j ; λ n i ) + 1 2 ∑ i = 1 d ∑ j = 1 p υ n i β k i j 2 , and ζ * is the unique root of

in the interval

The block-wise updates for

We now present a example of the Gaussian FM-GARCH model to specify the procedure described above, and give the following Lemma 1 about a useful minorizer for the MPL estimation of the Gaussian FM-GARCH model, which can be found in [

Lemma 1 if

Example 1 We consider the Gaussian FM-GARCH Model,

where

Here,

According to [

where

The block-wise updates of

for each k. Moreover, the coordinate-wise updates for the

for each k, m, and s. Finally, making the substitute (22) into (17), the coordinate-wise updates for the

for each k and

Note that (15)-(17) from Gaussian FM-GARCH Model are concave in the alternative parameterization

To implement the methods described in Sections 3 and 4.1, we need to select the size of the tuning parameters

where

The Block-wise MM algorithm is iterated until some convergence criterion is met. In this article, we choose to use the absolute convergence criterion, where TOL > 0 is a small tolerance constant from [

In this section, we evaluate the performance of the proposed method and algorithm via simulations. We consider the Gaussian FM-AR models and Gaussian FM-GARCH models. Following [

The first simulations are based on the Gaussian FM-AR (2) model. Assuming that K is known, the model for the simulation was a

where

_{2} is 100, however, the S2 in com1 of _{2} (S2 = 70.7) and MIXMCP-ML_{2} (S2 = 51.3) model are small, which indicates that MIXSCAD-ML_{2} ensures that non-zero coefficients can be correctly identified and some non-zero coefficients in the MIXLASSO-ML_{2} and MIXMCP-ML_{2} model are not estimated. The mean estimate over all falsely identified non-zero predictors (_{2} are between 0.001 and 0.01.

The second simulations are based on the Gaussian FM-GARCH(2,1,1) model. Also assuming that K is known, the model for the simulation was a

for

Method | Com | |||||||
---|---|---|---|---|---|---|---|---|

MIXSCAD-ML_{2} | 2*2*10 | com1 | 86.0 | 99.5 | 0.097 | 90.0 | 99.7 | −0.012 |

2*2*20 | 91.2 | 99.5 | 0.067 | 91.6 | 99.7 | −0.003 | ||

2*2*100 | 81.7 | 100.0 | 0.016 | 82.6 | 100.0 | 0.009 | ||

com2 | 94.3 | 99.3 | 0.020 | 95.5 | 100.0 | −0.093 | ||

94.2 | 99.3 | 0.013 | 96.1 | 100.0 | −0.018 | |||

90.7 | 100.0 | -0.015 | 90.5 | 100.0 | 0.008 | |||

MIXMCP-ML_{2} | 2*2*10 | com1 | 80.1 | 100.0 | 0.040 | 87.6 | 100.0 | 0.005 |

2*2*20 | 91.9 | 100.0 | 0.100 | 92.8 | 100.0 | 0.027 | ||

2*2*100 | 98.1 | 81.0 | 0.304 | 98.1 | 51.3 | 0.205 | ||

com2 | 93.0 | 100.0 | 0.041 | 96.5 | 100.0 | −0.015 | ||

96.8 | 100.0 | 0.055 | 98.4 | 100.0 | 0.084 | |||

97.4 | 100.0 | 0.076 | 97.2 | 100.0 | 0.037 | |||

MIXLASSO-ML_{2} | 2*2*10 | com1 | 76.1 | 100.0 | 0.089 | 76.3 | 99.7 | −0.019 |

2*2*20 | 81.6 | 100.0 | 0.066 | 81.4 | 100.0 | −0.011 | ||

2*2*100 | 80.5 | 76.0 | 0.053 | 81.1 | 70.7 | 0.041 | ||

com2 | 85.1 | 100.0 | 0.015 | 88.3 | 100.0 | −0.001 | ||

91.2 | 87.3 | 0.001 | 90.8 | 100.0 | −0.015 | |||

79.1 | 99.3 | 0.048 | 87.1 | 100.0 | −0.039 |

From _{2} are the biggest, which indicates that MIXSCAD-ML_{2} perform better than MIXLASSO-ML_{2} and MIXMCP-ML_{2} in correctly estimated zero coefficients. The mean estimate over all falsely identified non-zero predictors (_{2} is smaller than which from MIXLASSO-ML_{2} and MIXMCP-ML_{2}.

In this section, we evaluate the performance of the proposed method and algorithm via the analysis of the behavior of urban vehicular traffic of the city of São Paulo. This data set were collected notable occurrences of traffic in the metropolitan region of São Paulo in the period from 14 to 18 December 2009. This was acquired from the website http://archive.ics.uci.edu/ml/datasets.php. Registered from 7:00 to 20:00 every 30 minutes. It contains 135 observations and 18

Method | Com | |||||||
---|---|---|---|---|---|---|---|---|

MIXSCAD-ML_{2} | 2*2*10 | com1 | 88.8 | 89.5 | 0.408 | 92.4 | 84.0 | −0.048 |

2*2*20 | 89.9 | 84.5 | 0.432 | 91.5 | 79.0 | 0.168 | ||

com2 | 94.9 | 96.3 | 0.051 | 97.0 | 98.0 | −0.139 | ||

96.3 | 92.0 | 0.076 | 95.7 | 95.0 | 0.008 | |||

MIXMCP-ML_{2} | 2*2*10 | com1 | 80.8 | 94.0 | 0.417 | 87.4 | 81.3 | 0.115 |

2*2*20 | 85.8 | 78.5 | 0.540 | 87.3 | 68.0 | 0.031 | ||

com2 | 89.4 | 95.7 | 0.158 | 94.0 | 99.0 | 0.138 | ||

93.4 | 91.0 | 0.269 | 95.6 | 95.5 | 0.118 | |||

MIXLASSO-ML_{2} | 2*2*10 | com1 | 73.9 | 84.5 | 0.426 | 79.9 | 76.0 | −0.015 |

2*2*20 | 81.3 | 66.5 | 0.579 | 83.5 | 56.7 | −0.117 | ||

com2 | 76.7 | 96.0 | 0.080 | 83.6 | 99.5 | 0.018 | ||

88.2 | 75.0 | 0.111 | 93.4 | 90.5 | −0.126 |

variables as well as one response variable. Covariate acronyms are hour (HO), immobilized bus (IB), broken truck (BT), vehicle excess (VE), accident victim (AV), running over (RO), fire vehicles (FV), occurrence involving freight (OIF), incident involving dangerous freight (IIDF), lack of electricity (LOE), fire (FI), point of flooding (POF), manifestations (MA), defect in the network of trolleybuses (DNT), tree on the road (TRR), semaphore off (SO), intermittent Semaphore (IS) and the response is slowness in traffic percent. Consider the effect of date on the behavior of traffic, we add a new variable that is day (DA).

The levels of the covariates attributes from FMR, FM-AR (2) and FM-GARCH (2,1,1) with _{2} penalized FM-GARCH (2,1,1) with

Covariates | FMR | FM-AR | FM-GARCH | |||||||
---|---|---|---|---|---|---|---|---|---|---|

com1 | com2 | com1 | com2 | com1 | com2 | |||||

Intercept | 7.32 | −2.31 | 7.56 | - | −1.89 | - | 1.39 | - | 6.24 | - |

0.37 | 0.63 | 0.34 | - | 0.66 | - | 0.47 | - | 0.53 | - | |

DA | - | 1.47 | - | - | - | 1.54 | - | - | 0.99 | - |

HO | 0.13 | 0.52 | 0.11 | 0.36 | - | 0.13 | 0.29 | 0.39 | - | −0.03 |

IB | - | - | - | - | - | - | - | - | - | - |

BT | - | - | - | - | - | - | - | - | - | - |

VE | - | - | - | - | - | - | - | - | - | - |

AV | - | - | - | - | - | - | - | - | - | - |

RO | - | - | - | - | - | - | - | - | - | - |

FV | - | - | - | - | - | - | - | - | - | - |

OIF | - | - | - | - | - | - | - | - | - | - |

IIDF | - | - | - | - | - | - | - | - | - | - |

LOE | - | 1.75 | - | - | - | 1.88 | - | - | - | 1.80 |

FI | - | - | - | - | - | - | - | - | - | - |

POF | - | 0.61 | - | 1.25 | - | - | - | 1.41 | - | - |

MA | - | - | - | - | - | - | - | - | - | - |

DNT | - | - | - | −0.91 | - | - | - | −0.71 | - | - |

TRR | - | - | - | - | - | - | - | - | - | - |

SO | - | - | - | - | - | - | - | - | - | - |

IS | - | - | - | - | - | - | - | - | - | - |

model | K | BIC | MSE | |
---|---|---|---|---|

FM-GARCH (2,1,1) | 2 | 622.90 | 1.93 | 0.90 |

FM-AR (2) | 2 | 677.32 | 2.09 | 0.8 |

FMR | 2 | 682.36 | 2.41 | 0.87 |

In this article, we disccused that the modeling of response variable which is time series and with a finite mixture distribution depends on covariates, and the variable selection problem of them. We propose the FM-AR models and FM-GARCH models for modeling data that arise from a heterogeneous population which is time series, and propose a new regularization method (MIXLASSO-ML_{2}, MIXMCP-ML_{2}, MIXSCAD-ML_{2}) for the variable selection in these model, which composed of the mixture of the _{2} is always superior to other penalty methods.

The authors declare no conflicts of interest regarding the publication of this paper.

Liu, J. and Ye, W.Z. (2020) Variable Selection in Finite Mixture of Time-Varying Regression Models. Advances in Pure Mathematics, 10, 101-113. https://doi.org/10.4236/apm.2020.103007