Variable Selection in Finite Mixture of Time-Varying Regression Models

In this paper, we research the regression problem of time series data from heterogeneous populations on the basis of the finite mixture regression model. We propose two finite mixed time-varying regression models to solve this. A regularization method for variable selection of the models is proposed, which is a mixture of the appropriate penalty functions and 2 l penalty. A Block-wise minimization maximization (MM) algorithm is used for maximum penalized log quasi-likelihood estimation of these models. The procedure is illustrated by analyzing simulations and with an application to analyze the behavior of urban vehicular traffic of the city of São Paulo in the period from 14 to 18 December 2009, which shows that the proposed models outperform the FMR models.


Introduction
The problem of variable selection in FMR models has been widely discussed [1] [2] [3]. When a response variable y with a finite mixture distribution depends on covariates x , we obtain a finite mixture of regression (FMR) model. The FMR model with K components can be given as follows [3]: where y is an independent and identically distributed (IID) response and x is a 1 p × vector of covariates. However, in some situations, observations were not independent. As pointed out in [2], in the analysis of the PD data, observations from each patient over time were assumed to be independent to facilitate the analysis and comparison with results from the literature. However, the validity of such assumption may be questionable. Whereupon, we consider a situation that observations were time series.
The generalised autoregressive conditional heteroskedasticity (GARCH) model is widely used in time series analysis. A mixture generalized autoregressive conditional heteroscedastic (MGARCH) model was pointed out in [4]. [5] generalized the MixN-GARCH model by relaxing the assumption of constant mixing weights. Whereupon, we combine the GARCH model and the FMR model to discuss the above problem.
There has been extensive studies about variable selection methods. A recent review of the literature regarding the variable selection problem in FMR models can be found in [6]. There are a general family of penalty functions, including the least absolute shrinkage and selection operator (LASSO), the minimax concave penalty (MCP) and the smoothly clipped absolute deviation (SCAD) in [2] and [7].
The method of the maximum penalized log-likelihood (MPL) estimation is usually the EM algorithm. [8] proposed a new algorithm (block-wise MM) for the MPL estimation of the L-MLR model. It was proved to have some desirable features such as coordinate-wise updates of parameters, monotonicity of the penalized likelihood sequence, and global convergence of the estimates to a stationary point of the penalized loglikelihood function, which are missing in the commonly used approximate-EM algorithm presented in [3].
The rest of the paper is organized as follows: in Section 2, the definition of finite mixture of time-varying regression Models and in Section 3, feature selection methods are discussed. In Section 4, the block-wise MM algorithm for its estimation and the BIC for choosing tuning parameters and components are presented, and the example of the Gaussian distribution is derived. Simulation studies on the performance of the new variable selection methods are then provided in Section 5. In Section 6, analysis of a real data set illustrates the use of the procedure. Finally, conclusions are given in Section 7.

Finite
for 1, , k K =  , for a given link function ( ) h ⋅ , and a dispersion parameter kt φ .
The master vector of all parameters is given by ( ) T  T  T  T  T , , , = θ π α φ β , with , ,

Feature Selection Method
 be a sample of observations from the FM-AR or FM-GARCH model. The quasi-likelihood function of the parameter θ is given by [9] ( ) ( ) ( ) ( ) The log quasi-likelihood function of the parameter θ is given by When the effect of a component of x is not significant, the corresponding ordinary maximum quasi-likelihood estimate is often close to 0, but not equal to 0. Thus this covariate is not excluded from the model. Inspired by an idea of [2], we estimate θ by maximizing the penalized log quasi-likelihood function (MPLQ) for the model with the mixture penalty (or regularization) function: for some ridge tuning parameter  MCP penalty: Here, I is the indicative function. The constant 2 nk a ≥ and 0 nk b ≥ pointed in [2], and LASSO tuning parameter 0 nk λ ≥ , which controls the amount of penalty. The asymptotic properties about these penalty functions can be analogously derived in [3] and [2]. We call the penalty function ( ) nk  θ in (10) constructed from LASSO, MCP, SCAD jointly with the mixed 2 L -norm as MIXLASSO-ML 2 , MIXMCP-ML 2 , MIXSCAD-ML 2 penalties.

Numerical Solutions
A new method for maximizing the penalized log-likelihood function is the block-wise Minorization Maximization (MM) algorithm inspired by [8], which is also known as block successive lower-bound maximization (BSLM) algorithm in the language of [10]. At each iteration of the method, the function is maximized with respect to a single block of variables while the rest of the blocks are held fixed. We shall now proceed to describe the general framework of the algorithm.

Maximization of the Penalized Log-Likelihood Function
We follow the approach of [8] and minorize the ε -approximate of -( ) where ( ) Moreover, minorize the log quasi-likelihood function Note that   The block-wise updates for α , 0 γ , γ , δ , and β can be obtained by solving (15)-(17) via the first-order condition equal to 0.
We now present a example of the Gaussian FM-GARCH model to specify the procedure described above, and give the following Lemma 1 about a useful minorizer for the MPL estimation of the Gaussian FM-GARCH model, which can be found in [11].
; , ; , , , and kt e is an independent and identically distributed series with mean zero and variance unity.
According to [8], and using Lemma 1, we can obtain the further minorizer of Gaussian FM-GARCH by The block-wise updates of π from Gaussian FM-GARCH Model come from (18), and the block-wise updates for α , γ , and δ , can be obtained from (15)-(16) via the first-order condition equal to 0. By doing so, we obtain the coordinate-wise updates for α , 0 γ block r r ′ π β θ is the first derivative of (11) with respect to β .

Selection of Thresholding Parameters and Components
To implement the methods described in Sections 3 and 4.1, we need to select the size of the tuning parameters nk where p  is the dimensionality of β (i.e. the total number of non-zero regression coefficients in these model), and q  equal to 3K (FM-AR models) or 5K (for FM-GARCH models). The Block-wise MM algorithm is iterated until some convergence criterion is met. In this article, we choose to use the absolute convergence criterion, where TOL > 0 is a small tolerance constant from [8]. Based on the discussion above, we summarise our algorithm in 1.

Simulated Data Analysis
In this section, we evaluate the performance of the proposed method and algorithm via simulations. We consider the Gaussian FM-AR models and Gaussian FM-GARCH models. Following [2] and [8], we used the correctly estimated zero coefficients (S1), correctly estimated non-zero coefficients (S2) and the mean estimate over all falsely identified non-zero predictors ( NZ M ). The selection of thresholding parameters and components are solving by using Simulated Annealing (SA) algorithm. All simulations were evaluated with varying values of dimension p with 100 repetitions done for each.
Columns of x are drawn from a multivariate normal, with mean 0, variance 1, and two correlation structures: ( )  Table 1 reports the results. We can see that when the dimension p = 100, the S2 in com1 of 1 t X − from MIXSCAD-ML 2 is 100, however, the S2 in com1 of

Simulated Data Analysis of Gaussian FM-GARCH
The second simulations are based on the Gaussian FM-GARCH(2,1,1) model. Also assuming that K is known, the model for the simulation was a 2 K = ,  T  2  T  T  2  1  11  1 12  1,  2  21  1 22  2,   ,  1 , , . The regression coefficients are Table 1. Summary of MIXLASSO-ML2, MIXMCP-ML2 and MIXSCAD-ML2-penalized FM-AR (2) model with BIC method form the simulated scenario. Average correctly estimated zero coefficients (specificity; S1), average correctly estimated non-zero coefficients (sensitivity; S1), and the mean β estimate over all incorrectly estimated non-zero coefficients (MNZ) are also reported.   Table 2, we can see that in all simulations, the value of S1 in com1 and com2 of t X and 1 t X − from MIXSCAD-ML 2 are the biggest, which indicates that MIXSCAD-ML 2 perform better than MIXLASSO-ML 2 and MIXMCP-ML 2 in correctly estimated zero coefficients. The mean estimate over all falsely identified non-zero predictors ( NZ M ) of β from MIXSCAD-ML 2 is smaller than which from MIXLASSO-ML 2 and MIXMCP-ML 2 .

Discussion
In this article, we disccused that the modeling of response variable which is time series and with a finite mixture distribution depends on covariates, and the variable selection problem of them. We propose the FM-AR models and FM-GARCH models for modeling data that arise from a heterogeneous population which is time series, and propose a new regularization method