A Fast Iteration Method for Mixture Regression Problem

In this paper, we propose a Fast Iteration Method for solving mixture regression problem, which can be treated as a model-based clustering. Compared to the EM algorithm, the proposed method is faster, more flexible and can solve mixture regression problem with different error distributions (i.e. Laplace and t distribution). Extensive numeric experiments show that our proposed method has better performance on randomly simulations and real data.


Introduction
In some situations, the data may not be suitable for the linear model, such as nonlinear regression, nonparametric regression, generalized linear model.The mixture regression problem discussed in this paper is a situation with mixed data.Specifically, in the observations, some data are from a model, while others are from other models.As in [1], mixture regression problem can be treated as a regression or a clustering problem which can be written as: =  are unknown vectors of regression coefficients and unknown positive scalars, respectively.The random errors ε are assumed to be independent of the i X and ik π is the probability of observation i x belong to population Ω k (i.e. ( ) . The Equation (1.1), a mixture regression model, can also be treated as a model-based clustering [2], which can be solved by an EM Algorithm [3]- [5].In fact, EM Algorithm is a statistical method for maximizing the likelihood function by iterative method.The algorithm can be divided into two steps.The one is E-step which is used for estimating the exception for the parameters.The other one is M-step, which is used for maximize the likelihood function under the parameters predicted in E-step.The iteration will continue until the change of likelihood function is less than a given value (i.e. 10 −6 ).
In [6], Song used an EM algorithm for solving robust mixture regression model.The details of this algorithm are described as follows: • Initialize the value of the parameters: ( ) , , , , , , • E-Step: At the (k + 1)th iteration, calculate: • M-Step: Use the following value: to maximize: ( ) Solving mixture regression problem based on EM algorithm is a complex work with large amount calculation in each iteration.In this paper, we propose a Fast Iteration Method inspired by k-means clustering [7] to solve the mix regression problem.The aim of our method is to fit the data into several linear models.It can also be treated as a model that uses several lines to explain the data (See Figure 1).We will introduce this Fast Iteration Method in the next section.

Existence of Parameter Matrix
For the situation of the mixture regression problem, the aim of this problem is to find several linear models (i.e.every parameter β k of the model).While, if we known whether an observation is belong to each population, mixture regression model can be treated as a simple linear model.That is: there exist 1 , , a  β β to minimize the square error.We give a theorem as follows.
Theorem 2.1 The existence of the parameter matrix β .In the mixture regression problem, there exists a parameter matrix β which can minimize the square error.

Proof
is given which stand for the ith data which belong to Ω k or not.If τ is fixed, the model can be written as: If we know every ij τ , this model is a linear model.For example, if there are two models 1 Ω and 2 Ω in which includes one variable (p = 1, g = 2).The full model can be described as: If the elements ik τ of matrix τ are given, the problem is the same as a linear regression model.As ik τ is 0 or 1, so that matrix τ has limited combinations: 2 ik .
For the limited combinations, there exists a combination which will lead to the minimum square error.

Fast Iteration Method for Mixture Regression Problem
EM algorithm is meaningful to the mixture regression problem.However, there are still some other methods to solve this question.The algorithm below is a fast iteration for mixture regression model, which could solve the regression situation with data in different populations.This method is inspired by K-means (the famous clustering algorithm) which calculate the distance between each point to other models and replace the "worst" observation to the suitable model.After finishing this type of calculation for several times, the algorithm will stop until moving any points to other model won't make the loss-function better, that is, the change of loss function will below a threshold (10 −6 or 10 −9 ).In the question of small samples, this stop rules will lead to find a best classifying: moving any observation to any other populations will make things worse.Sometimes set a threshold will avoid the algorithm fall into the endless loop.
Our proposed Fast Iteration Method is similar to K-means algorithm, calculate the MSE for each point to every model and change the point which can decrease the MSE most.We summarize our method as follows.
Algorithm: 1) Calculate the initial value: Group information: cut the data randomly into g parts ( ) , every part has about [ ] n g observations.
Move observation I to population K, so that we can refresh the 1 2 , Ω , Ω , Ω g  : .
For the method of parameter estimating in the algorithm,

( )
, F X Y can be changed according to the different situations.For example, if we want to get the OLS estimation the ( ) If we need a robust estimation,

( )
, F X Y can be a method like median regression.Different methods for parameter estimation make the model much more flexible.OLS estimation can be solved quickly and median regression performs better if ε draw from a Laplace or t distribution.

Numeric Simulation
In order to validate the rationality of the model, we designed a numeric simulation and generated sample data , x y = in these three models.
We used three methods for comparing.Fast Iteration with Linear Model (FI-OLS), Fast Iteration with median regression model (FI-LAE) and EM algorithm are used for solving mixture regression problem for each model.
Repeat the simulation with 1000 times and we got the bias and MSE of every parameter (see Table 1 2, FI-OLS performs better in normal distribution which got a smaller MSE than FI-LAE and EM algorithm.In other distribution in model 2, FI-LAE is better than FI-OLS and EM algorithm got the biggest MSE.In Table 3, we can also see the FI-LAE got a small MSE in mixture and t distribution, while in the situation of Laplace distribution, FI-LAE is a little bit better than EM algorithm. As we described our model as a "fast" iteration method, the FI-OLS and FI-LAE are calculated faster than EM algorithm.For 100 observations with 2 populations, EM got about 0.07 s for mixture regression (Rpackage: Mixreg), while FI-OLS used about 0.02 s (i5, 8G memory).

Real Data Simulation
In the data simulation section, we use the data by Cohen (1984) [8].A data shows pure fundamental tone was played to a trained musician.150 observations of tuned and stretch ratio are played by the same musician.In this section, we will see the FI-LAE algorithm will handle the mixture regression data with outliers.
We used three algorithms in these four situations.In the first situation, the original data is used for regression.In other three situations, 5 outliers in different position are places in the data.(3,5) for the Situation 2, (1.5,0 )for the Situation 3 and (0,5) for Situation 4. The algorithms we used are FI-OLS, FI-LAE and EM algorithm.FI-OLS and FI-LAE are mentioned in our Fast Iteration for Mixture Regression model and the Mixreg package in R [9] is used for EM algorithm.In the situation 1, three algorithms got the similar answers.They all perform really well.In other situations, the FI-LAE shows that if there are some outliers in the data, a robust regression will lead to a closer answer, while the EM and FI-OLS affected by the outliers in different degrees.

Conclusion
As the discussion above, we can safely draw the conclusion that the fast iteration for solving mixture regression problem is efficiently and effectively.Compared to ordinary EM algorithm, this method can solve the problem quickly and obtain perfect performance.After changing the method of parameter estimation, the Fast Iteration Method can solve mixture regression when ε is in different distributions.
is the ith observation of the data.

Figure 1 .
Figure 1.Three lines example for mixture regression problem.

Figure 2 .
Figure 2. Real data simulation for FI-OLS, FI-LAE and EM algorithm.

Table 1 ,
Table 2 for model 2 and Table 3 for model 3) As we can see in the three tables, simulation shows the Fast Iteration for Matrix Regression with LAE performs better in Laplace distribution and t-distribution.More specifically, in FI-LAE has the smaller MSE in Laplace distribution, Mixture normal distribution and t distribution.EM algorithm gets a better estimation in normal distribution.In Table