^{1}

^{1}

Mixture regression is a regression problem with mixed data. Specifically, in the observations, some data are from one model, while others from other models. Only after assuming the quantity of the model is given, EM or other algorithms can be used to solve this problem. We propose an information criterion for mixture regression model in this paper. Compared to ordinary information citizen by data simulations, results show our citizen has better performance on choosing the correct quantity of models.

Mixture regression is a special situation in regression problem. Rather than getting samples in one distribution, the data of mixture regression are from multiple distributions (the information of which distribution every observation from is unknown), which will make a bad effect in parameter estimation. The mixture regression problem can be described as follows [

For all the mixture regression problem,

Furthermore,

Parameter estimation can be obtained by EM algorithm. Fraley et al., [

Column vector

E-step in mixture regression model can be obtained by:

When

As every observation is independent, covariance matrix can be defined as

Song et al., [

Moreover, all the algorithms mentioned below is considering the quantity of models g is known. However this will not happened in every condition. The number of models g need to be chosen before the algorithm. When X is a low dimension matrix, a scatter plot can be drawn for choosing g. To get the true quantity of models, watching scatter plot and giving a conclusion is not suitable for a high-dimension situation. It was meaningful to discussing how to create a proper method choosing the right quantity of models in a mixture regression problem.

The rest of the paper is organized as follows. Section 2 will discuss the equivalence between mixture regre- ssion and ordinary regression when classification matrix is fixed. We extend a method based on information criterion in Section 3. Section 4 is the data simulation of different information criterions. Proof of theorem is in the Appendix section.

Unsupervised learning has its method to choose the quantity of clusters, like GAP statics in K-means [

To find a proper method for choosing the quantity of models, we need to find the relationships between mixture regression and other algorithms. In some conditions, such as classification matrix Z is fixed and random error has the same variance, mixture regression can be written as a linear regression.

Theorem 1 (Equivalence between Mixture Regression and Linear Regression) If the estimater of

When random error in every model is independent and identically distributed from a normal distribution (

The proof can be found in the Appendix.

After proofing this theorem, we can use the evaluation methodology from regression to solve the quantity choosing in mixture regression.

For a regression problem, Akaike information criterion (AIC) or Bayesian information criterion (BIC) [

The best model is the one with the minimum AIC (BIC). L is the likelihood function which states the goodness of fitting (expression (3)). k is the penalty of the information criterion standing for the number of unknown parameters in the model. In linear regression, k means the number of dependent variables. As for BIC, the penalty is larger, weight of penalty comes to

In mixture regression, parameters in classification matrix should be considered as part of the estimator variables. Despite these variables, the model will tend to choosing a larger quantity of models which is also an overfitting problem.

For every observation,

Akaike information criterion for Mixture regression(AICM) and Bayesian information criterion for mixture (BICM) regression is:

AICM and BIC can be used for the quantity selecting in mixture regression problem. However, penalty weight for g in BICM is

In order to validating the rationality of the model, we designed numeric simulations and generated sample data

• Simulation I: 100 samples from 2 distributions. (

• Simulation II:200 samples from 2 distributions. (

• Simulation III:150 samples from 3 distributions. (

Models from simulation I is:

where

Lightaqua | Mean Value of Information Criterionn | Selected | |||||||
---|---|---|---|---|---|---|---|---|---|

Lightaqua Rules | Sample Size | 1 | 2(**) | 3 | 4 | ||||

AIC | 50*2 | 883.99 | 290.1 | 235.36 | 193.81 | 0 | 0 | 0 | 100 |

BIC | 50*2 | 891.8 | 295.31 | 243.17 | 204.23 | 0 | 0 | 0 | 100 |

AICM | 50*2 | 883.99 | 490.1 | 635.36 | 793.81 | 0 | 98 | 2 | 0 |

BICM | 50*2 | 891.8 | 755.82 | 1164.21 | 1585.78 | 2 | 98 | 0 | 0 |

Std. | - | 14.68 | 67.45 | 18.04 | 22.75 | - | - | - | - |

The models in simulation II is same as simulation I. While, the samples in simulation II is 100 for each distribution.

Simulation III has three distributions with 50 samples in each distribution.

See

Lightaqua | Mean Value of Information Criterion | Selected | |||||||
---|---|---|---|---|---|---|---|---|---|

Lightaqua Rules | Sample Size | 1 | 2(**) | 3 | 4 | ||||

AIC | 100*2 | 1766.66 | 565.28 | 484.74 | 413.171 | 0 | 0 | 2 | 98 |

BIC | 100*2 | 1776.55 | 571.87 | 494.64 | 426.36 | 0 | 0 | 2 | 98 |

AICM | 100*2 | 1766.66 | 965.28 | 1284.74 | 1613.17 | 0 | 100 | 0 | 0 |

BICM | 100*2 | 1776.55 | 1631.54 | 2613.96 | 3605.35 | 0 | 100 | 0 | 0 |

Std. | - | 20.51 | 18.10 | 30.88 | 34.94 | - | - | - | - |

Lightaqua | Mean Value of Information Criterion | Selected | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Lightaqua Rules | Sample Size | 1 | 2 | 3(**) | 4 | 5 | |||||

AIC | 50*3 | 1264.38 | 908.35 | 429.91 | 368.35 | 326.90 | 0 | 0 | 0 | 1 | 99 |

BIC | 50*3 | 1273.41 | 914.38 | 438.95 | 380.40 | 341.95 | 0 | 0 | 0 | 3 | 97 |

AICM | 50*3 | 1264.38 | 1208.35 | 1029.91 | 1268.35 | 1526.9 | 1 | 2 | 97 | 0 | 0 |

BICM | 50*3 | 1273.41 | 1665.97 | 1942.14 | 2635.18 | 3348.33 | 100 | 0 | 0 | 0 | 0 |

Std. | - | 22.09 | 64.34 | 97.55 | 23.38 | 24.18 | - | - | - | - | - |

According to the results in three simulations, we can see AICM and BICM show a good result in small g (

Dawei Lang,Wanzhou Ye, (2016) Selecting the Quantity of Models in Mixture Regression. Advances in Pure Mathematics,06,555-563. doi: 10.4236/apm.2016.68044

Proof of theorem 1

Proof. Linear regression has the form of:

To proof this theorem, mixture regression need to be written as the form above. And when every random error has the same variance, random error in mixture regression is also a normal distribution.

In mixture regression problem, ith observation

We have:

Because ith observation can be written as a product of vectors, population of observation can be written as

For the observation

In the distribution of variable

so

Submit your manuscript at: http://papersubmission.scirp.org/