^{1}

^{1}

The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effective algorithm to estimate the finite mixture model parameters. However, EM algorithm can not guarantee to find the global optimal solution, and often easy to fall into local optimal solution, so it is sensitive to the determination of initial value to iteration. Traditional EM algorithm select the initial value at random, we propose an improved method of selection of initial value. First, we use the k-nearest-neighbor method to delete outliers. Second, use the k-means to initialize the EM algorithm. Compare this method with the original random initial value method, numerical experiments show that the parameter estimation effect of the initialization of the EM algorithm is significantly better than the effect of the original EM algorithm.

Assessment of this performance of an algorithm generally relates to its efficiency, ease of operation and operation results. We are concerned about the efficiency of the iteration, and one of the factors that affect the efficiency of the iteration is the selection of the initial value. EM algorithm has a very important application in the Gaussian mixture model (GMM). In simple terms, if we don’t know neither the parameters of the mixed model nor the classification of the observed data, EM algorithm is a popular algorithm for estimating the parameters of finite mixture model. However, sometimes its performance is poor. This algorithm has an obvious shortcoming: it is very sensitive to the initial value. Therefore, in order to get the parameter estimation of the closest to the true value, we have to find a method to initialize the EM algorithm. We can list several usual methods with initialization: random center, hierarchical clustering, k-means algorithm and so on [

The mixture model is a useful tool for density estimation, and can be viewed as a kind of kernel method [

p ( x | θ ) = ∑ i = 1 k α i p i ( x | θ i ) = ∑ i = 1 k α i ( 2 π ) − d 2 | ∑ i | − 1 2 exp ( − 1 2 ( x − μ i ) T ∑ i − 1 ( x − μ i ) ) (1)

Also there is:

p i ( x | θ i ) = ( 2 π ) − d 2 | ∑ i | − 1 2 exp ( − 1 2 ( x − μ i ) T ∑ i − 1 ( x − μ i ) , i = 1 , 2 , ⋯ , k (2)

This is the probability density function of the i-th branch, which has a mean

μ i , a covariance ∑ i , and mixing proportion α i , and ∑ i = 1 k α i = 1 , θ i = ( μ i T , ∑ i ) T ,

θ = ( θ 1 T , θ 2 T , ⋯ , θ k T , α 1 , α 2 , ⋯ , α k ) T is a vector corresponding to all unknown parameters.

The classical and natural method for computing the maximum-likelihood estimates (MLEs) for mixture distributions is the EM algorithm (Dempster et al., 1977), which is known to possess good convergence properties. In other words, the EM algorithm is a kind of the iterative algorithm to solve the maximum likelihood estimate when the observation data is incomplete data, which has good application value as well. The process of parameter estimation of the EM algorithm is given in reference [

Expectation Step: calculate mathematical expectation of p ( θ | Y , Z ) or log p ( θ | Y , Z ) on the conditional distribution of Z, namely:

Q ( θ | θ ( t ) , Y ) = E z [ log p ( θ | Y , Z ) | θ ( t ) , Y ] = ∫ z log [ p ( θ | Y , Z ) ] p ( Z | θ ( t ) , Y ) d Z (3)

Maximization Step: maximize the Q ( θ | θ ( t ) , Y ) , that is to find a point θ ( t + 1 ) , then:

Q ( θ ( t + 1 ) | θ ( t ) , Y ) = max θ Q ( θ | θ ( t ) , Y ) (4)

after getting θ ( t + 1 ) , this forms an iteration θ ( t ) → θ ( t + 1 ) , iterate Expectation step and Maximization step until ‖ θ ( t + 1 ) − θ ( t ) ‖ or ‖ Q ( θ ( t + 1 ) | θ ( t ) , Y ) − Q ( θ ( t ) | θ ( t ) , Y ) ‖ is sufficiently small.

Outliers are data objects that are inconsistent with behavior or model of most of the data in the entire data set [

One of the easiest ways to measure whether an object is far away from most points is using the distance to the k-nearest neighbor. The outlier score of an object is given by the distance between the object and its the k-nearest neighbor. The minimum outlier score is 0, and the maximum value is the maximum value of the distance function: it is usually infinite.

Outlier score is highly sensitive to the value of k. If the k is too small, then a small amount of adjacent outliers can lead to a low outlier score. If k is too large, all objects in the clusters with points less than k are likely to be outliers. In order to make the scheme more robust for the selection of k, we can use the average distance of the first k nearest neighbors.

The K-means algorithm is one of the most popular iterative descent clustering methods [

Step 1: Choose samples m randomly: x ¯ 1 ( k ) , x ¯ 2 ( k ) , ⋯ , x ¯ m ( k ) as the cluster centers of mixed data.

Step 2: For the remaining n − m data, noted:

x 1 , x 2 , ⋯ , x j , ⋯ , x n − m , d j i ( k ) = | x j − x ¯ i ( k ) | (5)

This function represents the distance from x j to x ¯ i ( k ) , j = 1 , 2 , ⋯ , n − m , i = 1 , 2 , ⋯ , m . For fixed x j , if | x j − x ¯ i ( k ) | = min d j i ( k ) , i = 1 , 2 , ⋯ , m . Then x j is classified into the category i . Thus, we divide the data into m classes: C 1 ( k ) , C 2 ( k ) , ⋯ , C m ( k ) .

Step 3: Use formula:

x ¯ i k + 1 = 1 n i ( k ) ∑ j = 1 n i ( k ) x j i , x j i ∈ C i ( k ) (6)

n 1 ( k ) + n 2 ( k ) + ⋯ + n i ( k ) = n , i = 1 , 2 , ⋯ , m

Find out the sample mean of the data of each class in the second step as a new cluster center.

Step 4: For a given number ε > 0 , if ∑ i = 1 m | x ¯ i ( k + 1 ) − x ¯ i ( k ) | < ε , then stop iteration,

output value.

Otherwise, the new clustering center in the third step into the first step to replace the old clustering center, repeat the second step and third step operation.

First, we compute the distance between each observation data and its the k-nearest neighbor (this distance is called the k-NN distance), deleting those points which have relatively large the k-NN distances, namely abnormal observations that is far from most points. Second, we give a rough grouping of mixed data using k-means cluster method for the rest of data. Then according to the packet data, a rough estimate of parameters, as the initial value of the iterative algorithm, are given. Finally, execute EM algorithm until convergence to estimate the parameters of gaussian mixture modeling.

We compare the effect of parameter estimation about the improved initial value method and original method on the simulation data in this section.

First, we generate a one-dimensional data set; its histogram is shown in

The original random initial value method of EM algorithm is very unstable, sometimes it need to be iterated nearly 100 times, and usually can not be very

close the true value of the parameters. However, number of iterations of EM algorithm is greatly reduced by the improved initial value method which is pretty stable. And the final parameter estimates are closer to the true value compared to original method. Here, we select a test result to draw a line graph (see

The average iteration times of the original EM algorithm is approximately 33, and the final iteration result is also highly unstable. Then we compare the parameter estimation results of the original method and the improved method on some realization. We can draw some conclusion from

μ ^ 1 = 2.829 , μ ^ 2 = − 2.233 σ ^ 1 = 1.112 , σ ^ 2 = 1.795 α ^ 1 = 0.443 , α ^ 2 = 0.556 (7)

The outliers are removed by the distances between each sample and its 10-nearest neighbor in improved initial value method. Then use the k-means method to classify the remaining points. The class center of each class is used as the initial value of the mean, the sample variance of each class is used as the initial value of the variance. The proportion of each class is used as the initial value of the coefficient. After 7 iterations, the change of the parameter is less than 10^{−}^{2}. The results of parameter estimation are as follows

μ ^ 1 = 2.949 , μ ^ 2 = − 2.034 σ ^ 1 = 1.011 , σ ^ 2 = 1.992 α ^ 1 = 0.404 , α ^ 2 = 0.596 (8)

Obviously, the performance of the improved method outperforms that of the original method [

better parameter estimation results. What is more, its statistical meaning is easy to understand. Then, the improved initial value method can be used not only for one-dimensional Gaussian mixture model, but also for multidimensional Gaussian mixture model.

However, it is need to further study how to choose the k when using the k-NN distance to remove outliers. At the same time, if we can further optimize the process of initial value selection (reducing complexity of deleting outliers and k-means clustering), it will bring greater use value.

Li, Y. and Chen, Y.Y. (2018) Research on Initialization on EM Algorithm Based on Gaussian Mixture Model. Journal of Applied Mathematics and Physics, 6, 11-17. https://doi.org/10.4236/jamp.2018.61002