Modeling Methods in Clustering Analysis for Time Series Data

This paper is concerned about studying modeling-based methods in cluster analysis to classify data elements into clusters and thus dealing with time series in view of this classification to choose the appropriate mixed model. The mixture-model cluster analysis technique under different covariance structures of the component densities is presented. This model is used to capture the compactness, orientation, shape, and the volume of component clusters in one expert system to handle Gaussian high dimensional heterogeneous data set. To achieve ﬂexibility in currently practiced cluster analysis techniques. The Expectation-Maximization (EM) algorithm is considered to estimate the parameter of the covariance matrix. To judge the goodness of the models, some criteria are used. These criteria are for the covariance matrix produced by the simulation. These models have not been tackled in previous studies. The results showed the superiority criterion ICOMP PEU to other criteria. This is in addition to the success of the model based on Gaussian clusters in the prediction by using covariance matrices used in this study. The study also found the possibility of determining the optimal number of clusters by choosing the number of clusters corresponding to lower values for the different criteria used in the study.


Introduction
The clustering analysis is one of the statistical methods that deal with the division and classification of variables data elements into several homogeneous groups that are homogeneous within one group (cluster) and are different from How to cite this paper: Morad, N.A. (2020) Modeling Methods in Clustering other groups (other clusters). Cluster analysis is defined as, a set of methods for constructing a (hopefully) sensible and informative classification of an initially unclassified set of data, using the variable values observed on each individual. All such methods essentially try to imitate what the eye-brain system does so well in two dimensions (Everitt and Skrondal [1]). Because of this characteristic of cluster analysis, it has been used in many applied fields. It is used to divide and classify data into aggregates, which help to properly select appropriate statistical analysis of these data as a decision-making tool. The objective of this statistical method is to divide the data matrix containing the number of (n) of the samples and (p) of the variables into a homogeneous number of partial groups (k) by assembling homogeneous and convergent sample items in clusters. Thereafter, criteria and measures must be used to distinguish between the different cluster results to reach two main points: the similarity of the data elements within the different clusters and the optimal number of clusters. This is done through the use of the legal functions, known as the standards of validity and legal performance of the cluster. In this paper, one of the most important hybrid models based on clusters, the Gaussian mixed model-based clustering is used. The hybrid models based on clusters are able to predict accurately if the appropriate variance model is chosen. It is applied through the use of four heterogeneity models. The covariance matrix of the Gaussian mixed model is unknown. So to estimate these parameters we need to maximize the log-likelihood function of. Direct maximization of the log-likelihood function is complicated, so the maximum likelihood estimator (MLE) of a finite mixture model is usually obtained via the EM algorithm (Dempster et al. [2]). Banfield and Raftery [3] proposed a model-based clustering method based on constraining these geometric features of components using the eigenvalue decomposition of the covariance matrix.
Different constraints on the covariance matrix provides different models that are applicable to different data structures, which is another advantage of model-based clustering. In 1995, Celeux and Govaert [4] classified these models in three main families of models: spherical, diagonal and general families. They have given the definitions and derivations of all 14 available models, along with the covariance matrix update equations based on these models to be used in the EM algorithm. However, only nine of those have a closed form solution to the covariance update equation, which is evaluated in the M-step of the EM algorithm.
Later in 2016, Chi et al., [5] showed that the population likelihood function has bad local maxima even in the special case of equally-weighted mixtures of well-separated and spherical Gaussians. They proved that the log-likelihood value of these bad local maxima can be arbitrarily worse than that of any global optimum. Also, they showed that the EM algorithm with random initialization will converge to bad critical points with probability at least. They further establish that the first-order variant of EM will not converge to strict saddle points almost Cluster analysis is used in various fields of science. Tóth et al., [6] described gamma-ray bursts (GRBs) using clustering. They analyzed the Final BATSE Catalog using Gaussian-mixture-models-based clustering methods for six variables (durations, peak flux, total fluency and spectral hardness ratios) that contain information on clustering.
In 2000, Bozdogan [7] studied the basic idea of Akaike's [8] information criterion (AIC). Then, he presented some recent developments on a new entropic or information complexity (ICOMP) criterion of Bozdogan [9] for model selection.
The main contribution of the present paper is to propose the mixture-model cluster analysis technique under different covariance structures of the component densities. To determine the optimal number of clusters by selecting the number of clusters corresponding to the lowest values for the different criteria.
Four models for covariance structures that have not been applied in previous studies are studied using three criteria of the complexity of information.
This paper is organized as follows: Section one is the introduction and section two the Gaussian Mixture Model-based Clustering (GMMC) is discussed. In section three, the Expectation-Maximization (EM) algorithm is introduced. The Model Selection Criteria are introduced in section four. Finally, sections five and six contain the Numerical Results, and the Conclusion, respectively (Table 1).

The Gaussian Mixture Model-Based Clustering (GMMC)
The Gaussian mixture model is a powerful clustering algorithm used in cluster analysis. It is the most widely used clustering method of this kind, is the one based on learning a mixture of Gaussians. It assumes that there are a certain number of Gaussian distributions, and each of these distributions represents a cluster. Hence, a Gaussian Mixture Model tends to group the data points belonging to a single distribution together. Gaussian Mixture Models are probabilistic models and use the soft clustering approach for distributing the points in different clusters. It's difficult to determine the right model parameters, Expectation-Maximization method is used to determine the model parameters.
In a case where ( ) n p X × ∈  are given (p dimensional data of size n), would be interested in estimating the number of clusters K. Assuming the observations ij x ( 1, , i n =  , 1, , j p =  ) are assumed to be drawn from the following mixture K distribution, each corresponding to a different cluster: The mixture components (i.e. clusters) are ellipsoids centered at k µ with other geometric features, such as volume, shape, and orientation, determined by the covariance matrix k Σ . (Titterington et al. [10]).
In this case, the component densities k g are given by: Parsimonious parameterizations of the covariance matrices can be obtained by using the eigenvalue decomposition of the covariance matrix. The eigenvalue decomposition of the kth covariance matrix is given as: λ is a scalar controlling the volume of the ellipsoid.
A κ is a diagonal matrix specifying the shape of the density contours with ( ) D κ is an orthogonal matrix which determines the orientation of the corresponding ellipsoid (Banfield and Raftery [3] and Celeux and Govaert [4]).
In one dimension, there are just two models: E for equal variance and V for varying variance. In the multivariate setting, the volume, shape, and orientation of the covariance can be constrained to be equal or variable across groups. Thus, 14 possible models with different geometric characteristics can be specified. Table 2 reports all such models with the corresponding distribution structure type, volume, shape, orientation, and associated model names. See (Erar [11], Gupta and Bhatia [12], Chi et al., [5], Scrucca et al., [13], Malsiner-Walli et al., [14] and Tóth, et al., [6]).
Approaching the clustering problem from this probabilistic standpoint reduces the whole problem to the parameter estimation of a mixture density. The unknown parameters of the Gaussian mixture density, are the mixing proportions, k π , the mean vectors, k µ , and the covariance matrices, k Σ . Therefore, to estimate these parameters, we need to maximize the log-likelihood given by: The estimates of the mixing proportion, k π , the mean vector k µ , and the covariance matrix k Σ for the kth population are given as: This estimation requires the non-linear optimization of the mixture likelihood for high-dimensional data sets. However, there are no closed-form solutions to ( ) for any mixture density; so the likelihood has to be numerically maximized. For this numerical optimization, the Expectation-Maximization (EM) algorithm of Dempster et al. [2] is used, which treats the data as incomplete and the group labels y i as missing.

The Expectation-Maximization (EM) Algorithm
The expectation-maximization (EM) algorithm is an iterative procedure used to find maximum likelihood estimates when data are incomplete or are treated as being incomplete. The consummate citation for the EM algorithm is the famous paper by Dempster et al. [2]. In EM algorithm, E and M steps are iterated until convergence is reached. The EM algorithm is based on the "complete-data"; i.e.,  [15]).
In this approach, hierarchical clusters are obtained by recursively merging the two clusters that provide the smallest decrease in the classification likelihood for the Gaussian mixture model (Banfield and Raftery [3], Xu et al. [16]).
The EM algorithm is an iterative procedure consisting of two alternating steps, given some starting values for all parameters (ˆk π , ˆk µ and ˆk Σ ). The algorithm can be summarized as follows at iteration (t + 1): 1) In the E-step, the posterior probability, ˆi k T of the ith observation belonging to the kth component is estimated, given the current parameter estimates.
In the M-step, the parameter estimates of k π , k µ and k Σ are updated given the estimated posterior probabilities, using the update equations The EM algorithm requires two issues to be addressed; determining the number of components, K, and initialization of the parameters.

The Model Selection Criteria
After estimating the parameters for the covariance matrix, the next step of determining the optimal cluster structure is selecting the best model. Despite the vast number of different model selection criteria in the literature, Schwarz's Bayesian Criteria (SBC) (Schwarz [17]) is no doubt the most widely used in the model-based clustering. Besides these criteria, two other selection criteria are used. Namely AIC (Akaike [8]) and the information complexity (ICOMP) criterion (Bozdogan [18]). When using any information criterion to perform model selection, the model corresponding to the lowest score as providing the best balance between good fit and parsimony is chosen. Using the likelihood function, m is the number of independent parameters to be estimated. θ is the maximum likelihood estimate for parameter θ.
C is a real-valued complexity measure.


Cov θ is the estimated model covariance matrix.
The covariance matrix is estimated by the estimated inverse Fisher informa- That is to say, IFIM is the negative expectation of the matrix of the second partial derivatives of the maximized log-likelihood of the fitted model, evaluated at the maximum likelihood estimators θ .
For a multivariate normal model, the general form of ICOMP is defined as: For all the above criteria, the decision rule is to select the model that gives the minimum score for the loss function.

The Numerical Results
All results were obtained by using MATLAB.
The Gaussian mixture-model based clustering is applied, which implements the EM algorithm for inference, to four simulated data sets.   From Table 3, the optimal number of clusters was determined. It was determined by achieving the lowest values of the criteria at the same time. It was found to fit the number of clusters of two clusters.
Given below in Table 4 the parameter values estimated for the best simulation.
For the selected model, GMMC identifies the cluster labels with a miss classification rate of 1%. The miss classification rate is calculated as follows: 2) Model: VII with the covariance matrix ( k I λ ) (Figure 3 & Figure 4) Using Table 5, the optimal number of clusters was two clusters. GMMC achieves a miss classification rate of 2% for the model (VII). The resulting confusion matrix is shown in Table 6.
3) Model: VEE with the covariance matrix ( From the results in Table 7, it was found that the optimal number of clusters is three, so the number of clusters was increased. To achieve greater clarity, the sample size was 500 instead of 250 and was divided into three groups as follows (Table 8).
For this model, the miss classification rate was 15%.          The fit number of clusters for this model was two clusters (Table 9).

4) Model: VVE with the covariance matrix (
It was shown that the miss classification rate was 0% from the data in Table   10.

Conclusion
In this paper, the Gaussian mixture model-based clustering is used. The mixture models based on clusters are able to predict accurately if the appropriate covariance matrix, model is selected. It is applied by using four models:   ) represents the case of equal volume, variable shape, and orientation. It is showed that the optimal number of clusters equals two. From the values of the complexity criteria in Table 3, it is noted that the ICOMPPEU criterion corresponds to the lowest value compared to the other two criteria and the miss classification rate was 1%.
2) Model [VII] ( k I λ ) represents the case of variable volume, shape, and orientation. Also, in this model, the optimal number of clusters equals two and the ICOMPPEU criterion corresponds to the lowest value compared to the other two parameters (the values in Table 5). The miss classification rate was 2%. shape, and direction. From Table 7, it is found that the optimal number of clusters is calculated by the number of clusters corresponding to the lowest values of the complexity of the information and found to be equal to three. The miss classification rate was 15%.

4) Model [VVE] (
T k k DA D λ ) represents the case of variable volume, shape, and equal orientation. As the first and second model the optimal number of clusters equals two the ICOMPPEU criterion corresponds to the lowest value compared to the other two criteria (values are found in Table 9, while the miss classification rate was 0%.
The results showed that the ICOMPPEU criteria were superior to the rest of

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.