Unsupervised Functional Data Clustering Based on Adaptive Weights ()
1. Introduction
Functional data analysis (FDA), proposed by J.O. Ramsay [1] , is a high-dimensional data analysis method based on the functionalization of discrete data to describe statistical relationships from a continuous perspective. It is characterized by treating discrete data as a functional whole with an inherently uniform structure, rather than just an arrangement of individual observations. In recent years, the extension of classical multivariate statistical methods to functional data is an important part of data mining research. In the scalar environment, there are many mature multivariate clustering algorithms, such as Affinity Propagation clustering, hierarchical clustering, K-means clustering, DBSCAN clustering, Chameleon clustering, etc. Most of the multivariate cluster analysis methods cannot be directly applied to functional data. The intuitive analysis idea is to discretize functional data based on sampling, and then use multivariate clustering model to cluster discrete data points [2] [3] . Although this can solve the problem that functional data is difficult to apply multivariate clustering algorithms, it also violates the main idea of “taking the whole function as the analysis object”. Tarpey T et al. [4] proposed a clustering algorithm based on function principal points. The clustering results of this algorithm are affected by the number and location of extreme points, therefore it lacks robustness. Tokushige [5] extended the K-means clustering algorithm to functional data, where the cluster center is determined by the minimum value of the objective function. Antoniadis et al. [6] used wavelet analysis techniques in functional data clustering. Wang et al. [7] proposed a weighted clustering algorithm based on functional principal components, which can effectively make up for the limitations of ordinary functional principal component clustering analysis. Delaigle et al. [8] proposed a weighted K-means algorithm by using the Haar base function proposed by Hardle [9] and the unbalanced Haar base function proposed by Fryzlewicz [10] . This algorithm selects the projection function by weighting the K-means clustering criteria and proves that nearly perfect clustering can be achieved by projecting the data onto the chosen finite-dimensional space. Sharp and Browne [11] imposed jointly generalized hyperbolic distributions on projections of basis expansion coefficients into group specific subspaces. Parameter estimation is done through a multicycle ECM algorithm. Wu et al. [12] assumed that the distribution of the cluster members around their projections onto the cluster’s principal curve is an isotropic Gaussian, and that the projections of cluster members onto the cluster’s principal curve are uniformly distributed. The proposed method employe Bayesian Information Criterion to automatically and simultaneously find the appropriate number of features, the optimal degree of smoothing and the corresponding cluster members. It can be noticed that cluster analysis based on continuous perspective and cluster analysis of discrete data still have a lot in common. If infinite-dimensional functional clustering can be transformed into finite-dimensional clustering, then many excellent multivariate clustering analysis can be used in functional data clustering.
2. Preliminaries
2.1. K-Means
The K-means is one of the most popular unsupervised learning algorithms that solve the well-known clustering problem. Let
be a data set in a d-dimensional Euclidean space
. Let
be the cluster centers. Let
, where
is binary variable, indicating if the data point
belongs to k-th cluster,
. The K-means objective function is
(1)
The K-means algorithm is iterated through necessary conditions for minimizing the K-means objective function
with updating equations for cluster centers and memberships, respectively, as
(2)
(3)
where
is the Euclidean distance between the data point
and the cluster center
.
2.2. Functional Principal Component Analysis
Functional principal component analysis (FPCA) is a generalization of principal component analysis. Suppose
is a random process defined on a compact interval T. Let
be the mean function. Let
be the covariance function. By using the spectral decomposition of the covariance function, we can obtain:
where
is a non-negative eigenvalue of
and satisfies
.
is the corresponding eigenfunction. The eigenfunctions are regarded as a set of basis functions, called principal component basis functions. The principal component basis function combined with the Karhunen-Loeve expansion can represent the infinite-dimensional function data in a limited manner. Project X into the principal component basis function, and the principal component score of the function
is the corresponding coefficient.
(4)
Applying the Karhunen-Loeve expansion, we can get
(5)
3. Unsupervised Functional Data Clustering Based on Adaptive Weights
Multivariate clustering analysis is difficult to directly extend to functional data clustering. But from the knowledge of principal component analysis in the previous section, functional data
is equivalent to its principal component score
. Since the variance contribution rates of the principal component scores are different, the importance of different principal components is obviously different. It is difficult to achieve an ideal classification effect, if we ignore the objective differences in the efficiency of different principal component scores, and use the principal component score to represent the sample function for clustering [13] . K-means cluster analysis is not completely unsupervised learning, because it requires the number of clusters to be given first. Therefore, we use functional principal component analysis to map functional data from infinite-dimensional space to finite-dimensional space, and then use unsupervised functional data clustering based on adaptive weights (UFDCAW) to cluster
on the finite-dimensional space. Finally, the clustering result of the functional data
is obtained.
Let
which is equivalent to
be the data on the d-dimensional Euclidean space
. Let
be the set of c class centers. Let
be the degree of membership function matrix, where
is the membership degree of
belonging to the k class. For any
, there is
and
. Let
be a set of mixed proportions, where
represents the probability that data point belongs to the k class, and satisfies
.
is the entropy. The objective function of unsupervised functional data clustering based on adaptive weights is
(6)
where
is the distance from the data point
to the class center
. Considering the difference in the scoring efficiency of different principal components, let
where
is the adaptive weight of the p item of the principal component score.
The Lagrangian function of formula (3.1) is
(7)
First, we take the partial derivative of Largrangian (3.2) with respect to
, we obtain
(8)
Thus, we have
From
, we can get
(9)
Then we get the update equation for the center point
as follows:
(10)
We next take the partial derivative of the Lagrangian (3.2) with respect to
(11)
then we get the updating equation for
as follows:
(12)
where t is the number of iterations of the algorithm.
is the weighted mean of
in equation (3.7). For the k-th
mixing proportion
, if
is less than the weighted mean, then the new
will be less than the
. That is, the smaller proportion will decrease,
and then competition will occur. If
or
for some k, they are
considered to be illegitimate proportions. In this situation, we discard those clusters and then update the cluster number
to be
(13)
where
denotes the cardinality of the set
. After updating the number of clusters
, the remaining mixing proportion
and corresponding
need to be re-normalized by
(14)
(15)
We next concern about the parameter learning of
and
for the two terms of
and
. Based on some increasingly learning rates of cluster number with
, and
, it is seen that
decreases faster, but
decreases slower [14] . We suppose that the parameter
should not decrease too slow or too fast, and so we set the parameter
as
(16)
To guarantee condition
, the parameters
are defined as
(17)
where
and
represents the largest integer that is no more than a and t denotes the iteration number in the algorithm. In Equation (3.12), if
is large, then
is smalll to maintain stability. Although functional principal component analysis expands the function data from infinite dimension to finite dimension, the dimension of the sample is still relatively high, so we add
adjust parameter
to Equation (3.12).
In our setting, we use all data points as initial means with
, i.e.
, and we use
as initial mixing proportions. Thus, the proposed unsupervised Learning functional data clustering based on adaptive weights can be summarized as Algorithm 1.
4. Theoretical Properties
Proposition 1. Given the iterative formula
that minimizes the objective
function
, if there exists k such that
is less than its weighted mean, then
.
proof.
That is, there exists k such that
is satisfied, we can get
. □
Proposition 2. Given the iterative formula
that minimizes the objective
function
, if
, then
.
proof.
Therefore, when
holds,
can be guaranteed. □
5. Simulation Study
In this section, we use simulation experiments to verify the performance of the proposed clustering algorithm and whether it can calculate the optimal number of clusters c. At the same time, we compare with fuzzy C-means (FCM) and adaptive weighting functional clustering (AWFC). The result is evaluated using cluster purity. Purity is calculated using Equation (5.1):
(18)
where
is the cluster division,
is the real class division. The value range of purity is
, the larger the value, the better the clustering performance.
Assume that the three functional data for the simulation experiment are:
,
,
,
.
is the observation value of the function data with error, and the error
obeys a normal distribution with mean 0 and variance 0.02. In order to ensure that the FCM clustering algorithm is available when simulating experiments, we assume that
is 100 equally spaced observation nodes on the interval
. The observation sample size of each function data in the simulation experiment is the same, and the total sample size is 30, 300, 600, in order to compare the effectiveness of the clustering algorithm under different sample sizes. The figure shows the clustering data when
. Figure 1(a) is the observation value with noise. Figure 1(b) is the function data generated using the B-spline basis.
We can see that the three types of curves are increasing functions from Figure 1(a) and Figure 1(b), but there are significant differences in their derivatives. Following the steps of UFDCAW, the scatter plot of functional principal component scores of 3rd and 5th iterations for a sample size of 300 is shown in Figure 2. From Figure 2(b), we can see that the number of clusters decreases rapidly from 300 to 165 after 3 iterations. The algorithm ends after five iterations and obtains the correct number of clusters
and three class centers. The purity of clustering is 0.993.
Table 1 shows the average of the results of 100 replicate experiments using FCM, AWFC and UFDCAW on simulated data with sample sizes of 30, 300 and 600, judged by the purity of the clustering results.
The data presented in Table 1 shows that UFDCAW has higher cluster purity than the FCM and AWFC for sample sizes of 30, 300 and 600. The results of the comparison show that: 1) UFDCAW has higher cluster purity than FCM and can display sample distribution characteristics in low dimensional space. 2) FCM and AWFC required random initial class centres, therefore the cluster prurity will be low when algorithm selects an unsuitable class center. UFDCAW is more
Table 1. Comparison of clustering algorithm results.
Figure 1. (a) Observation value with noise, (b) function data for a sample size of 30.
Figure 2. Scatter plot of the functional principal component scores of data with a sample size of 30 (a)
, (b)
, (c)
.
stable because there is no need to select the initial clustering center. 3) UFDCAW can be iterated to obtain the number of clusters.
6. Conclusion
At present, the data we collect is becoming more and more complicated. Using multivariate statistical models to deal with complicated data will cause problems such as dimensional disaster, and functional data emerges as the times require. Based on the former studies, we expand the K-means clustering algorithm based on adaptive weights. It not only reduces the influence of noise, but also can find the optimal number of clusters using the algorithm when the initialization parameters are missing. It is verified by simulation that the algorithm proposed in this paper can obtain more accurate clustering results and has a certain degree of feasibility. But, the algorithm is slightly more complicated, and some information will be lost when using basis function fitting and functional principal component analysis, which needs to be further improved.