A Fixed Suppressed Rate Selection Method for Suppressed Fuzzy C-Means Clustering Algorithm

Suppressed fuzzy c-means (S-FCM) clustering algorithm with the intention of combining the higher speed of hard c-means clustering algorithm and the better classification performance of fuzzy c-means clustering algorithm had been studied by many researchers and applied in many fields. In the algorithm, how to select the suppressed rate is a key step. In this paper, we give a method to select the fixed suppressed rate by the structure of the data itself. The experimental results show that the proposed method is a suitable way to select the suppressed rate in suppressed fuzzy c-means clustering algorithm.


Introduction
With the development of computer and network technology, the world has entered the age of big data.As the basic data analysis method, cluster analysis method groups the data unsupervised with the similar characteristics.Since fuzzy set theory was successfully introduced to clustering analysis, it took several important steps until Bezdek reached the alternating optimization (AO) solution of fuzzy clustering, named fuzzy c-means (FCM) clustering algorithm [1]- [3], which improved the partition performance of the previously existing hard c-means clustering (HCM) algorithm, by extending the membership degree from { } 0,1 to [ ] 0,1 .FCM outperformed HCM in the terms of partition quality, at the cost of a slower convergence.In spite of this drawback, FCM is one of the most popular clustering algorithms.Many researchers have studied the convergence speed and parameter selection of FCM and elaborated various solutions to reduce the execution time [4]- [8].
As another way to speed up the FCM calculations, we proposed an algorithm, named as suppressed fuzzy c-means clustering (S-FCM) algorithm [9], to reduce the execution time of FCM by improving the convergence speed, while preserving its good classification accuracy.S-FCM established a relationship between the HCM and FCM with the suppressed rate ( ) α α ≤ ≤ : S-FCM becomes HCM when 0 α = and FCM when 1 α = .
The S-FCM algorithm is not optimal from a rigorous mathematical point of view, as it does not minimize the objective function.In order to study this problem, Szilágyi et al. defined a new objective function with parameter α and named it optimally suppressed fuzzy c-means (Os-FCM) clustering algorithm [10]- [12].Os-FCM cluster- ing algorithm is converged.By numerical tests, they claimed: we cannot take for granted the optimality or nonoptimality of S-FCM, but we can assert that it behaves very similar to an optimal clustering model (Os-FCM).
The problem of selecting a suitable parameter α in S-FCM constitutes an important part of implementing the S-FCM algorithm for real applications.The implementation performance of S-FCM may be significantly degraded if the parameter α is not properly selected.It is therefore important to select a suitable parameter α such that the S-FCM algorithm can take on the advantages of the fast convergence speed of the HCM as well as the superior partition performance of the FCM.Huang et al. proposed a modified S-FCM, named as MS-FCM, to determine the parameter α with type-driven learning.α is updated each iteration and successful used in MRI segmentation [13].And then there are many researchers pay close attention to parameter selection, just like Huang et al. gave Cauchy formula [14], Nyma et al. gave exponent formula [15], Li et al. gave fuzzy deviation exponent formula [16], and Saad et al. gave the clarity formula [17].However, these selection strategy made the parameter α is changed in each iteration.For the fixed selection case, we simple set 0.5 α = in the original paper.In this paper, we are further interesting on the fixed selection of α based on the data structure.
The remainder of the paper is organized as follows: Section 2 and Section 3 introduce FCM clustering algorithm and S-FCM clustering algorithm respectively.In Section 4, a method to select the parameter α based on the data structure is stated.Section 5 reports experimental analysis on the performances of the new selection method with some related algorithms and the conclusions are presented in Section 6.

Fuzzy C-Means Clustering Algorithm
FCM is one of the most widely used fuzzy clustering algorithms.It can be presented by the following mathematics programming.
The traditional FCM partitions a set of object data into a number of c clusters based on the minimization of a quadratic objective function.The objective function to be minimized is: where j x represents the input data ( ) , i v represents the prototype of center value or representative element of cluster ( ) is the fuzzy membership function showing the degree to which vector j x belongs to cluster i, 1 m > is the fuzzy factor parameter, and ij d represents the distance between vector j x and cluster prototype i v .According to the definition of fuzzy sets, the fuzzy memberships of any input vector j x satisfy the probability constraint The minimization of the objective function FCM J is achieved by alternately applying the optimization of and the optimization of FCM J During each cycle, the optimal values are deduced from the zero gradient conditions, and obtained as follows: ( ) ( ) According to the AO scheme of the FCM clustering algorithm, Equations ( 3) and ( 4) are alternately applied, until cluster prototypes stabilize.This stopping criterion compares the sum of norms of the variations of the prototype vectors i v within the latest iteration T, with a predefined small threshold value ε .

Suppressed Fuzzy C-Means Clustering Algorithm
The suppressed fuzzy c-means algorithm was introduced in [9], having the declared goal of improving the convergence speed of FCM, while keeping its good classification accuracy.The algorithm modified the AO scheme of FCM, by inserting an extra computational step between the application of formulae ( 3) and (4).Considering j x , if the degree of membership of j x belongs to pth cluster is the biggest of all the clusters, the value is noted as pj u .After modified, the memberships are: The fuzzy memberships are then modified such a way that all nonwinner values are decreased via multiplying by a so-called suppression rate ( ) 0 1 α α ≤ ≤ ; and the winner membership is increased accordingly, so that the probability constraint given in Equation ( 5) is fulfilled by the modified memberships.

The Fixed Selection of Suppression Rate α
In the original S-FCM, the suppression rate α is set the middle of interval, i.e., 0.5 α = , it can be consider a compromise with FCM and HCM.So we think that the better method to select α is based on the data distribu- tion structure.
For the data set , the proof is written in Appendix.

Experimental Studies
We make experimental studies in this section to show the performances of the new fixed selection method for α .The S-FCM with

Synthetic Datasets
In this section, we perform some experiments to compare the performances of these algorithms with synthetic datasets.In order to examine and compare the performance of FCM, S-FCM, S-FCM*, the following criterias are used.These are the number of iterations, iteration time until convergence and classification rate.These algorithms are started with the same initial values and stopped under the same condition.
The three artificial datas involves three clusters each with 100 points under multivariate normal distribution are named as data 1, data 2 and data 3 respectively.The parameters used for generating data 1 is: , and is showed in  1 that S-FCM* has minimum iteration number and iteration times (s), and the classification rate of FCM, S-FCM and S-FCM* are all 97.67%.For data 3, we move the    1 that S-FCM* has minimum iteration number and iteration times (s), and the classification rate of S-FCM and S-FCM* are all 89%, which is better than the classification rate of FCM with 88.33%.The fuzzy factor m = 2 and m = 10 are used to compare the results.
As supported by the experiments, it indicates that S-FCM* improves the convergence speed while preserving its good classification accuracy compared with S-FCM.

UCI Machine Learning Datasets
In this section, we perform experiments on a number of UCI Machine Learning data sets [18], which is Iris, Wine, Ionosphere, Sonar, GCM_efg and Leukemia.Iris plants data is the best-known data sets to be found in pattern recognition literature.The iris consists of 150 label vectors of four dimensions.Wine data consists of 178 label vectors of 13 dimensions.Ionosphere data consists of 351 vectors of 34 dimensions.Sonar data consists of 208 vectors of 60 dimensions.GCM_efg and the Leukemia are high-dimensional data sets.GCM_efg data consists of 43 vectors of 16,063 dimensiona and Leukemia data consists of 72 vectors of 7129 dimensions.We test the performances hundred times, average result (iteration number, iteration times and classification rate) are given in Table 2.
To compute the suppressed rate of S-FCM*, we get the value of 0.30 α = for the Iris data; 0.41 α = for the For Iris data, we can seen that that S-FCM* has minimum iteration number and iteration times (s) on average means, and the classification rate of S-FCM and S-FCM* are all 88.67%, which is better than the classification rate of FCM with 88%.For Wine data, we can seen that that S-FCM* has minimum iteration number and iteration times (s) on average means, and the classification rate of S-FCM and S-FCM* are all 69.54%, which is better than the classification rate of FCM with 68.54%.For Ionosphere data, we can seen that that S-FCM* has minimum iteration number and iteration times (s) on average means, and the classification rate of S-FCM and S-FCM* are all 70.66%, which is better than the classification rate of FCM with 69.8%.For GCM_efg data, we can seen that that S-FCM* has minimum iteration number and iteration times (s) on average means, and the classification rate of S-FCM and S-FCM* are all 74.42%, which is better than the classification rate of FCM with 69.77%.For Leukemia data, we can seen that that S-FCM* has minimum iteration number and iteration times (s) on average means, and the classification rate of S-FCM and S-FCM* are all 87.5%, which is better than the classification rate of FCM with 69.44%.For Sonar data, we can seen that that S-FCM* has minimum iteration number and iteration times (s) on average means, and the classification rate of FCM and S-FCM* are all 55.77%, which is better than the classification rate of S-FCM with 55.29%, this means that set 0.5 α = don't always a good selection.The fuzzy factor m = 2 and m = 10 are used to compare the results.

Conclusion
In this paper we propose a fixed suppressed rate selection method for suppressed fuzzy c-means clustering algorithm called S-FCM*, the method to select the fixed suppressed rate by the structure of the data itself.The experimental results show that the proposed method is a better way to select the suppressed rate in suppressed fuzzy c-means clustering algorithm.The S-FCM* improves the convergence speed, while preserving its good classification accuracy on average sense.
as S-FCM*.We make a comparison of the new approach with some algorithms: FCM, S-FCM.We work with Matlab version 8.0, a computer with 2 processors Genuine Intel of 3.0 GHz frequency, memory 1.0 G and hard disk of 500 G capacity.The parameters we used for these algorithms are: maximal number of iterations T = 200,

Figure 1 . 3 .
And then, we move the three cluster's center closer by is showed in Figure2.Further, we set the three cluster's center more closer to each others by − to obtain the data 3 and is showed in Figure We for S-FCM*.In cluster analysis, three important criterions to test the performances of clustering algorithm are iteration number, iteration times (s) and classification rate.For the data 1, three clusters are well-separated, thus a small value of α is hoped, we get the value of 0.27 α = for S-FCM*.It had shown in Table 1 that S-FCM* has minimum iteration number and iteration times (s), and the classification rate of FCM, S-FCM and S-FCM* are all 100%.For data 2, we move the clusters closer slightly, thus a slightly larger value of α is hoped, we get the value of 0.33 α = for S-FCM*.It had shown in Table

Figure 3 .
Figure 3.The plot of data 3. clusters more closer each other, thus a larger value of α is hoped, we get the value of 0.38 α = for S-FCM*.It had shown in Table1that S-FCM* has minimum iteration number and iteration times (s), and the classification rate of S-FCM and S-FCM* are all 89%, which is better than the classification rate of FCM with 88.33%.The fuzzy factor m = 2 and m = 10 are used to compare the results.As supported by the experiments, it indicates that S-FCM* improves the convergence speed while preserving its good classification accuracy compared with S-FCM.

Table 2 .
The average number of computational performances of FCM, S-FCM, S-FCM* with 100 runs.