^{1}

^{*}

^{2}

Classifying the data into a meaningful group is one of the fundamental ways of understanding and learning the valuable information. High-quality clustering methods are necessary for the valuable and efficient analysis of the increasing data. The Firefly Algorithm (FA) is one of the bio-inspired algorithms and it is recently used to solve the clustering problems. In this paper, Hybrid F-Firefly algorithm is developed by combining the Fuzzy C-Means (FCM) with FA to improve the clustering accuracy with global optimum solution. The Hybrid F-Firefly algorithm is developed by incorporating FCM operator at the end of each iteration in FA algorithm. This proposed algorithm is designed to utilize the goodness of existing algorithm and to enhance the original FA algorithm by solving the shortcomings in the FCM algorithm like the trapping in local optima and sensitive to initial seed points. In this research work, the Hybrid F-Firefly algorithm is implemented and experimentally tested for various performance measures under six different benchmark datasets. From the experimental results, it is observed that the Hybrid F-Firefly algorithm significantly improves the intra-cluster distance when compared with the existing algorithms like K-means, FCM and FA algorithm.

The permeation of information via the World Wide Web has generated an incessantly growing need for the improvement of techniques for discovering, accessing, and sharing knowledge from the various domains. The increase in both the volume and the variety of data requires advanced technologies to automatically understand the process and summarize the information meaningfully as per the requirements. So, human analysis of this massive amount of data is a tedious task and also no longer accurate. Therefore, it is necessary to design intelligent techniques to analyze these huge amounts of data. There are many data mining techniques available to retrieve the hidden information from this vast amount of data. The unsupervised learning or clustering is a powerful data mining technique that categorizes the objects into clusters based on the similarity between the objects [

The clustering process discovers meaningful and natural clusters in the datasets. The major obstacle in clustering is that no prior knowledge about the given dataset is available. There are many data clustering algorithms available in the literature to perform clustering. Among them most widely used categories of the clustering are partitional and hierarchical algorithms. The most popular class of partitional clustering is FCM clustering algorithm. But FCM algorithm depends on the initial seed points and converges to local optima. FCM clustering algorithm is applied to a wide variety of geostatistical data analysis problems [

Optimization is a process of finding the feasible solutions to the problems defined. The bio-inspired optimization algorithms have attracted many researchers to solve the problems of diverse fields. The main objective of these algorithms is to efficiently determine the near optimal solution for the problem statement defined. FA is one of the recently introduced swarm intelligence techniques and it is a kind of stochastic, nature-inspired, meta-heuristic algorithm used for solving complex problems. The FA is applied for clustering and its performance comparison with the commonly used optimization methods is performed as in the other population-based algorithm, the performance of the FA depends on the population size, attractiveness factor (β), light absorption coefficient (γ) and the distance between the two firefly particles (r). Their performance measure indicates that the FA based clustering is an efficient, reliable, and robust method, which can be applied successfully to produce optimal cluster centers. Recently firefly algorithm has been utilized by Senthilnath [

It is observed from the literature that combining bio-inspired algorithms with traditional data clustering techniques will produce even better results and overcome the drawbacks in partitional clustering algorithms [

From the literature, it is found that hybrid algorithm will improve the performance of the clustering process. Hence to overcome the shortcomings in the existing algorithms and to improve the clustering accuracy, the hybrid algorithm is proposed in this research work. The aim of the research work is to find the centroid of the clusters by minimizing the objective function, the sum of distances between the objects to their centers. For example, given N objects, the problem is to minimize the sum of squared distances between each object and allocate each object to one of the k cluster centers. The research work presented here focuses on developing hybrid clustering algorithm by combining FA with FCM algorithm. The proposed hybrid algorithms form high quality clusters by utilizing the advantages of both the partitional clustering and bio-inspired algorithms.

This paper is organized as follows: Section 2 describes about the overview of Firefly Algorithm. The various phases of proposed Hybrid F-Firefly algorithm are demonstrated in Section 3. Section 4 illustrates the experimental study and the evaluated experimental results under various performance metrics for the proposed and existing clustering algorithms are graphically shown and discussed in Section 5. Section 6 gives the conclusion of this research work.

The common approach used in this algorithm is that the fireflies will glow brighter when they attracts other nearby fireflies. The attraction between the two fireflies decreases as the distance between them increases. If there are no nearby fireflies brighter than a particular firefly, then this firefly will move randomly in the search space. The flashing characteristics of the fireflies use the following three idealized rules [

・ All fireflies are unisex, so that one firefly is attracted to other fireflies irrespective of their sex.

・ Attractiveness is proportional to the brightness that is the firefly with lesser brightness will move towards the brighter firefly. Both the attractiveness and brightness decrease as their distance increases. Firefly will move randomly if there are no fireflies brighter than the nearby fireflies.

・ The brightness of the firefly shall be associated with the objective function.

There are two important features in the FA algorithm like variation in the light intensity and formulation of attractiveness. The attractiveness of the firefly depends on the brightness which in turn is associated with the objective function. FA algorithm is developed based on the rule of light intensity “I” that decreases with the in-

crease in the square of the distance,

lem, the brightness “I” of the firefly at a particular location “x” can be

In the simplest form, the light intensity

The light intensity “I” and the absorption coefficient “γ” varies with the distance “r” for a given medium and it is given as:

where,

The combined effect of both inverse square law and absorption can be assumed as Gaussian form and it is represented as:

The firefly’s attractiveness is proportional to the light intensity of the adjacent fireflies. The attractiveness “β” of a firefly can be defined as:

where, “

The Euclidean distance between the firefly “I” at

The movement calculation of the firefly “I” is attracted to another more attractive (brighter) firefly “j” and it is estimated by:

The second term represents the attraction and third term is randomization. Where, “α” being randomization parameter and “rand” is a random number generator which is uniformly distributed between 0 and 1.

In the FA algorithm, the optimization of the objective function depends on the brightness and movement of the firefly. The brighter firefly will attract the nearby fireflies with low brightness. The firefly algorithm starts by initializing the population of fireflies. The brightness of the firefly determines the movement of the fireflies. In the iterative process, the intensity of the i^{th} firefly is compared with intensity of j^{th} firefly. Based on the difference in intensity, either i^{th} firefly move towards j^{th} firefly or j^{th} firefly will moves towards i^{th} firefly. The best solution obtained is continuously updated until certain stopping criterion is satisfied. Once the iterative process comes to an end, the best solution is determined.

From the literature, it is found that FA algorithm can outperform when compared to many other algorithms. FA algorithm expands and new variants of it started emerging to solve all kinds of optimization problems. FA is a swarm-intelligence-based algorithm, so it has similar advantages of the other swarm intelligence based algorithms. FA has two major advantages over other algorithms, namely automatic subdivision and the ability of dealing with multimodality [

・ As the FA algorithm is designed based on the attraction and this attraction decreases with distance increases. This phenomenon leads the whole population to automatically subdivide into groups and each group can search around the local optimum. Among all these local optimum, the global solution can be found.

・ This subdivision allows the fireflies to be able to find all optimum solution simultaneously if the population size is sufficiently high.

FA algorithm has some limitations such as:

・ FA parameters are set fixed and they do not change over time.

・ FA does not memorize any history of the best solution for each firefly and moves regardless of it, and it misses its best solution.

The existing FA algorithm suffers because of the inability to memorize any history about the best solution for each firefly and also moves randomly if there are no fireflies brighter than the nearby fireflies. This problem can be overcomed by the proposed Hybrid F-Firefly Algorithm in which the FCM operator is incorporated into the end of each iteration in the existing FA. The incorporation of the FCM algorithm has provides significant improvement in the performance of the firefly algorithm. The proposed Hybrid F-firefly algorithm consist of four major phases such as initialization phase, intensity calculation phase, movement calculation phase and FCM algorithm phase. The detailed description of all the phases of Hybrid F-Firefly algorithm is furnished as follows.

In the F-Firefly algorithm, the population (S) of fireflies within the search space is created. Based on the objective function, all the agents (fireflies) are randomly distributed in the search space. The position of the agents represents the possible solution (centroids) for the clustering problem. Furthermore, this phase will assign the algorithm parameters like

After the initialization phase, the intensity of each firefly is evaluated by measuring the distance between the position of the firefly and the whole data in the dataset. After calculating the distance, consider the minimum distance value among the firefly with respect to data from the dataset. Calculate the intensity value of each firefly based on the summation of minimum distance value obtained with respect to the data from the dataset. The formula for calculating the intensity is given below:

where, _{i} is the Minimum distance value for a particular firefly.

The brightness of the firefly determines the movement of the fireflies in the search space. After the intensity calculation, the fireflies are compared to find the new position. During the iterative process, the intensity of one firefly is compared with that of the other fireflies in the swarm and the difference in the brightness triggers the movement. The distance travelled depends on the attractiveness between the two fireflies. For example, consider a firefly “I” and “j”, the intensity value of the firefly “I” is compared with firefly “j”. If the intensity value of firefly “I” is more than firefly “j”then the firefly “j” is moved towards the firefly “I”. Based on the intensity value of the fireflies, the movement calculation is performed using Equation (8).

where,

Once the firefly “I” is moved to the new position and then update the intensity value of the firefly “I”. Similarly, the above mentioned procedure is repeated for the entire fireflies by keeping one of the fireflies as constant. The intensity value is calculated for the new position of the firefly obtained after the movement calculation is performed.

The proposed Hybrid F-Firefly algorithm is the modification of the existing FA algorithm by incorporation of the FCM operator (one step of FCM algorithm) to enhance the clustering performance. The main objective of incorporating the FCM operator in this stage is that, the FCM algorithm will find the local optimal solution effectively. Based on the current intensity value, the new position of the entire population of the fireflies is calculated by applying the FCM Operator. The process of FCM operator is carried out through the update of the membership values u_{ij} and position of the firefly f_{j} using the Equations (9) and (10) respectively [

where, u_{ij} is the degree of membership of x_{i} in the firefly j, the value of fuzziness component “m” = 2 and x_{i} is the data associated with the firefly under study.

where,

The new position for each firefly is evaluated and the intensity value is updated. The fireflies are sorted based on the intensity value before moving to the next iteration. During the iterative process, the best solution obtained so far is continuously updated and all the above mentioned phases of the F-Firefly algorithm are prolonged until the stopping criteria are reached. After the iterative process comes to an end, the best solution is determined by ranking the position of the fireflies and the post process is initiated to select the final centroids. The overall process of the proposed Hybrid F-Firefly algorithm is represented in the flowchart and it is shown in

Experiments are conducted to evaluate the performance proposed and existing algorithms. All the algorithms are implemented in JAVA language and executed on a core i5 processor, 2.1 MHZ, 4 GB RAM computer. The initial set up for the experiments is as follows: number of fireflies (S) = 20, attractiveness (β_{0}) = 1 and light absorption coefficient (γ) = 1. To present the effectiveness of the proposed Hybrid F-Firefly clustering algorithm, most commonly used partitional clustering algorithms are used for comparison of the results. The two major partitional clustering algorithms such as k-means and FCM clustering algorithms are used for comparison as they are commonly used for hard and soft clustering technique in a wide range of applications. The FA algorithm is also utilized for comparison because the Hybrid F-Firefly algorithm is a modification of the original FA algorithm.

The performance of the F-Firefly algorithm is compared with the existing algorithms based on the intra-cluster distance and cluster validity metrics. Here, the internal validity measures like Beta index, Distance index, and DB index are utilized for evaluating and comparison. To perform the experiments on the Hybrid F-Firefly and existing algorithms, six benchmark datasets such as Abalone, Zoo, Iris, Wine, Liver and Thyroid are selected from the UCI machine learning repository.

The experimental results obtained for the proposed Hybrid F-Firefly algorithm are highlighted in this section. In order to show the effectiveness and the strength of the Hybrid F-Firefly algorithm, experiments have been conducted using intra-cluster distance and the cluster validity measures. The Hybrid F-Firefly and existing algorithms are executed for a maximum of 200 iterations and cluster size is fixed as 3. The results are tabulated for Hybrid F-Firefly, k-means, FCM and FA algorithm and comparisons made under various performance measures for six different datasets.

Lower the value of intra-cluster distance value indicates that the goodness of the clustering algorithm. The average intra-cluster distance calculated for the FCM, k-means, FA, and Proposed Hybrid F-Firefly algorithms and it is tabulated in

The experimental results of the beta index obtained by Hybrid F-Firefly and existing algorithms using six different datasets are shown graphically in

The performance of the Hybrid F-Firefly and exiting algorithms are evaluated using Davies-Bouldin index with the six benchmark datasets are shown in

The distance index is calculated as the ratio of average intra-cluster distance and average inter-cluster distance.

Datasets | Existing Algorithms | Proposed Hybrid Algorithm | ||
---|---|---|---|---|

FCM | k-means | Firefly | F-Firefly | |

Abalone | 16,765.43 | 16,103.12 | 15,718.34 | 15,619.5 |

Zoo | 17,603.54 | 17,116.29 | 16,380.46 | 16,334.7 |

Liver | 7581.02 | 7309.69 | 6520.27 | 6508.23 |

Iris | 128.44 | 126.5 | 76.9 | 71.39 |

Thyroid | 11,356.37 | 11,062.62 | 10,088.94 | 9943.71 |

Wine | 16,885.46 | 16,530.33 | 16,231.6 | 16,201.4 |

It is well-known that the minimum distance index value illustrates better performance of the clustering algorithm. To evaluate the impact of the Hybrid F-Firefly algorithm, the comparisons are performed with the existing algorithms like k-means, FCM, FA algorithm. In

In this research work, a Hybrid F-Firefly algorithm is developed by incorporating FCM operator at the end of each iteration in FA algorithm. Various performance measures are evaluated for the proposed and existing algorithms under six different datasets. From the results, it is observed that the average intra-cluster distance computed using the six different datasets is minimum and yields the percentage improvements of 0.2% - 44.4% for the Hybrid F-Firefly algorithm when compared with the FCM, k-means, and FA algorithms. In addition, the proposed Hybrid F-Firefly algorithm provides 20% - 40% improvement of beta index value, 20% - 23% improvements of DB index value, and 26% - 33% improvement of distance index value when compared to the FA algorithm. These improvements in the performance of the Hybrid F-Firefly clustering algorithm are due to local search capability of FCM algorithm and so the global search property of FA algorithm is combined in the Hybrid F-Firefly algorithm. From the experimental results, it is observed that the performance of proposed Hybrid F-Firefly algorithm is better when compared with the original FA algorithms.

Krishnamoorthi Murugasamy,Kalamani Murugasamy, (2016) Hybrid Clustering Using Firefly Optimization and Fuzzy C-Means Algorithm. Circuits and Systems,07,2339-2348. doi: 10.4236/cs.2016.79204