TITLE:
An Experiment of K-Means Initialization Strategies on Handwritten Digits Dataset
AUTHORS:
Boyang Li
KEYWORDS:
K-means, Clustering Performance Evaluation, Machine Learning, Principal Component Analysis
JOURNAL NAME:
Intelligent Information Management,
Vol.10 No.2,
February
28,
2018
ABSTRACT: Clustering is an important unsupervised classification method which divides data into different groups based some similarity metrics. K-means becomes an increasing method for clustering and is widely used in different application. Centroid initialization strategy is the key step in K-means clustering. In general, K-means has three efficient initialization strategies to improve its performance i.e., Random, K-means++ and PCA-based K-means. In this paper, we design an experiment to evaluate these three strategies on UCI ML hand-written digits dataset. The experiment result shows that the three K-means initialization strategies find out almost identical cluster centroids, and they have almost the same results of clustering, but the PCA-based K-means strategy significantly improves running time, and is faster than the other two strategies.