TITLE:
Parallel K-Means Algorithm for Shared Memory Multiprocessors
AUTHORS:
Tayfun Kucukyilmaz
KEYWORDS:
K-Means, Clustering, Data Mining, Shared Memory Systems, High Performance
JOURNAL NAME:
Journal of Computer and Communications,
Vol.2 No.11,
September
12,
2014
ABSTRACT:
Clustering is the task of assigning a set
of instances into groups in such a way that is dissimilarity of instances
within each group is minimized. Clustering is widely used in several areas such
as data mining, pattern recognition, machine learning, image processing,
computer vision and etc. K-means is a popular clustering algorithm which
partitions instances into a fixed number clusters in an iterative fashion.
Although k-means is considered to be a poor clustering algorithm in terms of
result quality, due to its simplicity, speed on practical applications, and
iterative nature it is selected as one of the top 10 algorithms in data mining [1].
Parallelization of k-means is also studied during the last 2 decades. Most of
these work concentrate on shared-nothing architectures. With the advent of
current technological advances on GPU technology, implementation of the k-means
algorithm on shared memory architectures recently start to attract some
attention. However, to the best of our knowledge, no in-depth analysis on the
performance of k-means on shared memory multiprocessors is done in the
literature. In this work, our aim is to fill this gap by providing theoretical
analysis on the performance of k-means algorithm and presenting extensive tests
on a shared memory architecture.