TITLE:
CDV Index: A Validity Index for Better Clustering Quality Measurement
AUTHORS:
Jian-Hua Yeh, Fei-Jie Joung, Jia-Chi Lin
KEYWORDS:
Cluster Validity Index; Unsupervised Learning; K-Means Clustering; Intra Cluster Compactness; Inter Cluster Dispersedness
JOURNAL NAME:
Journal of Computer and Communications,
Vol.2 No.4,
March
18,
2014
ABSTRACT:
In this paper, a cluster validity index
called CDV index is presented. The CDV index is capable of providing a quality
measurement for the goodness of a clustering result for a data set. The CDV
index is composed of three major factors, including a statistically calculated
external diameter factor, a restorer factor to reduce the effect of data
dimension, and a number of clusters related punishment factor. With the
calculation of the product of the three factors under various number of
clusters settings, the best clustering result for some number of clusters
setting is able to be found by searching for the minimum value of CDV curve. In
the empirical experiments presented in this research, K-Means clustering method
is chosen for its simplicity and execution speed. For the presentation of the
effectiveness and superiority of the CDV index in the experiments, several
traditional cluster validity indexes were implemented as the control group of
experiments, including DI, DBI, ADI, and the most effective PBM index in
recent years. The data sets of the experiments are also carefully selected to
justify the generalization of CDV index, including three real world data sets
and three artificial data sets which are the simulation of real world data
distribution. These data sets are all tested to present the superior features
of CDV index.