CDV Index: A Validity Index for Better Clustering Quality Measurement

Jian-Hua Yeh; Fei-Jie Joung; Jia-Chi Lin

doi:10.4236/jcc.2014.24022

Journal of Computer and Communications > Vol.2 No.4, March 2014

CDV Index: A Validity Index for Better Clustering Quality Measurement

Jian-Hua Yeh, Fei-Jie Joung, Jia-Chi Lin
Department of Computer Science and Information Engineering, Aletheia University, Taipei, Chinese Taipei.
Department of Fashion and Administration Management, St. Johns University, Taipei, Chinese Taipei.
DOI: 10.4236/jcc.2014.24022 PDF HTML 4,464 Downloads 6,079 Views Citations

Abstract

In this paper, a cluster validity index called CDV index is presented. The CDV index is capable of providing a quality measurement for the goodness of a clustering result for a data set. The CDV index is composed of three major factors, including a statistically calculated external diameter factor, a restorer factor to reduce the effect of data dimension, and a number of clusters related punishment factor. With the calculation of the product of the three factors under various number of clusters settings, the best clustering result for some number of clusters setting is able to be found by searching for the minimum value of CDV curve. In the empirical experiments presented in this research, K-Means clustering method is chosen for its simplicity and execution speed. For the presentation of the effectiveness and superiority of the CDV index in the experiments, several traditional cluster validity indexes were implemented as the control group of experiments, including DI, DBI, ADI, and the most effective PBM index in recent years. The data sets of the experiments are also carefully selected to justify the generalization of CDV index, including three real world data sets and three artificial data sets which are the simulation of real world data distribution. These data sets are all tested to present the superior features of CDV index.

Keywords

Cluster Validity Index; Unsupervised Learning; K-Means Clustering; Intra Cluster Compactness; Inter Cluster Dispersedness

Share and Cite:

Yeh, J. , Joung, F. and Lin, J. (2014) CDV Index: A Validity Index for Better Clustering Quality Measurement. Journal of Computer and Communications, 2, 163-171. doi: 10.4236/jcc.2014.24022.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Mitchell, T.M. (1997) Machine Learning. 1st Edition, McGraw-Hill, Inc., New York.
[2]	Bishop, C.M. (2006) Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag New York, Inc., Secaucus.
[3]	Davies, D.L. and Bouldin, D.W. (1979) A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 224-227. http://dx.doi.org/10.1109/TPAMI.1979.4766909
[4]	Dunn, J.C. (1973) A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Journal of Cybernetics, 3, 32-57. http://dx.doi.org/10.1080/01969727308546046
[5]	Shafi, I., Ahmad, J., Shah, S.I., Ikram, A.A., Khan, A.A. and Bashir, S. (2010) Validity-Guided Fuzzy Clustering Evaluation for Neural Network-Based Time-Frequency Reassignment. EURASIP Journal on Advances in Signal Processing, 2010, Article ID: 636858. http://dx.doi.org/10.1155/2010/636858
[6]	Pakhira, M.K., Bandyopadhyay, S. and Maulik, U. (2004) Validity Index for Crisp and Fuzzy Clusters. 37, 487-501.
[7]	Wikipedia. Minkowski Distance. http://en.wikipedia.org/wiki/Minkowski_distance
[8]	Macqueen, J.B. (1967) Some Methods for Classification and Analysis of Multi-Variate Observations. Proceedings of the Fifth Berkeley Symposium on Math, Statistics, and Proba-bility, Vol. 1, University of California Press, 281-297.
[9]	Fisher, R.A. (1936) The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7, 179-188. http://dx.doi.org/10.1111/j.1469-1809.1936.tb02137.x
[10]	Bezdek, J.C. and Pal, N.R. (1998) Some New Indexes of Cluster Validity. Transactions on Systems, Man, and Cybernetics—Part B, 28, 301-315. http://dx.doi.org/10.1109/3477.678624
[11]	Kothari, R. and Pitts, D. (1999) On Finding the Number of Clusters. Pattern Recognition Letters, 20, 405-416. http://dx.doi.org/10.1016/S0167-8655(99)00008-2
[12]	Pal, N.R. and Bezdek, J.C. (1995) On Cluster Validity for the Fuzzy c-Means Model. IEEE Transactions on Fuzzy Systems, 3, 370-379. http://dx.doi.org/10.1109/91.413225

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies