TITLE:
On Data-Independent Properties for Density-Based Dissimilarity Measures in Hybrid Clustering
AUTHORS:
Kajsa Møllersen, Subhra S. Dhar, Fred Godtliebsen
KEYWORDS:
Background Noise, Gaussian Mixture Distribution, Kullback-Leibler, Outliers, Subcluster Weight
JOURNAL NAME:
Applied Mathematics,
Vol.7 No.15,
September
12,
2016
ABSTRACT: Hybrid clustering combines partitional and
hierarchical clustering for computational effectiveness and versatility in
cluster shape. In such clustering, a dissimilarity measure plays a crucial role
in the hierarchical merging. The dissimilarity measure has great impact on the
final clustering, and data-independent properties are needed to choose the
right dissimilarity measure for the problem at hand. Properties for
distance-based dissimilarity measures have been studied for decades, but
properties for density-based dissimilarity measures have so far received little
attention. Here, we propose six data-independent properties to evaluate density-based
dissimilarity measures associated with hybrid clustering, regarding equality,
orthogonality, symmetry, outlier and noise observations, and light-tailed
models for heavy-tailed clusters. The significance of the properties is
investigated, and we study some well-known dissimilarity measures based on
Shannon entropy, misclassification rate, Bhattacharyya distance and
Kullback-Leibler divergence with respect to the proposed properties. As none of
them satisfy all the proposed properties, we introduce a new dissimilarity
measure based on the Kullback-Leibler information and show that it satisfies
all proposed properties. The effect of the proposed properties is also
illustrated on several real and simulated data sets.