Parallel and Hierarchical Mode Association Clustering with an R Package Modalclust

DOI: 10.4236/ojs.2014.410078   PDF   HTML   XML   3,341 Downloads   3,853 Views   Citations


Modalclust is an R package which performs Hierarchical Mode Association Clustering (HMAC) along with its parallel implementation over several processors. Modal clustering techniques are especially designed to efficiently extract clusters in high dimensions with arbitrary density shapes. Further, clustering is performed over several resolutions and the results are summarized as a hierarchical tree, thus providing a model based multi resolution cluster analysis. Finally we implement a novel parallel implementation of HMAC which performs the clustering job over several processors thereby dramatically increasing the speed of clustering procedure especially for large data sets. This package also provides a number of functions for visualizing clusters in high dimensions, which can also be used with other clustering softwares.

Share and Cite:

Cheng, Y. and Ray, S. (2014) Parallel and Hierarchical Mode Association Clustering with an R Package Modalclust. Open Journal of Statistics, 4, 826-836. doi: 10.4236/ojs.2014.410078.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Fraley, C. (1998) Algorithms for Model-Based Gaussian Hierarchical Clustering. SIAM Journal on Scientific Computing, 20, 270-281.
[2] Fraley, C. and Raftery, A. (1998) How Many Clusters? Which clustering method? Answers via Model-Based Cluster Analysis. The Computer Journal, 41, 578-588.
[3] Fraley, C. and Raftery, A. (1999) Mclust: Software for Model-Based Cluster Analysis. Journal of Classification, 16, 297-306.
[4] Fraley, C. and Raftery, A. (2002) Model-Based Clustering, Discriminant Analysis, and Density Estimation. Journal of the American Statistical Association, 97, 611-631.
[5] Fraley, C. and Raftery, A. (2002) Mclust: Software for Model-Based Clustering, Density Estimation and Discriminant Analysis. Tech. Rep., DTIC Document.
[6] Ray, S. and Lindsay, B. (2005) The Topography of Multivariate Normal Mixtures. The Annals of Statistics, 33, 2042-2065.
[7] Ray, S. and Lindsay, B. (2007) Model Selection in High Dimensions: A Quadratic-Risk-Based Approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 95-118.
[8] Baudry, J., Raftery, A., Celeux, G., Lo, K. and Gottardo, R. (2010) Combining Mixture Components for Clustering. Journal of Computational and Graphical Statistics, 19, 332-353.
[9] Hennig, C. (2010) Methods for Merging Gaussian Mixture Components. Advances in Data Analysis and Classification, 4, 3-34.
[10] Tantrum, J., Murua, A. and Stuetzle, W. (2003) Assessment and Pruning of Hierarchical Model Based Clustering. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 197-205.
[11] Li, J., Ray, S. and Lindsay, B. (2007) A Nonparametric Statistical Approach to Clustering via Mode Identification. Journal of Machine Learning Research, 8, 1687-1723.
[12] Cheng, Y. and Ray, S. (2014) Multivariate Modality Inference Using Gaussian Kernel. Open Journal of Statistics, 4, 419-434.
[13] Lindsay, B., Markatou, M., Ray, S., Yang, K. and Chen, S. (2008) Quadratic Distances on Probabilities: A Unified Foundation. The Annals of Statistics, 36, 983-1006.

comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.