Particle Swarm Optimized Optimal Threshold Value Selection for Clustering based on Correlation Fractal Dimension

Abstract

The work on the paper is focused on the use of Fractal Dimension in clustering for evolving data streams. Recently Anuradha et al. proposed a new approach based on Relative Change in Fractal Dimension (RCFD) and damped window model for clustering evolving data streams. Through observations on the aforementioned referred paper, this paper reveals that the formation of quality cluster is heavily predominant on the suitable selection of threshold value. In the above-mentionedpaper Anuradha et al. have used a heuristic approach for fixing the threshold value. Although the outcome of the approach is acceptable, however, the approach is purely based on random selection and has no basis to claim the acceptability in general. In this paper a novel method is proposed to optimally compute threshold value using a population based randomized approach known as particle swarm optimization (PSO). Simulations are done on two huge data sets KDD Cup 1999 data set and the Forest Covertype data set and the results of the cluster quality are compared with the fixed approach. The comparison reveals that the chosen value of threshold by Anuradha et al., is robust and can be used with confidence.

Share and Cite:

Yarlagadda, A. , Murthy, J. and Prasad, M. (2014) Particle Swarm Optimized Optimal Threshold Value Selection for Clustering based on Correlation Fractal Dimension. Applied Mathematics, 5, 1615-1622. doi: 10.4236/am.2014.510155.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Aggarwal, C.C. (2006) Data Streams: Models and Algorithms (Advances in Database Systems). Springer, Secaucus.
[2] Gantz, J., Reinsel, D., Chute, C., Schlichting, W., McArthur, J., Minton, S., Xheneti, I., Toncheva, A. and Manfrediz, A. (2007) The Expanding Digital Universe: A Forecast of Worldwide Information Growth through 2010. Technical Report, 12, 634-638.
[3] Gaber, M.M., Zaslavsky, A. and Krishnaswamy, S. (2005) Mining Data Streams: A Review. SIGMOD Record, 34, 18-26. http://dx.doi.org/10.1145/1083784.1083789
[4] Aggarwal, C.C., Han, J., Wang, J. and Yu, P. (2003) A Framework for Clustering Evolving Data Streams. In: Proceedings of 29th International Conference on Very Large Data Bases (VLDB’03), Berlin, September 2003.
[5] Aggarwal, C.C., Han, J.W., Wang, J.Y. and Yu, P.S. (2006) On Clustering Massive Data Streams: A Summarization Paradigm. In: Aggarwal, C.C., Ed., Data Streams—Models and Algorithms, Springer, Boston, 11-38.
[6] Babock, B., Datar, M., Motwani, R. and O’Callaghan, L. (2003) Maintaining Variance and k-Medians over Data Stream Windows. Proceedings of the 22nd ACM Symposium on Principles of Data Base Systems, San Diego, 234-243.
[7] Barbará, D. (2002) Requirements for Clustering Data Streams. SIGKDD Explorations Newsletter, 3, 23-27. http://dx.doi.org/10.1145/507515.507519
[8] Beringher, J. and Hullermeier, E. (2006) Online Clustering of Parallel Data Streams. Data & Knowledge Engineering, 58, 180-204. http://dx.doi.org/10.1016/j.datak.2005.05.009
[9] Cao, F., Ester, M., Qian, W. and Zhou, A. (2006) Density-Based Clustering over Evolving Data Stream with Noise. Proceedings of the 6th SIAM International Conference on Data Mining (SIAM’06), Bethesda, 326-337.
[10] Charikar, M., O’Callaghan, L. and Panigrahy, R. (2003) Better Streaming Algorithms for Clustering Problems. Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC’03), San Diego, 30-39.
[11] Chen, Y. and Li, T. (2007) Density-Based Clustering for Real-Time Stream Data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07), ACM, New York, 133-142.
[12] Guha, S., Meyerson, A., Mishra, N., Motwani, R. and O’Callaghan, L. (2003) Clustering Data Streams: Theory and Practice. IEEE Transactions on Knowledge and Data Engineering, 15, 515-528. http://dx.doi.org/10.1109/TKDE.2003.1198387
[13] Joao, G. (2009) An Overview on Mining Data Streams. Springer-Verlag, Berlin, Heidelberg, 29-45.
[14] Zhu, Y.Y. and Shasha, D. (2002) StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. Proceedings of the 28th International Conference on Very Large Data Bases, Hong Kong, 358-369.
[15] Ester, M., Kriegel, H.-P., Jrg, S. and Xu, X. (1996) A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’96), Portland, 373-382.
[16] Tu, L. and Chen, Y.X. (2009) Stream Data Clustering Based on Grid Density and Attractions. ACM Transaction Knowledge Discovery Data, 3, 12:1-12:27.
[17] Guha, S., Mishra, N., Motwani, R. and O’Callaghan, L. (2000) Clustering Data Streams. In: Proceedings of the Annual IEEE Symposium on Foundations of Computer Science, Redondo Beach, 12-14 November 2000, 359-366.
[18] O’Callaghan, L., Mishra, N., Mishra, N. and Guha, S. (2002) Streaming-Data Algorithms for High Quality Clustering. Proceedings of the 18th International Conference on Data Engineering (ICDE’01), San Jose, 685-694.
[19] Anuradha, Y., Murthy, J.V.R. and Krishnaprasad, M.H.M. (2014) Clustering Based on Correlation Fractal Dimension over an Evolving Data Stream. Communicated to IJAIT 2014, unpublished.
[20] Anuradha, Y., Murthy, J.V.R. and Krishnaprasad, M.H.M. (2013) Estimating Correlation Dimension Using Multi Layered Grid and Damped Window Model over Data Streams. Procedia Technology, 10, 797-804. http://dx.doi.org/10.1016/j.protcy.2013.12.424
[21] Belussi, A. (1995) Estimating the Selectivity of Spatial Queries Using the Correlation Fractal Dimension. Proceedings of 21st International Conference on Very Large Data Bases, Zurich, 11-15 September 1995.
[22] Li, G.L., et al. (2011) Fractal-Based Algorithm for Anomaly Pattern Discovery on Time Series Stream. Journal of Convergence Information Technology, 6, 181-187.
http://dx.doi.org/10.4156/jcit.vol6.issue3.20
[23] Kennedy, J.F. and Eberhart, R.C. (1995) Particle Swarm Optimization. Proceedings of the IEEE International Conference on Neural Networks, 4, 1942-1948.
http://dx.doi.org/10.1109/ICNN.1995.488968
[24] Shi, Y. and Eberhart, R.C. (1998) Parameter Selection in Particle Swarm Optimization. Evolutionary Programming VII, Springer. Lecture Notes in Computer Science, 1447, 591-600.
http://dx.doi.org/10.1007/BFb0040810

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.