Optimizing Query Results Integration Process Using an Extended Fuzzy C-Means Algorithm

Abstract

Cleaning duplicate data is a major problem that persists even though many works have been done to solve it, due to the exponential growth of data amount treated and the necessity to use scalable and speed algorithms. This problem depends on the type and quality of data, and differs according to the volume of data set manipulated. In this paper we are going to introduce a novel framework based on extended fuzzy C-means algorithm by using topic ontology. This work aims to improve the OLAP querying process over heterogeneous data warehouses that contain big data sets, by improving query results integration, eliminating redundancies by using the extended classification algorithm, and measuring the loss of information.

Share and Cite:

Mouhni, N. , Elkalay, A. and Chakraoui, M. (2014) Optimizing Query Results Integration Process Using an Extended Fuzzy C-Means Algorithm. Journal of Software Engineering and Applications, 7, 354-359. doi: 10.4236/jsea.2014.75032.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Hemalatha, S., Raja, K. and Arasu, T. (2011) Duplicate Detection of Query Results from Multiple Web Databases. IJCA Special Issue on Computational Science—New Dimension & Perspectives.
[2] James, C. and Bezdek, R.E. (1984) William Full FCM: The Fuzzy c-Means Clustering Algorithm. Computers & Geosciences, 10, 191-203. http://dx.doi.org/10.1016/0098-3004(84)90020-7
[3] Robert, L., Cannon, J.V.D. and Bezdek, J.C. (1986) Efficient Implementation of the Fuzzy c-Means Clusteng Algornthms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8, 248-255.
[4] Jayanthi, S.K. and Subramani, S. (2010) Link Spam Detection Based on Dbspamclust with Fuzzy c-Means Clustering. International Journal of Next-Generation Networks, 2.
[5] Blonda, A. and Blonda, P. (1999) A Survey of Fuzzy Clustering Algorithms for Pattern Recognition—Part I. IEEE Transactions on Systems, Man, and Cybernetics, 29, 778-785.
http://dx.doi.org/10.1109/3477.809032
[6] O. Hassanzadeh, Chiang, F., Lee, H.C. and Miller, R.J. (2009) Framework for Evaluating Clustering Algorithms in Duplicate Detection. Proceedings of the VLDB Endowment, 2, 1282-1293.
[7] Gruber, T.R. (1995) Toward Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human-Computer Studies, 43, 907-928.
http://dx.doi.org/10.1006/ijhc.1995.1081
[8] Mouhni, N. and El Kalay, A. (2014) A Critical Overview of Existing Query Processing Systems over Heterogeneous Data Sources. Journal of Theoretical & Applied Information Technology, 60, 254-262.
[9] Guarino, N. (Ed.) (1998) Formal Ontology in Information Systems. Proceedings of the First International Conference (FOIS’98), Trento, 6-8 June 1998.
[10] Mouhni, N. and El Kalay, A. (2013) Ontology Based Data Warehouses Federation Management System. International Journal of Computer Science Issues (IJCSI), 10.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.