TITLE:
Study on the Development and Implementation of Different Big Data Clustering Methods
AUTHORS:
Jean Pierre Ntayagabiri, Jérémie Ndikumagenge, Longin Ndayisaba, Boribo Kikunda Philippe
KEYWORDS:
Clustering, K-Means, Fuzzy c-Means, Expectation Maximization, BIRCH
JOURNAL NAME:
Open Journal of Applied Sciences,
Vol.13 No.7,
July
28,
2023
ABSTRACT: Clustering
is an unsupervised learning method used to organize raw data in such a way that
those with the same (similar) characteristics are found in the same class and
those that are dissimilar are found in different classes. In this day and age,
the very rapid increase in the amount of data being produced brings new
challenges in the analysis and storage of this data. Recently, there is a
growing interest in key areas such as real-time data mining, which reveal an urgent need to process very large data under
strict performance constraints. The objective of this paper is to survey
four algorithms including K-Means algorithm, FCM algorithm, EM algorithm and
BIRCH, used for data clustering and then show their strengths and weaknesses.
Another task is to compare the results obtained by applying each of these
algorithms to the same data and to give a conclusion based on these results.