Document Clustering Using Semantic Cliques Aggregation ()
ABSTRACT
The search engines are indispensable tools
to find information amidst massive web pages and documents. A good search
engine needs to retrieve information not only in a shorter time, but also
relevant to the users’ queries. Most search engines provide short time
retrieval to user queries; however, they provide a little guarantee of
precision even to the highly detailed users’ queries. In such cases, documents
clustering centered on the subject and contents might improve search results.
This paper presents a novel method of document clustering, which uses semantic
clique. First, we extracted the Features from the documents. Later, the
associations between frequently co-occurring terms were defined, which were
called as semantic cliques. Each connected component in the semantic clique
represented a theme. The documents clustered based on the theme, for which we
designed an aggregation algorithm. We evaluated the aggregation algorithm
effectiveness using four kinds of datasets. The result showed that the semantic
clique based document clustering algorithm performed significantly better than
traditional clustering algorithms such as Principal Direction Divisive
Partitioning (PDDP), k-means, Auto-Class, and Hierarchical Clustering (HAC). We
found that the Semantic Clique Aggregation is a potential model to represent
association rules in text and could be immensely useful for automatic document
clustering.
Share and Cite:
Kumar, A. and Chiang, I. (2015) Document Clustering Using Semantic Cliques Aggregation.
Journal of Computer and Communications,
3, 28-40. doi:
10.4236/jcc.2015.312004.