TITLE:
An Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification
AUTHORS:
A. Anitha
KEYWORDS:
Agglomerative Clustering, Similarity Measure, Cluster Validity, Clickstream Sequence, Transaction
JOURNAL NAME:
Circuits and Systems,
Vol.7 No.9,
July
19,
2016
ABSTRACT: Web log mining is analysis
of web log files with web page sequences. Discovering user access patterns from
web access are necessary for building adaptive web servers, to improve
e-commerce, to carry out cross-marketing, for web personalization, to predict
web access sequence etc. In this paper, a new agglomerative clustering
technique is proposed to identify users with similar interest, and to determine
the motivation for visiting a website. Using this approach, web usage mining is
done through different stages namely data cleaning, preprocessing, pattern
discovery and pattern analysis. Results are given to explain how this approach
produces tight usage clusters than the existing web usage mining techniques.
Rather than traditional distance based clustering, the similarity measure is
considered during clustering process in order to reduce computational
complexity. This paper also deals with the problem of assessing the quality of
user session clusters and cluster validity is measured by using statistical
test, which measures the distances of clusters distributions to infer their
dissimilarity and distinguish level. Using such statistical measures, it is
proved that cluster accuracy is improved to the extent of 0.83, over existing
k-means clustering with validity measure 0.26, FCM (Fuzzy C Means) clustering
with validity measure 0.56. Rough set based clustering with validity measure
0.54 Generation of dense clusters is essential for finding interesting patterns
needed for further mining and analysis.