A Novel Approach for Clustering Periodic Patterns

The process of extracting patterns that are frequent from supermarket datasets is a well known problem of data mining. Nowadays, we have many approaches to resolve the problem. Association rule mining is one among them. Supermarket data are usually temporal in nature as they record all the transactions in the supermarket, with the time of occurrence. An algorithm has been proposed to find frequent itemsets, taking the temporal attributes in supermarket dataset. The best part of the algorithm is that each frequent itemset extracted by it is associated with a list of time intervals in which it is frequent. Taking time of transactions as calendar dates, we may get various types of periodic patterns viz. yearly, quarterly, monthly, etc. If the time intervals associated with a periodic itemset are kept in a compact manner, it turns out to be a fuzzy time interval. Clustering of such patterns can be a useful data mining problem. In this paper, we put forward an agglomerative hierarchical clustering algorithm which is able to extracts clusters among such periodic itemsets. Here we take two similarity measures, one on the itemsets of the clusters and others on the corresponding fuzzy time intervals. The efficiency of the proposed method is demonstrated through experimentation on real datasets.


Introduction
The most important data mining problems based on unsupervised learning approach is Clustering and it is very useful for the extraction of data distribution and patterns in the datasets.The clustering process is used to discover both the dense and sparse regions in a dataset.The two main broad approaches are parti-tioning approach and hierarchical approach.The hierarchical clustering creates a hierarchy of clusters from small to big or big to small and consequently it is named as agglomerative or divisive clustering techniques respectively.Clustering of numerical data has been studied in the past [1].However, practically we may have different types of data such as binary, categorical, spatial, ordinal, and temporal or mixture of these.New methods and interesting algorithms for clustering categorical and spatial data have been proposed in the recent times [2] [3] [4] [5].
Association rule mining is an important data mining problem which derives associations among data and was formulated by Agrawal et al. [6].Extracting association rules from temporal dataset is also an interesting data-mining problem.In [7], Ale et al. have proposed a method of extracting association rules which hold within an itemset's lifetime, where the lifetime of an itemset is the time-period between the first appearance and the last appearance of the itemset in the transactions.In [8], the work proposed by Ale and Rossi [7] is extended by considering time-gap between two consecutive appearance of an itemset in the transactions.The algorithm discussed in [8] mines all locally frequent itemsets along with the list of time-intervals.Each frequent itemsets is associated with a time-intervals lists, where it is frequent.From locally frequent itemsets, we can define various periodic patterns by considering the time-stamps as calendar dates.All such patterns are discussed in detail by the same authors in [9] [10].While extracting periodicity of a frequent pattern, if the associated timeintervals overlapping, then they can be piled up to form fuzzy interval [11].Thus, we can have some periodic patterns with each pattern is associated with a fuzzy time interval which describes its period.
In this paper, we devise an agglomerative hierarchical clustering method to explore clusters among such periodic patterns.We define the similarity measure on the corresponding fuzzy time intervals [12] associated with the periodic patterns.Then a merge function is defined in terms of the similarity as the union of the pair of periodic frequent itemsets.If the value of the similarity function of fuzzy time intervals is greater than some pre-assigned thresholds, then the corresponding frequent itemsets pairs are merge to form larger cluster/itemset and their corresponding fuzzy time intervals are also aggregated [13].Finally, in this paper we present an algorithm for the clustering of periodic patterns.
The rest of the paper is arranged as follows.Section 2 presents a brief literature review related to the existing clustering algorithms.In section 3, we present some basic definitions and results used in this paper.The proposed agglomerative clustering algorithm is discussed in section 4. In section 5, we discuss some analysis of experiments and results.Finally, we wind up the paper with possible future enhancements of the proposed work in section 6.

Related Works
In this segment, we present a brief assessment of the existing research findings related to our work.In [2], Gibson et al. have proposed an algorithm for clustering categorical data.Their approach is based dynamical system.In [14], authors have proposed a clustering algorithm called BIRCH.The algorithm proposed in [14] is a hierarchical agglomerative algorithm and it is an efficient algorithm for large datasets.In [4], authors have proposed a robust method for clustering categorical data using summaries.It is known as ROCK and it is an agglomerative hierarchical clustering.In [15], authors have done a survey on clustering timeseries data.In [16], authors have discussed algorithm for mining temporal data.In [17], the authors have done a survey on temporal data clustering.Concept of fuzzy sets [18] has been widely used in different areas including cluster analysis, association rule mining, pattern recognition and signal processing in the last couple of years.In [19], Dutta and Mahanta have proposed an algorithm for clustering large categorical database.The approached used in [19] is a fuzzy set based approach.In [20], author has made a survey on fuzzy clustering.In [21], authors have discussed about the applications of fuzzy sets in pattern recognition.
Finding associations among data has also attracted a large number of researchers.In [6], authors have presented a nice and efficient algorithm for association rules extractions.In [7], authors have presented a modified A-priori algorithm for the extractions of temporal association rules.In [1], authors have extended the work of [7] by adding the time-gap between two consecutive transactions containing an itemset.The algorithm [8], gives all locally frequent itemsets along with the lists of time intervals.Here each frequent itemset is linked with a time intervals list in which it is frequent.To compute the periodicity of such frequent itemsets if the time intervals associated with them have overlapping, then a method of redefining the time intervals is proposed in [11], which turns out to be fuzzy time intervals.

Problem Definition
In this section, we present a summarized view of some definitions and results on which our proposed algorithm is based.

Fuzzy Sets
Let X be the universe of discourse, then the fuzzy set A of X is characterized by , where ( )

α-Cut of Fuzzy Sets
An α-cut of the fuzzy set A of X is actually a crisp set A α with elements x of X having membership greater than or equal to α i.e.

Convex Fuzzy Sets
A fuzzy set is said to be convex if all it's αcuts are convex sets.

Fuzzy Numbers
A fuzzy number is a convex set defined in the real line whose membership value is 1 for at least one x ∈ X.

Trapezoidal Fuzzy Numbers or Fuzzy Intervals
A trapezoidal fuzzy numbers denoted by A = (a, b, c, d), where it's membership function is given by ( ) In short, we can express the above membership function as It is to be mentioned here that our fuzzy time intervals associated with periodic frequent patterns are actually trapezoidal fuzzy numbers.The fuzzy time intervals are formed using method [11] and is an L-R fuzzy intervals [22] [23].

Generalized Trapezoidal Fuzzy Numbers
A generalized trapezoidal fuzzy numbers is represented as A = (a, b, c, d, h), where its membership is given by ( ) In short, we can express the above membership function as

Similarity Measure
Let 0 ≤ a ≤ 1, 0 ≤ b ≤ 1, then the similarity measure between a and b is given by ( ) Let A = (a1, a2, a3, a4: h A ) and B = (b1, b2, b3, b4: h B ) be two generalized trapezoidal fuzzy numbers, then the similarity measure is defined in [12] as follows ( ) ( ) ( ) sim , sim , The larger the value of S(A, B) the more similarity between A and B. Obviously, 0 ≤ S(A, B) ≤ 1, A and B will be identical if S(A, B) = 1.

Merger of Periodic Frequent Itemsets Belonging to Two Different Clusters
Let A and B be two periodic frequent itemsets having periods T and S respectively.Then the merge function is defined ( ) merge , A B A B =  , if and only if S(T, S) ≥ θ, where, θ is pre-defined threshold (small positive numbers).It is to be mentioned here that corresponding fuzzy time intervals of T and S associated A and B will be aggregated [13] to form new fuzzy time interval.

Proposed Clustering Algorithm
In this segment, we describe our proposed clustering algorithm based on the notion explained in the previous section.The proposed algorithm takes as input, all periodic patterns with fuzzy time intervals describing their periods.The fuzzy time intervals are constructed using the methods discussed in [11].Assuming that each pattern is associated with exactly one fuzzy time interval, we want to do the clustering of periodic patterns in such a way that each cluster will contain similar type of periodic patterns.The similarity between periodic patterns is defined in terms their corresponding fuzzy time intervals.The similarity measure, S() function is discussed in section III.The itemset A and B having fuzzy time intervals T 1 and T 2 are said to be similar if and only if the value S(T 1 , T 2 ) is greater than some pre-defined threshold.
Initially, each pattern is assigned to a separate cluster.Thereafter, for each pair of clusters the similarity value S( ) is calculated and merge function is applied (to generate a new bigger cluster) if the S( ) is greater than the threshold.
And their corresponding periods/fuzzy time intervals are aggregated [13].The process of merging continues till no merger of clusters is possible or there is only one cluster at the top.In bellow we present the pseudo code for the proposed algorithm.

Experiment & Discussions
For experimentation, we have used a synthetic dataset T10I4D100K, available from FIMI1 website.As the dataset is non-temporal, we consider the temporal features, the calendar dates and execute the algorithm [8] to get the periodic patterns and then execute our clustering algorithm for the threshold value θ = 0.4.The clustering results along with the number of misclassified itemsets obtained are presented in the Table 1.We also represent trend of number of clusters with respect to the number of transactions in graphical form given in Figure 1 and with a bar diagram in Figure 2.

Conclusion & Future Works
In this paper, we have presented an agglomerative-hierarchical clustering algorithm to find clusters among periodic patterns with fuzzy time intervals.The    algorithm starts with as many clusters as the periodic patterns having fuzzy time intervals.Then, if their similarity value is greater than pre-defined threshold, the pairs of clusters are merged.The similarity is defined on fuzzy time intervals associated with periodic patterns.After each level the corresponding fuzzy time intervals are updated by aggregation.Although we have used the agglomerativehierarchical approach in this paper; any other approach can also be considered provided the similarity measure is properly defined.

Frequent 3 )
Pattern Clustering Algorithm (n, θ) Input: The number of frequent patterns n and threshold θ Output: A set of cluster S of with fuzzy time intervals Steps: Initially set of clusters is empty 1) S ← φ 2) read each frequent pattern A[i] with fuzzy time intervals T[i]To construct a cluster C with T if a cluster C 1 ∈ S with sim(T 1 , T)

Figure 1 .
Figure 1.Graph of no.transaction vs. no of clusters/itemsets.

Figure 2 .
Figure 2. Bar diagram cluster and misclassified items.

Table 1 .
Clustering results along with the number of misclassified itemsets for different sets of transactions.