^{1}

^{*}

^{1}

^{1}

^{1}

With the advent of the IoT era, the amount of real-time data that is processed in data centers has increased explosively. As a result, stream mining, extracting useful knowledge from a huge amount of data in real time, is attracting more and more attention. It is said, however, that real- time stream processing will become more difficult in the near future, because the performance of processing applications continues to increase at a rate of 10% - 15% each year, while the amount of data to be processed is increasing exponentially. In this study, we focused on identifying a promising stream mining algorithm, specifically a Frequent Itemset Mining (FIsM) algorithm, then we improved its performance using an FPGA. FIsM algorithms are important and are basic data- mining techniques used to discover association rules from transactional databases. We improved on an approximate FIsM algorithm proposed recently so that it would fit onto hardware architecture efficiently. We then ran experiments on an FPGA. As a result, we have been able to achieve a speed 400% faster than the original algorithm implemented on a CPU. Moreover, our FPGA prototype showed a 20 times speed improvement compared to the CPU version.

Frequent Itemset Mining (FIsM) is an important and fundamental problem in data mining with regards to association and correlation. FIsM finds frequent itemsets by counting the occurrence of subgroups called itemsets in a transactional database. It is possible to know the relationship between items from frequent itemsets because items contained in frequent itemsets have a high probability of occurring at the same time. The knowledge gained from this technique is widely used in data analytics, such as in the analysis of sensor networks, network traffic, stock markets, and fraud detection.

There is a similar process, Frequent Item Mining (FIM), which is used to discover frequently appearing items in a database. Though the processes sound similar, FIsM is a much more difficult problem compared to FIM because of combinatorial itemset explosion.

In 1994, Agrawal proposed the first Apriori algorithm [

Since stream data refers to large amounts of data coming in continuously, it is unrealistic to keep all data in main memory. Therefore, the algorithm is not allowed to scan the database multiple times, this memory capacity limitation prevents the method from accurately counting the number of occurrences of all itemsets. For this reason, a single-scan approximation algorithm has been proposed. For example, lossy counting [

In the FPGA implementation of the FIM item-processing algorithm, Teubner proposed the use of a space saving algorithm [

Our contributions in this paper are as follows:

We propose a faster and memory-efficient algorithm for hardware based on Skip LC-SS.

We explain a method to store itemset on small BRAMs without tree structure.

We verify the improvement of performance with standard evaluation tool that is capable of evaluating various FIsM.

This paper is organized as follows. The algorithm, which is the base of our algorithm, is described in Section 2. The discussion in Section 3 gives the hardware that is designed to work with our proposed algorithm. Experimental and evaluation results are described in Section 4. We then summarize our research in Section 5.

In _{1}, T_{2} ...T_{n}, where T_{i} is the transaction that arrives at time I and N is any huge number. Let e be an entry, and the support of e, shown by sup(e), is the number of transactions that include e in the stream S.

Given a minimal support threshold, σ satisfying ∈ (0, 1). If sup(e) ≥ σN, e is frequent itemset. In this algorithm, the counter called frequent counter for each mining target is held.

An entry table D is a table used to store the entries. K is the maximum number that can be held in table D. |D| denotes the number of entries in D. The minimum entry is entry e with a minimum of the frequent count in D. Given a minimal support threshold σ (0 < ∆/N < σ), the Skip LC-SS algorithm can be used to output all the items in set e such that sup(e) ≥ σN (no false negative), where ∆ is a lower limit such that the recall is guaranteed to be 1.

The Skip LC-SS algorithm was proposed by Yamamoto et al. in 2014. It can be extended to be used with the FIsM algorithm that corresponds to stream data by integrating it with Lossy Counting algorithm and space saving algorithm. It used the features of the Space Saving Algorithm to fix the number of entries to be saved. The processing unit in the transaction was the Lossy Counting Algorithm (LC-SS), in order to speed up the algorithm, some approximation process (skip) was added. As described above, this algorithm fixes the number of stored entries. Given a constant k, entry table D, and data stream containing plural transactions, the LC- SS Algorithm operates in the following manner in accordance with the state of the entry table. In this case, c(e_{i}) is the number of occurrences of the i-th itemset e_{i} and m is an entry with the minimum number of occurrences in the entry table.

• Case 1: |D| < K

1) if < e_{i}, c(e_{i}) > _{i}) by one.

2) else, store the new itemset i, 1 > in D. _{}

3) if |D| < K, set error count(∆) to one.

4) after cheking all itemset, process next transaction.

We show the method used to process T_{1} in _{1}. Then, confirm whether each itemset appears in the entry table. Because T_{1} is the first transaction, the entry table is empty. Therefore, all itemset are stored. However, if the number of itemsets exceeds K, the method shown in Case 2 is used. In this case, after all itemsets are checked, error count (∆) is incremented by one because the entry table has been filled. At this point the next transaction is processed.

• Case 2 : |D| = K

1) if i, c(e _{i}) > D, increment c(e _{i}) by one. _{}

2) else store e_{i} as a candidate.

3) after checking all itemset, replace the minimal entry

with the candidate set i, ∆ + 1>. _{}

4) update error count (∆) to c(m).

5) process next transaction.

The processing of T_{2} is shown in

In this algorithm, memory consumption is kept constant. However due to the use of the minimum value search and replacement, processing time is negatively affected. Thus, Yamamoto et al. introduced an approximation process to simplify (skip) the processing of replacement and large transactions, which cause a reduction in processing speed. The methods were termed the t2-skip and r-skip. If 2^{|Ti|−1} > k, the entry table frequency and error counts are incremented by one. Afterwards, the transaction is completed (t2-skip). Thus, it is possible to efficiently process a huge amount of transactions without causing significant performance degradation.

r-skip is an approximation process, but it is omitted because it is not used, as will be described later. Finally, in order to prevent the accuracy degradation due to r-skip and t2 skip, Yamamoto et al. proposed stream reduction. Stream reduction is a pre-processing method performed on the input transactions. The method executes FIM on transactions, and compares the frequency of items included in the incoming transaction with ∆. Items whose frequency is lower than ∆ are then removed from the transaction as an infrequent itemset. This enables the method to perform frequent itemset mining from relatively frequent items. Therefore, Skip LC-SS retains accuracy even when used with r-skip and t2-skip techniques to accelerate the processing.

_{3} = a, b, f is counted, occurrences of each item is (3, 2, 1). However, after processing transaction t_{2}, ∆ = 1. Because the frequency of item f is less than delta, transaction T_{3}’ is to be processed at a later stage than a and b. Afterwards, for this transaction, it is determined whether to execute the t2-skip. Because 22-1 < 7 (the condition of the t2-skip), this transaction is processed in LC-SS part of the algorithm (details omitted).

The maximum point of FIsM on hardware is considered configuring the entry table. In this study, we used a hash table to confirm whether itemsets were registered in the entry table for a small number of memory accesses. We list the following three issues when considering the hardware used to implement this algorithm.

1) Limitation of the amount of BRAM. (3.1)

Because the size of the BRAM is much smaller than the amount of memory that could be used by the server, the number of itemsets in the entry table is limited. This limitation lowered practicality for a huge data set. In custom chips, area is an issue.

2) Searching the minimal frequent count of the entry table is not suitable for hardware. (3.2)

The process of replacement requires that the minimal frequent count is found. However, sorting is not a problem well suited for FPGAs.

3) The lack of parallelism of the original algorithm. (3.3)

In the Skip LC-SS Algorithm, it is impossible to perform a count-up and replace at the same time because sequential processing is assumed.

In the Skip LC-SS algorithm, it is necessary to perform many processes, such as registering an itemset, confirming a registered itemset, counting up, and conducting sorting and replacement in the entry table for each transaction. Furthermore, these processes change depending on the state of the table. Therefore, frequent memory access becomes a bottleneck when DRAM is used instead of BRAM because of its latency. Considering the latency of reference itemsets, our approach is to configure the hash table using BRAM. This allows us to confirm the existence of itemsets at high speed with low latency. However, the size of BRAM is too small as mentioned above. Therefore, the itemset cannot be stored as string as in other software implementation. It is necessary to use a compact data structure for an efficient implementation with the hardware.

Therefore, to obtain a more efficient data structure, we considered that not only can the bit string indicatean itemset, its address is stored in accordance with the hash value obtained by hashing the itemset and is an unique value. Specifically, the itemset key contains n + 1 items (length n + 1), which consists of the new item (length 1) and the initial itemset address containing n items. Based on this idea, we propose a hash table to enable efficient searching and memory saving.

Our proposed method is shown in

There are two major methods used to search an itemset efficiently, breadth-first search and depth-first search. In recent research, depth-first search has been used with less memory consumption. However, in our method, knowledge of the itemsetkey with length n + 1 requires the address of the itemset with length n. Therefore, we use a breadth-first search due to the smaller increase in memory consumption of breadth-first search compared with a hashtable.

First, packets arrive include transaction via Ethernet decomposed and are stored in FIFO. Then FIFO provides items to the itemset generator. The itemset generator uses items as a key to the hash function for an itemset of length one. Then, the obtained value is used to search the hash table. If the itemset already exists in the hash table, its address is stored at the n-1 hash memory location. After processing all itemsets of length one, the itemset generator creates a key for an itemset with length two by combining items from the packet receiver and addresses from the n-1hash memory. Itemsets of length two are searched in the same way. Until all itemsets are completed, the search is repeated through incrementally increasing length one by one.

The Skip LC-SS Algorithm uses sorting to replace infrequent itemsets in the entry table with new itemsets. However, sorting in the skip LC-SS Algorithm is different from ordinary sorting because the value in the entry table changes by one at most. In our research, we propose a sorting method based on stream summary [

In this section, we propose a method to replace the Skip LC-SS Algorithm in the proposed hardware. In this al-

gorithm, we assume that the hardware allows us to count and sort at the same time. However, these two pro- cesses cannot be executed simultaneously with the replacement process. Therefore, it is necessary to stop these processes and switch processing actions for each transaction.

Instead of processing replacements during each transaction, we propose a method for batch processing a plurality of transactions. We then explore frequent itemsets among the replacement candidates generated from each transaction using counting. Possible replacements for the frequent itemset groups and the current entry table are combined at the end of the batch. Here, the problem is the proportion of itemsets that are not hit and the itemsets that hit the entry table. If the number of itemsets that do not hit is large, a bottleneck appears replacement candidates for the frequent itemsets are still being found. Therefore, replacement candidates generated through this process are generally itemsets whose length is the same as the frequent itemset in the entry table plus one, because they are likely to be frequent. That is, we will explore the frequent itemsets from the itemsets that were not found in the hash table. A search of the itemset that contains an itemset that was not hit in the hash table is terminated.

In order to obtain reliable data, we created evaluation data using a Synthetic Data Generator widely used to evaluate the performance of FIsM by the IBM Almaden Quest research group. This dataset is generated using the following parameters: “T” is the average number of items included in the transactions, “I” is the average length of the frequent itemset, “D” is the number of transactions included in the database, and “N” is the number of types of items included in the database. We use a label “TxIyDz (T = x, I = y, D = z)” to denote the characteristics of a dataset. Thereafter, in order to reveal the various characteristics of our algorithm, we conducted experiments by incrementally changing each parameter. Finally, to evaluate the algorithm more realistically, we evaluated it using a dataset that contained retail market basket data created by Tom Brijs. Most experimental results were obtained using an Intel Xeon E5-1620. Experiment 5-3 was evaluated using a Mac Pro with Mac OS 10.6, 3.33 GHz, 16 GB. Hardware (HW) results were evaluated based on the RTL simulation with 156 MHz. Target device is ZC706 and the resource usage is shown in

“Precision” is a metric indicating how often the frequent itemsets identified by the algorithm are truly frequent itemsets. ‘Recall’ is a metric indicating how often the algorithm identifies the real frequent itemsets.

These metrics indicate the accuracy of the FIsM results. The depth of the hash table is 12-bit and it has an associativity of 16. The size of the entry table is 10K. The size of the hash table and the entry table is the same.

Parameter | |
---|---|

T | Average number of items included in the transaction |

I | Average length of frequent itemset |

D | Number of transactions included in the database |

N | Kind of items included in the database |

L | Maximum number of frequent itemset |

FF | LUT | BRAM | Clock freq. |
---|---|---|---|

1,0370/106,400 | 8098/53,200 | 135/140 | 156 MHZ |

Min support | 0.5 | 0.1 | 0.005 |
---|---|---|---|

Precision recall | 1.00 | 1.00 | 1.00 |

Recall | 1.00 | 1.00 | 1.00 |

^{L}). In addition, in SW and HW, processing speed remains at a realistic value, while itemsets increase explosively due to the exponential increase in combinations. This is because an increase in T does not lead to an increase in the processing target due to the stream summary and efficient search methods.

information and a web click stream. The retail data consists of 88,162 transactions and 16,470 items; the web- log data stream consists of 19,466 transactions and 9961 items. As mentioned above, SW evaluation data is obtained from the original skip LC-SS paper by Yamamoto using the same dataset. The limit of the stored itemsets is 500 K, 600 K, and 700 K respectively. The configuration of the HW was the same as in the previous section (bucket size 10 K). As a result, despite the much smaller bucket size compared to the original algorithm, processing speed was 100 times faster and the error count necessary to guarantee minimum support was 100 times smaller. By introducing a batch process, we achieved a more efficient algorithm. Since the original algorithm executes the approximation process repeatedly on huge transactions, memory efficiency was reduced. However, the proposed algorithm uses memory more efficiently because it selects potential frequent itemsets and replaces them with frequent replacement candidates. Even in realistic datasets, this algorithm works well.

In this paper, we propose a hardware-friendly algorithm to increase the use of parallelism in FIsM processes in the original Skip LC-SS algorithm. By identifying the bottleneck in the original algorithm, we were able to successfully introduce a more efficient replacement process using our batch-replacement concept. The proposed algorithm also maintains an error count in order to guarantee minimum necessary support, which in turn makes it possible to keep the amount of memory consumption low. As a result, we achieved a 100 times faster and more memory-efficient algorithm. Because our algorithm limits the replacement target and prunes itemsets, except the itemset whose length is the length of the current frequent itemset plus one, it is possible that a frequent itemset that contains many items may be missed if the batch interval is not appropriate. In the future, we will improve our work by adding the capability to dynamically change the batch interval. This is aimed at producing a more accurate, fast, and hardware-friendly FIsM algorithm and a hardware implementation.