^{1}

^{2}

Discovering complex and incomplete periodic patterns in the logs of events is a complicated and time consuming task. This work shows that it is possible to discover complex and incomplete periodic patterns through finding simple patterns first and through logical derivations of complex and incomplete patterns later on. The paper defines a syntax and semantics of a class of periodic patterns that frequently occur in the logs of events. A system of derivation rules proposed in the paper can be used to transform a set of periodic patterns into a logically equivalent set of patterns. The rules are used in the algorithms that derive complex and incomplete periodic patterns. A prototype implementation of the algorithms that discover complex and incomplete periodic patterns in the logs of events is presented.

It is well known that precise estimation of the future workloads can be used to eliminate many performance related problems in database systems [

A problem of finding periodic patterns in the recorded workloads can be solved in a different way from the computationally intensive generation of candidate patterns and their subsequent verification in the logs of events. An important property of periodically repeated processes says that no matter how long and how complex a process is, all its elementary operations are also processed periodically. It leads to an idea where discovery of complex and incomplete periodic patterns can be done through discovery of periodic patterns of individual operations and later on through composition of simple patterns into the complex and incomplete ones. Discovery of simple and complete periodic patterns based on one operation or event can be done in a relatively simple way after partitioning historical data into the subsets that record activities of only one operation or event. The outcomes are the elementary and complete periodic patterns. Then, such homogeneous and complete patterns are “stitched” into the homogeneous and incomplete patterns with certain predefined maximum number of cycles missing. In the next stage, the sets of homogeneous and incomplete periodic patterns are union and all pairs of patterns that satisfy the predefined composition constraints such as minimal length, maximal carrier length are created and composed into complex and incomplete periodic patterns. The procedure is repeated until no new pairs can be found.

To implement a method described above we need a system of derivation rules that transforms the sets of periodic patterns into the logically equivalent sets of patterns and that allows for synthesis of longer patterns and composition of more complicated patterns. The main objective of this paper is to propose a system of derivation rules for complex and incomplete periodic patterns and to show how such system can be used in the algorithms and in a simple prototype implementation that discovers the incomplete periodic patterns from the logs of event.

The paper is organized in the following way. The next section reviews the previous research works related to discovering periodic patterns in historical information. Section 3 defines the concepts of multisets, time units and it shows how a log of events is transformed into a workload trace. Section 4 defines the syntax and semantics of complete and incomplete periodic patterns. A system of derivation rules for incomplete periodic patterns of is proposed in Section 5. Section 6 presents the algorithms that apply the system of derivation rules to find complex and incomplete periodic patterns. Section 7 describes a prototype implementation of the algorithms. Finally, Section 8 concludes the paper.

The works on frequent episodes [

Let be a unique identifier of an event, for example identifier of query processing plan in a database system, or an identifier of flight booking routine in a flight reservation system, etc. A log of events is a sequence of pairs

At a data preparation stage a log of events is transformed into a workload trace in the following way. A period of time

in a reduced event table. A workload trace of a log

In this work we consider periodic patterns that belong to a wider class of CRP periodic patterns defined as a triple

A carrier C defines a structure of periodically repeated events, computations, queries, etc.

A range R determines a time scope of periodic repetitions of a carrier measured in time units, for example from one time unit to another or starting in a given time unit and continuing over several cycles.

A periodicity P determines location of the next cycle of periodic pattern, for example after a given number of time units from the latest cycle with possible delay by one or more time units.

In the previous works, for example in [

In this work we consider a subclass of CRP periodic patterns defined as a triple

A carrier C is a nonempty sequence of at least one nonempty multisets of events.

A range

A periodicity is a pair of natural numbers

The values of

If

It is possible, that

The following sequence of definitions leads to validation of periodic pattern in a workload trace. Let C be a sequence of multisets where

A trace of a complete periodic pattern

a complete periodic pattern is a union of traces of its carrier over n multisets such that each trace starts at the time units

A trace of an incomplete periodic pattern

An incomplete periodic pattern

For example, a periodic pattern

A system of derivation rules presented below allows for creation of new periodic patterns valid in a workload trace

Let C be a multiset of events such that

If a periodic pattern

If a periodic pattern

If a periodic pattern

If

If

If

The first case of a split rule divides a pattern that consists of two cycles into two single cycle patterns. The second case “cuts of” a single cycle periodic pattern from either left or right side of a pattern that consist of more than two cycles. Finally, the last case splits a periodic pattern that has more than three cycles into two patterns with more than one cycle.

If the periodic patterns

If

If

If

If

In the first case of a synthesis rule merges two single cycle pattern into one pattern. In the next two cases a single cycle pattern is added at the left/right end of another pattern. The last case concatenates two patterns such that both of them consist of more than one cycle.

If a periodic pattern

If the periodic patterns

A process of discovering periodic patterns in the workload traces is implemented through systematic application of the derivation rules. In each step the rules transform a set of periodic patterns into an equivalent set of patterns. The objectives of the transformations are to find the periodic patterns that have complex carriers, that are long, that have short periods, and that have smallest length of gaps allowed.

We say that a periodic pattern

The process is controlled by the values of parameters

A process of finding homogeneous periodic patterns consist of four steps in which the derivation rules are applied to a workload

Step 1

We start from the application of discovery rule to

Step 2

For each

Step 3

The split and composition rules are used to transform the patterns like

Step 4

Finally, we apply a synthesis rule to the periodic patterns created so far in order to create longer and incomplete patterns with the length of gaps limited by a value of parameter

Complexity of the algorithm depends on the length n of a workload trace

A process of finding complex periodic patterns initially applies a composition rule to the sets of homogeneous and incomplete patterns obtained in the previous steps. Then, a composition rule is applied to the results of compositions until no new complex and incomplete patterns can be derived. The process is limited by a threshold value

Step 1

The sets

Step 2

Next, we find in G all pairs of periodic patterns

Step 3

For each pair of periodic patterns

The complexity of the algorithm is equal to

The algorithms described in the previous section are implemented in an environment of a commercial relational database management system. We save an audit trail from processing of a sequence of SQL statements against a sample TPC-H benchmark database. Then, we apply EXPLAIN PLAN statement to transform each SQL statement into an expression of extended relational algebra. The computations of individual relational algebra operations are considered as individual events in a log. Suchlog of events together with the times tamps is transformed into a workload trace where the individual operations are grouped within predefined time units. A synthetic workload generator is used to implement periodic processing of sequences of SQL statements. A number of casually processed SQL statements are incorporated into the workload to evaluate an impact of randomly processed statements on discovery of periodic patterns. All software is implemented in SQL embedded into a host language of a database management system used.

Application of a synthetic workload generator allows for precise estimation of the quality of results obtained from the algorithms through the comparison of pre-programmed iterative processing of SQL statements with the periodic patterns obtained from the algorithms. The algorithms are applied several times to the same log of events partitioned each time into the time units of different size. In all cases when a period of iterative processing of SQL statement is a multiplicity of the length of time units the algorithms return almost perfect results and are able to precisely detect the expected patterns. In the cases when a period of iteratively processed sequence of SQL statements was not consistent with the length of time units the algorithm return a larger number of shorter and simpler periodic patterns than expected, however the results were still within the acceptable quality range. Quality of the results also strongly depends on a careful choice of the parameters that restrict carrier length, period length etc. Selection of too large or too small configuration parameters contributes to identification of accidental patterns not planned within a synthetic workload.

This work describes a new approach to discovery of complex and incomplete periodic patterns in the logs of events. The method is based on an idea that it is possible to create complex and incomplete periodic patterns through systematic discovery, transformation, and composition of the simpler patterns. The new approach requires a system of derivation rules for transformation of periodic patterns into the equivalent ones. Such system of rules is defined in the paper. We show how the rules can be applied in the algorithms that process a workload trace obtained from a log of events into initially simple and homogeneous patterns and later on into complex and incomplete ones. A prototype implementation of the algorithms is used to discover the periodic patterns among the events equivalent to the computations of individual extended relational algebra operations implementing SQL statements processed by a relational database system.

A number of interesting research problems remain to be solved. An important problem is an appropriate choice of time units used to partition a log of events into multisets of events in a workload trace due an observation that quality of the discovered patterns depends on the length of time units. The next interesting problem includes investigations on the other system of derivation rules that may lead to more efficient implementations. An interesting task is more efficient and more general implementation of the algorithms that can be used to discover periodic patterns in many other domains. Finally, a class of periodic patterns considered in the paper can be extended on the patterns with more sophisticated specification of period parameter allowing for slight variations from cycle to cycle.

Janusz R. Getta,Marcin Zimniak, (2015) Discovering Complex Incomplete Periodic Patterns through Logical Derivations. Open Journal of Social Sciences,03,8-15. doi: 10.4236/jss.2015.311002