Automated Performance Tuning of Data Management Systems with Materializations and Indices

Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune the performance of database systems with a little help from the database servers. In this paper, we propose a new technique for automated performance tuning of data management systems. Firstly, we show how to use the periods of low workload time for performance improvements in the periods of high workload time. We demonstrate that extensions of a database system with materialised views and indices when a workload is low may contribute to better performance for a successive period of high workload. The paper proposes several online algorithms for continuous processing of estimated database workloads and for the discovery of the best plan for materialised view and index database extensions and of elimination of the extensions that are no longer needed. We present the results of experiments that show how the proposed automated performance tuning technique improves the overall performance of a data management system.


Introduction
Database management systems used by the business organisations need expert database administrators (DBAs) to configure the systems before a startup time and to tune performance while the systems are running.Currently, to achieve acceptable performance of a database system an administrator has to tune the system by herself/ himself with her/his knowledge of the anticipated workload [1].A sudden increase in the total number of users accessing a database system during peak hours may cause the significant delays, and it may even block entire system.One of the solutions to increase efficiency and reliability of database systems is automated or selfperformance tuning [2].
Self-tuning database management systems automatically create and drop persistent storage structures like indices and materialised views and dynamically re-allocate transient storage resources such as data buffer cache, library cache, and so on.Materialzied views, indices, and better management of cache in transient storage can speed up a database system to reduce response time.
Automated performance tuning of database systems is a challenging problem.At the periods of high workload, the query optimizer has to assign different schedules to execute the query statements that contribute to delays and high processing costs [3].Ad hoc optimisation of query processing with materializations and indexing would reduce those costs.
Materialized views are created as precomputed joins, stored aggregated data, stored summarised data and so on.Materializations are less expensive for joins and aggregations for processing of complex queries of high importance.In-addition, materializations can also increase the speed of query processing in large database systems which include expensive operations such a complex aggregation with joins.Moreover, it improves the performance of query processing by pre-calculating expensive operations on the database before execution and sorting the results in the database.Appropriate use od the materialized view and indices allows for optimal adjustment of the size of data sets to a given collection of queries through partitioning and creating indices on the database.For example, a materialised view restricts relational tables vertically only to the columns needed by a query.An index restricts relational tables horizontally only to the rows that satisfy the conditions used in a query.
The main objective of this research is to invent the algorithms to generate performance tuning task that reduces high workload and arrange the job within limited time and limited I/O cost level (limited workload).The algorithm uses the predicted query workload [4] of the database system.The predicted workload is used to prepare a database system for better performance during future data processing.First, a database is extended with materializations discovered from the analysis queries.Next, the algorithms generate a set of operations and create indices on extended database system from the previous step.
We present the algorithms, which are used to reduce a high workload for a given set of queries.The given workload can be a high or low workload.To reduce the high workload, we choose the best preparation plan before that high workload occurs.Besides, the best plan cannot be greater than the limited workload, and it also execute within the short time.Figure 1 shows the high and low workload structure with limited time and workload level.The limited time t and workload level l located at the period of low-level workload.Our preparation plans will generate at the duration of the low-level workload with the limited t and l to reduce heavy workload.
In this paper, we present two group of algorithms for automated tuning of database management systems.In the first step, we accept a set of queries which suppose to be processed in high workload time.Then, we find the minimized projections for each query and store the projections as the schemas of materialized views.After that, we minimize a set of schemas of materialized views and arrange them from high to low priority.Next, we check whether the selected schemas can execute within the limited workload time or not.Finally, we generate the materialized views.In the second step, we analyse the queries and extract the operations from extended relational algebra expressions.Then, we eliminate the duplicated operations and arrange them by priority level of cost multiply by frequencies of the operations.Next, we estimate the cost of indices and check whether the cost fits into limited workload time.Finally, we create indices on materialized views.
The major contributions of our work are the following.• We show how to discover the schemas needed to execute the queries and eliminate the duplicated one from given collection of queries q 1 , •••, q n for extended database system with materialized view.• We show how to discover common operations from database extended with materialized view.
• We show how to apply indexing technique on database extended with materialized view.• We show how to generate tuning scripts (preparation scripts) within the limited time and workload level.
The paper consists of 6 sections.The previous works in related research areas will discuss in the next section.Sections 3 and 4 explain the algorithms.Experimental results are presented and discussed in Section 5. Finally, Section 6 concludes the paper.

Related Work
The manual processes are needed to achieve high-performance of the database system.It is an enormous effort and the complicated job which database management administrators (DBAs) or users are done [5].There are too many techniques to tune the database system which is proposed by [6].They discussed papers written over the past decade, which are related to self-tuning database systems.At first, they described papers that are using indexes [7] to tune the database.Next, they discussed materialized view and partitioning, which is another technique to tune the database system.Finally, they listed commercial DBMS tools for tuning database system and analysis the pros and cons of those tools.
There are some commercial databases tools like Oracle Database 11 g [3] providing self-tuning tools to tuning database system automatically.This tool accepts a set of SQL statements, then generates execution plans and allows users to choose the best plan.Once users have accepted the recommended plans, the performance tunning progress will complete and execute the tuning scripts.
Moreover, the authors from [8] show that how to self-tuning for the Database system and proposed Fuzzybased tuning.They proposed buffer cache size (BCS), Buffer-Hit-Ration (BHR), and Database size (DBSize) to show how to self-tuning the database system.Their methods might have the problem of limited buffer size.
Furthermore, there are many research works proposed indexing technique to tune the database system.Among them, the authors from [9] proposed two automatic on-line index selection algorithms and one semiautomatic on-line index algorithm.They proposed both automatic and semi-automatic tuning by using best selection of index technique.The semi-automatic on-line index provides the list of the index (recommends the set of indexes) and allows DBA to select to create or drop the index.
Materialized view is one of the best techniques for tuning database.In this book [10] shows how materialized views can speed up queries especially for table joins and the use of aggregates like SUM.Materialized view can reduce the queries cost because it has already had the summary information pre-computed in them.The queries cost means I/O, CPU, and memory costs which are related to processing queries.
Above researchers focused on how to achieve the best performance for given the set of queries, and they show the results that reduced cost and time for given the set of queries.Although, they did not describe the consuming cost and time to execute for tuning scripts or tuning processes.In our research, we discuss how tuning scripts occupy within limited time and cost before high workload occurs.Furthermore, we discuss how workload reduces by using our new techniques.

Materializations
This section presents the algorithms that find the schemas of materialized views to be created in a period of low workload time.The algorithms maximise the performance improvements during a high workload period and minimise the costs of materialised views creation at a low workload period.
Algorithm 1 takes on input a set of queries {q 1 , •••, q n } predicted for the nearest period of high workload.The output of the algorithm is a multiset of schemas of materialized views where r i is a relational table processed by one of the queries q j {q 1 , •••, q n } and {c i1 , •••, c in } is the smallest set of columns from r i needed to process a query q j .Algorithm 1. performs the following actions.1. Make a multiset of schemas of materialized views V empty.2. Iterate from q 1 to q n .Let current query be q i .Let r i1 , •••, r in be the relational tables processed by q i .2.1.Apply EXPLAIN PLAN statement to q i to get query processing plans.Then, for each relational table r i processed in q i find its smallest projection ri[c i1 , •••, c ik ] needed to process a query q i .2.2.Append each projection r i [c i1 , •••, c ik ] found to a multiset of schemas of materializations V. 3. Repeat until all the queries are processed.
As a simple example, we consider the queries q 1 : SELECT a FROM r WHERE b > 10; and q 2 : SELECT s.a, t.b FROM s JOIN t ON s.a = t.b; and q 3 : SELECT r.a, s.a FROM r JOIN s ON r.a = s.a.The output from Algorithm 1 is a multiset Algorithm 2 takes on input a multiset of schemas of materialized views V.The algorithm replaces each schema v i V with a pair v i :f i where f i is a counter how many times the respective materialised view will be used at high workload time when processing the queries {q 1 , •••, q n } and it eliminates the schemas that do not significantly improve performance and the schemas which are to expensive to be created.
Algorithm 2 performs the following actions.1. Replace each v i V with a pair v i :0.

Iterate from
Let current schema of materialised view be v j := r j [X j ] where X i is a set of columns in r j .2.1.1.If X i = X j then increase a frequency counter f i in v i :f i by one and eliminate the duplicated schema v j .2.1.2.If v i v j or vi v j , then estimate the costs cost(v i ) and cost(v j ).Let a limit workload level for materialised views be l v .2.1.2.1.If l v > (cost(v i ) + cost(v j )) then we calculate the profit cost by computing as p = |cost(v i ) − cost(v j )| where p is profit.2.1.2.2.If p (cost(v i ) + cost(v j )) * 0.5), i.e. profit must be greater than 50% of total cost for both schemas then we take both schemas of materialized views (no elimination).Else if cost(v i ) > cost(v j ) then eliminate the v j and extend frequency f i .

Assuming that cost(r[a]) in the previous example is almost the same as cost(r[a, b]) then Algorithm 2 applied to a multiset
Algorithm 3 takes on input a set of schemas of materialized views V' created by Algorithm 2. The algorithm replaces each schema v i :f i V with v i :f i :c i where c i is an estimated cost of creation of materialized view v i .Then, it allocates schemas of materialized views V'' = {v 1 , •••, v n } within the limited workload l v and time t v for materialized views and execute them.Finally, Algorithm 3 verifies whether the queries q 1 , •••, q n benefit from the existence of the views in V''.
Algorithm 3 performs the following actions.1. Replace each v i :f i V with v i :f i :0. 2. Iterate from v 1 to v n in V. Let current schema of materialised view be v i = r i [X i ]:f i :c i where X i is schema of materialized view in r i .2.1.Estimate the cost of materialized view c i and add such value to v i :f i :c i .3. Sort the v i :f i :c i in descending order of c i *f i and update the V. 4. Iterate from v 1 to v n in V. Let current schema of materialized view be v i = r i [X i ]:f i :c i .Let estimated creation time for materialized view be i v , limited workload level be l v , and limited time be t v .4.1.If l v > c i and t v > i v then we create a materialized view v i and remove Iterate until all schemas are allocated into limited workload.5.In the final step, EXPLAIN PLAN statement is used to find the query processing plans for q 1 , •••, q n and to verify whether all materialized views created in the previous step are used by the queries.

Indexing
This section presents the algorithms that find the best index to be created in a low workload time.The algorithms improve performance in a database system and reduce the high workload period.Algorithm 4 processes a set of queries {q 1 , •••, q n }.Then, we transform operations 1 , •••, n into a sequence of sets of statements S = <S i , •••, S n >.Each statement in a set S i , for I = 1, •••, n takes a form s := (x y) where x, y are the arguments of the operation, and s is a result of operation (x y).The arguments x and y can be the database relational tables or the results of operations which computed earlier.
Algorithm 4 performs the following actions.1.Let a sequence of sets of statements S be empty.2. Formulate set of queries {q 1 , •••, q n } as expressions <e 1 , •••, e n > of an extended relational algebra.3. Iterate from e 1 to e n and reduce expressions into a single name of the temporary result then stop the iteration.
3.1.Let the new current set of statements be S i .3.2.Find all operations like (x y) in the expressions e 1 , •••, e n .3.3.Take each operation from the previous step and transforms into a form like s ij := (x y) and added into the current set of statements S i .3.4.Get all operations like (x y) from expressions e i , •••, e n and store into temporary result like s ij .Then append into current set Si.
As a simple example, consider the queries are q 1 , q 2 and q 3 .Their processing plans expressed as the expressions of extended relational algebras are q 1 :(a 1 r), q 2 :(a 1 (s 2 t)), q 3 :(b 1 (s 2 t)) where 1 and 2 are the operations and a, b, r, t, and s are the relational tables or the results of operations computed earlier.The transformation results of sets of statements are <{s 11 := (a 1 r), s 12 := (s 2 t), s 13 := (s 2 t)}, {s 21 := (a 1 s 12 ), s 22 := (b 1 s 13 )}>.
Algorithm 5 processes a sequence of sets of statements S. The algorithm appends each operation with a frequency like (x i y):f i where fi is a counter of how many time appear i in S. Then algorithm finds the common operations and eliminates one from a sequence of sets of statements and increase the frequency.
Algorithm 5 performs the following actions.1. Make all frequency be 0. 2. Iterate over the sets of statements <S 1 , •••, S p > in S and let the current set statements be S i .
2.1.For each statements from S i and let current statement be s ii := (X i ):f i where X i is (x i y).
2.1.1.For each statemets from S i − {s ii } and let current statement be s ij := (X j ):f j .2.1.1.1.If X i = X j then eliminate s ij and incease the counter f i = f i + 1. 2.1.1.2.Get eliminated statement s ij and replace all s ij with s ii in S.
Algorithm 6 processes updated a sequence of sets of statements S. The algorithm changes each statement s ii :f i S i with s ii :f i :o i where o i is estimated cost for index of operation i .Then, it allocates the indexes if they can fit into low workload l i and time t i .Algorithm 6 performs the following actions.1. Replace each s ii :f i with s ii :f i :0 in S.

Experiment
In our experiments, we show that our results are better than original respond time and cost.We used a synthetic TPC-H 4 GB benchmark relational database [11] which ready built relational tables by the user applications.We used commercial database software which includes procedure and packages to estimate the size of materialized view to compute the cost and virtual index to estimate the complexity of the operation for indexes.As a high workload, we choose eight complex queries from TPC's Template Set.We start our experiment by analyzing the queries and getting the schemas for materialized views which are necessary for each query.Then, we remove the duplicated set of schemas and compute the estimated cost and time for materialized view.Next, we generate the materialized view if they fit into limited workload time l m .After that, we verify weather all created materialized views are used by queries or not.The second part of our experiment analyzing the queries.Then, we get the sequence of the set of statements of operations.Next, we remove the duplicated operations and append the frequencies.After that, we create indexes when they can fit into limited workload time l i .
We did our experiments for several times to get the reliable results.Our experiments show that our algorithms achieve the better result than the original query.Table 1 shows that average percentages profit results of our algorithm is the best by comparing with the original one.The profits we achieve for executions times are range between 20% -65% and the costs are range between 66% -89%.

Summary and Conclusions
In this paper, we present the algorithms for automated Performance Tuning of Database System by using Materialized View and Indexing.The input is a set of low and high workloads of queries.The algorithm focuses on reducing high workload.To reduce the high workload, we execute tuning script (preparation stage) on low workload time.There are two main steps in this paper.One is how to create materialized views within the limited workload l v and time t v .The other is how to execute indexes on materialized views within the limited workload l i and time t i .
In the first stage, we analysis the queries and extract schemas of materialized views then, remove the duplicated schemas.Next, we execute the materialized view when it is fitted into limited workload time l v .The materialized view can extract the database schemas into smaller pieces of database schemas.It can store summarized data, precomputed joins with or without aggregations.Besides, it is suitable for large or important queries because it can eliminate the overhead associated with expensive joins and aggregations.Furthermore, it allows us to create indexes with minimal creation time and cost.

Figure 1 .
Figure 1.Sample circle for low and high workload. 1 , •••, S n > in S. Let current set of statements be S i .Let estimated creation time for index be i i , limited workload level be l i and limited time be t i .2.1.For each statement in S i like s ii := (x i y):f i :o i use a specification (x i y) and the estimated cost of index o i for operation i then, add such value to s ii :f i :o i .2.2.Sort the s ii :f i :o i in descending order of o i *f i and update the S i .2.3.Iterate each statement in S i .Let current statement be s ii := (x i y):f i :o i .2.3.1.The algorithm discovers the indexing by searching the type of operations like SELECTION, PROJECTION, NATURAL JOIN, SEMIJOIN, and so on.If i is not projection then processes below because we no need to create index on projections.If l i > o i and t i > i i then create the index.Then update the l i = l i − o i .

Table 1 .
Compare original queries execution time and cost with tuning queries execution time and cost: experiemnt results for q1-q8.