^{1}

^{1}

Automated performance tuning of data management systems offer various benefits such as improved performance, declined administration costs, and reduced workloads to database administrators (DBAs). Currently, DBAs tune the performance of database systems with a little help from the database servers. In this paper, we propose a new technique for automated performance tuning of data management systems. Firstly, we show how to use the periods of low workload time for performance improvements in the periods of high workload time. We demonstrate that extensions of a database system with materialised views and indices when a workload is low may contribute to better performance for a successive period of high workload. The paper proposes several online algorithms for continuous processing of estimated database workloads and for the discovery of the best plan for materialised view and index database extensions and of elimination of the extensions that are no longer needed. We present the results of experiments that show how the proposed automated performance tuning technique improves the overall performance of a data management system.

Database management systems used by the business organisations need expert database administrators (DBAs) to configure the systems before a startup time and to tune performance while the systems are running. Currently, to achieve acceptable performance of a database system an administrator has to tune the system by herself/ himself with her/his knowledge of the anticipated workload [

Self-tuning database management systems automatically create and drop persistent storage structures like indices and materialised views and dynamically re-allocate transient storage resources such as data buffer cache, library cache, and so on. Materialzied views, indices, and better management of cache in transient storage can speed up a database system to reduce response time.

Automated performance tuning of database systems is a challenging problem. At the periods of high workload, the query optimizer has to assign different schedules to execute the query statements that contribute to delays and high processing costs [

Materialized views are created as precomputed joins, stored aggregated data, stored summarised data and so on. Materializations are less expensive for joins and aggregations for processing of complex queries of high importance. In-addition, materializations can also increase the speed of query processing in large database systems which include expensive operations such a complex aggregation with joins. Moreover, it improves the performance of query processing by pre-calculating expensive operations on the database before execution and sorting the results in the database. Appropriate use od the materialized view and indices allows for optimal adjustment of the size of data sets to a given collection of queries through partitioning and creating indices on the database. For example, a materialised view restricts relational tables vertically only to the columns needed by a query. An index restricts relational tables horizontally only to the rows that satisfy the conditions used in a query.

The main objective of this research is to invent the algorithms to generate performance tuning task that reduces high workload and arrange the job within limited time and limited I/O cost level (limited workload). The algorithm uses the predicted query workload [

We present the algorithms, which are used to reduce a high workload for a given set of queries. The given workload can be a high or low workload. To reduce the high workload, we choose the best preparation plan before that high workload occurs. Besides, the best plan cannot be greater than the limited workload, and it also execute within the short time.

In this paper, we present two group of algorithms for automated tuning of database management systems. In the first step, we accept a set of queries which suppose to be processed in high workload time. Then, we find the minimized projections for each query and store the projections as the schemas of materialized views. After that, we minimize a set of schemas of materialized views and arrange them from high to low priority. Next, we check whether the selected schemas can execute within the limited workload time or not. Finally, we generate the materialized views. In the second step, we analyse the queries and extract the operations from extended relational algebra expressions. Then, we eliminate the duplicated operations and arrange them by priority level of cost multiply by frequencies of the operations. Next, we estimate the cost of indices and check whether the cost fits into limited workload time. Finally, we create indices on materialized views.

The major contributions of our work are the following.

・ We show how to discover the schemas needed to execute the queries and eliminate the duplicated one from given collection of queries q_{1}, ∙∙∙, q_{n} for extended database system with materialized view.

・ We show how to discover common operations from database extended with materialized view.

・ We show how to apply indexing technique on database extended with materialized view.

・ We show how to generate tuning scripts (preparation scripts) within the limited time and workload level.

The paper consists of 6 sections. The previous works in related research areas will discuss in the next section. Sections 3 and 4 explain the algorithms. Experimental results are presented and discussed in Section 5. Finally, Section 6 concludes the paper.

The manual processes are needed to achieve high-performance of the database system. It is an enormous effort and the complicated job which database management administrators (DBAs) or users are done [

There are some commercial databases tools like Oracle Database 11 g [

Moreover, the authors from [

Furthermore, there are many research works proposed indexing technique to tune the database system. Among them, the authors from [

Materialized view is one of the best techniques for tuning database. In this book [

Above researchers focused on how to achieve the best performance for given the set of queries, and they show the results that reduced cost and time for given the set of queries. Although, they did not describe the consuming cost and time to execute for tuning scripts or tuning processes. In our research, we discuss how tuning scripts occupy within limited time and cost before high workload occurs. Furthermore, we discuss how workload reduces by using our new techniques.

This section presents the algorithms that find the schemas of materialized views to be created in a period of low workload time. The algorithms maximise the performance improvements during a high workload period and minimise the costs of materialised views creation at a low workload period.

Algorithm 1 takes on input a set of queries {q_{1}, ∙∙∙, q_{n}} predicted for the nearest period of high workload. The output of the algorithm is a multiset of schemas of materialized views V = {v_{1}, ∙∙∙, v_{m}} that improve performance of query processing. Each v_{i} V is a schema of materialized view such as v_{i} = r_{i}[c_{i}_{1}, ∙∙∙, c_{ik}] where r_{i} is a relational table processed by one of the queries q_{j}{q_{1}, ∙∙∙, q_{n}} and {c_{i}_{1}, ∙∙∙, c_{in}} is the smallest set of columns from r_{i} needed to process a query q_{j}.

Algorithm 1. performs the following actions.

1. Make a multiset of schemas of materialized views V empty.

2. Iterate from q_{1} to q_{n}. Let current query be q_{i}. Let r_{i}_{1}, ∙∙∙, r_{in} be the relational tables processed by q_{i}.

2.1. Apply EXPLAIN PLAN statement to q_{i} to get query processing plans. Then, for each relational table r_{i} processed in q_{i} find its smallest projection ri[c_{i}_{1}, ∙∙∙, c_{ik}] needed to process a query q_{i}.

2.2. Append each projection r_{i}[c_{i}_{1}, ∙∙∙, c_{ik}] found to a multiset of schemas of materializations V.

3. Repeat until all the queries are processed.

As a simple example, we consider the queries q_{1}: SELECT a FROM r WHERE b > 10; and q_{2}: SELECT s.a, t.b FROM s JOIN t ON s.a = t.b; and q_{3}: SELECT r.a, s.a FROM r JOIN s ON r.a = s.a. The output from Algorithm 1 is a multiset V = {r[a, b], r[a], s[a], t[b], s[a]}.

Algorithm 2 takes on input a multiset of schemas of materialized views V. The algorithm replaces each schema v_{i} V with a pair v_{i}:f_{i} where f_{i} is a counter how many times the respective materialised view will be used at high workload time when processing the queries {q_{1}, ∙∙∙, q_{n}} and it eliminates the schemas that do not significantly improve performance and the schemas which are to expensive to be created.

Algorithm 2 performs the following actions.

1. Replace each v_{i} V with a pair v_{i}:0.

2. Iterate from v_{1} to v_{n} in V. Let current schema of materialised view be v_{i} = r_{i}[X_{i}] where X_{i} is a set of columns in r_{i}.

2.1. Iterate from v_{1} to v_{n} in V - {v_{i}}. Let current schema of materialised view be v_{j} := r_{j}[X_{j}] where X_{i} is a set of columns in r_{j}.

2.1.1. If X_{i} = X_{j} then increase a frequency counter f_{i} in v_{i}:f_{i} by one and eliminate the duplicated schema v_{j}.

2.1.2. If v_{i} v_{j} or vi v_{j}, then estimate the costs cost(v_{i}) and cost(v_{j}). Let a limit workload level for materialised views be l_{v}.

2.1.2.1. If l_{v} > (cost(v_{i}) + cost(v_{j})) then we calculate the profit cost by computing as p = |cost(v_{i}) − cost(v_{j})| where p is profit.

2.1.2.2. If p (cost(v_{i}) + cost(v_{j})) * 0.5), i.e. profit must be greater than 50% of total cost for both schemas then we take both schemas of materialized views (no elimination). Else if cost(v_{i}) > cost(v_{j}) then eliminate the v_{j} and extend frequency f_{i}.

Assuming that cost(r[a]) in the previous example is almost the same as cost(r[a, b]) then Algorithm 2 applied to a multiset V = {r[a, b], r[a], s[a], t[b], s[a]} returns a set V' = {r[a, b]:2, s[a]:2, t[b]:1}.

Algorithm 3 takes on input a set of schemas of materialized views V' created by Algorithm 2. The algorithm replaces each schema v_{i}:f_{i} V with v_{i}:f_{i}:c_{i} where c_{i} is an estimated cost of creation of materialized view v_{i}. Then, it allocates schemas of materialized views V'' = {v_{1}, ∙∙∙, v_{n}} within the limited workload l_{v} and time t_{v} for materialized views and execute them. Finally, Algorithm 3 verifies whether the queries q_{1}, ∙∙∙, q_{n} benefit from the existence of the views in V''.

Algorithm 3 performs the following actions.

1. Replace each v_{i}:f_{i} V with v_{i}:f_{i}:0.

2. Iterate from v_{1} to v_{n} in V. Let current schema of materialised view be v_{i} = r_{i}[X_{i}]:f_{i}:c_{i} where X_{i} is schema of materialized view in r_{i}.

2.1. Estimate the cost of materialized view c_{i} and add such value to v_{i}:f_{i}:c_{i}.

3. Sort the v_{i}:f_{i}:c_{i} in descending order of c_{i}*f_{i} and update the V.

4. Iterate from v_{1} to v_{n} in V. Let current schema of materialized view be v_{i} = r_{i}[X_{i}]:f_{i}:c_{i}. Let estimated creation time for materialized view be i_{v}, limited workload level be l_{v}, and limited time be t_{v}.

4.1. If l_{v} > c_{i} and t_{v} > i_{v} then we create a materialized view v_{i} and remove v_{i} from V'. Next, update the l_{v} = l_{v} − c_{i}.

4.2. Iterate until all schemas are allocated into limited workload.

5. In the final step, EXPLAIN PLAN statement is used to find the query processing plans for q_{1}, ∙∙∙, q_{n} and to verify whether all materialized views created in the previous step are used by the queries.

This section presents the algorithms that find the best index to be created in a low workload time. The algorithms improve performance in a database system and reduce the high workload period.

Algorithm 4 processes a set of queries {q_{1}, ∙∙∙, q_{n}}. Then, we transform operations _{1}, ∙∙∙, _{n} into a sequence of sets of statements S = i, ∙∙∙, S _{n}>. Each statement in a set S _{i}, for I = 1, ∙∙∙, n takes a form s := (x y) where x, y are the arguments of the operation, and s is a result of operation (x y). The arguments x and y can be the database relational tables or the results of operations which computed earlier. _{}

Algorithm 4 performs the following actions.

1. Let a sequence of sets of statements S be empty.

2. Formulate set of queries {q_{1}, ∙∙∙, q_{n}} as expressions 1, ∙∙∙, e _{n}> of an extended relational algebra. _{}

3. Iterate from e_{1} to e_{n} and reduce expressions into a single name of the temporary result then stop the iteration.

3.1. Let the new current set of statements be S_{i}.

3.2. Find all operations like (x y) in the expressions e_{1}, ∙∙∙, e_{n}.

3.3. Take each operation from the previous step and transforms into a form like s_{ij} := (x y) and added into the current set of statements S_{i}.

3.4. Get all operations like (x y) from expressions e_{i}, ∙∙∙, e_{n} and store into temporary result like s_{ij}. Then append into current set Si.

As a simple example, consider the queries are q_{1}, q_{2} and q_{3}. Their processing plans expressed as the expressions of extended relational algebras are q_{1}:(a_{1}r), q_{2}:(a_{1}(s_{2}t)), q_{3}:(b_{1}(s_{2}t)) where _{1} and _{2} are the operations and a, b, r, t, and s are the relational tables or the results of operations computed earlier. The transformation results of sets of statements are <{s_{11} := (a_{1}r), s_{12} := (s_{2}t), s_{13} := (s_{2}t)}, {s_{21} := (a_{1}s_{12}), s_{22} := (b_{1}s_{13})}>.

Algorithm 5 processes a sequence of sets of statements S. The algorithm appends each operation with a frequency like (x_{i}y):f_{i} where fi is a counter of how many time appear _{i} in S. Then algorithm finds the common operations and eliminates one from a sequence of sets of statements and increase the frequency.

Algorithm 5 performs the following actions.

1. Make all frequency be 0.

2. Iterate over the sets of statements 1, ∙∙∙, S _{p}> in S and let the current set statements be S _{i}. _{}

2.1. For each statements from S_{i} and let current statement be s_{ii} := (X_{i}):f_{i} where X_{i} is (x_{i}y).

2.1.1. For each statemets from S_{i} − {s_{ii}} and let current statement be s_{ij} := (X_{j}):f_{j}.

2.1.1.1. If X_{i} = X_{j} then eliminate s_{ij} and incease the counter f_{i} = f_{i} + 1.

2.1.1.2. Get eliminated statement s_{ij} and replace all s_{ij} with s_{ii} in S.

According to Algorithm 5, we found that s_{12} and s_{13} are the same and eliminate the s_{12} then, extend the frequency by one. After that, algorithm replace s_{13} with s_{12} in S. Then, algorithm return the S = <{s_{11} := (a_{1}r):1, s_{12} := (s_{2}t):2}, {s_{21} := (a_{1}s_{12}), s_{22} := (b_{1}s_{12})}>.

Algorithm 6 processes updated a sequence of sets of statements S. The algorithm changes each statement s_{ii}:f_{i} S_{i} with s_{ii}:f_{i}:o_{i} where o_{i} is estimated cost for index of operation _{i}. Then, it allocates the indexes if they can fit into low workload l_{i} and time t_{i}.

Algorithm 6 performs the following actions.

1. Replace each s_{ii}:f_{i} with s_{ii}:f_{i}:0 in S.

2. Iterate over the sequence of sequences 1, ∙∙∙, S _{n}> in S. Let current set of statements be S _{i}. Let estimated creation time for index be i _{i}, limited workload level be l _{i} and limited time be t _{i}. _{}

2.1. For each statement in S_{i} like s_{ii} := (x_{i}y):f_{i}:o_{i} use a specification (x_{i}y) and the estimated cost of index o_{i} for operation _{i} then, add such value to s_{ii}:f_{i}:o_{i}.

2.2. Sort the s_{ii}:f_{i}:o_{i} in descending order of o_{i}*f_{i} and update the S_{i}.

2.3. Iterate each statement in S_{i}. Let current statement be s_{ii} := (x_{i}y):f_{i}:o_{i}.

2.3.1. The algorithm discovers the indexing by searching the type of operations like SELECTION, PROJECTION, NATURAL JOIN, SEMIJOIN, and so on. If _{i} is not projection then processes below because we no need to create index on projections. If l_{i} > o_{i} and t_{i} > i_{i} then create the index. Then update the l_{i} = l_{i} − o_{i}.

In our experiments, we show that our results are better than original respond time and cost. We used a synthetic TPC-H 4 GB benchmark relational database [

We start our experiment by analyzing the queries and getting the schemas for materialized views which are necessary for each query. Then, we remove the duplicated set of schemas and compute the estimated cost and time for materialized view. Next, we generate the materialized view if they fit into limited workload time l_{m}. After that, we verify weather all created materialized views are used by queries or not. The second part of our experiment analyzing the queries. Then, we get the sequence of the set of statements of operations. Next, we remove the duplicated operations and append the frequencies. After that, we create indexes when they can fit into limited workload time l_{i}.

We did our experiments for several times to get the reliable results. Our experiments show that our algorithms achieve the better result than the original query.

In this paper, we present the algorithms for automated Performance Tuning of Database System by using Materialized View and Indexing. The input is a set of low and high workloads of queries. The algorithm focuses on reducing high workload. To reduce the high workload, we execute tuning script (preparation stage) on low workload time. There are two main steps in this paper. One is how to create materialized views within the limited workload l_{v} and time t_{v}. The other is how to execute indexes on materialized views within the limited workload l_{i} and time t_{i}.

In the first stage, we analysis the queries and extract schemas of materialized views then, remove the duplicated schemas. Next, we execute the materialized view when it is fitted into limited workload time l_{v}. The materialized view can extract the database schemas into smaller pieces of database schemas. It can store summarized data, precomputed joins with or without aggregations. Besides, it is suitable for large or important queries because it can eliminate the overhead associated with expensive joins and aggregations. Furthermore, it allows us to create indexes with minimal creation time and cost.

Name | Description | Execution Time | I/O Cost | Name | Description | Execution Time | I/O Cost | |
---|---|---|---|---|---|---|---|---|

q_{1 } | Original | 00:24.00 | 84,436 | q_{2 } | Original | 00:38.10 | 82,436 | |

Tuning | 00:07.70 | 8,835 | Tuning | 00:16.60 | 9,330 | |||

Profit% | 63.89% | 89.28% | Profit% | 56.54% | 88.68% | |||

q_{2} | Original | 00:34.30 | 124,928 | q_{4} | Original | 01:47.90 | 158,720 | |

Tuning | 00:11.40 | 41,587 | Tuning | 00:40.10 | 41,597 | |||

Profit% | 66.81% | 66.71% | Profit% | 44.92% | 73.79% | |||

q_{5} | Original | 01:06.30 | 131,072 | q_{6} | Original | 02:03.60 | 159,744 | |

Tuning | 00:44.90 | 41,594 | Tuning | 01:06.90 | 41,591 | |||

Profit% | 32.21% | 68.27% | Profit% | 45.83% | 73.96% | |||

q_{7} | Original | 01:32.70 | 155,648 | q_{8} | Original | 01:15.20 | 135,168 | |

Tuning | 00:51.50 | 41,602 | Tuning | 00:59.40 | 41,596 | |||

Profit% | 44.31% | 73.27% | Profit% | 20.71% | 69.23% |

In the second stage, we extract the operations from queries. Then, we eliminate duplicated operations and create the index over materialized view they are fitted on limited workload time l_{i}. There are different ways to choose the best index and generally, an index created based on WHERE clause. In this paper, we make a decision based on operations like SELECTION, PROJECTION, NATURAL JOIN, SEMIJOIN, and so on. Finally, we set up our experiments base on our algorithms, and the results show that our methods perform better than original execution time and cost.

Nan N. Noon,Janusz R. Getta, (2016) Automated Performance Tuning of Data Management Systems with Materializations and Indices. Journal of Computer and Communications,04,46-52. doi: 10.4236/jcc.2016.45007