Stochastic Modeling of Database Backup Policy for a Computer System

As the computer system has developed much in this highly information-oriented society, database security has become a very important problem and its backup strategies need to be made more efficiently and safety. The image copy method has been used as the most simple and dependable recovery mechanism for media failure. However, this method spends high overhead costs for massive data transmission and much processing time in the normal operation of the database. To cover such weak points, incremental and full backup methods are adopted before updated trucks reach a pre-determined level. Moreover, when the number of full backup files exceeded a predetermined level, we stop incremental and full backups and switch it to the image copy. This paper applies cumulative damage model to backup of files in a database system, by putting damage shock by update, failure shock by database failure and damage by dumped files, and considers the tradeoff among overhead costs of image copy and incremental, full backup methods, and discusses analytically an optimal policy for the image copy backup interval. Finally, numerical examples are given in the case of Poisson process and exponential distributions.


Introduction
In this highly information-oriented society, database security [1] in computer systems has become a very important problem.Some recovery techniques in a database management system have to be prepared previously for emergency of troubles, using backup, check pointing, reorganization, fault-tolerant technologies [2,3].The backup service and its protection techniques have been studied and programed by many researchers and engineers, e.g., the recent work Peer-to-Peer (P2P) backup system [4][5][6], backup and protection schemes for a distributed system [7,8].
When we refer to the backup techniques for the database systems, image copy [9] is the most simple and dependable method to ensure the safety of data and is always to take the backup copies of all files in other places and to take out them if files in the original secondary media are broken.However, this method takes many hours, storages and costs when files become large because it stores all files.Frequent image copy backup seems not be so reasonable so that it is often restricted to a weekly or monthly schedule, although the increasing speed and capacity of backup media could make overnight backup to be a more realistic proposition.To make the backup copies efficiently, we might dump only files that have changed since the last backup, which is called an incremented backup [9].This would lessen significantly both time and size of backup.Further, the recovery techniques for database failures and the backup schemes for broken hard disks were studied [2, 10,11].
However, there are only a few studies that focus to the scheduling problems of backup and recovery methods, although many related techniques are available.That is, we propose that backup schedule is a stochastic decision making process from the viewpoints of management and can be analyzed by the theory of stochastic processes.Especially, with regarding to backup modeling, there have been very few research papers that studied analytically optimal policies for a database system.Most problems were concerned with several ways to introduce backup methods in techniques.Optimal full backup policies and incremental backup schedules have been studied [12][13][14].Even so, as referred as in [9], image copy backup is necessary to any database system for its whole security strategy, although it is restricted to be performed weekly or monthly until now.Such a strict periodic image copy backup policy is unreasonable and could be optimized from the following two points: 1) this backup technique has its superiority that it could copy all files without any compression and make backup simpler and reliable; 2) its performance needs to be combined with incremental and full backup schedules, i.e., when the total cost for incremental and full backups exceeds some level, it is reasonable to do such an image copy backup to renewal the database.
Thus, we propose the following backup policy which ensures the safety of data and saves hours: The image copy is carried out at scheduled times, and between these backups, the incremental backup or full backup at each files, which takes all copies of newly updated files since the image copy, is done.That is, the image copy with large overhead is done at long interval and the incremental and full backups with small overhead are done at short interval.
In this paper, we apply the cumulative damage model [15,16] to the backup of files in a database system, by putting damage shock by update, failure shock by database failure and damage by dumped files.These models, which play an important role in reliability theory, are considered as a sequence of shocks that occur randomly in time and give some amount of damage to a unit.The damage is accumulated to the current damage level, weakens the unit gradually, and makes it failure when the total damage exceeds a failure level [15].As applications of cumulative damage processes in computer science, such models have been applied successfully to backup policies for a database system [12][13][14] and garbage collection policies in memory management [17,18].
The following sections are organized as: Section 2 introduces working schemes of incremental, full and image copy backups, by taking a database system with n files as an example.Section 3 formulates the stochastic backup model, which combines the incremental, full and image copy backup, i.e., the incremental backup is done when the number of updated trucks does not exceed a certain threshold value to an individual file; the full backup is done when the number of updated trucks exceeds a certain threshold value to an individual file; and the image copy backup is done at a planned time and when the number of full backup files exceeds a certain threshold value in a database.Then, we introduce costs suffered for the overheads of three backups and obtain the expected cost rate between the image copies.Section 4 discusses an optimal interval of the image copy that minimizes the expected cost in Section 3, and in Section 4, we compute the model and its policies as a numerical example in the case of Poisson process and exponential distribution.Finally, in Section 5, concluding remarks and further studies are given.

Backup Schedule
We consider a database system with n files that are composed of the same size of trucks.In this database system, we consider an incremental backup that takes all copies of newly updated trucks since the previous image copy and its overhead is increasing with the number of newly updated trucks.For example, if all updated trucks are included in the previous updated ones, then the number of transferred data is the same as the previous one.However, if the updated trucks have some different ones from the previous ones, then the number of transferred data is increasing by their differences.Taking out copies of previous backups can make the recovery of a database easily and rapidly, when some errors have occurred in storage media.
For example, we consider a database with 6 files: In Figure 1, when the updated track exceeds a threshold value Z at each file, e.g., (a), (e) and (f), we dofull backup to these files.Moreover, files (b)-(d) which do not exceed the threshold value Z execute the incremental backup.In Figure 2, when the number of full backup files in the database exceeds a threshold value, we do the image copy backup for this database.In this figure, when the number of 5/6 files in the database is targeted, e.g., files (a)-(c), (e) and (f) are needed to be done by full backup, we stop the full and incremental backups and switch it to the operation of image copy backup.
It is well known that when the number of updated trucks exceeds a threshold level Q in a file, the overhead of incremental backup is larger than that of full backup [10].The value of Q/m is about 60% [10] in a database system, where m is the total trucks in a file.Thus, if the number of updated trucks exceeds a level Z (0 < Z ≤ Q), we should make the full backup instead of the incremental backup.Moreover, when the number of updated files exceeds a threshold value in a database, the overhead of full backup is larger than that of image copy backup [9].
Figure 3 shows in the incremental backup method that a usual backup storage volume is small, however, it is necessary to preserve all backed up files.Therefore, the  accumulation of incremental backup files becomes large storage volume.In the full backup method, the storage volume is equal to the size of the object files.Moreover, in the image copy backup method, the storage volume is the same size of database.
From above discussions, it becomes an important problem in actual backup schemes when to create an image copy.We want to lessen the number of the image copy with large overhead, however, the overheads of the incremental and full backup increase adaptively with the number of newly updated trucks and files.From this point of view and we should decide the image copy interval, by comparing the overheads among three backups.

Expected Cost
We formulate the following stochastic backup models: The image copy backup is done at a planned time T and database initialization is made at such a scheduled time.The incremental backup is done when the volume of updated trucks does not exceed a certain threshold value Z to an individual file.The full backup is done when the volume of updated trucks exceeds a certain threshold value Z to an individual file.
It assumed that a database system with n files is updated according to a nonhomogeneous Poisson process with an intensity function   t  and a mean-value function  , R t i.e., [15].Then, the prob- ability that j-th update occurs exactly during (0, t] is where   0.
R t  Further, let j W denote an amount of trucks of each file, which they are updated or are new created since the last backup at the j-th update.It is assumed that each j W has an identical probability distribution Then, the total amount of updated trucks and Then, the probability that the total amount of updated trucks exceeds exactly a threshold level K at the j-th update is [15].Let Z t be the total amount of updated trucks at time t.Then, the distri- Suppose that when the total amount of updated trucks exceeds a threshold level Z at time we want to do the full backup for this file.When the total amount of updated trucks does not exceeds Z, we do the incremental backup, and this probability is Oppositely, when the total amount of updated trucks exceeds Z, we do the full backup, and this probability is Suppose that there exist n files in the database.However, it would be useless to do separately backup policies for each file.It is assumed that if the total amount of up- files exceeds a threshold level Z at time T, then the full backup is done for such j files.In this case, the probability that the image copy backup is done for this database is If the total amount of updated trucks of files exceeds a threshold level Z at time T, then the full backup is done for such j files.In this case, the probability that full backup is done for the database is  0,1, , 1 Next, we introduce the following costs: 1 Full and incremental backup costs per unit of time when the number of files whose updated trucks exceeds Z is less than K files.
The first item of right-hand side of ( 8) is the full and incremental backup cost and the second item is the image copy backup cost.Note that   0 C  .If , T   then the policy corresponds to the image copy only at the total amount of updated trucks exceeds the threshold level Z, and the expected cost rate is where which is the mean time until the total updated trucks exceeds K.

Optimal Policy
We obtain an optimal time whose left-hand side is strictly increasing from 0 to .
 , then there exists a finite and unique (12) and the resulting cost rate is

Numerical Example
Suppose that Then, we have Thus, ( 4) is rewritten as i.e., where We obtain the optimal value which satisfies (18), and the resulting expected cost rate is  it is necessary to do the image copy back up every month.

Conclusions
We have considered the problem when to make the incremental, full and image copy backups, under the assumptions that the overhead of backups depends on the total amount of newly updated files in a database.We have obtained the expected cost rate until the image copy backup, and have discussed the optimal interval of the image copy backup that minimizes it.It has been shown that the optimal interval is given by a finite and unique solution of an equation.
As a further problem, it would be necessary to consider the model where the backup of only newly updated files is made, and to compare it with the model studied in this paper.

Figure 1 .Figure 2 .
Figure 1.Execution of full and incremental backups in a database.

Figure 3 .
Figure 3. Storage volume every time of incremental, full and image copy backups.

: c 2 3
Image copy backup cost per unit of time when the number of files whose updated trucks exceeds Z is more than K files with .Cost of database backup and initialization at time T. The expected cost rate is

Tables 1
and 2 indicate the cost rates interval of becomes long.This shows that we should lengthen the image copy backup interval, if the value of 3 c is large compared to the value of From the tables, when When the unit of T is a day, we should execute the image copy backup

Table 2 . Optimal T * when µZ = 10.0. .
T  becomes 30.When the unit of T is a day, we should execute the image copy backup every 30 days.In the real world, we can request the value of