New Synchronization Algorithm Based on Delta Synchronization for Compressed Files in the Mobile Cloud Environment

The fast growing market of mobile device adoption and cloud computing has led to exploitation of mobile devices utilizing cloud services. One major challenge facing the usage of mobile devices in the cloud environment is mobile synchronization to the cloud, e.g., synchronizing contacts, text messages, images, and videos. Owing to the expected high volume of traffic and high time complexity required for synchronization, an appropriate synchronization algorithm needs to be developed. Delta synchronization is one method of synchronizing compressed files that requires uploading the whole file, even when no changes were made or if it was only partially changed. In the present study, we proposed an algorithm, based on Delta synchronization, to solve the problem of synchronizing compressed files under various forms of modification (e.g., not modified, partially modified, or completely modified). To measure the efficiency of our proposed algorithm, we compared it to the Dropbox application algorithm. The results demonstrated that our algorithm outperformed the regular Dropbox synchronization mechanism by reducing the synchronization time, cost, and traffic load between clients and the cloud service provider.


Introduction
Mobile devices play an essential role in everyday life owing to their communication effectiveness and mobility features.Mobile companies compete to build business models based on developing mobile applications that can assist people Various mobile platforms (e.g., ios and Android) compete to enrich their users with powerful applications that might include cloud applications.
Cloud computing has been widely recognized as the next generation in computing infrastructure; it offers many advantages such as allowing users to employ infrastructure devices (i.e., servers, networks, and storages), platforms (i.e., middleware services and operating systems), and software (i.e., application programs) provided by cloud providers (e.g., Google, Amazon, Azure, and Oracle) at low cost.In addition, cloud computing enables users to utilize resources in an on-demand manner.As a result, both academia and the IT industry have been encouraged to improve and merge mobile computing and cloud computing together into mobile cloud computing (MCC) and provide possible solutions for any challenges associated with this new environment.One of these challenges is mobile cloud synchronization [1], which is the focus of the present study.For mobile cloud synchronization, many IT companies are competing to determine the best solutions for any associated problems and are trying to determine the type of data that the user requires to be portable.
MCC is a combination of mobile computing, cloud computing, and mobile internet, and it integrates the advantages of these three technologies.Therefore, it can be called cloud computing for mobiles.
At its simplest, MCC refers to an infrastructure where data storage and data processing are both performed from a mobile device, where applications transfer the computing power and data storage onto the cloud [2].Therefore, various business opportunities for mobile network operators, as well as cloud providers, arise as the large computing power of the cloud is harnessed to benefit different types of mobile users.
As shown in Figure 1, the architecture of mobile cloud computing consists of two main components: mobile computing and cloud computing [3].The mobile computing component consists of various mobile devices such as smart phones, PDAs, and laptops.These devices are normally connected to a network via different communication technologies that enable the mobile user to send requests to the cloud service provider so adequate resources can be allocated to the mobile devices throughout the established connection.By starting the web application, the monitoring and calculating functions of the system are implemented to guarantee that quality of service is maintained until the connection is completed and the task is performed.This process includes accomplishing tasks such as sending responses rapidly, synchronizing files, and load balancing to ensure that the resources are allocated fairly to the appropriate clients.
MCC has a major benefit of moving data storage and computing power from mobile devices to the cloud, with other benefits including solving the problem of short battery life via transferring the execution of extensive communications to the cloud and centralizing security that might be useful in detecting patterns of security threats.
However, many challenges arise for MCC such as synchronization, quality of communications (i.e., wired, wireless, 3G, and LTE), quality of service (i.e., latency, delay, and bandwidth), mobile device classification (i.e., high-end, lowend, battery, CPU, and RAM), and individualization of mobile for various available operating systems.
In this context, we need to introduce cloud storage.Figure 2 briefly illustrates the evolution of cloud storage that is based on the traditional network storage and on hosted storage [4].Applications employ traditional network storage to  The main responsibilities of these cloud storage providers are to maintain the data and ensure it is available and accessible to its clients, as well as to protect the physical environment and ensure it remains running efficiently.Organizations and people lease or buy storage capacity from these cloud providers to store data for users, organizations, and/or applications [5].
Cloud storage systems occur in many different varieties, i.e., several storage systems are specialized and have a specific focus (e.g., the storage of web email messages or the storage of digital pictures), while other cloud storage systems are utilized to store digital data of all forms.Certain cloud storage systems might be described as small operations, while others are so large that their physical equipment can fill up an entire warehouse.The combined facilities that house and compose the cloud storage systems are known as data centers.Each cloud storage system requires one data server and a connection to the Internet via an ISP.Any client (e.g., mobile user subscribing to any cloud storage service) sends their file(s) over the Internet to the cloud data server and then the cloud data server records the sent information.If the client requested action is to retrieve the information, he/she accesses the corresponding cloud data server via an interface that is web-based.The cloud server can then either send the requested file(s) back to the client who made the request or allow the client to manipulate and access the file(s) on the server itself.
Synchronization is a method to synchronize a single set of data (such as a user's contact list) by automatically copying any changes back and forth between the mobile device and the cloud storage.This process ensures that the user's data can be easily viewed from anywhere by any authorized mobile device.For example, mobile users of Google Drive and Dropbox can access and share their own data in mobile environments.In such applications, a user can add or modify a file in his/her local folder and this update is then automatically reflected (synchronized) by a cloud server.In the case of sharing a file with other users, the cloud server takes the responsibility of synchronizing this shared file with all sharing users.If this file has frequent updates, it generates a significant amount of traffic during the synchronization process, which leads to challenges with trying to determine ways of reducing the synchronization traffic and its required time.These challenges constitute one of the most important issues in applications that employ data sharing [6].
A major issue in MCC synchronization is how to guarantee the data that is required to be synchronized given the different uncertainties, e.g., disconnections before process completion and insufficient cloud storage space.The present study addresses the problems of synchronization, discusses a synchronization method utilized by one major cloud service provider, and proposes a new method of synchronization.This work proposes an algorithm, based on Delta synchronization, to solve the problem of synchronizing compressed files under various forms of modification (e.g., not modified, partially modified, or completely modified).To measure the efficiency of our proposed algorithm, we compared it to the Dropbox application algorithm.This paper is organized as follows.Section 2 covers a literature review related to the research, while section 3 explains the concept of Delta synchronization.In section 4, the research problem is defined, and a theoretical solution and our proposed algorithm design are described.The experimental setup is shown in section 5, while results are discussed in section 6. Section 7 provides the conclusions of the study.

Literature Review
Many mobile cloud applications share data among various users and synchronize with their corresponding cloud servers.In [7], an Update-triggered Delta Synchronization (UDS) algorithm was proposed, which is a popular solution for data synchronization in mobile cloud applications.
Delta encoding computes the difference (i.e., Delta) between the old and new versions of a file.When a shared file is modified, only the difference in the file computed by the algorithm is notified to the users and the cloud server for synchronization.If the shared file is frequently modified, and a large number of users are sharing the file, the proportion of synchronization traffic of the UDS algorithm is deemed significant.Therefore, a proxy-based aggregated synchronization scheme has been proposed to reduce the synchronization overhead when frequent updates were applied to a shared file being accessed (shared) by many users [8].
An efficient Delta synchronization (EDS) algorithm that improves the performance of the UDS has been proposed [9].The algorithm aggregates updated data (i.e., Delta) in a step aimed at reducing the network traffic, and then synchronization is applied to the aggregated updates in a periodic manner to reflect modifications and satisfy consistency of files.In MCC, the cloud service is utilized to resolve the resource constraint problem of mobile device offloading, which causes excessive mobile battery consumption.
In Min & Hei work [10], a response time-based model was proposed, where data synchronization was perceived to estimate the efficiency of the offloading scheme in terms of response time.The goal was to improve the accuracy of the response time estimation when the cloud processes the requested task received from any mobile device.Domingos et al. [11] proposed adapting the Synchronization Algorithms based Message Digest to the mobile environment to minimize the usage of mobile device resources (i.e., reducing the amount of data transmitted during the synchronization process and reducing the processing carried out by the mobile device) by using only standard Structure Query Language (SQL) queries.The major goal for this proposed model was to minimize the amount of data exchange.
Furthermore, the authors of [12] proposed a model to Improve Synchronization Algorithms based on Message Digest (ISAMD) to resolve inconsistencies between the server-side database and the mobile client database.
Asynchronous data synchronization model that blocks the user interface during synchronization to prohibit any updates during the synchronization process has also been proposed [13], although this model disables other processes during synchronization and is disliked by the majority of users.

Delta Synchronization
Delta synchronization is one of the available designs that can assist in solving the challenges facing synchronization [14].It is a new feature that increases the speed of transfer of updated files by only uploading/downloading the modified parts of the files, instead of sending/receiving the entire files whenever the files are changed.

Solution
A variant of Delta Synchronization called EDS is shown in Figure 4.In EDS, a given file F is shared by N+1 mobile nodes and the file is synchronized with the corresponding cloud server.Let F' be a newer version of F and d denotes the difference between F and F'.The nth update of d is denoted by d n .In this environment, a mobile node can be one of the following two types: a write node (i.e., W-node) or a read-only node (i.e., R-node).To avoid simultaneously updating the F file from more than one node, only one node is allocated as a W-node while the remaining M nodes are allocated as R-nodes.Regardless of whether the file is compressed or not, the W-node is able to update d of the file F onto the cloud server.During the writing connection, the update process requires a T time interval.At the end of the writing connection of the W-node (i.e., once the process is terminated) another new W-node is selected.On the other side of this process, the file F can be read by activating the R-nodes during the T time interval.Any activated R-node during the read-only operation of the file F is called an activated R-node.
The synchronization process commonly involves extra overhead owing to the setup and maintenance of the TCP/HTTP(S) and the delivery of metadata between the cloud server and the R-node.Aggregating the updates from the W-node in the cloud server and propagating them to all the R-nodes to complete synchronization later greatly assists in mitigating the traffic overhead.As a result, a d update from the W-node is determined to be either aggregated in the cloud server or synchronized with all R-nodes at each T time interval.Although update aggregation can mitigate the traffic of synchronization, it can also cause inconsistency between the activated R-nodes and the W-node; in such case and when the updates are not yet modified, the activated R-node could possibly use an F file that is outdated.

Motivation
Even though Delta synchronization assists in solving the problem of redundant data synchronization that decreases the cloud storage usage, it is problematic when dealing with new mobile device users who frequently access large files.
Generally, large files (e.g., video, music, digital photo, Photoshop files, among others) are stored compressed; therefore, a change in a small part of a compressed file requires re-synchronizing the whole file because Delta synchronization does not deal with file parts.Therefore, there is a need for the proposal of a new model to be developed to overcome this problem.

Proposed Method
Delta synchronization is a reasonable solution when handling regular files.
However, it significantly fails when handling large compressed files, with the main source of failure being that the whole file has to be re-synchronized instead of merely updating the sub-files.This causes network congestion and reduces the available bandwidth, especially when synchronizing mobile devices in a wireless environment.
One possible solution to the problem of synchronizing large compressed files is to synchronize only the altered sub-files contained within the compressed file.
Given that compressed files normally have many sub-files, a logical method to update the files contained within is to decompress the files, synchronize the updated sub-files, and then recompress the files.This process might take a considerable amount of time, especially when there is a large number of compressed files.Therefore, the development of a method to synchronize only the changed sub-files without having to send the whole folder and have to go through the lengthy process of extracting and compressing is required.Such a method would reduce the time, traffic overheads, and synchronizing costs.In the present study, we propose a method (an algorithm) that will only update the sub-files of a large compressed file.Specifically, the proposed algorithm employs the concepts of hashing or hash functions.Hashing is utilized to convert several data types into a relatively small number that could serve as digital "fingerprint" data.The algorithm used in hashing is the one that manipulates the data to create the fingerprints that are called hash values or hash codes.
Figure 5 illustrates a client who has a database that contains a hash code for each synchronized file and its sub-files (of the compressed file), i.e., a client who has a compressed file and needs to synchronize it.Our synchronizing algorithm works by first calculating the hash code and, based on it, assumes one of three cases: 1) the compressed file is new, 2) the compressed file is not updated, and 3) the compressed file is updated.In the third case, four different scenarios for the sub-files are possible: a) the sub-file is new (No Update is Needed), b) the sub-file exists but has not been updated (Inserting a New Sub-file), c) the sub-file exists and has been updated (Updating an Existing Sub-File), and d) the sub-file has been deleted (Deleting an Existing Sub-file).At the end, the algorithm applies the necessary synchronization and deletes the resulting lists.A detailed description of our proposed synchronization algorithm is described in Figure 6.

Experimental Design
The hardware and software utilized play an important role in the performance of any algorithm subject evaluation or study.For algorithm implementation and experiment design in the present study, a laptop with regular hardware (i.e., Core i5-5200U CPU @ 2.20 GHz, 8 GB RAM) and software (i.e., Windows 10 Pro x64) specifications was utilized.We implemented our proposed algorithm under Visual Studio 2010, with C#.Furthermore, MD5 hashing function was already embedded within the .Net 4.0.Even though MD5 has some security concerns and can be figured by simple reverse engineering technique such as Brute-force attacks, it was selected because of its calculation speed.The disadvantages of using MD5 fall outside the scope of the present study and should be discussed further in the future.
For simplicity, a small database architecture was created.The database contained just one table with three columns (Table 1).The File Parent column was utilized to hold the name of the compressed file, while the File FullName column was utilized to hold the name of the sub-file(s).Finally, the File Hash was utilized to hold the value of the Hash code of File FullName.
The main motive behind creating this database was to summarize files saved on the client's device.In addition, changes during the synchronization process were saved as there was no need to monitor each change in all of the client's files.

File FullName Text
File Hash Text

Results and Discussion
To analyze the results of our proposed algorithm, we conducted two different tests (Test 1 and Test 2).In Test 1, the time required to perform an operation was recorded and compared between various file sizes, while in Test 2 the proposed method was benchmarked in a live environment against a popular cloud service provider.Results of the Test 1 experiment are reported in Table 2, which summarizes the results we recorded running our proposed algorithm on a number of different compressed files, each of which had a different file size and file type (e.g., text, image, audio, and video).For each compressed file, five different synchronization times for five different cases were recorded.First, the required time needed for inserting a new compressed file was recorded.Then, the time necessary to determine that there was no need to update was calculated, i.e., the time required for calculating and comparing the hash codes, the time needed when a new sub-file had to be inserted, the time needed for updating that file, and the time needed to delete a sub-file.Regarding the state where No Update is Needed involved synchronizing the same file without any changes.It is shown in Table 2 that our proposed algorithm performed very well as it did not upload any file and took a relatively short time to ensure that no changes were available while calculating the HashCode.Inserting a New Sub-file was concerned with the state where a new file had been inserted into the compressed file.In our proposed algorithm, only the new sub-file was synchronized instead of updating the whole compressed file.Updating an Existing Sub-File case dealt with updating one of the sub-files in the compressed file; therefore, the updated sub-file was synchronized instead of synchronizing the whole compressed file.Finally, Deleting an Existing Sub-file was the state when the existing sub-file available inside the compressed file had been deleted.Our proposed algorithm was capable of deleting the targeted sub-file from the compressed file, and there was no need to synchronize the whole compressed file.
The result revealed that the time for calculating the HashCode (no update) was consistently the minimum for each of the nine files (Table 2, column 3) and the maximum time needed was for inserting a new compressed file (Table 2 column 2).As shown in Table 2 and by comparing the recorded times for all cases, it can be concluded that our proposed algorithm significantly reduced the time needed to perform a specific operation (e.g., insert, update, delete, and so on) resulting in a reduced cost.However, the sub-files that were subject to modifications in all cases were mainly small files; hence, the experiment was subject to file type/size adjustments.
To benchmark the efficiency of our algorithm, we compared our results with a popular live cloud synchronizing environment, the Dropbox application (Dropbox, Inc., headquartered in San Francisco, California, USA, [15]).In this test, called Test 2, Case 1, we developed an application to upload a file to Dropbox while measuring the time needed for uploading the file.Table 3 summarizes the performed experiments.In brief, five different groups, each of which contained three files of the same type (e.g., text, image, audio, video, and mixed type) with various file sizes were utilized.
As shown in Table 3, the original file size before and after compression was calculated before performing other operations.For file compression, we applied the free WinZip tool embedded within Windows.Further, the compression ratio (CR) was calculated and then reported as shown in Equation ( 1): where, OFS denotes the original file size and CFS represents the compressed file size.Table 3 reveals that the compression ratio for text files was higher than for other file types, and reports the time required to insert the compressed file completely, while the other reported results are similar operations to those explained for Table 2.It can be seen that the Dropbox application did not reduce the synchronization time when any change occurred during the synchronization process for all file sizes and types as the whole compressed file had to be re-synchronized.The slight change in time between the different cases (e.g., Table 3 columns 5 through 9) were expected and could be attributed to the load on the Internet or the bandwidth of the network hosting the mobile.
We applied our proposed algorithm to the same dataset used in the Dropbox experiment (Test 2, Case 1) and recorded the results in Table 4; we called this experiment Test 2, Case 2.
In Table 4, it can be seen that the time needed to perform the same operation was reduced by a great magnitude when applying our proposed algorithm compared to the time required by the Dropbox application reported in Table 3.
The changes we applied to the sub-files for each row of Table 3 and Table 4 are recorded in Table 5.These changes were applied for Case 1 and Case 2 of Test 2. Table 6 contains the performance gain achieved by our algorithm and shows only the performance gain percentage (PGP) for clear and useful comparison.
We calculated the PGP as shown in Equation ( 2): where DbxT is the time required by the Dropbox application and PrT is the time required by our proposed algorithm.
Table 6 shows the PGP we achieved by applying our proposed algorithm to the different cases.When no update occurred at all, the performance gain was slightly higher than 99.5% while the lowest reported performance gain was for the text files.On average, the performance gain for inserting the compressed text   files was only 58%.Even though this value was lower than that for the other file types, it still remains a significantly high performance gain.The performance gain for inserting a compressed file was greater than 96% for all other file types.
A possible reason for the relatively low performance gain reported for text files might be the high compression ratio compared to other file types.However, the problem of inspecting the details of this variability is not in the scope of this paper and requires future research.

Conclusion and Future Work
In the present study, we proposed a new algorithm based on Delta synchronization that greatly improved the synchronization process and solved the problem of synchronizing compressed files.Two tests to validate the proposed algorithm were designed that compared the required time for achieving only specific changes in the compressed file instead of updating the whole file, and that benchmarked our proposed algorithm by comparing its performance to the Dropbox application that synchronizes the whole file after any amendment.Our proposed algorithm proved to work more efficiently than the Dropbox application did and demonstrated great potential in the field of cloud synchronization.
However, there is stillroom for improvement, with testing of different scenarios and benchmarking with various providers required.

Figure 3
Figure 3 provides an example of Delta synchronization, showing six users on the left hand side, with two of these users (numbers 2 and 6) altering their data (different colors indicate a change) and the remaining users (1, 3, 4, and 5) not altering their data.For the latter users who have not changed their data, we avoid re-synchronizing the data and synchronization was only applied to the data of users 2 and 6 as their data had changed.

Table 5 .
Changes applied to all sub-files.

Table 6 .
Percentages of performance gain improvement.