A Workflow-Based Failure Recovery in Web Services Composition

Abstract

In previous researches in the field of supporting reliability and fault tolerance in web service composition, only low level programming constructs such as exception handling (for example in WSBPEL) were considered. However we believe that the reliability and fault tolerance for composite services must be handled at a higher level of abstraction, i.e. at the workflow level. Therefore a language and technology independent method for fault-tolerant composition of web services is needed. To do this, a fault tolerant workflow is built in which the execution order of the services is determined such that upon a service failure a recovery process with the lowest cost is started. The cost of a service failure includes the cost of failed service and the total costs of roll-baking the previously executed services which are dependent on the failed service. In this article a FSP language is applied to formally specify the workflow.

Share and Cite:

Bushehrian, O. , Zare, S. and Rad, N. (2012) A Workflow-Based Failure Recovery in Web Services Composition. Journal of Software Engineering and Applications, 5, 89-95. doi: 10.4236/jsea.2012.52014.

1. Introduction

Nowadays SOA architecture is used as a platform for accessing to data and services in distributed form. Service Composition in this architecture is a way to obtain more complicated services by combining the functionality of individual services [1,2]. The main problem here is the fault tolerance and recovery of failures while executing a composite service [3,4]. In the situation of using a composite service, different faults may occur that mainly causes a service to fail [5]. However a fault-tolerant service composition is the one that ends up the whole transaction in a safe state upon a service failure where the related services are also rolled-back appropriately. Consider a service composition in arranging an itinerary. If the “Flight Reservation” service is failed all other committed services such as “Hotel Reservation” should be roll-backed. Therefore in a fault-tolerant service composition each set of related services may form a transaction for which the atomicity property is a must and by failing one of them, others have to be rolled-back. However for some services the roll-back operation may not be available or only available partially. Therefore failing a service may cause the whole composition ended up in an inconsistent execution due to the violation of atomicity property. To model this limitation, each service within the composition is associated with a “roll-back cost” which its value is an indicator of the amount of impact on the whole composition resulted from requesting the “rollback” operation for that service. If a service supports the roll-back operation, its associated roll-back cost is zero otherwise some value is considered for this cost. For example if the “Flight Reservation” is not a roll-back supporting service, its rollback cost will be equal to the whole ticket price. In addition to the roll-back costs, for each service within the composition it should be determined on which services this service is dependent. The dependency here is the roll-back dependency and is defined as follows: Service Sj is dependent on service Si when failure of Si must start roll-backing of Sj (if already executed).

The main question of our research is how to automatically build a workflow, considering the roll-back dependencies among services, that sequences the execution of the services in a way that upon a service failure, the mean roll-back cost of the service composition becomes minimal. In this context, we call such a workflow a “fault-tolerant” workflow. In brief, we are trying to provide some degree of atomicity property in the execution of services within a service composition. Our solution to this problem should answer the following questions: 1) To what extent the workflow generation can be automated, 2) To what extent the solution can be language independent and 3) Is the solution founded on a rigorous theory (and hence it is easily verifiable)? In the previous works in this field these important questions are not addressed. There are many previous works that apply fault tolerance techniques at different levels of abstractions to create a faulttolerant web service composition [6-8]. For example in some languages such as WS-BPEL [1] low level programming constructs like exception handling is provided to support fault tolerance for service composition. However, we believe that the support for reliability and fault tolerance should be considered at the higher level of abstraction: the workflow level. In addition to this shortcoming, the previous studies poorly addressed the robust recovery management once a failure happens.

In this paper these shortcomings are addressed by proposing a method to automatically build a workflow considering the services transactional properties that not only is specified at the high degree of abstraction (and hence language independent) but also supports robust failure recovery.

2. Related Work

In the field of web services reliability and fault tolerance, the previous researches are divided into four main groups [6]:

1) Improvement of the web service reliability in the architectural definition level;

2) Assessment of the system fault tolerance with error injection;

3) Analysis of properties of the second generation web services;

4) Definition of the reliability assessing models of the web service-based systems.

Articles related to the first category generally have used the old reliability techniques for web services which increase the fault tolerance by using redundant services in the architecture; moreover none of them presented a formal method. In the second category error injections for the fault tolerance support have been used [9,10]. Studies in the third category basically include reliability evaluation of the second generation web services, i.e. WSReliable Messaging [11], WS Security and WS-Atomic Transaction [11]. In the fourth category, web service based systems generally are created by composing the simpler services according to a work-flow. The reliability of the workflow is evaluated considering the individual services reliability in the workflow. In some researches, the “Markov chain” has been used to model the system behaviour. The probability that the Markov chain comes to a final state from a start state with some limited state depends on the other movements, which means it depends on the reliability of the other states [12]. In [13] to automatically compute the overall QoS of a workflow, a mathematical model and an algorithm (SWR algorithm) are proposed. To support the composition of Web services, they also have presented an ontology-based solution in which a discovery mechanism is applied to find Web services with desired interfaces and operational metrics, and to assist designers in resolving heterogeneity issues among Web services. In [14] to achieve a higher reliability in a composite web service system, it is proposed to decrease the failure rate and increase the repair rate. In this paper a method for calculating the MTTF (Mean Time to Failure) of composite web based on the workflow composition pattern is presented. In [14] a formal verifycation approach of the workflow-based composite web services is presented. And it has been translated to the BPEL4WS primitives. In [8] they have used EXTRA (Exception handling + TRAnsaction), a hybrid fault-tolerant mechanism which combines exception handling and transaction techniques to improve the reliability of composite services. The first one (exception handling) tries to repair fault and let composite services to continue. The second one (transaction) ensures composite services to terminate in a consistent state when faults are not repairable. They have also presented FACTS framework, which present an integrated environment for specification, verification, and execution of fault tolerant composite services. However, in their work the termination cost of non-cancellable service in a service composition is not taken into account.

3. The Methodology

Our proposed methodology for creating a fault-tolerant workflow for web service composition is illustrated in Figure 1. The first step is to create a Rollback graph considering the service dependencies. Service Sj is dependent on service Si when failure of Si must start rollbacking of Sj (if already executed). The set of all dependent services on Si is depicted by Rollback [i]. In the Rollback graph each vertex Si represents a service and each edge (Si, Sj) represents the existence of a rollback relationship between Si and Sj. The second step is to remove

Figure 1. Transformation algorithm.

the useless edges from the Rollback graph to avoid defining unnecessary dependencies in the subsequent steps. If there is a path from A to C via other services in the Rollback graph and there is also a direct edge (A, C), this direct edge should be deleted (Figure 2). The third step is to remove the cycles within the rollback graph, first the cycles are located, and then for each cycle, the order of the services for which the average rollback cost is the lowest, is determined considering the failure probability and the rollback cost corresponding to each service. At the end of the third step a Rollback DAG (RBDAG) is obtained. The final step is to use the prerequisite dependencies in the RBDAG to create the fault-tolerant workflow which is specified in FSP language [15,16].

4. Cycle Elimination

Each cycle within the Rollback graph represents a set of services with the atomicity property, i.e. once one of them fails; the previously executed ones should be roll-backed. To minimize this rollback cost the order of services in the cycle with the minimum cost is selected and then the cycle is eliminated to form a DAG. For each permutation r of the services, the probability of the failure at position k is denoted by P (r, k) and is computed as follows:

(1)

where Pi is the failure probability of service Si. The average rollback cost for permutation r then is calculated as follows:

(2)

where Cm denotes the rollback cost corresponding to service Sm.

Now it is possible to determine the permutation ropt for which the value of Avg_Cost (ropt) is minimum. After the best order is specified we can remove that cycle from Rollback graph.

5. Workflow Creation RULES

In order to translate services to an FSP model, the following rules are used:

R1: Corresponding to each service in RBDAG create a FSP process.

Figure 2. Removing extra edges.

R2: If service wj is prerequisite for service wi in the RBDAG, create another process called lock. The lock process prevents process i from starting until j is finished.

For each service wk in RBDAG, If the immediate predecessor of k, denoted by Pred (wk) is a member of Rollback [wk], it must be roll-backed once k is failed:

R3: To start roll-backing of Pred (wk) only after wk enters its failed state, a lock process named Rlockk is created and added to the FSP model. This process prevents Pred (wk) from entering to its roll-back state unless wk enters its failed state.

If there are more than one successor for a given service wi, (Figure 3(a)) the corresponding lock process (step 2) should be created such that all the successors start theirs executions just after wi is finished successfully. If there is more than one predecessor for wi (Figure 3(b)), the lock process should be created to allow the execution of wi only when all its predecessors are finished successfully. In some cases, the rollback of wi is dependent on the failure of two or more services together. This concept is shown in RBDAG like Figure 3(c).

For each service (wk) an FSP process with four actions is defined as shown in Figure 4.

First the service is in state 0 and moves to state 1 by start_k action, and then if it fails, it will move to the final state 3 and if it succeeds, it goes to state 2. If after the successful termination of wk, another service wi which belongs to the same cycle as wk, fails, then wk should be roll-backed and goes from state 2 to state 3.

However if wk is not a member of any cycle in the Rollback graph or it is, but after removing the cycle it was placed at the last position in the optimal order, the rollback never happens after successful termination of wk because according to rule R2, it starts only after all its dependent services in the cycle are terminated successfully.

Figure 3. Prerequisites states.

Figure 4. Process actions.

This kind of service has actions as shown in Figure 5.

The lock process mentioned in rule R2 is a FSP process with two states as shown in Figure 6.

The Rlockk process mentioned in rule R3 is a FSP process with three states as shown in Figure 7. The Rlockk goes to state 1 once wk is failed or roll-backed. At state 1, the rollbacki action is ready to fire assuming that wi is the service which is dependent on wk (i.e. wi has to be roll-backed upon wk unsuccessful termination).

6. Translation Algorithm

Figure 8 shows the Translation Algorithm in which G is a Rollback Directed Acyclic Graph (RBDAG). This algorithm creates a process for each service. If service wi is prerequisite for service wj a lock process is created and added to the FSP model. For each service wk in RBDAG, a subset of Rollback [k] which is predecessors of wk in RBDAG, must be rollback. To do this, a Rlock process is also added to the set of FSP processes.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] D. Jordan and J. Evdemon, “Web Services Business Process Execution Language Version 2.0, OASIS Standard,” 2009. http://docs.oasis-open.org/wsbpel/2.0/serviceref
[2] Gartner, “Emerging SOA Patterns in the Enterprise,” 2009. http://www.infoq.com/news/2008/08/gartner-emerging-soa-patterns
[3] Q. Yu, X. Liu, A. Bouguetta and B. Medjahed, “Deploying and Managing Web Services: Issues, Solutions, and Directions,” The VLDB Journal, Vol. 17, No. 3, 2008, pp. 537-572. doi:10.1007/s00778-006-0020-3
[4] V. Issarny, F. Tartanoglu, A. Romanovsky and N. Levy, “Coordinated forward Error Recovery for Composite Web Services,” Proceedings of 22nd International Symposium on Reliable Distributed Systems, 6-18 October 2003, pp. 167-176. doi:10.1109/RELDIS.2003.1238066
[5] H. P. Chen and Z. Y. Wang, “A Fault Detection Mechanism for Fault-Tolerant SOA-Based Applications,” Machine Learning and Cybernetics, 2007 International Conference, Hong Kong, 19-22 August 2007, pp. 3777-3781.
[6] C. Luigi, R. Luigi, M. Nicola and S. Sergio, “Web Services Workflow Reliability Estimation Through Reliability Patterns,” Security and Privacy in Communications Networks and the Workshops, 2007, SecureComm 2007, 3rd International Conference, Hong Kong, 17-21 September 2007, pp. 107-110.
[7] T. Hu, M. Guo, S. Guo, H. Ozaki, L. Zheng, K. Ota and M. Dong, “TTF of Composite Web Services,” Parallel and Distributed Processing with Applications (ISPA), 2010 International Symposium, Taipei, 6-9 September 2010, pp. 130-137.
[8] A. Liu, Q. Li, L. Huang and M. Xiao, “FACTS: A Framework for Fault-Tolerant Composition of Transactional Web Services,” Services Computing, IEEE Transactions, Vol. 3, No. 1, 2010, pp. 46-59. doi:10.1109/TSC.2009.28
[9] N. Looker and J. Xu, “Assessing the Dependability of OGSA Middleware by Fault-Injection,” Proceedings of the 22nd International Symposium of Reliable Distributed Systems, 6-18 October 2003, pp. 293-302. doi:10.1109/RELDIS.2003.1238079
[10] N. Looker, M. Munro and J. Xu, “A Tool for Dependability Analysis of Web Services,” Proceedings of the 28th Annual International Conference of Computer Software and Applications, Hong Kong, 28-30 September 2004, pp. 120-123.
[11] A. L. Goel, “Software Reliability Models: Assumptions, Limitations, and Applicability,” IEEE Transactions on Software Engineering, Vol. SE-11, No. 12, 1985, 1411 - 1423. doi:10.1109/TSE.1985.232177
[12] V. Grassi, “Architecture-Based Dependability Prediction for Service-Oriented Computing,” Proceedings of WADS, Edinburgh, 25 May 2004, pp. 279-299.
[13] J. Antonio and S. Cardoso, “Quality of Service and Semantic Composition of Web Services,” Ph.D. Dissertation, Department of Computer Science, University of Georgia, Athens, 2002.
[14] D. Bianculli, C. Ghezzi and P. Spoletini, “A Model Checking Approach to Verify BPEL4WS Workflows,” IEEE International Conference on Service-Oriented Computing and Applications, Newport Beach, 19-20 June 2007, pp. 13-20.
[15] H. Foster, S. Uchitel, J. Magee and J. Kramer, “Model-Based Verification of Web Service Compositions,” Proceedings of the 18th IEEE International Conference of the Automated Software Engineering, Montreal, 6-10 October 2003, pp. 152-161.
[16] J. Magee and J. Kramer, “Concurrency: State Models and Java Programs,” John Wiley and Sons, Chichester, 1999.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.