BPEL is the de facto standard for business process modeling in today\\\'s enterprises and is a promising candidate for the integration of business and scientific applications that run in Grid or Cloud environments. In these distributed infrastructures, the occurrence of faults is quite likely. Without sophisticated fault handling, workflows are frequently abandoned due to software or hardware failures, leading to a waste of CPU hours.
The fault handling mechanisms provided by BPEL are well suited for handling faults of the business logic, but infrastructure-induced errors should be handled automatically to avoid over-complication of workflow design and keep concerns separated.
This presentation (and the corresponding publication) identifies classes of faults that can be resolved automatically by the infrastructure, and provides a policy-based approach to configure this automatic behavior without the need for adding explicit fault handling mechanisms to the BPEL process. The proposed approach provides automatic redundancy of services using a Cloud infrastructure to allow substitution of defective services. An implementation based on the ActiveBPEL engine and Amazon\\\'s Elastic Compute Cloud is presented.
1. Fault-Tolerant BPEL Workflow Execution via Cloud-Aware Recovery Policies Ernst Juhnke, Tim Dörnemann, Bernd Freisleben {ejuhnke,doernemt,freisleb}@informatik.uni-marburg.de Presentedby: Ernst Juhnke
2. 2 Overview Motivation Fault Handling in BPEL Bird‘s Eye View of theArchitecture Design Implementation Experimental Results Conclusions
3. Motivation Whatis a SOA – andwhytouse a SOA? service-oriented manner leads to higher flexibility easeofreuse reducesdevelopmentcosts (at least theoretically) web services as implementing technology Couplingofservices composed applications are often called business processes -> workflows Highly distributed nature of composed applications failures are quite likely to occur network failures software failures on remote hosts … 3
4. Motivation (cont‘d) Fault handling is very important Many faults can be corrected by simplyretrying substituting the failed component by an equivalent one on-demandprovisioning Business Process Execution Language for Web Services BPEL is the de facto standard for web service composition enables the construction of complex web services offers a rich vocabulary of fault handling mechanisms (infrastructural) fault handling clutters the composition logic with non-functional aspects 4
5. Fault Handling in BPEL Faults related to the logic of the process have to be handled explicitly within the process the process must be able to react to these faults Faults related to infrastructural errorshave to be handled without interfering with the BPEL process 5 <faultHandlers> <catch faultName="buy:CreditCardNotApproved" faultVariable="Fault"> <!-- Make a callbacktotheclient --> <invoke partnerLink="Client" portType="buy:ClientCallbackPT" operation="ClientCallbackFault" inputVariable="Fault"/> </catch> <catchAll> <sequence> <assign> <copy> <!-- Create the Fault variable --> <fromexpression="string('Other fault')"/> <to variable="Fault" part="error"/> </copy> </assign> <invoke partnerLink="Client" portType="buy:ClientCallbackPT" operation="ClientCallbackFault" inputVariable="Fault"/> </sequence> </catchAll> </faultHandlers>
6. Bird‘s Eye View of theArchitecture Proposed solution adds a policy-based fault handling mechanism to BPEL without making any changes to the language standard changingtheengine'simplementation Uses a Cloudinfrastructuretoprovision spare machines on-demand 6
7. Design The solution substitutes the default invocation mechanism of the BPEL engine Fault Tolerant Invoke Handler (FTIH) executes and monitors every invoke operation FTIH is the only component that interacts with the workflow engine 7
10. Design (cont‘d) Dynamic Resolver schedules web service calls to underutilized machines dynamically 10 T. Dörnemann, E. Juhnke, and B. Freisleben. On-Demand Resource Provisioning for BPEL Workflows using Amazon’s Elastic Compute Cloud. In Proceedings of the 9th IEEE International Symposium on Cluster Computing and the Grid, page 140–147, Los Alamitos, CA, USA, 2009. IEEE Computer Society.
11. Implementation Integration of the FTIH into the engine is configured by using process deployment descriptors realized as a custom invoke handler (ActiveBPEL) URL-encoded definitions of policies 11 <partnerLink name="ecgAnalyzerPL"> <partnerRoleendpointReference="dynamic„ invokeHandler="java:FaultTolerantIH? GlobalPolicy='global.xml'; LocalPolicy='local.xml'" /> </partnerlink>
13. Implementation (cont‘d) Dynamic Resolver enhanced to cope with restrictions induced by policies 13 <Faults> <Fault name="NCName" final="true | false"> <byCausename="FaultCategory"/> <OriginHostretry="true | false"> <MaxTriesvalue="int"/> </OriginHost>? <Substitute resources="NONE | PHYSICAL_ONLY | EXISTING | NEW | DIFFERENT | ALL"> <MaxTriesvalue="int"/> </Substitute>? </Fault>+ <Faults>
14. Experimental Results Sleepresearch based on components developed in cooperation with researchers of the MediGrid (Medical Grid) project of the German Grid initiative (D-Grid) basically performs an ECG (electrocardiogram) analysis uses the produced results for apnea detection implementation uses the Physio Toolkit set of open source tools that is widely used in biomedical sciences 14
15. Experimental Results (cont‘d) The beat and apnea detection workflow data format of the recorded vital signs (EDF) at first needs a conversion performs beat and apnea detection in parallel is modeled using the Visual Grid Orchestrator (ViGO) 15
16. Experimental Results (cont‘d) Web services are programmed to throw different SOAP faults with a certain probability service faults are enforced randomly during the measurements probability is set to 30% probability of successful execution of the workflow is about 0.75 = 0.16807 policies for all fault categories are set to perform the same number of retries 16
21. Conclusions Policy-based fault handling mechanism for BPEL The solution is embedded into the BPEL engine without changing the BPEL standard and the engine's implementation Use case based upon medical sleep research 21
22. Finish … Thank you for your attention! Any questions – or remarks? 22