SlideShare ist ein Scribd-Unternehmen logo
1 von 90
Downloaden Sie, um offline zu lesen
• Click to edit
 Master title
      style


Scientific workflow management a
 way to enable e-science on both
          Grids and Clouds
                         Adam Belloum
                    Institute of Informatics
                   University of Amsterdam
                     a.s.z.belloum@uva.nl
        SHIWA summer school MTA STAKI, Budapest HU July 2012
• Clck to edit Master title
           style




                                            Outline


                  •  Introduction
                  •  Life cycle of e-Science Workflow
                  •  Different approaches to workflow scheduling
                              –  Workflow Process Modeling & Management In
                                 Grid/Cloud
                              –  Workflow and Web services (intrusive/non
                                 intrusive)
                  •  Provenance
• Clck to edit Master title
            style




•  Objective of the group
       –  address the research issues related to building an
          e-Science framework which enables scientist to
          share, use knowledge add use geographically
          distributed resources (grids, clouds)


•  Keywords:
       –  Grid, Scientific workflow, SOA, provenance,
          interoperability

Collaborative e-Science experiments: from scientific workflow to knowledge
sharing A.S.Z. Belloum, Vladimir Korkhov, Spiros Koulouzis, Marcia A Inda, and Marian
Bubak JULY/AUGUST, IEEE Internet Computing, 2011
• Clck to edit Master title
           style




                              The project: COMMIT

•  COMMIT is a public-private research
   community solving grand challenges in
   information and communication science
   shaping tomorrow’s society.

•  COMMIT has 15 projects and 200 people
   in 80 organisations such as universities,
   TNO, Thales, Logica, Philips, AMC, and
   SME’s like DevLab, Hyves, Waag.

•  COMMIT delivers science, disseminates its
   results, measures its impact, generates
   synergy.

                                      www.Commit-nl.nl
• Clck to edit Master title
            style




                               Workflow management system

•  Workflow
   management system is
   a computer program that
   manages the execution of
   a workflow on a set of
   computing resources.


                                               The user interface of the WS-VLAM a
                                               workflow management system developed
                                               in the VL-e project to execute application
                                               workflow on geographically distributed
                                               computing resources



Deployed as service on Dutch super Computer (DAS3), and Dutch NGI (BigGrid) Clusters
• Clck to edit Master title
           style




                                               Workflow

    A workflow is a model to represent a
    reliably repeatable sequence of operations/
    tasks by showing explicitly the
    interdependencies among them.

                                                   Human transcriptome map




                        http://www.youtube.com/watch?v=R6bTFrzaR_w&feature=player_embedded

 SigWin-Detector workflow has been developed in the VL-e project to detect ridges in
 for instance a Gene Expression sequence or Human transcriptome map, BMC Research
 Notes 2008, 1:63 doi:10.1186/1756-0500-1-63.
• Clck to edit Master title
            style




List of applications developed using WS-VLAM

•  sigWin detector                [Micro-Array Dept-UvA]

•  Affymetrix Permutation         [Micro-Array Dept-UvA]

•  Omnimatch                      [UU/Leiden]

•    wave propagation             [TUE ]

•    Blast                        [AMC ]

•    gut microbiota               [CWI]

•    Smart Infrastructure         [SNE-UvA]

•    Dynamic network control      [SNE-UvA]

•    GridSFEA,                    [TU Munchen]



More applications www.science.uva.nl/~gvlam/wsvlam/Applications
• Clck to edit Master title
           style
• Clck to edit Master title
            style




                               Complex Scientific experiments model
    (1)  Problem                                          (2) Experiment
         investigation:                                       Prototyping:

    •         Look for relevant problems              •  Design experiment workflows
    •         Browse available tools                  •  Develop necessary components
    •         Define the goal
    •         Decompose into steps
                                             Shared
                                           repositories

   (4) Results                                            (3) Experiment
       Publication:                                           Execution:

  •  Annotate data                                        •  Execute experiment processes
  •  Publish data                                         •  Control the execution
                                                          •  Collect and analysis data


Collaborative e-Science experiments: from scientific workflow to knowledge
sharing A.S.Z. Belloum, Vladimir Korkhov, Spiros Koulouzis, Marcia A Inda, and Marian
Bubak JULY/AUGUST, IEEE Internet Computing, 2011
• Clck to edit Master title
           style




                              Targets

•  co-allocate resources needed for workflow
   enactment across multiple domains?
•  achieve QoS for data centric application
   workflows that have special requirements on
   network connections?
•  achieve Robustness and fault tolerance for
   workflow running across distributed resources?
•  increase re-usability of Workflow, workflow
   components, and refine workflow execution?
• Clck to edit Master title
           style




                                  Outline


        •  Introduction
        •  Lifecycle of an e-science workflow
        •  Different approach to workflow scheduling
                  –  Workflow Process Modeling & Management In
                     Grid/Cloud
                  –  Workflow and Web services Workflow and Web
                     services (intrusive/non intrusive)
        •  Provenance                              (1)
                                                 Problem
                                                                        (2)
                                                                    Experiment
                                                                    Prototyping
                                                investigation:
                                                              Shared
                                                            repositories
                                                    (4)                (3)
                                                  Results           Experiment
                                                Publication:         Execution:
• Clck to edit Master title
           style




                         A WSRF enabled workflow engine
                                                      Workflow composition

                     Workflow
                Distributed Workflow
           Management system                                  Workflow execution



                                     Web Service Interface                  Web Service Interface
                                                                                                 Workflow
                                                                                                Workflow
                                                                                    Grid
                                                                                     Grid
                                                                                   Grid            Engine
                                                                                                  Engine
                      Application Templates      OGSA DAI         Application    Services
                                                                                  Services
                                                                                    Services      Web service
                                                                                                  Web service
                           Web service                            web service


                              Grid Middleware:                              Grid Middleware:
                              Data management                       Process & resource management

                      Network & storage Resources                   Network & Computing Resources

                              Data Management Stack                          Process Management Stack




   Bob Hertberger keynote talk at 2nd IEEE Conf on eScience & grid computing , Amsterdam 2006
• Clck to edit Master title
           style




                              WS-VLAM Engine: architecture
                                       Service host(s)      compute element(s)

                                   GT4 Java Container



                                       GRAM
                                      RTSM                      pre-ws-GRAM
                                      services
                                     Factory
               Client




                                                   RTSM       Worker
                                                 Instance     nodes

                    Delegate         Delegation
                                                                  Workflow
                                      service                    components
• Clck to edit Master title
           style




                                           Sequence-diagram
                        WS-vlam                             GT4 Delegation                         RTSM      GT4
                                                                              RTSM Factory
                         Client                                Service                           Instance   GRAM


                                 1. Create: delegation credential

          Step 1               Get the delegation credential EPR



                              2. Submit workflow execution plan
                                                                             3. Submit workflow component


          Step 2
                                    Get the RTSM instance EPR                4. Create: RTSM instance




                              5. subscribe: to notification events

            Step 3                    Get the notification events
• Clck to edit Master title
           style



                                                           WSRF Services
                                                            - WS-VLAM engine
                                 Current deployment         - workflow component repository

VLe Studio
•    WS-VLAM composer
•    VBrowser
•    Semantic tools
 SAW: Semantic Annotation for Workflow
 CLAMP: Connecting LAnguage for Modules & Programs
 HAMMER: Hybrid-bAsed MatchMaker for e-Science                          Sara: National super
           Resources
                                                                        computing center
                                                                             Production Grid

                                                                                    SRB

                                                  Server host

                              Computing Nodes
                              •   Workflow components
                              •   Grid Middleware à GT4
                                                                              Experimental
                                                                              Environment
• Clck to edit Master title
           style




                              Model of computation


•  Model of computation: stream-based process
   network.
      –  Engine co-allocates all workflows.
      –  Components waste time idling.
      –  Co-allocation difficult.
•  Communication: time coupled
      –  Assumes components are running
      –  Simultaneously
      –  Synchronized p2p
      –  Fixed TCP/IP
• Clck to edit Master title
           style




                        WS-VLAM communication library




 V. Korkhov et al. VLAM-G: Interactive data driven workflow engine for Grid-enabled resources,
 Scientific Programming 15 (2007) 173–188 173 IOS Press
• Clck to edit Master title
            style




                           WS-VLAM communication library

•  Data transfer rate as a function
   of the data block size (average
   of 10 measurements per each
   data-block

•  with the deviation not
   exceeding 5 percent)




  V. Korkhov et al. VLAM-G: Interactive data driven workflow engine for Grid-enabled resources,
  Scientific Programming 15 (2007) 173–188 173 IOS Press
• Clck to edit Master title
           style




                              Model of computation


•  Model of computation: dataflow network
      –  components scheduled depending on data
      –  components only activated when data is available
      –  no need for co-allocation

•  Communication: time decouples
      –  messaging communication system.
      –  components not synchronized
      –  communication not strictly TCP/IP
• Clck to edit Master title
           style




                              Additional features-Farming

•  Task farming: task replication.
•  Increases data consumption and production.
•  Implements 3 types of farming:
      –  Auto Farming: The engine decides on farm size
         depending on port load.
      –  One-to-One Farming: A task replicated for every
         message received.
      –  Fixed Farming: Statically defined.
•  Allows parameter sweep studies.
•  A task becomes a parameter engine
• Clck to edit Master title
            style




Reginald Cushing, Spiros Koulouzis, Adam S. Z. Belloum, Marian Bubak, Prediction-based Auto-
scaling of Scientific Workflows, 7th IEEE International Conference on e-Science, December 2011,
Stockholm, Sweden
• Clck to edit Master title
            style




                               System Overview




Reginald Cushing, Spiros Koulouzis, Adam S. Z. Belloum, Marian Bubak, Prediction-based Auto-scaling
of Scientific Workflows, Proceedings of the 9th International Workshop on Middleware for Grids, Clouds
and e-Science, ACM/IFIP/USENIX December 12th, 2011, Lisbon, Portugal
• Clck to edit Master title
           style




                              Enactment Engine
• Clck to edit Master title
           style




                              Message Broker
• Clck to edit Master title
           style




                              Submission System
• Clck to edit Master title
           style




                              Task Harnessing
• Clck to edit Master title
           style




                              Task Auto-scaling
• Clck to edit Master title
           style




                              Scaling Concepts
• Clck to edit Master title
           style




                              Scaling Concepts
• Clck to edit Master title
           style




                              Scaling Concepts
• Clck to edit Master title
           style




                              Scaling Concepts
• Clck to edit Master title
           style




                              Scaling Concepts
• Clck to edit Master title
           style




                              Scaling Concepts
• Clck to edit Master title
           style




                              Load Prediction
• Clck to edit Master title
           style




                              Auto-scaling steps
• Clck to edit Master title
           style




                              Auto scaling steps
• Clck to edit Master title
           style




                              Auto scaling steps
• Clck to edit Master title
           style




                              Auto-scaling Steps
• Clck to edit Master title
           style




                              Auto-scaling Steps
• Clck to edit Master title
           style




                              Auto-scaling Steps
• Clck to edit Master title
            style




                               Auto-scaling Steps (summary)

•  Each task port is monitored to calculate the data processing rate
•  Data is parceled in messages. Tasks consume messages
•  Using the mean data processing rate and the amount of data
   queued on the port we extrapolate the proc. time for all data
•  Based on the current resource we can estimate how many clones
   are needed to process all data within a time quantum
•  Clones are submitted in bursts so not to flood resources
•  Port is continuously monitored and further bursts can be
   submitted
•  Once clones are active, message consumption is faster since
   clones share same queues
• Clck to edit Master title
           style




                              Queue sharing
• Clck to edit Master title
           style




                                       Use case


                              Matlab
• Clck to edit Master title
           style




                              Use case
• Clck to edit Master title
           style




                              Workflow Without Scaling
• Clck to edit Master title
           style




                              Workflow Without Scaling
• Clck to edit Master title
           style




                              Use Case
• Clck to edit Master title
           style




                          Workflow execution with Scaling
• Clck to edit Master title
           style




                          Workflow execution with Scaling
• Clck to edit Master title
           style




                              Auto Scaling Task -1
• Clck to edit Master title
           style




                              Auto Scaling Task -2
• Clck to edit Master title
           style




                              Other Scaled Task -1
• Clck to edit Master title
           style




                              Auto Scaling Task -2
• Clck to edit Master title
            style




                 Extension to support Cloud resources




Resource on-demand using multiple cloud providers, Super-computing 2010, and SCALE 2012
• Clck to edit Master title
           style




                                  Outline

        •  Introduction
        •  Lifecycle of an e-science workflow
        •  Different approach to workflow scheduling
                  –  Workflow Process Modeling & Management In
                     Grid/Cloud
                  –  Workflow and Web services (Workflow and
                     Web services (intrusive/non intrusive)
        •  Provenance
                                                   (1)                  (2)
                                                                    Experiment
                                                 Problem
                                                                    Prototyping
                                                investigation:
                                                              Shared
                                                            repositories
                                                    (4)                (3)
                                                  Results           Experiment
                                                Publication:         Execution:
• Clck to edit Master title
           style




                    Usage of Web Services in e-science


 •  WS offer interoperability and flexibility in a large
    scale distributed environment.
 •  WS can be combined in a workflow so that
    more complex operations may be achieved,
 •  but any workflow implementation is potentially
    faced with a data transport problem
• Clck to edit Master title
            style




                               Service Submission
                                                •  Tasks/Jobs can be queued on
                                                   the runqueue by any entity. The
                                                   service submission listens on
                                                   the runqueue and picks up new
                                                   tasks to submit

                                                •  Resources such as Grid or Cloud
                                                   are abstracted using submitters
                                                   plugins

                                                •  Enabling a new resource is a
                                                   matter of writing its submitter
                                                •  Service Submission performs
                                                   matchmaking between services
                                                   and resources to run on

Reginald Cushing, Spiros Koulouzis, Adam S. Z. Belloum, Marian Bubak, Dynamic Handling for
Cooperating Scientific Web Services, 7th IEEE International Conference on e-Science, December 2011,
Stockholm, Sweden
• Clck to edit Master title
           style




                              Service Container Module
                                         •  The service container (Axis2)
                                            is the actual task that is
                                            submitted to a resource.

                                         •  The service container acts as a
                                            pilot-job mechanism Once
                                            active it will pull a web service
                                            to host.

                                         •  Axis2 is heavily modified to
                                            invert web service invocation
                                            from passive to active.
                                         •  Scaling, orchestration and
                                            communication are all handled
                                            within the service container
• Clck to edit Master title
           style




                              Bootstrapping Workflows

                                           •  The architecture has no
                                              central coordinator to
                                              orchestrate a workflow.
                                              Hence a workflow is only
                                              bootstrapped i.e. submit
                                              the starting services. The
                                              rest are autonomously
                                              scheduled by the service
                                              containers on the resources.

                                           •  The bootstrap client submits
                                              the first service and waits
                                              for output of the last
                                              service.
• Clck to edit Master title
           style




                              Orchestration Steps

                                       1.  Workflow is bootstrapped by
                                           submitting the workflow entry points
                                           onto the queue
                                       2.  Submission service picks up the
                                           queued service and submits to a
                                           resource.
                                       3.  Service container starts executing on a
                                           resource
                                       4.  Service container pulls a web service
                                           and polls for data to be consumed by
                                           the service.
                                       5.  Service container outputs data to the
                                           next service
                                       6.  Service container queues the next
                                           service if none exist
• Clck to edit Master title
           style




                              Service Container - Transport

                                            •  Transport handler requests
                                               SOAP from message broker
                                               queues instead of passively
                                               listening for HTTP

                                            •  Pull model allows web
                                               services to “bypass” firewalls
                                               and thus can be deployed
                                               within networks

                                            •  Transport Sender picks up
                                               the return SOAP message and
                                               sends it to the message broker
• Clck to edit Master title
            style




                               Service Container - Control
                                             •  Fuzzy controller
                                                implements auto-scaling
                                                routines

                                             •  Workflow enactor
                                                implements autonomous
                                                orchestration which makes a
                                                central coordinator
                                                redundant




•  Message transformer transforms a SOAP output to SOAP input
   for other services in the workflow
•  The message transformer allow back-to-back service
   communication
• Clck to edit Master title
           style




                                   Resource management


•  Within a single workflow services are competing
   for resources.
•  Scaling one service without any regard to the whole
   workflow may starve parts of the workflow and
   hamper progress
•  It would be ideal to have a mechanism to greedily
   consume resources if no one is using them but
   donate back resources once they are requested.

                              Fuzzy controller tries to do just that.
• Clck to edit Master title
            style




                                     Fuzzy Controller

                                              •  Task (web service) load and
                                                 Resource load are inputs to
                                                 the fuzzy controller.
                               Rule Base
                               Inference
                                Engine

                                              •  The controller applies a
                                                 number of fuzzy rules to
                                                 determine the output which
                                                 is the replication factor.
•  IF taskLoad IS very_high AND resourceLoad IS very_low
   THEN replication IS positive_aggressive.
•  IF taskLoad IS very_low AND resourceLoad IS high THEN
   replication IS negative_aggressive.
• Clck to edit Master title
            style




                                    Fuzzy Controller


                               Rule Base
                               Inference
                               Engine


                                           •  Task (web service) load and Resource
                                              load are inputs to the fuzzy controller.
                                           •  The controller applies a number of
                                              fuzzy rules to determine the output
                                              which is the replication factor.

•  IF taskLoad IS very_high AND resourceLoad IS very_low
   THEN replication IS positive_aggressive.

•  IF taskLoad IS very_low AND resourceLoad IS high THEN
   replication IS negative_aggressive.
• Clck to edit Master title
           style




                              Fuzzy Rule Map
                                   •  Service load is based on the amount
                                      of data being queued on the service
                                      and the time quantum for the
                                      service to run

                                   •  The service container continuously
                                      monitors the data processing rate
                                      and estimates the computation time
                                      needed to process all the queued
                                      data within a time frame of the data
                                      and the processing time are directly
                                      proportionate. This might not be the
                                      case for all problems.

The estimated processing time and the time quantum given by the
resource for executing the service are used to derive the service load.
Thus a service load of 2 means that it will take twice as much time as the
allocated quantum to process the data.
• Clck to edit Master title
           style




                              Back-to-Back Communication

                                            •  Back-2-Back communication
                                               allows web services to
                                               communicate directly
                                               without the need for an
                                               intermediate client.

                                            •  This is achieved through the
                                               message broker which
                                               exposes dedicated
                                               connections queues.

Foreach connection in A.method1.connections
      SOAPTemplate = getTemplate(connection);
      destinationQueue = getDestination(connection);
      newSOAP = transformSOAP(A.method1.output, SOAPTEmplate);
      write(newSOAP, destinationQueue);
• Clck to edit Master title
           style




                              Autonomous Orchestration
                                          •  The service container can
                                             query the message broker to
                                             deduce if and instance of B is
                                             running.

                                          •  If no instance of B is running,
                                             the service container for A
                                             submits B to the runqueue.

                                          •  Service containers are myopic
 Foreach connection in A.method1.connections
       SOAPTemplate = getTemplate(connection);
       destinationQueue = getDestination(connection);
       newSOAP = transformSOAP(A.method1.output, SOAPTEmplate);
       write(newSOAP, destinationQueue);
       If not active(destinationQueue)
               submit( getService(connection) );
• Clck to edit Master title
            style




                               Use Case

                                        •  Workflow with 2 pipelines.
                                           The pipelines perform
                                           sequence alignments using
                                           data from UniProtKB

                                        •  Each pipeline performs 22500
                                           alignments i.e. 45100 total
                                           alignments in all
•  All modules are standard web services which are hosted in the modified
   Axis2 container
•  The alignments where performed using BioJava api
•  Source and sink are part of the bootstrapping sequence. Source submits
   the getSequenceId service while sink waits for output from the
   htmlRenderer
•  The Distributed ASCI Computer 3 (DAS3) was used as the resource pool.
     www.uniprot.org
• Clck to edit Master title
           style




                              Evaluating Auto-Scaling




                        Service Load
                                            Running Service instances
• Clck to edit Master title
            style




                               Scale up web services




Peaks in load(left) will result in peaks in instances(right).
The fuzzy controllers scale up the web services to meet
the demands
• Clck to edit Master title
           style




                              Scale up web services
• Clck to edit Master title
           style




                              Greedy Scale up
• Clck to edit Master title
           style




                              Scale down web services
• Clck to edit Master title
           style




                                  Outline

        •  Introduction
        •  Lifecycle of an e-science workflow
        •  Different approach to workflow scheduling
                  –  Workflow Process Modeling & Management In
                     Grid/Cloud
                  –  Workflow and Web services (Workflow and
                     Web services (intrusive/non intrusive)
        •  Provenance
                                                   (1)                  (2)
                                                                    Experiment
                                                 Problem
                                                                    Prototyping
                                                investigation:
                                                              Shared
                                                            repositories
                                                    (4)                (3)
                                                  Results           Experiment
                                                Publication:         Execution:
• Clck to edit Master title
            style




                     Usage of Web Services in e-science


•  In service orchestration, all data is passed to the workflow
   engine before delivered to a consuming WS
•  Data transfers are made through SOAP, which is unfit for
   large data transfers




Enabling web services to consume and produce
large distributed datasets Spiros Koulouzis, Reginald Cushing, Konstantinos Karasavvas,
Adam Belloum, Marian Bubak to be published JAN/FEB, IEEE Internet Computing, 2012
• Clck to edit Master title
            style




                                   ProxyWS

•  uses multitude of protocols to transport large data
       –  used as an interface for developing WSs able to stream data.
       –  Or as enabler for legacy web services to stretch their current potential
          by referencing data that would otherwise be delivered via SOAP




Enabling web services to consume and produce
large distributed datasets Spiros Koulouzis, Reginald Cushing, Konstantinos Karasavvas,
Adam Belloum, Marian Bubak to be published JAN/FEB, IEEE Internet Computing, 2012
• Clck to edit Master title
           style




                        Indexing Name Entry Recognition

•  AIDA provides a set of components which
   enable the indexing of text documents in
   various formats.
•  AIDA's Indexer component, called
   IndexerWS is a WS able to index document
   with the use of the Streaming library.
• Clck to edit Master title
            style




Results Indexing Web Services for Information
             Retrieval (Indexing)




Enabling web services to consume and produce large distributed datasets Spiros
Koulouzis, Reginald Cushing, Konstantinos Karasavvas, Adam Belloum, Marian Bubak to be
published JAN/FEB, IEEE Internet Computing, 2012
• Clck to edit Master title
            style




Results Indexing Web Services for Information
               Retrieval (NER)




Enabling web services to consume and produce large distributed datasets Spiros
Koulouzis, Reginald Cushing, Konstantinos Karasavvas, Adam Belloum, Marian Bubak to be
published JAN/FEB, IEEE Internet Computing, 2012
• Clck to edit Master title
           style




                                   References

           1.  A.S.Z. Belloum, V. Korkhov, S. Koulouzis, M. A Inda, and M. Bubak
               Collaborative e-Science experiments: from scientific workflow to
               knowledge sharing JULY/AUGUST, IEEE Internet Computing, 2011
           2.  Ilkay Altintas, Manish Kumar Anand, Daniel Crawl, Shawn Bowers, Adam
               Belloum, Paolo Missier, Bertram Ludascher, Carole A. Goble, Peter M.A.
               Sloot, Understanding Collaborative Studies Through Interoperable
               Workflow Provenance, IPAW2010, Troy, NY, USA
           3.  A. Belloum, Z. Zhao, and M. Bubak Workflow systems and applications ,
               Future Generation Comp. Syst. 25 (5): 525-527 (2009)
           4.  Z. Zhao, A.S.Z. Belloum, et al., Distributed execution of aggregated multi
               domain workflows using an agent framework The 1st IEEE International
               Workshop on Scientific Workflows, Salt Lake City, U.SA, 2007
           5.  Zhiming Zhao, Adam Belloum, Cees De Laat, Pieter Adriaans, Bob
               Hertzberger Using Jade agent framework to prototype an e-Science
               workflow bus Authors Cluster Computing and the Grid, 2007. CCGRID
               2007
• Clck to edit Master title
           style




                       http://www.commit-nl.nl/




http://www.science.n/~gvlam/wsvlam/
                                                  http://www.vle.nl/
• Clck to edit Master title
           style




                                  Outline

        •  Introduction
        •  Lifecycle of an e-science workflow
        •  Different approach to workflow scheduling
                  –  Workflow Process Modeling & Management In
                     Grid/Cloud
                  –  Workflow and Web services (intrusive/non-
                     intrusive)
        •  provenance
                                                   (1)                  (2)
                                                                    Experiment
                                                 Problem
                                                                    Prototyping
                                                investigation:
                                                              Shared
                                                            repositories
                                                    (4)                (3)
                                                  Results           Experiment
                                                Publication:         Execution:
• Clck to edit Master title
           style




                              Provenance/ reproducibility



•  “A complete provenance record for a data
   object allows the possibility to reproduce
   the result and reproducibility is a critical
   component of the scientific method”

•  Provenance: The recording of metadata
   and provenance information during the
   various stages of the workflow lifecycle
 Workflows and e-Science: An overview of workflow system features
 and capabilities Ewa Deelmana, Dennis Gannonb, Matthew Shields c, Ian Taylor, Future
 Generation Computer Systems 25 (2009) 528540
• Clck to edit Master title
            style




                          History-tracing XML (FH Aachen)

•  provides data/process                                <patternMatch>
                                                          <events>
   provenance following an                                  <PortResolved> provenance
   approach that                                        data</PortResolved>
                                                            <ConDone>provenance data
       –  maps the workflow graph                                     </ConDone>
          to a layered structure of                         ...
                                                          </events>
          an XML document.                                <fileReader2>
       –  This allows an intuitive                          <events> ... </events>
                                                            <sign-fileReader2> ...
          and easy processable                                     </signfileReader2>
          representation of the                           </fileReader2>
          workflow execution path,                        <sffToFasta>
                                                            Reference
       –  which can be, eventually,                       </sffToFasta>
          electronically signed.                          <sign-patternMatch> ...
                                                                 </sign-patternMatch>
                                                        </patternMatch>

M. Gerards, Adam S. Z. Belloum, F. Berritz, V. Snder, S. Skorupa, A History-tracing XML-base
Proveannce Framework for workflows, WORKS 2010, New Orleans, USA, November 2010
• Clck to edit Master title
           style




                                           Plier (UvA/BigGrid)

     •      PLIER is an implementation of the OPM 1.1 specifications.
     •      It’s API provides a set of functions to build, store, and share workflow experiments
            as graphs.
     •      It also implements an optimal relational database as back-end storage that captures
            the concepts of the OPM model, using the Java Persistence API (JPA 2.0) and
            Hibernate.
     •      In addition, the PLIER API provides specific interfaces, using JDO 3.1, to transform,
            or serialize, the provenance data into specific formats (e.g. RDF, XML, and DOT).

                                                                                                            Id_36813958
                                                                                        tagMichel200        1_input_resul
                                                      resultPatternM
                                                      atch                              80707.txt           t_fasta
            Ribosomal_H
            uman.gz                                                                                    parameter
                                                                          Id 1587433265
                                                                          input patternFile        input_file           input_file
                                Input sffinfo
                                Component.tar
                                                      parameter
            sffTo                               input_file                  input_file
            Fasta                                            Pattern                                            blast
                              TrigeredBy                     Match                                              all
                                                                          TrigeredBy
          Completed                                                                                        Completed
                                                             Completed
                    output_file        output_file                                 output_file                      output_file
        • Id 81081428                  Id_1587433265_outp                Id_1587433265_out             Id_368139581_outpu
       output sffOutput                ut_result_fasta                   put_result_txt                t_out_blast_tar
• Clck to edit Master title
            style




                 wave propagation model applications




                                                        User Interface to compose workflow (top
                                                        right), monitor the execution of the farmed
                                                        workflows (top left), and monitor each run
                                                        separately (bottom left) data
              [Biomedical engineering Cardiovascular
              biomechanics group TUE])
    wave propagation model of blood flow in large
    vessels using an approximate velocity profile
    function:

    a biomedical study for which 3000 runs were
    required to perform a global sensitivity analysis
    of a blood pressure wave propagation in arteries

                                                        Query interface for the provenance data
                                                        collected from 3000 simulations of the “wave
                                                        propagation model of blood flow in large
BigGrid project 2009, presented EGI/                    vessels using an approximate velocity profile
BigGrid technical forum 2010                            function”
• Clck to edit Master title
            style




                               Blast Application


                                                    [Department of Clinical
                                                    Epidemiology, Biostatistics and
                                                    Bioinformatics (KEBB), AMC ]


 The aim of the application is the alignment
 of DNA sequence data with a given
 reference database. A workflow approach is
 currently followed to run this application on
 distributed computing resources.

For Each workflow run
• The provenance data is collected an stored
following the XML-tracing system
• User interface allows to reproduce events that
occurred at runtime (replay mode)
• User Interface can be customized (User can
select the events to track)
• User Interface show resource usage               on-going work UvA-AMC-fh-aachen
• Clck to edit Master title
           style




                                             Outline

                  •  Objectives
                  •  Problem statement and Challenges
                  •  Research Track
                              –  Track 1: Workflow Process Modeling &
                                 Management In Grid/Cloud
                              –  Track 2: Workflow Sharing and Reproducibility
                                 Workflow Semantics and provenance
                              –  Track 3: Management of scientific data
                                 Scalable Data access
                  •  Applications
• Clck to edit Master title
           style




                       http://www.commit-nl.nl/




http://www.science.n/~gvlam/wsvlam/
                                                  http://www.vle.nl/

Weitere ähnliche Inhalte

Ähnlich wie Adam shiwa summerschool 2012

Adam e science_2013_1
Adam e science_2013_1Adam e science_2013_1
Adam e science_2013_1Adam Belloum
 
Build Java Web Application Using Apache Struts
Build Java Web Application Using Apache Struts Build Java Web Application Using Apache Struts
Build Java Web Application Using Apache Struts weili_at_slideshare
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncQuery Labs
 
10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural stylesMajong DevJfu
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEWShiyong Lu
 
Software variability management - 2017
Software variability management - 2017Software variability management - 2017
Software variability management - 2017XavierDevroey
 
Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Testplant
 
Emulating cisco network laboratory topologies in the cloud
Emulating cisco network laboratory topologies in the cloudEmulating cisco network laboratory topologies in the cloud
Emulating cisco network laboratory topologies in the cloudronan messi
 
Development and Test on AWS - Pizette
Development and Test on AWS - PizetteDevelopment and Test on AWS - Pizette
Development and Test on AWS - PizetteAmazon Web Services
 
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)David Rosenblum
 
Model-Driven Cloud Data Storage
Model-Driven Cloud Data StorageModel-Driven Cloud Data Storage
Model-Driven Cloud Data Storagejccastrejon
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudAdianto Wibisono
 
Declarative Programming and a form of SDN
Declarative Programming and a form of SDN Declarative Programming and a form of SDN
Declarative Programming and a form of SDN Miya Kohno
 
Mk network programmability-03_en
Mk network programmability-03_enMk network programmability-03_en
Mk network programmability-03_enMiya Kohno
 
Tivoli Development Cloud Pennock Final Web
Tivoli Development Cloud Pennock Final WebTivoli Development Cloud Pennock Final Web
Tivoli Development Cloud Pennock Final WebKennisportal
 
Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure applicationCodecamp Romania
 
WebLogic Event Server - Alexandre Alves, BEA
WebLogic Event Server - Alexandre Alves, BEAWebLogic Event Server - Alexandre Alves, BEA
WebLogic Event Server - Alexandre Alves, BEAmfrancis
 
Workflow User Interfaces Patterns
Workflow User Interfaces PatternsWorkflow User Interfaces Patterns
Workflow User Interfaces PatternsJean Vanderdonckt
 

Ähnlich wie Adam shiwa summerschool 2012 (20)

Adam e science_2013_1
Adam e science_2013_1Adam e science_2013_1
Adam e science_2013_1
 
Build Java Web Application Using Apache Struts
Build Java Web Application Using Apache Struts Build Java Web Application Using Apache Struts
Build Java Web Application Using Apache Struts
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery LabsIncquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
Incquery Suite Models 2020 Conference by István Ráth, CEO of IncQuery Labs
 
10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles10 - Architetture Software - More architectural styles
10 - Architetture Software - More architectural styles
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
Software variability management - 2017
Software variability management - 2017Software variability management - 2017
Software variability management - 2017
 
Tools. Techniques. Trouble?
Tools. Techniques. Trouble?Tools. Techniques. Trouble?
Tools. Techniques. Trouble?
 
Emulating cisco network laboratory topologies in the cloud
Emulating cisco network laboratory topologies in the cloudEmulating cisco network laboratory topologies in the cloud
Emulating cisco network laboratory topologies in the cloud
 
Development and Test on AWS - Pizette
Development and Test on AWS - PizetteDevelopment and Test on AWS - Pizette
Development and Test on AWS - Pizette
 
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
Software System Scalability: Concepts and Techniques (keynote talk at ISEC 2009)
 
Model-Driven Cloud Data Storage
Model-Driven Cloud Data StorageModel-Driven Cloud Data Storage
Model-Driven Cloud Data Storage
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
Declarative Programming and a form of SDN
Declarative Programming and a form of SDN Declarative Programming and a form of SDN
Declarative Programming and a form of SDN
 
Mk network programmability-03_en
Mk network programmability-03_enMk network programmability-03_en
Mk network programmability-03_en
 
Tivoli Development Cloud Pennock Final Web
Tivoli Development Cloud Pennock Final WebTivoli Development Cloud Pennock Final Web
Tivoli Development Cloud Pennock Final Web
 
Alex mang patterns for scalability in microsoft azure application
Alex mang   patterns for scalability in microsoft azure applicationAlex mang   patterns for scalability in microsoft azure application
Alex mang patterns for scalability in microsoft azure application
 
WebLogic Event Server - Alexandre Alves, BEA
WebLogic Event Server - Alexandre Alves, BEAWebLogic Event Server - Alexandre Alves, BEA
WebLogic Event Server - Alexandre Alves, BEA
 
Workflow User Interfaces Patterns
Workflow User Interfaces PatternsWorkflow User Interfaces Patterns
Workflow User Interfaces Patterns
 

Kürzlich hochgeladen

Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...M56BOOKSTORE PRODUCT/SERVICE
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustSavipriya Raghavendra
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRATanmoy Mishra
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxPurva Nikam
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfMohonDas
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfMohonDas
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeCeline George
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSyedNadeemGillANi
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 

Kürzlich hochgeladen (20)

Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic SupportMarch 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
 
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRADUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
DUST OF SNOW_BY ROBERT FROST_EDITED BY_ TANMOY MISHRA
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptx
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
Diploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdfDiploma in Nursing Admission Test Question Solution 2023.pdf
Diploma in Nursing Admission Test Question Solution 2023.pdf
 
Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdf
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using Code
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 

Adam shiwa summerschool 2012

  • 1. • Click to edit Master title style Scientific workflow management a way to enable e-science on both Grids and Clouds Adam Belloum Institute of Informatics University of Amsterdam a.s.z.belloum@uva.nl SHIWA summer school MTA STAKI, Budapest HU July 2012
  • 2. • Clck to edit Master title style Outline •  Introduction •  Life cycle of e-Science Workflow •  Different approaches to workflow scheduling –  Workflow Process Modeling & Management In Grid/Cloud –  Workflow and Web services (intrusive/non intrusive) •  Provenance
  • 3. • Clck to edit Master title style •  Objective of the group –  address the research issues related to building an e-Science framework which enables scientist to share, use knowledge add use geographically distributed resources (grids, clouds) •  Keywords: –  Grid, Scientific workflow, SOA, provenance, interoperability Collaborative e-Science experiments: from scientific workflow to knowledge sharing A.S.Z. Belloum, Vladimir Korkhov, Spiros Koulouzis, Marcia A Inda, and Marian Bubak JULY/AUGUST, IEEE Internet Computing, 2011
  • 4. • Clck to edit Master title style The project: COMMIT •  COMMIT is a public-private research community solving grand challenges in information and communication science shaping tomorrow’s society. •  COMMIT has 15 projects and 200 people in 80 organisations such as universities, TNO, Thales, Logica, Philips, AMC, and SME’s like DevLab, Hyves, Waag. •  COMMIT delivers science, disseminates its results, measures its impact, generates synergy. www.Commit-nl.nl
  • 5. • Clck to edit Master title style Workflow management system •  Workflow management system is a computer program that manages the execution of a workflow on a set of computing resources. The user interface of the WS-VLAM a workflow management system developed in the VL-e project to execute application workflow on geographically distributed computing resources Deployed as service on Dutch super Computer (DAS3), and Dutch NGI (BigGrid) Clusters
  • 6. • Clck to edit Master title style Workflow A workflow is a model to represent a reliably repeatable sequence of operations/ tasks by showing explicitly the interdependencies among them. Human transcriptome map http://www.youtube.com/watch?v=R6bTFrzaR_w&feature=player_embedded SigWin-Detector workflow has been developed in the VL-e project to detect ridges in for instance a Gene Expression sequence or Human transcriptome map, BMC Research Notes 2008, 1:63 doi:10.1186/1756-0500-1-63.
  • 7. • Clck to edit Master title style List of applications developed using WS-VLAM •  sigWin detector [Micro-Array Dept-UvA] •  Affymetrix Permutation [Micro-Array Dept-UvA] •  Omnimatch [UU/Leiden] •  wave propagation [TUE ] •  Blast [AMC ] •  gut microbiota [CWI] •  Smart Infrastructure [SNE-UvA] •  Dynamic network control [SNE-UvA] •  GridSFEA, [TU Munchen] More applications www.science.uva.nl/~gvlam/wsvlam/Applications
  • 8. • Clck to edit Master title style
  • 9. • Clck to edit Master title style Complex Scientific experiments model (1)  Problem (2) Experiment investigation: Prototyping: •  Look for relevant problems •  Design experiment workflows •  Browse available tools •  Develop necessary components •  Define the goal •  Decompose into steps Shared repositories (4) Results (3) Experiment Publication: Execution: •  Annotate data •  Execute experiment processes •  Publish data •  Control the execution •  Collect and analysis data Collaborative e-Science experiments: from scientific workflow to knowledge sharing A.S.Z. Belloum, Vladimir Korkhov, Spiros Koulouzis, Marcia A Inda, and Marian Bubak JULY/AUGUST, IEEE Internet Computing, 2011
  • 10. • Clck to edit Master title style Targets •  co-allocate resources needed for workflow enactment across multiple domains? •  achieve QoS for data centric application workflows that have special requirements on network connections? •  achieve Robustness and fault tolerance for workflow running across distributed resources? •  increase re-usability of Workflow, workflow components, and refine workflow execution?
  • 11. • Clck to edit Master title style Outline •  Introduction •  Lifecycle of an e-science workflow •  Different approach to workflow scheduling –  Workflow Process Modeling & Management In Grid/Cloud –  Workflow and Web services Workflow and Web services (intrusive/non intrusive) •  Provenance (1) Problem (2) Experiment Prototyping investigation: Shared repositories (4) (3) Results Experiment Publication: Execution:
  • 12. • Clck to edit Master title style A WSRF enabled workflow engine Workflow composition Workflow Distributed Workflow Management system Workflow execution Web Service Interface Web Service Interface Workflow Workflow Grid Grid Grid Engine Engine Application Templates OGSA DAI Application Services Services Services Web service Web service Web service web service Grid Middleware: Grid Middleware: Data management Process & resource management Network & storage Resources Network & Computing Resources Data Management Stack Process Management Stack Bob Hertberger keynote talk at 2nd IEEE Conf on eScience & grid computing , Amsterdam 2006
  • 13. • Clck to edit Master title style WS-VLAM Engine: architecture Service host(s) compute element(s) GT4 Java Container GRAM RTSM pre-ws-GRAM services Factory Client RTSM Worker Instance nodes Delegate Delegation Workflow service components
  • 14. • Clck to edit Master title style Sequence-diagram WS-vlam GT4 Delegation RTSM GT4 RTSM Factory Client Service Instance GRAM 1. Create: delegation credential Step 1 Get the delegation credential EPR 2. Submit workflow execution plan 3. Submit workflow component Step 2 Get the RTSM instance EPR 4. Create: RTSM instance 5. subscribe: to notification events Step 3 Get the notification events
  • 15. • Clck to edit Master title style WSRF Services - WS-VLAM engine Current deployment - workflow component repository VLe Studio •  WS-VLAM composer •  VBrowser •  Semantic tools SAW: Semantic Annotation for Workflow CLAMP: Connecting LAnguage for Modules & Programs HAMMER: Hybrid-bAsed MatchMaker for e-Science Sara: National super Resources computing center Production Grid SRB Server host Computing Nodes •  Workflow components •  Grid Middleware à GT4 Experimental Environment
  • 16. • Clck to edit Master title style Model of computation •  Model of computation: stream-based process network. –  Engine co-allocates all workflows. –  Components waste time idling. –  Co-allocation difficult. •  Communication: time coupled –  Assumes components are running –  Simultaneously –  Synchronized p2p –  Fixed TCP/IP
  • 17. • Clck to edit Master title style WS-VLAM communication library V. Korkhov et al. VLAM-G: Interactive data driven workflow engine for Grid-enabled resources, Scientific Programming 15 (2007) 173–188 173 IOS Press
  • 18. • Clck to edit Master title style WS-VLAM communication library •  Data transfer rate as a function of the data block size (average of 10 measurements per each data-block •  with the deviation not exceeding 5 percent) V. Korkhov et al. VLAM-G: Interactive data driven workflow engine for Grid-enabled resources, Scientific Programming 15 (2007) 173–188 173 IOS Press
  • 19. • Clck to edit Master title style Model of computation •  Model of computation: dataflow network –  components scheduled depending on data –  components only activated when data is available –  no need for co-allocation •  Communication: time decouples –  messaging communication system. –  components not synchronized –  communication not strictly TCP/IP
  • 20. • Clck to edit Master title style Additional features-Farming •  Task farming: task replication. •  Increases data consumption and production. •  Implements 3 types of farming: –  Auto Farming: The engine decides on farm size depending on port load. –  One-to-One Farming: A task replicated for every message received. –  Fixed Farming: Statically defined. •  Allows parameter sweep studies. •  A task becomes a parameter engine
  • 21. • Clck to edit Master title style Reginald Cushing, Spiros Koulouzis, Adam S. Z. Belloum, Marian Bubak, Prediction-based Auto- scaling of Scientific Workflows, 7th IEEE International Conference on e-Science, December 2011, Stockholm, Sweden
  • 22. • Clck to edit Master title style System Overview Reginald Cushing, Spiros Koulouzis, Adam S. Z. Belloum, Marian Bubak, Prediction-based Auto-scaling of Scientific Workflows, Proceedings of the 9th International Workshop on Middleware for Grids, Clouds and e-Science, ACM/IFIP/USENIX December 12th, 2011, Lisbon, Portugal
  • 23. • Clck to edit Master title style Enactment Engine
  • 24. • Clck to edit Master title style Message Broker
  • 25. • Clck to edit Master title style Submission System
  • 26. • Clck to edit Master title style Task Harnessing
  • 27. • Clck to edit Master title style Task Auto-scaling
  • 28. • Clck to edit Master title style Scaling Concepts
  • 29. • Clck to edit Master title style Scaling Concepts
  • 30. • Clck to edit Master title style Scaling Concepts
  • 31. • Clck to edit Master title style Scaling Concepts
  • 32. • Clck to edit Master title style Scaling Concepts
  • 33. • Clck to edit Master title style Scaling Concepts
  • 34. • Clck to edit Master title style Load Prediction
  • 35. • Clck to edit Master title style Auto-scaling steps
  • 36. • Clck to edit Master title style Auto scaling steps
  • 37. • Clck to edit Master title style Auto scaling steps
  • 38. • Clck to edit Master title style Auto-scaling Steps
  • 39. • Clck to edit Master title style Auto-scaling Steps
  • 40. • Clck to edit Master title style Auto-scaling Steps
  • 41. • Clck to edit Master title style Auto-scaling Steps (summary) •  Each task port is monitored to calculate the data processing rate •  Data is parceled in messages. Tasks consume messages •  Using the mean data processing rate and the amount of data queued on the port we extrapolate the proc. time for all data •  Based on the current resource we can estimate how many clones are needed to process all data within a time quantum •  Clones are submitted in bursts so not to flood resources •  Port is continuously monitored and further bursts can be submitted •  Once clones are active, message consumption is faster since clones share same queues
  • 42. • Clck to edit Master title style Queue sharing
  • 43. • Clck to edit Master title style Use case Matlab
  • 44. • Clck to edit Master title style Use case
  • 45. • Clck to edit Master title style Workflow Without Scaling
  • 46. • Clck to edit Master title style Workflow Without Scaling
  • 47. • Clck to edit Master title style Use Case
  • 48. • Clck to edit Master title style Workflow execution with Scaling
  • 49. • Clck to edit Master title style Workflow execution with Scaling
  • 50. • Clck to edit Master title style Auto Scaling Task -1
  • 51. • Clck to edit Master title style Auto Scaling Task -2
  • 52. • Clck to edit Master title style Other Scaled Task -1
  • 53. • Clck to edit Master title style Auto Scaling Task -2
  • 54. • Clck to edit Master title style Extension to support Cloud resources Resource on-demand using multiple cloud providers, Super-computing 2010, and SCALE 2012
  • 55. • Clck to edit Master title style Outline •  Introduction •  Lifecycle of an e-science workflow •  Different approach to workflow scheduling –  Workflow Process Modeling & Management In Grid/Cloud –  Workflow and Web services (Workflow and Web services (intrusive/non intrusive) •  Provenance (1) (2) Experiment Problem Prototyping investigation: Shared repositories (4) (3) Results Experiment Publication: Execution:
  • 56. • Clck to edit Master title style Usage of Web Services in e-science •  WS offer interoperability and flexibility in a large scale distributed environment. •  WS can be combined in a workflow so that more complex operations may be achieved, •  but any workflow implementation is potentially faced with a data transport problem
  • 57. • Clck to edit Master title style Service Submission •  Tasks/Jobs can be queued on the runqueue by any entity. The service submission listens on the runqueue and picks up new tasks to submit •  Resources such as Grid or Cloud are abstracted using submitters plugins •  Enabling a new resource is a matter of writing its submitter •  Service Submission performs matchmaking between services and resources to run on Reginald Cushing, Spiros Koulouzis, Adam S. Z. Belloum, Marian Bubak, Dynamic Handling for Cooperating Scientific Web Services, 7th IEEE International Conference on e-Science, December 2011, Stockholm, Sweden
  • 58. • Clck to edit Master title style Service Container Module •  The service container (Axis2) is the actual task that is submitted to a resource. •  The service container acts as a pilot-job mechanism Once active it will pull a web service to host. •  Axis2 is heavily modified to invert web service invocation from passive to active. •  Scaling, orchestration and communication are all handled within the service container
  • 59. • Clck to edit Master title style Bootstrapping Workflows •  The architecture has no central coordinator to orchestrate a workflow. Hence a workflow is only bootstrapped i.e. submit the starting services. The rest are autonomously scheduled by the service containers on the resources. •  The bootstrap client submits the first service and waits for output of the last service.
  • 60. • Clck to edit Master title style Orchestration Steps 1.  Workflow is bootstrapped by submitting the workflow entry points onto the queue 2.  Submission service picks up the queued service and submits to a resource. 3.  Service container starts executing on a resource 4.  Service container pulls a web service and polls for data to be consumed by the service. 5.  Service container outputs data to the next service 6.  Service container queues the next service if none exist
  • 61. • Clck to edit Master title style Service Container - Transport •  Transport handler requests SOAP from message broker queues instead of passively listening for HTTP •  Pull model allows web services to “bypass” firewalls and thus can be deployed within networks •  Transport Sender picks up the return SOAP message and sends it to the message broker
  • 62. • Clck to edit Master title style Service Container - Control •  Fuzzy controller implements auto-scaling routines •  Workflow enactor implements autonomous orchestration which makes a central coordinator redundant •  Message transformer transforms a SOAP output to SOAP input for other services in the workflow •  The message transformer allow back-to-back service communication
  • 63. • Clck to edit Master title style Resource management •  Within a single workflow services are competing for resources. •  Scaling one service without any regard to the whole workflow may starve parts of the workflow and hamper progress •  It would be ideal to have a mechanism to greedily consume resources if no one is using them but donate back resources once they are requested. Fuzzy controller tries to do just that.
  • 64. • Clck to edit Master title style Fuzzy Controller •  Task (web service) load and Resource load are inputs to the fuzzy controller. Rule Base Inference Engine •  The controller applies a number of fuzzy rules to determine the output which is the replication factor. •  IF taskLoad IS very_high AND resourceLoad IS very_low THEN replication IS positive_aggressive. •  IF taskLoad IS very_low AND resourceLoad IS high THEN replication IS negative_aggressive.
  • 65. • Clck to edit Master title style Fuzzy Controller Rule Base Inference Engine •  Task (web service) load and Resource load are inputs to the fuzzy controller. •  The controller applies a number of fuzzy rules to determine the output which is the replication factor. •  IF taskLoad IS very_high AND resourceLoad IS very_low THEN replication IS positive_aggressive. •  IF taskLoad IS very_low AND resourceLoad IS high THEN replication IS negative_aggressive.
  • 66. • Clck to edit Master title style Fuzzy Rule Map •  Service load is based on the amount of data being queued on the service and the time quantum for the service to run •  The service container continuously monitors the data processing rate and estimates the computation time needed to process all the queued data within a time frame of the data and the processing time are directly proportionate. This might not be the case for all problems. The estimated processing time and the time quantum given by the resource for executing the service are used to derive the service load. Thus a service load of 2 means that it will take twice as much time as the allocated quantum to process the data.
  • 67. • Clck to edit Master title style Back-to-Back Communication •  Back-2-Back communication allows web services to communicate directly without the need for an intermediate client. •  This is achieved through the message broker which exposes dedicated connections queues. Foreach connection in A.method1.connections SOAPTemplate = getTemplate(connection); destinationQueue = getDestination(connection); newSOAP = transformSOAP(A.method1.output, SOAPTEmplate); write(newSOAP, destinationQueue);
  • 68. • Clck to edit Master title style Autonomous Orchestration •  The service container can query the message broker to deduce if and instance of B is running. •  If no instance of B is running, the service container for A submits B to the runqueue. •  Service containers are myopic Foreach connection in A.method1.connections SOAPTemplate = getTemplate(connection); destinationQueue = getDestination(connection); newSOAP = transformSOAP(A.method1.output, SOAPTEmplate); write(newSOAP, destinationQueue); If not active(destinationQueue) submit( getService(connection) );
  • 69. • Clck to edit Master title style Use Case •  Workflow with 2 pipelines. The pipelines perform sequence alignments using data from UniProtKB •  Each pipeline performs 22500 alignments i.e. 45100 total alignments in all •  All modules are standard web services which are hosted in the modified Axis2 container •  The alignments where performed using BioJava api •  Source and sink are part of the bootstrapping sequence. Source submits the getSequenceId service while sink waits for output from the htmlRenderer •  The Distributed ASCI Computer 3 (DAS3) was used as the resource pool. www.uniprot.org
  • 70. • Clck to edit Master title style Evaluating Auto-Scaling Service Load Running Service instances
  • 71. • Clck to edit Master title style Scale up web services Peaks in load(left) will result in peaks in instances(right). The fuzzy controllers scale up the web services to meet the demands
  • 72. • Clck to edit Master title style Scale up web services
  • 73. • Clck to edit Master title style Greedy Scale up
  • 74. • Clck to edit Master title style Scale down web services
  • 75. • Clck to edit Master title style Outline •  Introduction •  Lifecycle of an e-science workflow •  Different approach to workflow scheduling –  Workflow Process Modeling & Management In Grid/Cloud –  Workflow and Web services (Workflow and Web services (intrusive/non intrusive) •  Provenance (1) (2) Experiment Problem Prototyping investigation: Shared repositories (4) (3) Results Experiment Publication: Execution:
  • 76. • Clck to edit Master title style Usage of Web Services in e-science •  In service orchestration, all data is passed to the workflow engine before delivered to a consuming WS •  Data transfers are made through SOAP, which is unfit for large data transfers Enabling web services to consume and produce large distributed datasets Spiros Koulouzis, Reginald Cushing, Konstantinos Karasavvas, Adam Belloum, Marian Bubak to be published JAN/FEB, IEEE Internet Computing, 2012
  • 77. • Clck to edit Master title style ProxyWS •  uses multitude of protocols to transport large data –  used as an interface for developing WSs able to stream data. –  Or as enabler for legacy web services to stretch their current potential by referencing data that would otherwise be delivered via SOAP Enabling web services to consume and produce large distributed datasets Spiros Koulouzis, Reginald Cushing, Konstantinos Karasavvas, Adam Belloum, Marian Bubak to be published JAN/FEB, IEEE Internet Computing, 2012
  • 78. • Clck to edit Master title style Indexing Name Entry Recognition •  AIDA provides a set of components which enable the indexing of text documents in various formats. •  AIDA's Indexer component, called IndexerWS is a WS able to index document with the use of the Streaming library.
  • 79. • Clck to edit Master title style Results Indexing Web Services for Information Retrieval (Indexing) Enabling web services to consume and produce large distributed datasets Spiros Koulouzis, Reginald Cushing, Konstantinos Karasavvas, Adam Belloum, Marian Bubak to be published JAN/FEB, IEEE Internet Computing, 2012
  • 80. • Clck to edit Master title style Results Indexing Web Services for Information Retrieval (NER) Enabling web services to consume and produce large distributed datasets Spiros Koulouzis, Reginald Cushing, Konstantinos Karasavvas, Adam Belloum, Marian Bubak to be published JAN/FEB, IEEE Internet Computing, 2012
  • 81. • Clck to edit Master title style References 1.  A.S.Z. Belloum, V. Korkhov, S. Koulouzis, M. A Inda, and M. Bubak Collaborative e-Science experiments: from scientific workflow to knowledge sharing JULY/AUGUST, IEEE Internet Computing, 2011 2.  Ilkay Altintas, Manish Kumar Anand, Daniel Crawl, Shawn Bowers, Adam Belloum, Paolo Missier, Bertram Ludascher, Carole A. Goble, Peter M.A. Sloot, Understanding Collaborative Studies Through Interoperable Workflow Provenance, IPAW2010, Troy, NY, USA 3.  A. Belloum, Z. Zhao, and M. Bubak Workflow systems and applications , Future Generation Comp. Syst. 25 (5): 525-527 (2009) 4.  Z. Zhao, A.S.Z. Belloum, et al., Distributed execution of aggregated multi domain workflows using an agent framework The 1st IEEE International Workshop on Scientific Workflows, Salt Lake City, U.SA, 2007 5.  Zhiming Zhao, Adam Belloum, Cees De Laat, Pieter Adriaans, Bob Hertzberger Using Jade agent framework to prototype an e-Science workflow bus Authors Cluster Computing and the Grid, 2007. CCGRID 2007
  • 82. • Clck to edit Master title style http://www.commit-nl.nl/ http://www.science.n/~gvlam/wsvlam/ http://www.vle.nl/
  • 83. • Clck to edit Master title style Outline •  Introduction •  Lifecycle of an e-science workflow •  Different approach to workflow scheduling –  Workflow Process Modeling & Management In Grid/Cloud –  Workflow and Web services (intrusive/non- intrusive) •  provenance (1) (2) Experiment Problem Prototyping investigation: Shared repositories (4) (3) Results Experiment Publication: Execution:
  • 84. • Clck to edit Master title style Provenance/ reproducibility •  “A complete provenance record for a data object allows the possibility to reproduce the result and reproducibility is a critical component of the scientific method” •  Provenance: The recording of metadata and provenance information during the various stages of the workflow lifecycle Workflows and e-Science: An overview of workflow system features and capabilities Ewa Deelmana, Dennis Gannonb, Matthew Shields c, Ian Taylor, Future Generation Computer Systems 25 (2009) 528540
  • 85. • Clck to edit Master title style History-tracing XML (FH Aachen) •  provides data/process <patternMatch> <events> provenance following an <PortResolved> provenance approach that data</PortResolved> <ConDone>provenance data –  maps the workflow graph </ConDone> to a layered structure of ... </events> an XML document. <fileReader2> –  This allows an intuitive <events> ... </events> <sign-fileReader2> ... and easy processable </signfileReader2> representation of the </fileReader2> workflow execution path, <sffToFasta> Reference –  which can be, eventually, </sffToFasta> electronically signed. <sign-patternMatch> ... </sign-patternMatch> </patternMatch> M. Gerards, Adam S. Z. Belloum, F. Berritz, V. Snder, S. Skorupa, A History-tracing XML-base Proveannce Framework for workflows, WORKS 2010, New Orleans, USA, November 2010
  • 86. • Clck to edit Master title style Plier (UvA/BigGrid) •  PLIER is an implementation of the OPM 1.1 specifications. •  It’s API provides a set of functions to build, store, and share workflow experiments as graphs. •  It also implements an optimal relational database as back-end storage that captures the concepts of the OPM model, using the Java Persistence API (JPA 2.0) and Hibernate. •  In addition, the PLIER API provides specific interfaces, using JDO 3.1, to transform, or serialize, the provenance data into specific formats (e.g. RDF, XML, and DOT). Id_36813958 tagMichel200 1_input_resul resultPatternM atch 80707.txt t_fasta Ribosomal_H uman.gz parameter Id 1587433265 input patternFile input_file input_file Input sffinfo Component.tar parameter sffTo input_file input_file Fasta Pattern blast TrigeredBy Match all TrigeredBy Completed Completed Completed output_file output_file output_file output_file • Id 81081428 Id_1587433265_outp Id_1587433265_out Id_368139581_outpu output sffOutput ut_result_fasta put_result_txt t_out_blast_tar
  • 87. • Clck to edit Master title style wave propagation model applications User Interface to compose workflow (top right), monitor the execution of the farmed workflows (top left), and monitor each run separately (bottom left) data [Biomedical engineering Cardiovascular biomechanics group TUE]) wave propagation model of blood flow in large vessels using an approximate velocity profile function: a biomedical study for which 3000 runs were required to perform a global sensitivity analysis of a blood pressure wave propagation in arteries Query interface for the provenance data collected from 3000 simulations of the “wave propagation model of blood flow in large BigGrid project 2009, presented EGI/ vessels using an approximate velocity profile BigGrid technical forum 2010 function”
  • 88. • Clck to edit Master title style Blast Application [Department of Clinical Epidemiology, Biostatistics and Bioinformatics (KEBB), AMC ] The aim of the application is the alignment of DNA sequence data with a given reference database. A workflow approach is currently followed to run this application on distributed computing resources. For Each workflow run • The provenance data is collected an stored following the XML-tracing system • User interface allows to reproduce events that occurred at runtime (replay mode) • User Interface can be customized (User can select the events to track) • User Interface show resource usage on-going work UvA-AMC-fh-aachen
  • 89. • Clck to edit Master title style Outline •  Objectives •  Problem statement and Challenges •  Research Track –  Track 1: Workflow Process Modeling & Management In Grid/Cloud –  Track 2: Workflow Sharing and Reproducibility Workflow Semantics and provenance –  Track 3: Management of scientific data Scalable Data access •  Applications
  • 90. • Clck to edit Master title style http://www.commit-nl.nl/ http://www.science.n/~gvlam/wsvlam/ http://www.vle.nl/