SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
, reloaded



           Paolo Missier, Stian Soiland-Reyes, Stuart Owen, Alex Nenadic,
                        Ian Dunlop, Alan Williams, Carole Goble
                               School of Computer Science
                               University of Manchester, UK

         Wei Tan
                                                                       Tom Oinn
Argonne National Laboratory
     Argonne, IL, USA                                                   EBI, UK


                                     SSDBM
                               Heidelberg, Germany
                               June 30 - July 2, 2010
Taverna: scientific workflow management
Goal: perform cancer diagnosis using microarray analysis
- learn a model for lymphoma type prediction based on
samples from different lymphoma types


source: caGrid




http://www/myexperiment.org/workflows/746                  2
Taverna: scientific workflow management
Goal: perform cancer diagnosis using microarray analysis
- learn a model for lymphoma type prediction based on
samples from different lymphoma types

                                          lymphoma samples
source: caGrid                            ➔ hybridization data




http://www/myexperiment.org/workflows/746                        2
Taverna: scientific workflow management
Goal: perform cancer diagnosis using microarray analysis
- learn a model for lymphoma type prediction based on
samples from different lymphoma types

                                          lymphoma samples
source: caGrid                            ➔ hybridization data




   process microarray
 data as training dataset




http://www/myexperiment.org/workflows/746                        2
Taverna: scientific workflow management
Goal: perform cancer diagnosis using microarray analysis
- learn a model for lymphoma type prediction based on
samples from different lymphoma types

                                          lymphoma samples
source: caGrid                            ➔ hybridization data




   process microarray
 data as training dataset
                                     learn
                              predictive model


http://www/myexperiment.org/workflows/746                        2
Taverna: scientific workflow management
Goal: perform cancer diagnosis using microarray analysis
- learn a model for lymphoma type prediction based on
samples from different lymphoma types

                                          lymphoma samples
source: caGrid                            ➔ hybridization data




http://www/myexperiment.org/workflows/746                        2
In this (short) talk...




• Taverna for data-intensive science
  – architectural goals... and how they are being achieved


• Performance figures on benchmark workflows




                                                             3
Target -ilities
• Scalability
  – size of input data / size of input collections

• Configurability
  – each processor in the workflow individually tunable to
    adjust to the underlying platform

• Extensibility
  – plugin architecture to accommodate specific service
    groups
     • caGrid, BioMoby

  – extensible set of operators that define the semantics of the
    model


                                                                   4
Opportunities for parallel processing
• intra-processor: implicit iteration over list data
• inter-processor: pipelining
               [ id1, id2, id3, ...]




                                                5
Opportunities for parallel processing
• intra-processor: implicit iteration over list data
• inter-processor: pipelining
                  [ id1, id2, id3, ...]


           SFH1        SFH2          SFH3

            ...          ...              ...



         getDS1       getDS2         getDS3




                                                5
Opportunities for parallel processing
• intra-processor: implicit iteration over list data
• inter-processor: pipelining
                   [ id1, id2, id3, ...]


           SFH1         SFH2          SFH3

            ...           ...              ...



         getDS1        getDS2         getDS3


                  [ DS1, DS2, DS3, ...]




                                                 5
Configurable processor behaviour


                                    allocates concurrent
                                                  threads to list elements
             Default dispatch stack




Request
                   Parallelise

                   Provenance
                                                 captures service
                 Error bounce                    invocation events
                    Failover




                                      Response
                     Retry                       persistent server error:
                                                 tries each available
                  Stop/pause
                                                 service implementation
                     Invoke                      in turn

                                                 transient errors:
          interacts with                         retries one service
          underlying activity                    invocation
          - service operation
          - shell script execution
Extensible processor semantics
          

            Default dispatch stack
Request



                  Parallelise

                  Provenance

                Error bounce

                   Failover




                                     Response
                    Retry

                 Stop/pause

                    Invoke




                                                                         7
Extensible processor semantics
          

            Default dispatch stack                            Default dispatch stack                        Extended dispatch stack




                                                   Request




                                                                                                  Request
Request



                  Parallelise
                                                Enhancing dataflow model                                           Parallelise
                                                with limited form of loops to
                                                                Parallelise
                  Provenance                                                                                      Error bounce
                                                model busy waiting
                                                              Error bounce
                Error bounce                    (asynchronous service                                           Loop/Branching
                                                interaction)     Failover
                   Failover                                                                                         Failover




                                     Response
                                                                      Retry




                                                                                       Response




                                                                                                                                      Response
                    Retry                                                                                            Retry
                                                                     Invoke
                 Stop/pause                                                                                         Invoke
                    Invoke
                                                                       (a)                                           (b)




                                                                                                                                 7
Extensible processor semantics
          

            Default dispatch stack                                                     Default dispatch stack                               Extended dispatch stack




                                                                   Request




                                                                                                                                  Request
Request



                  Parallelise
                                                         Enhancing dataflow model                                                                  Parallelise
                                                         with limited form of loops to
                                                                         Parallelise
                  Provenance                                                                                                                      Error bounce
                                                         model busy waiting
                                                                       Error bounce
                Error bounce                             (asynchronous service                                                                  Loop/Branching
                                                         interaction)     Failover
                   Failover                                                                                                                         Failover




                                          Response
                                                                                                      Retry




                                                                                                                       Response




                                                                                                                                                                      Response
                    Retry                                                                                                                            Retry
                                                                                                      Invoke
                 Stop/pause                                                                                                                         Invoke
                                                         ."4&04&             7';'3%
                    Invoke                                    ;(4)4&0;=;"2.'4

                                                        '&&'./304&5,%&              +"#,-               (a)                                          (b)
                                !"#$#%&'()(*+,-%$.*//$0*1

                                                          2*(34/*56#%.8&9
                                                                                              +"#,-           &670

          Busy waiting                                         !"#$)*                          80&$9:5$;0%(<&


          using an
                                                                                            ;0%(<&    '&&'./304&5,%&


          explicit                                              +"#,-

          workflow                                        ./0.12&'&(%

                                                     %&'&(%     '&&'./304&5,%&
          pattern                2*(34/*5678&.8&9
                                                                    +"#$%&'&(%

                                      !"#$%&'&(%                        )%$-"40

                                                                        ,%$-"40


                                                                             &0%&

                                                                        2(..0%%



                                                                                                                                                                 7
Extensible processor semantics
          

            Default dispatch stack                                                     Default dispatch stack                                   Extended dispatch stack




                                                                   Request




                                                                                                                                     Request
Request



                  Parallelise
                                                         Enhancing dataflow model                                                                           Parallelise
                                                         with limited form of loops to
                                                                         Parallelise
                  Provenance                                                                                                                              Error bounce
                                                         model busy waiting
                                                                       Error bounce
                Error bounce                             (asynchronous service                                                                          Loop/Branching
                                                         interaction)     Failover
                   Failover                                                                                                                                      Failover




                                          Response
                                                                                                      Retry




                                                                                                                          Response




                                                                                                                                                                                            Response
                    Retry                                                                                                                                          Retry
                                                                                                      Invoke
                 Stop/pause                                                                                                                                       Invoke
                                                         ."4&04&             7';'3%
                    Invoke                                    ;(4)4&0;=;"2.'4

                                                        '&&'./304&5,%&              +"#,-               (a)                                                         (b)
                                !"#$#%&'()(*+,-%$.*//$0*1

                                                          2*(34/*56#%.8&9
                                                                                              +"#,-           &670                             (3/360      4"7&)7&

          Busy waiting                                         !"#$)*                          80&$9:5$;0%(<&                                    /1787&)/9/":437

          using an
                                                                                            ;0%(<&    '&&'./304&5,%&                           3&&3456)7&.$0&    !"#$%


          explicit                                              +"#,-                                                                                                     !"#$%

          workflow                                        ./0.12&'&(%

                                                     %&'&(%     '&&'./304&5,%&
                                                                                                                       Busy waiting                                  45)4;:&3&10

                                                                                                                                                                3&&3456)7&.$0&     0&3&10
          pattern                                                                                                      using an
                                 2*(34/*5678&.8&9
                                                                    +"#$%&'&(%                                         loop                                 !"#$%           &'()

                                      !"#$%&'&(%                        )%$-"40

                                                                        ,%$-"40
                                                                                                                       processor                            *)&+,-.+/)012&

                                                                                                                                                         /)012&     3&&3456)7&.$0&


                                                                             &0%&

                                                                        2(..0%%



                                                                                                                                                                                      7
Performance experimental setup
• previous version of Taverna engine used as baseline
• objective: to measure incremental improvement

                                 list generator
                                                      Parameters:
                                                      - byte size of list elements (strings)
                                                      - size of input list
                                                      - length of linear chain
   multiple parallel pipelines




                                          main insight: when the workflow is designed for
                                          pipelining, parallelism is exploited effectively




                                                                                               8
I - Memory usage

                                      shorter execution
                                      time due to pipelining
              T2 main memory
              data management


            T2 main embedded Derby back-end



                        T1 baseline




list size: 1,000 strings of 10K chars each
no intra-processor parallelism (1 thread/processor)


                                                                   9
I - Memory usage

                                      shorter execution
                                      time due to pipelining
              T2 main memory
              data management


            T2 main embedded Derby back-end



                        T1 baseline




list size: 1,000 strings of 10K chars each
no intra-processor parallelism (1 thread/processor)


                                                                   9
Available processors pool




pipelining in T2 makes up for smaller pools of threads/processor




                                                                   10
Bounded main memory usage
Separation of data and process spaces ensures scalable data management




                  varying data element size:
                  10K, 25K, 100K chars




                                                                         11
Summary
• Taverna 2 re-engineered for scalability on data-
  intensive pipelined dataflows

• separation of data and process spaces
• generic stack pattern for principled extensions
  – limited while loop, if-then-else
  – provenance capture
  – ... open plugin architecture accommodates further
    extensions



      For further details and a nice chat:
               please visit our poster!!


                                                           12

Weitere ähnliche Inhalte

Ähnlich wie Paper presentation: Taverna, reloaded

Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010
Aditya Varun Chadha
 
Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination Ext
Rohit Kelapure
 
Rails Application Optimization Techniques & Tools
Rails Application Optimization Techniques & ToolsRails Application Optimization Techniques & Tools
Rails Application Optimization Techniques & Tools
guest05c09d
 

Ähnlich wie Paper presentation: Taverna, reloaded (20)

Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate...
Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate...Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate...
Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate...
 
Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010Naukri Search Team achievements, 2009-2010
Naukri Search Team achievements, 2009-2010
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the Databus
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection[RAT資安小聚] Study on Automatically Evading Malware Detection
[RAT資安小聚] Study on Automatically Evading Malware Detection
 
20111104 s4 overview
20111104 s4 overview20111104 s4 overview
20111104 s4 overview
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012
 
Wolstencroft K - Workflows on the Cloud: scaling for national service
Wolstencroft K - Workflows on the Cloud: scaling for national serviceWolstencroft K - Workflows on the Cloud: scaling for national service
Wolstencroft K - Workflows on the Cloud: scaling for national service
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and App...
 
Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination Ext
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
 
#lspe: Dynamic Scaling
#lspe: Dynamic Scaling #lspe: Dynamic Scaling
#lspe: Dynamic Scaling
 
Rails Application Optimization Techniques & Tools
Rails Application Optimization Techniques & ToolsRails Application Optimization Techniques & Tools
Rails Application Optimization Techniques & Tools
 
Genomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyGenomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive Biology
 
Testing concurrent java programs - Sameer Arora
Testing concurrent java programs - Sameer AroraTesting concurrent java programs - Sameer Arora
Testing concurrent java programs - Sameer Arora
 
C4Bio paper talk
C4Bio paper talkC4Bio paper talk
C4Bio paper talk
 
Service Mesh vs. Frameworks: Where to put the resilience?
Service Mesh vs. Frameworks: Where to put the resilience?Service Mesh vs. Frameworks: Where to put the resilience?
Service Mesh vs. Frameworks: Where to put the resilience?
 
Service Mesh vs. Frameworks: Where to put the resilience?
Service Mesh vs. Frameworks: Where to put the resilience?Service Mesh vs. Frameworks: Where to put the resilience?
Service Mesh vs. Frameworks: Where to put the resilience?
 

Mehr von Paolo Missier

Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
Paolo Missier
 

Mehr von Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

Paper presentation: Taverna, reloaded

  • 1. , reloaded Paolo Missier, Stian Soiland-Reyes, Stuart Owen, Alex Nenadic, Ian Dunlop, Alan Williams, Carole Goble School of Computer Science University of Manchester, UK Wei Tan Tom Oinn Argonne National Laboratory Argonne, IL, USA EBI, UK SSDBM Heidelberg, Germany June 30 - July 2, 2010
  • 2. Taverna: scientific workflow management Goal: perform cancer diagnosis using microarray analysis - learn a model for lymphoma type prediction based on samples from different lymphoma types source: caGrid http://www/myexperiment.org/workflows/746 2
  • 3. Taverna: scientific workflow management Goal: perform cancer diagnosis using microarray analysis - learn a model for lymphoma type prediction based on samples from different lymphoma types lymphoma samples source: caGrid ➔ hybridization data http://www/myexperiment.org/workflows/746 2
  • 4. Taverna: scientific workflow management Goal: perform cancer diagnosis using microarray analysis - learn a model for lymphoma type prediction based on samples from different lymphoma types lymphoma samples source: caGrid ➔ hybridization data process microarray data as training dataset http://www/myexperiment.org/workflows/746 2
  • 5. Taverna: scientific workflow management Goal: perform cancer diagnosis using microarray analysis - learn a model for lymphoma type prediction based on samples from different lymphoma types lymphoma samples source: caGrid ➔ hybridization data process microarray data as training dataset learn predictive model http://www/myexperiment.org/workflows/746 2
  • 6. Taverna: scientific workflow management Goal: perform cancer diagnosis using microarray analysis - learn a model for lymphoma type prediction based on samples from different lymphoma types lymphoma samples source: caGrid ➔ hybridization data http://www/myexperiment.org/workflows/746 2
  • 7. In this (short) talk... • Taverna for data-intensive science – architectural goals... and how they are being achieved • Performance figures on benchmark workflows 3
  • 8. Target -ilities • Scalability – size of input data / size of input collections • Configurability – each processor in the workflow individually tunable to adjust to the underlying platform • Extensibility – plugin architecture to accommodate specific service groups • caGrid, BioMoby – extensible set of operators that define the semantics of the model 4
  • 9. Opportunities for parallel processing • intra-processor: implicit iteration over list data • inter-processor: pipelining [ id1, id2, id3, ...] 5
  • 10. Opportunities for parallel processing • intra-processor: implicit iteration over list data • inter-processor: pipelining [ id1, id2, id3, ...] SFH1 SFH2 SFH3 ... ... ... getDS1 getDS2 getDS3 5
  • 11. Opportunities for parallel processing • intra-processor: implicit iteration over list data • inter-processor: pipelining [ id1, id2, id3, ...] SFH1 SFH2 SFH3 ... ... ... getDS1 getDS2 getDS3 [ DS1, DS2, DS3, ...] 5
  • 12. Configurable processor behaviour  allocates concurrent threads to list elements Default dispatch stack Request Parallelise Provenance captures service Error bounce invocation events Failover Response Retry persistent server error: tries each available Stop/pause service implementation Invoke in turn transient errors: interacts with retries one service underlying activity invocation - service operation - shell script execution
  • 13. Extensible processor semantics  Default dispatch stack Request Parallelise Provenance Error bounce Failover Response Retry Stop/pause Invoke 7
  • 14. Extensible processor semantics  Default dispatch stack Default dispatch stack Extended dispatch stack Request Request Request Parallelise Enhancing dataflow model Parallelise with limited form of loops to Parallelise Provenance Error bounce model busy waiting Error bounce Error bounce (asynchronous service Loop/Branching interaction) Failover Failover Failover Response Retry Response Response Retry Retry Invoke Stop/pause Invoke Invoke (a) (b) 7
  • 15. Extensible processor semantics  Default dispatch stack Default dispatch stack Extended dispatch stack Request Request Request Parallelise Enhancing dataflow model Parallelise with limited form of loops to Parallelise Provenance Error bounce model busy waiting Error bounce Error bounce (asynchronous service Loop/Branching interaction) Failover Failover Failover Response Retry Response Response Retry Retry Invoke Stop/pause Invoke ."4&04& 7';'3% Invoke ;(4)4&0;=;"2.'4 '&&'./304&5,%& +"#,- (a) (b) !"#$#%&'()(*+,-%$.*//$0*1 2*(34/*56#%.8&9 +"#,- &670 Busy waiting !"#$)* 80&$9:5$;0%(<& using an ;0%(<& '&&'./304&5,%& explicit +"#,- workflow ./0.12&'&(% %&'&(% '&&'./304&5,%& pattern 2*(34/*5678&.8&9 +"#$%&'&(% !"#$%&'&(% )%$-"40 ,%$-"40 &0%& 2(..0%% 7
  • 16. Extensible processor semantics  Default dispatch stack Default dispatch stack Extended dispatch stack Request Request Request Parallelise Enhancing dataflow model Parallelise with limited form of loops to Parallelise Provenance Error bounce model busy waiting Error bounce Error bounce (asynchronous service Loop/Branching interaction) Failover Failover Failover Response Retry Response Response Retry Retry Invoke Stop/pause Invoke ."4&04& 7';'3% Invoke ;(4)4&0;=;"2.'4 '&&'./304&5,%& +"#,- (a) (b) !"#$#%&'()(*+,-%$.*//$0*1 2*(34/*56#%.8&9 +"#,- &670 (3/360 4"7&)7& Busy waiting !"#$)* 80&$9:5$;0%(<& /1787&)/9/":437 using an ;0%(<& '&&'./304&5,%& 3&&3456)7&.$0& !"#$% explicit +"#,- !"#$% workflow ./0.12&'&(% %&'&(% '&&'./304&5,%& Busy waiting 45)4;:&3&10 3&&3456)7&.$0& 0&3&10 pattern using an 2*(34/*5678&.8&9 +"#$%&'&(% loop !"#$% &'() !"#$%&'&(% )%$-"40 ,%$-"40 processor *)&+,-.+/)012& /)012& 3&&3456)7&.$0& &0%& 2(..0%% 7
  • 17. Performance experimental setup • previous version of Taverna engine used as baseline • objective: to measure incremental improvement list generator Parameters: - byte size of list elements (strings) - size of input list - length of linear chain multiple parallel pipelines main insight: when the workflow is designed for pipelining, parallelism is exploited effectively 8
  • 18. I - Memory usage shorter execution time due to pipelining T2 main memory data management T2 main embedded Derby back-end T1 baseline list size: 1,000 strings of 10K chars each no intra-processor parallelism (1 thread/processor) 9
  • 19. I - Memory usage shorter execution time due to pipelining T2 main memory data management T2 main embedded Derby back-end T1 baseline list size: 1,000 strings of 10K chars each no intra-processor parallelism (1 thread/processor) 9
  • 20. Available processors pool pipelining in T2 makes up for smaller pools of threads/processor 10
  • 21. Bounded main memory usage Separation of data and process spaces ensures scalable data management varying data element size: 10K, 25K, 100K chars 11
  • 22. Summary • Taverna 2 re-engineered for scalability on data- intensive pipelined dataflows • separation of data and process spaces • generic stack pattern for principled extensions – limited while loop, if-then-else – provenance capture – ... open plugin architecture accommodates further extensions For further details and a nice chat: please visit our poster!! 12