SlideShare a Scribd company logo
1 of 25
Provenance Annotation and Analysis to
Support
Process Re-Computation
Jacek Cała, Paolo Missier
School of Computing
Newcastle University, UK
Problem Outline
• Consider process P, e.g. the following NGS pipeline [5]:
[5] Cała, J., Marei, E., Xu, Y., Takeda, K., Missier, P.: Scalable and efficient whole-exome data processing using workflows on the cloud.
Future Generation Computer Systems (Jan 2016).
Problem Outline
• Only rarely P is a static entity.
• Usually, a variety of elements in P change:
• data dependencies,
• software tools & dependencies,
• [out of scope] the structure of P.
• Changes in the elements of P
=> the need to update past P outcomes
=> the need for re-computation.
The Re-Computation Framework
• To control the re-computation of processes
• proposed earlier in [6].
• The core of the framework is
the re-computation loop:
[6] Cała, J., Missier, P.: Selective and recurring re-computation of Big Data analytics
tasks: insights from a Genomics case study. Big Data Research (2018); in press.
Re-Computation Process
• Here we consider a single pass of the loop:
• And focus on the first step only (S1).
Preliminaries
• The ProvONE model: prospective + retrospective provenance [7].
• Set of software and data dependencies: D ={a0, b0, …}
• Process, input and execution configuration: E(P, x,V)
• Version change event: C = {an → an-1}
• Composite version change event: C = {an → an-1, bm → bm-1, …}
• Change front.
• Re-computation front.
• Restart tree.
[7] Cuevas-Vicenttín, V., Ludäscher, B., Missier, P., et al.: ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance (2016).
Change Front
• The accumulation of change events over a specified time window.
t
C0
{a1 → a0}
CF3
{a3, b1, c2}
CF5
{a3, b2, c2, d1}
C1
{b1 → b0}
C3
{a3 → a2, c2 → c1}
C4
{d1 → d0}
C5
{b2 → b1}
C2
{a2 → a1, c1 → c0}
E(…, [a0, b0, e0])
E(…, [a0, b1, d0])
E(…, [a2, b1, c1])
Re-computation Front
• Over time the population of executions grow
• Some of them may result from re-executions
• Some of them may be user-initiated
• may use historical versions of elements
• Looking for the transitive closure of the elements’ derivation is too
broad.
=> find out which of the past executions really need an update.
Re-computation Front
We use:
wasInformedBy(..., [prov:type=“recomp:re-execution”])
to denote a ReComp-initiated re-execution.
Re-computation Front
Re-computation Front
…
user-initiated
Re-computation Front
Restart Tree
• Re-computation front handles single executions well.
• What if the process is more complex than that?
• pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
Restart Tree
• Re-computation front handles single executions well.
• What if the process is more complex than that?
• pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
Restart Tree
• Re-computation front handles single executions well.
• What if the process is more complex than that?
• pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
Restart Tree
• Re-computation front handles single executions well.
• What if the process is more complex than that?
• pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
Restart Tree
• Re-computation front handles single executions well.
• What if the process is more complex than that?
• pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
 The provenance trace includes multiple
interrelated executions.
 During re-execution we have to combine
all of them within a single context – the
top-level execution.
Restart Tree
• To build a restart tree we rely on the proveone:wasPartOf
statements.
CF = {b2, e1}
Restart Tree
• Captures the vertical dimension of a single execution
• the transitive closure of the wasPartOf relation.
RT ≝ {Execution, [DataChange], [Children]}
CF = {b2, e1}
Restart Tree
• Captures the vertical dimension of a single execution
• the transitive closure of the wasPartOf relation.
RT = {E0, [], [{SE0, [], [{SSE1, [⟨b2 → b0⟩], []},{SSE3, [⟨e1 → e0⟩], []}]}, {SE1, …}, …]}
CF = {b2, e1}
The algorithm
• Combines all three aspects:
• the change front,
• the re-computation front and
• the restart tree.
• For a given change front,
–> produces the recomputation front that
–> includes a set of restart trees,
–> each refers to a single top-level execution with only the parts related to the
change(s).
• Enables ReComp to identify the minimal set of executions that may be affected by the
change(s)
• The remaining executions are either unaffected at all or refreshed previously.
Re-Computation Process
• Enables difference and impact analysis of the executions on the front
and their partial re-execution.
Difference and Impact Analysis
<<hasSubProgram>>
<<hasSubProgram>>
Conclusions
• We address the problem of the re-computation of:
• complex hierarchical processes,
• run over a cohort of input data samples,
• with multiple points of change,
• in the open system – allow users to initiate (re-)executions any time.
• The solution starts from the changes observed:
• In contrast to previous work, e.g. smart re-run and workflow caching.
• We proposed a simple algorithm to find the re-computation front:
• written in Prolog,
• very effective (response in the order of 1–100 ms),
• available on GitHub.
• The algorithm is the initial step in further scope identification and execution optimisation.
Thank you!
http://www.recomp.org.uk

More Related Content

What's hot

Aggregation is not Replication
Aggregation is not ReplicationAggregation is not Replication
Aggregation is not ReplicationCarlos Baquero
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsJen Aman
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...Rahul Jain
 
Dynamic memory allocation in c
Dynamic memory allocation in cDynamic memory allocation in c
Dynamic memory allocation in clavanya marichamy
 
Sas program for production scheduling in machine shop
Sas program for production scheduling in machine shopSas program for production scheduling in machine shop
Sas program for production scheduling in machine shopThiru Navukkarasu
 
Meteo I/O Introduction
Meteo I/O IntroductionMeteo I/O Introduction
Meteo I/O IntroductionRiccardo Rigon
 
Formalising paging in memory management
Formalising paging in memory managementFormalising paging in memory management
Formalising paging in memory managementGokul Vasan
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkMikio L. Braun
 
JGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separationJGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separationMarialaura Bancheri
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit
 
CS 151 Graphing lecture
CS 151 Graphing lectureCS 151 Graphing lecture
CS 151 Graphing lectureRudy Martinez
 

What's hot (17)

Aggregation is not Replication
Aggregation is not ReplicationAggregation is not Replication
Aggregation is not Replication
 
Dma
DmaDma
Dma
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
Discrete Math Lab Cheminformatics Joint Project
Discrete Math Lab Cheminformatics Joint ProjectDiscrete Math Lab Cheminformatics Joint Project
Discrete Math Lab Cheminformatics Joint Project
 
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
 
Dynamic memory allocation in c
Dynamic memory allocation in cDynamic memory allocation in c
Dynamic memory allocation in c
 
Sas program for production scheduling in machine shop
Sas program for production scheduling in machine shopSas program for production scheduling in machine shop
Sas program for production scheduling in machine shop
 
JGrass-NewAge ET component
 JGrass-NewAge ET component JGrass-NewAge ET component
JGrass-NewAge ET component
 
Stack Data structure
Stack Data structureStack Data structure
Stack Data structure
 
Meteo I/O Introduction
Meteo I/O IntroductionMeteo I/O Introduction
Meteo I/O Introduction
 
Formalising paging in memory management
Formalising paging in memory managementFormalising paging in memory management
Formalising paging in memory management
 
Project management
Project managementProject management
Project management
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into Flink
 
JGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separationJGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separation
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 
CS 151 Graphing lecture
CS 151 Graphing lectureCS 151 Graphing lecture
CS 151 Graphing lecture
 

Similar to Provenance Annotation and Analysis to Support Process Re-Computation

Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeWorkflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeFrederic Desprez
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovFwdays
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
ALGORITHM-ANALYSIS.ppt
ALGORITHM-ANALYSIS.pptALGORITHM-ANALYSIS.ppt
ALGORITHM-ANALYSIS.pptsapnaverma97
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxPJS KUMAR
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningPiotr Tylenda
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningAgnieszka Potulska
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニックUnity Technologies Japan K.K.
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0PMILebanonChapter
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...SSA KPI
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.pptAlpha474815
 

Similar to Provenance Annotation and Analysis to Support Process Re-Computation (20)

Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeWorkflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
ALGORITHM-ANALYSIS.ppt
ALGORITHM-ANALYSIS.pptALGORITHM-ANALYSIS.ppt
ALGORITHM-ANALYSIS.ppt
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptx
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine Learning
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine Learning
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
RECURSION.pptx
RECURSION.pptxRECURSION.pptx
RECURSION.pptx
 
Compiler Design Unit 4
Compiler Design Unit 4Compiler Design Unit 4
Compiler Design Unit 4
 
CS-323 DAA.pdf
CS-323 DAA.pdfCS-323 DAA.pdf
CS-323 DAA.pdf
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
 

More from Paolo Missier

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 

More from Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Provenance Annotation and Analysis to Support Process Re-Computation

  • 1. Provenance Annotation and Analysis to Support Process Re-Computation Jacek Cała, Paolo Missier School of Computing Newcastle University, UK
  • 2. Problem Outline • Consider process P, e.g. the following NGS pipeline [5]: [5] Cała, J., Marei, E., Xu, Y., Takeda, K., Missier, P.: Scalable and efficient whole-exome data processing using workflows on the cloud. Future Generation Computer Systems (Jan 2016).
  • 3. Problem Outline • Only rarely P is a static entity. • Usually, a variety of elements in P change: • data dependencies, • software tools & dependencies, • [out of scope] the structure of P. • Changes in the elements of P => the need to update past P outcomes => the need for re-computation.
  • 4. The Re-Computation Framework • To control the re-computation of processes • proposed earlier in [6]. • The core of the framework is the re-computation loop: [6] Cała, J., Missier, P.: Selective and recurring re-computation of Big Data analytics tasks: insights from a Genomics case study. Big Data Research (2018); in press.
  • 5. Re-Computation Process • Here we consider a single pass of the loop: • And focus on the first step only (S1).
  • 6. Preliminaries • The ProvONE model: prospective + retrospective provenance [7]. • Set of software and data dependencies: D ={a0, b0, …} • Process, input and execution configuration: E(P, x,V) • Version change event: C = {an → an-1} • Composite version change event: C = {an → an-1, bm → bm-1, …} • Change front. • Re-computation front. • Restart tree. [7] Cuevas-Vicenttín, V., Ludäscher, B., Missier, P., et al.: ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance (2016).
  • 7. Change Front • The accumulation of change events over a specified time window. t C0 {a1 → a0} CF3 {a3, b1, c2} CF5 {a3, b2, c2, d1} C1 {b1 → b0} C3 {a3 → a2, c2 → c1} C4 {d1 → d0} C5 {b2 → b1} C2 {a2 → a1, c1 → c0} E(…, [a0, b0, e0]) E(…, [a0, b1, d0]) E(…, [a2, b1, c1])
  • 8. Re-computation Front • Over time the population of executions grow • Some of them may result from re-executions • Some of them may be user-initiated • may use historical versions of elements • Looking for the transitive closure of the elements’ derivation is too broad. => find out which of the past executions really need an update.
  • 9. Re-computation Front We use: wasInformedBy(..., [prov:type=“recomp:re-execution”]) to denote a ReComp-initiated re-execution.
  • 13. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  • 14. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  • 15. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  • 16. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  • 17. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.  The provenance trace includes multiple interrelated executions.  During re-execution we have to combine all of them within a single context – the top-level execution.
  • 18. Restart Tree • To build a restart tree we rely on the proveone:wasPartOf statements. CF = {b2, e1}
  • 19. Restart Tree • Captures the vertical dimension of a single execution • the transitive closure of the wasPartOf relation. RT ≝ {Execution, [DataChange], [Children]} CF = {b2, e1}
  • 20. Restart Tree • Captures the vertical dimension of a single execution • the transitive closure of the wasPartOf relation. RT = {E0, [], [{SE0, [], [{SSE1, [⟨b2 → b0⟩], []},{SSE3, [⟨e1 → e0⟩], []}]}, {SE1, …}, …]} CF = {b2, e1}
  • 21. The algorithm • Combines all three aspects: • the change front, • the re-computation front and • the restart tree. • For a given change front, –> produces the recomputation front that –> includes a set of restart trees, –> each refers to a single top-level execution with only the parts related to the change(s). • Enables ReComp to identify the minimal set of executions that may be affected by the change(s) • The remaining executions are either unaffected at all or refreshed previously.
  • 22. Re-Computation Process • Enables difference and impact analysis of the executions on the front and their partial re-execution.
  • 23. Difference and Impact Analysis <<hasSubProgram>> <<hasSubProgram>>
  • 24. Conclusions • We address the problem of the re-computation of: • complex hierarchical processes, • run over a cohort of input data samples, • with multiple points of change, • in the open system – allow users to initiate (re-)executions any time. • The solution starts from the changes observed: • In contrast to previous work, e.g. smart re-run and workflow caching. • We proposed a simple algorithm to find the re-computation front: • written in Prolog, • very effective (response in the order of 1–100 ms), • available on GitHub. • The algorithm is the initial step in further scope identification and execution optimisation.

Editor's Notes

  1. Highlight the key elements: complex hierarchical workflow, multiple patient inputs, variety of software and data dependencies. Single run of the pipeline may include 30–40 patient samples, and tens or hundreds of such executions are made in practice, e.g. 1511 brain tumor patients from IGM.
  2. Examples of why changes in P should invoke updates of past P outcomes: reference genome  ??? dbsnp  variants no longer considered as de-novo – important in the case of rare diseases tool versions  more accurate alignment or variant discovery.
  3. Multiple past executions – scope of change is to filter irrelevant execs. Mention that this single pass here is slightly simplified – no cost considered.
  4. notation is slightly different – for the sake of presentation Mention that x is explicitly indicated to denote the fact that V describes dependency changes, whereas x⋲X is an element of a population of inputs. The last three are here to explain the effective algorithm. The alg. is supposed to produce the minimal set of potentially affected execs.
  5. Simple change – wait until CF Composite change Only the changed artefacts Change front CF3 Only the newest references – transitive closure of wDF on the set of all derivations of artefact D.v. The open system – users can introduce execs with non-recent versions. More changes Change front CF5 Includes all change artefacts – keeps the system open. Prioritisation within impact analysis may cause that not all past execs are recomputed. Windowing, windowing policy Explain that P may depend on many other components but C and CF include only the changed ones. Also, mention that CF includes the references to only the newest version of the dependencies and includes all of them –> it is important because various executions may depend on earlier versions, not necessarily the immediate predecessor. Also, the user can introduce executions which depend on versions well before the last CF. Mention that the windowing policy may vary widely, e.g. fixed window size or adaptive window based on some measure of change event significance.
  6. Isn’t it as simple as filtering out execs which are wasInformedBy
  7. Grey arrows indicate the data flow: solid lines – the usage of data, dotted lines – the communication / generation-usage pattern Black dashed arrows reflect the structure of the process.
  8. Mention that for the sake of simplicity we are not showing the port-data usage, which is also needed.
  9. Mention that for the sake of simplicity we are not showing the port-data usage, which is also needed.
  10. How is it different from workflow caching?
  11. Multiple past executions – scope of change is to filter irrelevant execs. Mention that this single pass here is slightly simplified – no cost considered.
  12. Mention that x is explicitly indicated to denote the fact that V describes dependency changes, whereas x⋲X is an element of a population of inputs. The last three are here to explain the effective algorithm. The alg. is supposed to produce the minimal set of potentially affected execs.
  13. Simple change Composite change Only the changed artefacts Change front CF0 Only the newest references – transitive closure of wDF on the set of all derivations of artefact D.v. The open system – users can introduce execs with non-recent versions. More changes Change front CF1 Includes all change artefacts – keeps the system open. Prioritisation within impact analysis may cause that not all past execs are recomputed. Windowing, windowing policy Explain that P may depend on many other components but C and CF include only the changed ones. Also, mention that CF includes the references to only the newest version of the dependencies and includes all of them –> it is important because various executions may depend on earlier versions, not necessarily the immediate predecessor. Also, the user can introduce executions which depend on versions well before the last CF. Mention that the windowing policy may vary widely, e.g. fixed window size or adaptive window based on some measure of change event significance.
  14. Simple change Composite change Only the changed artefacts Change front CF0 a point in time when ReComp initiates the re-computation loop Only the newest references – transitive closure of wDF on the set of all derivations of artefact D.v. The open system – users can introduce execs with non-recent versions. More changes Change front CF1 Includes all change artefacts – keeps the system open. Prioritisation within impact analysis may cause that not all past execs are recomputed. Windowing, windowing policy Explain that P may depend on many other components but C and CF include only the changed ones. Also, mention that CF includes the references to only the newest version of the dependencies and includes all of them –> it is important because various executions may depend on earlier versions, not necessarily the immediate predecessor. Also, the user can introduce executions which depend on versions well before the last CF. Mention that the windowing policy may vary widely, e.g. fixed window size or adaptive window based on some measure of change event significance.
  15. Simple change Composite change Only the changed artefacts Change front CF0 Only the newest references – transitive closure of wDF on the set of all derivations of artefact D.v. The open system – users can introduce execs with non-recent versions. More changes Change front CF1 Includes all change artefacts – keeps the system open. Prioritisation within impact analysis may cause that not all past execs are recomputed. Windowing, windowing policy Explain that P may depend on many other components but C and CF include only the changed ones. Also, mention that CF includes the references to only the newest version of the dependencies and includes all of them –> it is important because various executions may depend on earlier versions, not necessarily the immediate predecessor. Also, the user can introduce executions which depend on versions well before the last CF. Mention that the windowing policy may vary widely, e.g. fixed window size or adaptive window based on some measure of change event significance.
  16. The last Unless there is the top-level coordination process specified already.
  17. A few simplifications: All execs depend on all show artefacts {a & b}. There may be a whole set of execs not on the front, which depend on other elements/artefacts. Re-execution of E0 but not E1 (as explained in the paper) may be due to the scope of changes in ‘a’ by which E1 may not be affected by a2–>a1 and so not refreshed. Execs on the front are such that they are not an informant to any other informed exec.
  18. A few simplifications: All execs depend on all show artefacts {a & b}. There may be a whole set of execs not on the front, which depend on other elements/artefacts. Re-execution of E0 but not E1 (as explained in the paper) may be due to the scope of changes in ‘a’ by which E1 may not be affected by a2–>a1 and so not refreshed.