SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Validation and Inference of Schema-Level
Workflow Data-Dependency Annotations
Shawn Bowers1, Timothy McPhillips2, Bertram Lud¨ascher2
1Dept. of Computer Science, Gonzaga University
2School of Information Sciences, University of Illinois,
Urbana-Champaign
IPAW 2018
Scientific Workflows and Provenance
A workflow specification modeled as a graph of computation
steps (nodes) and data/control flow (edges)
gen_boundary_region
gen_boundary_region
boundary_coordinates
user_map_marker_pos
prism_data
file:data/112W36N.nc
d3gend1
d2 filter
c
Steps are often “black boxes” (invoke external programs)
Scientific Workflows and Provenance
During a workflow execution, systems
record “provenance” information ...
I invocation of steps
I data received/produced by steps
A workflow trace modeled as a graph
of invocations and corresponding data
I a trace is a specification instance
I capturing details of a workflow run
4gen:11 4 filter:1
1
4
gen:12
4 filter:1
1
77 filter:2
1
gen:11 0 filter:1
1
Di↵erent traces of the same specification
Data Dependency Assumptions and Issues
Traces are used to infer the “lineage” of data products (⇤)
I e.g., all steps and inputs/outputs that led to an output
I assume all outputs “depend on” all inputs of a step
4gen:11 4 filter:1
1
However, the inferred “dependencies” can be incorrect and vague
1. some outputs might not “depend on” all inputs
2. outputs can depend on inputs di↵erently (derivation, copy, ...)
(⇤)
some systems provide APIs for steps to declare dependencies at runtime
Prospective (Schema-Level) Dependency Annotations
Our approach:
I Allow wf authors to specify dependency patterns (annotations)
I Support di↵erent data dependency types
I Use dependency annotations to infer trace-level dependencies
Prospective (Schema-Level) Dependency Annotations
Our approach:
I Allow wf authors to specify dependency patterns (annotations)
I Support di↵erent data dependency types
I Use dependency annotations to infer trace-level dependencies
Prior work:
I Allows dependency annotations for individual workflow steps
I Rules for extracting trace-level invocation dependencies
I Requires each step to be (fully) annotated
Prospective (Schema-Level) Dependency Annotations
Our approach:
I Allow wf authors to specify dependency patterns (annotations)
I Support di↵erent data dependency types
I Use dependency annotations to infer trace-level dependencies
Prior work:
I Allows dependency annotations for individual workflow steps
I Rules for extracting trace-level invocation dependencies
I Requires each step to be (fully) annotated
Current contributions focus on workflow design:
1. Allow partially annotated workflow specifications
2. Infer complete sets of (possible) annotations
3. Validate correctness of annotations
Workflow Specifications
Minimally, a workflow specification W = (P, D, E) consists of
• a set P of program blocks (computation steps)
p1
• a set D of data blocks (data items or containers)
d1
• a set E ✓ P ⇥ L ⇥ D ⇥ {in, out} of uniquely labeled edges
p1
d1
p2
x1
x2
We use in(pi , xi , di ) and out(pj , xj , dj ) for input and output edges
• where xi , xj are labels in L
Dependency Annotations
Dependency annotations A ✓ Lout ⇥ Lin ⇥ T for a workflow W ...
• associate dependency types t 2 T (more later)
• to input-output edge pairs of W (identified by their labels)
We use dep rule(xi , xj , t) for annotations xi
t
xj (drawn in red)
d3gend1
d2 filter
c
cutoff
n r v1
v2
DependsOn CopyOf
DependsOn
• dep rule(n, r, depends on), dep rule(v1, v2, copy of),
dep rule(cutoff, v2, depends on)
Dependency Types
We consider five di↵erent dependency annotation types ... (⇤,†)
FlowsFrom: input present during invocation (e.g., a trigger)
DependsOn: output has control (statement) dependency on input
DerivedFrom: output has data (read-after-write) dependency on input
ValueOf: input value copied to the output (new data item)
SameAs: input copied to the output (same item “passed through”)
Dependency Types
We consider five di↵erent dependency annotation types ... (⇤,†)
FlowsFrom: input present during invocation (e.g., a trigger)
DependsOn: output has control (statement) dependency on input
DerivedFrom: output has data (read-after-write) dependency on input
ValueOf: input value copied to the output (new data item)
SameAs: input copied to the output (same item “passed through”)
Ordered from weakest to strongest form of dependency ...
FlowsFrom DependsOn DerivedFrom ValueOf SameAs
Dependency Types
We consider five di↵erent dependency annotation types ... (⇤,†)
FlowsFrom: input present during invocation (e.g., a trigger)
DependsOn: output has control (statement) dependency on input
DerivedFrom: output has data (read-after-write) dependency on input
ValueOf: input value copied to the output (new data item)
SameAs: input copied to the output (same item “passed through”)
Ordered from weakest to strongest form of dependency ...
FlowsFrom DependsOn DerivedFrom ValueOf SameAs
Or as subclasses (e.g., FlowsFrom+ as “at least FlowsFrom”) ...
FlowsFrom+
w DependsOn+
w DerivedFrom+
w ValueOf +
w SameAs+
(⇤)
Plus NotFlowsFrom, described later (†)
A more formal description is given in the paper
Reasoning using Dependency Composition
Given two “connected” program blocks:
p1
d1
d2
x1
x2
p2
d3
x3
x4
tj
ti
t
A composite (indirect) dependency x1
t
x4 is the weaker of the
dependencies x1
ti
x2 and x3
tj
x4
dep rule(x1, x2, ti)^dep rule(x3, x4, tj)^ti tj $ dep rule(x1, x4, ti)
dep rule(x1, x2, ti)^dep rule(x3, x4, tj)^tj ti $ dep rule(x1, x4, tj)
This extends to longer “chains” of connected program blocks
Dependency Composition with Multiple Paths
When multiple annotation “paths” exist ...
p1
p4
d1
d2
d5
x1
x2
x7
x9
DerivedFrom
p2
p3
d3
d4
x3
x4
x5
x6
x8
FlowsFrom
DerivedFrom
SameAs
DerivedFrom
The composite annotation type is the strongest type of the paths
• the top path implies FlowsFrom
• the bottom path implies DerivedFrom
• the infered type is DerivedFrom (i.e., “at least DerivedFrom”)
Use Case 1: Infer Composite Dependencies
Given annotations on blocks (steps), find composite annotations
I helps verify intent and construction of workflow
I e.g., that certain outputs are derived from inputs
normalize filterd1
d3
d5
d2
d4
xrange
x1
x2
x3
x4
xcutoff
DependsOn
SameAsDerivedFrom
DerivedFrom
DerivedFrom
DerivedFrom
Inferred annotations shown in blue
Use Case 2: Constraining Dependency Annotations
Add annotations to constrain choices
I e.g., may know the output should be derived from the input
I which can guide (constrain) block-level annotation choices
I or guide the workflow design itself
p1
p2
d1
d2
d3
x1
x2
x3
x4
DerivedFrom
DerivedFrom,
ValueOf, or SameAs?
DerivedFrom,
ValueOf, or SameAs?
Use Case 3: Validating Dependency Annotations
Ensure annotations are compatible
I e.g., lower-level (block) annotations are not consistent with
composite annotation (shown in purple)
generate
sample
d2
dtype
diter
xout
xiter
d1
xin
DerivedFrom
initial
sample
perturbd1
d2
dtype
diter
xtype
n x1
x2
s
xiter
DependsOn
DerivedFromDependsOn
DependsOn
xtype
din
p1 p2 dout
Dependency Reasoning Prototype Implementation
Answer-Set Programming (ASP) prototype in Potascco (clingo)
High level idea: use a generate-and-test algorithm
(i) “guess” annotations for non-annotated input-output pairs
(ii) ensure annotations satisfy composition rules
(iii) ensure annotations satisfy “strongest-path” constraint
Result is all possible and complete annotation sets (possible worlds)
(iv) find all annotations common to all worlds
(v) report possible choices for remaining input-output pairs
Prototype Implementation (cont)
The following “choice rule” guesses annotations
{dep_rule(I,O,R) : dep_type(R)} = 1 :- up_stream(I,O).
The up stream relation finds all possible input-output pairs
up_stream(I,O) :- in(I,P,_), out(O,P,_).
up_stream(I,O) :- in(I,P1,_), out(O1,P1,D1),
in(I2,P2,D1), up_stream(I2,O).
The following constraint ensures composition rules are satisfied
:- dep_rule(I,O,R), not valid_dep_path(I,O,R).
Prototype Implementation (cont)
The valid dep path relation finds valid compositions
valid_dep_path(I,O,R) :- in(I,P,_), out(O,P,_),
dep_rule(I,O,R).
valid_dep_path(I,O,R) :- in(I,P,_), out(O1,P,_), O != O1,
dep_rule(I,O1,R1), connected(O1,I1),
I != I1, valid_dep_path(I1,O,R2),
compose(R1,R2,R).
The connected relation ensures an output is connected to an input
connected(O,I) :- out(O,_,D), in(I,_,D).
compose computes composition (where weaker eq implements )
compose(R1,R2,R1) :- weaker_eq(R1,R2).
compose(R1,R2,R2) :- weaker_eq(R2,R1).
Prototype Implementation (cont)
Finally, the following constraint ensures “strongest” paths
:- dep_rule(I,O,R), valid_dep_path(I,O,R1),
weaker_eq(R,R1), R != R1.
Recently added NotFlowsFrom type (e.g., for subworkflows)
I Required only minimal changes: NotFlowsFrom FlowsFrom
I Full subworkflow support not yet implemented (future work)
d1
p1
x1
d2
p2
x2
d3
x3
d4
x4
Preliminary Performance Results
(1) Increase the depth of the
workflow (2-50 steps) and %
of block annotations
ps
ds
pe
de
...
...
(2) Increase the width of the
workflow (2-50 steps) and %
of block annotations
pe
de
...
Future Work
Add dependency annotations to YesWorkflow’s annotation types
I combine schema-level support and extend trace-level support
Apply schema-level dependency annotations to workflows in YW
I we can now do this, e.g., for paleocar (with NotFlowsFrom)
I extend annotation types as needed
Develop specialized reasoning support (as needed)
I ASP great for prototyping!
I but can improve performance with dedicated implementation
Dr. Shawn Bowers presenting the paper on July 10th, 2018 at IPAW, King’s College, London, UK.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

7.0 files and c input
7.0 files and c input7.0 files and c input
7.0 files and c input
 
Java8 - Interfaces, evolved
Java8 - Interfaces, evolvedJava8 - Interfaces, evolved
Java8 - Interfaces, evolved
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
 
Introduction of bison
Introduction of bisonIntroduction of bison
Introduction of bison
 
Python Basics by Akanksha Bali
Python Basics by Akanksha BaliPython Basics by Akanksha Bali
Python Basics by Akanksha Bali
 
XPath Injection
XPath InjectionXPath Injection
XPath Injection
 
Introduction to Java Programming Part 2
Introduction to Java Programming Part 2Introduction to Java Programming Part 2
Introduction to Java Programming Part 2
 
Yacc lex
Yacc lexYacc lex
Yacc lex
 
Control statements
Control statementsControl statements
Control statements
 
LEX & YACC TOOL
LEX & YACC TOOLLEX & YACC TOOL
LEX & YACC TOOL
 
Unit iii
Unit iiiUnit iii
Unit iii
 
JAVA OOP
JAVA OOPJAVA OOP
JAVA OOP
 
C intro
C introC intro
C intro
 
On Parameterised Types and Java Generics
On Parameterised Types and Java GenericsOn Parameterised Types and Java Generics
On Parameterised Types and Java Generics
 
[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov
[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov
[3.3] Detection & exploitation of Xpath/Xquery Injections - Boris Savkov
 
09. Java Methods
09. Java Methods09. Java Methods
09. Java Methods
 
Javaz. Functional design in Java 8.
Javaz. Functional design in Java 8.Javaz. Functional design in Java 8.
Javaz. Functional design in Java 8.
 
Java generics
Java genericsJava generics
Java generics
 
Java introduction
Java introductionJava introduction
Java introduction
 
Java Generics Introduction - Syntax Advantages and Pitfalls
Java Generics Introduction - Syntax Advantages and PitfallsJava Generics Introduction - Syntax Advantages and Pitfalls
Java Generics Introduction - Syntax Advantages and Pitfalls
 

Ähnlich wie Validation and Inference of Schema-Level Workflow Data-Dependency Annotations

Compiler Construction | Lecture 10 | Data-Flow Analysis
Compiler Construction | Lecture 10 | Data-Flow AnalysisCompiler Construction | Lecture 10 | Data-Flow Analysis
Compiler Construction | Lecture 10 | Data-Flow AnalysisEelco Visser
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paolo Missier
 
Python Workshop. LUG Maniapl
Python Workshop. LUG ManiaplPython Workshop. LUG Maniapl
Python Workshop. LUG ManiaplAnkur Shrivastava
 
IntroductionSTATA.ppt
IntroductionSTATA.pptIntroductionSTATA.ppt
IntroductionSTATA.pptssuser3840bc
 
Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)Pedro Rodrigues
 
Stream Based Input Output
Stream Based Input OutputStream Based Input Output
Stream Based Input OutputBharat17485
 
Introduction to Python , Overview
Introduction to Python , OverviewIntroduction to Python , Overview
Introduction to Python , OverviewNB Veeresh
 
PythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdfPythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdfdata2businessinsight
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard LibrarySantosh Rajan
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisSilvio Cesare
 
Deep Learning and TensorFlow
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlowOswald Campesato
 

Ähnlich wie Validation and Inference of Schema-Level Workflow Data-Dependency Annotations (20)

Compiler Construction | Lecture 10 | Data-Flow Analysis
Compiler Construction | Lecture 10 | Data-Flow AnalysisCompiler Construction | Lecture 10 | Data-Flow Analysis
Compiler Construction | Lecture 10 | Data-Flow Analysis
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
 
Python Workshop. LUG Maniapl
Python Workshop. LUG ManiaplPython Workshop. LUG Maniapl
Python Workshop. LUG Maniapl
 
IntroductionSTATA.ppt
IntroductionSTATA.pptIntroductionSTATA.ppt
IntroductionSTATA.ppt
 
Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)Introduction to the basics of Python programming (part 1)
Introduction to the basics of Python programming (part 1)
 
Functions in python
Functions in pythonFunctions in python
Functions in python
 
Stream Based Input Output
Stream Based Input OutputStream Based Input Output
Stream Based Input Output
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Python basics
Python basicsPython basics
Python basics
 
Introduction to Python , Overview
Introduction to Python , OverviewIntroduction to Python , Overview
Introduction to Python , Overview
 
PythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdfPythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdf
 
Lambda Functions in Java 8
Lambda Functions in Java 8Lambda Functions in Java 8
Lambda Functions in Java 8
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard Library
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
 
Deep Learning and TensorFlow
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlow
 

Mehr von Bertram Ludäscher

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionBertram Ludäscher
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Bertram Ludäscher
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database RulesBertram Ludäscher
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database RulesBertram Ludäscher
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsBertram Ludäscher
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Bertram Ludäscher
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueBertram Ludäscher
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesBertram Ludäscher
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseBertram Ludäscher
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...Bertram Ludäscher
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...Bertram Ludäscher
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...Bertram Ludäscher
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsBertram Ludäscher
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachBertram Ludäscher
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchBertram Ludäscher
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatBertram Ludäscher
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceBertram Ludäscher
 
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionWild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionBertram Ludäscher
 

Mehr von Bertram Ludäscher (20)

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A Dialogue
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflows
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of Research
 
ETC & Authors in the Driver's Seat
ETC & Authors in the Driver's SeatETC & Authors in the Driver's Seat
ETC & Authors in the Driver's Seat
 
From Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable ProvenanceFrom Provenance Standards and Tools to Queries and Actionable Provenance
From Provenance Standards and Tools to Queries and Actionable Provenance
 
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionWild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
 

Kürzlich hochgeladen

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Kürzlich hochgeladen (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

Validation and Inference of Schema-Level Workflow Data-Dependency Annotations

  • 1. Validation and Inference of Schema-Level Workflow Data-Dependency Annotations Shawn Bowers1, Timothy McPhillips2, Bertram Lud¨ascher2 1Dept. of Computer Science, Gonzaga University 2School of Information Sciences, University of Illinois, Urbana-Champaign IPAW 2018
  • 2. Scientific Workflows and Provenance A workflow specification modeled as a graph of computation steps (nodes) and data/control flow (edges) gen_boundary_region gen_boundary_region boundary_coordinates user_map_marker_pos prism_data file:data/112W36N.nc d3gend1 d2 filter c Steps are often “black boxes” (invoke external programs)
  • 3. Scientific Workflows and Provenance During a workflow execution, systems record “provenance” information ... I invocation of steps I data received/produced by steps A workflow trace modeled as a graph of invocations and corresponding data I a trace is a specification instance I capturing details of a workflow run 4gen:11 4 filter:1 1 4 gen:12 4 filter:1 1 77 filter:2 1 gen:11 0 filter:1 1 Di↵erent traces of the same specification
  • 4. Data Dependency Assumptions and Issues Traces are used to infer the “lineage” of data products (⇤) I e.g., all steps and inputs/outputs that led to an output I assume all outputs “depend on” all inputs of a step 4gen:11 4 filter:1 1 However, the inferred “dependencies” can be incorrect and vague 1. some outputs might not “depend on” all inputs 2. outputs can depend on inputs di↵erently (derivation, copy, ...) (⇤) some systems provide APIs for steps to declare dependencies at runtime
  • 5. Prospective (Schema-Level) Dependency Annotations Our approach: I Allow wf authors to specify dependency patterns (annotations) I Support di↵erent data dependency types I Use dependency annotations to infer trace-level dependencies
  • 6. Prospective (Schema-Level) Dependency Annotations Our approach: I Allow wf authors to specify dependency patterns (annotations) I Support di↵erent data dependency types I Use dependency annotations to infer trace-level dependencies Prior work: I Allows dependency annotations for individual workflow steps I Rules for extracting trace-level invocation dependencies I Requires each step to be (fully) annotated
  • 7. Prospective (Schema-Level) Dependency Annotations Our approach: I Allow wf authors to specify dependency patterns (annotations) I Support di↵erent data dependency types I Use dependency annotations to infer trace-level dependencies Prior work: I Allows dependency annotations for individual workflow steps I Rules for extracting trace-level invocation dependencies I Requires each step to be (fully) annotated Current contributions focus on workflow design: 1. Allow partially annotated workflow specifications 2. Infer complete sets of (possible) annotations 3. Validate correctness of annotations
  • 8. Workflow Specifications Minimally, a workflow specification W = (P, D, E) consists of • a set P of program blocks (computation steps) p1 • a set D of data blocks (data items or containers) d1 • a set E ✓ P ⇥ L ⇥ D ⇥ {in, out} of uniquely labeled edges p1 d1 p2 x1 x2 We use in(pi , xi , di ) and out(pj , xj , dj ) for input and output edges • where xi , xj are labels in L
  • 9. Dependency Annotations Dependency annotations A ✓ Lout ⇥ Lin ⇥ T for a workflow W ... • associate dependency types t 2 T (more later) • to input-output edge pairs of W (identified by their labels) We use dep rule(xi , xj , t) for annotations xi t xj (drawn in red) d3gend1 d2 filter c cutoff n r v1 v2 DependsOn CopyOf DependsOn • dep rule(n, r, depends on), dep rule(v1, v2, copy of), dep rule(cutoff, v2, depends on)
  • 10. Dependency Types We consider five di↵erent dependency annotation types ... (⇤,†) FlowsFrom: input present during invocation (e.g., a trigger) DependsOn: output has control (statement) dependency on input DerivedFrom: output has data (read-after-write) dependency on input ValueOf: input value copied to the output (new data item) SameAs: input copied to the output (same item “passed through”)
  • 11. Dependency Types We consider five di↵erent dependency annotation types ... (⇤,†) FlowsFrom: input present during invocation (e.g., a trigger) DependsOn: output has control (statement) dependency on input DerivedFrom: output has data (read-after-write) dependency on input ValueOf: input value copied to the output (new data item) SameAs: input copied to the output (same item “passed through”) Ordered from weakest to strongest form of dependency ... FlowsFrom DependsOn DerivedFrom ValueOf SameAs
  • 12. Dependency Types We consider five di↵erent dependency annotation types ... (⇤,†) FlowsFrom: input present during invocation (e.g., a trigger) DependsOn: output has control (statement) dependency on input DerivedFrom: output has data (read-after-write) dependency on input ValueOf: input value copied to the output (new data item) SameAs: input copied to the output (same item “passed through”) Ordered from weakest to strongest form of dependency ... FlowsFrom DependsOn DerivedFrom ValueOf SameAs Or as subclasses (e.g., FlowsFrom+ as “at least FlowsFrom”) ... FlowsFrom+ w DependsOn+ w DerivedFrom+ w ValueOf + w SameAs+ (⇤) Plus NotFlowsFrom, described later (†) A more formal description is given in the paper
  • 13. Reasoning using Dependency Composition Given two “connected” program blocks: p1 d1 d2 x1 x2 p2 d3 x3 x4 tj ti t A composite (indirect) dependency x1 t x4 is the weaker of the dependencies x1 ti x2 and x3 tj x4 dep rule(x1, x2, ti)^dep rule(x3, x4, tj)^ti tj $ dep rule(x1, x4, ti) dep rule(x1, x2, ti)^dep rule(x3, x4, tj)^tj ti $ dep rule(x1, x4, tj) This extends to longer “chains” of connected program blocks
  • 14. Dependency Composition with Multiple Paths When multiple annotation “paths” exist ... p1 p4 d1 d2 d5 x1 x2 x7 x9 DerivedFrom p2 p3 d3 d4 x3 x4 x5 x6 x8 FlowsFrom DerivedFrom SameAs DerivedFrom The composite annotation type is the strongest type of the paths • the top path implies FlowsFrom • the bottom path implies DerivedFrom • the infered type is DerivedFrom (i.e., “at least DerivedFrom”)
  • 15. Use Case 1: Infer Composite Dependencies Given annotations on blocks (steps), find composite annotations I helps verify intent and construction of workflow I e.g., that certain outputs are derived from inputs normalize filterd1 d3 d5 d2 d4 xrange x1 x2 x3 x4 xcutoff DependsOn SameAsDerivedFrom DerivedFrom DerivedFrom DerivedFrom Inferred annotations shown in blue
  • 16. Use Case 2: Constraining Dependency Annotations Add annotations to constrain choices I e.g., may know the output should be derived from the input I which can guide (constrain) block-level annotation choices I or guide the workflow design itself p1 p2 d1 d2 d3 x1 x2 x3 x4 DerivedFrom DerivedFrom, ValueOf, or SameAs? DerivedFrom, ValueOf, or SameAs?
  • 17. Use Case 3: Validating Dependency Annotations Ensure annotations are compatible I e.g., lower-level (block) annotations are not consistent with composite annotation (shown in purple) generate sample d2 dtype diter xout xiter d1 xin DerivedFrom initial sample perturbd1 d2 dtype diter xtype n x1 x2 s xiter DependsOn DerivedFromDependsOn DependsOn xtype din p1 p2 dout
  • 18. Dependency Reasoning Prototype Implementation Answer-Set Programming (ASP) prototype in Potascco (clingo) High level idea: use a generate-and-test algorithm (i) “guess” annotations for non-annotated input-output pairs (ii) ensure annotations satisfy composition rules (iii) ensure annotations satisfy “strongest-path” constraint Result is all possible and complete annotation sets (possible worlds) (iv) find all annotations common to all worlds (v) report possible choices for remaining input-output pairs
  • 19. Prototype Implementation (cont) The following “choice rule” guesses annotations {dep_rule(I,O,R) : dep_type(R)} = 1 :- up_stream(I,O). The up stream relation finds all possible input-output pairs up_stream(I,O) :- in(I,P,_), out(O,P,_). up_stream(I,O) :- in(I,P1,_), out(O1,P1,D1), in(I2,P2,D1), up_stream(I2,O). The following constraint ensures composition rules are satisfied :- dep_rule(I,O,R), not valid_dep_path(I,O,R).
  • 20. Prototype Implementation (cont) The valid dep path relation finds valid compositions valid_dep_path(I,O,R) :- in(I,P,_), out(O,P,_), dep_rule(I,O,R). valid_dep_path(I,O,R) :- in(I,P,_), out(O1,P,_), O != O1, dep_rule(I,O1,R1), connected(O1,I1), I != I1, valid_dep_path(I1,O,R2), compose(R1,R2,R). The connected relation ensures an output is connected to an input connected(O,I) :- out(O,_,D), in(I,_,D). compose computes composition (where weaker eq implements ) compose(R1,R2,R1) :- weaker_eq(R1,R2). compose(R1,R2,R2) :- weaker_eq(R2,R1).
  • 21. Prototype Implementation (cont) Finally, the following constraint ensures “strongest” paths :- dep_rule(I,O,R), valid_dep_path(I,O,R1), weaker_eq(R,R1), R != R1. Recently added NotFlowsFrom type (e.g., for subworkflows) I Required only minimal changes: NotFlowsFrom FlowsFrom I Full subworkflow support not yet implemented (future work) d1 p1 x1 d2 p2 x2 d3 x3 d4 x4
  • 22. Preliminary Performance Results (1) Increase the depth of the workflow (2-50 steps) and % of block annotations ps ds pe de ... ... (2) Increase the width of the workflow (2-50 steps) and % of block annotations pe de ...
  • 23. Future Work Add dependency annotations to YesWorkflow’s annotation types I combine schema-level support and extend trace-level support Apply schema-level dependency annotations to workflows in YW I we can now do this, e.g., for paleocar (with NotFlowsFrom) I extend annotation types as needed Develop specialized reasoning support (as needed) I ASP great for prototyping! I but can improve performance with dedicated implementation
  • 24. Dr. Shawn Bowers presenting the paper on July 10th, 2018 at IPAW, King’s College, London, UK.