Validation and Inference of Schema-Level Workflow Data-Dependency Annotations

Validation and Inference of Schema-Level
Workﬂow Data-Dependency Annotations
Shawn Bowers1, Timothy McPhillips2, Bertram Lud¨ascher2
1Dept. of Computer Science, Gonzaga University
2School of Information Sciences, University of Illinois,
Urbana-Champaign
IPAW 2018

Scientific Workflows and Provenance
A workflow specification modeled as a graph of computation
steps (nodes) and data/control flow (edges)
gen_boundary_region
gen_boundary_region
boundary_coordinates
user_map_marker_pos
prism_data
file:data/112W36N.nc
d3gend1
d2 filter
c
Steps are often “black boxes” (invoke external programs)

Scientific Workflows and Provenance
During a workflow execution, systems
record “provenance” information ...
I invocation of steps
I data received/produced by steps
A workflow trace modeled as a graph
of invocations and corresponding data
I a trace is a specification instance
I capturing details of a workflow run
4gen:11 4 filter:1
1
4
gen:12
4 filter:1
1
77 filter:2
1
gen:11 0 filter:1
1
Di↵erent traces of the same specification

Data Dependency Assumptions and Issues
Traces are used to infer the “lineage” of data products (⇤)
I e.g., all steps and inputs/outputs that led to an output
I assume all outputs “depend on” all inputs of a step
4gen:11 4 filter:1
1
However, the inferred “dependencies” can be incorrect and vague
1. some outputs might not “depend on” all inputs
2. outputs can depend on inputs di↵erently (derivation, copy, ...)
(⇤)
some systems provide APIs for steps to declare dependencies at runtime

Prospective (Schema-Level) Dependency Annotations
Our approach:
I Allow wf authors to specify dependency patterns (annotations)
I Support di↵erent data dependency types
I Use dependency annotations to infer trace-level dependencies

Our approach:
Prior work:
I Allows dependency annotations for individual workﬂow steps
I Rules for extracting trace-level invocation dependencies
I Requires each step to be (fully) annotated

Our approach:
Prior work:
I Allows dependency annotations for individual workflow steps
I Rules for extracting trace-level invocation dependencies
I Requires each step to be (fully) annotated
Current contributions focus on workflow design:
1. Allow partially annotated workflow specifications
2. Infer complete sets of (possible) annotations
3. Validate correctness of annotations

Workflow Specifications
Minimally, a workflow specification W = (P, D, E) consists of
• a set P of program blocks (computation steps)
p1
• a set D of data blocks (data items or containers)
d1
• a set E ✓ P ⇥ L ⇥ D ⇥ {in, out} of uniquely labeled edges
p1
d1
p2
x1
x2
We use in(pi , xi , di ) and out(pj , xj , dj ) for input and output edges
• where xi , xj are labels in L

Dependency Annotations
Dependency annotations A ✓ Lout ⇥ Lin ⇥ T for a workﬂow W ...
• associate dependency types t 2 T (more later)
• to input-output edge pairs of W (identiﬁed by their labels)
We use dep rule(xi , xj , t) for annotations xi
t
xj (drawn in red)
d3gend1
d2 filter
c
cutoff
n r v1
v2
DependsOn CopyOf
DependsOn
• dep rule(n, r, depends on), dep rule(v1, v2, copy of),
dep rule(cutoff, v2, depends on)

Dependency Types
We consider ﬁve di↵erent dependency annotation types ... (⇤,†)
FlowsFrom: input present during invocation (e.g., a trigger)
DependsOn: output has control (statement) dependency on input
DerivedFrom: output has data (read-after-write) dependency on input
ValueOf: input value copied to the output (new data item)
SameAs: input copied to the output (same item “passed through”)

Dependency Types
Ordered from weakest to strongest form of dependency ...
FlowsFrom DependsOn DerivedFrom ValueOf SameAs

Dependency Types
Ordered from weakest to strongest form of dependency ...
FlowsFrom DependsOn DerivedFrom ValueOf SameAs
Or as subclasses (e.g., FlowsFrom+ as “at least FlowsFrom”) ...
FlowsFrom+
w DependsOn+
w DerivedFrom+
w ValueOf +
w SameAs+
(⇤)
Plus NotFlowsFrom, described later (†)
A more formal description is given in the paper

Reasoning using Dependency Composition
Given two “connected” program blocks:
p1
d1
d2
x1
x2
p2
d3
x3
x4
tj
ti
t
A composite (indirect) dependency x1
t
x4 is the weaker of the
dependencies x1
ti
x2 and x3
tj
x4
dep rule(x1, x2, ti)^dep rule(x3, x4, tj)^ti tj $ dep rule(x1, x4, ti)
dep rule(x1, x2, ti)^dep rule(x3, x4, tj)^tj ti $ dep rule(x1, x4, tj)
This extends to longer “chains” of connected program blocks

Dependency Composition with Multiple Paths
When multiple annotation “paths” exist ...
p1
p4
d1
d2
d5
x1
x2
x7
x9
DerivedFrom
p2
p3
d3
d4
x3
x4
x5
x6
x8
FlowsFrom
DerivedFrom
SameAs
DerivedFrom
The composite annotation type is the strongest type of the paths
• the top path implies FlowsFrom
• the bottom path implies DerivedFrom
• the infered type is DerivedFrom (i.e., “at least DerivedFrom”)

Use Case 1: Infer Composite Dependencies
Given annotations on blocks (steps), ﬁnd composite annotations
I helps verify intent and construction of workﬂow
I e.g., that certain outputs are derived from inputs
normalize filterd1
d3
d5
d2
d4
xrange
x1
x2
x3
x4
xcutoff
DependsOn
SameAsDerivedFrom
DerivedFrom
DerivedFrom
DerivedFrom
Inferred annotations shown in blue

Use Case 2: Constraining Dependency Annotations
Add annotations to constrain choices
I e.g., may know the output should be derived from the input
I which can guide (constrain) block-level annotation choices
I or guide the workﬂow design itself
p1
p2
d1
d2
d3
x1
x2
x3
x4
DerivedFrom
DerivedFrom,
ValueOf, or SameAs?
DerivedFrom,
ValueOf, or SameAs?

Use Case 3: Validating Dependency Annotations
Ensure annotations are compatible
I e.g., lower-level (block) annotations are not consistent with
composite annotation (shown in purple)
generate
sample
d2
dtype
diter
xout
xiter
d1
xin
DerivedFrom
initial
sample
perturbd1
d2
dtype
diter
xtype
n x1
x2
s
xiter
DependsOn
DerivedFromDependsOn
DependsOn
xtype
din
p1 p2 dout

Dependency Reasoning Prototype Implementation
Answer-Set Programming (ASP) prototype in Potascco (clingo)
High level idea: use a generate-and-test algorithm
(i) “guess” annotations for non-annotated input-output pairs
(ii) ensure annotations satisfy composition rules
(iii) ensure annotations satisfy “strongest-path” constraint
Result is all possible and complete annotation sets (possible worlds)
(iv) ﬁnd all annotations common to all worlds
(v) report possible choices for remaining input-output pairs

Prototype Implementation (cont)
The following “choice rule” guesses annotations
{dep_rule(I,O,R) : dep_type(R)} = 1 :- up_stream(I,O).
The up stream relation ﬁnds all possible input-output pairs
up_stream(I,O) :- in(I,P,_), out(O,P,_).
up_stream(I,O) :- in(I,P1,_), out(O1,P1,D1),
in(I2,P2,D1), up_stream(I2,O).
The following constraint ensures composition rules are satisﬁed
:- dep_rule(I,O,R), not valid_dep_path(I,O,R).

The valid dep path relation ﬁnds valid compositions
valid_dep_path(I,O,R) :- in(I,P,_), out(O,P,_),
dep_rule(I,O,R).
valid_dep_path(I,O,R) :- in(I,P,_), out(O1,P,_), O != O1,
dep_rule(I,O1,R1), connected(O1,I1),
I != I1, valid_dep_path(I1,O,R2),
compose(R1,R2,R).
The connected relation ensures an output is connected to an input
connected(O,I) :- out(O,_,D), in(I,_,D).
compose computes composition (where weaker eq implements )
compose(R1,R2,R1) :- weaker_eq(R1,R2).
compose(R1,R2,R2) :- weaker_eq(R2,R1).

Finally, the following constraint ensures “strongest” paths
:- dep_rule(I,O,R), valid_dep_path(I,O,R1),
weaker_eq(R,R1), R != R1.
Recently added NotFlowsFrom type (e.g., for subworkﬂows)
I Required only minimal changes: NotFlowsFrom FlowsFrom
I Full subworkﬂow support not yet implemented (future work)
d1
p1
x1
d2
p2
x2
d3
x3
d4
x4

Preliminary Performance Results
(1) Increase the depth of the
workﬂow (2-50 steps) and %
of block annotations
ps
ds
pe
de
...
...
(2) Increase the width of the
workﬂow (2-50 steps) and %
of block annotations
pe
de
...

Future Work
Add dependency annotations to YesWorkﬂow’s annotation types
I combine schema-level support and extend trace-level support
Apply schema-level dependency annotations to workﬂows in YW
I we can now do this, e.g., for paleocar (with NotFlowsFrom)
I extend annotation types as needed
Develop specialized reasoning support (as needed)
I ASP great for prototyping!
I but can improve performance with dedicated implementation

Dr. Shawn Bowers presenting the paper on July 10th, 2018 at IPAW, King’s College, London, UK.

Validation and Inference of Schema-Level Workflow Data-Dependency Annotations

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Validation and Inference of Schema-Level Workflow Data-Dependency Annotations

Ähnlich wie Validation and Inference of Schema-Level Workflow Data-Dependency Annotations (20)

Mehr von Bertram Ludäscher

Mehr von Bertram Ludäscher (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Validation and Inference of Schema-Level Workflow Data-Dependency Annotations