This document discusses the symbiotic relationship between provenance and workflows in scientific research. It notes that workflows provide automation and integration capabilities, while provenance provides documentation of what transpired. The document provides examples of workflow and provenance technologies and outlines challenges around interoperability. It concludes that recognizing the interdependent relationship between provenance and workflows can help advance systems science research.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
The Symbiotic Nature of Provenance and Workflow
1. The Symbiotic Nature of Provenance and
Workflow
Eric Stephan, Todd Halter
Pacific Northwest National Laboratory
1
2. The Systems Science Challenge
! Studying complex systems typically has the
following characteristics:
! Interdisciplinary problem involving various
stakeholders
! Leverage multiple tools, algorithms, data products, and
sensors
! Reliant on highly iterative and repetitive techniques
! Steps are difficult to document and are often time
committed to memory or notes.
! Solution is to provide:
! ‘plumbing’ to more easily configure and automate
integration, calculation, analysis, and visualization
! Provide a historical explanation of what occurred
2
3. Active Computer Science Research Areas
! Workflows – plumbing
! Provenance – explanation
• Without a historical explanation
workflows provide capability,
but neglect a documentation
trail of what transpired.
• Without plumbing provenance
is difficult to introduce
generically or to support legacy
applications
3
4. Example Workflow Products
! Creating executable workflows
from schematic drawings
I.
Al&nas,
O.
Barney,
Z.Cheng,
T.
Critchlow,
B.
Ludaescher,
S.
Parker,
A.
Shoshani,
M.
Vouk,
“Accelera&ng
the
Scien&fic
Explora&on
Process
with
Scien&fic
Workflows”,
In
Journal
of
Physics:
Conference
Series
SciDAC
2006
proceedings.
June
2006.
4
5. Example Workflow Products
! Constructing component based MeDICi: Middleware for Data-
Intensive Computing
analytical pipelines on enterprise
service bus technology
Gorton
I,
AS
Wynne,
JP
Almquist,
and
J
ChaQerton.
2008.
”The
MeDICi
Integra&on
Framework:
A
PlaVorm
for
High
Performance
Data
Streaming
Applica&ons.”
In
WICSA
2008.
7th
IEEE/IFIP
Working
Conference
on
So[ware
Architecture,
Feb.
18-‐22,
2008,
Vancouver,
Canada
,
pp.
95-‐104.
IEEE
Computer
Society,
Los
Alamitos,
CA.
doi:10.1109/WICSA.2008.21
5
6. Example of Provenance
! Digital Library, Lineage
! Extensible Open Model- Open Provenance Model
Moreau
L,
B
Clifford,
J
Freire,
J
Futrelle,
Y
Gil,
P
Groth,
N
Kwasnikowska,
S
Miles,
P
Missier,
J
Myers,
BA
Plale,
YL
Simmhan,
EG
Stephan,
and
J
Van
den
Bussche.
2010.
"The
Open
Provenance
Model
Core
Specifica&on
(v1.1)
."
Future
Genera@ons
Computer
Systems.
doi:10.1016/j.future.2010.07.005
! Semantic web-based Models- Proof Markup Language
W3C
Incubator
Group,
hQp://www.w3.org/2005/Incubator/prov/wiki/
W3C_Provenance_Incubator_Group_Wiki
6
7. Examples of Creating Connectivity…
! Workflows
! Event listeners
! Self describing workflow components, flow
! Provenance
! Formally described
! Support for reasoning, transitive closure etc.
! Semantically relevant to provenance consumers.
7
8. Existing Deficiencies
! Workflows
! Listeners only reporting syntactic events
! Deluge of atomic transactions
! Inability to convey logical constructs
! E.g. initialization stage
! Lack of support to collect logs from legacy applications
! Provenance
! Collecting naïve provenance – big graph dilemma
! Hardcoded – risk being out of sync with workflow
! Collection without end user requirements
8
9. Interoperability Aides
! Applying provenance execution models to workflow
listeners
! E.g. Describe Anything DaAPI
Wynne
AS,
I
Gorton,
JM
Chase,
and
EG
Stephan.
2009.
MeDICi:
An
Open
PlaEorm
for
Sensor
Integra@on
.
PNNL-‐18716,
Pacific
Northwest
Na&onal
Laboratory,
Richland,
WA.
! Incorporating provenance in workflow framework
! Semantic Abstract Workflow (SAW)
Leonardo
Salayandia
and
Paulo
Pinheiro
da
Silva.
On
the
Use
of
Seman&c
Abstract
Workflows
Rooted
on
Provenance
Concepts
.PROVENANCE
AND
ANNOTATION
OF
DATA
AND
PROCESSES.
Lecture
Notes
in
Computer
Science,
2010,
Volume
6378/2010,
216-‐220,
DOI:
10.1007/978-‐3-‐642-‐17819-‐1_24
9
10. Interoperability Aides
! Advanced storage –
! Grids, Semantic Wikis
! New Provenance Model Abstractions
Stephan
EG,
TD
Halter,
and
BD
Ermold.
2010.
"Leveraging
The
Open
Provenance
Model
as
a
Mul&-‐Tier
Model
for
Global
Climate
Research
."
In
The
3rd
Interna@onal
Provenance
and
Annota@on
Workshop
(IPAW'2010).
Gibson
TD,
KL
Schuchardt,
and
EG
Stephan.
2009.
"Applica&on
of
Named
Graphs
Towards
Custom
Provenance
Views."
In
1st
Workshop
on
the
Theory
and
Prac&ce
of
Provenance
(TaPP
'09),
p.
Paper
No.
5.
USENIX,
Berkeley,
CA.
10
11. Conclusions
! Good news - Workflow and provenance interoperability is
evolving.
! Challenge #1: Recognizing existence of symbiotic
relationship between Workflow and Provenance.
! Challenge #2: Finding new ways to harness this
relationship to advance systems science research.
11