SlideShare ist ein Scribd-Unternehmen logo
1 von 40
From Scientific Workflows to
Research Objects: Publication
and Abstraction of Scientific
Experiments
Daniel Garijo Verdejo
Supervisors: Oscar Corcho, Yolanda Gil

Ontology Engineering Group
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
Date: 12/02/2014
Outline

Index
1.
2.
3.
4.
5.

Background
What do I do?
Motivation
Overview
Representing and publishing scientific workflows in the Web
•
•
•

Linked Data
Templates and provenance traces
Standards

6. Common motifs among scientific workflows
•

Workflow motif catalog

7. Detecting common fragments among scientific workflows
8. Workflows as part of an experiment: Research Objects
2
What are Scientific Workflows?

•“Template defining the set of tasks needed to
carry out a computational experiment” [1]
•Inputs
•Steps
•Intermediate results
•Outputs
•Data driven, usually represented as Directed
Acyclic Graphs (DAGs)
[1] Ewa Deelman, Dennis Gannon, Matthew Shields, Ian Taylor, Workflows and e-science: an overview of workflow
system features and capabilities, Future Generation Computer Systems 25 (5) (2009) 528–540.
3
How are scientific workflows created?
Laboratory Protocol
(recipe)

Experiment

Lab book

Workflow

Digital Log

• Similar to Laboratory Protocols

4
What am I working on?

•Workflow representation
•Plan/template representation
•Provenance trace representation
•Link between templates and traces

CH1: Can we export an abstract template of the
method being represented?
CH2: How do we interoperate with other workflow
results?
CH3: How do we access the workflow results?

•Creation of abstractions/motifs in scientific workflows
•Abstraction catalog
•Find how different workflows are related

CH4: How can we detect what are the typical
operations in scientific workflows?
CH5: How can we detect them automatically?

•Understandability and reuse of scientific workflows
•Relation between the workflows involved in the same experiment
(Research Objects)

CH6: Which workflow parts are related to each other?
CH7: How do workflows depend on the other parts of
the experiments?
5
Motivation

•As a designer: Discovery
•Workflows with similar functionality fragments/methods
•Design based in previous templates.

•As user/reuser: Understandability, Exploration
•Search workflows by functionality
•Commonalities between execution runs

Workflow 1

•Component categorization

6
Overview

Abstraction definitions and categorization

Descriptions/
PSMs/Ontologies

Data mining tools,
graph analysis, etc.
Algorithms for finding the different
abstractions automatically

RDF Stores
Experiment publication

Provenance
representation

Plan
representation

Vocabularies

7
Taverna and Wings

http://www.taverna.org.uk/

http://www.wings-workflows.org/

8
Representing and publishing scientific workflows in the Web

Representing and publishing scientific
workflows in the Web

A New Approach for Publishing Workflows: Abstractions, Standards, and Linked Data. Garijo, D.; and Gil, Y. In
Proceedings of the 6th workshop on Workflows in support of large-scale science, page 47-56, Seattle, 2011. ACM
9
Overview

Abstractions definitions and categorization

Algorithms for finding the different
abstractions automatically

Experiment Publication

Provenance
representation

Plan
representation

Virtuoso,
Pubby,
Wings (+Plugin)
OPMW

10
What is Linked Data?

1. Use URIs as names for things.
2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information.
4. Include links to other URIs.

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
8
Publishing workflows: high level architecture
Other
workflow
environments

WINGS on local laptop
Core
Portal

Workflow
Template

OPM
export

Workflow
Instance

Programatic access
(external apps)

WINGS on shared host
Core

Portal

Workflow
Template

OPM
export

Workflow
Instance

Linked
Data
Publication

WINGS on web server
Core
Portal

Wings workflow
generation

Workflow
Template
Workflow
Instance

Interactive
Browsing
(Pubby frontend)

OPM
export

OPM
conversion

Users

Publication

Share

Reuse

12
OPMW: Process view

13
OPMW: Attribution view

14
Representing and publishing scientific workflows in the Web

Standards

PROV-O: The PROV Ontology. Lebo, T.; Sahoo, S.; McGuinness, D.; Belhajjame, K.; Corsar, D.; Cheney, J.; Garijo, D.;
Soiland-Reyes, S.; Zednik, S.; and Zhao, J. W3C Consortium. 2012.
15
Overview

Abstractions definitions and categorization

Algorithms for finding the different
abstractions automatically

Experiment Publication

Provenance
representation

Plan
representation

Virtuoso,
Pubby,
Wings (+Plugin)
OPMW + PROV
+ P-PLAN

16
P-PLAN

•Plans are not provenance
•P-PLAN: Simple plan model for binding traces to template representations
•Aligned with OPMW and PROV (W3C Provenance Standard)
Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. Garijo, D.; and Gil, Y. In
Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012.
17
Representing and publishing scientific workflows in the Web

Common motifs among scientific
workflows

Common motifs in scientific workflows: An empirical analysis. Garijo, D.; Alper, P.; Belhajjame, K.; Corcho, O.;
Gil, Y.; and Goble, C. Future Generation Computer Systems, . 2013
18
Overview

Abstractions definitions and categorization

Motif Detection

Algorithms for automatic matching

Experiment Publication

Provenance
representation

Plan
representation

Virtuoso,
Pubby,
Wings (+Plugin)
OPMW

19
Overview

• Empirical analysis on 260 workflow templates from Taverna,
Wings, Galaxy and Vistrails
• Catalog of recurring patterns: scientific
workflow motifs.
• Data Oriented Motifs
• Workflow Oriented Motifs

•Understandability and reuse
http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg
20
Approach

•Reverse-engineer the set of current practices in workflow
development through an analysis of empirical evidence
•Identify workflow abstractions that would facilitate
understandability and therefore effective re-use

21
Workflow Motifs
•Workflow motif: Domain independent conceptual abstraction on the workflow
steps.
1. Data-oriented motifs: What kind of manipulations does the workflow have?

•E.g.:
•Data retrieval
•Data preparation
• etc.
2.

WHAT?

Workflow-oriented motifs: How does the workflow perform its operations?
•E.g.:
•Stateful steps
•Stateless steps
•Human interactions
•etc.

HOW?
22
Motif Catalog

Data-Oriented Motifs

Workflow-Oriented Motifs

Data Retrieval
Data Preparation

Intra-Workflow Motifs
Stateful (Asynchronous) Invocations

Format Transformation

Stateless (Synchronous) Invocations

Input Augmentation
and Output Splitting

Internal Macros

Data Organisation
Data Analysis
Data Curation/Cleaning

Data Moving
Data Visualisation

Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading

Ontology Purl: http://purl.org/net/wf-motifs
23
Representing and publishing scientific workflows in the Web

Detecting common fragments among
scientific workflows (macro motifs)

Detecting common scientific workflow fragments using execution provenance. Garijo, D.; Corcho, O.; and Gil, Y. In
Proceedings of of the seventh international conference on Knowledge capture, page 33-40, Banff, 2013. ACM.
24
Summary: Work done at ISI

Abstractions definitions and categorization

Algorithms for automatic
matching

Macro
abstraction
detection

Experiment Publication

Provenance
representation

Plan
representation

Motif Detection

SUBDUE + PAFI
exploration and
integration in
RDF

Virtuoso,
Pubby,
Wings (+Plugin)
OPMW + PROV
+ P-PLAN

25
Macro abstraction detection
Problem statement:
Given a repository of workflow templates (either abstract or specific) or workflow
execution traces, what are the workflow fragments I can deduce from it?

Useful for:
•Systems like Taverna and Wings: (Many templates, little annotation to relate them)
•Finding relationships between workflows and sub-workflows.
•Most used fragments, most executed, etc.
•Systems like GenePattern and Galaxy: (Many runs, nearly no templates published)
•Proposing new templates with the popular fragments.

26
Challenges: Common workflow fragment detection
•Given a collection of workflows, which are the most common fragments?
•Common sub-graphs among the collection
•Sub-graph isomorphism (NP-complete)
•We use the SUBDUE algorithm [Holder et al 1994] (hierachical clustering)
•Graph Grammar learning
•The rules of the grammar are the workflow fragments
•Graph based hierarchical clustering
•Each cluster corresponds to a workflow fragment
•Iterative algorithm with two measures for compressing the graph:
•Minimum Description Length (MDL)
•Size
•Current tests with PAFI (http://glaros.dtc.umn.edu/gkhome/pafi/overview)
ongoing.
[Holder et al 1994]: Substructure Discovery in the SUBDUE System L. B. Holder, D. J. Cook, and S. Djoko. AAAI
Workshop on Knowledge Discovery, pages 169-180, 1994.
27
How does SUBDUE work?
DatasetT1

DatasetT1

ProcessType1

ProcessType1

DatasetT2

DatasetT2

DatasetT2

ProcessType2

ProcessType2

ProcessType2

DatasetT3

DatasetT3

DatasetT3

ProcessType3

Input Graph

DatasetT3

28
How does SUBDUE work?
DatasetT1

DatasetT1

ProcessType1

ProcessType1

DatasetT2

DatasetT2

DatasetT2

ProcessType2

ProcessType2

ProcessType2

DatasetT3

DatasetT3

DatasetT3

ProcessType3

Fragment1

Iteration 1

DatasetT3

29
How does SUBDUE work?

DatasetT1

DatasetT1

ProcessType1

ProcessType1

FRAG1

FRAG1

ProcessType3

DatasetT3

FRAG1

Iteration 1
result
30
How does SUBDUE work?

DatasetT1

DatasetT1

ProcessType1

ProcessType1

FRAG1

FRAG1

ProcessType3

Fragment2
FRAG1

Iteration 2

DatasetT3

31
How does SUBDUE work?

FRAG2

FRAG2

ProcessType3

DatasetT3

FRAG1

Iteration 2
result
(STOP)
32
How does SUBDUE work?

Results:
Fragment 1 (FRAG1) :

Fragment 2 (FRAG2):

DatasetT2

DatasetT1

ProcessType2
DatasetT3

Occurrences:
3 times

ProcessType1

FRAG1

2 times

33
Challenges: Generalization of workflows

Stemmer

Porter
Stemmer

Term Weighting
Lovins
Stemmer

TF

DF

CF
34
Research Objects

Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse. Belhajjame, K.; Corcho, O.;
Garijo, D.; Zhao, J.; Missier, P.; Newman, D.; Palma, R.; Bechhofer, S.; Garcıa, E.; Manuel, .G. J.; Klyne, G.; Page,
K.; Roos, M.; Ruiz, J. E.; Soiland-Reyes, S.; Verdes-Montenegro, L.; De Roure, D.; and Goble, C. In Proceedings of
the Second International Conference on the Future of Scholarly Communication and Scientific Publishing
Sepublica2012, page 1-12, Hersonissos, 2012
35
What is a Research Object?

•Aggregation of resources that bundles together the contents
of a research work:
•Data
•Experiments
•Examples
•Bibliography
•Annotations
•Provenance
•ROs
•Etc.

36
Research Objects: An Overview

+ Open Annotation

•Tool support
•Interoperability

http://www.openannotation.org/spec/core/

37
What can you find in a Research Object? A real example

TPDL 2013, Valleta, Malta.

3
Thanks !
:oscarCorcho

:yolandGil

:supervises

:idafenSantana

:supervises

Wf Infrastructure

OEG

:danielGarijo

:collaboratesWith

:olgaGiraldo

:varunRatnakar

:supervises

:collaboratesWith

:collaboratesWith
:collaboratesWith
:collaboratesWith

Laboratory Protocols

:collaboratesWith
:caroleGoble

:khalidBelhajjame

:pinarAlper
39
From Scientific Workflows to
Research Objects: Publication
and Abstraction of Scientific
Experiments
Daniel Garijo Verdejo
Supervisors: Oscar Corcho, Yolanda Gil

Ontology Engineering Group
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
Date: 12/02/2014

Weitere ähnliche Inhalte

Was ist angesagt?

Common Motifs in Scientific Workflows: An Empirical Analysis
Common Motifs in Scientific Workflows: An Empirical AnalysisCommon Motifs in Scientific Workflows: An Empirical Analysis
Common Motifs in Scientific Workflows: An Empirical Analysisdgarijo
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesBertram Ludäscher
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Bertram Ludäscher
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceUniversity of Washington
 
Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesToine Bogers
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...Khalid Belhajjame
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 

Was ist angesagt? (8)

Common Motifs in Scientific Workflows: An Empirical Analysis
Common Motifs in Scientific Workflows: An Empirical AnalysisCommon Motifs in Scientific Workflows: An Empirical Analysis
Common Motifs in Scientific Workflows: An Empirical Analysis
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
 
Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program Committees
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 

Andere mochten auch

Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.orgNorman Morrison
 
Research Publications Management: Issues and Possibilities at SMU (P. Yeo)
Research Publications Management: Issues and Possibilities at SMU (P. Yeo)Research Publications Management: Issues and Possibilities at SMU (P. Yeo)
Research Publications Management: Issues and Possibilities at SMU (P. Yeo)ORCID, Inc
 
Publications in Research Journals
Publications in Research JournalsPublications in Research Journals
Publications in Research JournalsKnihovnaUTB
 
Lightning Talk Session - Elsevier (C. Shillum)
Lightning Talk Session - Elsevier (C. Shillum)Lightning Talk Session - Elsevier (C. Shillum)
Lightning Talk Session - Elsevier (C. Shillum)ORCID, Inc
 
Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Carole Goble
 
Types of Publications
Types of PublicationsTypes of Publications
Types of PublicationsJerry Stovall
 
Importance of research publication pg society
Importance of research publication   pg societyImportance of research publication   pg society
Importance of research publication pg societyHairul Abdul Rashid
 
RESEARCH TO PUBLICATION: A JOURNEY
RESEARCH TO PUBLICATION: A JOURNEYRESEARCH TO PUBLICATION: A JOURNEY
RESEARCH TO PUBLICATION: A JOURNEYKhalid Hakeem
 
Scientific Research and its publication
Scientific Research and its publicationScientific Research and its publication
Scientific Research and its publicationAnabela Mesquita
 

Andere mochten auch (9)

Research Shared: researchobject.org
Research Shared: researchobject.orgResearch Shared: researchobject.org
Research Shared: researchobject.org
 
Research Publications Management: Issues and Possibilities at SMU (P. Yeo)
Research Publications Management: Issues and Possibilities at SMU (P. Yeo)Research Publications Management: Issues and Possibilities at SMU (P. Yeo)
Research Publications Management: Issues and Possibilities at SMU (P. Yeo)
 
Publications in Research Journals
Publications in Research JournalsPublications in Research Journals
Publications in Research Journals
 
Lightning Talk Session - Elsevier (C. Shillum)
Lightning Talk Session - Elsevier (C. Shillum)Lightning Talk Session - Elsevier (C. Shillum)
Lightning Talk Session - Elsevier (C. Shillum)
 
Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016
 
Types of Publications
Types of PublicationsTypes of Publications
Types of Publications
 
Importance of research publication pg society
Importance of research publication   pg societyImportance of research publication   pg society
Importance of research publication pg society
 
RESEARCH TO PUBLICATION: A JOURNEY
RESEARCH TO PUBLICATION: A JOURNEYRESEARCH TO PUBLICATION: A JOURNEY
RESEARCH TO PUBLICATION: A JOURNEY
 
Scientific Research and its publication
Scientific Research and its publicationScientific Research and its publication
Scientific Research and its publication
 

Ähnlich wie From Scientific Workflows to Research Objects: Publication and Abstraction of Scientific Experiments

Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015dgarijo
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOMCarole Goble
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012Idafen Santana Pérez
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)dgarijo
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesRothamsted Research, UK
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objectsseanb
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publicationsdgarijo
 
Ethics reproducibility and data stewardship
Ethics reproducibility and data stewardshipEthics reproducibility and data stewardship
Ethics reproducibility and data stewardshipRussell Jarvis
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 

Ähnlich wie From Scientific Workflows to Research Objects: Publication and Abstraction of Scientific Experiments (20)

Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015Creating abstractions from scientific workflows: PhD symposium 2015
Creating abstractions from scientific workflows: PhD symposium 2015
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012
Conservation of Scientific Workflow Infrastructures by Using Semantics - 2012
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use Cases
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publications
 
Ethics reproducibility and data stewardship
Ethics reproducibility and data stewardshipEthics reproducibility and data stewardship
Ethics reproducibility and data stewardship
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 

Mehr von dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineeringdgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 
Publicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigaciónPublicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigacióndgarijo
 

Mehr von dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
Publicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigaciónPublicación de datos y métodos científicos en investigación
Publicación de datos y métodos científicos en investigación
 

Kürzlich hochgeladen

Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 

Kürzlich hochgeladen (20)

Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 

From Scientific Workflows to Research Objects: Publication and Abstraction of Scientific Experiments

  • 1. From Scientific Workflows to Research Objects: Publication and Abstraction of Scientific Experiments Daniel Garijo Verdejo Supervisors: Oscar Corcho, Yolanda Gil Ontology Engineering Group Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid Date: 12/02/2014
  • 2. Outline Index 1. 2. 3. 4. 5. Background What do I do? Motivation Overview Representing and publishing scientific workflows in the Web • • • Linked Data Templates and provenance traces Standards 6. Common motifs among scientific workflows • Workflow motif catalog 7. Detecting common fragments among scientific workflows 8. Workflows as part of an experiment: Research Objects 2
  • 3. What are Scientific Workflows? •“Template defining the set of tasks needed to carry out a computational experiment” [1] •Inputs •Steps •Intermediate results •Outputs •Data driven, usually represented as Directed Acyclic Graphs (DAGs) [1] Ewa Deelman, Dennis Gannon, Matthew Shields, Ian Taylor, Workflows and e-science: an overview of workflow system features and capabilities, Future Generation Computer Systems 25 (5) (2009) 528–540. 3
  • 4. How are scientific workflows created? Laboratory Protocol (recipe) Experiment Lab book Workflow Digital Log • Similar to Laboratory Protocols 4
  • 5. What am I working on? •Workflow representation •Plan/template representation •Provenance trace representation •Link between templates and traces CH1: Can we export an abstract template of the method being represented? CH2: How do we interoperate with other workflow results? CH3: How do we access the workflow results? •Creation of abstractions/motifs in scientific workflows •Abstraction catalog •Find how different workflows are related CH4: How can we detect what are the typical operations in scientific workflows? CH5: How can we detect them automatically? •Understandability and reuse of scientific workflows •Relation between the workflows involved in the same experiment (Research Objects) CH6: Which workflow parts are related to each other? CH7: How do workflows depend on the other parts of the experiments? 5
  • 6. Motivation •As a designer: Discovery •Workflows with similar functionality fragments/methods •Design based in previous templates. •As user/reuser: Understandability, Exploration •Search workflows by functionality •Commonalities between execution runs Workflow 1 •Component categorization 6
  • 7. Overview Abstraction definitions and categorization Descriptions/ PSMs/Ontologies Data mining tools, graph analysis, etc. Algorithms for finding the different abstractions automatically RDF Stores Experiment publication Provenance representation Plan representation Vocabularies 7
  • 9. Representing and publishing scientific workflows in the Web Representing and publishing scientific workflows in the Web A New Approach for Publishing Workflows: Abstractions, Standards, and Linked Data. Garijo, D.; and Gil, Y. In Proceedings of the 6th workshop on Workflows in support of large-scale science, page 47-56, Seattle, 2011. ACM 9
  • 10. Overview Abstractions definitions and categorization Algorithms for finding the different abstractions automatically Experiment Publication Provenance representation Plan representation Virtuoso, Pubby, Wings (+Plugin) OPMW 10
  • 11. What is Linked Data? 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” 8
  • 12. Publishing workflows: high level architecture Other workflow environments WINGS on local laptop Core Portal Workflow Template OPM export Workflow Instance Programatic access (external apps) WINGS on shared host Core Portal Workflow Template OPM export Workflow Instance Linked Data Publication WINGS on web server Core Portal Wings workflow generation Workflow Template Workflow Instance Interactive Browsing (Pubby frontend) OPM export OPM conversion Users Publication Share Reuse 12
  • 15. Representing and publishing scientific workflows in the Web Standards PROV-O: The PROV Ontology. Lebo, T.; Sahoo, S.; McGuinness, D.; Belhajjame, K.; Corsar, D.; Cheney, J.; Garijo, D.; Soiland-Reyes, S.; Zednik, S.; and Zhao, J. W3C Consortium. 2012. 15
  • 16. Overview Abstractions definitions and categorization Algorithms for finding the different abstractions automatically Experiment Publication Provenance representation Plan representation Virtuoso, Pubby, Wings (+Plugin) OPMW + PROV + P-PLAN 16
  • 17. P-PLAN •Plans are not provenance •P-PLAN: Simple plan model for binding traces to template representations •Aligned with OPMW and PROV (W3C Provenance Standard) Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. Garijo, D.; and Gil, Y. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012. 17
  • 18. Representing and publishing scientific workflows in the Web Common motifs among scientific workflows Common motifs in scientific workflows: An empirical analysis. Garijo, D.; Alper, P.; Belhajjame, K.; Corcho, O.; Gil, Y.; and Goble, C. Future Generation Computer Systems, . 2013 18
  • 19. Overview Abstractions definitions and categorization Motif Detection Algorithms for automatic matching Experiment Publication Provenance representation Plan representation Virtuoso, Pubby, Wings (+Plugin) OPMW 19
  • 20. Overview • Empirical analysis on 260 workflow templates from Taverna, Wings, Galaxy and Vistrails • Catalog of recurring patterns: scientific workflow motifs. • Data Oriented Motifs • Workflow Oriented Motifs •Understandability and reuse http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg 20
  • 21. Approach •Reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence •Identify workflow abstractions that would facilitate understandability and therefore effective re-use 21
  • 22. Workflow Motifs •Workflow motif: Domain independent conceptual abstraction on the workflow steps. 1. Data-oriented motifs: What kind of manipulations does the workflow have? •E.g.: •Data retrieval •Data preparation • etc. 2. WHAT? Workflow-oriented motifs: How does the workflow perform its operations? •E.g.: •Stateful steps •Stateless steps •Human interactions •etc. HOW? 22
  • 23. Motif Catalog Data-Oriented Motifs Workflow-Oriented Motifs Data Retrieval Data Preparation Intra-Workflow Motifs Stateful (Asynchronous) Invocations Format Transformation Stateless (Synchronous) Invocations Input Augmentation and Output Splitting Internal Macros Data Organisation Data Analysis Data Curation/Cleaning Data Moving Data Visualisation Human Interactions Inter-Workflow Motifs Atomic Workflows Composite Workflows Workflow Overloading Ontology Purl: http://purl.org/net/wf-motifs 23
  • 24. Representing and publishing scientific workflows in the Web Detecting common fragments among scientific workflows (macro motifs) Detecting common scientific workflow fragments using execution provenance. Garijo, D.; Corcho, O.; and Gil, Y. In Proceedings of of the seventh international conference on Knowledge capture, page 33-40, Banff, 2013. ACM. 24
  • 25. Summary: Work done at ISI Abstractions definitions and categorization Algorithms for automatic matching Macro abstraction detection Experiment Publication Provenance representation Plan representation Motif Detection SUBDUE + PAFI exploration and integration in RDF Virtuoso, Pubby, Wings (+Plugin) OPMW + PROV + P-PLAN 25
  • 26. Macro abstraction detection Problem statement: Given a repository of workflow templates (either abstract or specific) or workflow execution traces, what are the workflow fragments I can deduce from it? Useful for: •Systems like Taverna and Wings: (Many templates, little annotation to relate them) •Finding relationships between workflows and sub-workflows. •Most used fragments, most executed, etc. •Systems like GenePattern and Galaxy: (Many runs, nearly no templates published) •Proposing new templates with the popular fragments. 26
  • 27. Challenges: Common workflow fragment detection •Given a collection of workflows, which are the most common fragments? •Common sub-graphs among the collection •Sub-graph isomorphism (NP-complete) •We use the SUBDUE algorithm [Holder et al 1994] (hierachical clustering) •Graph Grammar learning •The rules of the grammar are the workflow fragments •Graph based hierarchical clustering •Each cluster corresponds to a workflow fragment •Iterative algorithm with two measures for compressing the graph: •Minimum Description Length (MDL) •Size •Current tests with PAFI (http://glaros.dtc.umn.edu/gkhome/pafi/overview) ongoing. [Holder et al 1994]: Substructure Discovery in the SUBDUE System L. B. Holder, D. J. Cook, and S. Djoko. AAAI Workshop on Knowledge Discovery, pages 169-180, 1994. 27
  • 28. How does SUBDUE work? DatasetT1 DatasetT1 ProcessType1 ProcessType1 DatasetT2 DatasetT2 DatasetT2 ProcessType2 ProcessType2 ProcessType2 DatasetT3 DatasetT3 DatasetT3 ProcessType3 Input Graph DatasetT3 28
  • 29. How does SUBDUE work? DatasetT1 DatasetT1 ProcessType1 ProcessType1 DatasetT2 DatasetT2 DatasetT2 ProcessType2 ProcessType2 ProcessType2 DatasetT3 DatasetT3 DatasetT3 ProcessType3 Fragment1 Iteration 1 DatasetT3 29
  • 30. How does SUBDUE work? DatasetT1 DatasetT1 ProcessType1 ProcessType1 FRAG1 FRAG1 ProcessType3 DatasetT3 FRAG1 Iteration 1 result 30
  • 31. How does SUBDUE work? DatasetT1 DatasetT1 ProcessType1 ProcessType1 FRAG1 FRAG1 ProcessType3 Fragment2 FRAG1 Iteration 2 DatasetT3 31
  • 32. How does SUBDUE work? FRAG2 FRAG2 ProcessType3 DatasetT3 FRAG1 Iteration 2 result (STOP) 32
  • 33. How does SUBDUE work? Results: Fragment 1 (FRAG1) : Fragment 2 (FRAG2): DatasetT2 DatasetT1 ProcessType2 DatasetT3 Occurrences: 3 times ProcessType1 FRAG1 2 times 33
  • 34. Challenges: Generalization of workflows Stemmer Porter Stemmer Term Weighting Lovins Stemmer TF DF CF 34
  • 35. Research Objects Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse. Belhajjame, K.; Corcho, O.; Garijo, D.; Zhao, J.; Missier, P.; Newman, D.; Palma, R.; Bechhofer, S.; Garcıa, E.; Manuel, .G. J.; Klyne, G.; Page, K.; Roos, M.; Ruiz, J. E.; Soiland-Reyes, S.; Verdes-Montenegro, L.; De Roure, D.; and Goble, C. In Proceedings of the Second International Conference on the Future of Scholarly Communication and Scientific Publishing Sepublica2012, page 1-12, Hersonissos, 2012 35
  • 36. What is a Research Object? •Aggregation of resources that bundles together the contents of a research work: •Data •Experiments •Examples •Bibliography •Annotations •Provenance •ROs •Etc. 36
  • 37. Research Objects: An Overview + Open Annotation •Tool support •Interoperability http://www.openannotation.org/spec/core/ 37
  • 38. What can you find in a Research Object? A real example TPDL 2013, Valleta, Malta. 3
  • 40. From Scientific Workflows to Research Objects: Publication and Abstraction of Scientific Experiments Daniel Garijo Verdejo Supervisors: Oscar Corcho, Yolanda Gil Ontology Engineering Group Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid Date: 12/02/2014