SlideShare a Scribd company logo
1 of 32
Wf4Ever: Annotating
research objects
Stian Soiland-Reyes, Sean BechHofer
myGrid, University of Manchester
Open Annotation Rollout, Manchester, 2013-06-24
This work is licensed under a
Creative Commons Attribution 3.0 Unported License
Motivation: Scientific workflows
Coordinated execution of
services and linked resources
Dataflow between services
Web services (SOAP, REST)
Command line tools
Scripts
User interactions
Components (nested workflows)
Method becomes:
Documented visually
Shareable as single definition
Reusable with new inputs
Repurposable other services
Reproducible?
http://www.myexperiment.org/workflows/3355
http://www.taverna.org.uk/
http://www.biovel.eu/
But workflows are complex machines
Outputs
Inputs
Configuration
Components
http://www.myexperiment.org/workflows/3355
ā€¢ Will it still work after a year? 10 years?
ā€¢ Expanding components, we see a workflow involves a
series of specific tools and services which
ā€¢ Depend on datasets, software libraries, other tools
ā€¢ Are often poorly described or understood
ā€¢ Over time evolve, change, break or are replaced
ā€¢ User interactions are not reproducible
ā€¢ But can be tracked and replayed
Electronic Paper Not Enough
Hypothesis Experiment
Result
Analysis
Conclusions
Investigation
Data Data
Electronic
paper
Publish
http://www.force11.org/beyondthepdf2http://figshare.com/
Open Research movement: Openly share the data of your experiments
http://datadryad.org/
http://www.researchobject.org/
RESEARCH OBJECT (RO)
http://www.researchobject.org/
Research objects goal: Openly share everything about your
experiments, including how those things are related
What is in a research object?
A Research Object bundles and relates digital
resources of a scientific experiment or
investigation:
Data used and results produced in
experimental study
Methods employed to produce and analyse
that data
Provenance and settings for the experiments
People involved in the investigation
Annotations about these resources, that are
essential to the understanding and
interpretation of the scientific outcomes
captured by a research object
http://www.researchobject.org/
Gathering everything
Research Objects (RO) aggregate related resources, their
provenance and annotations
Conveys ā€œeverything you need to knowā€ about a
study/experiment/analysis/dataset/workflow
Shareable, evolvable, contributable, citable
ROs have their own provenance and lifecycles
Why Research Objects?
i. To share your research materials
(RO as a social object)
ii. To facilitate reproducibility and reuse of methods
iii. To be recognized and cited
(even for constituent resources)
iv. To preserve results and prevent decay
(curation of workflow definition;
using provenance for partial rerun)
A Research object http://alpha.myexperiment.org/packs/387
QualityAssessment of a research object
QualityMonitoring
Annotations in research objects
Types: ā€œThis document contains an hypothesisā€
Relations: ā€œThese datasets are consumed by that toolā€
Provenance: ā€œThese results came from this workflow runā€
Descriptions: ā€œPurpose of this step is to filter out invalid dataā€
Comments: ā€œThis method looks useful, but how do I install it?ā€
Examples: ā€œThis is how you could use itā€
Annotation guidelines ā€“ which
properties?
Descriptions: dct:title, dct:description, rdfs:comment, dct:publisher, dct:license,
dct:subject
Provenance: dct:created, dct:creator, dct:modified, pav:providedBy,
pav:authoredBy, pav:contributedBy, roevo:wasArchivedBy, pav:createdAt
Provenance relations: prov:wasDerivedFrom, prov:wasRevisionOf,
wfprov:usedInput, wfprov:wasOutputFrom
Social networking: oa:Tag, mediaont:hasRating, roterms:technicalContact,
cito:isDocumentedBy, cito:isCitedBy
Dependencies: dcterms:requires, roterms:requiresHardware,
roterms:requiresSoftware, roterms:requiresDataset
Typing: wfdesc:Workflow, wf4ever:Script, roterms:Hypothesis, roterms:Results,
dct:BibliographicResource
What is provenance?
By Dr Stephen Dann
licensed under Creative Commons Attribution-ShareAlike 2.0 Generic
http://www.flickr.com/photos/stephendann/3375055368/
Attribution
who did it?
Derivation
how did it change?
Activity
what happens to it?
Licensing
can I use it?
Attributes
what is it?
Origin
where is it from?
Annotations
what do others say about it?
Aggregation
what is it part of?
Date and tool
when was it made?
using what?
Attribution
Who collected this sample? Who helped?
Which lab performed the sequencing?
Who did the data analysis?
Who curated the results?
Who produced the raw data this analysis is based on?
Who wrote the analysis workflow?
Why do I need this?
i. To be recognized for my work
ii. Who should I give credits to?
iii. Who should I complain to?
iv. Can I trust them?
v. Who should I make friends with?
prov:wasAttributedTo
prov:actedOnBehalfOf
dct:creator
dct:publisher
pav:authoredBy
pav:contributedBy
pav:curatedBy
pav:createdBy
pav:importedBy
pav:providedBy
...
Roles
Person
Organization
SoftwareAgent
Agent types
Alice
The
lab
Data
wasAttributedTo
actedOnBehalfOf
http://practicalprovenance.wordpress.com/
Derivation
Which sample was this metagenome sequenced from?
Which meta-genomes was this sequence extracted from?
Which sequence was the basis for the results?
What is the previous revision of the new results?
Why do I need this?
i. To verify consistency (did I use
the correct sequence?)
ii. To find the latest revision
iii. To backtrack where a diversion
appeared after a change
iv. To credit work I depend on
v. Auditing and defence for peer review
wasDerivedFrom
wasQuotedFrom
Sequence
New
results
wasDerivedFrom
Sample
Meta -
genome
Old
results
wasRevisionOf
wasInfluencedBy
Activities
What happened? When? Who?
What was used and generated?
Why was this workflow started?
Which workflow ran? Where?
Why do I need this?
i. To see which analysis was performed
ii. To find out who did what
iii. What was the metagenome
used for?
iv. To understand the whole process
ā€œmake me a Methods sectionā€
v. To track down inconsistencies
used
wasGeneratedBy
wasStartedAt
"2012-06-21"
Metagenome
Sample
wasAssociatedWith
Workflow
server
wasInformedBy
wasStartedBy
Workflow
run
wasGeneratedBy
Results
Sequencing
wasAssociatedWith
Alice
hadPlan
Workflow
definition
hadRole
Lab
technician
Results
PROV model
http://www.w3.org/TR/prov-primer/
Copyright Ā© 2013 W3CĀ® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.
Provenance Working Group
Provenance of what?
Who made the (content of) research object? Who maintains it?
Who wrote this document? Who uploaded it?
Which CSV was this Excel file imported from?
Who wrote this description? When? How did we get it?
What is the state of this RO? (Live or Published?)
What did the research object look like before? (Revisions) ā€“ are
there newer versions?
Which research objects are derived from this RO?
Research object model at a glance
Research
Object
Resource
Resource
Resource
Annotation
Annotation
Annotation
oa:hasTarget
Resource
Resource
Annotation graph
oa:hasBody
ore:aggregates
Ā«ore:AggregationĀ»
Ā«ro:ResearchObjectĀ»
Ā«oa:AnnotationĀ»
Ā«ro:AggregatedAnnotationĀ»
Ā«trig:GraphĀ»
Ā«ore:AggregatedResourceĀ»
Ā«ro:ResourceĀ»
Manifest
Ā«ore:ResourceMapĀ»
Ā«ro:ManifestĀ»
Wf4Ever architecture
Blob store
Graph
store
Resource
Uploaded to
Manifest
Annotation
graph
Research
object
AnnotationORE Proxy
External
reference
Redirects to
If RDF, import as named graph
SPARQL
REST resources
http://www.wf4ever-project.org/wiki/display/docs/RO+API+6
Where do RO annotations come
from?
Imported from uploaded resources, e.g. embedded in
workflow-specific format (creator: unknown!)
Created by users filling in Title, Description etc. on website
By automatically invoked software agents, e.g.:
A workflow transformation service extracts the workflow
structure as RDF from the native workflow format
Provenance trace from a workflow run, which describes the
origin of aggregated output files in the research object
How we are using the OA model
Multiple oa:Annotation contained within the manifest RDF and
aggregated by the RO.
Provenance (PAV, PROV) on oa:Annotation (who made the link)
and body resource (who stated it)
Typically a single oa:hasTarget, either the RO or an aggregated
resource.
oa:hasBody to a trig:Graph resource (read: RDF file) with the
ā€œactualā€ annotation as RDF:
<workflow1> dct:title "The wonderful workflow" .
Multiple oa:hasTarget for relationships, e.g. graph body:
<workflow1> roterms:inputSelected <input2.txt> .
What should we also be using?
Motivations
myExperiment: commenting, describing, moderating, questioning,
replying, tagging ā€“ made our own vocabulary as OA did not exist
Selectors on compound resources
E.g. description on processors within a workflow in a workflow
definition. How do you find this if you only know the workflow
definition file?
Currently: Annotations on separate URIs for each component,
described in workflow structure graph, which is body of annotation
targeting the workflow definition file
Importing/referring to annotations from other OA systems
(how to discover those?)
What is the benefit of OA for us?
Existing vocabulary ā€“ no need for our project to try to
specify and agree on our own way of tracking
annotations.
Potential interoperability with third-party annotation
tools
E.g. We want to annotate a figure in a paper and
relate it to a dataset in a research object ā€“ donā€™t
want to write another tool for that!
Existing annotations (pre research object) in Taverna
and myExperiment map easily to OA model
History lesson (AO/OAC/OA)
When forming the Wf4Ever Research Object model, we found:
Open Annotation Collaboration (OAC)
Annotation Ontology (AO)
What was the difference?
Technically, for Wf4Everā€™s purposes: They are equivalent
Political choice: AO ā€“ supported by Utopia (Manchester)
We encouraged the formation of W3C Open Annotation
Community Group and a joint model
Next: Research Object model v0.2 and RO Bundle will use the
OA model ā€“ since we only used 2 properties, mapping is 1:1
http://www.wf4ever-project.org/wiki/display/docs/2011-09-26+Annotation+model+considerations
Saving a research object:
RO bundle
Single, transferrable research object
Self-contained snapshot
Which files in ZIP, which are URIs? (Up to user/application)
Regular ZIP file, explored and unpacked with standard tools
JSON manifest is programmatically accessible without RDF
understanding
Works offline and in desktop applications ā€“ no REST API
access required
Basis for RO-enabled file formats, e.g. Taverna run bundle
Exchanged with myExperiment and RO tools
Workflow Results Bundle
workflowrun.prov.ttl
(RDF)
outputA.txt
outputC.jpg
outputB/
https://w3id.org/bundl
intermediates/
1.txt
2.txt
3.txt
de/def2e58b-50e2-4949-9980-fd310166621a.txt
inputA.txt
workflow
URI
references
attribution
execution
environment
Aggregating in Research Object
ZIP folder structure (RO Bundle)
mimetype
application/vnd.wf4ever.robundle+zip
.ro/manifest.json
RO Bundle
What is aggregated? File In ZIP or external URI
Who made the RO? When?
Who?
External URIs placed in folders
Embedded annotation
External annotation, e.g. blogpost
JSON-LD context ļƒ  RDF
RO provenance
.ro/manifest.json
Format
Note: JSON "quotes" not shown above for brevity
http://json-ld.org/
http://orcid.org/
https://w3id.org/bundle
http://mayor2.dia.fi.upm.es/oeg-upm/files/dgarijo/motifAnalysisSite/
<h3 property="dc:title">Common Motifs in Scientific Workflows:
<br>An Empirical Analysis</h3>
<body resource="http://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/"
typeOf="ore:Aggregation ro:ResearchObject">
Research Object as RDFa
http://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/
<li><a property="ore:aggregates" href="t2_workflow_set_eSci2012.v.0.9_FGCS.xls"
typeOf="ro:Resource">Analytics for Taverna workflows</a></li>
<li><a property="ore:aggregates" href="WfCatalogue-AdditionalWingsDomains.xlsxā€œ
typeOf="ro:Resource">Analytics for Wings workflows</a></li>
<span property="dc:creator prov:wasAttributedTo"
resource="http://delicias.dia.fi.upm.es/members/DGarijo/#me"></span>
W3C community group for RO
http://www.w3.org/community/rosc/

More Related Content

Similar to 2013 06-24 Wf4Ever: Annotating research objects (PPTX)

Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
Ā 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Julie Allinson
Ā 
2017-11-03 Provenance and Research Object
2017-11-03 Provenance and Research Object2017-11-03 Provenance and Research Object
2017-11-03 Provenance and Research ObjectStian Soiland-Reyes
Ā 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
Ā 
Biocatalogue, FileQuirks, MyExperiment
Biocatalogue, FileQuirks, MyExperimentBiocatalogue, FileQuirks, MyExperiment
Biocatalogue, FileQuirks, MyExperimentJerzy
Ā 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurghJun Zhao
Ā 
2017-11-03 Provenance and Research Object
2017-11-03 Provenance and Research Object2017-11-03 Provenance and Research Object
2017-11-03 Provenance and Research ObjectStian Soiland-Reyes
Ā 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchangelagoze
Ā 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk SlidesBioCatalogue
Ā 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015Joaquin Vanschoren
Ā 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLogCameron Neylon
Ā 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic webStanley Wang
Ā 
Mduke sagecite-jisc-march11
Mduke sagecite-jisc-march11Mduke sagecite-jisc-march11
Mduke sagecite-jisc-march11monicaduke
Ā 
AAT LOD Microthesauri
AAT LOD MicrothesauriAAT LOD Microthesauri
AAT LOD MicrothesauriMarcia Zeng
Ā 
Versioning for Workflow Evolution
Versioning for Workflow EvolutionVersioning for Workflow Evolution
Versioning for Workflow EvolutionEran Chinthaka Withana
Ā 
Example Of Import Java
Example Of Import JavaExample Of Import Java
Example Of Import JavaMelody Rios
Ā 

Similar to 2013 06-24 Wf4Ever: Annotating research objects (PPTX) (20)

Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
Ā 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
Ā 
2017-11-03 Provenance and Research Object
2017-11-03 Provenance and Research Object2017-11-03 Provenance and Research Object
2017-11-03 Provenance and Research Object
Ā 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
Ā 
Biocatalogue, FileQuirks, MyExperiment
Biocatalogue, FileQuirks, MyExperimentBiocatalogue, FileQuirks, MyExperiment
Biocatalogue, FileQuirks, MyExperiment
Ā 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
Ā 
SEppt
SEpptSEppt
SEppt
Ā 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
Ā 
2017-11-03 Provenance and Research Object
2017-11-03 Provenance and Research Object2017-11-03 Provenance and Research Object
2017-11-03 Provenance and Research Object
Ā 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
Ā 
UCIAD overview
UCIAD overviewUCIAD overview
UCIAD overview
Ā 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk Slides
Ā 
LeVan, "Search Web Services"
LeVan, "Search Web Services"LeVan, "Search Web Services"
LeVan, "Search Web Services"
Ā 
OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015OpenML Tutorial ECMLPKDD 2015
OpenML Tutorial ECMLPKDD 2015
Ā 
The Chemtools LaBLog
The Chemtools LaBLogThe Chemtools LaBLog
The Chemtools LaBLog
Ā 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
Ā 
Mduke sagecite-jisc-march11
Mduke sagecite-jisc-march11Mduke sagecite-jisc-march11
Mduke sagecite-jisc-march11
Ā 
AAT LOD Microthesauri
AAT LOD MicrothesauriAAT LOD Microthesauri
AAT LOD Microthesauri
Ā 
Versioning for Workflow Evolution
Versioning for Workflow EvolutionVersioning for Workflow Evolution
Versioning for Workflow Evolution
Ā 
Example Of Import Java
Example Of Import JavaExample Of Import Java
Example Of Import Java
Ā 

More from Stian Soiland-Reyes

2017-09-27-scholarly-html-ro
2017-09-27-scholarly-html-ro2017-09-27-scholarly-html-ro
2017-09-27-scholarly-html-roStian Soiland-Reyes
Ā 
2017-11-03 Scientific Workflow systems
2017-11-03 Scientific Workflow systems2017-11-03 Scientific Workflow systems
2017-11-03 Scientific Workflow systemsStian Soiland-Reyes
Ā 
2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language Viewer2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language ViewerStian Soiland-Reyes
Ā 
2016-05-18-Make research reproducible again - researchobject.org
2016-05-18-Make research reproducible again - researchobject.org2016-05-18-Make research reproducible again - researchobject.org
2016-05-18-Make research reproducible again - researchobject.orgStian Soiland-Reyes
Ā 
2015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 20152015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 2015Stian Soiland-Reyes
Ā 
2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architecture2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architectureStian Soiland-Reyes
Ā 
2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 statusStian Soiland-Reyes
Ā 
2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator project2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator projectStian Soiland-Reyes
Ā 
2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wild2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wildStian Soiland-Reyes
Ā 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)Stian Soiland-Reyes
Ā 
2013-05-29 Taverna Provenance
2013-05-29 Taverna Provenance2013-05-29 Taverna Provenance
2013-05-29 Taverna ProvenanceStian Soiland-Reyes
Ā 
2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?Stian Soiland-Reyes
Ā 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research ObjectStian Soiland-Reyes
Ā 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
Ā 
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...Stian Soiland-Reyes
Ā 
2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow systemStian Soiland-Reyes
Ā 
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTXTaverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTXStian Soiland-Reyes
Ā 
Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)Stian Soiland-Reyes
Ā 
Bringing caBIG services together using Taverna
Bringing caBIG services together using TavernaBringing caBIG services together using Taverna
Bringing caBIG services together using TavernaStian Soiland-Reyes
Ā 

More from Stian Soiland-Reyes (19)

2017-09-27-scholarly-html-ro
2017-09-27-scholarly-html-ro2017-09-27-scholarly-html-ro
2017-09-27-scholarly-html-ro
Ā 
2017-11-03 Scientific Workflow systems
2017-11-03 Scientific Workflow systems2017-11-03 Scientific Workflow systems
2017-11-03 Scientific Workflow systems
Ā 
2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language Viewer2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language Viewer
Ā 
2016-05-18-Make research reproducible again - researchobject.org
2016-05-18-Make research reproducible again - researchobject.org2016-05-18-Make research reproducible again - researchobject.org
2016-05-18-Make research reproducible again - researchobject.org
Ā 
2015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 20152015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 2015
Ā 
2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architecture2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architecture
Ā 
2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status
Ā 
2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator project2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator project
Ā 
2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wild2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wild
Ā 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)
Ā 
2013-05-29 Taverna Provenance
2013-05-29 Taverna Provenance2013-05-29 Taverna Provenance
2013-05-29 Taverna Provenance
Ā 
2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?
Ā 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
Ā 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects
Ā 
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
Ā 
2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system
Ā 
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTXTaverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Ā 
Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Ā 
Bringing caBIG services together using Taverna
Bringing caBIG services together using TavernaBringing caBIG services together using Taverna
Bringing caBIG services together using Taverna
Ā 

Recently uploaded

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
Ā 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
Ā 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
Ā 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
Ā 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
Ā 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Ā 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
Ā 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
Ā 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
Ā 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Ā 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Ā 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Ā 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Ā 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Ā 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Ā 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Ā 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Ā 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Ā 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Ā 

2013 06-24 Wf4Ever: Annotating research objects (PPTX)

  • 1. Wf4Ever: Annotating research objects Stian Soiland-Reyes, Sean BechHofer myGrid, University of Manchester Open Annotation Rollout, Manchester, 2013-06-24 This work is licensed under a Creative Commons Attribution 3.0 Unported License
  • 2. Motivation: Scientific workflows Coordinated execution of services and linked resources Dataflow between services Web services (SOAP, REST) Command line tools Scripts User interactions Components (nested workflows) Method becomes: Documented visually Shareable as single definition Reusable with new inputs Repurposable other services Reproducible? http://www.myexperiment.org/workflows/3355 http://www.taverna.org.uk/ http://www.biovel.eu/
  • 3. But workflows are complex machines Outputs Inputs Configuration Components http://www.myexperiment.org/workflows/3355 ā€¢ Will it still work after a year? 10 years? ā€¢ Expanding components, we see a workflow involves a series of specific tools and services which ā€¢ Depend on datasets, software libraries, other tools ā€¢ Are often poorly described or understood ā€¢ Over time evolve, change, break or are replaced ā€¢ User interactions are not reproducible ā€¢ But can be tracked and replayed
  • 4. Electronic Paper Not Enough Hypothesis Experiment Result Analysis Conclusions Investigation Data Data Electronic paper Publish http://www.force11.org/beyondthepdf2http://figshare.com/ Open Research movement: Openly share the data of your experiments http://datadryad.org/
  • 5. http://www.researchobject.org/ RESEARCH OBJECT (RO) http://www.researchobject.org/ Research objects goal: Openly share everything about your experiments, including how those things are related
  • 6. What is in a research object? A Research Object bundles and relates digital resources of a scientific experiment or investigation: Data used and results produced in experimental study Methods employed to produce and analyse that data Provenance and settings for the experiments People involved in the investigation Annotations about these resources, that are essential to the understanding and interpretation of the scientific outcomes captured by a research object http://www.researchobject.org/
  • 7. Gathering everything Research Objects (RO) aggregate related resources, their provenance and annotations Conveys ā€œeverything you need to knowā€ about a study/experiment/analysis/dataset/workflow Shareable, evolvable, contributable, citable ROs have their own provenance and lifecycles
  • 8. Why Research Objects? i. To share your research materials (RO as a social object) ii. To facilitate reproducibility and reuse of methods iii. To be recognized and cited (even for constituent resources) iv. To preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun)
  • 9. A Research object http://alpha.myexperiment.org/packs/387
  • 10.
  • 11. QualityAssessment of a research object
  • 13. Annotations in research objects Types: ā€œThis document contains an hypothesisā€ Relations: ā€œThese datasets are consumed by that toolā€ Provenance: ā€œThese results came from this workflow runā€ Descriptions: ā€œPurpose of this step is to filter out invalid dataā€ Comments: ā€œThis method looks useful, but how do I install it?ā€ Examples: ā€œThis is how you could use itā€
  • 14. Annotation guidelines ā€“ which properties? Descriptions: dct:title, dct:description, rdfs:comment, dct:publisher, dct:license, dct:subject Provenance: dct:created, dct:creator, dct:modified, pav:providedBy, pav:authoredBy, pav:contributedBy, roevo:wasArchivedBy, pav:createdAt Provenance relations: prov:wasDerivedFrom, prov:wasRevisionOf, wfprov:usedInput, wfprov:wasOutputFrom Social networking: oa:Tag, mediaont:hasRating, roterms:technicalContact, cito:isDocumentedBy, cito:isCitedBy Dependencies: dcterms:requires, roterms:requiresHardware, roterms:requiresSoftware, roterms:requiresDataset Typing: wfdesc:Workflow, wf4ever:Script, roterms:Hypothesis, roterms:Results, dct:BibliographicResource
  • 15. What is provenance? By Dr Stephen Dann licensed under Creative Commons Attribution-ShareAlike 2.0 Generic http://www.flickr.com/photos/stephendann/3375055368/ Attribution who did it? Derivation how did it change? Activity what happens to it? Licensing can I use it? Attributes what is it? Origin where is it from? Annotations what do others say about it? Aggregation what is it part of? Date and tool when was it made? using what?
  • 16. Attribution Who collected this sample? Who helped? Which lab performed the sequencing? Who did the data analysis? Who curated the results? Who produced the raw data this analysis is based on? Who wrote the analysis workflow? Why do I need this? i. To be recognized for my work ii. Who should I give credits to? iii. Who should I complain to? iv. Can I trust them? v. Who should I make friends with? prov:wasAttributedTo prov:actedOnBehalfOf dct:creator dct:publisher pav:authoredBy pav:contributedBy pav:curatedBy pav:createdBy pav:importedBy pav:providedBy ... Roles Person Organization SoftwareAgent Agent types Alice The lab Data wasAttributedTo actedOnBehalfOf http://practicalprovenance.wordpress.com/
  • 17. Derivation Which sample was this metagenome sequenced from? Which meta-genomes was this sequence extracted from? Which sequence was the basis for the results? What is the previous revision of the new results? Why do I need this? i. To verify consistency (did I use the correct sequence?) ii. To find the latest revision iii. To backtrack where a diversion appeared after a change iv. To credit work I depend on v. Auditing and defence for peer review wasDerivedFrom wasQuotedFrom Sequence New results wasDerivedFrom Sample Meta - genome Old results wasRevisionOf wasInfluencedBy
  • 18. Activities What happened? When? Who? What was used and generated? Why was this workflow started? Which workflow ran? Where? Why do I need this? i. To see which analysis was performed ii. To find out who did what iii. What was the metagenome used for? iv. To understand the whole process ā€œmake me a Methods sectionā€ v. To track down inconsistencies used wasGeneratedBy wasStartedAt "2012-06-21" Metagenome Sample wasAssociatedWith Workflow server wasInformedBy wasStartedBy Workflow run wasGeneratedBy Results Sequencing wasAssociatedWith Alice hadPlan Workflow definition hadRole Lab technician Results
  • 19. PROV model http://www.w3.org/TR/prov-primer/ Copyright Ā© 2013 W3CĀ® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. Provenance Working Group
  • 20. Provenance of what? Who made the (content of) research object? Who maintains it? Who wrote this document? Who uploaded it? Which CSV was this Excel file imported from? Who wrote this description? When? How did we get it? What is the state of this RO? (Live or Published?) What did the research object look like before? (Revisions) ā€“ are there newer versions? Which research objects are derived from this RO?
  • 21. Research object model at a glance Research Object Resource Resource Resource Annotation Annotation Annotation oa:hasTarget Resource Resource Annotation graph oa:hasBody ore:aggregates Ā«ore:AggregationĀ» Ā«ro:ResearchObjectĀ» Ā«oa:AnnotationĀ» Ā«ro:AggregatedAnnotationĀ» Ā«trig:GraphĀ» Ā«ore:AggregatedResourceĀ» Ā«ro:ResourceĀ» Manifest Ā«ore:ResourceMapĀ» Ā«ro:ManifestĀ»
  • 22. Wf4Ever architecture Blob store Graph store Resource Uploaded to Manifest Annotation graph Research object AnnotationORE Proxy External reference Redirects to If RDF, import as named graph SPARQL REST resources http://www.wf4ever-project.org/wiki/display/docs/RO+API+6
  • 23. Where do RO annotations come from? Imported from uploaded resources, e.g. embedded in workflow-specific format (creator: unknown!) Created by users filling in Title, Description etc. on website By automatically invoked software agents, e.g.: A workflow transformation service extracts the workflow structure as RDF from the native workflow format Provenance trace from a workflow run, which describes the origin of aggregated output files in the research object
  • 24. How we are using the OA model Multiple oa:Annotation contained within the manifest RDF and aggregated by the RO. Provenance (PAV, PROV) on oa:Annotation (who made the link) and body resource (who stated it) Typically a single oa:hasTarget, either the RO or an aggregated resource. oa:hasBody to a trig:Graph resource (read: RDF file) with the ā€œactualā€ annotation as RDF: <workflow1> dct:title "The wonderful workflow" . Multiple oa:hasTarget for relationships, e.g. graph body: <workflow1> roterms:inputSelected <input2.txt> .
  • 25. What should we also be using? Motivations myExperiment: commenting, describing, moderating, questioning, replying, tagging ā€“ made our own vocabulary as OA did not exist Selectors on compound resources E.g. description on processors within a workflow in a workflow definition. How do you find this if you only know the workflow definition file? Currently: Annotations on separate URIs for each component, described in workflow structure graph, which is body of annotation targeting the workflow definition file Importing/referring to annotations from other OA systems (how to discover those?)
  • 26. What is the benefit of OA for us? Existing vocabulary ā€“ no need for our project to try to specify and agree on our own way of tracking annotations. Potential interoperability with third-party annotation tools E.g. We want to annotate a figure in a paper and relate it to a dataset in a research object ā€“ donā€™t want to write another tool for that! Existing annotations (pre research object) in Taverna and myExperiment map easily to OA model
  • 27. History lesson (AO/OAC/OA) When forming the Wf4Ever Research Object model, we found: Open Annotation Collaboration (OAC) Annotation Ontology (AO) What was the difference? Technically, for Wf4Everā€™s purposes: They are equivalent Political choice: AO ā€“ supported by Utopia (Manchester) We encouraged the formation of W3C Open Annotation Community Group and a joint model Next: Research Object model v0.2 and RO Bundle will use the OA model ā€“ since we only used 2 properties, mapping is 1:1 http://www.wf4ever-project.org/wiki/display/docs/2011-09-26+Annotation+model+considerations
  • 28. Saving a research object: RO bundle Single, transferrable research object Self-contained snapshot Which files in ZIP, which are URIs? (Up to user/application) Regular ZIP file, explored and unpacked with standard tools JSON manifest is programmatically accessible without RDF understanding Works offline and in desktop applications ā€“ no REST API access required Basis for RO-enabled file formats, e.g. Taverna run bundle Exchanged with myExperiment and RO tools
  • 30. RO Bundle What is aggregated? File In ZIP or external URI Who made the RO? When? Who? External URIs placed in folders Embedded annotation External annotation, e.g. blogpost JSON-LD context ļƒ  RDF RO provenance .ro/manifest.json Format Note: JSON "quotes" not shown above for brevity http://json-ld.org/ http://orcid.org/ https://w3id.org/bundle
  • 31. http://mayor2.dia.fi.upm.es/oeg-upm/files/dgarijo/motifAnalysisSite/ <h3 property="dc:title">Common Motifs in Scientific Workflows: <br>An Empirical Analysis</h3> <body resource="http://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/" typeOf="ore:Aggregation ro:ResearchObject"> Research Object as RDFa http://www.oeg-upm.net/files/dgarijo/motifAnalysisSite/ <li><a property="ore:aggregates" href="t2_workflow_set_eSci2012.v.0.9_FGCS.xls" typeOf="ro:Resource">Analytics for Taverna workflows</a></li> <li><a property="ore:aggregates" href="WfCatalogue-AdditionalWingsDomains.xlsxā€œ typeOf="ro:Resource">Analytics for Wings workflows</a></li> <span property="dc:creator prov:wasAttributedTo" resource="http://delicias.dia.fi.upm.es/members/DGarijo/#me"></span>
  • 32. W3C community group for RO http://www.w3.org/community/rosc/

Editor's Notes

  1. http://purl.org/wf4ever/model
  2. http://purl.org/wf4ever/model
  3. Most of the user-contributed content in a research object is recorded as annotations
  4. Typing of resources and relating them to each-other are individual annotations
  5. This shows the quality assessment of a research object ā€“ a Minimal Information Model forms a Checklist which indicates how preservability of a research object.Here, many of the checklist items indicate the existence of particular annotations, while others are checking availability of HTTP resources.Annotations are additionally picked up for UI (title/description) ā€“ this highlights how the importance of annotation model. This was implemented before OA, and the format of the description was not recorded, hence the HTML is rendered as plain text.
  6. Quality metrics are monitored over time, so as the research object evolves (say gaining additional annotations), the health of the annotations can be indicated graphically.
  7. The annotation framework basically allows ā€œany annotationā€, so we had to write guidelines on which annotation properties we are going to recommend and ā€œnativelyā€ understand. Reused existing vocabularies like Dublin Core Terms, PROV and PAV, but also had to make our own more specific vocabularies.
  8. When I first heard about Provenance, I thought it was something French, like Provance. Provenance is classically understood as where something is coming from (Origin); like in this example ā€“ are the shallots from Holland or France? Was there some kind of Derivation that changed their nationality? Obviously if we are going to talk about somethingsā€™ provenance, we have to be clear about what that thing is.. The shallots? The sign? The picture? Or this Flickr page?Provenance also covers other aspects, mainly Attribution (who did it), Dates (when?), and Activities (what happened). There are Attributes to describe the state of the thing. Perhaps not always considered provenance, but anyway relevant, are aggregations (one thing is part of another), Licensing (Can I use it?) and of course Annotations ā€“ what do others say about it?
  9. Letā€™s take an example of a biomedical lab that sequences genome data. There would be lots of questions relating to attributions ā€“ different people play different roles, even act on behalf of others. We can call these Agents ā€“ things that can perform stuff. People are obvious agents, Organizations (like The Lab), but also Software can be active agents.
  10. When we talk about things, or entities, we might want to relate them to each other. An extracted genome can be said to be derived from the sample. The sequence we select from the genome is a kind of quote. The result we get from analysing this is derived from the sequence, and is a revision of the old result ā€“ which again has its own chain of influences which might differ.
  11. Activities is what is happening ā€“ typically using existing entities and generating new ones, somewhat under control by one or more agents. Taken together, you can describe a whole lineage of activities that generate and consume each otherā€™s entities.
  12. So these three classes are what is at the core of the W3C PROV model, which we have helped build. The Entity is derived from other entities, and attributed to an Agent. An Activity use one entity and generates another, and is associated with an agent.
  13. So for Wf4Ever, what kind of provenance are we talking about? It is a bit more down to earth as we are generally just talking about files and documents, but still, as you form a research object combining many different resources, the different dimensions of provenance become increasingly important to track.
  14. So letā€™s have a look at what a Research Object looks like. The core is the concept of the Research Object itself, which you may also known as an ORE aggregation. This is described by the manifest, which is simply an RDF file. The RO aggregates a series of resources ā€“ in Linked Data these could be anywhere in the world. Additionally it aggregates a set of annotations, which we know is the link between a target resource (here aggregated in the RO), and an body resource. In Wf4Ever we typically provide the body as a separate RDF Graph, so that we can use existing vocabularies to describe and relate the resources.
  15. So how is this model exposed in the Wf4Ever architecture? In the back end there is a Blob store which can store any odd file, and a Graph store, where we import any RDF file as a named graph. This is exposed as a SPARQL endpoint. The manifest is one of those files, so you can query Research Object aggregations, their annotations and the annotation bodies in a single SPARQL query.Then each of the different types are exposed as a REST resource. Compound resources are simply PUT or GET almost directly into the Blob store. The ORE Proxy is a mechanism for ā€œuploadingā€ a URL, this is how we keep external references. The annotation graphs (the bodies) are also simply uploaded as files ā€“ and then related to the resource(s) it describe by making an Annotation within the manifest. So the Manifest is the key RDF document, as it contains both the aggregation and the annotation ā€“ but importantly not the body of those annotations (Which is typically either 1 line or 1000s of lines of RDF) as those are separate files.
  16. So where do our annotations come from? For one we are importing existing annotations from existing formats ā€“ like extracting annotations from within workflow definitions.Secondly, users annotate research objects by interacting with the website, filling in title, description, etc.Thirdly we have software agents that are automatically invoked, for instance we have a transformation service that extracts the workflow structure as an RDF graph by parsing the native workflow format. Provenance traces of workflow runs can be inspected to relate files in the RO with generated outputs in the workflow run.
  17. So which aspects of the OA model are we using? Well, in comes down to actually just oa:hasTarget andoa:hasBody. We use the oa:Annotation as the anchor for provenance about who made the relation, and keep separate provenance on the body resource ā€“ as the person stating it could differ.
  18. myExperiment, a social networking site for sharing workflows, already have the aspects of commenting, etc.. ā€“ it does have an RDF interface, but using our own vocabulary. As we upgrade myExperiment to be based on research objects, we will be moving to use the OA model for motivatons ā€“ the proposed OA motivatoins largely cover what we already do.We also have an issue in that we have many annotations on compound resources. For instance within the Taverna workbench it is possible to annotate individual steps of a workflow with descriptions, examples and so on. Within a research object these are referred to by their full URI ā€“ but how do you find this URI, as the compound resources of the workflow definition are not aggregated. Currently we have to climb through the generated workflow structure graph to find the processors, which we find through a separate annotation on the workflow definition file. If we use the OA selector mechanism we should be able to also relate those deep annotations as SpecificResource on the workflow definition.
  19. Now I have to be fair, we started on this quite a bit before OA existed. In fact, when we started there seemed to be two competing models. But as we inspected OAC and AO and wondered about how we would use them, we came to the conclusion that, at least for our purposes, they are technically equivalent. So we just went with a political choice, AO was already supported by Utopia, which is made in Manchested, so we went with that. However, our enquiries about this also helped trigger the formaton of the OA community group, which we have also participated in. We are looking forward to use the OA model as we now revise the RO model ā€“ the transition is not hard as we only used the 2 properties for target and body, which maps 1:1.In the Research Object Bundle we have already adapted OA, which is a way to save a whole research object as a single ZIP file.
  20. So not everyone have access to set up a RESTful semantic web servers, in particular weā€™ve run into this with desktop applications ā€“ users just want to save files and then they decide where they are stored. So we decided to write a serialization format for Research Object, which we call the RO Bundle.We wanted this to be accessible for applicaton developers, so weā€™ve adopted ZIP and JSON, and in a way this would let you create research objects and make annotations without ever seing any RDF.
  21. This is how we represent a workflow run as a Workflow Results RO Bundle. We aggregate the workflowoutputs, , workflow definition, the inputs used for execution, a description of the execution environment, external URI references (such as the project homepage) and attribution to scientists who contributed to the bundle. This effectively forms a Research Object, all tied together by the RO Bundle Manifest, which is in JSON-LD format. (normal JSON that is also valid RDF).
  22. This shows how the JSON manifest focuses on the most common aspect of a research object ā€“ who made it? When? What is aggregated ā€“ files in the ZIP but also external URIs ā€“ up to the application or person making the bundle to decide what is to be included in the ZIP. Annotations are included at the bottom here, we see that thereā€™s an annotation ā€œaboutā€ (target) the analysis JPEG, and the content (the body) is within the annotations/ folder. Similarly, the next annotations relates the external resource (a blog post) with our aggregation of a resource.This is processable as JSON-LD ā€“ so it is not just JSON, it is also RDF, and out comes normal ORE aggregations and OA annotations.
  23. Hereā€™s another example of light-weight usage of RDFa to turn a normal index.html into a research object. Here the author is given as a creator of the RO, and the excel files that helped form this analysis are aggregated by the research object. This way of using the Research Object model requires not infrastructure or special packaging ā€“ and we have augmented this page to also have a downloadable RO Bundle so you can get all the aggregated resources in a one-go operation.
  24. So we have recently formed a W3C Community Group for Research Object, which has gathered significant interest, 75 participant. As you see, I am one of the chairs, and so is Rob which you already know from OA group. We are just starting up, and our focus in the RO community group is rather how to practically use Research Objects as a concept than to specify a new model ā€“ weā€™ll refer to existing models where itā€™s appropriate, but also explore other models which could be described as research objects.