SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Provenance in the Dynamic, Collaborative New
                  Science




                    Dr Jun Zhao
               Department of Zoology
                University of Oxford
              jun.zhao@zoo.ox.ac.uk
Technological infrastructure for the preservation and efficient
retrieval and reuse of scientific workflows in a range of disciplines
Packaging, preserving and publishing
Astronomy Use Case:
     A Repeater's Story
●   Dealing with big amounts of tabular
    data
●   A lot of small scripts to avoid creating
    blackbox process
●   Local resource sharing, public
    access only after publication
●   Data must be frequently updated
    from external data repositories
●   Data updates must be tested before
    being executed
●   Data must be locally stored with
    versioning
●   “... we don't like to spread [the tasks]
    and lose controls who is doing
    what ...”
Research Objects
http:/www.wf4ever-project.org
                                       ●
                                           Aggregation – Pointers or literals of
                                           internal and external content;
                                       ●
                                           Identity –Equivalence, equality;
                                       ●
                                           Metadata – A reusable object;
                                       ●
                                           Lifecycle – Stages of development.
                                           Impacts on available functionality;
                                       ●
                                           Versioning – Recording changes;
                                       ●
                                           Security – Access, authentication,
                                           ownership, trust;
                                       ●
                                           Graceful Degradation of
                                           Understanding – Opaque RO
                                           domain content.
                                       ●
                                           Mixed stewardship
                                       ●
                                           Provenance
       ROs are Content Aware Objects
                                            ●
                                                Of compound objects
         that bundle things together
                                            ●
                                                Of evolutions
                                            ●
                                                Of dynamic objects and static
                                                objects
Biology Use Case: A Reuser's Story
●   Takes a set of genes from gene experiment results
    performed by others, as read in a scientific paper
●   Perform 'dry' analysis to understand which genes and
    which biological processes were disturbed by which
    chemical compounds
    ●   basic affymetrix data processing
    ●   statistical analysis to identify genes that are significantly
        differentially expressed under different conditions (with/without the
        compounds)
    ●   find those pathways that are most prominent among the filtered
        genes
Biology Use Case: A Reuser's Story
●   Search for existing experiments from
    myExperiment (http://myexperiment.org)
●   Challenge: Understand the workflow
    ●   Perform test runs with test data and his own data
    ●   Read others' logs
    ●   Read annotations to workflows
●   Reuse scripts from colleagues and perform
    tests that his colleagues are familiar with
How Can It be Supported?
●   A reference to the source of the data and the people to acknowledge for it.
●   The initial hypothesis
●   The conceptual workflow or a summary of the experiment plan
●   References to workflows that were tested, with comments on their application for
    the user's use case
●   The workflow of the user's, possibly with a backlog of previous versions that the
    user wishes to keep for reference (with notes and comments)
●   The runs of the user's own workflow, results and the recorded steps that lead to
    the results, in some cases with comments for later reference (e.g. 'here I used
    parameter A, next time I may try B')
●   The final hypothesis, with comments.
●   A reference to the results of the workflow
●   Design logs that record the user's considerations while making the workflow
●   Run logs that record the user's considerations while running and interpreting the
    workflow
Where is Linked Data?
The Role of Linked Data in Wf4Ever
●   Collaborative science
●   Dynamic science
●   Open science
Provenance Challenge
●   Identity
●   Context
●   Storage
●   Retrieval
Take home
●   Provenance should be user-driven
●   Linked Data should be a means to an end
●   http://www.wf4ever-project.org
Acknowledgement
●   Marco Roos of Leiden Unveristy (NL) and Jose
    Enrique Ruiz of Instituto de Astrofísica de
    Andalucía (Spain)
●   Carole Goble of University of Manchester (UK)
    and Jose Manuel Gomez of iSOCO (Spain)
●   Hui Hua and Jenny Molly of University of
    Oxford (UK)

Weitere ähnliche Inhalte

Ähnlich wie 2011 03-provenance-workshop-edingurgh

Wf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science PreservationWf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science PreservationJoint ALMA Observatory
 
VO Course 12: Workflows & the Wf4Ever project
VO Course 12: Workflows & the Wf4Ever projectVO Course 12: Workflows & the Wf4Ever project
VO Course 12: Workflows & the Wf4Ever projectJoint ALMA Observatory
 
NLP in Web Data Extraction (Omer Gunes)
NLP in Web Data Extraction (Omer Gunes)NLP in Web Data Extraction (Omer Gunes)
NLP in Web Data Extraction (Omer Gunes)timfu
 
Expert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning EnvironmentExpert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning EnvironmentWolfgang Reinhardt
 
Piloting agile project management
Piloting agile project managementPiloting agile project management
Piloting agile project managementNatalie Collins
 
The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationSimeon Warner
 
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...CitoyensCapteurs
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16GenomeInABottle
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
Learning Objects
Learning ObjectsLearning Objects
Learning Objectsjohnmill
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objectsseanb
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
Validating ontologies with OOPS! - EKAW2012
Validating ontologies with OOPS! - EKAW2012Validating ontologies with OOPS! - EKAW2012
Validating ontologies with OOPS! - EKAW2012María Poveda Villalón
 
Core Java for Selenium
Core Java for SeleniumCore Java for Selenium
Core Java for SeleniumRajathi-QA
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsMelanie Courtot
 
PERICLES workshop (London 15 October 2015) - Introduction
PERICLES workshop (London 15 October 2015) - IntroductionPERICLES workshop (London 15 October 2015) - Introduction
PERICLES workshop (London 15 October 2015) - IntroductionPERICLES_FP7
 
Empirical se 2013-01-17
Empirical se 2013-01-17Empirical se 2013-01-17
Empirical se 2013-01-17Ivica Crnkovic
 

Ähnlich wie 2011 03-provenance-workshop-edingurgh (20)

Wf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science PreservationWf4Ever: Work!ows for Methodology and Science Preservation
Wf4Ever: Work!ows for Methodology and Science Preservation
 
VO Course 12: Workflows & the Wf4Ever project
VO Course 12: Workflows & the Wf4Ever projectVO Course 12: Workflows & the Wf4Ever project
VO Course 12: Workflows & the Wf4Ever project
 
NLP in Web Data Extraction (Omer Gunes)
NLP in Web Data Extraction (Omer Gunes)NLP in Web Data Extraction (Omer Gunes)
NLP in Web Data Extraction (Omer Gunes)
 
Expert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning EnvironmentExpert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning Environment
 
Piloting agile project management
Piloting agile project managementPiloting agile project management
Piloting agile project management
 
The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservation
 
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
 
Workflow Preservation
Workflow PreservationWorkflow Preservation
Workflow Preservation
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
Learning Objects
Learning ObjectsLearning Objects
Learning Objects
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objects
 
Post-graduate course: Object technology: Persistence.
Post-graduate course: Object technology: Persistence.Post-graduate course: Object technology: Persistence.
Post-graduate course: Object technology: Persistence.
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Validating ontologies with OOPS! - EKAW2012
Validating ontologies with OOPS! - EKAW2012Validating ontologies with OOPS! - EKAW2012
Validating ontologies with OOPS! - EKAW2012
 
Core Java for Selenium
Core Java for SeleniumCore Java for Selenium
Core Java for Selenium
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
 
PERICLES workshop (London 15 October 2015) - Introduction
PERICLES workshop (London 15 October 2015) - IntroductionPERICLES workshop (London 15 October 2015) - Introduction
PERICLES workshop (London 15 October 2015) - Introduction
 
Recognition at end of Year 1
Recognition at end of Year 1Recognition at end of Year 1
Recognition at end of Year 1
 
Empirical se 2013-01-17
Empirical se 2013-01-17Empirical se 2013-01-17
Empirical se 2013-01-17
 

Mehr von Jun Zhao

2010 10 provxg_datagovuk
2010 10 provxg_datagovuk2010 10 provxg_datagovuk
2010 10 provxg_datagovukJun Zhao
 
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovukJun Zhao
 
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_nextJun Zhao
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prvJun Zhao
 
2010 05 edinburgh
2010 05 edinburgh2010 05 edinburgh
2010 05 edinburghJun Zhao
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf OpenflydataJun Zhao
 
2009 09 Lod London
2009 09 Lod London2009 09 Lod London
2009 09 Lod LondonJun Zhao
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils FlywebJun Zhao
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Jun Zhao
 
myExperiment and AIDA
myExperiment and AIDAmyExperiment and AIDA
myExperiment and AIDAJun Zhao
 
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls CallJun Zhao
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao EswcJun Zhao
 
2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao LdowJun Zhao
 

Mehr von Jun Zhao (14)

2010 10 provxg_datagovuk
2010 10 provxg_datagovuk2010 10 provxg_datagovuk
2010 10 provxg_datagovuk
 
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk
 
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_next
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prv
 
2010 05 edinburgh
2010 05 edinburgh2010 05 edinburgh
2010 05 edinburgh
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
 
2009 09 Lod London
2009 09 Lod London2009 09 Lod London
2009 09 Lod London
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009
 
myExperiment and AIDA
myExperiment and AIDAmyExperiment and AIDA
myExperiment and AIDA
 
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls Call
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao Eswc
 
2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow
 

2011 03-provenance-workshop-edingurgh

  • 1. Provenance in the Dynamic, Collaborative New Science Dr Jun Zhao Department of Zoology University of Oxford jun.zhao@zoo.ox.ac.uk
  • 2.
  • 3.
  • 4.
  • 5. Technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines
  • 7. Astronomy Use Case: A Repeater's Story ● Dealing with big amounts of tabular data ● A lot of small scripts to avoid creating blackbox process ● Local resource sharing, public access only after publication ● Data must be frequently updated from external data repositories ● Data updates must be tested before being executed ● Data must be locally stored with versioning ● “... we don't like to spread [the tasks] and lose controls who is doing what ...”
  • 8. Research Objects http:/www.wf4ever-project.org ● Aggregation – Pointers or literals of internal and external content; ● Identity –Equivalence, equality; ● Metadata – A reusable object; ● Lifecycle – Stages of development. Impacts on available functionality; ● Versioning – Recording changes; ● Security – Access, authentication, ownership, trust; ● Graceful Degradation of Understanding – Opaque RO domain content. ● Mixed stewardship ● Provenance ROs are Content Aware Objects ● Of compound objects that bundle things together ● Of evolutions ● Of dynamic objects and static objects
  • 9. Biology Use Case: A Reuser's Story ● Takes a set of genes from gene experiment results performed by others, as read in a scientific paper ● Perform 'dry' analysis to understand which genes and which biological processes were disturbed by which chemical compounds ● basic affymetrix data processing ● statistical analysis to identify genes that are significantly differentially expressed under different conditions (with/without the compounds) ● find those pathways that are most prominent among the filtered genes
  • 10. Biology Use Case: A Reuser's Story ● Search for existing experiments from myExperiment (http://myexperiment.org) ● Challenge: Understand the workflow ● Perform test runs with test data and his own data ● Read others' logs ● Read annotations to workflows ● Reuse scripts from colleagues and perform tests that his colleagues are familiar with
  • 11. How Can It be Supported? ● A reference to the source of the data and the people to acknowledge for it. ● The initial hypothesis ● The conceptual workflow or a summary of the experiment plan ● References to workflows that were tested, with comments on their application for the user's use case ● The workflow of the user's, possibly with a backlog of previous versions that the user wishes to keep for reference (with notes and comments) ● The runs of the user's own workflow, results and the recorded steps that lead to the results, in some cases with comments for later reference (e.g. 'here I used parameter A, next time I may try B') ● The final hypothesis, with comments. ● A reference to the results of the workflow ● Design logs that record the user's considerations while making the workflow ● Run logs that record the user's considerations while running and interpreting the workflow
  • 13. The Role of Linked Data in Wf4Ever ● Collaborative science ● Dynamic science ● Open science
  • 14. Provenance Challenge ● Identity ● Context ● Storage ● Retrieval
  • 15. Take home ● Provenance should be user-driven ● Linked Data should be a means to an end ● http://www.wf4ever-project.org
  • 16. Acknowledgement ● Marco Roos of Leiden Unveristy (NL) and Jose Enrique Ruiz of Instituto de Astrofísica de Andalucía (Spain) ● Carole Goble of University of Manchester (UK) and Jose Manuel Gomez of iSOCO (Spain) ● Hui Hua and Jenny Molly of University of Oxford (UK)