SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
Scientific Workflow Management System


             Janus
             Provenance


Research
objects,
myExperiment,
and

Open
Provenance
for
collabora;ve
E‐science
                                               REPRISE
workshop
‐
IDCC’09
   Paolo Missier
    Information Management Group
    School of Computer Science, University of Manchester, UK


            with additional material by Sean Bechhofer and Matthew Gamble,
                                e-Labs design group, University of Manchester
                                                                                        1
                                                          IDCC’09, London - P.Missier
Momentum on sharing and collaboration
  Special issue of Nature on Data Sharing (Sept. 2009)




The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168–
169 (2009)
Prepublication data sharing:
Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9
September 2009          http://www.nature.com/news/specials/datasharing/index.html 2
                                                               IDCC’09, London - P.Missier
Momentum on sharing and collaboration
  Special issue of Nature on Data Sharing (Sept. 2009)
                          • timeliness requires rapid sharing
                          • repurposing
                          • the Human Genome project use case




The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168–
169 (2009)
Prepublication data sharing:
Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9
September 2009          http://www.nature.com/news/specials/datasharing/index.html 2
                                                               IDCC’09, London - P.Missier
Momentum on sharing and collaboration
  Special issue of Nature on Data Sharing (Sept. 2009)
                          • timeliness requires rapid sharing
                          • repurposing
                          • the Human Genome project use case
• Ongoing debate in several communities
   – Clinical trials [1]
   – Earth Sciences -- ESIP - data preservation / stewardship, 2009
   – Long established in some communities - Atmospheric sciences,
     1998 [2]
• Science Commons recommendations for Open Science
   – Open Science recommendations from Science Commons (July 2008) [link]

The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168–
169 (2009)
Prepublication data sharing:
Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9
September 2009          http://www.nature.com/news/specials/datasharing/index.html 2
                                                               IDCC’09, London - P.Missier
Reference scenario

   workflow             workflow
      +                 execution
input dataset
 specification




                                    3
Reference scenario

       workflow             workflow
          +                 execution
    input dataset
     specification




?




                                        3
Reference scenario

       workflow                workflow
          +                    execution
    input dataset
     specification




?
                                 outcome
                     outcome   (provenance)
                      (data)




                                           3
Reference scenario

       workflow                workflow
          +                    execution
    input dataset
     specification




?
                                 outcome
                     outcome   (provenance)
                      (data)




                                   Research
                                    Object
                                   Packaging


                                           3
Reference scenario

       workflow                workflow
          +                    execution
    input dataset
     specification




?
                                 outcome
                     outcome   (provenance)
                      (data)




                                   Research
                                    Object
                                   Packaging


                                           3
Reference scenario

                  workflow                workflow
                     +                    execution
               input dataset
                specification




?
                                            outcome
                                outcome   (provenance)
                                 (data)




     browse                                   Research
      query                                    Object
    unbundle                                  Packaging
      reuse

                                                      3
Reference scenario

                  workflow                workflow
                     +                    execution
               input dataset
                specification




?
      Data-mediated                         outcome
          implicit              outcome   (provenance)
       collaboration             (data)




     browse                                   Research
      query                                    Object
    unbundle                                  Packaging
      reuse

                                                      3
Collaboration through data

        What is needed for B to make sense of A’s data?


1.Packaging:
  – standards for self-descriptive data + metadata bundles:
    Research Objects


2.Content:
  – data format standardization efforts
  – metadata representation
     • process provenance
        –workflow provenance


3.Container:
  – a repository for Research Objects                                        4
                                               IDCC’09, London - P.Missier
Collaboration through data

        What is needed for B to make sense of A’s data?


1.Packaging:
  – standards for self-descriptive data + metadata bundles:
    Research Objects


2.Content:
  – data format standardization efforts
  – metadata representation
     • process provenance
        –workflow provenance


3.Container:
  – a repository for Research Objects                                        4
                                               IDCC’09, London - P.Missier
Collaboration through data

        What is needed for B to make sense of A’s data?


1.Packaging:
  – standards for self-descriptive data + metadata bundles:
    Research Objects


2.Content:
  – data format standardization efforts
  – metadata representation
     • process provenance
        –workflow provenance


3.Container:
  – a repository for Research Objects                                        4
                                               IDCC’09, London - P.Missier
Collaboration through data

        What is needed for B to make sense of A’s data?


1.Packaging:
  – standards for self-descriptive data + metadata bundles:
    Research Objects


2.Content:
  – data format standardization efforts
  – metadata representation
     • process provenance
        –workflow provenance


3.Container:
  – a repository for Research Objects                                        4
                                               IDCC’09, London - P.Missier
Paul’s

Paul’s
Pack
               QTL


 Research

  Object




              Common pathways
Paul’s

Paul’s
Pack
                                  QTL


 Research

  Object         Workflow 16

  Results

 Logs           Slides


  Workflow 13            Paper



                Results
                                 Common pathways
Paul’s

Paul’s
Pack
                                  QTL


 Research

  Object         Workflow 16

  Results

 Logs           Slides


  Workflow 13            Paper


                                             Representation

                Results
                                 Common pathways
Paul’s

Paul’s
Pack
                                  QTL


 Research

  Object         Workflow 16

  Results

 Logs           Slides


  Workflow 13            Paper


                                             Representation

                Results                      Domain Relations


                                 Common pathways
Paul’s

  Paul’s
Pack
                                                         QTL


   Research

    Object                     Workflow 16
                            produces
      Results
  Included in             Included in   Published in


    Logs                     Slides
produces
           Feeds into
                 Included in Included in

     Workflow 13                        Paper

                        produces         Published in

                                                                    Representation

                              Results                               Domain Relations


                                                        Common pathways
Paul’s

  Paul’s
Pack
                                                         QTL


   Research

    Object                     Workflow 16
                            produces
      Results
  Included in             Included in   Published in


    Logs                     Slides
produces
           Feeds into
                 Included in Included in

     Workflow 13                        Paper

                        produces         Published in

                                                                    Representation

                              Results                               Domain Relations

                                                                    Aggregation
                                                        Common pathways
Paul’s

  Paul’s
Pack
                                                         QTL


   Research

    Object                     Workflow 16
                            produces
      Results
  Included in             Included in   Published in


    Logs                     Slides
produces
           Feeds into
                 Included in Included in

     Workflow 13                        Paper
 Metadata               produces         Published in

                                                                    Representation

                              Results                               Domain Relations

                                                                    Aggregation
                                                        Common pathways
ORE: representing generic aggregations




Resource Map                                        Data structure
(descriptor)



   http://www.openarchives.org/ore/1.0/primer.html section 4
A. Pepe, M. Mayernik, C.L. Borgman, and H.V. Sompel, "From Artifacts to Aggregations:
Modeling Scientific Life Cycles on the Semantic Web," Journal of the American Society for
Information Science and Technology (JASIST), to appear, 2009.




                                                                                    6
Content: Workflow provenance




A detailed trace of workflow execution
- tasks performed, data transformations
- inputs used, outputs produced




                                  8
Content: Workflow provenance




A detailed trace of workflow execution
- tasks performed, data transformations
- inputs used, outputs produced




                                  8
Content: Workflow provenance




                                  A detailed trace of workflow execution
        lister
                                  - tasks performed, data transformations
                  get pathways
                   by genes1      - inputs used, outputs produced
                 merge pathways



     gene_id


    concat gene pathway ids

        output




pathway_genes
                                                                    8
Why provenance matters, if done right
• To establish quality, relevance, trust
• To track information attribution through complex transformations
• To describe one’s experiment to others, for understanding / reuse
• To provide evidence in support of scientific claims
• To enable post hoc process analysis for improvement, re-design




The W3C Incubator on Provenance has been collecting numerous use cases:
http://www.w3.org/2005/Incubator/prov/wiki/Use_Cases#




                                                      IDCC’09, London - P.Missier
What users expect to learn

                                  • Causal relations:
                                    - which pathways come from which genes?
                                    - which processes contributed to producing an
        lister                          image?
                                    -   which process(es) caused data to be incorrect?
                  get pathways
                   by genes1
                                    -   which data caused a process to fail?

                 merge pathways   • Process and data analytics:
                                    – analyze variations in output vs an input
     gene_id                          parameter sweep (multiple process runs)
                                    – how often has my favourite service been
    concat gene pathway ids           executed? on what inputs?
                                    – who produced this data?
        output
                                    – how often does this pathway turn up when the
                                      input genes range over a certain set S?

pathway_genes
                                                                                 10
                                                              IDCC’09, London - P.Missier
Open Provenance Model
• graph of causal dependencies involving data and processors
• not necessarily generated by a workflow!
• v1.0.1 currently open for comments


      wasGeneratedBy (R)
 A                                P
                                                     Goal:
       used (R)
 P                     A                             standardize causal dependencies
                                           to enable provenance metadata exchange

                                           wgb(R5)
 A1     wgb(R1)        used(R3)       A3             P1
                  P3
                                           wgb(R6)
 A2     wgb(R2)        used(R4)       A4             P2



                                                                                              11
                                                                IDCC’09, London - P.Missier
The 3rd provenance challenge

• Chosen workflow from the Pan-STARRS project
  – Panoramic Survey Telescope & Rapid Response Syste


• http://twiki.ipaw.info/bin/view/Challenge/
  ThirdProvenanceChallenge



• Goal:
  – demonstrate “provenance interoperability” at query level




                                                                              12
                                                IDCC’09, London - P.Missier
The 3rd provenance challenge workflow


read input file




load database




verify




                                13
The 3rd provenance challenge workflow


read input file




load database




verify




                                13
OPM and query-interoperability
Team A
                                 prov(WA)
 encode W                                    execute
                   run WA
   as WA                                     query Q




            OPM(prov(WA))         export    Q(prov(WA))
                                 prov(WA)




                                                          14
OPM and query-interoperability
Team A
                                     prov(WA)
 encode W                                            execute
                   run WA
   as WA                                             query Q




            OPM(prov(WA))             export        Q(prov(WA))
                                     prov(WA)




Team B
                                                     Q(PWA)


                                    PWA =
                            import(OPM(prov(WA)))

                                                      execute
              import
                                                      query Q
                                                                  14
OPM and query-interoperability
Team A
                                     prov(WA)
 encode W                                            execute
                   run WA
   as WA                                             query Q




            OPM(prov(WA))             export        Q(prov(WA))
                                     prov(WA)



                                                           ?
Team B
                                                     Q(PWA)


                                    PWA =
                            import(OPM(prov(WA)))

                                                      execute
              import
                                                      query Q
                                                                  14
OPM in Taverna
     skippable




                 15
OPM in Taverna
     skippable




                 15
OPM in Taverna
                                                     skippable

➡ the answer to any TP query can be viewed as an OPM graph
➡ encoded as RDF/XML (using the Tupelo provenance API)




                                                                 15
Additional requirements




                  16
Additional requirements
• Artifact values require uniform common identifier
  scheme
  – each group used artifacts to refer to its own data results
  – but those results were expressed using proprietary
    naming conventions
  – Linked Data in OPM?




                                                                 16
Additional requirements
• Artifact values require uniform common identifier
  scheme
  – each group used artifacts to refer to its own data results
  – but those results were expressed using proprietary
    naming conventions
  – Linked Data in OPM?

• OPM accounts for structural causal relationships
  – additional domain-specific knowledge required
  – attaching semantic annotations to OPM graph nodes




                                                                 16
Additional requirements
• Artifact values require uniform common identifier
  scheme
  – each group used artifacts to refer to its own data results
  – but those results were expressed using proprietary
    naming conventions
  – Linked Data in OPM?

• OPM accounts for structural causal relationships
  – additional domain-specific knowledge required
  – attaching semantic annotations to OPM graph nodes

• OPM graphs can grow very large
  – reduce size by exporting only query results
     • Taverna approach
  – multiple levels of abstraction
     • through OPM accounts (“points of view”)                   16
Query results as OPM graphs

                              prov(WA)
encode W                                  execute
                  run WA
  as WA                                   query Q




           OPM(prov(WA))       export    Q(prov(WA))
                              prov(WA)
Query results as OPM graphs

                              prov(WA)
encode W                                  execute
                  run WA
  as WA                                   query Q




           OPM(prov(WA))       export    Q(prov(WA))
                              prov(WA)
Query results as OPM graphs

                              prov(WA)
encode W                                    execute
                  run WA
  as WA                                     query Q




           OPM(prov(WA))       export      Q(prov(WA))
                             Q(prov(WA))
Query results as OPM graphs

                             prov(WA)
encode W                                   execute
                run WA
  as WA                                    query Q




       OPM(Q(prov(WA)))       export      Q(prov(WA))
                            Q(prov(WA))
Query results as OPM graphs

                                 prov(WA)
 encode W                                       execute
                  run WA
   as WA                                        query Q




         OPM(Q(prov(WA)))        export        Q(prov(WA))
                               Q(prov(WA))




- Approach implemented in Taverna 2.1
- Internal provenance DB with ad hoc query language
- To be released soon
Full-fledged data-mediated collaborations

 exp. A            workflow A +
                     input A

                     Research
                      Object result
              result    A
                          provenance
             datasets             A
                A




                                       18
Full-fledged data-mediated collaborations

 exp. A            workflow A +
                     input A

                     Research
                      Object result
              result    A
                          provenance
             datasets             A
                A




                                       18
Full-fledged data-mediated collaborations

                     exp. A         workflow A +
                                      input A

                                      Research
                                       Object result
                               result    A
                                           provenance
                              datasets             A
                                 A




result A → input B



                                                        18
Full-fledged data-mediated collaborations

                     exp. A             workflow A +
                                          input A

                                          Research
                                           Object result
                                   result    A
                                               provenance
                                  datasets             A
                                     A




                                        workflow B+
                                          input B

                                          Research
                                           Object result
                         exp. B    result    B
                                               provenance
result A → input B                datasets             B
                                     B


                                                            18
Full-fledged data-mediated collaborations




                workflow A +
                  input A      workflow B +
                                 inputB
     result A → input B
                       Research
        result          Object  result
       datasets result   A+B provenance
           A    datasets            A+B
                   B




                                              18
Full-fledged data-mediated collaborations




                workflow A +
                  input A      workflow B +
                                 inputB
     result A → input B
                       Research
        result          Object  result
       datasets result   A+B provenance
           A    datasets            A+B
                   B

      Provenance composition
       accounts for implicit
          collaboration



                                              18
Full-fledged data-mediated collaborations




                                workflow A +
                                  input A      workflow B +
                                                 inputB
                     result A → input B
                                       Research
                        result          Object  result
                       datasets result   A+B provenance
                           A    datasets            A+B
                                   B

                      Provenance composition
                       accounts for implicit
                          collaboration


Aligned with focus of upcoming Provenance Challenge 4:
“connect my provenance to yours" into a whole OPM provenance graph.
                                                                 18
Contacts


The myGrid Consortium (Manchester, Southampton)

                     http://mygrid.org.uk


                     http://www.myexperiment.org



       Janus         Me: pmissier@acm.org
       Provenance




                                               19

Weitere ähnliche Inhalte

Was ist angesagt?

Duplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringDuplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringIJORCS
 
Capabilities Brief Analytics
Capabilities Brief AnalyticsCapabilities Brief Analytics
Capabilities Brief AnalyticsDataTactics
 
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...IJORCS
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsAnubhav Jain
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAnubhav Jain
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Anubhav Jain
 
DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05John Cobb
 
Materials Informatics and Python
Materials Informatics and PythonMaterials Informatics and Python
Materials Informatics and PythonShintaro Fukushima
 
Statistical Analysis of Web of Data Usage
Statistical Analysis of Web of Data UsageStatistical Analysis of Web of Data Usage
Statistical Analysis of Web of Data UsageMarkus Luczak-Rösch
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbenchEuropean Data Forum
 

Was ist angesagt? (15)

Duplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using ClusteringDuplicate Detection of Records in Queries using Clustering
Duplicate Detection of Records in Queries using Clustering
 
Capabilities Brief Analytics
Capabilities Brief AnalyticsCapabilities Brief Analytics
Capabilities Brief Analytics
 
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
Privacy Preserving MFI Based Similarity Measure For Hierarchical Document Clu...
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Dmdw1
Dmdw1Dmdw1
Dmdw1
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05
 
Materials Informatics and Python
Materials Informatics and PythonMaterials Informatics and Python
Materials Informatics and Python
 
Statistical Analysis of Web of Data Usage
Statistical Analysis of Web of Data UsageStatistical Analysis of Web of Data Usage
Statistical Analysis of Web of Data Usage
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
EDF2012   Peter Boncz - LOD benchmarking SRbenchEDF2012   Peter Boncz - LOD benchmarking SRbench
EDF2012 Peter Boncz - LOD benchmarking SRbench
 

Andere mochten auch

Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Paolo Missier
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paolo Missier
 
Session talk @ AGU09
Session talk @ AGU09Session talk @ AGU09
Session talk @ AGU09Paolo Missier
 
PDT: Personal Data from Things, and its provenance
PDT: Personal Data from Things,and its provenancePDT: Personal Data from Things,and its provenance
PDT: Personal Data from Things, and its provenancePaolo Missier
 
Structured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paperStructured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paperPaolo Missier
 
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paolo Missier
 
SWPM12 report on the dagstuhl seminar on Semantic Data Management
SWPM12 report on the dagstuhl seminar on Semantic Data Management SWPM12 report on the dagstuhl seminar on Semantic Data Management
SWPM12 report on the dagstuhl seminar on Semantic Data Management Paolo Missier
 
Ipaw12 datalog paper talk
Ipaw12 datalog paper talkIpaw12 datalog paper talk
Ipaw12 datalog paper talkPaolo Missier
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paolo Missier
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Paolo Missier
 
ProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphsProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphsPaolo Missier
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBTPaolo Missier
 
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Your data won’t stay smart forever:exploring the temporal dimension of (big ...
Your data won’t stay smart forever: exploring the temporal dimension of (big ...Paolo Missier
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...Paolo Missier
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralPaolo Missier
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07Paolo Missier
 

Andere mochten auch (17)

Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
Invited talk @ Cardiff University, 2008: Approximate entity reconciliation fo...
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005
 
C4Bio paper talk
C4Bio paper talkC4Bio paper talk
C4Bio paper talk
 
Session talk @ AGU09
Session talk @ AGU09Session talk @ AGU09
Session talk @ AGU09
 
PDT: Personal Data from Things, and its provenance
PDT: Personal Data from Things,and its provenancePDT: Personal Data from Things,and its provenance
PDT: Personal Data from Things, and its provenance
 
Structured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paperStructured Occurrence Network for provenance: talk for ipaw12 paper
Structured Occurrence Network for provenance: talk for ipaw12 paper
 
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
Paper talk @ EDBT'10: Fine-grained and efficient lineage querying of collecti...
 
SWPM12 report on the dagstuhl seminar on Semantic Data Management
SWPM12 report on the dagstuhl seminar on Semantic Data Management SWPM12 report on the dagstuhl seminar on Semantic Data Management
SWPM12 report on the dagstuhl seminar on Semantic Data Management
 
Ipaw12 datalog paper talk
Ipaw12 datalog paper talkIpaw12 datalog paper talk
Ipaw12 datalog paper talk
 
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
 
ProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphsProvAbs: model, policy, and tooling for abstracting PROV graphs
ProvAbs: model, policy, and tooling for abstracting PROV graphs
 
Big Data Quality Panel : Diachron Workshop @EDBT
Big Data Quality Panel: Diachron Workshop @EDBTBig Data Quality Panel: Diachron Workshop @EDBT
Big Data Quality Panel : Diachron Workshop @EDBT
 
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Your data won’t stay smart forever:exploring the temporal dimension of (big ...
Your data won’t stay smart forever: exploring the temporal dimension of (big ...
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
 
Paper presentation @DILS'07
Paper presentation @DILS'07Paper presentation @DILS'07
Paper presentation @DILS'07
 

Ähnlich wie Scientific Workflow Management System Research Objects

Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
The Future of Digital Science - World Science Forum 2011
The Future of Digital Science - World Science Forum 2011The Future of Digital Science - World Science Forum 2011
The Future of Digital Science - World Science Forum 2011Kaitlin Thaney
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research ObjectsDavid De Roure
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarshiptsbbbu
 
Make your data great now
Make your data great nowMake your data great now
Make your data great nowDaniel JACOB
 
Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?LIBER Europe
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsJason Hattrick-Simpers
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
Publishing of Scientific Data  - Science Foundation Ireland Summit 2010Publishing of Scientific Data  - Science Foundation Ireland Summit 2010
Publishing of Scientific Data - Science Foundation Ireland Summit 2010jodischneider
 
Linking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papersLinking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papersAnita de Waard
 
Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...CITE
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycleSherry Lake
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials DataIan Foster
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006raj_vij
 

Ähnlich wie Scientific Workflow Management System Research Objects (20)

NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Calder palgrave uksg2
Calder palgrave uksg2Calder palgrave uksg2
Calder palgrave uksg2
 
Calder palgrave uksg2
Calder palgrave uksg2Calder palgrave uksg2
Calder palgrave uksg2
 
The Future of Digital Science - World Science Forum 2011
The Future of Digital Science - World Science Forum 2011The Future of Digital Science - World Science Forum 2011
The Future of Digital Science - World Science Forum 2011
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
Make your data great now
Make your data great nowMake your data great now
Make your data great now
 
Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?Where is the opportunity for libraries in the collaborative data infrastructure?
Where is the opportunity for libraries in the collaborative data infrastructure?
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
Publishing of Scientific Data  - Science Foundation Ireland Summit 2010Publishing of Scientific Data  - Science Foundation Ireland Summit 2010
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
 
Linking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papersLinking data to publications: Towards the execution of papers
Linking data to publications: Towards the execution of papers
 
Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...Book Recommendation System using Data Mining for the University of Hong Kong ...
Book Recommendation System using Data Mining for the University of Hong Kong ...
 
Managing the research life cycle
Managing the research life cycleManaging the research life cycle
Managing the research life cycle
 
Networking Materials Data
Networking Materials DataNetworking Materials Data
Networking Materials Data
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 

Mehr von Paolo Missier

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 

Mehr von Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 

Kürzlich hochgeladen

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Kürzlich hochgeladen (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Scientific Workflow Management System Research Objects

  • 1. Scientific Workflow Management System Janus Provenance Research
objects,
myExperiment,
and
 Open
Provenance
for
collabora;ve
E‐science REPRISE
workshop
‐
IDCC’09 Paolo Missier Information Management Group School of Computer Science, University of Manchester, UK with additional material by Sean Bechhofer and Matthew Gamble, e-Labs design group, University of Manchester 1 IDCC’09, London - P.Missier
  • 2. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168– 169 (2009) Prepublication data sharing: Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9 September 2009 http://www.nature.com/news/specials/datasharing/index.html 2 IDCC’09, London - P.Missier
  • 3. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) • timeliness requires rapid sharing • repurposing • the Human Genome project use case The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168– 169 (2009) Prepublication data sharing: Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9 September 2009 http://www.nature.com/news/specials/datasharing/index.html 2 IDCC’09, London - P.Missier
  • 4. Momentum on sharing and collaboration Special issue of Nature on Data Sharing (Sept. 2009) • timeliness requires rapid sharing • repurposing • the Human Genome project use case • Ongoing debate in several communities – Clinical trials [1] – Earth Sciences -- ESIP - data preservation / stewardship, 2009 – Long established in some communities - Atmospheric sciences, 1998 [2] • Science Commons recommendations for Open Science – Open Science recommendations from Science Commons (July 2008) [link] The Toronto group: Toronto International Data Release Workshop Authors, Nature 461, 168– 169 (2009) Prepublication data sharing: Nature 461, 168-170 (10 September 2009) | doi:10.1038/461168a; Published online 9 September 2009 http://www.nature.com/news/specials/datasharing/index.html 2 IDCC’09, London - P.Missier
  • 5. Reference scenario workflow workflow + execution input dataset specification 3
  • 6. Reference scenario workflow workflow + execution input dataset specification ? 3
  • 7. Reference scenario workflow workflow + execution input dataset specification ? outcome outcome (provenance) (data) 3
  • 8. Reference scenario workflow workflow + execution input dataset specification ? outcome outcome (provenance) (data) Research Object Packaging 3
  • 9. Reference scenario workflow workflow + execution input dataset specification ? outcome outcome (provenance) (data) Research Object Packaging 3
  • 10. Reference scenario workflow workflow + execution input dataset specification ? outcome outcome (provenance) (data) browse Research query Object unbundle Packaging reuse 3
  • 11. Reference scenario workflow workflow + execution input dataset specification ? Data-mediated outcome implicit outcome (provenance) collaboration (data) browse Research query Object unbundle Packaging reuse 3
  • 12. Collaboration through data What is needed for B to make sense of A’s data? 1.Packaging: – standards for self-descriptive data + metadata bundles: Research Objects 2.Content: – data format standardization efforts – metadata representation • process provenance –workflow provenance 3.Container: – a repository for Research Objects 4 IDCC’09, London - P.Missier
  • 13. Collaboration through data What is needed for B to make sense of A’s data? 1.Packaging: – standards for self-descriptive data + metadata bundles: Research Objects 2.Content: – data format standardization efforts – metadata representation • process provenance –workflow provenance 3.Container: – a repository for Research Objects 4 IDCC’09, London - P.Missier
  • 14. Collaboration through data What is needed for B to make sense of A’s data? 1.Packaging: – standards for self-descriptive data + metadata bundles: Research Objects 2.Content: – data format standardization efforts – metadata representation • process provenance –workflow provenance 3.Container: – a repository for Research Objects 4 IDCC’09, London - P.Missier
  • 15. Collaboration through data What is needed for B to make sense of A’s data? 1.Packaging: – standards for self-descriptive data + metadata bundles: Research Objects 2.Content: – data format standardization efforts – metadata representation • process provenance –workflow provenance 3.Container: – a repository for Research Objects 4 IDCC’09, London - P.Missier
  • 16. Paul’s
 Paul’s
Pack QTL Research
 Object Common pathways
  • 17. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Results Common pathways
  • 18. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Representation Results Common pathways
  • 19. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 Results Logs Slides Workflow 13 Paper Representation Results Domain Relations Common pathways
  • 20. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 produces Results Included in Included in Published in Logs Slides produces Feeds into Included in Included in Workflow 13 Paper produces Published in Representation Results Domain Relations Common pathways
  • 21. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 produces Results Included in Included in Published in Logs Slides produces Feeds into Included in Included in Workflow 13 Paper produces Published in Representation Results Domain Relations Aggregation Common pathways
  • 22. Paul’s
 Paul’s
Pack QTL Research
 Object Workflow 16 produces Results Included in Included in Published in Logs Slides produces Feeds into Included in Included in Workflow 13 Paper Metadata produces Published in Representation Results Domain Relations Aggregation Common pathways
  • 23. ORE: representing generic aggregations Resource Map Data structure (descriptor) http://www.openarchives.org/ore/1.0/primer.html section 4 A. Pepe, M. Mayernik, C.L. Borgman, and H.V. Sompel, "From Artifacts to Aggregations: Modeling Scientific Life Cycles on the Semantic Web," Journal of the American Society for Information Science and Technology (JASIST), to appear, 2009. 6
  • 24.
  • 25. Content: Workflow provenance A detailed trace of workflow execution - tasks performed, data transformations - inputs used, outputs produced 8
  • 26. Content: Workflow provenance A detailed trace of workflow execution - tasks performed, data transformations - inputs used, outputs produced 8
  • 27. Content: Workflow provenance A detailed trace of workflow execution lister - tasks performed, data transformations get pathways by genes1 - inputs used, outputs produced merge pathways gene_id concat gene pathway ids output pathway_genes 8
  • 28. Why provenance matters, if done right • To establish quality, relevance, trust • To track information attribution through complex transformations • To describe one’s experiment to others, for understanding / reuse • To provide evidence in support of scientific claims • To enable post hoc process analysis for improvement, re-design The W3C Incubator on Provenance has been collecting numerous use cases: http://www.w3.org/2005/Incubator/prov/wiki/Use_Cases# IDCC’09, London - P.Missier
  • 29. What users expect to learn • Causal relations: - which pathways come from which genes? - which processes contributed to producing an lister image? - which process(es) caused data to be incorrect? get pathways by genes1 - which data caused a process to fail? merge pathways • Process and data analytics: – analyze variations in output vs an input gene_id parameter sweep (multiple process runs) – how often has my favourite service been concat gene pathway ids executed? on what inputs? – who produced this data? output – how often does this pathway turn up when the input genes range over a certain set S? pathway_genes 10 IDCC’09, London - P.Missier
  • 30. Open Provenance Model • graph of causal dependencies involving data and processors • not necessarily generated by a workflow! • v1.0.1 currently open for comments wasGeneratedBy (R) A P Goal: used (R) P A standardize causal dependencies to enable provenance metadata exchange wgb(R5) A1 wgb(R1) used(R3) A3 P1 P3 wgb(R6) A2 wgb(R2) used(R4) A4 P2 11 IDCC’09, London - P.Missier
  • 31. The 3rd provenance challenge • Chosen workflow from the Pan-STARRS project – Panoramic Survey Telescope & Rapid Response Syste • http://twiki.ipaw.info/bin/view/Challenge/ ThirdProvenanceChallenge • Goal: – demonstrate “provenance interoperability” at query level 12 IDCC’09, London - P.Missier
  • 32. The 3rd provenance challenge workflow read input file load database verify 13
  • 33. The 3rd provenance challenge workflow read input file load database verify 13
  • 34. OPM and query-interoperability Team A prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) prov(WA) 14
  • 35. OPM and query-interoperability Team A prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) prov(WA) Team B Q(PWA) PWA = import(OPM(prov(WA))) execute import query Q 14
  • 36. OPM and query-interoperability Team A prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) prov(WA) ? Team B Q(PWA) PWA = import(OPM(prov(WA))) execute import query Q 14
  • 37. OPM in Taverna skippable 15
  • 38. OPM in Taverna skippable 15
  • 39. OPM in Taverna skippable ➡ the answer to any TP query can be viewed as an OPM graph ➡ encoded as RDF/XML (using the Tupelo provenance API) 15
  • 41. Additional requirements • Artifact values require uniform common identifier scheme – each group used artifacts to refer to its own data results – but those results were expressed using proprietary naming conventions – Linked Data in OPM? 16
  • 42. Additional requirements • Artifact values require uniform common identifier scheme – each group used artifacts to refer to its own data results – but those results were expressed using proprietary naming conventions – Linked Data in OPM? • OPM accounts for structural causal relationships – additional domain-specific knowledge required – attaching semantic annotations to OPM graph nodes 16
  • 43. Additional requirements • Artifact values require uniform common identifier scheme – each group used artifacts to refer to its own data results – but those results were expressed using proprietary naming conventions – Linked Data in OPM? • OPM accounts for structural causal relationships – additional domain-specific knowledge required – attaching semantic annotations to OPM graph nodes • OPM graphs can grow very large – reduce size by exporting only query results • Taverna approach – multiple levels of abstraction • through OPM accounts (“points of view”) 16
  • 44. Query results as OPM graphs prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) prov(WA)
  • 45. Query results as OPM graphs prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) prov(WA)
  • 46. Query results as OPM graphs prov(WA) encode W execute run WA as WA query Q OPM(prov(WA)) export Q(prov(WA)) Q(prov(WA))
  • 47. Query results as OPM graphs prov(WA) encode W execute run WA as WA query Q OPM(Q(prov(WA))) export Q(prov(WA)) Q(prov(WA))
  • 48. Query results as OPM graphs prov(WA) encode W execute run WA as WA query Q OPM(Q(prov(WA))) export Q(prov(WA)) Q(prov(WA)) - Approach implemented in Taverna 2.1 - Internal provenance DB with ad hoc query language - To be released soon
  • 49. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A 18
  • 50. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A 18
  • 51. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A result A → input B 18
  • 52. Full-fledged data-mediated collaborations exp. A workflow A + input A Research Object result result A provenance datasets A A workflow B+ input B Research Object result exp. B result B provenance result A → input B datasets B B 18
  • 53. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B 18
  • 54. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B Provenance composition accounts for implicit collaboration 18
  • 55. Full-fledged data-mediated collaborations workflow A + input A workflow B + inputB result A → input B Research result Object result datasets result A+B provenance A datasets A+B B Provenance composition accounts for implicit collaboration Aligned with focus of upcoming Provenance Challenge 4: “connect my provenance to yours" into a whole OPM provenance graph. 18
  • 56. Contacts The myGrid Consortium (Manchester, Southampton) http://mygrid.org.uk http://www.myexperiment.org Janus Me: pmissier@acm.org Provenance 19