SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Work at ISI,
                 Current Status,
                   Next Steps

                       Daniel Garijo Verdejo,
                           Oscar Corcho,
                            Yolanda Gil


Ontology Engineering Group. Laboratorio de Inteligencia Artificial
            Departamento de Inteligencia Artificial
                   Facultad de Informática
              Universidad Politécnica de Madrid
                                                           Date: 29/11/2012
What am I working on?


•Creation of abstractions in scientific workflows
   •Workflow Traces and template representation
       •Provenance representation
       •Plan representation

   •Abstraction catalog

   •Find ways to link the definitions to the provenance traces automatically


•Understandability and reuse of scientific workflows




                                                                               2
Outline


Index

1.   Motivation
2.   Overview
3.   Workflow systems used
4.   Summary of work done in my previous visit to ISI
     •   OPMW and provenance publishing
5. Summary of work done before second visit to ISI
     •   Workflow motif catalog
6. Summary of work done in my second visit to ISI
     •   OPMW-PROV and P-PLAN
     •   Automatic macro abstraction detection
7. Next Steps
8. Future work

                                                            3
Motivation


•As a designer: Discovery

   •Workflows with similar functionality fragments/methods

   •Design based in previous templates.

•As user/reuser: Understandability

   •Search workflows by functionality

   •Commonalities between execution runs

   •Component categorization




                                                                    4
Overview

                                             Descriptions/
Abstraction definitions and categorization   PSMS/Ontologies


                                             Data mining tools,
                                             graph analysis, etc.
   Algorithms for finding the different
       abstractions automatically



                                             RDF Stores
         Experiment publication


                                             Vocabularies
   Provenance                 Plan
 representation          representation



                                                                    5
Taverna and Wings




                                                        http://www.taverna.org.uk/




http://www.wings-workflows.org/




                                                                                       6
                              IEEE eScience 2012. Chicago, USA
Summary: Previous Work at ISI


Abstractions definitions and categorization




   Algorithms for finding the different
       abstractions automatically



                                              Virtuoso,
         Experiment Publication               Pubby,
                                              Wings (+Plugin)

                                              OPMW
    Provenance                 Plan
  representation          representation



                                                                7
High level architecture


                                                                                    Other
                                                                                    workflow
                  WINGS on local laptop                                             environments
                         Workflow
          Core           Template        OPM
         Portal          Workflow       export
                         Instance
                                                               Programatic access
                                                                 (external apps)
                  WINGS on shared host
                         Workflow                  Linked
          Core           Template        OPM
         Portal                         export      Data
                         Workflow
                         Instance                Publication       Interactive
                  WINGS on web server
                                                                    Browsing
                         Workflow                               (Pubby frontend)
          Core           Template      OPM
         Portal                       export                                            Users
                        Workflow
                        Instance



Wings workflow                OPM
                                                 Publication      Share               Reuse
  generation                conversion




                                                                                                8
OPMW: Process view




               9
OPMW: Attribution view




                   10
Work previous to second visit to ISI


                                              Motif Detection
Abstractions definitions and categorization




   Algorithms for automatic matching




                                              Virtuoso,
         Experiment Publication               Pubby,
                                              Wings (+Plugin)

    Provenance                 Plan            OPMW
  representation          representation



                                                                 11
Overview



      • Empirical analysis on 177 workflow templates from Taverna and
      Wings


      • Catalog of recurring patterns: scientific
      workflow motifs.

               • Data Oriented Motifs

               • Workflow Oriented Motifs


      •Understandability and reuse

http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg
                                                                                                  12
                                                          IEEE eScience 2012. Chicago, USA
Approach


•Reverse-engineer the set of current practices in workflow
development through an analysis of empirical evidence

•Identify workflow abstractions that would facilitate
understandability and therefore effective re-use




                                                                  13
                      IEEE eScience 2012. Chicago, USA
Workflow Motifs


•Workflow motif: Domain independent conceptual abstraction on the workflow
steps.
1. Data-oriented motifs: What kind of manipulations does the workflow have?

     •E.g.:
          •Data retrieval
          •Data preparation
          • etc.
                                                          WHAT?
2.   Workflow-oriented motifs: How does the workflow perform its operations?

     •E.g.:
          •Stateful steps
          •Stateless steps
          •Human interactions
          •etc.
                                                                 HOW?
                                                                                14
                              IEEE eScience 2012. Chicago, USA
Motif Catalog
Data-Oriented Motifs                   Workflow-Oriented Motifs

  Data Retrieval                            Intra-Workflow Motifs
  Data Preparation                                      Stateful (Asynchronous) Invocations
          Format Transformation                         Stateless (Synchronous) Invocations
          Input Augmentation                            Internal Macros
           and Output Splitting
                                                        Human Interactions
          Data Organisation

  Data Analysis                             Inter-Workflow Motifs

  Data Curation/Cleaning                                Atomic Workflows

                                                         Composite Workflows
  Data Moving

  Data Visualisation                                    Workflow Overloading

                                                                                         15
                              IEEE eScience 2012. Chicago, USA
Summary: Work done at ISI


                                                 Motif Detection
Abstractions definitions and categorization


                                                  SUBDUE
                                                  exploration and
                                 Macro
Algorithms for automatic                          integration in
                               abstraction
        matching                                  RDF
                                detection



                                                 Virtuoso,
         Experiment Publication                  Pubby,
                                                 Wings (+Plugin)

    Provenance                  Plan              OPMW + PROV
  representation           representation         + P-PLAN



                                                                    16
PROV Compatibility

•OPMW fits naturally into PROV
    •Same usage-generation
    structure
    •Extension for the scientific
    workflow with PROV

•Binary relationships (no n-ary
patterns used).
    •Simplicity

•Publication of PROV as well as
OPMW.
    •Queries can be answered in
    both languages.
    •Flexibility.

•http://www.opmw.org/node/8



                                    17
P-PLAN




•Plans are not provenance
•P-PLAN: Simple plan model for binding traces to template representations
•Aligned with OPMW and PROV
•Documentation in progress
                                                                               18
Summary: Work done at ISI


                                                 Motif Detection
Abstractions definitions and categorization


                                                  SUBDUE
                                                  exploration and
                                 Macro
Algorithms for automatic                          integration in
                               abstraction
        matching                                  RDF
                                detection



                                                 Virtuoso,
         Experiment Publication                  Pubby,
                                                 Wings (+Plugin)

    Provenance                  Plan              OPMW + PROV
  representation           representation         + P-PLAN



                                                                    19
Macro abstraction detection

Problem statement:

Given a repository of workflow templates (either abstract or specific) or workflow
execution traces, what are the workflow fragments I can deduce from it?


Useful for:
•Systems like Taverna and Wings: (Many templates, little annotation to relate them)
     •Finding relationships between workflows and sub-workflows.
          •Most used fragments, most executed, etc.

    •Systems like GenePattern and Galaxy: (Many runs, nearly no templates
    published)
         •Proposing new templates with the popular fragments.




                                                                                      20
Macro abstraction detection

•Work in Progress (implementation and evaluation)
   •WINGS traces

•Similar to Sub-graph Isomorphism
•Kind of “Graph Clustering”

•Early results
     •Tool for finding common sub-graphs
          •Sequential graphs
          •Efficient
          •Scalable.

    •Integration with RDF (by me)

•TO DO:
    •Finish implementation: inference.
    •Evaluation!!

                                                                            21
Next Steps




       22
Next Steps


•Thesis:
    •Finish up implementation.
    •How to evaluate results?

•Publications:
    •Workshop:
         •Provenance Corpus (with Taverna Team). To have something citable
    •Conference:
         •KCAP: Macro detection implementation and evaluation.
    •Journal
         •Decay analysis publication in journal (January)
         •OPMW - PROV -P-PLAN publication in journal (December)
         •Motif extension publication in journal (Invited by special issue)
         (Now)




                                                                              23
Future work


•Thesis:

    •Other methods for detecting workflow abstraction automatically
        •Metadata and file analysis (diff, etc.): Filter, merge, etc.
        •Provenance reconstruction.

•Project:

    •RO model specifications
    •Testcases
    •Workflow abstraction with Isoco




                                                                            24
Thanks !
                     :oscarCorcho                                   :yolandGil




                             :supervises
                                                  :supervises

                                                    :danielGarijo
:khalidBelhajjame
                                                                                 :varunRatnakar
                                                            :collaboratesWith
                     :collaboratesWith




                              :collaboratesWith
                                                  :collaboratesWith


                    :caroleGoble




                                                          :pinarAlper

                                                                                                       25
Work at ISI,
        relation with wf4Ever,
              future steps
                        Daniel Garijo Verdejo

Ontology Engineering Group. Laboratorio de Inteligencia Artificial
            Departamento de Inteligencia Artificial
                   Facultad de Informática
              Universidad Politécnica de Madrid




                                                           Date: 03/10/2011

Weitere ähnliche Inhalte

Ähnlich wie Status update OEG - Nov 2012

Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Cloudera, Inc.
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesNarayan Bharadwaj
 
Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...MOCA Platform
 
Chisimba - introduction to practical demo
Chisimba - introduction to practical demoChisimba - introduction to practical demo
Chisimba - introduction to practical demoDerek Keats
 
SharePoint 2010 as a Development Platform
SharePoint 2010 as a Development PlatformSharePoint 2010 as a Development Platform
SharePoint 2010 as a Development PlatformAyman El-Hattab
 
The SENSORIA Development Environment
The SENSORIA Development EnvironmentThe SENSORIA Development Environment
The SENSORIA Development EnvironmentIstvan Rath
 
Webinar - An Open Source Platform for Business Process Mining
Webinar - An Open Source Platform for Business Process MiningWebinar - An Open Source Platform for Business Process Mining
Webinar - An Open Source Platform for Business Process MiningSpagoWorld
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieChicago Hadoop Users Group
 
Alfresco day madrid jeff potts - activiti
Alfresco day madrid   jeff potts - activitiAlfresco day madrid   jeff potts - activiti
Alfresco day madrid jeff potts - activitiAlfresco Software
 
Alfresco Day Madrid - Jeff Potts - Activiti
Alfresco Day Madrid - Jeff Potts - ActivitiAlfresco Day Madrid - Jeff Potts - Activiti
Alfresco Day Madrid - Jeff Potts - ActivitiToni de la Fuente
 
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...Perforce
 
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service OCCIware
 
Single API for library services (poster)
Single API for library services (poster)Single API for library services (poster)
Single API for library services (poster)Milan Janíček
 
Open Source
Open SourceOpen Source
Open Sourceblamb
 
What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009Stefane Fermigier
 
EclipseConEurope2012 SOA - Models As Operational Documentation
EclipseConEurope2012 SOA - Models As Operational DocumentationEclipseConEurope2012 SOA - Models As Operational Documentation
EclipseConEurope2012 SOA - Models As Operational DocumentationMarc Dutoo
 

Ähnlich wie Status update OEG - Nov 2012 (20)

Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
Hadoop World 2011: Proven Tools to Manage Hadoop Environments - Joey Jablonsk...
 
Dreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_CasesDreamforce_2012_Hadoop_Use_Cases
Dreamforce_2012_Hadoop_Use_Cases
 
Introducing spring
Introducing springIntroducing spring
Introducing spring
 
Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...Online performance modeling and analysis of message-passing parallel applicat...
Online performance modeling and analysis of message-passing parallel applicat...
 
Chisimba - introduction to practical demo
Chisimba - introduction to practical demoChisimba - introduction to practical demo
Chisimba - introduction to practical demo
 
SharePoint 2010 as a Development Platform
SharePoint 2010 as a Development PlatformSharePoint 2010 as a Development Platform
SharePoint 2010 as a Development Platform
 
The SENSORIA Development Environment
The SENSORIA Development EnvironmentThe SENSORIA Development Environment
The SENSORIA Development Environment
 
Using R with Hadoop
Using R with HadoopUsing R with Hadoop
Using R with Hadoop
 
Webinar - An Open Source Platform for Business Process Mining
Webinar - An Open Source Platform for Business Process MiningWebinar - An Open Source Platform for Business Process Mining
Webinar - An Open Source Platform for Business Process Mining
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
Alfresco day madrid jeff potts - activiti
Alfresco day madrid   jeff potts - activitiAlfresco day madrid   jeff potts - activiti
Alfresco day madrid jeff potts - activiti
 
Alfresco Day Madrid - Jeff Potts - Activiti
Alfresco Day Madrid - Jeff Potts - ActivitiAlfresco Day Madrid - Jeff Potts - Activiti
Alfresco Day Madrid - Jeff Potts - Activiti
 
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...
 
Spring fundamentals
Spring fundamentalsSpring fundamentals
Spring fundamentals
 
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service
OCCiware A Formal and Tooled Toolchain For Managing Everything as a Service
 
Single API for library services (poster)
Single API for library services (poster)Single API for library services (poster)
Single API for library services (poster)
 
Open Source
Open SourceOpen Source
Open Source
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009What's new in Nuxeo 5.2? - Solutions Linux 2009
What's new in Nuxeo 5.2? - Solutions Linux 2009
 
EclipseConEurope2012 SOA - Models As Operational Documentation
EclipseConEurope2012 SOA - Models As Operational DocumentationEclipseConEurope2012 SOA - Models As Operational Documentation
EclipseConEurope2012 SOA - Models As Operational Documentation
 

Mehr von dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflowsdgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineeringdgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 

Mehr von dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
Automated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific WorkflowsAutomated Hypothesis Testing with Large Scale Scientific Workflows
Automated Hypothesis Testing with Large Scale Scientific Workflows
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 

Kürzlich hochgeladen

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Kürzlich hochgeladen (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Status update OEG - Nov 2012

  • 1. Work at ISI, Current Status, Next Steps Daniel Garijo Verdejo, Oscar Corcho, Yolanda Gil Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid Date: 29/11/2012
  • 2. What am I working on? •Creation of abstractions in scientific workflows •Workflow Traces and template representation •Provenance representation •Plan representation •Abstraction catalog •Find ways to link the definitions to the provenance traces automatically •Understandability and reuse of scientific workflows 2
  • 3. Outline Index 1. Motivation 2. Overview 3. Workflow systems used 4. Summary of work done in my previous visit to ISI • OPMW and provenance publishing 5. Summary of work done before second visit to ISI • Workflow motif catalog 6. Summary of work done in my second visit to ISI • OPMW-PROV and P-PLAN • Automatic macro abstraction detection 7. Next Steps 8. Future work 3
  • 4. Motivation •As a designer: Discovery •Workflows with similar functionality fragments/methods •Design based in previous templates. •As user/reuser: Understandability •Search workflows by functionality •Commonalities between execution runs •Component categorization 4
  • 5. Overview Descriptions/ Abstraction definitions and categorization PSMS/Ontologies Data mining tools, graph analysis, etc. Algorithms for finding the different abstractions automatically RDF Stores Experiment publication Vocabularies Provenance Plan representation representation 5
  • 6. Taverna and Wings http://www.taverna.org.uk/ http://www.wings-workflows.org/ 6 IEEE eScience 2012. Chicago, USA
  • 7. Summary: Previous Work at ISI Abstractions definitions and categorization Algorithms for finding the different abstractions automatically Virtuoso, Experiment Publication Pubby, Wings (+Plugin) OPMW Provenance Plan representation representation 7
  • 8. High level architecture Other workflow WINGS on local laptop environments Workflow Core Template OPM Portal Workflow export Instance Programatic access (external apps) WINGS on shared host Workflow Linked Core Template OPM Portal export Data Workflow Instance Publication Interactive WINGS on web server Browsing Workflow (Pubby frontend) Core Template OPM Portal export Users Workflow Instance Wings workflow OPM Publication Share Reuse generation conversion 8
  • 11. Work previous to second visit to ISI Motif Detection Abstractions definitions and categorization Algorithms for automatic matching Virtuoso, Experiment Publication Pubby, Wings (+Plugin) Provenance Plan OPMW representation representation 11
  • 12. Overview • Empirical analysis on 177 workflow templates from Taverna and Wings • Catalog of recurring patterns: scientific workflow motifs. • Data Oriented Motifs • Workflow Oriented Motifs •Understandability and reuse http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg 12 IEEE eScience 2012. Chicago, USA
  • 13. Approach •Reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence •Identify workflow abstractions that would facilitate understandability and therefore effective re-use 13 IEEE eScience 2012. Chicago, USA
  • 14. Workflow Motifs •Workflow motif: Domain independent conceptual abstraction on the workflow steps. 1. Data-oriented motifs: What kind of manipulations does the workflow have? •E.g.: •Data retrieval •Data preparation • etc. WHAT? 2. Workflow-oriented motifs: How does the workflow perform its operations? •E.g.: •Stateful steps •Stateless steps •Human interactions •etc. HOW? 14 IEEE eScience 2012. Chicago, USA
  • 15. Motif Catalog Data-Oriented Motifs Workflow-Oriented Motifs Data Retrieval Intra-Workflow Motifs Data Preparation Stateful (Asynchronous) Invocations Format Transformation Stateless (Synchronous) Invocations Input Augmentation Internal Macros and Output Splitting Human Interactions Data Organisation Data Analysis Inter-Workflow Motifs Data Curation/Cleaning Atomic Workflows Composite Workflows Data Moving Data Visualisation Workflow Overloading 15 IEEE eScience 2012. Chicago, USA
  • 16. Summary: Work done at ISI Motif Detection Abstractions definitions and categorization SUBDUE exploration and Macro Algorithms for automatic integration in abstraction matching RDF detection Virtuoso, Experiment Publication Pubby, Wings (+Plugin) Provenance Plan OPMW + PROV representation representation + P-PLAN 16
  • 17. PROV Compatibility •OPMW fits naturally into PROV •Same usage-generation structure •Extension for the scientific workflow with PROV •Binary relationships (no n-ary patterns used). •Simplicity •Publication of PROV as well as OPMW. •Queries can be answered in both languages. •Flexibility. •http://www.opmw.org/node/8 17
  • 18. P-PLAN •Plans are not provenance •P-PLAN: Simple plan model for binding traces to template representations •Aligned with OPMW and PROV •Documentation in progress 18
  • 19. Summary: Work done at ISI Motif Detection Abstractions definitions and categorization SUBDUE exploration and Macro Algorithms for automatic integration in abstraction matching RDF detection Virtuoso, Experiment Publication Pubby, Wings (+Plugin) Provenance Plan OPMW + PROV representation representation + P-PLAN 19
  • 20. Macro abstraction detection Problem statement: Given a repository of workflow templates (either abstract or specific) or workflow execution traces, what are the workflow fragments I can deduce from it? Useful for: •Systems like Taverna and Wings: (Many templates, little annotation to relate them) •Finding relationships between workflows and sub-workflows. •Most used fragments, most executed, etc. •Systems like GenePattern and Galaxy: (Many runs, nearly no templates published) •Proposing new templates with the popular fragments. 20
  • 21. Macro abstraction detection •Work in Progress (implementation and evaluation) •WINGS traces •Similar to Sub-graph Isomorphism •Kind of “Graph Clustering” •Early results •Tool for finding common sub-graphs •Sequential graphs •Efficient •Scalable. •Integration with RDF (by me) •TO DO: •Finish implementation: inference. •Evaluation!! 21
  • 23. Next Steps •Thesis: •Finish up implementation. •How to evaluate results? •Publications: •Workshop: •Provenance Corpus (with Taverna Team). To have something citable •Conference: •KCAP: Macro detection implementation and evaluation. •Journal •Decay analysis publication in journal (January) •OPMW - PROV -P-PLAN publication in journal (December) •Motif extension publication in journal (Invited by special issue) (Now) 23
  • 24. Future work •Thesis: •Other methods for detecting workflow abstraction automatically •Metadata and file analysis (diff, etc.): Filter, merge, etc. •Provenance reconstruction. •Project: •RO model specifications •Testcases •Workflow abstraction with Isoco 24
  • 25. Thanks ! :oscarCorcho :yolandGil :supervises :supervises :danielGarijo :khalidBelhajjame :varunRatnakar :collaboratesWith :collaboratesWith :collaboratesWith :collaboratesWith :caroleGoble :pinarAlper 25
  • 26. Work at ISI, relation with wf4Ever, future steps Daniel Garijo Verdejo Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid Date: 03/10/2011