SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Oscar Corcho
(with contributions from Olga Giraldo, Alexander García,
and Idafen Santana)
http://www.oeg-upm.net/index.php/en/researchareas/3-
semanticscience/index.html
Ontology Engineering Group
Universidad Politécnica de Madrid, Spain
Towards Reproducible Science: a
few building blocks from my
personal experience
ocorcho@fi.upm.es
@ocorcho
22/10/2017
S4BioDiv2017, Vienna
Towards Reproducible Science
Introduction
2
HYPOTHESIS CONVINCE
AUDIENCE
REPEATABLE
SCIENTIFIC EXPERIMENTS
Towards Reproducible Science
Introduction
3
SCIENTIFIC EXPERIMENTS
IN VIVO/VITRO IN SILICO
Alison’s
biodiversity
scientists
Towards Reproducible Science
Introduction
4
SCIENTIFIC EXPERIMENTS
IN VIVO/VITRO IN SILICO
REPEATABILITY
Alison’s
biodiversity
scientists
Towards Reproducible Science 5
 Before continuing….
What does reproducibility
mean for you?
And for your colleagues?
And for the colleagues from
other disciplines?
Towards Reproducible Science
The R* brouhaha
6
Source: The R* brouhaha. Goble C. RDA-Europe’s workshop on RepScience 2016.
Towards Reproducible Science
My own take on terminology
PRESERVATION
CONSERVATION
7
Towards Reproducible Science
My own take on terminology
PRESERVATION
CONSERVATION
REPLICABILITY
REPRODUCIBILITY
8
Towards Reproducible Science
Experiment components
9
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
Towards Reproducible Science
Experiment components
10
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
This has attracted most
of the attention so far
Towards Reproducible Science
Block 1. Experimental Protocols
11
Olga Giraldo
Alexander Garcia
Explore alternative ways for documenting and
retrieving information from experimental protocols
Using Semantics and NLP in the SMART Protocols Repository. Giraldo O, García-Castro
A, Corcho O - ICBO, 2015
Using Semantics and Natural Language Processing in Experimental Protocols. Giraldo
O, García-Castro A, Figueredo J, Corcho O - J Biomedical Semantics, to appear
SMART protocols: semantic representation for experimental protocols. Giraldo O,
García-Castro A, Corcho O – Linked Science 2014
Towards Reproducible Science
What is an experimental protocol
 Experimental protocols
are like cooking recipes
 They have ingredients:
reagents and sample
 They have appliances:
equipment,
 They have a list of instructions,
The protocols should have
complete information that
allows anybody to recreate an
experiment.
 They have a total time
 They have critical steps…
Towards Reproducible Science
Some of the issues we aim at addressing
• Incubate the
centrifuge tubes in a
water bath.
• Incubate the samples
for 5 min with gentle
shaking.
• Rinse DNA briefly in
1-2 ml of wash.
• Incubate at -20C
overnight.
 some protocols present insufficient
granularity,
 the instructions can be imprecise or
ambiguous due to the use of natural
language.
 The protocols lack structure
Towards Reproducible Science
Bio-ontologies
OBI, EXPO, EXACT, BAO, IAO, ERO…
Data repository
for making data
available
few efforts focus on
representing and
standardizing
experimental protocols.
For reproducibility
purposes, if the data
must be available, so
does the experimental
protocol detailing the
methodology followed
to derive the data.
Resources for
reporting guidelines or
Minimum Information
standards
Ingredients for Improving Reproducibility
Towards Reproducible Science
Main research question
How to formalize the information from
laboratory protocols as a knowledge base?
Towards Reproducible Science
Our approach
• Ontology model representing lab protocols
• Gazetteer-based method: use existing lists of named
entities
 Lists of proper nouns, which refer to real-life entities
• Rule-based approaches:
write manual extraction
rules
• Development of a Gold
Standard of protocols
annotated manually
Towards Reproducible Science
SMART Protocols ontology
17
http://vocab.linkeddata.es/SMARTProtocols/
https://smartprotocols.github.io/
Towards Reproducible Science
The SIRO model
Sample/Specimen
(whole organism, anatomical
part, bodily fluids, etc.)
Instruments
(equipment, devices,
consumables, software)
Reagents
(chemical compounds,
mixtures)
Objective
(purpose)
The SIRO model
supports search,
retrieval and
classification of
experimental protocols
Towards Reproducible Science
Design of semantic Gazetteer and JAPE rules
Design of semantic Gazetteers
• Facilitate the annotation of instances
related to:
 Experimental actions
 Instruments
 Samples/ organisms
 Reagents
Design of grammar
rules
• Facilitate the
annotation of
instructions
Towards Reproducible Science
Development of a Gold Standard
100 protocols published in
several repositories
Annotators - experts in
life sciences
http://smart-
protocols.labs.linkingdata.io/dist/d
ev/#/login
The SMART Protocols
Annotation Tool
Guidelines about What
and How annotate
Materials:
• BioTechniques,
• CSH-Protocols,
• Current protocols,
• Genet and Mol. Res,
• Journal of Biolog. Methods,
• Jove,
• MethodsX,
• Nature protocols exchange,
• Nature protocols
• Curso BIOS 2016, Colombia
• Universidad del Valle,
Colombia
• Japan (Database Center for
Life Science (DBCLS),
Robotic Biology Institute
(RBI), Spiber, Yachie-Lab,
University of Tokyo).
• Universidad Santiago de
Cali, Colombia
Towards Reproducible Science
Preliminary results
Entities sample instrument reagent objective
Sample Neural cell 3 0 0 0
neural stem cells (NSCs) 3 0 0 0
Instrument Cell culture centrifuge 0 3 0 0
cell culture incubator 0 3 0 0
Microscope 0 3 0 0
Millicell culture plate inserts 8-?m pore size 0 3 0 0
reagent B27 supplement 0 0 3 0
DMEM/F12 0 0 3 0
FGF2 neutralizing antibody 0 0 3 0
glucose 0 0 3 0
objective Here we describe two migration assays, a matrigel migration assay
and a Boyden chamber migration assay, which allow the in
vitro assessment of neural migration under defined conditions
(Ladewig, Koch and Brüstle, 2014).
0 0 0 3
entities sample instrument reagent
Reagent - Sample/Organism Ac-omega viral DNA 1 2
baculoviral 1 2
DNA insert 2 1
I-Sce I meganuclease 1 2
Sample/Organism Insect cells 3
Instrument spinner 3
Centrifuge 3
Flask 3
Reagent IPL-41 powdered 3
Liposome formulation 3
Phenol:chloroform 3
Fleiss Kappa for 3
raters = 1.0
Fleiss Kappa for 3
raters = 0.755
Towards Reproducible Science
Our ongoing work
22
 So far, this is ok for handling protocols that have
been already reported in papers
Can we actually change the way in which
these protocols are produced?
Towards Reproducible Science
Platform for publishing semantic protocols
Features:
 Open semantic publishing platform
o The protocols are born semantic
 Self describing documents
o Meaningful entities
o Machine procesable workflows
 Documents will reference existing URIs
o Samples/organisms
o Reagents/chemical compounds
o Instruments
SMART Protocols Ontology /
Gazetteers / Grammar rules
UniProt
NCBI taxonomy
PubChem
Vendors
Towards Reproducible Science
Platform available at: http://smartprotocols.labs.linkingdata.io/app/protocols
The platform
Towards Reproducible Science
25
Capturing relevant elements in the document
Towards Reproducible Science
Organisms come from the UniProt Taxon API
26
After selecting
an organism,
the
correspondent
ID is
automatically
recorded
Towards Reproducible Science
Reagents come from the PubChem API
Towards Reproducible Science
Machine processable
workflows
Step
Step
Step
Step
Step
Towards Reproducible Science
Final edited protocol, also available as bioschemas
Towards Reproducible Science
Block 2. Computational Environments
30
Idafen Santana
Is it possible to describe the main properties of the
Execution Environment of a Computational Scientific
Experiment and, based on this description, derive a
reproduction process for generating an equivalent
environment using virtualization techniques?
Conservation of Computational Scientific Execution Environments for Workflow-
based Experiments Using Ontologies. Santana-Pérez I. PhD thesis, 2016.
http://oa.upm.es/39520/
Towards Reproducible Science
Experiment components
31
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
Towards Reproducible Science
Experiment components
32
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
Experiment components
33
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
Experiment components
34
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
Experiment components
35
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
bundles and relates digital resources of a scientific experiment
or investigation using standard mechanisms, “tool middleware”
http://www.w3.org/community/rosc/
http://www.researchobject.org/
Towards Reproducible Science
Experiment components
38
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
Towards Reproducible Science
Open Research Problems
39
Towards Reproducible Science
Open Research Problems
40
 Computational Infrastructures are usually a predefined
element of a Computational Scientific Workflow.
Towards Reproducible Science
Open Research Problems
41
 Computational Infrastructures are usually a predefined
element of a Computational Scientific Workflow.
 Execution Environments are poorly described.
Towards Reproducible Science
Open Research Problems
42
 Computational Infrastructures are usually a predefined
element of a Computational Scientific Workflow.
 Execution Environments are poorly described.
 Current reproducibility approaches for computational
experiments consider mostly data and procedure.
Towards Reproducible Science
Representation
43
CLOUD
 Describing execution environments
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
Towards Reproducible Science
Representation
 WICUS ontology network
o Workflow Infrastructure Conservation Using Semantics
o http://purl.org/net/wicus
o 5 ontologies
• WICUS Workflow Execution Requirements ontology
• WICUS Software Stack ontology
• WICUS Hardware Specs ontology
• WICUS Scientific Virtual Appliance ontology
• WICUS Ontology: links the previous ontologies
44
Towards Reproducible Science
WICUS ontology network
 WICUS Workflow Execution Requirements ontology
o http://purl.org/net/wicus-reqs
45
Towards Reproducible Science
WICUS ontology network
 WICUS Software Stack ontology
o http://purl.org/net/wicus-stack
46
Towards Reproducible Science
WICUS ontology network
 WICUS Scientific Virtual Appliance ontology
o http://purl.org/net/wicus-sva
47
Towards Reproducible Science
WICUS ontology network
 WICUS Hardware Specs ontology
o http://purl.org/net/wicus-hwspecs
48
Towards Reproducible Science
WICUS ontology network
 WICUS ontology network
o http://purl.org/net/wicus
49
Towards Reproducible Science
WICUS ontology network
 WICUS ontology network
o http://purl.org/net/wicus
50
Towards Reproducible Science
WICUS system
 Overview, inputs and outputs
51
Towards Reproducible Science
Evaluation
 Workflows reproduced
o 3 scientific domains
o 3 workflow management systems
o 6 different workflows
52
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
(2003) (2014)(2014) (2015) (2011)(2011)
Towards Reproducible Science
Evaluation
53
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
CLOU
D
EQUIVALENT EXECUTION
ENVIRONMENTSEMANTIC
ANNOTATIONS
COMPARE
Towards Reproducible Science
Evaluation
54
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
Towards Reproducible Science
Evaluation
55
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
• Non-deterministic
• Standard and error output
• Generated files equivalent
Towards Reproducible Science
Evaluation
56
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
• Same results
• Results from Int. Extinction
may vary
Towards Reproducible Science
Evaluation
57
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
• Genomic data
• Exact match
Towards Reproducible Science
Evaluation
58
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
Towards Reproducible Science
Summarizing
 Two building blocks towards reproducibility of
scientific experiments
o In vivo/vitro
• Focus on providing structured descriptions of methods
(laboratory protocols)
• Our tools: ontologies, gazeteers, NLP tools and
automatic and manual annotation tools
• Challenge: make protocols be more structured (and
semantic) from the beginning
o In silico
• Focus on the equipment (computational infrastructure)
for workflow-based experiments
• Ontologies, automatic and manual annotation tools, and
an execution environment
• Challenge: keep track of all types of appliances, and
make scientists work on providing annotations
 Is this enough?
59
Towards Reproducible Science
Summarizing
 Is this enough?
Clearly not, but a step forward
towards ensuring reproducibility
(with a focus on methods)
60
Oscar Corcho
(with contributions from Olga Giraldo, Alexander García,
and Idafen Santana)
Ontology Engineering Group
Universidad Politécnica de Madrid, Spain
Towards Reproducible Science: a
few building blocks from my
personal experience
ocorcho@fi.upm.es
@ocorcho
22/10/2017
S4BioDiv2017, Vienna
Towards Reproducible Science
Light pollution (www.stars4all.eu)

Weitere ähnliche Inhalte

Was ist angesagt?

Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious DiseaseJoão André Carriço
 
Flash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysisFlash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysisAndrea Telatin
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsAndrea Telatin
 
The benefits of environment specific curation of the public databases for tax...
The benefits of environment specific curation of the public databases for tax...The benefits of environment specific curation of the public databases for tax...
The benefits of environment specific curation of the public databases for tax...Aaron Marc Saunders
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Jordan.Ramsby.resume
Jordan.Ramsby.resumeJordan.Ramsby.resume
Jordan.Ramsby.resumeJordan Ramsby
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur PipelineEman Abdelrazik
 
Presentation cybernetics immunology-ver1.02 (for-criticism)
Presentation cybernetics immunology-ver1.02 (for-criticism)Presentation cybernetics immunology-ver1.02 (for-criticism)
Presentation cybernetics immunology-ver1.02 (for-criticism)EmadfHABIB2
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterMonica Munoz-Torres
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Barbera van Schaik
 
Integrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS dataIntegrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS dataJoão André Carriço
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeBM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeLeighton Pritchard
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Alejandra Gonzalez-Beltran
 
Legionella Laboratory Testing | Biosan Laboratories
Legionella Laboratory Testing | Biosan LaboratoriesLegionella Laboratory Testing | Biosan Laboratories
Legionella Laboratory Testing | Biosan LaboratoriesAnthony Lewis
 

Was ist angesagt? (20)

Computational Resources In Infectious Disease
Computational Resources In Infectious DiseaseComputational Resources In Infectious Disease
Computational Resources In Infectious Disease
 
Flash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysisFlash introduction to Qiime2 -- 16S Amplicon analysis
Flash introduction to Qiime2 -- 16S Amplicon analysis
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked2015_CV_J_SHELTON_linked
2015_CV_J_SHELTON_linked
 
The benefits of environment specific curation of the public databases for tax...
The benefits of environment specific curation of the public databases for tax...The benefits of environment specific curation of the public databases for tax...
The benefits of environment specific curation of the public databases for tax...
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Eccmid meet the expert 2015
Eccmid meet the expert 2015Eccmid meet the expert 2015
Eccmid meet the expert 2015
 
Jordan.Ramsby.resume
Jordan.Ramsby.resumeJordan.Ramsby.resume
Jordan.Ramsby.resume
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
 
Presentation cybernetics immunology-ver1.02 (for-criticism)
Presentation cybernetics immunology-ver1.02 (for-criticism)Presentation cybernetics immunology-ver1.02 (for-criticism)
Presentation cybernetics immunology-ver1.02 (for-criticism)
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of Exeter
 
2013 biodesign EPFL project summary
2013 biodesign EPFL project summary2013 biodesign EPFL project summary
2013 biodesign EPFL project summary
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
 
Integrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS dataIntegrating phylogenetic inference and metadata visualization for NGS data
Integrating phylogenetic inference and metadata visualization for NGS data
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeBM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of Strathclyde
 
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
 
Legionella Laboratory Testing | Biosan Laboratories
Legionella Laboratory Testing | Biosan LaboratoriesLegionella Laboratory Testing | Biosan Laboratories
Legionella Laboratory Testing | Biosan Laboratories
 

Ähnlich wie Towards Reproducible Science: a few building blocks from my personal experience

Using semantics and NLP in experimental protocols
Using semantics and NLP in experimental protocolsUsing semantics and NLP in experimental protocols
Using semantics and NLP in experimental protocolsOlga Ximena Giraldo
 
The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)Oscar Corcho
 
Patent awareness particularly in Bio-science related inventions
Patent awareness particularly in Bio-science related inventionsPatent awareness particularly in Bio-science related inventions
Patent awareness particularly in Bio-science related inventionsPankaj Kumar
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Monica Munoz-Torres
 
China Medical University Student ePaper2
China Medical University Student ePaper2China Medical University Student ePaper2
China Medical University Student ePaper2Isabelle Chiu
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and modelsmyGrid team
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10PICNIC Festival
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012KUPKB_Team
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeChris Mungall
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyBarry Smith
 
Biochips seminar report
Biochips seminar reportBiochips seminar report
Biochips seminar reportGolam Murshid
 
Biological database by kk sahu
Biological database by kk sahuBiological database by kk sahu
Biological database by kk sahuKAUSHAL SAHU
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalENCODE-DCC
 
Modern Biological Tools and Techniques
Modern Biological Tools and TechniquesModern Biological Tools and Techniques
Modern Biological Tools and TechniquesFatima_Carino23
 

Ähnlich wie Towards Reproducible Science: a few building blocks from my personal experience (20)

Using semantics and NLP in experimental protocols
Using semantics and NLP in experimental protocolsUsing semantics and NLP in experimental protocols
Using semantics and NLP in experimental protocols
 
The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)The role of annotation in reproducibility (Empirical 2014)
The role of annotation in reproducibility (Empirical 2014)
 
Patent awareness particularly in Bio-science related inventions
Patent awareness particularly in Bio-science related inventionsPatent awareness particularly in Bio-science related inventions
Patent awareness particularly in Bio-science related inventions
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
 
14A81A05A3
14A81A05A314A81A05A3
14A81A05A3
 
China Medical University Student ePaper2
China Medical University Student ePaper2China Medical University Student ePaper2
China Medical University Student ePaper2
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10Life Sciences De-Mystified - Mark Bünger - PICNIC '10
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012
 
Collaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of LifeCollaboratively Creating the Knowledge Graph of Life
Collaboratively Creating the Knowledge Graph of Life
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
Introduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental BiologyIntroduction to Ontologies for Environmental Biology
Introduction to Ontologies for Environmental Biology
 
How Bio.Kitchen
How Bio.Kitchen How Bio.Kitchen
How Bio.Kitchen
 
Biochips seminar report
Biochips seminar reportBiochips seminar report
Biochips seminar report
 
Biological database by kk sahu
Biological database by kk sahuBiological database by kk sahu
Biological database by kk sahu
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE Portal
 
Modern Biological Tools and Techniques
Modern Biological Tools and TechniquesModern Biological Tools and Techniques
Modern Biological Tools and Techniques
 

Mehr von Oscar Corcho

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOscar Corcho
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Oscar Corcho
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management Oscar Corcho
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosOscar Corcho
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOscar Corcho
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Oscar Corcho
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaOscar Corcho
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyOscar Corcho
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...Oscar Corcho
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101Oscar Corcho
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET Oscar Corcho
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Oscar Corcho
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadOscar Corcho
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016Oscar Corcho
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaOscar Corcho
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesOscar Corcho
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Oscar Corcho
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Oscar Corcho
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 

Mehr von Oscar Corcho (20)

Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de MadridOrganisational Interoperability in Practice at Universidad Politécnica de Madrid
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
 
Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020Introducción a los Datos Abiertos - Open Data Day 2020
Introducción a los Datos Abiertos - Open Data Day 2020
 
Open Data (and Software, and other Research Artefacts) - A proper management
Open Data (and Software, and other Research Artefacts) -A proper managementOpen Data (and Software, and other Research Artefacts) -A proper management
Open Data (and Software, and other Research Artefacts) - A proper management
 
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosAdiós a los ficheros, hola a los grafos de conocimientos estadísticos
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
 
Ontology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data SharingOntology Engineering at Scale for Open City Data Sharing
Ontology Engineering at Scale for Open City Data Sharing
 
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Situación de las iniciativas de Open Data internacionales (y algunas recomen...
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
 
STARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación LumínicaSTARS4ALL - Contaminación Lumínica
STARS4ALL - Contaminación Lumínica
 
Publishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case studyPublishing Linked Statistical Data: Aragón, a case study
Publishing Linked Statistical Data: Aragón, a case study
 
An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...An initial analysis of topic-based similarity among scientific documents base...
An initial analysis of topic-based similarity among scientific documents base...
 
Linked Statistical Data 101
Linked Statistical Data 101Linked Statistical Data 101
Linked Statistical Data 101
 
Aplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMETAplicando los principios de Linked Data en AEMET
Aplicando los principios de Linked Data en AEMET
 
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Ojo Al Data 100 - Call for sharing session at IODC 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016
 
Educando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidadEducando sobre datos abiertos: desde el colegio a la universidad
Educando sobre datos abiertos: desde el colegio a la universidad
 
STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016STARS4ALL general presentation at ALAN2016
STARS4ALL general presentation at ALAN2016
 
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaGeneración de datos estadísticos enlazados del Instituto Aragonés de Estadística
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
 
Presentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart CitiesPresentación de la red de excelencia de Open Data y Smart Cities
Presentación de la red de excelencia de Open Data y Smart Cities
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?Linked Statistical Data: does it actually pay off?
Linked Statistical Data: does it actually pay off?
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 

Kürzlich hochgeladen

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 

Kürzlich hochgeladen (20)

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 

Towards Reproducible Science: a few building blocks from my personal experience

  • 1. Oscar Corcho (with contributions from Olga Giraldo, Alexander García, and Idafen Santana) http://www.oeg-upm.net/index.php/en/researchareas/3- semanticscience/index.html Ontology Engineering Group Universidad Politécnica de Madrid, Spain Towards Reproducible Science: a few building blocks from my personal experience ocorcho@fi.upm.es @ocorcho 22/10/2017 S4BioDiv2017, Vienna
  • 2. Towards Reproducible Science Introduction 2 HYPOTHESIS CONVINCE AUDIENCE REPEATABLE SCIENTIFIC EXPERIMENTS
  • 3. Towards Reproducible Science Introduction 3 SCIENTIFIC EXPERIMENTS IN VIVO/VITRO IN SILICO Alison’s biodiversity scientists
  • 4. Towards Reproducible Science Introduction 4 SCIENTIFIC EXPERIMENTS IN VIVO/VITRO IN SILICO REPEATABILITY Alison’s biodiversity scientists
  • 5. Towards Reproducible Science 5  Before continuing…. What does reproducibility mean for you? And for your colleagues? And for the colleagues from other disciplines?
  • 6. Towards Reproducible Science The R* brouhaha 6 Source: The R* brouhaha. Goble C. RDA-Europe’s workshop on RepScience 2016.
  • 7. Towards Reproducible Science My own take on terminology PRESERVATION CONSERVATION 7
  • 8. Towards Reproducible Science My own take on terminology PRESERVATION CONSERVATION REPLICABILITY REPRODUCIBILITY 8
  • 9. Towards Reproducible Science Experiment components 9 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  • 10. Towards Reproducible Science Experiment components 10 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO This has attracted most of the attention so far
  • 11. Towards Reproducible Science Block 1. Experimental Protocols 11 Olga Giraldo Alexander Garcia Explore alternative ways for documenting and retrieving information from experimental protocols Using Semantics and NLP in the SMART Protocols Repository. Giraldo O, García-Castro A, Corcho O - ICBO, 2015 Using Semantics and Natural Language Processing in Experimental Protocols. Giraldo O, García-Castro A, Figueredo J, Corcho O - J Biomedical Semantics, to appear SMART protocols: semantic representation for experimental protocols. Giraldo O, García-Castro A, Corcho O – Linked Science 2014
  • 12. Towards Reproducible Science What is an experimental protocol  Experimental protocols are like cooking recipes  They have ingredients: reagents and sample  They have appliances: equipment,  They have a list of instructions, The protocols should have complete information that allows anybody to recreate an experiment.  They have a total time  They have critical steps…
  • 13. Towards Reproducible Science Some of the issues we aim at addressing • Incubate the centrifuge tubes in a water bath. • Incubate the samples for 5 min with gentle shaking. • Rinse DNA briefly in 1-2 ml of wash. • Incubate at -20C overnight.  some protocols present insufficient granularity,  the instructions can be imprecise or ambiguous due to the use of natural language.  The protocols lack structure
  • 14. Towards Reproducible Science Bio-ontologies OBI, EXPO, EXACT, BAO, IAO, ERO… Data repository for making data available few efforts focus on representing and standardizing experimental protocols. For reproducibility purposes, if the data must be available, so does the experimental protocol detailing the methodology followed to derive the data. Resources for reporting guidelines or Minimum Information standards Ingredients for Improving Reproducibility
  • 15. Towards Reproducible Science Main research question How to formalize the information from laboratory protocols as a knowledge base?
  • 16. Towards Reproducible Science Our approach • Ontology model representing lab protocols • Gazetteer-based method: use existing lists of named entities  Lists of proper nouns, which refer to real-life entities • Rule-based approaches: write manual extraction rules • Development of a Gold Standard of protocols annotated manually
  • 17. Towards Reproducible Science SMART Protocols ontology 17 http://vocab.linkeddata.es/SMARTProtocols/ https://smartprotocols.github.io/
  • 18. Towards Reproducible Science The SIRO model Sample/Specimen (whole organism, anatomical part, bodily fluids, etc.) Instruments (equipment, devices, consumables, software) Reagents (chemical compounds, mixtures) Objective (purpose) The SIRO model supports search, retrieval and classification of experimental protocols
  • 19. Towards Reproducible Science Design of semantic Gazetteer and JAPE rules Design of semantic Gazetteers • Facilitate the annotation of instances related to:  Experimental actions  Instruments  Samples/ organisms  Reagents Design of grammar rules • Facilitate the annotation of instructions
  • 20. Towards Reproducible Science Development of a Gold Standard 100 protocols published in several repositories Annotators - experts in life sciences http://smart- protocols.labs.linkingdata.io/dist/d ev/#/login The SMART Protocols Annotation Tool Guidelines about What and How annotate Materials: • BioTechniques, • CSH-Protocols, • Current protocols, • Genet and Mol. Res, • Journal of Biolog. Methods, • Jove, • MethodsX, • Nature protocols exchange, • Nature protocols • Curso BIOS 2016, Colombia • Universidad del Valle, Colombia • Japan (Database Center for Life Science (DBCLS), Robotic Biology Institute (RBI), Spiber, Yachie-Lab, University of Tokyo). • Universidad Santiago de Cali, Colombia
  • 21. Towards Reproducible Science Preliminary results Entities sample instrument reagent objective Sample Neural cell 3 0 0 0 neural stem cells (NSCs) 3 0 0 0 Instrument Cell culture centrifuge 0 3 0 0 cell culture incubator 0 3 0 0 Microscope 0 3 0 0 Millicell culture plate inserts 8-?m pore size 0 3 0 0 reagent B27 supplement 0 0 3 0 DMEM/F12 0 0 3 0 FGF2 neutralizing antibody 0 0 3 0 glucose 0 0 3 0 objective Here we describe two migration assays, a matrigel migration assay and a Boyden chamber migration assay, which allow the in vitro assessment of neural migration under defined conditions (Ladewig, Koch and Brüstle, 2014). 0 0 0 3 entities sample instrument reagent Reagent - Sample/Organism Ac-omega viral DNA 1 2 baculoviral 1 2 DNA insert 2 1 I-Sce I meganuclease 1 2 Sample/Organism Insect cells 3 Instrument spinner 3 Centrifuge 3 Flask 3 Reagent IPL-41 powdered 3 Liposome formulation 3 Phenol:chloroform 3 Fleiss Kappa for 3 raters = 1.0 Fleiss Kappa for 3 raters = 0.755
  • 22. Towards Reproducible Science Our ongoing work 22  So far, this is ok for handling protocols that have been already reported in papers Can we actually change the way in which these protocols are produced?
  • 23. Towards Reproducible Science Platform for publishing semantic protocols Features:  Open semantic publishing platform o The protocols are born semantic  Self describing documents o Meaningful entities o Machine procesable workflows  Documents will reference existing URIs o Samples/organisms o Reagents/chemical compounds o Instruments SMART Protocols Ontology / Gazetteers / Grammar rules UniProt NCBI taxonomy PubChem Vendors
  • 24. Towards Reproducible Science Platform available at: http://smartprotocols.labs.linkingdata.io/app/protocols The platform
  • 25. Towards Reproducible Science 25 Capturing relevant elements in the document
  • 26. Towards Reproducible Science Organisms come from the UniProt Taxon API 26 After selecting an organism, the correspondent ID is automatically recorded
  • 27. Towards Reproducible Science Reagents come from the PubChem API
  • 28. Towards Reproducible Science Machine processable workflows Step Step Step Step Step
  • 29. Towards Reproducible Science Final edited protocol, also available as bioschemas
  • 30. Towards Reproducible Science Block 2. Computational Environments 30 Idafen Santana Is it possible to describe the main properties of the Execution Environment of a Computational Scientific Experiment and, based on this description, derive a reproduction process for generating an equivalent environment using virtualization techniques? Conservation of Computational Scientific Execution Environments for Workflow- based Experiments Using Ontologies. Santana-Pérez I. PhD thesis, 2016. http://oa.upm.es/39520/
  • 31. Towards Reproducible Science Experiment components 31 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  • 32. Towards Reproducible Science Experiment components 32 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  • 33. Towards Reproducible Science Experiment components 33 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  • 34. Towards Reproducible Science Experiment components 34 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  • 35. Towards Reproducible Science Experiment components 35 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  • 36. Towards Reproducible Science bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms, “tool middleware” http://www.w3.org/community/rosc/ http://www.researchobject.org/
  • 37. Towards Reproducible Science Experiment components 38 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  • 38. Towards Reproducible Science Open Research Problems 39
  • 39. Towards Reproducible Science Open Research Problems 40  Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.
  • 40. Towards Reproducible Science Open Research Problems 41  Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.  Execution Environments are poorly described.
  • 41. Towards Reproducible Science Open Research Problems 42  Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.  Execution Environments are poorly described.  Current reproducibility approaches for computational experiments consider mostly data and procedure.
  • 42. Towards Reproducible Science Representation 43 CLOUD  Describing execution environments FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT
  • 43. Towards Reproducible Science Representation  WICUS ontology network o Workflow Infrastructure Conservation Using Semantics o http://purl.org/net/wicus o 5 ontologies • WICUS Workflow Execution Requirements ontology • WICUS Software Stack ontology • WICUS Hardware Specs ontology • WICUS Scientific Virtual Appliance ontology • WICUS Ontology: links the previous ontologies 44
  • 44. Towards Reproducible Science WICUS ontology network  WICUS Workflow Execution Requirements ontology o http://purl.org/net/wicus-reqs 45
  • 45. Towards Reproducible Science WICUS ontology network  WICUS Software Stack ontology o http://purl.org/net/wicus-stack 46
  • 46. Towards Reproducible Science WICUS ontology network  WICUS Scientific Virtual Appliance ontology o http://purl.org/net/wicus-sva 47
  • 47. Towards Reproducible Science WICUS ontology network  WICUS Hardware Specs ontology o http://purl.org/net/wicus-hwspecs 48
  • 48. Towards Reproducible Science WICUS ontology network  WICUS ontology network o http://purl.org/net/wicus 49
  • 49. Towards Reproducible Science WICUS ontology network  WICUS ontology network o http://purl.org/net/wicus 50
  • 50. Towards Reproducible Science WICUS system  Overview, inputs and outputs 51
  • 51. Towards Reproducible Science Evaluation  Workflows reproduced o 3 scientific domains o 3 workflow management systems o 6 different workflows 52 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST (2003) (2014)(2014) (2015) (2011)(2011)
  • 52. Towards Reproducible Science Evaluation 53 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results FORMER EQUIPMENT ANNOTATE REPRODUCE CLOU D EQUIVALENT EXECUTION ENVIRONMENTSEMANTIC ANNOTATIONS COMPARE
  • 53. Towards Reproducible Science Evaluation 54 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE
  • 54. Towards Reproducible Science Evaluation 55 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE • Non-deterministic • Standard and error output • Generated files equivalent
  • 55. Towards Reproducible Science Evaluation 56 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE • Same results • Results from Int. Extinction may vary
  • 56. Towards Reproducible Science Evaluation 57 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE • Genomic data • Exact match
  • 57. Towards Reproducible Science Evaluation 58 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE
  • 58. Towards Reproducible Science Summarizing  Two building blocks towards reproducibility of scientific experiments o In vivo/vitro • Focus on providing structured descriptions of methods (laboratory protocols) • Our tools: ontologies, gazeteers, NLP tools and automatic and manual annotation tools • Challenge: make protocols be more structured (and semantic) from the beginning o In silico • Focus on the equipment (computational infrastructure) for workflow-based experiments • Ontologies, automatic and manual annotation tools, and an execution environment • Challenge: keep track of all types of appliances, and make scientists work on providing annotations  Is this enough? 59
  • 59. Towards Reproducible Science Summarizing  Is this enough? Clearly not, but a step forward towards ensuring reproducibility (with a focus on methods) 60
  • 60. Oscar Corcho (with contributions from Olga Giraldo, Alexander García, and Idafen Santana) Ontology Engineering Group Universidad Politécnica de Madrid, Spain Towards Reproducible Science: a few building blocks from my personal experience ocorcho@fi.upm.es @ocorcho 22/10/2017 S4BioDiv2017, Vienna
  • 61. Towards Reproducible Science Light pollution (www.stars4all.eu)

Hinweis der Redaktion

  1. Cambiar la licencia por la que aplique.
  2. Experiments are central to empirical science, they are the foundation in which experimental sciences are built and improved. They allow to verify the hypothesis defined according to the scientific method. Convince the reader (other scientists) that the conclusions of an study are correct. For that, and for supporting the growth of science, the must be a repeatable process. (both by him/herself and by other scientists).
  3. In last decades there has been an evolution in the way experimental science is conducted, adding computational resources for solving scientific problems. We have moved from a paradigm in which experiments were mainly conducted on laboratories or in nature, also referred to as in vitro or in vivo science To a paradigm in which simulations and mathematical models executed over computational resources, are used for obtaining scientific insights, also referred to as computational science or in silico science. Computational experiments complement rather than substitute classical experiments.
  4. In both cases, either in classical or computational experimental science, experiments must be a repeatable process For trusting the scientific results And for allowing the development of incremental research.
  5. In this context, a definition of which kind od repeatability we are looking for, and how we plan to do it, must be provided. The first thing that we have to do, is to define how we are going to take care of the object of interest, which can be done in 2 main ways Preservation: the act of isolating the object preventing any interaction that could damage it. Conservation: the set of actions for studying the object and its associated features, allowing a supervised or restricted use of it. The processes allow to prolong the life of the object.
  6. Once a plan for taking care of the object have been stated, we have as well two ways for obtaining a repetition of the it: A replication: an copy of the original object which is as close as possible to the original A reproduction: an object that expose or mimic a certain set of features in the same way as the original one In this work we explore how conservation techniques can be applied for experimental science reproducibility For achieving this conservation and reproducibility…
  7. Any scientific experiment can be divided into three main components DATA: the phenomena we study from nature, light from stars, genomes from plants or animals, reports in social science, etc. SCIENTIFIC PROCEDURE: the set of steps that have to be performed in order to obtain the results of the experiment. EQUIPMENT: the set of tools that are required by scientists in order to capture, process and interpret the desired data. From telescopes to microscopes, petri dishes or bunsen burners, there is a wide range of tools depending on the scientific domain. All these components… __________________________
  8. All these components have a counterpart in the Computational Science world. DATA is often represented by means of tables in data bases, structured files, or even web services providing data. The SCIENTIFIC PROCEDURE can be defined by the source code written on a given language or by the descriptions of a set of invocations of different tools. … and in last decades, as Scientific Workflows, which have emerge as a paradigm for formally defining the set of data transformations to perform the scientific procedure of a computational experiment. Finally, the EQUIPMENT of a computational experiment is defined by the of hardware and software resources that are required to execute the experiment. Some initiatives have ….
  9. In our platform the users login with an ORCID ID.
  10. We capture bibliographic data and information related to the description of the protocol like purpose, applications, advantages, limitations, etc.
  11. We capture a set of metadata for representing the sample, one of them is the name of the organism; and the name of the organism come from …
  12. And in the case of the reagents we capture the reagents from PubChem API
  13. the users can draw their workflows, describe each step or instruction and capture additional information as equipment, reagent, kits, software that participate in each step, also the users can include alerts messages, etc.
  14. All these components have a counterpart in the Computational Science world. DATA is often represented by means of tables in data bases, structured files, or even web services providing data. The SCIENTIFIC PROCEDURE can be defined by the source code written on a given language or by the descriptions of a set of invocations of different tools. … and in last decades, as Scientific Workflows, which have emerge as a paradigm for formally defining the set of data transformations to perform the scientific procedure of a computational experiment. Finally, the EQUIPMENT of a computational experiment is defined by the of hardware and software resources that are required to execute the experiment. Some initiatives have ….
  15. Some initiatives have been proposed to target the reproducibility issues of the different components of experiments in computational science.
  16. DATA Examples: RDA, Open Provenance Mode, MIBBI, VCR…
  17. Some initiatives have been proposed to target the reproducibility issues of the parts of computational experiments. SC. PROCEDURE Examples: Taverna, Pegasus, WINGS, Galaxy, SCUFL WMS and their related WF languages are a way of encapsulating an preserving the scientific procedure in computational experiments Platforms such as myExperiment allow its sharing and reproducibility
  18. Finally, we found that there was a lack of approaches targeting the computational equipment by the time we started this work. Most of the work done in the area by that time, focused on sharing virtual machine images, as a way of providing exact copies of the execution environment During the time of this work, some other initiatives have appear targeting this problem, as we will discuss later. ------------------------------------------------- EQUIPMENT There is a lack of initiatives in this aspect Some projects have aimed to approach it during the time of this work. Most of them focus on the use of VM -> BLACK BOXES (here we should motivate the need of exposing the knowledge about the execution environment for increasing the reproducibility) Examples: CernVM, ReproZip, TIMBUS NOTE: LINK THIS ONE WITH THE FOLLOWING SLIDE ABOUT THE OPEN RESEARCH PROBLEMS
  19. To share your research materials (RO as a social object) To facilitate reproducibility and reuse of methods To be recognized and cited (even for constituent resources) To preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun) Middleware
  20. All these components have a counterpart in the Computational Science world. DATA is often represented by means of tables in data bases, structured files, or even web services providing data. The SCIENTIFIC PROCEDURE can be defined by the source code written on a given language or by the descriptions of a set of invocations of different tools. … and in last decades, as Scientific Workflows, which have emerge as a paradigm for formally defining the set of data transformations to perform the scientific procedure of a computational experiment. Finally, the EQUIPMENT of a computational experiment is defined by the of hardware and software resources that are required to execute the experiment. Some initiatives have ….
  21. The firs open problem we identified is that… ____________________________________ Open Research Problem 1: Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow. The majority of computational scientists develop their experiments with an already existing infrastructure in mind, thus not considering its definition as part of the experiment. Open Research Problem 2: Execution Environments are poorly described, or even not described at all, when describing the results of an experiment. Often, the infrastructure used in the evaluation process is summarized explaining briefly its hardware overall capabilities and the basic software stack. This lack of information compromises the conservation and reproducibility of the experiment. Open Research Problem 3: Current approaches for Computational Scientific Experiments conservation and reproducibility take into account only the compu-tational process of the experiment (scientific procedure) and the data used and produced, but not the execution environment.
  22. Open Research Problem 1: Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow. The majority of computational scientists develop their experiments with an already existing infrastructure in mind, thus not considering its definition as part of the experiment.
  23. Open Research Problem 2: Execution Environments are poorly described, or even not described at all, when describing the results of an experiment. Often, the infrastructure used in the evaluation process is summarized explaining briefly its hardware overall capabilities and the basic software stack. This lack of information compromises the conservation and reproducibility of the experiment.
  24. Open Research Problem 3: Current approaches for Computational Scientific Experiments conservation and reproducibility take into account only the computational process of the experiment (scientific procedure) and the data used and produced, but not the execution environment. Based on this study, in this work, we focus on the aspects related to the reproducibility of the computational EQUIPMENT of a scientific experiment defined as a computational scientific workflows.
  25. That is, a set of modes for annotating the original environment, and that can be used for specifying and reproducing a new equivalent using cloud solutions
  26. As a result of this process, we developed the WICUS ontology network, which is composed…
  27. The first ontology is the workflow execution environment, which introduces the concept of workflow… Using this ontology we can describe the structure of a workflow, such as the ones depicted on this figure, which describes 3 workflows belonging to the Pegasus WMS, represented by the different figures and colors. Here we see how each of workflows is composed by a set of subworkflows, each one of them related to different execution requirement, as well as the requirement defined by the WMS (pegasus in this case).
  28. - DEPENDENCIES: JAR FILES DEPENDS ON THE JAVA VM
  29. Examples… Based on these models, that allow us to describe execution environments of scientific workflows….
  30. These system is composed by 3 main stages, which process the available experimental materials, for obtaining the corresponding enactment files These enactment files can be executed for deploying a reproduced execution environment. These overview can be decomposed into a set of modules and intermediate results generated during the process of reproducing an experiment _________________________________________ There are several input files and registries that can be used to extract information about the execution environment of the workflows Wf spec (DAG, make, etc.) SW comp registry (TC) WMS annotations (manual) SVA catalog (manual)
  31. We evaluated a total of 6 different workflows All of them expose different computational characteristics, From small ones, such as internal extinction, to really large ones such as SoyKb Or those requiring small amount of time for execution, such as xcorr, or montage, to the ones requiring 20 to 24 hours, such as BLAST All these workflows have been developed by different institutions, and published in different conferences and journals Some of them date from a decade ago, whereas others have been published recently We have selected them, based on their domain and the availability of their materials and support by the communities.
  32. Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  33. Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: which generates an image of the sky, pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  34. Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: which generates an image of the sky, pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  35. Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: which generates an image of the sky, pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  36. Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: which generates an image of the sky, pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  37. Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: which generates an image of the sky, pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  38. Cambiar la licencia por la que aplique.