SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Downloaden Sie, um offline zu lesen
Open & reproducible research - What can we do
in practice?
Presented by
Felix Z. Hoffmann
@Felix11H
felix11h.github.io/
Slides
GitHub: bit.ly/bx18s
Resources and links
Open Science Fellows Program:
bit.ly/osfprog
project description: bit.ly/osfproj
prototype: bit.ly/osrep
Who am I?
PhD student with Prof. Jochen Triesch
Computational models of structural
plasticity
Google Summer of Code 2014
Data-centric views in Sumatra
Wikimedia Open Science Fellow
2017/2018
Open computational research study
The reproducibility crisis
Computational reproducibility should be easy...
Hard Easy?
Computational reproducibility should be easy...
1. cheap and universal access to computers (as
opposed to a lab)
2. running code is inexpensive and unproblematic
(compared to replicating an experiment)
3. can easily share code & data that allow direction
reproduction
Is there a reproducibility crisis
in computational research?
Sharing of code & data mandatory in many journals
Policy of Science since February 11, 2011
Sharing of code & data mandatory in many journals
Policy of Science since February 11, 2011
Sharing of code & data mandatory in many journals
Policy of Science since February 11, 2011
Out of 206 computational studies in Science since 2011,
26 provided code & data directly
Stodden et al. 2018
A few responses...
Stodden et al. 2018
A few responses...
Stodden et al. 2018
A few responses...
Stodden et al. 2018
Of N = 206 articles published in Science since 2011...
⇒ Code & data could be retrieved for 91 out of 206 studies.
Stodden et al. 2018
Open Source for Neuroscience
opensourceforneuroscience.org/
Open Source for Neuroscience
opensourceforneuroscience.org/
Reproducibility when code was available
Tried to reproduce randomly selected 22 studies (out of 56 that
were judged as potentially reproducible)
Stodden et al. 2018
Reproducibility when code was available
Tried to reproduce randomly selected 22 studies (out of 56 that
were judged as potentially reproducible)
Even when code was available, more than half of studies
were reproducible only with significant effort!
Stodden et al. 2018
Reproducibility when code was available
Tried to reproduce randomly selected 22 studies (out of 56 that
were judged as potentially reproducible)
Even when code was available, more than half of studies
were reproducible only with significant effort!
Problems:
- impossible to reproduce
(missing code, data or
methodology)
Stodden et al. 2018
Reproducibility when code was available
Tried to reproduce randomly selected 22 studies (out of 56 that
were judged as potentially reproducible)
Even when code was available, more than half of studies
were reproducible only with significant effort!
Problems:
- impossible to reproduce
(missing code, data or
methodology)
- required tedious effort (e.g.
download large number of
individual data sets)
Stodden et al. 2018
Reproducibility when code was available
Tried to reproduce randomly selected 22 studies (out of 56 that
were judged as potentially reproducible)
Even when code was available, more than half of studies
were reproducible only with significant effort!
Problems:
- impossible to reproduce
(missing code, data or
methodology)
- required tedious effort (e.g.
download large number of
individual data sets)
- required intellectual effort (e.g.
knowledge of past articles,
implementing given pseudo
code)
Stodden et al. 2018
Reproducibility when code was available
Tried to reproduce randomly selected 22 studies (out of 56 that
were judged as potentially reproducible)
Even when code was available, more than half of studies
were reproducible only with significant effort!
Problems:
- impossible to reproduce
(missing code, data or
methodology)
- required tedious effort (e.g.
download large number of
individual data sets)
- required intellectual effort (e.g.
knowledge of past articles,
implementing given pseudo
code)
- required tweaking (e.g. mising
parameters, minor methods
steps)
Stodden et al. 2018
Reproducibility when code was available
Computational reproducibility remains difficult
even when code is available!
How to publish our research so that (computational)
results are reproducible?
A complex computational study - Reproducible?
A B C
anisotropic rewired tuned aniso.
Open Science Fellowship
Open Science Fellowship
- runs from October to June
- fellows are paired with a
mentor who supports the
progress
- financial support provided
- training seminars & work-
shops
- program in German
Open Science Fellowship
Photo: Ralf Rebmann, CC BY-SA 4.0
Open Science Fellowship
application for 3rd round open: bit.ly/osfprog
Photo: Ralf Rebmann, CC BY-SA 4.0
Problems to solve
Problem 1
- using of difficult to install computa-
tional environment (graph-tool)
Problem 2
Problems to solve
Problem 1
- using of difficult to install computa-
tional environment (graph-tool)
Problem 2
Problems to solve
Problem 1
- using of difficult to install computa-
tional environment (graph-tool)
Problems to solve
Problem 1 
- using of difficult to install computa-
tional environment (graph-tool)
Problems to solve
Problem 1 
- using of difficult to install computa-
tional environment (graph-tool)
Problem 2
- long  resource demanding com-
putations
Problems to solve
Problem 1 
- using of difficult to install computa-
tional environment (graph-tool)
Problem 2
- long  resource demanding com-
putations
- subsequent analysis require output
of previous computations
Problems to solve
Problem 1 
- using of difficult to install computa-
tional environment (graph-tool)
Problem 2
- long  resource demanding com-
putations
- subsequent analysis require output
of previous computations
difficult to understand what is re-
quired to reproduce a single output
(1 figure)
Problems to solve
Problem 1 
- using of difficult to install computa-
tional environment (graph-tool)
Problem 2
- long  resource demanding com-
putations
- subsequent analysis require output
of previous computations
difficult to understand what is re-
quired to reproduce a single output
(1 figure)
Davison 2012
Problems to solve
Problem 1 
- using of difficult to install computa-
tional environment (graph-tool)
Problem 2
- long  resource demanding com-
putations
- subsequent analysis require output
of previous computations
difficult to understand what is re-
quired to reproduce a single output
(1 figure)
Davison 2012
Problems to solve
Problem 1 
- using of difficult to install computa-
tional environment (graph-tool)
Problem 2 
- long  resource demanding com-
putations
- subsequent analysis require output
of previous computations
difficult to understand what is re-
quired to reproduce a single output
(1 figure)
Davison 2012
Computational reproducibility in a prototype
+
=
computational reproducibility (in a prototype)
Example study:
http://bit.ly/osproj
Documentation:
http://bit.ly/osrep
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed
program can be easily used, and modified, by you
and other people, inside  outside own lab
→ avoid hard coded numbers, write documentation
program can be re-implemented by another research
to re-obtain results
→ see for example ReScience journal
Benureau and Rougier 2018
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed → document
dependencies necessary to run code
program can be easily used, and modified, by you
and other people, inside  outside own lab
→ avoid hard coded numbers, write documentation
program can be re-implemented by another research
to re-obtain results
→ see for example ReScience journal
Benureau and Rougier 2018
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed → document
dependencies necessary to run code
R2
Repeatable: program is deterministic, produces repeatable output
program can be easily used, and modified, by you
and other people, inside  outside own lab
→ avoid hard coded numbers, write documentation
program can be re-implemented by another research
to re-obtain results
→ see for example ReScience journal
Benureau and Rougier 2018
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed → document
dependencies necessary to run code
R2
Repeatable: program is deterministic, produces repeatable output
→ add seeds for random number generators
program can be easily used, and modified, by you
and other people, inside  outside own lab
→ avoid hard coded numbers, write documentation
program can be re-implemented by another research
to re-obtain results
→ see for example ReScience journal
Benureau and Rougier 2018
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed → document
dependencies necessary to run code
R2
Repeatable: program is deterministic, produces repeatable output
→ add seeds for random number generators
R3
Reproducible: another researcher can take code  input data,
execute code, and re-obtain same results
program can be easily used, and modified, by you
and other people, inside  outside own lab
→ avoid hard coded numbers, write documentation
program can be re-implemented by another research
to re-obtain results
→ see for example ReScience journal
Benureau and Rougier 2018
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed → document
dependencies necessary to run code
R2
Repeatable: program is deterministic, produces repeatable output
→ add seeds for random number generators
R3
Reproducible: another researcher can take code  input data,
execute code, and re-obtain same results
→ detailed versions of dependencies, version of
code, availability
program can be easily used, and modified, by you
and other people, inside  outside own lab
→ avoid hard coded numbers, write documentation
program can be re-implemented by another research
to re-obtain results
→ see for example ReScience journal
Benureau and Rougier 2018
Where to publish code
https://zenodo.org/
Where to publish code
https://osf.io/
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed → document
dependencies necessary to run code
R2
Repeatable: program is deterministic, produces repeatable output
→ add seeds for random number generators
R3
Reproducible: another researcher can take code  input data,
execute code, and re-obtain same results
→ detailed versions of dependencies, version of
code, availability
Benureau and Rougier 2018
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed → document
dependencies necessary to run code
R2
Repeatable: program is deterministic, produces repeatable output
→ add seeds for random number generators
R3
Reproducible: another researcher can take code  input data,
execute code, and re-obtain same results
→ detailed versions of dependencies, version of
code, availability
R4
Reusable: program can be easily used, and modified, by you
and other people, inside  outside own lab
Benureau and Rougier 2018
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed → document
dependencies necessary to run code
R2
Repeatable: program is deterministic, produces repeatable output
→ add seeds for random number generators
R3
Reproducible: another researcher can take code  input data,
execute code, and re-obtain same results
→ detailed versions of dependencies, version of
code, availability
R4
Reusable: program can be easily used, and modified, by you
and other people, inside  outside own lab
→ avoid hard coded numbers, write documentation
Benureau and Rougier 2018
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed → document
dependencies necessary to run code
R2
Repeatable: program is deterministic, produces repeatable output
→ add seeds for random number generators
R3
Reproducible: another researcher can take code  input data,
execute code, and re-obtain same results
→ detailed versions of dependencies, version of
code, availability
R4
Reusable: program can be easily used, and modified, by you
and other people, inside  outside own lab
→ avoid hard coded numbers, write documentation
R5
Replicable: program can be re-implemented by another research
to re-obtain results
Benureau and Rougier 2018
Five Rs for reproducible scientific code
R1
Re-runnable: can be run again when needed → document
dependencies necessary to run code
R2
Repeatable: program is deterministic, produces repeatable output
→ add seeds for random number generators
R3
Reproducible: another researcher can take code  input data,
execute code, and re-obtain same results
→ detailed versions of dependencies, version of
code, availability
R4
Reusable: program can be easily used, and modified, by you
and other people, inside  outside own lab
→ avoid hard coded numbers, write documentation
R5
Replicable: program can be re-implemented by another research
to re-obtain results
→ see for example ReScience journal
Benureau and Rougier 2018
References
Benureau, Fabien C. Y. and Nicolas P. Rougier (2018).
Re-Run, Repeat, Reproduce, Reuse, Replicate:
Transforming Code into Scientific Contributions. In:
Frontiers in Neuroinformatics 11.
Collberg, Christian, Todd Proebsting, Gina Moraila,
Akash Shankaran, Shi Zuoming, and Alex M Warren
(2013). Measuring Reproducibility in Computer
Systems Research. In:
Davison, Andrew (2012). Automated Capture of
Experiment Context for Easier Reproducibility in
Computational Research. In: Computing in Science 
Engineering 14.4, pp. 48–56.
Rougier, Nicolas P. et al. (2017). Sustainable
Computational Science: The ReScience Initiative. In:
Stodden, Victoria, Jennifer Seiler, and Zhaokun Ma
(2018). An Empirical Analysis of Journal Policy
Effectiveness for Computational Reproducibility. In:
Proceedings of the National Academy of Sciences
115.11, pp. 2584–2589.
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Genomics data analysis in Julia
Genomics data analysis in JuliaGenomics data analysis in Julia
Genomics data analysis in Julia
 
A Julia package for iterative SVDs with applications to genomics data analysis
A Julia package for iterative SVDs with applications to genomics data analysisA Julia package for iterative SVDs with applications to genomics data analysis
A Julia package for iterative SVDs with applications to genomics data analysis
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2014 toronto-torbug
2014 toronto-torbug2014 toronto-torbug
2014 toronto-torbug
 
Reproducible Research in R and R Studio
Reproducible Research in R and R StudioReproducible Research in R and R Studio
Reproducible Research in R and R Studio
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
Implications of GPT-3
Implications of GPT-3Implications of GPT-3
Implications of GPT-3
 
An introduction to Julia
An introduction to JuliaAn introduction to Julia
An introduction to Julia
 
Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...Intelligent Software Engineering: Synergy between AI and Software Engineering...
Intelligent Software Engineering: Synergy between AI and Software Engineering...
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 

Ähnlich wie Open & reproducible research - What can we do in practice?

Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
Miklos Koren
 
Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical Sciences
Aron Ahmadia
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 

Ähnlich wie Open & reproducible research - What can we do in practice? (20)

Computer Tools for Academic Research
Computer Tools for Academic ResearchComputer Tools for Academic Research
Computer Tools for Academic Research
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
Software Preservation: challenges and opportunities for reproductibility (Sci...
Software Preservation: challenges and opportunities for reproductibility (Sci...Software Preservation: challenges and opportunities for reproductibility (Sci...
Software Preservation: challenges and opportunities for reproductibility (Sci...
 
ScilabTEC 2015 - Irill
ScilabTEC 2015 - IrillScilabTEC 2015 - Irill
ScilabTEC 2015 - Irill
 
ownR platform extended technical introduction
ownR platform extended technical introductionownR platform extended technical introduction
ownR platform extended technical introduction
 
ownR extended technical introduction
ownR extended technical introductionownR extended technical introduction
ownR extended technical introduction
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...
 
Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical Sciences
 
ownR presentation eRum 2016
ownR presentation eRum 2016ownR presentation eRum 2016
ownR presentation eRum 2016
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
ownR platform technical introduction
ownR platform technical introductionownR platform technical introduction
ownR platform technical introduction
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
FEC2017-Introduction-to-programming
FEC2017-Introduction-to-programmingFEC2017-Introduction-to-programming
FEC2017-Introduction-to-programming
 
#OSSPARIS19 - Overcoming open source challenges in reinforcement learning - W...
#OSSPARIS19 - Overcoming open source challenges in reinforcement learning - W...#OSSPARIS19 - Overcoming open source challenges in reinforcement learning - W...
#OSSPARIS19 - Overcoming open source challenges in reinforcement learning - W...
 

Kürzlich hochgeladen

dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 

Kürzlich hochgeladen (20)

Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 

Open & reproducible research - What can we do in practice?

  • 1. Open & reproducible research - What can we do in practice? Presented by Felix Z. Hoffmann @Felix11H felix11h.github.io/ Slides GitHub: bit.ly/bx18s Resources and links Open Science Fellows Program: bit.ly/osfprog project description: bit.ly/osfproj prototype: bit.ly/osrep
  • 2. Who am I? PhD student with Prof. Jochen Triesch Computational models of structural plasticity Google Summer of Code 2014 Data-centric views in Sumatra Wikimedia Open Science Fellow 2017/2018 Open computational research study
  • 4. Computational reproducibility should be easy... Hard Easy?
  • 5. Computational reproducibility should be easy... 1. cheap and universal access to computers (as opposed to a lab) 2. running code is inexpensive and unproblematic (compared to replicating an experiment) 3. can easily share code & data that allow direction reproduction Is there a reproducibility crisis in computational research?
  • 6. Sharing of code & data mandatory in many journals Policy of Science since February 11, 2011
  • 7. Sharing of code & data mandatory in many journals Policy of Science since February 11, 2011
  • 8. Sharing of code & data mandatory in many journals Policy of Science since February 11, 2011
  • 9. Out of 206 computational studies in Science since 2011, 26 provided code & data directly Stodden et al. 2018
  • 13. Of N = 206 articles published in Science since 2011... ⇒ Code & data could be retrieved for 91 out of 206 studies. Stodden et al. 2018
  • 14.
  • 15. Open Source for Neuroscience opensourceforneuroscience.org/
  • 16. Open Source for Neuroscience opensourceforneuroscience.org/
  • 17. Reproducibility when code was available Tried to reproduce randomly selected 22 studies (out of 56 that were judged as potentially reproducible) Stodden et al. 2018
  • 18. Reproducibility when code was available Tried to reproduce randomly selected 22 studies (out of 56 that were judged as potentially reproducible) Even when code was available, more than half of studies were reproducible only with significant effort! Stodden et al. 2018
  • 19. Reproducibility when code was available Tried to reproduce randomly selected 22 studies (out of 56 that were judged as potentially reproducible) Even when code was available, more than half of studies were reproducible only with significant effort! Problems: - impossible to reproduce (missing code, data or methodology) Stodden et al. 2018
  • 20. Reproducibility when code was available Tried to reproduce randomly selected 22 studies (out of 56 that were judged as potentially reproducible) Even when code was available, more than half of studies were reproducible only with significant effort! Problems: - impossible to reproduce (missing code, data or methodology) - required tedious effort (e.g. download large number of individual data sets) Stodden et al. 2018
  • 21. Reproducibility when code was available Tried to reproduce randomly selected 22 studies (out of 56 that were judged as potentially reproducible) Even when code was available, more than half of studies were reproducible only with significant effort! Problems: - impossible to reproduce (missing code, data or methodology) - required tedious effort (e.g. download large number of individual data sets) - required intellectual effort (e.g. knowledge of past articles, implementing given pseudo code) Stodden et al. 2018
  • 22. Reproducibility when code was available Tried to reproduce randomly selected 22 studies (out of 56 that were judged as potentially reproducible) Even when code was available, more than half of studies were reproducible only with significant effort! Problems: - impossible to reproduce (missing code, data or methodology) - required tedious effort (e.g. download large number of individual data sets) - required intellectual effort (e.g. knowledge of past articles, implementing given pseudo code) - required tweaking (e.g. mising parameters, minor methods steps) Stodden et al. 2018
  • 23. Reproducibility when code was available Computational reproducibility remains difficult even when code is available!
  • 24. How to publish our research so that (computational) results are reproducible?
  • 25. A complex computational study - Reproducible? A B C anisotropic rewired tuned aniso.
  • 27. Open Science Fellowship - runs from October to June - fellows are paired with a mentor who supports the progress - financial support provided - training seminars & work- shops - program in German
  • 28. Open Science Fellowship Photo: Ralf Rebmann, CC BY-SA 4.0
  • 29. Open Science Fellowship application for 3rd round open: bit.ly/osfprog Photo: Ralf Rebmann, CC BY-SA 4.0
  • 30. Problems to solve Problem 1 - using of difficult to install computa- tional environment (graph-tool) Problem 2
  • 31. Problems to solve Problem 1 - using of difficult to install computa- tional environment (graph-tool) Problem 2
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. Problems to solve Problem 1 - using of difficult to install computa- tional environment (graph-tool)
  • 37. Problems to solve Problem 1 - using of difficult to install computa- tional environment (graph-tool)
  • 38. Problems to solve Problem 1 - using of difficult to install computa- tional environment (graph-tool) Problem 2 - long resource demanding com- putations
  • 39. Problems to solve Problem 1 - using of difficult to install computa- tional environment (graph-tool) Problem 2 - long resource demanding com- putations - subsequent analysis require output of previous computations
  • 40. Problems to solve Problem 1 - using of difficult to install computa- tional environment (graph-tool) Problem 2 - long resource demanding com- putations - subsequent analysis require output of previous computations difficult to understand what is re- quired to reproduce a single output (1 figure)
  • 41. Problems to solve Problem 1 - using of difficult to install computa- tional environment (graph-tool) Problem 2 - long resource demanding com- putations - subsequent analysis require output of previous computations difficult to understand what is re- quired to reproduce a single output (1 figure) Davison 2012
  • 42.
  • 43. Problems to solve Problem 1 - using of difficult to install computa- tional environment (graph-tool) Problem 2 - long resource demanding com- putations - subsequent analysis require output of previous computations difficult to understand what is re- quired to reproduce a single output (1 figure) Davison 2012
  • 44. Problems to solve Problem 1 - using of difficult to install computa- tional environment (graph-tool) Problem 2 - long resource demanding com- putations - subsequent analysis require output of previous computations difficult to understand what is re- quired to reproduce a single output (1 figure) Davison 2012
  • 45.
  • 46. Computational reproducibility in a prototype + = computational reproducibility (in a prototype) Example study: http://bit.ly/osproj Documentation: http://bit.ly/osrep
  • 47. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed program can be easily used, and modified, by you and other people, inside outside own lab → avoid hard coded numbers, write documentation program can be re-implemented by another research to re-obtain results → see for example ReScience journal Benureau and Rougier 2018
  • 48. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed → document dependencies necessary to run code program can be easily used, and modified, by you and other people, inside outside own lab → avoid hard coded numbers, write documentation program can be re-implemented by another research to re-obtain results → see for example ReScience journal Benureau and Rougier 2018
  • 49. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed → document dependencies necessary to run code R2 Repeatable: program is deterministic, produces repeatable output program can be easily used, and modified, by you and other people, inside outside own lab → avoid hard coded numbers, write documentation program can be re-implemented by another research to re-obtain results → see for example ReScience journal Benureau and Rougier 2018
  • 50. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed → document dependencies necessary to run code R2 Repeatable: program is deterministic, produces repeatable output → add seeds for random number generators program can be easily used, and modified, by you and other people, inside outside own lab → avoid hard coded numbers, write documentation program can be re-implemented by another research to re-obtain results → see for example ReScience journal Benureau and Rougier 2018
  • 51. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed → document dependencies necessary to run code R2 Repeatable: program is deterministic, produces repeatable output → add seeds for random number generators R3 Reproducible: another researcher can take code input data, execute code, and re-obtain same results program can be easily used, and modified, by you and other people, inside outside own lab → avoid hard coded numbers, write documentation program can be re-implemented by another research to re-obtain results → see for example ReScience journal Benureau and Rougier 2018
  • 52. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed → document dependencies necessary to run code R2 Repeatable: program is deterministic, produces repeatable output → add seeds for random number generators R3 Reproducible: another researcher can take code input data, execute code, and re-obtain same results → detailed versions of dependencies, version of code, availability program can be easily used, and modified, by you and other people, inside outside own lab → avoid hard coded numbers, write documentation program can be re-implemented by another research to re-obtain results → see for example ReScience journal Benureau and Rougier 2018
  • 53. Where to publish code https://zenodo.org/
  • 54. Where to publish code https://osf.io/
  • 55. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed → document dependencies necessary to run code R2 Repeatable: program is deterministic, produces repeatable output → add seeds for random number generators R3 Reproducible: another researcher can take code input data, execute code, and re-obtain same results → detailed versions of dependencies, version of code, availability Benureau and Rougier 2018
  • 56. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed → document dependencies necessary to run code R2 Repeatable: program is deterministic, produces repeatable output → add seeds for random number generators R3 Reproducible: another researcher can take code input data, execute code, and re-obtain same results → detailed versions of dependencies, version of code, availability R4 Reusable: program can be easily used, and modified, by you and other people, inside outside own lab Benureau and Rougier 2018
  • 57. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed → document dependencies necessary to run code R2 Repeatable: program is deterministic, produces repeatable output → add seeds for random number generators R3 Reproducible: another researcher can take code input data, execute code, and re-obtain same results → detailed versions of dependencies, version of code, availability R4 Reusable: program can be easily used, and modified, by you and other people, inside outside own lab → avoid hard coded numbers, write documentation Benureau and Rougier 2018
  • 58. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed → document dependencies necessary to run code R2 Repeatable: program is deterministic, produces repeatable output → add seeds for random number generators R3 Reproducible: another researcher can take code input data, execute code, and re-obtain same results → detailed versions of dependencies, version of code, availability R4 Reusable: program can be easily used, and modified, by you and other people, inside outside own lab → avoid hard coded numbers, write documentation R5 Replicable: program can be re-implemented by another research to re-obtain results Benureau and Rougier 2018
  • 59. Five Rs for reproducible scientific code R1 Re-runnable: can be run again when needed → document dependencies necessary to run code R2 Repeatable: program is deterministic, produces repeatable output → add seeds for random number generators R3 Reproducible: another researcher can take code input data, execute code, and re-obtain same results → detailed versions of dependencies, version of code, availability R4 Reusable: program can be easily used, and modified, by you and other people, inside outside own lab → avoid hard coded numbers, write documentation R5 Replicable: program can be re-implemented by another research to re-obtain results → see for example ReScience journal Benureau and Rougier 2018
  • 60. References Benureau, Fabien C. Y. and Nicolas P. Rougier (2018). Re-Run, Repeat, Reproduce, Reuse, Replicate: Transforming Code into Scientific Contributions. In: Frontiers in Neuroinformatics 11. Collberg, Christian, Todd Proebsting, Gina Moraila, Akash Shankaran, Shi Zuoming, and Alex M Warren (2013). Measuring Reproducibility in Computer Systems Research. In: Davison, Andrew (2012). Automated Capture of Experiment Context for Easier Reproducibility in Computational Research. In: Computing in Science Engineering 14.4, pp. 48–56. Rougier, Nicolas P. et al. (2017). Sustainable Computational Science: The ReScience Initiative. In: Stodden, Victoria, Jennifer Seiler, and Zhaokun Ma (2018). An Empirical Analysis of Journal Policy Effectiveness for Computational Reproducibility. In: Proceedings of the National Academy of Sciences 115.11, pp. 2584–2589. Thank you!