2. Disclosure
(or call it perspective)
• Funded by NSF through BEACON for a series
of workshops on developing reproducible
science curriculum
• 3 principle questions:
• What practices, tools, and resources are
available now?
• How best to teach these?
• What are the gaps faced by biologist users?
5. Main takeaways
(distilled to tweets)
• software with many dependencies ->
exponentially lower prob that all install
• holes or errors in docs -> harmless for
experts, often fatal for "method novice"
• software evolution & rot -> parameters
that worked 1 year ago now throw an
error
• Non-domain reproducers harder: baseline
software, packages differ #dependencyhell
7. “arguing that reproducibility
is laudable in general glosses
over the fact that for each
research group it is a
significant amount of work
to make their research
(easily) reproducible for
independent scientists”
8. “Any work you do to
make your analysis
more reproducible
pays dividends for
colleagues and your
future self.”
Jeremy Leipzig
9.
10. For research to be
reproducible, the parts need
to be available to start
Collberg et al (2014), Measuring Reproducibility in Computer Systems Research.
http://reproducibility.cs.arizona.edu/tr.pdf
11. A huge tech soup
• vagrant
• Ansible
• Docker
• Drone
• Travis
• knitr
• packrat
• VM memory limits
• VM storage limits
• VM uptime limits
• firewalls
• protected data
• data snapshotting