The project Opening Reproducible Research (http://o2r.info) tries to reduce the barrier to reproducible research by developing a convention and supporting tools for Executable Research Compendia (ERCs, https://doi.org/10.1045/january2017-nuest) which include (i) the article, (ii) data, (iii) code, and (iv) the runtime environment to reproduce the study. The ERC provides a well-structured container for both the needs of journals (ERC as the item under review), archives (suitable metadata and packaging formats), and researchers (literally everything needed to re-do an analysis is there). It relies on Docker to define and store the runtime environment. ERCs should be simple enough to be created manually and absorb best practices for organizing digital workspaces. Complementary, an online creation service automatically creates ERC, including Dockerfile and Docker image, from typical user workspaces for less experienced users. A validation and manipulation service will allow (a) users to create an ERC for their workflows with minimal required input, (b) users to interact with published ERC, e.g. (peer) review the contents, or manipulate parameters of the workflow and explore interactive graphics, and (c) platform providers (e.g. journals, data repositories, archives, universities) to integrate o2r building blocks to expand their procedures with exectuable containers. The reference implementation focuses on the geoscience domain and the R language.
We show which steps and aspects of publishing and properly archiving computational research with containers can or cannot be automated for a specific community of practice, and point to future challenges. We will share the concepts behind the ERC (http://o2r.info/erc-spec) and the state of the o2r architecture (http://o2r.info/architecture) and software (https://github.com/o2r-project).
DevEX - reference for building teams, processes, and platforms
Creating Executable Research Compendia to Improve Reproducibility in the Geosciences
1. Creating Executable Research Compendia to
Improve Reproducibility in the Geosciences
Daniel Nüst | University of Münster | @nordholmen
C4RR workshop, June 28 2017, Cambridge, UK
5. Key features of ERCs
Nested containers (BagIt, Docker)
Librarian-ready
Reproducibility range of 5 to 10 years
(still worth integrating, target users are not science historians)
Desktop-size data and algorithms - closed and complete
“Geo-stuff” and R for the “last 10 %”
Remain understandable for scientists
5
12. meta toolsuite
- extract
- map
- harvest
- validate
Highlights
Automatically extract several metadata from
workspace, including spatial information
Facilitate MD management with schema
translation maps 12
16. ERC specification
GitHub dev
Development steps
version 0, practical evaluation
version 0.5, expert evaluation
version 0.6, architect evaluation
version 1 (mid 2017) > ref. impl.
Content
http://o2r.info/erc-spec
16
17. ERC specification - key features & structure
base directory
main document & display file
runtime image & runtime manifest
yml configuration file (control statements, metadata)
5 files + x
17
23. Summary
Executable Research Compendia are fun and …
help us learn a lot about reproducibility
work including a domain-specific “last mile”
take into consideration requirements of libraries and preservation
re-use and integrate, are not “a platform”
dont’t solve all problems (R, geo, 1/5 Vs, no HPC, comp. reproducibility)
Reproducibility service makes ERC work in geosciences for the current
publication workflow and services.
23
24. Outlook
“A lot of glue work around the edges” (M.Hartley)
ERCs are post-hoc glue for minimal reproducibility
Catching up with reference implementation and demo ERCs
Spin-out of tools
Follow-ups & collaborations
(production mode in cloud? special issues?)
24
The ERC provides a well-structured container for both the needs of journals (ERC as the item under review), archives (suitable metadata and packaging formats), and researchers (literally everything needed to re-do an analysis is there). It relies on Docker to define and store the runtime environment. ERCs should be simple enough to be created manually and absorb best practices for organizing digital workspaces.
“bundle”
Test platform - we are not a platform!
Daniel
Marc
Daniel
Should be able to create it manually
# researchers workspaces = # researchers
data NEXT TO container
ENTRY POINTs
“Sophisticated Makefile”