Aparna Radhakrishnan, Engility
NOAA/GFDL was founded in 1955 and is still in the forefront of climate research, contributing to the numerous policies and decisions undertaken in this world of evolving responses with respect to climate, which in turn creates an avalanche of effects in various sectors, e.g agriculture, health, GDP. The scale and magnitude of computing and data have proven to increase significantly in the last decade, thus making data delivery methods to the world a herculean research problem by itself. In addition to this, the time and efforts invested by a user in analyzing and peer-reviewing a research article is very laborious. Literature shows numerous outstanding climate studies published in International climate assessment reports, such as the Intergovernmental Panel on Climate Change (IPCC), the United Nations body for assessing the science related to climate change. The need to verify the research and make it reproducible and transparent before it gets translated into major decisions is, now more than ever, one of our most critical challenges. In this presentation, we will paint a picture of the history of climate computing and analytics with significant transformations applied in order to make meaningful, quantifiable, credible, interoperable, accessible and reusable climate research. In other words, we will draw a path towards reproducible research using Docker containers for massive data publishing and climate analytics. This paper will also discuss some of the pioneering efforts from collaborators from other laboratories and organizations (such as ESGF, Google, NASA JPL, Columbia University, PMEL, etc.) in the area of Docker containers in computing and analysis on and off the cloud.
Automating Google Workspace (GWS) & more with Apps Script
Â
DCSF 19 Towards Reproducable Climate Research
1. 1
Aparna Radhakrishnan
Reproducible climate research
U.S.
D
EPARTMENT OF COMM
E
R
CE
NATIONALOCEA
NIC
AND ATMOSPHERIC
ADMINISTRATION
Facilitating inspiration-driven and industrial-strength analysis
3. 3
Reproducible research is the idea that data analyses, and
more generally, scientific claims, are published with their
data and software code so that others may verify the
findings and build upon them.
(Ref. https://www.coursera.org/learn/reproducible-research)
4. 4
Learn
Give credit
Build Trust
Make informed decisions
Extend Research
Collaborate
Big data Source
code
Documentation
Numerous possibilities
Value-added research
Research the
research
5. Specifications Document: E.g. Platform,
Dependencies, Software
HOW-TO?
E.g. Configure, Install.
CODEPRODUCTS, DATA
DOCUMENTATION
A.Radhakrishnan et al, Towards Reproducible Climate research
6. When will
I get to
conduct my
research?
6 A.Radhakrishnan et al, Towards Reproducible Climate research
7. Explore the use of Docker containers.
7 A.Radhakrishnan et al, Towards Reproducible Climate research
9. 9
How do we guide a researcher to
provide pointers
to
reproducing this figure from a
paper publication?
Test image only.
A.Radhakrishnan et al, Towards Reproducible Climate research
A beginnerâs guide to developing and sharing climate analysis using docker containers.
10. 10 A.Radhakrishnan et al, Towards Reproducible Climate research
1. Create a dockerfile from within your project directory that has
your jupyter notebook and other supporting code.
2. Build your docker image
3. Run your application
4. Activate your conda environment and run jupyter notebook
Reference. https://cloud.docker.com/u/aparnadotnoaa/repository/docker/aparnadotnoaa/gfdlanalysis-example
Use âv to include any additional data volumes
Cheat Sheet
5. Open localhost:8888?token=<paste_token_from_above_cmd>
11. 12 A.Radhakrishnan et al, Towards Reproducible Climate research
Live demo
Using docker and jupyter notebook for analysis in
the cloud
12. 13
Step 6. Share with colleagues
Create a dockerhub account and a repository.
Push your awesome docker image!
Pull your awesome docker image!
See step 3 and 4 to run.
A.Radhakrishnan et al, Towards Reproducible Climate research
14. 15
How it works?
1. Enter your GitHub repository information for the Jupyter notebooks
2. Pangeo-binder builds a Docker image of your repository
3. Interact with your notebooks in a live environment
4. Scale your computations across an adaptive dask cluster
A.Radhakrishnan et al, Towards Reproducible Climate research
15. 16
Live demo
1. Open Pangeo-binder: https://binder.pangeo.io/
2. Provide a Github repo URL, e.g. https://github.com/rabernat/pangeo_esgf_demo
3. Tap on Launch.
A.Radhakrishnan et al, Towards Reproducible Climate research
17. Earth System Grid Federation Architecture
18
(VM instances)
Nikonov et al, ESGF F2F 2018.
18. https://esgf.github.io/esgf-docker/compose/quick-start
Bash shell
Docker Engine
Docker Compose
âą Data publishing on Earth
System Grid Federation
âą Collaborative software
development for
International projects
âą Ease of installation and
maintenance.
Some benefits Pre-requisites
19 A.Radhakrishnan et al, Towards Reproducible Climate research
20. 21
Acknowledgments
21
Ryan Abernathey
V.Balaji
Thank You
Luca Cinquini
John Krasting
Colleen McHugh
Serguei Nikonov
Roland Schweitzer
Hans Vahlenkamp
Chandin Wilson
Andrew Wittenberg
@HV Photography
A.Radhakrishnan et al, Towards Reproducible Climate research