SlideShare ist ein Scribd-Unternehmen logo
1 von 27
JUPYTER ASCENDING:
A PRACTICAL HAND GUIDE TO GALACTIC SCALE,
REPRODUCIBLE DATA SCIENCE
John Fonner, PhD
University of Texas at Austin
April 5th, 2016
4/5/2016 1
 Photos, Tweets, and hate mail all welcome!
 Email: jfonner@tacc.utexas.edu
 Twitter: @johnfonner
4/5/2016 2
SCIENCE AS A SECOND THOUGHT
1. Formulate a theory
2. Gather data
3. Learn about data storage
4. Learn about data movement
protocols
5. Lose data
6. Check out of rehab
7. Learn about backup and
replication
8. Gather data
9. Learn about versioning
10. Start preliminary analysis
11. Buy a newer laptop
12. Buy more memory
13. Buy a desktop with more
memory
14. Buy a bigger monitor & GPUs
“for work”
15. Google “250GB Excel
Spreadsheet”
16. Learn about batch processing
17. Learn about batch schedulers
18. Learn about patience.
19. Learn more about data storage
20. Learn about distributed
systems.
21. Go back through notes to
remember the science question.
22. Learn R & Python
23. Learn linux admin
24. Finish preliminary analysis.
25. Grow a ponytail
26. Write a paper.
27. Learn about data publishing
28. Learn about reproducibility
29. Plot the death of your
advisor/dept. head
30. Apply for grants & research
allocations on public systems
31. Wait to apply next time
32. Finish analyzing data
33. Reformulate your theory
34. Goto 1
4/5/2016 3
SCIENTIFIC REPRODUCIBILITY
4/5/2016 4
+ +
SOME ASSEMBLY REQUIRED…
4/5/2016 5
?
? ?
?
4/5/2016 6
SCIENTISTS, WITH FEW EXCEPTIONS,
ARE NOT TRAINED PROGRAMMERS
 Research is hard
 Coding is hard
 Research code is
 well designed,
 documented,
 leverages design patterns,
 highly reusable,
 portable,
 and usually open source.
4/5/2016 7
ACCESSIBILITY >= CAPABILITY
 For scientific reproducibility, the impact of your work
will be more about accessibility than capability
 Domain grad students, not sys admins, are the early
adopters
 Where can we focus effort to create community around
capability?
4/5/2016 8
 What has changed the least about the computation
you do over the last 10 years?
 What do we ask domain researchers to learn to use
our tools and data?
4/5/2016 9
Memory/CPU/Disk
Operating System
Applications
Interface
4/5/2016 10
4/5/2016 11
4/5/2016 12
 Decoupling the technology “stack”
4/5/2016 13
“Reproducers”
• Web Browser
• GUIs
• Windows / Mac OS
Support
• Sample Data and Sample
Workflows
“Producers”
• Linux CLI
• Hadoop / GPFS / Lustre
• Clusters / Clouds /
Containers
• Dockerfile / Makefile /
Ansible
BACKEND INFRASTRUCTURE: SYSTEMS
 Categorize systems as either Storage or Execution
 Describe and support relevant protocols, directories,
schedulers, and quotas
 Each system includes the credentials to log into the
system (SSH Keys, X509, username/password)
 Register everything with a JSON document
http://agaveapi.co/documentation/tutorials/system-
management-tutorial/
4/5/2016 14
BACKEND INFRASTRUCTURE: APPS
 An “App” is a versioned instance of a software package
on a specific Execution System
 App assets are bundled into a directory and stored on a
Storage System
 Apps can be private, shared with individual users, or
made public
 Public apps are compressed, assigned a checksum, and
stored in a protected space
http://agaveapi.co/documentation/tutorials/app-
management-tutorial/
4/5/2016 15
BACKEND INFRASTRUCTURE: JOBS
 A “Job” is an execution of an App with a specific set
of input files and parameters
 All jobs are given an ID, all inputs and parameters are
preserved, output is also tracked
 Jobs can be shared with others
http://agaveapi.co/documentation/tutorials/job-
management-tutorial/
4/5/2016 16
4/5/2016 17
4/5/2016 18
4/5/2016 19
4/5/2016 20
4/5/2016 21
DEVELOPER COMMAND-LINE TOOLS
 https://bitbucket.org/agaveapi/cli
 Requires bash and python’s json.tool
 Uses caching for authentication
 Parses JSON responses to condense output
 As a Linux user, this is home-sweet-home
4/5/2016 22
WHAT ABOUT JUPYTER?
 Bleeding edge research will never be on a webpage
 Data exploration “outside the app” also needs to be
captured
 An infrastructure for responsible computing at scale
inevitably must support responsible data exploration
 Jupyter has broad OS support, domain adoption,
domain libraries, and a more interactive UI
4/5/2016 23
AGAVEPY
 github.com/TACC/agavepy
 Pythonic wrapper for all Agave endpoints
 pip install agavepy
 Developers actively “dogfooding” the module
 (Obviously) usable within Jupyter
 Has had greater uptake by users (not just developers)
4/5/2016 24
AGAVE-AWARE JUPYTERHUB
 Going one step further – give users a notebook
 jupyter.public.tenants.prod.agaveapi.co/
 (Free) account creation here:
public.tenants.prod.agaveapi.co/create_account
 Beta implementation at the moment
 data purges during updates
 Limited capacity on the current VM
 All notebooks run inside Docker containers
4/5/2016 25
WHAT’S NEXT?
 Full-featured developer portal
 Open-source reference implementation of an
Angular Javascript portal built on Agave
 Additional Jupyter notebook examples
 Production-grade support for a hosted JupyterHub
4/5/2016 26
THANKS!
QUESTIONS?
Email: jfonner@tacc.utexas.edu
Twitter: @johnfonner
TACC: www.tacc.utexas.edu
Agave: www.agaveapi.co
AgavePy: github.com/TACC/agavepy
4/5/2016 27

Weitere ähnliche Inhalte

Ähnlich wie Jupyter Ascending: a practical hand guide to galactic scale, reproducible data science

Scaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesScaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesMatthew Vaughn
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuseMatthew Vaughn
 
Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...oiisdp
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayAndrei Savu
 
U-Boot community analysis
U-Boot community analysisU-Boot community analysis
U-Boot community analysisxulioc
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
Keep Calm and Curate
Keep Calm and CurateKeep Calm and Curate
Keep Calm and CurateGarethKnight
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Drupal 7 Feeds Intro Drupal Camp Indianapolis 2011
Drupal 7 Feeds Intro Drupal Camp Indianapolis 2011Drupal 7 Feeds Intro Drupal Camp Indianapolis 2011
Drupal 7 Feeds Intro Drupal Camp Indianapolis 2011jbarclay
 
'Scikit-project': How open source is empowering open science – and vice versa
'Scikit-project': How open source is empowering open science – and vice versa'Scikit-project': How open source is empowering open science – and vice versa
'Scikit-project': How open source is empowering open science – and vice versaNathan Shammah
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Kelle Cruz
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...inside-BigData.com
 

Ähnlich wie Jupyter Ascending: a practical hand guide to galactic scale, reproducible data science (20)

Scaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data ChallengesScaling People, Not Just Systems, to Take On Big Data Challenges
Scaling People, Not Just Systems, to Take On Big Data Challenges
 
Packaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reusePackaging computational biology tools for broad distribution and ease-of-reuse
Packaging computational biology tools for broad distribution and ease-of-reuse
 
Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesday
 
U-Boot community analysis
U-Boot community analysisU-Boot community analysis
U-Boot community analysis
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Keep Calm and Curate
Keep Calm and CurateKeep Calm and Curate
Keep Calm and Curate
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Drupal 7 Feeds Intro Drupal Camp Indianapolis 2011
Drupal 7 Feeds Intro Drupal Camp Indianapolis 2011Drupal 7 Feeds Intro Drupal Camp Indianapolis 2011
Drupal 7 Feeds Intro Drupal Camp Indianapolis 2011
 
'Scikit-project': How open source is empowering open science – and vice versa
'Scikit-project': How open source is empowering open science – and vice versa'Scikit-project': How open source is empowering open science – and vice versa
'Scikit-project': How open source is empowering open science – and vice versa
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
 

Kürzlich hochgeladen

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 

Kürzlich hochgeladen (20)

REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 

Jupyter Ascending: a practical hand guide to galactic scale, reproducible data science

  • 1. JUPYTER ASCENDING: A PRACTICAL HAND GUIDE TO GALACTIC SCALE, REPRODUCIBLE DATA SCIENCE John Fonner, PhD University of Texas at Austin April 5th, 2016 4/5/2016 1
  • 2.  Photos, Tweets, and hate mail all welcome!  Email: jfonner@tacc.utexas.edu  Twitter: @johnfonner 4/5/2016 2
  • 3. SCIENCE AS A SECOND THOUGHT 1. Formulate a theory 2. Gather data 3. Learn about data storage 4. Learn about data movement protocols 5. Lose data 6. Check out of rehab 7. Learn about backup and replication 8. Gather data 9. Learn about versioning 10. Start preliminary analysis 11. Buy a newer laptop 12. Buy more memory 13. Buy a desktop with more memory 14. Buy a bigger monitor & GPUs “for work” 15. Google “250GB Excel Spreadsheet” 16. Learn about batch processing 17. Learn about batch schedulers 18. Learn about patience. 19. Learn more about data storage 20. Learn about distributed systems. 21. Go back through notes to remember the science question. 22. Learn R & Python 23. Learn linux admin 24. Finish preliminary analysis. 25. Grow a ponytail 26. Write a paper. 27. Learn about data publishing 28. Learn about reproducibility 29. Plot the death of your advisor/dept. head 30. Apply for grants & research allocations on public systems 31. Wait to apply next time 32. Finish analyzing data 33. Reformulate your theory 34. Goto 1 4/5/2016 3
  • 7. SCIENTISTS, WITH FEW EXCEPTIONS, ARE NOT TRAINED PROGRAMMERS  Research is hard  Coding is hard  Research code is  well designed,  documented,  leverages design patterns,  highly reusable,  portable,  and usually open source. 4/5/2016 7
  • 8. ACCESSIBILITY >= CAPABILITY  For scientific reproducibility, the impact of your work will be more about accessibility than capability  Domain grad students, not sys admins, are the early adopters  Where can we focus effort to create community around capability? 4/5/2016 8
  • 9.  What has changed the least about the computation you do over the last 10 years?  What do we ask domain researchers to learn to use our tools and data? 4/5/2016 9 Memory/CPU/Disk Operating System Applications Interface
  • 13.  Decoupling the technology “stack” 4/5/2016 13 “Reproducers” • Web Browser • GUIs • Windows / Mac OS Support • Sample Data and Sample Workflows “Producers” • Linux CLI • Hadoop / GPFS / Lustre • Clusters / Clouds / Containers • Dockerfile / Makefile / Ansible
  • 14. BACKEND INFRASTRUCTURE: SYSTEMS  Categorize systems as either Storage or Execution  Describe and support relevant protocols, directories, schedulers, and quotas  Each system includes the credentials to log into the system (SSH Keys, X509, username/password)  Register everything with a JSON document http://agaveapi.co/documentation/tutorials/system- management-tutorial/ 4/5/2016 14
  • 15. BACKEND INFRASTRUCTURE: APPS  An “App” is a versioned instance of a software package on a specific Execution System  App assets are bundled into a directory and stored on a Storage System  Apps can be private, shared with individual users, or made public  Public apps are compressed, assigned a checksum, and stored in a protected space http://agaveapi.co/documentation/tutorials/app- management-tutorial/ 4/5/2016 15
  • 16. BACKEND INFRASTRUCTURE: JOBS  A “Job” is an execution of an App with a specific set of input files and parameters  All jobs are given an ID, all inputs and parameters are preserved, output is also tracked  Jobs can be shared with others http://agaveapi.co/documentation/tutorials/job- management-tutorial/ 4/5/2016 16
  • 22. DEVELOPER COMMAND-LINE TOOLS  https://bitbucket.org/agaveapi/cli  Requires bash and python’s json.tool  Uses caching for authentication  Parses JSON responses to condense output  As a Linux user, this is home-sweet-home 4/5/2016 22
  • 23. WHAT ABOUT JUPYTER?  Bleeding edge research will never be on a webpage  Data exploration “outside the app” also needs to be captured  An infrastructure for responsible computing at scale inevitably must support responsible data exploration  Jupyter has broad OS support, domain adoption, domain libraries, and a more interactive UI 4/5/2016 23
  • 24. AGAVEPY  github.com/TACC/agavepy  Pythonic wrapper for all Agave endpoints  pip install agavepy  Developers actively “dogfooding” the module  (Obviously) usable within Jupyter  Has had greater uptake by users (not just developers) 4/5/2016 24
  • 25. AGAVE-AWARE JUPYTERHUB  Going one step further – give users a notebook  jupyter.public.tenants.prod.agaveapi.co/  (Free) account creation here: public.tenants.prod.agaveapi.co/create_account  Beta implementation at the moment  data purges during updates  Limited capacity on the current VM  All notebooks run inside Docker containers 4/5/2016 25
  • 26. WHAT’S NEXT?  Full-featured developer portal  Open-source reference implementation of an Angular Javascript portal built on Agave  Additional Jupyter notebook examples  Production-grade support for a hosted JupyterHub 4/5/2016 26
  • 27. THANKS! QUESTIONS? Email: jfonner@tacc.utexas.edu Twitter: @johnfonner TACC: www.tacc.utexas.edu Agave: www.agaveapi.co AgavePy: github.com/TACC/agavepy 4/5/2016 27

Hinweis der Redaktion

  1. Traditionally, the Methods section of a paper captured the process In computation, we commonly have the methods section + the raw input data Enterprising students add source code