SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Computable Content: 

Notebooks, containers, and data-centric
organizational learning
Domino Data Science Popup

2017-02-22
Paco Nathan, @pacoid

Dir, Learning Group @ O’Reilly Media
Project Jupyter
3
Project Jupyter is the evolution of iPython notebooks,
applied to a range of different programming languages
and environments
https://jupyter.org/
https://github.com/ipython/ipython/wiki/IPython-
kernels-for-other-languages
Some history…
4
Download Anaconda:
continuum.io/downloads
Activate the environment needed:
source activate py3k
Launch Juypter:
jupyter notebook
An example notebook (requires installs; see notes):
github.com/ceteri/oriole_jupyterday_atl/blob/master/example.ipynb
Installation and launch using Anaconda
5
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''
from textblob import TextBlob
blob = TextBlob(text)
print(blob.tags)
print(blob.noun_phrases)
Installation and launch using Anaconda
6
7
At its core, one can think of Jupyter as a suite 

of network protocols:
Jupyter is to the remote semantics of a REPL

as…

HTTP is to the remote semantics of file share
A suite of network protocols
8
An excellent team
9
JupyterHub
github.com/jupyterhub/jupyterhub
Jupyter in Education
groups.google.com/forum/#!forum/jupyter-education
JupyterLab (alpha preview)
github.com/jupyterlab/jupyterlab
Jupyter Kernels
github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages
Projects:
10
documentation
jupyter.readthedocs.io/en/latest/index.html
discussions
groups.google.com/forum/#!forum/jupyter
gitter.im/jupyter/jupyter
events
calendar.google.com/calendar/embed?
src=p51j0ac1iccmj44tae12hq4dk0%40group.calendar.google.com
Resources:
11
speaking of upcoming events, stay tuned for …
Resources:
Computable Content
14
An observation…
15
Jupyter @ O’Reilly Media
Embracing Jupyter Notebooks at O'Reilly

oreilly.com/ideas/jupyter-at-oreilly
Learn alongside innovators, thought-by-thought, in context

oreilly.com/ideas/oreilly-oriole-learn-alongside-innovators-
thought-by-thought-in-context
Oriole Online Tutorials

safaribooksonline.com/oriole/
How Do You Learn?
oreilly.com/learning/how-do-you-learn
16
For example…
• A unique new medium blends code,
data, text, and video into a narrated
learning experience with computable
content
• Purely browser-based UX; zero
installation required
• Substantially higher engagement
metrics
• Opens the door for live coding 

in assessments
• GitHub lists over 300K public 

Jupyter notebooks
Regex Golf by Peter Norvig

oreilly.com/learning/regex-golf-
with-peter-norvig
17
Motivations
O’Reilly needed a way for authors to use Jupyter notebooks to create
professional publications. We also wanted to integrate video narration
into the UX. The result is a unique new medium called Oriole:
• Jupyter notebooks are used in the middleware
• each viewer gets a 100% HTML experience 

(no download/install needed)
• context as a “unit of thought”
• the code and video are sync’ed together
• each web session has a Docker container running in the cloud
18
Motivations
Innovators in programming, data science, dev ops, design, etc., tend to
be really busy people. Tutorials are now much quicker to publish than
“traditional” books and videos. The audience gets direct, hands-on,
contextualized experience across a wide variety of programming
environments.
19
Literate Programming, Don Knuth

literateprogramming.com/
Paraphrased:
Instead of telling computers what to do, tell other
people what you want the computers to do
Some history
20
Wolfram Research introduced notebooks in 1988 

for working with Mathematica…
Some history
21
PyCon 2016 Keynote, Lorena Barba
youtu.be/ckW1xuGVpug?t=35m11s (video)
figshare.com/articles/PyCon2016_Keynote/3407779 (slides)
Highly recommended: speech acts (based 

on Winograd and Flores) as theory for what 

we’re doing here
More recently
Notebook Practice
23
• focus on a concise “unit of thought”
• invest the time and editorial effort to create a good intro
• keep your narrative simple and reasonably linear
• “chunk” the text and code into understandable parts
• alternate between text, code, output, further links, etc.
• use markdown for interesting links: background, deep-dive, etc.
• code cells shouldn’t be long (< 10 lines), must show output
• load data+libraries from the container, not the network
• clear all output then “Run All” – or it didn’t happen
• video narratives: there’s text, and there’s subtext...
• pause after each “beat” – smile, breathe, let people follow you
Tips learned by teaching with Jupyter
For the JVM people: stop thinking only about IDEs, Ivy, Maven, etc. (ibid, Knuth1984)

BUILD UBER JARS, LOAD LIBS FROM CONTAINER, NOT THE NETWORK!

(apologies for shouting)
24
Jupyter notebooks + Git repos provide a low-cost,
pragmatic way toward the practice of repeatable
science – in this case, repeatable Data Science
• executable documents
• code + params + results + descriptions
• shareable insights
Notebooks: a cure for silos
25
In data science, we see the benefits to teams for shared
insights, storytelling, etc.
Meanwhile domain expertise is generally more important than
knowledge about tools
There’s a value for developers to use notebooks in lieu of IDEs
in some cases – what are those cases?
GitHub now renders notebooks, so they can be used for
documentation, reporting, etc.
Digital Object Identifiers (DOI) can be assigned through
Zenodo, making notebooks citable for academic publication
“Sharing is caring”
Authoring & Scale-Out
27
Launchbot.io
28
Launchbot allows a notebook author to build a
container that includes the required Jupyter kernel,
installed libraries, datasets, etc.
You need to have Docker installed on your laptop
The backend uses Git and DockerHub to manage
containers
For scale, deploy to DC/OS
Achieving scale
29
A notebook, a container, and ~20 minutes of
informal video walk into a bar...
O’Reilly Media conferences + training:
NLP in Python

repeated live online courses
Strata

SJ Mar 13-16

Deep Learning sessions, 2-day training
Artificial Intelligence

NY Jun 26-29, SF Sep 17-20

SF CFP is open, follow @OreillyAI for updates
speaker:
periodic newsletter for updates, 

events, conf summaries, etc.:
liber118.com/pxn/

@pacoid
A modest proposalJust Enough Math Building Data
Science Teams
Hylbert-SpeysHow Do You Learn?

Weitere ähnliche Inhalte

Andere mochten auch

Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersDomino Data Lab
 
Data Science and Goodhart's Law
Data Science and Goodhart's LawData Science and Goodhart's Law
Data Science and Goodhart's LawDomino Data Lab
 
Success Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackSuccess Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackDomino Data Lab
 
Sentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaSentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaDomino Data Lab
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesCapturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesDomino Data Lab
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataA Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataDomino Data Lab
 
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnRealtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnDomino Data Lab
 
Machine Learning at Netflix
Machine Learning at NetflixMachine Learning at Netflix
Machine Learning at NetflixDomino Data Lab
 
Challenges of Predicting User Engagement
Challenges of Predicting User EngagementChallenges of Predicting User Engagement
Challenges of Predicting User EngagementDomino Data Lab
 
DSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanDSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanPaco Nathan
 
Paquetes oficiales living tours peru
Paquetes oficiales living tours peruPaquetes oficiales living tours peru
Paquetes oficiales living tours peruFAUL KNER RAMOS LEON
 
Gayane cather resume 2017
Gayane cather resume 2017Gayane cather resume 2017
Gayane cather resume 2017Gayane Cather
 

Andere mochten auch (14)

Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software Engineers
 
Data Science and Goodhart's Law
Data Science and Goodhart's LawData Science and Goodhart's Law
Data Science and Goodhart's Law
 
Success Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackSuccess Through an Actionable Data Science Stack
Success Through an Actionable Data Science Stack
 
Sentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaSentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social Media
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesCapturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataA Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
 
Open Data for Social Good
Open Data for Social GoodOpen Data for Social Good
Open Data for Social Good
 
The Right Question
The Right QuestionThe Right Question
The Right Question
 
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnRealtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going On
 
Machine Learning at Netflix
Machine Learning at NetflixMachine Learning at Netflix
Machine Learning at Netflix
 
Challenges of Predicting User Engagement
Challenges of Predicting User EngagementChallenges of Predicting User Engagement
Challenges of Predicting User Engagement
 
DSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco NathanDSSG Speaker Series: Paco Nathan
DSSG Speaker Series: Paco Nathan
 
Paquetes oficiales living tours peru
Paquetes oficiales living tours peruPaquetes oficiales living tours peru
Paquetes oficiales living tours peru
 
Gayane cather resume 2017
Gayane cather resume 2017Gayane cather resume 2017
Gayane cather resume 2017
 

Ähnlich wie Computable content: Notebooks, containers, and data-centric organizational learning

Computable Content
Computable ContentComputable Content
Computable ContentPaco Nathan
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons LearnedPaco Nathan
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoEGI Federation
 
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)PyData
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioMuralidharan Deenathayalan
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioMuralidharan Deenathayalan
 
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiryVishwas N
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of PythonAsia Smith
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroidsJose Enrique Ruiz
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.Nicholas Pringle
 
Jupyter notebook for interactive data visualization敖
Jupyter notebook for interactive data visualization敖Jupyter notebook for interactive data visualization敖
Jupyter notebook for interactive data visualization敖Jellyfish.tech
 
Sensing Platform Overview
Sensing Platform OverviewSensing Platform Overview
Sensing Platform Overviewabyssknight
 
Azure Notebooks - Jupyter for the Cloud
Azure Notebooks - Jupyter for the CloudAzure Notebooks - Jupyter for the Cloud
Azure Notebooks - Jupyter for the CloudCameron Vetter
 
Jupyter: A Gateway for Scientific Collaboration and Education
Jupyter: A Gateway for Scientific Collaboration and EducationJupyter: A Gateway for Scientific Collaboration and Education
Jupyter: A Gateway for Scientific Collaboration and EducationCarol Willing
 
Portland Science Hack Day: Open Source Hardware
Portland Science Hack Day: Open Source HardwarePortland Science Hack Day: Open Source Hardware
Portland Science Hack Day: Open Source HardwareDrew Fustini
 
JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...No Bu
 
Raspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflowRaspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflow霈萱 蔡
 
Take the Smalltalk Red Pill
Take the Smalltalk Red PillTake the Smalltalk Red Pill
Take the Smalltalk Red PillOSOCO
 

Ähnlich wie Computable content: Notebooks, containers, and data-centric organizational learning (20)

Computable Content
Computable ContentComputable Content
Computable Content
 
Computable Content: Lessons Learned
Computable Content: Lessons LearnedComputable Content: Lessons Learned
Computable Content: Lessons Learned
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
 
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning StudioIntroduction to Jupyter notebook and MS Azure Machine Learning Studio
Introduction to Jupyter notebook and MS Azure Machine Learning Studio
 
Machine learning in cybersecutiry
Machine learning in cybersecutiryMachine learning in cybersecutiry
Machine learning in cybersecutiry
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 
Jupyter notebook for interactive data visualization敖
Jupyter notebook for interactive data visualization敖Jupyter notebook for interactive data visualization敖
Jupyter notebook for interactive data visualization敖
 
London level39
London level39London level39
London level39
 
Sensing Platform Overview
Sensing Platform OverviewSensing Platform Overview
Sensing Platform Overview
 
Azure Notebooks - Jupyter for the Cloud
Azure Notebooks - Jupyter for the CloudAzure Notebooks - Jupyter for the Cloud
Azure Notebooks - Jupyter for the Cloud
 
Jupyter: A Gateway for Scientific Collaboration and Education
Jupyter: A Gateway for Scientific Collaboration and EducationJupyter: A Gateway for Scientific Collaboration and Education
Jupyter: A Gateway for Scientific Collaboration and Education
 
Portland Science Hack Day: Open Source Hardware
Portland Science Hack Day: Open Source HardwarePortland Science Hack Day: Open Source Hardware
Portland Science Hack Day: Open Source Hardware
 
JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...
 
Raspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflowRaspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflow
 
Take the Smalltalk Red Pill
Take the Smalltalk Red PillTake the Smalltalk Red Pill
Take the Smalltalk Red Pill
 

Mehr von Domino Data Lab

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...Domino Data Lab
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...Domino Data Lab
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataDomino Data Lab
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itDomino Data Lab
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationDomino Data Lab
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryDomino Data Lab
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusDomino Data Lab
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterDomino Data Lab
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceDomino Data Lab
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Domino Data Lab
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Domino Data Lab
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at ScaleDomino Data Lab
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataDomino Data Lab
 
Software Engineering for Data Scientists
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data ScientistsDomino Data Lab
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Domino Data Lab
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyDomino Data Lab
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsDomino Data Lab
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceDomino Data Lab
 

Mehr von Domino Data Lab (20)

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops data
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile Virus
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with Jupyter
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data Science
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
Software Engineering for Data Scientists
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data Scientists
 
Making Big Data Smart
Making Big Data SmartMaking Big Data Smart
Making Big Data Smart
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data Science
 

Kürzlich hochgeladen

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Kürzlich hochgeladen (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Computable content: Notebooks, containers, and data-centric organizational learning

  • 1. Computable Content: 
 Notebooks, containers, and data-centric organizational learning Domino Data Science Popup
 2017-02-22 Paco Nathan, @pacoid
 Dir, Learning Group @ O’Reilly Media
  • 3. 3 Project Jupyter is the evolution of iPython notebooks, applied to a range of different programming languages and environments https://jupyter.org/ https://github.com/ipython/ipython/wiki/IPython- kernels-for-other-languages Some history…
  • 4. 4 Download Anaconda: continuum.io/downloads Activate the environment needed: source activate py3k Launch Juypter: jupyter notebook An example notebook (requires installs; see notes): github.com/ceteri/oriole_jupyterday_atl/blob/master/example.ipynb Installation and launch using Anaconda
  • 5. 5 text = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. ''' from textblob import TextBlob blob = TextBlob(text) print(blob.tags) print(blob.noun_phrases) Installation and launch using Anaconda
  • 6. 6
  • 7. 7 At its core, one can think of Jupyter as a suite 
 of network protocols: Jupyter is to the remote semantics of a REPL
 as…
 HTTP is to the remote semantics of file share A suite of network protocols
  • 9. 9 JupyterHub github.com/jupyterhub/jupyterhub Jupyter in Education groups.google.com/forum/#!forum/jupyter-education JupyterLab (alpha preview) github.com/jupyterlab/jupyterlab Jupyter Kernels github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages Projects:
  • 11. 11 speaking of upcoming events, stay tuned for … Resources:
  • 12.
  • 15. 15 Jupyter @ O’Reilly Media Embracing Jupyter Notebooks at O'Reilly
 oreilly.com/ideas/jupyter-at-oreilly Learn alongside innovators, thought-by-thought, in context
 oreilly.com/ideas/oreilly-oriole-learn-alongside-innovators- thought-by-thought-in-context Oriole Online Tutorials
 safaribooksonline.com/oriole/ How Do You Learn? oreilly.com/learning/how-do-you-learn
  • 16. 16 For example… • A unique new medium blends code, data, text, and video into a narrated learning experience with computable content • Purely browser-based UX; zero installation required • Substantially higher engagement metrics • Opens the door for live coding 
 in assessments • GitHub lists over 300K public 
 Jupyter notebooks Regex Golf by Peter Norvig
 oreilly.com/learning/regex-golf- with-peter-norvig
  • 17. 17 Motivations O’Reilly needed a way for authors to use Jupyter notebooks to create professional publications. We also wanted to integrate video narration into the UX. The result is a unique new medium called Oriole: • Jupyter notebooks are used in the middleware • each viewer gets a 100% HTML experience 
 (no download/install needed) • context as a “unit of thought” • the code and video are sync’ed together • each web session has a Docker container running in the cloud
  • 18. 18 Motivations Innovators in programming, data science, dev ops, design, etc., tend to be really busy people. Tutorials are now much quicker to publish than “traditional” books and videos. The audience gets direct, hands-on, contextualized experience across a wide variety of programming environments.
  • 19. 19 Literate Programming, Don Knuth
 literateprogramming.com/ Paraphrased: Instead of telling computers what to do, tell other people what you want the computers to do Some history
  • 20. 20 Wolfram Research introduced notebooks in 1988 
 for working with Mathematica… Some history
  • 21. 21 PyCon 2016 Keynote, Lorena Barba youtu.be/ckW1xuGVpug?t=35m11s (video) figshare.com/articles/PyCon2016_Keynote/3407779 (slides) Highly recommended: speech acts (based 
 on Winograd and Flores) as theory for what 
 we’re doing here More recently
  • 23. 23 • focus on a concise “unit of thought” • invest the time and editorial effort to create a good intro • keep your narrative simple and reasonably linear • “chunk” the text and code into understandable parts • alternate between text, code, output, further links, etc. • use markdown for interesting links: background, deep-dive, etc. • code cells shouldn’t be long (< 10 lines), must show output • load data+libraries from the container, not the network • clear all output then “Run All” – or it didn’t happen • video narratives: there’s text, and there’s subtext... • pause after each “beat” – smile, breathe, let people follow you Tips learned by teaching with Jupyter For the JVM people: stop thinking only about IDEs, Ivy, Maven, etc. (ibid, Knuth1984)
 BUILD UBER JARS, LOAD LIBS FROM CONTAINER, NOT THE NETWORK!
 (apologies for shouting)
  • 24. 24 Jupyter notebooks + Git repos provide a low-cost, pragmatic way toward the practice of repeatable science – in this case, repeatable Data Science • executable documents • code + params + results + descriptions • shareable insights Notebooks: a cure for silos
  • 25. 25 In data science, we see the benefits to teams for shared insights, storytelling, etc. Meanwhile domain expertise is generally more important than knowledge about tools There’s a value for developers to use notebooks in lieu of IDEs in some cases – what are those cases? GitHub now renders notebooks, so they can be used for documentation, reporting, etc. Digital Object Identifiers (DOI) can be assigned through Zenodo, making notebooks citable for academic publication “Sharing is caring”
  • 28. 28 Launchbot allows a notebook author to build a container that includes the required Jupyter kernel, installed libraries, datasets, etc. You need to have Docker installed on your laptop The backend uses Git and DockerHub to manage containers For scale, deploy to DC/OS Achieving scale
  • 29. 29 A notebook, a container, and ~20 minutes of informal video walk into a bar...
  • 30. O’Reilly Media conferences + training: NLP in Python
 repeated live online courses Strata
 SJ Mar 13-16
 Deep Learning sessions, 2-day training Artificial Intelligence
 NY Jun 26-29, SF Sep 17-20
 SF CFP is open, follow @OreillyAI for updates
  • 31.
  • 32. speaker: periodic newsletter for updates, 
 events, conf summaries, etc.: liber118.com/pxn/
 @pacoid A modest proposalJust Enough Math Building Data Science Teams Hylbert-SpeysHow Do You Learn?