SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Scaling Notebooks for
Deep Learning
workloads
Luciano Resende
JupyterCon - 2018
1© 2018 IBM Corporation
About me - Luciano Resende
2
Open Source AI Platform Architect – IBM – CODAIT
• Senior Technical Staff Member at IBM, contributing to open source for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Toree, Apache Spark among other projects related to AI/ML platforms
lresende@us.ibm.com
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
© 2018 IBM Corporation
Jupyter Notebooks
Overview
3© 2018 IBM Corporation
Jupyter Notebooks
© 2018 IBM Corporation 4
Notebooks are interactive
computational
environments, in which
you can combine code
execution, rich text,
mathematics, plots and
rich media.
Jupyter Notebooks
© 2018 IBM Corporation 5
• Notebook UI runs on the browser
• The Notebook Server serves the
’Notebooks’
• Kernels interpret/execute cell
contents
– Are responsible for code execution
– Abstracts different languages
– 1:1 relationship with Notebook
– Runs and consume resources as long as
notebook is running
Jupyter Notebooks
and Deep Learning
6© 2018 IBM Corporation
Deep Learning workloads
© 2018 IBM Corporation 7
Resource Intensive workloads
Requires expensive hardware (GPUs)
Long Running training jobs
- Simple MNIST takes over one hour
WITHOUT a decent GPU
- Other non complex deep learning model
training can easily take over a day
WITH GPUs
Jupyter & Kubernetes
© 2018 IBM Corporation 8
Kubernetes Platform
- Containers provides a flexible way to
deploy applications and are here to stay
- Kubernetes enables easy management
of containerized applications and
resources with the benefit of Elasticity
and Quality of Services
Source: https://github.com/Langhalsdino/Kubernetes-GPU-Guide
Enterprise Gateway & Kubernetes
© 2018 IBM Corporation
Supported Platforms
FfDL
Before Enterprise Gateway After Enterprise Gateway
Before Jupyter Enterprise Gateway …
• Resources required for all kernels needs to
be allocated during Notebook Server pod
creation
• Resources limited to what is physically
available on the host node that runs all
kernels and associated Spark drivers
After Jupyter Enterprise Gateway …
• Gateway pod very lightweight
• Kernels in their own pod, isolation
• Kernel pods built from community images:
Spark-on-K8s, TensorFlow, Keras, etc.
Jupyter Enterprise Gateway - Kubernetes
© 2018 IBM Corporation 10
Container images defined in kernelspec
Community image
Kernel
Spark on K8
Kernel
Distributed
File
System
Vanilla Kernels
Spark based kernels
Gateway
nb2kg
nb2kg
Deep Learning Platforms
© 2018 IBM Corporation 11
Prohibited costs
- Deep Learning resources are prohibitive in
costs to be locked/idle during interactive
development
Deep Learning Platforms
- We have seen the rise of Deep Learning
platforms that leverage containers and
Kubernetes as the basis of their
infrastructure
- Kubernetes enables Deep Learning
platforms to easily share and restrict
accelerated hardware
Fabric for Deep Learning
IBM Watson Studio
Deep Learning as a service
Batch
oriented
developm
ent
Deep Learning Workspace
March 30 2018 / © 2018 IBM Corporation 12
Streamline Data Science user experience when
coming from Notebook/Interactive
development interfaces
• Current process include multiple steps, one
being decomposing the notebook into an
application that needs to be submitted as a
zip to the deep learning runtime which
becomes a show stopper for data scientists
to adopt FfDL and DLaas
March 30 2018 / © 2018 IBM Corporation 13
Streamline the Deep Learning application
lifecycle
• Run local notebook experiments, with small data
samples and seamlessly validate experiments on
Deep Learning environments
• IBM Cloud DLaaS, FfDL (open source), KubeFlow
(open source)
Simplify productionalization of Model training
and serving from Notebooks
• Enable running/scheduling notebooks on
production environments as batch jobs
• Results can be made available via updated
notebook, or exported to html, pdf and a few
other formats.
Interactive development
lifecycle done on
commodity hardware
with sampled data
Training on full dataset
gets scheduled as
batch jobs on deep
learning infrastructure
Deep Learning Workspace
14March 30 2018 / © 2018 IBM Corporation
March 30 2018 / © 2018 IBM Corporation 15
• User select where to run the
experiment
• Job is packaged and submitted on
behalf of user
• User has access to Job Console to
monitor experiment
Deep Learning Workspace
March 30 2018 / © 2018 IBM Corporation
Deep Learning Extensions
Notebook Extension (button)
https://github.com/lresende/enterprise_scheduler_extension
Notebook Scheduler (backend)
https://github.com/lresende/enterprise_scheduler
16
Enable interactive development on
commodity hardware while easily
enabling model training on Deep
Learning infrastructure with GPU based
hardware.
Deep Learning Workspace
17March 30 2018 / © 2018 IBM Corporation
BACKUP
SLIDES
18© 2018 IBM Corporation
Jupyter Enterprise Gateway
19© 2018 IBM Corporation
Jupyter Enterprise
Gateway
© 2018 IBM Corporation
Jupyter Enterprise Gateway at IBM Code
https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter-incubator/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Supported Kernels
Supported Platforms
20
A lightweight, multi-tenant, scalable
and secure gateway that enables
Jupyter Notebooks to share resources
across an Apache Spark or Kubernetes
cluster for Enterprise/Cloud use cases
Spectrum Conductor
+
Jupyter Enterprise Gateway Features
© 2018 IBM Corporation
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
21
16
32
48
64
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS
Optimized Resource Allocation
– Utilize resources on all cluster nodes by running kernels as Spark
applications in YARN Cluster Mode.
– Pluggable architecture to enable support for additional Resource Managers
Enhanced Security
– End-to-End secure communications
• Secure socket communications
• Encrypted HTTP communication using SSL
Multiuser support with user impersonation
– Enhance security and sandboxing by enabling user impersonation when
running kernels (using Kerberos).
– Individual HDFS home folder for each notebook user.
– Use the same user ID for notebook and batch jobs.
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Jupyter Enterprise Gateway – YARN
© 2018 IBM Corporation 22
YARN Cluster
YARN
Workers
Gateway Node
Jupyter Enterprise Gateway
• Multitenancy
• Remote kernel lifecycle management via process proxies
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Impersonation:
Alice’s kernel runs
under Alice’s user ID.
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
SecurityLayer
nb2kg
nb2kg
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Bob
Alice
Enterprise Gateway & Kubernetes
© 2018 IBM Corporation
Supported Platforms
FfDL
Before Enterprise Gateway After Enterprise Gateway
Before Jupyter Enterprise Gateway …
• Resources required for all kernels needs to
be allocated during Notebook Server pod
creation
• Resources limited to what is physically
available on the host node that runs all
kernels and associated Spark drivers
After Jupyter Enterprise Gateway …
• Gateway pod very lightweight
• Kernels in their own pod, isolation
• Kernel pods built from community images:
Spark-on-K8s, TensorFlow, Keras, etc.
Jupyter Enterprise Gateway - Kubernetes
© 2018 IBM Corporation 24
Container images defined in kernelspec
Community image
Kernel
Spark on K8
Kernel
Distributed
File
System
Vanilla Kernels
Spark based kernels
Gateway
nb2kg
nb2kg
The Secret Sauce:
Process Proxies mixed with Kernel Launchers
© 2018 IBM Corporation
Process Proxy:
• Abstracts kernel process represented by Jupyter
framework
• Pluggable class definition identified in kernelspec
(kernel.json)
• Manages kernel lifecycle
Kernel Launcher:
• Embeds target kernel
• Listens on gateway communication port
• Conveys interrupt requests (via local signal)
• Could be extended for additional communications
{
"language": "python",
"display_name": "Spark - Python (Kubernetes Mode)",
"process_proxy": {
"class_name":
"enterprise_gateway.services.processproxies.k8s.KubernetesProcessP
roxy",
"config": {
"image_name": "elyra/kubernetes-kernel-py:dev",
"executor_image_name": "elyra/kubernetes-kernel-py:dev”,
"port_range" : "40000..42000"
}
},
"env": {
"SPARK_HOME": "/opt/spark",
"SPARK_OPTS": "--master k8s://https://${KUBERNETES_SERVICE_HOST
--deploy-mode cluster --name …",
…
},
"argv": [
"/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.
sh",
"{connection_file}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
26March 30 2018 / © 2018 IBM Corporation

Weitere ähnliche Inhalte

Was ist angesagt?

Building distribution packages with Docker
Building distribution packages with DockerBuilding distribution packages with Docker
Building distribution packages with DockerBruno Cornec
 
Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016Bruno Cornec
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Herding your cattle from dev to ops
Herding your cattle from dev to opsHerding your cattle from dev to ops
Herding your cattle from dev to opsBastiaan Schaap
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerDavinder Kohli
 
Open Stack Cloud Services
Open Stack Cloud ServicesOpen Stack Cloud Services
Open Stack Cloud ServicesSaurabh Gupta
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Akash Tandon
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
Automated Lifecycle Management - CloudFoundry on OpenStack
Automated Lifecycle Management - CloudFoundry on OpenStackAutomated Lifecycle Management - CloudFoundry on OpenStack
Automated Lifecycle Management - CloudFoundry on OpenStackAnimesh Singh
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOAnimesh Singh
 
C418 - Build, Deploy and Manage Your First Open Pattern with PureApplication ...
C418 - Build, Deploy and Manage Your First Open Pattern with PureApplication ...C418 - Build, Deploy and Manage Your First Open Pattern with PureApplication ...
C418 - Build, Deploy and Manage Your First Open Pattern with PureApplication ...Hendrik van Run
 
AWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWSAWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWSMassimo Ferre'
 
.docker : how to deploy Digital Experience in a container drinking a cup of c...
.docker : how to deploy Digital Experience in a container drinking a cup of c....docker : how to deploy Digital Experience in a container drinking a cup of c...
.docker : how to deploy Digital Experience in a container drinking a cup of c...Andrea Fontana
 
OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021The Khronos Group Inc.
 
Vulkan ML Japan Virtual Open House Feb 2021
Vulkan ML Japan Virtual Open House Feb 2021Vulkan ML Japan Virtual Open House Feb 2021
Vulkan ML Japan Virtual Open House Feb 2021The Khronos Group Inc.
 
Raspberry Pi - Lecture 1 Introduction
Raspberry Pi - Lecture 1 IntroductionRaspberry Pi - Lecture 1 Introduction
Raspberry Pi - Lecture 1 IntroductionMohamed Abdallah
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...Edge AI and Vision Alliance
 
Cloud native buildpacks_collabnix
Cloud native buildpacks_collabnixCloud native buildpacks_collabnix
Cloud native buildpacks_collabnixSuman Chakraborty
 

Was ist angesagt? (20)

Building distribution packages with Docker
Building distribution packages with DockerBuilding distribution packages with Docker
Building distribution packages with Docker
 
Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016
 
KarthikSNOW_CV
KarthikSNOW_CVKarthikSNOW_CV
KarthikSNOW_CV
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Herding your cattle from dev to ops
Herding your cattle from dev to opsHerding your cattle from dev to ops
Herding your cattle from dev to ops
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, Docker
 
Open Stack Cloud Services
Open Stack Cloud ServicesOpen Stack Cloud Services
Open Stack Cloud Services
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
HP and linux
HP and linuxHP and linux
HP and linux
 
Automated Lifecycle Management - CloudFoundry on OpenStack
Automated Lifecycle Management - CloudFoundry on OpenStackAutomated Lifecycle Management - CloudFoundry on OpenStack
Automated Lifecycle Management - CloudFoundry on OpenStack
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPO
 
C418 - Build, Deploy and Manage Your First Open Pattern with PureApplication ...
C418 - Build, Deploy and Manage Your First Open Pattern with PureApplication ...C418 - Build, Deploy and Manage Your First Open Pattern with PureApplication ...
C418 - Build, Deploy and Manage Your First Open Pattern with PureApplication ...
 
AWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWSAWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWS
 
.docker : how to deploy Digital Experience in a container drinking a cup of c...
.docker : how to deploy Digital Experience in a container drinking a cup of c....docker : how to deploy Digital Experience in a container drinking a cup of c...
.docker : how to deploy Digital Experience in a container drinking a cup of c...
 
OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021OpenCL Overview Japan Virtual Open House Feb 2021
OpenCL Overview Japan Virtual Open House Feb 2021
 
Vulkan ML Japan Virtual Open House Feb 2021
Vulkan ML Japan Virtual Open House Feb 2021Vulkan ML Japan Virtual Open House Feb 2021
Vulkan ML Japan Virtual Open House Feb 2021
 
Raspberry Pi - Lecture 1 Introduction
Raspberry Pi - Lecture 1 IntroductionRaspberry Pi - Lecture 1 Introduction
Raspberry Pi - Lecture 1 Introduction
 
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ..."An Update on Open Standard APIs for Vision Processing," a Presentation from ...
"An Update on Open Standard APIs for Vision Processing," a Presentation from ...
 
Cloud native buildpacks_collabnix
Cloud native buildpacks_collabnixCloud native buildpacks_collabnix
Cloud native buildpacks_collabnix
 

Ähnlich wie Scaling notebooks for Deep Learning workloads

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkLuciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayLuciano Resende
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gatewayLuciano Resende
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_indexChester Chen
 
Randstad Docker meetup - Serverless
Randstad Docker meetup - ServerlessRandstad Docker meetup - Serverless
Randstad Docker meetup - ServerlessDavid Delabassee
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Usernetes: Kubernetes as a non-root user
Usernetes: Kubernetes as a non-root userUsernetes: Kubernetes as a non-root user
Usernetes: Kubernetes as a non-root userAkihiro Suda
 
Comparing Next-Generation Container Image Building Tools
 Comparing Next-Generation Container Image Building Tools Comparing Next-Generation Container Image Building Tools
Comparing Next-Generation Container Image Building ToolsAkihiro Suda
 
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
Introduction to Python GUI development with Delphi for Python - Part 1:   Del...Introduction to Python GUI development with Delphi for Python - Part 1:   Del...
Introduction to Python GUI development with Delphi for Python - Part 1: Del...Embarcadero Technologies
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
 
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...Cloud Native Day Tel Aviv
 
Developing and Deploying Microservices to IBM Cloud Private
Developing and Deploying Microservices to IBM Cloud PrivateDeveloping and Deploying Microservices to IBM Cloud Private
Developing and Deploying Microservices to IBM Cloud PrivateShikha Srivastava
 
Using MySQL Containers
Using MySQL ContainersUsing MySQL Containers
Using MySQL ContainersMatt Lord
 
Updates on webSpoon and other innovations from Hitachi R&D
Updates on webSpoon and other innovations from Hitachi R&DUpdates on webSpoon and other innovations from Hitachi R&D
Updates on webSpoon and other innovations from Hitachi R&DHiromu Hota
 
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerIndrajit Poddar
 
IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.
IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.
IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.OW2
 

Ähnlich wie Scaling notebooks for Deep Learning workloads (20)

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Randstad Docker meetup - Serverless
Randstad Docker meetup - ServerlessRandstad Docker meetup - Serverless
Randstad Docker meetup - Serverless
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Usernetes: Kubernetes as a non-root user
Usernetes: Kubernetes as a non-root userUsernetes: Kubernetes as a non-root user
Usernetes: Kubernetes as a non-root user
 
Comparing Next-Generation Container Image Building Tools
 Comparing Next-Generation Container Image Building Tools Comparing Next-Generation Container Image Building Tools
Comparing Next-Generation Container Image Building Tools
 
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
Introduction to Python GUI development with Delphi for Python - Part 1:   Del...Introduction to Python GUI development with Delphi for Python - Part 1:   Del...
Introduction to Python GUI development with Delphi for Python - Part 1: Del...
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
Kubernetes is hard! Lessons learned taking our apps to Kubernetes - Eldad Ass...
 
Developing and Deploying Microservices to IBM Cloud Private
Developing and Deploying Microservices to IBM Cloud PrivateDeveloping and Deploying Microservices to IBM Cloud Private
Developing and Deploying Microservices to IBM Cloud Private
 
Using MySQL Containers
Using MySQL ContainersUsing MySQL Containers
Using MySQL Containers
 
Updates on webSpoon and other innovations from Hitachi R&D
Updates on webSpoon and other innovations from Hitachi R&DUpdates on webSpoon and other innovations from Hitachi R&D
Updates on webSpoon and other innovations from Hitachi R&D
 
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
 
IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.
IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.
IBM Keynote presentation, OW2con'19, June 12-13, 2019, Paris.
 

Mehr von Luciano Resende

From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...Luciano Resende
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirLuciano Resende
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirLuciano Resende
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine LearningLuciano Resende
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceLuciano Resende
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningLuciano Resende
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overviewLuciano Resende
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitionsLuciano Resende
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceLuciano Resende
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSLuciano Resende
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscanyLuciano Resende
 
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache TuscanyS314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache TuscanyLuciano Resende
 

Mehr von Luciano Resende (17)

From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overview
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open source
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
 
SCA Reaches the Cloud
SCA Reaches the CloudSCA Reaches the Cloud
SCA Reaches the Cloud
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscany
 
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache TuscanyS314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
 

Kürzlich hochgeladen

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 

Kürzlich hochgeladen (20)

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 

Scaling notebooks for Deep Learning workloads

  • 1. Scaling Notebooks for Deep Learning workloads Luciano Resende JupyterCon - 2018 1© 2018 IBM Corporation
  • 2. About me - Luciano Resende 2 Open Source AI Platform Architect – IBM – CODAIT • Senior Technical Staff Member at IBM, contributing to open source for over 10 years • Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Toree, Apache Spark among other projects related to AI/ML platforms lresende@us.ibm.com https://www.linkedin.com/in/lresende @lresende1975 https://github.com/lresende © 2018 IBM Corporation
  • 4. Jupyter Notebooks © 2018 IBM Corporation 4 Notebooks are interactive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media.
  • 5. Jupyter Notebooks © 2018 IBM Corporation 5 • Notebook UI runs on the browser • The Notebook Server serves the ’Notebooks’ • Kernels interpret/execute cell contents – Are responsible for code execution – Abstracts different languages – 1:1 relationship with Notebook – Runs and consume resources as long as notebook is running
  • 6. Jupyter Notebooks and Deep Learning 6© 2018 IBM Corporation
  • 7. Deep Learning workloads © 2018 IBM Corporation 7 Resource Intensive workloads Requires expensive hardware (GPUs) Long Running training jobs - Simple MNIST takes over one hour WITHOUT a decent GPU - Other non complex deep learning model training can easily take over a day WITH GPUs
  • 8. Jupyter & Kubernetes © 2018 IBM Corporation 8 Kubernetes Platform - Containers provides a flexible way to deploy applications and are here to stay - Kubernetes enables easy management of containerized applications and resources with the benefit of Elasticity and Quality of Services Source: https://github.com/Langhalsdino/Kubernetes-GPU-Guide
  • 9. Enterprise Gateway & Kubernetes © 2018 IBM Corporation Supported Platforms FfDL Before Enterprise Gateway After Enterprise Gateway Before Jupyter Enterprise Gateway … • Resources required for all kernels needs to be allocated during Notebook Server pod creation • Resources limited to what is physically available on the host node that runs all kernels and associated Spark drivers After Jupyter Enterprise Gateway … • Gateway pod very lightweight • Kernels in their own pod, isolation • Kernel pods built from community images: Spark-on-K8s, TensorFlow, Keras, etc.
  • 10. Jupyter Enterprise Gateway - Kubernetes © 2018 IBM Corporation 10 Container images defined in kernelspec Community image Kernel Spark on K8 Kernel Distributed File System Vanilla Kernels Spark based kernels Gateway nb2kg nb2kg
  • 11. Deep Learning Platforms © 2018 IBM Corporation 11 Prohibited costs - Deep Learning resources are prohibitive in costs to be locked/idle during interactive development Deep Learning Platforms - We have seen the rise of Deep Learning platforms that leverage containers and Kubernetes as the basis of their infrastructure - Kubernetes enables Deep Learning platforms to easily share and restrict accelerated hardware Fabric for Deep Learning IBM Watson Studio Deep Learning as a service Batch oriented developm ent
  • 12. Deep Learning Workspace March 30 2018 / © 2018 IBM Corporation 12 Streamline Data Science user experience when coming from Notebook/Interactive development interfaces • Current process include multiple steps, one being decomposing the notebook into an application that needs to be submitted as a zip to the deep learning runtime which becomes a show stopper for data scientists to adopt FfDL and DLaas
  • 13. March 30 2018 / © 2018 IBM Corporation 13 Streamline the Deep Learning application lifecycle • Run local notebook experiments, with small data samples and seamlessly validate experiments on Deep Learning environments • IBM Cloud DLaaS, FfDL (open source), KubeFlow (open source) Simplify productionalization of Model training and serving from Notebooks • Enable running/scheduling notebooks on production environments as batch jobs • Results can be made available via updated notebook, or exported to html, pdf and a few other formats. Interactive development lifecycle done on commodity hardware with sampled data Training on full dataset gets scheduled as batch jobs on deep learning infrastructure Deep Learning Workspace
  • 14. 14March 30 2018 / © 2018 IBM Corporation
  • 15. March 30 2018 / © 2018 IBM Corporation 15 • User select where to run the experiment • Job is packaged and submitted on behalf of user • User has access to Job Console to monitor experiment Deep Learning Workspace
  • 16. March 30 2018 / © 2018 IBM Corporation Deep Learning Extensions Notebook Extension (button) https://github.com/lresende/enterprise_scheduler_extension Notebook Scheduler (backend) https://github.com/lresende/enterprise_scheduler 16 Enable interactive development on commodity hardware while easily enabling model training on Deep Learning infrastructure with GPU based hardware. Deep Learning Workspace
  • 17. 17March 30 2018 / © 2018 IBM Corporation
  • 19. Jupyter Enterprise Gateway 19© 2018 IBM Corporation
  • 20. Jupyter Enterprise Gateway © 2018 IBM Corporation Jupyter Enterprise Gateway at IBM Code https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/ Jupyter Enterprise Gateway source code at GitHub https://github.com/jupyter-incubator/enterprise_gateway Jupyter Enterprise Gateway Documentation http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ Supported Kernels Supported Platforms 20 A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark or Kubernetes cluster for Enterprise/Cloud use cases Spectrum Conductor +
  • 21. Jupyter Enterprise Gateway Features © 2018 IBM Corporation Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow 21 16 32 48 64 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap) Cluster Size (32GB Nodes) MAXIMUM NUMBER OF SIMULTANEOUS KERNELS Optimized Resource Allocation – Utilize resources on all cluster nodes by running kernels as Spark applications in YARN Cluster Mode. – Pluggable architecture to enable support for additional Resource Managers Enhanced Security – End-to-End secure communications • Secure socket communications • Encrypted HTTP communication using SSL Multiuser support with user impersonation – Enhance security and sandboxing by enabling user impersonation when running kernels (using Kerberos). – Individual HDFS home folder for each notebook user. – Use the same user ID for notebook and batch jobs. Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel
  • 22. Jupyter Enterprise Gateway – YARN © 2018 IBM Corporation 22 YARN Cluster YARN Workers Gateway Node Jupyter Enterprise Gateway • Multitenancy • Remote kernel lifecycle management via process proxies Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Impersonation: Alice’s kernel runs under Alice’s user ID. Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver SecurityLayer nb2kg nb2kg Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Bob Alice
  • 23. Enterprise Gateway & Kubernetes © 2018 IBM Corporation Supported Platforms FfDL Before Enterprise Gateway After Enterprise Gateway Before Jupyter Enterprise Gateway … • Resources required for all kernels needs to be allocated during Notebook Server pod creation • Resources limited to what is physically available on the host node that runs all kernels and associated Spark drivers After Jupyter Enterprise Gateway … • Gateway pod very lightweight • Kernels in their own pod, isolation • Kernel pods built from community images: Spark-on-K8s, TensorFlow, Keras, etc.
  • 24. Jupyter Enterprise Gateway - Kubernetes © 2018 IBM Corporation 24 Container images defined in kernelspec Community image Kernel Spark on K8 Kernel Distributed File System Vanilla Kernels Spark based kernels Gateway nb2kg nb2kg
  • 25. The Secret Sauce: Process Proxies mixed with Kernel Launchers © 2018 IBM Corporation Process Proxy: • Abstracts kernel process represented by Jupyter framework • Pluggable class definition identified in kernelspec (kernel.json) • Manages kernel lifecycle Kernel Launcher: • Embeds target kernel • Listens on gateway communication port • Conveys interrupt requests (via local signal) • Could be extended for additional communications { "language": "python", "display_name": "Spark - Python (Kubernetes Mode)", "process_proxy": { "class_name": "enterprise_gateway.services.processproxies.k8s.KubernetesProcessP roxy", "config": { "image_name": "elyra/kubernetes-kernel-py:dev", "executor_image_name": "elyra/kubernetes-kernel-py:dev”, "port_range" : "40000..42000" } }, "env": { "SPARK_HOME": "/opt/spark", "SPARK_OPTS": "--master k8s://https://${KUBERNETES_SERVICE_HOST --deploy-mode cluster --name …", … }, "argv": [ "/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run. sh", "{connection_file}", "--RemoteProcessProxy.response-address", "{response_address}", "--RemoteProcessProxy.spark-context-initialization-mode", "lazy" ] }
  • 26. 26March 30 2018 / © 2018 IBM Corporation