Elyra - a set of AI-centric extensions to JupyterLab Notebooks.

Introducing Elyra -
Extending JupyterLab
for AI
—
Luciano Resende
IBM - CODAIT

About me - Luciano Resende
Open Source AI Platform Architect – IBM – CODAIT
• Senior Technical Staff Member at IBM, contributing to open source for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Toree, Apache Spark among other projects related to AI/ML platforms
lresende@us.ibm.com
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende

IBM Open Source
Participation
Learn
Open Source @ IBM
Program touches
78,000
IBMers annually
Consume
Virtually all
IBM products
contain some
open source
• 40,363 pkgs
Per Year
Contribute
• >62K OS Certs
per year
• ~10K IBM
commits per
month
Connect
> 1000
active IBM
Contributors
Working in key OS
projects

IBM Open Source
Participation
IBM generated open source innovation
• 137 IBM Open Code projects w/1000+ Github projects
• Projects graduates into full open governance: Node-Red,
OpenWhisk, SystemML, Blockchain fabric among others
• developer.ibm.com/code/open/code/
Community
• IBM focused on 18 strategic communities
• Drive open governance in “Centers of Gravity”
• IBM Leaders drive key technologies and assure freedom of
action
The IBM OS Way is now open sourced
• Training, Recognition, Tooling
• Organization, Consuming, Contributing

IBM’s history of
strong AI leadership
1997: Deep Blue
• Deep Blue became the first machine to beat a world chess champion in
tournament play
2011: Jeopardy!
• Watson beat two top Jeopardy!
champions
1968, 2001: A Space Odyssey
• IBM was a technical
advisor
• HAL is “the latest in
machine intelligence”
2018: Open Tech, AI & emerging standards
• New IBM centers of gravity for AI
• OS projects increasing exponentially
• Emerging global standards in AI
2018: Project Debater

Center for Open Source
Data and AI
Technologies
6
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
CODAIT
http://codait.org
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codait

Home Automation & Security
- Multiple connected or
standalone devices
- Controlled by Voice
- Amazon Echo (Alexa)
- Google Home
- Apple HomePod (Siri)

https://www.dezeen.com/2016/02/12/google-self-driving-car-artficial-intelligence-system-recognised-as-driver-usa/
https://medium.com/@DoorDash/welcoming-our-newest-robots-to-the-doordash-fleet-with-marble-e752a85d6602
Autonomous Driving
In 2016, Google's self-driving car
system has been officially recognized as
a driver in the US, paving the way for
the legalization of autonomous vehicles.
Doordash is currently testing self-
driving robots for food delivery.

AMAZON Go
AMAZON GO – No lines, no checkout,
just grab and go

AI is everywhere…
this means
AI is easy…

Jupyter Notebooks
Notebooks are interactive
computational environments,
in which you can combine
code execution, rich text,
mathematics, plots and rich
media.
14

Jupyter Notebook
Simple, but Powerful
As simple as opening a web
page, with the capabilities of
a powerful, multilingual,
development environment.
Interactive widgets
Code can produce rich
outputs such as images,
videos, markdown, LaTeX
and JavaScript. Interactive
widgets can be used to
manipulate and visualize
data in real-time.
Language of choice
Jupyter Notebooks have
support for over 50
programming languages,
including those popular in
Data Science, Data
Engineer, and AI such as
Python, R, Julia and Scala.
Big Data Integration
Leverage Big Data platforms
such as Apache Spark from
Python, R and Scala.
Explore the same data with
pandas, scikit-learn,
ggplot2, dplyr, etc.
Share Notebooks
Notebooks can be shared
with others using e-mail,
Dropbox, Google Drive,
GitHub, etc

Jupyter Notebook-Classic
Single page web interface:
- File Browser
- Code Console (QT Console)
- Text Editor
The Classic Notebook is starting to move
towards maintenance mode
• Community efforts being concentrated in
the new JupyterLab UI.
• Community continue to deliver bug-fixes
and security updates frequently

JupyterLab
JupyterLab is the next generation
UI for the Jupyter Ecosystem.
Bring all the previous
improvements into a single unified
platform plus more!
Provides a modular, extensible
architecture
Retains backward compatibility
with the old notebook we know
and love

JupyterLab
File Explorer
Widgets / Rich Output
Tabbed
Workspaces
Text Editor
Console/Terminal

Elyra
extends JupyterLab for AI

Elyra source code at GitHub
https://github.com/elyra-ai/elyra
Elyra Documentation
https://github.com/elyra-ai/elyra/blob/master/README.md
Elyra
Elyra is a set of AI centric
extensions to JupyterLab. It
aims to help data scientists,
machine learning engineers
and AI developer’s through the
model development life cycle
complexities.

EL YRA
Hybrid Runtime Support
Notebook Pipelines
JupyterLab Extensions
Hybrid runtime support
It simplifies the task of running the notebooks
interactively on cloud machines, improving productivity by
leveraging the power of cloud-based resources
Versioning using git
Simplify tracking changes, enabling better sharing
among teammates
Elyra provides a visual editor for building Notebook-
based AI pipelines, enabling the conversion of
multiple notebooks into batch jobs or workflows.
Notebook Pipelines editor
Notebook as batch jobs
Elyra extends the notebook UI to simplify the
submission of notebooks as a batch job for model
training
Python script execution
Exposes Python Scripts as first-class citizens allowing
users to locally edit their scripts and execute them
against local or cloud-based resources seamlessly.
Fork me at: github.com/elyra-ai

AI / Deep Learning
Workloads
Resource intensive workloads
Requires expensive hardware (GPU, TPU)
Heterogeneous frameworks
Long Running training jobs
– Simple MNIST takes over one hour
WITHOUT a decent GPU
– Other non complex deep learning model
training can easily take over a day WITH
GPUs

Training/Deploying Models requires a lot of DevOPS
Model
Serving
Monitoring
Resource
Management
Configuration
Hyperparameter
Optimization
Reproducibility

AI / Deep Learning
Workloads Challenges
• How to isolate the training environments to multiple jobs,
based on different deep learning frameworks (and/or
releases) can be submitted/trained on the same time.
• Ability to allocate individual system level resources such as
GPUs, TPUs, etc with different kernels for a period of time.
• Ability to allocate and free up system level resources such as
GPUs, TPUs, etc as they stop being used or when they are idle
for a period of time.

AI / Deep Learning
Workloads
Source: https://github.com/Langhalsdino/Kubernetes-GPU-Guide
Containers and Kubernetes Platform
- Containers simplify management of
complicated and heterogenous AI/Deep
Learning infrastructure providing a required
isolation layer to different pods running
different Deep Learning frameworks
- Containers provides a flexible way to deploy
applications and are here to stay
- Kubernetes enables easy management of
containerized applications and resources
with the benefit of Elasticity and Quality of
Services

AI Platforms
AI/Deep Learning Platforms aim to
abstract the DevOPS tasks from the
Data Scientist providing a consistent
way to develop AI models independent
of the toolkit/framework being used.

Kubeflow
• ML Toolkit for Kubernetes
• Open source and community driven
• Support multiple ML Frameworks
• End-to-end workflows that can be
shared, scaled and deployed

Kubeflow Pipelines
Kubeflow Pipelines is a platform for
building and deploying portable,
scalable machine learning (ML)
workflows based on Docker containers.
• End-to-end orchestration: enabling and simplifying the
orchestration of machine learning pipelines.
• Easy experimentation: making it easy for you to try
numerous ideas and techniques and manage your
various trials/experiments.
• Easy re-use: enabling you to re-use components and
pipelines to quickly create end-to-end solutions
without having to rebuild each time.

Kubeflow Pipelines
Two key takeaways : A Pipeline and a
Pipeline Component
A pipeline is a description of a machine
learning (ML) workflow, including all of
the workflow components and how they
work together.

Kubeflow Pipelines
A pipeline component is an
implementation of a pipeline task.
A component represents a step in the
workflow.

Kubeflow Pipelines
Each pipeline component is a container
that contains a program to perform the
task required for that particular step of
your workflow.

Model
Training
Model
Validation
Model
Deployment
Source A
(Batch)
Data
Ingestion/Pr
eparation
Source B
(Stream)
Data
Ingestion/Pr
eparation
Source C
(Fetch)
Data
Ingestion/Pr
eparation
Model
Testing
Features
(Snapshot)
Decompose Schedule/Run

Notebook as batch jobs
Model training can take
hours, if not days.
Elyra extends the
Notebook UI with a new
“submit notebook”
button that simplify the
submission of a single
notebook as a batch job.
Submit Notebook

Jupyter Enterprise Gateway website
https://Jupyter.org/enterprise_gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Supported Kernels
Supported Platforms
Jupyter Enterprise Gateway
Spectrum Conductor
A lightweight, multi-tenant,
scalable and secure gateway
that enables Jupyter
Notebooks to share resources
across an Apache Spark or
Kubernetes cluster for
Enterprise/Cloud use cases
+ +

Jupyter Enterprise
Gateway Features
Optimized Resource Allocation
– Utilize resources on all cluster nodes by running kernels
as Spark applications in YARN Cluster Mode.
– Pluggable architecture to enable support for additional
Resource Managers
Enhanced Security
– End-to-End secure communications
- Secure socket communications
- Encrypted HTTP communication using SSL
Multiuser support with user
impersonation
– Enhance security and sandboxing by enabling user
impersonation when running kernels (using Kerberos).
– Individual HDFS home folder for each notebook user.
– Use the same user ID for notebook and batch jobs.
Kernel
Kernel Kernel
Kernel
Kernel
Kernel
Kernel
16
32
48
64
0
20
40
60
80
4 Nodes 8 Nodes 12 Nodes 16 Nodes
xKernels(4GBHeap) Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS

Enterprise Gateway
& Kubernetes
Supported Platforms
Before Jupyter Enterprise Gateway …
- Resources required for all kernels needs to
be allocated during Notebook Server pod
creation
- Resources limited to what is physically
available on the host node that runs all
kernels and associated Spark drivers
After Jupyter Enterprise Gateway …
- Gateway pod very lightweight
- Kernels in their own pod, isolation
- Kernel pods built from community images:
Spark-on-K8s, TensorFlow, Keras, etc.
Before Enterprise Gateway After Enterprise Gateway

Bob
Alice
Jupyter
Enterprise
Gateway
Bob
Alice
Container images defined in kernelspec
Community image
Kernel
Spark on Kubernetes
Kernel
Jupyter Enterprise
Gateway - Kubernetes

Bob
Alice
Jupyter
Enterprise
Gateway
Bob
Alice
Container images defined in kernelspec
JupyterHub will provision
custom images containing
Notebook + NB2KG
extension
JupyterLab
Jupyter
Notebook
Community image
Kernel
Spark on Kubernetes
Kernel
Jupyter Enterprise Gateway - Kubernetes

Python Script
execution
• Create new Python
script from the
workspace launcher
• Navigate trough the
script via Table of
Content outline
• Execute the script
against local or cloud-
based resources
Execute Script Select Environment

Git Integration
Elyra provides integrated
support for git
repositories simplifying
tracking changes,
allowing rollback to
working versions of the
code, backups

Resources
Elyra source code at GitHub
https://github.com/elyra-ai/elyra
Elyra Documentation
https://github.com/elyra-ai/elyra/blob/master/README.md
Jupyter Enterprise Gateway
https://Jupyter.org/enterprise_gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Jupyter Blog
https://blog.jupyter.org/
STAR
US
&
FORK
US
ON
GITHUB

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Elyra - a set of AI-centric extensions to JupyterLab Notebooks.

Similar to Elyra - a set of AI-centric extensions to JupyterLab Notebooks. (20)

More from Luciano Resende

More from Luciano Resende (20)

Recently uploaded

Recently uploaded (20)

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.