SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Scaling Jupyter with
Jupyter Enterprise Gateway
Luciano Resende
Alan Chin
CODAIT - IBM
About me – Alan Chin
Sr. Software Engineer – Build and Infrastructure – CODAIT
• Over 3 years working with Open Source Projects (Apache SystemML, Apache Spark,
Apache Ambari
• Currently Contributing to the Jupyter Enterprise Gateway Project
akchin@us.ibm.com
https://www.linkedin.com/in/alankchin/
@AlanChin11
https://github.com/akchinSTC
IBM Developer / © 2019 IBM Corporation 2
About me - Luciano Resende
Open Source AI Platform Architect – IBM – CODAIT
• Senior Technical Staff Member at IBM, contributing to open source for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Toree, Apache Spark among other projects related to AI/ML platforms
lresende@us.ibm.com
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
IBM Developer / © 2019 IBM Corporation 3
IBM Open Source Contributions
IBM Developer / © 2019 IBM Corporation
Learn
Open Source @ IBM
Program touches
78,000
IBMers annually
Consume
Virtually all
IBM products
contain some
open source
• 40,363 pkgs
Per Year
Contribute
• >62K OS Certs per
year
• ~10K IBM commits
per month
• 1500+ GitHub repos
Connect
> 1000
active IBM
Contributors
Working in key OS
projects
4
IBM Open Source
Participation
IBM generated open source innovation
• 137 IBM Open Code projects w/1500+ Github projects
• Projects that have graduated into full open governance:
Jupyter Enterprise Gateway, Node-Red, OpenWhisk,
Apache SystemML, Blockchain Fabric
• https://developer.ibm.com/code/open/code/
Community
• IBM focused on 18 strategic communities
• Drive open governance in “Centers of Gravity”
• IBM Leaders drive key technologies and assure freedom
of action
The IBM OS Way is now open sourced
• Training, Recognition, Tooling
• Organization, Consuming, Contributing
5IBM Developer / © 2019 IBM Corporation
Center for Open Source
Data and AI
Technologies
6
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
6IBM Developer / © 2019 IBM Corporation
CODAIT
codait.org
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codait
Jupyter Notebooks
7IBM Developer / © 2019 IBM Corporation
Jupyter Notebooks
Notebooks are interactive
computational environments,
in which you can combine
code execution, rich text,
mathematics, plots and rich
media.
8IBM Developer / © 2019 IBM Corporation
Jupyter Notebook Platform Architecture
Notebook UI runs on the browser
The Notebook Server serves the
‘Notebooks’
Kernels interpret/execute cell contents
Are responsible for code execution
Abstracts different languages
1:1 relationship with Notebook
Runs and consume resources as long as
notebook is running
9IBM Developer / © 2019 IBM Corporation
Jupyter Notebook
Interactive Workloads
10IBM Developer / © 2019 IBM Corporation
Analytics Workloads
• Large amount of data
• Shared across organization in Data
Lakes
• Multiple workload types
Data cleansing
Data Warehouse
Machine Learning and Insights
11IBM Developer / © 2019 IBM Corporation
AI / Deep Learning Workloads
Resource intensive workloads
Requires expensive hardware (GPU,
TPU)
Long Running training jobs
Simple MNIST takes over one hour
WITHOUT a decent GPU
Other non complex deep learning
model training can easily take over a
day WITH GPUs
12IBM Developer / © 2019 IBM Corporation
Local Development Environment
IBM Developer / © 2019 IBM Corporation 13
Analytic and AI
Platforms
Large pool of shared computing
resources
- Enterprise Cloud, Public Cloud or Hybrid
- Shared Data (Data Lakes/Object Storage)
Distributed Consumers
- Notebooks running local (users laptop)
or as a service (e.g. Jupyter Hub)
Different Resource Utilization Patterns
- High number of idle resources
IBM Developer / © 2019 IBM Corporation 14
Jupyter Notebook Stack
Limitations
Kernel
Kernel
Kernel
Kernel
Kernel
Scalability
- Jupyter Kernels running as local process
- Resources are limited by what is available
on the one single node that runs all Kernels
and associated Spark drivers
Security
- Single user sharing the same privileges
- Users can see and control each other process
using Jupyter administrative utilities
8 8 8 8
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 Nodes
MaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF SIMULTANEOUS KERNELS
IBM Developer / © 2019 IBM Corporation 15
Jupyter Enterprise Gateway
16IBM Developer / © 2019 IBM Corporation
Jupyter Enterprise Gateway website
https://Jupyter.org/enterprise_gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter-incubator/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Supported Kernels
Supported Platforms
Jupyter Enterprise Gateway
Spectrum Conductor
+
A lightweight, multi-tenant,
scalable and secure gateway
that enables Jupyter
Notebooks to share resources
across an Apache Spark or
Kubernetes cluster for
Enterprise/Cloud use cases
IBM Developer / © 2019 IBM Corporation 17
+
Jupyter Enterprise Gateway Features
Optimized Resource Allocation
Utilize resources on all cluster nodes by running kernels
as Spark applications in YARN Cluster Mode.
Pluggable architecture to enable support for additional
Resource Managers
Enhanced Security
End-to-End secure communications
- Secure socket communications
- Encrypted HTTP communication using SSL
Multiuser support with user
impersonation
Enhance security and sandboxing by enabling user
impersonation when running kernels (using Kerberos).
Individual HDFS home folder for each notebook user.
Use the same user ID for notebook and batch jobs.
Kernel
Kernel Kernel
Kernel
Kernel
Kernel
Kernel
16
32
48
64
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 Nodes
MaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF SIMULTANEOUS KERNELS
IBM Developer / © 2019 IBM Corporation 18
Jupyter Notebooks
and Kubernetes
19IBM Developer / © 2019 IBM Corporation
Deep Learning Workloads
Resource Intensive workloads
Requires expensive hardware (GPU,
TPU)
Long Running training jobs
- Simple MNIST takes over one hour
WITHOUT a decent GPU
- Other non complex deep learning model
training can easily take over a day WITH
GPUs
IBM Developer / © 2019 IBM Corporation 20
Jupyter & Kubernetes
Kubernetes Platform
- Containers provides a flexible way to
deploy applications and are here to stay
- Containers simplify management of
complicated and heterogenous AI/Deep
Learning infrastructure
- Kubernetes enables easy management of
containerized applications and resources
with the benefit of Elasticity and Quality of
Services
Source: https://github.com/Langhalsdino/Kubernetes-GPU-Guide
IBM Developer / © 2019 IBM Corporation 21
Enterprise Gateway
& Kubernetes
Supported Platforms
Before Jupyter Enterprise Gateway …
- Resources required for all kernels needs to
be allocated during Notebook Server pod
creation
- Resources limited to what is physically
available on the host node that runs all
kernels and associated Spark drivers
After Jupyter Enterprise Gateway …
- Gateway pod very lightweight
- Kernels in their own pod, isolation
- Kernel pods built from community images:
Spark-on-K8s, TensorFlow, Keras, etc.
FfDL
Before Enterprise Gateway After Enterprise Gateway
IBM Developer / © 2019 IBM Corporation 22
Bob
Alice
Jupyter
Enterprise
Gateway
Bob
Alice
Container images defined in kernelspec
Community image
Kernel
Spark on Kubernetes
Kernel
Jupyter Enterprise Gateway - Kubernetes
IBM Developer / © 2019 IBM Corporation 23
Bob
Alice
Jupyter
Enterprise
Gateway
Bob
Alice
Container images defined in kernelspec
JupyterHub will provision
custom images containing
Notebook + NB2KG
extension
JupyterLab
Jupyter
Notebook
Community image
Kernel
Spark on Kubernetes
Kernel
Jupyter Enterprise Gateway - Kubernetes
IBM Developer / © 2019 IBM Corporation 25
Jupyter & Kubernetes
• Multi-user Enterprise Gateway pod
• Each kernel launched on it’s own pod
• Kernel pod namespace is configurable
IBM Developer / © 2019 IBM Corporation 26
Configuration
Jupyter Kernels are configured by
kernelspecs
- Each kernel has a correspondent kernelspec
- Stored in one of the Jupyter data path
directory
- $ jupyter kernelspec list
/…/anaconda3/share/jupyter/kernels/python2/kernel.jsom
IBM Developer / © 2019 IBM Corporation 27
Configurations
Process Proxy:
• Abstracts kernel process represented by Jupyter
framework
• Pluggable class definition identified in kernelspec
(kernel.json)
• Manages kernel lifecycle
Kernel Launcher:
• Embeds target kernel
• Listens on gateway communication port
• Conveys interrupt requests (via local signal)
• Could be extended for additional communications
{
"language": "python",
"display_name": "Spark - Python (Kubernetes Mode)",
"process_proxy": {
"class_name":
"enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy",
"config": {
"image_name": "elyra/kubernetes-kernel-py:dev",
"executor_image_name": "elyra/kubernetes-kernel-py:dev”,
"port_range" : "40000..42000"
}
},
"env": {
"SPARK_HOME": "/opt/spark",
"SPARK_OPTS": "--master k8s://https://${KUBERNETES_SERVICE_HOST --deploy-
mode cluster --name …",
…
},
"argv": [
"/usr/local/share/jupyter/kernels/spark_scala_yarn_cluster/bin/run.sh",
"--RemoteProcessProxy.kernel-id",
"{kernel_id}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.port-range",
"{port_range}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
IBM Developer / © 2019 IBM Corporation 28
Spectrum Conductor
+
Supported
Runtime
Platforms
J U P Y T E R E N T E R P R I S E G A T E W A Y
Remote
Kernel Manager
Distributed
Process Proxy
YARN Cluster
Process Proxy
Kubernetes
Process Proxy
Conductor Cluster
Process Proxy
J U P Y T E R N O T E B O O K UI
NB2KG Extension
J U P Y T E R K E R N E L G A T E W A Y
J U P Y T E R N O T E B O O K
FfDL
P R O G R A M M A T I C A P I
Docker
Process Proxy
Jupyter Enterprise Gateway Components
IBM Developer / © 2019 IBM Corporation 29
+
With Notebook
6.0, the NB2KG
capabilities have
been integrated
into the Notebook
server.
For
programmatically
access, we have a
experimental
Enterprise
Gateway client
that enable
request a kernel
and submit code
very simply.
Summary
IBM Developer / © 2019 IBM Corporation 30
Interactive Workloads
across Kubernetes Cluster
+
• Enable support to
remote kernels in order
to scale Notebook
across entire cluster
• Multitenant with support
for user impersonation
leveraging Kerberos
• Base container image
becomes a choice (e.g.
Python with Tensorflow)
J U P Y T E R
E N T E R P R I S E G A T E W A Y
S U P P O R T E D
K E R N E L S
S U P P O R T E D
R U N T I M E S
IBM Developer / © 2019 IBM Corporation 31
+
Other resources
Jupyter Enterprise Gateway
https://Jupyter.org/enterprise_gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Jupyter Enterprise Gateway Gitter
https://gitter.im/jupyter/enterprise_gateway
Jupyter Blog
https://blog.jupyter.org/
IBM Developer / © 2019 IBM Corporation 32
Stable Release - EG 1.2.0
(Analytics Workload with Spark running
YARN cluster mode support)
pip install jupyter_enterprise_gateway
Beta Release - EG 2.0.0 RC1
Introduce support for AI Workloads on
Kubernetes
pip install --pre jupyter_enterprise_gateway
STAR
US
&
FORK
US
ON
GITHUB
Thank you!
@lresende1975
@AlanChin11
IBM Developer / © 2019 IBM Corporation 33

Weitere ähnliche Inhalte

Was ist angesagt?

Hive Bucketing in Apache Spark
Hive Bucketing in Apache SparkHive Bucketing in Apache Spark
Hive Bucketing in Apache SparkTejas Patil
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introductionleanderlee2
 
Webinar: PostgreSQL continuous backup and PITR with Barman
Webinar: PostgreSQL continuous backup and PITR with BarmanWebinar: PostgreSQL continuous backup and PITR with Barman
Webinar: PostgreSQL continuous backup and PITR with BarmanGabriele Bartolini
 
Beyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka EcosystemBeyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka Ecosystemconfluent
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow ArchitectureGerard Toonstra
 
Oracle MAA (Maximum Availability Architecture) 18c - An Overview
Oracle MAA (Maximum Availability Architecture) 18c - An OverviewOracle MAA (Maximum Availability Architecture) 18c - An Overview
Oracle MAA (Maximum Availability Architecture) 18c - An OverviewMarkus Michalewicz
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceDatabricks
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackMichel Tricot
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalDatabricks
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowJulien Le Dem
 

Was ist angesagt? (20)

Hive Bucketing in Apache Spark
Hive Bucketing in Apache SparkHive Bucketing in Apache Spark
Hive Bucketing in Apache Spark
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Oracle archi ppt
Oracle archi pptOracle archi ppt
Oracle archi ppt
 
Webinar: PostgreSQL continuous backup and PITR with Barman
Webinar: PostgreSQL continuous backup and PITR with BarmanWebinar: PostgreSQL continuous backup and PITR with Barman
Webinar: PostgreSQL continuous backup and PITR with Barman
 
Beyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka EcosystemBeyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka Ecosystem
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Oracle MAA (Maximum Availability Architecture) 18c - An Overview
Oracle MAA (Maximum Availability Architecture) 18c - An OverviewOracle MAA (Maximum Availability Architecture) 18c - An Overview
Oracle MAA (Maximum Availability Architecture) 18c - An Overview
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 

Ähnlich wie Strata - Scaling Jupyter with Jupyter Enterprise Gateway

Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsLuciano Resende
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksLuciano Resende
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Codemotion
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkLuciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewLuciano Resende
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayLuciano Resende
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gatewayLuciano Resende
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsLuciano Resende
 
IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...
IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...
IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...IBM Connections Developers
 
Connect 2014 - Key108 - Application Development Strategy
Connect 2014 - Key108  - Application Development StrategyConnect 2014 - Key108  - Application Development Strategy
Connect 2014 - Key108 - Application Development StrategyPhilippe Riand
 
IBM: The Linux Ecosystem
IBM: The Linux EcosystemIBM: The Linux Ecosystem
IBM: The Linux EcosystemKangaroot
 
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerIndrajit Poddar
 
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Indrajit Poddar
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Social Applications made easy with the new Social Business Toolkit SDK
Social Applications made easy with the new Social Business Toolkit SDKSocial Applications made easy with the new Social Business Toolkit SDK
Social Applications made easy with the new Social Business Toolkit SDKIBM Connections Developers
 
Srikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latestSrikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latestSrikanth Pilli
 
Randstad Docker meetup - Serverless
Randstad Docker meetup - ServerlessRandstad Docker meetup - Serverless
Randstad Docker meetup - ServerlessDavid Delabassee
 

Ähnlich wie Strata - Scaling Jupyter with Jupyter Enterprise Gateway (20)

Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooks
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...
IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...
IBM Connect 2014 - KEY108: IBM Collaboration Solutions Application Developmen...
 
Connect 2014 - Key108 - Application Development Strategy
Connect 2014 - Key108  - Application Development StrategyConnect 2014 - Key108  - Application Development Strategy
Connect 2014 - Key108 - Application Development Strategy
 
IBM: The Linux Ecosystem
IBM: The Linux EcosystemIBM: The Linux Ecosystem
IBM: The Linux Ecosystem
 
The Personal Assistant
The Personal AssistantThe Personal Assistant
The Personal Assistant
 
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerFast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
 
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Social Applications made easy with the new Social Business Toolkit SDK
Social Applications made easy with the new Social Business Toolkit SDKSocial Applications made easy with the new Social Business Toolkit SDK
Social Applications made easy with the new Social Business Toolkit SDK
 
Srikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latestSrikanth_PILLI_CV_latest
Srikanth_PILLI_CV_latest
 
Randstad Docker meetup - Serverless
Randstad Docker meetup - ServerlessRandstad Docker meetup - Serverless
Randstad Docker meetup - Serverless
 

Mehr von Luciano Resende

A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...Luciano Resende
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirLuciano Resende
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsLuciano Resende
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirLuciano Resende
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine LearningLuciano Resende
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceLuciano Resende
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningLuciano Resende
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overviewLuciano Resende
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitionsLuciano Resende
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceLuciano Resende
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSLuciano Resende
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscanyLuciano Resende
 

Mehr von Luciano Resende (20)

A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernels
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overview
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open source
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
 
SCA Reaches the Cloud
SCA Reaches the CloudSCA Reaches the Cloud
SCA Reaches the Cloud
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscany
 

Kürzlich hochgeladen

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Kürzlich hochgeladen (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

Strata - Scaling Jupyter with Jupyter Enterprise Gateway

  • 1. Scaling Jupyter with Jupyter Enterprise Gateway Luciano Resende Alan Chin CODAIT - IBM
  • 2. About me – Alan Chin Sr. Software Engineer – Build and Infrastructure – CODAIT • Over 3 years working with Open Source Projects (Apache SystemML, Apache Spark, Apache Ambari • Currently Contributing to the Jupyter Enterprise Gateway Project akchin@us.ibm.com https://www.linkedin.com/in/alankchin/ @AlanChin11 https://github.com/akchinSTC IBM Developer / © 2019 IBM Corporation 2
  • 3. About me - Luciano Resende Open Source AI Platform Architect – IBM – CODAIT • Senior Technical Staff Member at IBM, contributing to open source for over 10 years • Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Toree, Apache Spark among other projects related to AI/ML platforms lresende@us.ibm.com https://www.linkedin.com/in/lresende @lresende1975 https://github.com/lresende IBM Developer / © 2019 IBM Corporation 3
  • 4. IBM Open Source Contributions IBM Developer / © 2019 IBM Corporation Learn Open Source @ IBM Program touches 78,000 IBMers annually Consume Virtually all IBM products contain some open source • 40,363 pkgs Per Year Contribute • >62K OS Certs per year • ~10K IBM commits per month • 1500+ GitHub repos Connect > 1000 active IBM Contributors Working in key OS projects 4
  • 5. IBM Open Source Participation IBM generated open source innovation • 137 IBM Open Code projects w/1500+ Github projects • Projects that have graduated into full open governance: Jupyter Enterprise Gateway, Node-Red, OpenWhisk, Apache SystemML, Blockchain Fabric • https://developer.ibm.com/code/open/code/ Community • IBM focused on 18 strategic communities • Drive open governance in “Centers of Gravity” • IBM Leaders drive key technologies and assure freedom of action The IBM OS Way is now open sourced • Training, Recognition, Tooling • Organization, Consuming, Contributing 5IBM Developer / © 2019 IBM Corporation
  • 6. Center for Open Source Data and AI Technologies 6 CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission 6IBM Developer / © 2019 IBM Corporation CODAIT codait.org codait (French) = coder/coded https://m.interglot.com/fr/en/codait
  • 7. Jupyter Notebooks 7IBM Developer / © 2019 IBM Corporation
  • 8. Jupyter Notebooks Notebooks are interactive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media. 8IBM Developer / © 2019 IBM Corporation
  • 9. Jupyter Notebook Platform Architecture Notebook UI runs on the browser The Notebook Server serves the ‘Notebooks’ Kernels interpret/execute cell contents Are responsible for code execution Abstracts different languages 1:1 relationship with Notebook Runs and consume resources as long as notebook is running 9IBM Developer / © 2019 IBM Corporation
  • 10. Jupyter Notebook Interactive Workloads 10IBM Developer / © 2019 IBM Corporation
  • 11. Analytics Workloads • Large amount of data • Shared across organization in Data Lakes • Multiple workload types Data cleansing Data Warehouse Machine Learning and Insights 11IBM Developer / © 2019 IBM Corporation
  • 12. AI / Deep Learning Workloads Resource intensive workloads Requires expensive hardware (GPU, TPU) Long Running training jobs Simple MNIST takes over one hour WITHOUT a decent GPU Other non complex deep learning model training can easily take over a day WITH GPUs 12IBM Developer / © 2019 IBM Corporation
  • 13. Local Development Environment IBM Developer / © 2019 IBM Corporation 13
  • 14. Analytic and AI Platforms Large pool of shared computing resources - Enterprise Cloud, Public Cloud or Hybrid - Shared Data (Data Lakes/Object Storage) Distributed Consumers - Notebooks running local (users laptop) or as a service (e.g. Jupyter Hub) Different Resource Utilization Patterns - High number of idle resources IBM Developer / © 2019 IBM Corporation 14
  • 15. Jupyter Notebook Stack Limitations Kernel Kernel Kernel Kernel Kernel Scalability - Jupyter Kernels running as local process - Resources are limited by what is available on the one single node that runs all Kernels and associated Spark drivers Security - Single user sharing the same privileges - Users can see and control each other process using Jupyter administrative utilities 8 8 8 8 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 Nodes MaxKernels(4GBHeap) Cluster Size (32GB Nodes) MAXIMUM NUMBER OF SIMULTANEOUS KERNELS IBM Developer / © 2019 IBM Corporation 15
  • 16. Jupyter Enterprise Gateway 16IBM Developer / © 2019 IBM Corporation
  • 17. Jupyter Enterprise Gateway website https://Jupyter.org/enterprise_gateway/ Jupyter Enterprise Gateway source code at GitHub https://github.com/jupyter-incubator/enterprise_gateway Jupyter Enterprise Gateway Documentation http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ Supported Kernels Supported Platforms Jupyter Enterprise Gateway Spectrum Conductor + A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark or Kubernetes cluster for Enterprise/Cloud use cases IBM Developer / © 2019 IBM Corporation 17 +
  • 18. Jupyter Enterprise Gateway Features Optimized Resource Allocation Utilize resources on all cluster nodes by running kernels as Spark applications in YARN Cluster Mode. Pluggable architecture to enable support for additional Resource Managers Enhanced Security End-to-End secure communications - Secure socket communications - Encrypted HTTP communication using SSL Multiuser support with user impersonation Enhance security and sandboxing by enabling user impersonation when running kernels (using Kerberos). Individual HDFS home folder for each notebook user. Use the same user ID for notebook and batch jobs. Kernel Kernel Kernel Kernel Kernel Kernel Kernel 16 32 48 64 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 Nodes MaxKernels(4GBHeap) Cluster Size (32GB Nodes) MAXIMUM NUMBER OF SIMULTANEOUS KERNELS IBM Developer / © 2019 IBM Corporation 18
  • 19. Jupyter Notebooks and Kubernetes 19IBM Developer / © 2019 IBM Corporation
  • 20. Deep Learning Workloads Resource Intensive workloads Requires expensive hardware (GPU, TPU) Long Running training jobs - Simple MNIST takes over one hour WITHOUT a decent GPU - Other non complex deep learning model training can easily take over a day WITH GPUs IBM Developer / © 2019 IBM Corporation 20
  • 21. Jupyter & Kubernetes Kubernetes Platform - Containers provides a flexible way to deploy applications and are here to stay - Containers simplify management of complicated and heterogenous AI/Deep Learning infrastructure - Kubernetes enables easy management of containerized applications and resources with the benefit of Elasticity and Quality of Services Source: https://github.com/Langhalsdino/Kubernetes-GPU-Guide IBM Developer / © 2019 IBM Corporation 21
  • 22. Enterprise Gateway & Kubernetes Supported Platforms Before Jupyter Enterprise Gateway … - Resources required for all kernels needs to be allocated during Notebook Server pod creation - Resources limited to what is physically available on the host node that runs all kernels and associated Spark drivers After Jupyter Enterprise Gateway … - Gateway pod very lightweight - Kernels in their own pod, isolation - Kernel pods built from community images: Spark-on-K8s, TensorFlow, Keras, etc. FfDL Before Enterprise Gateway After Enterprise Gateway IBM Developer / © 2019 IBM Corporation 22
  • 23. Bob Alice Jupyter Enterprise Gateway Bob Alice Container images defined in kernelspec Community image Kernel Spark on Kubernetes Kernel Jupyter Enterprise Gateway - Kubernetes IBM Developer / © 2019 IBM Corporation 23
  • 24. Bob Alice Jupyter Enterprise Gateway Bob Alice Container images defined in kernelspec JupyterHub will provision custom images containing Notebook + NB2KG extension JupyterLab Jupyter Notebook Community image Kernel Spark on Kubernetes Kernel Jupyter Enterprise Gateway - Kubernetes
  • 25. IBM Developer / © 2019 IBM Corporation 25
  • 26. Jupyter & Kubernetes • Multi-user Enterprise Gateway pod • Each kernel launched on it’s own pod • Kernel pod namespace is configurable IBM Developer / © 2019 IBM Corporation 26
  • 27. Configuration Jupyter Kernels are configured by kernelspecs - Each kernel has a correspondent kernelspec - Stored in one of the Jupyter data path directory - $ jupyter kernelspec list /…/anaconda3/share/jupyter/kernels/python2/kernel.jsom IBM Developer / © 2019 IBM Corporation 27
  • 28. Configurations Process Proxy: • Abstracts kernel process represented by Jupyter framework • Pluggable class definition identified in kernelspec (kernel.json) • Manages kernel lifecycle Kernel Launcher: • Embeds target kernel • Listens on gateway communication port • Conveys interrupt requests (via local signal) • Could be extended for additional communications { "language": "python", "display_name": "Spark - Python (Kubernetes Mode)", "process_proxy": { "class_name": "enterprise_gateway.services.processproxies.k8s.KubernetesProcessProxy", "config": { "image_name": "elyra/kubernetes-kernel-py:dev", "executor_image_name": "elyra/kubernetes-kernel-py:dev”, "port_range" : "40000..42000" } }, "env": { "SPARK_HOME": "/opt/spark", "SPARK_OPTS": "--master k8s://https://${KUBERNETES_SERVICE_HOST --deploy- mode cluster --name …", … }, "argv": [ "/usr/local/share/jupyter/kernels/spark_scala_yarn_cluster/bin/run.sh", "--RemoteProcessProxy.kernel-id", "{kernel_id}", "--RemoteProcessProxy.response-address", "{response_address}", "--RemoteProcessProxy.port-range", "{port_range}", "--RemoteProcessProxy.spark-context-initialization-mode", "lazy" ] } IBM Developer / © 2019 IBM Corporation 28
  • 29. Spectrum Conductor + Supported Runtime Platforms J U P Y T E R E N T E R P R I S E G A T E W A Y Remote Kernel Manager Distributed Process Proxy YARN Cluster Process Proxy Kubernetes Process Proxy Conductor Cluster Process Proxy J U P Y T E R N O T E B O O K UI NB2KG Extension J U P Y T E R K E R N E L G A T E W A Y J U P Y T E R N O T E B O O K FfDL P R O G R A M M A T I C A P I Docker Process Proxy Jupyter Enterprise Gateway Components IBM Developer / © 2019 IBM Corporation 29 + With Notebook 6.0, the NB2KG capabilities have been integrated into the Notebook server. For programmatically access, we have a experimental Enterprise Gateway client that enable request a kernel and submit code very simply.
  • 30. Summary IBM Developer / © 2019 IBM Corporation 30
  • 31. Interactive Workloads across Kubernetes Cluster + • Enable support to remote kernels in order to scale Notebook across entire cluster • Multitenant with support for user impersonation leveraging Kerberos • Base container image becomes a choice (e.g. Python with Tensorflow) J U P Y T E R E N T E R P R I S E G A T E W A Y S U P P O R T E D K E R N E L S S U P P O R T E D R U N T I M E S IBM Developer / © 2019 IBM Corporation 31 +
  • 32. Other resources Jupyter Enterprise Gateway https://Jupyter.org/enterprise_gateway/ Jupyter Enterprise Gateway source code at GitHub https://github.com/jupyter/enterprise_gateway Jupyter Enterprise Gateway Documentation http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ Jupyter Enterprise Gateway Gitter https://gitter.im/jupyter/enterprise_gateway Jupyter Blog https://blog.jupyter.org/ IBM Developer / © 2019 IBM Corporation 32 Stable Release - EG 1.2.0 (Analytics Workload with Spark running YARN cluster mode support) pip install jupyter_enterprise_gateway Beta Release - EG 2.0.0 RC1 Introduce support for AI Workloads on Kubernetes pip install --pre jupyter_enterprise_gateway STAR US & FORK US ON GITHUB