SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Elyra - Extending
JupyterLab for AI
Using Elyra for
COVID-19 Analytics
—
Luciano Resende
IBM - CODAIT
About me - Luciano Resende
Open Source AI Platform Architect – IBM – CODAIT
• Senior Technical Staff Member at IBM, contributing to open source for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Toree, Apache Spark among other projects related to AI/ML platforms
lresende@us.ibm.com
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
IBM Open Source
Participation
Learn
Open Source @ IBM
Program touches
78,000
IBMers annually
Consume
Virtually all
IBM products
contain some
open source
• 40,363 pkgs
Per Year
Contribute
• >62K OS Certs
per year
• ~10K IBM
commits per
month
Connect
> 1000
active IBM
Contributors
Working in key OS
projects
IBM Open Source
Participation
IBM generated open source innovation
• 137 IBM Open Code projects w/1000+ Github projects
• Projects graduates into full open governance: Node-Red,
OpenWhisk, SystemML, Blockchain fabric among others
• developer.ibm.com/code/open/code/
Community
• IBM focused on 18 strategic communities
• Drive open governance in “Centers of Gravity”
• IBM Leaders drive key technologies and assure freedom of
action
The IBM OS Way is now open sourced
• Training, Recognition, Tooling
• Organization, Consuming, Contributing
Center for Open Source
Data and AI
Technologies
5
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
CODAIT
http://codait.org
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codait
Agenda
- Introduction to the
COVID-19 scenario
- Introduction to Elyra
- Walkthrough the COVID-
19 analytic scenario
exploring the Elyra
features
https://fivethirtyeight.com/features/a-comic-strip-tour-of-the-wild-world-of-pandemic-modeling
IBM Developer / CODAIT / © 2020 IBM Corporation 8
Leveraging analytics to navigate unprecedented times - Getty
Frederick Reiss
IBM CODAIT
Romeo Kienzler
IBM CODAIT
Challenges implementing
COVID-notebooks
9
- How to break apart tasks that are
expensive to run
- Data preparation
- Once data is prepared, analytics can run
multiple times on that same data
- Data updates are frequently
- When data is updated, how to ensure all
tasks are executed on the right order
- How to collaborate and share my
artifacts
But what is
Elyra ?
Elyra
Elyra is a set of
AI centric extensions for
JupyterLab
Elyra was officially
announced as an open
source project by IBM on
April 29th.
The name Elyra is a word
play with one of the Jupyter
moons “Elara” where we
introduce the “y” from
“Jupyter” to make it “Elyra”
JupyterLab
JupyterLab is the next generation
UI for the Jupyter Ecosystem.
Bring all the previous
improvements into a single unified
platform plus more!
Provides a modular, extensible
architecture
Retains backward compatibility
with the old notebook we know
and love
JupyterLab
File Explorer
Widgets / Rich Output
Tabbed
Workspaces
Text Editor
Console/Terminal
Elyra at GitHub
https://github.com/elyra-ai/elyra
Elyra Documentation
https://elyra.readthedocs.io/en/latest/
Elyra
Elyra is a set of AI centric
extensions to JupyterLab. It
aims to help data scientists,
machine learning engineers
and AI developer’s through the
model development life cycle
complexities.
ELYRA
Hybrid Runtime Support
Notebook Pipelines
JupyterLab Extensions
Hybrid runtime support
It simplifies the task of running the notebooks
interactively on cloud machines, improving productivity by
leveraging the power of cloud-based resources
Versioning using git
Simplify tracking changes, enabling better sharing
among teammates
Elyra provides a visual editor for building Notebook-
based AI pipelines, enabling the conversion of
multiple notebooks into batch jobs or workflows.
Notebook Pipelines editor
Notebook as batch jobs
Elyra extends the notebook UI to simplify the
submission of notebooks as a batch job for model
training
Python script execution
Exposes Python Scripts as first-class citizens allowing
users to locally edit their scripts and execute them
against local or cloud-based resources seamlessly.
Fork me at: github.com/elyra-ai
Resources
Elyra source code at GitHub
https://github.com/elyra-ai/elyra
Elyra Documentation
https://elyra.readthedocs.io/en/latest/
Elyra announcement and demo video
https://www.youtube.com/watch?v=PuGNijkV5PQ
COVID-19 analytics scenario using Elyra by Fred Reiss
https://www.youtube.com/watch?v=CbcgyzB8c4M&t
STAR
US
&
FORK
US
ON
GITHUB
Backup
Slides
Pipelines
AI / Deep Learning
Workloads
Resource intensive workloads
Requires expensive hardware (GPU, TPU)
Heterogeneous frameworks
Long Running training jobs
– Simple MNIST takes over one hour
WITHOUT a decent GPU
– Other non complex deep learning model
training can easily take over a day WITH
GPUs
Training/Deploying Models requires a lot of DevOPS
Model
Serving
Monitoring
Resource
Management
Configuration
Hyperparameter
Optimization
Reproducibility
AI / Deep Learning
Workloads Challenges
• How to isolate the training environments to multiple jobs,
based on different deep learning frameworks (and/or
releases) can be submitted/trained on the same time.
• Ability to allocate individual system level resources such as
GPUs, TPUs, etc with different kernels for a period of time.
• Ability to allocate and free up system level resources such as
GPUs, TPUs, etc as they stop being used or when they are idle
for a period of time.
AI / Deep Learning
Workloads
Source: https://github.com/Langhalsdino/Kubernetes-GPU-Guide
Containers and Kubernetes Platform
- Containers simplify management of
complicated and heterogenous AI/Deep
Learning infrastructure providing a required
isolation layer to different pods running
different Deep Learning frameworks
- Containers provides a flexible way to deploy
applications and are here to stay
- Kubernetes enables easy management of
containerized applications and resources
with the benefit of Elasticity and Quality of
Services
AI Platforms
AI/Deep Learning Platforms aim to
abstract the DevOPS tasks from the
Data Scientist providing a consistent
way to develop AI models independent
of the toolkit/framework being used.
Kubeflow
• ML Toolkit for Kubernetes
• Open source and community driven
• Support multiple ML Frameworks
• End-to-end workflows that can be
shared, scaled and deployed
Kubeflow Pipelines
Kubeflow Pipelines is a platform for
building and deploying portable,
scalable machine learning (ML)
workflows based on Docker containers.
• End-to-end orchestration: enabling and simplifying the
orchestration of machine learning pipelines.
• Easy experimentation: making it easy for you to try
numerous ideas and techniques and manage your
various trials/experiments.
• Easy re-use: enabling you to re-use components and
pipelines to quickly create end-to-end solutions
without having to rebuild each time.
Kubeflow Pipelines
Two key takeaways : A Pipeline and a
Pipeline Component
A pipeline is a description of a machine
learning (ML) workflow, including all of
the workflow components and how they
work together.
Kubeflow Pipelines
A pipeline component is an
implementation of a pipeline task.
A component represents a step in the
workflow.
Kubeflow Pipelines
Each pipeline component is a container
that contains a program to perform the
task required for that particular step of
your workflow.
Model
Training
Model
Validation
Model
Deployment
Source A
(Batch)
Data
Ingestion/Pr
eparation
Source B
(Stream)
Data
Ingestion/Pr
eparation
Source C
(Fetch)
Data
Ingestion/Pr
eparation
Model
Testing
Features
(Snapshot)
Decompose Schedule/Run
Kubeflow Pipelines
Notebooks
as batch jobs
Notebook as batch jobs
Model training can take
hours, if not days.
Elyra extends the
Notebook UI with a new
“submit notebook”
button that simplify the
submission of a single
notebook as a batch job.
Submit Notebook
Hybrid Runtime
Support
Jupyter Enterprise Gateway website
https://Jupyter.org/enterprise_gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Supported Kernels
Supported Platforms
Jupyter Enterprise Gateway
Spectrum Conductor
A lightweight, multi-tenant,
scalable and secure gateway
that enables Jupyter
Notebooks to share resources
across an Apache Spark or
Kubernetes cluster for
Enterprise/Cloud use cases
+ +
Jupyter Enterprise
Gateway Features
Optimized Resource Allocation
– Utilize resources on all cluster nodes by running kernels
as Spark applications in YARN Cluster Mode.
– Pluggable architecture to enable support for additional
Resource Managers
Enhanced Security
– End-to-End secure communications
- Secure socket communications
- Encrypted HTTP communication using SSL
Multiuser support with user
impersonation
– Enhance security and sandboxing by enabling user
impersonation when running kernels (using Kerberos).
– Individual HDFS home folder for each notebook user.
– Use the same user ID for notebook and batch jobs.
Kernel
Kernel Kernel
Kernel
Kernel
Kernel
Kernel
16
32
48
64
0
20
40
60
80
4 Nodes 8 Nodes 12 Nodes 16 Nodes
xKernels(4GBHeap) Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS
Enterprise Gateway
& Kubernetes
Supported Platforms
Before Jupyter Enterprise Gateway …
- Resources required for all kernels needs to
be allocated during Notebook Server pod
creation
- Resources limited to what is physically
available on the host node that runs all
kernels and associated Spark drivers
After Jupyter Enterprise Gateway …
- Gateway pod very lightweight
- Kernels in their own pod, isolation
- Kernel pods built from community images:
Spark-on-K8s, TensorFlow, Keras, etc.
Before Enterprise Gateway After Enterprise Gateway
Bob
Alice
Jupyter
Enterprise
Gateway
Bob
Alice
Container images defined in kernelspec
Community image
Kernel
Spark on Kubernetes
Kernel
Jupyter Enterprise
Gateway - Kubernetes
Bob
Alice
Jupyter
Enterprise
Gateway
Bob
Alice
Container images defined in kernelspec
JupyterHub will provision
custom images containing
Notebook + NB2KG
extension
JupyterLab
Jupyter
Notebook
Community image
Kernel
Spark on Kubernetes
Kernel
Jupyter Enterprise Gateway - Kubernetes
Python script
execution
Python Script
execution
• Create new Python
script from the
workspace launcher
• Navigate trough the
script via Table of
Content outline
• Execute the script
against local or cloud-
based resources
Execute Script Select Environment
Git
Integration
Git Integration
Elyra provides integrated
support for git
repositories simplifying
tracking changes,
allowing rollback to
working versions of the
code, backups
Resources
Elyra source code at GitHub
https://github.com/elyra-ai/elyra
Elyra Documentation
https://elyra.readthedocs.io/en/latest/
Elyra announcement and demo video
https://developer.ibm.com/technologies/artificial-intelligence/blogs/open-source-elyra-ai-toolkit-simplifies-data-model-development/
COVID-19 analytics scenario using Elyra by Fred Reiss
https://www.youtube.com/watch?v=CbcgyzB8c4M&t
STAR
US
&
FORK
US
ON
GITHUB
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Kokki: Configuration Management Framework
Kokki: Configuration Management FrameworkKokki: Configuration Management Framework
Kokki: Configuration Management Framework
Aleksey Maksimov
 
BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
 
Building a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackBuilding a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStack
Animesh Singh
 
Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streaming
DataWorks Summit
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 

Was ist angesagt? (20)

Build FAST Learning Apps with Docker and OpenPOWER
Build FAST Learning Apps with Docker and OpenPOWERBuild FAST Learning Apps with Docker and OpenPOWER
Build FAST Learning Apps with Docker and OpenPOWER
 
Kokki: Configuration Management Framework
Kokki: Configuration Management FrameworkKokki: Configuration Management Framework
Kokki: Configuration Management Framework
 
Webinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwareWebinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMware
 
BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache Spark
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern Success
 
Building a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStackBuilding a PaaS Platform like Bluemix on OpenStack
Building a PaaS Platform like Bluemix on OpenStack
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !
 
Moby KubeCon 2017
Moby KubeCon 2017Moby KubeCon 2017
Moby KubeCon 2017
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Herding your cattle from dev to ops
Herding your cattle from dev to opsHerding your cattle from dev to ops
Herding your cattle from dev to ops
 
Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streaming
 
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
 

Ähnlich wie Using Elyra for COVID-19 Analytics

Ähnlich wie Using Elyra for COVID-19 Analytics (20)

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Tracing The Evolution Open Source & Embedded Systems - Mr. Jayakumar Balasubr...
Tracing The Evolution Open Source & Embedded Systems - Mr. Jayakumar Balasubr...Tracing The Evolution Open Source & Embedded Systems - Mr. Jayakumar Balasubr...
Tracing The Evolution Open Source & Embedded Systems - Mr. Jayakumar Balasubr...
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Tracing the evolution - Open source & Embedded systems
Tracing the evolution - Open source & Embedded systemsTracing the evolution - Open source & Embedded systems
Tracing the evolution - Open source & Embedded systems
 
Top 10 python ide
Top 10 python ideTop 10 python ide
Top 10 python ide
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019
Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019
Matteo Valoriani, Antimo Musone - The Future of Factory - Codemotion Rome 2019
 
Cytoscape and External Data Analysis Tools
Cytoscape and External Data Analysis ToolsCytoscape and External Data Analysis Tools
Cytoscape and External Data Analysis Tools
 
oneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel ProductoneAPI: Industry Initiative & Intel Product
oneAPI: Industry Initiative & Intel Product
 
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...ODSC East 2020   Accelerate ML Lifecycle with Kubernetes and Containerized Da...
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
 
Eclipse Overview
Eclipse Overview Eclipse Overview
Eclipse Overview
 

Mehr von Luciano Resende

Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
Luciano Resende
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
Luciano Resende
 

Mehr von Luciano Resende (19)

From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overview
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open source
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
 
SCA Reaches the Cloud
SCA Reaches the CloudSCA Reaches the Cloud
SCA Reaches the Cloud
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscany
 
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache TuscanyS314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
 

Kürzlich hochgeladen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 

Using Elyra for COVID-19 Analytics

  • 1. Elyra - Extending JupyterLab for AI Using Elyra for COVID-19 Analytics — Luciano Resende IBM - CODAIT
  • 2. About me - Luciano Resende Open Source AI Platform Architect – IBM – CODAIT • Senior Technical Staff Member at IBM, contributing to open source for over 10 years • Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Toree, Apache Spark among other projects related to AI/ML platforms lresende@us.ibm.com https://www.linkedin.com/in/lresende @lresende1975 https://github.com/lresende
  • 3. IBM Open Source Participation Learn Open Source @ IBM Program touches 78,000 IBMers annually Consume Virtually all IBM products contain some open source • 40,363 pkgs Per Year Contribute • >62K OS Certs per year • ~10K IBM commits per month Connect > 1000 active IBM Contributors Working in key OS projects
  • 4. IBM Open Source Participation IBM generated open source innovation • 137 IBM Open Code projects w/1000+ Github projects • Projects graduates into full open governance: Node-Red, OpenWhisk, SystemML, Blockchain fabric among others • developer.ibm.com/code/open/code/ Community • IBM focused on 18 strategic communities • Drive open governance in “Centers of Gravity” • IBM Leaders drive key technologies and assure freedom of action The IBM OS Way is now open sourced • Training, Recognition, Tooling • Organization, Consuming, Contributing
  • 5. Center for Open Source Data and AI Technologies 5 CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission CODAIT http://codait.org codait (French) = coder/coded https://m.interglot.com/fr/en/codait
  • 6. Agenda - Introduction to the COVID-19 scenario - Introduction to Elyra - Walkthrough the COVID- 19 analytic scenario exploring the Elyra features
  • 8. IBM Developer / CODAIT / © 2020 IBM Corporation 8 Leveraging analytics to navigate unprecedented times - Getty Frederick Reiss IBM CODAIT Romeo Kienzler IBM CODAIT
  • 9. Challenges implementing COVID-notebooks 9 - How to break apart tasks that are expensive to run - Data preparation - Once data is prepared, analytics can run multiple times on that same data - Data updates are frequently - When data is updated, how to ensure all tasks are executed on the right order - How to collaborate and share my artifacts
  • 11. Elyra Elyra is a set of AI centric extensions for JupyterLab Elyra was officially announced as an open source project by IBM on April 29th. The name Elyra is a word play with one of the Jupyter moons “Elara” where we introduce the “y” from “Jupyter” to make it “Elyra”
  • 12. JupyterLab JupyterLab is the next generation UI for the Jupyter Ecosystem. Bring all the previous improvements into a single unified platform plus more! Provides a modular, extensible architecture Retains backward compatibility with the old notebook we know and love
  • 13. JupyterLab File Explorer Widgets / Rich Output Tabbed Workspaces Text Editor Console/Terminal
  • 14. Elyra at GitHub https://github.com/elyra-ai/elyra Elyra Documentation https://elyra.readthedocs.io/en/latest/ Elyra Elyra is a set of AI centric extensions to JupyterLab. It aims to help data scientists, machine learning engineers and AI developer’s through the model development life cycle complexities.
  • 15. ELYRA Hybrid Runtime Support Notebook Pipelines JupyterLab Extensions Hybrid runtime support It simplifies the task of running the notebooks interactively on cloud machines, improving productivity by leveraging the power of cloud-based resources Versioning using git Simplify tracking changes, enabling better sharing among teammates Elyra provides a visual editor for building Notebook- based AI pipelines, enabling the conversion of multiple notebooks into batch jobs or workflows. Notebook Pipelines editor Notebook as batch jobs Elyra extends the notebook UI to simplify the submission of notebooks as a batch job for model training Python script execution Exposes Python Scripts as first-class citizens allowing users to locally edit their scripts and execute them against local or cloud-based resources seamlessly. Fork me at: github.com/elyra-ai
  • 16.
  • 17. Resources Elyra source code at GitHub https://github.com/elyra-ai/elyra Elyra Documentation https://elyra.readthedocs.io/en/latest/ Elyra announcement and demo video https://www.youtube.com/watch?v=PuGNijkV5PQ COVID-19 analytics scenario using Elyra by Fred Reiss https://www.youtube.com/watch?v=CbcgyzB8c4M&t STAR US & FORK US ON GITHUB
  • 20. AI / Deep Learning Workloads Resource intensive workloads Requires expensive hardware (GPU, TPU) Heterogeneous frameworks Long Running training jobs – Simple MNIST takes over one hour WITHOUT a decent GPU – Other non complex deep learning model training can easily take over a day WITH GPUs
  • 21. Training/Deploying Models requires a lot of DevOPS Model Serving Monitoring Resource Management Configuration Hyperparameter Optimization Reproducibility
  • 22. AI / Deep Learning Workloads Challenges • How to isolate the training environments to multiple jobs, based on different deep learning frameworks (and/or releases) can be submitted/trained on the same time. • Ability to allocate individual system level resources such as GPUs, TPUs, etc with different kernels for a period of time. • Ability to allocate and free up system level resources such as GPUs, TPUs, etc as they stop being used or when they are idle for a period of time.
  • 23. AI / Deep Learning Workloads Source: https://github.com/Langhalsdino/Kubernetes-GPU-Guide Containers and Kubernetes Platform - Containers simplify management of complicated and heterogenous AI/Deep Learning infrastructure providing a required isolation layer to different pods running different Deep Learning frameworks - Containers provides a flexible way to deploy applications and are here to stay - Kubernetes enables easy management of containerized applications and resources with the benefit of Elasticity and Quality of Services
  • 24. AI Platforms AI/Deep Learning Platforms aim to abstract the DevOPS tasks from the Data Scientist providing a consistent way to develop AI models independent of the toolkit/framework being used.
  • 25. Kubeflow • ML Toolkit for Kubernetes • Open source and community driven • Support multiple ML Frameworks • End-to-end workflows that can be shared, scaled and deployed
  • 26. Kubeflow Pipelines Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. • End-to-end orchestration: enabling and simplifying the orchestration of machine learning pipelines. • Easy experimentation: making it easy for you to try numerous ideas and techniques and manage your various trials/experiments. • Easy re-use: enabling you to re-use components and pipelines to quickly create end-to-end solutions without having to rebuild each time.
  • 27. Kubeflow Pipelines Two key takeaways : A Pipeline and a Pipeline Component A pipeline is a description of a machine learning (ML) workflow, including all of the workflow components and how they work together.
  • 28. Kubeflow Pipelines A pipeline component is an implementation of a pipeline task. A component represents a step in the workflow.
  • 29. Kubeflow Pipelines Each pipeline component is a container that contains a program to perform the task required for that particular step of your workflow.
  • 30. Model Training Model Validation Model Deployment Source A (Batch) Data Ingestion/Pr eparation Source B (Stream) Data Ingestion/Pr eparation Source C (Fetch) Data Ingestion/Pr eparation Model Testing Features (Snapshot) Decompose Schedule/Run
  • 32.
  • 34. Notebook as batch jobs Model training can take hours, if not days. Elyra extends the Notebook UI with a new “submit notebook” button that simplify the submission of a single notebook as a batch job. Submit Notebook
  • 36. Jupyter Enterprise Gateway website https://Jupyter.org/enterprise_gateway/ Jupyter Enterprise Gateway source code at GitHub https://github.com/jupyter/enterprise_gateway Jupyter Enterprise Gateway Documentation http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ Supported Kernels Supported Platforms Jupyter Enterprise Gateway Spectrum Conductor A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark or Kubernetes cluster for Enterprise/Cloud use cases + +
  • 37. Jupyter Enterprise Gateway Features Optimized Resource Allocation – Utilize resources on all cluster nodes by running kernels as Spark applications in YARN Cluster Mode. – Pluggable architecture to enable support for additional Resource Managers Enhanced Security – End-to-End secure communications - Secure socket communications - Encrypted HTTP communication using SSL Multiuser support with user impersonation – Enhance security and sandboxing by enabling user impersonation when running kernels (using Kerberos). – Individual HDFS home folder for each notebook user. – Use the same user ID for notebook and batch jobs. Kernel Kernel Kernel Kernel Kernel Kernel Kernel 16 32 48 64 0 20 40 60 80 4 Nodes 8 Nodes 12 Nodes 16 Nodes xKernels(4GBHeap) Cluster Size (32GB Nodes) MAXIMUM NUMBER OF SIMULTANEOUS KERNELS
  • 38. Enterprise Gateway & Kubernetes Supported Platforms Before Jupyter Enterprise Gateway … - Resources required for all kernels needs to be allocated during Notebook Server pod creation - Resources limited to what is physically available on the host node that runs all kernels and associated Spark drivers After Jupyter Enterprise Gateway … - Gateway pod very lightweight - Kernels in their own pod, isolation - Kernel pods built from community images: Spark-on-K8s, TensorFlow, Keras, etc. Before Enterprise Gateway After Enterprise Gateway
  • 39. Bob Alice Jupyter Enterprise Gateway Bob Alice Container images defined in kernelspec Community image Kernel Spark on Kubernetes Kernel Jupyter Enterprise Gateway - Kubernetes
  • 40. Bob Alice Jupyter Enterprise Gateway Bob Alice Container images defined in kernelspec JupyterHub will provision custom images containing Notebook + NB2KG extension JupyterLab Jupyter Notebook Community image Kernel Spark on Kubernetes Kernel Jupyter Enterprise Gateway - Kubernetes
  • 42. Python Script execution • Create new Python script from the workspace launcher • Navigate trough the script via Table of Content outline • Execute the script against local or cloud- based resources Execute Script Select Environment
  • 44. Git Integration Elyra provides integrated support for git repositories simplifying tracking changes, allowing rollback to working versions of the code, backups
  • 45. Resources Elyra source code at GitHub https://github.com/elyra-ai/elyra Elyra Documentation https://elyra.readthedocs.io/en/latest/ Elyra announcement and demo video https://developer.ibm.com/technologies/artificial-intelligence/blogs/open-source-elyra-ai-toolkit-simplifies-data-model-development/ COVID-19 analytics scenario using Elyra by Fred Reiss https://www.youtube.com/watch?v=CbcgyzB8c4M&t STAR US & FORK US ON GITHUB