SlideShare ist ein Scribd-Unternehmen logo
1 von 21
SUSE Developer
program for Data
Scientists
Developers Program Architect
Marco Varlese
marco.varlese@suse.com
Sr. Product Manager
Accelerators & Artificial
Intelligence
Alessandro Festa
alessandro.festa@suse.com
@bringyourownid
In this session…
You will learn… Why a Dev
Program for Data
Scientist…
Containers, GPU’s, Tricks and…
…how about some juggling?
So what about
Dev Program…
Why a GPU aware Container
• Technical needs:
• Machine Learning and Deep Learning need high computational power
• It’s not only GPU but market is there right now (see next slide) = 90% of the users/customers
• Machine Learning in a container are the way to go: are simples to use for a non-technical person,
easy to deploy, easy to “transport” (from on-prem to cloud and reverse)
• Challenges:
• NVIDIA drivers are no open source so cannot be shipped with Leap/Tubleweed (no OBS) so we (as
community) need to find a solutions to make users life easier
• Nvidia-docker from NVIDIA CUDA are required
• Docker images for Machine Learning frameworks are HUGE (over 3 GB)
Wait wait…. What you are talking about? NVIDIA what?
You are here
Mandatory Requirements
“Make sure you have installed the NVIDIA driver and a supported version of Docker for your distribution”
GNU/Linux x86_64 with kernel version > 3.10
Docker >= 1.12
NVIDIA GPU with Architecture > Fermi (2.1)
NVIDIA drivers ~= 361.93 (untested on older versions)
CUDA toolkit version Driver version GPU architecture
6.5 >= 340.29 >= 2.0 (Fermi)
7.0 >= 346.46 >= 2.0 (Fermi)
7.5 >= 352.39 >= 2.0 (Fermi)
8.0 == 361.93 or >= 375.51 == 6.0 (P100)
8.0 >= 367.48 >= 2.0 (Fermi)
9.0 >= 384.81 >= 3.0 (Kepler)
9.1 >= 387.26 >= 3.0 (Kepler)
9.2 >= 396.26 >= 3.0 (Kepler)
10.0 >= 384.111, < 385.00 Tesla GPUs
10.0 >= 410.48 >= 3.0 (Kepler)
10.1 >= 384.111, < 385.00 Tesla GPUs
10.1 >=410.72, < 411.00 Tesla GPUs
10.1 >= 418.39 >= 3.0 (Kepler)
Where to start• NVIDIA container matrix:
• https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html#framework-matrix-2019
Where to start
• NVIDIA gitlab : https://gitlab.com/nvidia
Where to start
NVIDIA on Docker Hub: https://hub.docker.com/r/nvidia/cuda/
CUDA images come in three flavors:
• base: starting from CUDA 9.0, contains the bare minimum (libcudart) to
deploy a pre-built CUDA application.Use this image if you want to
manually select which CUDA packages you want to install.
• runtime: extends the base image by adding all the shared libraries from
the CUDA toolkit.Use this image if you have a pre-built application using
multiple CUDA libraries.
• devel: extends the runtime image by adding the compiler toolchain, the
debugging tools, the headers and the static libraries.Use this image to
compile a CUDA application from sources.
Challenges (Resume)
• HOST require nvidia-docker V2 installed (github pull waiting to
be merged - https://github.com/NVIDIA/nvidia-docker/pull/790)
: we are working on IT (Thanks Darren Davis our TAM to push
on NVIDIA!)
• CudNN and CUDA require license acceptance by user –
cannot be easily delivered as SUSE package – Partner Hub to
the rescue ! And in containers may be installed silently using
an explicit variable (i.e.: -e ACCEPT_EULA=Y)
• Some dependencies are missing in SLE but not in
openSUSE when install CUDA directly from the NVIDA Repo
– as alternative we may use the CUDA script.
Both packages
seems to be
optional to me.
Do we need
samples? -
Maybe
Do we need
X11 driver in a
container? –
Would say it
depends….
Both are published as openSUSE packages
Result
NVIDIA
variables
(mandatory)
Only needed if run
the NVIDIA Cuda
script (optional ?)
Actual install steps
(these are for
Tensorflow base)
DEMO TIME
But the containers is not enough…
You’re a Data Scientist not a
SysAdmin/DevOps
AI Use Cases (for openSUSE)
Data
Scientist
Machine
Learning
Engineer
• Run an experiment with different
coefficients and summarize the results
• Work “local” first
• Create “template” and need to re-apply
to production ready environment
• Write Code based on Dataset samples
• Work either “local” or “remote” connected
• Need to re-test (QA) code on a different
environment
Can Customers Do
It Alone?
Kubic/openSUSE
Leap
openSUSE Leap
+
Containers/VM
Deployment
openSUSE Leap
Kubic
A simple Data Scientist playground
Launch
Notebook
Choose Use
Case/ML
Framework
Use
playground
Data Scientist choose but do not see
complexity
On Prem
Cloud
DEMO TIME
So to recap…and to learn something new…
OpenSUSE Conference 2019 - Building GPU aware containers

Weitere ähnliche Inhalte

Was ist angesagt?

RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016Intel Nervana
 
OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgradesFrédéric Lepied
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceIntel Nervana
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebula Project
 
SDN Symposium - Cybera
SDN Symposium - CyberaSDN Symposium - Cybera
SDN Symposium - CyberaEdgar Magana
 
Automating hard things may 2015
Automating hard things   may 2015Automating hard things   may 2015
Automating hard things may 2015Mark Baker
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA Taiwan
 
Intel neural compute_stick_2
Intel neural compute_stick_2Intel neural compute_stick_2
Intel neural compute_stick_2Zhin-hsin Ou
 
Pycon2014 GPU computing
Pycon2014 GPU computingPycon2014 GPU computing
Pycon2014 GPU computingAshwin Ashok
 

Was ist angesagt? (9)

RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016RE-Work Deep Learning Summit - September 2016
RE-Work Deep Learning Summit - September 2016
 
OpenStack Summit Vancouver: Lessons learned on upgrades
OpenStack Summit Vancouver:  Lessons learned on upgradesOpenStack Summit Vancouver:  Lessons learned on upgrades
OpenStack Summit Vancouver: Lessons learned on upgrades
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
 
SDN Symposium - Cybera
SDN Symposium - CyberaSDN Symposium - Cybera
SDN Symposium - Cybera
 
Automating hard things may 2015
Automating hard things   may 2015Automating hard things   may 2015
Automating hard things may 2015
 
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detectionNVIDIA 深度學習教育機構 (DLI): Approaches to object detection
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
 
Intel neural compute_stick_2
Intel neural compute_stick_2Intel neural compute_stick_2
Intel neural compute_stick_2
 
Pycon2014 GPU computing
Pycon2014 GPU computingPycon2014 GPU computing
Pycon2014 GPU computing
 

Ähnlich wie OpenSUSE Conference 2019 - Building GPU aware containers

GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsAltoros
 
From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R Kai Lichtenberg
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCinside-BigData.com
 
Migrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMigrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMicrosoft Tech Community
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
Easier, Better, Faster, Safer Deployment with Docker and Immutable ContainersEasier, Better, Faster, Safer Deployment with Docker and Immutable Containers
Easier, Better, Faster, Safer Deployment with Docker and Immutable ContainersC4Media
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Intel Software Brasil
 
Classification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike WangClassification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike WangPAPIs.io
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStackke4qqq
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersJoy Qiao
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsJack (Jaegeun) Han
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computingbakers84
 
One Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONEOne Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONESoftware AG
 
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...Chris Fregly
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStackPuppet
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackke4qqq
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackke4qqq
 
ApacheCloudStack
ApacheCloudStackApacheCloudStack
ApacheCloudStackPuppet
 

Ähnlich wie OpenSUSE Conference 2019 - Building GPU aware containers (20)

GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R From Zero to Hero - All you need to do serious deep learning stuff in R
From Zero to Hero - All you need to do serious deep learning stuff in R
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
Migrating existing open source machine learning to azure
Migrating existing open source machine learning to azureMigrating existing open source machine learning to azure
Migrating existing open source machine learning to azure
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
Easier, Better, Faster, Safer Deployment with Docker and Immutable ContainersEasier, Better, Faster, Safer Deployment with Docker and Immutable Containers
Easier, Better, Faster, Safer Deployment with Docker and Immutable Containers
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
 
Classification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike WangClassification of aerial photographs using DIGITS 2 - Mike Wang
Classification of aerial photographs using DIGITS 2 - Mike Wang
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
 
Cuda
CudaCuda
Cuda
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clusters
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
One Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONEOne Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONE
 
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
High Performance TensorFlow in Production -- Sydney ML / AI Train Workshop @ ...
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 
ApacheCloudStack
ApacheCloudStackApacheCloudStack
ApacheCloudStack
 

Kürzlich hochgeladen

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

OpenSUSE Conference 2019 - Building GPU aware containers

  • 1. SUSE Developer program for Data Scientists Developers Program Architect Marco Varlese marco.varlese@suse.com Sr. Product Manager Accelerators & Artificial Intelligence Alessandro Festa alessandro.festa@suse.com @bringyourownid
  • 2. In this session… You will learn… Why a Dev Program for Data Scientist… Containers, GPU’s, Tricks and… …how about some juggling?
  • 3. So what about Dev Program…
  • 4. Why a GPU aware Container • Technical needs: • Machine Learning and Deep Learning need high computational power • It’s not only GPU but market is there right now (see next slide) = 90% of the users/customers • Machine Learning in a container are the way to go: are simples to use for a non-technical person, easy to deploy, easy to “transport” (from on-prem to cloud and reverse) • Challenges: • NVIDIA drivers are no open source so cannot be shipped with Leap/Tubleweed (no OBS) so we (as community) need to find a solutions to make users life easier • Nvidia-docker from NVIDIA CUDA are required • Docker images for Machine Learning frameworks are HUGE (over 3 GB) Wait wait…. What you are talking about? NVIDIA what?
  • 6. Mandatory Requirements “Make sure you have installed the NVIDIA driver and a supported version of Docker for your distribution” GNU/Linux x86_64 with kernel version > 3.10 Docker >= 1.12 NVIDIA GPU with Architecture > Fermi (2.1) NVIDIA drivers ~= 361.93 (untested on older versions)
  • 7. CUDA toolkit version Driver version GPU architecture 6.5 >= 340.29 >= 2.0 (Fermi) 7.0 >= 346.46 >= 2.0 (Fermi) 7.5 >= 352.39 >= 2.0 (Fermi) 8.0 == 361.93 or >= 375.51 == 6.0 (P100) 8.0 >= 367.48 >= 2.0 (Fermi) 9.0 >= 384.81 >= 3.0 (Kepler) 9.1 >= 387.26 >= 3.0 (Kepler) 9.2 >= 396.26 >= 3.0 (Kepler) 10.0 >= 384.111, < 385.00 Tesla GPUs 10.0 >= 410.48 >= 3.0 (Kepler) 10.1 >= 384.111, < 385.00 Tesla GPUs 10.1 >=410.72, < 411.00 Tesla GPUs 10.1 >= 418.39 >= 3.0 (Kepler)
  • 8. Where to start• NVIDIA container matrix: • https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html#framework-matrix-2019
  • 9. Where to start • NVIDIA gitlab : https://gitlab.com/nvidia
  • 10. Where to start NVIDIA on Docker Hub: https://hub.docker.com/r/nvidia/cuda/ CUDA images come in three flavors: • base: starting from CUDA 9.0, contains the bare minimum (libcudart) to deploy a pre-built CUDA application.Use this image if you want to manually select which CUDA packages you want to install. • runtime: extends the base image by adding all the shared libraries from the CUDA toolkit.Use this image if you have a pre-built application using multiple CUDA libraries. • devel: extends the runtime image by adding the compiler toolchain, the debugging tools, the headers and the static libraries.Use this image to compile a CUDA application from sources.
  • 11. Challenges (Resume) • HOST require nvidia-docker V2 installed (github pull waiting to be merged - https://github.com/NVIDIA/nvidia-docker/pull/790) : we are working on IT (Thanks Darren Davis our TAM to push on NVIDIA!) • CudNN and CUDA require license acceptance by user – cannot be easily delivered as SUSE package – Partner Hub to the rescue ! And in containers may be installed silently using an explicit variable (i.e.: -e ACCEPT_EULA=Y) • Some dependencies are missing in SLE but not in openSUSE when install CUDA directly from the NVIDA Repo – as alternative we may use the CUDA script.
  • 12. Both packages seems to be optional to me. Do we need samples? - Maybe Do we need X11 driver in a container? – Would say it depends…. Both are published as openSUSE packages
  • 13. Result NVIDIA variables (mandatory) Only needed if run the NVIDIA Cuda script (optional ?) Actual install steps (these are for Tensorflow base)
  • 15. But the containers is not enough… You’re a Data Scientist not a SysAdmin/DevOps
  • 16.
  • 17. AI Use Cases (for openSUSE) Data Scientist Machine Learning Engineer • Run an experiment with different coefficients and summarize the results • Work “local” first • Create “template” and need to re-apply to production ready environment • Write Code based on Dataset samples • Work either “local” or “remote” connected • Need to re-test (QA) code on a different environment Can Customers Do It Alone?
  • 18. Kubic/openSUSE Leap openSUSE Leap + Containers/VM Deployment openSUSE Leap Kubic A simple Data Scientist playground Launch Notebook Choose Use Case/ML Framework Use playground Data Scientist choose but do not see complexity On Prem Cloud
  • 20. So to recap…and to learn something new…

Hinweis der Redaktion

  1. So why would you need SUSE Global Services when you already have SUSE Support with your subscription? Excellent question. To put it simply, your team has their every day job. They are tasked with “keeping the lights on,” which means they are responsible for all the baseline needs, including: Maintenance and security of all your servers Maintaining uptime and avoiding business disruption And providing quality services to your business and customers At the same time, your business is asking you to transform to meet the needs of the digital economy. That is, your team has to grow themselves to become IT generalists that can both span development and operations. They need to speed software and application delivery so that they are not merely doing yearly releases but maybe releasing products quarterly, monthly or even faster. You are also grappling with a skills gap. And it seems as soon as you have someone certified or trained on the technology another company or recruiter poaches that talent. So can you do it alone?