Accelerating & Optimizing Machine Learning on VMware vSphere leveraging NVIDIA GPUs

1©2018 VMware, Inc.
Accelerating & Optimizing HPC/ML
on vSphere Leveraging NVIDIA
GPU
Mohan Potheri, VMware, Inc
Justin Murray, VMware, Inc

Agenda
New Demands on IT
VMware Goal and Approach
Why Virtualize AI & ML
Machine Learning Landscape
Maximizing GPU Utilization
Extending GPU Sharing to Containers
Summary

New Demands on IT Infrastructure
X86 SGXGPU NVM FPGAQAT IPU
Specialized Hardware
Security
Hybrid Cloud
Public Cloud
Global Infra and Edge
Growth of Apps
Business
Critical Apps
Desktop
Virtualization
Graphic
Intensive
Cloud-Native
Apps
Edge/IOTSaaSMobile Custom/OtherAnalytics/
AI/ML
PMEM

Our Goal and Approach
• Increase agility and decrease time to discovery for researchers, data scientists, and
engineers
• Provide IT with the ability to efficiently provision, allocate, manage and ensure
compliance of research compute infrastructure across an increasingly broad range of
technical and business requirements
• By leveraging VMware’s proven, enterprise-class virtualization and cloud technologies to
meet the performance requirements of research computing, HPC, and ML workloads, and
• Bringing novel capabilities to bear to enable new capabilities not available in traditional
HPC/ML environments

• Simple cluster expansion and
contraction
• Rapidly reproduce research
environments
• Higher resiliency and less
downtime with vMotion
• Fault-isolation (hardware and
software)
 Cluster resource-sharing
 Minimize setup and
configuration time with
centralized management
capabilities
 Simultaneously support mixed
software environments
 Industry-leading virtualization
platform that your IT already
knows
• Easy, secure data access and
sharing
• Security Isolation
• Multi-tenant data security
Why Virtualize HPC AI/ML Infrastructure
vSphere can help data scientists get to answers faster
Operational Flexibility Reduced Complexity Secure Sensitive Workloads

Dispelling the Misunderstanding about GPUs on vSphere
• Hypervisor is not an intermediary
when accessing the GPU
• GPU access is
• Directly via passthrough to VM
or
• NVIDIA Grid vGPU
• Near Zero performance impact

Machine
Learning
Deep
LearningBig Data
Edge
or
IoT
ON-PREM
OFF-PREM
trainingdata
inference
inference
Machine Learning Infrastructure Landscape
Data Analytics
Two Main Phases in ML
• Training / Model Building
• Often very large data sets
• Compute, storage, and network
intensive
• Server-class infrastructure
• Inference / Scoring
• Apply existing models to new data
• Used for prediction
• Edge or core infrastructure
V
D
I

Using GPUs with vSphere

VM Direct Path I/O for NVIDIA
GPU

A Virtualized GPU
PassThrough v Sphere 6.5/6.7
ESXi Host
GPU
VM VM
Linux
CUDA Library & Driver
TensorFlow

• Can provision VMs with one or more GPUs
• Easily reuse GPU infrastructure
• Same behavior as Public Cloud GPU instances
• Benefits:
• HW Isolation
• Workload Isolation
• VM Level Quality of Service
• Fast environment provisioning
• Near bare-metal performance
• Passthrough device certification for vSphere not required
• Server must be compatible with device as published by server OEM and
GPU vendor
• Server must be vSphere Certified
GPU Acceleration on vSphere with DirectPath I/O
VM
GPU
App
GPU
App
GPU
App
GPU
App
GPU
App
• Caveats:
• No vMotion
• No Suspend and Resume
• No DRS
• No vSphere HA
Learn more

VM DirectPath I/O – Multiple GPUs Attached to a Virtual Machine

vSphere GPU Sharing Mechanisms

• Share single GPU among multiple VMs
• Provision VMs with partial up to one full GPU
• GRID vGPU VM Suspend and Resume support
• Quickly repurpose GPU infrastructure
• VDI or Data Science by day
• Compute (ML) by Night
• Benefits:
• HW Isolation
• Workload Isolation
• VM Level Quality of Service
• GPU Quality of Service
• Fast environment provisioning
• Bare-metal comparable performance
VMware vSphere 6.7 and NVIDIA Quadro vDWS (GRID 7.0)
GPU
App
GPU
App
GPU
App
GPU
App
GPU
App
GPU
App
GPU
App
GPU
App
Learn more

NVIDIA Grid – Two Layers of Software/Drivers

NVIDIA Grid Configuration – Choosing the vGPU Profile

• Dynamic GPU attach anywhere
• Fractional GPUs for Efficiency
• Application Run Time Virtualization
• Standard based GPU
Bitfusion Enables Remote GPU Sharing
BF
Client VM
ESX Host
BF
Server
VM
ESX Host
GPU Passthrough
BF
Server
VM
ESX Host
GPU Passthrough
BF
Server
VM
ESX Host
GPU Passthrough
vSphere GPU Cluster
BF
Client VM
ESX Host
BF
Client VM
ESX Host
BF
Client VM
ESX Host

Maximize GPU Utilization

vSphere 6.7 GPU Virtual Machine Suspend and Resume
Source: Enhancing Operations for NVIDIA Grid
Video Demo:
https://youtu.be/PwVReRauY50
Blog Article:
https://blogs.vmware.com/vsphere/2018/07/vs
phere-6-7-suspend-and-resume-of-gpu-
attached-virtual-machines.html

Go beyond a traditional batch-
processing to viewing HPC resources as
an engine for returning results in real
time.
Enable HPC compute jobs to harvest
cycles from a VDI compute environment.
Outcome
Benefit
Deep Learning Virtualization Use Case: Cycle Harvesting
Challenge:
Data Scientists submit jobs in traditional batches, because of
compute availability
• Submit jobs one day
• Wait until the next day for the job results
What if…
The VDI environment has unused cycles. Could HPC jobs be run in
the environment when it is not needed to run VDI?
Will it blend?

Cycle Harvesting
VMware ESXi VMware ESXi VMware ESXi
100 100 100 100 100 100 1 1Share Value 100
8AMTime Noon 5PM 10PM
1

Cycle Harvesting Case Study
https://bit.ly/2MrBngH

Extending GPGPU Sharing to
Containers

Why Singularity Containers?
Docker is not designed for HPC architectures
Singularity is the best suited Container solution for HPC:
Singularity container is encapsulated in a single file making
it highly portable and secure.
Singularity is designed from the ground up for scientific
computing

Combining Virtual Machines & Containers for GPU sharing
• Sharing GPUs in a container is difficult as there is no resource management
• vSphere VM with NVIDIA Grid or Bitfusion can use whole or partial GPU
• Containers are a great packaging mechanism for applications
• By enclosing one container per virtual machine, we get the best of both worlds
• GPU resources can be shared with other containers
• Machine and Deep Learning applications & platforms can be packaged and distributed effectively as a
container

Logical Schematic of Infrastructure components
• One Singularity Container per VM
• Containers leverage partial or full GPUs allocated
to the virtual machine
• Container packaged with TensorFlow, tools, etc.
• Bitfusion provides GPU sharing
BF
Server
VM
ESX Host
GPU
Passthrough
BF
Server
VM
ESX Host
GPU
Passthrough
BF
Server
VM
ESX Host
GPU
Passthrough
vSphere GPU Cluster
Singularit
y
ContainerVirtual Machine
ESX Host
Singularit
y
Container
Virtual Machine
ESX Host
vSphere Generic Cluster

Images/sec Throughput comparison for 1 GPU
2.5-3X more throughput
with sharing
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
Resnet50 Alexnet Inception3
Throughput comparison with and without GPU sharing
Total Throughput Baseline no sharing
ThroughputRatios

Runtime comparison for 1 GPU (with/without sharing)
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
160.00
180.00
200.00
Runtime (%) Average Run Time (Seconds)
Runtime comparison for 1 GPU with and without sharing
Unshared Shared
17%
Only 17% slower for nearly 3X Throughput

Summary
• Sharing is key to enable cloud like capabilities on premises
• vSphere is the best platform to leverage latest high performance hardware
• Virtualization supports device sharing and delivers near bare-metal performance
• HW Sharing through vSphere can increase utilization. (Cycle Harvesting)

Accelerating & Optimizing Machine Learning on VMware vSphere leveraging NVIDIA GPUs

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Accelerating & Optimizing Machine Learning on VMware vSphere leveraging NVIDIA GPUs

Ähnlich wie Accelerating & Optimizing Machine Learning on VMware vSphere leveraging NVIDIA GPUs (20)

Mehr von inside-BigData.com

Mehr von inside-BigData.com (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Accelerating & Optimizing Machine Learning on VMware vSphere leveraging NVIDIA GPUs