In this deck from the 2019 Stanford HPC Conference, Mohan Potheri from VMware,presents: Accelerating & Optimizing Machine Learning on VMware vSphere leveraging NVIDIA GPUs.
"This session introduces machine learning on vSphere to the attendee and explains when and why GPUs are important for them. Basic machine learning with Apache Spark is demonstrated. GPUs can be effectively shared in vSphere environments and the various methods of sharing are addressed here. We will explore features like suspending/resuming a VM that has GPUs attached to it, for sharing of those resources. Compelling use cases were developed leveraging vSphere's GPU capabilities. These use cases showcase deep learning with GPGPUs for image processing, stock prediction and distributed training. References to various technical papers will be given."
Mohan Potheri is VCDX#98 and has more than 20 years in IT infrastructure, with in depth experience on VMWARE virtualization. He currently focuses on evangelization of "High Performance Computing (HPC)" and "Big Data” Virtualization on vSphere. He also has extensive experience with business-critical applications such as SAP, Oracle, SQL and Java across UNIX, Linux and Windows environments. Mohan Potheri is an expert on SAP virtualization and has been a speaker in multiple VMWORLD and PEX events. Prior to VMWARE, Mohan worked at many large enterprises where he has engineered fully virtualized HPC Solutions. He has planned, designed, implemented and managed robust highly available, DR compliant virtual environments in UNIX and x86 environments.
Watch the video: https://youtu.be/rDsht9NFwR0
Learn more: https://www.vmware.com/solutions/high-performance-computing.html
and
http://hpcadvisorycouncil.com/events/2019/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
4. Our Goal and Approach
• Increase agility and decrease time to discovery for researchers, data scientists, and
engineers
• Provide IT with the ability to efficiently provision, allocate, manage and ensure
compliance of research compute infrastructure across an increasingly broad range of
technical and business requirements
• By leveraging VMware’s proven, enterprise-class virtualization and cloud technologies to
meet the performance requirements of research computing, HPC, and ML workloads, and
• Bringing novel capabilities to bear to enable new capabilities not available in traditional
HPC/ML environments
26. Why Singularity Containers?
Docker is not designed for HPC architectures
Singularity is the best suited Container solution for HPC:
Singularity container is encapsulated in a single file making
it highly portable and secure.
Singularity is designed from the ground up for scientific
computing
27. Combining Virtual Machines & Containers for GPU sharing
• Sharing GPUs in a container is difficult as there is no resource management
• vSphere VM with NVIDIA Grid or Bitfusion can use whole or partial GPU
• Containers are a great packaging mechanism for applications
• By enclosing one container per virtual machine, we get the best of both worlds
• GPU resources can be shared with other containers
• Machine and Deep Learning applications & platforms can be packaged and distributed effectively as a
container
28. Logical Schematic of Infrastructure components
• One Singularity Container per VM
• Containers leverage partial or full GPUs allocated
to the virtual machine
• Container packaged with TensorFlow, tools, etc.
• Bitfusion provides GPU sharing
BF
Server
VM
ESX Host
GPU
Passthrough
BF
Server
VM
ESX Host
GPU
Passthrough
BF
Server
VM
ESX Host
GPU
Passthrough
vSphere GPU Cluster
Singularit
y
ContainerVirtual Machine
ESX Host
Singularit
y
Container
Virtual Machine
ESX Host
vSphere Generic Cluster
29. Images/sec Throughput comparison for 1 GPU
2.5-3X more throughput
with sharing
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
Resnet50 Alexnet Inception3
Throughput comparison with and without GPU sharing
Total Throughput Baseline no sharing
ThroughputRatios
30. Runtime comparison for 1 GPU (with/without sharing)
0.00
20.00
40.00
60.00
80.00
100.00
120.00
140.00
160.00
180.00
200.00
Runtime (%) Average Run Time (Seconds)
Runtime comparison for 1 GPU with and without sharing
Unshared Shared
17%
Only 17% slower for nearly 3X Throughput
31. Summary
• Sharing is key to enable cloud like capabilities on premises
• vSphere is the best platform to leverage latest high performance hardware
• Virtualization supports device sharing and delivers near bare-metal performance
• HW Sharing through vSphere can increase utilization. (Cycle Harvesting)