Weitere ähnliche Inhalte Ähnlich wie Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - Erez Cohen, Mellanox - Cloud Native Day Tel Aviv 2018 (20) Mehr von Cloud Native Day Tel Aviv (20) Kürzlich hochgeladen (20) Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - Erez Cohen, Mellanox - Cloud Native Day Tel Aviv 20181. 1© 2018 Mellanox Technologies
Machine Learning On OpenStack and K8 Done Right!
2018
Brain In The Cloud
Erez Cohen, VP CloudX & Artificial Intelligence
2. 2© 2018 Mellanox Technologies
Data is Growing Faster Than Ever
Autonomous vehicle generates 4000GByte per day
SONAR
~10-100KB Per/Sec
CAMERA
~20-40MB Per/sec
GPS
~50KB Per/Sec
Data will grow by a factor of 10 over the next decade to 160 Zeta Bytes in 2025 (source: IDC)
Faster Data processing requires faster Interconnect speeds
RADAR
~10-100KB Per/Sec
Light Detection & Ranging
~10-70MB Per/Sec
3. 3© 2018 Mellanox Technologies
Machine Learning Is Everywhere!
Fraud Detection
4. 4© 2018 Mellanox Technologies
What Is Machine Learning
Machine Learning
Machine learning is the subfield of computer science that, according
to Arthur Samuel in 1959, gives "computers the ability to learn without
being explicitly programmed.“
Source: https://en.wikipedia.org/wiki/Machine_learning
5. 5© 2018 Mellanox Technologies
Deep Learning
Also known as Deep Neural Network (DNN)
Subset of Artificial Neural Network (ANN)
Deep Learning
Deep Learning is a subfield of machine learning concerned with
algorithms inspired by the structure and function of the brain called
artificial neural networks
Source: http://machinelearningmastery.com/what-is-deep-learning/
6. 6© 2018 Mellanox Technologies
Why Deep Learning And Why Now?
Deep Learning allow to solve difficult problems
In some cases problems that can’t be solve in other ways
Deep Learning is not new
1943: “A logical calculus of the ideas immanent in nervous activity”, McCulloch & Pitts
So why now?
Infrastructure
Recent development in GPU and network technology allow to realize machine learning
Data
More data is generated then ever. Critical for the training process
Software
Wave of open source machine learning frameworks
Cognitive Toolkit
7. 7© 2018 Mellanox Technologies
Deep Learning Demands Highest Performance
TRAINING DATASET
NEW DATA
TRAINING
Intensive computing (Billions of
TFLOPS)
• GPU!
Ultra-fast networking for scalability
• RDMA, GPUDirect, Collective
acceleration
Fast, distributed storage
INFERENCING
Images
Video
Text
Speech
8. 8© 2018 Mellanox Technologies
Neural Networks Complexity Growth
2014 2015 2016 2017
DeepSpeech DeepSpeech-2
DeepSpeech-3
30X
2013 2014 2015 2016
AlexNet GoogleNet
ResNet
Inception-V2
350X
Inception-V4
Image
Recognition
Speech
Recognition
PolyNet
9. 9© 2018 Mellanox Technologies
Training Challenges
Training with large data sets and increasing networks can take long time
In some cases even weeks
In many cases training need to happen frequently
Model development and tuning
Real life use cases may require retraining regularly
Accelerate training time by scale out architecture
Add workers (nodes) to reduce training time
Types of parallelism that are now popular
Data parallelism
Model parallelism
Network is critical element to accelerate Distributed Training!
10. 10© 2018 Mellanox Technologies
Model and Data Parallelism
Main Model/Parameter Server/Allreaduce
Local
Model
Mini
Batch
Mini
Batch
Mini
Batch
Mini
Batch
Mini
Batch
Local
Model
Local
Model
Local
Model
Local
Model
Local
Model
Mini
BatchData Data
Model Parallelism Data Parallelism
11. 11© 2018 Mellanox Technologies
Accelerates Distributed Training
Data Parallelism communication pattern
Gradient updates to parameter servers or among workers.
Model parameters distribution among workers.
Frequent – each training step due to the sequential nature of SGD
High bandwidth is needed, as models become larger and larger
Number of parameters is increasing
Usually characterized with Bursts on the network - workers are synchronized
RDMA and GPU Direct Accelerates Distributed Training
12. 12© 2018 Mellanox Technologies
Machine Learning on the Cloud
GPU provisioning to VMs
Advance Networking
Advance Storage
13. 13© 2018 Mellanox Technologies
GPU Provisioning with OpenStack
Today – PCI Passthrough or Ironic
PCI passthrough requires hardware support and has some caveats…
https://wiki.openstack.org/wiki/GPUs
Good performance requires pinning and NUMA topology support configured too
Tomorrow – vGPU
mdev framework introduced in Linux 4.10 by Red Hat, NVIDIA, Intel
~$ openstack flavor show 56cd053c-b6a2-4103-b870-a83dd5d27ec1
+----------------------------+--------------------------------------------+
| Field | Value |
+----------------------------+--------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 1000 |
| disk | 30 |
| id | 56cd053c-b6a2-4103-b870-a83dd5d27ec1 |
| name | mon.m3.c24r120.2gpu-p100.mlx |
| os-flavor-access:is_public | False |
| properties | pci_passthrough:alias='P100:2,MlxCX4-VF:1' |
| ram | 122880 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 24 |
+----------------------------+--------------------------------------------+
~$ openstack server list --all-projects --project d99… --flavor 56c…
+--------------------------------------+------------+--------+----------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------------+--------+----------------------------------+
| 1d77bf12-0099-4580-bf6f-36c42225f2c0 | massive003 | ACTIVE | monash-03-internal=10.16.201.20 |
+--------------------------------------+------------+--------+----------------------------------+
14. 14© 2018 Mellanox Technologies
What Is RDMA?
Remote Direct Memory Access (RDMA)
Advance transport protocol (same layer as TCP and UDP)
Main features
Remote memory read/write semantics in addition to send/receive
Kernel bypass / direct user space access
Full hardware offload
Secure, channel based IO
Application advantage
Low latency
High bandwidth
Low CPU consumption
RoCE: RDMA over Converged Ethernet
Available for all Ethernet speeds 10 – 100G
Verbs: RDMA SW interface (equivalent to sockets)
16. 16© 2018 Mellanox Technologies
Para-Virtualized SR-IOV
Enable Advance Networking For VMs & Containers
Single Root I/O Virtualization
(SR-IOV)
PCIe device presents multiple instances
to the OS/Hypervisor
Enables Application Direct Access
Bare metal performance for VM
Reduces CPU overhead
Enables many advanced NIC features
(e.g. DPDK, RDMA, ASAP2,)
NIC
Hypervisor
vSwitch
VM VM
SR-IOV NIC
Hypervisor VM VM
eSwitch
Physical Function
(PF)
Virtual Function
(VF)
17. 17© 2018 Mellanox Technologies
ASAP2 Direct: Full OVS Offload
Enable SR-IOV data path with OVS control plane
In other words, enable support for most SDN controllers with SR-IOV
data plane
Use Open vSwitch to be the management interface and
offload OVS data-plane to Mellanox embedded Switch
(eSwitch) using ASAP2 Direct
Allow for RDMA, GPUDirect and other advance network
services directly from a VM or Container
VM
ConnectX-5 eSwitch
VM
Hypervisor
OVS
SR-IOV
VF
SR-IOV
VF
DataPath
PF
18. 18© 2018 Mellanox Technologies
Comprehensive OpenStack Integration
Integrated with Major
OpenStack
Distributions
In-Box
Neturon-ML2
support for
mixed
environment
(VXLAN, PV,
SRIOV)
Ethernet
Neutron: Data
plane
acceleration
and isolation
iSER and
NVMf
Accelerating
storage
access
OpenStack Plugins Create Seamless Integration , Control, & Management
19. 19© 2018 Mellanox Technologies
Container Networking Acceleration
Enable RoCE and DPDK networking technologies to accelerate
cloud-native apps and workloads
20. 20© 2018 Mellanox Technologies
Containers and Kubernetes Integration
PF VF-1 VF-2 VF-3
SR-IOV
CNI
ibdev=mlx5_1
netdev=eth0
net_ns=1
ibdev=mlx5_2
netdev=eth1
net_ns=2
ibdev=mlx5_3
netdev=eth2
net_ns=3
Kubernetes/
Docker
Container1 Container2 Container3
SR-
IOV/RDMA
Device
Plugin
Mellanox ConnectX Adapter Card with SR-IOV Enabled
Every container/POD has an IB device (mlx5_1,2,3)
Isolation is on the driver level
RDMA Application RDMA Application RDMA Application
Verbs Verbs Verbs
21. 21© 2018 Mellanox Technologies
All Major Machine Learning Frameworks Support
RDMA
TensorFlow: Several implementations upstream
Native (verbs) -
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/
verbs
MPI, Horovod – Donated by Uber among others
Caffe2 / PyTorch: Over MPI or Gloo library
Microsoft Cognitive Toolkit: Native support
NVIDIA NCCL2: Native support in NCCL
Cognitive Toolkit
22. 22© 2018 Mellanox Technologies
TensorFlow with Mellanox RDMA Test Report
System Configuration
8 x x86 servers
4 x NVIDIA P100 per server
Mellanox 100G RDMA network
NVMe driver per server
TensorFlow v1.4
RDMA vs. TCP: Up to 50% Better Performance
Advanced RDMA vs. TCP: Up to 173% Better Performance
Reference Deployment Guide
23. 23© 2018 Mellanox Technologies
NVIDIA® DGX-1™ Deep Learning Server
8 x NVIDIA® Tesla® P/V100 GPUs
5.3TFlops
16nm FinFET
NVLINK
4 x ConnectX®-4 EDR 100G InfiniBand
Adapters
24. 24© 2018 Mellanox Technologies
Mellanox Enables Most Efficient Machine Learning
Platforms
Highest Performance, Scalability and Productivity for Deep Learning