Data scientists use Graphics Processing Unit, or GPU, to achieve the highest performance for deep learning training and inference. However, there is complexity to managing those hardware resources efficiently which may be outside the scope of the data scientists’ expertise. OpenShift is the ideal platform for simplifying that complexity by providing powerful abstractions for scalable cloud computing. This session will review the value of GPU in data science, how modern deep learning software frameworks consume GPU resources, and the operator-based architecture that enables GPU in OpenShift today.
2. @pdmackinnon
● pmackinn@redhat.com
● Principal Engineer in the Red Hat AI Center of Excellence
● Kubeflow committer since project formation
● Open Data Hub and NVIDIA GPU Operator contributor
● KubeCon, TensorFlow World, GTC, ODSC, OpenShift
Commons, and SCaLE 17x presenter
● Technical Editor for upcoming Kubeflow publication
● Co-author of “Linux Unleashed”
● Thirty years of distributed computing consulting and
engineering experience
3. • Data science: data and models
• AI/ML lifecycle: training to inference
• Scalars, vectors, and tensors
• CPU and GPU
• Notebooks and frameworks
• The OpenShift GPU operator “family”
• The components of GPU enablement
• Installation and demo
Agenda
6. The AI/ML lifecycle
Inference/Serving
Training
Data collection
Feature
extraction
Labeling
Monitoring
Logging
Analysis
Transformation
Validation
Splitting
Model validation
Hyperparameter tuning
Algorithm selection or
development
Model Data and Model
in Production
Data
7. Scalars, vectors, and tensors
Scalar - a real number having magnitude that measures
something: volume, density, speed, energy, mass, time, etc.
Vector - a one-dimensional array of scalars: force, velocity,
momentum, etc.
Tensor - a higher-order algebraic object that could be a scalar, a
vector, a multidimensional array, a multilinear map, etc.
Modern CPU have advanced instruction sets for vector algebra
but modern GPU are built specifically to perform complex
tensor operations with a high degree of parallelism
8. Scalars, vectors, and tensors
How many matrix multiplications can be done in one clock cycle?
Image: https://iq.opengenus.org/
10¹ 10⁴ 10⁵
9. So, in one clock cycle...
CPU (scalar)
CPU/GPU
(vector)
GPU (tensor)
10. Or, DL with real world data...
Object
(scalar)
Movement
(vector)
Classification, velocity,
bearing, and much more
(tensor)
11. CPU and GPU
NVIDIA Ampere A100
• 6912 FP32 CUDA Cores
• 432 Gen3 Tensor Cores
but
• FP32 -> 19.5 TFLOPS
AMD EPYC 7702 (Rome)
• 64 CPU Cores
• 128 Threads
• 2.0GHz Base Clock
• FP32 -> 1-2 TFLOPS
13. Profit
380x speedup over CPU in basic CNN smoke test
(Intel Xeon E5-2686 vs. NVIDIA V100-SXM2-16Gi)
14. Special Resource Operator
(SRO)
● Community operator
● Reference
implementation for other
specialized hardware
○ NIC, FPGA
● Provided the code basis
for the NVIDIA GPU
Operator
● Deployed from
OperatorHub
GPU operators
NVIDIA GPU Operator
● Certified and supported on
OpenShift by NVIDIA and Red Hat
● Can be deployed from embedded
OperatorHub or with Helm
Both operators require node feature
discovery (NFD)
NVIDIA also provides the GPU feature
operator for enhanced labeling
15. Operator components
• Container-runtime-toolkit: The NVIDIA GPU Operator
supports docker and cri-o container runtimes. This daemonset
ensures the correct runtime setup for the GPU hook.
• Driver: A container deployed as a daemonset that holds all
userspace and kernelspace software to make the GPU device
work.
• Device plugin: A daemonset that monitors the health and
availability of the GPU on the node. Vital for pod scheduling.
• DCGM: Data Center GPU Monitoring - a node exporter that
captures GPU metrics for use by Prometheus.
nodeSelector:
feature.node.kubernetes.io/pci-10de.present: "true"