Presentation delivered at the 2017 LinuxCon China.
Container is a popular technology in cloud with the merits of fast provisioning, high density and near-native performance. New requirements have been rising to further use GPU to accelerate various applications in containers, e.g. media trans-coding, machine learning, etc. However there is still gap of managing GPU in such container usages. This presentation will first review the status of GPU support on both native container and Intel clear container, then provide an anatomy of technical gaps and enabling plans in major areas (orchestration layer, resource isolation and QoS performance isolation), then review our work for GPU virtualization in clear container and prototype work through extensions in docker plugin, cgroup and GPU scheduler to fix those gaps.
4. 4
What Makes a Container?
Platform
Host OS
Namespaces
Control
groups
App
Libs
App
Libs
App
Libs
Platform
Hypervisor
App
Libs
App
Libs
Guest OS
App
Libs
Guest OS
Bare Metal Containers
Docker DockerDocker
VM Containers
(e.g. Clear Container)
5. 5
Path to GPU Containers
Bare Metal Containers VM Containers
GPU namespaces
GPU control groups
GPU virtualization technology
Integrate with container runtime
6. 6
GPU Namespace:
DRM Render Node
DRM/i915
Card# RenderD#
Xserver ComputeMedia
Xcode
Unprivileged
No mode-setting
Offscreen
3D
Privileged
Mode setting
Global gem flink
7. 7
GPU Namespace:
DRM Render Node
DRM/i915
RenderD#
Media
Xcode
Libs
Offscreen
3D
Libs
Compute
Libs
GPU
Contexts
Sharing happen with dmabuf only (no
global gem object on card0)
No need of creating another namespace
Expose DRM render nodes to provide
isolated views through device cgroup
e.g “docker run –device=/dev/dri/renderD128
–it debian”
Per-process hardware context in GPU
Hardware managed render context switch
8. 8
GPU CGroup
CPU Mem BlkIO Net GPU
container
Container
resource
manager
Cgroup
controller
container container container
…
Container-granule GPU resource control
9. 9
GPU CGroup
New subsystem: gpu_cgrp_subsys
Control knobs
gpu.shares (% share of GPU cycles, work in progress)
gpu.priority (workload priority in the system)
gpu.memory (maximum GPU memory size)
DRM/i915
Favor ‘shares/priority’ in workload scheduler
Enforce memory limitation (optionally for UMA graphics)
10. 10
Container Runtime Integration
{
“linux”: {
“devices”: [
{
“path”: “/dev/dri/renderD128”,
“type”: “c”,
“major”: 226,
“minor”: 128,
“fileMode”: 432,
“uid”: 0,
“gid”: 0
}
],
“resources”: {
…
“gpu”: {
“memory”: <max_mem_in_bytes>,
“prio”: <workload_priority>
}
}
}
RunC: an universal container runtime
Understand new gpu cgroup
Add new Linux resources config.json options for gpu
Runtime-tools helper to generate config stubs
New Docker GPU control parameters
e.g docker –gpu-mem=xxx –gpu-priority=xxx
–gpu-share=xxx …
11. 11
Image Portability
Main challenge - library compatibility
User level libraries MUST match underlying kernel/HW
Introduce Docker helper for image inspection
Compare image libraries with host environment
Reject to launch container upon any mismatch
12. 12
VM GPU Containers
DRM/i915
vGPU
Media
Xcode
Libs
GPU
vGPU vGPU
Intel GVT-g
Guest OS
Media
Xcode
Libs
Guest OS
Media
Xcode
Libs
Guest OS
CC VM CC VM CC VM Clear Container (CC)
Intel graphics virtualization technology
(Intel GVT-g)
Assign vGPU to VM container
In upstream Linux since v4.10
https://01.org/igvt-g
13. 13
VM GPU Containers
Nested containers in VM also have GPU access
Need Improvement:
Scalability
Only support 8vGPUs today due to resource partitioning
Need some enlightened way to further scale in the short term
Boot time
Full guest graphics driver load takes >1.5s
Fast optimization to <0.5s in progress
14. 14
Status
GPU cgroup PoC
https://github.com/zhenyw/linux
https://github.com/zhenyw/runc
https://github.com/zhenyw/moby
https://github.com/zhenyw/cli
GPU virtualization for Clear container
https://clearlinux.org/features/intel%C2%AE-clear-containers
https://github.com/01org/gvt-linux/wiki/Clear-Container-with-GVTg-Setup-Guide