SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
The State of Rootless Containers
Aleksa Sarai / SUSE
Akihiro Suda / NTT
@lordcyphar
@_AkihiroSuda_
Who are we?
Aleksa Sarai
• Senior Software Engineer
at SUSE.
• Maintainer of runc and
several other Open
Container Initiative
projects.
Akihiro Suda
• Software engineer at NTT
(the largest telco in Japan)
• Maintainer of Moby
(former Docker Engine),
BuildKit, containerd, and
etc...
Agenda
• What are Rootless Containers? What are they for?
– User Namespaces
– Network Namespaces
– Mount Namespaces
– cgroups
– Current adoption status
• Demo: “Usernetes”
Introduction to Rootless Containers
• Most container runtimes* require root privileges.
– ... and lack sufficient protections against privilege escalation.
• What can you do if you don't have (and can't get) root privileges?
– (Computing clusters in universities for example.)
• Rootless containers are containers that can be created and managed
without privileged codepaths (some caveats apply).
– Requires quite a few kernel technologies, as well as some
userspace tricks...
“The Security Argument”
Another justification is to avoid privileged codepaths entirely:
• No privilege escalation if you never actually have privileges!
docker:CVE-2014-9357 docker:CVE-2015-3629 docker:CVE-2015-3627
• Configuration mistakes cannot escalate privileges above the original
user. docker:CVE-2016-8867
• Path traversal vulnerabilities only affect paths the user can already
access. docker:CVE-2015-3630 k8s:CVE-2017-1002101 k8s:CVE-2017-1002102
docker:CVE-2018-15664
(This is not a panacea, the kernel features we use have had security flaws
in the past -- especially user namespaces. But you can also restrict their
usage inside rootless containers!)
User Namespaces
• The key component of rootless containers.
– Map UIDs/GIDs in the guest to different UIDs/GIDs on the host.
– Unprivileged (on the host) users can have (limited) root inside!
• Root has UID 0 and full capabilities, but obvious restrictions apply.
– Inaccessible files, inserting kernel modules, rebooting, ...
• Unprivileged users can map only their own UID/GID (to itself or root).
– We need something better to be able to use package managers.
User Namespaces
• To allow multi-user mappings, shadow-utils now provides newuidmap
and newgidmap (packaged by most distributions).
– SETUID binaries writing mappings configured in /etc/sub[ug]id
/etc/subuid:
1000:420000:65536
/proc/42/uid_map:
0 1000 1
1 420000 65536
Provided by the admin (real root)
User can configure map UIDs after
unsharing a user namespace
User Namespaces
Problems:
• SETUID binary can be dangerous
– newuidmap & newgidmap had two CVEs so far:
• CVE-2016-6252 (CVSS v3: 7.8): integer overflow issue
• CVE-2018-7169 (CVSS v3: 5.3): supplementary GID issue
• Hard to maintain subuid & subgid
– Having 64K sub-IDs should be ok for most cases, but to allow
nesting user namespaces, an enormous number of sub-IDs would
be needed
• Potential sub-ID (up to 4G entries) starvation, especially in
LDAP environments with many users
User Namespaces
Alternative way: Single-mapping mode + Ptrace + Xattr
• Single-mapping mode does not require newuidmap/newgidmap
• Ptrace can emulate fake sub-UIDs/sub-GIDs
– No need to hook all syscalls (unlike gVisor)
– Seccomp could be used as well in future
• Xattr (extended file attributes) can be used for persistent chown(2)
emulation (see user.rootlesscontainers).
Free from potential newuidmap/newgidmap CVEs
• But slow and no real isolation across sub-UIDs/sub-GIDs
• Almost adequate for image building purpose, but not panacea
Network Namespaces
An unprivileged user can create network namespaces by acquiring the root
in a user namespace, but cannot set up the veth pair across the parent and
the child (i.e. No internet connection)
• Note: isolating network namespace is not mandatory (but no iptables, bridges, no namespaced abstract
UNIX sockets)
The Internet
Host (“parent”)
UserNS + NetNS (“child”)
NetNS NetNS
Network Namespaces
Prior work: LXC uses SETUID binary (lxc-user-nic) for setting up the
veth pair across the parent and the child
Problem: SETUID binary can be dangerous!
• CVE-2017-5985 (CVSS v3: 3.3): netns privilege escalation
• CVE-2018-6556 (NEW! disclosure: 8/10/2018): arbitrary file open(2)
Network Namespaces
Our approach: use usermode network (“Slirp”) with a TAP device
• Completely unprivileged
The Internet
Host
UserNS + NetNS
NetNS NetNS
TAP
“Slirp” TAPFD
“sendfd” (SCM_RIGHTS cmsg)
Network Namespaces
Benchmark of several “Slirp” implementations:
• slirp4netns (our implementation based on QEMU) is the fastest because
it avoids copying packets across the namespaces
MTU=1500 MTU=4000 MTU=16384 MTU=65520
vde_plug 763 Mbps Unsupported Unsupported Unsupported
VPNKit 514 Mbps 526 Mbps 540 Mbps Unsupported
slirp4netns 1.07 Gbps 2.78 Gbps 4.55 Gbps 9.21 Gbps
cf. rootful veth 52.1 Gbps 45.4 Gbps 43.6 Gbps 51.5 Gbps
Benchmark: iperf3 (netns -> host), measured on Travis CI
See rootless-containers/rootlesskit#12
Network Namespaces
Setting up /etc/resolv.conf (without chroot) is mess…
• resolv.conf may point to 127.0.0.X (for systemd-resolved /
dnsmasq)
• But 127.0.0.X DNS is unaccessible from network namespaces
• We can use bind-mount for replacing resolv.conf, but it is often
forcibly unmounted by systemd-resolved / NetworkManager
Solution: isolate /etc
• Mount an empty tmpfs on /etc
• Create the new resolv.conf on the new /etc
• Create symlinks for the real /etc/*, except resolv.conf
Root Filesystems
Your container root filesystem has to live somewhere. Many filesystem
features used by “rootful” container runtimes aren’t available.
• Ubuntu allows overlayfs in a user namespace, but this isn't supported
upstream (due to security concerns).
• Btrfs allows unprivileged subvolume management, but requires
privileges to set it up beforehand.
• Devicemapper is completely locked away from us.
Root Filesystems
A “simple” work-around is to just extract images to a directory!
• It works … but people want storage deduplication.
Alternatives:
• Reflinks to a "known good" extracted image (inode exhaustion).
– (Can use on XFS, btrfs, ... but not ext4 family.)
• Unprivileged userspace overlayfs using FUSE (Linux >=4.18).
(Container images themselves have significant flaws as well.)
cgroups
/sys/fs/cgroup is a roadblock to many features we want in rootless
containers (accounting, pause and resume, even getting a list of PIDs!).
• By default completely owned by root (and managed by systemd).
There are a variety of workarounds, with various downsides:
• cgroup namespaces (with nsdelegate) only work in cgroupv2.
• LXC’s pam_cgfs requires installation of a PAM module (and only works
for logged-in users).
Current adoption
status
runc
Fully supported since 1.0.0-rc4 (merged March 2017).
• Some minor features don’t work because of outside restrictions.
• Originally only supported completely-unprivileged (no funny
business) mode.
With 1.0.0-rc5, it supports “partially privileged” mode:
• /sys/fs/cgroups can be used if they are set up to be writable.
• Multi-user mappings are supported if they are set up with
/etc/sub[ug]id.
CLONE_NEWCGROUP still not supported (but nsdelegate is v2-only).
umoci and orca-build
umoci is the original generic OCI image manipulation tool.
• https://github.com/openSUSE/umoci
• Supports extraction (unpack) and layer generation (repack).
• It has supported rootless mode since the beginning.
– Emulates CAP_DAC_OVERRIDE with recursive chmod.
– Supports persistent xattr-based chown(2) emulation.
orca-build was one of the first dameon-less OCI (Dockerfile) builders.
• Built on top of umoci, skopeo, and runc.
• Supports rootless building, and is only 500 lines of Python.
• Currently have plans to merge into umoci as a contrib/ wrapper.
BuildKit and img
• BuildKit: next-generation backend for `docker build`
– Integrated to Docker since v18.06, but can be also used as a
standalone daemon, with support for the rootless mode
– Uses the host network namespace at the moment
• Not a huge problem when BuildKit itself is containerized
– Rootless BuildKit has been used in OpenFaaS cloud
• img: rootless and daemonless image builder based on BuildKit, by
Jessie Frazelle
– Same as BuildKit but daemonless
Kaniko
• Google’s unprivileged container image builder
• Different from our approach
– Kaniko itself needs to be executed in a container (without
--privileged)
– Dockerfile RUN instructions are executed without creating nested
containers inside the Kaniko container
• A RUN instruction gains the root in the Kaniko container
• Seems inappropriate for malicious Dockerfiles due to the lack of isolation
– Potential cloud credential leakage: #106
Docker (Moby) & Podman
• Docker / Moby
– Rootless mode is being proposed: #37375
– Supports both slirp4netns and VPNKit for network isolation
– Even Swarm-mode works! (except overlay NW atm)
• Podman: Red Hat’s daemonless replacement for docker
– Already supports rootless mode
– Uses slirp4netns (Thanks Giuseppe Scrivano!)
Kubernetes & CRI runtimes
• kubelet, kube-proxy, and dockershim require a bunch of hacks for
running without cgroups and sysctl
– No hack needed for kube-apiserver and kube-scheduler
– POC available; Planning to propose KEP to SIG-node soon
• Alternative CRI runtimes:
– CRI-O: Already supports rootless mode
– containerd: rootless mode is on plan
• TODO: stability improvement & multi-node network
“Usernetes”
Experimental binary distribution of rootless Moby (Docker), CRI-O and
Kubernetes, installable under $HOME without mess
https://github.com/rootless-containers/usernetes
$ tar xjvf usernetes-x86_64.tbz
$ cd usernetes
$ ./run.sh
$ ./kubectl.sh run -it --image..
Demo: “Usernetes”
The State of Rootless Containers

Weitere ähnliche Inhalte

Was ist angesagt?

Docker and OpenStack Boston Meetup
Docker and OpenStack Boston MeetupDocker and OpenStack Boston Meetup
Docker and OpenStack Boston Meetup
Kamesh Pemmaraju
 

Was ist angesagt? (20)

ISC HPCW talks
ISC HPCW talksISC HPCW talks
ISC HPCW talks
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020
 
[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode
 
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
 
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit
 
[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive
 
[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images
 
SCALE 2011 Deploying OpenStack with Chef
SCALE 2011 Deploying OpenStack with ChefSCALE 2011 Deploying OpenStack with Chef
SCALE 2011 Deploying OpenStack with Chef
 
Building images efficiently and securely on Kubernetes with BuildKit
Building images efficiently and securely on Kubernetes with BuildKitBuilding images efficiently and securely on Kubernetes with BuildKit
Building images efficiently and securely on Kubernetes with BuildKit
 
Introduction and Deep Dive Into Containerd
Introduction and Deep Dive Into ContainerdIntroduction and Deep Dive Into Containerd
Introduction and Deep Dive Into Containerd
 
Docker engine - Indroduc
Docker engine - IndroducDocker engine - Indroduc
Docker engine - Indroduc
 
Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019
 
App container rkt
App container rktApp container rkt
App container rkt
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013
 
Docker open stack boston
Docker open stack bostonDocker open stack boston
Docker open stack boston
 
Docker and OpenStack Boston Meetup
Docker and OpenStack Boston MeetupDocker and OpenStack Boston Meetup
Docker and OpenStack Boston Meetup
 
Docker, Docker Swarm mangement tool - Gorae
Docker, Docker Swarm mangement tool - GoraeDocker, Docker Swarm mangement tool - Gorae
Docker, Docker Swarm mangement tool - Gorae
 
Secure container: Kata container and gVisor
Secure container: Kata container and gVisorSecure container: Kata container and gVisor
Secure container: Kata container and gVisor
 
Containers Sandboxing (KubeCon 2018)
Containers Sandboxing (KubeCon 2018)Containers Sandboxing (KubeCon 2018)
Containers Sandboxing (KubeCon 2018)
 
Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)
 

Ähnlich wie The State of Rootless Containers

Introduction to NetBSD kernel
Introduction to NetBSD kernelIntroduction to NetBSD kernel
Introduction to NetBSD kernel
Mahendra M
 

Ähnlich wie The State of Rootless Containers (20)

Podman rootless containers
Podman rootless containersPodman rootless containers
Podman rootless containers
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
 
Docking postgres
Docking postgresDocking postgres
Docking postgres
 
Docker Security
Docker SecurityDocker Security
Docker Security
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
[Podman Special Event] Kubernetes in Rootless Podman
[Podman Special Event] Kubernetes in Rootless Podman[Podman Special Event] Kubernetes in Rootless Podman
[Podman Special Event] Kubernetes in Rootless Podman
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetes
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 Supercomputer
 
Stateless Hypervisors at Scale
Stateless Hypervisors at ScaleStateless Hypervisors at Scale
Stateless Hypervisors at Scale
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library Hypervisor
 
Introduction to NetBSD kernel
Introduction to NetBSD kernelIntroduction to NetBSD kernel
Introduction to NetBSD kernel
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Unikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOS
 
Docker_AGH_v0.1.3
Docker_AGH_v0.1.3Docker_AGH_v0.1.3
Docker_AGH_v0.1.3
 
Containerization using docker and its applications
Containerization using docker and its applicationsContainerization using docker and its applications
Containerization using docker and its applications
 
Containerization using docker and its applications
Containerization using docker and its applicationsContainerization using docker and its applications
Containerization using docker and its applications
 
Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'a
 

Mehr von Akihiro Suda

Mehr von Akihiro Suda (20)

20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
20240321 [KubeCon EU Pavilion] Lima.pdf_
20240321 [KubeCon EU Pavilion] Lima.pdf_20240321 [KubeCon EU Pavilion] Lima.pdf_
20240321 [KubeCon EU Pavilion] Lima.pdf_
 
20240320 [KubeCon EU Pavilion] containerd.pdf
20240320 [KubeCon EU Pavilion] containerd.pdf20240320 [KubeCon EU Pavilion] containerd.pdf
20240320 [KubeCon EU Pavilion] containerd.pdf
 
20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf
 
[KubeConNA2023] Lima pavilion
[KubeConNA2023] Lima pavilion[KubeConNA2023] Lima pavilion
[KubeConNA2023] Lima pavilion
 
[KubeConNA2023] containerd pavilion
[KubeConNA2023] containerd pavilion[KubeConNA2023] containerd pavilion
[KubeConNA2023] containerd pavilion
 
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
 
[CNCF TAG-Runtime] Usernetes Gen2
[CNCF TAG-Runtime] Usernetes Gen2[CNCF TAG-Runtime] Usernetes Gen2
[CNCF TAG-Runtime] Usernetes Gen2
 
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
 
The internals and the latest trends of container runtimes
The internals and the latest trends of container runtimesThe internals and the latest trends of container runtimes
The internals and the latest trends of container runtimes
 
[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion
 
[KubeConEU2023] containerd pavilion
[KubeConEU2023] containerd pavilion[KubeConEU2023] containerd pavilion
[KubeConEU2023] containerd pavilion
 
[Container Plumbing Days 2023] Why was nerdctl made?
[Container Plumbing Days 2023] Why was nerdctl made?[Container Plumbing Days 2023] Why was nerdctl made?
[Container Plumbing Days 2023] Why was nerdctl made?
 
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
 
[CNCF TAG-Runtime 2022-10-06] Lima
[CNCF TAG-Runtime 2022-10-06] Lima[CNCF TAG-Runtime 2022-10-06] Lima
[CNCF TAG-Runtime 2022-10-06] Lima
 
[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行
 
[Docker Tokyo #35] Docker 20.10
[Docker Tokyo #35] Docker 20.10[Docker Tokyo #35] Docker 20.10
[Docker Tokyo #35] Docker 20.10
 
DockerとPodmanの比較
DockerとPodmanの比較DockerとPodmanの比較
DockerとPodmanの比較
 
[Container Runtime Meetup] runc & User Namespaces
[Container Runtime Meetup] runc & User Namespaces[Container Runtime Meetup] runc & User Namespaces
[Container Runtime Meetup] runc & User Namespaces
 

Kürzlich hochgeladen

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Kürzlich hochgeladen (20)

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 

The State of Rootless Containers

  • 1. The State of Rootless Containers Aleksa Sarai / SUSE Akihiro Suda / NTT @lordcyphar @_AkihiroSuda_
  • 2. Who are we? Aleksa Sarai • Senior Software Engineer at SUSE. • Maintainer of runc and several other Open Container Initiative projects. Akihiro Suda • Software engineer at NTT (the largest telco in Japan) • Maintainer of Moby (former Docker Engine), BuildKit, containerd, and etc...
  • 3. Agenda • What are Rootless Containers? What are they for? – User Namespaces – Network Namespaces – Mount Namespaces – cgroups – Current adoption status • Demo: “Usernetes”
  • 4. Introduction to Rootless Containers • Most container runtimes* require root privileges. – ... and lack sufficient protections against privilege escalation. • What can you do if you don't have (and can't get) root privileges? – (Computing clusters in universities for example.) • Rootless containers are containers that can be created and managed without privileged codepaths (some caveats apply). – Requires quite a few kernel technologies, as well as some userspace tricks...
  • 5. “The Security Argument” Another justification is to avoid privileged codepaths entirely: • No privilege escalation if you never actually have privileges! docker:CVE-2014-9357 docker:CVE-2015-3629 docker:CVE-2015-3627 • Configuration mistakes cannot escalate privileges above the original user. docker:CVE-2016-8867 • Path traversal vulnerabilities only affect paths the user can already access. docker:CVE-2015-3630 k8s:CVE-2017-1002101 k8s:CVE-2017-1002102 docker:CVE-2018-15664 (This is not a panacea, the kernel features we use have had security flaws in the past -- especially user namespaces. But you can also restrict their usage inside rootless containers!)
  • 6. User Namespaces • The key component of rootless containers. – Map UIDs/GIDs in the guest to different UIDs/GIDs on the host. – Unprivileged (on the host) users can have (limited) root inside! • Root has UID 0 and full capabilities, but obvious restrictions apply. – Inaccessible files, inserting kernel modules, rebooting, ... • Unprivileged users can map only their own UID/GID (to itself or root). – We need something better to be able to use package managers.
  • 7. User Namespaces • To allow multi-user mappings, shadow-utils now provides newuidmap and newgidmap (packaged by most distributions). – SETUID binaries writing mappings configured in /etc/sub[ug]id /etc/subuid: 1000:420000:65536 /proc/42/uid_map: 0 1000 1 1 420000 65536 Provided by the admin (real root) User can configure map UIDs after unsharing a user namespace
  • 8. User Namespaces Problems: • SETUID binary can be dangerous – newuidmap & newgidmap had two CVEs so far: • CVE-2016-6252 (CVSS v3: 7.8): integer overflow issue • CVE-2018-7169 (CVSS v3: 5.3): supplementary GID issue • Hard to maintain subuid & subgid – Having 64K sub-IDs should be ok for most cases, but to allow nesting user namespaces, an enormous number of sub-IDs would be needed • Potential sub-ID (up to 4G entries) starvation, especially in LDAP environments with many users
  • 9. User Namespaces Alternative way: Single-mapping mode + Ptrace + Xattr • Single-mapping mode does not require newuidmap/newgidmap • Ptrace can emulate fake sub-UIDs/sub-GIDs – No need to hook all syscalls (unlike gVisor) – Seccomp could be used as well in future • Xattr (extended file attributes) can be used for persistent chown(2) emulation (see user.rootlesscontainers). Free from potential newuidmap/newgidmap CVEs • But slow and no real isolation across sub-UIDs/sub-GIDs • Almost adequate for image building purpose, but not panacea
  • 10. Network Namespaces An unprivileged user can create network namespaces by acquiring the root in a user namespace, but cannot set up the veth pair across the parent and the child (i.e. No internet connection) • Note: isolating network namespace is not mandatory (but no iptables, bridges, no namespaced abstract UNIX sockets) The Internet Host (“parent”) UserNS + NetNS (“child”) NetNS NetNS
  • 11. Network Namespaces Prior work: LXC uses SETUID binary (lxc-user-nic) for setting up the veth pair across the parent and the child Problem: SETUID binary can be dangerous! • CVE-2017-5985 (CVSS v3: 3.3): netns privilege escalation • CVE-2018-6556 (NEW! disclosure: 8/10/2018): arbitrary file open(2)
  • 12. Network Namespaces Our approach: use usermode network (“Slirp”) with a TAP device • Completely unprivileged The Internet Host UserNS + NetNS NetNS NetNS TAP “Slirp” TAPFD “sendfd” (SCM_RIGHTS cmsg)
  • 13. Network Namespaces Benchmark of several “Slirp” implementations: • slirp4netns (our implementation based on QEMU) is the fastest because it avoids copying packets across the namespaces MTU=1500 MTU=4000 MTU=16384 MTU=65520 vde_plug 763 Mbps Unsupported Unsupported Unsupported VPNKit 514 Mbps 526 Mbps 540 Mbps Unsupported slirp4netns 1.07 Gbps 2.78 Gbps 4.55 Gbps 9.21 Gbps cf. rootful veth 52.1 Gbps 45.4 Gbps 43.6 Gbps 51.5 Gbps Benchmark: iperf3 (netns -> host), measured on Travis CI See rootless-containers/rootlesskit#12
  • 14. Network Namespaces Setting up /etc/resolv.conf (without chroot) is mess… • resolv.conf may point to 127.0.0.X (for systemd-resolved / dnsmasq) • But 127.0.0.X DNS is unaccessible from network namespaces • We can use bind-mount for replacing resolv.conf, but it is often forcibly unmounted by systemd-resolved / NetworkManager Solution: isolate /etc • Mount an empty tmpfs on /etc • Create the new resolv.conf on the new /etc • Create symlinks for the real /etc/*, except resolv.conf
  • 15. Root Filesystems Your container root filesystem has to live somewhere. Many filesystem features used by “rootful” container runtimes aren’t available. • Ubuntu allows overlayfs in a user namespace, but this isn't supported upstream (due to security concerns). • Btrfs allows unprivileged subvolume management, but requires privileges to set it up beforehand. • Devicemapper is completely locked away from us.
  • 16. Root Filesystems A “simple” work-around is to just extract images to a directory! • It works … but people want storage deduplication. Alternatives: • Reflinks to a "known good" extracted image (inode exhaustion). – (Can use on XFS, btrfs, ... but not ext4 family.) • Unprivileged userspace overlayfs using FUSE (Linux >=4.18). (Container images themselves have significant flaws as well.)
  • 17. cgroups /sys/fs/cgroup is a roadblock to many features we want in rootless containers (accounting, pause and resume, even getting a list of PIDs!). • By default completely owned by root (and managed by systemd). There are a variety of workarounds, with various downsides: • cgroup namespaces (with nsdelegate) only work in cgroupv2. • LXC’s pam_cgfs requires installation of a PAM module (and only works for logged-in users).
  • 19. runc Fully supported since 1.0.0-rc4 (merged March 2017). • Some minor features don’t work because of outside restrictions. • Originally only supported completely-unprivileged (no funny business) mode. With 1.0.0-rc5, it supports “partially privileged” mode: • /sys/fs/cgroups can be used if they are set up to be writable. • Multi-user mappings are supported if they are set up with /etc/sub[ug]id. CLONE_NEWCGROUP still not supported (but nsdelegate is v2-only).
  • 20. umoci and orca-build umoci is the original generic OCI image manipulation tool. • https://github.com/openSUSE/umoci • Supports extraction (unpack) and layer generation (repack). • It has supported rootless mode since the beginning. – Emulates CAP_DAC_OVERRIDE with recursive chmod. – Supports persistent xattr-based chown(2) emulation. orca-build was one of the first dameon-less OCI (Dockerfile) builders. • Built on top of umoci, skopeo, and runc. • Supports rootless building, and is only 500 lines of Python. • Currently have plans to merge into umoci as a contrib/ wrapper.
  • 21. BuildKit and img • BuildKit: next-generation backend for `docker build` – Integrated to Docker since v18.06, but can be also used as a standalone daemon, with support for the rootless mode – Uses the host network namespace at the moment • Not a huge problem when BuildKit itself is containerized – Rootless BuildKit has been used in OpenFaaS cloud • img: rootless and daemonless image builder based on BuildKit, by Jessie Frazelle – Same as BuildKit but daemonless
  • 22. Kaniko • Google’s unprivileged container image builder • Different from our approach – Kaniko itself needs to be executed in a container (without --privileged) – Dockerfile RUN instructions are executed without creating nested containers inside the Kaniko container • A RUN instruction gains the root in the Kaniko container • Seems inappropriate for malicious Dockerfiles due to the lack of isolation – Potential cloud credential leakage: #106
  • 23. Docker (Moby) & Podman • Docker / Moby – Rootless mode is being proposed: #37375 – Supports both slirp4netns and VPNKit for network isolation – Even Swarm-mode works! (except overlay NW atm) • Podman: Red Hat’s daemonless replacement for docker – Already supports rootless mode – Uses slirp4netns (Thanks Giuseppe Scrivano!)
  • 24. Kubernetes & CRI runtimes • kubelet, kube-proxy, and dockershim require a bunch of hacks for running without cgroups and sysctl – No hack needed for kube-apiserver and kube-scheduler – POC available; Planning to propose KEP to SIG-node soon • Alternative CRI runtimes: – CRI-O: Already supports rootless mode – containerd: rootless mode is on plan • TODO: stability improvement & multi-node network
  • 25. “Usernetes” Experimental binary distribution of rootless Moby (Docker), CRI-O and Kubernetes, installable under $HOME without mess https://github.com/rootless-containers/usernetes $ tar xjvf usernetes-x86_64.tbz $ cd usernetes $ ./run.sh $ ./kubectl.sh run -it --image..