SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
Cgroups, namespaces and beyond:
What are containers made from?
Jérôme🐳🚢📦Petazzoni
Tinkerer Extraordinaire
Who am I?
Jérôme Petazzoni (@jpetazzo)
French software engineer living in California
I have built and scaled the dotCloud PaaS
almost 5 years ago, time flies!
I talk (too much) about containers
first time was maybe in February 2013, at SCALE in L.A.
Proud member of the House of Bash
when annoyed, we replace things with tiny shell scripts
Outline
What is a container?
High level and low level overviews
The building blocks
Namespaces, cgroups, copy-on-write storage
Container runtimes
Docker, LXC, rkt, runC, systemd-nspawn
OpenVZ, Jails, Zones
Hand-crafted, house-blend, artisan containers
Demos!
What is a container?
📦
High level approach: lightweight VM
I can get a shell on it
through SSH or otherwise
It "feels" like a VM
own process space
own network interface
can run stuff as root
can install packages
can run services
can mess up routing, iptables …
Low level approach: chroot on steroids
It's not quite like a VM:
uses the host kernel
can't boot a different OS
can't have its own modules
doesn't need init as PID 1
doesn't need syslogd, cron...
It's just normal processes on the host machine
contrast with VMs which are opaque
How are they implemented?
Let's look in the kernel source!
Go to LXR
Look for "LXC" → zero result
Look for "container" → 1000+ results
Almost all of them are about data structures
or other unrelated concepts like “ACPI containers”
There are some references to “our” containers
but only in the documentation
How are they implemented?
Let's look in the kernel source!
Go to LXR
Look for "LXC" → zero result
Look for "container" → 1000+ results
Almost all of them are about data structures
or other unrelated concepts like “ACPI containers”
There are some references to “our” containers
but only in the documentation
??? Are containers even in the kernel ???
How are they implemented?
Let's look in the kernel source!
Go to LXR
Look for "LXC" → zero result
Look for "container" → 1000+ results
Almost all of them are about data structures
or other unrelated concepts like “ACPI containers”
There are some references to “our” containers
but only in the documentation
??? Are containers even in the kernel ???
??? Do containers even actually EXIST ???
The building blocks:
control groups
Control groups
Resource metering and limiting
memory
CPU
block I/O
network*
Device node (/dev/*) access control
Crowd control
*With cooperation from iptables/tc; more on that later
Generalities
Each subsystem has a hierarchy (tree)
separate hierarchies for CPU, memory, block I/O…
Hierarchies are independent
the trees for e.g. memory and CPU can be different
Each process is in a node in each hierarchy
think of each hierarchy as a different dimension or axis
Each hierarchy starts with 1 node (the root)
Initially, all processes start at the root node*
Each node = group of processes
sharing the same resources
*It's a bit more subtle; more on that later
Example
Memory cgroup: accounting
Keeps track of pages used by each group:
file (read/write/mmap from block devices)
anonymous (stack, heap, anonymous mmap)
active (recently accessed)
inactive (candidate for eviction)
Each page is “charged” to a group
Pages can be shared across multiple groups
e.g. multiple processes reading from the same files
when pages are shared, only one group “pays” for a page
Memory cgroup: limits
Each group can have its own limits
limits are optional
two kinds of limits: soft and hard limits
Soft limits are not enforced
they influence reclaim under memory pressure
Hard limits will trigger a per-group OOM killer
Limits can be set for different kinds of memory
physical memory
kernel memory
total memory
Memory cgroup: avoiding OOM killer
Do you like when the kernel terminates random
processes because something unrelated ate all
the memory in the system?
Setup oom-notifier
Then, when the hard limit is exceeded:
freeze all processes in the group
notify user space (instead of going rampage)
we can kill processes, raise limits, migrate containers ...
when we're in the clear again, unfreeze the group
Memory cgroup: tricky details
Each time the kernel gives a page to a process,
or takes it away, it updates the counters
This adds some overhead
This cannot be enabled/disabled per process
it has to be done at boot time
When multiple groups use the same page,
only the first one is “charged”
but if it stops using it, the charge is moved to another group
HugeTLB cgroup
Controls the amount of huge pages
usable by a process
safe to ignore if you don’t know what huge pages are
Cpu cgroup
Keeps track of user/system CPU time
Keeps track of usage per CPU
Allows to set weights
Can't set CPU limits
OK, let's say you give N%
then the CPU throttles to a lower clock speed
now what?
same if you give a time slot
instructions? their exec speed varies wildly
¯_ツ_/¯
Cpuset cgroup
Pin groups to specific CPU(s)
Reserve CPUs for specific apps
Avoid processes bouncing between CPUs
Also relevant for NUMA systems
Provides extra dials and knobs
per zone memory pressure, process migration costs…
Blkio cgroup
Keeps track of I/Os for each group
per block device
read vs write
sync vs async
Set throttle (limits) for each group
per block device
read vs write
ops vs bytes
Set relative weights for each group
Note: most writes go through the page cache
so classic writes will appear to be unthrottled at first
Net_cls and net_prio cgroup
Automatically set traffic class or priority,
for traffic generated by processes in the group
Only works for egress traffic
Net_cls will assign traffic to a class
class then has to be matched with tc/iptables,
otherwise traffic just flows normally
Net_prio will assign traffic to a priority
priorities are used by queuing disciplines
Devices cgroup
Controls what the group can do on device nodes
Permissions include read/write/mknod
Typical use:
allow /dev/{tty,zero,random,null} ...
deny everything else
A few interesting nodes:
/dev/net/tun (network interface manipulation)
/dev/fuse (filesystems in user space)
/dev/kvm (VMs in containers, yay inception!)
/dev/dri (GPU)
Freezer cgroup
Allows to freeze/thaw a group of processes
Similar in functionality to SIGSTOP/SIGCONT
Cannot be detected by the processes
that’s how it differs from SIGSTOP/SIGCONT!
Doesn’t impede ptrace/debugging
Specific usecases
cluster batch scheduling
process migration
Subtleties
PID 1 is placed at the root of each hierarchy
New processes start in their parent’s groups
Groups are materialized by pseudo-fs
typically mounted in /sys/fs/cgroup
Groups are created by mkdir in the pseudo-fs
mkdir /sys/fs/cgroup/memory/somegroup/subcgroup
To move a process:
echo $PID > /sys/fs/cgroup/.../tasks
The cgroup wars: systemd vs cgmanager vs ...
The building blocks:
namespaces
Namespaces
Provide processes with their own system view
Cgroups = limits how much you can use;
namespaces = limits what you can see
you can’t use/affect what you can’t see
Multiple namespaces:
pid
net
mnt
uts
ipc
user
Each process is in one namespace of each type
Pid namespace
Processes within a PID namespace only see
processes in the same PID namespace
Each PID namespace has its own numbering
starting at 1
If PID 1 goes away, whole namespace is killed
Those namespaces can be nested
A process ends up having multiple PIDs
one per namespace in which its nested
Net namespace: in theory
Processes within a given network namespace
get their own private network stack, including:
network interfaces (including lo)
routing tables
iptables rules
sockets (ss, netstat)
You can move a network interface across netns
ip link set dev eth0 netns PID
Net namespace: in practice
Typical use-case:
use veth pairs
(two virtual interfaces acting as a cross-over cable)
eth0 in container network namespace
paired with vethXXX in host network namespace
all the vethXXX are bridged together
(Docker calls the bridge docker0)
But also: the magic of --net container
shared localhost (and more!)
Mnt namespace
Processes can have their own root fs
conceptually close to chroot
Processes can also have “private” mounts
/tmp (scoped per user, per service...)
masking of /proc, /sys
NFS auto-mounts (why not?)
Mounts can be totally private, or shared
No easy way to pass along a mount
from a namespace to another ☹
Uts namespace
gethostname / sethostname
'nuff said!
Ipc namespace
Ipc namespace
Does anybody know about IPC?
Ipc namespace
Does anybody know about IPC?
Does anybody care about IPC?
Ipc namespace
Does anybody know about IPC?
Does anybody care about IPC?
Allows a process (or group of processes)
to have own:
IPC semaphores
IPC message queues
IPC shared memory
... without risk of conflict with other instances
User namespace
Allows to map UID/GID; e.g.:
UID 0→1999 in container C1 is mapped to
UID 10000→11999 on host;
UID 0→1999 in container C2 is mapped to
UID 12000→13999 on host;
etc.
UID in containers becomes irrelevant
just use UID 0 in the container
it gets squashed to a non-privileged user outside
Security improvement
But: devil is in the details
Just landed in Docker experimental branch!
Namespace manipulation
Namespaces are created with the clone() syscall
i.e. with extra flags when creating a new process
Namespaces are materialized by pseudo-files
/proc/<pid>/ns
When the last process of a namespace exits,
it is destroyed
but can be preserved by bind-mounting the pseudo-file
It’s possible to “enter” a namespace with setns()
exposed by the nsenter wrapper in util-linux
The building blocks:
copy-on-write
Copy-on-write storage
Create a new container instantly
instead of copying its whole filesystem
Storage keeps track of what has changed
Many options available
AUFS, overlay (file level)
device mapper thinp (block level)
BTRFS, ZFS (FS level)
Considerably reduces footprint and “boot” times
See also: Deep dive into Docker storage drivers
Other details
Orthogonality
All those things can be used independently
If you only need resource isolation,
use a few selected cgroups
If you want to simulate a network of routers,
use network namespaces
Put the debugger in a container's namespaces,
but not its cgroups
so that the debugger doesn’t steal resources
Setup a network interface in a container,
then move it to another
etc.
Some missing bits
Capabilities
break down "root / non-root" into fine-grained rights
allow to keep root, but without the dangerous bits
however: CAP_SYS_ADMIN remains a big catchall
SELinux / AppArmor ...
containers that actually contain
deserve a whole talk on their own
check https://github.com/jfrazelle/bane
(AppArmor profile generator)
Container runtimes
Container runtimes
using cgroups+namespaces
LXC
Set of userland tools
A container is a directory in /var/lib/lxc
Small config file + root filesystem
Early versions had no support for CoW
Early versions had no support to move images
Requires significant amount of elbow grease
easy for sysadmins/ops, hard for devs
systemd-nspawn
From its manpage:
“For debugging, testing and building”
“Similar to chroot, but more powerful”
“Implements the Container Interface”
Seems to position itself as plumbing
Recently added support for ɹǝʞɔop images
#define INDEX_HOST "index.do" /* the URL we get the data from */ "cker.io”
#define HEADER_TOKEN "X-Do" /* the HTTP header for the auth token */ "cker-Token:”
#define HEADER_REGISTRY "X-Do" /*the HTTP header for the registry */ "cker-Endpoints:”
Docker Engine
Daemon controlled by REST(ish) API
First versions shelled out to LXC
now uses its own libcontainer runtime
Manages containers, images, builds, and more
Some people think it does too many things
rkt, runC
Back to the basics!
Focus on container execution
no API, no image management, no build, etc.
They implement different specifications:
rkt implements appc (App Container)
runC implements OCP (Open Container Project)
runC leverages Docker's libcontainer
Which one is best?
They all use the same kernel features
Performance will be exactly the same
Look at:
features
design
ecosystem
Container runtimes
using other mechanisms
OpenVZ
Also Linux
Older, but battle-tested
e.g. Travis CI gives you root in OpenVZ
Tons of neat features too
ploop (efficient block device for containers)
checkpoint/restore, live migration
venet (~more efficient veth)
Still developed and maintained
Jails / Zones
FreeBSD / Solaris
Coarser granularity of features
Strong emphasis on security
Great for hosting providers
Not so much for developers
where's the equivalent of docker run -ti ubuntu?
Illumos branded zones can run Linux binaries
with Joyent Triton, you can run Linux containers on Illumos
This is different from the Oracle Solaris port
the latter will run Solaris binaries, not Linux binaries
Building our own
containers
Not for production use!
(For educational purposes only)
55
Thank you!
Jérôme🐳🚢📦Petazzoni
@jpetazzo
jerome@docker.com

Weitere ähnliche Inhalte

Was ist angesagt?

Kubernetes
KubernetesKubernetes
Kuberneteserialc_w
 
Container Security
Container SecurityContainer Security
Container SecuritySalman Baset
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityJérôme Petazzoni
 
Best practices for ansible
Best practices for ansibleBest practices for ansible
Best practices for ansibleGeorge Shuklin
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Neeraj Shrimali
 
containerd the universal container runtime
containerd the universal container runtimecontainerd the universal container runtime
containerd the universal container runtimeDocker, Inc.
 
OverlayFS as a Docker Storage Driver
OverlayFS as a Docker Storage DriverOverlayFS as a Docker Storage Driver
OverlayFS as a Docker Storage DriverTomoya Akase
 
Introduction to Ansible
Introduction to AnsibleIntroduction to Ansible
Introduction to AnsibleCoreStack
 
K8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals TrainingK8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals TrainingPiotr Perzyna
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleMihai Criveti
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveMichal Rostecki
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingViller Hsiao
 
Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Weaveworks
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespacesLocaweb
 
Linux Kernel Init Process
Linux Kernel Init ProcessLinux Kernel Init Process
Linux Kernel Init ProcessKernel TLV
 
Understanding docker networking
Understanding docker networkingUnderstanding docker networking
Understanding docker networkingLorenzo Fontana
 

Was ist angesagt? (20)

Kubernetes
KubernetesKubernetes
Kubernetes
 
Kubernetes Basics
Kubernetes BasicsKubernetes Basics
Kubernetes Basics
 
Docker
DockerDocker
Docker
 
Container Security
Container SecurityContainer Security
Container Security
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and security
 
Best practices for ansible
Best practices for ansibleBest practices for ansible
Best practices for ansible
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup.
 
Docker, LinuX Container
Docker, LinuX ContainerDocker, LinuX Container
Docker, LinuX Container
 
containerd the universal container runtime
containerd the universal container runtimecontainerd the universal container runtime
containerd the universal container runtime
 
OverlayFS as a Docker Storage Driver
OverlayFS as a Docker Storage DriverOverlayFS as a Docker Storage Driver
OverlayFS as a Docker Storage Driver
 
Introduction to Ansible
Introduction to AnsibleIntroduction to Ansible
Introduction to Ansible
 
K8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals TrainingK8s in 3h - Kubernetes Fundamentals Training
K8s in 3h - Kubernetes Fundamentals Training
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image Lifecycle
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep Dive
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)
 
Ansible
AnsibleAnsible
Ansible
 
Linux cgroups and namespaces
Linux cgroups and namespacesLinux cgroups and namespaces
Linux cgroups and namespaces
 
Linux Kernel Init Process
Linux Kernel Init ProcessLinux Kernel Init Process
Linux Kernel Init Process
 
Understanding docker networking
Understanding docker networkingUnderstanding docker networking
Understanding docker networking
 

Ähnlich wie Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Europe 2015)

LXC Containers and AUFs
LXC Containers and AUFsLXC Containers and AUFs
LXC Containers and AUFsDocker, Inc.
 
Scale11x lxc talk
Scale11x lxc talkScale11x lxc talk
Scale11x lxc talkdotCloud
 
Lightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFSLightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFSJérôme Petazzoni
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Jérôme Petazzoni
 
Containerization Is More than the New Virtualization
Containerization Is More than the New VirtualizationContainerization Is More than the New Virtualization
Containerization Is More than the New VirtualizationC4Media
 
Introduction to containers
Introduction to containersIntroduction to containers
Introduction to containersNitish Jadia
 
Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势Anthony Wong
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Jérôme Petazzoni
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersVaibhav Sharma
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xrkr10
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...Yandex
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013dotCloud
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Docker, Inc.
 
Java in containers
Java in containersJava in containers
Java in containersMartin Baez
 
Docker Tips And Tricks at the Docker Beijing Meetup
Docker Tips And Tricks at the Docker Beijing MeetupDocker Tips And Tricks at the Docker Beijing Meetup
Docker Tips And Tricks at the Docker Beijing MeetupJérôme Petazzoni
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!C4Media
 
The building blocks of docker.
The building blocks of docker.The building blocks of docker.
The building blocks of docker.Chafik Belhaoues
 
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...Maarten Balliauw
 

Ähnlich wie Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Europe 2015) (20)

LXC Containers and AUFs
LXC Containers and AUFsLXC Containers and AUFs
LXC Containers and AUFs
 
Scale11x lxc talk
Scale11x lxc talkScale11x lxc talk
Scale11x lxc talk
 
Lightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFSLightweight Virtualization: LXC containers & AUFS
Lightweight Virtualization: LXC containers & AUFS
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
Containerization Is More than the New Virtualization
Containerization Is More than the New VirtualizationContainerization Is More than the New Virtualization
Containerization Is More than the New Virtualization
 
Introduction to containers
Introduction to containersIntroduction to containers
Introduction to containers
 
Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势
 
.ppt
.ppt.ppt
.ppt
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Java in containers
Java in containersJava in containers
Java in containers
 
Docker Tips And Tricks at the Docker Beijing Meetup
Docker Tips And Tricks at the Docker Beijing MeetupDocker Tips And Tricks at the Docker Beijing Meetup
Docker Tips And Tricks at the Docker Beijing Meetup
 
Let's Talk Locks!
Let's Talk Locks!Let's Talk Locks!
Let's Talk Locks!
 
The building blocks of docker.
The building blocks of docker.The building blocks of docker.
The building blocks of docker.
 
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
JetBrains Day Seoul - Exploring .NET’s memory management – a trip down memory...
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 

Mehr von Jérôme Petazzoni

Use the Source or Join the Dark Side: differences between Docker Community an...
Use the Source or Join the Dark Side: differences between Docker Community an...Use the Source or Join the Dark Side: differences between Docker Community an...
Use the Source or Join the Dark Side: differences between Docker Community an...Jérôme Petazzoni
 
Orchestration for the rest of us
Orchestration for the rest of usOrchestration for the rest of us
Orchestration for the rest of usJérôme Petazzoni
 
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...Jérôme Petazzoni
 
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...Jérôme Petazzoni
 
Containers, docker, and security: state of the union (Bay Area Infracoders Me...
Containers, docker, and security: state of the union (Bay Area Infracoders Me...Containers, docker, and security: state of the union (Bay Area Infracoders Me...
Containers, docker, and security: state of the union (Bay Area Infracoders Me...Jérôme Petazzoni
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...Jérôme Petazzoni
 
How to contribute to large open source projects like Docker (LinuxCon 2015)
How to contribute to large open source projects like Docker (LinuxCon 2015)How to contribute to large open source projects like Docker (LinuxCon 2015)
How to contribute to large open source projects like Docker (LinuxCon 2015)Jérôme Petazzoni
 
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...Jérôme Petazzoni
 
Microservices. Microservices everywhere! (At OSCON 2015)
Microservices. Microservices everywhere! (At OSCON 2015)Microservices. Microservices everywhere! (At OSCON 2015)
Microservices. Microservices everywhere! (At OSCON 2015)Jérôme Petazzoni
 
Deploy microservices in containers with Docker and friends - KCDC2015
Deploy microservices in containers with Docker and friends - KCDC2015Deploy microservices in containers with Docker and friends - KCDC2015
Deploy microservices in containers with Docker and friends - KCDC2015Jérôme Petazzoni
 
Containers: from development to production at DevNation 2015
Containers: from development to production at DevNation 2015Containers: from development to production at DevNation 2015
Containers: from development to production at DevNation 2015Jérôme Petazzoni
 
Immutable infrastructure with Docker and containers (GlueCon 2015)
Immutable infrastructure with Docker and containers (GlueCon 2015)Immutable infrastructure with Docker and containers (GlueCon 2015)
Immutable infrastructure with Docker and containers (GlueCon 2015)Jérôme Petazzoni
 
The Docker ecosystem and the future of application deployment
The Docker ecosystem and the future of application deploymentThe Docker ecosystem and the future of application deployment
The Docker ecosystem and the future of application deploymentJérôme Petazzoni
 
Docker: automation for the rest of us
Docker: automation for the rest of usDocker: automation for the rest of us
Docker: automation for the rest of usJérôme Petazzoni
 
Docker Non Technical Presentation
Docker Non Technical PresentationDocker Non Technical Presentation
Docker Non Technical PresentationJérôme Petazzoni
 
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special EditionIntroduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special EditionJérôme Petazzoni
 
Introduction to Docker, December 2014 "Tour de France" Edition
Introduction to Docker, December 2014 "Tour de France" EditionIntroduction to Docker, December 2014 "Tour de France" Edition
Introduction to Docker, December 2014 "Tour de France" EditionJérôme Petazzoni
 
Containers, Docker, and Microservices: the Terrific Trio
Containers, Docker, and Microservices: the Terrific TrioContainers, Docker, and Microservices: the Terrific Trio
Containers, Docker, and Microservices: the Terrific TrioJérôme Petazzoni
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerJérôme Petazzoni
 
Introduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange CountyIntroduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange CountyJérôme Petazzoni
 

Mehr von Jérôme Petazzoni (20)

Use the Source or Join the Dark Side: differences between Docker Community an...
Use the Source or Join the Dark Side: differences between Docker Community an...Use the Source or Join the Dark Side: differences between Docker Community an...
Use the Source or Join the Dark Side: differences between Docker Community an...
 
Orchestration for the rest of us
Orchestration for the rest of usOrchestration for the rest of us
Orchestration for the rest of us
 
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...
Docker : quels enjeux pour le stockage et réseau ? Paris Open Source Summit ...
 
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...
Making DevOps Secure with Docker on Solaris (Oracle Open World, with Jesse Bu...
 
Containers, docker, and security: state of the union (Bay Area Infracoders Me...
Containers, docker, and security: state of the union (Bay Area Infracoders Me...Containers, docker, and security: state of the union (Bay Area Infracoders Me...
Containers, docker, and security: state of the union (Bay Area Infracoders Me...
 
From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...From development environments to production deployments with Docker, Compose,...
From development environments to production deployments with Docker, Compose,...
 
How to contribute to large open source projects like Docker (LinuxCon 2015)
How to contribute to large open source projects like Docker (LinuxCon 2015)How to contribute to large open source projects like Docker (LinuxCon 2015)
How to contribute to large open source projects like Docker (LinuxCon 2015)
 
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
Containers, Docker, and Security: State Of The Union (LinuxCon and ContainerC...
 
Microservices. Microservices everywhere! (At OSCON 2015)
Microservices. Microservices everywhere! (At OSCON 2015)Microservices. Microservices everywhere! (At OSCON 2015)
Microservices. Microservices everywhere! (At OSCON 2015)
 
Deploy microservices in containers with Docker and friends - KCDC2015
Deploy microservices in containers with Docker and friends - KCDC2015Deploy microservices in containers with Docker and friends - KCDC2015
Deploy microservices in containers with Docker and friends - KCDC2015
 
Containers: from development to production at DevNation 2015
Containers: from development to production at DevNation 2015Containers: from development to production at DevNation 2015
Containers: from development to production at DevNation 2015
 
Immutable infrastructure with Docker and containers (GlueCon 2015)
Immutable infrastructure with Docker and containers (GlueCon 2015)Immutable infrastructure with Docker and containers (GlueCon 2015)
Immutable infrastructure with Docker and containers (GlueCon 2015)
 
The Docker ecosystem and the future of application deployment
The Docker ecosystem and the future of application deploymentThe Docker ecosystem and the future of application deployment
The Docker ecosystem and the future of application deployment
 
Docker: automation for the rest of us
Docker: automation for the rest of usDocker: automation for the rest of us
Docker: automation for the rest of us
 
Docker Non Technical Presentation
Docker Non Technical PresentationDocker Non Technical Presentation
Docker Non Technical Presentation
 
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special EditionIntroduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
 
Introduction to Docker, December 2014 "Tour de France" Edition
Introduction to Docker, December 2014 "Tour de France" EditionIntroduction to Docker, December 2014 "Tour de France" Edition
Introduction to Docker, December 2014 "Tour de France" Edition
 
Containers, Docker, and Microservices: the Terrific Trio
Containers, Docker, and Microservices: the Terrific TrioContainers, Docker, and Microservices: the Terrific Trio
Containers, Docker, and Microservices: the Terrific Trio
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and Docker
 
Introduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange CountyIntroduction to Docker at Glidewell Laboratories in Orange County
Introduction to Docker at Glidewell Laboratories in Orange County
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Europe 2015)

  • 1. Cgroups, namespaces and beyond: What are containers made from? Jérôme🐳🚢📦Petazzoni Tinkerer Extraordinaire
  • 2. Who am I? Jérôme Petazzoni (@jpetazzo) French software engineer living in California I have built and scaled the dotCloud PaaS almost 5 years ago, time flies! I talk (too much) about containers first time was maybe in February 2013, at SCALE in L.A. Proud member of the House of Bash when annoyed, we replace things with tiny shell scripts
  • 3. Outline What is a container? High level and low level overviews The building blocks Namespaces, cgroups, copy-on-write storage Container runtimes Docker, LXC, rkt, runC, systemd-nspawn OpenVZ, Jails, Zones Hand-crafted, house-blend, artisan containers Demos!
  • 4. What is a container? 📦
  • 5. High level approach: lightweight VM I can get a shell on it through SSH or otherwise It "feels" like a VM own process space own network interface can run stuff as root can install packages can run services can mess up routing, iptables …
  • 6. Low level approach: chroot on steroids It's not quite like a VM: uses the host kernel can't boot a different OS can't have its own modules doesn't need init as PID 1 doesn't need syslogd, cron... It's just normal processes on the host machine contrast with VMs which are opaque
  • 7. How are they implemented? Let's look in the kernel source! Go to LXR Look for "LXC" → zero result Look for "container" → 1000+ results Almost all of them are about data structures or other unrelated concepts like “ACPI containers” There are some references to “our” containers but only in the documentation
  • 8. How are they implemented? Let's look in the kernel source! Go to LXR Look for "LXC" → zero result Look for "container" → 1000+ results Almost all of them are about data structures or other unrelated concepts like “ACPI containers” There are some references to “our” containers but only in the documentation ??? Are containers even in the kernel ???
  • 9. How are they implemented? Let's look in the kernel source! Go to LXR Look for "LXC" → zero result Look for "container" → 1000+ results Almost all of them are about data structures or other unrelated concepts like “ACPI containers” There are some references to “our” containers but only in the documentation ??? Are containers even in the kernel ??? ??? Do containers even actually EXIST ???
  • 11. Control groups Resource metering and limiting memory CPU block I/O network* Device node (/dev/*) access control Crowd control *With cooperation from iptables/tc; more on that later
  • 12. Generalities Each subsystem has a hierarchy (tree) separate hierarchies for CPU, memory, block I/O… Hierarchies are independent the trees for e.g. memory and CPU can be different Each process is in a node in each hierarchy think of each hierarchy as a different dimension or axis Each hierarchy starts with 1 node (the root) Initially, all processes start at the root node* Each node = group of processes sharing the same resources *It's a bit more subtle; more on that later
  • 14. Memory cgroup: accounting Keeps track of pages used by each group: file (read/write/mmap from block devices) anonymous (stack, heap, anonymous mmap) active (recently accessed) inactive (candidate for eviction) Each page is “charged” to a group Pages can be shared across multiple groups e.g. multiple processes reading from the same files when pages are shared, only one group “pays” for a page
  • 15. Memory cgroup: limits Each group can have its own limits limits are optional two kinds of limits: soft and hard limits Soft limits are not enforced they influence reclaim under memory pressure Hard limits will trigger a per-group OOM killer Limits can be set for different kinds of memory physical memory kernel memory total memory
  • 16. Memory cgroup: avoiding OOM killer Do you like when the kernel terminates random processes because something unrelated ate all the memory in the system? Setup oom-notifier Then, when the hard limit is exceeded: freeze all processes in the group notify user space (instead of going rampage) we can kill processes, raise limits, migrate containers ... when we're in the clear again, unfreeze the group
  • 17. Memory cgroup: tricky details Each time the kernel gives a page to a process, or takes it away, it updates the counters This adds some overhead This cannot be enabled/disabled per process it has to be done at boot time When multiple groups use the same page, only the first one is “charged” but if it stops using it, the charge is moved to another group
  • 18. HugeTLB cgroup Controls the amount of huge pages usable by a process safe to ignore if you don’t know what huge pages are
  • 19. Cpu cgroup Keeps track of user/system CPU time Keeps track of usage per CPU Allows to set weights Can't set CPU limits OK, let's say you give N% then the CPU throttles to a lower clock speed now what? same if you give a time slot instructions? their exec speed varies wildly ¯_ツ_/¯
  • 20. Cpuset cgroup Pin groups to specific CPU(s) Reserve CPUs for specific apps Avoid processes bouncing between CPUs Also relevant for NUMA systems Provides extra dials and knobs per zone memory pressure, process migration costs…
  • 21. Blkio cgroup Keeps track of I/Os for each group per block device read vs write sync vs async Set throttle (limits) for each group per block device read vs write ops vs bytes Set relative weights for each group Note: most writes go through the page cache so classic writes will appear to be unthrottled at first
  • 22. Net_cls and net_prio cgroup Automatically set traffic class or priority, for traffic generated by processes in the group Only works for egress traffic Net_cls will assign traffic to a class class then has to be matched with tc/iptables, otherwise traffic just flows normally Net_prio will assign traffic to a priority priorities are used by queuing disciplines
  • 23. Devices cgroup Controls what the group can do on device nodes Permissions include read/write/mknod Typical use: allow /dev/{tty,zero,random,null} ... deny everything else A few interesting nodes: /dev/net/tun (network interface manipulation) /dev/fuse (filesystems in user space) /dev/kvm (VMs in containers, yay inception!) /dev/dri (GPU)
  • 24. Freezer cgroup Allows to freeze/thaw a group of processes Similar in functionality to SIGSTOP/SIGCONT Cannot be detected by the processes that’s how it differs from SIGSTOP/SIGCONT! Doesn’t impede ptrace/debugging Specific usecases cluster batch scheduling process migration
  • 25. Subtleties PID 1 is placed at the root of each hierarchy New processes start in their parent’s groups Groups are materialized by pseudo-fs typically mounted in /sys/fs/cgroup Groups are created by mkdir in the pseudo-fs mkdir /sys/fs/cgroup/memory/somegroup/subcgroup To move a process: echo $PID > /sys/fs/cgroup/.../tasks The cgroup wars: systemd vs cgmanager vs ...
  • 27. Namespaces Provide processes with their own system view Cgroups = limits how much you can use; namespaces = limits what you can see you can’t use/affect what you can’t see Multiple namespaces: pid net mnt uts ipc user Each process is in one namespace of each type
  • 28. Pid namespace Processes within a PID namespace only see processes in the same PID namespace Each PID namespace has its own numbering starting at 1 If PID 1 goes away, whole namespace is killed Those namespaces can be nested A process ends up having multiple PIDs one per namespace in which its nested
  • 29. Net namespace: in theory Processes within a given network namespace get their own private network stack, including: network interfaces (including lo) routing tables iptables rules sockets (ss, netstat) You can move a network interface across netns ip link set dev eth0 netns PID
  • 30. Net namespace: in practice Typical use-case: use veth pairs (two virtual interfaces acting as a cross-over cable) eth0 in container network namespace paired with vethXXX in host network namespace all the vethXXX are bridged together (Docker calls the bridge docker0) But also: the magic of --net container shared localhost (and more!)
  • 31. Mnt namespace Processes can have their own root fs conceptually close to chroot Processes can also have “private” mounts /tmp (scoped per user, per service...) masking of /proc, /sys NFS auto-mounts (why not?) Mounts can be totally private, or shared No easy way to pass along a mount from a namespace to another ☹
  • 32. Uts namespace gethostname / sethostname 'nuff said!
  • 34. Ipc namespace Does anybody know about IPC?
  • 35. Ipc namespace Does anybody know about IPC? Does anybody care about IPC?
  • 36. Ipc namespace Does anybody know about IPC? Does anybody care about IPC? Allows a process (or group of processes) to have own: IPC semaphores IPC message queues IPC shared memory ... without risk of conflict with other instances
  • 37. User namespace Allows to map UID/GID; e.g.: UID 0→1999 in container C1 is mapped to UID 10000→11999 on host; UID 0→1999 in container C2 is mapped to UID 12000→13999 on host; etc. UID in containers becomes irrelevant just use UID 0 in the container it gets squashed to a non-privileged user outside Security improvement But: devil is in the details Just landed in Docker experimental branch!
  • 38. Namespace manipulation Namespaces are created with the clone() syscall i.e. with extra flags when creating a new process Namespaces are materialized by pseudo-files /proc/<pid>/ns When the last process of a namespace exits, it is destroyed but can be preserved by bind-mounting the pseudo-file It’s possible to “enter” a namespace with setns() exposed by the nsenter wrapper in util-linux
  • 40. Copy-on-write storage Create a new container instantly instead of copying its whole filesystem Storage keeps track of what has changed Many options available AUFS, overlay (file level) device mapper thinp (block level) BTRFS, ZFS (FS level) Considerably reduces footprint and “boot” times See also: Deep dive into Docker storage drivers
  • 42. Orthogonality All those things can be used independently If you only need resource isolation, use a few selected cgroups If you want to simulate a network of routers, use network namespaces Put the debugger in a container's namespaces, but not its cgroups so that the debugger doesn’t steal resources Setup a network interface in a container, then move it to another etc.
  • 43. Some missing bits Capabilities break down "root / non-root" into fine-grained rights allow to keep root, but without the dangerous bits however: CAP_SYS_ADMIN remains a big catchall SELinux / AppArmor ... containers that actually contain deserve a whole talk on their own check https://github.com/jfrazelle/bane (AppArmor profile generator)
  • 46. LXC Set of userland tools A container is a directory in /var/lib/lxc Small config file + root filesystem Early versions had no support for CoW Early versions had no support to move images Requires significant amount of elbow grease easy for sysadmins/ops, hard for devs
  • 47. systemd-nspawn From its manpage: “For debugging, testing and building” “Similar to chroot, but more powerful” “Implements the Container Interface” Seems to position itself as plumbing Recently added support for ɹǝʞɔop images #define INDEX_HOST "index.do" /* the URL we get the data from */ "cker.io” #define HEADER_TOKEN "X-Do" /* the HTTP header for the auth token */ "cker-Token:” #define HEADER_REGISTRY "X-Do" /*the HTTP header for the registry */ "cker-Endpoints:”
  • 48. Docker Engine Daemon controlled by REST(ish) API First versions shelled out to LXC now uses its own libcontainer runtime Manages containers, images, builds, and more Some people think it does too many things
  • 49. rkt, runC Back to the basics! Focus on container execution no API, no image management, no build, etc. They implement different specifications: rkt implements appc (App Container) runC implements OCP (Open Container Project) runC leverages Docker's libcontainer
  • 50. Which one is best? They all use the same kernel features Performance will be exactly the same Look at: features design ecosystem
  • 52. OpenVZ Also Linux Older, but battle-tested e.g. Travis CI gives you root in OpenVZ Tons of neat features too ploop (efficient block device for containers) checkpoint/restore, live migration venet (~more efficient veth) Still developed and maintained
  • 53. Jails / Zones FreeBSD / Solaris Coarser granularity of features Strong emphasis on security Great for hosting providers Not so much for developers where's the equivalent of docker run -ti ubuntu? Illumos branded zones can run Linux binaries with Joyent Triton, you can run Linux containers on Illumos This is different from the Oracle Solaris port the latter will run Solaris binaries, not Linux binaries
  • 55. Not for production use! (For educational purposes only) 55