Containers technologies have been gaining a lot of traction in the DevOps world, especially with the arrival of Docker.io. But these technologies have been around for more than 10 years. In this talk, we will dive through the history of Linux Containers, and how it differentiates with traditional virtualization technologies. From Linux UML and VServer days in the early 2000s, through OpenVZ and the rise of Linux namespaces and cgroups, to LXC, the new kid on the block.
3. An Introduction to Containers
“[...] should it be possible for the operating
system to ensure that excessive resource
usage by one group of processes doesn't
interfere with another group of processes?
Should it be possible for a single kernel to
provide resource-usage statistics for a
logical group of processes? Likewise,
should the kernel be able to allow multiple
processes to transparently use port 80?”
Glauber Costa, Parallels (SWSoft / company behind OpenVZ)
http://lwn.net/Articles/524952/
4. Containers (vs virtualization)
â—Ź Group processes together to create secure,
isolated virtual environments
â—Ź Share the host kernel / operating system
â—Ź Generally perform better than traditional
virtualization
â—Ź Often have limitations with kernel features
(VPN, loopback devices, iptables, FUSE,
NFS, etc.)
5. User-Mode Linux (UML)
â—Ź Kernel patch to compile the Linux kernel as
“regular” binary. Run linux inside linux: ./linux
â—Ź First paper in August 2000, Linux 2.2.x [1]
â—Ź Mainstream since 2.6.0 (December 2003)
â—Ź No root access needed (network requires
TUN/TAP)
â—Ź Linode was initially offering UML containers
and switched to Xen on March 28, 2008 [2]
â—Ź Works out of the box with all recent kernels [3]
[1] http://user-mode-linux.sourceforge.net/old/als2000/index.html
[2] https://blog.linode.com/2008/03/28/linodes-in-xen/
[3] http://uml.devloop.org.uk/
6. Linux-VServer
â—Ź Created by Jacques Gelinas, a Montrealer
â—Ź First public announcement October 2001 [1]
● Use a “security context” concept to isolate
processes (similar to Linux Namespaces)
â—Ź Still alive (latest patch for Linux 3.10.21)
â—Ź Dreamhost (the company behind Ceph) still
use Linux-VServer for their VPS offering
[1] http://www.cs.helsinki.fi/linux/linux-kernel/2001-40/1065.html
[2] http://www.dreamhost.com/servers/vps/
7. OpenVZ
â—Ź Patch based on latest RHEL kernel (currently
2.6.32; 40MB gzip patch). Extends Linux
Cgroups/Namespaces features
â—Ź Mature (initial release in 2005), OSS behind
Parallels Virtuozzo (commercial)
â—Ź Future of OpenVZ lies within Linux Cgroups/
Namespaces. Recent version of OpenVZ tools
work partially with recent mainstream kernels
â—Ź OpenVZ developers very active in Linux
kernel/Namespaces community
10. LXC
â—Ź Docker uses LXC for creating containers
â—Ź First release of LXC September 2008
â—Ź Set of userspace tools to create containers
on top of Linux Cgroups and Namespaces
â—Ź LXC containers are not fully secure yet.
It’s possible for root inside container to
escape and gain root on host. Need
AppArmor/SELinux. Future lies in the User
namespace.
11. Linux Namespaces
Different namespaces = Different “Views” of
the kernel
Linux 2.4.19 - 3 Aug 2002
Mount namespace
Mount Points
Linux 2.6.19 - 29 Nov 2006
UTS namespace
Hostname
IPC namespace
Interprocess communication
PID namespace
Processes in different PID
namespace can have the same PID
Network namespace
Network devices, IP addresses,
routing tables, iptables entries
User namespace
Root privileges for operations inside
a user namespace, but unprivileged
outside the namespace. Number of
Linux filesystems are not yet usernamespace aware.
Linux 2.6.24 - 24 Jan 2008
Linux 3.8 - 18 Feb 2013
http://lwn.net/Articles/531114/
12. Linux Cgroups
â—Ź Virtually group processes together, apply
limits, priority, accounting, etc.
â—Ź Divided in subsystems, each subsystem
representing a resource (CPU, memory, etc)
blkio
Limit input/output access to and from block devices
cpu
Uses the scheduler to provide access to the CPU
devices
Allows or denies access to devices
freezer
Suspends or resumes tasks in a cgroup
memory
Set limits on memory use by tasks in a cgroup, and generates automatic
reports on memory resources used by those tasks
...
https://access.redhat.com/site/documentation/enUS/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch01.html
13. Playing with Cgroups
â—Ź Cgroups are configured through the cgroup
virtual file system (similar to /proc)
â—Ź Mounting the cgroup virtual filesystem for the
desired subsystem (ex. blkio):
sudo mkdir -p /sys/fs/cgroup/blkio
sudo mount -t cgroup -oblkio blkio /sys/fs/cgroup/blkio
● Create a new cgroup named “1mbsec” in the
blkio sybsystem:
sudo mkdir /sys/fs/cgroup/blkio/1mbsec
14. Playing with Cgroups (cont.)
â—Ź Set a limit of 1MB/ sec on this cgroup:
echo '253:2 '$((1024*1024)) |sudo tee /sys/fs/cgroup/blkio/1mbsec/blkio.throttle.write_bps_device
â—Ź Attach current process (shell) to the 1mbsec
cgroup:
echo $$ | sudo tee /sys/fs/cgroup/blkio/1mbsec/tasks
â—Ź Writes are now throttled to 1MB/sec:
dd if=/dev/zero of=100mbtest.bin bs=1M count=100 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 100.055 s, 1.0 MB/s
15. My Personal Experience
● OpenVZ is generally the “go-to” for public /
production containers (unless you need some
of the recent kernel features)
â—Ź LXC is gaining a lot of interest, especially with
tools like Docker. Escaping LXC containers is
a major security issue, you will need to learn
AppArmor/SELinux to secure LXC
● User-Mode Linux is a very well kept secret. It’
s a great way to quickly run containers,
especially in non-root environments, and
works out the box with all recent kernels.