Lightweight virtualization", also called "OS-level virtualization", is not new. On Linux it evolved from VServer to OpenVZ, and, more recently, to Linux Containers (LXC). It is not Linux-specific; on FreeBSD it's called "Jails", while on Solaris it’s "Zones". Some of those have been available for a decade and are widely used to provide VPS (Virtual Private Servers), cheaper alternatives to virtual machines or physical servers. But containers have other purposes and are increasingly popular as the core components of public and private Platform-as-a-Service (PAAS), among others.
Just like a virtual machine, a Linux Container can run (almost) anywhere. But containers have many advantages over VMs: they are lightweight and easier to manage. After operating a large-scale PAAS for a few years, dotCloud realized that with those advantages, containers could become the perfect format for software delivery, since that is how dotCloud delivers from their build system to their hosts. To make it happen everywhere, dotCloud open-sourced Docker, the next generation of the containers engine powering its PAAS. Docker has been extremely successful so far, being adopted by many projects in various fields: PAAS, of course, but also continuous integration, testing, and more.
5. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
6. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
15. Many targets
● your local development environment
● your coworkers' developement environment
● your Q&A team's test environment
● some random demo/test server
● the staging server(s)
● the production server(s)
● bare metal
● virtual machines
● shared hosting
23. Solution to the transport problem:
the intermodal shipping container
24. Solution to the transport problem:
the intermodal shipping container
● 90% of all cargo now shipped in a standard container
● faster and cheaper to load and unload on ships
(by an order of magnitude)
● less theft, less damage
● freight cost used to be >25% of final goods cost, now <3%
● 5000 ships deliver 200M containers per year
26. Linux containers...
● run everywhere
– regardless of kernel version
– regardless of host distro
– (but container and host architecture must match*)
● run anything
– if it can run on the host, it can run in the container
– i.e., if it can run on a Linux kernel, it can run
*Unless you emulate CPU with qemu and binfmt
27. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
29. High level approach:
it's a lightweight VM
● own process space
● own network interface
● can run stuff as root
● can have its own /sbin/init
(different from the host)
« Machine Container »
30. Low level approach:
it's chroot on steroids
● can also not have its own /sbin/init
● container = isolated process(es)
● share kernel with host
● no device emulation (neither HVM nor PV)
« Application Container »
31. Separation of concerns:
Dmitry the Developer
● inside my container:
– my code
– my libraries
– my package manager
– my app
– my data
32. Separation of concerns:
Oleg the Ops guy
● outside the container:
– logging
– remote access
– network configuration
– monitoring
33. How does it work?
Isolation with namespaces
● pid
● mnt
● net
● uts
● ipc
● user
34. pid namespace
jpetazzo@tarrasque:~$ ps aux | wc -l
212
jpetazzo@tarrasque:~$ sudo docker run -t -i ubuntu bash
root@ea319b8ac416:/# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 18044 1956 ? S 02:54 0:00 bash
root 16 0.0 0.0 15276 1136 ? R+ 02:55 0:00 ps aux
(That's 2 processes)
39. user namespace
● no « demo » for this one... Yet!
● UID 0→1999 in container C1 is mapped to
UID 10000→11999 in host;
UID 0→1999 in container C2 is mapped to
UID 12000→13999 in host; etc.
● required lots of VFS and FS patches (esp. XFS)
● what will happen with copy-on-write?
– double translation at VFS?
– single root UID on read-only FS?
40. How does it work?
Isolation with cgroups
● memory
● cpu
● blkio
● devices
41. memory cgroup
● keeps track pages used by each group:
– file (read/write/mmap from block devices; swap)
– anonymous (stack, heap, anonymous mmap)
– active (recently accessed)
– inactive (candidate for eviction)
● each page is « charged » to a group
● pages can be shared (e.g. if you use any COW FS)
● Individual (per-cgroup) limits and out-of-memory killer
42. cpu and cpuset cgroups
● keep track of user/system CPU time
● set relative weight per group
● pin groups to specific CPU(s)
– Can be used to « reserve » CPUs for some apps
– This is also relevant for big NUMA systems
43. blkio cgroups
● keep track IOs for each block device
– read vs write; sync vs async
● set relative weights
● set throttle (limits) for each block device
– read vs write; bytes/sec vs operations/sec
Note: earlier versions (pre-3.8) didn't account async correctly.
3.8 is better, but use 3.10 for best results.
47. Efficiency: almost no overhead
● processes are isolated, but run straight on the host
● CPU performance
= native performance
● memory performance
= a few % shaved off for (optional) accounting
● network performance
= small overhead; can be optimized to zero overhead
48. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
49. Efficiency: storage-friendly
● unioning filesystems
(AUFS, overlayfs)
● snapshotting filesystems
(BTRFS, ZFS)
● copy-on-write
(thin snapshots with LVM or device-mapper)
This is now being integrated with low-level LXC tools as well!
50. Efficiency: storage-friendly
● provisioning now takes a few milliseconds
● … and a few kilobytes
● creating a new base image (from a running container) takes a
few seconds (or even less)
52. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
53. What can Docker do?
● Open Source engine to commoditize LXC
● using copy-on-write for quick provisioning
● allowing to create and share images
● standard format for containers
(stack of layers; 1 layer = tarball+metadata)
● standard, reproducible way to easily build trusted images
(Dockerfile)
54. Docker: authoring images
● you can author « images »
– either with « run+commit » cycles, taking snapshots
– or with a Dockerfile (=source code for a container)
– both ways, it's ridiculously easy
● you can run them
– anywhere
– multiple times
55. Dockerfile example
FROM ubuntu
RUN apt-get -y update
RUN apt-get install -y g++
RUN apt-get install -y erlang-dev erlang-manpages erlang-base-hipe ...
RUN apt-get install -y libmozjs185-dev libicu-dev libtool ...
RUN apt-get install -y make wget
RUN wget http://.../apache-couchdb-1.3.1.tar.gz | tar -C /tmp -zxf-
RUN cd /tmp/apache-couchdb-* && ./configure && make install
RUN printf "[httpd]nport = 8101nbind_address = 0.0.0.0" >
/usr/local/etc/couchdb/local.d/docker.ini
EXPOSE 8101
CMD ["/usr/local/bin/couchdb"]
56. Yes, but...
● « I don't need Docker;
I can do all that stuff with LXC tools, rsync, some scripts! »
● correct on all accounts;
but it's also true for apt, dpkg, rpm, yum, etc.
● the whole point is to commoditize,
i.e. make it ridiculously easy to use
59. What this really means…
● instead of writing « very small shell scripts » to
manage containers, write them to do the rest:
– continuous deployment/integration/testing
– orchestration
● = use Docker as a building block
● re-use other people images (yay ecosystem!)
60. Docker: sharing images
● you can push/pull images to/from a registry
(public or private)
● you can search images through a public index
● dotCloud maintains a collection of base images
(Ubuntu, Fedora...)
● satisfaction guaranteed or your money back
61. Docker: not sharing images
● private registry
– for proprietary code
– or security credentials
– or fast local access
● the private registry is available
as an image on the public registry
(yes, that makes sense)
62. Typical workflow
● code in local environment
(« dockerized » or not)
● each push to the git repo triggers a hook
● the hook tells a build server to clone the code and run « docker
build » (using the Dockerfile)
● the containers are tested (nosetests, Jenkins...),
and if the tests pass, pushed to the registry
● production servers pull the containers and run them
● for network services, load balancers are updated
63. Hybrid clouds
● Docker is part of OpenStack « Havana »,
as a Nova driver + Glance translator
● typical workflow:
– code on local environment
– push container to Glance-backed registry
– run and manage containers using OpenStack APIs
64. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
65. What's Docker exactly?
● rewrite of dotCloud internal container engine
– original version: Python, tied to dotCloud's internal stuff
– released version: Go, legacy-free
● the Docker daemon runs in the background
– manages containers, images, and builds
– HTTP API (over UNIX or TCP socket)
– embedded CLI talking to the API
● Open Source (GitHub public repository + issue tracking)
● user and dev mailing lists
67. Outline
● Why Linux Containers?
● What are Linux Containers exactly?
● What do we need on top of LXC?
● Why Docker?
● What is Docker exactly?
● Where is it going?
69. Docker: the ecosystem
● Cocaine (PAAS; has Docker plugin)
● CoreOS (full distro based on Docker)
● Deis (PAAS; available)
● Dokku (mini-Heroku in 100 lines of bash)
● Flynn (PAAS; in development)
● Maestro (orchestration from a simple YAML file)
● OpenStack integration (in Havana, Nova has a Docker driver)
● Shipper (fabric-like orchestration)
70. Cocaine integration
● what's Cocaine?
– Open Source PaaS from Yandex
– modular: can switch logging, storage, etc. without changing apps
– infrastructure abstraction layer + service discovery
– monitoring: metrics collection; load balancing
● why Docker?
– Cocaine initially used cgroups
– wanted to add LXC for better isolation and resource control
– heard about Docker at the right time
71. device-mapper thin snapshots
(aka « thinp »)
● start with a 10 GB empty ext4 filesystem
– snapshot: that's the root of everything
● base image:
– clone the original snapshot
– untar image on the clone
– re-snapshot; that's your image
● create container from image:
– clone the image snapshot
– run; repeat cycle as many times as needed
72. AUFS vs THINP
AUFS
● easy to see changes
● small change =
copy whole file
● ~42 layers
● patched kernel
(Debian, Ubuntu OK)
● efficient caching
● no quotas
THINP
● must diff manually
● small change =
copy 1 block (100k-1M)
● unlimited layers
● stock kernel (>3.2)
(RHEL 2.6.32 OK)
● duplicated pages
● FS size acts as quota
73. Misconceptions about THINP
● « performance degradation »
no; that was with « old » LVM snapshots
● « can't handle 1000s of volumes »
that's LVM; Docker uses devmapper directly
● « if snapshot volume is out of space,
it breaks and you lose everything »
that's « old » LVM snapshots; thinp halts I/O
● « if still use disk space after 'rm -rf' »
no, thanks to 'discard passdown'
74. Other features in 0.7
● links
– linked containers can discover each other
– environment variable injection
– allows to expose remote services thru containers
(implements the ambassador pattern)
– side-effect: container naming
● host integration
– we ♥ systemd
75. 0.8 and beyond
● beam
– introspection API
– based on Redis protocol
(i.e. all Redis clients work)
– works well for synchronous req/rep and streams
– reimplementation of Redis core in Go
– think of it as « live environment variables »,
that you can watch/subscribe to
● and much more