Using Kubernetes and CoreOS to increase scalability and availability. Presentation at the Athens Docker meetup http://www.meetup.com/Docker-Athens/events/226277352/
8. Chapter 1
the container A lightweight VM?
A chrooted process?
An application packaging technology?
9. Containers kick ass despite limitations
● Great for dev on a single node.
● Ideal for CI.
● It gets tricky in multi-node
production environments.
● A lot of hacking required to
orchestrate deployments,
rollback, scale, monitor,
migrate.
10. Chapter 2
CoreOS
A lightweight Linux distro for clustered
deployments that uses containers to
manage your services at a higher level
of abstraction, instead of installing
packages via yum or apt.
11. etcd
● A distributed key-value store that
provides a reliable way to store data
across a cluster of machines.
● Values can be watched, to trigger
app reconfigurations when they
change.
● Odd sized clusters guaranteed to
reach consensus.
● JSON/REST API.
12. flannel
● An etcd backed network
fabric for containers.
● A virtual network that
gives a subnet to each
host for use with
container runtimes.
13. fleet
● An etcd backed,
distributed init system
(distributed systemd).
● Treat CoreOS cluster as if
it shared an init system.
● Graceful updates of
CoreOS across the cluster.
● Handles machine failures.
fleet
14. rkt
● Container runtime by
CoreOS
● rkt is an implementation
of the App Container Spec
● rkt features native support
for fetching and running
Docker container images
17. Pods
● A collocated group of containers
with shared volumes. Always
executed on the same node.
● The smallest deployable units.
● Correspond to a colocated group of
applications running with shared
context.
18. Replication controllers
● Ensure that a specific
number of pod replicas are
running at any one time.
● Replace pods that are
deleted or terminated.
● Get rid of excess pods.
19. Labels
● Key-value pairs attached to
pods and other resources.
● Specify identifying
properties of resources.
● Sets of objects can be
identified by label selectors
(e.g. version=2).
20. Services
● An abstraction that uses a
selector to map an incoming
port to a set of pods.
● Needed to keep stable front-
ends since pods are mortal
and each pod gets its own ip
address.
21. Self-healing
● The user declares the
target state e.g. “I need 5
uwsgi & 10 celery servers
active at all times”.
● Kubernetes will re-start,
replicate & re-schedule
containers to ensure that
this is met.
22. Scaling
● By increasing or decreasing the
replication factor of each pod,
respective services will scale up
or down.
● Auto-scaling of services
depending on pod CPU
utilization.
● New nodes can be added to
increase cluster capacity.
25. High availability of Kubernetes can
be achieved with CoreOS (e.g. fleet),
but not without some serious effort...
High availability
of Kubernetes
26. Used to be an issue, promised to be
resolved in Kubernetes v1.1.1
“included option to use native IP
tables offering an 80% reduction in
tail latency, an almost complete
elimination of CPU overhead “
Network
performance
27. Stateful services and Kubernetes do
not fit well. There are some “exotic”
ways to solve the problem, but they
are either still in beta or under heavy
development (e.g. flocker)
Stateful
services
28. Kubernetes is configured to work out
of the box only for GCE and EC2. In
any other case manual configuration
of load-balancers and external DNS
services is to be expected.
Public Load
Balancer
External DNS
29. Kubernetes on top of CoreOS is a
completely new way of doing things...
operation workflows for DevOps
should be heavily adjusted to this new
way of things…
You could end up building your own
tools around Kubernetes...
Operational
Management
31. ● One click deployment!
● Replicate as much of the production setup as possible
● Everything pre-configured for the developer (e.g. add-ons)
Goals for the development process:
Our experience so far...
32. -
Ended up building our own
internal tools
aka mistctl
everything is ctl nowadays…
does anyone remember tail -f ???
33. +
Works locally but not in prod???
Not the case anymore...at least
most of the times
34. Local dev
with
Kubernetes in
place
● Higher demands on developer’s
laptop power!
● Allows us to get rid of distro specific
dependencies.
● Adds new dependencies: vagrant &
virtualbox.
● Local dev environment is very close
to production.
37. CI Workflow explanation
1. Developer opens a PR against the staging
branch on Github, triggers Jenkins job.
2. Jenkins setups the env runs the tests and
posts the results back to the PR.
3. Reviewer merges to staging branch after
manual code review.
4. Jenkins builds pre-production containers
and pushes them to the registry.
5. Jenkins triggers deploy on pre-production
cluster.
6. Jenkins runs stress tests against pre-
production cluster.
7. Reviewer compares stress test results with
previous results.
...
41. Monitoring
● Locally using cAdvisor, heapster,
influxDB & Grafana.
● Externally using 3rd party
service.
● Enhance Mist.io to monitor
Kubernetes clusters and to
trigger actions based on rules.
42. High Availability
● For the cluster services
through fleet: multiple
masters.
● For our own services,
especially the stateful
ones (e.g. MongoDB).
43. Disaster Recovery
● Deploy Kubernetes cluster on
another provider or region.
● Deploy our apps on the new
cluster.
● Restore data from latest
backup or perform live
migration, depending on the
type of disaster.