How to integrate modern containers into a classical system monitoring. This covers both LXC (System Containers) and Docker/Kubernetes (Application) containers.
It starts with a brief introduction into the world of containers and then uses two examples (check_lxc and check_rancher2) how to monitor the two types of containers.
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Its all about the... containers!
1. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
How to integrateHow to integrate
modern containersmodern containers
in a classicalin a classical
system monitoringsystem monitoring
2. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
Brace for impact!
What are containers?
- A real world comparison
- Brief introduction?
LXC
LXC – System Containers
- Short Introduction
- Monitoring System Containers
- check_lxc
(Application) Containers
- Quick Introduction
- Monitoring Challenges
- Rancher (2)
- check_rancher2
3. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
This guy
… is called Claudio Kuenzler
… lives in Switzerland
… reports to a master process, co-managing two forks
… works at NZZ Media Group and co-founded Infiniroot.com
… is @Napsty on Github and @ClaudioKuenzler on Twitter
… runs a blog at claudiokuenzler.com
… started using Nagios® in 2006, discovered #monitoringlove
… maintains several monitoring plugins, best known are:
check_esxi_hardware, check_smart, check_equallogic
… been using containers since 2012
4. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
What are containers?
?
5. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
20 feet standard container (1 TEU)
~ 20 feet long (6.1m)
~ 8 feet wide (2.4m)
~ 8 feet high (2.5m)
~ 33 m3 volume
40 feet standard container (= 2 TEU)
~ 40 feet long (12.2m)
~ 8 feet wide (2.4m)
~ 8 feet high (2.5m)
~ 67 m3 volume
TEU = Twenty feet equivalent unit
That's a container!
6. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
They stack up!
By stacking containers together:
- Efficiency (use of space)
- Stability (they don't wobble around)
- Security (for on board staff)
7. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
In a perfect world
The current largest container vessel, the “OOCL Hong
Kong”, holds a capacity of 21’413 TEU.
8. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
The world is not perfect
9. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
When we talk containers, do we mean…?
- Linux Containers (LXC) – aka System Containers → Lightweight VM
- (Docker) Containers – aka Application Containers → Single process (mostly)
Virtualization on process level
- The hard truth: a container is a process
- Processes started inside a container are (obviously) child processes
- Containers use the same Kernel as the host (process, remember?)
- Direct hardware access through Kernel (no hardware virtualization)
- Resource allocation/limits using cgroups
Again… What are containers?
10. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
Containers are believed to be “new” but are actually “old” in the Unix world
- FreeBSD “Jails” exist since FreeBSD 4.0 (2000)
- Solaris “Zones” exist since Solaris 10 (2004)
- OpenVZ “Containers” exist since 2005
- Linux Containers (LXC) exist since 2007
First developped at IBM, now maintained by Canonical (Ubuntu)
AKA System Containers (to distinguish from Docker containers)
- Docker Containers (based on liblxc) exist since 2013
Since 2014 with own library (libcontainer)
AKA Application Containers
- containerd container runtime (since 2015) of Docker Inc.
In 2017 Docker Inc donated containerd to CNCF
A brief history...
11. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
LXC – Linux Containers
12. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
LXC – Linux Containers
- Can be compared to a classical virtual machine w/o hardware virtualization
- Dedicated virtual nic (bridged veth by default), full network access
- Dedicated file system (rootfs, best practice: LVM LV)
- Dedicated namespaces for isolation
- cannot see processes of the host
- neither of other containers on the same hosts
- Dedicated init system
- Basically: A super fast VM! (Fast creation, fast boot)
- Install monitoring agents/daemons as you would do on a VM
15. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
root@host:~# free m
total used free shared buff/cache available
Mem: 120869 7296 46183 2839 67390 109749
Swap: 15258 0 15258
root@container:~# free m
total used free shared buff/cache available
Mem: 120869 7296 46182 2839 67390 109748
Swap: 15258 0 15258
LXC – Monitoring memory
That’s the same!
- Container sees total capacity and used memory of host
- Not able to determine own memory usage within container
- Therefore do not use monitoring plugin inside container (e.g. check_mem)
16. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
Above: Host
Below: Container
LXC – Monitoring memory
- Same memory usage
- Same CPU load
- Same uptime
- Tasks (procs) differ
17. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
LXC – Monitoring memory (LXCFS)
Above: Host
Below: Container
- Still same CPU load
- But memory usage differs
- Uptime now differs, too
- Tasks (procs) differ
- lxcfs virtualizes parts of
/proc inside the container
- lxcfs package in Ubuntu is
recommended when
installing liblxc1 (since LXC
2.x)
- In Debian, needs to be
installed manually
18. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
root@host:~# free m
total used free shared buff/cache available
Mem: 3945 229 2894 0 821 3470
Swap: 3814 0 3814
root@container:~# free m
total used free shared buff/cache available
Mem: 3945 93 3663 0 187 3851
Swap: 3814 0 3814
LXC – Monitoring memory (LXCFS)
Used is not the same anymore!
- Able to show own memory usage inside container
- Container still sees total memory capacity of host
- However: Misleading “available” calculation because not all consumers seen
- If you run a memory check within the container, use “used” column only
That’s wrong!
X
19. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
LXC – Monitoring CPU usage
- Container will always see host usage
- Currently not possible to have a “different” view inside container
- (Clumsy) Approach: Compare jiffies on the host
- Example for time spent on system CPU (kernel space) in 5s:
12215974 – 12215646 = 328
- Compared with the host’s jiffies, gives an idea about the container usage
- Future: Possible to monitor CPU usage inside container with cgroups → soon
root@host:~# lxccgroup n container cpuacct.stat
user 41618658
system 12215646
root@host:~# sleep 5
root@host:~# lxccgroup n container cpuacct.stat
user 41619791
system 12215974
20. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
LXC – Using check_lxc
- check_lxc is a monitoring (workaround) plugin executed on the LXC host
- Uses cgroup values for memory and cpu checks (lxc-cgroup)
- Checks container autostart configuration
- Correctly monitors memory usage of container(s), incl. Swap
- Gives an idea about container cpu usage
root@host:~# /usr/lib/nagios/plugins/check_lxc.sh n container01 t auto
LXC AUTOSTART OK
root@host:~# /usr/lib/nagios/plugins/check_lxc.sh n container01 t mem
LXC container01 OK Used Memory: 571 MB|mem=598769664B;0;0;0;0
root@host:~# /usr/lib/nagios/plugins/check_lxc.sh n container01 t cpu
LXC container01 OK CPU Usage: 27%|cpu=27%;;;0;0
root@host:~# /usr/lib/nagios/plugins/check_lxc.sh n container99 t cpu
LXC container99 OK CPU Usage: 3%|cpu=3%;;;0;0
21. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
LXC – check_lxc in Icinga2
- Deploy checks of LXC containers with “apply” rules
- Example: Define a custom variable “containers” on the host object:
- Apply rule (here used with nrpe remote check):
object Host "lxchost01" {
import "generichost"
address = "192.168.100.101"
[…]
# Containers running on this host
vars.containers = [ "container01", "container02", "container99" ]
}
apply Service "LXC Memory " for (container in host.vars.containers) {
import "genericservice"
check_command = "nrpe"
vars.nrpe_command = "check_lxc"
vars.nrpe_arguments = [ container, "mem" ]
assign where host.address && host.vars.containers
}
22. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
LXC – check_lxc in Icinga2
- All (defined) containers are monitored
- Not much config changes needed
- Quick overview which container uses
→ most memory
→ most cpu
- check_lxc is still in development
→ contributions welcome
23. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
LXC – Recap
- Can “mostly” be monitored the same way as a classical host or VM
- Some resources must be monitored from “outside” (= on the LXC host)
- CPU resource monitoring might soon work from “inside”, too!
Monitoring... Where? Example Plugin
Processes Inside check_procs
Filesystem Inside check_disk
Network IO Inside check_netio
Memory usage Outside check_lxc
CPU usage Outside check_lxc
Disk IO Outside check_diskio
24. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
Application Containers
25. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
Application Containers (AC)
- Single process application running in a container (supposed to be)
- Stateless (no data stored inside container)
- Dedicated virtual nic (veth by default), NATted network access (expose)
- Dedicated file system (aufs or overlayfs by default, share the capacity)
- Dedicated namespaces for isolation
- cannot see processes of the host
- neither of other containers on the same hosts
- No init system – Just a process to start, remember?
- Great for quickly scaling up redundant applications (behind a LB)
- We rarely hear “Docker” anymore these days. Kubernetes! containerd!
26. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
AC – Monitoring challenges
- No additional software/daemons “allowed” (single process, remember?)
- No direct network access (NAT via host bridge, iptables)
- Expose ports? Yes, but workaroundish and legacy (→ Ingress)
- Standalone Docker means a lot of manual work (→ use Orchestration)
- Stop treating application containers as a classical host/network object
- Think of it as a process, not a host
27. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
AC – Management w/ Rancher
- Rancher is a “management layer” on top of the orchestration layer
→ Container Runtime → Orchestration (e.g. Kubernetes) → Rancher
- Intuitive user interface and a flexible HTTP API (→ CI/CD!)
- Rancher was chosen after >1y of internal Docker research, comparisons, tests
- Since Q3 2017 in production with total >1200 containers (Oct 18)
- Rancher 1.x uses Cattle orchestration
- Rancher 2.x is relatively new (April 2018), built on Kubernetes orchestration
- Rancher 2.x Working on Test + Staging environments, soon first Prod env
- Need to monitor the Rancher 2.x environments!
PS: No, I’m not affiliated with Rancher! Just a community user.
28. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
AC – Management w/ Rancher
29. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
AC – Healthchecks! (Kubernetes)
- The container (pod) can be monitored using readiness and liveness probes
- readinessProbe: Detect when the application is ready (e.g. startup delay)
- livenessProbe: Detect failures in application (e.g. http error)
- Allows multiple kind of probes:
- Run a command (e.g. cat /tmp/healthy) → exit 0 = OK
- HTTP Check (e.g. GET /health on port 8080) → Status 2xx/3xx = OK
- TCP Check (e.g. establish connection to port 8080) → Established = OK
30. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
AC – Healthcheck in Rancher 2
33. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
AC – Using check_rancher2
- check_rancher2 is a monitoring plugin which uses Rancher 2’s API
- Can run anywhere (requires http/https connection to API)
- Checks status of:
- Cluster(s)
- Project(s)
- Workload(s) (→ Services)
- Pod(s) (→ Containers)
- Consider the API endpoint as the “host”, check types as its services
- Future (hopefully):
- Workload or Pod usage (cpu, memory, network statistics)
depends on https://github.com/rancher/rancher/issues/14230
34. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
AC – check_rancher2 example
- Workload stuck in “removing”
- New workload (importer) already active
- Monitoring needs to alert me about this!
35. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
AC – check_rancher2 example
$ ./check_rancher2.sh H rancher2.example.com U token
xxxxx P longsecretpass S t pod p cr8ss7:p85rmm o
importer8bf85dcc9r5rtn n gamma
CHECK_RANCHER2 CRITICAL Pod importer8bf85dcc9r5rtn is
removing|'pod_active'=0;;;; 'pod_error'=1;;;;
- Plugin connects to Rancher 2 API using the information from the parameters:
-H: API Host/DNS/IP
-U: User-ID (token-xxxxx)
-P: Password for User-ID
-S: Use SSL (https)
-t: Use “pod” check type
-p: Project name (contains cluster ID, too)
-o: Pod name (optional)
-n: Namespace (optional, required for specific pod name)
36. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
Application Containers – Recap
- It’s not only Docker anymore → containerd (+ runc, kata) as container engine
- An application container is not a classical host
- Think of it as an application/process
- Use orchestration/container management (Kubernetes, Rancher, OpenShift, ...)
- Set up health checks → Healthchecks are your monitoring go go!
- Monitor these health checks using orchestration/management API’s
(Rancher2: check_rancher2)
- There might also be plugins which use kubectl locally
37. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
References and links
- Lost at sea: https://gcaptain.com/number-of-containers-lost-at-sea-falling-survey-shows/
- LXC: https://linuxcontainers.org/
- cgroup-v1: https://www.kernel.org/doc/Documentation/cgroup-v1/
- cgroup-v2: https://www.kernel.org/doc/Documentation/cgroup-v2.txt
- Docker: https://www.docker.com/
- Kubernetes: https://kubernetes.io/
- containerd: https://containerd.io/
- Rancher: https://rancher.com/
- check_lxc: https://claudiokuenzler.com/monitoring-plugins/check_lxc.php
- check_rancher2: https://claudiokuenzler.com/monitoring-plugins/check_rancher2.php
38. It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
Thank you
[[ $questions eq 0 ]] && exit 0
Hinweis der Redaktion
Infiniroot: Where we provide open source consulting and solutions for technical challenges and managed server hosting
Infiniroot: Where we provide open source consulting and solutions for technical challenges and managed server hosting
Q
- Who has heard/not heard about containers?
- Who is already using containers?
- Who is using containers in production?
Let’s start with a very basic question: What are containers?
Cargo World:
A container is a fixed unit size, around the globe this unit is used.
The base container has a length of 20 feet. In short this is called a TEU (Twenty Feet Equivalent).
The doubled size of this container is the 40 feet wide container.
Yes, you guessed it right. The two standard sizes fit together like LEGO blocks.
By stacking containers together, the transportation is more efficient but also more secure.
IT world: this is what we call redundancy and high availability!
In a perfect world, the vessel ships out with thousands of containers.
There is never a storm. All containers stay aboard.
In fact, 10 years ago I worked a couple of months for an international shipping company. When I started the first week at this company there was a quick introduction into shipping and containers. I had just one question at the end: Do containers sometimes fall over board?
The answer was short and straightforward but not what I expected: Oh yes, all the time !
According to statistics on the Internet around 1500 containers fall overboard – per YEAR.
And this is the part where we can compare the shipping containers with computer containers: They can crash. That’s why we need to monitor them!
If we compare the output of free -m inside of the container and on the host, the output is the same!
Click
Top or htop is actually a great command to visually compare the container’s and the host’s usage.
We can clearly see that the container shows the same information as the host, except for the number of processes shown as tasks.
The container is only aware of its own processes, but not how much resources they use.
But something very interesting happens, when the additional package LXCFS is installed.
The container is now suddenly able to see its own memory usage.
Thanks to lxcfs, the uptime value now shows the real uptime of the container itself, not the uptime of the host.
This is a great help for a quick analysis inside the container.
However: CPU usage is still the same as on the host
When we use the same “free -m” commands from above but now with LXCFS installed, we can now see a difference in the “used” column.
This means the container is aware of its own processes and correctly shows memory usage.
But because the container still sees the total memory capacity of the host, memory calculations are wrong.
Remember: The container is unable to see processes outside of itself. How can it know how much memory the other containers or the host itself consumes?
Available minus used therefore results in something wrong.
Monitoring CPU usage a a little bit more tricky.
As you could see from htop before, both container and host show the same usage.