Its all about the... containers!

It’s all about the... containers!
Monitoring containers OSMC 2018 Nuremberg @ClaudioKuenzler
How to integrateHow to integrate
modern containersmodern containers
in a classicalin a classical
system monitoringsystem monitoring

Brace for impact!
What are containers?
- A real world comparison
- Brief introduction?
LXC
LXC – System Containers
- Short Introduction
- Monitoring System Containers
- check_lxc
(Application) Containers
- Quick Introduction
- Monitoring Challenges
- Rancher (2)
- check_rancher2

This guy
… is called Claudio Kuenzler
… lives in Switzerland
… reports to a master process, co-managing two forks
… works at NZZ Media Group and co-founded Infiniroot.com
… is @Napsty on Github and @ClaudioKuenzler on Twitter
… runs a blog at claudiokuenzler.com
… started using Nagios® in 2006, discovered #monitoringlove
… maintains several monitoring plugins, best known are:
check_esxi_hardware, check_smart, check_equallogic
… been using containers since 2012

What are containers?
?

20 feet standard container (1 TEU)
~ 20 feet long (6.1m)
~ 8 feet wide (2.4m)
~ 8 feet high (2.5m)
~ 33 m3 volume
40 feet standard container (= 2 TEU)
~ 40 feet long (12.2m)
~ 8 feet wide (2.4m)
~ 8 feet high (2.5m)
~ 67 m3 volume
TEU = Twenty feet equivalent unit
That's a container!

They stack up!
By stacking containers together:
- Efficiency (use of space)
- Stability (they don't wobble around)
- Security (for on board staff)

In a perfect world
The current largest container vessel, the “OOCL Hong
Kong”, holds a capacity of 21’413 TEU.

The world is not perfect

When we talk containers, do we mean…?
- Linux Containers (LXC) – aka System Containers → Lightweight VM
- (Docker) Containers – aka Application Containers → Single process (mostly)
Virtualization on process level
- The hard truth: a container is a process
- Processes started inside a container are (obviously) child processes
- Containers use the same Kernel as the host (process, remember?)
- Direct hardware access through Kernel (no hardware virtualization)
- Resource allocation/limits using cgroups
Again… What are containers?

Containers are believed to be “new” but are actually “old” in the Unix world
- FreeBSD “Jails” exist since FreeBSD 4.0 (2000)
- Solaris “Zones” exist since Solaris 10 (2004)
- OpenVZ “Containers” exist since 2005
- Linux Containers (LXC) exist since 2007
First developped at IBM, now maintained by Canonical (Ubuntu)
AKA System Containers (to distinguish from Docker containers)
- Docker Containers (based on liblxc) exist since 2013
Since 2014 with own library (libcontainer)
AKA Application Containers
- containerd container runtime (since 2015) of Docker Inc.
In 2017 Docker Inc donated containerd to CNCF
A brief history...

LXC – Linux Containers

LXC – Linux Containers
- Can be compared to a classical virtual machine w/o hardware virtualization
- Dedicated virtual nic (bridged veth by default), full network access
- Dedicated file system (rootfs, best practice: LVM LV)
- Dedicated namespaces for isolation
- cannot see processes of the host
- neither of other containers on the same hosts
- Dedicated init system
- Basically: A super fast VM! (Fast creation, fast boot)
- Install monitoring agents/daemons as you would do on a VM

root@container:~# ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     27207  0.1  0.0  20068  4036 ?        Ss   21:21   0:00 /bin/bash
root     27274  0.0  0.0  38308  3348 ?        R+   21:21   0:00  _ ps auxf
root         1  0.0  0.0 204336  6660 ?        Ss   Sep03   0:09 /sbin/init
root        16  0.0  0.2 374092 267440 ?       Ss   Sep03   1:53 /lib/systemd/systemd
journald
root        35  0.0  0.0  29664  2820 ?        Ss   Sep03   0:03 /usr/sbin/cron f
root        36  0.0  0.0 250116  3492 ?        Ssl  Sep03   3:21 /usr/sbin/rsyslogd n
nagios      71  0.0  0.0  23916  4196 ?        Ss   Sep03   0:03 /usr/sbin/nrpe c
/etc/nagios/nrpe.cfg f
root        75  0.0  0.0  12668  1644 pts/3    Ss+  Sep03   0:00 /sbin/agetty noclear
tty4 linux
tty1 linux
tty3 linux
[...]
LXC – Monitoring processes
→ Use check_procs !

root@container:~# df h x cgroup
Filesystem             Type         Size  Used Avail Use% Mounted on
/dev/vgdata/irczsrvc03 ext4          25G   17G  6.5G  73% /
none                   tmpfs        492K     0  492K   0% /dev
proc                   proc            0     0     0     /proc
proc                   proc            0     0     0     /proc/sys/net
proc                   proc            0     0     0     /proc/sys
proc                   proc            0     0     0     /proc/sysrqtrigger
sysfs                  sysfs           0     0     0     /sys
sysfs                  sysfs           0     0     0     /sys
sysfs                  sysfs           0     0     0     /sys/devices/virtual/net
sysfs                  sysfs           0     0     0     /sys/devices/virtual/net
fusectl                fusectl         0     0     0     /sys/fs/fuse/connections
devpts                 devpts          0     0     0     /dev/console
devpts                 devpts          0     0     0     /dev/pts
devpts                 devpts          0     0     0     /dev/tty1
tmpfs                  tmpfs         60G     0   60G   0% /dev/shm
tmpfs                  tmpfs         60G  169M   59G   1% /run
tmpfs                  tmpfs        5.0M     0  5.0M   0% /run/lock
tmpfs                  tmpfs         60G     0   60G   0% /sys/fs/cgroup
mqueue                 mqueue          0     0     0     /dev/mqueue
binfmt_misc            binfmt_misc     0     0     0     /proc/sys/fs/binfmt_misc
hugetlbfs              hugetlbfs       0     0     0     /dev/hugepages
LXC – Monitoring filesystem(s)
→ Use check_disk !

root@host:~# free m
              total        used        free      shared  buff/cache   available
Mem:         120869        7296       46183        2839       67390      109749
Swap:         15258           0       15258
root@container:~# free m
Mem:         120869        7296       46182        2839       67390      109748
Swap:         15258           0       15258
LXC – Monitoring memory
That’s the same!
- Container sees total capacity and used memory of host
- Not able to determine own memory usage within container
- Therefore do not use monitoring plugin inside container (e.g. check_mem)

Above: Host
Below: Container
LXC – Monitoring memory
- Same memory usage
- Same CPU load
- Same uptime
- Tasks (procs) differ

LXC – Monitoring memory (LXCFS)
Above: Host
Below: Container
- Still same CPU load
- But memory usage differs
- Uptime now differs, too
- Tasks (procs) differ
- lxcfs virtualizes parts of
/proc inside the container
- lxcfs package in Ubuntu is
recommended when
installing liblxc1 (since LXC
2.x)
- In Debian, needs to be
installed manually

root@host:~# free m
Mem:           3945         229        2894           0         821        3470
Swap:          3814           0        3814
root@container:~# free m
Mem:           3945          93        3663           0         187        3851
Swap:          3814           0        3814
LXC – Monitoring memory (LXCFS)
Used is not the same anymore!
- Able to show own memory usage inside container
- Container still sees total memory capacity of host
- However: Misleading “available” calculation because not all consumers seen
- If you run a memory check within the container, use “used” column only
That’s wrong!
X

LXC – Monitoring CPU usage
- Container will always see host usage
- Currently not possible to have a “different” view inside container
- (Clumsy) Approach: Compare jiffies on the host
- Example for time spent on system CPU (kernel space) in 5s:
12215974 – 12215646 = 328
- Compared with the host’s jiffies, gives an idea about the container usage
- Future: Possible to monitor CPU usage inside container with cgroups → soon
root@host:~# lxccgroup n container cpuacct.stat
user 41618658
system 12215646
root@host:~# sleep 5
root@host:~# lxccgroup n container cpuacct.stat
user 41619791
system 12215974

LXC – Using check_lxc
- check_lxc is a monitoring (workaround) plugin executed on the LXC host
- Uses cgroup values for memory and cpu checks (lxc-cgroup)
- Checks container autostart configuration
- Correctly monitors memory usage of container(s), incl. Swap
- Gives an idea about container cpu usage
root@host:~# /usr/lib/nagios/plugins/check_lxc.sh n container01 t auto
LXC AUTOSTART OK
root@host:~# /usr/lib/nagios/plugins/check_lxc.sh n container01 t mem
LXC container01 OK Used Memory: 571 MB|mem=598769664B;0;0;0;0
root@host:~# /usr/lib/nagios/plugins/check_lxc.sh n container01 t cpu
LXC container01 OK CPU Usage: 27%|cpu=27%;;;0;0
root@host:~# /usr/lib/nagios/plugins/check_lxc.sh n container99 t cpu
LXC container99 OK CPU Usage: 3%|cpu=3%;;;0;0

LXC – check_lxc in Icinga2
- Deploy checks of LXC containers with “apply” rules
- Example: Define a custom variable “containers” on the host object:
- Apply rule (here used with nrpe remote check):
object Host "lxchost01" {
  import "generichost"
  address = "192.168.100.101"
[…]
  # Containers running on this host
  vars.containers = [ "container01", "container02", "container99" ]
}
apply Service "LXC Memory " for (container in host.vars.containers) {
  import "genericservice"
  check_command = "nrpe"
  vars.nrpe_command = "check_lxc"
  vars.nrpe_arguments = [ container, "mem" ]
  assign where host.address && host.vars.containers
}

LXC – check_lxc in Icinga2
- All (defined) containers are monitored
- Not much config changes needed
- Quick overview which container uses
→ most memory
→ most cpu
- check_lxc is still in development
→ contributions welcome

LXC – Recap
- Can “mostly” be monitored the same way as a classical host or VM
- Some resources must be monitored from “outside” (= on the LXC host)
- CPU resource monitoring might soon work from “inside”, too!
Monitoring... Where? Example Plugin
Processes Inside check_procs
Filesystem Inside check_disk
Network IO Inside check_netio
Memory usage Outside check_lxc
CPU usage Outside check_lxc
Disk IO Outside check_diskio

Application Containers

Application Containers (AC)
- Single process application running in a container (supposed to be)
- Stateless (no data stored inside container)
- Dedicated virtual nic (veth by default), NATted network access (expose)
- Dedicated file system (aufs or overlayfs by default, share the capacity)
- Dedicated namespaces for isolation
- cannot see processes of the host
- neither of other containers on the same hosts
- No init system – Just a process to start, remember?
- Great for quickly scaling up redundant applications (behind a LB)
- We rarely hear “Docker” anymore these days. Kubernetes! containerd!

AC – Monitoring challenges
- No additional software/daemons “allowed” (single process, remember?)
- No direct network access (NAT via host bridge, iptables)
- Expose ports? Yes, but workaroundish and legacy (→ Ingress)
- Standalone Docker means a lot of manual work (→ use Orchestration)
- Stop treating application containers as a classical host/network object
- Think of it as a process, not a host

AC – Management w/ Rancher
- Rancher is a “management layer” on top of the orchestration layer
→ Container Runtime → Orchestration (e.g. Kubernetes) → Rancher
- Intuitive user interface and a flexible HTTP API (→ CI/CD!)
- Rancher was chosen after >1y of internal Docker research, comparisons, tests
- Since Q3 2017 in production with total >1200 containers (Oct 18)
- Rancher 1.x uses Cattle orchestration
- Rancher 2.x is relatively new (April 2018), built on Kubernetes orchestration
- Rancher 2.x Working on Test + Staging environments, soon first Prod env
- Need to monitor the Rancher 2.x environments!
PS: No, I’m not affiliated with Rancher! Just a community user.

AC – Management w/ Rancher

AC – Healthchecks! (Kubernetes)
- The container (pod) can be monitored using readiness and liveness probes
- readinessProbe: Detect when the application is ready (e.g. startup delay)
- livenessProbe: Detect failures in application (e.g. http error)
- Allows multiple kind of probes:
- Run a command (e.g. cat /tmp/healthy) → exit 0 = OK
- HTTP Check (e.g. GET /health on port 8080) → Status 2xx/3xx = OK
- TCP Check (e.g. establish connection to port 8080) → Established = OK

AC – Healthcheck in Rancher 2

AC – Healthcheck in kubectl
$ kubectl describe pod nginxtestpfbwm namespace gamma insecureskiptlsverify=true
Name:           nginxtestpfbwm
Namespace:      gamma
Node:           mhradoi02t/192.168.254.62
Start Time:     Tue, 02 Oct 2018 21:16:09 +0200
Labels:         controllerrevisionhash=3146803588
                podtemplategeneration=4
                workload.user.cattle.io/workloadselector=daemonSetgammanginxtest
Annotations:    cni.projectcalico.org/podIP: 10.42.1.118/32
Status:         Running
IP:             10.42.1.118
Controlled By:  DaemonSet/nginxtest
Containers:
  nginxtest:
    Container ID:   docker://93c32c0f3eaf34f939347206c5e7151eac60efb14ef4a464fb3c82fa5cbde659
    Image:          nginx
    Image ID:       docker
pullable://nginx@sha256:e8ab8d42e0c34c104ac60b43ba60b19af08e19a0e6d50396bdfd4cef0347ba83
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Tue, 02 Oct 2018 21:16:12 +0200
    Ready:          True
    Restart Count:  0
    Liveness:       httpget http://:80/ delay=10s timeout=2s period=2s #success=1 #failure=3
    Readiness:      httpget http://:80/ delay=10s timeout=2s period=2s #success=2 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from defaulttoken9nbfr (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          True
  PodScheduled   True
[...]

AC – Healthcheck in Rancher API
$ curl s u tokenxxxxx:longsecretpass https://rancher2.example.com/v3/project/cr8ss7:p
85rmm/pods/gamma:nginxtestpfbwm |jshon |more
{
[...]
   "name": "nginxtest",
   "privileged": false,
   "readOnly": false,
   "readinessProbe": {
    "failureThreshold": 3,
    "initialDelaySeconds": 10,
    "path": "/",
    "periodSeconds": 2,
    "port": 80,
    "scheme": "HTTP",
    "successThreshold": 2,
    "tcp": false,
    "timeoutSeconds": 2,
    "type": "/v3/project/schemas/probe"
   },
   "resources": {
    "type": "/v3/project/schemas/resourceRequirements"
   },
   "restartCount": 0,
   "runAsNonRoot": false,
   "state": "running",
   "stdin": true,
   "stdinOnce": false,
[...]

AC – Using check_rancher2
- check_rancher2 is a monitoring plugin which uses Rancher 2’s API
- Can run anywhere (requires http/https connection to API)
- Checks status of:
- Cluster(s)
- Project(s)
- Workload(s) (→ Services)
- Pod(s) (→ Containers)
- Consider the API endpoint as the “host”, check types as its services
- Future (hopefully):
- Workload or Pod usage (cpu, memory, network statistics)
depends on https://github.com/rancher/rancher/issues/14230

AC – check_rancher2 example
- Workload stuck in “removing”
- New workload (importer) already active
- Monitoring needs to alert me about this!

AC – check_rancher2 example
$ ./check_rancher2.sh H rancher2.example.com U token
xxxxx P longsecretpass S t pod p cr8ss7:p85rmm o
importer8bf85dcc9r5rtn n gamma
CHECK_RANCHER2 CRITICAL Pod importer8bf85dcc9r5rtn is
removing|'pod_active'=0;;;; 'pod_error'=1;;;;
- Plugin connects to Rancher 2 API using the information from the parameters:
-H: API Host/DNS/IP
-U: User-ID (token-xxxxx)
-P: Password for User-ID
-S: Use SSL (https)
-t: Use “pod” check type
-p: Project name (contains cluster ID, too)
-o: Pod name (optional)
-n: Namespace (optional, required for specific pod name)

Application Containers – Recap
- It’s not only Docker anymore → containerd (+ runc, kata) as container engine
- An application container is not a classical host
- Think of it as an application/process
- Use orchestration/container management (Kubernetes, Rancher, OpenShift, ...)
- Set up health checks → Healthchecks are your monitoring go go!
- Monitor these health checks using orchestration/management API’s
(Rancher2: check_rancher2)
- There might also be plugins which use kubectl locally

References and links
- Lost at sea: https://gcaptain.com/number-of-containers-lost-at-sea-falling-survey-shows/
- LXC: https://linuxcontainers.org/
- cgroup-v1: https://www.kernel.org/doc/Documentation/cgroup-v1/
- cgroup-v2: https://www.kernel.org/doc/Documentation/cgroup-v2.txt
- Docker: https://www.docker.com/
- Kubernetes: https://kubernetes.io/
- containerd: https://containerd.io/
- Rancher: https://rancher.com/
- check_lxc: https://claudiokuenzler.com/monitoring-plugins/check_lxc.php
- check_rancher2: https://claudiokuenzler.com/monitoring-plugins/check_rancher2.php

Thank you
[[ $questions eq 0 ]] && exit 0

Its all about the... containers!

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Its all about the... containers!

Ähnlich wie Its all about the... containers! (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Its all about the... containers!

Hinweis der Redaktion