Let’s Fix Logging Once and for All

Brought to you by
Let's Fix Logging
Once and for All
Peter Portante
Senior Principal Software Engineer at

Abstract … and Why?
“We describe a modiﬁcation to the Linux Kernel which gives an SRE control over
the combined bandwidth of logging on a node of a distributed system, while
providing a way for the logging source owner (container or service) to control what
happens when the bandwidth limit is hit.”
■ Why do we care?
● Because a node can become unstable when one or more processes consume disk or network
resources due to bugs (or unintended behaviors) or malicious code
■ Why separation of policy from rate-limit?
● So that the SREs can provide a stable platform, while application / service owners maintain
behavior in the face of limits

Peter Portante
Senior Principal Software Engineer at Red Hat, Inc.
■ Something cool I’ve done - 7 club passing
■ My perspective on P99s - New and hopeful
■ Another thing about me - I enjoy yard work and puttering
■ What I do away from work - I love to juggle clubs

First Principles
■ Restore behavioral control for logging on a node to the SRE
● An SRE should be able to set a limit for the total logging rate of a node
■ Applications retain control of their behavior when limits are hit
● Should the application slow to meet the logging rate?
● Should the application ignore the limit by dropping logs?

Node Rate-Limit for SRE
■ Implement an opt-in “bandwidth gate” for ﬁle descriptors
■ SRE sets bandwidth limit for the gate
● System-wide
● Amount per interval (100 MB/sec, 10 Mb/min, etc.)
■ write() system call does not move data if bandwidth limit is hit during interval
■ SRE directs participating frameworks (systemd, podman/conmon, etc.) to use
the gate

Behavioral Policies for the Application
■ Add policy associated with the application
● Policy is either “drop” or “block” (default set by the SRE for the system)
■ For “drop”, write() system call always returns number of bytes that were given as written
● But only actually writes amount that can ﬁt in that interval’s bandwidth
■ For “block”, write() system call returns number of bytes able to be written in the interval,
blocks when total number of bytes for interval has been reached
● The key is that write() will block before any data is transferred from the user’s buffer
when the limit is hit
● Frameworks that create processes (systemd, podman/conmon, etc.) set requested policy

Ah … Why is this a problem now?

What Changed
■ Container run-times which byte-capture / interpret stdout & stderr by
default, and write the data to disk ﬁrst
● Podman / CRI-O (conmon), Docker
■ Densiﬁcation of applications as a node’s memory and compute resources
have grown
● With 10+ cores per socket, and hyper-threads, node concurrency can easily generate more log
data than available local disk or network bandwidth can handle
■ Separation of who writes applications from who runs them where
● Containers make it easy to build an app once, and run it anywhere

Logging Subsystems from development to production
Courtesy https://gifmemes.io/

But why in the Kernel?
■ Both conmon and systemd could implement a similar mechanism in
user-space
● BUT data is transferred through a pipe (conmon) and a socket (systemd) before those services
can handle it
■ For systemd
● One can already come close to this solution with the existing behaviors, BUT the application
owner has no control over drop vs block
■ For conmon
● A shared memory segment could be used across all conmon processes, BUT then the SRE has
to consider how to manage each sub-system separately
■ The kernel-based solution avoids unnecessary resource usage and gives the
SRE one-place to set the logging limit

SRE Sets Node’s Logging Bandwidth Limit
■ A simple agreed-upon sysconfig ﬁle containing the bandwidth limit
● /etc/sysconfig/logging-bandwidth
■ INTERVAL = 10 secs
■ MAXIMUM_BYTES = 100 MiB
■ eBPF script for implementing rate-limit and policy enforcement provided
■ Systemd and Podman (conmon) “opt-in” creating pipes and sockets with eBPF
hook enabled

Policy Provided via Systemd & Podman
■ Systemd
● In service ﬁle
■ StdoutLoggingPolicy = drop
■ StderrLoggingPolicy = block
■ SyslogLoggingPolicy = block
■ Podman (conmon)
● $ podman run
--log-opt stdoutloggingpolicy=drop
--log-opt stderrloggingpolicy=block

Policy Provided via Kubernetes Container Spec
apiVersion: v1
kind: Pod
metadata:
name: helloworld
spec:
containers:
- name: helloworld
image: helloworld
logging:
policy:
stdout: drop
stderr: block

Recap
■ Institute a node logging limit controlled by the SRE
■ Give application owners the ability to determine behavior at the limit
● drop vs block
■ Place the gate so data is not transferred from a process
● Avoid unnecessary data movement and resource usage
■ Implement in the Kernel to share among participating sub-systems
● podman/conmon, systemd, etc.

Brought to you by
Peter Portante
peter.portante@redhat.com

Let’s Fix Logging Once and for All

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Let’s Fix Logging Once and for All

Ähnlich wie Let’s Fix Logging Once and for All (20)

Mehr von ScyllaDB

Mehr von ScyllaDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Let’s Fix Logging Once and for All