eBPF is one of the key technologies nowadays. There are several existing technologies in network or observability fields but not much in storage space. This presentation tells my research story and tries to define some of the possibilities of the technology.
3. StorageOS is cloud native, software-defined
storage for running containerized applications
in production, running in the cloud, on-prem
and in hybrid/multi-cloud environments.
3
6. ● What the heck is Extended Berkley Packet Filter (eBPF)
− Linux kernel feature since 4.1 - 🙀
− First it was an iptables replacement (BPF)
− It uses kernel events to do various things
− cat /proc/kallsyms | wc -l
● 185449 (and counting)
− eBPF has the capability to interact with userspace
− Script compiled to a special eBPF bytecode
− New attack vendor
● In short:
− Small, mostly C program, compiled to bytecode to hook up at almost anywhere in
the kernel.
Basics
7. How does it work?
Source: https://www.brendangregg.com/ebpf.html
10. ● Tracing at the VFS layer level
− At this level eBPF plugin is able to catch file related events:
● CRUD of files or directories
● File system caches
● Mount points
● cat /proc/kallsyms | grep "t vfs" | wc -l
− 44
● Examples:
− vfsstat.py: Count VFS calls
− vfsreadlat.c: VFS read latency distribution
Storage related options
11. ● Tracing at the file system layer level
− File system specific events:
● Ext4, NFS, BTRS, …
● CRUD operations
● Low level operations
● Performance related events
● cat /proc/kallsyms | grep "t ext4" | wc -l
− 397
● Examples:
− nfsslower.py: Trace slow NFS operations
− btrfsdist.py: Summarize BTRFS operation latency distribution
Storage related options
12. ● Tracing at the block device / device driver layer levels
− A trace at this level gives insight on which areas of:
● Low level - near to HW – operations
● Physical disk devices
● Virtual block devices
● Block device read – write
● Examples:
− bitehist.py: Block I/O size
− disksnoop.py: Trace block device I/O latency
Storage related options
13. ● Supported architectures are limited (arm, amd64 included)
● Not supported everywhere
− Needs CONFIG_BPF_SYSCALL during kernel build
− Container needs privileged mode
− In cloud it should be tricky, not widely supported
● Portability is tricky
● Limited size of MAPs
● Hard to debug
● Test matrix should be huge on case of a heterogeneous infrastructure
Weaknesses
14. ● Small pre-built bytecode
● JIT compiled
− Depends on CONFIG_BPF_JIT
● Kernel changes observed function instruction order
− It is native
− No extra layer
− No exact or measurable overhead
Performance impact
18. ● Without interacting a user space program eBPF has just a limited use-cases
● EBPF uses a shared MAPs to gap the overlap the gap
● Read of MAP happens asynchronous
● There are several type of MAPs for different uses-cases
Interacting with userspace
21. ● BCC
− BCC is a toolkit for creating efficient kernel tracing and manipulation programs
− Contains lots of examples
− Kernel instrumentation is written in C
− Python and Lua frontends
● Dynamic generated C source in Python source looks really ugly
Frontends
22. ● BPFTrace
− High level, fixed scope tracing language
− Solves portability
− Language is inspired by awk and C, and predecessor tracers such as Dtrace
− Many of the BCC examples have rewritten in BPFTrace
− Supports one liners
● bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %sn", comm,
str(args->filename)); }
− Kubectl plugin exists: kubectl-trace
− Easy to learn:
● Trace all EXT4 reads in the given mount point
https://github.com/mhmxs/bpftrace/pull/1/files
Frontends
24. ● Gobpf
− Provides Go binding for BCC Framework
− Low level utils to load and use eBPF programs
− The same as BCC:
● Kernel instrumentation is written in C
● Python - Go
Frontends
25. ● Cilium/ebpf
− Pure Go library that provides utilities for loading, compiling, and debugging eBPF
programs
− Contains lots of examples
− Useful helper functions
− Kernel instrumentation is written in ASM
● Generated with Go code
− Kernel instrumentation is written in C
● Generates Go bindings
Frontends
27. ● By default eBPF program has to match with kernel
− Function signatures can change
− Data structures can change
● What options we have to increase portability
− Use BPFTrace if possible because it just works
− Deal with kernel version match
Portability
28. ● Helpers to deal with it
● Use Cilium/ebpf because of it’s handy helpers
● Bpftool is able to dump kernel headers
● bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
● High-level BPF CO-RE mechanics
● The CO-RE is a set of macros to generate memory accessors on the fly
● Read memory
● Field exists
● So on...
−
Portability
29. ● Kernel memory is not readable directly
− bpf_core_read() function reads the memory
● Kernel structs are randomly ordered
● High-level BPF CO-RE mechanics
− BPF_CORE_READ(file, f_path.dentry, d_iname); // path of data
− With regular bpf_core_read() each f_path, dentry, d_name needs to read into a
separated variable
Portability
30. ● Hard to debug
● Many times there is no error, just does nothing
● BPF calls are also traceable
− Needs to recompile the kernel
− Needs to disable JIT compiler
● Rbpf is a eBPF virtual machine in Rust
Debugging
32. ● I LOVE eBPF
● Lot’s of opportunities from AI driven storage miner detector to real-time file monitoring
● With a bit of kernel knowledge it is easy to react on almost any kind event
● Several frontends, helpers and other libraries
● Bunch of existing projects – real world experience
● Kubernetes integration depends on distribution/platform
● C is mandatory at the end of the day
● Really hard to debug
SUMM()
34. ● eBPF for SRE with Reilably: https://dev.to/reliably/ebpf-for-sre-with-reliably-18dc
● Tracing Go function arguments in prod: https://blog.px.dev/ebpf-function-tracing/post/
● Tracing SSL/TLS connections: https://blog.px.dev/ebpf-openssl-tracing
Extra reading