SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Linux kernel tracing superpowers
in the cloud
Andrea Righi
andrea@betterservers.com
@arighi
Who am I?
●
Andrea Righi
●
Performance engineer @
BetterServers.com
●
My main activities
●
Linux kernel stuff
●
Virtualization
●
Storage
●
Cloud computing
Agenda
●
Overview
●
Profiling Technologies
●
Examples
●
Q/A
Overview
https://imgs.xkcd.com/comics/optimization.png
Premature optimization
anti-methodology
Drunk-man anti-methodology
●
Tune random things until the problem goes
away
Blame someone else
anti-methodology
●
Find a component X that you are not
responsible for and redirect problems to
component X
Problem-solving methodology
●
Observe
●
Measure
●
Optimize
●
Rinse and repeat...
CPU sampling vs tracing
●
Sampling
●
Create a periodic timed interrupt that collects the
current program counter, function address and the
entire stack back trace
●
Tracing
●
Record times and invocations of specific events
Generic performance analysis tools
●
uptime → system lifetime and load average
●
top → generic overall system stat
●
vmstat 1 → system/memory stat by time
●
mpstat -P ALL 1 → CPU load balancing
●
pidstat 1 → process usage
●
iostat -kxd 1 → disk I/O
●
free -m → memory usage
●
sar -n DEV 1 → network I/O
●
dmesg | tail → last kernel error messages
Sampling technologies
perf
●
perf is a powerful multi-tool and profiler
●
Interval sampling
●
CPU performance counter events
●
user + kernel sampling and tracing
●
event filtering
●
perf top → best tool to get an idea of what’s
going on in the system
Visualizing traces: flame graphs
●
CPU flame graphs
●
x-axis
sample population
●
y-axis
●
stack depth
●
Wider boxes =
More samples =
More time spent on CPU
Tracing technologies
strace
●
strace(1): system call tracer in Linux
●
It uses the ptrace() system call that pauses the
target process for each syscall so that the
debugger can read the state
●
And it’s doing this twice: when the syscall begins
and when it ends!
strace overhead
### Regular execution ###
righiandr@Dell:~$ dd if=/dev/zero of=/dev/null bs=1 count=500k
512000+0 records in
512000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 0,201641 s, 2,5 MB/s
### Strace execution (tracing a syscall that is never called) ###
righiandr@Dell:~$ strace -eaccept dd if=/dev/zero of=/dev/null bs=1 count=500k
512000+0 records in
512000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 11,7989 s, 43,4 kB/s
+++ exited with 0 +++
Tracepoint
●
A tracepoint is special code statically placed in
your program (programmer defines where to put
the tracepoint)
●
If someone wants to see when the tracepoint is
hit and extract data they can “enable” or “activate”
the tracepoint using a specific interface
●
Two elements are required:
●
Tracepoint definition (placed in a header file)
●
Tracepoint statement (in C code)
Tracepoint example
TRACE_EVENT(ext4_free_inode,
TP_PROTO(struct inode *inode),
TP_ARGS(inode),
TP_STRUCT__entry(
__field( dev_t, dev )
__field( ino_t, ino )
__field( uid_t, uid )
__field( gid_t, gid )
__field( __u64, blocks )
__field( __u16, mode )
),
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
__entry->uid = i_uid_read(inode);
__entry->gid = i_gid_read(inode);
__entry->blocks = inode->i_blocks;
__entry->mode = inode->i_mode;
),
TP_printk("dev %d,%d ino %lu mode 0%o uid %u gid %u blocks %llu",
MAJOR(__entry->dev), MINOR(__entry->dev),
(unsigned long) __entry->ino, __entry->mode,
__entry->uid, __entry->gid, __entry->blocks)
);
Kprobes (Kernel probes)
●
Trap almost every kernel code address, specifying a handler routine to be
invoked when the breakpoint is hit
●
How does it work?
●
Make a copy of the probed instruction and replace the original instruction with a
breakpoint instruction (int3 on x86)
●
When the breakpoint is hit, a trap occurs, CPU's registers are saved and the control
passes to the Kprobes pre-handler
●
The saved instruction is executed in single-step mode
●
The Kprobes post-handler is executed
●
The rest of the original function is executed
●
Same mechanism can be applied to user-space
●
uprobes
Kprobe example: stack trace
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>
static const char function_name[] = "schedule_timeout";
static int my_handler(struct kprobe *p, struct pt_regs *regs)
{
dump_stack();
printk(KERN_INFO "%s called %s(%d)n",
current->comm, function_name, (int)regs->di);
}
static struct kprobe my_kp = {
.pre_handler = my_wrapper,
.symbol_name = function_name,
};
static int __init my_kprobe_init(void)
{
return register_kprobe(&my_kp);
}
static void __exit my_kprobe_exit(void)
{
unregister_kprobe(&my_kp);
}
Example: kprobe / uprobe
●
Example (kprobe)
$ sudo ./bin/kprobe 'p:do_sys_open filename=+0(%si):string'
$ sudo ./bin/kprobe 'p:SyS_execve filename=+0(%di):string'
●
Example (uprobe)
$ sudo ./bin/uprobe 'r:bash:readline +0($retval):string'
$ sudo ./bin/uprobe 'p:/lib/x86_64-linux-gnu/libc-2.23.so:system +0(%di):string'
$ sudo ./bin/uprobe 'p:/lib/x86_64-linux-gnu/libc-2.23.so:malloc size=%di'
●
Tracing format
$ sudo cat /sys/kernel/debug/tracing/trace
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
More advanced examples
●
Access complex data struct via kprobe and
perf probe:
$ sudo -i perf probe --vmlinux=/home/righiandr/linux/vmlinux 
-nv 'netif_receive_skb skb->dev->name'
...
Writing event: p:probe/netif_receive_skb _text+7991520 name=+0(+16(%di))
...
$ sudo ./bin/kprobe 'p:netif_receive_skb name=+0(+16(%di)):string'
Tracing overhead
●
strace: high overhead
●
tracepoints: low overhead
●
kprobes/uprobes: very low overhead
Efficient profiling: eBPF
eBPF: definition
●
eBPF: a highly efficient virtual machine that
lives in the kernel
●
Ingo Molnar described eBPF as
● “One of the more interesting features in this cycle is the
ability to attach eBPF programs (user-defined, sandboxed
bytecode executed by the kernel) to kprobes. This allows
user-defined instrumentation on a live kernel image that can
never crash, hang or interfere with the kernel negatively”
eBPF history
●
Initially it was BPF: Berkeley Packet Filter
●
It has its roots in BSD in the very early 1990’s
●
Originally designed as a mechanism for fast filtering network packets
●
Initially used in Linux by tcpdump to implement the filtering “engine”
behind its complex command-line syntax
●
Linux introduced eBPF: extended Berkeley Packet Filter (3.18 –
December 2014)
●
More efficient / more generic than the original BPF
●
Kernel 4.9: eBPF programs can be attached to perf_events
●
Timed samples can now run BPF programs!
eBPF as a VM
●
Example assembly of a simple eBPF filter
●
Load 16-bit quantity from offset 12 in the packet to the
accumulator (ethernet type)
●
Compare the value to see if the packet is an IP packet
●
If the packet is IP, return TRUE (packet is accepted)
●
otherwise return 0 (packet is rejected)
●
Only 4 VM instructions to filter IP packets!
ldh [12]
jeq #ETHERTYPE_IP, l1, l2
l1: ret #TRUE
l2: ret #0
eBPF context
●
eBPF is not specific to any particular context
●
packet filtering: context is a packet
●
tracing: context is a snapshot of processor registers when the tracepoint is hit
●
JIT:
●
every BPF instruction is mapped to a x86 instruction sequence
●
accumulator and index registers stored directly into processor’s registers
●
program is placed in a vmalloc() space and executed directly when a context
is processed
How to write a eBPF filter
●
A filter can be written in C
●
GCC backend as well as LLVM
backend
●
Compiler generates eBPF byte
code which resides in an ELF file
●
Load the program into the kernel
by using the bpf() syscall
/*
* tracing filter example to print events
* for loobpack device only if attached to
* netif_receive_skb()
*/
#include <linux/skbuff.h>
#include <linux/netdevice.h>
#include <linux/bpf.h>
#include <trace/bpf_trace.h>
void filter(struct bpf_context *ctx)
{
char devname[4] = "lo";
struct net_device *dev;
struct sk_buff *skb = 0;
skb = (struct sk_buff *)ctx->regs.si;
dev = bpf_load_pointer(&skb->dev);
if (bpf_memcmp(dev->name, devname, 2) == 0) {
char fmt[] = "skb %p dev %p n";
bpf_trace_printk(fmt, sizeof(fmt),
(long)skb, (long)dev, 0);
}
}
Source: https://www.goodreads.com/author_blog_posts/14131100-linux-4-9-s-efficient-bpf-profiler
Thread-injection profiling
Parasite thread injection
●
Concept of parasite thread injection introduced in Linux 3.4
(via PTRACE_SEIZE)
●
Attach to the target pid without stopping it and becoming a
“parasite” thread of pid
●
Original goal: freeze and restore TCP connections during
checkpoint/restart
●
Example
●
python-pyrasite: injecting code into running Python programs
References
●
Brendan Gregg blog
●
http://brendangregg.com/blog/
●
BCC tools
●
https://github.com/iovisor/bcc
●
Perf-tools
●
https://github.com/brendangregg/perf-tools
●
Perf-labs
●
https://github.com/brendangregg/perf-labs
●
Linux documentation
●
http://lxr.linux.no/linux/Documentation/trace
●
http://lxr.linux.no/linux/Documentation/kprobes.txt
●
The BSD Packet Filter: A New Architecture for User-level Packet Capture -
S. McCanne and V. Jacobson
●
http://www.tcpdump.org/papers/bpf-usenix93.pdf
●
Linux weekly news
●
http://lwn.net
Thanks
●
@arighi
●
andrea@betterservers.com
Linux kernel tracing superpowers in the cloud

Weitere ähnliche Inhalte

Was ist angesagt?

Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun
idsecconf
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 

Was ist angesagt? (20)

LPC2019 BPF Tracing Tools
LPC2019 BPF Tracing ToolsLPC2019 BPF Tracing Tools
LPC2019 BPF Tracing Tools
 
bcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challengesbcc/BPF tools - Strategy, current tools, future challenges
bcc/BPF tools - Strategy, current tools, future challenges
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLab
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
Linux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - WonokaerunLinux kernel-rootkit-dev - Wonokaerun
Linux kernel-rootkit-dev - Wonokaerun
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPFKernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017: Performance Analysis with BPF
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Performance Wins with BPF: Getting Started
Performance Wins with BPF: Getting StartedPerformance Wins with BPF: Getting Started
Performance Wins with BPF: Getting Started
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bcc
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Container Performance Analysis
Container Performance AnalysisContainer Performance Analysis
Container Performance Analysis
 
Kernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at NetflixKernel Recipes 2017: Using Linux perf at Netflix
Kernel Recipes 2017: Using Linux perf at Netflix
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 

Ähnlich wie Linux kernel tracing superpowers in the cloud

TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsTIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
Xiaozhe Wang
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
Kan-Ru Chen
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
Silvio Cesare
 

Ähnlich wie Linux kernel tracing superpowers in the cloud (20)

TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling ToolsTIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
Debug generic process
Debug generic processDebug generic process
Debug generic process
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports Development
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDB
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
 
NSC #2 - Challenge Solution
NSC #2 - Challenge SolutionNSC #2 - Challenge Solution
NSC #2 - Challenge Solution
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
Andrea Righi - Spying on the Linux kernel for fun and profit
Andrea Righi - Spying on the Linux kernel for fun and profitAndrea Righi - Spying on the Linux kernel for fun and profit
Andrea Righi - Spying on the Linux kernel for fun and profit
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
Valgrind
ValgrindValgrind
Valgrind
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 
Linux Kernel Debugging
Linux Kernel DebuggingLinux Kernel Debugging
Linux Kernel Debugging
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
 

Mehr von Andrea Righi (6)

Eco-friendly Linux kernel development
Eco-friendly Linux kernel developmentEco-friendly Linux kernel development
Eco-friendly Linux kernel development
 
Linux kernel bug hunting
Linux kernel bug huntingLinux kernel bug hunting
Linux kernel bug hunting
 
Kernel bug hunting
Kernel bug huntingKernel bug hunting
Kernel bug hunting
 
Understand and optimize Linux I/O
Understand and optimize Linux I/OUnderstand and optimize Linux I/O
Understand and optimize Linux I/O
 
Debugging linux
Debugging linuxDebugging linux
Debugging linux
 
Linux boot-time
Linux boot-timeLinux boot-time
Linux boot-time
 

Kürzlich hochgeladen

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 

Kürzlich hochgeladen (20)

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 

Linux kernel tracing superpowers in the cloud

  • 1. Linux kernel tracing superpowers in the cloud Andrea Righi andrea@betterservers.com @arighi
  • 2. Who am I? ● Andrea Righi ● Performance engineer @ BetterServers.com ● My main activities ● Linux kernel stuff ● Virtualization ● Storage ● Cloud computing
  • 5.
  • 7. Drunk-man anti-methodology ● Tune random things until the problem goes away
  • 8. Blame someone else anti-methodology ● Find a component X that you are not responsible for and redirect problems to component X
  • 10. CPU sampling vs tracing ● Sampling ● Create a periodic timed interrupt that collects the current program counter, function address and the entire stack back trace ● Tracing ● Record times and invocations of specific events
  • 11. Generic performance analysis tools ● uptime → system lifetime and load average ● top → generic overall system stat ● vmstat 1 → system/memory stat by time ● mpstat -P ALL 1 → CPU load balancing ● pidstat 1 → process usage ● iostat -kxd 1 → disk I/O ● free -m → memory usage ● sar -n DEV 1 → network I/O ● dmesg | tail → last kernel error messages
  • 13. perf ● perf is a powerful multi-tool and profiler ● Interval sampling ● CPU performance counter events ● user + kernel sampling and tracing ● event filtering ● perf top → best tool to get an idea of what’s going on in the system
  • 14. Visualizing traces: flame graphs ● CPU flame graphs ● x-axis sample population ● y-axis ● stack depth ● Wider boxes = More samples = More time spent on CPU
  • 16. strace ● strace(1): system call tracer in Linux ● It uses the ptrace() system call that pauses the target process for each syscall so that the debugger can read the state ● And it’s doing this twice: when the syscall begins and when it ends!
  • 17. strace overhead ### Regular execution ### righiandr@Dell:~$ dd if=/dev/zero of=/dev/null bs=1 count=500k 512000+0 records in 512000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 0,201641 s, 2,5 MB/s ### Strace execution (tracing a syscall that is never called) ### righiandr@Dell:~$ strace -eaccept dd if=/dev/zero of=/dev/null bs=1 count=500k 512000+0 records in 512000+0 records out 512000 bytes (512 kB, 500 KiB) copied, 11,7989 s, 43,4 kB/s +++ exited with 0 +++
  • 18. Tracepoint ● A tracepoint is special code statically placed in your program (programmer defines where to put the tracepoint) ● If someone wants to see when the tracepoint is hit and extract data they can “enable” or “activate” the tracepoint using a specific interface ● Two elements are required: ● Tracepoint definition (placed in a header file) ● Tracepoint statement (in C code)
  • 19. Tracepoint example TRACE_EVENT(ext4_free_inode, TP_PROTO(struct inode *inode), TP_ARGS(inode), TP_STRUCT__entry( __field( dev_t, dev ) __field( ino_t, ino ) __field( uid_t, uid ) __field( gid_t, gid ) __field( __u64, blocks ) __field( __u16, mode ) ), TP_fast_assign( __entry->dev = inode->i_sb->s_dev; __entry->ino = inode->i_ino; __entry->uid = i_uid_read(inode); __entry->gid = i_gid_read(inode); __entry->blocks = inode->i_blocks; __entry->mode = inode->i_mode; ), TP_printk("dev %d,%d ino %lu mode 0%o uid %u gid %u blocks %llu", MAJOR(__entry->dev), MINOR(__entry->dev), (unsigned long) __entry->ino, __entry->mode, __entry->uid, __entry->gid, __entry->blocks) );
  • 20. Kprobes (Kernel probes) ● Trap almost every kernel code address, specifying a handler routine to be invoked when the breakpoint is hit ● How does it work? ● Make a copy of the probed instruction and replace the original instruction with a breakpoint instruction (int3 on x86) ● When the breakpoint is hit, a trap occurs, CPU's registers are saved and the control passes to the Kprobes pre-handler ● The saved instruction is executed in single-step mode ● The Kprobes post-handler is executed ● The rest of the original function is executed ● Same mechanism can be applied to user-space ● uprobes
  • 21. Kprobe example: stack trace #include <linux/kernel.h> #include <linux/module.h> #include <linux/kprobes.h> static const char function_name[] = "schedule_timeout"; static int my_handler(struct kprobe *p, struct pt_regs *regs) { dump_stack(); printk(KERN_INFO "%s called %s(%d)n", current->comm, function_name, (int)regs->di); } static struct kprobe my_kp = { .pre_handler = my_wrapper, .symbol_name = function_name, }; static int __init my_kprobe_init(void) { return register_kprobe(&my_kp); } static void __exit my_kprobe_exit(void) { unregister_kprobe(&my_kp); }
  • 22. Example: kprobe / uprobe ● Example (kprobe) $ sudo ./bin/kprobe 'p:do_sys_open filename=+0(%si):string' $ sudo ./bin/kprobe 'p:SyS_execve filename=+0(%di):string' ● Example (uprobe) $ sudo ./bin/uprobe 'r:bash:readline +0($retval):string' $ sudo ./bin/uprobe 'p:/lib/x86_64-linux-gnu/libc-2.23.so:system +0(%di):string' $ sudo ./bin/uprobe 'p:/lib/x86_64-linux-gnu/libc-2.23.so:malloc size=%di' ● Tracing format $ sudo cat /sys/kernel/debug/tracing/trace # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION
  • 23. More advanced examples ● Access complex data struct via kprobe and perf probe: $ sudo -i perf probe --vmlinux=/home/righiandr/linux/vmlinux -nv 'netif_receive_skb skb->dev->name' ... Writing event: p:probe/netif_receive_skb _text+7991520 name=+0(+16(%di)) ... $ sudo ./bin/kprobe 'p:netif_receive_skb name=+0(+16(%di)):string'
  • 24. Tracing overhead ● strace: high overhead ● tracepoints: low overhead ● kprobes/uprobes: very low overhead
  • 26. eBPF: definition ● eBPF: a highly efficient virtual machine that lives in the kernel ● Ingo Molnar described eBPF as ● “One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively”
  • 27. eBPF history ● Initially it was BPF: Berkeley Packet Filter ● It has its roots in BSD in the very early 1990’s ● Originally designed as a mechanism for fast filtering network packets ● Initially used in Linux by tcpdump to implement the filtering “engine” behind its complex command-line syntax ● Linux introduced eBPF: extended Berkeley Packet Filter (3.18 – December 2014) ● More efficient / more generic than the original BPF ● Kernel 4.9: eBPF programs can be attached to perf_events ● Timed samples can now run BPF programs!
  • 28. eBPF as a VM ● Example assembly of a simple eBPF filter ● Load 16-bit quantity from offset 12 in the packet to the accumulator (ethernet type) ● Compare the value to see if the packet is an IP packet ● If the packet is IP, return TRUE (packet is accepted) ● otherwise return 0 (packet is rejected) ● Only 4 VM instructions to filter IP packets! ldh [12] jeq #ETHERTYPE_IP, l1, l2 l1: ret #TRUE l2: ret #0
  • 29. eBPF context ● eBPF is not specific to any particular context ● packet filtering: context is a packet ● tracing: context is a snapshot of processor registers when the tracepoint is hit ● JIT: ● every BPF instruction is mapped to a x86 instruction sequence ● accumulator and index registers stored directly into processor’s registers ● program is placed in a vmalloc() space and executed directly when a context is processed
  • 30. How to write a eBPF filter ● A filter can be written in C ● GCC backend as well as LLVM backend ● Compiler generates eBPF byte code which resides in an ELF file ● Load the program into the kernel by using the bpf() syscall /* * tracing filter example to print events * for loobpack device only if attached to * netif_receive_skb() */ #include <linux/skbuff.h> #include <linux/netdevice.h> #include <linux/bpf.h> #include <trace/bpf_trace.h> void filter(struct bpf_context *ctx) { char devname[4] = "lo"; struct net_device *dev; struct sk_buff *skb = 0; skb = (struct sk_buff *)ctx->regs.si; dev = bpf_load_pointer(&skb->dev); if (bpf_memcmp(dev->name, devname, 2) == 0) { char fmt[] = "skb %p dev %p n"; bpf_trace_printk(fmt, sizeof(fmt), (long)skb, (long)dev, 0); } }
  • 33. Parasite thread injection ● Concept of parasite thread injection introduced in Linux 3.4 (via PTRACE_SEIZE) ● Attach to the target pid without stopping it and becoming a “parasite” thread of pid ● Original goal: freeze and restore TCP connections during checkpoint/restart ● Example ● python-pyrasite: injecting code into running Python programs
  • 34. References ● Brendan Gregg blog ● http://brendangregg.com/blog/ ● BCC tools ● https://github.com/iovisor/bcc ● Perf-tools ● https://github.com/brendangregg/perf-tools ● Perf-labs ● https://github.com/brendangregg/perf-labs ● Linux documentation ● http://lxr.linux.no/linux/Documentation/trace ● http://lxr.linux.no/linux/Documentation/kprobes.txt ● The BSD Packet Filter: A New Architecture for User-level Packet Capture - S. McCanne and V. Jacobson ● http://www.tcpdump.org/papers/bpf-usenix93.pdf ● Linux weekly news ● http://lwn.net