SlideShare a Scribd company logo
1 of 47
Kernel Debugging
Hao-Ran Liu
Choices of debugging tools
• Add debug code, recompile and run
– printk, but bug may disappear if it's timing sensitive and data is
written to a serial console
– Set console log level to 0 and use dmesg instead
• Patch code at runtime to print or gather data
– Ftrace, Kprobes
• Patch code at runtime to stop kernel and analyze
– KDB, KGDB
• Run the kernel under the control of VM like QEMU,
VirtualBox
printk()
• Kernel-space equivalent of printf()
• Each kernel message are prepended a
string representing its loglevel n
– “<n>Hello world!”
• Loglevel determines the severity of the
message
Printk loglevel
• Messages with level lower than console_loglevel are
shown to the console
• console_loglevel can be changed via
– dmesg -n level
– syslog system call
– echo n > /proc/sys/kernel/printk
Name String Meaning Alias macro
KERN_EMERG "0" Emergency messages, system is about to crash or is unstable pr_emerg()
KERN_ALERT "1" Something bad happened and action must be taken immediately pr_alert()
KERN_CRIT "2" A serious hardware/software failure pr_crit()
KERN_ERR "3" Often used by drivers to indicate difficulties with the hardware pr_err()
KERN_WARNING "4" nothing serious by itself but might indicate problems pr_warning()
KERN_NOTICE "5" Nothing serious. Often used to report security events. pr_notice()
KERN_INFO "6" Informational message e.g. startup info. at driver initialization pr_info()
KERN_DEBUG "7" Debug messages
pr_debug()if DEBUG is
defined
KERN_DEFAULT "d" The default kernel loglevel
KERN_CONT "" "continued" line after a line that had no enclosing n pr_cont()
Kernel log buffer
• kernel log buffer stores kernel messages
• It is a circular buffer. Old messages are
overwritten when the buffer is full
– Use klogd daemon to keep old msgs in a file
– Log buffer size is configurable
• Kernel log buffer can be manipulated via
syslog system call
– or dmesg command line tool
syslog system call
• int syslog(int type, char *bufp, int len)
/*
* Commands to sys_syslog:
*
* 0 -- Close the log. Currently a NOP.
* 1 -- Open the log. Currently a NOP.
* 2 -- Read from the log (wait until the buffer is nonempty)
* 3 -- Read all messages remaining in the ring buffer
* 4 -- Read and clear all messages remaining in the ring buffer
* 5 -- Clear ring buffer.
* 6 -- Disable printk to console
* 7 -- Enable printk to console
* 8 -- Set level of messages printed to console
* 9 -- Return number of unread characters in the log buffer
*/
Klogd and syslogd
• Klogd is “kernel log daemon”. It receives kernel
messages via syslog system call (or /proc/kmsg) and
redirect them to syslogd
• syslogd differentiate messages by facility.priority (ex.
LOG_KERN.LOG_ERR) and consults /etc/syslog.conf to
know how to deal with them (discard or save in a file)
Kernel
Log buffer
/proc/kmsg
sys_syslog()
klogd
syslogd
file
files
Kernel space User space
C library:
openlog()
closelog()
syslog()
other
daemons
Use printk macros
• Do not remove debug printk
– you may need it later to debug another related issue
• Undefine DEBUG to remove debug messages in
a production kernel
• For drivers, use dev_dbg() instead
Limit the rate of your printk
• Printk may overwhelm the console if
– printk in a code which get executed very often
– printk in a frequently-triggered IRQ handler (eg. Timer)
• printk_ratelimit() return 0 when message to be
printed should be surpressed
• printk_once()
– no matter how often you call it, it prints once and
never again
if (printk_ratelimit( ))
printk(KERN_NOTICE "The printer is still on firen");
printk_ratelimit() implementation
• The two variable can be modified via
/proc/sys/kernel/
/* minimum time in jiffies between messages */
int printk_ratelimit_jiffies = 5*HZ;
/* number of messages we send before ratelimiting */
int printk_ratelimit_burst = 10;
int printk_ratelimit(void)
{
return __printk_ratelimit(printk_ratelimit_jiffies,
printk_ratelimit_burst);
}
/proc file system
• A software-created, pseudo file system
• Contains many system information, ex:
– /proc/<pid>/maps
– /proc/sys/kernel/*
– /proc/interrupts
– /proc/meminfo
• Use of /proc fs is discouraged, they should
contain only information about process
• You should use sysfs or debugfs instead
debugfs
• a simple way to make information
available to user space
– Unlike sysfs, which has strict one-value-per-
file rules
– NOT a stable API for user space
– mount -t debugfs none /sys/kernel/debug
debugfs example
#include <linux/module.h>
#include <linux/debugfs.h>
#define len 200
u64 intvalue, hexvalue;
struct dentry *dirret, *fileret, *u64int, *u64hex;
char _buf[len];
static ssize_t myreader(struct file *fp,
char __user *user_buffer, size_t count, loff_t *pos)
{
char *kbuf = (char *)file_inode(fp)->i_private;
return simple_read_from_buffer(user_buffer, count, pos,
kbuf, len);
}
static ssize_t mywriter(struct file *fp,
const char __user *user_buffer, size_t count, loff_t *pos)
{
char *kbuf = (char *)file_inode(fp)->i_private;
return simple_write_to_buffer(kbuf, len, pos,
user_buffer, count);
}
static const struct file_operations fops_debug = {
.read = myreader,
.write = mywriter,
};
static int __init init_debug(void) {
/* create a directory in /sys/kernel/debug */
dirret = debugfs_create_dir(“mydebug", NULL);
if (IS_ERR_OR_NULL(dirret))
return -ENODEV;
/* create a file in the above directory
This requires read and write file operations */
fileret = debugfs_create_file("text", 0644, dirret,
_buf, &fops_debug);
/* create a file which takes in a int(64) value */
u64int = debugfs_create_u64("number", 0644, dirret,
&intvalue);
/* takes a hex decimal value */
u64hex = debugfs_create_x64("hexnum", 0644, dirret,
&hexvalue);
return 0;
}
static void __exit exit_debug(void) {
/* remove mydebug dir recursively */
debugfs_remove_recursive(dirret);
}
module_init(init_debug);
module_exit(exit_debug);
strace: system call trace
• Intercepts and records
– system calls issued by a process
– signals a process received
• Where to use
– Have a in indepth understanding of the exactly behavior of a program
– Debug the exactly argument or system call a program issued
– When you don’t have access to the source code
• Syntax
– strace [option] <command [args]>
• Common option
– -c -- count time, calls, and errors for each syscall and report summary
– -f -- follow forks
– -T -- print time spent in each syscall
– -e expr -- a qualifying expression: option=[!]all or option=[!]val1[,val2]...
(options: trace, abbrev, verbose, raw, signal, read, or write)
strace output example
execve("/bin/dmesg", ["dmesg"], [/* 22 vars */]) = 0
...
syslog(0x3, 0x95d3858, 0x4008) = 16384
write(1, "amily 2nIP: routing cache hash t"..., 4096amily
write(1, "to accept 2 bytes to c1bd7f9e fr"...,
...
munmap(0xb7d6b000, 4096) = 0
exit_group(0) = ?
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
92.75 0.013263 35 374 write
5.02 0.000718 718 1 syslog
0.51 0.000073 18 4 1 open
0.47 0.000067 34 2 munmap
0.41 0.000058 12 5 old_mmap
0.34 0.000048 24 2 mmap2
0.11 0.000016 4 4 fstat64
0.10 0.000015 15 1 read
0.10 0.000015 8 2 mprotect
0.08 0.000012 3 4 brk
0.04 0.000006 2 3 close
0.04 0.000006 6 1 uname
0.02 0.000003 3 1 set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00 0.014300 404 1 total
Kernel oops
• When kernel detects some bug in itself
– Fault: Kernel kill faulting process and try to continue
• Some locks and data structures may not be released
properly; the system cannot be trusted anymore
– Panic: system halts, usually in interrupt context or in
idle, init task where kernel think it cannot recover itself
• Oops message contains
– Error message
– Contents of registers
– Stack dump
– Function call trace
• Enable CONFIG_KALLSYMS at kernel
configuration to have symbolic call trace
(otherwise all you see are binary addresses)
Kernel Oops Example
• Code below will trigger an oops
ssize_t faulty_write (struct file *filp, const char __user *buf, size_t count,
loff_t *pos)
{
/* make a simple fault by dereferencing a NULL pointer */
*(int *)0 = 0;
return 0;
}
struct file_operations faulty_fops = {
.read = faulty_read,
.write = faulty_write,
.owner = THIS_MODULE
};
Kernel Oops Example
Unable to handle kernel NULL pointer dereference at virtual address 00000000
Internal error: Oops: 817 [#1] SMP ARM
Modules linked in: faulty(O) bnep hci_uart btbcm bluetooth brcmfmac brcmutil
CPU: 1 PID: 835 Comm: bash Tainted: G O 4.4.21-v7+ #911
task: b6a605c0 ti: b6ae8000 task.ti: b6ae8000
PC is at faulty_write+0x18/0x20 [faulty]
pc : [<7f33c018>] lr : [<8015736c>] sp : b6ae9ed0 ip : b6ae9ee0 fp : b6ae9edc
r10: 00000000 r9 : b6ae8000 r8 : 8000fd08
r7 : b6ae9f80 r6 : 01493c08 r5 : b6ae9f80 r4 : b93953c0
r3 : b6ae9f80 r2 : 00000002 r1 : 01493c08 r0 : 00000000
Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c5383d Table: 36b3806a DAC: 00000055
Process bash (pid: 835, stack limit = 0xb6ae8210)
Stack: (0xb6ae9ed0 to 0xb6aea000)
9ec0: b6ae9f4c b6ae9ee0 8015736c 7f33c00c
9ee0: 00000000 0000000a b934f600 80174fc0 b6ae9f3c b6ae9f00 80174fc0 805b66fc
9f00: b6ae9f3c 801559f8 00000000 80157c34 00000000 00000000 b6ae9f44 b6ae9f28
9f20: 80155a0c 80159158 b93953c0 b93953c0 00000002 01493c08 b6ae9f80 8000fd08
9f40: b6ae9f7c b6ae9f50 80157c64 80157344 80155a0c 801752e8 b93953c0 b93953c0
9f60: 00000002 01493c08 8000fd08 b6ae8000 b6ae9fa4 b6ae9f80 801585d4 80157bd0
[<7f33c018>] (faulty_write [faulty]) from [<8015736c>] (__vfs_write+0x34/0xe8)
[<8015736c>] (__vfs_write) from [<80157c64>] (vfs_write+0xa0/0x1a8)
[<80157c64>] (vfs_write) from [<801585d4>] (SyS_write+0x54/0xb0)
[<801585d4>] (SyS_write) from [<8000fb40>] (ret_fast_syscall+0x0/0x1c)
Code: e24cb004 e52de004 e8bd4000 e3a00000 (e5800000)
Calling convention
• An low-level scheme for how subroutines
receive parameters from their caller and
how they return a result
• ARM 32 register allocation:
Register Use Comment
r15 Program counter
r14 Link register Used by BL instruction
r13 Stack pointer Must 8 bytes aligned
r12 For Intra procedure call
r4 to r11: For local variables Callee saved
r0 to r3 For arguments and return values Caller saved
ARM32 Calling convention
decodecode
• A script for disassembling oops code
pi@raspberrypi:~/linux $ dmesg | scripts/decodecode
[ 80.573075] Code: e24cb004 e52de004 e8bd4000 e3a00000 (e5800000)
All code
========
0: e24cb004 sub fp, ip, #4
4: e52de004 push {lr} ; (str lr, [sp, #-4]!)
8: e8bd4000 ldmfd sp!, {lr}
c: e3a00000 mov r0, #0
10:* e5800000 str r0, [r0] <-- trapping instruction
Code starting with the faulting instruction
===========================================
0: e5800000 str r0, [r0]
Finding oops code with GDB
• Module should be compiled with “-g”
– Add “ccflags-y := -g” to module’s Makefile
pi@raspberrypi:~/sunplus/oops $ cat /proc/modules
faulty 1367 0 - Live 0x7f33c000 (O)
bnep 10340 2 - Live 0x7f335000
...
pi@raspberrypi:~/sunplus/oops $ gdb
GNU gdb (Raspbian 7.7.1+dfsg-5) 7.7.1
(gdb) add-symbol-file faulty.ko 0x7f33c000
add symbol table from file "faulty.ko" at
.text_addr = 0x7f33c000
(y or n) y
Reading symbols from faulty.ko...done.
(gdb) list *0x7f33c018
0x7f33c018 is in faulty_write (/home/pi/sunplus/oops/faulty.c:51).
46
47 ssize_t faulty_write (struct file *filp, const char __user *buf,
size_t count,
48 loff_t *pos)
49 {
50 /* make a simple fault by dereferencing a NULL pointer */
51 *(int *)0 = 0;
52 return 0;
53 }
gdb – observe kernel variables
• gdb can observe variables in the kernel
• How to use?
– gdb /usr/src/linux/vmlinux /proc/kcore
– p jiffies /* print the value of jiffies variable */
– p jiffies /* you get the same value, since gdb cache value readed
from the core file */
– core-file /proc/kcore /* flush gdb cache */
– p jiffies /* you get a different value of jiffies */
• vmlinux is the name of the uncompressed ELF kernel
executable, not bzImage
• kcore represent the kernel executable in the format of a
core file
• Disadvantage
– Read-only access to the kernel
Introduction of KGDB and KDB
●
●
Linux kernel has two different debugger front ends
(kdb and kgdb) which interface to the debug core
KDB
– Use on a system console or serial console
– Not a source level debugger, aimed at doing simple
analysis or diagnosis
– Function
●
●
●
Data: Read/write memory, registers
Linux: process lists, backtrace, dmesg.
Control: set breakpoints, single step instruction
KGDB
●
●
source level debugger, used with GDB to debug a
Linux kernel
Two machines (physical or virtual) are required for
using KGDB
– Communicate via network or serial connection
– Target machine runs the kernel to be debugged
– Development machine runs a instance of GDB against
vmlinux file which contains the symbols.
KGDB Kernel Configuration (1)
●
●
●
●
CONFIG_DEBUG_INFO=y
– Required by GDB for source level debugging. This
adds debug symbols to kernel and modules (gcc -g)
CONFIG_KALLSYMS=y
– Required by KDB to access symbols by name
CONFIG_FRAME_POINTER=y
– Save frame info. in registers or stack to allows GDB to
construct stack back traces more accurately
CONFIG_DEBUG_RODATA=n
– Page tables will disallow write to kernel read-only data.
If this is enabled, you cannot use software breakpoints
KGDB Kernel Configuration (2)
●
●
●
●
●
CONFIG_EXPERIMENTAL=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
– kgdboc is a KGDB I/O driver for use KGDB/KDB over
serial console
CONFIG_SERIAL_8250=y
– Driver for standard serial ports
CONFIG_SERIAL_8250_CONSOLE=y
– Allow the use of a serial port as system console
KDB Kernel Configuration
●
●
●
KGDB must first be enabled before KDB is
enabled. To use KDB on a serial console, kgdboc
and a serial port driver are also needed
CONFIG_KGDB_KDB=y
– include kdb frontend for kgdb
CONFIG_KDB_KEYBOARD=y
– KDB can use a PS/2 type keyboard for an input device
Kernel Parameters - kgdboc
●
●
kgdboc=[kms][[,]kbd][[,]serial_device][,baud]
– Designed to work with a single serial port which is used
for your primary console and for kernel debugging
– kms (kernel mode setting) integration to allow entering
kdb on a graphic console
– Can be configured in kernel boot parameters or at
runtime with sysfs
– does not support interrupting the target via the gdb
remote protocol. You must manually send a sysrq-g
Enable / Disable kgdboc
– echo ttyS0,115200 > /sys/module/kgdboc/parameters/kgdboc
– echo “” > /sys/module/kgdboc/parameters/kgdboc
Kernel Parameters - kgdbwait
●
●
●
●
It makes kernel stop as early as I/O driver supports
and wait for a debugger connection during booting
of a kernel
Useful for debugging kernel initialization
Note
– A KGDB I/O driver must be compiled into kernel and
kgdbwait should always follow the parameter for
KGDB I/O driver in kernel command line
Example
– kgdboc=ttyS0,115200 kgdbwait
Using KDB on serial port
●
●
●
Configure I/O driver
– Boot kernel with kgdboc parameters or
– Configure kgdboc via sysfs
Enter the kernel debugger manually by sending a
sysrq-g or by waiting for an oops or fault
– echo g > /proc/sysrq-trigger
– Minicom: Ctrl-a, f, g
– Telnet: Ctrl-], send break<RET>, g
At KDB prompt, enter “help” to see a list of
commands, “go” to resume kernel execution
Some KDB commands
Command Usage Description
----------------------------------------------------------
md <vaddr> Display Memory Contents
mm <vaddr> <contents> Modify Memory Contents
go [<vaddr>] Continue Execution
rd Display Registers
rm <reg> <contents> Modify Registers
bt [<vaddr>] Stack traceback
help Display Help Message
kgdb Enter kgdb mode
ps [<flags>|A] Display active task list
pid <pidnum> Switch to another task
lsmod List loaded kernel modules
dmesg [lines] Display syslog buffer
kill <-signal> <pid> Send a signal to a process
summary Summarize the system
bp [<vaddr>] Set/Display breakpoints
ss Single Step
Screenshot of KDB with GDB
Using KGDB and GDB (1)
●
●
●
Configure kgdboc
– kgdb, like kdb will only hook up to the kernel trap
hooks if a KGDB I/O driver is loaded and configured
Stop kernel execution
– Send a sysrq-g, if you see a kdb prompt, enter “kgdb”
– or you can use kgdbwait for debugging kernel boot.
Connect from from gdb
Serial port TCP port
$ gdb ./vmlinux
(gdb) set remotebaud 115200
(gdb) target remote /dev/ttyS0
$ gdb ./vmlinux
(gdb) target remote 192.168.1.99:1234
Using KGDB and GDB (2)
● Reminder
– If you “continue” in gdb, and need to "break in" again,
you need to issue another sysrq-g
– You can put a breakpoint at sys_sync and then run
"sync" from a shell to break into the debugger
Screenshot of KGDB and GDB
Kernel profiling with perf
• perf is a command-line profiling tool based on
perf_events kernel interface
– It’s event-based sampling.
When a PMU counter overflows, a sample is recorded.
usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
The most commonly used perf commands are:
annotate Read perf.data (created by perf record) and display annotated code
archive Create archive with object files with build-ids found in perf.data file
data Data file related processing
diff Read perf.data files and display the differential profile
evlist List the event names in a perf.data file
kmem Tool to trace/measure kernel memory properties
list List all symbolic event types
lock Analyze lock events
mem Profile memory accesses
record Run a command and record its profile into perf.data
report Read perf.data (created by perf record) and display the profile
sched Tool to trace/measure scheduler properties (latencies)
script Read perf.data (created by perf record) and display trace output
stat Run a command and gather performance counter statistics
timechart Tool to visualize total system behavior during a workload
top System profiling tool.
trace strace inspired tool
probe Define new dynamic tracepoints
Use perf_events for CPU profiling
• Flame Graphs visualize profiled code
$ git clone --depth 1 https://github.com/brendangregg/FlameGraph
$ sudo perf record -F 99 -a -g -- sleep 30
$ perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > perf.svg
Example of perf report
$ pi@raspberrypi:~/sunplus $ sudo perf record -g -a sleep 10
$ pi@raspberrypi:~/sunplus $ sudo perf report
Samples: 5K of event 'cycles:ppp', Event count (approx.): 184814613
Children Self Command Shared Object Symbol
+ 83.86% 1.97% swapper [kernel.kallsyms] [k] cpu_startup_entry
+ 70.22% 0.00% swapper [kernel.kallsyms] [k] secondary_start_kernel
+ 70.22% 0.00% swapper [unknown] [k] 0x000095ac
+ 67.09% 0.45% swapper [kernel.kallsyms] [k] default_idle_call
+ 66.11% 61.30% swapper [kernel.kallsyms] [k] arch_cpu_idle
...
$pi@raspberrypi:~/sunplus $ sudo perf kmem record
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.199 MB perf.data (814 samples) ]
pi@raspberrypi:~/sunplus $ sudo perf kmem stat --caller
Failed to read max nodes, using default of 8
---------------------------------------------------------------------------------------------------------
Callsite | Total_alloc/Per | Total_req/Per | Hit | Ping-pong | Frag
---------------------------------------------------------------------------------------------------------
kthread_create_on_node+5c | 64/64 | 28/28 | 1 | 0 | 56.250%
bcm2835_dma_create_cb_chain+54 | 832/277 | 560/186 | 3 | 3 | 32.692%
alloc_worker+30 | 128/128 | 88/88 | 1 | 0 | 31.250%
alloc_skb_with_frags+58 | 512/512 | 384/384 | 1 | 0 | 25.000%
...
SUMMARY (SLAB allocator)
========================
Total bytes requested: 419,528
Total bytes allocated: 420,216
Total bytes wasted on internal fragmentation: 688
Internal fragmentation: 0.163725%
Cross CPU allocations: 0/326
ftrace
• Useful for event tracing, analyzing latencies and performance issues
• The proc sysctl ftrace_enable is a big on/off switch. Default is enabled
– To disable: echo 0 > /proc/sys/kernel/ftrace_enabled
• Summary of /sys/kernel/debug/tracing
Filename Description
current_tracer Set or display the current tracer that is configured
available_tracers Tracers listed here can be configured by echoing their name into current_tracer
tracing_on Enable or disables writing to the ring buffer (tracing overhead may still be occurring)
trace Output of the trace in a human readable format
tracing_max_latency Some of the tracers record the max latency. For example, the time interrupts are disabled.
tracing_thresh Latency tracers will record a trace whenever the latency is greater than the number (in ms)
in this file
set_ftrace_pid Have the function tracer only trace a single thread
set_graph_function Set a "trigger" function where tracing should start with the function graph tracer
stack_trace The stack back trace of the largest stack that was encountered when the stack tracer is
activated
trace_marker This is a very useful file for synchronizing user space with events happening in the kernel.
Writing strings into this file will be written into the ftrace buffer
List of tracers
Name of tracers Description
function Function call tracer to trace all kernel functions
function_graph Trace both entry and exit of the functions. It then provides the ability to draw a
graph of function calls like C source code
irqsoff Traces the areas that disable interrupts and saves the trace with the longest
max latency. See tracing_max_latency.
preemptoff Traces and records the amount of time for which preemption is disabled.
preemptirqsoff Traces and records the largest time for which irqs and/or preemption is
disabled.
wakeup Traces and records the max latency that it takes for the highest priority task to
get scheduled after it has been woken up.
wakeup_rt Traces and records the max latency that it takes for just RT tasks
nop To remove all tracers from tracing simply echo "nop" into current_tracer
Example of function tracer
# echo SyS_nanosleep hrtimer_interrupt > set_ftrace_filter
# echo function > current_tracer
# echo 1 > tracing_on
# usleep 1
# echo 0 > tracing_on
# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 5/5 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
usleep-2665 [001] .... 4186.475355: sys_nanosleep <-system_call_fastpath
<idle>-0 [001] d.h1 4186.475409: hrtimer_interrupt <-smp_apic_timer_interrupt
usleep-2665 [001] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt
<idle>-0 [003] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt
<idle>-0 [002] d.h1 4186.475427: hrtimer_interrupt <-smp_apic_timer_interrupt
Note: function tracer uses ring buffers to store
entries. The newest data may overwrite the
oldest data.Sometimes using echo to stop the
trace is not sufficient because the tracing could
have overwritten the data that you wanted to
record. For this reason, it is sometimes better
to disable tracing directly from a program.
Example of function-graph
tracer
• This tracer can also measure execution time of a function
• To trace only one function and all of its children:
# echo __do_fault > set_graph_function
# echo function_graph > current_tracer
# echo 1 > tracing_on
# usleep 1
# echo 0 > tracing_on
# cat trace
#
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
0) | __do_fault() {
0) | filemap_fault() {
0) 0.408 us | find_get_page();
0) 0.085 us | _cond_resched();
0) 2.462 us | }
0) 0.087 us | _raw_spin_lock();
0) 0.104 us | add_mm_counter_fast();
0) 0.106 us | page_add_file_rmap();
0) 0.090 us | _raw_spin_unlock();
0) | unlock_page() {
0) 0.103 us | page_waitqueue();
0) 0.146 us | __wake_up_bit();
0) 1.508 us | }
0) 8.403 us | }
Example of irqsoff tracer
# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 3.8.0-test+
# --------------------------------------------------------------------
# latency: 16 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: swapper/0-0 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: run_timer_softirq
# => ended at: run_timer_softirq
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
#  / |||||  | /
<idle>-0 0d.s2 0us+: _raw_spin_lock_irq <-run_timer_softirq
<idle>-0 0dNs3 17us : _raw_spin_unlock_irq <-run_timer_softirq
<idle>-0 0dNs3 17us+: trace_hardirqs_on <-run_timer_softirq
<idle>-0 0dNs3 25us : <stack trace>
=> _raw_spin_unlock_irq
=> run_timer_softirq
=> __do_softirq
...
# echo 0 > options/function-trace
# echo irqsoff > current_tracer
# echo 1 > tracing_on
# echo 0 > tracing_max_latency
# ls -ltr
[...]
# echo 0 > tracing_on
# cat trace
Note the above example had function-trace not set. If we set
function-trace, we get a much larger output
Example of stack tracer
• ftrace makes it convenient to check the stack size at
every function call
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
After running it for a few minutes, the output looks like:
# cat stack_max_size
2928
# cat stack_trace
Depth Size Location (18 entries)
----- ---- --------
0) 2928 224 update_sd_lb_stats+0xbc/0x4ac
1) 2704 160 find_busiest_group+0x31/0x1f1
2) 2544 256 load_balance+0xd9/0x662
3) 2288 80 idle_balance+0xbb/0x130
4) 2208 128 __schedule+0x26e/0x5b9
5) 2080 16 schedule+0x64/0x66
6) 2064 128 schedule_timeout+0x34/0xe0
7) 1936 112 wait_for_common+0x97/0xf1
8) 1824 16 wait_for_completion+0x1d/0x1f
9) 1808 128 flush_work+0xfe/0x119
10) 1680 16 tty_flush_to_ldisc+0x1e/0x20
11) 1664 48 input_available_p+0x1d/0x5c
12) 1616 48 n_tty_poll+0x6d/0x134
13) 1568 64 tty_poll+0x64/0x7f
14) 1504 880 do_select+0x31e/0x511
15) 624 400 core_sys_select+0x177/0x216
16) 224 96 sys_select+0x91/0xb9
17) 128 128 system_call_fastpath+0x16/0x1b
ftrace homework
• Read https://www.kernel.org/doc/Documentation/trace/events.txt
This document is about event tracing (static tracepoints)
• perf-tools is a collection of performance analysis tools for Linux
ftrace and perf_events. Try to find a good use of it in your work. You
can download it from https://github.com/brendangregg/perf-tools.git
• Write a small program using ftrace to track the number of context
switches per second for each CPU.
$ sudo ./ftrace_ctxt_switches.py
...
Duration (sec): 61.386, Context switches (per sec): CPU0: 1130 ( 18) CPU1: 5875 ( 96) CPU2: 183 ( 3) CPU3: 230 ( 4)
Duration (sec): 63.784, Context switches (per sec): CPU0: 1138 ( 18) CPU1: 6028 ( 95) CPU2: 188 ( 3) CPU3: 230 ( 4)
...
References
• Linux Device Drivers, 3rd Edition, Jonathan
Corbet
• Linux kernel source,
http://lxr.free-electrons.com
• Choose a Linux tracer, Brendan Gregg
– http://www.brendangregg.com/blog/2015-07-
08/choosing-a-linux-tracer.html
• KDB and KGDB kernel documentation
– http://kernel.org/pub/linux/kernel/people/jwessel/kdb/

More Related Content

What's hot

Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdfAdrian Huang
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernelAdrian Huang
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicJoseph Lu
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBshimosawa
 
Kernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisBuland Singh
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingScyllaDB
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionGene Chang
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory modelSeongJae Park
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in LinuxAdrian Huang
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernellcplcp1
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdfAdrian Huang
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Jian-Hong Pan
 
Yet another introduction to Linux RCU
Yet another introduction to Linux RCUYet another introduction to Linux RCU
Yet another introduction to Linux RCUViller Hsiao
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingBrendan Gregg
 

What's hot (20)

Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
Kernel_Crash_Dump_Analysis
Kernel_Crash_Dump_AnalysisKernel_Crash_Dump_Analysis
Kernel_Crash_Dump_Analysis
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
malloc & vmalloc in Linux
malloc & vmalloc in Linuxmalloc & vmalloc in Linux
malloc & vmalloc in Linux
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdf
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021Let's trace Linux Lernel with KGDB @ COSCUP 2021
Let's trace Linux Lernel with KGDB @ COSCUP 2021
 
Yet another introduction to Linux RCU
Yet another introduction to Linux RCUYet another introduction to Linux RCU
Yet another introduction to Linux RCU
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
 

Similar to Kernel Debugging Tools and Techniques

Kernel debug log and console on openSUSE
Kernel debug log and console on openSUSEKernel debug log and console on openSUSE
Kernel debug log and console on openSUSESUSE Labs Taipei
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource KernelsSilvio Cesare
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentMohammed Farrag
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxTushar B Kute
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)shimosawa
 
Virtual platform
Virtual platformVirtual platform
Virtual platformsean chen
 
202110 SESUG 49 UNIX X Command Tips and Tricks
202110 SESUG 49 UNIX X Command Tips and Tricks202110 SESUG 49 UNIX X Command Tips and Tricks
202110 SESUG 49 UNIX X Command Tips and Tricksdhorvath
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questionsTeja Bheemanapally
 
(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and TricksDavid Horvath
 
Exploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernelExploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernelVitaly Nikolenko
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilersAnastasiaStulova
 
Austin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreAustin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreKim Phillips
 
Writing Character driver (loadable module) in linux
Writing Character driver (loadable module) in linuxWriting Character driver (loadable module) in linux
Writing Character driver (loadable module) in linuxRajKumar Rampelli
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisAnne Nicolas
 
Working with core dump
Working with core dumpWorking with core dump
Working with core dumpThierry Gayet
 
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with VolatlityOMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with VolatlityAndrew Case
 

Similar to Kernel Debugging Tools and Techniques (20)

Kernel debug log and console on openSUSE
Kernel debug log and console on openSUSEKernel debug log and console on openSUSE
Kernel debug log and console on openSUSE
 
Driver_linux
Driver_linuxDriver_linux
Driver_linux
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports Development
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in Linux
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Virtual platform
Virtual platformVirtual platform
Virtual platform
 
LINUX Device Drivers
LINUX Device DriversLINUX Device Drivers
LINUX Device Drivers
 
202110 SESUG 49 UNIX X Command Tips and Tricks
202110 SESUG 49 UNIX X Command Tips and Tricks202110 SESUG 49 UNIX X Command Tips and Tricks
202110 SESUG 49 UNIX X Command Tips and Tricks
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questions
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks(SAS) UNIX X Command Tips and Tricks
(SAS) UNIX X Command Tips and Tricks
 
Exploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernelExploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernel
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
Ui disk & terminal drivers
Ui disk & terminal driversUi disk & terminal drivers
Ui disk & terminal drivers
 
Austin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectreAustin c-c++-meetup-feb2018-spectre
Austin c-c++-meetup-feb2018-spectre
 
Writing Character driver (loadable module) in linux
Writing Character driver (loadable module) in linuxWriting Character driver (loadable module) in linux
Writing Character driver (loadable module) in linux
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
 
Working with core dump
Working with core dumpWorking with core dump
Working with core dump
 
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with VolatlityOMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
 

Recently uploaded

Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringJuanCarlosMorales19600
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 

Recently uploaded (20)

Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 

Kernel Debugging Tools and Techniques

  • 2. Choices of debugging tools • Add debug code, recompile and run – printk, but bug may disappear if it's timing sensitive and data is written to a serial console – Set console log level to 0 and use dmesg instead • Patch code at runtime to print or gather data – Ftrace, Kprobes • Patch code at runtime to stop kernel and analyze – KDB, KGDB • Run the kernel under the control of VM like QEMU, VirtualBox
  • 3. printk() • Kernel-space equivalent of printf() • Each kernel message are prepended a string representing its loglevel n – “<n>Hello world!” • Loglevel determines the severity of the message
  • 4. Printk loglevel • Messages with level lower than console_loglevel are shown to the console • console_loglevel can be changed via – dmesg -n level – syslog system call – echo n > /proc/sys/kernel/printk Name String Meaning Alias macro KERN_EMERG "0" Emergency messages, system is about to crash or is unstable pr_emerg() KERN_ALERT "1" Something bad happened and action must be taken immediately pr_alert() KERN_CRIT "2" A serious hardware/software failure pr_crit() KERN_ERR "3" Often used by drivers to indicate difficulties with the hardware pr_err() KERN_WARNING "4" nothing serious by itself but might indicate problems pr_warning() KERN_NOTICE "5" Nothing serious. Often used to report security events. pr_notice() KERN_INFO "6" Informational message e.g. startup info. at driver initialization pr_info() KERN_DEBUG "7" Debug messages pr_debug()if DEBUG is defined KERN_DEFAULT "d" The default kernel loglevel KERN_CONT "" "continued" line after a line that had no enclosing n pr_cont()
  • 5. Kernel log buffer • kernel log buffer stores kernel messages • It is a circular buffer. Old messages are overwritten when the buffer is full – Use klogd daemon to keep old msgs in a file – Log buffer size is configurable • Kernel log buffer can be manipulated via syslog system call – or dmesg command line tool
  • 6. syslog system call • int syslog(int type, char *bufp, int len) /* * Commands to sys_syslog: * * 0 -- Close the log. Currently a NOP. * 1 -- Open the log. Currently a NOP. * 2 -- Read from the log (wait until the buffer is nonempty) * 3 -- Read all messages remaining in the ring buffer * 4 -- Read and clear all messages remaining in the ring buffer * 5 -- Clear ring buffer. * 6 -- Disable printk to console * 7 -- Enable printk to console * 8 -- Set level of messages printed to console * 9 -- Return number of unread characters in the log buffer */
  • 7. Klogd and syslogd • Klogd is “kernel log daemon”. It receives kernel messages via syslog system call (or /proc/kmsg) and redirect them to syslogd • syslogd differentiate messages by facility.priority (ex. LOG_KERN.LOG_ERR) and consults /etc/syslog.conf to know how to deal with them (discard or save in a file) Kernel Log buffer /proc/kmsg sys_syslog() klogd syslogd file files Kernel space User space C library: openlog() closelog() syslog() other daemons
  • 8. Use printk macros • Do not remove debug printk – you may need it later to debug another related issue • Undefine DEBUG to remove debug messages in a production kernel • For drivers, use dev_dbg() instead
  • 9. Limit the rate of your printk • Printk may overwhelm the console if – printk in a code which get executed very often – printk in a frequently-triggered IRQ handler (eg. Timer) • printk_ratelimit() return 0 when message to be printed should be surpressed • printk_once() – no matter how often you call it, it prints once and never again if (printk_ratelimit( )) printk(KERN_NOTICE "The printer is still on firen");
  • 10. printk_ratelimit() implementation • The two variable can be modified via /proc/sys/kernel/ /* minimum time in jiffies between messages */ int printk_ratelimit_jiffies = 5*HZ; /* number of messages we send before ratelimiting */ int printk_ratelimit_burst = 10; int printk_ratelimit(void) { return __printk_ratelimit(printk_ratelimit_jiffies, printk_ratelimit_burst); }
  • 11. /proc file system • A software-created, pseudo file system • Contains many system information, ex: – /proc/<pid>/maps – /proc/sys/kernel/* – /proc/interrupts – /proc/meminfo • Use of /proc fs is discouraged, they should contain only information about process • You should use sysfs or debugfs instead
  • 12. debugfs • a simple way to make information available to user space – Unlike sysfs, which has strict one-value-per- file rules – NOT a stable API for user space – mount -t debugfs none /sys/kernel/debug
  • 13. debugfs example #include <linux/module.h> #include <linux/debugfs.h> #define len 200 u64 intvalue, hexvalue; struct dentry *dirret, *fileret, *u64int, *u64hex; char _buf[len]; static ssize_t myreader(struct file *fp, char __user *user_buffer, size_t count, loff_t *pos) { char *kbuf = (char *)file_inode(fp)->i_private; return simple_read_from_buffer(user_buffer, count, pos, kbuf, len); } static ssize_t mywriter(struct file *fp, const char __user *user_buffer, size_t count, loff_t *pos) { char *kbuf = (char *)file_inode(fp)->i_private; return simple_write_to_buffer(kbuf, len, pos, user_buffer, count); } static const struct file_operations fops_debug = { .read = myreader, .write = mywriter, }; static int __init init_debug(void) { /* create a directory in /sys/kernel/debug */ dirret = debugfs_create_dir(“mydebug", NULL); if (IS_ERR_OR_NULL(dirret)) return -ENODEV; /* create a file in the above directory This requires read and write file operations */ fileret = debugfs_create_file("text", 0644, dirret, _buf, &fops_debug); /* create a file which takes in a int(64) value */ u64int = debugfs_create_u64("number", 0644, dirret, &intvalue); /* takes a hex decimal value */ u64hex = debugfs_create_x64("hexnum", 0644, dirret, &hexvalue); return 0; } static void __exit exit_debug(void) { /* remove mydebug dir recursively */ debugfs_remove_recursive(dirret); } module_init(init_debug); module_exit(exit_debug);
  • 14. strace: system call trace • Intercepts and records – system calls issued by a process – signals a process received • Where to use – Have a in indepth understanding of the exactly behavior of a program – Debug the exactly argument or system call a program issued – When you don’t have access to the source code • Syntax – strace [option] <command [args]> • Common option – -c -- count time, calls, and errors for each syscall and report summary – -f -- follow forks – -T -- print time spent in each syscall – -e expr -- a qualifying expression: option=[!]all or option=[!]val1[,val2]... (options: trace, abbrev, verbose, raw, signal, read, or write)
  • 15. strace output example execve("/bin/dmesg", ["dmesg"], [/* 22 vars */]) = 0 ... syslog(0x3, 0x95d3858, 0x4008) = 16384 write(1, "amily 2nIP: routing cache hash t"..., 4096amily write(1, "to accept 2 bytes to c1bd7f9e fr"..., ... munmap(0xb7d6b000, 4096) = 0 exit_group(0) = ? % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 92.75 0.013263 35 374 write 5.02 0.000718 718 1 syslog 0.51 0.000073 18 4 1 open 0.47 0.000067 34 2 munmap 0.41 0.000058 12 5 old_mmap 0.34 0.000048 24 2 mmap2 0.11 0.000016 4 4 fstat64 0.10 0.000015 15 1 read 0.10 0.000015 8 2 mprotect 0.08 0.000012 3 4 brk 0.04 0.000006 2 3 close 0.04 0.000006 6 1 uname 0.02 0.000003 3 1 set_thread_area ------ ----------- ----------- --------- --------- ---------------- 100.00 0.014300 404 1 total
  • 16. Kernel oops • When kernel detects some bug in itself – Fault: Kernel kill faulting process and try to continue • Some locks and data structures may not be released properly; the system cannot be trusted anymore – Panic: system halts, usually in interrupt context or in idle, init task where kernel think it cannot recover itself • Oops message contains – Error message – Contents of registers – Stack dump – Function call trace • Enable CONFIG_KALLSYMS at kernel configuration to have symbolic call trace (otherwise all you see are binary addresses)
  • 17. Kernel Oops Example • Code below will trigger an oops ssize_t faulty_write (struct file *filp, const char __user *buf, size_t count, loff_t *pos) { /* make a simple fault by dereferencing a NULL pointer */ *(int *)0 = 0; return 0; } struct file_operations faulty_fops = { .read = faulty_read, .write = faulty_write, .owner = THIS_MODULE };
  • 18. Kernel Oops Example Unable to handle kernel NULL pointer dereference at virtual address 00000000 Internal error: Oops: 817 [#1] SMP ARM Modules linked in: faulty(O) bnep hci_uart btbcm bluetooth brcmfmac brcmutil CPU: 1 PID: 835 Comm: bash Tainted: G O 4.4.21-v7+ #911 task: b6a605c0 ti: b6ae8000 task.ti: b6ae8000 PC is at faulty_write+0x18/0x20 [faulty] pc : [<7f33c018>] lr : [<8015736c>] sp : b6ae9ed0 ip : b6ae9ee0 fp : b6ae9edc r10: 00000000 r9 : b6ae8000 r8 : 8000fd08 r7 : b6ae9f80 r6 : 01493c08 r5 : b6ae9f80 r4 : b93953c0 r3 : b6ae9f80 r2 : 00000002 r1 : 01493c08 r0 : 00000000 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5383d Table: 36b3806a DAC: 00000055 Process bash (pid: 835, stack limit = 0xb6ae8210) Stack: (0xb6ae9ed0 to 0xb6aea000) 9ec0: b6ae9f4c b6ae9ee0 8015736c 7f33c00c 9ee0: 00000000 0000000a b934f600 80174fc0 b6ae9f3c b6ae9f00 80174fc0 805b66fc 9f00: b6ae9f3c 801559f8 00000000 80157c34 00000000 00000000 b6ae9f44 b6ae9f28 9f20: 80155a0c 80159158 b93953c0 b93953c0 00000002 01493c08 b6ae9f80 8000fd08 9f40: b6ae9f7c b6ae9f50 80157c64 80157344 80155a0c 801752e8 b93953c0 b93953c0 9f60: 00000002 01493c08 8000fd08 b6ae8000 b6ae9fa4 b6ae9f80 801585d4 80157bd0 [<7f33c018>] (faulty_write [faulty]) from [<8015736c>] (__vfs_write+0x34/0xe8) [<8015736c>] (__vfs_write) from [<80157c64>] (vfs_write+0xa0/0x1a8) [<80157c64>] (vfs_write) from [<801585d4>] (SyS_write+0x54/0xb0) [<801585d4>] (SyS_write) from [<8000fb40>] (ret_fast_syscall+0x0/0x1c) Code: e24cb004 e52de004 e8bd4000 e3a00000 (e5800000)
  • 19. Calling convention • An low-level scheme for how subroutines receive parameters from their caller and how they return a result • ARM 32 register allocation: Register Use Comment r15 Program counter r14 Link register Used by BL instruction r13 Stack pointer Must 8 bytes aligned r12 For Intra procedure call r4 to r11: For local variables Callee saved r0 to r3 For arguments and return values Caller saved
  • 21. decodecode • A script for disassembling oops code pi@raspberrypi:~/linux $ dmesg | scripts/decodecode [ 80.573075] Code: e24cb004 e52de004 e8bd4000 e3a00000 (e5800000) All code ======== 0: e24cb004 sub fp, ip, #4 4: e52de004 push {lr} ; (str lr, [sp, #-4]!) 8: e8bd4000 ldmfd sp!, {lr} c: e3a00000 mov r0, #0 10:* e5800000 str r0, [r0] <-- trapping instruction Code starting with the faulting instruction =========================================== 0: e5800000 str r0, [r0]
  • 22. Finding oops code with GDB • Module should be compiled with “-g” – Add “ccflags-y := -g” to module’s Makefile pi@raspberrypi:~/sunplus/oops $ cat /proc/modules faulty 1367 0 - Live 0x7f33c000 (O) bnep 10340 2 - Live 0x7f335000 ... pi@raspberrypi:~/sunplus/oops $ gdb GNU gdb (Raspbian 7.7.1+dfsg-5) 7.7.1 (gdb) add-symbol-file faulty.ko 0x7f33c000 add symbol table from file "faulty.ko" at .text_addr = 0x7f33c000 (y or n) y Reading symbols from faulty.ko...done. (gdb) list *0x7f33c018 0x7f33c018 is in faulty_write (/home/pi/sunplus/oops/faulty.c:51). 46 47 ssize_t faulty_write (struct file *filp, const char __user *buf, size_t count, 48 loff_t *pos) 49 { 50 /* make a simple fault by dereferencing a NULL pointer */ 51 *(int *)0 = 0; 52 return 0; 53 }
  • 23. gdb – observe kernel variables • gdb can observe variables in the kernel • How to use? – gdb /usr/src/linux/vmlinux /proc/kcore – p jiffies /* print the value of jiffies variable */ – p jiffies /* you get the same value, since gdb cache value readed from the core file */ – core-file /proc/kcore /* flush gdb cache */ – p jiffies /* you get a different value of jiffies */ • vmlinux is the name of the uncompressed ELF kernel executable, not bzImage • kcore represent the kernel executable in the format of a core file • Disadvantage – Read-only access to the kernel
  • 24. Introduction of KGDB and KDB ● ● Linux kernel has two different debugger front ends (kdb and kgdb) which interface to the debug core KDB – Use on a system console or serial console – Not a source level debugger, aimed at doing simple analysis or diagnosis – Function ● ● ● Data: Read/write memory, registers Linux: process lists, backtrace, dmesg. Control: set breakpoints, single step instruction
  • 25. KGDB ● ● source level debugger, used with GDB to debug a Linux kernel Two machines (physical or virtual) are required for using KGDB – Communicate via network or serial connection – Target machine runs the kernel to be debugged – Development machine runs a instance of GDB against vmlinux file which contains the symbols.
  • 26. KGDB Kernel Configuration (1) ● ● ● ● CONFIG_DEBUG_INFO=y – Required by GDB for source level debugging. This adds debug symbols to kernel and modules (gcc -g) CONFIG_KALLSYMS=y – Required by KDB to access symbols by name CONFIG_FRAME_POINTER=y – Save frame info. in registers or stack to allows GDB to construct stack back traces more accurately CONFIG_DEBUG_RODATA=n – Page tables will disallow write to kernel read-only data. If this is enabled, you cannot use software breakpoints
  • 27. KGDB Kernel Configuration (2) ● ● ● ● ● CONFIG_EXPERIMENTAL=y CONFIG_KGDB=y CONFIG_KGDB_SERIAL_CONSOLE=y – kgdboc is a KGDB I/O driver for use KGDB/KDB over serial console CONFIG_SERIAL_8250=y – Driver for standard serial ports CONFIG_SERIAL_8250_CONSOLE=y – Allow the use of a serial port as system console
  • 28. KDB Kernel Configuration ● ● ● KGDB must first be enabled before KDB is enabled. To use KDB on a serial console, kgdboc and a serial port driver are also needed CONFIG_KGDB_KDB=y – include kdb frontend for kgdb CONFIG_KDB_KEYBOARD=y – KDB can use a PS/2 type keyboard for an input device
  • 29. Kernel Parameters - kgdboc ● ● kgdboc=[kms][[,]kbd][[,]serial_device][,baud] – Designed to work with a single serial port which is used for your primary console and for kernel debugging – kms (kernel mode setting) integration to allow entering kdb on a graphic console – Can be configured in kernel boot parameters or at runtime with sysfs – does not support interrupting the target via the gdb remote protocol. You must manually send a sysrq-g Enable / Disable kgdboc – echo ttyS0,115200 > /sys/module/kgdboc/parameters/kgdboc – echo “” > /sys/module/kgdboc/parameters/kgdboc
  • 30. Kernel Parameters - kgdbwait ● ● ● ● It makes kernel stop as early as I/O driver supports and wait for a debugger connection during booting of a kernel Useful for debugging kernel initialization Note – A KGDB I/O driver must be compiled into kernel and kgdbwait should always follow the parameter for KGDB I/O driver in kernel command line Example – kgdboc=ttyS0,115200 kgdbwait
  • 31. Using KDB on serial port ● ● ● Configure I/O driver – Boot kernel with kgdboc parameters or – Configure kgdboc via sysfs Enter the kernel debugger manually by sending a sysrq-g or by waiting for an oops or fault – echo g > /proc/sysrq-trigger – Minicom: Ctrl-a, f, g – Telnet: Ctrl-], send break<RET>, g At KDB prompt, enter “help” to see a list of commands, “go” to resume kernel execution
  • 32. Some KDB commands Command Usage Description ---------------------------------------------------------- md <vaddr> Display Memory Contents mm <vaddr> <contents> Modify Memory Contents go [<vaddr>] Continue Execution rd Display Registers rm <reg> <contents> Modify Registers bt [<vaddr>] Stack traceback help Display Help Message kgdb Enter kgdb mode ps [<flags>|A] Display active task list pid <pidnum> Switch to another task lsmod List loaded kernel modules dmesg [lines] Display syslog buffer kill <-signal> <pid> Send a signal to a process summary Summarize the system bp [<vaddr>] Set/Display breakpoints ss Single Step
  • 33. Screenshot of KDB with GDB
  • 34. Using KGDB and GDB (1) ● ● ● Configure kgdboc – kgdb, like kdb will only hook up to the kernel trap hooks if a KGDB I/O driver is loaded and configured Stop kernel execution – Send a sysrq-g, if you see a kdb prompt, enter “kgdb” – or you can use kgdbwait for debugging kernel boot. Connect from from gdb Serial port TCP port $ gdb ./vmlinux (gdb) set remotebaud 115200 (gdb) target remote /dev/ttyS0 $ gdb ./vmlinux (gdb) target remote 192.168.1.99:1234
  • 35. Using KGDB and GDB (2) ● Reminder – If you “continue” in gdb, and need to "break in" again, you need to issue another sysrq-g – You can put a breakpoint at sys_sync and then run "sync" from a shell to break into the debugger
  • 37. Kernel profiling with perf • perf is a command-line profiling tool based on perf_events kernel interface – It’s event-based sampling. When a PMU counter overflows, a sample is recorded. usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS] The most commonly used perf commands are: annotate Read perf.data (created by perf record) and display annotated code archive Create archive with object files with build-ids found in perf.data file data Data file related processing diff Read perf.data files and display the differential profile evlist List the event names in a perf.data file kmem Tool to trace/measure kernel memory properties list List all symbolic event types lock Analyze lock events mem Profile memory accesses record Run a command and record its profile into perf.data report Read perf.data (created by perf record) and display the profile sched Tool to trace/measure scheduler properties (latencies) script Read perf.data (created by perf record) and display trace output stat Run a command and gather performance counter statistics timechart Tool to visualize total system behavior during a workload top System profiling tool. trace strace inspired tool probe Define new dynamic tracepoints
  • 38. Use perf_events for CPU profiling • Flame Graphs visualize profiled code $ git clone --depth 1 https://github.com/brendangregg/FlameGraph $ sudo perf record -F 99 -a -g -- sleep 30 $ perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > perf.svg
  • 39. Example of perf report $ pi@raspberrypi:~/sunplus $ sudo perf record -g -a sleep 10 $ pi@raspberrypi:~/sunplus $ sudo perf report Samples: 5K of event 'cycles:ppp', Event count (approx.): 184814613 Children Self Command Shared Object Symbol + 83.86% 1.97% swapper [kernel.kallsyms] [k] cpu_startup_entry + 70.22% 0.00% swapper [kernel.kallsyms] [k] secondary_start_kernel + 70.22% 0.00% swapper [unknown] [k] 0x000095ac + 67.09% 0.45% swapper [kernel.kallsyms] [k] default_idle_call + 66.11% 61.30% swapper [kernel.kallsyms] [k] arch_cpu_idle ... $pi@raspberrypi:~/sunplus $ sudo perf kmem record ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.199 MB perf.data (814 samples) ] pi@raspberrypi:~/sunplus $ sudo perf kmem stat --caller Failed to read max nodes, using default of 8 --------------------------------------------------------------------------------------------------------- Callsite | Total_alloc/Per | Total_req/Per | Hit | Ping-pong | Frag --------------------------------------------------------------------------------------------------------- kthread_create_on_node+5c | 64/64 | 28/28 | 1 | 0 | 56.250% bcm2835_dma_create_cb_chain+54 | 832/277 | 560/186 | 3 | 3 | 32.692% alloc_worker+30 | 128/128 | 88/88 | 1 | 0 | 31.250% alloc_skb_with_frags+58 | 512/512 | 384/384 | 1 | 0 | 25.000% ... SUMMARY (SLAB allocator) ======================== Total bytes requested: 419,528 Total bytes allocated: 420,216 Total bytes wasted on internal fragmentation: 688 Internal fragmentation: 0.163725% Cross CPU allocations: 0/326
  • 40. ftrace • Useful for event tracing, analyzing latencies and performance issues • The proc sysctl ftrace_enable is a big on/off switch. Default is enabled – To disable: echo 0 > /proc/sys/kernel/ftrace_enabled • Summary of /sys/kernel/debug/tracing Filename Description current_tracer Set or display the current tracer that is configured available_tracers Tracers listed here can be configured by echoing their name into current_tracer tracing_on Enable or disables writing to the ring buffer (tracing overhead may still be occurring) trace Output of the trace in a human readable format tracing_max_latency Some of the tracers record the max latency. For example, the time interrupts are disabled. tracing_thresh Latency tracers will record a trace whenever the latency is greater than the number (in ms) in this file set_ftrace_pid Have the function tracer only trace a single thread set_graph_function Set a "trigger" function where tracing should start with the function graph tracer stack_trace The stack back trace of the largest stack that was encountered when the stack tracer is activated trace_marker This is a very useful file for synchronizing user space with events happening in the kernel. Writing strings into this file will be written into the ftrace buffer
  • 41. List of tracers Name of tracers Description function Function call tracer to trace all kernel functions function_graph Trace both entry and exit of the functions. It then provides the ability to draw a graph of function calls like C source code irqsoff Traces the areas that disable interrupts and saves the trace with the longest max latency. See tracing_max_latency. preemptoff Traces and records the amount of time for which preemption is disabled. preemptirqsoff Traces and records the largest time for which irqs and/or preemption is disabled. wakeup Traces and records the max latency that it takes for the highest priority task to get scheduled after it has been woken up. wakeup_rt Traces and records the max latency that it takes for just RT tasks nop To remove all tracers from tracing simply echo "nop" into current_tracer
  • 42. Example of function tracer # echo SyS_nanosleep hrtimer_interrupt > set_ftrace_filter # echo function > current_tracer # echo 1 > tracing_on # usleep 1 # echo 0 > tracing_on # cat trace # tracer: function # # entries-in-buffer/entries-written: 5/5 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | usleep-2665 [001] .... 4186.475355: sys_nanosleep <-system_call_fastpath <idle>-0 [001] d.h1 4186.475409: hrtimer_interrupt <-smp_apic_timer_interrupt usleep-2665 [001] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt <idle>-0 [003] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt <idle>-0 [002] d.h1 4186.475427: hrtimer_interrupt <-smp_apic_timer_interrupt Note: function tracer uses ring buffers to store entries. The newest data may overwrite the oldest data.Sometimes using echo to stop the trace is not sufficient because the tracing could have overwritten the data that you wanted to record. For this reason, it is sometimes better to disable tracing directly from a program.
  • 43. Example of function-graph tracer • This tracer can also measure execution time of a function • To trace only one function and all of its children: # echo __do_fault > set_graph_function # echo function_graph > current_tracer # echo 1 > tracing_on # usleep 1 # echo 0 > tracing_on # cat trace # # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 0) | __do_fault() { 0) | filemap_fault() { 0) 0.408 us | find_get_page(); 0) 0.085 us | _cond_resched(); 0) 2.462 us | } 0) 0.087 us | _raw_spin_lock(); 0) 0.104 us | add_mm_counter_fast(); 0) 0.106 us | page_add_file_rmap(); 0) 0.090 us | _raw_spin_unlock(); 0) | unlock_page() { 0) 0.103 us | page_waitqueue(); 0) 0.146 us | __wake_up_bit(); 0) 1.508 us | } 0) 8.403 us | }
  • 44. Example of irqsoff tracer # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 3.8.0-test+ # -------------------------------------------------------------------- # latency: 16 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: swapper/0-0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: run_timer_softirq # => ended at: run_timer_softirq # # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / delay # cmd pid ||||| time | caller # / ||||| | / <idle>-0 0d.s2 0us+: _raw_spin_lock_irq <-run_timer_softirq <idle>-0 0dNs3 17us : _raw_spin_unlock_irq <-run_timer_softirq <idle>-0 0dNs3 17us+: trace_hardirqs_on <-run_timer_softirq <idle>-0 0dNs3 25us : <stack trace> => _raw_spin_unlock_irq => run_timer_softirq => __do_softirq ... # echo 0 > options/function-trace # echo irqsoff > current_tracer # echo 1 > tracing_on # echo 0 > tracing_max_latency # ls -ltr [...] # echo 0 > tracing_on # cat trace Note the above example had function-trace not set. If we set function-trace, we get a much larger output
  • 45. Example of stack tracer • ftrace makes it convenient to check the stack size at every function call # echo 1 > /proc/sys/kernel/stack_tracer_enabled After running it for a few minutes, the output looks like: # cat stack_max_size 2928 # cat stack_trace Depth Size Location (18 entries) ----- ---- -------- 0) 2928 224 update_sd_lb_stats+0xbc/0x4ac 1) 2704 160 find_busiest_group+0x31/0x1f1 2) 2544 256 load_balance+0xd9/0x662 3) 2288 80 idle_balance+0xbb/0x130 4) 2208 128 __schedule+0x26e/0x5b9 5) 2080 16 schedule+0x64/0x66 6) 2064 128 schedule_timeout+0x34/0xe0 7) 1936 112 wait_for_common+0x97/0xf1 8) 1824 16 wait_for_completion+0x1d/0x1f 9) 1808 128 flush_work+0xfe/0x119 10) 1680 16 tty_flush_to_ldisc+0x1e/0x20 11) 1664 48 input_available_p+0x1d/0x5c 12) 1616 48 n_tty_poll+0x6d/0x134 13) 1568 64 tty_poll+0x64/0x7f 14) 1504 880 do_select+0x31e/0x511 15) 624 400 core_sys_select+0x177/0x216 16) 224 96 sys_select+0x91/0xb9 17) 128 128 system_call_fastpath+0x16/0x1b
  • 46. ftrace homework • Read https://www.kernel.org/doc/Documentation/trace/events.txt This document is about event tracing (static tracepoints) • perf-tools is a collection of performance analysis tools for Linux ftrace and perf_events. Try to find a good use of it in your work. You can download it from https://github.com/brendangregg/perf-tools.git • Write a small program using ftrace to track the number of context switches per second for each CPU. $ sudo ./ftrace_ctxt_switches.py ... Duration (sec): 61.386, Context switches (per sec): CPU0: 1130 ( 18) CPU1: 5875 ( 96) CPU2: 183 ( 3) CPU3: 230 ( 4) Duration (sec): 63.784, Context switches (per sec): CPU0: 1138 ( 18) CPU1: 6028 ( 95) CPU2: 188 ( 3) CPU3: 230 ( 4) ...
  • 47. References • Linux Device Drivers, 3rd Edition, Jonathan Corbet • Linux kernel source, http://lxr.free-electrons.com • Choose a Linux tracer, Brendan Gregg – http://www.brendangregg.com/blog/2015-07- 08/choosing-a-linux-tracer.html • KDB and KGDB kernel documentation – http://kernel.org/pub/linux/kernel/people/jwessel/kdb/