SlideShare ist ein Scribd-Unternehmen logo
1 von 137
Downloaden Sie, um offline zu lesen
Linux Kernel - BPF / XDP
KossLab 유태희, 송태웅
BPF 란 ?
1. 1. Berkeley Packet Filter since 1992
2. 2. Kernel Infrastructure
BPF 란 ?
1. Berkeley Packet Filter since 1992
1. 2. Kernel Infrastructure
a. - Interpreter in-kernel virtual machine
- Hook points in-kernel callback point
- Map
- Helper
BPF 란 ?
“Safe dynamic programs and tools”
"런타임중 안전하게 커널코드를 삽입하는 기술"
BPF Infrastructure:
안전한 code injection 작전
1) Native 머신코드 대신 BPF instruction 을 활용하자
2) Verifier 를 통해 위험요소를 미리검사하자
3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
BPF Infrastructure:
안전한 code injection 작전
1) Native 머신코드 대신 BPF instruction 을 활용하자
BPF Infrastructure:
안전한 code injection 작전
2) Verifier 를 통해 위험요소를 미리검사하자
BPF Infrastructure:
안전한 code injection 작전
3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
BPF Infrastructure:
안전한 code injection 위한 기반기술
Kernel += BPF Interpreter in-kernel virtual machine
+ Verifier
+ BPF Helper 함수 추가 leveraging kernel func
+ BPF syscall prog/map: loading & attaching 등
1) 주니어 x86 Instruction set ’simplified x86’
(참고: PLUMgrind의 x86 bytecode verifier 실패)
2) BPF = classic BPF:10% + x86:70% + arm64:25% + risc:5%
3) Instruction encoding 사이즈 고정
(for high interpreter speed)
4) 간소화 -> 위험을 예측하고 예방하기 수월
(Verifier를 통한 loop, memory access 범위 점검 등)
5) Architecture-independent
BPF Instruction set:
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
$ cat include/uapi/linux/bpf.h
[...]
struct bpf_insn {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
__u8 src_reg:4; /* source register */
__s16 off; /* signed offset */
__s32 imm; /* signed immediate constant */
};
[...]
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + fields:4
+ fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + LD/ST fields:4
+ ALU/JUM fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
LD/ST 계열:
0x00 ~ 0x03
ALU/JMP 계열:
0x04 ~ 0x07
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + LD/ST fields:4
+ ALU/JUM fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
LD/ST 계열:
0x00 ~ 0x03
ALU/JMP 계열:
0x04 ~ 0x07
BPF Instruction set:
struct bpf_insn prog[] = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
BPF_LD_MAP_FD(BPF_REG_1, map_fd),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
BPF_EXIT_INSN(),
};
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/samples/bpf/sock_example.c
BPF Helper 함수:
$ grep BPF_CALL
kernel/bpf/helpers.c:
BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key)
BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key,
[...]
kernel/trace/bpf_trace.c:
BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src,
BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
[...]
net/core/filter.c:
BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb)
BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x)
[...]
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 3 BPF
BPF (Safe dynamic programs and tools)
M: Alexei Starovoitov <ast@kernel.org>
M: Daniel Borkmann <daniel@iogearbox.net>
L: netdev@vger.kernel.org
[...]
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
BPF as a kernel subproject
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
JIT 지원 arch:
x86,
arm, arm64
sparc,
s390,
powerpc, mips
“Safe dynamic programs and tools”
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
BPF core:
Syscall,
Interpreter,
Verifier,
Generic Helpers,
Maps,
...
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
Hook points,
Specific Helpers
...
For cBPF, ...
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
bpf loading(lib),
bpf tool,
test codes,
samples,
...
BPF as a kernel subproject
BPF Infrastructure:
BPF프로그램 활용을 위한 지원
1) Hook points in-kernel callback point
2) Map user-to-kernel shared memory
3) helper를 통한 커널함수호출 leveraging
4) Object pinning /sys/fs/bpf/...
KERNEL SPACE
bpf() SYSCALL
BPF Controller 1
(User App)
ip tc
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
BPF
BPF
BPF
func(): Helper
func()
func()
func()
BPF library
in-iproute2
BPF Controller 2
(User App)
. . . . . .
BPF Architecture:
BPF library: libbpf
prog/map
load, attach, control
XDP
iptables는 충분히 빠른가요?
iptables는 왜 느릴까요?
iptables의 정책을 튜닝해본적 있으신가요?
XDP
(eXpress Data Path)
XDP == FAST PATH
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
XDP FAST PATH
REDIRECT
TX
APP
RX
L7
L4
L3
DD
BPF
Tutorial
준비물
1. 컴파일 컴퓨터 1대
2. 테스트 컴퓨터 1대(x86추천)
3. 커널 소스코드
4. clang + llvm(컴파일러)
5. bpftool(bpf 프로그램 로더)
6. bpf를 지원하는 iproute2 패키지
clang + llvm
컴파일러
git.kernel.org 의 bpf tree
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
커널 소스코드
bpftool
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/tools/bpf/bpftool
BPF 프로그램 로더
iproute2
https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/iproute2.git
XDP 설정도구
kernel source code 및 bpf sample code
samples/bpf
예제
kernel소스 내 sample code 분석
samples/bpf
예제(xdp_rxq_info_kern.c)
BPF 프로그램 컴파일 실습
samples/bpf
컴파일
$ mount bpffs /sys/fs/bpf -t bpf
$ bpftool prog load ./xdp_rxq_info_kern.o /sys/fs/bpf/xdp
프로그램 로드
$ ls /sys/fs/bpf/
$ ./bpftool prog list
$ ./bpftool prog dump xlated id X
jited
프로그램 확인
$ ip link set dev lo xdp pin /sys/fs/bpf/xdp
XDP프로그램 설정
$ ip link show dev lo
XDP프로그램 설정 확인
$ ip link set dev lo xdp off
$ rm /sys/fs/bpf/xdp
XDP프로그램 설정 제거
iptables vs XDP
TEST NETWORK
PC2
192.168.4.2
PC1
192.168.4.1
ICMP
$ ping
iptables를 사용하여 패킷을 버리기
DROP
#PC2
$ ping 192.168.4.1
#PC1
$ iptables -A INPUT -s 192.168.4.2 -d 192.168.4.1 -p icmp 
-j DROP
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DROP
XDP를 사용하여 패킷을 버리기
DROP
$ ./bpftool prog load ./xdp_icmp.o /sys/fs/bpf/xdp_icmp
$ ip link set dev lo xdp pin /sys/fs/bpf/xdp_icmp
XDP프로그램 설정 제거
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DROP
BPF Tracing
iptables path VS XDP path
netif_receive_skb_internal()
ipt_do_table()
DROP
BPF Tracing:
iptables - DROP case
netif_receive_skb_internal()
ipt_do_table()
Long time !! ~~
DROP
BPF Tracing:
iptables - DROP case
netif_receive_skb_internal()
do_xdp_generic()
DROP
BPF Tracing:
XDP - DROP case
netif_receive_skb_internal()
do_xdp_generic()
Short time !! ~~
DROP
BPF Tracing:
XDP - DROP case
netif_receive_skb_internal()
ipt_do_table()
do_xdp_generic()
Short time !! ~~
BPF Tracing:
iptables vs XDP - DROP case
DROP
DROP
Long time !! ~~
BPF Tracing:
iptables vs XDP - DROP case
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
DROP
DROP
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
BPF
Beginning point: BPF ATTACH !!
BPF
Return point: BPF ATTACH !!
Return point: BPF ATTACH !!
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
BPF
BPFSEC("kprobe/netif_receive_skb_internal")
int bpf_trace_receive_skb(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
u64 start_time = bpf_ktime_get_ns();
bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time,
BPF_ANY);
return 0;
}
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kprobe/netif_receive_skb_internal")
int bpf_trace_receive_skb(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
u64 start_time = bpf_ktime_get_ns();
bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time,
BPF_ANY);
return 0;
}
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/do_xdp_generic")
int bpf_trace_xdp_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM2(ctx);
int action = PT_REGS_RC(ctx);
if (action == XDP_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/do_xdp_generic")
int bpf_trace_xdp_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM2(ctx);
int action = PT_REGS_RC(ctx);
if (action == XDP_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/ipt_do_table")
int bpf_trace_iptables_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
int action = PT_REGS_RC(ctx);
if (action == NF_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/ipt_do_table")
int bpf_trace_iptables_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
int action = PT_REGS_RC(ctx);
if (action == NF_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
Ftrace Tracing
iptables path VS XDP path
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
YOU WIN !!
“XDP is LOVE”
BPF internals
BPF Infrastructure:
1) Hook points in-kernel callback point
2) LOAD ATTACH CALLBACK
3) Verifier / Interpreter / JIT
4) Map user-to-kernel shared memory
5) helper를 통한 커널함수호출 leveraging
6) Object pinning /sys/fs/bpf/…
...
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return
. . .
. . .
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return
. . .
. . .if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
tc: L3 DD 직전 / 직후 지점if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
tc: L3 DD 직전 / 직후 지점if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
BPF Interpreter
또는
JIT 된 머신코드
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
HOW ?
tc: L3 DD 직전 / 직후 지점
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF_PROG_LOAD
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
HOW ? in bpf()
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
4. select runtime:
1) BPF interpreter func addr
2) JIT 후 BPF func addr
return fd;
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
4. select runtime:
1) BPF interpreter func addr
2) JIT 후 BPF func addr
if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
return fd;
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
다양한 BPF ATTACH 방식:
- sock(), send() AF_NETLINK
- bpf() syscall BPF_PROG_ATTACH
BPF_RAW_TRACEPOINT_OPEN
- kprobe event id, ioctl()
PERF_EVENT_IOC_SET_BPF
...
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
Callback !!
Callback !!
BPF CALLBACK !!
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
func(): Helper
func()
func()
func()
BPF Helper 함수를 통한 커널함수 호출 leveraging
!!
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
func(): Helper
func()
func()
func()
BPF Controller 1
(User App)
BPF library: libbpf
prog/map
load, attach, control
BPF Controller 2
(User App)
BPF map 을 통한 user to kernel memory shared
KERNEL SPACE
bpf() SYSCALL
BPF Controller 1
(User App)
ip tc
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
BPF
BPF
BPF
func(): Helper
func()
func()
func()
BPF library
in-iproute2
BPF Controller 2
(User App)
. . . . . .
BPF Architecture:
BPF library: libbpf
prog/map
load, attach, control
XDP internals
XDP_ABORT
XDP_DROP
XDP_PASS
XDP_TX
XDP_REDIRECT
XDP RETURN TYPE
XDP_REDIRECT
XDP_TX
XDP_PASS
BPF
APP
XDP_DROP
Network Device Driver
Generic XDP
vs
Driver XDP
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DRIVER XDP PATH
REDIRECT
TX
APP
RX
L7
L4
L3
L2
PASS
BPF
DRIVER XDP PATH
REDIRECT
TX
APP
RX
L7
L4
L3
L2
PASS
BPF
Driver XDP vs Generic XDP
REDIRECT
TX
RX
PASS
BPF
REDIRECT
TX
RX
L3
BPF
PASS
XDP 자료구조와 SKB
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
skb->data
xdp->data_hard_start
xdp->data_meta
xdp_frame
DATA ACCESS 허용범위
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
xdp->data_meta
xdp->data_hard_start
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
xdp->data_meta
xdp->data_hard_start
XDP_REDIRECT분석
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
bpf_redirect()통한 XDP_REDIRECT
bpf_redirect()에 대해
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECT - bulkTX
bulkTX
REDIRECT
TX
RX
BPF
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
map
DEVMAP
DEVMAP
REDIRECT
TX
RX
BPF
xdp_frame
DEVMAP
redirect info
bpf_redirect_map
Key Value(Device)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
CPUMAP
CPUMAP
REDIRECT
???
RX
BPF
xdp_frame
CPUMAP
redirect info
bpf_redirect_map
Key Value(CPU)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
CPUMAP
REDIRECT
netif_receive_skb_core
RX
BPF
xdp_frame
CPUMAP
redirect info
bpf_redirect_map
Key Value(CPU)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
GENERIC_XDP의 REDIRECT
BPFILTER
● memory model switching
○ /net/core/xdp.c
● page pool
○ /net/core/page_pool
● offload
● AF_XDP && XSK(XDP SOCKET)
● helper functions
● Device Driver
Additional Topics:
● Verifier
○ CFG, DAG, register, memory check...
● Other types
○ TC, SOCKET FILTER, CGROUP
● BTF
○ ELFutils, clang -g, llc -mattr=dwarfris
● Tail call
○ bpf_prog_array 연관
Additional Topics:
● FACEBOOK’s Katran
○ L4 Load-balancing
○ https://github.com/facebookincubator/katran
● Suricata
○ IPD/IDS engine
○ https://suricata-ids.org/
● Cilium
○ https://cilium.io/
● IOvisor bcc
○ https://www.iovisor.org/
● IR Decoding
○ https://lwn.net/Articles/759188/
Additional Topics:

Weitere ähnliche Inhalte

Was ist angesagt?

debugging openstack neutron /w openvswitch
debugging openstack neutron /w openvswitchdebugging openstack neutron /w openvswitch
debugging openstack neutron /w openvswitch
어형 이
 

Was ist angesagt? (20)

eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
 
淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道 淺談探索 Linux 系統設計之道
淺談探索 Linux 系統設計之道
 
Performance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux KernelPerformance Analysis Tools for Linux Kernel
Performance Analysis Tools for Linux Kernel
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Dataplane programming with eBPF: architecture and tools
Dataplane programming with eBPF: architecture and toolsDataplane programming with eBPF: architecture and tools
Dataplane programming with eBPF: architecture and tools
 
XDP in Practice: DDoS Mitigation @Cloudflare
XDP in Practice: DDoS Mitigation @CloudflareXDP in Practice: DDoS Mitigation @Cloudflare
XDP in Practice: DDoS Mitigation @Cloudflare
 
Replacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with Cilium
 
eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
DoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDKDoS and DDoS mitigations with eBPF, XDP and DPDK
DoS and DDoS mitigations with eBPF, XDP and DPDK
 
debugging openstack neutron /w openvswitch
debugging openstack neutron /w openvswitchdebugging openstack neutron /w openvswitch
debugging openstack neutron /w openvswitch
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
SFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress UpdateSFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress Update
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 

Ähnlich wie BPF / XDP 8월 세미나 KossLab

Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 
Not breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABINot breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABI
Alison Chaiken
 

Ähnlich wie BPF / XDP 8월 세미나 KossLab (20)

Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDCSF 19 eBPF Superpowers
DCSF 19 eBPF Superpowers
 
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
Beagleboard xm-setup
Beagleboard xm-setupBeagleboard xm-setup
Beagleboard xm-setup
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPFUSENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
 
Meetup 2009
Meetup 2009Meetup 2009
Meetup 2009
 
PHP selber bauen
PHP selber bauenPHP selber bauen
PHP selber bauen
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep Dive
 
Not breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABINot breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABI
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

BPF / XDP 8월 세미나 KossLab

  • 1. Linux Kernel - BPF / XDP KossLab 유태희, 송태웅
  • 2. BPF 란 ? 1. 1. Berkeley Packet Filter since 1992 2. 2. Kernel Infrastructure
  • 3. BPF 란 ? 1. Berkeley Packet Filter since 1992 1. 2. Kernel Infrastructure a. - Interpreter in-kernel virtual machine - Hook points in-kernel callback point - Map - Helper
  • 4. BPF 란 ? “Safe dynamic programs and tools” "런타임중 안전하게 커널코드를 삽입하는 기술"
  • 5. BPF Infrastructure: 안전한 code injection 작전 1) Native 머신코드 대신 BPF instruction 을 활용하자 2) Verifier 를 통해 위험요소를 미리검사하자 3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
  • 6. BPF Infrastructure: 안전한 code injection 작전 1) Native 머신코드 대신 BPF instruction 을 활용하자
  • 7. BPF Infrastructure: 안전한 code injection 작전 2) Verifier 를 통해 위험요소를 미리검사하자
  • 8. BPF Infrastructure: 안전한 code injection 작전 3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
  • 9. BPF Infrastructure: 안전한 code injection 위한 기반기술 Kernel += BPF Interpreter in-kernel virtual machine + Verifier + BPF Helper 함수 추가 leveraging kernel func + BPF syscall prog/map: loading & attaching 등
  • 10. 1) 주니어 x86 Instruction set ’simplified x86’ (참고: PLUMgrind의 x86 bytecode verifier 실패) 2) BPF = classic BPF:10% + x86:70% + arm64:25% + risc:5% 3) Instruction encoding 사이즈 고정 (for high interpreter speed) 4) 간소화 -> 위험을 예측하고 예방하기 수월 (Verifier를 통한 loop, memory access 범위 점검 등) 5) Architecture-independent BPF Instruction set:
  • 11. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 $ cat include/uapi/linux/bpf.h [...] struct bpf_insn { __u8 code; /* opcode */ __u8 dst_reg:4; /* dest register */ __u8 src_reg:4; /* source register */ __s16 off; /* signed offset */ __s32 imm; /* signed immediate constant */ }; [...]
  • 12. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + fields:4 + fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h
  • 13. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + LD/ST fields:4 + ALU/JUM fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h LD/ST 계열: 0x00 ~ 0x03 ALU/JMP 계열: 0x04 ~ 0x07
  • 14. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + LD/ST fields:4 + ALU/JUM fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h LD/ST 계열: 0x00 ~ 0x03 ALU/JMP 계열: 0x04 ~ 0x07
  • 15. BPF Instruction set: struct bpf_insn prog[] = { BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */), BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */ BPF_LD_MAP_FD(BPF_REG_1, map_fd), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */ BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */ BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */ BPF_EXIT_INSN(), }; https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/samples/bpf/sock_example.c
  • 16. BPF Helper 함수: $ grep BPF_CALL kernel/bpf/helpers.c: BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key) BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key, [...] kernel/trace/bpf_trace.c: BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc) BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr) BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src, BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1, [...] net/core/filter.c: BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb) BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x) [...]
  • 17. BPF as a kernel subproject “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 3 BPF BPF (Safe dynamic programs and tools) M: Alexei Starovoitov <ast@kernel.org> M: Daniel Borkmann <daniel@iogearbox.net> L: netdev@vger.kernel.org [...]
  • 18. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ BPF as a kernel subproject
  • 19. $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ JIT 지원 arch: x86, arm, arm64 sparc, s390, powerpc, mips “Safe dynamic programs and tools” BPF as a kernel subproject
  • 20. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ BPF core: Syscall, Interpreter, Verifier, Generic Helpers, Maps, ... BPF as a kernel subproject
  • 21. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ Hook points, Specific Helpers ... For cBPF, ... BPF as a kernel subproject
  • 22. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ bpf loading(lib), bpf tool, test codes, samples, ... BPF as a kernel subproject
  • 23. BPF Infrastructure: BPF프로그램 활용을 위한 지원 1) Hook points in-kernel callback point 2) Map user-to-kernel shared memory 3) helper를 통한 커널함수호출 leveraging 4) Object pinning /sys/fs/bpf/...
  • 24. KERNEL SPACE bpf() SYSCALL BPF Controller 1 (User App) ip tc Map 1 (Shared memory) Map 2 (Shared memory) . . BPF BPF BPF func(): Helper func() func() func() BPF library in-iproute2 BPF Controller 2 (User App) . . . . . . BPF Architecture: BPF library: libbpf prog/map load, attach, control
  • 25. XDP
  • 30. XDP == FAST PATH
  • 31. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 32. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 35. 준비물 1. 컴파일 컴퓨터 1대 2. 테스트 컴퓨터 1대(x86추천) 3. 커널 소스코드 4. clang + llvm(컴파일러) 5. bpftool(bpf 프로그램 로더) 6. bpf를 지원하는 iproute2 패키지
  • 37. git.kernel.org 의 bpf tree https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 커널 소스코드
  • 40. kernel source code 및 bpf sample code samples/bpf 예제
  • 41. kernel소스 내 sample code 분석 samples/bpf 예제(xdp_rxq_info_kern.c)
  • 42. BPF 프로그램 컴파일 실습 samples/bpf 컴파일
  • 43. $ mount bpffs /sys/fs/bpf -t bpf $ bpftool prog load ./xdp_rxq_info_kern.o /sys/fs/bpf/xdp 프로그램 로드
  • 44. $ ls /sys/fs/bpf/ $ ./bpftool prog list $ ./bpftool prog dump xlated id X jited 프로그램 확인
  • 45. $ ip link set dev lo xdp pin /sys/fs/bpf/xdp XDP프로그램 설정
  • 46. $ ip link show dev lo XDP프로그램 설정 확인
  • 47. $ ip link set dev lo xdp off $ rm /sys/fs/bpf/xdp XDP프로그램 설정 제거
  • 51. #PC2 $ ping 192.168.4.1 #PC1 $ iptables -A INPUT -s 192.168.4.2 -d 192.168.4.1 -p icmp -j DROP
  • 52. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 53. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD DROP
  • 55. $ ./bpftool prog load ./xdp_icmp.o /sys/fs/bpf/xdp_icmp $ ip link set dev lo xdp pin /sys/fs/bpf/xdp_icmp XDP프로그램 설정 제거
  • 56. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD DROP
  • 59. netif_receive_skb_internal() ipt_do_table() Long time !! ~~ DROP BPF Tracing: iptables - DROP case
  • 61. netif_receive_skb_internal() do_xdp_generic() Short time !! ~~ DROP BPF Tracing: XDP - DROP case
  • 62. netif_receive_skb_internal() ipt_do_table() do_xdp_generic() Short time !! ~~ BPF Tracing: iptables vs XDP - DROP case DROP DROP Long time !! ~~
  • 63. BPF Tracing: iptables vs XDP - DROP case net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) DROP DROP
  • 64. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case
  • 65. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF BPF Beginning point: BPF ATTACH !! BPF Return point: BPF ATTACH !! Return point: BPF ATTACH !!
  • 66. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF BPF BPFSEC("kprobe/netif_receive_skb_internal") int bpf_trace_receive_skb(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); u64 start_time = bpf_ktime_get_ns(); bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time, BPF_ANY); return 0; }
  • 67. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kprobe/netif_receive_skb_internal") int bpf_trace_receive_skb(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); u64 start_time = bpf_ktime_get_ns(); bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time, BPF_ANY); return 0; }
  • 68. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/do_xdp_generic") int bpf_trace_xdp_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM2(ctx); int action = PT_REGS_RC(ctx); if (action == XDP_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 69. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/do_xdp_generic") int bpf_trace_xdp_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM2(ctx); int action = PT_REGS_RC(ctx); if (action == XDP_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 70. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/ipt_do_table") int bpf_trace_iptables_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); int action = PT_REGS_RC(ctx); if (action == NF_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 71. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/ipt_do_table") int bpf_trace_iptables_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); int action = PT_REGS_RC(ctx); if (action == NF_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 73. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb()
  • 74. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() DROP
  • 75. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP
  • 76. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP
  • 77. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP YOU WIN !! “XDP is LOVE”
  • 79. BPF Infrastructure: 1) Hook points in-kernel callback point 2) LOAD ATTACH CALLBACK 3) Verifier / Interpreter / JIT 4) Map user-to-kernel shared memory 5) helper를 통한 커널함수호출 leveraging 6) Object pinning /sys/fs/bpf/… ...
  • 80. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return . . . . . .
  • 81. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return . . . . . .if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); 특정 커널 함수 안에
  • 82. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! tc: L3 DD 직전 / 직후 지점if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); 특정 커널 함수 안에
  • 83. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! tc: L3 DD 직전 / 직후 지점if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); BPF Interpreter 또는 JIT 된 머신코드 특정 커널 함수 안에
  • 84. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! HOW ? tc: L3 DD 직전 / 직후 지점
  • 85. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf
  • 86. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call
  • 87. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call Map 1 (Shared memory)
  • 88. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF_PROG_LOAD Map 1 (Shared memory)
  • 89. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory)
  • 90. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . HOW ? in bpf() Map 1 (Shared memory)
  • 91. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) Map 1 (Shared memory)
  • 92. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr Map 1 (Shared memory)
  • 93. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr 4. select runtime: 1) BPF interpreter func addr 2) JIT 후 BPF func addr return fd; Map 1 (Shared memory)
  • 94. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr 4. select runtime: 1) BPF interpreter func addr 2) JIT 후 BPF func addr if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); return fd;
  • 95. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . 다양한 BPF ATTACH 방식: - sock(), send() AF_NETLINK - bpf() syscall BPF_PROG_ATTACH BPF_RAW_TRACEPOINT_OPEN - kprobe event id, ioctl() PERF_EVENT_IOC_SET_BPF ...
  • 96. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF Callback !! Callback !! BPF CALLBACK !!
  • 97. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF func(): Helper func() func() func() BPF Helper 함수를 통한 커널함수 호출 leveraging !!
  • 98. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF func(): Helper func() func() func() BPF Controller 1 (User App) BPF library: libbpf prog/map load, attach, control BPF Controller 2 (User App) BPF map 을 통한 user to kernel memory shared
  • 99. KERNEL SPACE bpf() SYSCALL BPF Controller 1 (User App) ip tc Map 1 (Shared memory) Map 2 (Shared memory) . . BPF BPF BPF func(): Helper func() func() func() BPF library in-iproute2 BPF Controller 2 (User App) . . . . . . BPF Architecture: BPF library: libbpf prog/map load, attach, control
  • 104. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 105. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 108. Driver XDP vs Generic XDP REDIRECT TX RX PASS BPF REDIRECT TX RX L3 BPF PASS
  • 110.
  • 116. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 117. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 118. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 120.
  • 122.
  • 123.
  • 124. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 125. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 128. DEVMAP
  • 129. DEVMAP REDIRECT TX RX BPF xdp_frame DEVMAP redirect info bpf_redirect_map Key Value(Device) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 130. CPUMAP
  • 131. CPUMAP REDIRECT ??? RX BPF xdp_frame CPUMAP redirect info bpf_redirect_map Key Value(CPU) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 132. CPUMAP REDIRECT netif_receive_skb_core RX BPF xdp_frame CPUMAP redirect info bpf_redirect_map Key Value(CPU) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 135. ● memory model switching ○ /net/core/xdp.c ● page pool ○ /net/core/page_pool ● offload ● AF_XDP && XSK(XDP SOCKET) ● helper functions ● Device Driver Additional Topics:
  • 136. ● Verifier ○ CFG, DAG, register, memory check... ● Other types ○ TC, SOCKET FILTER, CGROUP ● BTF ○ ELFutils, clang -g, llc -mattr=dwarfris ● Tail call ○ bpf_prog_array 연관 Additional Topics:
  • 137. ● FACEBOOK’s Katran ○ L4 Load-balancing ○ https://github.com/facebookincubator/katran ● Suricata ○ IPD/IDS engine ○ https://suricata-ids.org/ ● Cilium ○ https://cilium.io/ ● IOvisor bcc ○ https://www.iovisor.org/ ● IR Decoding ○ https://lwn.net/Articles/759188/ Additional Topics: