SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
© 2018 NETRONOME
Jiong Wang
Linux Plumbers Conference
Vancouver, Nov, 2018
Efficient JIT to 32-bit Arches
1
© 2018 NETRONOME
Background
 ISA specification and impact on JIT compiler
 Default code-gen use 64-bit register, ALU64, JMP64
 No big impact on 64-bit backends (REX header for x86-64)
 Requires extra regs and insns on 32-bit arches.
 Programs in real world are 32-bit friendly
 Nearly all non-pointer arithmetic are 32-bit and all will be if pointer is 32-bit
2
38%
62%
test_l4lb_noinline.c
32-bit ALU 64-bit ALU + JMP
31%
69%
test_xdp_noinline.c
32-bit ALU 64-bit ALU + JMP
Arch BPF_ALU/BPF_JMP
arm emit_alu_r(rd[1], rs[1], true, false, op, ctx);
emit_alu_r(rd[0], rs[0], true, true, op, ctx)
emit(ARM_CMP_R(rd, rm), ctx);
emit(ARM_COND_EQ, ARM_CMP_R(rt, rn), ctx);
nfp emit_alu(…reg_both(dst), reg_a(dst), alu_op, reg_b(src));
emit_alu(…reg_both(dst + 1), reg_a(dst + 1), alu_op, reg_b(src +
1));
…
© 2018 NETRONOME
Solution A - 32-bit subreg ISA
 eBPF ISA already defined 32-bit subreg and ALU32 insns, could pass 32-bit semantics from c types down to assembly
 A subreg read is always 32-bit read, but write implicitly zeros high 32-bit. Compilers or hand written assembly could be
using this
 32-bit arches must model implicitly zero extension using extra insns to guarantee correctness. This applies to all ALU32
insn. How could we avoid these extra code-gen? 3
void cal(u32 *a, u32 *b, u64 *c) {
u32 sum = *a + *b;
*c = sum;
}
-mattr=+alu32
cal:
w1 = *(u32 *)(r1 + 0)
w2 = *(u32 *)(r2 + 0)
w2 += w1
*(u64 *)(r3 + 0) = r2
exit
ALU32 insn but there is exploit of implicit zero
extension on w2 by the following store
Reference a 32-bit subreg as a 64-bit definition
-mattr=+alu32 (since LLVM 7.0)
void cal(u32 *a, u32 *b, u32 *c) {
u32 sum = *a + *b;
*c = sum;
}
cal:
w1 = *(u32 *)(r1 + 0)
w2 = *(u32 *)(r2 + 0)
w2 += w1
*(u32 *)(r3 + 0) = w2
exit
cal:
r1 = *(u32 *)(r1 + 0)
r2 = *(u32 *)(r2 + 0)
r2 += r1
*(u32 *)(r3 + 0) = r2
exit
default
© 2018 NETRONOME
Solution A - 32-bit subreg ISA
 Just don’t do zero extension on the DST_REG of ALU32 at all, do them the first time DST_REG is used as 64-bit. Mark the
reg to save on later use
 Insns have 64-bit register read including: cond BPF_JMP, BPF_ALU, BPF_STX | BPF_DW
 cond JMP overhead could be reduced by introducing BPF_JMP32 insn. X86_64 and AArch64 ISA do support this
 A large portion of BPF_ALU comes from address calculation, for example calculating address of local variables.
Could potentially avoid this using verifier register type info
 Without flow analysis, when a instruction is jump destination (start of basic block), need to clear register mark. This could
possibly cause quite a few unnecessary code-gen
Benchmark Total insn Total cond JMP JMP32 Percentage
test_xdp_noinline 999 84 52 61.9%
test_l4lb_noinline 569 40 24 60.0%
© 2018 NETRONOME
Solution A - 32-bit subreg ISA
 LLVM might have better knowledge on all these, so another choice is stopping LLVM exploiting implicit zero extension and
generate it explicitly
 For explicit zero extension, we can’t use existing ALU32 MOV as otherwise we can’t differentiate it with normally MOV
which we don’t want to clear high 32bits. We will need new explicit zero-extension instruction BPF_UEXT
 But how could verifier safely know the input sequences is compiled from LLVM compiler conforming to such convention is
an issue. Could generate ELF tag, however which could be faked, therefore have potential impact on security
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
6
 Work with any input sequence, for example pure ALU64 sequence. Can’t leverage LLVM’s work
 Optimistic
 Assume all ALU instructions are 32-bit safe initially
 Once find one 64-bit register use, pollute all instructions contributed to the value of this register
 Pessimistic
 Assume all ALU instructions are 64-bit initially
 Mark one ALU insn as 32-bit safe only when all the use of its definition are 32-bit
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
 Amount of 32-bit use info decides amount of opportunities
 Without +alu32 code-gen, use info come from
 B/H/W STORE
 CMP-JMP if reg is zero extended
 With +alu32, all ALU32 insn could also produce use info, and analysis could finished quicker
r1 r1
r2 r2
r2 r2
r3r2
r0
r1
define use0 use1
r1 = *(u32 *)(r1 + 0)
r2 = *(u32 *)(r2 + 0)
r2 += r1
*(u32 *)(r3 + 0) = r2
exit
w2 r2
w1 r1
w1 w1
w2 …w1
r0
w2
define use0 use1
w2 = *(u32 *)(r2 + 0)
w1 = *(u32 *)(r1 + 0)
w1 += w2
call foo
exit
void bar(u32 *a, u32 *b, u32 *c) {
u32 sum = *a + *b;
}
*c = sum
*c = foo(sum)
7
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
 A stand-alone, drop-and-work implementation organized as a kernel lib
 Implementation available as RFC
 Could be enhanced with previous CFG infrastructure
https://lwn.net/Articles/753724 to become a global DF analyzer
struct bpf_du_chain {
struct bpf_du_insn *insn;
struct bpf_du_chain *next;
unsigned int flags;
};
r2 r2
r1 r1
r1 r1
w2 …r1
r0
r2
define use0 use1
r2 = *(u32 *)(r2 + 0)
r1 = *(u32 *)(r1 + 0)
r1 += r2
*(u32 *)(r3 + 0) = r2
exit
struct bpf_du_insn {
struct bpf_du_chain *chain[2];
unsigned int flags;
};
bpf_df_init
run DF passes
(bpf_df_pass_32bit_safe)
bpf_df_init
copy DF result
to JIT backend
• Partition scope, mark places
introducing unknown def and use
• Scan insn sequence from start to
end, build DU chain
• Run pre-requisite pass if any
• Mark seed 32-bit insn
• Iteratively backward propagate
until reached fix-point
• Copy DF result kept as flags
• Release resources
8
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
 Algorithm pseudo code for local analyzer
ext_defs[MAX_BPF_REG] = false;
// insn define the current value of R
act_defs[MAX_BPF_REG] = NULL;
bpf_df_init:
for_each_insn_in_the_sequence
if (cur_insn == jmp or call)
dst_insn.flag |= FLAG_IS_JMP_DST
for_each_insn_in_the_sequence
if (cur_insn.flag & FLAG_IS_JMP_DST)
ext_defs[0..MAX_BPF_REG] = true
if (cur_insn == call or exit or jmp )
add_unknown_use(act_def[arg_regs])
or act_defs[0..MAX_BPF_REG] = NULL
cur_insn.u2d[0] = act_defs[cur_insn.dst_reg]
act_defs[cur_insn.dst_reg].d2u.next = cur_insn;
cur_insn.u2d[1] = act_defs[cur_insn.src_reg]
act_defs[cur_insn.src_reg].d2u.next = cur_insn;
bpf_df_pass_32bit_safe:
for_each_insn_in_the_sequence
if (cur_insn == ST_B/H/W && has_single_use(cur_insn.u2d[1]))
cur_insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE
else if (cur_insn == ALU32) {
if (has_single_use(cur_insn.u2d[0])
cur_insn.u2d[0].flag |= FLAG_IS_32BIT_SAFE
if (BPF_X && has_single_use(cur_insn.u2d[1])
cur_insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE
}
do {
changed = false
for_each_insn_in_the_sequence
changed |= propagate_32bit_safe(cur_insn)
while (changed)
propagate_32bit_safe:
changed = false
if (!(insn.flag & FLAG_IS_32BIT_SAFE))
return false
if (insn == alu64 or alu32 except right shift) {
if (has_single_use(insn.u2d[0])
insn.u2d[0].flag |= FLAG_IS_32BIT_SAFE
if (BPF_X && has_single_use(insn.u2d[1])
insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE
}
return changed;
9
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
10
 Build global analyser on top of CFG infrastructure
 Split each register into hi/lo, hi0 ~ lo10, lo0 ~ lo10, do classic live variable analysis
 Calculate LiveIn and LiveOut for hi registers, meaning whether high 32-bit are live (used later) when entering and
leaving one basic block. For 32-bit safety, we care about Out(s) which could tell us the use in successor blocks
 Gen(s) function is complex than classic Gen(s) as we need insn DU chain to propagate from def to use. Can’t simply
finish initialize Gen(s) set using insn scan
Out(s) = ∪s′ ∊ succ(s)
In(s′)
In(s) = Gen(s) ∪ (Out(s) - Kill(s))
© 2018 NETRONOME
Solution B – Standalone DF Analyzer
KERNEL/lib/bpf_core_cfg.c
KERNEL/lib/bpf_core_df.c
KERNEL/lib/bpf_core_opt_32bit.c
driver/nfp driver/arm tool/bpftool Any other places inside
kernel tree needs
eBPF insn analysis
Ideally should compile for both userspace and kernel space
userspace compilation mode
11
© 2018 NETRONOME
Solution C - Enhance Verifier DF analyzer
 Verifier has a light “DF analyzer”, designed for releasing more path prune opportunities
 It is integrated with path walker, collects and processes information while walking insn
 Fit scenarios which do not need memorize historical information, but doesn’t fit well otherwise
12
10: r6 = *(u32 *)(r10 - 4)
11: r7 = *(u32 *)(r10 - 8)
12: *(u64 *)(r10 - 16) = r6
13: *(u64 *)(r10 - 24) = r7
14: r6 += 1
15: r7 += r6
16: *(u32 *)(r10 - 28) = r7
17: r3 += r7
18: r4 = r6
19: *(u64 *)(r10 - 32) = r3
20: *(u64 *)(r10 - 40) = r4
State A
State B
State C
R4 Propagation
R3 Propagation
reg read propagation
10: r6 = *(u32 *)(r10 - 4)
11: r7 = *(u32 *)(r10 - 8)
12: *(u64 *)(r10 - 16) = r6
13: *(u64 *)(r10 - 24) = r7
14: r6 += 1
15: r7 += r6
16: *(u32 *)(r10 - 28) = r7
17: r3 += r7
18: r4 = r6
19: *(u64 *)(r10 - 32) = r3
20: *(u64 *)(r10 - 40) = r4
64-bit usage propagation
R4 Propagation
R3 Propagation
R7 Propagation
R6 Propagation
© 2018 NETRONOME
Solution C - Enhance Verifier DF analyzer
 32-bit analysis algorithm based on verifier DF analyzer
 On solution B, one insn is 32-bit safe if its definition has 32-bit use only, this requires def<->use dual link
 Verifier DF analyzer is based on code path walking, we don’t know whether all uses has been visited, so can only
build use->def singular chain
 Therefore, the algorithm is using 64-bit “polluting”. Initially all instructions are considered as 32-bit safe, then
whenever there is one 64-bit use of one definition, that insn is polluted as 64-bit
 64-bit polluting could propagate from definition to use. For example, A = B + C, definition A has 64-bit use, insn A is
polluted, uses B and C must also be 64-bit as they are forming A, therefore definition insn of B and C should be
polluted as well
13
r7 += r6
r3 += r7
*(u64 *)(r10 - 28) = r3
insn_mark_stack[stack_idx++] = insn_idx;
while (stack_idx) {
def_idx = insn_mark_stack[--stack_idx];
aux[def_idx].full_ref = true;
u2d0 = aux[def_idx].u2d[0];
if (u2d0 >= 0 && !aux[u2d0].full_ref)
insn_mark_stack[stack_idx++] = u2d0;
u2d1 = aux[def_idx].u2d[1];
if (u2d1 >= 0 && !aux[u2d1].full_ref)
insn_mark_stack[stack_idx++] = u2d1;
}
Need to pollute insns that
define r3, r6 and r7 and any
one further backward
iteratively.
© 2018 NETRONOME
Solution C - Enhance Verifier DF analyzer
 64-bit read in pruned path need to be propagated upward to parent verifier state
 But 64-bit read propagation is more complex, it won’t be screened off by a simple write and such propagation needs to be
done iteratively on all relevant registers
10: r6 = *(u32 *)(r10 - 4)
11: r4+=r3
12: *(u64 *)(r10 - 16) = r6
13: *(u64 *)(r10 - 24) = r7
14: r4 = 1
14: r6 += r4
16: r5 += r6
17: r4 += r5
18: r5 = 1
19: r4 += r5
20: *(u64 *)(r10 - 40) = r4
normal reg read
propagation
64-bit reg read
propagation
when propagation start when walking the insn when walking the insn,
but could start later when
one relevant reg identified
as 64-bit
need historical info no yes
need to record all insns
relevant to one reg
screen off any write to the reg write to the reg, but
source shouldn’t be
overlapping with the dest
affect other regs no yes
instructions involved generating r4, registers in these insns are all relevant
14
© 2018 NETRONOME
Solution C - Enhance Verifier DF analyzer
 Historical information
 Insns involved with value generation of the reg
 Because reg state is changing alone with insn walking, so should keep the state when walking the insn
 For example, r4 in insn 14 contributed to the final value of r4 used in insn 20 which is 64-bit, so it should be
propagated to parent state. Insn 15 however screened off r14, if we don’t record the reg state when walking insn 14,
we will miss this propagation
15
14: r6 += r4
15: r4 = 1
16: r5 += r6
17: r4 += r5
18: r5 = 1
19: r4 += r5
20: *(u64 *)(r10 - 40) = r4
State A
State B
struct bpf_insn_aux_data {
…
struct bpf_reg_state_lite reg_state_lite[MAX_BPF_REG];
struct bpf_verifier_state *parent_vstate;
s16 u2d[2];
}
struct bpf_reg_state_lite {
struct bpf_reg_state *parent;
u32 live;
};
insn walker could visit one insn recursively (disallowed now), or repeatedly, both would break the chain!
© 2018 NETRONOME
Solution C - Enhance Verifier DF analyzer
 Algorithm pseudo code
16
enum reg_arg_type {
SRC_OP_0,
SRC_OP64_0,
SRC_OP_1,
SRC_OP64_1,
SRC_OP64_IMP,
DST_OP,
U_DST_OP,
DST_OP_NO_MARK,
U_DST_OP_NO_MARK
};
check_reg_arg across verifier need to pass finer arg_type
depending on the position of the src and whether it is 64bit
check_reg_arg(env, insn->src_reg, SRC_OP|64_0, insn_idx);
check_reg_arg(env, insn->dst_reg, SRC_OP|64_1, insn_idx);
dst_type = no overlap between dst_reg and any src ?
U_DST_OP_NO_MARK : DST_OP_NO_MARK;
check_reg_arg(env, insn->dst_reg, dst_type, insn_idx);
check_reg_arg(reg, type, insn_idx):
rstate = cur_frame->regs + insn_idx
if (type == SRC_*) {
link_use_to_def(rstate, type, insn_idx)
mark_reg_read(… type == SRC_*64_*);
} else {
rstate->live |= REG_LIVE_WRITTEN;
if (t == U_DST_OP || t == U_DST_OP_NO_MARK)
rstate->live |= REG_LIVE_WRITTEN_UNIQUE;
…
rstate->def_insn_idx = insn_idx;
if (insn_idx != INVALID_INSN_IDX)
copy_reg_state_lite(insn_aux[insn_idx].regs_lite, regs);
}
link_reg_to_def(rstate, type, insn_idx):
slot_idx = 0 or 1 depending on type
def_idx = rstate->def_insn_idx;
insn_aux_data[insn_idx].u2d[slot_idx] = def_idx;
mark_reg_read:
…
if (!64bit_read)
return;
return mark_reg_read64(…)
mark_reg_read64:
while (parent) {
if (writes &&
state->live &
REG_LIVE_WRITTEN_UNIQUE)
break;
parent->live |= REG_LIVE_READ64;
state = parent;
parent = state->parent;
writes = true;
}
if (no_mark_insn)
return;
return mark_insn_64bit(def_idx);
mark_insn_64bit:
insn_mark_stack[stack_idx++] = insn_idx;
while (stack_idx) {
def_idx = insn_mark_stack[--stack_idx];
aux[def_idx].full_ref = true;
u2d0 = aux[def_idx].u2d[0];
if (u2d0 >= 0 && !aux[u2d0].full_ref) {
insn_mark_stack[stack_idx++] = u2d0;
mark_reg_read64(aux[def_idx].u2d0.dst_reg,
no_mark_insn)
}
u2d1 = aux[def_idx].u2d[1];
if (u2d1 >= 0 && !aux[u2d1].full_ref) {
insn_mark_stack[stack_idx++] = u2d1;
mark_reg_read64(aux[def_idx].u2d1.dst_reg,
no_mark_insn)
}
}
© 2018 NETRONOME
Which approach to go?
 Verifier
 Better to focus on verification, offer reliable and fast verification, no bothering of optimization
 DF analysis based on dynamic insn walking requires recording historical information
 Path prune however make it very difficult to collect such information on such scope
 Classic CFG + DF analysis
 Reliable and has sophisticated algorithms
 Redo LLVM’s work, heavy for kernel space
 Reuse information passed down from LLVM through 32-bit subreg ISA
 Throw the heavy lift work to user space static compiler who is also really good at
 Lack of some instructions (JMP32 etc) is causing trouble
 Best to follow this approach and enhance eBPF ISA ?
17
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...Linaro
 
8086 labmanual
8086 labmanual8086 labmanual
8086 labmanualiravi9
 
Code GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemCode GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemMarina Kolpakova
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesMarina Kolpakova
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowMarina Kolpakova
 
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...Hsien-Hsin Sean Lee, Ph.D.
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processorsRISC-V International
 
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Hsien-Hsin Sean Lee, Ph.D.
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Marina Kolpakova
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersMarina Kolpakova
 
15CS44 MP & MC Module 1
15CS44 MP & MC Module 115CS44 MP & MC Module 1
15CS44 MP & MC Module 1RLJIT
 
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Hsien-Hsin Sean Lee, Ph.D.
 
Happy To Use SIMD
Happy To Use SIMDHappy To Use SIMD
Happy To Use SIMDWei-Ta Wang
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5PRADEEP
 
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PipeliningLec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PipeliningHsien-Hsin Sean Lee, Ph.D.
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me_xhr_
 

Was ist angesagt? (20)

BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
 
Code GPU with CUDA - SIMT
Code GPU with CUDA - SIMTCode GPU with CUDA - SIMT
Code GPU with CUDA - SIMT
 
8086 labmanual
8086 labmanual8086 labmanual
8086 labmanual
 
Code GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemCode GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory Subsystem
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
 
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processors
 
Ch2 arm-1
Ch2 arm-1Ch2 arm-1
Ch2 arm-1
 
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
 
Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...Pragmatic optimization in modern programming - modern computer architecture c...
Pragmatic optimization in modern programming - modern computer architecture c...
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
 
3D-DRESD Lorenzo Pavesi
3D-DRESD Lorenzo Pavesi3D-DRESD Lorenzo Pavesi
3D-DRESD Lorenzo Pavesi
 
DSP_Assign_1
DSP_Assign_1DSP_Assign_1
DSP_Assign_1
 
15CS44 MP & MC Module 1
15CS44 MP & MC Module 115CS44 MP & MC Module 1
15CS44 MP & MC Module 1
 
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 
Happy To Use SIMD
Happy To Use SIMDHappy To Use SIMD
Happy To Use SIMD
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
 
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- PipeliningLec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
Lec1 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Pipelining
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me
 

Ähnlich wie Efficient JIT to 32-bit Arches

Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet FiltersKernel TLV
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)bolovv
 
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander NasonovMultiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonoveurobsdcon
 
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...Andrey Karpov
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentOOO "Program Verification Systems"
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecturesreea4
 
Exploring the x64
Exploring the x64Exploring the x64
Exploring the x64FFRI, Inc.
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
64bit SMP OS for TILE-Gx many core processor
64bit SMP OS for TILE-Gx many core processor64bit SMP OS for TILE-Gx many core processor
64bit SMP OS for TILE-Gx many core processorToru Nishimura
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCKernel TLV
 
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching materialJohn Williams
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardJian-Hong Pan
 

Ähnlich wie Efficient JIT to 32-bit Arches (20)

Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
Chapter Eight(3)
Chapter Eight(3)Chapter Eight(3)
Chapter Eight(3)
 
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander NasonovMultiplatform JIT Code Generator for NetBSD by Alexander Nasonov
Multiplatform JIT Code Generator for NetBSD by Alexander Nasonov
 
Arm architecture
Arm architectureArm architecture
Arm architecture
 
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
 
PVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications developmentPVS-Studio, a solution for resource intensive applications development
PVS-Studio, a solution for resource intensive applications development
 
A Close Look at ARM Code Size
A Close Look at ARM Code SizeA Close Look at ARM Code Size
A Close Look at ARM Code Size
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecture
 
Arm
ArmArm
Arm
 
Exploring the x64
Exploring the x64Exploring the x64
Exploring the x64
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
ARM Introduction
ARM IntroductionARM Introduction
ARM Introduction
 
64bit SMP OS for TILE-Gx many core processor
64bit SMP OS for TILE-Gx many core processor64bit SMP OS for TILE-Gx many core processor
64bit SMP OS for TILE-Gx many core processor
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
 
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
 
Share the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development BoardShare the Experience of Using Embedded Development Board
Share the Experience of Using Embedded Development Board
 
arm-intro.ppt
arm-intro.pptarm-intro.ppt
arm-intro.ppt
 
C programming session10
C programming  session10C programming  session10
C programming session10
 

Mehr von Netronome

Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Netronome
 
LFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DSLFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DSNetronome
 
LFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M InstructionsLFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M InstructionsNetronome
 
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureUsing Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureNetronome
 
Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Netronome
 
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware OffloadsQuality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware OffloadsNetronome
 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project LaunchNetronome
 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesNetronome
 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFNetronome
 
Massively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryMassively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryNetronome
 
Offloading Linux LAG Devices Via Open vSwitch and TC
Offloading Linux LAG Devices Via Open vSwitch and TCOffloading Linux LAG Devices Via Open vSwitch and TC
Offloading Linux LAG Devices Via Open vSwitch and TCNetronome
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniquesNetronome
 
eBPF & Switch Abstractions
eBPF & Switch AbstractionseBPF & Switch Abstractions
eBPF & Switch AbstractionsNetronome
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureNetronome
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT CompilerNetronome
 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction Netronome
 
Host Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsHost Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsNetronome
 
The Power of SmartNICs
The Power of SmartNICsThe Power of SmartNICs
The Power of SmartNICsNetronome
 
DPDK Support for New HW Offloads
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW OffloadsNetronome
 

Mehr von Netronome (20)

Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 
LFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DSLFSMM AF XDP Queue I-DS
LFSMM AF XDP Queue I-DS
 
LFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M InstructionsLFSMM Verifier Optimizations and 1 M Instructions
LFSMM Verifier Optimizations and 1 M Instructions
 
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server ArchitectureUsing Network Acceleration for an Optimized Edge Cloud Server Architecture
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
 
Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports Offloading TC Rules on OVS Internal Ports
Offloading TC Rules on OVS Internal Ports
 
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware OffloadsQuality of Service Ingress Rate Limiting and OVS Hardware Offloads
Quality of Service Ingress Rate Limiting and OVS Hardware Offloads
 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
 
Flexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific ArchitecturesFlexible and Scalable Domain-Specific Architectures
Flexible and Scalable Domain-Specific Architectures
 
Unifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPFUnifying Network Filtering Rules for the Linux Kernel with eBPF
Unifying Network Filtering Rules for the Linux Kernel with eBPF
 
Massively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional MemoryMassively Parallel RISC-V Processing with Transactional Memory
Massively Parallel RISC-V Processing with Transactional Memory
 
Offloading Linux LAG Devices Via Open vSwitch and TC
Offloading Linux LAG Devices Via Open vSwitch and TCOffloading Linux LAG Devices Via Open vSwitch and TC
Offloading Linux LAG Devices Via Open vSwitch and TC
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
 
eBPF & Switch Abstractions
eBPF & Switch AbstractionseBPF & Switch Abstractions
eBPF & Switch Abstractions
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
 
Demystify eBPF JIT Compiler
Demystify eBPF JIT CompilerDemystify eBPF JIT Compiler
Demystify eBPF JIT Compiler
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
P4 Introduction
P4 Introduction P4 Introduction
P4 Introduction
 
Host Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsHost Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment Models
 
The Power of SmartNICs
The Power of SmartNICsThe Power of SmartNICs
The Power of SmartNICs
 
DPDK Support for New HW Offloads
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW Offloads
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Kürzlich hochgeladen (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Efficient JIT to 32-bit Arches

  • 1. © 2018 NETRONOME Jiong Wang Linux Plumbers Conference Vancouver, Nov, 2018 Efficient JIT to 32-bit Arches 1
  • 2. © 2018 NETRONOME Background  ISA specification and impact on JIT compiler  Default code-gen use 64-bit register, ALU64, JMP64  No big impact on 64-bit backends (REX header for x86-64)  Requires extra regs and insns on 32-bit arches.  Programs in real world are 32-bit friendly  Nearly all non-pointer arithmetic are 32-bit and all will be if pointer is 32-bit 2 38% 62% test_l4lb_noinline.c 32-bit ALU 64-bit ALU + JMP 31% 69% test_xdp_noinline.c 32-bit ALU 64-bit ALU + JMP Arch BPF_ALU/BPF_JMP arm emit_alu_r(rd[1], rs[1], true, false, op, ctx); emit_alu_r(rd[0], rs[0], true, true, op, ctx) emit(ARM_CMP_R(rd, rm), ctx); emit(ARM_COND_EQ, ARM_CMP_R(rt, rn), ctx); nfp emit_alu(…reg_both(dst), reg_a(dst), alu_op, reg_b(src)); emit_alu(…reg_both(dst + 1), reg_a(dst + 1), alu_op, reg_b(src + 1)); …
  • 3. © 2018 NETRONOME Solution A - 32-bit subreg ISA  eBPF ISA already defined 32-bit subreg and ALU32 insns, could pass 32-bit semantics from c types down to assembly  A subreg read is always 32-bit read, but write implicitly zeros high 32-bit. Compilers or hand written assembly could be using this  32-bit arches must model implicitly zero extension using extra insns to guarantee correctness. This applies to all ALU32 insn. How could we avoid these extra code-gen? 3 void cal(u32 *a, u32 *b, u64 *c) { u32 sum = *a + *b; *c = sum; } -mattr=+alu32 cal: w1 = *(u32 *)(r1 + 0) w2 = *(u32 *)(r2 + 0) w2 += w1 *(u64 *)(r3 + 0) = r2 exit ALU32 insn but there is exploit of implicit zero extension on w2 by the following store Reference a 32-bit subreg as a 64-bit definition -mattr=+alu32 (since LLVM 7.0) void cal(u32 *a, u32 *b, u32 *c) { u32 sum = *a + *b; *c = sum; } cal: w1 = *(u32 *)(r1 + 0) w2 = *(u32 *)(r2 + 0) w2 += w1 *(u32 *)(r3 + 0) = w2 exit cal: r1 = *(u32 *)(r1 + 0) r2 = *(u32 *)(r2 + 0) r2 += r1 *(u32 *)(r3 + 0) = r2 exit default
  • 4. © 2018 NETRONOME Solution A - 32-bit subreg ISA  Just don’t do zero extension on the DST_REG of ALU32 at all, do them the first time DST_REG is used as 64-bit. Mark the reg to save on later use  Insns have 64-bit register read including: cond BPF_JMP, BPF_ALU, BPF_STX | BPF_DW  cond JMP overhead could be reduced by introducing BPF_JMP32 insn. X86_64 and AArch64 ISA do support this  A large portion of BPF_ALU comes from address calculation, for example calculating address of local variables. Could potentially avoid this using verifier register type info  Without flow analysis, when a instruction is jump destination (start of basic block), need to clear register mark. This could possibly cause quite a few unnecessary code-gen Benchmark Total insn Total cond JMP JMP32 Percentage test_xdp_noinline 999 84 52 61.9% test_l4lb_noinline 569 40 24 60.0%
  • 5. © 2018 NETRONOME Solution A - 32-bit subreg ISA  LLVM might have better knowledge on all these, so another choice is stopping LLVM exploiting implicit zero extension and generate it explicitly  For explicit zero extension, we can’t use existing ALU32 MOV as otherwise we can’t differentiate it with normally MOV which we don’t want to clear high 32bits. We will need new explicit zero-extension instruction BPF_UEXT  But how could verifier safely know the input sequences is compiled from LLVM compiler conforming to such convention is an issue. Could generate ELF tag, however which could be faked, therefore have potential impact on security
  • 6. © 2018 NETRONOME Solution B – Standalone DF Analyzer 6  Work with any input sequence, for example pure ALU64 sequence. Can’t leverage LLVM’s work  Optimistic  Assume all ALU instructions are 32-bit safe initially  Once find one 64-bit register use, pollute all instructions contributed to the value of this register  Pessimistic  Assume all ALU instructions are 64-bit initially  Mark one ALU insn as 32-bit safe only when all the use of its definition are 32-bit
  • 7. © 2018 NETRONOME Solution B – Standalone DF Analyzer  Amount of 32-bit use info decides amount of opportunities  Without +alu32 code-gen, use info come from  B/H/W STORE  CMP-JMP if reg is zero extended  With +alu32, all ALU32 insn could also produce use info, and analysis could finished quicker r1 r1 r2 r2 r2 r2 r3r2 r0 r1 define use0 use1 r1 = *(u32 *)(r1 + 0) r2 = *(u32 *)(r2 + 0) r2 += r1 *(u32 *)(r3 + 0) = r2 exit w2 r2 w1 r1 w1 w1 w2 …w1 r0 w2 define use0 use1 w2 = *(u32 *)(r2 + 0) w1 = *(u32 *)(r1 + 0) w1 += w2 call foo exit void bar(u32 *a, u32 *b, u32 *c) { u32 sum = *a + *b; } *c = sum *c = foo(sum) 7
  • 8. © 2018 NETRONOME Solution B – Standalone DF Analyzer  A stand-alone, drop-and-work implementation organized as a kernel lib  Implementation available as RFC  Could be enhanced with previous CFG infrastructure https://lwn.net/Articles/753724 to become a global DF analyzer struct bpf_du_chain { struct bpf_du_insn *insn; struct bpf_du_chain *next; unsigned int flags; }; r2 r2 r1 r1 r1 r1 w2 …r1 r0 r2 define use0 use1 r2 = *(u32 *)(r2 + 0) r1 = *(u32 *)(r1 + 0) r1 += r2 *(u32 *)(r3 + 0) = r2 exit struct bpf_du_insn { struct bpf_du_chain *chain[2]; unsigned int flags; }; bpf_df_init run DF passes (bpf_df_pass_32bit_safe) bpf_df_init copy DF result to JIT backend • Partition scope, mark places introducing unknown def and use • Scan insn sequence from start to end, build DU chain • Run pre-requisite pass if any • Mark seed 32-bit insn • Iteratively backward propagate until reached fix-point • Copy DF result kept as flags • Release resources 8
  • 9. © 2018 NETRONOME Solution B – Standalone DF Analyzer  Algorithm pseudo code for local analyzer ext_defs[MAX_BPF_REG] = false; // insn define the current value of R act_defs[MAX_BPF_REG] = NULL; bpf_df_init: for_each_insn_in_the_sequence if (cur_insn == jmp or call) dst_insn.flag |= FLAG_IS_JMP_DST for_each_insn_in_the_sequence if (cur_insn.flag & FLAG_IS_JMP_DST) ext_defs[0..MAX_BPF_REG] = true if (cur_insn == call or exit or jmp ) add_unknown_use(act_def[arg_regs]) or act_defs[0..MAX_BPF_REG] = NULL cur_insn.u2d[0] = act_defs[cur_insn.dst_reg] act_defs[cur_insn.dst_reg].d2u.next = cur_insn; cur_insn.u2d[1] = act_defs[cur_insn.src_reg] act_defs[cur_insn.src_reg].d2u.next = cur_insn; bpf_df_pass_32bit_safe: for_each_insn_in_the_sequence if (cur_insn == ST_B/H/W && has_single_use(cur_insn.u2d[1])) cur_insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE else if (cur_insn == ALU32) { if (has_single_use(cur_insn.u2d[0]) cur_insn.u2d[0].flag |= FLAG_IS_32BIT_SAFE if (BPF_X && has_single_use(cur_insn.u2d[1]) cur_insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE } do { changed = false for_each_insn_in_the_sequence changed |= propagate_32bit_safe(cur_insn) while (changed) propagate_32bit_safe: changed = false if (!(insn.flag & FLAG_IS_32BIT_SAFE)) return false if (insn == alu64 or alu32 except right shift) { if (has_single_use(insn.u2d[0]) insn.u2d[0].flag |= FLAG_IS_32BIT_SAFE if (BPF_X && has_single_use(insn.u2d[1]) insn.u2d[1].flag |= FLAG_IS_32BIT_SAFE } return changed; 9
  • 10. © 2018 NETRONOME Solution B – Standalone DF Analyzer 10  Build global analyser on top of CFG infrastructure  Split each register into hi/lo, hi0 ~ lo10, lo0 ~ lo10, do classic live variable analysis  Calculate LiveIn and LiveOut for hi registers, meaning whether high 32-bit are live (used later) when entering and leaving one basic block. For 32-bit safety, we care about Out(s) which could tell us the use in successor blocks  Gen(s) function is complex than classic Gen(s) as we need insn DU chain to propagate from def to use. Can’t simply finish initialize Gen(s) set using insn scan Out(s) = ∪s′ ∊ succ(s) In(s′) In(s) = Gen(s) ∪ (Out(s) - Kill(s))
  • 11. © 2018 NETRONOME Solution B – Standalone DF Analyzer KERNEL/lib/bpf_core_cfg.c KERNEL/lib/bpf_core_df.c KERNEL/lib/bpf_core_opt_32bit.c driver/nfp driver/arm tool/bpftool Any other places inside kernel tree needs eBPF insn analysis Ideally should compile for both userspace and kernel space userspace compilation mode 11
  • 12. © 2018 NETRONOME Solution C - Enhance Verifier DF analyzer  Verifier has a light “DF analyzer”, designed for releasing more path prune opportunities  It is integrated with path walker, collects and processes information while walking insn  Fit scenarios which do not need memorize historical information, but doesn’t fit well otherwise 12 10: r6 = *(u32 *)(r10 - 4) 11: r7 = *(u32 *)(r10 - 8) 12: *(u64 *)(r10 - 16) = r6 13: *(u64 *)(r10 - 24) = r7 14: r6 += 1 15: r7 += r6 16: *(u32 *)(r10 - 28) = r7 17: r3 += r7 18: r4 = r6 19: *(u64 *)(r10 - 32) = r3 20: *(u64 *)(r10 - 40) = r4 State A State B State C R4 Propagation R3 Propagation reg read propagation 10: r6 = *(u32 *)(r10 - 4) 11: r7 = *(u32 *)(r10 - 8) 12: *(u64 *)(r10 - 16) = r6 13: *(u64 *)(r10 - 24) = r7 14: r6 += 1 15: r7 += r6 16: *(u32 *)(r10 - 28) = r7 17: r3 += r7 18: r4 = r6 19: *(u64 *)(r10 - 32) = r3 20: *(u64 *)(r10 - 40) = r4 64-bit usage propagation R4 Propagation R3 Propagation R7 Propagation R6 Propagation
  • 13. © 2018 NETRONOME Solution C - Enhance Verifier DF analyzer  32-bit analysis algorithm based on verifier DF analyzer  On solution B, one insn is 32-bit safe if its definition has 32-bit use only, this requires def<->use dual link  Verifier DF analyzer is based on code path walking, we don’t know whether all uses has been visited, so can only build use->def singular chain  Therefore, the algorithm is using 64-bit “polluting”. Initially all instructions are considered as 32-bit safe, then whenever there is one 64-bit use of one definition, that insn is polluted as 64-bit  64-bit polluting could propagate from definition to use. For example, A = B + C, definition A has 64-bit use, insn A is polluted, uses B and C must also be 64-bit as they are forming A, therefore definition insn of B and C should be polluted as well 13 r7 += r6 r3 += r7 *(u64 *)(r10 - 28) = r3 insn_mark_stack[stack_idx++] = insn_idx; while (stack_idx) { def_idx = insn_mark_stack[--stack_idx]; aux[def_idx].full_ref = true; u2d0 = aux[def_idx].u2d[0]; if (u2d0 >= 0 && !aux[u2d0].full_ref) insn_mark_stack[stack_idx++] = u2d0; u2d1 = aux[def_idx].u2d[1]; if (u2d1 >= 0 && !aux[u2d1].full_ref) insn_mark_stack[stack_idx++] = u2d1; } Need to pollute insns that define r3, r6 and r7 and any one further backward iteratively.
  • 14. © 2018 NETRONOME Solution C - Enhance Verifier DF analyzer  64-bit read in pruned path need to be propagated upward to parent verifier state  But 64-bit read propagation is more complex, it won’t be screened off by a simple write and such propagation needs to be done iteratively on all relevant registers 10: r6 = *(u32 *)(r10 - 4) 11: r4+=r3 12: *(u64 *)(r10 - 16) = r6 13: *(u64 *)(r10 - 24) = r7 14: r4 = 1 14: r6 += r4 16: r5 += r6 17: r4 += r5 18: r5 = 1 19: r4 += r5 20: *(u64 *)(r10 - 40) = r4 normal reg read propagation 64-bit reg read propagation when propagation start when walking the insn when walking the insn, but could start later when one relevant reg identified as 64-bit need historical info no yes need to record all insns relevant to one reg screen off any write to the reg write to the reg, but source shouldn’t be overlapping with the dest affect other regs no yes instructions involved generating r4, registers in these insns are all relevant 14
  • 15. © 2018 NETRONOME Solution C - Enhance Verifier DF analyzer  Historical information  Insns involved with value generation of the reg  Because reg state is changing alone with insn walking, so should keep the state when walking the insn  For example, r4 in insn 14 contributed to the final value of r4 used in insn 20 which is 64-bit, so it should be propagated to parent state. Insn 15 however screened off r14, if we don’t record the reg state when walking insn 14, we will miss this propagation 15 14: r6 += r4 15: r4 = 1 16: r5 += r6 17: r4 += r5 18: r5 = 1 19: r4 += r5 20: *(u64 *)(r10 - 40) = r4 State A State B struct bpf_insn_aux_data { … struct bpf_reg_state_lite reg_state_lite[MAX_BPF_REG]; struct bpf_verifier_state *parent_vstate; s16 u2d[2]; } struct bpf_reg_state_lite { struct bpf_reg_state *parent; u32 live; }; insn walker could visit one insn recursively (disallowed now), or repeatedly, both would break the chain!
  • 16. © 2018 NETRONOME Solution C - Enhance Verifier DF analyzer  Algorithm pseudo code 16 enum reg_arg_type { SRC_OP_0, SRC_OP64_0, SRC_OP_1, SRC_OP64_1, SRC_OP64_IMP, DST_OP, U_DST_OP, DST_OP_NO_MARK, U_DST_OP_NO_MARK }; check_reg_arg across verifier need to pass finer arg_type depending on the position of the src and whether it is 64bit check_reg_arg(env, insn->src_reg, SRC_OP|64_0, insn_idx); check_reg_arg(env, insn->dst_reg, SRC_OP|64_1, insn_idx); dst_type = no overlap between dst_reg and any src ? U_DST_OP_NO_MARK : DST_OP_NO_MARK; check_reg_arg(env, insn->dst_reg, dst_type, insn_idx); check_reg_arg(reg, type, insn_idx): rstate = cur_frame->regs + insn_idx if (type == SRC_*) { link_use_to_def(rstate, type, insn_idx) mark_reg_read(… type == SRC_*64_*); } else { rstate->live |= REG_LIVE_WRITTEN; if (t == U_DST_OP || t == U_DST_OP_NO_MARK) rstate->live |= REG_LIVE_WRITTEN_UNIQUE; … rstate->def_insn_idx = insn_idx; if (insn_idx != INVALID_INSN_IDX) copy_reg_state_lite(insn_aux[insn_idx].regs_lite, regs); } link_reg_to_def(rstate, type, insn_idx): slot_idx = 0 or 1 depending on type def_idx = rstate->def_insn_idx; insn_aux_data[insn_idx].u2d[slot_idx] = def_idx; mark_reg_read: … if (!64bit_read) return; return mark_reg_read64(…) mark_reg_read64: while (parent) { if (writes && state->live & REG_LIVE_WRITTEN_UNIQUE) break; parent->live |= REG_LIVE_READ64; state = parent; parent = state->parent; writes = true; } if (no_mark_insn) return; return mark_insn_64bit(def_idx); mark_insn_64bit: insn_mark_stack[stack_idx++] = insn_idx; while (stack_idx) { def_idx = insn_mark_stack[--stack_idx]; aux[def_idx].full_ref = true; u2d0 = aux[def_idx].u2d[0]; if (u2d0 >= 0 && !aux[u2d0].full_ref) { insn_mark_stack[stack_idx++] = u2d0; mark_reg_read64(aux[def_idx].u2d0.dst_reg, no_mark_insn) } u2d1 = aux[def_idx].u2d[1]; if (u2d1 >= 0 && !aux[u2d1].full_ref) { insn_mark_stack[stack_idx++] = u2d1; mark_reg_read64(aux[def_idx].u2d1.dst_reg, no_mark_insn) } }
  • 17. © 2018 NETRONOME Which approach to go?  Verifier  Better to focus on verification, offer reliable and fast verification, no bothering of optimization  DF analysis based on dynamic insn walking requires recording historical information  Path prune however make it very difficult to collect such information on such scope  Classic CFG + DF analysis  Reliable and has sophisticated algorithms  Redo LLVM’s work, heavy for kernel space  Reuse information passed down from LLVM through 32-bit subreg ISA  Throw the heavy lift work to user space static compiler who is also really good at  Lack of some instructions (JMP32 etc) is causing trouble  Best to follow this approach and enhance eBPF ISA ? 17 Thank you!