2. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Automated vulnerability detection techniques
1. Blackbox Fuzzing (no program analysis, no feedback)
2. Whitebox Fuzzing (mostly program analysis)
3. Greybox Fuzzing (no program analysis, but coverage feedback)
3. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Automated vulnerability detection techniques
1. Blackbox Fuzzing (no program analysis, no feedback)
š Model-Based
Blackbox
Fuzzing
Peach, Spike ā¦
Seed Input
5. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Automated vulnerability detection techniques
1. Blackbox Fuzzing (no program analysis, no feedback)
š Model-Based
Blackbox
Fuzzing
Input model
Peach, Spike ā¦
Seed Input
š
š
š
Valid input
Semi-valid input
Semi-valid input
Mutated Inputs
6. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Automated vulnerability detection techniques
1. Blackbox Fuzzing (no program analysis, no feedback)
2. Whitebox Fuzzing (mostly program analysis)
7. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Automated vulnerability detection techniques
1. Blackbox Fuzzing (no program analysis, no feedback)
2. Whitebox Fuzzing (mostly program analysis)
3. Greybox Fuzzing (no program analysis, but coverage feedback)
š šš šGreybox
Fuzzing
ā¦
Seed Input
8. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Automated vulnerability detection techniques
1. Blackbox Fuzzing (no program analysis, no feedback)
2. Whitebox Fuzzing (mostly program analysis)
3. Greybox Fuzzing (no program analysis, but coverage feedback)
š šš šGreybox
Fuzzing
ā¦
šš Enqueue
Seed Input Mutated Inputs
Queue of āØ
āinterestingā seeds
Retain inputsāØ
that increaseāØ
coverage!
9. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Automated vulnerability detection techniques
1. Blackbox Fuzzing (no program analysis, no feedback)
2. Whitebox Fuzzing (mostly program analysis)
3. Greybox Fuzzing (no program analysis, but coverage feedback)
š šš šGreybox
Fuzzing
ā¦
šš EnqueueDequeue
Seed Input Mutated Inputs
Queue of āØ
āinterestingā seeds
Retain inputsāØ
that increaseāØ
coverage!
10. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Greybox Fuzzing is frequently used
ā¢ State-of-the-art in automated vulnerability detection
ā¢ Extremely eļ¬cient coverage-based input generation
ā¢ All program analysis before/at instrumentation time.
ā¢ Start with a seed corpus, choose a seed ļ¬le, fuzz it.
ā¢ Add to corpus only if new input increases coverage.
11. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Greybox Fuzzing is frequently used
ā¢ State-of-the-art in automated vulnerability detection
ā¢ Extremely eļ¬cient coverage-based input generation
ā¢ All program analysis before/at instrumentation time.
ā¢ Start with a seed corpus, choose a seed ļ¬le, fuzz it.
ā¢ Add to corpus only if new input increases coverage.
ā¢ Cannot be directed!
12. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Directed Fuzzing has many applications
ā¢ Patch Testing: reach changed statements
ā¢ Crash Reproduction: exercise stack trace
ā¢ SA Report Veriļ¬cation: reach ādangerousā location
ā¢ Information Flow Detection: exercise source-sink pairs
13. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Directed Fuzzing: classical constraint satisfaction prob.
ā¢ Program analysis to identify program paths āØ
that reach given program locations.
ā¢ Symbolic Execution to derive path conditions āØ
for any of the identiļ¬ed paths.
ā¢ Constraint Solving to ļ¬nd an input that
ā¢ satisļ¬es the path condition and thus
ā¢ reaches a program location that was given.
Ļ1 = (x>y)ā§(x+y>10)āØ
Ļ2 = Ā¬(x>y)ā§(x+y>10)
x > y
a = x a = y
x+y>10
b = a
return b
14. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Directed Fuzzing: classical constraint satisfaction prob.
ā¢ Program analysis to identify program paths āØ
that reach given program locations.
ā¢ Symbolic Execution to derive path conditions āØ
for any of the identiļ¬ed paths.
ā¢ Constraint Solving to ļ¬nd an input that
ā¢ satisļ¬es the path condition and thus
ā¢ reaches a program location that was given.
Ļ1 = (x>y)ā§(x+y>10)āØ
Ļ2 = Ā¬(x>y)ā§(x+y>10)
x > y
a = x a = y
x+y>10
b = a
return b
RequiresāØ
heavy-weightāØ
machinery!
15. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Overview
ā¢ Directed Fuzzing as optimisation problem!
1. Instrumentation Time:
1. Extract call graph (CG) and control-ļ¬ow graphs (CFGs).
2. For each BB, compute distance to target locations.
3. Instrument program to aggregate distance values.
16. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Overview
ā¢ Directed Fuzzing as optimisation problem!
1. Instrumentation Time:
1. Extract call graph (CG) and control-ļ¬ow graphs (CFGs).
2. For each BB, compute distance to target locations.
3. Instrument program to aggregate distance values.
2. Runtime, for each input
1. collect coverage and distance information, and
2. decide how long to be fuzzed based on distance.
ā¢ If input is closer to the targets, it is fuzzed for longer.
ā¢ If input is further away from the targets, it is fuzzed for shorter.
17. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Instrumentation
ā¢ Function-level target distance using call graph (CG)
main
a b
cd e
18. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Instrumentation
ā¢ Function-level target distance using call graph (CG)
1. Identify target functions in CG
main
a b
cd e
19. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
main
a b
cd e
Instrumentation
ā¢ Function-level target distance using call graph (CG)
1. Identify target functions in CG
2. For each function, compute theāØ
harmonic mean of the length āØ
of the shortest path to targets
1
2
0
2
3N/A
20. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Instrumentation
ā¢ Function-level target distance using call graph (CG)
ā¢ BB-level target distance using CFG
CFG for function b
main
a b
cd e
1
2
0
2
3N/A
21. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Instrumentation
ā¢ Function-level target distance using call graph (CG)
ā¢ BB-level target distance using control-ļ¬ow graph (CFG)
1. Identify target BBs andāØ
assign distance 0āØ
(none in function b)
CFG for function b
main
a b
cd e
1
2
0
2
3N/A
22. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Instrumentation
ā¢ Function-level target distance using call graph (CG)
ā¢ BB-level target distance using control-ļ¬ow graph (CFG)
1. Identify target BBs andāØ
assign distance 0
2. Identify BBs thatāØ
call functions
CFG for function b
main
a b
cd e
1
2
0
2
3N/A
c
a
23. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Instrumentation
ā¢ Function-level target distance using call graph (CG)
ā¢ BB-level target distance using control-ļ¬ow graph (CFG)
1. Identify target BBs andāØ
assign distance 0
2. Identify BBs thatāØ
call functions andāØ
assign 10*FLTD
CFG for function b
main
a b
cd e
1
2
0
2
3N/A
c
a 10
30
24. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Instrumentation
ā¢ Function-level target distance using call graph (CG)
ā¢ BB-level target distance using control-ļ¬ow graph (CFG)
1. Identify target BBs andāØ
assign distance 0
2. Identify BBs thatāØ
call functions andāØ
assign 10*FLTD
3. For each BB, compute harmonicāØ
mean of (length of shortest path toāØ
any function-calling BB + 10*FLTD).
CFG for function b
c
a 10
30
[(1+30)-1+(2+10)-1]-1
25. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Instrumentation
ā¢ Function-level target distance using call graph (CG)
ā¢ BB-level target distance using control-ļ¬ow graph (CFG)
1. Identify target BBs andāØ
assign distance 0
2. Identify BBs thatāØ
call functions andāØ
assign 10*FLTD
3. For each BB, compute harmonicāØ
mean of (length of shortest path toāØ
any function-calling BB + 10*FLTD).
CFG for function b
8.7
11
10
30
13
12
N/A
26. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Runtime
ā¢ Function-level target distance using call graph (CG)
ā¢ BB-level target distance using control-ļ¬ow graph (CFG)
ā¢ Seed distance from instrumented binary
CFG for function b
8.7
11
10
30
13
12
N/A
27. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Runtime
ā¢ Function-level target distance using call graph (CG)
ā¢ BB-level target distance using control-ļ¬ow graph (CFG)
ā¢ Seed distance from instrumented binary
ā¢ Two 64-bit shared memory entries
ā¢ Aggregated BB-level distance values
ā¢ Number of executed BBs
Seed Distance: 19.4 āØ
= (8.7+30)/2
8.7
11
10
30
13
12
N/A
28. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Runtime
ā¢ Function-level target distance using call graph (CG)
ā¢ BB-level target distance using control-ļ¬ow graph (CFG)
ā¢ Seed distance from instrumented binary
ā¢ Two 64-bit shared memory entries
ā¢ Aggregated BB-level distance values
ā¢ Number of executed BBs
8.7
11
10
30
13
12
N/A
Seed Distance: 10.4 āØ
= (8.7+11+10+12)/4
29. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Runtime
ā¢ Function-level target distance using call graph (CG)
ā¢ BB-level target distance using control-ļ¬ow graph (CFG)
ā¢ Seed distance from instrumented binary
ā¢ Two 64-bit shared memory entries
ā¢ Aggregated BB-level distance values
ā¢ Number of executed BBs
8.7
11
10
30
13
12
N/A
Seed Distance: 10.4 āØ
= (8.7+11+10+12)/4
Now that we know how toāØ
compute seed distance,
ā letās minimise it! ā
30. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Background: Coverage-based Greybox Fuzzing
Directed Fuzzing as āØ
Optimisation Problem
Seed File
š
32. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Background: Coverage-based Greybox Fuzzing
Directed Fuzzing as āØ
Optimisation Problem
š šš šGreybox
Fuzzing
ā¦
Seed Input
33. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Background: Coverage-based Greybox Fuzzing
Directed Fuzzing as āØ
Optimisation Problem
š šš šGreybox
Fuzzing
ā¦
šš Enqueue
Seed Input Mutated Inputs
Queue of āØ
āinterestingā seeds
Retain inputsāØ
that increaseāØ
coverage!
34. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Background: Coverage-based Greybox Fuzzing
Directed Fuzzing as āØ
Optimisation Problem
š šš šGreybox
Fuzzing
ā¦
šš EnqueueDequeue
Seed Input Mutated Inputs
Queue of āØ
āinterestingā seeds
Retain inputsāØ
that increaseāØ
coverage!
35. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Background: Coverage-based Greybox Fuzzing
Directed Fuzzing as āØ
Optimisation Problem
šš
Queue of āØ
āinterestingā seeds
šš š
energy
low energyāØ
- generated lessāØ
test inputs!
36. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Background: Coverage-based Greybox Fuzzing
Directed Fuzzing as āØ
Optimisation Problem
šš
Queue of āØ
āinterestingā seeds
šš š
high energyāØ
- generated moreāØ
test inputs!š
energy
37. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Directed Fuzzing as āØ
Optimisation Problem
ā¢ Background: Coverage-based Greybox Fuzzing
ā¢ Seedās energy:
ā¢ Number of inputs generated when chosen for fuzzing
ā¢ Local property: each seed has its own energy
ā¢ Power schedule:
ā¢ Assigns energy to seeds according to a pre-deļ¬ned formula
ā Boosted Greybox Fuzzing (AFLFast CCSā16)
ā¢ Assign more energy to seeds exercising low-frequency paths.
ā Directed Greybox Fuzzing (AFLGo CCSā17)
ā¢ Assign more energy to seeds that a closer to the given targets!
38. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Directed Fuzzing as āØ
Optimisation Problem
ā¢ Background: Coverage-based Greybox Fuzzing
ā¢ Seedās energy:
ā¢ Number of inputs generated when chosen for fuzzing
ā¢ Local property: each seed has its own energy
ā¢ Power schedule:
ā¢ Assigns energy to seeds according to a pre-deļ¬ned formula
ā Boosted Greybox Fuzzing (AFLFast CCSā16)
ā¢ Assign more energy to seeds exercising low-frequency paths.
ā Directed Greybox Fuzzing (AFLGo CCSā17)
ā¢ Assign more energy to seeds that a closer to the given targets!
39. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Directed Greybox Fuzzing
ā¢ Assign more energy to seeds that a closer to the given targets!
ā¢ Problem (Stochastic Gradient Descent)
ā¢ If we always assign more energy to closer seeds, āØ
we typically reach only a local minimum,āØ
but never a global minimum distance!
ā¢ Solution (Simulated Annealing)
ā¢ Sometimes assign more energy to further-away seeds!
ā¢ Approaches global minimum distance.
Directed Fuzzing as āØ
Optimisation Problem
40. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Directed Greybox Fuzzing
ā¢ Assign more energy to seeds that a closer to the given targets!
ā¢ Problem (Stochastic Gradient Descent)
ā¢ If we always assign more energy to closer seeds, āØ
we typically reach only a local minimum,āØ
but never a global minimum distance!
ā¢ Solution (Simulated Annealing)
ā¢ Sometimes assign more energy to further-away seeds!
ā¢ Approaches global minimum distance.
Directed Fuzzing as āØ
Optimisation Problem
41. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Directed Fuzzing as āØ
Optimisation Problem
ā¢ Simulated Annealing (SA)
ā¢ Exploration phase:
ā¢ Energy of closer seeds similar to energyāØ
of further-away seeds
ā¢ Exploitation phase:
ā¢ Energy of closer seeds is assigned āØ
to be higher and higher
ā¢ Energy of further-away seeds is āØ
assigned to be lower and lower
ā¢ We are increasing the āimportanceā āØ
of seed distance over time.
42. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Directed Fuzzing as āØ
Optimisation Problem
ā¢ Simulated Annealing (SA)
ā¢ Annealing from metallurgy: control the cooling of materialāØ
to reduce defects (e.g., cracks or bubbles) in the material.
ā¢ Temperature T ā [0,1] speciļ¬es āimportanceā of distance.
ā¢ At T=1, exploration (normal AFL)
ā¢ At T=0, exploitation (gradient descent)
ā¢ Cooling schedule controls (global) temperature
ā¢ Classically, exponential cooling.
43. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Integrating Simulated Annealing as power schedule
ā¢ In the beginning (t = 0min), āØ
assign the same energyāØ
to all seeds.
Directed Fuzzing as āØ
Optimisation Problem
dene the normalized seed
set of target locations Tb .
s. This trace contains the
ed distance d(s,Tb ) as
(m,Tb )
|
(3)
set S of seeds to fuzz. We
,Tb ) as the dierence be-
he minimum seed distance
by the dierence between
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Distance d(s,Tb)
Energyp(s,Tb)
t = 0min t =10min t = 80min
44. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Integrating Simulated Annealing as power schedule
ā¢ In the beginning (t = 0min), āØ
assign the same energyāØ
to all seeds.
ā¢ Later (t=10min), assignāØ
a bit more energy toāØ
seeds that are closer.āØ
Directed Fuzzing as āØ
Optimisation Problem
dene the normalized seed
set of target locations Tb .
s. This trace contains the
ed distance d(s,Tb ) as
(m,Tb )
|
(3)
set S of seeds to fuzz. We
,Tb ) as the dierence be-
he minimum seed distance
by the dierence between
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Distance d(s,Tb)
Energyp(s,Tb)
t = 0min t =10min t = 80min
45. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Integrating Simulated Annealing as power schedule
ā¢ In the beginning (t = 0min), āØ
assign the same energyāØ
to all seeds.
ā¢ Later (t=10min), assignāØ
a bit more energy toāØ
seeds that are closer.
ā¢ At exploitation (t=80min),āØ
assign maximal energy toāØ
seeds that are closest.
Directed Fuzzing as āØ
Optimisation Problem
dene the normalized seed
set of target locations Tb .
s. This trace contains the
ed distance d(s,Tb ) as
(m,Tb )
|
(3)
set S of seeds to fuzz. We
,Tb ) as the dierence be-
he minimum seed distance
by the dierence between
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Distance d(s,Tb)
Energyp(s,Tb)
t = 0min t =10min t = 80min
46. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Integrating Simulated Annealing as power schedule
ā¢ In the beginning (t = 0min), āØ
assign the same energyāØ
to all seeds.
ā¢ Later (t=10min), assignāØ
a bit more energy toāØ
seeds that are closer.
ā¢ At exploitation (t=80min),āØ
assign maximal energy toāØ
seeds that are closest.
Directed Fuzzing as āØ
Optimisation Problem
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Distance d(s,Tb)
Energyp(s,Tb)
t = 0min t =10min t = 80min
0.00
0.25
0.50
0.75
1.00
0 20 40 60 80
Current time t (in min)
Energyp(s,Tb)
d = 1 d = 0.5 d = 0
47. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Results
ā¢ Patch Testing: Reach changed statements
ā¢ State-of-the-art in patch testing
ā¢ KATCH (based on Klee symbolic exec. tool)
ā¢ Experimental Setup
ā¢ Reuse original KATCH-benchmark
ā¢ Measure patch coverage (#changed BBs reached)
ā¢ Measure vuln. detection (#errors discovered)
KATCH: High-Coverage Testing of Software Patches
Paul Dan Marinescu
Department of Computing
Imperial College London, UK
p.marinescu@imperial.ac.uk
Cristian Cadar
Department of Computing
Imperial College London, UK
c.cadar@imperial.ac.uk
ABSTRACT
One of the distinguishing characteristics of software systems
is that they evolve: new patches are committed to software
repositories and new versions are released to users on a
continuous basis. Unfortunately, many of these changes
bring unexpected bugs that break the stability of the system
or aāµect its security. In this paper, we address this problem
using a technique for automatically testing code patches.
Our technique combines symbolic execution with several
novel heuristics based on static and dynamic program anal-
!#$%'$!(
Figure 1: KATCH is integrated in the software
48. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Results
ā¢ Patch Testing: Reach changed statements
ā¢ State-of-the-art in patch testing
ā¢ KATCH (based on Klee symbolic exec. tool)
ā¢ Patch Coverage (#changed BBs reached)
is unrelated to the concept [32]. We make a conscious eort to
explain the observed phenomena and distinguish conceptual from
technical origins. Moreover, we encourage the reader to consider
the perspective of a security researcher who is actually handling
these tools to establish whether there exists a vulnerability.
5.1 Patch Coverage
We begin by analyzing the patch coverage achieved by both K
and AFLG as measured by the number of previously uncovered
basic blocks that were changed in the respective patch.
Table 1: Patch coverage results showing the number of previ-
ously uncovered targets that K and AFLG could cover
in the stipulated time budget, respectively.
#Changed #Uncovered
Basic Blocks Changed BBs K AFLG
Binutils 852 702 135 159
Diutils 166 108 63 64
Sum 1018 810 198 223
complex inp
executed onl
for certain a
dened stru
higher-quali
approach [2
To unders
we investiga
we can see
cannot cove
cover. We at
vidual streng
dicult con
be dicult t
quickly exp
stuck in a pa
AFLG and
282 targets
cover indi
49. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Results
ā¢ Patch Testing: Reach changed statements
ā¢ State-of-the-art in patch testing
ā¢ KATCH (based on Klee symbolic exec. tool)
ā¢ Patch Coverage (#changed BBs reached)
ā¢ While we would expect Klee to take a substantial lead, āØ
AFLGo outperforms KATCH in terms of patch coverage.
is unrelated to the concept [32]. We make a conscious eort to
explain the observed phenomena and distinguish conceptual from
technical origins. Moreover, we encourage the reader to consider
the perspective of a security researcher who is actually handling
these tools to establish whether there exists a vulnerability.
5.1 Patch Coverage
We begin by analyzing the patch coverage achieved by both K
and AFLG as measured by the number of previously uncovered
basic blocks that were changed in the respective patch.
Table 1: Patch coverage results showing the number of previ-
ously uncovered targets that K and AFLG could cover
in the stipulated time budget, respectively.
#Changed #Uncovered
Basic Blocks Changed BBs K AFLG
Binutils 852 702 135 159
Diutils 166 108 63 64
Sum 1018 810 198 223
complex inp
executed onl
for certain a
dened stru
higher-quali
approach [2
To unders
we investiga
we can see
cannot cove
cover. We at
vidual streng
dicult con
be dicult t
quickly exp
stuck in a pa
AFLG and
282 targets
cover indi
50. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Results
ā¢ Patch Testing: Reach changed statements
ā¢ State-of-the-art in patch testing
ā¢ KATCH (based on Klee symbolic exec. tool)
ā¢ Patch Coverage (#changed BBs reached)
ā¢ While we would expect Klee to take a substantial lead, āØ
AFLGo outperforms KATCH in terms of patch coverage.
ā¢ BUT: Together they cover 42% and 26% āØ
more than AFLGo and KATCH individually. āØ
They complement each other!āØ
oject Tools diff, sdiff, diff3, cmp
ogram Size 42,930 LoC
n Commits 175 commits from Novā09āMayā12
GNU Diutils
oject Tools
addr2line, ar, cxxfilt, elfedit, nm,
objcopy, objdump, ranlib, readelf
size, strings, strip
ogram Size 68,830 LoC + 800kLoC from libraries
n Commits 181 commits from Aprā11āAugā12
GNU Binutils
K ā 59 139 84 ā AFLG
51. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Results
ā¢ Patch Testing: Reach changed statements
ā¢ State-of-the-art in patch testing
ā¢ KATCH (based on Klee symbolic exec. tool)
ā¢ Patch Coverage (#changed BBs reached)
ā¢ Vulnerability Detection (#errors discovered)
As future work, we are planning to integrate symbolic-execution-
based directed whitebox fuzzing and directed greybox fuzzing to
achieve a directed fuzzing technique that is both very eective and
very ecient in terms of reaching pre-specied target locations.
We believe that such an integration would be superior to each
technique individually. An integrated directed fuzzing technique
that leverages both symbolic execution and search as optmization
problem would be able to draw on their combined strengths to
mitigate their indiviual weaknesses. Driller [38] is an example of
an (undirected) fuzzer that integrates the AFL greybox fuzzer and
the K whitebox fuzzer.
5.2 Vulnerability Detection
Table 2: Showing the number of previously unreported bugs
found by AFLG (reported Aprā2017) in addition to the num-
ber of bug reports for K (reported Febā2013).
K AFLG
#Reports #Reports14 #New Reports #CVEs
Binutils 7 4 12 7
Diutils 0 N/A 1 0
Sum 7 4 13 7
AFLG found 13 previously unreported bugs in addition to 4 of
Table 3: Bug rep
Report-ID
Binutils
21408
21409
21412
21414
21415
21417
Diutils http://lists.g
We attribute m
to the eciency of
program analysis a
magnitute more in
is the runtime che
error detection mec
the program when
we instrumented o
Sanitizer (ASAN). I
e.g., by reading be
writing memory th
a SEGFAULT even
fuzzer uses this sig
for a generated in
(i.e., interprets) th
and uses constrain
52. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Results
ā¢ Patch Testing: Reach changed statements
ā¢ State-of-the-art in patch testing
ā¢ KATCH (based on Klee symbolic exec. tool)
ā¢ Patch Coverage (#changed BBs reached)
ā¢ Vulnerability Detection (#errors discovered)
ā¢ AFLGo found 13 previously unreported bugs (7 CVEs) āØ
in addition to 4 of the 7 bugs that were found by KATCH.
As future work, we are planning to integrate symbolic-execution-
based directed whitebox fuzzing and directed greybox fuzzing to
achieve a directed fuzzing technique that is both very eective and
very ecient in terms of reaching pre-specied target locations.
We believe that such an integration would be superior to each
technique individually. An integrated directed fuzzing technique
that leverages both symbolic execution and search as optmization
problem would be able to draw on their combined strengths to
mitigate their indiviual weaknesses. Driller [38] is an example of
an (undirected) fuzzer that integrates the AFL greybox fuzzer and
the K whitebox fuzzer.
5.2 Vulnerability Detection
Table 2: Showing the number of previously unreported bugs
found by AFLG (reported Aprā2017) in addition to the num-
ber of bug reports for K (reported Febā2013).
K AFLG
#Reports #Reports14 #New Reports #CVEs
Binutils 7 4 12 7
Diutils 0 N/A 1 0
Sum 7 4 13 7
AFLG found 13 previously unreported bugs in addition to 4 of
Table 3: Bug rep
Report-ID
Binutils
21408
21409
21412
21414
21415
21417
Diutils http://lists.g
We attribute m
to the eciency of
program analysis a
magnitute more in
is the runtime che
error detection mec
the program when
we instrumented o
Sanitizer (ASAN). I
e.g., by reading be
writing memory th
a SEGFAULT even
fuzzer uses this sig
for a generated in
(i.e., interprets) th
and uses constrain
53. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Results
ā¢ Crash Reproduction: Exercise stack trace
ā¢ State-of-the-art in crash reproduction
ā¢ BugRedux (based on Klee symbolic exec. tool)
ā¢ Experimental Setup
ā¢ Reuse original BugRedux-benchmark
ā¢ Determine whether or not crash can be reproduced
Software
developer
Instrumenter
Application Instrumented
application
Crash report
(execution data)
Analyzer
BugRedux
Test input
Debugging
tool
Software
tester
Figure 4. Intuitive high-level view of BUGREDUX.
Test input
Oracle
54. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Results
ā¢ Crash Reproduction: Exercise stack trace
ā¢ State-of-the-art in crash reproduction
ā¢ BugRedux (based on Klee symbolic exec. tool)
ā¢ Experimental Setup
ā¢ Reuse original BugRedux-benchmark
ā¢ Determine whether or not crash can be reproduced
experimental setup, starting conguration, and timeouts. BugRedux
[18] is a directed whitebox fuzzer based on K, takes as input a
sequence of program statements, and generates as output a test case
that exercises that sequence and crashes the program. It was shown
that BugRedux works best of the complete method-call sequence is
provided that lead to the crash. However, as discussed earlier often
only the stack-trace is available, which does not contain methods
that have already āreturnedā, i.e., nished execution. Hence, for our
comparison, we set the method-calls in the stack trace as targets.
Despite our request for all subjects from the original dataset, only
a subset of nine subjects could be located for us. For two subjects
(exim, xmail), we could not obtain the stack-trace that would specify
the target locations. Specically, the crash in exim can only be
reproduced on 32bit architectures while the crash in xmail overows
the stack such that the stack-trace is overridden. The results for the
remaining seven subjects are shown in Table 6.
Table 6: Bugs reproduced for the original BugRedux subjects
Subjects BugRedux AFLG Comments
sed.fault1 7 7 Takes two les as input
sed.fault2 7 3
grep 7 3
gzip.fault1 7 3
gzip.fault2 7 3
ncompress 3 3
polymorph 3 3
same benchmarks that th
in the original papers [18
The second concern is
a study minimizes system
nal validity for fuzzer exp
However, for our experi
was readily available, suc
Binutils and the K ex
OSS-Fuzz experiments, a
classically provides for th
when comparing two fuzz
seed corpus such that bot
Second, like implementati
faithfully implement the
shown in the comparison
The third concern is co
a test measures what it c
note that results of tool co
grain of salt. An empirical
implementations of two c
selves. Improving the ec
fuzzer may only be a que
lated to the concept [32].
explain the observed phen
technical origins. Moreov
the perspective of a secur
these tools to establish w
AFLGo reproduces āØ
3 times more crashes!
55. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Result Summary
ā¢ Our directed greybox fuzzer (AFLGo) outperforms āØ
symbolic execution-based directed fuzzers (KATCH BugRedux)
ā¢ in terms of reaching more target locations and
ā¢ in terms of detecting more vulnerabilities,
ā¢ on their own, original benchmark sets.
56. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Conclusion
Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Directed Fuzzing: classical constraint satisfaction prob.
ā¢ Program analysis to identify program paths that reach given
program locations.
ā¢ Symbolic Execution to derive path conditions for any of the
identiļ¬ed paths.
ā¢ Constraint Solving to ļ¬nd an input that
ā¢ satisļ¬es the path condition and thus
ā¢ reaches a program location that was given.
RequiresāØ
heavy-weightāØ
machinery!
Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Overview
ā¢ Directed Fuzzing as optimisation problem!
1. Instrumentation Time:
1. Extract call graph (CG) and control-ļ¬ow graphs (CFGs).
2. For each BB, compute distance to target locations.
3. Instrument program to aggregate distance values.
2. Runtime, for each input
1. collect coverage and distance information, and
2. decide how long to be fuzzed based on distance.
ā¢ If input is closer to the targets, it is fuzzed for longer.
ā¢ If input is further away from the targets, it is fuzzed for shorter.
Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Integrating Simulated Annealing as power schedule
ā¢ In the beginning (t = 0min), āØ
assign the same energyāØ
to all seeds.
ā¢ Later (t=10min), assignāØ
a bit more energy toāØ
seeds that are closer.
ā¢ At exploitation (t=80min),āØ
assign maximal energy toāØ
seeds that are closest.
Directed Fuzzing as āØ
Optimisation Problem
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Distance d(s,Tb)
Energyp(s,Tb)
t = 0min t =10min t = 80min
0.00
0.25
0.50
0.75
1.00
0 20 40 60 80
Current time t (in min)
Energyp(s,Tb)
d = 1 d = 0.5 d = 0
Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Result Summary
ā¢ Our directed greybox fuzzer (AFLGo) outperforms āØ
symbolic execution-based directed fuzzers (KATCH BugRedux)
ā¢ in terms of reaching more target locations and
ā¢ in terms of detecting more vulnerabilities,
ā¢ on their own, original benchmark sets.
*Tool comparisons should always be taken with a grain of salt.
*
57. Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Conclusion
Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Motivation
ā¢ Directed Fuzzing: classical constraint satisfaction prob.
ā¢ Program analysis to identify program paths that reach given
program locations.
ā¢ Symbolic Execution to derive path conditions for any of the
identiļ¬ed paths.
ā¢ Constraint Solving to ļ¬nd an input that
ā¢ satisļ¬es the path condition and thus
ā¢ reaches a program location that was given.
RequiresāØ
heavy-weightāØ
machinery!
Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Overview
ā¢ Directed Fuzzing as optimisation problem!
1. Instrumentation Time:
1. Extract call graph (CG) and control-ļ¬ow graphs (CFGs).
2. For each BB, compute distance to target locations.
3. Instrument program to aggregate distance values.
2. Runtime, for each input
1. collect coverage and distance information, and
2. decide how long to be fuzzed based on distance.
ā¢ If input is closer to the targets, it is fuzzed for longer.
ā¢ If input is further away from the targets, it is fuzzed for shorter.
Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
ā¢ Integrating Simulated Annealing as power schedule
ā¢ In the beginning (t = 0min), āØ
assign the same energyāØ
to all seeds.
ā¢ Later (t=10min), assignāØ
a bit more energy toāØ
seeds that are closer.
ā¢ At exploitation (t=80min),āØ
assign maximal energy toāØ
seeds that are closest.
Directed Fuzzing as āØ
Optimisation Problem
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Distance d(s,Tb)
Energyp(s,Tb)
t = 0min t =10min t = 80min
0.00
0.25
0.50
0.75
1.00
0 20 40 60 80
Current time t (in min)
Energyp(s,Tb)
d = 1 d = 0.5 d = 0
Presented by Abhik Roychoudhury
Directed Greybox Fuzzing
Result Summary
ā¢ Our directed greybox fuzzer (AFLGo) outperforms āØ
symbolic execution-based directed fuzzers (KATCH BugRedux)
ā¢ in terms of reaching more target locations and
ā¢ in terms of detecting more vulnerabilities,
ā¢ on their own, original benchmark sets.
*Tool comparisons should always be taken with a grain of salt.
*
Questions?