3. 3
Outline
1. Intro & Motivation
2. VM forking nuts & bolts
3. Kernel fuzzing with AFL on Xen
4. Malware fuzzing & Memory replay PoC
5. What’s next & challenges
6. Q&A
4. 4
# whoami
Senior Security Research @ Intel
Maintainer of Xen’s introspection subsystem
Maintainer of LibVMI
– Hypervisor agnostic introspection library (Xen, KVM, Bareflank, etc)
– Lot’s of super convenient APIs to do introspection with
Background in malware research & black-box binary analysis
5. 5
Why fuzzing?
Time-tested approach to software validation
Super simple, very effective
Watch 36c3 “No Source, no problem! High speed binary fuzzing” for a good
intro to fuzzing
Requires some setup & writing a harness
The harder it is to write the harness the less likely it will be done
How do you create coverage trace for the kernel?
How do you recover fast enough for fuzzing to be effective?
6. 6
Kernel fuzzers do exist
syzkaller
– Linux syscall fuzzer with built-in coverage guidance
– https://github.com/google/syzkaller
kAFL
– KVM based using AFL, coverage via Intel PT & PML
– https://github.com/RUB-SysSec/kAFL
Chocolate milk
– Custom bootloader & hypervisor, all in rust
– https://github.com/gamozolabs/chocolate_milk
7. 7
Why make another one?
All of these platforms are very tightly coupled to their use-case
We wanted something stable but also flexible to build on
Preferring code that’s upstream to cut down on time it takes to maintain custom
patches & debugging things when they break
Xen’s VMI subsystem is still experimental but fits the bill
Also allows us to consider new types of fuzzing approaches
Also allows us to target new use-cases
– Malware fuzzing!
8. 8
Why VM forking?
We need a way to restore VMs to a start point quickly after each fuzz cycle
Restoring from a save-file can take up to 2s
Even from a fast SSD or tmpfs
Fuzzing to be effective we need to be faster then that
Xen has a long-forgotten, half abandoned subsystem:
– Memory sharing!
Should be possible to use it to create forks in a fast & lightweight manner
9. 9
Memory sharing code archeology
First implemented by Citrix in 2009
Fairly active development until ~2012
Pretty much abandoned afterwards
As expected, had some bit-rot over the years
But for the most part it still “just works”!
10. 10
Memory sharing
1. Enable memory sharing for each participating domain
2. Nominate a page for sharing
– Page ownership transferred to the dom_cow domain
– Page is marked read-only in the original domain’s p2m (ie. EPT)
3. Multiple domains can now map this shared page
– Page contents are NOT checked, this is not KSM!
4. When EPT faults due to write-access, deduplicate page for the faulting
domain and update p2m to point to the new page
5. When no domain left that uses the shared page its released from dom_cow
11. 11
Memory management in Xen
The p2m is only for managing the domain’s view of its memory
There are pages invisible to the guest but it still “owns them”
The domain struct maintains a linked_list of all pages
How does Xen know when it’s safe to release a page?
– The actual domain is not the only one that may map it
– QEMU also needs to have access (in dom0, or a stubdom)
– Xen may also map pages itself (shared_info, vcpu_info_page)
A shared page may also be mapped into dom0!
12. 12
Memory management in Xen
The solution: every time a page is mapped by anything its reference counted
Only safe to release when reference count is 0
Pages are also typed separately from the p2m
– See full list in xen/include/asm-x86/mm.h
Surprisingly little documentation on what these types and flags do
– Or how they are even stored for the page
Who holds the reference is also not kept, makes debugging things hard
– Pages can only be made sharable if their reference count is 1
13. 13
VM forking
1. Create domain with an empty p2m
2. Specify its parent
3. Copy vCPU parameters from parent (& some other stuff)
4. When domain is resumed, it will page-fault
5. Populate pages on-demand in the page-fault handler
– Read & execute accesses are populated with a shared entry
– Write accesses are deduplicated
14. 14
VM forking: allocate metadata & copy vCPU
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
Copy
15. 15
Populate fork VM memory when MMU faults
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
fault
<n/a>
Read/Exec?
Share entry
16. 16
Populate fork VM memory when MMU faults
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
{sharedX}
fault
Write?
Deduplicate
17. 17
Populate fork VM memory when MMU faults
Forked VM
Metadata
vCPU
context
Parent VM
(Windows/Linux)
Metadata
vCPU
context
Memory
pages
<pageX>
<pageY>
<n/a>
<n/a>
{sharedX}
<pageZ>
21. 21
VM forking
It’s different then fork() on Linux
The parent domain currently has to remain paused while forks are active
– This was fine for our use-case
– For a full domain split, all the parent pages need to be made shared
– Pages that can’t be made shared would need an extra copy
– Doable, was out-of-scope for now
Forks can be further forked!
– Pages are searched for through the whole chain
22. 22
VM forking without a device model
It’s possible to create a fork without the QEMU backend
Launching QEMU is slow & there is no reset operation for the QEMU state
The fork can execute with just CPU & memory assigned!
At least some parts of the fork can
Usually when fuzzing we are exercising very specific code locations
Perfect for that use-case
No interrupts!
Fully functional VMI interface
24. 24
VM forking with an IOMMU
We wanted to fuzz the kernel and kernel modules
– Device drivers!
Without real hardware present initializing the code that handles it is hard
Let’s pass the device through with an IOMMU and let everything initialize
Code is now in fully functional state
When we fork, the device stays with the parent
The fork still has fully functional fully initialized kernel code to play with!
Way easier then having to transplant memory or hand-crafting the init
25. 25
Fuzzing with AFL
Another benefit of VM forks is that we can have many of them
– All running simultaneously on different cores
– Each can be created / destroyed / reset independently
– Fully utilize all your hardware!
So let’s put it all together with AFL
– Pause parent VM when it executes magic CPUID (leaf 0x13371337)
– End of code needs to be marked with another magic CPUID
– Fork & breakpoint kernel crash handlers (oops, panic, etc)
– Run!
27. 27
Coverage guidance
We can use VMI to trace the execution
– MTF single-stepping would be way too slow for fuzzing
1. Disassemble code from the start and breakpoint next control-flow instruction
2. When breakpoint executes, record location in coverage map
3. Remove breakpoint & enable single-step
4. Execute one instruction, record location & disable singlestep
5. GOTO 1.
30. 30
Fuzzing malware!
Exercise binary to explore it’s available execution paths
Replace detection of “crash” with “malicious behavior”
Side-step reliance on anti-anti-analysis tricks
Gain confidence in results through large number of executions
Automate & scale
31. 31
Fuzzing malware?
No source-code & debug data
Fuzzers are normally limited to ring3
Binary obfuscation & modular decryption
Encrypted communication
Scalability & containment
What is the “input” we fuzz?
32. 32
How do we approach this?
Complexity is the bane of security
Complexity involves assumptions
Malware loves breaking our assumptions
We need to keep it simple
Our fuzzing system needs to “just work” on anything we throw at it
33. 33
RAM
Key insight: all applications rely on memory
Inducing hardware-faults in memory has been shown to be an effective offensive
technique: Rowhammer!
We could use the same technique for fuzzing
Except we don’t have to actually hammer the RAM, we can virtualize it
35. 35
We can do this!
1. Trap VM memory accesses to a hypervisor using EPT permissions
2. Fork the VM
3. Fuzz memory content in the VM fork
4. Resume VM fork & observe execution
5. Reset fork
6. Rinse & repeat
36. 36
How to fuzz RAM of unknown binary?
Random binary is making accesses to memory
Purpose & context unknown
We can mutate the memory contents
We can do totally random values
We can mix & match
Is this going to be effective?
37. 37
Memory replay
Key insight: memory values read or written by an application are for the most part
meaningful for the application
Replay attack is an effective offensive security technique: valid data is
maliciously or fraudulently repeated or delayed
1. Record memory values being accessed, replay them for future accesses
2. Don’t hardcode addresses
3. Don’t hardcode values
4. Dead simple
38. 38
PoC released as open-source (MIT)
https://xenbits.xen.org/git-http/people/tklengyel/memory-replay.git
42. 42
Why we care about malware?
At IAGS Security, Privacy & Mitigations we do
- Pen Testing
- Software SAFE: secure architecture review
Both tasks require up-to-date knowledge on security issues
– How do you keep up & prioritize them?
– Knowing what interfaces are being attacked and how would help
Third party binaries
– Do we know if any of them have hidden capabilities (debug/trojan/etc)?
43. 43
What we do today
CVEs, conferences, academic publications, blogposts, Twitter, etc.
– Ad-hoc, arbitrary, “shiny new thing” bias
Manual reverse engineering, source-code review
– Doesn’t scale, limited in scope
Fuzzing
– Mostly ring3 only, creating harness requires expert knowledge
44. 44
What we need
We should understand what is being attacked
We should understand how it is being attacked
We should focus on hardening those components to maximize ROI
We should be able to tell when something new appears
We should get ahead of the curve
We need DATA
45. 45
Why is that hard?
Malware fights back
Malware authors want to protect their investment
Longer the malware can spread & run the better the ROI
Static fingerprinting has long been broken
Reverse engineering everything is not feasible
46. 46
Dynamic analysis state-of-the-art
Some of the analysis systems are emulation based
Most recent systems are virtualization based
Most try to be stealthy to trick the malware into executing as it would in its actual
target environment
Large collection of anti-anti-analysis tricks
47. 47
Dynamic analysis state-of-the-art
Dynamic malware analysis systems are inherently limited
check_if_malware(random_binary) == halting problem
The Engineer’s Proof by Induction: “If it’s not malware after 1 minute of
execution, and it’s not malware after 2 minute of execution, …, then it’s not
malware”
¯_(ツ)_/¯
See Detecting traditional packers, decisively, D. Bueno, K. J. Compton, K. A. Sakallah and M. Bailey, RAID 2013.
48. 48
Dynamic analysis state-of-the-art
Current automated malware analysis systems are only as good as their
understanding of the tricks that hide/delay malicious behavior
“malware can determine that a system is an artificial environment and not a real
user device with an accuracy of 92.86%”
(⋋▂⋌)
https://www.first.org/resources/papers/conf2017/Countering-Innovative-Sandbox-Evasion-Techniques-Used-by-Malware.pdf
Spotless sandboxes: Evading malware analysis systems using wear-and-tear artifacts. Miramirkhani et al., IEEE S&P 2017
51. 51
Malware fuzzing
Fuzz known malicious binaries to find bugs
– Find botnet “kill-switch”
– Find bugs in c2c communication to take it offline
– Aid reverse engineering
– Make fun of malware
Cool stuff
Not what we are after
52. 52
Malware detection using fuzzing
Fuzz unknown binary to detect known malware
– See if anything gets dropped while fuzzing that triggers on VirusTotal
– Monitor memory with YARA sigs
– Check for known IOCs
Cool stuff
Still not what we are after
53. 53
Behavior modeling using fuzzing
Fuzz unknown binary to build a behavior model
– Detect hidden capabilities
– Detect capabilities that would never trigger under normal circumstances
– Perform similarity match of behavior model
– Detect unknown (buzz-word-alert: 0day) malware
Very cool stuff
That’s what we’ll talk about today
54. 54
A simple test case
Replace magic_string with magic_string2 on-the-fly using the hypervisor!
69. 69
This must be a valid address
Printed backwards due to
system endianness
0x5567a9b5c72d
This must be a ret!
70. 70
This must be a valid address
Printed backwards due to
system endianness
0x5567a9b5c72d
And it is executed shortly after!
This must be a ret!
71. 71
This must be a ret!
This must be a valid address
Printed backwards due to
system endianness
0x5567a9b5c72d
And it is executed shortly after!
We just smashed the stack!
72. 72
Could it be that we smashed
something in memcmp?
There are some function calls
made
73. 73
Could it be that we smashed
something in memcmp?
There are some function calls
made
No, this isn’t it, OP_T_THRES
is defined as 8
We specifically called
memcmp with a len of 7!
No other function calls are
made by memcmp
75. 75
Something executes between test() and memcmp()
There isn’t anything there though..
Unless it’s the dynamic linker (ld) kicking in for a late binding!
76. 76
Something executes between test() and memcmp()
There isn’t anything there though..
Unless it’s the dynamic linker (ld) kicking in for a late binding!
77. 77
That explains a lot!
We have smashed the stack of the dynamic loader!
That’s why we have seen over 200 memory accesses for that extremely tiny
code!
Let’s try again but with resolving imports at load time
• gcc –o test –Wl,-z,now test.c
• Memory accesses drop to 6 R and 3 R/W!
• Fuzzing this new binary results in secret_path being called where we expected
78. 78
VM fork stats
• Forks deployed: 201
• Fuzz iterations executed: 8042
• Highest fork mem use: 13Mb
• Average fork mem use: 683Kb
79. 79
So where are we?
We didn’t get rid of all assumptions
• Target binary must use memory in some way for its CF
• What if multiple memory locations needs magic values in combination?
• AFL’s coverage map is not adequate for malware fuzzing, can be overflown
• We must have a definition of what we consider “malicious”!
• What is and isn’t malicious depends on the context
We now have a metric to measure our “trust”: number of fuzz-cases executed!
• Better then code-coverage since the code isn’t static
80. 80
TODO
• Follow new paths and record memory values to be fed back to the fuzzer
• Actual fuzzing based on the recorded memory values
• Glitching of the registers
• Control-flow path inversion
• Taint-tracking
• Windows support
• Parallel fuzzing.. and more!