SlideShare ist ein Scribd-Unternehmen logo
1 von 85
Downloaden Sie, um offline zu lesen
A New Tracer for Reverse Engineering
Niizh (Section 1b) : Background and Implementation (Work in Progress)
Tsukasa Ooi (@a4lg)
I...
• will introduce the way to make reverse engineering
more efficient ...possibly.
• Possibly ?
– (Nov 2010) Generic OSes don t work currently.
• Sorry for no live demo!
– Some predictions are included.
Related Topics
• Reverse Engineering
– especially dynamic analysis, debuggers and tracers.
• Intel x86 (32-bit) architecture
• Virtualization / Virtual Machine Monitor (VMM)
– Record and Replay
• Intrusion detection and analysis (e.g. honeypots)
• Bug detection (e.g. fuzzing)
Agenda
• Drawbacks of instruction tracers
• New tracing method based on Record and Replay
• Tracing-VMM implementation on x64
• Partial Tests
• (Possible) Practical use of this Tracer
• Challenges
Target Platform
• Intel x86 (16/32-bit) architecture
• PC/AT
• General purpose OSes (Windows, Linux etc...)
Background
Background
Dynamic Analysis
• Analyze running programs
– e.g. By intercepting operations of the program
• Various tools
– Debuggers
• e.g. OllyDbg, IDA Pro...
– Monitors
• Process Monitor, Wireshark...
– Tracers
• API Monitor, OllyDbg, Process Stalker...
• Today, I will talk about so called tracers.
Tracers (1)
• Capture and save the information
associated with specific event.
– Various granularity
• instruction, basic block, function, system call...
• Instruction tracing
– REALLY easy to apply automatic analysis
(like automated-unpacking.)
– If you can trace every internal context
each instruction, it means you can acquire any
information you would like.
Tracers (2)
• But, in early research, I found most of these
instruction tracers have some drawbacks:
– Extremely Slow
• They hook every instruction execution that makes
tracers really slow.
• x10-x1000
– Generate huge amount of data
• Several gigabytes per real-second.
(real-second : 1sec with no-emulation)
• Save many information each instruction.
• Saving information can be also bottleneck.
Tracers (3)
• Can we solve these issues?
• Major Requirements:
– Overhead : <100%
– Size of trace : <5MB/s
• Theme is:
How (did I implement ¦ to make)
VMM-based tracer satisfying these requirements?
Theory ‒ Record and Replay
Theory ― Record and Replay
Record and Replay (0)
• I was going to have independently discovered but:
– I didn t find any documents related before.
• ReVirt : Enabling Intrusion Analysis through Virtual-
Machine Logging and Replay
– I found the new method is a variety of
Record and Replay .
– It is very related and difficult to separate.
• So I m going to describe Record and Replay
with my method.
Record and Replay (1)
• The method have some variety of names:
– VMware calls this Record and Replay
– Logging and Replay Lockstep
• Execution with 2-passes (Record/Replay)
• By focusing on common characteristics of
many machine architectures, it makes
trace output phenomenally small.
– Normally, the input from external hardware
is not so frequent.
• Many architectures can be represented as this model:
– Input (can be null)
– Calculation / Process (+Internal Context)
– Output (can be null)
• Assuming the output is uniquely determined
by internal context (by function g below.)
• zn+1 = f(zn, in)
on+1 = g(zn+1)
Record and Replay (2)
Input
Output
Calc/Proc
+Context
• Saving all information is equivalent to
saving all of internal context (zn).
– The output is not required because we assume
it is uniquely determined by internal context.
• Also save z0 (initial internal context.)
• Function f (equivalent to calculation/process)
must be a mathematic function.
– Same input, same output.
– Not ambiguous.
Record and Replay (3)
Input
Output
Calc/Proc
+Context
• Focusing on dependency
– Input : there are no dependency.
– Calculation / Process (+Context) : depend on input
• Now you can find...
– Internal context only depends on internal state and
the input array. You can recover all of these from that
information.
Record and Replay (4)
Input
Calc/Proc
+Context
• Pass 1 : Record
– Capture and save initial context
– Run the virtual machine
• Accepts input from external hardware.
– Capture and save all inputs
• This does not generate the dump of
internal context but you can recover
it from this small amount
of data.
Record and Replay (5)
Input
Trace log
Calc/Proc
+Context
InitState
• Pass 2 : Replay
– Recover initial context from trace log.
– Run the virtual machine.
• But read trace log to supply input data.
• So it does not accept new hardware inputs.
– Read internal context from
running virtual-machine.
• It is very similar to
Record pass!
Record and Replay (6)
Input
Trace log
Calc/Proc
+Context
InitState
Cons. (1)
• It seems to be just running twice but:
– You have saved trace log so you can run
Replay pass anytime, anywhere, as you want.
• You will extract a part of information from Replay pass.
• If you need more information, you just need to
run Replay pass with different configuration.
– If you need to, you can run Replay pass in parallel.
• You can shorten the automated-analysis.
(Actually, you may encounter the dependency issues.)
Cons. (2)
• (Cont.)
– Two passes are independent.
• Even if you run slow analysis, the Record pass
remains running as before.
• You may use Replay pass to do slow and
verbose analysis which is difficult to apply directly
(such buffer-overflow detection.)
• This method has an affinity for reverse engineering.
– Trace log contains nearly *everything*
happening in the virtual machine!
Real World Example (1)
• VMware Workstation (6 or later)
– Record/Replay feature
• Record execution and you can replay just like
videos and/or you can use it to debug.
– It proprietary and no enough robustness
but this is actually the example implemented
Record and Replay method.
– Trace log : normally 1-10MB/s
Real World Example (2)
• VMware Workstation (6 or later)
– But...
• It s still a VMware .
• There is no enough debug interface.
– If debug interface is well equipped,
you could use it for reverse engineering.
• Other examples:
– ReplayDIRECTOR (Java debugging tool)
– Jockey (http://home.gna.org/jockey/)
• User-mode Recording / Debugging library for Linux
• All deterministic elements can be considered
one type of input but not inefficient.
– Do you want to record many element of null?!
• Classify the type of so called inputs.
– Nondeterministic Input(s)
– Interrupt(s)
• Just a name; they don t represent
its name literally.
Applying to x86 (1)
入力
トレース
計算/処理
+内部状態
初期状態
• Nondeterministic Inputs
– The timing which internal context can be
undetermined can be determined uniquely
(like in instruction in x86.)
– But you cannot determine the actual value
or contents without running it.
– Save actual value or contents.
But don t save its timing.
• We can determine the timing from
recent internal context and interrupts.
Applying to x86 (2.1)
• Interrupts
– The timing is not uniquely predictable.
– And actual content can be nondeterministic.
– In this case, trace the timing. Additionally,
if actual content of interrupt is nondeterministic,
trace it too.
• e.g. Interrupt vector number (hardware interrupt)
• The most important thing is:
– Based on these classification, we have to
classify all elements in the virtual machine.
Applying to x86 (2.2)
• Modeling ― VM-Internal Disk
– Assume the VM-internal disk is reliable and
record initial disk image.
– Almost all elements are deterministic
except interrupts that disk generates.
• The content read is equivalent to
the content last written.
• But timing of ATA interrupt cannot be
predicted strictly so we can consider this interrupt.
Applying to x86 (3.1)
• Modeling ― Mouse, Keyboard, Network
– They are unpredictable/external input.
– The input from the device uses both of
x86 interrupt and I/O port operation.
– Both.
– Network packet you sent are recovered from
the internal context.
Applying to x86 (3.2)
• Modeling ― Time Stamp Counter (CPU)
– The clock count since computer reset
that can be read the value with RDTSC instruction.
– Consider Nondeterministic Input.
– Even if the physical location of the value is inside
the CPU, you should consider these value when
they produce unpredictable results.
• If you could model and consider this deterministic,
the implementation can be inefficient.
• NOT considering this deterministic improves
VM emulation efficiency.
Applying to x86 (3.3)
• Modeling ― CPU exception
– Almost all exceptions are deterministic
including their timing.
• Page Fault occurs because the CPU has
accessed the invalid memory address.
– So this is not even the input.
• Modeling ― Not determinable behavior of CPU
– After some CPU operation, the part of internal context
can be nondeterministic. (Value/behavior is undefined
by the architecture.)
– Consider this Nondeterministic Inputs.
Applying to x86 (3.4)
• Modeling ― Inexact Arithmetic Operation
– Transcendental instruction such as FSINCOS, FATAN
does not define the actual value because
specifying the actual value is very difficult.
– The minimum information that can be used to
recover the original value is considered
Nondeterministic Input.
• Likewise, we have to model *everything*
– Implementation is relatively difficult.
Applying to x86 (3.5)
Applying to x86 (4)
• Considering X nondeterministic?
– Increase number of hooks.
– Trace log get bigger, execution get slower.
– Fewer is great.
• I thought these nondeterministic events are
much, much fewer than normal instructions so
there s no problem.
– But it was wrong.
How do you think?
• Is this instruction deterministic?
XOR edx, edx
– As you know, this instruction just
clears edx register.
– But answer is No.
• Many of normal operations make some part of
internal context nondeterministic.
– IT IS EFLAGS.
The curst of EFLAGS? (1)
• Let s look inside.
– edx IS zero. On the other hand,
EFLAGS.AF is updated to ? .
– Intel s manual says this value is undefined
(can vary.)
xxx......xxx
000......000
x x x x x x
0 0 1 ? 1 0
XOR edx, edx
(next instruction)
OFedx SF ZF AF PF CF
EFLAGS
The curst of EFLAGS? (2)
• This is not the end!
– These frequently used instructions as well.
– According to the profiling, 10-15% of instruction
makes a part of EFLAGS undefined!
0 M M ? M 0 AND, OR, XOR, TEST (Logical Arithmetic)
OF SF ZF PF CFAF
M ? ? ? ? M MUL, IMUL (Multiplication)
? ? ? ? ? ? DIV, IDIV (Division)
? M M ? M ? SHL, SHR, SAL, SAR count (Shift)
The curst of EFLAGS?(3)
• Not much, much fewer at all!
– Even 10% of instructions, the overhead of hooking
cannot be ignored.
– We can choose EFLAGS not to trace .
For instance we can update EFLAGS register to
deterministic value. But...
• Updating flags (POPF) is extremely slow!
• 24-25 clocks in Intel Nehalem MA (Core i7)
– To avoid this problem, we need to
avoid these values to be affected.
The implementation problem (1)
• Public Record and Replay implementation
does not care about this condition!
– They just limit processor model.
If we record the program in processor model A,
we need to replay with the exactly same model.
– Prevents distributed analysis.
– Normally, programs don t depend on these
undefined (nondeterministic) values.
• But technically, 1-bit of nondeterministic value
can cause chaos.
The implementation problem (2)
• What is RIGHT?
– We cannot exactly know which CPU model is right.
– I want to integrate information in one.
No more compatibility/portability problems.
• This is no good for reverse engineering.
– I want robustness!
EFLAGS : Lazy Evaluation (1)
• EFLAGS and programs have these characteristics:
– Over 80% of updated flags are just discarded.
• We want to trace *everything*. but it is
worthless to trace the value that is not used at all.
– Updating/Evaluating flags are
adjacent in most cases.
• e.g. Compare → Jump Conditionally
• Intel do this optimization! (Macro-Fusion)
– How about lazy evaluation?
• Trace nondeterministic EFLAGS value
when it is used.
EFLAGS : Lazy Evaluation (2)
• Current Implementation:
– JIT compiling with static evaluation
(to make programs run faster.)
– Evaluate each instruction block
• From the instruction after some jump operation
to the unconditional jump (instruction/exception).
• Scan each block forward.
– Evaluate propagation of virtual EFLAGS.
• Deterministic or not (Initial Value : No)
• Last instruction that updated flag value.
• We use heuristics.
EFLAGS : Lazy Evaluation (3)
• (cont.)
– If the instruction in the block depends on these
flags and virtual flags satisfy the condition below,
we just consider this value nondeterministic.
• The value of virtual flag is nondeterministic.
• The value is deterministic but updated instruction
is too old (32-bytes / 8-instruction or more older.)
• Currently, this is very effective.
– I found almost of all flags are traced
during interrupt handling / context switch.
Record and Replay : Conclusion
• Using Record and Replay , we can decrease
the amount of trace log and trace overhead.
• Using (my) improved method,
we can acquire robust trace log in x86 platform.
Implementation
Implementation
Implementation
• I implement VMM-based tracer.
– To run general purpose OSes.
• But it was not a good idea. Because of its
complexity, I couldn t finalize the VMM (Nov 2010.)
– Using binary translation
• Read guest instruction and transform it
to run on host platform.
– I chose to use x64 platform to implement VMM.
• There s some reason that x64 is good for
binary translation-based x86 emulation.
x86 on x64 (1)
• x64 is a 64-bit extension to x86 architecture.
– AMD, Intel and VIA have x64 extension.
– Very similar instruction format.
– Some extensions:
• Increased general purpose and XMM registers (8→16)
• New addressing modes
(64-bit, RIP [program counter] relative)
• There are many elements that make implementing
binary translation-based VMM.
x86 on x64 (2.1)
• Benefit : 32-bit registers and clamp
– General purpose register format is based on
its original (that shares lower bits.)
• 例 : ax (16-bit), eax (32-bit), rax (64-bit)
– If you run the instruction which destination is
32-bit register, upper 32-bit of corresponding register
is cleared!
0123
0123
4567
1234
MOV eax, 0x01234567
MOV ax, 0x1234
eax
ax
x86 on x64 (2.1)
• Benefit : 32-bit registers and clamp
– General purpose register format is based on
its original (that shares lower bits.)
• 例 : ax (16-bit), eax (32-bit), rax (64-bit)
– If you run the instruction which destination is
32-bit register, upper 32-bit of
corresponding register is cleared!
01234567
00000000
89abcdef
12345678
MOV rax, 0x0123456789abcdef
MOV eax, 0x12345678
rax
eax
x86 on x64 (2.2)
• Benefit : Increased Registers (GPR/XMM)
– 8→16 (16 additional register including XMM regs.)
– Save emulator s context without
destroying the existing registers.
rax r8
rcx r9
rdx r10
rbx r11
rsp r12
rbp r13
rsi r14
rdi r15
xmm0 xmm8
xmm1 xmm9
xmm2 xmm10
xmm3 xmm11
xmm4 xmm12
xmm5 xmm13
xmm6 xmm14
xmm7 xmm15
x86 on x64 (2.2)
• Benefit : Increased Registers (GPR/XMM)
– 8→16 (16 additional register including XMM regs.)
– Save emulator s context without
destroying the existing registers.
eax cs.base
ecx es.base
edx emuinfo
ds.base ebx
stack esp
ebp tmp2
esi ss.base
tmp1 edi
xmm0 fs.base
xmm1 gs.base
xmm2 tmp3
xmm3 tmp4
xmm4 notused
xmm5 notused
xmm6 notused
xmm7 notused
Actual register mapping table.
For memory/cache optimization,
some registers are relocated.
x86 on x64 (2.2)
• Benefit : Increased Registers (GPR/XMM)
– 8→16 (16 additional register including XMM regs.)
– Save emulator s context without
destroying the existing registers.
– XMM registers are difficult to use sometime
but we can transfer to GPR using movq instruction.
x86 on x64 (2.3.1)
• Benefit : Remained Addressing Format
– Some addressing modes are added but
still x86-based addressing format.
– x86 have complex addressing mode:
• Like 2-add, 1-shift : [esi+edx*4+123]
• We can use it to separate memory access!
– Address Translation : [segbase+offset]
• All memory access if segbase-relative.
(segbase contains 64-bit address of segment base.)
– Achieving Memory Isolation
• Like Google Native Client for x64
x86 on x64 (2.3.2)
• Benefit : Remained Addressing Format
– (e.g. 1) : inc [ds:ecx] → inc [rbx+rcx]
• rbx : Base address of DS segment.
• rcx : Guest ECX register.
– Wait a minute, ecx register is 32-bit but
using rcx register that is 64-bit register!
(You sure that way?)
• No problem. As I described before,
result of 32-bit operations are also clamped.
• We can guarantee that the value of
rcx is in the 32-bit range (0x0000_0000-0xffff_ffff.)
x86 on x64 (2.3.3)
• Benefit : Remained Addressing Format
– (e.g. 2 [wrong]) : inc [ds:ecx+edx] → inc [rbx+rcx+rdx]
• Store intermediate result to temporary register.
– (e.g. 2 [correct]) : inc [ds:ecx+edx] →
lea edi, [rcx+rdx] ; inc [rbx+rdi]
• edi/rdi : Temporary register
• Almost same as first example.
– I ll take the best encoding x64 have.
• Store 64-bit address to 32-bit register!
• This is also a valid encoding. Address is automatically
clamped and instruction is shortened.
x86 on x64 (2.4.1)
• Benefit : Huge Memory Range
– 64-bit address width
• Valid 48-bit (sign extended) logical address.
• 0x0000_1234_5678 → 0x0000_0000_1234_5678
• 0x8000_1234_5678 → 0xffff_8000_1234_5678
– We can place the data/code that VMM uses
outside the guest accessible region.
• Looking x86 on x86, it needed address compression
to store host/guest data in same address space.
• Increases VMM speed.
x86 on x64 (2.4.2)
• Benefit : Huge Memory Range
– But allocating just 4GB is not enough.
The result of address calculation can over/underflow.
– On 32-bit mode on x86, address calculation is
done by 32-bit precision and overflow/underflow
is ignored. It means lower 32-bits is equivalent
to actual accessed memory address.
– So, we modify the page table to satisfy:
lower 32-bits are equivalent == same physical address
x86 on x64 (2.4.3)
• Benefit : Huge Memory Range
– Allocate virtual memory region.
– Considering address overflow, we allocate
up to 44.5GB range of virtual memory.
• Red and Blue areas point exactly same physical region.
• We use page table to achieve.
44.5GB
42.25GB
2.25GB
x86 on x64 (2.4.4)
• Benefit : Huge Memory Range
– Allocate virtual memory region each
segment and/or segment access control.
• On segment switch, just change base address.
cs.base
ds.base
es.base
ss.base
data3
code0
data3
code3
x86 on x64 (2.4.4)
• Benefit : Huge Memory Range
– Allocate virtual memory region each
segment and/or segment access control.
• On segment switch, just change base address.
cs.base
ds.base
es.base
ss.base
data3
code0
data3
code3
x86 on x64 (2.5.1)
• Benefit : Simplified Architecture
– Architecture of x64 is relatively simplified
which makes implementing Type-2 VMM easier.
• Only two interrupt handler types:
– Interrupt Gate and Trap Gate
• Now segment is a mere façade.
– Flat memory model for CS, DS, ES and SS.
– Replacing IDT (interrupt vector) to
allocate VM-specific context.
• PatchGuard compatible!
• Nearly stealth but cannot hook system calls.
x86 on x64 (2.5.2)
• Benefit : Simplified Architecture
– Pass-through the interrupts
• We can do it safely with IDT switching.
• There s some overhead.
VM OS
Actually implementation is a bit more complicated
but I show the summary.
IDT switch
IDT switch
OS Kernel
VM Trampoline
OS IntHandler
VM Entry
VM IntHandler
VM Kernel
x86 on x64 (3)
• Using these techniques, implement
binary translation.
– But currently, it is still incomplete.
• To trace the timing, the following
information is required.
– Value of branch counter
(software implementation is possible.)
– Current program counter (IP, EIP)
– Repeat count (CX, ECX)
• only when rep instruction was executing.
Everything into the Ring-0
• Is privilege isolation required?
– Dynamic code is generated safely and
well isolated; enabling run everything in
the kernel-mode (Ring-0.)
• Low-overhead implementation.
• Current implementation do it.
– If this is dangerous behavior, you can also
run the code on user-mode (Ring-3.)
Tests
Verification
Trace size test (1.1)
• Trace log size required
– DLX Linux bundled Bochs 2.45
• From computer reset until login screen.
• 52,217,403 instructions (no-emulation : 53 sec)
– Specs
• 1 MIPS (1,000,000 instructions/sec)
• 32MB MEM, 10MB HDD
– Use Bochs to generate instruction/memory trace
and convert using specific methods.
Trace size test (1.2)
• Trace log size required
– Size of initial context is not included.
– Modeled devices in Bochs emulator
and estimated the size of trace log required.
– Due to simplified model, the size
is only estimated (not exact value.)
Trace size test (1.3)
• Methods (comparison included)
– Raw
Text-format instruction/memory trace generated by Bochs.
– Verbose
Normal tracer (like OllyDbg does)
– Dumb
Record and Replay plus memory monitoring.
– RnR (1)
Record and Replay (tracing EFLAGS)
– PROPOSAL
Improved Record and Replay method
– RnR (2)
Record and Replay (IGNORING EFLAGS)
Trace size test (2.1)
Method Size (bytes)
Raw 7,178,948,236 6.68GB
Verbose X > 419,430,400 400MB
Dumb 60,713,538 57.90MB
RnR (1) 6,932,542 6.61MB
PROPOSAL 389,013 380KB
RnR (2) 31,788 31KB
This table shows PROPOSAL generates only 1/1,000 of trace log
than Verbose tracer. Record and Replay method (ignoring EFLAGS)
is smaller than PROPSAL but it has low portability.
Trace size test (2.2)
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
10,000,000,000
Size (bytes)
Trace size test (2.3)
• Conclusion
– This result didn t come from actual implementation
so there is some suspicious points.
– Despite of this, the proposal method generates
really small trace log compared to old methods.
Overhead tests
0
10
20
30
40
min max
without Tracer
with Tracer
Possible Practical Uses
Application
Possible Practical Uses (1)
• Reverse Engineering (non-Malware)
– Everything *worked* is everything *recorded*
• All your program are belong to us!
• Programs behavior is recorded,
including VM detection and/or anti-debugging.
– Of course program is unpacked/decrypted.
• You can integrate multiple analysis.
Possible Practical Uses (2)
• Avoiding Anti-debugging/Anti-VM
– No well-known backdoor.
– But binary translation based VM can be detected
by running specific code.
• e.g. Self-modifying code is (extremely) slow.
– You can find how VM is detected.
At least, you can extract useful information to
avoid VM detection.
• Protection of normal program is not so strong.
Possible Practical Uses (3)
• Reverse Engineering (Malware)
– It is DANGEROUS to run malware directly!
– However, if you can take care of these problems,
this tracer can be useful.
– Honeypots?
Possible Practical Uses (4)
• Fuzzing / Exploit analysis / Bug discovery
– Imagine that Valgrind is applied to all programs
and you can use the guest program interactively.
– By offline-analysis, you can find and track
memory corruption.
– If you can reproduce the issue,
you can extract useful information.
– However, it can be very implementation-dependent
for fuzzing. (efficient or not.)
Possible Practical Uses (5)
• Analysis Support
– Export for other well-known tools.
• e.g. Wireshark
– In this case, you have program s behavior so
you can add metadata and/or supplemental info.
• e.g. SSL/TLS auto decryption
• You cannot steal a key from packet dump but
remember, you can run the program which uses
private (common/shared) key!
Possible Practical Uses (6)
• <<Place Entry Here>>
– I guess you can use for other purposes.
– I hope that many people work best around
these type of tracer.
Future Challenges / Conclusion
Future Challenges / Summary
Challenge : Multicore (1)
• Original Record and Replay is not for
multi-processing environment.
– Many of communications make tracer slow.
– Almost all implementations restricts
1 CPU/thread. (mine, too )
• But, it doesn t mean this is impossible.
– Time-sharing
– Software emulation of MESI protocol
– Trace memory contents
Challenge : Multicore (2)
• Time-sharing
– Only one CPU running simultaneously.
– Switch the CPU execution with timer to
simulate running multiple CPUs.
• Pros.
– Almost no synchronization required.
• Cons.
– More CPUs, less efficiency.
– Difficult to reproduce multi-threading problems
because this is not true multi-procesing.
Challenge : Multicore (3)
• Software Implementation of MESI protocol
– Memory coherency algorithm
– CPU uses this protocol (or its varieties) to
make memory/cache coherent.
– We can implement this using page-level protection.
– Lock the page to write them.
• Pros.
– High efficiency on few shared pages.
• Cons.
– Software implementation is quite slow.
Challenge : Multicore (4)
• Trace Memory contents
– Also trace memory contents read for shared pages.
• Pros.
– Can achieve high efficiency... maybe.
• Cons.
– It is not a perfect-information tracer.
(Which CPU has written this value?!)
– Memory trace is slow.
• Bandwidth monster may be required.
Challenge : 64-bit / Others
• x64 on x64 is very difficult.
– There are some ways but not so efficient.
• SSE2 / Reciprocal, Square root instructions
– Not exact value is required for these instructions
and fast to run it (this is a problem.)
• Hypervisor again?
– Trace without portability and convert it to
portable one (using same processor model.)
– This is not perfect, but possible choice.
CAUTION : PATENTS
• Some of these techniques are patented!
– Record and Replay
– Optimization for Binary Translation based VMM.
– Difficult/Impossible to avoid these patents.
• However, all patents I have founds are
only United States patent and I guess using this
tracer outside US is no problem.
– Be careful.
Conclusion
• I described how to build tracing-VMM for
x86 on x64.
• Using proposal method, trace log gets smaller
and overhead gets lower too.
– However, proper tests (validations) are required
to check whether this is useful for reverse engineering.
• Many of practical uses!
– Some other?
contact me at : li at livegrid dot org
Open Source Project : Niizh
will be available at http://niizh.org/
Thank you!
Any questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Lec 9-os-review
Lec 9-os-reviewLec 9-os-review
Lec 9-os-reviewMothi R
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin敬倫 林
 
CNIT 127 Ch Ch 1: Before you Begin
CNIT 127 Ch Ch 1: Before you BeginCNIT 127 Ch Ch 1: Before you Begin
CNIT 127 Ch Ch 1: Before you BeginSam Bowne
 
Contiki introduction II-from what to how
Contiki introduction II-from what to howContiki introduction II-from what to how
Contiki introduction II-from what to howDingxin Xu
 
CNIT 127 14: Protection Mechanisms
CNIT 127 14: Protection MechanismsCNIT 127 14: Protection Mechanisms
CNIT 127 14: Protection MechanismsSam Bowne
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyPeter Tröger
 
CNIT 127 Lecture 7: Intro to 64-Bit Assembler (not in book)
CNIT 127 Lecture 7: Intro to 64-Bit Assembler (not in book)CNIT 127 Lecture 7: Intro to 64-Bit Assembler (not in book)
CNIT 127 Lecture 7: Intro to 64-Bit Assembler (not in book)Sam Bowne
 
CNIT 127: 3: Shellcode
CNIT 127: 3: ShellcodeCNIT 127: 3: Shellcode
CNIT 127: 3: ShellcodeSam Bowne
 
CNIT 127 14: Protection Mechanisms
CNIT 127 14: Protection MechanismsCNIT 127 14: Protection Mechanisms
CNIT 127 14: Protection MechanismsSam Bowne
 
Peddle the Pedal to the Metal
Peddle the Pedal to the MetalPeddle the Pedal to the Metal
Peddle the Pedal to the MetalC4Media
 
CNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you BeginCNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you BeginSam Bowne
 
CNIT 127: Ch 4: Introduction to format string bugs
CNIT 127: Ch 4: Introduction to format string bugsCNIT 127: Ch 4: Introduction to format string bugs
CNIT 127: Ch 4: Introduction to format string bugsSam Bowne
 
CNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgCNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgSam Bowne
 
Networking Architecture of Warframe
Networking Architecture of WarframeNetworking Architecture of Warframe
Networking Architecture of WarframeMaciej Siniło
 
Real-Time Scheduling Algorithms
Real-Time Scheduling AlgorithmsReal-Time Scheduling Algorithms
Real-Time Scheduling AlgorithmsAJAL A J
 
CNIT 127 Ch 3: Shellcode
CNIT 127 Ch 3: ShellcodeCNIT 127 Ch 3: Shellcode
CNIT 127 Ch 3: ShellcodeSam Bowne
 
Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Pythoninfodox
 
CNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows ProgramsCNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows ProgramsSam Bowne
 
CNIT 127 Ch 4: Introduction to format string bugs (rev. 2-9-17)
CNIT 127 Ch 4: Introduction to format string bugs (rev. 2-9-17)CNIT 127 Ch 4: Introduction to format string bugs (rev. 2-9-17)
CNIT 127 Ch 4: Introduction to format string bugs (rev. 2-9-17)Sam Bowne
 

Was ist angesagt? (20)

Lec 9-os-review
Lec 9-os-reviewLec 9-os-review
Lec 9-os-review
 
Week1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC BeginWeek1 Electronic System-level ESL Design and SystemC Begin
Week1 Electronic System-level ESL Design and SystemC Begin
 
CNIT 127 Ch Ch 1: Before you Begin
CNIT 127 Ch Ch 1: Before you BeginCNIT 127 Ch Ch 1: Before you Begin
CNIT 127 Ch Ch 1: Before you Begin
 
Contiki introduction II-from what to how
Contiki introduction II-from what to howContiki introduction II-from what to how
Contiki introduction II-from what to how
 
CNIT 127 14: Protection Mechanisms
CNIT 127 14: Protection MechanismsCNIT 127 14: Protection Mechanisms
CNIT 127 14: Protection Mechanisms
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - Concurrency
 
CNIT 127 Lecture 7: Intro to 64-Bit Assembler (not in book)
CNIT 127 Lecture 7: Intro to 64-Bit Assembler (not in book)CNIT 127 Lecture 7: Intro to 64-Bit Assembler (not in book)
CNIT 127 Lecture 7: Intro to 64-Bit Assembler (not in book)
 
CNIT 127: 3: Shellcode
CNIT 127: 3: ShellcodeCNIT 127: 3: Shellcode
CNIT 127: 3: Shellcode
 
CNIT 127 14: Protection Mechanisms
CNIT 127 14: Protection MechanismsCNIT 127 14: Protection Mechanisms
CNIT 127 14: Protection Mechanisms
 
Os lectures
Os lecturesOs lectures
Os lectures
 
Peddle the Pedal to the Metal
Peddle the Pedal to the MetalPeddle the Pedal to the Metal
Peddle the Pedal to the Metal
 
CNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you BeginCNIT 127 Ch 1: Before you Begin
CNIT 127 Ch 1: Before you Begin
 
CNIT 127: Ch 4: Introduction to format string bugs
CNIT 127: Ch 4: Introduction to format string bugsCNIT 127: Ch 4: Introduction to format string bugs
CNIT 127: Ch 4: Introduction to format string bugs
 
CNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgCNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbg
 
Networking Architecture of Warframe
Networking Architecture of WarframeNetworking Architecture of Warframe
Networking Architecture of Warframe
 
Real-Time Scheduling Algorithms
Real-Time Scheduling AlgorithmsReal-Time Scheduling Algorithms
Real-Time Scheduling Algorithms
 
CNIT 127 Ch 3: Shellcode
CNIT 127 Ch 3: ShellcodeCNIT 127 Ch 3: Shellcode
CNIT 127 Ch 3: Shellcode
 
Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Python
 
CNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows ProgramsCNIT 126 7: Analyzing Malicious Windows Programs
CNIT 126 7: Analyzing Malicious Windows Programs
 
CNIT 127 Ch 4: Introduction to format string bugs (rev. 2-9-17)
CNIT 127 Ch 4: Introduction to format string bugs (rev. 2-9-17)CNIT 127 Ch 4: Introduction to format string bugs (rev. 2-9-17)
CNIT 127 Ch 4: Introduction to format string bugs (rev. 2-9-17)
 

Ähnlich wie A New Tracer for Reverse Engineering - PacSec 2010

Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Atollic
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
 
Unity - Internals: memory and performance
Unity - Internals: memory and performanceUnity - Internals: memory and performance
Unity - Internals: memory and performanceCodemotion
 
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksKernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksAnne Nicolas
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiTakuya ASADA
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixCodemotion Tel Aviv
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh PPlusOrMinusZero
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource KernelsSilvio Cesare
 
Larson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniquesLarson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniquesScott K. Larson
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
 
Infrastructure as Code, Theory Crash Course
Infrastructure as Code, Theory Crash CourseInfrastructure as Code, Theory Crash Course
Infrastructure as Code, Theory Crash CourseDr. Sven Balnojan
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf ToolsRaj Pandey
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data RacesAzul Systems Inc.
 
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...CanSecWest
 
Efficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode DetectionEfficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode DetectionGeorg Wicherski
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
 

Ähnlich wie A New Tracer for Reverse Engineering - PacSec 2010 (20)

Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
Advanced debugging on ARM Cortex devices such as STM32, Kinetis, LPC, etc.
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
 
Unity - Internals: memory and performance
Unity - Internals: memory and performanceUnity - Internals: memory and performance
Unity - Internals: memory and performance
 
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecksKernel Recipes 2015: Solving the Linux storage scalability bottlenecks
Kernel Recipes 2015: Solving the Linux storage scalability bottlenecks
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
JVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, WixJVM Memory Model - Yoav Abrahami, Wix
JVM Memory Model - Yoav Abrahami, Wix
 
Performance Tuning by Dijesh P
Performance Tuning by Dijesh PPerformance Tuning by Dijesh P
Performance Tuning by Dijesh P
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
 
Larson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniquesLarson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniques
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Advanced Windows Exploitation
Advanced Windows ExploitationAdvanced Windows Exploitation
Advanced Windows Exploitation
 
Infrastructure as Code, Theory Crash Course
Infrastructure as Code, Theory Crash CourseInfrastructure as Code, Theory Crash Course
Infrastructure as Code, Theory Crash Course
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Perf Tools
Linux Perf ToolsLinux Perf Tools
Linux Perf Tools
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data Races
 
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
CSW2017Richard Johnson_harnessing intel processor trace on windows for vulner...
 
Efficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode DetectionEfficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode Detection
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 

Mehr von Tsukasa Oi

Farewell, Stagefright bugs!
Farewell, Stagefright bugs!Farewell, Stagefright bugs!
Farewell, Stagefright bugs!Tsukasa Oi
 
さらば、Stagefright 脆弱性
さらば、Stagefright 脆弱性さらば、Stagefright 脆弱性
さらば、Stagefright 脆弱性Tsukasa Oi
 
Windows をより安全にする SafeSEH on MinGW
Windows をより安全にする SafeSEH on MinGWWindows をより安全にする SafeSEH on MinGW
Windows をより安全にする SafeSEH on MinGWTsukasa Oi
 
リバースエンジニアリングのための新しいトレース手法 - PacSec 2010
リバースエンジニアリングのための新しいトレース手法 - PacSec 2010リバースエンジニアリングのための新しいトレース手法 - PacSec 2010
リバースエンジニアリングのための新しいトレース手法 - PacSec 2010Tsukasa Oi
 
ステルスルートキット : 悪いヤツはどうライブメモリフォレンジックをすり抜ける? - PacSec 2009
ステルスルートキット : 悪いヤツはどうライブメモリフォレンジックをすり抜ける? - PacSec 2009ステルスルートキット : 悪いヤツはどうライブメモリフォレンジックをすり抜ける? - PacSec 2009
ステルスルートキット : 悪いヤツはどうライブメモリフォレンジックをすり抜ける? - PacSec 2009Tsukasa Oi
 
Stealthy Rootkit : How bad guy fools live memory forensics? - PacSec 2009
Stealthy Rootkit : How bad guy fools live memory forensics? - PacSec 2009Stealthy Rootkit : How bad guy fools live memory forensics? - PacSec 2009
Stealthy Rootkit : How bad guy fools live memory forensics? - PacSec 2009Tsukasa Oi
 
Creating Secure VM (Comarison between Intel and AMD, and one more thing...) -...
Creating Secure VM (Comarison between Intel and AMD, and one more thing...) -...Creating Secure VM (Comarison between Intel and AMD, and one more thing...) -...
Creating Secure VM (Comarison between Intel and AMD, and one more thing...) -...Tsukasa Oi
 
Lack of System Registers and two simple anti-forensic attacks - AVTokyo 2009
Lack of System Registers and two simple anti-forensic attacks - AVTokyo 2009Lack of System Registers and two simple anti-forensic attacks - AVTokyo 2009
Lack of System Registers and two simple anti-forensic attacks - AVTokyo 2009Tsukasa Oi
 
システムレジスタの不足と2つのシンプルなアンチフォレンジック攻撃 - AVTokyo 2009
システムレジスタの不足と2つのシンプルなアンチフォレンジック攻撃 - AVTokyo 2009システムレジスタの不足と2つのシンプルなアンチフォレンジック攻撃 - AVTokyo 2009
システムレジスタの不足と2つのシンプルなアンチフォレンジック攻撃 - AVTokyo 2009Tsukasa Oi
 
セキュアVMの構築 (IntelとAMDの比較、あともうひとつ...) - AVTokyo 2009
セキュアVMの構築 (IntelとAMDの比較、あともうひとつ...) - AVTokyo 2009セキュアVMの構築 (IntelとAMDの比較、あともうひとつ...) - AVTokyo 2009
セキュアVMの構築 (IntelとAMDの比較、あともうひとつ...) - AVTokyo 2009Tsukasa Oi
 

Mehr von Tsukasa Oi (10)

Farewell, Stagefright bugs!
Farewell, Stagefright bugs!Farewell, Stagefright bugs!
Farewell, Stagefright bugs!
 
さらば、Stagefright 脆弱性
さらば、Stagefright 脆弱性さらば、Stagefright 脆弱性
さらば、Stagefright 脆弱性
 
Windows をより安全にする SafeSEH on MinGW
Windows をより安全にする SafeSEH on MinGWWindows をより安全にする SafeSEH on MinGW
Windows をより安全にする SafeSEH on MinGW
 
リバースエンジニアリングのための新しいトレース手法 - PacSec 2010
リバースエンジニアリングのための新しいトレース手法 - PacSec 2010リバースエンジニアリングのための新しいトレース手法 - PacSec 2010
リバースエンジニアリングのための新しいトレース手法 - PacSec 2010
 
ステルスルートキット : 悪いヤツはどうライブメモリフォレンジックをすり抜ける? - PacSec 2009
ステルスルートキット : 悪いヤツはどうライブメモリフォレンジックをすり抜ける? - PacSec 2009ステルスルートキット : 悪いヤツはどうライブメモリフォレンジックをすり抜ける? - PacSec 2009
ステルスルートキット : 悪いヤツはどうライブメモリフォレンジックをすり抜ける? - PacSec 2009
 
Stealthy Rootkit : How bad guy fools live memory forensics? - PacSec 2009
Stealthy Rootkit : How bad guy fools live memory forensics? - PacSec 2009Stealthy Rootkit : How bad guy fools live memory forensics? - PacSec 2009
Stealthy Rootkit : How bad guy fools live memory forensics? - PacSec 2009
 
Creating Secure VM (Comarison between Intel and AMD, and one more thing...) -...
Creating Secure VM (Comarison between Intel and AMD, and one more thing...) -...Creating Secure VM (Comarison between Intel and AMD, and one more thing...) -...
Creating Secure VM (Comarison between Intel and AMD, and one more thing...) -...
 
Lack of System Registers and two simple anti-forensic attacks - AVTokyo 2009
Lack of System Registers and two simple anti-forensic attacks - AVTokyo 2009Lack of System Registers and two simple anti-forensic attacks - AVTokyo 2009
Lack of System Registers and two simple anti-forensic attacks - AVTokyo 2009
 
システムレジスタの不足と2つのシンプルなアンチフォレンジック攻撃 - AVTokyo 2009
システムレジスタの不足と2つのシンプルなアンチフォレンジック攻撃 - AVTokyo 2009システムレジスタの不足と2つのシンプルなアンチフォレンジック攻撃 - AVTokyo 2009
システムレジスタの不足と2つのシンプルなアンチフォレンジック攻撃 - AVTokyo 2009
 
セキュアVMの構築 (IntelとAMDの比較、あともうひとつ...) - AVTokyo 2009
セキュアVMの構築 (IntelとAMDの比較、あともうひとつ...) - AVTokyo 2009セキュアVMの構築 (IntelとAMDの比較、あともうひとつ...) - AVTokyo 2009
セキュアVMの構築 (IntelとAMDの比較、あともうひとつ...) - AVTokyo 2009
 

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

A New Tracer for Reverse Engineering - PacSec 2010

  • 1. A New Tracer for Reverse Engineering Niizh (Section 1b) : Background and Implementation (Work in Progress) Tsukasa Ooi (@a4lg)
  • 2. I... • will introduce the way to make reverse engineering more efficient ...possibly. • Possibly ? – (Nov 2010) Generic OSes don t work currently. • Sorry for no live demo! – Some predictions are included.
  • 3. Related Topics • Reverse Engineering – especially dynamic analysis, debuggers and tracers. • Intel x86 (32-bit) architecture • Virtualization / Virtual Machine Monitor (VMM) – Record and Replay • Intrusion detection and analysis (e.g. honeypots) • Bug detection (e.g. fuzzing)
  • 4. Agenda • Drawbacks of instruction tracers • New tracing method based on Record and Replay • Tracing-VMM implementation on x64 • Partial Tests • (Possible) Practical use of this Tracer • Challenges
  • 5. Target Platform • Intel x86 (16/32-bit) architecture • PC/AT • General purpose OSes (Windows, Linux etc...)
  • 7. Dynamic Analysis • Analyze running programs – e.g. By intercepting operations of the program • Various tools – Debuggers • e.g. OllyDbg, IDA Pro... – Monitors • Process Monitor, Wireshark... – Tracers • API Monitor, OllyDbg, Process Stalker... • Today, I will talk about so called tracers.
  • 8. Tracers (1) • Capture and save the information associated with specific event. – Various granularity • instruction, basic block, function, system call... • Instruction tracing – REALLY easy to apply automatic analysis (like automated-unpacking.) – If you can trace every internal context each instruction, it means you can acquire any information you would like.
  • 9. Tracers (2) • But, in early research, I found most of these instruction tracers have some drawbacks: – Extremely Slow • They hook every instruction execution that makes tracers really slow. • x10-x1000 – Generate huge amount of data • Several gigabytes per real-second. (real-second : 1sec with no-emulation) • Save many information each instruction. • Saving information can be also bottleneck.
  • 10. Tracers (3) • Can we solve these issues? • Major Requirements: – Overhead : <100% – Size of trace : <5MB/s • Theme is: How (did I implement ¦ to make) VMM-based tracer satisfying these requirements?
  • 11. Theory ‒ Record and Replay Theory ― Record and Replay
  • 12. Record and Replay (0) • I was going to have independently discovered but: – I didn t find any documents related before. • ReVirt : Enabling Intrusion Analysis through Virtual- Machine Logging and Replay – I found the new method is a variety of Record and Replay . – It is very related and difficult to separate. • So I m going to describe Record and Replay with my method.
  • 13. Record and Replay (1) • The method have some variety of names: – VMware calls this Record and Replay – Logging and Replay Lockstep • Execution with 2-passes (Record/Replay) • By focusing on common characteristics of many machine architectures, it makes trace output phenomenally small. – Normally, the input from external hardware is not so frequent.
  • 14. • Many architectures can be represented as this model: – Input (can be null) – Calculation / Process (+Internal Context) – Output (can be null) • Assuming the output is uniquely determined by internal context (by function g below.) • zn+1 = f(zn, in) on+1 = g(zn+1) Record and Replay (2) Input Output Calc/Proc +Context
  • 15. • Saving all information is equivalent to saving all of internal context (zn). – The output is not required because we assume it is uniquely determined by internal context. • Also save z0 (initial internal context.) • Function f (equivalent to calculation/process) must be a mathematic function. – Same input, same output. – Not ambiguous. Record and Replay (3) Input Output Calc/Proc +Context
  • 16. • Focusing on dependency – Input : there are no dependency. – Calculation / Process (+Context) : depend on input • Now you can find... – Internal context only depends on internal state and the input array. You can recover all of these from that information. Record and Replay (4) Input Calc/Proc +Context
  • 17. • Pass 1 : Record – Capture and save initial context – Run the virtual machine • Accepts input from external hardware. – Capture and save all inputs • This does not generate the dump of internal context but you can recover it from this small amount of data. Record and Replay (5) Input Trace log Calc/Proc +Context InitState
  • 18. • Pass 2 : Replay – Recover initial context from trace log. – Run the virtual machine. • But read trace log to supply input data. • So it does not accept new hardware inputs. – Read internal context from running virtual-machine. • It is very similar to Record pass! Record and Replay (6) Input Trace log Calc/Proc +Context InitState
  • 19. Cons. (1) • It seems to be just running twice but: – You have saved trace log so you can run Replay pass anytime, anywhere, as you want. • You will extract a part of information from Replay pass. • If you need more information, you just need to run Replay pass with different configuration. – If you need to, you can run Replay pass in parallel. • You can shorten the automated-analysis. (Actually, you may encounter the dependency issues.)
  • 20. Cons. (2) • (Cont.) – Two passes are independent. • Even if you run slow analysis, the Record pass remains running as before. • You may use Replay pass to do slow and verbose analysis which is difficult to apply directly (such buffer-overflow detection.) • This method has an affinity for reverse engineering. – Trace log contains nearly *everything* happening in the virtual machine!
  • 21. Real World Example (1) • VMware Workstation (6 or later) – Record/Replay feature • Record execution and you can replay just like videos and/or you can use it to debug. – It proprietary and no enough robustness but this is actually the example implemented Record and Replay method. – Trace log : normally 1-10MB/s
  • 22. Real World Example (2) • VMware Workstation (6 or later) – But... • It s still a VMware . • There is no enough debug interface. – If debug interface is well equipped, you could use it for reverse engineering. • Other examples: – ReplayDIRECTOR (Java debugging tool) – Jockey (http://home.gna.org/jockey/) • User-mode Recording / Debugging library for Linux
  • 23. • All deterministic elements can be considered one type of input but not inefficient. – Do you want to record many element of null?! • Classify the type of so called inputs. – Nondeterministic Input(s) – Interrupt(s) • Just a name; they don t represent its name literally. Applying to x86 (1) 入力 トレース 計算/処理 +内部状態 初期状態
  • 24. • Nondeterministic Inputs – The timing which internal context can be undetermined can be determined uniquely (like in instruction in x86.) – But you cannot determine the actual value or contents without running it. – Save actual value or contents. But don t save its timing. • We can determine the timing from recent internal context and interrupts. Applying to x86 (2.1)
  • 25. • Interrupts – The timing is not uniquely predictable. – And actual content can be nondeterministic. – In this case, trace the timing. Additionally, if actual content of interrupt is nondeterministic, trace it too. • e.g. Interrupt vector number (hardware interrupt) • The most important thing is: – Based on these classification, we have to classify all elements in the virtual machine. Applying to x86 (2.2)
  • 26. • Modeling ― VM-Internal Disk – Assume the VM-internal disk is reliable and record initial disk image. – Almost all elements are deterministic except interrupts that disk generates. • The content read is equivalent to the content last written. • But timing of ATA interrupt cannot be predicted strictly so we can consider this interrupt. Applying to x86 (3.1)
  • 27. • Modeling ― Mouse, Keyboard, Network – They are unpredictable/external input. – The input from the device uses both of x86 interrupt and I/O port operation. – Both. – Network packet you sent are recovered from the internal context. Applying to x86 (3.2)
  • 28. • Modeling ― Time Stamp Counter (CPU) – The clock count since computer reset that can be read the value with RDTSC instruction. – Consider Nondeterministic Input. – Even if the physical location of the value is inside the CPU, you should consider these value when they produce unpredictable results. • If you could model and consider this deterministic, the implementation can be inefficient. • NOT considering this deterministic improves VM emulation efficiency. Applying to x86 (3.3)
  • 29. • Modeling ― CPU exception – Almost all exceptions are deterministic including their timing. • Page Fault occurs because the CPU has accessed the invalid memory address. – So this is not even the input. • Modeling ― Not determinable behavior of CPU – After some CPU operation, the part of internal context can be nondeterministic. (Value/behavior is undefined by the architecture.) – Consider this Nondeterministic Inputs. Applying to x86 (3.4)
  • 30. • Modeling ― Inexact Arithmetic Operation – Transcendental instruction such as FSINCOS, FATAN does not define the actual value because specifying the actual value is very difficult. – The minimum information that can be used to recover the original value is considered Nondeterministic Input. • Likewise, we have to model *everything* – Implementation is relatively difficult. Applying to x86 (3.5)
  • 31. Applying to x86 (4) • Considering X nondeterministic? – Increase number of hooks. – Trace log get bigger, execution get slower. – Fewer is great. • I thought these nondeterministic events are much, much fewer than normal instructions so there s no problem. – But it was wrong.
  • 32. How do you think? • Is this instruction deterministic? XOR edx, edx – As you know, this instruction just clears edx register. – But answer is No. • Many of normal operations make some part of internal context nondeterministic. – IT IS EFLAGS.
  • 33. The curst of EFLAGS? (1) • Let s look inside. – edx IS zero. On the other hand, EFLAGS.AF is updated to ? . – Intel s manual says this value is undefined (can vary.) xxx......xxx 000......000 x x x x x x 0 0 1 ? 1 0 XOR edx, edx (next instruction) OFedx SF ZF AF PF CF EFLAGS
  • 34. The curst of EFLAGS? (2) • This is not the end! – These frequently used instructions as well. – According to the profiling, 10-15% of instruction makes a part of EFLAGS undefined! 0 M M ? M 0 AND, OR, XOR, TEST (Logical Arithmetic) OF SF ZF PF CFAF M ? ? ? ? M MUL, IMUL (Multiplication) ? ? ? ? ? ? DIV, IDIV (Division) ? M M ? M ? SHL, SHR, SAL, SAR count (Shift)
  • 35. The curst of EFLAGS?(3) • Not much, much fewer at all! – Even 10% of instructions, the overhead of hooking cannot be ignored. – We can choose EFLAGS not to trace . For instance we can update EFLAGS register to deterministic value. But... • Updating flags (POPF) is extremely slow! • 24-25 clocks in Intel Nehalem MA (Core i7) – To avoid this problem, we need to avoid these values to be affected.
  • 36. The implementation problem (1) • Public Record and Replay implementation does not care about this condition! – They just limit processor model. If we record the program in processor model A, we need to replay with the exactly same model. – Prevents distributed analysis. – Normally, programs don t depend on these undefined (nondeterministic) values. • But technically, 1-bit of nondeterministic value can cause chaos.
  • 37. The implementation problem (2) • What is RIGHT? – We cannot exactly know which CPU model is right. – I want to integrate information in one. No more compatibility/portability problems. • This is no good for reverse engineering. – I want robustness!
  • 38. EFLAGS : Lazy Evaluation (1) • EFLAGS and programs have these characteristics: – Over 80% of updated flags are just discarded. • We want to trace *everything*. but it is worthless to trace the value that is not used at all. – Updating/Evaluating flags are adjacent in most cases. • e.g. Compare → Jump Conditionally • Intel do this optimization! (Macro-Fusion) – How about lazy evaluation? • Trace nondeterministic EFLAGS value when it is used.
  • 39. EFLAGS : Lazy Evaluation (2) • Current Implementation: – JIT compiling with static evaluation (to make programs run faster.) – Evaluate each instruction block • From the instruction after some jump operation to the unconditional jump (instruction/exception). • Scan each block forward. – Evaluate propagation of virtual EFLAGS. • Deterministic or not (Initial Value : No) • Last instruction that updated flag value. • We use heuristics.
  • 40. EFLAGS : Lazy Evaluation (3) • (cont.) – If the instruction in the block depends on these flags and virtual flags satisfy the condition below, we just consider this value nondeterministic. • The value of virtual flag is nondeterministic. • The value is deterministic but updated instruction is too old (32-bytes / 8-instruction or more older.) • Currently, this is very effective. – I found almost of all flags are traced during interrupt handling / context switch.
  • 41. Record and Replay : Conclusion • Using Record and Replay , we can decrease the amount of trace log and trace overhead. • Using (my) improved method, we can acquire robust trace log in x86 platform.
  • 43. Implementation • I implement VMM-based tracer. – To run general purpose OSes. • But it was not a good idea. Because of its complexity, I couldn t finalize the VMM (Nov 2010.) – Using binary translation • Read guest instruction and transform it to run on host platform. – I chose to use x64 platform to implement VMM. • There s some reason that x64 is good for binary translation-based x86 emulation.
  • 44. x86 on x64 (1) • x64 is a 64-bit extension to x86 architecture. – AMD, Intel and VIA have x64 extension. – Very similar instruction format. – Some extensions: • Increased general purpose and XMM registers (8→16) • New addressing modes (64-bit, RIP [program counter] relative) • There are many elements that make implementing binary translation-based VMM.
  • 45. x86 on x64 (2.1) • Benefit : 32-bit registers and clamp – General purpose register format is based on its original (that shares lower bits.) • 例 : ax (16-bit), eax (32-bit), rax (64-bit) – If you run the instruction which destination is 32-bit register, upper 32-bit of corresponding register is cleared! 0123 0123 4567 1234 MOV eax, 0x01234567 MOV ax, 0x1234 eax ax
  • 46. x86 on x64 (2.1) • Benefit : 32-bit registers and clamp – General purpose register format is based on its original (that shares lower bits.) • 例 : ax (16-bit), eax (32-bit), rax (64-bit) – If you run the instruction which destination is 32-bit register, upper 32-bit of corresponding register is cleared! 01234567 00000000 89abcdef 12345678 MOV rax, 0x0123456789abcdef MOV eax, 0x12345678 rax eax
  • 47. x86 on x64 (2.2) • Benefit : Increased Registers (GPR/XMM) – 8→16 (16 additional register including XMM regs.) – Save emulator s context without destroying the existing registers. rax r8 rcx r9 rdx r10 rbx r11 rsp r12 rbp r13 rsi r14 rdi r15 xmm0 xmm8 xmm1 xmm9 xmm2 xmm10 xmm3 xmm11 xmm4 xmm12 xmm5 xmm13 xmm6 xmm14 xmm7 xmm15
  • 48. x86 on x64 (2.2) • Benefit : Increased Registers (GPR/XMM) – 8→16 (16 additional register including XMM regs.) – Save emulator s context without destroying the existing registers. eax cs.base ecx es.base edx emuinfo ds.base ebx stack esp ebp tmp2 esi ss.base tmp1 edi xmm0 fs.base xmm1 gs.base xmm2 tmp3 xmm3 tmp4 xmm4 notused xmm5 notused xmm6 notused xmm7 notused Actual register mapping table. For memory/cache optimization, some registers are relocated.
  • 49. x86 on x64 (2.2) • Benefit : Increased Registers (GPR/XMM) – 8→16 (16 additional register including XMM regs.) – Save emulator s context without destroying the existing registers. – XMM registers are difficult to use sometime but we can transfer to GPR using movq instruction.
  • 50. x86 on x64 (2.3.1) • Benefit : Remained Addressing Format – Some addressing modes are added but still x86-based addressing format. – x86 have complex addressing mode: • Like 2-add, 1-shift : [esi+edx*4+123] • We can use it to separate memory access! – Address Translation : [segbase+offset] • All memory access if segbase-relative. (segbase contains 64-bit address of segment base.) – Achieving Memory Isolation • Like Google Native Client for x64
  • 51. x86 on x64 (2.3.2) • Benefit : Remained Addressing Format – (e.g. 1) : inc [ds:ecx] → inc [rbx+rcx] • rbx : Base address of DS segment. • rcx : Guest ECX register. – Wait a minute, ecx register is 32-bit but using rcx register that is 64-bit register! (You sure that way?) • No problem. As I described before, result of 32-bit operations are also clamped. • We can guarantee that the value of rcx is in the 32-bit range (0x0000_0000-0xffff_ffff.)
  • 52. x86 on x64 (2.3.3) • Benefit : Remained Addressing Format – (e.g. 2 [wrong]) : inc [ds:ecx+edx] → inc [rbx+rcx+rdx] • Store intermediate result to temporary register. – (e.g. 2 [correct]) : inc [ds:ecx+edx] → lea edi, [rcx+rdx] ; inc [rbx+rdi] • edi/rdi : Temporary register • Almost same as first example. – I ll take the best encoding x64 have. • Store 64-bit address to 32-bit register! • This is also a valid encoding. Address is automatically clamped and instruction is shortened.
  • 53. x86 on x64 (2.4.1) • Benefit : Huge Memory Range – 64-bit address width • Valid 48-bit (sign extended) logical address. • 0x0000_1234_5678 → 0x0000_0000_1234_5678 • 0x8000_1234_5678 → 0xffff_8000_1234_5678 – We can place the data/code that VMM uses outside the guest accessible region. • Looking x86 on x86, it needed address compression to store host/guest data in same address space. • Increases VMM speed.
  • 54. x86 on x64 (2.4.2) • Benefit : Huge Memory Range – But allocating just 4GB is not enough. The result of address calculation can over/underflow. – On 32-bit mode on x86, address calculation is done by 32-bit precision and overflow/underflow is ignored. It means lower 32-bits is equivalent to actual accessed memory address. – So, we modify the page table to satisfy: lower 32-bits are equivalent == same physical address
  • 55. x86 on x64 (2.4.3) • Benefit : Huge Memory Range – Allocate virtual memory region. – Considering address overflow, we allocate up to 44.5GB range of virtual memory. • Red and Blue areas point exactly same physical region. • We use page table to achieve. 44.5GB 42.25GB 2.25GB
  • 56. x86 on x64 (2.4.4) • Benefit : Huge Memory Range – Allocate virtual memory region each segment and/or segment access control. • On segment switch, just change base address. cs.base ds.base es.base ss.base data3 code0 data3 code3
  • 57. x86 on x64 (2.4.4) • Benefit : Huge Memory Range – Allocate virtual memory region each segment and/or segment access control. • On segment switch, just change base address. cs.base ds.base es.base ss.base data3 code0 data3 code3
  • 58. x86 on x64 (2.5.1) • Benefit : Simplified Architecture – Architecture of x64 is relatively simplified which makes implementing Type-2 VMM easier. • Only two interrupt handler types: – Interrupt Gate and Trap Gate • Now segment is a mere façade. – Flat memory model for CS, DS, ES and SS. – Replacing IDT (interrupt vector) to allocate VM-specific context. • PatchGuard compatible! • Nearly stealth but cannot hook system calls.
  • 59. x86 on x64 (2.5.2) • Benefit : Simplified Architecture – Pass-through the interrupts • We can do it safely with IDT switching. • There s some overhead. VM OS Actually implementation is a bit more complicated but I show the summary. IDT switch IDT switch OS Kernel VM Trampoline OS IntHandler VM Entry VM IntHandler VM Kernel
  • 60. x86 on x64 (3) • Using these techniques, implement binary translation. – But currently, it is still incomplete. • To trace the timing, the following information is required. – Value of branch counter (software implementation is possible.) – Current program counter (IP, EIP) – Repeat count (CX, ECX) • only when rep instruction was executing.
  • 61. Everything into the Ring-0 • Is privilege isolation required? – Dynamic code is generated safely and well isolated; enabling run everything in the kernel-mode (Ring-0.) • Low-overhead implementation. • Current implementation do it. – If this is dangerous behavior, you can also run the code on user-mode (Ring-3.)
  • 63. Trace size test (1.1) • Trace log size required – DLX Linux bundled Bochs 2.45 • From computer reset until login screen. • 52,217,403 instructions (no-emulation : 53 sec) – Specs • 1 MIPS (1,000,000 instructions/sec) • 32MB MEM, 10MB HDD – Use Bochs to generate instruction/memory trace and convert using specific methods.
  • 64. Trace size test (1.2) • Trace log size required – Size of initial context is not included. – Modeled devices in Bochs emulator and estimated the size of trace log required. – Due to simplified model, the size is only estimated (not exact value.)
  • 65. Trace size test (1.3) • Methods (comparison included) – Raw Text-format instruction/memory trace generated by Bochs. – Verbose Normal tracer (like OllyDbg does) – Dumb Record and Replay plus memory monitoring. – RnR (1) Record and Replay (tracing EFLAGS) – PROPOSAL Improved Record and Replay method – RnR (2) Record and Replay (IGNORING EFLAGS)
  • 66. Trace size test (2.1) Method Size (bytes) Raw 7,178,948,236 6.68GB Verbose X > 419,430,400 400MB Dumb 60,713,538 57.90MB RnR (1) 6,932,542 6.61MB PROPOSAL 389,013 380KB RnR (2) 31,788 31KB This table shows PROPOSAL generates only 1/1,000 of trace log than Verbose tracer. Record and Replay method (ignoring EFLAGS) is smaller than PROPSAL but it has low portability.
  • 67. Trace size test (2.2) 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 10,000,000,000 Size (bytes)
  • 68. Trace size test (2.3) • Conclusion – This result didn t come from actual implementation so there is some suspicious points. – Despite of this, the proposal method generates really small trace log compared to old methods.
  • 71. Possible Practical Uses (1) • Reverse Engineering (non-Malware) – Everything *worked* is everything *recorded* • All your program are belong to us! • Programs behavior is recorded, including VM detection and/or anti-debugging. – Of course program is unpacked/decrypted. • You can integrate multiple analysis.
  • 72. Possible Practical Uses (2) • Avoiding Anti-debugging/Anti-VM – No well-known backdoor. – But binary translation based VM can be detected by running specific code. • e.g. Self-modifying code is (extremely) slow. – You can find how VM is detected. At least, you can extract useful information to avoid VM detection. • Protection of normal program is not so strong.
  • 73. Possible Practical Uses (3) • Reverse Engineering (Malware) – It is DANGEROUS to run malware directly! – However, if you can take care of these problems, this tracer can be useful. – Honeypots?
  • 74. Possible Practical Uses (4) • Fuzzing / Exploit analysis / Bug discovery – Imagine that Valgrind is applied to all programs and you can use the guest program interactively. – By offline-analysis, you can find and track memory corruption. – If you can reproduce the issue, you can extract useful information. – However, it can be very implementation-dependent for fuzzing. (efficient or not.)
  • 75. Possible Practical Uses (5) • Analysis Support – Export for other well-known tools. • e.g. Wireshark – In this case, you have program s behavior so you can add metadata and/or supplemental info. • e.g. SSL/TLS auto decryption • You cannot steal a key from packet dump but remember, you can run the program which uses private (common/shared) key!
  • 76. Possible Practical Uses (6) • <<Place Entry Here>> – I guess you can use for other purposes. – I hope that many people work best around these type of tracer.
  • 77. Future Challenges / Conclusion Future Challenges / Summary
  • 78. Challenge : Multicore (1) • Original Record and Replay is not for multi-processing environment. – Many of communications make tracer slow. – Almost all implementations restricts 1 CPU/thread. (mine, too ) • But, it doesn t mean this is impossible. – Time-sharing – Software emulation of MESI protocol – Trace memory contents
  • 79. Challenge : Multicore (2) • Time-sharing – Only one CPU running simultaneously. – Switch the CPU execution with timer to simulate running multiple CPUs. • Pros. – Almost no synchronization required. • Cons. – More CPUs, less efficiency. – Difficult to reproduce multi-threading problems because this is not true multi-procesing.
  • 80. Challenge : Multicore (3) • Software Implementation of MESI protocol – Memory coherency algorithm – CPU uses this protocol (or its varieties) to make memory/cache coherent. – We can implement this using page-level protection. – Lock the page to write them. • Pros. – High efficiency on few shared pages. • Cons. – Software implementation is quite slow.
  • 81. Challenge : Multicore (4) • Trace Memory contents – Also trace memory contents read for shared pages. • Pros. – Can achieve high efficiency... maybe. • Cons. – It is not a perfect-information tracer. (Which CPU has written this value?!) – Memory trace is slow. • Bandwidth monster may be required.
  • 82. Challenge : 64-bit / Others • x64 on x64 is very difficult. – There are some ways but not so efficient. • SSE2 / Reciprocal, Square root instructions – Not exact value is required for these instructions and fast to run it (this is a problem.) • Hypervisor again? – Trace without portability and convert it to portable one (using same processor model.) – This is not perfect, but possible choice.
  • 83. CAUTION : PATENTS • Some of these techniques are patented! – Record and Replay – Optimization for Binary Translation based VMM. – Difficult/Impossible to avoid these patents. • However, all patents I have founds are only United States patent and I guess using this tracer outside US is no problem. – Be careful.
  • 84. Conclusion • I described how to build tracing-VMM for x86 on x64. • Using proposal method, trace log gets smaller and overhead gets lower too. – However, proper tests (validations) are required to check whether this is useful for reverse engineering. • Many of practical uses! – Some other?
  • 85. contact me at : li at livegrid dot org Open Source Project : Niizh will be available at http://niizh.org/ Thank you! Any questions?