Translation Cache Policies for Dynamic Binary Translation
1. National School of Computer Sciences
Translation cache policies for dynamic binary translation
Saber FERJANI
TIMA Laboratory - SLS Group
18 Avril 2013
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
1 / 25
2. Who I am ?
Academic & Professional Cursus
2010-2013 : Student at National School of Computer Sciences - Tunisia.
2011/2012 : Robotic team leader, participation to many competitions.
June-July 2011 : Intern at Alpha Technology, Design of many PCB layout
including QFP, SO, SMT and through hole components.
July-August 2012 : Intern at STMicroelectronics : Developing software for a
Hygrometer and an Altimeter, for STM32F3 microcontroller
http ://about.me/ferjani
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
2 / 25
3. Context
Why ?
Hardware design is taking more and more time,
Software development should start earlier,
Instruction Set Simulators (ISS) handles the simulation of processors, named
target, on a machine with a different architecture, named host.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
3 / 25
4. Context
Why ?
Hardware design is taking more and more time,
Software development should start earlier,
Instruction Set Simulators (ISS) handles the simulation of processors, named
target, on a machine with a different architecture, named host.
How ?
Cross Compilation.
Interpretive translation.
Dynamic Binary Translation.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
3 / 25
5. Terminology
Simulator : just duplicate the behavior of the system.
Emulator : duplicate the inner workings of the system.
TB : Translated Bloc.
IR : Intermediate representation (also called op-code)
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
4 / 25
8. Qemu Overview
Generic and open source machine emulator and virtualizer,
Created by Fabrice Bellard in 2003,
uses portable dynamic translation,
Supported Targets : x86, arm, mips, sh4, cris, sparc, powerpc, nds32...
Qemu Features
Just-in-time (JIT) compilation support,
Self-modifying code support,
Direct block chaining.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
7 / 25
9. Subject
Problematic
Simulation speed is mainly affected by reuse of TB,
Current policy just flush the entire cache when it is full,
We need to enhance translation cache policy in order to maximize TB reuse.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
8 / 25
11. Optimal cache algorithm
Evict entry that will not be used for the longest time.
Unfeasible in practice, since we cannot really know future !
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
10 / 25
12. Optimal cache algorithm
Evict entry that will not be used for the longest time.
Unfeasible in practice, since we cannot really know future !
First In First Out
Most simple cache replacement policy,
Entry remain in memory a constant duration.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
10 / 25
13. Optimal cache algorithm
Evict entry that will not be used for the longest time.
Unfeasible in practice, since we cannot really know future !
First In First Out
Most simple cache replacement policy,
Entry remain in memory a constant duration.
Least Recently Used
Enhancement to FIFO.
Each time an entry is referenced, it is moved to the end of the queue.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
10 / 25
14. Optimal cache algorithm
Evict entry that will not be used for the longest time.
Unfeasible in practice, since we cannot really know future !
First In First Out
Most simple cache replacement policy,
Entry remain in memory a constant duration.
Least Recently Used
Enhancement to FIFO.
Each time an entry is referenced, it is moved to the end of the queue.
Least Frequently Used
Exploit the overall popularity rather than temporal locality.
Least referenced entry is always chosen for eviction.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
10 / 25
21. lookup tb
by target pc
no Translate one
basic block
Cached ?
yes
execute tb
chain it
to existed
basic block
Exception
handling
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
14 / 25
22. lookup tb
by target pc
no Translate one
basic block
Cached ?
yes
execute tb
chain it
to existed
basic block
Exception
handling
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
15 / 25
23. Focus on (Translate one basic block)
try to allocate
space for tb
sucess ?
no
Flush entire
translation
cache
yes
allocate
space for tb
(cannot fail!)
generate op
& host code
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
16 / 25
24. Implementation constraints
Variable TB size
In basics cache algorithms, evicting one entry is always sufficient to bring an other,
but in our case, TB size is not only variable, but also unknown during allocation.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
17 / 25
25. Implementation constraints
Variable TB size
In basics cache algorithms, evicting one entry is always sufficient to bring an other,
but in our case, TB size is not only variable, but also unknown during allocation.
Self modifying code
When the executed code modify it self, the TB is re-translated into different
space. thus result in many memory allocation while only the last one is needed.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
17 / 25
26. Implementation constraints
Variable TB size
In basics cache algorithms, evicting one entry is always sufficient to bring an other,
but in our case, TB size is not only variable, but also unknown during allocation.
Self modifying code
When the executed code modify it self, the TB is re-translated into different
space. thus result in many memory allocation while only the last one is needed.
Low overhead
We need to predict if the the replacement cache overhead remain below the cost
of cache flush, otherwise, we should simply flush the entire cache.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
17 / 25
28. Goals
Simulate LRU & LFU Algorithms,
Compare cache hit ratio,
Evaluate overhead of each algorithm.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
19 / 25
29. Goals
Simulate LRU & LFU Algorithms,
Compare cache hit ratio,
Evaluate overhead of each algorithm.
Assumptions
We ignore TB size & cache size,
Quota of retained entries is 1/5,
Cache size is just limited by number of TB,
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
19 / 25
30. Execution ratio = (executions/translation)
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
20 / 25
31. LFU cache hit ratio
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
21 / 25
32. LRU cache hit ratio
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
22 / 25
33. Perspectives
find a suitable cache replacement policy that take care of implementation
constraints.
use a dynamically variable quota for retained entries.
add small op-code buffer to optimize re-translation of self modifying code.
divide translation cache into multiple space to optimize partial cache flush.
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
23 / 25
34. Bibliography
QEMU Just-In-Time Code Generator and System Emulation - cmchao
(March 15,2010).
QEMU internals - Chad D. Kersey (January 28, 2009).
QEMU, a Fast and Portable Dynamic Translator - Fabrice Bellard (USENIX
2005 Annual).
Performance Evaluation of Traditional Caching Policies on A Large System
with Petabytes of Data - 2012 IEEE Seventh International Conference on
Networking, Architecture, and Storage
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
24 / 25
35. Thanks for your attention !
Feel free to ask any question !
Saber F. (TIMA SLS )
ENSI
18 Avril 2013
25 / 25