Tracing and partial evaluation have been proposed as meta-compilation techniques for interpreters to make just-in-time compilation language-independent. They promise that programs executing on simple interpreters can reach performance of the same order of magnitude as if they would be executed on state-of-the-art virtual machines with highly optimizing just-in-time compilers built for a specific language. Tracing and partial evaluation approach this meta-compilation from two ends of a spectrum, resulting in different sets of tradeoffs.
This study investigates both approaches in the context of self-optimizing interpreters, a technique for building fast abstract-syntax-tree interpreters. Based on RPython for tracing and Truffle for partial evaluation, we assess the two approaches by comparing the impact of various optimizations on the performance of an interpreter for SOM, an object-oriented dynamically-typed language. The goal is to determine whether either approach yields clear performance or engineering benefits. We find that tracing and partial evaluation both reach roughly the same level of performance. SOM based on meta-tracing is on average 3x slower than Java, while SOM based on partial evaluation is on average 2.3x slower than Java. With respect to the engineering, tracing has however significant benefits, because it requires language implementers to apply fewer optimizations to reach the same level of performance.
Tracing versus Partial Evaluation: Which Meta-Compilation Approach is Better for Self-Optimizing Interpreters?
1. Tracing versus Partial Evaluation
Which Meta-Compilation Approach is
Better for Self-Optimizing
Interpreters?
Stefan Marr, Stéphane Ducasse
OOPSLA, October 28, 2015
Work Done At
2. Disclaimer
2
I am currently funded by
* Würthinger, T.; Wimmer, C.; Wöß A.; Stadler, L.; Duboscq, G.; Humer, C.; Richards, G.; Simon, D. & Wolczko, M,
One VM to Rule Them All,
in Proceedings of the 2013 ACM International Symposium on New Ideas,
New Paradigms, and Reflections on Programming & Software, ACM.
Oracle Labs
4. Compare Concrete Systems
Truffle + Graal
with Partial Evaluation
RPython
with Meta-Tracing
[3] Würthinger et al., One VM to Rule Them All, Onward!
2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's Tracing JIT
Compiler, ICOOOLPS Workshop 2009, ACM, pp. 18-25.
Oracle Labs
5. Selecting A Case Study
On both Systems
5
Self-Optimizing AST Interpreter
6. Represents Large Group of
Dynamic Languages
Dynamically Typed (Smalltalk)
Classes
(and everything is an Object)
Closures (lambdas)
Non-local Returns
(almost exceptions)
Set of Benchmark
6
http://som-st.github.io
7. SOMMT versus SOMPE
Meta-Tracing Partial Evaluation
7
cnt
1
+
cnt:
=
if
cnt:
=
0
cnt
1
+
cnt:
=if cnt:
=
0
[3] Würthinger et al., One VM to Rule Them
All, Onward! 2013, ACM, pp. 187-204.
[2] Bolz et al., Tracing the Meta-level: PyPy's
Tracing JIT Compiler, ICOOOLPS Workshop
2009, ACM, pp. 18-25.
8. WHICH APPROACH IS FASTER FAST?
minimal amount of engineering to get good performance
8
13. Optimization Impact on SOMPE
13
I
I
I
I
I
I
I
I
I
I
I
I
I
lower control structures
inline caching
cache globals
typed fields
lower common ops
array strategies
inline basic ops.
typed vars
opt. local vars
baseline
min. escaping closures
typed args
catch−return nodes 0.85
1.00
1.20
1.50
2.00
3.00
4.00
5.00
7.00
8.00
10.00
12.00
Speedup Factor
(higher is better, logarithmic scale)Speedup Factor
(higher is better, logarithmic scale)
14. Implementation Sizes
RPython
From Minimal to Optimized
+57% LOC
From 3,455 LOC to 5,414 LOC
Truffle
From Minimal to Optimized
+ 103% LOC
From 5,424 LOC to 11,037 LOC
14
The Way I write
Python
The Way I write
Java
15. WHICH APPROACH GIVES BETTER
STARTUP PERFORMANCE?
Considering the User-Perceived System Performance
15
16. Measuring “Whole Program” Runtime
16
4
8
12
16
0 200 400 600
GeoMeanOf(Wall−ClockTimeforxIterations,dividedbycorrespondingJavaresult)
VM
Java
RTruffleSOM−jit−ex
TruffleSOM−graal−n
Wall−Clock Behavior for Various Run Lengths: Aggregation over all Benchmarks
FactoroverJava,forx-iterations
Iterations of Benchmark in Same Process
8sec 25sec 46sec
• Process Start to Finish
• Overall Wall-clock time
• Normalized to Java
Java
SOMMT
SOMPE
It is about how to determine the compilation unit.
Remember, the interpreter is implemented in one language, and the compilation works on the meta-level.
The main idea is that we want to take the implementation, add information from the execution context, and use that to do very aggressive and speculative optimizations on the interpreter implementation.
This avoids the need to write custom JIT compilers.