5. Project motivation
• The ultimate goal is a faster Jython
• The new compiler is just a component to
get there
• Focus is on representation of Python code
on the JVM
6. What does Code
Representation include?
• Function/Method/Code object
representation
• Scopes. How to store locals and globals
• Call frame representation
• Affects sys._getframe()
• The representation of builtins
• Mapping of python attributes to the JVM
7. Compiler tool chain
AST
Source code Parser AST Analyzer Compiler The “spine” of the
compiler. The main part.
Code Info This is the same in any
per scope compiler in Jython, and
similar to other systems,
CPython in particular, as
well.
8. Compiler tool chain
AST
Source code Parser AST Analyzer Compiler This is the structure of
the compiler in Jython
Code Info today.
per scope
Java
byte code
Jython
runtime
system JVM
9. Compiler tool chain
AST
Source code Parser AST Analyzer Compiler IR Transformer
Code Info
per scope
IR
The advanced compiler
adds t wo more steps
to the compilation
process.
The analyzer and
Codegen
compiler step also
Java
Jython change.
byte code
runtime
system JVM
10. Compiler tool chain
AST
Source code Parser AST Analyzer Compiler IR Transformer
Code Info
This flexibility makes it per scope
possible to output many IR
different code formats.
Even bundle together multiple Python
formats for one module.
byte code
Codegen
Java
Jython byte code
Interpreter
runtime
system JVM
11. Compiler tool chain
AST
Source code Parser AST Analyzer Compiler IR Transformer
Code Info
It is also possible to compile, per scope
and re-compile code with
more information from the
actual runtime data.
Codegen
IR
IR
Java
+ runtime
byte code
Jython info
Interpreter
runtime
system JVM
12. The Intermediate
Representation
• “sea of nodes” style SSA
• Control flow and data flow both
modeled as edges between nodes
• Simplifies instruction re-ordering
14. Parrotbench
• 7 tests, numbered b0-b6
• Test b1 omitted
• Tests infinite recursion and expects
recursion limit exception
• Allocates objects while recursing
• Not applicable for Jython
15. Running parrotbench
• Python 2.6 vs Jython 2.5 (trunk)
• Each test executes 3 times, minimum taken
• Total time of process execution, including
startup also measured
• Jython also tested after JVM JIT warmup
• Warmup for about 1 hour...
110 iterations of each test
16. The tests
(rough understanding)
• b0 parses python in python
• b2 computes pi
• b3 sorts random data
• b4 more parsing of python in python
• b5 tests performance of builtins
• b6 creates large simple lists/dicts
17. Python 2.6
Test Time (ms)
b0 1387
b2 160
b3 943
b4 438
b5 874
b6 1079
Total* (incl.VM startup) 15085
* Total time is for three iterations, other
times is the best iteration of those three
18. Jython 2.5b
(Preview version available at PyCon)
Test Time (ms) Time (ms)
(without JIT warmup) (with JIT warmup)
b0 4090 2099
b2 202 107
b3 3612 1629
b4 1095 630
b5 3044 2161
b6 2755 2237
Total* (incl.VM startup) 51702 Not applicable
* Total time is for three iterations, other
times is the best iteration of those three
19. Jython 2.5+ Jython 2.5.0 Final has
an embarrassing
performance issue on
list multiplication that
(Snapshot from June 24 2009) got introduced when
the list implementation
was made thread safe.
Test Time (ms) Time (ms)
(without JIT warmup) (with JIT warmup)
b0 2968 2460
b2 202 124
b3 2255 2030
b4 875 742
b5 4036 2291
b6 2279 2276
Total* (incl.VM startup) 57279 Not applicable
* Total time is for three iterations, other
times is the best iteration of those three
20. CPython2.6 vs Jython2.5 Work on thread safety and compatibility has
made Jython *slower* but better.
Performance is a later focus.
Python 2.6 Jython 2.5b Jython 2.5+
60,000
45,000
30,000
15,000
0
Total runtime Excluding VM startup
21. CPython2.6 vs Jython2.5
UnJITed
performance
improved due to
lower call overhead
b0 b2 b3 b4 b5 b6 and better dict. JITed
performance worse
due to thread safety
fiixes.
15,000
11,250
7,500
3,750
0
Python 2.6 Jython 2.5b with warmup Jython 2.5+ with warmup
22. CPython2.6 vs Jython2.5
Python 2.6 Jython 2.5b Jython 2.5b with warmup
Jython 2.5+ Jython 2.5+ with warmup
5,000
3,750
2,500
1,250
0
b0 b2 b3 b4 b5 b6
23. JRuby is a good indicator
for the performance we
could reach with Jython.
It’s a similar language on
the same platform.
Therefore a comparison
and analysis is
interesting.
Is JRuby
faster than Jython?
24. Adding two numbers
# Jython
def adder(a,b):
return a+b
# JRuby
def adder(a,b)
a+b
end
25. Execution times
(ms for 400000 additions)
Jython JRuby
700ms
697ms
525ms
466ms
350ms
175ms
0ms
Without counter
26. Why is JRuby faster?
• JRuby has had more work on performance
• Jython work has been focused on 2.5 compatibility
• Next release will start to target performance
• JRuby has a shorter call path
• JRuby does Call Site caching
27. Counting the number of
additions - Jython
from threading import Lock
count = 0
lock = Lock()
def adder(a,b):
global count
with lock:
count += 1
return a+b
28. Counting the number
of additions - JRuby
class Counting
def adder(a,b)
@mutex.synchronize {
@count = @count + 1
}
a + b
end
end
29. Execution times
(ms for 400000 additions)
Jython (Lock) JRuby (Mutex) Jython (AtomicInteger)
50,000ms
I included AtomicInteger to
46,960ms verify that the problem
was with the
synchronization primitives.
37,500ms
25,000ms
12,500ms
4,590ms
0ms 2,981ms
With counter
30. Why is JRuby faster?
• JRuby has had more work on performance
• JRuby has lower call overhead
• JRuby Mutex is easier for the JVM to
optimize than Jython Lock
• Because of JRubys use of closures
31. Call overhead
comparison
• Python wrapper around • Java code implementing
Java primitives the Ruby logic
• Call to Python code • Lock
• Reflective Java call • Direct call to closure
• Lock • Unlock
• Execute actual code
• Call to Python code
• Reflective Java call
• Unlock
33. Call frames
• A lot of Python code depend on reflecting
call frames
• Every JVM has call frames, but only expose
them to debuggers
• Current Jython is naïve about how frames
are propagated
• Simple prototyping hints at up to 2x boost
34. Extremely late binding
• Every binding can change
• The module scope is volatile
• Even builtins can be overridden
35. Exception handling
• Exception type matching in Python is a
sequential comparison.
• Exception type matching in the JVM is done
on exact type by the VM.
• Exception types are specified as arbitrary
expressions.
• No way of mapping Python try/except
directly to the JVM.
36. Blocks of Code
• The JVM has a size limit
• The JVM JIT has an even smaller size limit
38. Call frames
• Analyze code - omit unnecessary frames
• Fall back to java frames for pdb et.al.
• Treat locals, globals, dir, exec, eval as special
• Pass state - avoid central stored state
• sys._getframe() is an implementation detail
39. Late binding
• Ignore it and provide a fail path
• Inline builtins
• Turn for i in range(...): ... into a java loop
• Do direct invocations to members of the
same module
40. JVM Code analysis
• Create faux closures
• Separate code blocks that evaluate in
same scope
• Will also help with the code size limit
41. Exception handling
• The same late binding optimizations
+ optimistic exception handler
restructuring gets us far
42. Reaping the fruits of
the future JVMs
• Invokedynamic can perform most optimistic
direct calls and provide the fail path
• Interface injection make all java objects
look like python objects
• Gives improved integration between
different dynamic languages even more
• The advanced compiler makes a perfect
platform for integrating this
44. The “Advanced Jython
compiler” project
• Not just a compiler - but everything close
to the compiler - code representation
• A platform for moving forward
• First and foremost an enabling tool
• Actual improvement happens elsewhere
45. Performance
• Jython has decent performance
• On some benchmarks Jython is better
• For single threaded applications CPython is
still slightly better
• Don’t forget: Jython can do threading
• Long running applications benefit from the
JVM - Jython is for the server side
• We are only getting started...
46. Python / JVM mismatch
- Getting better -
• Most of the problems comes from trying to
mimic CPython to closely
• Future JVMs are a better match
• Break code into smaller chunks
• Shorter call paths
• Optimistic optimizations are the way to go