1. Compiling Python to
Native Code for Speed
and Scale
David Kammeyer
Continuum Analytics
kammeyer@continuum.io
Tuesday, June 4, 13
2. Continuum Background
• Python for Big Data and Science
• Founded by Travis Oliphant
(Creator of NumPy) and Peter
Wang in 2012
• 45 Employees
Tuesday, June 4, 13
4. Products
Anaconda: Easy to install Python distribution, including the
most popular open-source scientific and mathematical
libraries. (Free!)
Accelerate: Opens up the full capabilities of the GPU or
multi-core processor to Python.
IOPro: fast loading of data from files, SQL, and NoSQL
stores, improving performance and reducing memory
overhead.
Wakari: Browser-based Python and Linux environment for
collaborative data analysis, exploration, and visualization.
(Small Instance is Free!)
Tuesday, June 4, 13
5. Open Source Projects
Blaze: High-performance Python library for modern
vector computing, distributed and streaming data
Bokeh: Interactive, grammar-based visualization
system for large datasets
Numba:Vectorizing Python compiler for multicore
and GPU, using LLVM
Tuesday, June 4, 13
6. Numba
• Just-in-time, dynamic compiler for Python
• Optimize data-parallel computations at call time,
to take advantage of local hardware configuration
• Compatible with NumPy, Blaze
• Leverage LLVM ecosystem:
• Optimization passes
• Inter-op with other languages
• Variety of backends (e.g. CUDA for GPU support)
Tuesday, June 4, 13
8. Simple API
#@jit('void(double[:,:], double, double)')
@autojit
def numba_update(u, dx2, dy2):
nx, ny = u.shape
for i in xrange(1,nx-1):
for j in xrange(1, ny-1):
u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 +
(u[i,j+1] + u[i,j-1]) * dx2) /
(2*(dx2+dy2))
Comment out one of jit or autojit (don’t use together)
• jit --- provide type information (fastest to call at run-time)
• autojit --- detects input types, infers output, generates code
if needed, and dispatches (a little more run-time call
overhead)
Tuesday, June 4, 13
11. Fast vectorize
NumPy’s ufuncs take “kernels” and
apply the kernel element-by-element
over entire arrays Write kernels in
Python!
from numbapro import vectorize
from math import sin
@vectorize([‘f8(f8)’, ‘f4(f4)’])
def sinc(x):
if x==0.0:
return 1.0
else:
return sin(x*pi)/(pi*x)
Tuesday, June 4, 13
12. Create parallel-for loops
“prange” directive that spawns compiled tasks
in threads (like Open-MP parallel-for pragma)
import numbapro
from numba import autojit, prange
@autojit
def parallel_sum2d(a):
sum = 0.0
for i in prange(a.shape[0]):
for j in range(a.shape[1]):
sum += a[i,j]
Tuesday, June 4, 13
13. Example: MandelbrotVectorized
from numbapro import vectorize
sig = 'uint8(uint32, f4, f4, f4, f4, uint32, uint32,
uint32)'
@vectorize([sig], target='gpu')
def mandel(tid, min_x, max_x, min_y, max_y, width,
height, iters):
pixel_size_x = (max_x - min_x) / width
pixel_size_y = (max_y - min_y) / height
x = tid % width
y = tid / width
real = min_x + x * pixel_size_x
imag = min_y + y * pixel_size_y
c = complex(real, imag)
z = 0.0j
for i in range(iters):
z = z * z + c
if (z.real * z.real + z.imag * z.imag) >= 4:
return i
return 255
Kind Time Speed-up
Python 263.6 1.0x
CPU 2.639 100x
GPU 0.1676 1573x
Tesla S2050
Tuesday, June 4, 13
14. Many More Advanced Features!
• Extension classes (jit a class -- autojit coming soon!)
• Struct support (NumPy arrays can be structs)
• SSA -- can refer to local variables as different types
• Typed lists and typed dictionaries and sets coming
soon!
• Calling ctypes and CFFI functions natively
• pycc (create stand-alone dynamic library and
executable)
• pycc --python (create static extension module for
Python)
Tuesday, June 4, 13
15. Availability
•Core is Open Source
•github.com/numba/numba
•GPU Compiliation and Parallelization
available in Anaconda Accelerate, €100.
Tuesday, June 4, 13