2. Motivation
• Python is great for rapid development
and high-level thinking-in-code
• It is slow for interior loops because lack
of type information leads to a lot of
indirection and “extra” code.
3. Motivation
• NumPy users have a lot of type
information --- but only currently have
one-size fits all pre-compiled, vectorized
loops.
• Many new features envisioned will need
the ability for high-level expressions to
be compiled to machine code.
4. Goals
• Most developers should not have to write
anything but Python -- or other even higher-
level Domain Specific Language (DSL).
• Create faster code using array-expressions from
NumPy users -- Fortran is the initial target
• Take advantage of multi-core and GPUs for a
subset of Python.
5. Why Not PyPy?
• PyPy does not work with CPython
• PyPy is a (meta) “tracing” JIT. Machine code is
generated on the fly so there is no “build step” -- but
we want to support a “build step” when justified
• PyPy tries to speed up everything -- we want to
optimize more specifically on numeric codes
(including complex numbers)
More to the story...
6. Why not Cython?
• Cython is great for what it does, but...
• Cython creates extension modules which cannot be
“unloaded” dynamically
• Cython requires a full C-compiler
• Cython doesn’t do type inference -- you have to
declare types on everything
• Cython is another syntax to learn
7. What’s the real motivation...
• “Computed columns” for data-types
• Always been bothered by how to write a fast-version
of “vectorize”
• and... I wanted to play with LLVM!
8. More Ranting
• The world needs more array-oriented compilers --
Python has needed one for a decade at least.
• Array-oriented computing needs more light in CS
curricula
• Most domain experts can write what they want at a
high-level. Commonly this is then “translated” to a
lower-level and then the compiler gets a hold of it.
This is sub-optimal.
• Projects discussed are doing this, but still niche.
Copperhead, Theano, etc.
9. More Ranting
• Today’s vector machines (and vector co-processors,
or GPUS) were made for array-oriented computing.
• The software stack has just not caught up ---
unfortunate because APL came out in 1963.
• There is a reason Fortran remains popular.
10. Array-Oriented Computing
• Loosely defined as “Organize data-together” and
operate on it together (or in cache-size chunks) with
array-level operations (e.g. NumPy)
Object Attr1 Attr2 Attr3
Attr1 Object
Object Object1
Attr2 Attr1
Attr1
Attr3 Attr2 Object2
Attr2
Attr3
Attr3 Object3
Object Object4
Object Attr1
Object Object5
Attr1 Attr2
Attr1
Attr2 Attr3 Object6
Attr2
Attr3 Attr3
11. Goal:
Numba should be the world’s best
array-oriented compiler.
12. NumPy + Mamba = Numba
Python Function Machine Code
LLVM-PY
LLVM Library
ISPC OpenCL OpenMP CUDA CLANG
Intel AMD Nvidia Apple ARM
13. Ufuncs
Generalized
UFuncs
Python
Function
Window
Kernel
Funcs
Function-
Uses of Numba
based
Indexing
Memory
Filters
Numba
NumPy Runtime
I/O Filters
Reduction
Filters
Computed
Columns
function pointer
14. Uses of Numba in SciPy
optimize integrate
special ode
writing more of SciPy at high-level
15. Numba --- a deeper look
Numba is a Python to LLVM translator. It
translates Python to LLVM IR (the LLVM
machinery is then used to create machine
code from there). Numba is NumPy aware
--- it understands NumPy’s type system,
methods, C-API, and data-structures
16. Numba -- written in Python
• Numba itself is pure Python -- it uses (an
updated) LLVM-py to interact with the LLVM
C++ library to build a representation of the
code in LLVM assembler.
• LLVM then creates machine code (or a
“bitcode” module which can be persisted or
sent to another machine)
• Machine-code is equivalent to a C-level
function-pointer (e.g. a ctypes function)
22. Status and Future
• Current master branch mostly due to Jon Riehl
(Resilient Science) sponsored by Continuum
Analytics, Inc. --- interprets bytecode directly
• New devel branch working with AST directly and
making rapid progress
- Mark Florrison (minivect)
- Siu Kwan Lam (pymothoa)
23. Software Stack Future?
Plateaus of Code re-use + DSLs
SQL R
TDPL Matlab
Python
OBJC C
FORTRAN C++
LLVM