Talk at PyCon2022 over building binary packages for Python. Covers an overview and an in-depth look into pybind11 for binding, scikit-build for creating the build, and build & cibuildwheel for making the binaries that can be distributed on PyPI.
2. Is Python fast?
2
Python: 4 minutes
Python -> NumPy, SciPy
High level language + libraries:
Easy exploration of di
ff
erent algorithms
Compiled!
From The counter-intuitive rise of Python in scienti
fi
c computing:
Problem:
projection of ~1B cells
https://cerfacs.fr/coop/fortran-vs-python
Fortran: 6 hours and 30 minutes
3. Python has great performance
3
As long as you have a library that has your desired algorithm
But what do you do if you don’t?
4. Solution 0: dump Python
4
Other languages can o
ff
er native performance (C, Rust, etc)
But Python is easy to learn, fast to write, and has a massive ecosystem
We can split code in to “driver” code that takes virtually no time to execute
and “performance-critical” code that takes most of the execution time
Python
Not Python
5. Solution 1: Numba
5
Pros
As fast or faster than any other possible solution
Will target the exact hardware used
Supports parallelization, GPUs, and more
Can be ahead-of-time compiled
Cons
Takes time to JIT (though pretty fast)
(Somewhat) heavy dependency
Expansive but still limited feature set - numeric heavy
Slow to support new Python, NumPy, etc.
JIT compiler for Python functions using LLVM(Lite)
@numba.vectorize
def f(x):
return x*2
@numba.jit
def f(x):
return x*2
6. Solution 2: Make Python Faster
6
This is in not the original goal of Python, but recently is actually being worked on.
PyPy is a JIT version of Python
Pyjion is a JIT for regular Python 3.10+
Facebook has Cinder, a fast fork of CPython 3.8
Dropbox has Pyston, a fast version of Python
And CPython itself is getting faster now!
But these projects generally don’t make heavy numeric work faster.
Why? Python was designed to support the solution we’ll see on the next page.
NumPy, etc
7. Solution 3: (Pre)compile
7
This is in the original design of Python - it’s how CPython works
Write code using
CPython API
Compile
extension
Build wheels
Write compiled
lib with interface
Setuptools
Scikit-build
cibuildwheel
pybind11
Cython
SWIG
ctypes
cf
f
multibuild
This is why faster Python doesn’t make scienti
fi
c codes
much faster - they are already compiled!
mesonpy maturin-action
8. Python
What is a binary extension?
8
User code
Python module
Compiled
+ Python
User code
Binary extension
Can be any language:
C, C++, Rust, Cython, MyPyC, …
Easy: Lots of examples, similar to user code
Portable: Usually works anywhere
High level: can express complex relationships easily
Complex: limited examples, multilingual
Speci
fi
c: Compiled per architecture / Python version
Low level: can achieve high performance
Calls Calls
9. Pure / Binary wheels
9
mypy-0.950-py3-none-any.whl
Pure wheel.
Any impl
Any ABI
Any platform
Should not contain
compiled extensions.
You can force wheels with
--only-binary=:all: /
PIP_ONLY_BINARY=:all:
mypy-0.950.tar.gz
SDist. Not a wheel.
Contains raw source code.
Requires the build backend
(setup.py, pyproject.toml, etc)
mypy-0.950-cp310-cp310-win_amd64.whl
Compiled wheel.
CPython 3.10
CPython 3.10
Windows 64-bit
Pip selects the most
speci
fi
c wheel
This is the fastest one!
10. But why compile?
10
Speed
High level Python uses low level compiled libraries for performance!
NumPy
Pandas
MyPy
(optional)
PyTorch
Tornado
pydantic
uvloop
twisted
websockets
pyyaml
markupsafe
psutil matplotlib
TensorFlow
Code reuse
There are lots of libraries out there in other languages. No need to rewrite!
Check for yourself with pipx run pypi-command-line wheels <package>
11. Disclamers
11
This will be biased. I’ll often focus on stu
ff
I am part of.
But I also liked these things enough to join them.
And I know those things best.
We will look at lots of “good practices”, maybe “best”, but certainly not “only”.
12. But it’s hard?
12
Let’s divide the problem into three stages:
Binding / Coding
Generic
ctypes
cf
fi
C++
Boost.Python
pybind11
nanobind
Python
MyPyC
Numba (AOC mode)
Other
Cython (custom lang)
SWIG (multi-lang)
Build System
Setuptools + custom code
Scikit-build
MesonPy
Enscons
Hatch-mypyc
No native support in
fl
it,
hatchling, poetry, etc, but some
have plugins / custom code
examples.
Wheel building
cibuildwheel
multibuild
maturin-action (Rust)
Intentional omissions!
Numba (JIT Python)
CPPYY (JIT C++)
PyPy (JIT Python)
Pyjion (JIT Python)
We will select one tool for each
job for this talk - and sometimes
mention some of the others.
The goal is not to show you the
“right” way, but how simple it
can/should be, and what you
should not settle for.
13. Bindings / Coding
13
Access to (maybe existing) compiled code
C, C++, Rust, Go, etc.
From-scratch fast code
May prefer something that looks like Python
✅ Can reuse existing work
✅ Can use across languages
✅ Can leverage strong language support
❌ Must know at least two languages
✅ Less to learn (but not general)
✅ Can make extension optional
14. A fork in the road
14
(not a pun on UNIX forks)
A fork in
Python extension
C interface library
float square(float x) {
return x*x;
}
from ctypes import cdll, c_float
lib = cdll.LoadLibrary('./simple.so')
lib.square.argtypes = (c_float,)
lib.square.restype = c_float
lib.square(2.0)
static PyObject* square_wrapper(PyObject* self, PyObject* args) {
float input, result;
if (!PyArg_ParseTuple(args, "f", &input)) {
return NULL;
}
result = square(input);
return PyFloat_FromDouble(result);
}
static PyMethodDef pysimple_methods[] = {
{ "square", square_wrapper, METH_VARARGS, "Square function" },
{ NULL, NULL, 0, NULL }
};
static struct PyModuleDef pysimple_module = {
PyModuleDef_HEAD_INIT,
"pysimple",
NULL,
-1,
pysimple_methods
};
PyMODINIT_FUNC PyInit_pysimple(void) {
return PyModule_Create(&pysimple_module);
}
16. Not just a JIT
16
Numba
Numba supports AOT (ahead of time) compilation, too!
Limited support, anyway.
from numba.pycc import CC
cc = CC('my_module')
@cc.export('square', 'f8(f8)')
def square(a):
return a ** 2
from distutils.core import setup
from source_module import cc
setup(...,
ext_modules=[cc.distutils_extension()])
import my_module
my_module.square(1.414)
17. Fast Python
17
MyPyC
Piggy-backs on modern Python typing - used to make MyPy 5x faster!
import time
def fib(n: int) -> int:
if n <= 1:
return n
else:
return fib(n - 2) + fib(n - 1)
t0 = time.time()
fib(32)
print(time.time() - t0)
python fib.py
mypyc fib.py
python -c "import fib" # 10x faster!
For comparison, Numba JIT is 35x faster
18. Fast not-quite-Python
18
Cython
# cython: language_level=3
import time
import cython
@cython.ccall
def fib(n: cython.int) -> cython.int:
if n <= 1:
return n
else:
return fib(n - 2) + fib(n - 1)
t0 = time.time()
fib(32)
print(time.time() - t0)
Python-like language, transpiles to C/C++
✅ Fast (arrays too) (with the proper directives)
✅ Can also bind C & C++ (verbose)
✴ Many ways to do things
❌ May need oldest-supported-numpy when building
pipx run --spec cython cythonize fib.py
clang $(python3-config --cflags --ldflags) -shared -undefined dynamic_lookup
fib.c -o fib$(python3-config —extension-suffix) # macOS
python -c "import fib" # 9x faster
19. Binding:
19
Header-only pure C++11 CPython/PyPy interface
Trivial to add to a project
No special build requirements
No dependencies
No precompile phase
Not a new language
Think of it like the missing C++ API for CPython
Designed for one purpose!
20. Example of usage
20
#include <pybind11/pybind11.h>
int add(int i, int j) {
return i + j;
}
PYBIND11_MODULE(example, m) {
m.def("add", &add);
}
Standard include
Normal C++ to bind
Create a Python module
Signature statically inferred
Docs and parameter names optional
g++ -shared -fPIC example.cpp $(pipx run pybind11 --includes) -o example$(python3-config --extension-suffix)
Complete, working, no-install example (linux version)!
21. Many great features
21
Simple enough for tiny projects
659K code mentions of pybind11 on GitHub
Powerful enough for huge projects
SciPy, PyTorch, dozens of Google projects
Small binaries prioritized
Perfect for WebAssembly with Pyodide
Powerful object lifetime controls
py::keep_alive, std::shared_ptr, and more
NumPy support without NumPy headers
No need to lock minimum NumPy at build
Supports interop in both directions
Can even be used to embed Python in C++
Most STL containers and features supported
Including C++17 additions, like std::variant (or boost)
Vectorize methods or functions
py::vectorize, even on a lambda function
Trampoline classes and multiple inheritance
Complex C++ is supported
Complete control of the Python representation
Special methods, inheritance, pickling, and more
Bu
ff
er protocol and array classes
Includes Eigen support too
Cross-extension ABI
One extension can return a type wrapped in another one
22. Nanobind
22
C++17+ & Python 3.8+ only
Similar API to pybind11
Intentionally more limited than pybind11
Focus on small, e
ffi
cient bindings
Some ideas can be backported to pybind11
https://github.com/wjakob/nanobind
23. Example project
23
Let’s work through a complete example!
Every line of code needed. Period.
All platforms. Everything.
Let’s bind a tiny bit of CLI11 (argument parser library)
from cli11 import App
app = App("hello")
app.add_flag("--this")
app.parse(["--this"])
assert app["--this"] == 1
# Should work
print(app)
app["--that"] # -> KeyError
27. Build systems
27
There are some great pure Python build systems today with PEP 621!
But your choices for building binaries is limited!
Setuptools/distutils
No C++ std support
No multithreaded builds
No partial builds
No compiler features
(etc)
Native Cython support
numpy.distutils
mypyc.build
numba.pycc.CC().distutils_extension
pybind11.setup_helpers
setuptools_rust
setuptools_golang
Extensions Enscons
Early PEP 517 adopter
(Uses distutls/setuptools internally)
MesonPy
New PEP 621 meson adaptor
Maturin
Rust PEP 621 builder
From
scratch
Scikit-build
Currently a wrapper around setup
Wrapper
Planned move in this direction!
28. Pybind11 and setuptools
28
setup.py
Setuptools
Helpers can be used for easy setuptools support
Proper C++
fl
ags & pybind11 headers
from pybind11.setup_helpers import Pybind11Extension, build_ext
module = Pybind11Extension(
"python_example",
["src/main.cpp"],
cxx_std=11,
)
setup(
...,
ext_modules=ext_modules,
cmdclass={"build_ext": build_ext}, # Optional!
)
Optional parallel compiler utility included
29. Scikit-build
29
Scikit-build is a CMake-setuptools adaptor from KitWare, the makers of CMake
First introduced as PyCMake at SciPy 2014 and renamed in 2016
Includes CMake for Python and ninja for Python
Pybind11 has a scikit-build example!
pybind/cmake_example is one of the most popular
examples of combining CMake and Python on GitHub
Updated now with lots of
fi
xes!
But this duplicates the code for everyone
Adding Apple Silicon support will now have to be
done on every project that copied the example, etc.
Two new maintainers recently joined the project
One from pybind11/cibuildwheel/build (me), one from cibuildwheel/manylinux (Matthieu Darbois)!
cmake package for Python
manylinux archs • musllinux • Apple Silicon • cibuildwheel • nox
Revamped ninja for Python too
OS’s and archs • cibuildwheel • nox
If you need a refresher on CMake, I wrote a book for that: https://cliutils.gitlab.io/modern-cmake/
30. Scikit-build plans
30
Develop scikit-build-core
PEP 517 builder, setuptools/distutils free
Compatibility layer for scikit-build
Limited public API helps
Proper setuptools extension
And Hatch, Poetry, etc.
Generalize, perhaps?
PEP 621 direct build
Best for many cases?
Add extension discovery mechanism
Easy integration with pybind11, other Python packages!
Possible support in CMake itself
# pyproject.toml
requires = ["pybind11", …]
# CMakeLists.txt
find_package(pybind11 CONFIG REQUIRED)
https://iscinumpy.dev/post/scikit-build-proposal/
34. Redistributables
34
PyPI
Wheels for all common platforms
Conda-forge
Mostly automated, just propose a recipe
manylinux/musllinux images
Linux
Controlled docker images
MacOS
Target version important
Windows
Easiest platform for Python
python.org Python required Any Python will do (NuGet, etc)
Arm / Universal2 cross compile
Multiple architectures 32-bit still important,
fl
edgling ARM
Auditwheel Delocate Delvewheel (newish)
35. cibuildwheel 🎡
35
Supports all major CI providers
GitHub Action provided too!
Can run locally
A
ffi
liated (shared maintainer) with manylinux
Close collaboration with PyPy devs
Joined the PyPA in 2021
Used by matplotlib, mypy, scikit-learn, and more
Supports:
Targeting macOS 10.9+
Apple Silicon cross-compiling 3.8+
All variants of manylinux (including emulation)
musllinux
PyPy 3.7-3.9
Repairing and testing wheels
Reproducible pinned defaults (can unpin)
New in cibuildwheel 2
Python 2 & 3.5 removed, 3.10 added
Pre-release Python support
pyproject.toml support
Optional pypa/build support
All pybind examples
include cibuildwheel!
New in cibuildwheel 2.1-2.4
Local Windows & MacOS runs
TOML overrides array
manylinux2014 default
musllinux
Environment variable passthrough
Experimental Windows ARM
New in 2.5, released today!
Stable ABI support
Build from SDist
tomllib on Python 3.11 (host)
42. Think outside the box
42
clang-format-wheel
Scikit-Build
Runs LLVM’s CMake build
cibuildwheel
Builds python-independent binary wheels
1-2 MB binaries on PyPI
No “binding”, only entrypoint!
- repo: https://github.com/pre-commit/mirrors-clang-format
rev: "v14.0.1"
hooks:
- id: clang-format
types_or: [c++, c, cuda]
pipx run clang-format
Use with pre-commit, even on pre-commit.ci!
43. NEW
Quickstart: scikit-hep/cookie
43
You can make a project following the Scikit-HEP developer
guidelines quickly with cookiecutter
pipx run cookiecutter gh:scikit-hep/cookie
Choose pybind11 or skbuild from the eleven backend choices!
Generation tested with Nox in GitHub Actions
Pyodide powered in-browser repo-review tool
See https://scikit-hep.org/developer
& https://github.com/scikit-hep/cookie
(Not HEP speci
fi
c, except for the defaults)
Also maturin (Rust)!
44. Key Takeaways
44
A great place for modern packaging advice:
https://scikit-hep.org/developer
My blog with useful links at the top:
https://iscinumpy.dev
Great examples:
https://github.com/pybind/python_example
https://github.com/pybind/scikit_build_example
https://github.com/pybind/cmake_example
https://github.com/scikit-build/scikit-build-sample-projects
Example source code:
https://github.com/henryiii/pybind11_skbuild_cibuildwheel_example
This work was partially supported by the National Science Foundation under Cooperative Agreement OAC-1836650.
45. My Projects
45
Plumbum • POVM • PyTest GHA annotate-failures
https://iscinumpy.dev
https://scikit-hep.org
https://iris-hep.org
C++ & Python
Building Python Packages
Scikit-HEP: Other
Other C++
Scikit-HEP: Histograms
pybind11 (python_example, cmake_example, scikit_build_example) • Conda-Forge ROOT
cibuildwheel • build • scikit-build (cmake, ninja, sample-projects) • Scikit-HEP/cookie
boost-histogram • Hist • UHI • uproot-browser
Vector • Particle • DecayLanguage • repo-review
Other Python
Jekyll-Indico
Other Ruby
CLI11 • GooFit
Modern CMake • CMake Workshop
Computational Physics Class
Python CPU, GPU, Compiled minicourses
Level Up Your Python
My books and workshops
47. A few features I’ve helped with
47
py::kw_only and py::pos_only
Support Python 3 keyword only arguments
Support Python 3.8 position only arguments
All from any version of Python (C API)
py::prepend
Add to the beginning of the overload chain
py::type
Access and manipulate the type
https://iscinumpy.gitlab.io/post/pybind11-2-6-0/
Checks can be run from nox
Easier for new developers to contribute
Large clang-tidy cleanups
Readability, better performance, modernization
CMake
Integration with standard CMake features
CMake 3.4+ required
Support for newer features (3.18.2+ best)
FindPython optionally supported
CUDA as a language supported and tested
Python discovery can be deactivated
Portable con
fi
g
fi
le (now included in PyPI package)
Helper functions added
Check importable libraries easily
New modular target design
Follows best practices given in
https://cliutils.gitlab.io/modern-cmake
48. New CI
48
Massive rewrite of the CI in GHA, with 60+ jobs covering far more than ever before
Special thanks to Google for funding extra parallel jobs!
Jobs:
Windows, macOS and Linux, Python 2.7-3.9 (3.10 now) & PyPI
GCC (4.8 and several newer), Clang (8 versions), MSVC 2015-2019
ICC, NVCC (CUDA 11), and NVHPC (PGI) 20.9
MinGW
CentOS 7 and 8
C++: 11, 14, 17, and 20
Debug build
Valgrind build
Clang-tidy build
CMake con
fi
gure check, 3.4, 3.7, 3.8, and 3.18
Packaging tests verifying every
fi
le in the wheel/SDist
Newly
supported
compilers!
Discovered and
fi
xed
bug in CPython 3.9.0
49. A family of projects
49
pybind/python_example:
A setuptools project using setuptools helpers
and binary wheel builds, conda recipe, and more.
pybind/scikit_build_example:
A CMake project using scikit-build (new for 2.6).
pybind/cmake_example:
A CMake project using manual setuptools integration.
Major cleanup for 2.6, now includes Apple Silicon
support and more.
pybind/pybind11_mkdoc:
A documentation parser using LLVM.
Support for conda, pip, and
cibuildwheel work
fl
ows on GitHub Actions
Support for Apple Silicon cross-compile
Support for PyPy wheels
Dependabot GHA updates
Template repositories now
And several more that I’ve not helped with
50. Snippets from boost-histogram
50
Exporting C++ addition to Python
.def(py::self += py::self)
Custom equality, with cast
.def("__eq__",
[](const histogram_t& self, const py::object& other) {
try {
return self == py::cast<histogram_t>(other);
} catch(const py::cast_error&) {
return false;
}
}
)
52. Snippets from boost-histogram
52
Static constructor with automatic vectorization (ufuct-like)
Direct access to structured memory in NumPy
.def_static("_make", py::vectorize([](const double& a, const double& b) {
return weighted_sum(a, b);
}))
PYBIND11_NUMPY_DTYPE(weighted_mean,
sum_of_weights,
sum_of_weights_squared,
value,
_sum_of_weighted_deltas_squared);