A few weeks into your tenure as a software engineer at the Ministry of Truth you are assigned your first real feature request: write a context manager that can make “2 + 2” equal 5 at runtime. Your solution should be written only in Python (for maximum portability). Absurd? Perhaps, but you know better than to ask questions. You are no thought-criminal.
In this talk I walk through the steps I took to modify the value of two plus two in CPython at runtime—using only Python and the ctypes module. What began for me as a silly and frivolous side project became an education in how the python data model works behind the scenes and how CPython compiles, optimizes, and executes python code.
The goal of this talk is to provide an introduction to CPython internals while walking through the steps needed to monkeypatch integer addition to make “2 + 2” equal 5. The audience should come away with a better understanding of how python objects and types are represented in memory, how references are counted, and how python scripts are transformed into abstract syntax trees, compiled into code objects, and then executed by the CPython virtual stack machine. And because I’ve limited myself to using ctypes, these topics can be explored without familiarity with C as a prerequisite.
4. Naive approach
old_int_add = int.__add__
def int_add(a, b):
if a == b == 2:
return 5
else:
return old_int_add(a, b)
int.__add__ = int_add
int.__dict__['__add__'] = int_add
TypeError: can't set attributes of built-in/extension type 'int'
TypeError: 'dictproxy' object does not support item assignment
23. Using the dis module to see what’s going on
import dis
two = 2
def add_two_plus_two():
return two + two
>>> dis.dis(add_two_plus_two)
2 0 LOAD_GLOBAL 0 (two)
3 LOAD_GLOBAL 0 (two)
6 BINARY_ADD
7 RETURN_VALUE
24. The BINARY_ADD instruction opcode
/* ceval.c: PyEval_EvalFrameEx */
TARGET_NOARG(BINARY_ADD) {
w = POP();
v = TOP();
if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
/* INLINE: int + int */
register long a, b, i;
a = PyInt_AS_LONG(v);
b = PyInt_AS_LONG(w);
i = (long)((unsigned long)a + b);
x = PyInt_FromLong(i);
}
/* ... */
}
PyInt_CheckExact(v) PyInt_CheckExact(w)
25. class int2(int):
def __add__(self, other):
if self == other == 2:
return 5
else:
return int.__add__(self, other)
>>> (2).__class__ = int2
Solution: change (2).__class__ to something
other than int
TypeError: __class__ assignment: only for heap types
27. >>> with override_type(2, int2):
... print(eval("2 + 2"))
5
>>> two = 2
>>> with override_type(2, int2):
... print(two + two)
5
>>> with override_type(2, int2):
... print(2 + 2)
4
28. Final obstacle: peephole optimization
• When we disassembled the bytecode earlier, we used a
variable two rather than a literal 2:
import dis
two = 2
def add_two_plus_two():
return two + two
>>> dis.dis(add_two_plus_two)
2 0 LOAD_GLOBAL 0 (two)
3 LOAD_GLOBAL 0 (two)
6 BINARY_ADD
7 RETURN_VALUE
29. Final obstacle
• What happens if we use a literal 2?
import dis
def add_two_plus_two():
return 2 + 2
>>> dis.dis(add_two_plus_two)
4 0 LOAD_CONST 2 (4)
3 RETURN_VALUE
?!
30. What’s going on?
• peephole optimization: an optimization technique in compilers
where certain recognized instructions are replaced with
shorter or faster versions.
• In CPython, performed by the C function PyCode_Optimize
• Does not occur in an eval, hence why eval("2 + 2") works.