Python code is compiled into machine instructions by CPython. Small integers (-5 to 256) and many string literals are interned or singletons, reused in memory. Beginners sometimes confuse object identity (is) with equality (==). The Python interpreter performs optimizations like replacing mutable objects with immutable ones and precalculating expressions. Developers can inspect objects and disassemble code to understand how Python internals affect performance.
2. Abstractions galore
Programming is all about abstractions.
All non-trivial abstractions, to some
degree, are leaky.
Joel Spolsky’s Law of Leaky Abstractions
CPython translates your Python code into
machine instructions.
3. Small integers are singletons
● Everything is an object
● Integers are used all over the place
● Small integers are singletons, there is only
one copy of each small number
● -5 through to 256 are all cached like this
4. Small integers are singletons
Beginners confuse is with ==
Identity is not the same as equality
Use is only for objects that you know to be
singletons, always, like None.
5. Some strings are interned
Interning: reusing a singleton copy on demand
● All identifiers are interned
● Many string literals are interned
Comparing pointers in C is so much faster than
comparing the contents
Everything is a dictionary -> lots of comparing
6. When to intern too
Python code can use the intern() function
Use this together with is identity testing
When:
● Large numbers of strings
● Lots of dictionary access or other equality
tests
Everything in Python is an object. This includes integers.
And integers are used all over the place; ask for the length of something? The result is an integer object. Want to index into a list? You have to create an integer object for that. Etc.
To save on creating and destroying too many objects, the most commonly used integer values are cached; you always get the *same, single copy* of such integers. That’s fine, because these objects are immutable, sharing a single copy carries no risk of corruption.
The values -5 through to 256 are all singletons.
Why does this matter? Beginners may get confused between `is` and `==` , between identity testing and equality testing.
The former tests if two object references are pointing to the same object, while the latter tests if two objects contain the same value.
Since these small integers are all singletons, `is` *always works for these*; if you have a value 42 in one place and another 42 in another, they are always the same object. But for larger integers this fails and confusion ensues.
Demo:
foo = 6 * 8
bar = 40 + 2
foo == bar
foo is bar
Some strings are cached too; they are *interned*.
Interning means that the interpreter will explicitly reuse a singleton version of strings. Small integers are reused by the constructor, so *always*. Interned strings are only reused explicitly, the interpreter makes a decision to intern a string when creating one.
In Python, all identifiers (names in your program, including attributes on objects) are interned. Creating a class? Then your class name, all attributes including the method names and all arguments and local names in the functions are all interned.
When you create a string literal, so a string value in quotes, and the value *looks enough* like an identifier, then it is interned too.
Why was this done?
Namespaces in Python are dictionaries. A lookup in a dictionary is fast thanks to hashing, but always requires an equality test on the key too, because hashing into the dictionary table is not unique. Python code does **loads** of namespace lookups, all the time.
Python first tests if the C pointers are the same, an identity test, because that is so much faster than a string comparison, character by character.
If your program has to handle a *lot* of text lookups (in dictionaries, for example), it could be advantageous to use interning too.
You can use the built-in function `intern()` to produce singletons; apply it judiciously to your dictionary keys and anywhere you want to test for those keys.
The peephole optimizer is part of the Python compiler. It applies a few tricks to your code.
The Python compiler stores not just code, but also constants; integers, strings, tuples, anything that is immutable and defined with your code is stored as a constant for quick and easy access.
To aid in this:
Expressions are simplified
Some mutable objects are replaced with immutables
(Next slides elaborate)
Mutable literals in membership tests are replaced with an immutable variant
list -> tuple
set -> frozenset
Membership testing in sets is faster than using a tuple, make use of it!