In which Richard will tell you about some things you should never (probably ever) do to or in Python. Warranties may be voided. The recording of this talk is online at http://www.youtube.com/watch?v=H2yfXnUb1S4
2. Monday, 8 July 13
In this talk I'm going to poke around in some strange corners of Python and perhaps show
you some things you can do with Python that you probably shouldn't. First up I'm going to
look at some strange edge cases of the Python grammar.
3. >>> from serialise_marshal import SerialiseMarshal
>>> try:
... from serialise_json import SerialiseJSON
... except:
... SerialiseJSON = None
... pass
...
Monday, 8 July 13
Let's say we have some mixin classes that perform serialisation. Let's say that out preferred
mixin might not be available, but we want things to go on rocking regardless.
4. >>> from serialise_marshal import SerialiseMarshal
>>> try:
... from serialise_json import SerialiseJSON
... except:
... SerialiseJSON = None
... pass
...
>>> class foo(SerialiseJSON or SerialiseMarshal):
... pass
...
Monday, 8 July 13
So, did you know that classes in the bases clause of a class definition could also be
expressions? Oh yes. Because "inheritance" just doesn't cut it in this modern world of rapidly
changing serialisation protocols. We need more "fallbackitance".
5. >>> try:
... B0RK
... except eval('NameError'):
... print("Caught!")
... else:
... print("ok")
...
Caught!
Monday, 8 July 13
So, who knew that except clauses can be expressions? Whatever the expression evaluates to
had better be an Exception base class, but as long as it is, it'll be the exception type that's
caught.
6. >>> def generate_stuff():
... for i in range(3):
... yield 'spam'
... while True:
... yield 'ham'
...
Monday, 8 July 13
Hey, generators are cool, right?
7. >>> def generate_stuff():
... for i in range(3):
... yield 'spam'
... while True:
... yield 'ham'
...
>>> generate_stuff = generate_stuff().__next__
Monday, 8 July 13
When you pluck out their __next__ method you can just keep calling them and they generate
stuff!
8. >>> def generate_stuff():
... for i in range(3):
... yield 'spam'
... while True:
... yield 'ham'
...
>>> generate_stuff = generate_stuff().__next__
>>> generate_stuff()
'spam'
>>> generate_stuff()
'spam'
>>> generate_stuff()
'spam'
>>> generate_stuff()
'ham'
>>> generate_stuff()
'ham'
>>> generate_stuff()
'ham'
>>> generate_stuff()
'ham'
... and so on
Monday, 8 July 13
They're like happy little spewing machines that can make your program become awesomer!
9. def generate_5_assertions():
for i in range(5):
yield AssertionError
while True:
yield RuntimeError
generate_5_assertions = generate_5_assertions().__next__
Monday, 8 July 13
Let's modify our generator to generate exception classes instead of strings.
10. def generate_5_assertions():
for i in range(5):
yield AssertionError
while True:
yield RuntimeError
generate_5_assertions = generate_5_assertions().__next__
import random
while True:
try:
assert random.randint(0, 1)
print('Phew!')
except generate_5_assertions():
print('Assertion Squashed!')
Monday, 8 July 13
And now, in some stupid code that generates stupid assertion errors about half the time, we
can restrict our program so that it's only so tolerant of those errors.
11. don't do this richard$ python3 except_clause.py
Phew!
Assertion Squashed!
Assertion Squashed!
Phew!
Phew!
Assertion Squashed!
Phew!
Phew!
Assertion Squashed!
Phew!
Assertion Squashed!
Phew!
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
AssertionError
Monday, 8 July 13
5 errors and we stop squashing them. Er, don't do this?
12. Monday, 8 July 13
And just in case you thought I was kidding, this is an actual line of code from MongoDB. OK,
this isn't exactly the same, it's avoiding logging 90% of a particular kind of error.
13. Monday, 8 July 13
now let's look at some ways that Python's runtime is perhaps a little more mutable than you
previously thought
14. >>> def f():
... print('ohai there!')
...
>>> f()
ohai there!
Monday, 8 July 13
OK, now let's do something a little more odd. Let's define a functions, say f(). It does a thing.
15. >>> def f():
... print('ohai there!')
...
>>> f()
ohai there!
>>> f.__code__
<code object f at 0x10b25e930, file "<stdin>", line 1>
Monday, 8 July 13
The it does is in its code, and that code object is attached to the function object as the
__code__ attribute.
16. >>> def f():
... print('ohai there!')
...
>>> f()
ohai there!
>>> f.__code__
<code object f at 0x10b25e930, file "<stdin>", line 1>
>>> exec(f.__code__)
ohai there!
Monday, 8 July 13
You can exec code objects. That's fun.
17. >>> def g():
... print('hello, world!')
...
>>> g()
hello, world!
Monday, 8 July 13
Let's make another function.
18. >>> def g():
... print('hello, world!')
...
>>> g()
hello, world!
>>> g.__code__ = f.__code__
>>> g()
ohai there!
Monday, 8 July 13
How many of you knew the __code__ attribute was mutable?
19. >>> with open('some_code.py', 'w') as f:
... f.write('print("Hello, world!")')
...
22
>>> import some_code
Hello, world!
Monday, 8 July 13
The code object is not unique to functions. The code in modules is also encapsulated in a
code object.
20. >>> with open('some_code.py', 'w') as f:
... f.write('print("Hello, world!")')
...
22
>>> import some_code
Hello, world!
>>> print some_code.__cached__
__pycache__/some_code.cpython-33.pyc
Monday, 8 July 13
In fact, the "pyc" file that's written by Python to cache a module's code is the code object
marshalled.
21. >>> with open('some_code.py', 'w') as f:
... f.write('print("Hello, world!")')
...
22
>>> import some_code
Hello, world!
>>> print some_code.__cached__
__pycache__/some_code.cpython-33.pyc
>>> import marshal
>>> with open(some_code.__cached__, 'rb') as f:
... code = marshal.loads(f.read()[12:])
...
Monday, 8 July 13
We can unmarshal that object. And I think you know where I'm heading with this.
22. >>> with open('some_code.py', 'w') as f:
... f.write('print("Hello, world!")')
...
22
>>> import some_code
Hello, world!
>>> print some_code.__cached__
__pycache__/some_code.cpython-33.pyc
>>> import marshal
>>> with open(some_code.__cached__, 'rb') as f:
... code = marshal.loads(f.read()[12:])
...
>>> f.__code__ = code
>>> f()
Hello, world!
Monday, 8 July 13
I can't think of a single reason why you'd ever want to do this, so I'm not even going to
bother to tell you not to. I have a feeling you'd be able to justify it regardless.
23. Monday, 8 July 13
You can also create code objects by hand.
24. Monday, 8 July 13
You don't even have to start with Python source code. Which, let's face it, would be the most
obvious way of constructing code objects by hand. But we're not here for the obvious way to
do things, are we?
25. <python>
<Module>
<FunctionDef name="adder">
<arguments><arg arg="a" /><arg arg="b" /></arguments>
<body>
<Return>
<Add>
<left><Load id="a" /></left>
<right><Load id="b" /></right>
</Add>
</Return>
</body>
</FunctionDef>
<Expr>
<Call>
<func><Load id="print" /></func>
<args>
<Str value="1 + 2 =" />
<Call>
<func><Load id="adder" /></func>
<args><Num value="1" /><Num value="2" /></args>
</Call>
</args>
</Call>
</Expr>
<Expr>
<Call>
<func><Load id="print" /></func>
<args>
<Str value="one + two =" />
<Call>
<func><Load id="adder" /></func>
<args><Str value="one" /><Str value="two" /></args>
</Call>
</args>
</Call>
</Expr>
</Module>
</python>
Monday, 8 July 13
Witness, for example, the beautiful elegance of this XML Python source. We shall call this
"adder.pyxml". It's it beautiful? And elegant? In this modern age of Service Oriented
Architecture DOMs over well-formed WDSL carriers with ubiquitous SGML DTDs incorporating
the full implementation of OMA DRM, who wouldn't want to code in XML directly? The way we
do this is we parse the XML and construct what's known as an Abstract Syntax Tree which we
can then compile into a code object.
26. ...
Monday, 8 July 13
Which basically consists of a bunch of this. I have it on good authority from someone close to
the ast code that this really isn't done very often. I even managed to provoke a segfault from
a deep corner of the ast code, so that was fun.
27. >>> import pyxml_loader
>>> pyxml_loader.install()
>>>
>>> import adder
>>>
>>> print(adder.add(3, 4))
7
Monday, 8 July 13
So, to make this glorious new possibility a reality, we install our pyxml loader and now we
can import adder.pyxml! Huzzah!
28. Monday, 8 July 13
To do this, we create a customer file loader for the import machinery. The import stuff needs
me to register a finder, which will locate pyxml files matching the module name and return a
loader which will actually load the code for the module it found. You can also abuse this to
import SQL files that you can execute. Or write funny little DSLs that meld Python and LISP. Or
write something to implement macros for Python.
29. Monday, 8 July 13
let's throw in some of Python's slightly more powerful introspection capabilities and see what
damage we can do...
30. >>> locals()
{'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__doc__': None, '__package__': None, '__builtins__': <module
'builtins' (built-in)>, '__name__': '__main__'}
Monday, 8 July 13
So, you all know about locals() and globals(), right? They give you a handle on the dictionary
that is the local or global namespace you're in.
31. >>> import inspect
>>> inspect.currentframe().f_locals
{'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__doc__': None, '__package__': None, '__builtins__': <module
'builtins' (built-in)>, '__name__': '__main__'}
Monday, 8 July 13
or as I like to call them by their full name inspect.currentframe.f_locals...
32. >>> locals()
{'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__doc__': None, '__package__': None, '__builtins__': <module
'builtins' (built-in)>, '__name__': '__main__'}
>>> spam = 1
>>> locals()['spam']
1
Monday, 8 July 13
Anyway, you can poke at that dict just like it was a dict (hint: it *is* a dict)
33. >>> locals()
{'__loader__': <class '_frozen_importlib.BuiltinImporter'>,
'__doc__': None, '__package__': None, '__builtins__': <module
'builtins' (built-in)>, '__name__': '__main__'}
>>> spam = 1
>>> locals()['spam']
1
>>> locals()['ham'] = 2
>>> ham
2
Monday, 8 July 13
So of course modifying that dict is possible to create new local or global variables.
34. Given some JSON in a file config.json:
{
"message": "Hello, world!",
"badness_value": 1000
}
Monday, 8 July 13
So, given some JSON .. and I think you may know where I'm going with this
35. Given some JSON in a file config.json:
{
"message": "Hello, world!",
"badness_value": 1000
}
>>> from json_loader import import_json
>>> import_json('config.json')
>>> message
'Hello, world!'
>>> badness_value
1000
Monday, 8 July 13
Yes, loading variables directly from JSON files.
36. import json
import inspect
def import_json(filename):
caller = inspect.currentframe().f_back
caller.f_locals.update(json.load(open('config.json')))
Monday, 8 July 13
Why should import * hog all the namespace pollution fun? We look up the call stack to find
the local namespace of interest - the inspect module provides some handy features for this.
First we get the frame - think of it as the state of the function. Each frame has a reference to
its calling frame in f_back, and its local variables in f_locals. This is far from the worst thing
I'll show you today, but it's still worth saying: you probably shouldn't do this.
37. >>> import marshal
>>> class SerialiseMarshal(object):
... @staticmethod
... def serialise(data):
... return marshal.dumps(data)
...
>>> class DoStuff(SerialiseMarshal):
... def do_stuff(self):
... return self.serialise(dict(message="Hello, world!"))
...
>>> DoStuff().do_stuff()
b'{ux07x00x00x00messageurx00x00x00Hello, world!0'
Monday, 8 July 13
OK, on to another kind of unexpected mutability. Let's go back to our serialisation idea. Say
we have this kind of setup where a doing stuff class inherits from a mixin class to serialise
some data using the marshal module.
38. >>> import json
>>> class SerialiseJSON(object):
... @staticmethod
... def serialise(data):
... return json.dumps(data)
...
Monday, 8 July 13
Let's say we decide a little while later in the code that we want to stop serialising with
marshal and use JSON instead.
39. >>> import json
>>> class SerialiseJSON(object):
... @staticmethod
... def serialise(data):
... return json.dumps(data)
...
>>> DoStuff.__bases__ = (SerialiseJSON,)
>>> DoStuff().do_stuff()
'{"message": "Hello, world!"}'
Monday, 8 July 13
We can just swap out the old mixin class and replace it with the new one and hey presto
DON'T DO THIS.
40. class MyContextManager(object):
def __enter__(self):
# do stuff at start
def __exit__(self, exc_type, exc_val, exc_tb):
# do stuff at exit
with MyContextManager():
# do stuff!
Monday, 8 July 13
So context managers are pretty neat, right? So who else, when they saw them for the first
time, thought "hey, I reckon we could hack some namespaces right here..." No? Oh, well I did.
41. >>> from context_capture import capture_in
>>> d = {}
>>> with capture_in(d):
... spam = 'ham'
...
>>> d
{'spam': 'ham'}
Monday, 8 July 13
Here's a context manager that'll snarf all local variable assignments and copy them into a
dictionary called "d". Kind of like a little backup namespace. I am not going to justify this to
you!
42. >>> from context_capture import capture_on
>>> class T(object):
... def __init__(self):
... with capture_on(self):
... spam = 'spam'
... ham = 'ham'
...
>>> t = T()
>>> t.spam
'spam'
Monday, 8 July 13
It's pretty easy to modify the code to capture the locals onto another object. No more typing
"self" all the time!
43. >>> from context_capture import capture_globals
>>> def foo():
... with capture_globals():
... spam = 'ham'
... print(spam)
...
>>> foo()
ham
>>> spam
'ham'
Monday, 8 July 13
Who else is sick and tired of typing global all the time? Well we can do away with all those
pesky "global" variable declarations by promoting all local assignments into the global
namespace.
44. class LocalsCapture(object):
def __enter__(self):
caller_frame = inspect.currentframe().f_back
self.local_names = set(caller_frame.f_locals)
return self
def __exit__(self, exc_type, exc_val, exc_tb):
caller_frame = inspect.currentframe().f_back
for name in caller_frame.f_locals:
if name not in self.local_names:
self.capture(name, caller_frame.f_locals[name])
class capture_in(LocalsCapture):
def __init__(self, namespace):
self.namespace = namespace
def capture(self, name, value):
self.namespace[name] = value
class capture_globals(capture_in):
def __init__(self):
caller_frame = inspect.currentframe().f_back
super(capture_globals, self).__init__(caller_frame.f_globals)
Monday, 8 July 13
How does it work? Well, recall our context manager is invoked twice. The enter is invoked at
the start, of the with block, so when that happens we snapshot the local variable names
belonging to the caller's frame using old f_back and f_locals. When the with block exits we
are invoked again so in exit we see what new local names exist and capture the new ones.
45. Monday, 8 July 13
A common problem we software developers face is that we're often asked to find out why
some live code has gone awry. Sometimes we're not the author of the code, and sometimes
we're not even familiar with the deployment scenario. And sometimes it's late at night and
you're just not at all happy about having been called up to fix someone else's mess.
46. print 'query =', query
Monday, 8 July 13
So we try to print out some values like the web query but we have no idea where the output
goes.
47. import sys
print >>sys.stderr, 'query =', query
Monday, 8 July 13
So we try maybe standard error, maybe that'll make it to the server logs?
48. import logging
logging.debug('query = %r', query)
Monday, 8 July 13
Nope, ok, maybe logging? But seriously, there's so many ways this can fail - not knowing
where the log file is or what the logging level is set to. Ugh.
49. pip install q
To print the value of foo, put this in your program:
import q; q(foo)
Output will go to /tmp/q (or $TMPDIR/q), so:
tail -f /tmp/q
Monday, 8 July 13
So the q module was born. Quick and dirty debugging output for tired programmers.
50. results in this in the "q" file:
Monday, 8 July 13
The "q" module not only dumps the value but also includes the context the value was seen in
including the expression that created the value and the function the q invocation was made
in.
52. results in this in the "q" file:
Monday, 8 July 13
The decorator tracing gives you information about what arguments the function was called
with and the return value from the function. It's clever if the return value is huge - that gets
stored off in a separate file referenced from the q log. But there's a lot of funky stuff going on
in q.
53. info = self.inspect.getframeinfo(self.sys._getframe(1), context=9)
# info.index is the index of the line containing the end of the call
# expression, so this gets a few lines up to the end of the expression.
lines = ['']
if info.code_context:
lines = info.code_context[:info.index + 1]
# If we see "@q" on a single line, behave like a trace decorator.
if lines[-1].strip().startswith('@') and args:
return self.trace(args[0])
Monday, 8 July 13
Just to give you some idea of how one bit works, this is how we determine whether q has
been invoked as a decorator or just as a value-dumping function with a callable argument.
The decorator usage is detected using this code. It walks the call stack to see whether we're
invoked as a function or decorator by looking at the actual source code of the call site - if it
looks like a decorator we declare it a decorator!
54. class OverloadDemo {
void test() {
System.out.println("No parameters");
}
// Overload test for one integer parameter.
void test(int a) {
System.out.println("a: " + a);
}
// Overload test for two integer parameters.
void test(int a, int b) {
System.out.println("a and b: " + a + " " + b);
}
// overload test for a double parameter
void test(double a) {
System.out.println("double a: " + a);
}
}
Monday, 8 July 13
This is Java. Don't do this.
So back when I was teaching Python at university I had a student ask me how to do
overloading of methods like Java does. I said that "Python doesn't work that way". Then I
thought for a moment, and said "ask me again next week".
56. >>> class A(object):
... @overload
... def method(self, a):
... return 'a'
... @method.add
... def method(self, a, b):
... return 'a, b'
...
>>> a = A()
>>> a.method(1)
'a'
>>> a.method(1, 2)
'a, b'
Monday, 8 July 13
And here you go, method overloading.
57. >>> @overload
... def func(a:int):
... return 'int'
...
>>> @func.add
... def func(a:str):
... return 'str'
...
>>> func(1)
'int'
>>> func('s')
'str'
>>> func(1.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "overload.py", line 94, in f
raise TypeError('invalid call argument(s)')
TypeError: invalid call argument(s)
Monday, 8 July 13
Overloading functions works too. As do function argument annotations. You can even
overload classmethods, staticmethods and, if you really need to, classes themselves.
58. func.__defaults__
Monday, 8 July 13
The implementation uses a bunch of introspection into functions for things like the default
argument values supplied at function creation time.
59. func.__code__.co_argcount
func.__code__.co_varnames
Monday, 8 July 13
The code object's required argument count and the names of those required arguments. We
match arguments passed into the function to the function signature using basically the same
method as regular function calls but trying to find the first function signature that accepts
the passed arguments. For each element in the argcount, we pop an element off the fixed
arguments passed in. Or if there's none of those we grab the value from keyword arguments
by argname.
60. func.__annotations__.get(arg)
Monday, 8 July 13
We can also look into the annotations dictionary which works by the named arguments in the
function definition. We only care if ann is a type object and value is an instance of that type. If
it's not a match we discard this overload option and move to the next (if any).
61. func.__code__.co_flags & 0x04
Monday, 8 July 13
And then if the *args flag is set, we can supply remaining arguments passed to the invocation
along as *args values, otherwise
63. if isinstance(callable, (classmethod, staticmethod)):
....
Monday, 8 July 13
So now given we have matched the supplied values to the function signature we invoke the
function. There's some other hacks in there like detecting staticmethod and classmethod
because they have a funky proxyish kinda object which handles the class argument.
64. Monday, 8 July 13
So the q module is nice but there's a something about it that's just a little fishy.
65. import q; q(foo)
Monday, 8 July 13
So, who can see something odd about this? That's right, modules aren't callable. To make this
work, the q module resorts to a bit of a hack.
66. # Install the Q() object in sys.modules so that "import q" gives a callable q.
import sys
sys.modules['q'] = Q()
Monday, 8 July 13
q currently does this, which has side-effects. Most notably, as soon as you replace the
module in sys.modules, it is garbage-collected since there's no references to it (the Q class
does not retain a reference to its module). Thus the Q class needs additional yucky hacks
around imports and other things so it can handle that and additionally pretend to be a
module. There's an alternative: we can make modules callable.
68. Given hello_world.py:
Python 2.7.1 (r271:86832, Aug 5 2011, 03:30:24)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import hello_world
>>> hello_world()
hello, world!
Monday, 8 July 13
Hey, presto, callable modules! But how does callable_modules.enable work?
69. ...
Monday, 8 July 13
The handling of callability is done at the types level (as in C types). Every builtin type like ints
and modules have a PyTypeObject structure. Callability is implemented through a slot called
tp_call. It's a pointer to a C function that is invoked when objects of that type are called. If it's
not set (ie. NULL) then the objects of that type aren't callable. So we need to provide a
callable for the tp_call slot and a way to assign it to the slot using ctypes. Oh yes, ctypes.
70. Monday, 8 July 13
First up, here's our callback for the tp_call slot. We define the C level API for the ternaryfunc
callback and a simple Python function that implements the calling of the __call__ method on
the object (module). The ctypes layer does some interesting things with Python callbacks for
C functions that I won't go into now; suffice to say it took me a while to figure out the last
argtype for the function declaration needed to be c_void_p...
71. Monday, 8 July 13
Next we define the PyTypeObject structure - or enough of it at least - so we can assign to the
tp_call slot. We also need PyObject defined since we need to access the type object through
the C object itself.
72. Monday, 8 July 13
So once we have all those parts in place, we can modify the module type to make its
instances callable!
TODO segfault on missing __call__
73. >>> import callable_modules
>>> callable_modules.enable()
>>>
>>> import string
>>> def called(*args, **kw):
... print 'called with', args, kw
...
>>> string.__call__ = called
>>> string()
called with () {}
>>> string(1, 2, three='four')
called with (1, 2) {'three': 'four'}
Monday, 8 July 13
74. Monday, 8 July 13
Using similar ctypes hackery we can modify builtin types to add new attributes. Yes, this has
been done, see http://clarete.github.com/forbiddenfruit/.
75. >>> from forbiddenfruit import curse
>>> from datetime import timedelta, datetime
>>> curse(int, 'days', property(lambda s: timedelta(s)))
>>> (12).days
datetime.timedelta(12)
>>> curse(timedelta, 'ago', property(lambda s: datetime.now() - s))
>>> print (12).days.ago
2013-05-31 18:56:49.745315
Monday, 8 July 13
The above is inspired by http://shouldly.github.com/ from Ruby land.
79. Controlling Minecraft from Python
- demo game of life or something
- maybe "import this" in Minecraft?
Monday, 8 July 13
80. def reraise_as(new_type):
e_type, e_value, e_traceback = sys.exc_info()
new_type = new_exception_or_type
new_exception = new_type()
new_exception.__cause__ = e_value
try:
raise new_type, new_exception, e_traceback
finally:
del e_traceback
try:
do_something_crazy()
except Exception:
reraise_as(UnhandledException)
Monday, 8 July 13
This is a neat idea by dcramer (TODO name) which uses the new __cause__ attribute of
exceptions to allow you to re-raise an exception under a different type while not losing any
information. https://github.com/dcramer/reraise
81. TODO
Monday, 8 July 13
bytecodehacks
"optimisations"
automatic "self"
what's the worst thing we could do with bytecode?