14. I’m trying to avoid this… Image source: icanhascheezburger.com
15.
16.
17.
18.
19.
20.
21.
22. Taking a look at the demo Image produced by the author. Data based on real execution of sort functions.
23.
24.
25.
26.
27. The Centre for Australian Weather and Climate Research A partnership between CSIRO and the Bureau of Meteorology Tennessee Leeuwenburg Phone: 03 9669 4310 Work Email: t.leeuwenburg@bom.gov.au Email: tleeuwenburg@gmail.com Web: www.cawcr.gov.au Thank you www.cawcr.gov.au
Hinweis der Redaktion
Introduce self as a speaker… Name: Tennessee Leeuwenburg: Qualifications: Bachelor of Computer Science, Diploma of Philosophy, Masters of Business Administration Manage a Team of 2-3 developers
History of benchmarking – “Fun Facts!” Image source: The Internet
This graph shows a situation where quality can be simply measured. The message of the graph is one of improvement compared to competitors, eventually reaching a rough parity.
I am trying here to introduce a general topic by reference to a specific example. What’s being demonstrated here is the principle of visualisation and the principle of measurement.
The yellow line is the normalised CPython performance. It doesn’t mean CPython took a constant amount of time in all tests. The blue graph shows the proportional time that PyPy took relative to CPython. The significance of that is not immediately apparent. It shows, however, that PyPy is basically faster across the board. It’s an impressive graphic, and a convincing piece of communication. Image Source: speed.pypy.org
This is a pretty high level principle. It might not have that much apparent value to a software engineer who is more interested in how to dis-assemble a problem into its parts. This isn’t so much about problem solving as monitoring symptoms. And that’s what you actually need to communicate to people to convince them they do or don’t have a problem to fix. Measurement and comparison can be used for problem diagnosis (and communication of such), but benchmarking is about high-level monitoring.
Okay, now I admit it’s a stretch to say that this graph actually caused PyPy to get faster, especially since I’m not a contributor to that project. The point is that PyPy are using this graph to (1) communicate the idea that they are on a path of progress and (2) understand themselves their level of performance. It provides a layer of truth to perceptions about progress and speed.
There are often many hidden factors which can influence a result. Getting the benefits from benchmarking usually come with time and analysis, not for free from some graph. If you are lucky enough to work in an area with standard tests, you may be able to sidestep some of the confusion.
Now that we have taken a look at one example, it’s time to consider the issue more generally. Assuming still that we are measuring speed, what is the context of that measurement? Historical performance is one piece of context – it tells you about improvement, which is always important. However benchmarking by configuration is important too. Maybe your test system is nice and fast, but your production environment has double the data size, and you have a bad exponential algorithm which only really bites you when peak load hits. Maybe your