By Katerina Barone-Adesi.
Discover property-based testing, and see how it works on a real project, the pflua compiler.
How do you find a lot of non-obvious bugs in an afternoon? Write a property that should always be true (like "this code should have the same result before and after it's optimized"), generate random valid expressions, and study the counter-examples!
Property-based testing is a powerful technique for finding bugs quickly. It can partly replace unit tests, leading to a more flexible test suite that generates more cases and finds more bugs in less time.
It's really quick and easy to get started with property-based testing. You can use existing tools like QuickCheck, or write your own: Andy Windo and I wrote pflua-quickcheck and found a half-dozen bugs with it in one afternoon, using pure Lua and no external libraries.
In this talk, I will introduce property-based testing, demonstrate a tool for using it in Lua - and how to write your own property-based testing tool from scratch, and explain how simple properties found bugs in pflua.
(c) 2015 FOSDEM VZW
CC BY 2.0 BE
https://archive.fosdem.org/2015/
Property-based testing an open-source compiler, pflua (FOSDEM 2015)
1. Property-based testing an
open source compiler, pflua
A fast and easy way to find bugs
kbarone@igalia.com ( luatime.org )
www.igalia.com
Katerina Barone-Adesi
2. Summary
● What is property-based testing?
● Why is it worth using?
● Property-based testing case study with pflua,
an open source compiler
● How do you implement it in an afternoon?
● What tools already exist?
3. Why test?
● Reliability
● Interoperability
● Avoiding regressions
● … but this is the test room, so
hopefully people already think testing
is useful and necessary
4. Why property-based testing?
● Writing tests by hand is slow, boring,
expensive, and usually doesn't lead to
many tests being written
● Generating tests is cheaper, faster,
more flexible, and more fun
● Covers cases humans might not
5. Why is it more flexible?
● Have you ever written a set of unit
tests, then had to change them all by
hand as the code changes?
● It's a lot easier and faster to change
one part of test generation instead!
6. What is property-based testing?
● Choose a property (a statement that
should always be true), such as:
● somefunc(x, y) < 100
● sort(sort(x)) == sort(x) (for stable sorts)
● run(expr) == run(optimize(expr))
● our_app(input) == other_app(input)
7. What is property-based testing not?
● A formal proof
● Exhaustive (except for very small types)
● What that means: property-based testing tries
to find counter-examples. If you find a counter-
example, something is wrong and must be
changed. If you don't, it's evidence (NOT proof)
towards that part of your program being correct.
8. Why not exhaustively test?
● Too difficult
● Too expensive
● Too resource-consuming (human and computer
time)
● Formal methods and state space reduction
have limitations
9. What is pflua?
● Pflua is a source to source compiler
● It takes libpcap's filter language (which we call
pflang), and emits lua code
● Why? This lets us run the lua code with luajit
● Performance: better than libpcap, often by a
factor of two or more
● https://github.com/Igalia/pflua/
● Apache License, Version 2.0
10.
11. What is pflang?
● The input for pflua, libpcap, and other tools
● Igalia's name for it, not an official name
● A language for defining packet filters
● Examples: “ip”, “tcp”, “tcp port 80”, …
● tcp port 80 and not host 192.168.0.1
● If you've used wireshark or tcpdump,
you've used pflang
12. Case study: testing pflua
● Pflua already had two forms of testing,
and works in practice
● Andy Wingo and I implemented a
property-based checker in an
afternoon, with one property...
13. What was the test property?
● lua code generated from optimized and
unoptimized IR has the same result on the
same random packet
● It compared two paths:
● Input → IR → optimize(IR) →
compile → run()
● Input → IR → (no change) →
compile → run()
14. What happened?
● We found 6/7 bugs
● Some are ones we were unlikely to
find with testing by hand
● Remember: pflua is an already-tested,
working project
15. What were the bugs?
● Accidental comments: 8--2 is 8, not 10! (Lua)
● Invalid optimization: ntohs/ntohl
● Generating invalid lua (return must end block)
● Range analysis: range folding bug (→ inf)
● Range analysis: not setting range of len
● Range analysis: NaN (inf – inf is not your friend)
● + a Luajit bug, found later by the same test
16. Case study recap
● Property-based testing is useful even for
seemingly-working, seemingly-mature code
● We found 3 bugs in range analysis
● We were unlikely to have found all 3 bugs with
unit testing by hand
● This was code that appeared to work
● Typical use didn't cause any visible problem
● 4 of the 6 bugs fixed that afternoon
17. Property-based testing: how?
● for i = 1,100 do
local g = generate_test_case()
run_test_case(property, g)
● Conceptually, it's that simple:
Generate and run tests (handling exceptions)
● With premade tools, you need a property,
and (sometimes) a random test generator
18. How to generate test cases
● The simplest version is unweighted choices:
function True() return { 'true' } end
function Comparison()
return { ComparisonOp(), Arithmetic(),
Arithmetic() } end
…
function Logical()
return choose({ Conditional, Comparison,
True, False, Fail })() end
19. Are unweighted choices enough?
● math.random(0, 2^32-1)
● Property: 1/y <= y
● False iff y = 0
● 4 billion test cases doesn't guarantee this will
be found...
● What are other common edge case numbers?
21. Write your own checker!
for i = 1,iterations do
local packet, packet_idx = choose(packets)
local P, len = packet.packet, packet.len
random_ir = Logical()
local unopt_lua = codegen.compile(random_ir)
local optimized = optimize.optimize(random_ir)
local opt_lua = codegen.compile(optimized)
if unopt_lua(P, len) ~= opt_lua(P, len)
then print_details_and_exit() end
end
22. Test generation problems
● Large, hard-to-analyze test cases
● Defaults to randomly searching the
solution space; randomly testing that
plain 'false' is still 'false' after
optimization as 20% of your 1000
tests is a bit daft
23. What level to test?
● For a compiler: the front-end language? Various
levels of IR? Other?
● In general: input? Internal objects?
● Tradeoffs: whitebox testing with internals can
be useful, but can break systems with internals
that the system itself cannot create.
● Testing multiple levels is possible
● Tends to test edge cases of lower levels
24. Interaction with interface stability
● At any level, more flexible than hand unit
testing
● Interfaces change. Inputs hopefully change
rarely; internals may change often
● Property-based testing makes refactoring
cheaper and easier: less code to change when
internals change, more test coverage
25. It's still worth unit testing
● Use property-based testing to find bugs (and
classes of bugs)
● Use unit tests for avoiding regressions;
continue to routinely test code that has already
caused problems, to reduce the chances that
known bugs will be re-introduced
● Use unit testing if test generation is infeasible,
or for extremely rare paths
26. Reproducible tests
● There are some pitfalls to outputting a
random seed to re-run tests
● The RNG may not produce consistent
results across platforms or be stable
across upgrades
● (Rare) Bugs in your compiler / interpreter
/ libraries can hinder reproducibility
27. Existing tools: QuickCheck
● Originally in Haskell; has been widely ported to
other languages
● Better tools for test case generation
● Allows filtering test cases
● Starts with small test cases
● QuickCheck2: test case minimization
28. The future of test generation
● Hypothesis, by David Ritchie MacIver (Python)
● https://github.com/DRMacIver/hypothesis
● Example database is better than saving seeds - it
propagates interesting examples between tests.
● Much smarter data generation
● Adapts to conditional tests better
● Blurs the lines between fuzz testing, conventional
unit testing and property based testing.
29. Forward-looking Hypothesis
● The following are planned, but not implemented
● Using coverage information to drive example
generation
● Adding "combining rules" which allow you to
also express things like "set | set -> set" and
then it can test properties on those too.
● Better workflows around integrating into CI
● End-of-February 1.0 release predicted
30. Other stable tools
● Scalacheck
● Quviq's Quickcheck for Erlang
● Have/inspired some of the benefits of
Hypothesis, but are already mature and widely
used
31. Conclusions
● Property-based testing finds tricky bugs and
saves time
● You can start it in an afternoon, with no tools
● There are some pretty helpful existing tools
(QuickCheck, Hypothesis, ScalaCheck, etc)
● Start property-based testing today!
● Or Monday, at least.