2. Testing is a good thing
But how do we know our tests are
good?
3. Code coverage is a start
But it can give a “good” score with
really dreadful tests
4. Really dreadful tests
class Adder
def self.add (x, y)
return x - y
end
end
describe Adder do
it "should add the two arguments" do
Adder.add(1, 1)
end
end
Coverage: 100%
Usefulness: 0
8. If you can change the code, and a
test doesn’t fail, either the code is
never run or the tests are wrong.
9. How?
1. Run test suite
2. Change code (mutate)
3. Run test suite again
If tests now fail, mutant dies. Otherwise it
survives.
10. Going with our previous example
class Adder
def self.add (x, y)
return x - y
end
end Let’s change something
describe Adder do
it "should add the two arguments" do
Adder.add(1, 1)
end
end
11. Going with our previous example
class Adder
def self.add (x, y)
return x + y
end
end
This still passes
describe Adder do
it "should add the two arguments" do
Adder.add(1, 1)
end
end
13. So what? It caught a really
rubbish test
How about something slightly less
obvious?
14. Slightly less obvious (and I mean slightly)
class ConditionChecker
def self.check(a, b)
if a && b
return 42
else
return 0
end
end
end
describe ConditionChecker do
it "should return 42 when both arguments are true" do
ConditionChecker.check(true, true).should == 42
end
it "should return 0 when both arguments are false" do
ConditionChecker.check(false, false).should == 0
end
end Coverage: 100%
Usefulness: >0
But still wrong
15. Slightly less obvious (and I mean slightly)
class ConditionChecker
def self.check(a, b)
if a && b
return 42
else Mutate
return 0
end
end
end
describe ConditionChecker do
it "should return 42 when both arguments are true" do
ConditionChecker.check(true, true).should == 42
end
it "should return 0 when both arguments are false" do
ConditionChecker.check(false, false).should == 0
end
end
16. Slightly less obvious (and I mean slightly)
class ConditionChecker
def self.check(a, b)
if a || b
return 42
else
return 0
end
end
end
describe ConditionChecker do Passing tests
it "should return 42 when both arguments are true" do
ConditionChecker.check(true, true).should == 42
end
it "should return 0 when both arguments are false" do
ConditionChecker.check(false, false).should == 0
end
end
19. The downfall of mutation
(Equivalent Mutants)
index = 0
while index != 100 do
doStuff()
index += 1
end
Mutates to
index = 0
while index < 100 do
doStuff()
index += 1
end
But the programs are equivalent, so no test will fail
20. There is no possible test which
can “kill” the mutant
The programs are equivalent
22. How bad is it?
• Good paper assessing the problem [SZ10]
• Took 7 widely used, “large” projects
• Found:
– 15 mins to assess one mutation
– 45% uncaught mutations are equivalent
– Better tested project -> worse signal-to-noise ratio
23. Can we detect the equivalents?
• Not in the general case [BA82]
• Some specific cases can be detected
– Using compiler optimisation techniques [BS79]
– Using mathematical constraints [DO91]
– Line coverage changes [SZ10]
• All heuristic algorithms – not seen any
claiming to kill all equivalent mutants
27. Ruby
• Mutant seems to be the new favourite
• Runs in Rubinius (1.8 or 1.9 mode)
• Only supports RSpec
• Easy to set up
rvm install rbx-head
rvm use rbx-head
gem install mutant
• And easy to use
mutate “ClassName#method_to_test” spec
28. Java
• Loads of tools to choose from
• Bytecode vs source mutation
• Will look at PIT (seems like one of the better
ones)
29. PIT - pitest.org
• Works with “everything”
– Command line
– Ant
– Maven
• Bytecode level mutations (faster)
• Very customisable
– Exclude classes/packages from mutation
– Choose which mutations you want
– Timeouts
• Makes pretty HTML reports (line/mutation coverage)
30. Summary
• Can point at weak areas in your tests
• At the same time, can be prohibitively noisy
• Try it and see
32. References
• [BA82] - T. A. Budd and D. Angluin. Two notions of correctness and
their relation to testing. Acta Informatica, 18(1):31-45, November
1982.
• [BS79] - D. Baldwin and F. Sayward. Heuristics for determining
equivalence of program mutations. Research report 276,
Department of Computer Science, Yale University, 1979.
• [DO91] - R. A. DeMillo and A. J. O
utt. Constraint-based automatic test data generation. IEEE
Transactions on Software Engineering, 17(9):900-910, September
1991.
• [SZ10] - D. Schuler and A. Zeller. (Un-)Covering Equivalent Mutants.
Third International Conference on Software Testing, Verification and
Validation (ICST), pages 45-54. April 2010.
33. Also interesting
• [AHH04] – K. Adamopoulos, M. Harman and R. M. Hierons. How to
Overcome the Equivalent Mutant Problem and Achieve Tailored
Selective Mutation Using Co-evolution. Genetic and Evolutionary
Computation -- GECCO 2004, pages 1338-1349. 2004.
Hinweis der Redaktion
Code changes include: arithmetic flip, boolean flip, access modifier change, statement deletion, and lots more
Difficult to identify equivalent mutants. There are some papers which suggest methods (but I didn’t have time to read them properly).
Paper called “(Un-)Covering Equivalent Mutants”7 projects: AspectJ, Barbecue, Apache Commons (Lang?), Jaxen, Joda-Time, JTopas, XStream
Undecidable problem for arbitrary pairs of programs [BA82]Constraints represent the conditions under which a mutant will die. If the constraint system cannot be true, there are no conditions under which it can die -> equivalent mutant.For arbitrary constraint systems, recognising feasibility is undecidable.Line coverage change supposedly a decent heuristic. Change means probably non-equivalent. 75% correctly classified.
Since most of the team do Ruby, I’ve had a look into that too
Looked into Heckle – since that was what the original topic of this talk was. Turns out it’s been dead for a long time.
Largely based on Heckle, rewritten on top of RubiniusOnly supports RSpec, but is that what’s used in the team? Author is looking to extend to other frameworks.Not sure if you need rubinius-head any more, but you did as of February 2012 (perhaps there’s a more stable version with support now)Also not sure about compatibility of Rubinius with the “official” Ruby implementation
Reason I mention Java: more mature ecosystem for these tools (interesting features which would be nice in Mutant)Bytecode is faster to mutate as it avoids recompilationsJumble and Jester also seem quite popular
Exclude logging/debug calls (3rd party logging frameworks excluded by default) -> Probably don’t care about asserts on these
If I’ve not hit the time limit, are there any questions?