More Related Content Similar to [Paul Holland] Bad Metrics and What You Can Do About It (20) More from Ho Chi Minh City Software Testing Club (20) [Paul Holland] Bad Metrics and What You Can Do About It1. Bad Metrics and
What You
Can Do About It
Paul Holland
Managing Director of Testing
Practice at Doran Jones, Inc.
2. My Background
• Independent S/W Testing consultant since Apr 2012
• 16+ years testing telecommunications equipment and
reworking test methodologies at Alcatel-Lucent
• 10+ years as a test manager
• Presenter at STAREast, STARWest, Let’s Test, EuroSTAR
and CAST
• Keynote at KWSQA conference in 2012
• Facilitator at 35+ peer conferences and workshops
• Teacher of S/W testing for the past 5 years
• Teacher of Rapid Software Testing
– through Satisfice (James Bach): www.satisfice.com
• Military Helicopter pilot – Canadian Sea Kings
April, 2013 ©2013 Testing Thoughts 2
3. Attributions
• Over the past 10 years I have spoken with many
people regarding metrics. I cannot directly attribute
any specific aspects of this talk to any individual but
all of these people (and more) have influenced my
opinions and thoughts on metrics:
– Cem Kaner, James Bach, Michael Bolton, Ross Collard,
Doug Hoffman, Scott Barber, John Hazel, Eric Proegler, Dan
Downing, Greg McNelly, Ben Yaroch
April, 2013 ©2013 Testing Thoughts 3
4. Definitions of METRIC(from http://www.merriam-
webster.com, April 2012)
• 1 plural : a part of prosody that deals with metrical structure
• 2 : a standard of measurement <no metric exists that can be
applied directly to happiness — Scientific Monthly>
• 3 : a mathematical function that associates a real nonnegative
number analogous to distance with each pair of elements in a set
such that the number is zero only if the two elements are
identical, the number is the same regardless of the order in which
the two elements are taken, and the number associated with one
pair of elements plus that associated with one member of the
pair and a third element is equal to or greater than the number
associated with the other member of the pair and the third
element
April, 2013 ©2013 Testing Thoughts 4
5. Sample Metrics
• Number of Test Cases Planned (per release or feature)
• Number of Test Cases Executed vs. Plan
• Number of Bugs Found per Tester
• Number of Bugs Found per Feature
• Number of Bugs Found in the Field
• Number of Open Bugs
• Lab Equipment Usage
April, 2013 ©2013 Testing Thoughts 5
6. Sample Metrics
• Hours between crashes in the Field
• Percentage Behind Plan
• Percentage of Automated Test Cases
• Percentage of Tests Passed vs. Failed (pass rate)
• Number of Test Steps
• Code Coverage / Path Coverage
• Requirements Coverage
April, 2013 ©2013 Testing Thoughts 6
7. Goodhart’s Law
• In 1975, Charles Goodhart, a former advisor to the
Bank of England and Emeritus Professor at the
London School of Economics stated:
Any observed statistical regularity will
tend to collapse once pressure is placed
upon it for control purposes
Goodhart, C.A.E. (1975a) ‘Problems of Monetary Management: The UK Experience’ in Papers in
Monetary Economics, Volume I, Reserve Bank of Australia, 1975
April, 2013 ©2013 Testing Thoughts 7
8. Goodhart’s Law
• Professor Marilyn Strathern FBA has re-stated
Goodhart's Law more succinctly and more generally:
`When a measure becomes a target, it
ceases to be a good measure.'
April, 2013 ©2013 Testing Thoughts 8
9. Elements of Bad Metrics
1. Measure and/or compare elements that are
inconsistent in size or composition
– Impossible to effectively use for comparison
– How many containers do you need for your
possessions?
– Test Cases and Test Steps
• Greatly vary in time required and complexity
– Bugs
• Can be different severity, likelihood - i.e.: risk
April, 2013 ©2013 Testing Thoughts 9
10. Elements of Bad Metrics
2. Create competition between individuals
and/or teams
– They typically do not result in friendly competition
– Inhibits sharing of information and teamwork
– Especially damaging if compensation is impacted
– Number of xxxx per tester
– Number of xxxx per feature
April, 2013 ©2013 Testing Thoughts 10
11. Elements of Bad Metrics
3. Easy to “game” or circumvent the desired
intention
– Easy to be improved by undesirable behaviour
– Pass rate (percentage): Execute more simple tests that
will pass or break up a long test case into many
smaller ones
– Number of bugs raised: Raising two similar bug
reports instead of combining them
April, 2013 ©2013 Testing Thoughts 11
12. Elements of Bad Metrics
4. Contain misleading information or gives a
false sense of completeness
– Summarizing a large amount of information into
one or two numbers out of context
– Coverage (Code, Path)
• Misleading information based on touching the code
once
– Pass rate and number of test cases
April, 2013 ©2013 Testing Thoughts 12
13. Impact of Using Bad Metrics
• Gives Executives a false sense of test coverage
– All they see is numbers out of context
– The larger the numbers the better the testing
– The difficulty of good testing is hidden by large “fake”
numbers
• Dangerous message to Executives
– Our pass rate is at 96% so our product is in good shape
– Code coverage is at 100% - our code is completely tested
– Feature specification coverage is at 100% - Ship it!!!
• What could possibly go wrong?
April, 2013 ©2013 Testing Thoughts 13
14. Sample Metrics
• Number of Test Cases Planned (per release or feature)
• Number of Test Cases Executed vs. Plan
• Number of Bugs Found per Tester
• Number of Bugs Found per Feature
• Number of Bugs Found in the Field – A list of Bugs
• Number of Open Bugs – A list of Open Bugs
• Lab Equipment Usage
April, 2013 ©2013 Testing Thoughts 14
15. Sample Metrics
• Hours between crashes in the Field
• Percentage Behind Plan – depends if plan is flexible
• Percentage of Automated Test Cases
• Percentage of Tests Passed vs. Failed (pass rate)
• Number of Test Steps
• Code Coverage / Path Coverage – depends on usage
• Requirements Coverage – depends on usage
April, 2013 ©2013 Testing Thoughts 15
16. So … Now what?
• I have to stop counting everything. I feel
naked and exposed.
• Track expected effort instead of tracking test
cases using:
– Whiteboard
– Excel spreadsheet
April, 2013 ©2013 Testing Thoughts 16
17. Whiteboard
• Used for planning and tracking of test execution
• Suitable for use in waterfall or agile (as long as
you have control over your own team’s process)
• Use colours to track:
– Features, or
– Main Areas, or
– Test styles (performance, robustness, system)
April, 2013 ©2013 Testing Thoughts 17
18. Whiteboard
• Divide the board into four areas:
– Work to be done
– Work in Progress
– Cancelled or Work not being done
– Completed work
• Red stickies indicate issues (not just bugs)
• Create a sticky note for each half day of work (or mark
# of half days expected on the sticky note)
• Prioritize stickies daily (or at least twice/wk)
• Finish “on-time” with low priority work incomplete
April, 2013 ©2013 Testing Thoughts 18
21. Reporting
• An Excel Spreadsheet with:
– List of Charters
– Area
– Estimated Effort
– Expended Effort
– Remaining Effort
– Tester(s)
– Start Date
– Completed Date
– Issues
– Comments
• Does NOT include pass/fail percentage or number of test
cases
April, 2013 ©2013 Testing Thoughts 21
22. Sample Report
April, 2013 ©2013 Testing Thoughts 22
Charter Area
Estimated
Effort
Expended
Effort
Remaining
Effort Tester Date Started
Date
Completed
Issues
Found Comments
Investigation for high QLN spikes on EVLT
H/W
Performance 0 20 0 acode 12/10/2011 01/14/2012
ALU016170
32
Lots of investigation. Problem was on 2-3 out of 48
ports which just happened to be 2 of the 6 ports I
tested.
ARQ Verification under different RA Modes ARQ 2 2 0 ncowan 12/14/2011 12/15/2011
POTS interference ARQ 2 0 0 --- 01/08/2012 01/08/2012
Decided not to test as the H/W team already tested
this functionality and time was tight.
Expected throughput testing ARQ 5 5 0 acode 01/10/2012 01/14/2012
INP vs. SHINE ARQ 6 6 0 ncowan 12/01/2011 12/04/2011
INP vs. REIN ARQ 6 7 5 jbright 01/06/2012 01/10/2012
To translate the files properly, had to install Python
solution from Antwerp. Some overhead to begin
testing (installation, config test) but was fairly quick
to execute afterwards
INP vs. REIN + SHINE ARQ 12 12
Traffic delay and jitter from RTX ARQ 2 2 0 ncowan 12/05/2011 12/05/2011
Attainable Throughput ARQ 1 4 0 jbright 01/05/2012 01/08/2012
Took longer because was not behaving as expected
and I had to make sure I was testing correctly. My
expectations were wrong based on virtual noise not
being exact.
23. Weekly Report
• A PowerPoint slide indicating the important
issues (not a count but a list)
– “Show stopping” bugs
– New bugs found since last report
– Important issues with testing (blocking bugs,
equipment issues, people issues, etc.)
– Risks (updates and newly discovered)
– Tester concerns (if different from above)
– The slide on the next page indicating progress
April, 2013 ©2013 Testing Thoughts 23
24. Sample Report
April, 2013 ©2013 Testing Thoughts 24
0
10
20
30
40
50
60
70
80
90
ARQ SRA Vectoring Regression H/W Performance
Effort(personhalfdays)
Feature
"Awesome Product" Test Progress as of 02/01/2012
Original Planned Effort
Expended Effort
Total Expected Effort
Direction of lines indicates
effort trend since last report
Solid centre bar=finished
Green: No concerns
Yellow: Some concerns
Red: Major concerns