2. Precision vs. AccuracyPrecision vs. Accuracy
AccuracyAccuracy
Saying PI = 3 is accurate, but not preciseSaying PI = 3 is accurate, but not precise
I’m 2 meters tall, which is accurate,I’m 2 meters tall, which is accurate,
but not precisebut not precise
PrecisionPrecision
Saying PI = 4.378383 is precise, but not accurateSaying PI = 4.378383 is precise, but not accurate
Airline flight times are precise to the minute,Airline flight times are precise to the minute,
but not accuratebut not accurate
Number of significant digits is the keyNumber of significant digits is the key
3. Precision vs. AccuracyPrecision vs. Accuracy
People make assumptions about accuracyPeople make assumptions about accuracy
based on precisionbased on precision
““365 days” is not the same as “1 year” or365 days” is not the same as “1 year” or
“4 quarters” or even “52 weeks”“4 quarters” or even “52 weeks”
““10,000 staff hours” is not the same as “510,000 staff hours” is not the same as “5
staff years”staff years”
Unwarranted precision is the enemy ofUnwarranted precision is the enemy of
accuracy (e.g., 395.7 days +/- 6 months)accuracy (e.g., 395.7 days +/- 6 months)
4. IntroductionIntroduction
Good GoalsGood Goals
A goal should be SMARTA goal should be SMART
SpecificSpecific
Measurable/TestableMeasurable/Testable
AttainableAttainable
RelevantRelevant
Time-boundTime-bound
Can use aCan use a Purpose, Issue, ObjectPurpose, Issue, Object formatformat
6. IntroductionIntroduction
GQM ExampleGQM Example
Current average cycle time * 100
Baseline average cycle time
Subjective rating of manager’s satisfaction
Measures
Is the performance of the process improving?Question
Average cycle time
Standard Deviation
% cases outside the upper limit
Measures
What is the current change request processing speed?Question
Improve by 10%
the timeliness of
change request processing
from the project manager’s viewpoint
Goal Purpose
Issue
Object (process)
Viewpoint
7. Project Evaluation: QualityProject Evaluation: Quality
Test Planning and ResourcesTest Planning and Resources
Do we have enough testing resources?Do we have enough testing resources?
How many tests do we need to run (estimated)?How many tests do we need to run (estimated)?
How long does each test case take to design and write?How long does each test case take to design and write?
How long does each test take, on average?How long does each test take, on average?
How many full testing cycles do we expect? (more thanHow many full testing cycles do we expect? (more than
one especially for early test cycles)one especially for early test cycles)
How many person-days do we need (# tests * time perHow many person-days do we need (# tests * time per
test * # of cycles)?test * # of cycles)?
How many testing staff do we have?How many testing staff do we have?
How long will the testing phase take, with our currentHow long will the testing phase take, with our current
staff?staff?
Is the testing phase too long (i.e. our current staff is notIs the testing phase too long (i.e. our current staff is not
sufficient)? Do we have to test less or can we add staff?sufficient)? Do we have to test less or can we add staff?
8. Project Evaluation: QualityProject Evaluation: Quality
Reported/Corrected SoftwareReported/Corrected Software
DefectsDefects
0%
100%
Time
Start of testing phase End of testing phase
Defects
found
Defects fixed
Defects open
From Manager’s Handbook for Software
Development, Revision 1, NASA, Software
Engineering Laboratory 1990
9. Project Evaluation: QualityProject Evaluation: Quality
Reported/Corrected SoftwareReported/Corrected Software
Defects – Actual ProjectDefects – Actual Project
Number of
defect
reports
(in
thousands)
0
1.0
Weeks of testing
5 10 15 20 25 30 35 40
0.2
0.4
0.8
0.6
Found
Open
Fixed
11. Project Evaluation: QualityProject Evaluation: Quality
Statistics on Effort per DefectStatistics on Effort per Defect
Data on time required to fix defects, categorizedData on time required to fix defects, categorized
by type of defect, provides a basis for estimatingby type of defect, provides a basis for estimating
remaining defect correction workremaining defect correction work
Need to collect data on fix time in defect trackingNeed to collect data on fix time in defect tracking
systemsystem
Data on phases in which defects are injected andData on phases in which defects are injected and
later detected gives you a measure of thelater detected gives you a measure of the
efficiency of the development process. If 95% ofefficiency of the development process. If 95% of
the defects are detected in the same phase theythe defects are detected in the same phase they
were created, the project has an efficient processwere created, the project has an efficient process
12. Project Evaluation: QualityProject Evaluation: Quality
A Defect Fix Time Model forA Defect Fix Time Model for
TestingTesting
From Software Metrics: Establishing a Company-wide Program, by
Robert B Grady and Deborah L. Caswell, 1987
25%
50%
20%
4%
1%
2 hours
5 hours
10 hours
20 hours
50 hours
13. Product Characterization: QualityProduct Characterization: Quality
DefectsDefects
Defects are one of the most often usedDefects are one of the most often used
measures of qualitymeasures of quality
Definitions of defects differDefinitions of defects differ
Only items found by customers? Testers?Only items found by customers? Testers?
Items found during upstream reviews?Items found during upstream reviews?
Only non-trivial items?Only non-trivial items?
Small enhancements?Small enhancements?
Timing of “defect” detection an important partTiming of “defect” detection an important part
of defect characterizationof defect characterization
A “product defect” may be different than a “processA “product defect” may be different than a “process
defect”defect”
14. Product Evaluation: TestingProduct Evaluation: Testing
System Test ProfileSystem Test Profile
0
20
40
60
80
100
120
140
System Test Phase
Tests
Tests
Executed
Tests
Passed
Tests
Planned
From NASA, Recommended Approach to
Software Development, 1992
15. Product Evaluation: TestingProduct Evaluation: Testing
System Test ProfileSystem Test Profile
0
20
40
60
80
100
120
140
System Test Phase
Tests
Tests
Executed
Tests
Passed
Tests
Planned
From NASA, Recommended Approach to
Software Development, 1992
16. Product Evaluation: TestingProduct Evaluation: Testing
Cumulative Defects Found inCumulative Defects Found in
TestingTesting
Error Rate Model
0
1
2
3
4
5
6
7
8
Design Code/Test System Test Acceptance
Test
CumulativeErrorsperKSLOC
Historical Norm
Upper bound
Lower Bound
From Manager’s Handbook for
Software Development, Revision 1,
NASA, Software Engineering
Laboratory 1990
17. Product Evaluation: TestingProduct Evaluation: Testing
Cumulative Defects – ActualCumulative Defects – Actual
ProjectProject Error Rate Model
0
1
2
3
4
5
6
7
8
Design Code/Test System Test Acceptance
Test
CumulativeErrorsperKSLOC
Historical Norm
Upper Bound
Lower Bound
Actual Project
From Manager’s Handbook for
Software Development, Revision 1,
NASA, Software Engineering
Laboratory 1990
18. Product PredictionProduct Prediction
Predicting Future Defect RatesPredicting Future Defect Rates
Increasing FactorsIncreasing Factors
System sizeSystem size
ApplicationApplication
complexitycomplexity
Compressing theCompressing the
scheduleschedule
4x increase4x increase
More staffMore staff
Lower productivityLower productivity
Decreasing FactorsDecreasing Factors
Simplifying theSimplifying the
application/problem atapplication/problem at
handhand
Extending the plannedExtending the planned
development timedevelopment time
Cut in halfCut in half
Fewer staffFewer staff
Higher productivityHigher productivity
19. Product PredictionProduct Prediction
Defect Density PredictionDefect Density Prediction
To judge whether we’ve found all the defects for anTo judge whether we’ve found all the defects for an
application, estimate its defect densityapplication, estimate its defect density
Need statistics on defect density of past similar projectsNeed statistics on defect density of past similar projects
Use this data to predict expected density on this projectUse this data to predict expected density on this project
For example, if our prior projects had a defect densityFor example, if our prior projects had a defect density
between 7 and 9.5 defects/KLOC, we expect a similarbetween 7 and 9.5 defects/KLOC, we expect a similar
density on our new projectdensity on our new project
If our new project has 100,000 lines of code, we expect to findIf our new project has 100,000 lines of code, we expect to find
between 700 and 950 defects totalbetween 700 and 950 defects total
If we’ve found 600 defects so farIf we’ve found 600 defects so far
We’re not done: we expect to find between 100 and 350 moreWe’re not done: we expect to find between 100 and 350 more
defectsdefects
20. Product PredictionProduct Prediction
Distribution of Software DefectDistribution of Software Defect
Origins and SeveritiesOrigins and Severities
Highest severity faults come fromHighest severity faults come from
requirements and designrequirements and design
SeverityLevel
Minor
Mod
Major
Critical
Requirements
Design
Coding
Documentation
Bad Fixes
21. Product PredictionProduct Prediction
Defect ModelingDefect Modeling
Model the number of defects expectedModel the number of defects expected
based on past experiencebased on past experience
Model the number of defects inModel the number of defects in
requirements, design, construction, etc.requirements, design, construction, etc.
Two approaches:Two approaches:
Model defects based on effort hours, i.e XModel defects based on effort hours, i.e X
defects will be introduced per hour workeddefects will be introduced per hour worked
Model defects per KSLOC (or other size unit)Model defects per KSLOC (or other size unit)
based on past experience and code growthbased on past experience and code growth
curvecurve
22. Product PredictionProduct Prediction
Defect ModelingDefect Modeling continuedcontinued
Approach 1: SEI data, based on PSP data:Approach 1: SEI data, based on PSP data:
DesignDesign Injected/hour = 1.76Injected/hour = 1.76
CodingCoding Injected/hour = 4.20Injected/hour = 4.20
Approach 2:Approach 2:
Defects / KSLOC total are about 40 (30-85)Defects / KSLOC total are about 40 (30-85)
10% requirements (4/KLOC)
25% design (10/KLOC)
40% coding (16/KLOC)
15% user documentation (6/KLOC)
10% bad fixes (4/KLOC)
23. Product PredictionProduct Prediction
Predicted and Actual DefectsPredicted and Actual Defects
FoundFound
0
100
200
300
400
500
600
700
800
Analysis
High
LevelD
esign
Low
LevelD
esignConstruction
UnitTest
ProjectIntegration
Test
Release
Integration
TestSystem
Test
Beta
G
eneralAvailability
Defects
Phase injection
estimate
Phase actual removal
Phase expected
removal
Cumulative actual
removal
Cumulative injection
estimate
Cumulative expected
Removal
Cumulative injection
reestimate
Development Phase
From Edward F. Weller, Practical Applications of
Statistical Process Control, IEEE Software
May/June 2000
Size reestimate
25. Release MeasuresRelease Measures
Defect CountsDefect Counts
Defect counts give a quantitative handleDefect counts give a quantitative handle
on how much work the project team stillon how much work the project team still
has to do before it can release thehas to do before it can release the
softwaresoftware
Graph the cumulative reported defects,Graph the cumulative reported defects,
open defects and fixed defectsopen defects and fixed defects
When the software is nearing release, theWhen the software is nearing release, the
number of open defects should trendnumber of open defects should trend
downward, and the fixed defects shoulddownward, and the fixed defects should
be approaching the reported defects linebe approaching the reported defects line
26. Release MeasuresRelease Measures
Defect Trends – Near ReleaseDefect Trends – Near Release
All DefectsAll Defects
Number of
defect
reports
(in
thousands)
0
1.0
Weeks of testing
5 10 15 20 25 30 35 40
0.2
0.4
0.8
0.6
Found
Open
Fixed
Target
27. Release MeasuresRelease Measures
Defect Trends – Near ReleaseDefect Trends – Near Release
Severity 1 and 2Severity 1 and 2
Number of
defect
reports
(in
thousands)
0
1.0
Weeks of testing
5 10 15 20 25 30 35 40
0.2
0.4
0.8
0.6
Found
Open
Fixed
Target
28. Release MeasuresRelease Measures
Construx Measurable ReleaseConstrux Measurable Release
CriteriaCriteria
Acceptance testing successfully completedAcceptance testing successfully completed
All open change requests dispositionedAll open change requests dispositioned
System testing successfully completedSystem testing successfully completed
All requirements implemented, based on the specAll requirements implemented, based on the spec
All review goals have been metAll review goals have been met
Declining defect rates are seenDeclining defect rates are seen
Declining change rates are seenDeclining change rates are seen
No open Priority A defects exist in the databaseNo open Priority A defects exist in the database
Code growth has stabilizedCode growth has stabilized
29. Release MeasuresRelease Measures
HP Measurable ReleaseHP Measurable Release
CriteriaCriteria
Breadth – testing coverage of userBreadth – testing coverage of user
accessible and internal functionsaccessible and internal functions
Depth – branch coverage testingDepth – branch coverage testing
Reliability – continuous hours of operationReliability – continuous hours of operation
under stress; stability; ability to recoverunder stress; stability; ability to recover
gracefully from defect conditionsgracefully from defect conditions
Remaining defect density at releaseRemaining defect density at release
From Robert B Grady, Practical Software
Metrics for Project Management and Process
Improvement, 1992
30. Release MeasuresRelease Measures
Post Release Defect Density byPost Release Defect Density by
Whether Met Release CriteriaWhether Met Release Criteria
Postrelease incoming defects submitted by customers (3
month moving average)
MR 1 2 3 4 5 6 7 8 9 10 11 12
Months
Defects
submitted
(normalized by
KLOC)
Did Not
Meet
Worst
Product
That Met
Average of
Products
That Met
From Practical Software Metrics for
Project Management and Process
Improvement, by Robert B. Grady 1992
31. Release Measures: Defect CountsRelease Measures: Defect Counts
Defect Plot Before ReleaseDefect Plot Before Release
0
2
4
6
8
10
12
Time
NumberofDefects
Sev 1 & 2
Sev 2
Sev 1
Target
From Robert B Grady, Practical Software Metrics for Project Management
and Process Improvement, 1992
34. Process EvaluationProcess Evaluation
Status ExampleStatus Example
0
100
200
300
400
500
600
700
800
Implementation Phase
Units
Target
Units Created
Units Reviewed
Units Tested
1
From NASA, Manager’s Handbook for
Software Development, Revision 1, 1990
35. Goal #1 – Improve Software QualityGoal #1 – Improve Software Quality
Postrelease Discovered DefectPostrelease Discovered Defect
DensityDensity
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Nov-84 Mar-86 Aug-87 Dec-88 May-90 Sep-91 Jan-93
NumberofOpenSeriousandCritical
DefectReports
Older
< 12
Months
10X Goal
From Practical Software Metrics for Project Management
and Process Improvement, by Robert B. Grady 1992
36. Goal #1 – Improve Software QualityGoal #1 – Improve Software Quality
Prerelease Defect DensityPrerelease Defect Density
Question: How can we predict software quality based onQuestion: How can we predict software quality based on
early development processes?early development processes?
0
10
20
30
40
50
60
70
80
Oct-80 Feb-82 Jul-83 Nov-84 Mar-86 Aug-87 Dec-88
Project Release Date
DefectsinTest/KLOC
Defects in
Test/KLOC
Linear
(Defects in
Test/KLOC)
From Practical Software Metrics for
Project Management and Process
Improvement, by Robert B. Grady 1992
37. Goal #3 – Improve ProductivityGoal #3 – Improve Productivity
Defect Repair EfficiencyDefect Repair Efficiency
Question: How efficient are defect-fixing activities? Are weQuestion: How efficient are defect-fixing activities? Are we
improving?improving?
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
1987 1988 1989 1990 1991
DefectsFixed/Engr.Month
Defects
Fixed /
Engr
Month
From Practical Software Metrics for Project Management
and Process Improvement, by Robert B. Grady 1992
38. Goal #4 – Maximize Customer SatisfactionGoal #4 – Maximize Customer Satisfaction
Mean Time to Fix Critical andMean Time to Fix Critical and
Serious DefectsSerious Defects
Question: How long does it take to fix a problem?Question: How long does it take to fix a problem?
0
50
100
150
200
250
7/18/1990
8/18/1990
9/18/1990
10/18/1990
11/18/1990
12/18/1990
1/18/1991
2/18/1991
3/18/1991
4/18/1991
5/18/1991
6/18/1991
7/18/1991
8/18/1991
Days
AR
QA
KP+AD
LC
MR
From Practical Software Metrics for Project Management and
Process Improvement, by Robert B. Grady 1992
AR = Awaiting release
QA = Final QA testing
KP = known problem
AD = awaiting data
LC = lab classification
MR = marketing
review
Hinweis der Redaktion
Main Point: Cover Slide
Notes:
Administrivia:
Hours (9-4:30) or (8:30-4)
Breaks (15 minutes 10:30, 2:30) (10:00, 2:00)
Lunch 12-1 (or 11:30-12:30)
Messages
Facilities
Notebook
Show enthusiasm for the topic!
Ask them for their enthusiasm!
(maybe needs a slide on the morass of measurement – every awful measurement you’ve seen or heard of?)
Don’t place laptop in center of table, I block screen. Put it at end of the table in our seminar room.
Misc.:
References: None
Main Point: Precision is different than accuracy
Notes:
Accuracy = correct; closest to the truth
Precision = Exactness; how finely you express the point.
Precision implies accuracy that may not exist.
So what would be the best way to say the project will take a year.
Exactly a year?
About a year?
12 months?
364 days?
Miscellaneous: Story of the 70 million and 6 year old dinosaur
Reference: None
Main Point: People are misled
Notes:
How you communicate leads people to believe how confident you are.
A precise, but inaccurate, estimate communicates a highly confident, accurate estimate.
Miscellaneous: Story of the 70 million and 6 year old dinosaur
Reference: None
Main Point: SMART Goals
Notes:
The goals can either be project, area, or organizational in nature. We can talk about what the group is chartered to do.
Purpose, Issue, Object format is from Basili’s group at Maryland. It also has viewpoint – see template. You may or may not find viewpoint useful.
Make sure to have a good flow between Smart and GQM
Misc: None
References: None
Main Point: Diagram of GQM – how goals, questions and measures relate to each other
Notes:
Goal: A point, end or place that an individual or group is striving to reach
Question: decision maker issues related to progress toward or attainment of one or more goals
Measure – A measurable characteristic of the organization or project that provides data which helps answer one or more of the questions.
The same measure can be used in order to answer different questions under the same goal, and may answer questions under different goals as well.
It’s an engineering problem. Smallest number of measurements to support the largest number of goals
Sometimes a combination of two answers a question, too
Misc: None
Reference: Basili, Victor R., Gianluigi Caldiera, and H. Dieter Rombach, “The Goal Question Metric Approach”, in Encyclopedia of Software Engineering, Wiley, 1994, available at http://ftp.cs.umd.edu/pub/sel/papers/gqm.pdf
Main Point: Example of GQM – how goals, questions and measures relate to each other
Notes:
Goal: A point, end or place that an individual or group is striving to reach
Goals have a purpose (like improve), an issue, an object or process and a viewpoint. See example above. It also has a quality issue, timeliness here.
Question: decision maker issues related to progress toward or attainment of one or more goals
Measure – A measurable characteristic of the organization or project that provides data which helps answer one or more of the questions.
Example of a goal, question and measures
Misc: None
Reference: Basili, Victor R., Gianluigi Caldiera, and H. Dieter Rombach, “The Goal Question Metric Approach”, in Encyclopedia of Software Engineering, Wiley, 1994, available at http://ftp.cs.umd.edu/pub/sel/papers/gqm.pdf
Main Point: Test Planning measures
Notes:
Now we’ll talk about measures related to Test Planning
These are very basic, and could be done by test phase
Basically, how many tests do we think we’ll need to develop
How long does each take to run (on average, from past experience)
Thus how long will the whole set take to run (total effort)?
How many staff do we have?
Thus, how long will the testing phase (this particular phase) last?
Is this ok for the project schedule, or do we need more staff? Do we have to run fewer tests?
Allow for multiple runs, especially of early testing phases!! Allow for 3 or four cycles
Time to find a defect varies by test type. It’s less for unit test, more for system test. Time to find a defect increases as the testing phase continues, as most defects are already found (or at least most easy ones!). So time to find the average defect isn’t linear – 2x time may not yield 2x defects. The defect detection curve is a Rayleigh curve, at least on a large project. Peaks then tails off slowly.
More important to talk about input (characteristics evaluation)
Misc.: None
References: None
Main Point: Information about Reported and Corrected Defects plots
Notes:
Plot defects found, defect fixed, defects still open on one graph
This is a healthy project. No huge backlog. Strive for a graph like this. This is a basic graph that everyone should be plotting. Most shops have this data in their defect tracking system. How many found, how many fixed, how many still open, over time. You’ll have to figure the numbers out over time, or save them every Friday or whatever. The dates the defect was entered into the defect tracking system, and the date it was fixed are known, so you can reconstruct this curve.
Key information is in the slope of the open defects curve
Open defects should decline as testing progresses, unless there is inadequate staff correcting problems, or the software is exceptionally buggy.
People introduce defects every hour they work. So the defect introduction curve looks like a Rayleigh curve (seen earlier).
If count of open defects is growing, possible causes may be:
Inadequate staffing to correct defects
Software very unreliable
Ambiguous or volatile requirements
If count of open defects is decreasing, possible causes may be:
Stable development
Testing/development progressing
Reliable, well-designed software (easy to fix defects)
Misc.: NoneReferences: [NASA90] page 6-9
Main Point: Information about Reported and Corrected Defects plots
Notes:
Data shown is from an actual project. This is more likely to be reality in your project! Have a celebration when the white line and blue line cross. You’re fixing defects faster than you’re finding them.
Draw on whiteboard an unhealthy open curve with a second hump.
Most customers are happy with a project that has 95% of defects removed. Most companies don’t even manage that. It costs significantly more to get to 99% and 99.9%. Putnam says 25% more for 99% 50% more for 99.9, but a recent paper says up to 10 times more expensive
From the Trajectory Computation and Orbital Products System, developed from 1983 to 1987. The total size was 1.2 million SLOC.
Early in testing, defects were not getting corrected. The cause was lower quality software. Defects were not found during system testing. Corrective actions: staffing was increased at week 20 to help address open defects. System attained stability (fixed and open lines crossed) at week 35, with defects being corrected faster than they were being reported.
Caution about metrics based on defect closure rate – the easy ones get corrected first, leaves the harder ones for the end.
Misc.: NoneReferences: [NASA90] page 6-9
Main Point: Defect Rate
Notes:
How long we need to test depends on the final reliability we need.
This is a theoretical curve, but actual data has been shown to fit this curve. And, if we test for 2x the time, we’ll find 2x the defects, if we’re in the peak of the curve, but not later on.
It makes sense – as we work, we inject defects into the project. Since the effort curve (at least for large projects) follows a Rayleigh curve, the defect rate curve also is a Rayleigh curve
The first line labeled 95% is the reliability we often aim for. This translates into a MTBF of about 8-9 hours, which is enough for a typical batch program that usually doesn’t need to run for a long time without failure
99% translates into a MTBF of more like 1.8 weeks – which is needed for software that must run in an online environment for days without failure. If 95% is a project time of 1.0, 99% is 1.25 (25% longer) according to Putnam
99.9% translates into a MTBF of 10+ weeks – and a project time of 1.5 (50% longer) relative to 95% defect removal.
Note that the time to find a defect varies as time goes on. Peaks then tails off slowly – i.e. gets longer and longer to find each new defect at the end.
I didn’t talk about collecting the data. How did it come to be?
# Defects doesn’t = reliability
What is a defect?
Misc.: NoneReferences: Putnam and Myers, pages 125-130
Main Point: Effort per defect info related to release readiness
Notes:
From SPSG page 225
See slide xx showing effort per defect data from HP
Need to collect effort to fix defects.
Then can generate average fix times by type of defect.
Then, if you know how many defects you have open, you know how many person-weeks of effort you still have to go to fix the remaining defects on the project
This works grossly, for planning purposes, when there are hundreds of defects expected
Towards the end of a project we don’t know how long it will take us to fix the last few nasty defects, so this breaks down a bit.
And, just because you haven’t found any for a week, doesn’t mean there aren’t any there!
Industry data: JPL: 5-17 hours to fix a defect
Misc.: None
References: SPSG pages 221-235, chapter 16
Main Point: HP’s Defect model – move to testing section?
Notes:
This model is for how long defects take to find and fix. This model was developed by Henry Kohoutek at HP
From actual data for one of HP’s product lines
You could figure it out for your defects
You can use a model like this to estimate the testing hours needed for your project.
To use the model, estimate your total defects from the total code size (this is known at the start of test)
Multiply it by the expected defect density, this yields total defects expected
Then calculate, using the model, the total time necessary to discover all the defects
Then the available staffing can be applied to the total time to predict the testing schedule
Also, what percentage of your fixes break something else? It can be as high as 25%
Does this time include confirmation time for the fix? Yes
Got a question about schedule slips – schedule slips are systemic – i.e. see EV slides
Misc.: None
References: From Software Metrics: Establishing a Company-Wide Program, by Robert B. Grady and Deborah L. Caswell, 1987, page 128
Main Point: A little bit about defects
Notes:
What is a defect in your company? A review issue? A small enhancement? A customer enhancement request? Defects found in unit testing?
What’s a defect isn’t completely clear either
Do you count things found in reviews?
What about unit test defects if they’re found by the programmers themselves? Are they counted?
Some shops put enhancements (either internally generated or customer requests) into the defect tracking system, as a convenient place to store them. Are they ‘defects’?
Need to weed out duplicates, etc.
Some companies use several defect tracking systems so finding out the total can be difficult (one for defects in production, another during testing, for example)
Misc.: None
References: None
Main Point: System Test Profile
Notes:
An example of plotting testing progress, number of tests planned, number of tests executed, number of tests passed. We expect more or less linear growth as we test. This is an easy graph to do. Want tests executed and tests passed lines close together.
This is for an actual project shown in NASA Recommended Approach to Software Development, page 8-18
What’s happening here? Testing starts off well, then levels off and finally continues at a lower rate
Cause: midway through the phase, testers found they did not have the input coefficients needed to test flight software. There was a long delay before the data became available, and testing momentum declined.
This S-shaped curve can be tracked for several items of interest during software development, for example:
Completion of design reviews over time
Completion of code inspections over time
Completion of code integration over time (the graph above)
Completion of component test in terms of number of test cases attempted and successful over time
Completion of system test in terms of number of test cases attempted and successful over time
See also units coded, read, tested graph shown earlier for build testing
Misc.: NoneReferences: NASA Recommended Approach to Software Development, page 8-18
Main Point: System Test Profile
Notes:
An example of plotting testing progress, number of tests planned, number of tests executed, number of tests passed. We expect more or less linear growth as we test. This is an easy graph to do. Want tests exec and tests passed lines close together.
This is a made up slide, showing a project with more problems than in the prior slide – tests passed are falling behind tests executed. This is a pattern to watch out for!
This S-shaped curve can be tracked for several items of interest during software development, for example:
Completion of design reviews over time
Completion of code inspections over time
Completion of code integration over time (the graph above)
Completion of component test in terms of number of test cases attempted and successful over time
Completion of system test in terms of number of test cases attempted and successful over time
See also units coded, read, tested graph shown earlier for build testing
Misc.: NoneReferences: NASA Recommended Approach to Software Development, page 8-18
Main Point: Defect Rates
Notes:
Track defects vs. total estimated size of the project
You need to define what a defect is. People often use what’s in the defect tracking system. This graph is of defects found in test only
NASA has developed software development processes which reduce defects – for example requirements and design reviews. If you work in a shop without those processes, your defects in test will be much higher than shown here.
In testing, NASA has found the defect rates are halved in each succeeding testing phase (not counting defects found in requirements and design reviews)
4 defects/KSLOC in construction/unit test
2 defects/KSLOC in system test
1 defect/KSLOC in acceptance test
A graph of typical defect rates – NASA data. This shows their model upper and lower bounds as well as the expected rates
If a project’s defect rate is above the model upper bounds, possible causes:
Unreliable softwareMisinterpreted requirements
Extremely complex software
If a project’s defect rate is below the model bounds, possible causes:
Reliable software“Easy” problem
Inadequate testing
Misc.: NoneReferences: [NASA90] page 6-8
Main Point: Defect Rates
Notes:
Defect rates from one actual project. What’s going on?
This actual project had a lower defect rate and lower defect detection rate
In this case, this was an early indication of high quality
There was smooth progress in detecting defects – it’s not like they weren’t testing all along
This was one of the highest quality systems produced
If the defect density is lower than expected, possible causes:
Size estimate is high (good)
Inspection defect detection is low (bad)
Work product quality is high (as above example) (good)
Insufficient level of detail in work product (bad)
If the defect density is higher than expected, possible causes:
Size estimate is low (bad)
Work product quality is poor (bad)
Inspection defect detection is high (good)
Too much detail in work product (good or bad)
Misc.: None
References: [NASA90] page 6-8
Main Point: Factors which affect the defect rate
Notes: Several things increase the amount of faults we put into the system and their opposite tends to decrease the amount.
Japanese study showed that perceived schedule pressure drove up defects by 4x.
Schedule beyond the minimum (25%?) decreases faults inserted – to less than half
If you know your history, here are the factors that can affect it up or down
Misc. X
References: Measures for Excellence, Reliable Software on Time, Within Budget, by Lawrence H. Putnam and Ware Myers, 1992 This is from Putnam, Chapter 8, pages 135-146
Main Point: Defect density prediction related to release readiness
Notes:
For example past projects have found defects per KSLOC to be
In the range of 7 to 10
You have a system of 100,000 lines of code and have found 600 defects so far = 6 defects per KSLOC.
Based on past experience from 2 projects, you expect 7 to 10, or 700 to 1000 defects in 100,000 lines of code
If you are trying to remove 95% of all defects before release, you need to plan to find 665 – 950 defects based on your past experience
If you have past experience on a number of projects, you may know your average lifetime defect rate is 7.4 ± 0.4 defects which is a much closer range
Misc.: None
References: SPSG pages 221-235, chapter 16
Main Point: Worst faults come early
Notes: Critical is system doesn’t work
Major is some of system doesn’t work
Moderate is system works but data corruption
Minor is workmanship
What is the percentage of total defects introduced in each phase?
Vertical bars are 5% each.
We see that the faults that will cause us the most pain come from activities early in the lifecycle. This highlights our problem with brute force quality by testing at the end. Too late to find them, too expensive.
We see coding errors start to pick up at major but is dominated by design.
Coding probably gives the most number of errors but perhaps not the highest cost to the organization.
Capers Jones says this might be a typical distribution for a medium to large system, of 50,000 LOC and larger
He says for a small system of 5000 LOC or less, coding defects would be more than 50% of the total
Misc. X
Reference: Capers Jones, Applied Software Measurement, page 368
Main Point: Defect Modeling
Notes:
Defect modeling is another approach
We’ve seen this already in prior slides
Can model based on number of lines of code and past experience on defects per KSLOC
Or on defects inserted per hour of effort in requirements, design, construction etc.
Or based on anything else you might have tracked historical data on – defects per class, for example
Model defects against size in whatever units makes sense – KLOC, FP, use cases, classes, whatever
Can also insert defects deliberately and see how many are caught by testing. Say insert 100. Testing finds 220, 200 not inserted and 20 inserted. Since found 20% of inserted ones, predict 20% is the percentage of actual defects found. Thus total actual defects isn 1000, and 800 remain (plus the 80 we inserted) = 880 total. Predict same percent of actual defects found as inserted defects, although this is only true if we can model actual defects and insert the same distribution of types in our inserted ones. If do this, insert defects in a branch of code in CM library, test the branch, then kill it.
Misc.: None
References: SPSG pages 221-235, chapter 16
Main Point: Defect Modeling
Notes:
Some industry numbers as an example for defect modeling
Based in defects injected per hour of design, construction
Based on defects per KSLOC in different test phases
This is a level 3 prediction: you won’t have this data to start
Capers Jones (1999 seminar at Cx) says 83% of defects exist before a single line of code is written
Misc.: None
References: SPSG pages 221-235, chapter 16; NASA Manager’s Handbook for Software Development, page 6-8 and Watts S Humphrey ‘Measuring Software Quality’ presented at SEPG 2000
Main Point: Predicted and actual defects found
Notes:
If you track and record all defects, you can develop a profile of defects by phase for your organization. Then you can develop a graph that shows expected defect numbers as the project progresses
If you then find more or fewer defects than expected, you can research why
If you know you usually catch 50% of the requirements defects in inspections, you can predict how many you’ll catch and how many will ‘escape’
Phase containment numbers of defects by phase is graphed above – defect injection estimate, defect expected removal, defect actual removal, by phase, plus cumulative numbers
Defect removal in unit test was higher than estimated, which meant that a lower number of defects were removed in integration test and system test phases. Without accurate defect removal data from the unit test phase, these low numbers would be of more concern with respect to product quality
This is a level 3 metric! Phase criteria for releasing by phase is about product and process quality.
Misc.: NoneReferences: From Edward F. Weller, Practical Applications of Statistical Process Control, IEEE Software May/June 2000
Main Point: Measures related to Inspections and other reviews
Notes:
Example of defect profile by type
Could be other type classifications
HP found defect percent by type varied a lot between divisions – see green HP book page 139 ff
These are IEEE classifications
Specification is requirements – specs don’t describe the needs of the users
Functionality – Incorrect or incompatible product features – is also requirements
Data Handling, Computation and Logic are coding errors
UI, Data definition and error checking are design errors
Misc.: None
References: Robert B. Grady, Practical Software Metrics for Project Management and Process Improvement, Prentice Hall PTR, 1992 , page 139 ff
Main Point: What defect counts tell us about release readiness
Notes:
From SPSG page 224
See slide xx showing defects graph
If the project’s quality level is out of control, and the project is thrashing, you might see a steadily increasing number of open defects. Steps need to be taken to improve the quality of the existing designs and code before adding more new functionality.
Misc.: None
References: SPSG pages 221-235, chapter 16
Main Point: Information about Reported and Corrected Defects plots
Notes:
Data shown is theoretical – I moved the lines from the actual project to show values at release.
Defect counts give a quantitative handle on how much work the project team still has to do before it can release the software
Graph the cumulative reported defects, open defects and fixed defects
When the software is nearing release, the number of open defects should trend downward and get near zero, and the fixed defects should be approaching the reported defects line
Misc.: None
References: [NASA90] page 6-9
Main Point: Information about Reported and Corrected Defects plots
Notes:
Data shown is theoretical – I moved the lines from the actual project to show values at release.
Defect counts give a quantitative handle on how much work the project team still has to do before it can release the software
Graph the cumulative reported defects, open defects and fixed defects
When the software is nearing release, the number of open defects should trend downward and get near zero, and the fixed defects should be approaching the reported defects line
And, just because you haven’t found any for a week, doesn’t mean there aren’t any there! The open line should be near zero for more than one week!
Typically there are no sev 1 or sev 2 defects; and you look at the sev 3 with product support and fix some of the remaining ones – the ones product support says will be a problem – before release.
So the total defects slide (the one before) doesn’t get to zero before release, although it gets low; but the sev 1 and sev 2 plot gets to zero.
Misc.: NoneReferences: [NASA90] page 6-9
Main Point: HP release criteria
Notes:
From green HP book
Misc.: None
References: Robert B. Grady, Practical Software Metrics for Project Management and Process Improvement, page 76
Main Point: HP release criteria
Notes:
From green HP book
Misc.: None
References: Robert B. Grady, Practical Software Metrics for Project Management and Process Improvement, page 76
Main Point: HP post-release criteria plot of critical/serious defects, by whether met release criteria or not
Notes:
From green HP book, page 78
Certification meant following the criteria on for testing and release criteria
The bottom line is an average of a dozen projects that met the new release criteria
This graph compares products that met the criteria for release (shown 2 slides ago) vs. ones that didn’t.
Graph shows that a combination of good development and testing processes will enable you to confidently predict a low incoming defect rate
This is in some ways an experiment. When standard release criteria were implemented, not all projects met them; so HP could compare, later, the postrelease defects in the products that met them vs. products that didn’t.
Misc.: None
References: Robert B. Grady, Practical Software Metrics for Project Management and Process Improvement, page 78
Main Point: HP release criteria plot of critical/serious defects
Notes:
From green HP book, page 77
Plot in book includes defects/KLOC on the right
The target line (3 defects) is at 0.02
6 defects is at 0.04
9 defects = 0.06
The goal of the target defects was for the test cycle before the final one
Goal for final test cycle would be 0 Sev 1 and 2 defects
Note it included some critical or serious defects – but not many!
Track this (Sev 1 & 2) during the whole testing phase, not just at the end (also track total defects)
Misc.: None
References: Robert B. Grady, Practical Software Metrics for Project Management and Process Improvement, page 76
Main Point: Need combination for reasonable level of removal
Notes: A study was done to determine effectiveness of various techniques.
The “check” ones are personal desk checking.
Function testing is related modules
Integration testing is the whole system
As you can see, a desk check can do fairly well when run right or down right lousy.
The lowest effective rate was unit testing.
The real message though is look what happens with the combined number.
If our goal is 95%, we may not need to do all of these. This is where software engineering comes into play. What is the right set for our project that will get us there for the least cost?
Misc. Most research doesn’t include requirements which many purists don’t consider part of a software project. Too much variability and may end up in hardware.
Reference: Capers Jones, Programming Productivity, p.179
Main Point: Development status model
Notes:
For a single build (single stage of a staged development project)
This is a key progress indicator
It is an indirect software quality indicator
The model must represent how development is done – the development methodology
Expect a lag between coding and review and between review and testing
This is a measure of process quality. We want to keep consistent space between the lines. If reviewing and testing fall behind, the project is falling behind, even though coding is going well.
If the project suddenly catches up, be suspicious, probably they didn’t do reviews and testing as thoroughly as they should have.
Monitor only major activities
Misc.: None
References: NASA Manager’s Handbook for Software Development, Revision 1, page 6-11
Main Point: Example for an actual project of development status model
Notes:
Shows target, units coded, units reviewed,units tested
The project shown finished code and unit testing nearly on schedule. When severe problems were encountered during system integration and testing, it was found that insufficient unit testing had resulted in poor quality software. Details shown above.
Note the miracle finish at 1 – where all of a sudden code review and unit testing catch up with coding near the deadline, when there had been a 3-4 week lag
Cause:
Some crucial testing information was not available
Short cuts were taken in reviews and unit testing to meet schedules
Result: project entered system testing phase with poor quality software. To bring the software up to standard, the system test phase took 100% longer than expected (!)
Misc.: None
References: NASA Manager’s Handbook for Software Development, Revision 1, page 6-11
Main Point: HP post-release defects over time, as part of measuring process improvement
Notes:
From green HP book, page 207
A corporate wide HP goal was to improve the product post release defect density by a factor of 10 in five years
This graph, from one division, show its progress in meeting its goals
Goal: improve software quality
Questions this graph helps to answer:
What is our current software quality?
This graph shows post-release defect density of products according to one of HP’s 10X improvement measures. It is only an after-the-fact indicator of the quality level produced by our processes, and thus can only influence future products through cause-effect analysis.
Misc.: None
References: Robert B. Grady, Practical Software Metrics for Project Management and Process Improvement, page 207
Main Point: HP Division Prerelease Defect Density
Notes:
From green HP book, page 199
A corporate wide HP goal was to improve the product post release defect density by a factor of 10 in five years
This graph, from one division, show its progress in meeting its goals
Goal: improve software quality
Questions this graph helps to answer:
How can we predict product quality based on early development processes?
This data can be used to predict the performance of the graph on the prior slide. For an unchanging process, there is a roughly predictable ration between pre- and post-release defects.
Keep in mind that an upward trend in this graph could show either better testing techniques or poorer pretest defect avoidance.
A downward trend could reflect better pre-test defect avoidance or poorer testing.
Misc.: None
References: Robert B. Grady, Practical Software Metrics for Project Management and Process Improvement, page 199
Main Point: Defect Repair Efficiency (defects fixed per engineering month)
Notes:
A corporate wide HP goal was to improve productivity
Our primary cost is in engineering months and calendar months. Our output today is most effectively measured in KLOC or function points for new development, and in defects fixed for maintenance. Productivity measurement is a particularly sensitive topic. It is best not to measure any finer level of detail than in these examples, and it is best to drive improvements from figure 15-4.
Goal: improve productivity
Questions this graph helps to answer: How efficient are defect-fixing activities?
This graph shows the trend of efficiency in fixing defects. It helps to insure that we reduce the average effort to fix defects besides whatever staffing actions we might take to reduce the backlog. (This graph does not show real data).
Misc.: None
References: Robert B. Grady, Practical Software Metrics for Project Management and Process Improvement, page 201
Main Point: Mean time to fix serious and critical defects
Notes:
A corporate wide HP goal was to maximize customer satisfaction
The next three graphs show more direct aspects of customer satisfaction. They deal with responsiveness to important customer problems and indirectly with how well we understand all our customer’s needs
Goal: Maximize customer satisfaction
Questions this graph helps to answer: How does it take to fix a problem?
The trend of the total area under the curve is related to how long customers have to wait before they see fixes. The largest area represents the best opportunity to shorten cycle time. MR = marketing review, LC = Lab classification, KP = Known Problem, AD = waiting for data, QA = final quality assurance testing, AR = awaiting release
Misc.: None
References: Robert B. Grady, Practical Software Metrics for Project Management and Process Improvement, page 202, figure 15-8