Why apply six sigma to software development? Let’s look at software development. Progress has been made since the term the “software crisis was first coined in the 1960’s to describe the poor quality of software development, but software projects are still notoriously late, over budget and fail to satisfy their requirements. Consider the following statistics. Now, let’s look at six sigma.
However, software development is not a typical six sigma application While software development is process oriented, <click> inputs are often ill-defined <click> outputs are often difficult to fully evaluate – “you can’t use testing to verify the absence of errors” <click> performance is highly influenced by human factors, leading to a high degree of natural variation.
“Assessment and Control of Software Risks”, Casper Jones 1994 (p.29)
So, in applying Six Sigma to software development, we’ll use both techniques. We’ll apply DFSS to the fuzzy front end to improve our ability to Gather and interpret requirements Establish a product concept Develop product specifications Determine implementation requirements (i.e., CTPs – which for software could include things like the acquisition of new tools, new technology, new skill sets, etc.) Establish the project’s schedule and cost structure We’ll use DMAIC to strengthen the actual software development process, which begins for our purposes with design. Our primarily objective will be to Improve overall productivity Improve quality Reduce the number of introduced defects Improve defect containment Reduce the number of defects delivered to the customer
Let’s begin by looking at how we can use DFSS to improve the fuzzy front end.
For example, we may begin by looking at different geographic areas and the different types of customers in each of those areas (here a lead user is one who customizes the product themselves). In addition to geography, we could also divide customers based on Size (small, medium and large), or type of application (real-time financial services, real-time simulation, etc.) The idea is to identify all of the slices in the pie chart that represents the product’s market, so that our view of requirements is complete. <Draw this on the board)
Kano analysis divides requirements into three basic categories: Must be’s Satisfiers Delighters Illustrate the difference using the airline example. Taking a shuttle flight between New York and Washington Getting there – must be Getting there is less time is a satisfier – the faster the better Surveying excellent American Champagne in crystal glasses during the flight is a delighter The relationship between the categories and customer satisfaction is shown in the next slide <click>
The information can be can be summarized in a matrix that shows: The relationship between requirements and use cases The Kano classification and priority associated with each use case Note: requirements and use cases should be numbered for tracking
< Walk through the diagram>
In the matrix, we start by summarizing the voice of the customer by listing The requirements and use cases The customer’s view of each use case (Kano classification and priority) We then create two columns for each of our design options One of specifying the level of support for each use case, where 0 = no support, 1 = minimum support, 2 = average support, and 3 = strong support, based on the measures we determined earlier (<back two slides if necessary>) The other specifying the level of effort required in LOC to support the specified level of support for each use case Note: In defining the design options, you must be realistic. That is the resulting options must all be viable. Therefore, you must consider the relationships between use-cases as well as conflicts between use cases. For example, providing strong support for one use case may dictate the level of support provided for another use case. Or in order to provide strong support for one use case, we may have to provide strong support for another, even though this would not be our choice.
For each option we can calculate a customer satisfaction score. This will be a function of the Kano classification, priority and level of support for each use case. This is not yet a science and there are different ways to this: The simplest is to ignore the Kano classification and calculate the customer satisfaction score by summing priority x level of support over of all of the use cases. More complete approaches take the Kano classifications into account. For example, Penalize the score for every must be that is not supported Scores for must bes = priority x level of support, with maximum = priority x average Scores of satisfiers = priority x level of support Scores for delighters = priority 2 x level of support The most important thing is to be consistent so that the comparison is valid, and to regularly tune your approach based on pre-release estimates of customer satisfaction and post release measurements of customer satisfaction.
For each option, we can calculate an effort score by summing the level of effort estimates for all of the use cases.
Finally, we can refine our options by performing a rough benefit/cost analysis. This can be done by Determining the center of the grid by Calculating the median customer satisfaction score for the set of options Calculating the median level of effort score for the set of options Plotting the options Selecting the options with the best benefit/cost profile for further analysis. This will like be options in Quadrant 2, or Quadrants 1 and 3; options in Quadrant 4 are not attractive
We use Putnam’s equation as an illustration of an estimating model because it is based on thousands of real software projects, well documented, and well explained in published materials. However, there is no shortage of estimating models, and another model may be better for your situation. The important point is to use a consistent, quantitative approach for evaluating your capability to develop and deliver products based on the different options, and continuously improve your estimating model based on actual results. Note: B in the above equation is called a skill factor. It’s a function of project size and takes care to the additional effort required for integration. Note: E = (Size/Productivity x Duration 4/3 )/B Duration = (Size/Productivity x (Effort/B) 1/3 ) 3/4
Next, we calculate a manpower buildup index, which represents how quickly we staff projects. This is another historical parameter that is useful in characterizing the organizations software development capability. The more schedule compression, the shorter the duration of projects, but with a disproportionate increase in staffing (cost) and risk.
At the end of the project in month 10, we will have discovered 80.6% of the total defects. We can use this information in two ways: We can use this information together with historical data on the effort required to find and fix defects to sanity check our plan. For example, if it takes 23 hours to find and fix each defect, we can check if the allocated effort is sufficient given the defect discovery profile. Since we will be delivering approximately 20% of the defects to customers, we might want to revise our plan to start testing earlier, or to institute other defect containment strategies (e.g., inspections) to reduce the anticipated number of defects at the start of testing. The goal is ensure that our plans for staffing and testing are sufficient to deliver the required level of quality. Needless to say, if significant changes are made, earlier parts of our evaluation may have to be repeated.
Here, we’re comparing one design option against two possible schedules, one more aggressive than the other. Business value = intrinsic value of the product Feature value = added value based on customer satisfaction rating Duration adjustment – estimated value of delivering early Net value = (business value + feature value) – (effort cost + defect repair cost)
This slide continues the comparison, this time between the moderate and very rapid schedules. Results show that very compressed schedules are not always advantageous because of the disproportionate increase in cost and effort.
Finally, once our overall plan has been established, the next step is transform this into a detailed project plan and perform simulation using schedule estimates (best, average and worst cases) for the critical path to determine the overall schedule risk. <click – next slide>
Monte Carlo simulation of project schedule risk can be performed when a critical path plan has been developed (i.e., predecessors and successors identified for each task). For each task in the critical path the team is asked to provide ‘best case’, ‘expected’, and ‘worst case’ duration estimates (sometimes called “PERT” estimates). These are used as input to the simulation, typically run 1000 times, to get a probability distribution of expected completion times. Simulation results can be used to determine earliest and latest dates associated with a given probability (e.g., 95% as illustrated here), or to determine the probability associated with a particular date (e.g., ‘what is the probability the project will complete by 1/1/2003?’)
Now, let’s look at the application of Six Sigma’s DMAIC process to the software development process.
Before describing the actual process, let me mention a few prerequisites. First, The software development process must be well defined in order to apply DMAIC to achieve process improvements.
Perhaps the best way to illustrate how DMAIC can be applied to improve the software development process is to walk through an example. This slide shows the problem statement and goal statement developed during the Define phase <review slide>
The team decided to collect three types of metrics during the measure phase to more fully characterize the problem: Total problems fixed prior to release per project Total post release problems per project Types of post release problems, overall and per project These metrics were selected to provide a more complete picture of the company’s defect containment capability. This data will allow the team to determine overall defect containment and study defect containment as A function of project characteristics A function of error type
The analyzed the collected data in several ways. First, they looked at the relationship between pre-release defects and project size and found a strong correlation.
Next, they looked at the relationship between escaped defects and pre-release defects and found that it was fairly linear. The first two results taken together suggest that there is no significant variation in defect containment effectiveness across projects.
Next, the team created a histogram of escaped defects showing the distribution of the different types of problems. This showed that code related problems are the most common. The team decided that the root cause of the increased maintenance effort was the increased size of recent projects, which is unlikely to change, and poor error containment for code related problems,
The team decided that they could improve the situation by improving the effectiveness of code inspections. They decided that the effectiveness of code inspections (the number of identified defects) was a function of The size of the unit inspected The preparation time The inspection time The number of reviewers and decided to conduct a designed experiment to determine the optimal combination of these factors. <click – next slide>
Once they determined the optimal combination they conducted a pilot test using real projects to verify the results.
In order to ensure that the improvement will be maintained and managed the team established a performance standard for code inspections based on defects/KLOC, and established a plan for monitoring the process and responding tom situations where unacceptable performance is observed.