2. DEFINE MEASURE ANALYZE IMPLEMENT CONTROL REPEAT
6 PANEL
We receive data from
these 2 phases which
needs to be processed to
get an inference
3. MINITAB
• It was developed by Pennsylvania State University.
• It is a statistical package developed for an in-depth statistical analysis
of data.
• Plays a major role in all six-sigma projects.
• It is one of the best user-friendly tools available in the market.
5. TOOLS IN MINITAB
Cause and Effect
Diagram
Control Chart
Pareto Chart
Scatter Plot
Flow Chart
Check Sheet
Histogram
Z Test
T test
F test
ANOVA
Regression Model
7 QC TOOLS
STATISTICAL
HYPOTHESIS
TESTING
6. EXAMPLE 1
• A car manufacturing company learns that the number of dents and
dings on the doors have increased exponentially. They want to
determine the root causes for the formation of dents and dings.
• It is very tough to list down all the causes in a haphazard manner for
analysis.
• How to proceed?
7. CAUSE AND EFFECT DIAGRAM
• Cause and effect diagram was introduced by Ishikawa
• It is also called as Ishikawa Diagram.
• The causes are categorized under 6M’s
1. Men
2. Material
3. Machine
4. Method
5. Measurement
6. Mother Nature
10. Fill in the sub-
causes under each
sub-heading by
clicking the
corresponding cell.
11.
12. CONTROL CHARTS
• A control chart is a graph that represents the data taken over time
and the variations in the data
• It can help in monitoring the stability of the process
• It can help in determining whether the process needs any
improvement
• There are two types of variations:
• Assignable causes of variations: These variations are larger in
magnitude and can be easily traced
• Chance Cause of Variation: These variations are inevitable in any
process.
16. EXAMPLE 2
• A telephone service center wants to assess whether its call answering
system is in control. The total number of incoming calls and number
of unanswered calls were recorded for 21 days. Each unanswered call
is a defective.
• How do we solve this issue?
• Since the incoming calls vary daily, therefore a p-chart can be used to
analyze the data.
20. • In p-chart, The proportion of defectives in each sample is studied to
determine whether the process is in control or out of control.
• An in-control represents only random variation in proportion
• An out of control represents non-random variation in proportion of
samples which might be due to special causes.
• In the above scenario, the telephone data did not fail in any of the
tests therefore the process is in-control.
21. EXAMPLE 3
• A manufacturer of light bulbs wants to assess whether or not its
process is in control. samples of light bulbs were taken every hour for
three shifts (24 samples of 500 light bulbs) and tested to see whether
or not they light. Defects are light bulbs that do not work.
• How do we test?
• Using NP chart, we need to determine the number of defects in each
sample to see the nature of the process.
24. EXAMPLE 4
• A wallpaper manufacturer wants to assess the stability of its printing
process. Samples of 100 feet were taken from 25 wallpaper rolls and
the number of blemishes, which include print smears, pattern
distortions, and missing ink, were counted.
• How to analyze this issue?
• In this scenario, we can use a c-chart to analyze the defects.
27. • The 2 points have exceeded the 3 sigma level. Hence we need to
analyze the reasons behind those defects in the sample and only if we
avoid those defects, the overall process can be deemed as statistically
in-control process.
28. EXAMPLE 5
• A stenographic company wants to assess the quality of its
transcription service. They sampled 25 samples (sets of pages) and
counted the number of typographical errors for each sample.
Differently sized sets of pages were sampled. Typographical errors are
defects.
• How to analyze this issue?
• Because the samples are not the same size, a U chart can be used to
asses the number of defects per unit measurement and to see if the
process is under control
31. • From the graph, it is seen that sample 6 and sample 18 have
exceeded the 3 sigma level for that samples.
• These results indicate that the process is unstable and that the factors
causing this variation should be identified and corrected.
32. EXAMPLE 6
• A clothing manufacturer has had numerous returns due to various
quality complaints. The manufacturer wants to correct the problems
but isn't sure where to begin. They decide to track the number and
type of defects in their clothing line. Identifying the sources of their
most common problems will help them to target their improvement
efforts.
• How to do this?
• With the help of pareto analysis, it is possible to identify the most
common problems.
36. EXAMPLE 7
• A research team at a fitness assessment company is looking for a
method to predict a person's body fat percentage. This health
measure is difficult and expensive to measure directly. In its model,
the team wants to include a predictor variable that is easier to
measure, and is considering the use of Body Mass Index (BMI).
• How to measure the body fat percentage?
• We can use scatter plot to measure the same.
40. RESULTS
• The scatterplot for the BMI and fat data shows a strong positive and
linear relationship between the two variables. Body Mass Index (BMI)
may be a good predictor of body fat percentage.
41. EXAMPLE 8
• A potato chip manufacturer is studying the problem of broken potato
chips. As part of the initial investigation, the manufacturer randomly
samples 100 packages and counts the number of broken chips per
package.
• How can he interpret the data?
• This can be done using a histogram.
44. STATISTICAL HYPOTHESIS TESTING
• In the upcoming slides, the results obtained are purely based on
statistical calculations
• The common tests are Z test, T test and F Test
45. EXAMPLE 9
• A dietician selects a random sample of 13 bottles of cooking oil to
determine if the mean percentage of saturated fat is different from
the advertised 15%. Previous research indicates that the population
standard deviation is 2.6%
• How to proceed?
46. Z-TEST
• The Z-confidence interval and test procedures are used to make
inferences about a population mean (m ), based on the mean of a
random sample.
47. Fat content in Oil.
First step is to define the hypothesis.
The null hypothesis will be μ = 15
The alternate hypothesis will be μ ≠ 15
The confidence interval is a range of likely values for m . Since you do not
know the true value of μ , the confidence interval allows you to guess its
value based on the sample data. The sample mean provides an estimate
of μ, and s is used to determine how far off the estimate might be.
In general, the proportion of intervals that include m is equal to 1 minus
the chosen a-level . You can choose any a-level that is greater than 0%
and less than 100%. The 0.05 a-level is commonly used.
48.
49.
50. Since the fat content data were analyzed with an a-level of 0.05,
a 95% (or 0.95) confidence interval was constructed.
This interval tells you that, based on the sample mean and the
known value of s , you can be 95% confident that μ is greater
than or equal to 15.1866 and less than or equal to 18.0134.
Since the reference value of 15 is not within the confidence
interval, you can reject H0 with 95% confidence and conclude
that μ is not 15.
2018161412
3.0
2.5
2.0
1.5
1.0
0.5
0.0
X
_
Ho
Fat Content
Frequency
Histogram of Fat Content
(with Ho and 95% Z-confidence interval for the Mean, and StDev = 2.6)
51. EXAMPLE 10
• A health management firm has samples of satisfaction ratings from
former patients of two hospitals, and wants to know if one is rated
more highly than the other. The information will be used to refer
patients and to make suggestions for hospital improvement.
• The variances of the two samples were found to be sufficiently
similar, so a pooled standard deviation will be used for the test.
52. T TEST
• The two-sample t-confidence interval and test procedures are used to
make inferences about the difference between two population means
(μ A and μ B ), based on data from two independent, random
samples.
53. Satisfaction rating by patients from 2 different hospitals
First step is to define the hypothesis
The null hypothesis will be That the difference, μ A - μ B , is equal
to the chosen reference value
The alternate hypothesis will be That μ A - μ B is not equal to the
chosen reference value.
The confidence level that is opted generally is a=0.05.
54.
55.
56. The next step is to check if the obtained t value lies between
the range of –t(a/2) to +t(a/2). If the condition is satisfied, we
can go ahead with the null hypothesis or else the alternative
hypothesis.
For the hospital satisfaction data, the t-value is 4.11 which
does not lie in the range of (-2.28,+2.28), hence this indicates
that there is less than a 0.05% chance that you would have
obtained your samples if μ A - μ B was actually 0.
The boxplots of the hospital-satisfaction data illustrate that:
The medians for each sample are very similar to the means.
Mean satisfaction was greater for hospital A than for hospital
B. (This was confirmed by the results of the t-test.)
The spread of the data appears to be about the same for both
samples, except that sample B has a slightly longer upper tail
than sample A.
BA
100
90
80
70
60
50
40
30
Data
Boxplot of A, B
57. EXAMPLE 11
• A recent study compared drivers on two types of roads. Each driver
drove on one of two road types: first class road (1) and dirt road (2).
As a measure of driving performance, testers recorded the number of
steering corrections each driver made on each type of road. You want
to test if the drivers' performances were equally variable across the
two road conditions.
• How to analyze this?
58. VARIANCE TEST
• The Two Variances confidence interval and test procedures are used
to make inferences about the equality of the standard deviations and
variances between two populations based on data from two
independent, random samples.
59. Number of steering corrections
made by the driver on the two
road conditions.
The first step is to define the hypothesis
The null hypothesis is the standard deviation of population 1, σ(1),
and the standard deviation of population 2, σ(2), are the same
The alternate hypothesis is the standard deviation of population
1, σ(1), and the standard deviation of population 2, σ(2), are not
the same
The confidence level that is opted generally is a=0.05.
60.
61. We are supposed to choose
“Both samples are in one
column” because the first
column has the different types of
roads
62. 3
2
1
0
252015105
3
2
1
0
1
Corrects
Frequency
2
Histogram of Corrects
Panel variable: RoadType
In Minitab, two tests are performed Bonnet's test and Levene's test.
We get a p value for each test.
High p-values (above the specified a-level ) indicate no statistically
significant difference between the standard deviations or variances
(equality or homogeneity).
Low p-values (below the specified a-level) indicate a difference
between the standard deviations or variances (inequality).
For the driving data, the high p-values for both tests (0.640 and
0.680) indicate that the standard deviations are not different.
The histogram shows that the number of corrections for the first class
road (1) was smaller than the number of corrections for the dirt road
(2). The variability between groups appears approximately equal. The
data for first class roads may be skewed to the right.
63. SUMMARY
• Minitab is an user friendly software used for extensive statistical
analysis.
• It is used in major six sigma projects to analyze the data and give an
insight into the potential problem
• There are various tools in Minitab such as Scatter plot, Pareto
Analysis, Histogram, Cause and effect matrix, Control Chart.
• Majority of the tools are under the “ Stat” option in the tool bar.