2. Setting Expectations
Calculating Measures of central tendency and variation
Skewness and kurtosis
Calculating area under normal curve
Sorting data
Histogram
Pareto Chart
Scatter diagrams
Bar and Pie charts
Using Analysis Toolpak for advanced functions
2
3. This is not a training on Six Sigma!!
The training presentation assumes that you are already
aware of Six Sigma concepts, and are looking for ways to
implement the same using MS Excel.
The training presentation also assumes that you know the
basics of MS Excel, and hence it focuses on some advanced
analytical concepts.
The excel tips and tools mentioned in this presentation can
be used in multiple phases of the DMAIC order. So, the
presentation does not follow a DMAIC flow of thought.
The training is based on MS Excel 2007. Improvise a little
when you are using MS Excel 2003.
3
4. In mathematics, the central tendency of a data set is a measure of the
"middle" or "expected" value of the data set. There are many different
descriptive statistics that can be chosen as a measurement of the
central tendency of the data items. These include mean, the median
and the mode.
Other statistical measures such as the standard deviation and the range
are called measures of spread and describe how spread out the data is.
4
5. The arithmetic mean (average) of a list of numbers is the sum of all of
the list divided by the number of items in the list.
To obtain the arithmetic mean from a dataset, use the excel function
“Average”. Click below for the syntax for using the function.
Click for the syntax
Syntax
=AVERAGE(number1,number2,...)
5
6. A median is described as the number separating the higher half of a
sample, a population, or a probability distribution, from the lower half.
If there is an even number of observations, the median is not unique, so
one often takes the mean of the two middle values.
Click for the syntax
Syntax
=MEDIAN(number1,number2,...)
6
7. The mode is the value that occurs the most frequently in a data set or a
probability distribution. The mode is not necessarily unique, since the
same maximum frequency may be attained at different values.
Click for the syntax
Syntax
=mode(number1,number2,...)
7
8. In Statistics, variance is the expected square deviation of a variable or
distribution from its expected value or mean. To obtain variance from a
distribution, excel uses the function “=var”. Click below for the syntax.
Click for the syntax
Syntax
=VAR(number1,number2,...)
8
9. Standard deviation is a measure of the variability or dispersion of a
statistical population, a data set, or a probability distribution. To
calculate Standard Deviation in an excel worksheet, we use the
function, “=stdev”.
Click for the syntax
Syntax
=STDEV(number1,number2,...)
9
10. In descriptive statistics, the range is the length of the smallest interval
which contains all the data. It is calculated on excel by subtracting the
Min from the max value of the sample. Click below for the syntax.
Click for the syntax
Syntax
=max(A2:A16)-Min(A2:A16)
10
11. In probability theory and statistics, skewness is a measure of the
asymmetry of the probability distribution of a real-valued random
variable. It is measured in Six Sigma because, in reality, data points are
always not perfectly symmetric.
Click for the syntax
Syntax
=skew(A2:A16)
11
12. In probability theory and statistics, kurtosis is a measure of the
"peakedness" of the probability distribution of a real-valued random
variable.
Click for the syntax
Syntax
=kurt(A2:A16)
12
13. If the mean is 85 days and the standard deviation is 5 days,
what is the yield if the USL is 90 days?
USL
Z = (90 − 85) / 5 = 1
Area under curve to
Y = Pr( x ≤ 90) = Pr( z ≤ 1) right of USL would
be considered %
defective
P(z<1) = P(z>-1) = 1-.15865
= .8413 Yield ≅ 84.1% Yield
60 70 80 90 100 110 120
D a ys
-7 -6 - -4 -3 -2 - 0 2 3 4 5 6 7
5 1 1
Z-Scale
13
18. For a pizza delivery center, the mean of the delivery time is
20 minutes and the standard deviation is 3.5. What is their
target, if the probability of achieving the target is 99.78%?
USL
Yield
Hours
a s
18
22. Data in raw form are usually not easy to use
for decision making
Some type of organization is needed
▪ Table
▪ Graph
Techniques reviewed here:
Ordered Array
Histograms
Bar charts and pie charts
Contingency tables
22
23. A sorted list of data:
Shows range (min to max)
Provides some signals about variability
within the range
May help identify outliers (unusual observations)
If the data set is large, the ordered array is
less useful
23
24. Data in raw form (as
collected):
24, 26, 24, 21, 27, 27, 30, 41,
32, 38
Data in ordered array from
smallest to largest:
21, 24, 24, 26, 27, 27, 30, 32, 38,
41
24
25. A graph of the data in a frequency distribution is
called a histogram
The class boundaries (or class midpoints) are
shown on the horizontal axis
the vertical axis is either frequency, relative
frequency, or percentage
Bars of the appropriate heights are used to
represent the number of observations within
each class
25
26. Class
Class Midpoint Frequency
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5 Histogram : Daily High Tem perature
40 but less than 50 45 4
7 6
50 but less than 60 55 2
6 5
Frequency
5 4
4 3
3 2
2
(No gaps 1 0 0
between 0
bars)
5 15 25 35 45 55 More
26
29. 2
Choose Histogram
(
Input data range and bin range (bin
range is a cell range containing
the upper class boundaries for
3 each class grouping)
Select Chart Output
and click “OK”
29
32. Scatter Diagrams are used for bivariate
numerical data
Bivariate data consists of paired observations
taken from two numerical variables
The Scatter Diagram:
one variable is measured on the vertical axis and
the other variable is measured on the horizontal
axis
32
33. 1
Select the Insert Menu
tab
2
Select Scatter plot
dropdown and
click on any of
the options. If in
doubt, select the
first option
(scatter with only
markers)
33
34. Volume Cost per
Cost per Day vs. Production Volume
per day day
23 125 250
26 140 200
29 146
Cost per Day
150
33 160
38 167 100
42 170 50
50 188
0
55 195
0 10 20 30 40 50 60 70
60 200
Volume per Day
34