This document provides an overview of basic statistics concepts. It defines statistics, describes who uses statistics, and outlines descriptive and inferential statistics. It also defines types of variables, population and sample, measures of central tendency including mean, median and mode, and measures of dispersion including range, variance and standard deviation. Frequency distribution is discussed as a method to organize grouped quantitative data into classes with their frequencies. The normal curve is briefly mentioned as well.
5. LOGOINTRODUCTIONINTRODUCTION
Types of Statistics
Descriptive Statistics:
Describes the characteristics of a product or process
using information collected on it.
Methods of organizing, summarizing, and presenting
data in an informative way.
Inferential Statistics:
Draws conclusions on unknown process parameters
based on information contained in a sample.
A decision, estimate, prediction, or generalization
about a population, based on a sample.
Uses probability
6. LOGOINTRODUCTIONINTRODUCTION
Type of Variable
A. Qualitative or Attribute variable - The
characteristic being studied is nonnumeric.
Examples: Gender, religious affiliation, type of
automobile owned, state of birth, eye color are
examples.
B. Quantitative variable - Information is reported
numerically.
Examples: Balance in your checking account,
minutes remaining in class, or number of children in
a family.
7. LOGOINTRODUCTIONINTRODUCTION
Type of Variable
Type of Variable
Qualitative Quantitative
• brand of PC
• marital status
• hair colours
ContinuousDiscrete
• amount of income
tax paid
• weight of a student
• yearly rainfall in
Malacca
• children in a family
• TV sets owned
• strokes on a golf
hole
8. LOGOSTATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
Statistical Process Control (S.P.C.)
This is a control system which uses statistical
techniques for knowing, all the time, changes
in the process.
It is an effective method in preventing
defects and helps continuous quality
improvement.
9. LOGOSTATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
WHAT DOES S.P.C. MEAN?
Statistical:
Statistics are tools used to make predictions
on performance.
There are a number of simple methods for
analysing data and, if applied correctly, can
lead to predictions with a high degree of
accuracy.
10. LOGOSTATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
WHAT DOES S.P.C. MEAN?
Process:
The process involves people, machines,
materials, methods, management and
environment working together to produce an
output, such as an end product.
People Machines Material
Management Methods Environment
Output
11. LOGOSTATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
WHAT DOES S.P.C.
MEAN?
Control:
Controlling a process is guiding
it and comparing actual
performance against a
target/nominal value.
Then identifying when and
what corrective action is
necessary to achieve
the target.
12. LOGOSTATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
S.P.C
Statistics aid in making decisions about a
process based on sample data and the results
predict the process as a whole.
13. LOGOSTATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
POPULATION & SAMPLE
X = Sample mean s = Sample standard Deviation
µ = Population mean σ= Population Standard deviation
14. LOGOSTATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
POPULATION & SAMPLE
Population :
Includes each element from the set of observations that can be
made.
Sample:
Consists only of observations drawn from the population.
Population parameter (µ,σ)
The mean of a population is denoted by the symbol μ.
Sample statistic (x , s)
The mean of a sample is denoted by the symbol x. A quality
calculated from sample of observation.
15. LOGO
VARIATIONS
Variation exists in all processes.
Variation can be categorized as either:
Example: Let us taking a pie and cutting it
into pieces, making each pieces the same
size as best we can.
This is inherent variability so even very
good product.
Common or Random causes of variation
OR
Assignable Cause Variation
STATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
16. LOGO
PROCESS VARIATION
No industrial process or machine is able to
produce consecutive items which are identical
in appearance, length, weight, thickness etc.
The differences may be large or very small,
but they are always there.
The differences are known as ‘variation’.
This is the reason why ‘tolerances’ are used.
STATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
17. LOGO
Process Variations
STATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
Process Element Variable Examples
Machine Speed, operating temperature,
feed rate
Tools Shape, wear rate
Fixtures Dimensional accuracy
Materials Composition, dimensions
Operator Choice of set-up, fatigue
Maintenance Lubrication, calibration
Environment Humidity, temperature
18. LOGO
TYPES OF VARIATION (1)
1. Random variation
Random causes that we cannot identify.
Unavoidable, e.g. slight differences in process
variables like diameter, weight, service time,
temperature, equipment, tooling, employee
actions, facility environment, materials,
measurement system, etc.
Also called common/natural cause variation
To reduce random variation, we must reduce
variation in the inputs and the process.
As long as the distribution remains in specified
limits, the process said be ‘in control’.
STATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
19. LOGO
TYPES OF VARIATION (2)
2. Non-random variation
Also called special cause variation or
assignable cause variation
Caused by equipment out of adjustment,
worn tooling, operator errors, poor training,
defective materials, measurement errors,
new batch of raw materials etc.
The process is not behaving as it usually
does.
The cause should be identified and
corrected.
STATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
20. LOGO
Mean = the calculated average of all the
values in a given data set
Median = the central value of a data set
arranged in order
Mode = the value which occurs with
most frequency in a given data set
MEASURE OF CENTRAL TENDENCYMEASURE OF CENTRAL TENDENCY
21. LOGOMEASURE OF CENTRAL TENDENCYMEASURE OF CENTRAL TENDENCY
Ungrouped Data Grouped Data
Mean for population data:
Mean for sample data:
where:
= the sum af all values
N = the population size
n = the sample size,
= the population mean
= the sample mean
Mean for population data:
Mean for sample data:
where:
= midpoint
= frequency of a class
Mean
N
xu ∑=
n
xx ∑=
∑x
x
u
N
fxu ∑=
n
fxx ∑=
x
f
22. LOGO
Example (Ungrouped Data)
The following data give the prices (rounded to
thousand RM) of five homes sold recently in NEC.
158 189 265 127 191
Find the mean sale price for these homes.
MEASURE OF CENTRAL TENDENCYMEASURE OF CENTRAL TENDENCY
23. LOGO
Solution
Thus, these five homes were sold for an average
price of RM186 thousand @ RM186 000.
The mean has the advantage that its calculation
includes each value of the data set.
MEASURE OF CENTRAL TENDENCYMEASURE OF CENTRAL TENDENCY
186
5
930
5
191127265189158
=
=
++++=
∑= n
xx
24. LOGO
Example (Grouped Data)
The following table gives the frequency distribution of the
number of orders received each day during the past 50
days at the office of a mail-order company. Calculate the
mean.
MEASURE OF CENTRAL TENDENCYMEASURE OF CENTRAL TENDENCY
Number of order f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
n = 50
25. LOGO
Solution
Because the data set includes only 50 days, it represents
a sample. The value of is calculated in the following
table:
MEASURE OF CENTRAL TENDENCYMEASURE OF CENTRAL TENDENCY
∑ fx
Number of order f x fx
10 – 12 4 11 44
13 – 15 12 14 168
16 – 18 20 17 340
19 – 21 14 20 280
n = 50 Σfx = 832
26. LOGO
Solution
The value of mean sample is:
Thus, this mail-order company received an average of
16.64 orders per day during these 50 days.
MEASURE OF CENTRAL TENDENCYMEASURE OF CENTRAL TENDENCY
64.16
50
832==∑= n
fxx
27. LOGO
The measures of central tendency such as mean,
median and mode do not reveal the whole picture of
the distribution of a data set.
Two data sets with the same mean may have a
completely different spreads.
The variation among the values of observations for
one data set may be much larger or smaller than for
the other data set
Relative Dispersion Measurement :
i. Range
ii. Standard Deviation
MEASURE OF DIPERSIONMEASURE OF DIPERSION
28. LOGO
Range (Ungrouped Data)
Example
Find the range of production for this data set
MEASURE OF DIPERSIONMEASURE OF DIPERSION
RANGE = Largest value – Smallest value
State
Total Area
(square miles)
Arkansas 53,182
Louisiana 49,651
Oklahoma 69,903
Texas 267,277
29. LOGO
Solution
Find the range of production for this data set
Range = Largest value – Smallest value
= 267 277 – 49 651
= 217 626
Disadvantages:
being influenced by outliers.
Based on two values only. All other values in a data
set are ignored.
MEASURE OF DIPERSIONMEASURE OF DIPERSION
30. LOGO
Range (Grouped Data)
Example
Find the range for this data set
MEASURE OF DIPERSIONMEASURE OF DIPERSION
Range = Upper bound of last class – Lower bound of first class
Class Frequency
41 - 50 1
51 - 60 3
61 - 70 7
71 - 80 13
81 – 90 10
91 - 100 6
TOTAL 40
Solution
Upper bound of last class = 100.5
Lower bound of first class = 40.5
Range = 100.5 – 40.5 = 60
31. LOGO
Range (Ungrouped Data)
Example
Find the range of production for this data set
MEASURE OF DIPERSIONMEASURE OF DIPERSION
RANGE = Largest value – Smallest value
State
Total Area
(square miles)
Arkansas 53,182
Louisiana 49,651
Oklahoma 69,903
Texas 267,277
32. LOGO
Variance and Standard Deviation
Standard deviation is the most used measure of
dispersion.
A Standard Deviation value tells how closely the
values of a data set clustered around the mean.
Lower value of standard deviation indicates that the
data set value are spread over relatively smaller range
around the mean.
Larger value of data set indicates that the data set
value are spread over relatively larger around the
mean (far from mean).
Standard deviation is obtained the positive root of the
variance
MEASURE OF DIPERSIONMEASURE OF DIPERSION
33. LOGO
Variance and Standard Deviation
Ungrouped Data
MEASURE OF DIPERSIONMEASURE OF DIPERSION
Variance
Standard
Deviation
Population
Sample
( )
N
N
xx∑ ∑−
=
2
2
2σ
( )
1
2
2
2
−
∑ ∑−
=
n
n
xx
s
2
σσ =
2
ss =
34. LOGO
Example
Let x denote the total production (in unit) of company
Find the variance and standard deviation
MEASURE OF DIPERSIONMEASURE OF DIPERSION
Company Production
A 62
B 93
C 126
D 75
E 34
35. LOGO
Solution
MEASURE OF DIPERSIONMEASURE OF DIPERSION
Company Production (x) x2
A 62 3844
B 93 8649
C 126 15876
D 75 5625
E 34 1156
Σx=390 Σx2
=35150
( )
5.1182
15
5
2
390
35150
1
2
2
2
=
−
−
=
−
∑ ∑−
=
n
n
xx
s
3875.34
50.1182
,
;50.11822
=
=
=
s
Therefore
sSince
36. LOGO
Variance and Standard Deviation
Grouped Data
MEASURE OF DIPERSIONMEASURE OF DIPERSION
Variance
Standard
Deviation
Population
Sample
N
N
fx
fx∑
∑
−
=
2
2
2σ
1
2
2
2
−
∑
∑
−
=
n
n
fx
fx
s
2
σσ =
2
ss =
37. LOGO
Example
Find the variance and standard deviation for the
following data:
MEASURE OF DIPERSIONMEASURE OF DIPERSION
No. of order f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
TOTAL n = 50
38. LOGOMEASURE OF DIPERSIONMEASURE OF DIPERSION
No. of order f x x2
fx fx2
10 – 12 4 11 121 44 484
13 – 15 12 14 196 168 2352
16 – 18 20 17 289 340 5780
19 – 21 14 20 400 280 5600
TOTAL n = 50 832 14216
5820.7
150
50
2
832
14216
1
2
2
2
=
−
−
=
−
∑
∑
−
=
n
n
fx
fx
s
Solution
Variance
75.2
5820.7
2
=
=
= ss
Standard Deviation
11
2
1012
int =
+
== pomidx
39. LOGO
A frequency distribution for quantitative data lists
all the classes and the number of values that belong
to each class.
Data presented in form of frequency distribution are
called grouped data.
FREQUENCY DISTRIBUTIONFREQUENCY DISTRIBUTION
40. LOGO
The class boundary is given by the midpoint of the
upper limit of one class and the lower limit of the
next class. Also called real class limit.
To find the midpoint of the upper limit of the first
class and the lower limit of the second class, we
divide the sum of these two limits by 2.
Example :
FREQUENCY DISTRIBUTIONFREQUENCY DISTRIBUTION
5.400
2
401400
=
+
=boundaryclassLower
41. LOGO
Class Width (class size)
Example
Class Midpoint or Mark
Example :
FREQUENCY DISTRIBUTIONFREQUENCY DISTRIBUTION
boundaryLowerboundaryUpperwidthClass −=
2005.4005.600 =−=classfirsttheofWidth
2
limlim
int
itUpperitLower
midpoClass
+
=
5.500
2
600401
1 =
+
=classsttheofWidth
43. LOGO
1. To decide the number of classes, we used Sturge’s
formula, which is
Where; c = the no. of classes
n = the no. of observations in the data set
2. Class width,
This class width is rounded to a convenient number
3. Lower Limit of the First Class or the Starting Point
Use the smallest value in the data set
CONSTRUCTING FREQUENCY DISTRIBUTIONCONSTRUCTING FREQUENCY DISTRIBUTION
TABLESTABLES
nc log3.31+=
c
range
i
classesofnumber
valuesmallestvalueestl
i
>
−
>
arg
44. LOGO
Example
The following data give the total home runs hit by all players
of each of the 30 Major League Baseball teams during
2004 season
FREQUENCY DISTRIBUTIONFREQUENCY DISTRIBUTION
45. LOGO
Solution
1.
2.
3.Starting point = 135
FREQUENCY DISTRIBUTIONFREQUENCY DISTRIBUTION
( )
class
ncclassesofNumber
689.8
48.13.31
log3.31,
≈=
+=
+=
18
8.17
6
135242
,
≈
>
−
>iwidthClass
46. LOGO
Frequency Distribution for Data
FREQUENCY DISTRIBUTIONFREQUENCY DISTRIBUTION
Total Home
Runs
Class
Boundaries
Tally f
135-152 134.5 - 152.5 IIII IIII 10
153-170 152.5 - 170.5 II 2
171-188 170.5 - 188.5 IIII 5
189-206 188.5 - 206.5 IIII I 6
207-224 206.5 - 224.5 III 3
225-242 224.5 - 242.5 IIII 4
Σf=30
50. LOGOTHE NORMAL CURVETHE NORMAL CURVE
Properties of the Normal Curve:
Theoretical construction
Also called Bell Curve or Gaussian Curve
Perfectly symmetrical normal distribution
The mean of a distribution is the midpoint of the
curve
The tails of the curve are infinite
Mean of the curve = median = mode
The “area under the curve” is measured in standard
deviations from the mean
51. LOGOTHE NORMAL CURVETHE NORMAL CURVE
Properties of the Normal Curve:
Has a mean = 0 and standard deviation = 1.
General relationships:±1 s = about 68.26%
±2 s = about 95.44%
±3 s = about 99.72%
-5 -4 -3 -2 -1 0 1 2 3 4 5
68.26%
95.44%
99.72%
52. LOGOTHE NORMAL CURVETHE NORMAL CURVE
Standard Scores
One use of the normal curve is to explore
Standard Scores. Standard Scores are
expressed in standard deviation units, making
it much easier to compare variables measured
on different scales.
There are many kinds of Standard Scores. The
most common standard score is the ‘z’ scores.
A ‘z’ score states the number of standard
deviations by which the original score lies
above or below the mean of a normal curve.
53. LOGOTHE NORMAL CURVETHE NORMAL CURVE
The Z Score
The normal curve is not a single curve but a
family of curves, each of which is determined
by its mean and standard deviation.
In order to work with a variety of normal
curves, we cannot have a table for every
possible combination of means and standard
deviations.
54. LOGOTHE NORMAL CURVETHE NORMAL CURVE
The Z Score
What we need is a standardized normal curve
which can be used for any normally distributed
variable. Such a curve is called the Standard
Normal Curve.
s
xxZ i −=
55. LOGOTHE NORMAL CURVETHE NORMAL CURVE
The Standard Normal Curve
The Standard Normal Curve (z distribution) is
the distribution of normally distributed
standard scores with mean equal to zero and a
standard deviation of one.
A z score is nothing more than a figure, which
represents how many standard deviation units
a raw score is away from the mean.
56. LOGOTHE NORMAL CURVETHE NORMAL CURVE
Example 1:
A normal curve has an average of 55.38 and a
standard deviation of 1.95. What percentage of
the area under the curve will fall between the
limits of 52.5 and 56.5
58. LOGOTHE NORMAL CURVETHE NORMAL CURVE
Solution:
The area under the normal distribution curve is
Therefore, the area under the curve between limits
52.5 and 56.5 = A2 – A1
= 0.7157 – 0.0694
= 0.6463
= 64.63%
µ
52.38
x2
56.5
Area, A2
Area, A1
x1
52.5
59. LOGOTHE NORMAL CURVETHE NORMAL CURVE
Example 2:
The life of an equipment in hours is a random
variable following normal distribution having a
mean life of 5600 hours with standard deviation
of 840 hours.
i.What % of equipment will fail between 5000
and 6200 hours.
ii.What % will survive more than 6000 hours.
iii.What % will fail below 3500 hours.
60. LOGOTHE NORMAL CURVETHE NORMAL CURVE
Solution:
Given data,
i.Percentage of equipment that will fail between 5000
and 6200 hours.
Let x1 = 5000 hours, x2 = 6200 hours
hours
hoursx
840
5600
=
=
σ
]71.0[7611.0,
71.0
840
56006200
]71.0[2389.0,
71.0
840
56005000
22
2
2
11
1
1
ATableAppendixZForAArea
xxZ
ATableAppendixZForAArea
xxZ
==
=−=−=
−==
−=−=−=
σ
σ
61. LOGOTHE NORMAL CURVETHE NORMAL CURVE
Solution:
Area under the curve between 5000 hours and 6200
hours
= A2 – A1
= 0.7611 – 0.2389
= 0.5222
= 52.22%
µ
5600
x2
6200
Area, A2
Area, A1
x1
5000
62. LOGOTHE NORMAL CURVETHE NORMAL CURVE
Solution:
ii. Percentage of equipment that will survive more
than 6000 hours.
Hence, the percentage is
= 1 – A1 = 1 – 0.6844 = 0.3156 = 31.56%
]48.0[6844.0,
48.0
840
56006000
1 ATableAppendixZForAArea
xxZ
==
=−=−= σ
Total area = 1
Area, A
μ
5600
x
6000
Area, A1
63. LOGOTHE NORMAL CURVETHE NORMAL CURVE
Solution:
iii. Percentage of equipment that will fail below 3500
hours.
Hence, the percentage is
= 0.062%
]5.2[0062.0,
5.2
840
56003500
1 ATableAppendixZForAArea
xxZ
−==
−=−=−= σ
μ
5600
x
3000
Area, A1