Basic Statistics Guide

www.themegallery.com
UNIT 1-UNIT 1- BASIC STATISTICS
© Mechanical Engineering Department

LOGOOUTLINEOUTLINE
Introduction
Statistical Process Control (S.P.C.)
Measure of Central Tendency
Measure of Dispersion
Frequency Distribution
The Normal Curve

LOGOINTRODUCTIONINTRODUCTION
Definition of Statistics:
Statistics is the science of collecting,
organizing, presenting, analyzing, and
interpreting numerical data to assist in
making more effective decisions.

Who Uses Statistics?
Statistical techniques are used
extensively in marketing, accounting,
quality control, consumers, professional
sports people, hospital administrators,
educators, politicians, physicians, etc...

Types of Statistics
Descriptive Statistics:
 Describes the characteristics of a product or process
using information collected on it.
 Methods of organizing, summarizing, and presenting
data in an informative way.
Inferential Statistics:
 Draws conclusions on unknown process parameters
based on information contained in a sample.
 A decision, estimate, prediction, or generalization
about a population, based on a sample.
 Uses probability

Type of Variable
A. Qualitative or Attribute variable - The
characteristic being studied is nonnumeric.
Examples: Gender, religious affiliation, type of
automobile owned, state of birth, eye color are
examples.
B. Quantitative variable - Information is reported
numerically.
Examples: Balance in your checking account,
minutes remaining in class, or number of children in
a family.

Type of Variable
Type of Variable
Qualitative Quantitative
• brand of PC
• marital status
• hair colours
ContinuousDiscrete
• amount of income
tax paid
• weight of a student
• yearly rainfall in
Malacca
• children in a family
• TV sets owned
• strokes on a golf
hole

LOGOSTATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL
Statistical Process Control (S.P.C.)
This is a control system which uses statistical
techniques for knowing, all the time, changes
in the process.
It is an effective method in preventing
defects and helps continuous quality
improvement.

WHAT DOES S.P.C. MEAN?
Statistical:
Statistics are tools used to make predictions
on performance.
There are a number of simple methods for
analysing data and, if applied correctly, can
lead to predictions with a high degree of
accuracy.

WHAT DOES S.P.C. MEAN?
Process:
The process involves people, machines,
materials, methods, management and
environment working together to produce an
output, such as an end product.
People Machines Material
Management Methods Environment
Output

WHAT DOES S.P.C.
MEAN?
Control:
Controlling a process is guiding
it and comparing actual
performance against a
target/nominal value.
Then identifying when and
what corrective action is
necessary to achieve
the target.

S.P.C
Statistics aid in making decisions about a
process based on sample data and the results
predict the process as a whole.

POPULATION & SAMPLE
X = Sample mean s = Sample standard Deviation
µ = Population mean σ= Population Standard deviation

POPULATION & SAMPLE
Population :
Includes each element from the set of observations that can be
made.
Sample:
Consists only of observations drawn from the population.
Population parameter (µ,σ)
The mean of a population is denoted by the symbol μ.
Sample statistic (x , s)
The mean of a sample is denoted by the symbol x. A quality
calculated from sample of observation.

LOGO
VARIATIONS
Variation exists in all processes.
Variation can be categorized as either:
Example: Let us taking a pie and cutting it
into pieces, making each pieces the same
size as best we can.
This is inherent variability so even very
good product.
Common or Random causes of variation
OR
Assignable Cause Variation
STATISTICAL PROCESS CONTROLSTATISTICAL PROCESS CONTROL

LOGO
PROCESS VARIATION
No industrial process or machine is able to
produce consecutive items which are identical
in appearance, length, weight, thickness etc.
The differences may be large or very small,
but they are always there.
The differences are known as ‘variation’.
This is the reason why ‘tolerances’ are used.

LOGO
Process Variations
Process Element Variable Examples
Machine Speed, operating temperature,
feed rate
Tools Shape, wear rate
Fixtures Dimensional accuracy
Materials Composition, dimensions
Operator Choice of set-up, fatigue
Maintenance Lubrication, calibration
Environment Humidity, temperature

LOGO
TYPES OF VARIATION (1)
1. Random variation
 Random causes that we cannot identify.
 Unavoidable, e.g. slight differences in process
variables like diameter, weight, service time,
temperature, equipment, tooling, employee
actions, facility environment, materials,
measurement system, etc.
 Also called common/natural cause variation
 To reduce random variation, we must reduce
variation in the inputs and the process.
 As long as the distribution remains in specified
limits, the process said be ‘in control’.

LOGO
TYPES OF VARIATION (2)
2. Non-random variation
 Also called special cause variation or
assignable cause variation
 Caused by equipment out of adjustment,
worn tooling, operator errors, poor training,
defective materials, measurement errors,
new batch of raw materials etc.
 The process is not behaving as it usually
does.
 The cause should be identified and
corrected.

LOGO
Mean = the calculated average of all the
values in a given data set
Median = the central value of a data set
arranged in order
Mode = the value which occurs with
most frequency in a given data set
MEASURE OF CENTRAL TENDENCYMEASURE OF CENTRAL TENDENCY

LOGOMEASURE OF CENTRAL TENDENCYMEASURE OF CENTRAL TENDENCY
Ungrouped Data Grouped Data
Mean for population data:
Mean for sample data:
where:
= the sum af all values
N = the population size
n = the sample size,
= the population mean
= the sample mean
Mean for population data:
Mean for sample data:
where:
= midpoint
= frequency of a class
Mean
N
xu ∑=
n
xx ∑=
∑x
x
u
N
fxu ∑=
n
fxx ∑=
x
f

LOGO
Example (Ungrouped Data)
The following data give the prices (rounded to
thousand RM) of five homes sold recently in NEC.
158 189 265 127 191
Find the mean sale price for these homes.

LOGO
Solution
 Thus, these five homes were sold for an average
price of RM186 thousand @ RM186 000.
 The mean has the advantage that its calculation
includes each value of the data set.
186
5
930
5
191127265189158
=
=
++++=
∑= n
xx

LOGO
Example (Grouped Data)
The following table gives the frequency distribution of the
number of orders received each day during the past 50
days at the office of a mail-order company. Calculate the
mean.
Number of order f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
n = 50

LOGO
Solution
 Because the data set includes only 50 days, it represents
a sample. The value of is calculated in the following
table:
∑ fx
Number of order f x fx
10 – 12 4 11 44
13 – 15 12 14 168
16 – 18 20 17 340
19 – 21 14 20 280
n = 50 Σfx = 832

LOGO
Solution
The value of mean sample is:
Thus, this mail-order company received an average of
16.64 orders per day during these 50 days.
64.16
50
832==∑= n
fxx

LOGO
 The measures of central tendency such as mean,
median and mode do not reveal the whole picture of
the distribution of a data set.
 Two data sets with the same mean may have a
completely different spreads.
 The variation among the values of observations for
one data set may be much larger or smaller than for
the other data set
 Relative Dispersion Measurement :
i. Range
ii. Standard Deviation
MEASURE OF DIPERSIONMEASURE OF DIPERSION

LOGO
Range (Ungrouped Data)
Example
Find the range of production for this data set
RANGE = Largest value – Smallest value
State
Total Area
(square miles)
Arkansas 53,182
Louisiana 49,651
Oklahoma 69,903
Texas 267,277

LOGO
Solution
Find the range of production for this data set
Range = Largest value – Smallest value
= 267 277 – 49 651
= 217 626
Disadvantages:
 being influenced by outliers.
 Based on two values only. All other values in a data
set are ignored.

LOGO
Range (Grouped Data)
Example
Find the range for this data set
Range = Upper bound of last class – Lower bound of first class
Class Frequency
41 - 50 1
51 - 60 3
61 - 70 7
71 - 80 13
81 – 90 10
91 - 100 6
TOTAL 40
Solution
Upper bound of last class = 100.5
Lower bound of first class = 40.5
Range = 100.5 – 40.5 = 60

LOGO
Variance and Standard Deviation
 Standard deviation is the most used measure of
dispersion.
 A Standard Deviation value tells how closely the
values of a data set clustered around the mean.
 Lower value of standard deviation indicates that the
data set value are spread over relatively smaller range
around the mean.
 Larger value of data set indicates that the data set
value are spread over relatively larger around the
mean (far from mean).
 Standard deviation is obtained the positive root of the
variance

LOGO
Ungrouped Data
Variance
Standard
Deviation
Population
Sample
( )
N
N
xx∑ ∑−
=
2
2
2σ
( )
1
2
2
2
−
∑ ∑−
=
n
n
xx
s
2
σσ =
2
ss =

LOGO
Example
Let x denote the total production (in unit) of company
Find the variance and standard deviation
Company Production
A 62
B 93
C 126
D 75
E 34

LOGO
Solution
Company Production (x) x2
A 62 3844
B 93 8649
C 126 15876
D 75 5625
E 34 1156
Σx=390 Σx2
=35150
( )
5.1182
15
5
2
390
35150
1
2
2
2
=
−
−
=
−
∑ ∑−
=






n
n
xx
s
3875.34
50.1182
,
;50.11822
=
=
=
s
Therefore
sSince

LOGO
Grouped Data
Variance
Standard
Deviation
Population
Sample
N
N
fx
fx∑
∑
−
=





2
2
2σ
1
2
2
2
−
∑
∑
−
=





n
n
fx
fx
s
2
σσ =
2
ss =

LOGO
Example
Find the variance and standard deviation for the
following data:
No. of order f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
TOTAL n = 50

LOGOMEASURE OF DIPERSIONMEASURE OF DIPERSION
No. of order f x x2
fx fx2
10 – 12 4 11 121 44 484
13 – 15 12 14 196 168 2352
16 – 18 20 17 289 340 5780
19 – 21 14 20 400 280 5600
TOTAL n = 50 832 14216
5820.7
150
50
2
832
14216
1
2
2
2
=
−
−
=
−
∑
∑
−
=










n
n
fx
fx
s
Solution
Variance
75.2
5820.7
2
=
=
= ss
Standard Deviation
11
2
1012
int =
+
== pomidx

LOGO
 A frequency distribution for quantitative data lists
all the classes and the number of values that belong
to each class.
 Data presented in form of frequency distribution are
called grouped data.
FREQUENCY DISTRIBUTIONFREQUENCY DISTRIBUTION

LOGO
 The class boundary is given by the midpoint of the
upper limit of one class and the lower limit of the
next class. Also called real class limit.
 To find the midpoint of the upper limit of the first
class and the lower limit of the second class, we
divide the sum of these two limits by 2.
Example :
5.400
2
401400
=
+
=boundaryclassLower

LOGO
 Class Width (class size)
Example
 Class Midpoint or Mark
Example :
boundaryLowerboundaryUpperwidthClass −=
2005.4005.600 =−=classfirsttheofWidth
2
limlim
int
itUpperitLower
midpoClass
+
=
5.500
2
600401
1 =
+
=classsttheofWidth

LOGOFREQUENCY DISTRIBUTIONFREQUENCY DISTRIBUTION

LOGO
1. To decide the number of classes, we used Sturge’s
formula, which is
Where; c = the no. of classes
n = the no. of observations in the data set
2. Class width,
This class width is rounded to a convenient number
3. Lower Limit of the First Class or the Starting Point
 Use the smallest value in the data set
CONSTRUCTING FREQUENCY DISTRIBUTIONCONSTRUCTING FREQUENCY DISTRIBUTION
TABLESTABLES
nc log3.31+=
c
range
i
classesofnumber
valuesmallestvalueestl
i
>
−
>
arg

LOGO
Example
The following data give the total home runs hit by all players
of each of the 30 Major League Baseball teams during
2004 season

LOGO
Solution
1.
2.
3.Starting point = 135
( )
class
ncclassesofNumber
689.8
48.13.31
log3.31,
≈=
+=
+=
18
8.17
6
135242
,
≈
>
−
>iwidthClass

LOGO
Frequency Distribution for Data
Total Home
Runs
Class
Boundaries
Tally f
135-152 134.5 - 152.5 IIII IIII 10
153-170 152.5 - 170.5 II 2
171-188 170.5 - 188.5 IIII 5
189-206 188.5 - 206.5 IIII I 6
207-224 206.5 - 224.5 III 3
225-242 224.5 - 242.5 IIII 4
Σf=30

LOGO
Histograms
134.5 152.5 170.5 188.5 206.5 224.5 242.5

LOGOTHE NORMAL CURVETHE NORMAL CURVE
The Histogram and the Normal Curve

The Theoretical Normal Curve

Properties of the Normal Curve:
 Theoretical construction
 Also called Bell Curve or Gaussian Curve
 Perfectly symmetrical normal distribution
 The mean of a distribution is the midpoint of the
curve
 The tails of the curve are infinite
 Mean of the curve = median = mode
 The “area under the curve” is measured in standard
deviations from the mean

Properties of the Normal Curve:
 Has a mean = 0 and standard deviation = 1.
 General relationships:±1 s = about 68.26%
±2 s = about 95.44%
±3 s = about 99.72%
-5 -4 -3 -2 -1 0 1 2 3 4 5
68.26%
95.44%
99.72%

Standard Scores
 One use of the normal curve is to explore
Standard Scores. Standard Scores are
expressed in standard deviation units, making
it much easier to compare variables measured
on different scales.
 There are many kinds of Standard Scores. The
most common standard score is the ‘z’ scores.
 A ‘z’ score states the number of standard
deviations by which the original score lies
above or below the mean of a normal curve.

The Z Score
 The normal curve is not a single curve but a
family of curves, each of which is determined
by its mean and standard deviation.
 In order to work with a variety of normal
curves, we cannot have a table for every
possible combination of means and standard
deviations.

The Z Score
 What we need is a standardized normal curve
which can be used for any normally distributed
variable. Such a curve is called the Standard
Normal Curve.
s
xxZ i −=

The Standard Normal Curve
 The Standard Normal Curve (z distribution) is
the distribution of normally distributed
standard scores with mean equal to zero and a
standard deviation of one.
 A z score is nothing more than a figure, which
represents how many standard deviation units
a raw score is away from the mean.

Example 1:
A normal curve has an average of 55.38 and a
standard deviation of 1.95. What percentage of
the area under the curve will fall between the
limits of 52.5 and 56.5

Solution:
Given data,
5.565.52,
95.1
38.55
21
==
=
=
XandXLimits
x
σ
]57.0[7157.0,
57.0
95.1
38.555.56
]48.1[0694.0,
48.1
95.1
38.555.52
22
2
2
11
1
1
ATableAppendixZForAArea
xxZ
xxZ
==
=−=−=
−==
−=−=−=
σ
σ

Solution:
The area under the normal distribution curve is
Therefore, the area under the curve between limits
52.5 and 56.5 = A2 – A1
= 0.7157 – 0.0694
= 0.6463
= 64.63%
µ
52.38
x2
56.5
Area, A2
Area, A1
x1
52.5

Example 2:
The life of an equipment in hours is a random
variable following normal distribution having a
mean life of 5600 hours with standard deviation
of 840 hours.
i.What % of equipment will fail between 5000
and 6200 hours.
ii.What % will survive more than 6000 hours.
iii.What % will fail below 3500 hours.

Solution:
Given data,
i.Percentage of equipment that will fail between 5000
and 6200 hours.
Let x1 = 5000 hours, x2 = 6200 hours
hours
hoursx
840
5600
=
=
σ
]71.0[7611.0,
71.0
840
56006200
]71.0[2389.0,
71.0
840
56005000
22
2
2
11
1
1
xxZ
xxZ
==
=−=−=
−==
−=−=−=
σ
σ

Solution:
Area under the curve between 5000 hours and 6200
hours
= A2 – A1
= 0.7611 – 0.2389
= 0.5222
= 52.22%
µ
5600
x2
6200
Area, A2
Area, A1
x1
5000

Solution:
ii. Percentage of equipment that will survive more
than 6000 hours.
Hence, the percentage is
= 1 – A1 = 1 – 0.6844 = 0.3156 = 31.56%
]48.0[6844.0,
48.0
840
56006000
1 ATableAppendixZForAArea
xxZ
==
=−=−= σ
Total area = 1
Area, A
μ
5600
x
6000
Area, A1

Solution:
iii. Percentage of equipment that will fail below 3500
hours.
Hence, the percentage is
= 0.062%
]5.2[0062.0,
5.2
840
56003500
1 ATableAppendixZForAArea
xxZ
−==
−=−=−= σ
μ
5600
x
3000
Area, A1

LOGO
Click to edit company slogan .

Basic Statistics Guide

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Basic Statistics Guide

Ähnlich wie Basic Statistics Guide (20)

Mehr von Asraf Malik

Mehr von Asraf Malik (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Basic Statistics Guide