Chapter 9 learning more about sample data(1)

2013/05/22
1
STATISTICS
X-Kit Textbook
Chapter 9
Precalculus Textbook
Appendix B: Concepts in Statistics
Par B.2
CONTENT
THE GOAL
Look at ways of summarising a large
amount of sample data in just one or two
key numbers.
Two important aspects of a set of data:
•The LOCATION
•The SPREAD
MEASURES OF CENTRAL TENDENCY
(LOCATION)
Arithmetic Mean (Average)
Mode (the highest point/frequency)
Median (the middle observation)
Number of fraudulent cheques received at a
bank each week for 30 weeks
Week
1
2 3 4 5 6 7 8 9 10
5 3 8 3 3 1 10 4 6 8
Week
11
12 13 14 15 16 17 18 19 20
3 5 4 7 6 6 9 3 4 5
Week
21
22 23 24 25 26 27 28 29 30
7 9 4 5 8 6 4 4 10 4
ARITHMETIC MEAN
• 𝒙 =
𝟏𝟔𝟒
𝟑𝟎
= 𝟓. 𝟒𝟕
• To calculate the MEAN add all the data points
in our sample and divide by die number of
data points (sample size).
• The MEAN can be a value that doesn’t
actually match any observation.
• The MEAN gives us useful information about
the location of our frequency distribution.

2013/05/22
2
GRAPH
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10
Frequency
Frequency
CALCULATE THE MEAN
Raw Data
• 𝑥 =
𝑥
𝑛
• 𝑥 is data
points
• 𝑛 is number
of
observations
Frequency
Table
• 𝑥 =
𝑥𝑓
𝑛
• 𝑥 is data
points
• 𝑛 is number
of
observations
• 𝑓 is the
frequency
Frequency
Table (Intervals)
• 𝑥 =
𝑥𝑓
𝑛
• 𝑥 is midpoints
for intervals
• 𝑛 is number
of
observations
• 𝑓 is the
frequency
CALCULATE THE MEAN - FREQUENCY TABLE:
NUBEROFFRAUDULENT CHEQUESPERWEEK
Distinct Values TallyMarks Frequency
1 / 1
2 0
3 //// 5
4 //// // 7
5 //// 4
6 //// 4
7 // 2
8 /// 3
9 // 2
10 // 2
Truck Data: weights (in tonnes) of 20 fully
loaded trucks
Truck
1
2 3 4 5 6 7 8 9 10
Weight
4.54
3.81 4.29 5.16 2.51 4.63 4.75 3.98 5.04 2.80
Truck
11
12 13 14 15 16 17 18 19 20
Weight
2.52
5.88 2.95 3.59 3.87 4.17 3.30 5.48 4.26 3.53
CALCULATE THE MEAN - GROUPED
FREQUENCY TABLE:
TruckData: weights(intonnes)of20fullyloadedtrucks
Class Intervals Frequency Midpoint
𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 𝟐. 𝟓 + 𝟑. 𝟎 ÷ 𝟐 = 2.75
𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25
𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75
𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25
𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75
𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25
𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75
MODE
•The mode is the interval with the
HIGHEST FREQUENCY.
•There can be two or more modes in a set
of data – then the mode would not be a
good measure of central tendency.
•MULTI-MODAL data consist of more than
one mode.
•UNI-MODAL data consist of only one
mode.

2013/05/22
3
GRAPH: The MODE = 4
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10
Frequency
Frequency
Call Centre Data: waiting times (in seconds)
for 35 randomly selected customers
C1 2 3 4 5 6 7 8 9 10 11 12
75 37 13 90 45 23 104 135 30 73 34 12
C13 14 15 16 17 18 19 20 21 22 23 24
38 40 22 47 26 57 65 33 9 85 87 16
C25 26 27 28 29 30 31 32 33 34 35
102 115 68 29 142 5 15 10 25 41 49
FREQUENCY TABLE: The MODAL CLASS is the
interval 𝟐𝟓 < 𝒙 ≤ 𝟓𝟎
Class Intervals TallyMarks Frequency
0 ≤ 𝑥 ≤ 25 //// //// 10
25 < 𝑥 ≤ 50 //// //// / 11
50 < 𝑥 ≤ 75 //// / 6
75 < 𝑥 ≤ 100 /// 3
100 < 𝑥 ≤ 125 /// 3
125 < 𝑥 ≤ 150 // 2
HISTOGRAM: MODAL CLASS (𝟐𝟓 < 𝒙 ≤ 𝟓𝟎]
0
2
4
6
8
10
12
Intervals
[0;25]
(25;50]
(50;75]
(75;100]
(100;125]
(125;150]
THE MEDIAN – RAW DATA:
Numberoffraudulentchequesreceived atabankeach weekfor30weeks
Week
1
2 3 4 5 6 7 8 9 10
5 3 8 3 3 1 10 4 6 8
Week
11
12 13 14 15 16 17 18 19 20
3 5 4 7 6 6 9 3 4 5
Week
21
22 23 24 25 26 27 28 29 30
7 9 4 5 8 6 4 4 10 4
MEDIAN
• Median = 5
• Put all observations in order from smallest to
largest, then the middle observation is the
MEDIAN.
1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5,
5, 6, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9, 10, 10

2013/05/22
4
DON’T FALL INTO THE COMMON TRAP
• The median is NOT the middle of the range of
observations, for example
1, 1, 1, 1, 1, 3, 9
The median is 1 (the middle observation).
The middle of the range (9 – 1) is 5! Big
difference!
MEDIAN
Odd Number of
Observations,
for example 7
Median Position
𝒏+𝟏
𝟐
Even Number of
Observations,
for example30
Median Position
half-way between
𝒏
𝟐
𝒂𝒏𝒅 (
𝒏
𝟐
+ 𝟏)
FINDTHE MEDIAN -FREQUENCYTABLE:
NUBER OF FRAUDULENT CHEQUES PERWEEK
Distinct Values Frequency Cumulative
Frequency
1 1 1
2 0 1
3 5 6
4 7 13
5 4 17
6 4 21
7 2 23
8 3 26
9 2 28
10 2 30
FIND THE MEDIAN - GROUPED FREQUENCY
TABLE:
ClassIntervals Frequency Midpoint
𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 𝟐. 𝟓 + 𝟑. 𝟎 ÷ 𝟐 = 2.75
𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25
𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75
𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25
𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75
𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25
𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75
FIND THE MEDIAN FROM A GROUPED
FREQUENCY TABLE
•Median (middle observation)?
•Find the class interval in which that
observation lies.
?
CALCULATIONS
Raw Data
Mean
Mode
Median
Frequency Table
(Ungrouped
Data)
Mean
Mode
Median
Frequency Table
(Grouped Data)
Mean
Mode
Median

2013/05/22
5
HOW TO CHOOSE THE BEST MEASURE OF
LOCATION?
• When choosing the best measure of location, we
need to look as the SHAPE of the distribution.
• For nearly symmetric data, the mean is the best
choice.
• For very skewed (asymmetric) data, the mode or
median is better.
• The mean moves further along the tail than the
median, it is more sensitive to the values far from
the centre.
SYMMETRIC histogram:
Mean = Median = Mode
A POSITIVELY SKEWED (skewed to the right)
histogram has a longer tail on the right side:
Mode < Median < Mean
A NEGATIVELY SKEWED (skewed to the left)
histogram has a longer tail on the left side:
Mean < Median < Mode
PROBLEM
•We can find two very different data sets (one
distribution very spread out and another very
concentrated) with measures of central
tendency EQUAL.
•To find a true idea of our sample, we have to
MEASURE THE SPREAD OF A DISTRIBUTION,
called the spread dispersion.
MEASURESOF SPREAD(DISPERSION)
Interquartile Range
Variance
Standard Deviation

2013/05/22
6
MEASURINGSPREAD
•Think of a distribution in terms of
percentages, a horizontal axis equally divided
into 100 percentiles.
•The 10th percentile marks the point below
which 10% of the observations fall, and
above which 90% of observations fall.
•The 50th percentile, below which 50% of the
observations lie, is the median.
WORKINGWITH A PERCENTILE
• 𝑝% of the observationfall belowthe 𝑝 𝑡ℎ percentile.
𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏 =
𝒑
𝟏𝟎𝟎
𝒏 + 𝟏
• Workingwith the example on fraudulentcheques:
1, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6,
7, 7, 8, 8, 8, 9, 9, 10, 10
𝑷 𝟓𝟎 =
𝟓𝟎
𝟏𝟎𝟎
𝟑𝟎 + 𝟏 = 𝟏𝟓. 𝟓
• 15.5 tells us where to find our 50th percentile.
• 15 tells us which observation to go to, and 0.5 tells us how far to
move along the space between that observation and the next
highest one.
FORMULA
• 𝑷 𝟓𝟎 = 𝒙 𝟏𝟓 + 𝟎. 𝟓 𝒙 𝟏𝟔 − 𝒙 𝟏𝟓
𝑷 𝒑 = 𝒙 𝒌 + 𝒅 𝒙 𝒌+𝟏 − 𝒙 𝒌
• 𝑃 means percentile
• 𝑝 tell us which percentile
• 𝑘 the whole number calculated from the
position
• 𝑑 the decimal fraction calculated from the
position
WORKINGWITH PERCENTILESFROMUNGROUPEDFREQUENCYDATA:
Distinct Values Frequency Cumulative Frequency
1 1 1
2 0 0 + 1 = 1
3 5 1 + 5 = 6
4 7 6 + 7 = 13
5 4 13 + 4 = 17
6 4 17 + 4 = 21
7 2 21 + 2 = 23
8 3 23 + 3 = 26
9 2 26 + 2 = 28
10 2 28 + 2 = 30
WORKING WITH PERCENTILES (AND
MEDIAN) FROM GROUPED DATA
• To identify the class interval 𝑳 < 𝒙 ≤ 𝑼 containing the
𝑝 𝑡ℎ percentile:
𝒑
𝟏𝟎𝟎
𝒏 + 𝟏
• The decimal fraction for grouped data is:
𝒅 =
𝑷𝒐𝒔𝒊𝒕𝒊𝒐𝒏−𝑺𝒖𝒎 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒊𝒆𝒔 𝒕𝒐 𝑳
𝑭𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 𝒐𝒇 𝒄𝒍𝒂𝒔𝒔 𝑳 < 𝒙 ≤ 𝑼
• Calculate the 𝑝 𝑡ℎ percentile:
𝑷 𝒑 ≈ 𝑳 + 𝒅 𝑼 − 𝑳
FIND THE MEDIAN - GROUPED FREQUENCY
TABLE:
Class Intervals Frequency CumulativeFrequency
𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 4
𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 5
𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 10
𝟒. 𝟎 < 𝐱 ≤ 𝟒. 𝟓 3 13
𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 16
𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 19
𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 20

2013/05/22
7
FIND THEMEDIAN-GROUPEDFREQUENCYTABLE:
• To identify the class interval 𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 containing
the 50 𝑡ℎ percentile:
𝟓𝟎
𝟏𝟎𝟎
𝟐𝟎 + 𝟏 = 𝟏𝟎. 𝟓
𝒅 =
𝟏𝟎.𝟓 − 𝟏𝟎
𝟑
=
𝟏
𝟔
𝑷 𝟓𝟎 ≈ 𝟒. 𝟎 + 𝒅 𝟒. 𝟓 − 𝟒. 𝟎 = 𝟒. 𝟎𝟖𝟑𝟑𝟑
MEASURINGSPREAD
• If we measure the DIFFERENCE in value between
one percentile and another, this would give us an
idea of how widely our data is spread out.
• INTERQUARTILE RANGE (IQR) = 75th – 25th Percentiles
• The bigger the IQR, the more spread out the data.
• The 75th percentile ≥ 25th percentile, therefor the
IQR ≥ 0 .
• We tend to use the MEDIAN (as measure of
central tendency) together with the IQR.
FIND THE IQR - GROUPED FREQUENCY
TABLE:
ClassIntervals Frequency CumulativeFrequency
𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 4
𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 5
𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 10
𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 13
𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 16
𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 19
𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 20
• To identify the class interval 𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 containing
𝟕𝟓
𝟏𝟎𝟎
𝟐𝟎 + 𝟏 = 𝟏𝟓. 𝟕𝟓
𝒅 =
𝟏𝟓. 𝟕𝟓 − 𝟏𝟑
𝟑
= 𝟎. 𝟗𝟏𝟕
𝑷 𝟕𝟓 ≈ 𝟒. 𝟓 + 𝒅 𝟓. 𝟎 − 𝟒. 𝟓 = 𝟒. 𝟗𝟓𝟖
• To identify the class interval 𝟑. 𝟓 < 𝒙 ≤ 𝟒.0 containing
𝟐𝟓
𝟏𝟎𝟎
𝟐𝟎 + 𝟏 = 𝟓. 𝟐𝟓
𝒅 =
𝟓. 𝟐𝟓 − 𝟓
𝟓
= 𝟎. 𝟎𝟓
𝑷 𝟐𝟓 ≈ 𝟑. 𝟓 + 𝒅 𝟒. 𝟎 − 𝟑. 𝟓 = 𝟑. 𝟓𝟐𝟓
• IQR = 4.958 – 3.525 = 1.433
MEASURINGSPREAD
• When we use the MEAN as our measure of central
tendency, we usually choose A MEASURE OF HOW FAR
THE DATA IS SPREAD OUT AROUND THE MEAN.
• Two measures of spread that are based on the mean are
the VARIANCE and the STANDARD DEVIATION.
• An advantage of standard deviation is that it is measured
in the same units as the original observations.
• The variance and standard deviation are closely related.
• The variance (𝒔 𝟐 or 𝝈 𝟐) is the square of the standard
deviation (𝒔 or 𝝈).

2013/05/22
8
VARIANCE& STANDARD DEVIATION
• Variance is the rough average of all the squared
distances from the mean:
𝒔 𝟐 =
𝒙 − 𝒙 𝟐
𝒏 − 𝟏
Or
𝒔 𝟐 =
𝟏
𝒏 − 𝟏
𝒙 𝟐 −
𝒙 𝟐
𝒏
• Variance is always a positive number.
Number of fraudulent cheques received at a
bank each week for 30 weeks
Week
1
2 3 4 5 6 7 8 9 10
5 3 8 3 3 1 10 4 6 8
Week
11
12 13 14 15 16 17 18 19 20
3 5 4 7 6 6 9 3 4 5
Week
21
22 23 24 25 26 27 28 29 30
7 9 4 5 8 6 4 4 10 4
VARIANCE &STANDARD DEVIATIONFROMRAWDATA 𝒙 = 𝟓. 𝟒𝟕
Distinct
Values
𝒙 − 𝒙 𝒙 − 𝒙 𝟐 Frequencies
𝒇 𝒙 − 𝒙 𝟐
1 1 − 5.47 = −4.47 −4.47 2
= 19.9809 𝟏𝟗. 𝟗𝟖𝟎𝟗
2 −3.47 12.0409 𝟎
3 −2.47 6.1009 𝟑𝟎. 𝟓𝟎𝟒𝟓
4 −1.47 2.1609 𝟏𝟓. 𝟏𝟐𝟔𝟑
5 0.47 0.2209 𝟎. 𝟖𝟖𝟑𝟔
6 0.53 0.2809 𝟏. 𝟏𝟐𝟑𝟔
7 1.53 2.3409 𝟒. 𝟔𝟖𝟏𝟖
8 2.53 6.4009 𝟏𝟗. 𝟐𝟎𝟐𝟕
9 3.53 12.4609 𝟐𝟒. 𝟗𝟐𝟏𝟖
10 4.53 20.5209 𝟒𝟏. 𝟎𝟒𝟏𝟖
(𝒙 − 𝒙 ) = 0 𝒙 − 𝒙 𝟐 = 82.509
𝟏𝟓𝟕. 𝟒𝟔𝟕
CALCULATE THE VARIANCE &STANDARD DEVIATION -
FREQUENCY TABLE:
Distinct Values Frequency Squared Observation
1 1 1
2 0 4
3 5 9
4 7 16
5 4 25
6 4 36
7 2 49
8 3 64
9 2 81
10 2 100
VARIANCE & STANDARD DEVIATION FROM
UNGROUPED FREQUENCY DATA
𝒔 𝟐
=
𝟏
𝒏 − 𝟏
𝒇𝒙 𝟐
−
𝒇𝒙 𝟐
𝒏
• Variance:
𝒔 𝟐
=
𝟏
𝟑𝟎 − 𝟏
𝟏𝟎𝟓𝟒 −
𝟏𝟔𝟒 𝟐
𝟑𝟎
= 𝟓. 𝟒𝟐𝟗𝟗
• Standard deviation: 𝑠 = 𝜎 = 5.4299 = 𝟐. 𝟑𝟑
FIND THE VARIANCE & STANDARD
DEVIATION - GROUPED FREQUENCY TABLE:
Class Intervals Frequency Midpoint Squared Midpoint
𝟐. 𝟓 ≤ 𝒙 ≤ 𝟑. 𝟎 4 2.75 7.5625
𝟑. 𝟎 < 𝒙 ≤ 𝟑. 𝟓 1 3.25 10.5625
𝟑. 𝟓 < 𝒙 ≤ 𝟒. 𝟎 5 3.75 14.0625
𝟒. 𝟎 < 𝒙 ≤ 𝟒. 𝟓 3 4.25 18.0625
𝟒. 𝟓 < 𝒙 ≤ 𝟓. 𝟎 3 4.75 22.5625
𝟓. 𝟎 < 𝒙 ≤ 𝟓. 𝟓 3 5.25 27.5625
𝟓. 𝟓 < 𝒙 ≤ 𝟔. 𝟎 1 5.75 33.0625

2013/05/22
9
VARIANCE & STANDARD DEVIATION FROM
GROUPED DATA
𝒔 𝟐
=
𝟏
𝒏 − 𝟏
𝒇𝒙 𝟐
−
𝒇𝒙 𝟐
𝒏
• Variance:
𝒔 𝟐
=
𝟏
𝟐𝟎 − 𝟏
𝟑𝟒𝟖. 𝟕𝟓 −
𝟖𝟏. 𝟓 𝟐
𝟐𝟎
= 𝟎. 𝟖𝟕𝟓𝟕
• Standard deviation: 𝑠 = 𝜎 = 0.8757 = 𝟎. 𝟗𝟒
CALCULATIONS
Raw Data
IQR
Variance &
Standard
Deviation
Frequency Table
(Ungrouped
Data)
IQR
Variance &
Standard
Deviation
Frequency Table
(Grouped Data)
IQR
Variance &
Standard
Deviation
BOX - AND - WISKER DIAGRAM
(5 POINT SUMMARY)
Minimum
Value
𝑸 𝟏 = 𝑷 𝟐𝟓
Median𝑸 𝟑 = 𝑷 𝟕𝟓
Maximum
Value
EXAMPLE
Consider the following set of 23 scores:
0 3 4 8 9 12 14 15 16 16 16 18
19 21 22 25 32 34 39 43 54 67 77
1. Find the 5 point summary
2. Draw a box – and – wisher plot to
illustrate the values
5 - POINT SUMMARY
0 3 4 8 9 12 14 15 16 16 16 18
19 21 22 25 32 34 39 43 54 67 77
HOMEWORK
•Example X-Kit textbook page 218 – 223.
•“Practise for your exams” page 224
number 1 & 2.
•Par B.2 (page B5) all odd number
questions.

Chapter 9 learning more about sample data(1)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Chapter 9 learning more about sample data(1)

Ähnlich wie Chapter 9 learning more about sample data(1) (20)

Mehr von Celumusa Godfrey Nkosi

Mehr von Celumusa Godfrey Nkosi (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Chapter 9 learning more about sample data(1)