This document discusses various measures of variability and dispersion in statistics, including the range, quartiles, interquartile range, percentiles, and five number summary. It provides definitions and examples of each measure. The range is defined as the difference between the highest and lowest values in a data set. Quartiles split a data set into four equal parts, with the first (Q1) and third (Q3) quartiles used to calculate the interquartile range. Percentiles indicate the percentage of values below a given score. The five number summary encapsulates the minimum, first quartile, median, third quartile, and maximum.
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Â
Lesson 7 measures of dispersion part 1
1. Introduction to Statistics for Built
Environment
Course Code: AED 1222
Compiled by
DEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED)
CENTRE FOR FOUNDATION STUDIES (CFS)
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
2. Lecture 8
Todayâs Lecture:
ď§ The range
ď§ Quartiles & the Interquartile range.
ď§ Percentiles
ď§ Percentiles Rank
ď§ The five number summary
Measures of variability/dispersion
Part I
3. What is/are Measures of Variation/Dispersion?
âIf the dispersion are widely dispersed, the central
location is said to be less representative of the
data as a whole.
âIf the dispersion are closely dispersed, the central
location is considered more reliable.
Measures of Variation/Dispersion
âMeasures of Variation / Dispersion is an
information on the spread or variability of the data
values.
4. Business Statistics: A Decision-
Making Approach, 7e Š 2008
Prentice-Hall, Inc.
⢠Describes how the data is spreading or the
variability
Same center,
different variation / dispersion
Shape of a Distribution and Variability / Dispersion
Measures of variation/dispersion cont.
Close dispersion
Wide dispersion
5. Business Statistics: A Decision-
Making Approach, 7e Š 2008
Prentice-Hall, Inc.
Right-SkewedLeft-Skewed Symmetric
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Shape of a Distribution and Variability / Dispersion
⢠Shapes either Symmetric or skewed
Measures of variation/dispersion cont.
6. The range
âThe range is simply the difference between the
largest and the smallest observed values in a data set.
Thus, range, including any outliers, is the actual
spread of data.
âA great deal of information is ignored when
computing the range, since only the largest and
smallest data values are considered.
Range = difference between highest and lowest
observed values
What is/are Range?
7. Business Statistics: A Decision-
Making Approach, 7e Š 2008
Prentice-Hall, Inc.
⢠Simplest measure of variation
⢠Difference between the largest and the smallest
observations:
Range = xmaximum â xminimum
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
The range cont.
Find The Range
8. âThe range value of a data set is greatly influenced
by the presence of just one unusually large or small
value (outlier).
âThe range can be expressed as an interval such as
4â10, where 4 is the lowest value and 10 is highest.
âOften, it is expressed as interval width. For
example, the range of 4â10 can also be expressed
as a range of 6.
The range cont.
9. Business Statistics: A Decision-
Making Approach, 7e Š 2008
Prentice-Hall, Inc.
⢠Ignores the way in which data are distributed
⢠Sensitive to outliers
7 8 9 10 11
12 Range = 12 - 7 = 5
7 8 9 10 11
12 Range = 12 - 7 = 5
Disadvantages of the Range
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
The range cont.
10. âOther disadvantages of using range is that it does not
measure the spread of the majority of values in a data set
âit only measures the spread between highest and
lowest values.
âAs a result, other measures are required in order to give
a better picture of the data spread.
âThe range is an informative tool used as a supplement
to other measures such as the standard deviation or
semi-interquartile range, but it should rarely be used as
the only measure of spread.
The range cont.
11. 1, 2, 4, 6, 12, 15, 19, 26
Smallest Value Largest Value
Does not take into account how clumped together
the scores are
Range = 26 - 1 = 25
Question: Is range a good measure of spread/dispersion?
Question: Is value 25 a good representative value?
The range cont.
Example 1:
12. 425425 430430 430430 435435 435435 435435 435435 435435 440440 440440
440440 440440 440440 445445 445445 445445 445445 445445 450450 450450
450450 450450 450450 450450 450450 460460 460460 460460 465465 465465
465465 470470 470470 472472 475475 475475 475475 480480 480480 480480
480480 485485 490490 490490 490490 500500 500500 500500 500500 510510
515515 525525 525525 525525 535535 549549 550550 570570 570570
575575 575575 580580 590590 600600 600600 600600 600600 615615 615615
510510
Smallest value
Largest value
does not take into
account how
clumped together
the scores are
Range = 615 â 425 = 190
Question: Is range a good measure of spread/dispersion?
The range cont.
Example 2:
13. The quartiles
âIn descriptive statistics, a quartile is any of the three values
which divide the sorted (arrayed) data set into four equal
parts, so that each part represents one fourth of the sampled
population.
âThe median divides the data into two equal sets.
âThe lower quartile is the value of the middle of the first set,
where 25% of the values are smaller than Q1 and 75% are
larger. This first quartile takes the notation Q1.
âThe upper quartile is the value of the middle of the second
set, where 75% of the values are smaller than Q3 and 25% are
larger. This third quartile takes the notation Q3.
What is/are Quartiles?
14. Business Statistics: A Decision-
Making Approach, 7e Š 2008
Prentice-Hall, Inc.
⢠Quartiles split the ranked data into 4 equal groups:
25% 25% 25% 25%
Q1 Q2 Q3
The quartiles cont.
> first quartile (designated Q1) = lower quartile = cuts off
lowest 25% of data = 25th percentile
> second quartile (designated Q2) = median = cuts off data
set in half (50% of data) = 50th percentile
> third quartile (designated Q3) = upper quartile = cuts off
highest 25% of data, or lowest 75% = 75th percentile
⢠Note that the second quartile Q2 (the 50th
percentile) is the median
15. The formula for locating the position of the
observation at a given percentile, y, with n data
points sorted in ascending order is:
i = Ly = (y/100)n
â˘Case 1: If L is a whole number, then the value will
be found halfway between positions L and L+1.
â˘Case 2: If L is a decimal, round up to the nearest
whole number. (for example, L = 1.2 becomes 2).
Locating the position of the quartiles
The quartiles cont.
16. Business Statistics: A Decision-
Making Approach, 7e Š 2008
Prentice-Hall, Inc.
Sample Data in Ordered Array:
11 12 13 16 16 17 18 21 22
Example:
ďŽ Find the first quartile (Q1)
n = 9
Q1 = 25th
percentile, so find i : i = (9) = 2.25
so round up and use the value in the 3rd
position: Q1 = 13
25
100
The quartiles cont.
17. The Interquartile Range
âThe interquartile range is another range used as a
measure of the spread.
âThe difference between upper and lower quartiles (Q3â
Q1), which is called the interquartile range, also indicates
the dispersion of a data set.
âThe interquartile range spans 50% of a data set, and
eliminates the influence of outliers because, in effect, the
highest and lowest quarters are removed.
What is/are Interquartile Range?
18. Business Statistics: A Decision-
Making Approach, 7e Š 2008
Prentice-Hall, Inc.
⢠Can eliminate some outlier problems by using the
interquartile range
⢠Eliminate some high-and low-valued observations
and calculate the range from the remaining values.
⢠Interquartile range = 3rd
quartile â 1st
quartile
The interquartile range cont.
Interquartile range = difference between upper quartile (Q3)
and lower quartile (Q1)
19. Business Statistics: A Decision-
Making Approach, 7e Š 2008
Prentice-Hall, Inc.
Median
(Q2)
X maximumX minimum
Q1 Q3
Example:
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 â 30 = 27
The interquartile range cont.
20. An exercise
A year ago, Ali began working at a computer store. His
supervisor asked him to keep a record of the number of
sales he made each month.
The following data set is a list of his sales for the last 12
months: 34, 47, 1, 15, 57, 24, 20, 11, 19, 50, 28, 37.
â˘Use Ali's sales records to find:
ďś the median
ďś the range
ďś the upper and lower quartiles
ďś the interquartile range
21. The values in an ascending array are:
1, 11, 15, 19, 20, 24, 28, 34, 37, 47, 50, 57.
Median = (12 + 1) á 2
= 6.5th value
= (6th
+ 7th
observations) á 2
= (24 + 28) á 2
= 26
Range = difference between the highest and lowest
values = 57 â 1
= 56
Exercise cont.
22. Lower quartile = value of middle of first half of data Q1 =
the median of 1, 11, 15, 19, 20, 24
= (3rd
+ 4th
observations) á 2
= (15 + 19) á 2
= 17
Upper quartile = value of middle of second half of data Q3
= the median of 28, 34, 37, 47, 50, 57
= (3rd
+ 4th
observations) á 2
= (37 + 47) á 2
= 42
Interquartile range = Q3âQ1 = 42 â 17 = 25
Exercise cont.
24. Percentiles
âThe ath
percentile is a value so that roughly a%
of the data are smaller and (100-a)% of the
data are larger.
âThere are three steps for computing a
percentile.
â Organize the data into an ascending array;
â Count the number of values (n);
â Select the a*(n+1) observation.
What is/are Percentile?
25. Business Statistics: A Decision-
Making Approach, 7e Š 2008
Prentice-Hall, Inc.
⢠The pth
percentile in an ordered array of n values is the value
in ith
position, where
ďŽ Example: Find the 60th
percentile in an ordered array of
19 values.
(n)
100
p
i =
11.4(19)
100
60
(n)
100
p
i ===
If i is not an integer, round
up to the next higher integer
value
So use value in the i =
12th
position
Percentiles cont.
26. You can't always be so lucky to have a*(n+1) be
a nice whole number. Here are some scenarios:
â˘If a*(n+1) is not a whole number, then go
halfway between the two adjacent numbers.
â˘If a*(n+1) < 1, select the smallest observation.
â˘If a*(n+1) > n, select the largest observation.
Percentiles cont.
27. ⢠Arrayed data: 18, 33, 58, 67, 73, 93, 147
⢠There are 7 observations (n=7).
⢠Select 0.50*(7+1) = 4th observation.
⢠Therefore, the 50th percentile equals 67.
-Notice that there are three observations larger than
67 and three observations smaller than 67.
Percentiles cont.
Example:
ďŽ Compute the 50th percentile for the following data set
73, 58, 67, 93, 33, 18, 147
50th
percentile
28. ďŽ Suppose we want to compute the 20th percentileâŚ
⢠Notice that p*(n+1) = 0.20*(7+1)=1.6. This is not a
whole number so we select halfway between 1st and
2nd observation or 25.5.
ďŽ Suppose we want to compute the 10th percentileâŚ
⢠Since 0.10*(7+1)=0.8, we should select the smallest
observation which is 18.
Percentile cont.
18, 33, 58, 67, 73, 93, 14710th
percentile
20th
percentile25.5Example: (Cont.)
29. ⢠Percentile rank of a score is the percentage of
scores in its frequency distribution which are
lower than it.
⢠Percentile ranks are commonly used to clarify
the interpretation of scores on standardized
tests.
⢠Given formula:
Pr = (number of value below the score+ 0.5) á ( total
number of values) (100%).
Percentiles Rank
What is/are Percentile Rank?
30. Business Statistics: A Decision-
Making Approach, 7e Š 2008
Prentice-Hall, Inc.
Sample Data in Ordered Array:
11 12 13 16 16 17 18 21 22
Example:
ďŽ Find the percentile rank for (17)
n = 9
Pr17 = [(no. of values below the score of 17+0.5)/no. of scores]*100
= [(5+0.5)/9]*100
= 61st
percentile.
This value indicated that the student, who obtained a score of 17 marks,
scored higher than 61% of the members of his/her class.
Percentile Rank cont.
61%
31. The five number summary
A five number summary uses percentiles to
describe a set of data. The five number summary
consists of
â˘MAX - the maximum value
â˘75% - the 75th
percentile
â˘50% - the 50th
percentile (or the median)
â˘25% - the 25th
percentile
â˘MIN - the minimum value
The five number summary splits the data into four
regions, each of which contains 25% of the data.
What is/are Five Number Summary?
32. ⢠The minimum value is = 1
⢠The lower half is {1, 3, 4}, and the median of that half (the
25th
percentile) is = 3
⢠The median (the 50th
percentile) is = 5
⢠The upper half is {6, 7, 9}, and the median of that half (the
75th
percentile) is = 7
⢠The maximum value is = 9
⢠The minimum value is = 1
⢠The lower half is {1, 3, 4}, and the median of that half (the
25th
percentile) is = 3
⢠The median (the 50th
percentile) is = 5
⢠The upper half is {6, 7, 9}, and the median of that half (the
75th
percentile) is = 7
⢠The maximum value is = 9
The five number summary cont.
Example:
ďŽ Find the five number summary for the data set.
1, 3, 4, 5, 6, 7, 9
33. Next classâŚ
The following topics will be discussed:
ď§ Measures of variability / dispersion (Part II):
ď§ The average absolute deviation
ď§ The Variance
ď§ The Standard deviation
ď§ Coefficient of Variation (CV)