2. Summarizing distributions of
univariate data
1. Measuring center: median, mean
2. Measuring spread: range, interquartile
range, standard deviation
3. Measuring position: quartiles, percentiles,
standardized scores (z-scores)
4. Using boxplots
5. The effect of changing units on summary
measures
3. Measuring Center
When describing the “center” of a set of
data, we can use the mean or the median.
Mean: “Average” value
Median: “Center” value (Q2)
4. Where is the Center of the
Distribution?
If you had to pick a single number to describe
all the data what would you pick?
It’s easy to find the center when a histogram is
unimodal and symmetric—it’s right in the
middle.
On the other hand, it’s not so easy to find the
center of a skewed histogram or a histogram
with more than one mode.
5. Mean
To find the mean
of a set of
observations, add
their values and
divide by the
number of
observations.
x =
xi∑
n
6. Find the mean of:
2 3 4 6 8 12
6
1286432 +++++
833.5=x
7. Although the mean is the most popular
measure of center, it is not always the most
appropriate.
The mean is very sensitive to extreme
observations (outliers).
Because outliers affect the mean, we say
that the mean is NOT a resistant measure of
center.
So if the mean is not a resistant measure of
center, what is? Median
8. Median
The median is the value with
exactly half the data values
below it and half above it.
It is the middle data value
once the data values have
been ordered) that divides
the histogram into two
equal areas
It has the same units as
the data
The median is not
influenced by extreme
observations, so we say
that the median is a
resistant measure of
center.
9. Finding the Median
First sort the values (arrange them in order),
then follow one of these:
1. If the number of data values is even, the
median is found by computing the mean of
the two middle numbers.
2. If the number of data values is odd, the
median is the number located in the exact
middle of the list.
10. 5.40 1.10 0.42 0.73 0.48 1.10
0.42 0.48 0.73 1.10 1.10 5.40
(in order - even number of values – no exact middle shared by two numbers)
0.73 + 1.1 MEDIAN is 0.915
2
5.40 1.10 0.42 0.73 0.48 1.10 0.66
0.42 0.48 0.66 0.73 1.10 1.10 5.40
(in order - odd number of values)
exact middle MEDIAN is 0.73
11. Mean vs Median
Mean Median
Average value of variable Typical value of variable
Not resistant to outliers Resistant to outliers
A good measure when the data
is symmetric
A reliable measure regardless
of the shape of the distribution
Farther out in the long tail than
the median when data is
skewed
Close to the center even when
the data is skewed
Easy to find Less prone to mistakes
15. Range
Distance between largest and smallest values.
Range = Maximum – Minimum
Range is useful if there are no outliers.
16. Interquartile Range
How to find the IQR:
1. Find median
2. Find the median of both halves of data
the lower median is 1st
Quartile
the upper median is 3rd
Quartile
3. Subtract the two quartile scores
17. Outliers
One general rule of thumb for identifying
outliers is finding any data points that lie:
Lower than 1.5 * IQR below Q1
OR
Higher than 1.5 * IQR above Q3
18. Check For Understanding
• The “Descriptive Statistics” of test grades for a certain
class are listed below.
Mean = 74.71
Median = 76
Standard Deviation = 12.61
Minimum = 35
Maximum = 94
Q1 = 68
Q3 = 84
• (a) Determine the IQR for this data.
• (b) Using the answer from part (a), determine whether
the lowest and highest values in the data are outliers.
19. Standard Deviation
A standard deviation is a measure of the average
deviation from the mean.
sx =
1
n −1
(xi − x)2
∑
20. If the data is uniform or symmetric use:
If the data is skewed, use:
MeanCenter:
Spread:standard deviation
MedianCenter:
Spread:Five-number summary, Range, IQR
21. Distributions with Outliers
Since outliers affect mean and standard
deviation, it is usually better to use median
and IQR
However, if the distribution is unimodal—use
mean and median and just report outliers
separately
However, if you find a simple reason for
outlier, eliminate it and use mean and
standard devation—if symmetric
22. Measuring Position
Quartiles
Percentiles
Z-scores
• We can either use z-
Scores or percentiles to
declare the location of
an observation in a
distribution.
• z-Scores use the mean
and standard deviation.
• Percentiles use a
position relative to the
starting point.
23. Percentiles/Quartiles
• is the notation for
the kth percentile
• is the notation for
the nth quartile
P Q25 1=
P Q50 2= = median
P Q75 3=
24. Finding Percentiles
If you are trying to find the percentile
corresponding to a certain score x:
number of scores <
100
total number of scores
x
Percentile = ×
• Percentiles are used often when reporting academic
scores such as SAT scores. Let’s say you get a 620 on
the math portion of the SAT. It might also indicate
that you are in the “78th percentile”. That means
that you scored better than 78% of all students
taking that particular SAT.
25. Measuring Relative Standing With
Standardized Values (z-Scores)
• One way to compare an individual to the whole
distribution is to describe it’s location in the
distribution relative to the mean.
• Let’s do this by describing how many standard
deviations an individual is away from the mean value.
• We call this the “standardized value,” or, the “z-
Score.”
26. Here is how to interpret z-scores:
A z-score less than 0 represents an element less than
the mean.
A z-score greater than 0 represents an element
greater than the mean.
A z-score equal to 0 represents an element equal to
the mean.
A z-score equal to 1 represents an element that is 1
standard deviation greater than the mean; a z-score
equal to 2, 2 standard deviations greater than the
mean; etc.
A z-score equal to -1 represents an element that is 1
standard deviation less than the mean; a z-score equal
to -2, 2 standard deviations less than the mean; etc.
27. Five-Number Summary
The five-number summary of a distribution
consists of the smallest observation, the first
quartile, the median, the third quartile, and the
largest observation, written in order from
smallest to largest.
Minimum Q1 Median Q3 Maximum
28. Boxplots
The five-number summary divides the
distribution roughly into quarters. This leads
to a new way to display quantitative data, the
boxplot.
29. How to make a boxplot:
1. Draw and label a number line that includes
the range of the distribution.
2. Draw a central box from Q1 to Q3.
3. Note the median M inside the box.
4. Extend lines (whiskers) from the box out to
the minimum and maximum values that are
not outliers.
32. Effect of Changing Units
If you add a constant to every
value, the mean and median
increase by the same
constant.
Example:
Suppose you have a set of
scores with a mean equal to 5
and a median equal to 6. If
you add 10 to every score,
the new mean will be 5 + 10 =
15; and the new median will
be 6 + 10 = 16.
If you multiply every value
by a constant. Then, the
mean and the median will
also be multiplied by that
constant.
Example:
Assume that a set of scores
has a mean of 5 and a
median of 6. If you multiply
each of these scores by 10,
the new mean will be 5 * 10
= 50; and the new median
will be 6 * 10 = 60.
Sometimes, researchers change units (minutes to hours,
feet to meters, etc.). Here is how measures of central
tendency are affected when we change units:
33. Check For Understanding
The average score on a test is 150 with a
standard deviation of 15. Each score is then
increased by 25. What are the new mean and
standard deviation?
34. Check For Understanding
The test grades from a college statistics class are shown
below.
85 72 64 65 98 78 75 76 82 80 61 92 72 58 65 74 92 85 74 76 77 77
62 68 68 54 62 76 73 85 88 91 99 82 80 74 76 77 70 60
(a) Construct two different graphs of these data
(b) Calculate the five-number summary and the mean and
standard deviation of the data.
(c) Describe the distribution of the data, citing both the
plots
and the summary statistics found in questions (a) and (b).