2. Statistics
⢠Quite often you will encounter statistics
in your day to day life. These statistics
may be mentioned on the news,
commercials, weather reports or the
Internet. It's important for you to be able
to interpret these statistics, as they may
affect your opinion and the choices you
make.
3. Measures of Central Tendency
⢠In this lesson you will learn about
measures of central tendency. Large
amounts of data are often summarized
by stating the values of the mean,
median and mode.
⢠Although these three measures of central
tendency are usually located near the
centre of a group of data, they often have
different calculated values.
4. Calculating the
Mean, Median and Mode
⢠The word 'average' is often used in everyday
language to describe the sum of a set of values
divided by the total number of values.
⢠In statistics, this term is known as the mean or
arithmetic mean.
⢠There are two other common statistical terms that
are used to refer to the centre of a set of values
â the median and the mode.
⢠The median of a set of values is the middle value
when the values are arranged in ascending or
descending order. The mode of a set of values is
the value that occurs the most often.
5. Calculating the Mean, Median and
Mode
Example 1:
⢠Calculate the mean, the median and the mode
for the following set of values: 55, 62, 70, 77,
78, 78
⢠Sample Solution:
â˘
(If there is an even number of values you must
average the two values in the middle.)
6. Calculating the Mean, Median and
Mode
Example 1:
⢠Calculate the mean, the median and the mode
for the following set of values: 55, 62, 70, 77,
78, 78
⢠Sample Solution:
Mode = 78 (the number that occurs the most
often)
7. Calculating the Mean, Median and
Mode
Example 2:
⢠Kansas Ross has a mean mark of 50% for her
first three math tests and then she earns a mark
of 70% on her fourth test. Kansas states that
since the average of 50 and 70 is 60, her new
mean math mark is 60%. Do you think Kansas
is correct? Explain your reasoning.
8. Calculating the Mean, Median and
Mode
Example 2:
⢠Sample Solution:
⢠Kansas is incorrect. If her first three tests have a
mean mark of 50%, then the sum of her first
three tests must be 150 (150/3 = 50). Then the
sum of her four tests will be 150 + 70 = 220, so
the mean of her four tests is 220/4 = 55.
⢠Kansas did not take into account the fact that
each test should be weighted evenly to give a
mean of 55% not 60%.
9. The Effect of Outliers on the
Measures of Central Tendancy
⢠In statistics, an outlier is an observation that is
numerically distant from the rest of the data. It's
a value that lies outside (and is much larger or
smaller than) the other values in a set of data.
⢠For example, in the scores 3, 25, 27, 28, 29, 32,
33, 85, both 3 and 85 are considered "outliers"
because they are numerically distant from the
other numbers in the data set.
10. The Effect of Outliers on the
Measures of Central Tendancy
Example:
⢠There are five people in a group that are 61, 61,
63, 64, 66, and 90 inches tall. a) Determine the
mean, median and mode.
b) What is the outlier? If you remove the
outlier, which measure(s) of central
tendancy is affected the most?
11. The Effect of Outliers on the
Measures of Central Tendancy
Solution:
⢠The mean is 67.5, the median is 63.5 (halfway between
63 and 64) and the mode is 61.
⢠90 is the outlier. If you remove the 90 from the set of
values, the new mean is 63, and the median is also 63.
The mode is unchanged at 61. The outlier affects the
mean more because when dealing with median it
doesn't matter what the actual value of the outlier is.
Whether it was 90 or 130, taking it out would have the
same effect on the median. The mean depends on the
actual value of the outlier.
12. Determining the Trimmed Mean
⢠A trimmed mean is calculated by discarding a
certain percentage of the lowest and the highest
scores â then calculating the mean of the
remaining scores.
⢠For example, a mean trimmed 25% is calculated by
discarding the top and bottom 25% of the scores,
then taking the mean of the remaining scores.
⢠A trimmed mean is less susceptible to the effects of
extreme scores (outliers) than the arithmetic mean.
The trimmed mean is designed to eliminate the
impact of outliers.
13. Determining the Trimmed Mean
⢠These are the steps to follow to determine
the trimmed mean:
â Find the number of observations, denoted 'n'.
â Reorder them from smallest to largest.
â Find the proportion trimmed, p=P/100, where P = %
trimmed.
â Calculate np to determine how many values to trim at
each end.
14. Determining the Trimmed Mean
Example:
⢠Find the 10% trimmed mean of 2, 35, 46, 47,
51, 51, 59, 60, 61, 121.
15. Determining the Trimmed Mean
Solution:
⢠n=10, p = 10/100 = 0.10, np = 10*0.1 = 1 which
is an integer so trim exactly one observation at
each end. Trim off 2 and 121 which leaves you
with 8 observations.
If np has a fractional part present, you can
round that portion to the nearest integer to
determine how many values to trim at each end.
17. Determine the Weighted Mean of a
Set of Data
⢠The weighted mean is similar to an arithmetic
mean (the most common type of average). But
with the weighted mean, each of the data points
contributing equally to the final average, some
data points contribute more than others. If all
the weights are equal, then the weighted mean
is the same as the arithmetic mean.
19. Determine the Weighted Mean of a
Set of Data
⢠Consider the following example:
⢠Your math teacher has two math classes. One
class has 5 students, while the other has 10
students. The grades in each class on a test
were:
Class 1: 55, 69, 80, 84, 62
Class 2: 70, 90, 55, 84, 88, 93, 78, 69, 98, 75
⢠The mean for class 1 is 70, and the mean for
class 2 is 80.
20. Determine the Weighted Mean of a
Set of Data
⢠If you calculate the mean of the two classes
together you get 75 (70 + 80 = 150 150/2 = 75).
However, this does not account for the different
number of students in each class, and the value
of 75 does not reflect the mean student grade
for all 15 students.
⢠The accurate student mean for all of the
students, without regard to which class they are
in, can be found by totalling all of the grades
and dividing by 15 students.
21. Determine the Weighted Mean of a
Set of Data
⢠This can also be accomplished by using a
weighted mean of the class means:
⢠The use of weighted mean makes it possible to
find the mean student grade in the case where
only the class means and the number of
students in each class are available.
22. Determine the Weighted Mean of a
Set of Data
⢠To calculate the Weighted Mean for a set of
data follow these steps:
â Multiply each value by its weight.
â Add up the products of value multiplied by weight to
get the total value.
â Add the weight themselves to get the total weight.
â Divide the total value by the total number of individual
values.
23. Determine the Weighted Mean of a
Set of Data
Example:
⢠One hundred people were surveyed to
find out how many days they exercised
per week. The following chart
summarizes the results of the survey.
What is the mean number of days that
this group of people exercised per week?
Number of days of 0 1 2 3 4 5 6 7
exercise per week
Number of People 6 5 7 15 29 16 14 8
24. Determine the Weighted Mean of a
Set of Data
Sample Solution:
⢠1. Multiply each value by its weight.
0 x 6 = 0, 1 x 5 = 5, 2 x 7 = 14, 3 x 15 = 45, 4 x 29 = 116, 5 x 16 = 80,
6 x 14 = 84, 7 x 8 = 56
⢠2. Add up the products of value multiplied by weight to get the
total value.
Sum = 0 + 5 + 14 + 45 + 116 + 80 + 84 + 56 = 400
⢠3. Add the weight themselves to get the total weight.
There are 100 people in the survey.
⢠4. Divide the total value by the total number of individual values.
400/100 = 4 days
The mean number of days that this group of people exercised per
week is 4 days.
25. Practice
⢠Lorem ipsum dolor sit amet, consectetuer
adipiscing elit. Vivamus et magna. Fusce
sed sem sed magna suscipit egestas.
⢠Lorem ipsum dolor sit amet, consectetuer
STATISTICS
adipiscing elit. Vivamus et magna. Fusce
Measures magna suscipit egestas.
sed sem sed of Central Tendency
Worksheet #1
26. Statistics
⢠One of the most fundamental principles in statistics
is that of variability. The study and understanding
of variability is important in
medicine, manufacturing, science, meteorology, bu
siness and many aspects of our daily lives.
How affective is a particular drug? What is the
average life span of a D-cell battery? Whatâs the
probability of precipitation tomorrow? Which age
group watches the most TV in a week?
Statistics and more specifically the study of
variability help us to answer questions like these.
In this unit you will be learning about the variability
of data.
27. Percentiles
⢠One way to find out how well you have done on
a test is to convert your test mark to a percent
score. This percent score indicates how well
you would have done on the test if it were
marked out of 100.
⢠Although somewhat meaningful in itself, your
score takes on more meaning when it is
compared to that of your classmates. How
many students scored higher than you did?
How many scored lower? The study of
percentiles helps us answer these questions.
28. Percentiles
⢠Cara writes a test and scores 48 out of
possible 60 marks.
⢠Therefore, 48 out of 60 as a percent
score is 80%.
⢠A mark of 80% seems like a very good
mark. It is often given a letter grade of âAâ
and is associated with excellence.
⢠Is it?
29. Percentiles
⢠Suppose 100 students have written the same test as Cara.
Suppose only 10 of these students score less than 48 out of 60.
How does Caraâs mark compare with the marks of the other
students who have written the same test? Assume that no other
student has scored exactly 48 out of 60.
Solution
You can compare Caraâs mark with those of the other students
who have written the same test using the following percent bar.
Since the majority of the students have scored higher than 48 out
of 60, Caraâs mark of 48 out of 60 is not that impressive.
30. Percentiles
⢠Suppose, however, that of the 100 students who have written the
same test as Cara, 90 of them score lower than 48 out of 60. How
does Caraâs mark now compare with the marks of the other
students? Assume no other student scores 48 out of 60.
Solution
You can again compare Caraâs mark with those of the other
students with a percent bar.
Relative to the other test scores, Caraâs mark of 48 out of 60 is
very good.
31. Percentile Rank
⢠A score becomes more meaningful when
it is compared to other scores. One way
to compare a score is to assign it a
percentile rank.
A percentile rank indicates the percent
of all scores that fall below a particular
score.
32. Percentile Rank
⢠In the first example, where 10% of the students have
scored below 48 out of 60, Caraâs percentile rank would
be 10 out of 100 or in the 10th percentile. A mark in the
10th percentile indicates that Cara has scored better than
only 10% of all the students who have written the test.
In the second example, where 90% of students have
scored below 48 out of 60, Caraâs percentile rank would
be 90 out of 100 or in the 90th percentile. A mark in the
90th percentile indicates that Cara has scored better than
90% of the students who have written the test.
⢠A percentile rank compares the number of scores less
than or equal to a given score to the total number of
scores. The higher the percentile rank, the better the
score compares to the other scores. The lower the
percentile rank, the poorer the score compares to the
other scores.
33. Calculating Percentile Rank
⢠The formula is as follows:
B = the number of scores Below a given score
E = the number of scores Equal to the given score, including the
given score.
However, if there are no other scores equal to the given score,
then E = 1.
n = the total number of scores
The percentile formula takes all the scores less than the given
score (B) and adds these to half the scores equal to the given
score (E). This sum is then converted to a percent (percentile) by
dividing by the total number of scores (n) and then multiplying that
value by 100.
Note that the percentile rank is usually rounded up to the next
whole number.
35. Standard Deviation
⢠Measures of central tendency give us a sense
of the âaverageâ of all values in a set of data.
The range measures the variability of the data
in that it is the difference between the greatest
and least values. Although useful, these
statistics donât give a complete picture of the
data set.
Standard deviation is a more complex measure
of variability that measures the distance that
each piece of data is from the mean.
36. Standard Deviation
⢠The standard deviation of a sample is
represented by the symbol Sx and is
calculated using the following formula:
37. Standard Deviation
⢠To calculate the standard deviation follow the 6
steps outlined below:
⢠Step 1
Determine the mean ( X ).
⢠Step 2
Determine the difference between each score
(x) and the mean (X). This calculation is
represented by the following:
38. Standard Deviation
⢠Step 3
Square each difference by multiplying each
difference by itself. Calculate the standard
deviation for this data set using the formula
⢠Step 4
Determine the sum of these squares. This sum
is represented by the following:
39. Standard Deviation
⢠Step 5
Divide the sum of the squares by n - 1. (Recall that n
is the number of values.)
This calculation is called the variance and is
represented by the following:
⢠Step 6
To determine the standard deviation, calculate the
square root of the variance.
This calculation determines the standard deviation and
is represented by the following:
40. Standard Deviation
⢠Step 5
Divide the sum of the squares by n - 1. (Recall that n
is the number of values.)
This calculation is called the variance and is
represented by the following:
⢠Step 6
To determine the standard deviation, calculate the
square root of the variance.
This calculation determines the standard deviation and
is represented by the following:
43. Distribution of Data
⢠Data samples are often collected from
very large populations. The heights of
Senior 4 students in Manitoba, the life
expectancy of new automobiles, the mass
of a new penny and the number of CDs
sold monthly are all examples of such
large populations.
When this type of data is displayed in a
frequency histogram*, a bell-shaped
curve such as this often results.
44. Distribution of Data
⢠A graph of this shape is called a normal curve
and the distribution of the data along this curve
is called a normal distribution. Because the
distribution of many naturally occurring sets of
data follow a normal distribution, the normal
curve is widely used in statistics.
46. Characteristics of Normal
Distribution
Observe the characteristics of this histogram:
⢠The tops of the bars are connected producing a
smooth curve.
⢠This smooth curve is bell shaped.
⢠Most studentsâ scores are clustered around the
mean score.
⢠The histogram is
symmetrical on either side
of the mean.
⢠Very few students scored
less than 45 or greater
than 85.
47. Characteristics of Normal Distribution
⢠There is a significant relationship between any
normal distribution and the standard deviation
introduced earlier.
⢠Every normal distribution has the same percent of
its data within given standard deviations of its
mean. The following graph indicates the percents of
data within one, two, and three standard deviations
from the mean for any normal distribution.
48. Characteristics of Normal Distribution
Every normal curve has the following characteristics.
⢠It is bell shaped and extends in both directions.
⢠The mean is at the centre of the curve and the curve is
symmetrical about the mean. This means that the curve can
be folded along the line marking the mean and the left side of
the curve will fall on top of the right side.
⢠The mean equals the median. There are an equal number of
pieces of data below and above the mean.
⢠The scores that make up the normal distribution tend to
cluster around the middle with very few values more than
three standard deviations away from the mean on either side.
⢠Approximately 68% (34% + 34%) of all the data falls within
one standard deviation of the mean.
⢠Approximately 28% (14% + 14%) of all data falls between one
and two standard deviations of the mean.
⢠Approximately 4% (2% + 2%) of all data falls between two and
three standard deviations of the mean.