4. Suppose that a PE coach records the
height of each student in his class.
This is an example of
univariate data
Univariate – consist of observations on a
single variable made on individuals in a
sample or population
5. Suppose that the PE coach records the
height and weight of each student in his
class.
This is an example of
bivariate data
Bivariate - data that consist of pairs of
numbers from two variables for each
individual in a sample or population
6. Suppose that the PE coach records the
height, weight, number of sit-ups, and
number of push-ups for each student in
his class.
This is an example of
multivariate data
Multivariate - data that consist of
observations on two or more variables
8. Categorical variables
• Qualitative
• Consist of categorical responses
1. Car model Which of
They are all
these
2. Birth year categorical
variables are
3. Type of cell phone variables!
NOT
4. Your zip code categorical
5. Which club you have joined variables?
9. Numerical variables
• quantitative It makes sense to perform math
There operations on these values.
are two types of
numerical variables -
• observations or measurements take on
discrete and continuous
numerical values
1. GPAs Which of these
Does it makes sense
variables are
2. Height of students to find an average
NOT numerical?
3. Codes to combination locks
code to combination
4. Number of text messages per day locks?
5. Weight of textbooks
10. Two types of variables
categorical numerical
discrete continuous
11. Discrete (numerical)
• Isolated points along a number line
• usually counts of items
• Example: number of textbooks purchased
12. Continuous (numerical)
• Variable that can be any value in a
given interval
• usually measurements of something
• Examples: GPAs or height or weight
13. Are the following variables categorical
or numerical (discrete or continuous)?
1. the color of cars in the teacher’s lot
Categorical
2. the number of calculators owned by
students at your college Discrete numerical
3. the zip code of an individual
Categorical
Is money a measurement or a count?
4. the amount of time it takes students to
drive to school Continuous numerical
5. the appraised value of homes in your city
Discrete numerical
14. Graphical Display Variable Type Data Type Purpose
Display data
Bar Chart
Use the following table to
Univariate Categorical
distribution
Comparative Bar
Chart determine2an appropriate 2 or more
Univariate for or
more groups
Categorical
Compare
groups
Dotplot graphical display a data set. data of
Univariate
What types
Numerical
Display
graphs can be
distribution
Numerical used with
Comparative Univariate for 2 or Compare 2 or more
dotplot more groups groups
Stem-and-leaf categorical
Display data
Univariate Numerical
display data?
distribution
Comparative stem- Univariate for 2 Compare 2 or more
and-leaf groups In section
Numerical
2.3, we will
groups
Histogram Univariate see how the various
Numerical
Display data
distribution
graphical displays for
Investigate
Scatterplot Bivariate univariate,relationship between
Numerical numerical
data compare.
2 variables
Univariate, collected Investigate trend
Time series plot Numerical
over time over time
16. Bar Chart
When to Use: Univariate, Categorical data
To comply with new standards from the U. S. Department of
This ischart is afrequency distribution.
A bar called a graphical bottom of the
Transportation, helmets should reach thedisplay for
motorcyclist’s ears. The report “Motorcycle Helmet Use in 2005 –
categorical data.
Overall Results” (National Highway Traffic Safety Administration,
Augustfrequency distribution is by observing 1700
A 2005) summarized data collected a table that
displays the possible categories along
motorcyclists nationwide at selected roadway locations.
The frequency for a particular
Each time a motorcyclist passed by,frequencies or whether
with the associated the observer that noted
the category is thehelmet (N), a noncompliant helmet (NC),
rider was wearing no number of times
or a compliant helmet (C). frequencies. set.
relative in the data
category appears
Helmet Use Frequency
The data are summarized in this
N 731
table:
NC 153
This should equal the total number of
C 816
observations. 1700
17. Bar Chart
To compile with new standards from the U. S. Department of
Transportation, helmets should reach the bottom of the
motorcyclist’s ears. The report “Motorcycle Helmet Use in 2005 –
Overall Results” (National Highway Traffic Safety Administration,
August 2005) summarized data collected by observing 1700
motorcyclists nationwide at selected roadway locations.
Each time a motorcyclist passed by, the observer noted whether
the rider was wearing no helmet (N), a noncompliant helmet (NC),
or a compliant helmet (C).
The data are summarized in this Relative
Helmet Use
Helmet Use Frequency
table: N 731
0.430
This should equal 1 NC 153
0.090
816
(allowing for rounding). C 0.480
1700
1.000
18. Bar Chart
How to construct
1. Draw a horizontal line; write the categories or
All bars should have the same width so
labels below the line at regularly spaced
that both the height and the area of
intervals
the bar are proportional to the
frequency or relative frequency of the
2. Draw a vertical line; label the scale using
corresponding categories.
frequency or relative frequency
3. Place a rectangular bar above each category
label with a height determined by its frequency
or relative frequency
19. Bar Chart
What to Look For
Frequently or infrequently occurring
categories
Here is the
completed bar chart
for the motorcycle
helmet data.
Describe this graph.
20. Comparative Bar Charts
When to Use Univariate, Categorical data for
Bar charts can two or more groups
also be used to provide a visual
You use relative frequency rather
comparison of two or more groups.
than frequency on the vertical axis
How to constructyou can make meaningful
so that
comparisons even if the sample
• Constructed by using the same horizontal and
sizes are not the same.
vertical axes for the bar charts of two or
more groups
• Usually color-coded to indicate which bars
Why?
correspond to each group
• Should use relative frequencies on the
vertical axis
21. Each year the Princeton Review conducts a survey of
students applying to college and of parents of college
applicants. In 2009, 12,715 high school students
responded to the question “Ideally how far from home
would you like the college you attend to be?”
Also, 3007 parents of students applying to college
responded to the question “how far from home would
you like the college yourshould you do first?Data
What child attends to be?”
are displayed in the frequency table below.
Frequency
Ideal Distance Students Parents Create a
Less than 250 miles 4450 1594 comparative
250 to 500 miles 3942 902 bar chart
500 to 1000 miles 2416 331
with these
data.
More than 1000 miles 1907 180
22. Relative Frequency
Ideal Distance Students Parents
Less than 250 miles .35 .53
250 to 500 miles .31 .30
500 to 1000 miles .19 .11
More than 1000 miles .15 .06
Found by dividing the frequency by the total
number of students
Found by dividing the frequency by the total
number of parents
What does this
graph show about
the ideal distance
college should be
from home?
24. Dotplot
When to Use Univariate, Numerical data
How to construct
1. Draw a horizontal line and mark it with an
appropriate numerical scale
2. Locate each value in the data set along the
scale and represent it by a dot. If there are
two are more observations with the same
value, stack the dots vertically
25. Dotplot
What to Look For
• A representative or typical value (center)
in An outlier is an unusually large or small
the data set
• The extent to which the data values
data value.
spread out
• The nature offor deciding when an(shape)
A precise rule the distribution observation
is an outlier is given we look for with
What in Chapter 3.
along the numberunivariate, numerical data
line
• The presence of unusual values (gaps and
sets are similar for
outliers) dotplots, stem-and-leaf
displays, and histograms.
26. The first three observations are
Professor Norm gave a 10-question quiz last
plotted – note that you stack the
week in his introductory statistics class. The
points if values are repeated.
number of correct answers for each student is
recorded below.First draw a horizontal line with an
appropriate scale.
6 8 6 5 4 7 9 4 5
8 5 This 6 the completed dotplot.
4 is 7 7 3 8 7
6 7 6 6 6 5 5 9
Write a sentence
or two describing
this distribution. 2 4 6 8 10
Number of correct answers
Number of correct answers
27. What to Look For
What to Look For
The representative or typical value (center) in the data set
• • The representative or typical value (center) in the data set
• • The extent to which the data values spreadone that has a
A symmetrical distributionspread out
data values is out
The extent to which thedata values spread out
extent to which
vertical Norm curve, (shape) alongthe left line
If we draw a
•Professor line of gave a 10-question the number line is
• The nature of the distribution (shape) along the number half
The nature of symmetry where quiz last
smoothing out this
• • The presence of unusual values
The presence of unusual values
week in hiswe will see that of the right half.The
dotplot, mirror image statistics class.
a introductory
number of ONLY oneanswers for each student is
there is correct peak.
recorded below.
Distributions with a single
peak are said to be
2 4 6 8 10
unimodal.
Number of correct answers
TheDistributions with two
center for the distribution of the number of
peaks are bimodal, and
correct answers is about 6. There is not a lot of
with more than two peaks
variability in the observations. The distribution
are multimodal.
is approximately symmetrical with no unusual
observations.
28. Comparative Dotplots
When to Use Univariate, numerical data with
observations from 2 or more groups
How to construct
• Constructed using the same numerical scale
for two or more dotplots
• Be sure to include group labels for the
dotplots in the display
What to Look For
Comment on the same four attributes, but
comparing the dotplots displayed.
29. Distributions where the right tail is longer
In anotherthatcomparative be positively the data
Notice introductorysidedotplotclass, skewed
Create a the left statistics with of the
than the left is said to (or lower tail)
sets from the two statistics classes,
Professor Skew also gaveto 10-question quiz. The
(or skewed a the right).
distribution is longer than the right side (or
number of correct answers for andSkew’s class
Is the distribution for Prof. Skew. to is
Professors’ Norm each said
upper tail). This distribution is studentbe
recorded direction of skewness is always inleft).
The below.
negatively skewed (or skewed to the the
symmetric? Why or why not?
direction of the longer tail.
The center8 the distribution for the number
6 of 10 8 8 7 9 8 10
of correct 7
answers on 9Prof. Skew’s class is 8
Prof. Skew
8 8 7 7 3 7
larger than the center of Prof. Norm’s class.
8 7 6 6 6 5 5 9 8
There is also more variability in Prof. Skew’s
distribution. Prof. Skew’s distribution
appears to have an unusual observation where
one student few had 2 answers correct while
Write a only
Prof. Norm
there were no unusual observations in Prof.
sentences
Norm’s class. The distribution for Prof. Skew
comparing these
is negatively skewed while Prof. Norm’s
distributions.
distribution is more symmetrical.
2 4 6 8 10
Number of correct answers
30. Stem-and-Leaf Displays
When to Use Univariate, Numerical data
How to construct
Stem-and-leafor more of the leading digits for
• Select one displays are an effective way to
summarize univariate numerical data when the
the stem
• List the data set stem values in a vertical
possible is not too large.
column
• Record the leaf for each observationlist
Each observation is split intosure to
Be two parts:
beside theconsists of theevery stem from
Stem – corresponding stem digit(s)
first value
• Indicate the units forthe finaland leavesthe
Leaf - consists of stems digit(s) to
the smallest
someplace in the display
largest value
31. Stem-and-Leaf Displays
What to Look For
• A representative or typical value (center)
in the data set
• The extent to which the data values
spread out
• The presence of unusual values (gaps and
outliers)
• The extent of symmetry in the data
distribution
• The number and location of peaks
32. The completed stem-and-leafleaf will is shown
So the display be the last
below. two digits.
TheLet 5.6% be represented (AARP Bulletin, Junethe
article “Going Wireless” as 05.6% so that all
2009) reported thedigits in front of the decimal. If we
numbers have two estimated percentage of due to
However, it is somewhat difficult tothe leaf is 5.6
With 05.6%, read
households with only wireless phone service (no behind –
use the 2-digits, we would have will be written to 20
the 2-digit stems. from 05
and it stems
landline) for the 50 U.S. states andstems! the second
the stem the District of
that’s way too many 0. For
Columbia. Data use the first digit (tens) as our stems.
So let’s just for the 19 Eastern but theare written
number, states first digit
5.7 also is given
A common practice is to drop all
here.
in theThis makes the (with a
behind the stem 0 display
leaf.
5.6 5.7 20.0 16.8 16.5 13.4commato read, and
easier between). 8.0
10.8 9.3 11.6
11.4 16.3 14.0 10.8 7.8 DOES NOT change the
20.6 10.8 5.1 11.6
What is the leaf for 20.0%
overall distribution of
A 5stem-and-leaf display is anshould that leafway
and What is the variablebe
where appropriate
0 5.6, 9 8 79.3, 8.0, 7.8, 5.1
5 5.7, 5
5.7 the data set.
written?
1 6.8, 3 0 13.4, 4 0 0 1summarize theseinterest?
6 6 6.5, 1 6 0.8, 1.6, 1.4, 6.3, 4.0, 0.8, 0.8, 1.6 data.
to of
2 0.0, 0.6
00
0.0
(A dotplot would also be Wireless percent
a reasonable choice.)
33. The article “Going Wireless” (AARP Bulletin, June
2009) reported the estimated percentage of
households with only wireless phone service (no
landline) for the 50 U.S. states and the District of
Columbia. Data for the 19 Eastern states are given
here. While it is not
necessary to write
The center of the distribution
0 559875
5789
for the the leaves in order
estimated percentage
1 6 6 3 01 1 3 4 6 6 6
0001 1164001
of households with only wireless
2 00 from smallest to
phone service is approximately
Stem: tens
11%. There doesby doing so,
largest, not appear to
Leaf: ones be much the centerThisthe
variability. of
Write a few display distribution is more
appears to be a
sentences describing unimodal, symmetric
easily seen.
this distribution. distribution with no outliers.
34. Comparative Stem-and-Leaf Displays
When to Use Univariate, numerical data with
observations from 2 or more group
How to construct
• List the leaves for one data set to the right
of the stems
• List the leaves for the second data set to the
left of the stems
• Be sure to include group labels to identify
which group is on the left and which is on the
right
35. The article “Going Wireless” (AARP Bulletin, June
2009) reported the estimated percentage of
households with only wireless phone service (no
landline) for the 50 U.S. states and the District of
Columbia. Data for the 13 Western states are given
Western States Eastern States
here. 998 0 555789
8766110 1 00011134666
11.7 18.9 9.0 16.7 8.0 22.1 9.2 10.8
521 2 00
21.1 17.7 25.5 16.3 11.4
Stem: tens
Leaf: ones
The center of the distribution ofcomparative stem-
Create a the estimated
and-leaf display comparing the
Write a few of households with only wireless phone service
percentage
for the Western states is a little larger than the center
sentences distributions of the Eastern
comparing these states. Both distributions are
for the Eastern
and Western states.
distribution. with approximately the same amount of
symmetrical
variability.
36. Histograms
When to Use Univariate numerical data
Dotplots and stem-and-leaf displays are not
How to construct Constructed data
Discrete differently for
effective ways to summarize numerical
• Draw a horizontal scale and mark it with the possible
data when the discrete contains a large
data set versus continuous
values for the variable
• Draw a vertical scale and data it datafrequency or
number of mark values. data almost
Discrete numerical
with
relative frequencyalways result from counting. In
Histograms are value, draw a rectangle centered a
such cases, each observation is
• Above each possible displays that don’t work
at well for small a height corresponding to its
that value with data sets but do work well
whole number
frequency or relative frequency
for larger numerical data sets.
What to look for
Center or typical value; spread; general shape
and location and number of peaks; and gaps or
outliers
37. Queen honey bees mate shortly after they become adults.
During a mating flight, the queen usually takes multiple
partners, collecting sperm that she will store and use
throughout the rest of her life.
A paper, “The Curious Promiscuity of Queen Honey Bees”
(Annals of Zoology [2001]: 255-265), provided the
following data on the number of partners for 30 queen
bees.
12 2 4 6 6 7 8 7 8 11
8 3 5 6 7 10 1 9 7 6
9 7 5 4 7 4 6 7 8 10
Here is a dotplot
of these data.
2 4 6 8 10 12
Number of Partners
38. The bars should be centered over the
discrete data values and have heights
Queen honey bees continued
corresponding to the frequency of each
data value.
6
Frequency
4
2
0 2 4 6 8 10 12
In practice, histograms for discrete data ONLY show the
Number of partners
The distributionnumber built the histogram on of queen
rectangular bars. We of partners, partners top of the
The variable, for the number of is discrete. To
honey bees to show create a histogram: with aover the
dotplot is approximatelybars are centered center
that the symmetric
at 7 partners already have athat heights of the bars are
discrete data values and horizontal axis – of
we and a somewhat large amount
variability. There doesn’t appear to befrequency
we need to frequency of each data any outliers.
the add a vertical axis for value.
39. Here are two histograms showing the of
What do you notice about the shapes
“queen bee these two One uses frequency
data set”. histograms?
on the vertical axis, while the other uses
relative frequency
40. Histograms with equal width intervals
When to Use Univariate numerical data
How to construct Continuous data
• Mark the boundaries of the class intervals on the
horizontal axis
• Use either frequency or relative frequency on the
vertical axis
• Draw a rectangle for each class interval directly above
that interval. The height of each rectangle is the
frequency or relative frequency of the corresponding
interval
What to look for
Center or typical value; spread; general shape and
location and number of peaks; and gaps or outliers
41. The top dotplot shows all the data
Consider the following data on carry-on luggage
values in each interval stacked in
weight for 25 airline passengers.
This interval includes 10the the interval. barsbut not
With25.0 17.9 the middle 30.0 rectangular to cover
continuous data, of 18.0 values 28.2 27.8
10.1 27.6 and all 28.7 up
an interval 20.9 data values (notwill 20.8 28.5
15. of 33.8 intervals just one value).
including 31.4 The next 27.6 21.9 19.9 include 15 and
28.0
Looking 24.9up todotplot, it 22.7easy 20,see that we
all22.4 at this but not including to and so on.
values 26.4 22.0 34.5 is 25.3
could use intervals with a width of 5.
Here is a is a continuous numerical data set.
This dotplot of this data set.
42. From the dotplot, it is easy to see how the
continuous histogram is created.
43. Comparative Histograms
The article “Early Television Exposure and
The biggest difference between the two histograms
Subsequent Attention Problems in Children”
• Mustthe lowApril with a much higher proportion of 3-
is at use two separate histograms with the
(Pediatrics, end, 2004) investigated the television
same horizontal U.S. children. 0-2 TVfrequency on
year-old children axis and relative hours show
viewing habits of falling in the These graphsinterval
the vertical axis 1-year-old children.3-year old
than
the viewing habits of 1-year old and
children.
1-yr-olds 3-yr-olds
44. Histograms with unequal width intervals
When to use
when you have a concentration of data in the
middle with some extreme values
How to construct
construct similar to histograms with
continuous data, but with density on the
vertical axis
relative frequency for interval
density
width of interval
45. When people are asked for the values such as age or weight,
they sometimes relative frequency on the verticalThe
When using shade the truth in their responses. axis,
article “Self-Report of Academic Performance” (Social
the proportional area principle is violated.
Methods and Research [November 1981]: 165-185) focused
on SAT scores and grade point average (GPA). For each
student inthe relativethe difference between reported to
Notice the sample, frequency for the interval 0.4 GPA
and< actual GPA was than the relative frequency for the
2.0 is smaller determined. Positive differences
resulted from individuals reporting GPAs the bar is MUCH
interval -0.1 to < 0, but the area of larger than the
Class Relative Frequency
correct value.
Interval
larger.
-2.0 to < -0.4 0.023
-0.4 to < -0.2 0.055
-0.2 to < 0.1 0.097
-0.1 to < 0 0.210
0 to < 0.1 0.189
0.1 to < 0.2 0.139
0.2 to < 0.4 0.116
0.4 to < 2.0 0.171
46. GPAs continued
Class Relative Width Density
To fix this problem, we Interval Frequency
need to find the -2.0 to < -0.4 0.023 1.6 0.014
density of each -0.4 to < -0.2 0.055 0.2 0.275
interval. -0.2 to < 0.1 0.097 0.1 0.970
-0.1 to < 0 0.210 0.1 2.100
0 to < 0.1 0.189 0.1 1.890
relative frequency for interval
density 0.1 to 0.2 0.139 0.1 1.390
width of interval
0.2 to < 0.4 0.116 0.2 0.580
0.4 to 2.0 0.171 1.6 0.107
This is a correct
histogram with unequal
widths.
48. Scatterplots
When to Use Bivariate Numerical data
How to construct
1. Draw horizontal and vertical axes. Label the
horizontal axis and include an appropriate scale for
the x-variable. Label the vertical axis and include
an appropriate scale for the y-variable.
2. For each (x, y) pair in the data set, add a dot in
the appropriate location in the display.
What to look for
Relationship between x and y
49. The accompanying table gives the cost (in
dollars) and an overall quality rating for 10
different brands of men’s athletic shoes
(www.consumerreports.org).
Cost 65 45 45 80 110 110 30 80 110 70
Rating 71 70 62 59 58 57 56 52 51 51
Is there a relationship between x = cost and
y = quality rating?
A scatterplot can help
answer this question
50. Cost 65 45 45 80 110 110 30 80 110 70
Rating 71 70 62 59 58 57 56 52 51 51
Is there a relationship
70 between x = cost and
Next, plotdraw completed
Here is eachand y) pair.
yFirst, the (x, label
= quality rating?
appropriate horizontal
scatterplot.
Rating
60
and vertical axes.
There appears to be a
50 negative relationship
20 40 60 80 100
between cost of athletic
Cost shoes and their quality
rating – does that
surprise you?
51. Time Series Plots
When to Use Bivariate data with time and
another variable
How to construct
1. Draw horizontal and vertical axes. Label the
horizontal axis and include an appropriate scale
for the x-variable. Label the vertical axis and
include an appropriate scale for the y-variable.
2. For each (x, y) pair in the data set, add a dot in
the appropriate location in the display.
3. Connect each dot in order
What to look for
trends or patterns over time
52. The Christmas Price Index is computed each year by
PNC Advisors. It is a humorous look at the cost of
giving all the gifts described in the popular Christmas
song “The Twelve Days of Christmas”
(www.pncchristmaspriceindex.com).
Describe any
trends or
patterns
that you see.
Why is there a downward
trend between 1993 & 1995?
54. Pie (Circle) Chart
When to Use Categorical data
How to construct
• A circle is used to represent the whole data set.
• “Slices” of the pie represent the categories
• The size of a particular category’s slice is
proportional to its frequency or relative
frequency.
• Most effective for summarizing data sets when
there are not too many categories
55. Pie (Circle) Chart
The article “Fred Flintstone, Check Your Policy” (The Washington
Post, October 2, 2005) summarized a survey of 1014 adults
conducted by the Life and Health Insurance Foundation for
Education. Each person surveyed was asked to select which of five
fictional characters had the greatest need for life insurance:
Spider-Man, Batman, Fred Flintstone, Harry Potter, and Marge
Simpson. The data are summarized in the pie chart.
The survey results were quite
different from the assessment
of an insurance expert.
The insurance expert felt that
Batman, a wealthy bachelor, and
Spider-Man did not need life
insurance as much as Fred
Flintstone, a married man with
dependents!
56. Segmented can be difficult to construct by
A pie chart (or Stacked) Bar Charts
When to Use circular Categorical data makes
hand. The shape sometimes
if difficult to compare areas for different
categories, particularly when the relative
How to construct
frequencies are similar.
• Use a rectangular bar rather than a circle
to represent the entire data set.
So, we could use a segmented bar chart.
• The bar is divided into segments, with
different segments representing
different categories.
• The area of the segment is proportional to
the relative frequency for the particular
category.
57. Segmented (or Stacked) Bar Charts
Each year, the Higher Education Research Institute
conducts a survey of college seniors. In 2008,
approximately 23,000 seniors participated in the survey
(“Findings from the 2008 Administration of the College
Senior Survey,” Higher Education Research Institute,
June 2009).
This segmented bar
chart summarizes
student responses to
the question: “During
the past year, how much
time did you spend
studying and doing
homework in a typical
week?”
59. Avoid these Common Mistakes
1. Areas should be proportional to frequency,
relative frequency, or magnitude of the
number being represented.
By replacing naturally drawn to
The eye is the bars of a bar
large areas in graphical displays.
chart with milk buckets,
Sometimes, indistorted. to make
areas are an effort
the graphical displays more
interesting, designers1980 sight
The two buckets for lose
of this important principle.
represent 32 cows, whereas
Consider this graph (1970 Today,
the one bucket for USA
October 3, 2002).cows.
represents 19
60. Avoid these Common Mistakes
1. Areas should be proportional to frequency,
relative frequency, or magnitude of the
number being represented.
Another common distortion
occurs when a third
dimension is added to bar
charts or pie charts. This
distorts the areas and
makes it much more
difficult to interpret.
61. Avoid these Common Mistakes
2. Be cautious of graphs with broken axes (axes
that don’t start at 0).
• The use of broken axes in a scatterplot does not result
in a misleading picture of the relationship of bivariate
data.
• In time series plots, broken axes can sometimes
exaggerate the magnitude of change over time.
• In bar charts and histograms, the vertical axis should
NEVER be broken. This violates the “proportional
area” principle.
62. Avoid these Common Mistakes
2. Be cautious of graphs with broken axes (axes
that don’t start at 0).
This bar chart is similar to
one in an advertisement for
a software product designed
to raise student test scores.
Areas of the bars are not
proportional to the
magnitude of the numbers
represented – the area for
the rectangle 68 is more
than three times the area of
the rectangle representing
55!
63. Avoid these Common Mistakes
3. Notice that the intervals between observations are
Watch out for unequal time spacing in time
irregular,plots. points in the plot are equally spaced
series yet the
along the time axis. This makes it difficult to assess
the rate ofis a correct time series plot.
Here change over time.
If observations
over time are not
made at regular
time intervals,
special care must
be taken in
constructing the
time series plot.
64. Avoid these Common Mistakes
4. Be careful how you interpret patterns in
Does an increase in the number of Methodist
scatterplots.
ministers CAUSE the increase in imported rum?
Consider the following scatterplot showing the relationship between
the number of Methodist ministers in New England and the amount
of Cuban rum imported into Boston from 1860 to 1940
(Education.com). 35000
r = .999973 30000
A strong pattern in a
Number of Barrels
of Imported Rum 25000
scatterplot means that 20000
the two variables tend to
vary together in a 15000
predictable way, BUT it 10000
does not mean that there
is a cause-and-effect 5000
0 50 100 150 200 250 300
relationship. Number of Methodist Ministers
65. Avoid these Common Mistakes
5. Make sure that a graphical display creates
the right first impression.
Consider the following graph
from USA Today (June 25,
2001). Although this graph
does not violate the
proportional area principle,
the way the “bar” for the
none category is displayed
makes this graph difficult to
read. A quick glance at this
graph may leave the reader
with an incorrect impression.