2. Displaying Bivariate Data
Bivariate Data: data with two variables (two
quantities or qualities that change)
Generally one variable depends on the other
The dependent variable depends on the
independent variable
Eg. Height and Weight
Eg. Hours studied and test result
Tend to focus more on dependent and
independent variables when plotting scatterplots
K McMullen 2012
3. Displaying Bivariate Data
Back-to-back stem plots: are used to display the
relationship between a numerical variable and a
two-valued categorical variable
They are used to compare data sets using
summary statistics such as measures of centre
and measures of spread
Eg. Comparing Further Maths study scores
(numerical variable) with gender (male or female-
two-valued categorical variable)
K McMullen 2012
4. Displaying Bivariate Data
Parallel box plots: are used to display the relationship
between a numerical variable and a categorical
variable with two or more categories
They are used to compare sets of data using
summary statistics such as measures of centre and
measures of spread- also think of the 5 number
summary
Remember that parallel box plots must be placed on
the same axis (you can also do this on CAS)
Eg. The results achieved by 4 different further maths
classes
K McMullen 2012
5. Displaying Bivariate Data
Two-way frequency tables: are used to display
the relationship between two categorical
variables and can be represented graphically as
a segmented bar chart
Remember that it is easier to compare data sets
if you are working with percentages instead of
totals
In a frequency table you should place your
independent variable along the top row and your
dependent variable along the left column (this will
mean that all your columns must add to 100% if
done correctly)
K McMullen 2012
6. Displaying Bivariate Data
Scatterplots: are used to display the relationship
(correlation) between two numerical variables
The dependent variable is displayed on the vertical
axis
The independent variable is displayed on the
horizontal axis
The relationship between variables on a scatterplot
can be described in terms of:
Strength (strong, moderate, weak)
Direction (positive, negative)
Form (linear, non-linear)
K McMullen 2012
7. Displaying Bivariate Data
Scatterplots- continued
Pearson’s product-moment correlation coefficient (r)
is used to measure the strength of the scatterplot
The values of r range between -1 (perfect negative)
to 1 (perfect positive)
You can approximate the value of r (look at formula
on p. 101) but you can also calculate it using CAS
(obviously more reliable)
To interpret r look and copy the table on page 100 of
your textbook
K McMullen 2012
8. Displaying Bivariate Data
Scatterplots- continued
• The coefficient of determination (r2): this provides information about
the degree to which one variable can be predicted from another
variable provided that the variables have a linear correlation
• The coefficient of determination is calculated by squaring the
correlation coefficient (r)
• When commenting using r2 always convert your value into a
percentage
• Comments
“The coefficient of determination tells us that rr% of the variation in the
dependent variable is explained by the variation in the independent
variable”
K McMullen 2012
9. Displaying Bivariate Data
• You must remember the difference between
correlation and causation
• To interpret your scatterplot you must stick to the
variables given and don’t make any unnecessary
assumptions
• If your scatterplot is negative then: “As IV
increases the DV decreases)
• If your scatterplot is positive then: “As IV
increases the DV increases)
K McMullen 2012
10. Displaying Bivariate Data
Example: Age and arm span of teenage boys
Comment: As the age of teenage boys increases
the length of their arm span also increases
Assumption: As teenage boys get taller their arm
span increases
Obviously they get taller but height is not a
variable and therefore you should not comment
on it
K McMullen 2012
11. Displaying Bivariate Data
Eg. The number of cigarettes smoked and fitness
level
Comment: As the number of cigarettes increase
the fitness level of participants decreased
Assumption: Smoking cigarettes causes fitness
levels to decrease
You must remember that there can be other
factors the can account for low levels of fitness
such as lack of exercise or weight etc
K McMullen 2012
12. Displaying Bivariate Data
Eg. People catching public transport and the sales of
designer handbags
Comment: As the number of people catching public
transport increase the number of people buying
designer handbags decreases
Assumption: A high proportion of people catching
public transport has caused a decline in the sales of
designer handbags
These two variables are clearly unrelated even though
there can be some correlation. You need to always
question the validity of stats- what else could have
caused public transport use to increase and designer
handbags sales to decrease?
K McMullen 2012