Pearson's chi-squared test is used to determine if there is a relationship between two categorical variables. It has the following structure:
1) State the null and alternative hypotheses
2) Calculate the test statistic by finding residuals between observed and expected counts and summing their squares divided by expected values
3) Find the critical value based on degrees of freedom and significance level
4) Reject the null hypothesis if the test statistic exceeds the critical value, concluding the variables are dependent. Otherwise fail to reject, concluding independence.
Three examples are provided to demonstrate applying the chi-squared test to determine dependence between grades and attendance, height and nose size, and weather and season.
1. Lecture 12 - The χ2
-test
C2 Foundation Mathematics (Standard Track)
Dr Linda Stringer Dr Simon Craik
l.stringer@uea.ac.uk s.craik@uea.ac.uk
INTO City/UEA London
2. Pearson’s χ2
-test
Pearson’s χ2-test is another kind of hypothesis test, we use
it to investigate whether two variables are independent, or
related to each other.
For example to investigate whether people’s voting
intention and social class are related.
In a χ2-test the variables are categorical - for example the
categories of voting intention are Labour, Conservative and
Liberal Democrat, and the categories of social class are A,
B and C.
To investigate whether men get paid more than women, the
variables are gender and salary. The categories of gender
are male and female, and the categories of salary could be
low, medium and high.
3. The structure of a χ2
-test
The χ2-test has the same structure as a Z-test and a T-test
Hypotheses (state H0 and H1)
Critical value (look it up in a table)
Test statistic (this is the long part - there are 5 steps)
Decision (Reject/accept H0, with justification)
Conclusion (in words)
4. Are students’ grades dependent on attendance?
Some students think that turning up to lectures and seminars
doesn’t make a difference to what grade they get at the end of
the module. We surveyed a sample of 100 students. Based on
the data below do you think that attendance and grades are
independent variables?
Often attends Sometimes Rarely attends
Distinction 36 7 4
Merit 12 10 7
Pass 2 6 16
You see that about half of the students (47 out of 100) got a
Distinction. However is this true for each individual category of
attendance?
We will perform a χ2-test on this data later (see Example 1
below)
5. Hypotheses
The null hypothesis (H0) is always that the variables are
independent.
The alternative hypothesis (H1) is always that the variables
are related (dependent on each other).
Write the hypotheses as sentences, for example
H0: Voting intention and social class are independent
H1; Voting intention depends on social class
6. χ2
-test statistic step 1: calculate the totals
Data is presented in a table, called a contingency table.
The values in the contingency table are called the
observed values (O).
( The contingency table is often referred to as the
’observed’ values table.)
We first calculate the row totals, the column totals and the
grand total.
7. χ2
-test statistic step 2: the expected table
We then construct the expected table.
This is the table of expected values (E). For each cell,
expected value (E) =
row total × column total
grand total
NOTE: For a χ2-test to be viable the expected values must
all be greater than 5 in a 2 × 2 table and greater than 5 in
80% of the cells in larger tables.
Handy hint: The totals are the same for the expected table
as for the observed table.
8. χ2
-test statistic step 3: the residual table
We then construct the residual table.
This is the table of residual values. For each cell,
residual value (R) = observed value(O)−expected value(E)
Handy hint: The row and column totals in the residual table
are always 0.
9. χ2
-test statistic step 4: the χ2
-table
We then construct the χ2-table.
Construct the χ2 table. For each cell,
value =
R2
E
=
(O − E)2
E
Handy hint: The values in the χ2 table are always positive.
10. χ2
-test statistic step 5: the test statistic
The test statistic is the sum of all the values in the χ2 table
χ2
-test statistic =
R2
E
=
(O − E)2
E
Handy hint: The test statistic is always positive.
11. The critical value
We also need to find our critical value.
To do this we calculate the degree of freedom of our table.
This will be the degrese of freedom of the rows multiplied
by the degrees of freedom of the columns.
d.o.f. = (n − 1) × (m − 1), where n is the number of rows
and m is the number of columns.
Get the critical value by reading off the degree of freedom
and required significance level from the table. If the test
statistic is greater than the critical value we reject the null
hypothesis.
13. Example 1
Some students think that turning up to lectures doesn’t
make a difference to what grade they get. We consider a
sample of 100 students, and test at 5% level of
significance.
Often attends Sometimes Rarely
Distinction 36 7 4
Merit 12 10 7
Pass 2 6 16
First state the hypotheses.
H0: the students’ attendance and grades are independent.
H1: the grade depends on attendence.
14. Example 1
Step 1: Calculate the column totals, the row totals and the
grand total.
Often Sometimes Rarely Row total
Distinction 36 7 4 47
Merit 12 10 7 29
Pass 2 6 16 24
Column total 50 23 27 100
Step 2: Calculate the expected table.
expected value (E) =
row total × column total
grand total
23.5 10.81 12.69
14.5 6.67 7.83
12 5.52 6.48
15. Example 1
Step 3: Calculate the residual table
residual value (R) = observed value(O) − expected value(E)
12.5 -3.81 -8.69
-2.5 3.33 -0.83
-10 0.48 9.52
Step 4: Calculate the χ2-table. As we want a final value to 2 dp
calculate to 3 dp.
value =
R2
E
=
(O − E)2
E
6.649 1.343 5.951
0.431 1.663 0.088
8.333 0.042 13.986
Step 5: The test statistic is the sum of all the values in the table
χ2
-test statistic =
R2
=
(O − E)2
= 38.49 to 2 d.p.
16. Example 1
We now find our critical value.
The degree of freedom will be (3 − 1) × (3 − 1) = 4.
We consult our table and get a critical value of 9.49.
As our test statistic is greater than the critical value,
38.49>9.49, we decide to reject the null hypothesis.
We conclude that your grade depends on your attendance.
17. Example 2
In his sketch the Vitruvian man Leonardo da Vinci
displayed the “perfect" proportions of man.
We aren’t so sure there is a correlation between height and
nose size so we gather some data and test at a 1% level of
significance.
≤ 1.5 cm 1.5-2.5 cm ≥ 2.5 cm
≤ 165cm 9 3 6
166-175 cm 15 18 18
176-185 cm 12 21 24
≥ 186 cm 9 6 9
First we state our hypotheses.
H0: height and nose size are independent variables.
H1: height and nose size are dependent variables.
18. Example 2
Calculate the column and row totals.
≤ 1.5 cm 1.5-2.5 cm ≥ 2.5 cm Row total
≤ 165cm 9 3 6
166-175 cm 15 18 18
176-185 cm 12 21 24
≥ 186 cm 9 6 9
Column total 45 48 57 150
Calculate the expected table. Multiply the column sum and row
sum and divide by total.
5.4 5.76 6.84
15.3 16.32 19.38
17.1 18.24 21.66
7.2 7.68 9.12
19. Example 2
Calculate the residual table. Subtract the expected value from
the original value.
3.6 -2.76 -0.84
-0.3 1.68 -1.38
-5.1 2.76 2.34
1.8 -1.68 -0.12
Calculate the χ2-table. Square the residual value and divide by
the expected. As we want a final value to 2 d.p., calculate to 3
d.p.
2.4 1.323 0.103
0.006 0.173 0.098
1.521 0.418 0.253
0.45 0.368 0.002
χ2
-test statistic =
R2
E
=
(O − E)2
E
= 7.11 to 2 d.p.
20. Example 2
We now find our critical value.
The degree of freedom will be (3 − 1) × (4 − 1) = 6.
We consult our table and get a critical value of 16.81.
Our test statistic is closer to 0 than our critical value,
7.11<16.81, so we decide to accept our null hypothesis.
We conclude that height and nose size are independent.
21. Example 3
A cynical Englishman says that it rains all year round in
Britain. To see if this is the case we keep a log of the
weather in each season. We test at a 1% level of
significance.
Overcast Sunny Rainy Snowy
Spring 5 53 28 4
Summer 31 37 22 0
Autumn 32 25 30 3
Winter 22 15 29 24
First we state our hypotheses.
H0: the weather and season are independent.
H1: the weather depends on the season.
22. Example 3
Calculate the column and row totals.
Overcast Sunny Rainy Snowy Row total
Spring 5 53 28 4 90
Summer 31 37 22 0 90
Autumn 32 25 30 3 90
Winter 22 15 29 24 90
Column total 90 130 109 31 360
Calculate the expected table. Multiply the column sum and row
sum and divide by total.
22.5 32.5 27.25 7.75
22.5 32.5 27.25 7.75
22.5 32.5 27.25 7.75
22.5 32.5 27.25 7.75
23. Example 3
Calculate the residual table. Subtract the expected value from
the original value.
-17.5 20.5 0.75 -3.75
8.5 4.5 -5.25 -7.75
9.5 -7.5 2.75 -4.75
-0.5 -17.5 1.75 16.25
Calculate the χ2-table. Square the residual value and divide by
the expected. As we want a final value to 2 d.p., calculate to 3
d.p.
13.611 12.931 0.021 1.815
3.211 0.623 1.011 7.75
4.011 1.731 0.278 2.911
0.011 9.423 0.112 34.073
The test statistic is the sum of all the values in the table 93.52
to 2 d.p.
24. Example 3
We now find our critical value.
The degree of freedom will be (4 − 1) × (4 − 1) = 9.
We consult our table and get a critical value of 21.67.
Our test statistic larger than our critical value so we decide
to reject our null hypothesis.
We conclude that the weather depends on the season.