This document discusses the chi-square test of independence and how it can be used to determine if two variables are independent of each other. It provides an example using a contingency table to analyze the relationship between gender and voter turnout. The expected values for each cell are calculated based on the marginal totals. The chi-square statistic is then used to determine if the differences between observed and expected values are statistically significant, which would indicate the variables are dependent rather than independent.
3. Questions of independence are actually
the flip side of questions of relationship. If
a variable is independent of another variable,
then functions in one will not be accompanied
by functions in the other.
7. For example, the question, “Are admissions
decisions at a local community college fair?” can
reasonably be interpreted as a question of
independence (or bias).
8. If fairness is taken to mean that there is
proportional representation of minority and
majority students that mirrors the local
proportions, then a test of independence can
estimate whether admissions are “fair”.
9. The question becomes “Are admissions
decisions independent of majority/minority
status?”
10. Assuming that majority students are similar in
their preparation and motivation as minority
students and they apply to the community
college in proportionally similar numbers as
minority students, then a fair admissions
process should be independent of majority
status and render proportions of admissions
that are similar to proportions of majority and
minority students in the local populations
11. INDEPENDENT EXAMPLE: If you are a minority
you are neither more likely nor less likely to be
admitted.
13. Failure to be independent would indicate bias.
BIAS EXAMPLE: If you are a minority you are
more likely to be admitted.
14. Failure to be independent would indicate bias.
BIAS EXAMPLE: If you are a minority you are
more likely to be admitted.
BIAS EXAMPLE: If you are a minority you less
likely to be admitted
15. Failure to be independent would indicate bias.
BIAS EXAMPLE: If you are a minority you are
more likely to be admitted.
BIAS EXAMPLE: If you are a minority you less
likely to be admitted.
You will use certain statistical methods (like the
chi square test of independence) to determine if
independence is significant or not.
16. Here is an example taken from
http://omega.albany.edu:8008/mat108dir/chi2i
ndependence/chi2in-m2h.html:
17. Here is an example taken from
http://omega.albany.edu:8008/mat108dir/chi2i
ndependence/chi2in-m2h.html:
In a certain town, there are about one million
eligible voters. A simple random sample of
10,000 eligible voters was chosen to study the
relationship between gender and participation
in the last election.
18. Here is an example taken from
http://omega.albany.edu:8008/mat108dir/chi2i
ndependence/chi2in-m2h.html:
In a certain town, there are about one million
eligible voters. A simple random sample of
10,000 eligible voters was chosen to study the
relationship between gender and participation
in the last election. The results
are summarized in the following
2X2 (read two by two)
contingency table:
19. In a certain town, there are about one million
eligible voters. A simple random sample of
10,000 eligible voters was chosen to study the
relationship between gender and participation
in the last election. The results are summarized
in the following 2X2 (read two by two)
contingency table:
Men Women
__________________________
Voted 2792 3591
Didn't vote 1486 2131
20. We want to check whether being a man or a
woman (columns) is independent of having
voted in the last election (rows). In other words
is “gender and voting independent”?
Men Women
__________________________
Voted 2792 3591
Didn't vote 1486 2131
22. Solution:
In order to answer the question we need to
build a test of hypothesis. We have
23. Solution:
In order to answer the question we need to
build a test of hypothesis. We have
Null Hypothesis = ‘Gender is independent of
Voting’
24. Solution:
In order to answer the question we need to
build a test of hypothesis. We have
Null Hypothesis = ‘Gender is independent of
Voting’
Alternative Hypothesis = ‘Gender and Voting
are dependent’
25. Solution:
In order to answer the question we need to
build a test of hypothesis. We have
Null Hypothesis = ‘Gender is independent of
Voting’
Alternative Hypothesis = ‘Gender and Voting
are dependent’
After specifying the Null Hypothesis, we need to
compute the expected table under the
assumption that rows and columns are in fact
independent.
26. As you can see we have the observed table
below:
27. As you can see we have the observed table
below:
Men Women
__________________________
Voted 2792 3591
Didn't vote 1486 2131
We need to create an expected table and then
determine if the difference between the
observed and expected are significant:
28. As you can see we have the observed table
below:
Men Women
__________________________
Voted 2792 3591
Didn't vote 1486 2131
We need to create an expected table and then
determine if the difference between the
observed and expected are significant:
29. As you can see we have the observed table
below:
Men Women
__________________________
Voted 2792 3591
Didn't vote 1486 2131
We need to create an expected table and then
determine if the difference between the
observed and expected are significant:
Observed Numbers Expected Numbers Difference
30. Remember that the smaller the DIFFERENCE,
the better the fit which in this case would favor
INDEPENDENCE between gender and voting
tendencies.
31. Remember that the smaller the DIFFERENCE,
the better the fit which in this case would favor
INDEPENDENCE between gender and voting
tendencies.
Observed Numbers Expected Numbers Difference
32. Inversely, the larger the DIFFERENCE the worse
the fit which in this case would indicate that
gender and voting tendencies are dependent
upon one another.
33. Inversely, the larger the DIFFERENCE the worse
the fit which in this case would indicate that
gender and voting tendencies are dependent
upon one another.
Observed Numbers Expected Numbers Difference
34. We use Chi-Square distribution to determine if
that difference is significant or not.
35. We use Chi-Square distribution to determine if
that difference is significant or not.
We will now show you how to compute the chi-square
statistic for a test of independence.
36. We use Chi-Square distribution to determine if
that difference is significant or not.
We will now show you how to compute the chi-square
statistic for a test of independence.
First, we compute the row and column totals
along with the grand total.
42. Total Men &
Women or Total
Voted/Not Voted
Men Women
________________________________________
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
43. Now we have the information we need to create
an expected table. Here is the equation for
calculating the expected value for the cell “Men
who Voted”:
44. Now we have the information we need to create
an expected table. Here is the equation for
calculating the expected value for the cell “Men
who Voted”:
Expected Value(Men who voted) =
(Number (all who voted) * Number (all men))
Number(total number)
45. Observed Men Women
_
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Men Who
Voted
46. OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who voted) =
(6386 (all who voted) * Number (all men))
Number (total number)
47. OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who voted) =
(6386 (all who voted) * 4278 (all men) )
Number (total number)
48. OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who voted) =
(6386 (all who voted) * 4278 (all men) )
10000 (total number)
49. OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who voted) =
(27306474 (all who voted * all men))
10000 (total number)
50. OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who voted) =
2730.6474 ((all who voted * all men)/total number)
51. OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who voted) =
2731 ((all who voted * all men)/total number)
52. EXPECTED Men Women
_ TABLE
Voted 2731 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who voted) =
2731 ((all who voted * all men)/total number)
53. EXPECTED Men Women
_ TABLE
Voted 2731 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
What is the expected
value for Women who
Voted?
55. Women who voted:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
56. Women who voted:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who voted) =
(6386 (all who voted) * 5722 (all women) )
10000 (total number)
57. Women who voted:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who voted) =
(6386 (all who voted) * 5722 (all women) )
10000 (total number)
58. Women who voted:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who voted) =
(6386 (all who voted) * 5722 (all women) )
10000 (total number)
59. Women who voted:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who voted) =
(36523526 ((all who voted) * (all women)) )
10000 (total number)
60. Women who voted:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who voted) =
(3652.3526 ((all who voted) * (all women)))/total number
61. Women who voted:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who voted) =
(3652 ((all who voted) * (all women)))/total number
62. Women who voted:
EXPECTED Men Women
_ TABLE
Voted 2731 3652 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who voted) =
(3652 ((all who voted) * (all women)))/total number
63. Women who voted:
EXPECTED Men Women
_ TABLE
Voted 2731 3652 6386
Didn't vote 1486 2131 3617
4278 5722 10000
What is the expected
value for Men who
Didn’t Vote?
65. Men who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
66. Men who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who didn’t vote) =
(3617 (all who didn’t vote) * 4278 (all men) )
10000 (total number)
67. Men who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who didn’t vote) =
(3617 (all who didn’t vote) * 4278 (all men) )
10000 (total number)
68. Men who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who didn’t vote) =
(3617 (all who didn’t vote) * 4278 (all men) )
10000 (total number)
69. Men who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who didn’t vote) =
(15473526 ((all who didn’t vote) * (all men)) )
10000 (total number)
70. Men who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Men who didn’t vote) =
(1547.3526 ((all who didn’t vote) * (all men)) / (total number))
71. Men who didn’t vote:
EXPECTED Men Women
_ TABLE
Voted 2731 3652 6386
Didn't vote 1547 2131 3617
4278 5722 10000
Expected Value (Men who didn’t vote) =
(1547 ((all who didn’t vote) * (all men)) / (total number))
72. Men who didn’t vote:
EXPECTED Men Women
_ TABLE
Voted 2731 3652 6386
Didn't vote 1547 2131 3617
4278 5722 10000
What is the expected
value for Women who
Didn’t Vote?
73. Women who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
74. Women who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who didn’t vote) =
(3617 (all who didn’t vote) * 5722 (all women) )
10000 (total number)
75. Women who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who didn’t vote) =
(3617 (all who didn’t vote) * 5722 (all women) )
10000 (total number)
76. Women who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who didn’t vote) =
(3617 (all who didn’t vote) * 5722 (all women) )
10000 (total number)
77. Women who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who didn’t vote) =
(20696474 (all who didn’t vote) * (all women) )
10000 (total number)
78. Women who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who didn’t vote) =
(2069.6474 (all who didn’t vote) * (all women)) /(total number)
79. Women who didn’t vote:
OBSERVED Men Women
_ TABLE
Voted 2792 3591 6386
Didn't vote 1486 2131 3617
4278 5722 10000
Expected Value (Women who didn’t vote) =
(2070 (all who didn’t vote) * (all women)) /(total number)
80. Men who didn’t vote:
EXPECTED Men Women
_ TABLE
Voted 2731 3652 6386
Didn't vote 1547 2070 3617
4278 5722 10000
81. OBSERVED Men Women
TABLE
Voted 2792 3591
Didn't vote 1486 2131
4278 5722
10000
EXPECTED Men Women
TABLE
Voted 2731 3652
Didn't vote 1547 2070
- = Difference
4278 5722
10000
82. OBSERVED Men Women
TABLE
Voted 2792 3591
Didn't vote 1486 2131
4278 5722
10000
EXPECTED Men Women
TABLE
Voted 2731 3652
Didn't vote 1547 2070
- = Difference
4278 5722
10000
With the information above, we can now plug in
the numbers using the Chi-square independence
test.
83. OBSERVED Men Women
TABLE
Voted 2792 3591
Didn't vote 1486 2131
4278 5722
10000
EXPECTED Men Women
TABLE
Voted 2731 3652
Didn't vote 1547 2070
- = Difference
4278 5722
10000
With the information above, we can now plug in
the numbers using the Chi-square independence
test.
Note – this is the same equation that is used
with the Chi-square goodness of fit test:
84. OBSERVED Men Women
TABLE
Voted 2792 3591
Didn't vote 1486 2131
4278 5722
10000
EXPECTED Men Women
TABLE
Voted 2731 3652
Didn't vote 1547 2070
- = Difference
4278 5722
10000
With the information above, we can now plug in
the numbers using the Chi-square independence
test.
Note – this is the same equation that is used
with the Chi-square goodness of fit test:
푥2 = Σ
(푂 − 퐸)2
퐸
123. Now we determine if a 푥2of 6.6 exceeds the
critical 푥2 for terms.
124. To calculate the 푥2 critical we first must
determine the degrees of freedom as well as set
the probability level.
125. To calculate the 푥2 critical we first must
determine the degrees of freedom as well as set
the probability level.
The probability or alpha level means the
probability of a type 1 error we are willing to live
with (i.e., this is the probability of being wrong
when we reject the null hypothesis). Generally
this value is .05 which is like saying we are
willing to be wrong 5 out of 100 times (.05)
before we will reject the null-hypothesis.
126. Degrees of Freedom are calculated by taking the
number rows and subtracting them by 1 and
then multiplying the result by taking the number
of columns and subtracting them by 1.
127. Degrees of Freedom are calculated by taking the
number rows and subtracting them by 1 and
then multiplying the result by taking the number
of columns and subtracting them by 1. (Two
rows -1) or (2-1) X (2-1) or 1X1=1. Degrees of
Freedom = 1.
128. We now have all of the information we need to
determine the critical 푥2.
129. We now have all of the information we need to
determine the critical 푥2.
We go to the Chi-Square Distribution Table and
locate the degrees of freedom:
130. We now have all of the information we need to
determine the critical 푥2.
We go to the Chi-Square Distribution Table and
locate the degrees of freedom:
df 0.100 0.050 0.025
1 2.71 3.84 5.02
2 4.61 5.99 7.38
3 6.25 7.82 9.35
4 7.78 9.49 11.14
5 9.24 11.07 12.83
6 10.64 12.59 14.45
7 12.02 14.07 16.10
8 13.36 15.51 17.54
9 14.68 16.92 19.20
… … … …
131. We now have all of the information we need to
determine the critical 푥2.
We go to the Chi-Square Distribution Table and
locate the degrees of freedom:
df 0.100 0.050 0.025
1 2.71 3.84 5.02
2 4.61 5.99 7.38
3 6.25 7.82 9.35
4 7.78 9.49 11.14
5 9.24 11.07 12.83
6 10.64 12.59 14.45
7 12.02 14.07 16.10
8 13.36 15.51 17.54
9 14.68 16.92 19.20
… … … …
And then we locate the
probability or alpha level:
132. We now have all of the information we need to
determine the critical 푥2.
We go to the Chi-Square Distribution Table and
locate the degrees of freedom:
df 0.100 0.050 0.025
1 2.71 3.84 5.02
2 4.61 5.99 7.38
3 6.25 7.82 9.35
4 7.78 9.49 11.14
5 9.24 11.07 12.83
6 10.64 12.59 14.45
7 12.02 14.07 16.10
8 13.36 15.51 17.54
9 14.68 16.92 19.20
… … … …
And then we locate the
probability or alpha level:
Where these two values
intersect in the table we find
the critical 푥2.
133. Since the chi-square goodness of fit value (6.6)
exceeds the critical 푥2 (3.84) we will reject the
null-hypothesis.
134. Since the chi-square goodness of fit value (6.6)
exceeds the critical 푥2 (3.84) we will reject the
null-hypothesis.
Voting patterns and gender status are not
statistically significantly dependent on one
another.
135. Since the chi-square goodness of fit value (6.6)
exceeds the critical 푥2 (3.84) we will reject the
null-hypothesis.
Voting patterns and gender status are not
statistically significantly dependent on one
another.
136. Since the chi-square goodness of fit value (6.6)
exceeds the critical 푥2 (3.84) we will reject the
null-hypothesis.
Voting patterns and gender status are not
statistically significantly dependent on one
another.
There actually is a significant difference.
137. So what is the difference between chi-square
test of goodness of fit and test of
independence?
139. A goodness-of-fit test is a one variable Chi-square
test.
In this example, a department chair wants to
know if the enrollments across three professors
are equally distributed.
140. A goodness-of-fit test is a one variable Chi-square
test.
In this example, a department chair wants to
know if the enrollments across three professors
are equally distributed.
Here is the actual, or observed, data:
141. A goodness-of-fit test is a one variable Chi-square
test.
In this example, a department chair wants to
know if the enrollments across three professors
are equally distributed.
Here is the actual, or observed, data:
OBSERVED
TABLE
Prof A’s
Class
Prof B’s
Class
Prof C’s
Class
Students enrolled 31 25 10
142. A goodness-of-fit test is a one variable Chi-square
test.
OBSERVED
TABLE
Prof A’s
Class
Prof B’s
Class
Prof C’s
Class
Students enrolled 31 25 10
143. A goodness-of-fit test is a one variable Chi-square
test.
OBSERVED
TABLE
Prof A’s
Class
Prof B’s
Class
Prof C’s
Class
Students enrolled 31 25 10
144. A test of independence is a two variable Chi-square
test.
145. A test of independence is a two variable Chi-square
test.
For example, a department chair wants to know
if women and men enrollments are equally
distributed across three professor classes.
146. A test of independence is a two variable Chi-square
test.
For example, a department chair wants to know
if women and men enrollments are equally
distributed across three professor classes.
OBSERVED
TABLE
Prof A’s
Class
Prof B’s
Class
Prof C’s
Class
Men 21 7 7
Women 10 18 3
147. A test of independence is a two variable
(gender) Chi-square test.
OBSERVED
TABLE
Prof A’s
Class
Prof B’s
Class
Prof C’s
Class
Men 21 7 7
Women 10 18 3