2. Relationships Between Variables
• The relationship between variables can be
explained in various ways such as:
– Presence /absence of a relationship
– Directionality of the relationship
– Strength of association
– Type of relationship
3. Relationships Between Variables
• Presence / absence of a relationship
– E.g., if we are interested to study the customer
satisfaction levels of a fast-food restaurant, then
we need to know if the quality of food and
customer satisfaction have any relationship or not
4. Relationships Between Variables
• Direction of the relationship
– The direction of a relationship can be either
positive or negative
– Food quality perceptions are related positively to
customer commitment toward a restaurant.
5. Relationships Between Variables
• Strength of association
– They are generally categorized as nonexistent, weak,
moderate, or strong.
– Quality of food is strongly associated with customer
satisfaction in a fast-food restaurant
6. Relationships Between Variables
• Type of association
– How can the link between Y and X best be
described?
– There are different ways in which two variables
can share a relationship
• Linear relationship
• Curvilinear relationship
7. Chi-Square (χ2) and Frequency Data
• Today the data that we analyze consists of frequencies; that
is, the number of individuals falling into categories. In other
words, the variables are measured on a nominal scale.
• The test statistic for frequency data is Pearson Chi-Square.
The magnitude of Pearson Chi-Square reflects the amount of
discrepancy between observed frequencies and expected
frequencies.
8. Steps in Test of Hypothesis
1. Determine the appropriate test
2. Establish the level of significance:α
3. Formulate the statistical hypothesis
4. Calculate the test statistic
5. Determine the degree of freedom
6. Compare computed test statistic against a
tabled/critical value
9. 1. Determine Appropriate Test
• Chi Square is used when both variables are
measured on a nominal scale.
• It can be applied to interval or ratio data that
have been categorized into a small number of
groups.
• It assumes that the observations are randomly
sampled from the population.
• All observations are independent (an individual
can appear only once in a table and there are no
overlapping categories).
• It does not make any assumptions about the
shape of the distribution nor about the
homogeneity of variances.
10. 2. Establish Level of Significance
• α is a predetermined value
• The convention
• α = .05
• α = .01
• α = .001
11. 3. Determine The Hypothesis:
Whether There is an Association
or Not
• Ho : The two variables are independent
• Ha : The two variables are associated
12. 4. Calculating Test Statistics
• Contrasts observed frequencies in each cell of a
contingency table with expected frequencies.
• The expected frequencies represent the number of
cases that would be found in each cell if the null
hypothesis were true ( i.e. the nominal variables are
unrelated).
• Expected frequency of two unrelated events is
product of the row and column frequency divided by
number of cases.
Fe= Fr Fc / N
14. 4. Calculating Test Statistics
O
fre bse
qu rv
en ed
cie
s
( Fo − Fe ) 2
χ = ∑
2
Fe
Ex que
fre
pe nc
cte y
d
qu ted
cy
fre pec
en
Ex
15. 5. Determine Degrees of
of
ber
Num ls in
leve n
m
df = (R-1)(C-1)
colu le
b
Freedom
varia
Numb
e
levels r of
in ro
variab w
le
16. 6. Compare computed test statistic
against a tabled/critical value
• The computed value of the Pearson chi-
square statistic is compared with the critical
value to determine if the computed value is
improbable
• The critical tabled values are based on
sampling distributions of the Pearson chi-
square statistic
• If calculated χ2 is greater than χ2 table
value, reject Ho
17. Example
• Suppose a researcher is interested in buying
preferences of environmentally conscious
consumers.
• A questionnaire was developed and sent to a
random sample of 90 voters.
• The researcher also collects information about
the gender of the sample of 90 respondents.
18. Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row
Male 10 10 30 50
Female 15 15 10 40
f column 25 25 40 n = 90
19. Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row
Male 10 10 30 50
Female 15 15 10 40
f column e d 25 25 40 n = 90
erv cies
bs en
O qu
fre
20. Bivariate Frequency Table or
Row frequency
Contingency Table
Favor Neutral Oppose f row
Male 10 10 30 50
Female 15 15 10 40
f column 25 25 40 n = 90
21. Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row
Male 10 10 30 50
Female 15 15 10 40
f column 25 25 40 n = 90
Column frequency
22. 1. Determine Appropriate Test
1. Gender ( 2 levels) and Nominal
2. Buying Preference ( 3 levels) and Nominal
24. 3. Determine The Hypothesis
• Ho : There is no difference between men and
women in their opinion on pro-environmental
products.
• Ha : There is an association between gender
and opinion on pro-environmental products.
25. 4. Calculating Test Statistics
Favor Neutral Oppose f row
Men fo =10 fo =10 fo =30 50
fe =13.9 fe =13.9 fe=22.2
Women fo =15 fo =15 fo =10 40
fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90
26. 4. Calculating Test Statistics
Favor Neutral Oppose f row
= 50*25/90
Men fo =10 fo =10 fo =30 50
fe =13.9 fe =13.9 fe=22.2
Women fo =15 fo =15 fo =10 40
fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90
27. 4. Calculating Test Statistics
Favor Neutral Oppose f row
Men fo =10 fo =10 fo =30 50
fe =13.9 fe =13.9 fe=22.2
= 40* 25/90
Women fo =15 fo =15 fo =10 40
fe =11.1 fe =11.1 fe =17.8
f column 25 25 40 n = 90
30. 6. Compare computed test statistic
against a tabled/critical value
• α = 0.05
• df = 2
• Critical tabled value = 5.991
• Test statistic, 11.03, exceeds critical value
• Null hypothesis is rejected
• Men and women differ significantly in their
opinions on pro-environmental products
31. SPSS Output Example
Chi-Square Tests
Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 11.025a 2 .004
Likelihood Ratio 11.365 2 .003
Linear-by-Linear
8.722 1 .003
Association
N of Valid Cases 90
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 11.11.
32. Additional Information in SPSS Output
• Exceptions that might distort χ2 Assumptions
– Associations in some but not all categories
– Low expected frequency per cell
• Extent of association is not same as statistical
significance
Demonstrated
through an example
33. Another Example Heparin Lock
Placement
Complication Incidence * Heparin Lock Placement Time Group Crosstabulation
Heparin Lock Time:
Placement Time Group
1 = 72 hrs
1 2 Total
Complication Had Compilca Count 9 11 20
2 = 96 hrs
Incidence Expected Count 10.0 10.0 20.0
% within Heparin Lock
18.0% 22.0% 20.0%
Placement Time Group
Had NO Compilca Count 41 39 80
Expected Count 40.0 40.0 80.0
% within Heparin Lock
82.0% 78.0% 80.0%
Placement Time Group
Total Count 50 50 100
Expected Count 50.0 50.0 100.0
% within Heparin Lock
100.0% 100.0% 100.0%
Placement Time Group
from Polit Text: Table 8-1
34. Hypotheses in Smoking Habit
• Ho: There is no association between
complication incidence and duration of
smoking habit. (The variables are
independent).
• Ha: There is an association between
complication incidence and duration of
smoking habit. (The variables are related).
35. More of SPSS Output
Chi-Square Tests
Asymp. Sig. Exact Sig. Exact Sig.
Value df (2-sided) (2-sided) (1-sided)
Pearson Chi-Square .250b 1 .617
Continuity Correctiona .063 1 .803
Likelihood Ratio .250 1 .617
Fisher's Exact Test .803 .402
Linear-by-Linear
.248 1 .619
Association
N of Valid Cases 100
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10.
00.
36. Pearson Chi-Square
• Pearson Chi-Square = .
250, p = .617
Since the p > .05, we fail to
reject the null hypothesis Chi-Square Tests
that the complication rate Value df
Asymp. Sig.
(2-sided)
Exact Sig.
(2-sided)
Exact Sig.
(1-sided)
is unrelated to smoking Pearson Chi-Square
Continuity Correctiona
.250b
.063
1
1
.617
.803
habit duration. Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
.250 1 .617
.803 .402
• Continuity correction is
.248 1 .619
Association
N of Valid Cases 100
used in situations in which
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10.
the expected frequency
00.
for any cell in a 2 by 2
table is less than 10.
37. More SPSS Output
Symmetric Measures
Asymp.
a b
Value Std. Error Approx. T Approx. Sig.
Nominal by Phi -.050 .617
Nominal Cramer's V .050 .617
Interval by Interval Pearson's R -.050 .100 -.496 .621c
Ordinal by Ordinal Spearman Correlation -.050 .100 -.496 .621c
N of Valid Cases 100
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Based on normal approximation.
38. Phi Coefficient
• Pearson Chi-Square Symmetric Measures
Asymp.
a
Value Std. Error
provides information Nominal by
Nominal
Phi
Cramer's V
-.050
.050
about the existence of
Interval by Interval Pearson's R -.050 .100
Ordinal by Ordinal Spearman Correlation -.050 .100
N of Valid Cases 100
relationship between 2 a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothes
nominal variables, but not
c. Based on normal approximation.
about the magnitude of
the relationship
• Phi coefficient is the χ 2
measure of the strength φ=
of the association N
39. Cramer’s V
• When the table is larger than 2 Symmetric Measures
by 2, a different index must be
Asymp.
a
Value Std. Error
Nominal by Phi -.050
used to measure the strength Nominal
Interval by Interval
Cramer's V
Pearson's R
.050
-.050 .100
of the relationship between the Ordinal by Ordinal
N of Valid Cases
Spearman Correlation -.050
100
.100
variables. One such index is a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis
Cramer’s V. c. Based on normal approximation.
• If Cramer’s V is large, it means
that there is a tendency for
particular categories of the first
variable to be associated with
χ 2
particular categories of the
second variable. V=
N (k − 1)
40. Cramer’s V
• When the table is larger than 2 Symmetric Measures
by 2, a different index must be
Asymp.
a
Value Std. Error
Nominal by Phi -.050
used to measure the strength Nominal
Interval by Interval
Cramer's V
Pearson's R
.050
-.050 .100
of the relationship between the Ordinal by Ordinal
N of Valid Cases
Spearman Correlation -.050
100
.100
variables. One such index is a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis
Cramer’s V. c. Based on normal approximation.
• If Cramer’s V is large, it means
that there is a tendency for
particular categories of the first
variable to be associated with
χ 2
particular categories of the
second variable. V=
N (k − 1)
Number of Smallest of
cases number of rows or