1. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
BRM ProjectReport:Chi-square test of
independence
What is Chi-Square Test?
• Chi-Square is a statistical test commonly used to compare observed data,
with data we would expect to obtain according to a specific hypothesis.
• It is a non-parametric test, which means that it makes no assumptions about
the parameters (defining properties) of the population distribution(s) from
which one's data are drawn.
• The data used in calculating a chi square statistic, must be random, raw,
2. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
mutually exclusive, drawn from a large enough sample.
• It consists of three different type of analysis :
o Goodness Of Fit : determines if the sample under analysis was drawn
from a population that follows some specified distribution.
o Test for Homogeneity : answers the proposition that several
populations are homogeneous with respectto some characteristic
o Test Of Independence
What is test for independence of variables?
• The Chi-Square test of Independence is used to determine if there is a
significant relationship between two nominal (categorical) variables.
• The frequency of one nominal variable is compared with different values of
the second nominal variable.
• The Chi-Square Test of Independence can only compare categorical
variables.
• Additionally, the Chi-Square Test of Independence only assesses
associations between categorical variables, and can not provide any
inferences about causation.
• The null hypothesis (H0) and alternative hypothesis (H1) of the Chi-Square
Test of Independence can be expressed in two different but equivalent ways:
H0: "Variable X” is independent of “Variable Y"
H1: "Variable X” is not independent of “Variable Y"
What is Objective Of Study?
• For the purposeof the study, we would be taking different examples, across
different studies and surveys conducted.
• Our study would be restricted to only analyzing degree of association,
amongst two categorical variables, considered in the conducted survey,
which is why we would be using Chi-Square Test of Independence of
variables.
3. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Problem 1:
Manager of an apple store located in New Delhi, India wishes to analyze its sales data in order to
understand the relationship between the Gender of their respective customers and the Product bought by
them. They wish to check the dependency of Product sales on Gender of the customer.
Dataset
Product Gender
(1 : Male, 2 :
Female)
Frequency
iPod 1 189
4. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
iPod 0 77
MacBook 1 256
MacBook 0 187
iMac 1 159
iMac 0 102
iPhone 5s 1 120
iPhone 5s 0 54
Variables
1. Product: Products sold by the Apple Store.
Data Type: Nominal.
2. Gender: Gender of Apple’s customers
Data Type: Nominal
3. Frequency: Number of Sales of respective product.
Data Type: Scale
Methodology
Since our objective is to find the dependency between a product sold and gender of our customers and
both our variables are categorical / Nominal in nature, we use Chi Square Test For Independence.
We have used IBM SPSS Statistics, Version 23 as the statistic platform to use our statistical tool, which is
Chi Square test.Chi Square Test is used to identify if two categorical variables are dependent on each
other or not which means that if there is a significant relationship between the two categorical variables.
For our purpose, we have considered a significance level of 95%.
Steps Followed
Since the data for our respective variables is in aggregated form and accompanied with frequencies, we
need to assign weight to cases to tell SPSS that frequency variable will be used to count the number of
occurrences of each case.
5. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Once weight case have been assigned. Go To Analyze > Descriptive > Cross Tabs to perform Chi Square
Test.
We can select either of the variables as Row or Column. Once done, go to Statistics tab.
6. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Check the Chi Square option at the top. Since this options tells us only if the relationship is significant or
not, we also need to check Phi and Cramer’s V option under Nominal section to represent how strong the
relationship is. Click Continue and click on the Cells tab.
Observed and Expected options represent the frequency which have been observed for each option and
expected frequency (calculated by taking the average proportion of all the given frequencies ). Click
continue. Once these options have been selected, we can run our test.
7. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
INTERPRETATION
Product * Gender Crosstabulation : This table in the final output shows us the observed and expected
values for our respective frequencies.
Chi-Square Tests: This shows us the values for the following terms:
1. PearsonChi-Square
a. Value: Values of 15.714 represents our df and significance value in Chi- Square Table.
b. Df: It is the degree of freedom which counts the number of variables.
8. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
c. AsymptoticSignificance:It is the p-value. A value of 0.001 signifies that as per the given
data, relationship between gender of the customers and products sold is significant.
2. Symmetric Measures
a. Phi Value: A value of .117 shows that gender and products sold are related by 11.7% or
gender explains the value of product sold by 11.7%.
b. Approximate Significance: Value of .001 suggests that our result is significant.
9. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Problem 2
A manger wants to see if geographical region is associated with ownership of a
Macintosh computer. Determine if owning a mac and the geographical region
where the owner lives are related.
Region Mac No Mac
North East 12 56
South West 70 18
Mid West 17 18
In this case we have two (or more) variables, both of which are categorical, and we
want to determine if they are independent or related.
Hypothesis: H0:the two variables are independent, H1: The two variables are not independent
(they are related)
Chi Square TestResults-
Region* Mac_NoMac
10. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
11. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Interpretation
Region* Mac_NoMacCrosstabulation : This table in the final output shows us the observed and
expected values for our respective frequencies.
Chi-Square Tests: This shows us the values for the following terms:
1. PearsonChi-Square
a. Value: Values of 59.049 represents our df and significance value in Chi- Square Table.
b. Df: It is the degree of freedom which counts the number of variables.
c. AsymptoticSignificance:It is the p-value. A value of 0.000 signifies that as per the given
data, relationship between gender of the customers and products sold is significant.
2. Symmetric Measures
a. Phi Value: A value of .556 shows that Region and Mac/No Mac sold are related by
55.6% or region explains the value of Mac/No Mac sold by 55.6%.
b. Approximate Significance: Value of .000 suggests that our result is significant.
12. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Problem 3
A marketeer wants to know the relation between the brand of smartphone people use and the brand they'd
like to use. She'll first try to establish these are related in the first place by testing the null hypothesis that
the current phone brand and the desired phone brand are independent. She collects data on 150
respondents, resulting in phone_brands.sav, part of which is shown below.
Hypothesis: H0: the two variables are independent, H1: The two variables are
not independent (they are related)
Chi Square TestResults-
13. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
14. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Interpretation
The main conclusion from this graph is that smartphone users are quite loyal to brands; users of every
brand still prefer the brand they're using. The effect is strongest for HTC users. The four histograms are
far from similar; independence between current and preferred doesn't seem to hold even approximately.
We'll first look at the Crosstabulation table. Since both variables have 4 answer categories, (4 * 4 =) 16
different combinations may occur in the data. For each combination (or “cell”), the table presents the
frequency with which it occurs. We already saw a visual representation of these 16 observed frequencies
in the graph we ran earlier.
Next, we'll inspect the Chi-Square Tests table. Now, the null hypothesis of independence implies that
each cell should contain a given frequency. However, the observed frequencies often differ from such
expected frequencies. The Pearson Chi-Square test statistic basically expresses the total difference
between the 16 observed frequencies and their expected counterparts; the larger its value, the larger the
difference between the data and the null hypothesis.
The p-value, denoted by “Asymp.Sig. (2-tailed)”, is .000. This means that there's a 0% chance to find the
observed (or a larger) degree of association between the variables if they're perfectly independent in the
population.
In the table depicting Symmetric measures, a value of .935 shows that current brand being used and
preferred brand are related by 93.5% or current brand explains the preferred brand of respondents by
93.5%. Moreover, approximate significance value of .000 suggests that our result is significant.
Therefore,we observed a strong association between the current and the preferred brands of phones given
Pearson chi square is 131.2 and p value is .000.
15. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Problem 4
AnNGO wants to know is there and relation between the people drinking alcohol
and their smoking habits in an organization.
Solution
H0: The smoking habits of the people are independent of the drinking habits of the people i.e. the two
variables are independent.
H1: The smoking habits of the people are dependent on the drinking habits of the people i.e. the two
variables are not independent (they are related).
Data Sample
16. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
17. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Interpretation
Cross-Tabulation
They provide a basic picture of the interrelation between two variables and can help find interactions
between them. This table shows the actual count and the expected count. This shows the various
possibilities that can be formed. It contains the summarized information of the sample data.
Chi Square
From the Chi Square table we can see that the significance value is .155 which is more than .05. This
means that the null hypothesis is accepted. That means there is no relation/association between the
smoking habits of the people and their drinking habits.
Symmetric measures
In the table depicting Symmetric measures, we can see that the significance value is .155 which is greater
than .05. Therefore we accept the null hypothesis.
18. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Problem 5
A College wants to know that is there any relation between the marks scored by
the student and place where they stay so that administration can improve the
result of the college.
Solution:
H0: The marks scored by students is independent of the place where they live i.e. the two variables are
independent.
H1:The marks scored by students is dependent of the place where they livei.e. the two variables are not
independent (they are related).
19. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Interpretation
Cross-Tabulation
They provide a basic picture of the interrelation between two variables and can help find interactions
between them. This table shows the actual count and the expected count. This shows the various
possibilities that can be formed. It contains the summarized information of the sample data.
Chi Square
From the Chi Square table we can see that the significance value is .943 which is more than .05. This
means that the null hypothesis is accepted. That means there is no relation/association between the marks
scored by student and where they live.
20. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Symmetric measures
In the table depicting Symmetric measures, we can see that the significance value is .943 which is greater
than .05. Therefore we accept the null hypothesis.
Problem 6
In the given data set the people have been categorized into 3 economic classes namely class1, 2 and 3.
Full body checkups were performed on them and they were checked for any disease. Now, the purpose of
this task is to see whether the health of a person and the economic class to which he/she belongs.
Solution:
H0: The two variables are independent of each other
H1: The health of a person is dependent on his/her economic class
Interpretation
Cross-Tabulation
They provide a basic picture of the interrelation between two variables and can help find interactions
between them. This table shows the actual count and the expected count. This shows the various
possibilities that can be formed. It contains the summarized information of the sample data.
21. March 15, 2017 [BRM PROJECT REPORT: CHI-SQUARE TEST OF INDEPENDENCE]
Chi-square
From the Chi Square table we can see that the significance value is .862 which is more than .05. This
means that the null hypothesis is accepted. That means there is no relation/association between the marks
scored by student and where they live.
Symmetric measures
In the table depicting Symmetric measures, we can see that the significance value is .862 which is greater
than .05. Therefore we accept the null hypothesis.