Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 5
Module 5
Chapter 10: Correlation and Regression
Chapter 11: Goodness of Fit and Contingency Tables
Chapter 12: Analysis of Variance
Practice test ch 10 correlation reg ch 11 gof ch12 anova
1. 1
Statistics, Sample Test (Exam Review)
Module 5: Chapters 10, 11 & 12 Review
Chapters 10: Correlation & Regression
Chapters 11: Goodness-of-Fit (Multinomial Experiments) & Contingency Tables)
Chapter 12: Contingency Tables, Analysis of Variance
Chapters 10: Correlation & Regression
Instructions: Read this Mini Lecture or your text, or study the tutorials online thoroughly to
be able to handle this Sample Test at the end of this lecture.
Mini Lecture:
Chapter 10: Correlation and Regression
Linear Correlation Coefficient: 𝒓 =
𝒏(∑𝒙𝒚)−∑ 𝒙•∑ 𝒚
√[𝒏(∑ 𝒙
𝟐
)−(∑𝒙)𝟐][𝒏(∑ 𝒚
𝟐
)−(∑ 𝒚)𝟐]
• –1 r 1
• Value of r does not change if all values of either variable are converted to a different
scale.
• The r is not affected by the choice of x and y. Interchange x and y will not change the
value of r.
• r measures strength of a linear relationship.
Testing for a Linear Correlation:
Calculate the value of r (Formula above), and choose the value of α
Step 1: H0: 𝛒 = 𝟎, H1: 𝛒 ≠ 𝟎, 2TT, Claim
Step 2: Method 1: Test statistic (TS) 𝒕 =
𝒓
√𝟏−𝒓𝟐
𝒏−𝟐
Method 2: 𝑻𝑺: 𝒓
Step 3: Critical Value (CV), RR & NRR.
Method 1: Use Critical T-value given & 𝐝𝐟 = 𝒏 − 𝟐
Method 2: Use Critical r-value From Pearson Correlation Coefficient table using 𝒏 &
Step 4: Decision
If the absolute value of TS does not exceed
the critical values:
a. Do not Reject H0
b. The claim is False
c. There is no linear correlation between
the 2 variables.
If the absolute value of TS exceeds the
critical values:
a. Reject H0
b. The claim is True
c. There is a significant linear
correlation between the 2 variables.
Regression, Regression Equation
The regression equation expresses a relationship between x (called the independent variable,
predictor variable or explanatory variable, and y (called the dependent variable or response
variable.
2. 2
The typical equation of a straight line is expressed in the form of y = mx + b, where b is the y-
intercept and m is the slope. (Given a collection of paired data, the regression equation,
algebraically describes the relationship between the two variables)
Note:
Population Parameter: 0 1
y x
= + Sample Statistic: 𝑦
̂ = 𝑏0 + 𝑏1𝑥
Formulas:
𝒚
̂ = 𝒃𝟎 + 𝒃𝟏𝒙, OR 𝒚
̂ = 𝒂 + 𝒃𝒙,
𝑺𝒍𝒐𝒑𝒆: 𝒃𝟏 =
𝒏 ∑ 𝒙𝒚−∑ 𝒙 ∑ 𝒚
𝒏 ∑ 𝒙𝟐−(∑ 𝒙)𝟐 𝒀 − 𝐢𝐧𝐭 𝒆 𝒓𝒄𝒆𝒑𝒕: 𝒃𝟎 = 𝒚 − 𝒃𝟏𝒙, 𝒚 =
∑ 𝒚
𝒏
, 𝒙 =
∑ 𝒙
𝒏
Also: 𝒃𝟏 = 𝒃 = 𝒓
𝒔𝒚
𝒔𝒙
, 𝒃𝟎 = 𝒂 = 𝒚
̅ − 𝒃𝟏𝒙
̅
r is the linear correlation coefficient
sy is the standard deviation of the sample y values
sx is the standard deviation of the sample x values.
Regression Line
The graph of the regression equation is called the regression line (or line of best fit, or least
squares line). Best fit means that the sum of the squares of the vertical distance (residuals)
from each point to the line is at a minimum.
If there is not a significant linear
correlation, the best predicted y-
value value is 𝒚.
If there is a significant linear
correlation, the best predicted y-
value is found by substituting the
x-value into the regression
equation.
Residual: Observed value –
Predicted value: 𝑦 − 𝑦
̂
Module 5: Chapters 10, 11 & 12 Review
Statistics, Sample Test (Exam Review)
Chapters 10: Correlation & Regression
1. Given the sample data:
3. 3
Data
X 1 1 3 5
Y 2 8 6 4
a. Find the value of the linear correlation coefficient r.
b. Test the claim that there is a linear correlation between the two variables x and y. Use both (a)
Method 1 and (b) Method 2. ( = 0.05)
c. Find the regression equation.
d. Find the best predicted value of y, when x is equal to 2.
2. A researcher is studying the effects of Testosterone injections on different lab mice. She
controls the concentration of Testosterone injected into the mice and measures their
metabolic rates. Analyze the following dataset. Will you be running a correlation or a
regression?
a. Find the value of the linear correlation coefficient r, and test the claim that there is a
linear correlation between the two variables x and y. ( = 0.05)
b. Find the regression equation.
c. Find the best predicted value of y, metabolic rates, when x is equal to 1.3, the
concentration of Testosterone.
d. Using the above pairs and the value of r, what proportion of the variation in metabolic
rates can be explained by the variation in the concentration of Testosterone?
Testosterone: x Metabolic Rate: y
0.6 500
0.9 556
1.02 578
1.06 581
1.08 590
1.11 605
1.16 612
1.2 624
1.24 639
1.27 640
1.32 643
1.37 650
1.39 658
1.43 661
1.46 670
1.49 672
1.5 673
4. 4
Statistics, Sample Test (Exam Review)
Module 5: Chapters 10, 11 & 12 Review
Chapters 11: Goodness-of-Fit (Multinomial Experiments) and Contingency Tables)
Instructions: Read this Mini Lecture or your text, or study the tutorials online thoroughly to
be able to handle this Sample Test at the end of this lecture.
Mini Lecture:
Chapter 11: Multinomial Experiments & Contingency Tables
Definition:
A Multinomial Experiment is an experiment that meets the following conditions:
1. The number of trials is fixed.
2. The trials are independent.
3. All outcomes of each trial must be classified into exactly one of several different categories.
4. The probabilities for the different categories remain constant for each trial.
11.1: Goodness-of-fit Test in Multinomial Experiments:
Definition
A goodness-of-fit test is used to test the hypothesis that an observed frequency distribution fits
(or conforms to) some claimed distribution.
0 represents the observed frequency of an outcome
E represents the expected frequency of an outcome
k represents the number of different categories or outcomes
n represents the total number of trials
If all expected frequencies are equal, then
n
E
k
= , the sum of all observed frequencies divided by the number of categories.
If the expected frequencies are not all equal:
E np
= , each expected frequency is found by multiplying the sum of all observed frequencies
by the probability for the category.
Test Statistic:
( )
2
2 O E
E
−
=
Critical Values
Found in Table A-4 using k – 1 degrees of freedom where k = number of categories
Goodness-of-fit hypothesis tests are always right-tailed.
11.2: Contingency Tables: Independence and Homogeneity
Contingency Table (or two-way frequency table)
Definition
A contingency table is a table in which frequencies correspond to two variables.
(One variable is used to categorize rows, and a second variable is used to categorize columns.)
Contingency tables have at least two rows and at least two columns.
Test of Independence
5. 5
This method tests the null hypothesis that the row variable and column variable in a contingency
table are not related. (The null hypothesis is the statement that the row and column variables are
independent.)
Assumptions
The sample data are randomly selected.
The null hypothesis H0 is the statement that the row and column variables are independent; the
alternative hypothesis H1 is the statement that the row and column variables are dependent.
For every cell in the contingency table, the expected frequency E is at least 5. (There is no
requirement that every observed frequency must be at least 5.)
Test of Independence
Test Statistic:
( )
2
2 O E
E
−
=
Critical Values
1. Found in Table A-4 using: degrees of freedom = (r – 1)(c – 1)
r is the number of rows and c is the number of columns
2. Tests of Independence are always right-tailed.
Expected Frequency for Contingency Tables
( )
( )
( )
( )
Rowtotal Columntotal
E np Grandtotal
Grandtotal Grandtotal
= = • •
( )( )
( )
rowtotal columntotal
E
Grandtotal
=
Grand Total = Total number of all observed frequencies in the table
Test of Homogeneity
In a test of homogeneity, we test the claim that different populations have the same proportions
of some characteristics.
How to distinguish between a test of homogeneity and a test for independence:
Were predetermined sample sizes used for different populations (are they similar in nature?)
(test of homogeneity,), or was one big sample drawn so both row and column totals were
determined randomly (test of independence)?
Statistics, Sample Test (Exam Review)
Module 5: Chapters 10, 11 & 12 Review
Chapters 11: Goodness-of-Fit (Multinomial Experiments) and Contingency Tables)
1. Here are the observed frequencies from four categories: 5, 6, 8, and 13. At 0.05 significance
level, test the claim that the four categories are all equally likely.
a. State the null and alternative hypothesis.
6. 6
b. What is the expected frequency for each of the four categories?
c. What is the value of the test statistic?
d. Find the critical value(s).
e. Make a decision
2. A professor asked 40 of his students to identify the tire they would select as a flat tire of a car
carrying 4 students who misses a test (an excuse). The following table summarizes the result,
use a 0.05 significance level to test the claim that all 4 tires have equal proportions of being
claimed as flat.
Tire Left Front Right Front Left Rear Right Rear
Number selected 11 15 8 6
3. Using a 0.05 significance level, test the claim that when the Titanic sank, whether someone
survived or died is independent of whether that person is a man, woman, boy, or girl.
4. Using the following table, with a 0.05 significance level, test the effect of pollster gender on
survey responses by men.
Gender of Interviewer
Men Women
Men Who Agree 560 308
Men Who Disagree 240 92
5. Test of Homogeneity: Using the following table, with a 0.05 significance level, test the
association between people living in a city and becoming infected with a highly resistant
bacterium.
Living
Location
City
Outside City
Bacterial Infected 4239
5923
Condition Not Infected 12900
18986
7. 7
Statistics, Sample Test (Exam Review)
Module 5: Chapters 10, 11 & 12 Review
Chapter 12: Contingency Tables, Analysis of Variance
Instructions: Read this Mini Lecture or your text, or study the tutorials online thoroughly to
be able to handle this Sample Test at the end of this lecture.
Mini Lecture:
Chapter 12: Analysis of Variance (ANOVA)
12-2: One-Way ANOVA
Analysis of variance (ANOVA) is a method for testing the hypothesis that three or more-
population means are equal.
For example:
H0: µ1 = µ2 = µ3 = . . . µk
H1: At least one mean is different
ANOVA methods require the F-distribution:
1. The F-distribution is not symmetric; it is skewed to the right.
2. The values of F can be 0 or positive, they cannot be negative.
3. There is a different F-distribution for each pair of degrees of freedom for the numerator and
denominator. Critical values of F are given in Table A-5
An Approach to Understanding ANOVA
1. Understand that a small P-value (such as 0.05 or less) leads to the rejection of the null
hypothesis of equal means. With a large P-value (such as greater than 0.05), fail to reject the
null hypothesis of equal means.
2. Develop an understanding of the underlying rationale by studying the example in this section.
3. Become acquainted with the nature of the SS (sum of squares) and MS (mean square) values
and their role in determining the F test statistic, but use statistical software packages or a
calculator for finding those values.
Definition: Treatment (or factor)
A treatment (or factor) is a property or characteristic that allows us to distinguish the different
populations from another. Use Technology for ANOVA calculations if possible
Assumptions
1. The populations have approximately normal distributions.
2. The populations have the same variance 2 (or standard deviation ).
3. The samples are simple random samples.
4. The samples are independent of each other.
5. The different samples are from populations that are categorized in only one way.
Procedure for testing:
H0: µ1 = µ2 = µ3 = . . .
1. Use Technology to obtain results.
2. Identify the P-value from the display.
8. 8
3. Form a conclusion based on these criteria:
If P-value , reject the null hypothesis of equal means.
If P-value > , fail to reject the null hypothesis of equal means.
Example:
A researcher wishes to try three different techniques to lower blood pressure of individuals
diagnosed with high blood pressure. The subjects are randomly assigned to the three groups; the
first takes medication, the second exercises and the third follows a certain diet. After 4 weeks the
reduction in each person’s blood pressure is recorded. α = 0.05
The data:
Medication Exercise Diet
10 6 5
12 8 9
9 3 12
15 0 8
Solution:
Step 1: H0: μ1 = μ2 = μ3 (claim)
H1: At least one mean is different from the others. (RTT, claim)
Step 2: Test statistic:
Find the Grand Mean, the mean of all values in the samples.
𝑥𝐺𝑀 =
∑ 𝑥
𝑁
=
10 + 6 + 5 + ⋯ + 4
5 + 5 + 5
= 7.73
Find the Between-group variance (Mean Square, MSB.)
SSG = 5(11.8 − 7.73)2 + 5(3.8 − 7.73)2 + 5(7.6 − 7.73)2 = 160.13
𝑴𝑺𝑩 𝑶𝑹 𝒔𝑩
𝟐
=
𝑺𝑺𝑩
𝒌 − 𝟏
=
∑ 𝒏𝒊(𝑿
̄ 𝒊 − 𝑿
̄ 𝑮𝑴)𝟐
𝒌 − 𝟏
𝟓(𝟏𝟏. 𝟖 − 𝟕. 𝟕𝟑)𝟐
+ 𝟓(𝟑. 𝟖 − 𝟕. 𝟕𝟑)𝟐
+ 𝟓(𝟕. 𝟔 − 𝟕. 𝟕𝟑)𝟐
𝟑 − 𝟏
=
𝟏𝟔𝟎. 𝟏𝟑
𝟐
= 𝟖𝟎. 𝟎𝟕
Find the Within-group variance (Error)(Mean of sample variances, or mean square, MSW) SSE =
(5 − 1)5.7 + (5 − 1)10.2 + (5 − 1)10.3 = 104.8
MSE = 104.8/(15 − 3) = 8.73
Or:
𝑀𝑆𝑤 𝑂𝑅 𝑠𝑊
2
𝑂𝑅 𝑠𝑝
2
=
𝑆𝑆𝑤
𝑁 − 𝑘
=
∑(𝑛𝑖 − 1)𝑠𝑖
2
∑(𝑛𝑖 − 1)
4(5.7) + 4(10.2) + 4(10.3)
4 + 4 + 4
=
104.80
12
= 8.73
𝐹 =
𝑀𝑆𝐵
𝑀𝑆𝑊
=
𝑠𝐵
2
𝑠𝑊
2 =
80.07
8.73
= 9.17
9. 9
Source Sum of
Squares
d.f. Mean
Squares
F
Between
Within (error)
SSB =160.13
SSW =104.80
k – 1= 2
N – k=12
MSB = 80.07
MSW = 8.73
9.17
Total 264.93 14
Step 3: Find the critical value.
Number of Groups (Factors): k = 3, d.f.N. = k – 1 = 3 – 1 = 2,
Total sample size: N = 15, d.f.D. = N – k = 15 – 3 = 12
α = 0.05 → CV: 𝑭 = 3.8853
Step 4: Decision:
a. Reject H0
b. The claim is False
c. There is NOT sufficient evidence to support the claim that there is no difference among
the means and conclude that at least one mean is different from the others.
CV: F = 3.8853 TS: 𝑭 = 9.17
0.05
10. 10
Statistics, Sample Test (Exam Review)
Module 5: Chapters 10, 11 & 12 Review
Chapter 12: Contingency Tables, Analysis of Variance
1. 1. Weights in kg of Cypress trees were obtained from trees planted in a Sandy and Dry
area. They were given different treatments as indicated. Fill up the table below and use
a 0.05 significance level to test the claim that the four treatment categories yield Cypress
trees with the same mean weight. 𝑥𝐺𝑀 =
∑𝑥
𝑁
No Treatment Fertilizer Irrigation
Fertilizer &
Irrigation
0.24 0.92 0.96 1.07
1.69 0.07 1.43 1.63
1.23 0.56 1.26 1.39
0.99 1.74 1.57 0.49
1.8 1.13 0.75 0.95
2. (One-Way ANOVA) Given the readability scores summarized in the following table and a
significance level of = 0.05, Use Technology to test the claim that the three samples come
from populations with means that are not all the same.
Clancy Rowling Tolstoy
N 12 12 12
x 70.73 80.75 66.15
S 11.33 4.68 7.86