Statistics 1 (FPN) QP

Statistics 1 (FPN)– Question Pool
Welcome to Success Formula Question Pool
Disclaimers
• All slides and its materials are the property of Success Formula
• You get an exclusive free personal access once buying the course the slides are made for
• The slides are individually marked, and Success Formula can track to which users they belong
• No part of this slide deck may be reproduced, distributed, or transmitted (hereafter in this slide
referred together as “Shared”) in any form or by any means, including sharing the material on
platforms such as StudyDrive
• In case slides are shared, Success Formula can attempt legal actions towards the sharing party in line
with European and Dutch Law (Copyright laws)
1
Error Bounty
• If you find any mistake in this slide deck, let us know and we will refund you the cost of the slides
• Only the first person indicating the mistake gets the refund
Answers
Question
Some people seem to like Breaking Bad, others like Prison Break. What is the percentage of people that
watch TV?
2
A. The Walking Dead
B. Depends on the year
C. All of them
D. Answer D because it is the best answer
Answer: C
Introduction question Question topic
The question
Difficulty
Answers
Correct
Answer
Significance level
*** Always use a significance level of 0.05 if
otherwise not specified***
3
Stats1 – Question Pool
Probability Theory
Answers
Question
Florian wants to show Julian a new magic trick. As part of the trick, Julian has to pull a card out of a 52
card deck, 3 times in a row, each time keeping the card before pulling the next one. There are 26 red
cards and 26 black cards.
Which statement is incorrect?
5
A. The probability that out of the three chosen cards, there is at least one red card or at least one black
card is equal to 1
B. The outcome of the 2nd trial will influence the outcome of the 3rd trial
C. The probability of picking a queen of hearts equals the probability of picking a queen of hearts
given that in the previous trial Julian picked a 7 of spades
D. The sample space is all the possible combinations of cards that can be drawn in a sample of 3
Answer: C
1. Probability Theory
1E. Probability Theory
Question
Florian wants to show Julian a new magic trick. As part of the trick, Julian has to pull a card out of a 52
card deck, 3 times in a row, each time keeping the card before pulling the next one. There are 26 red
cards and 26 black cards. Which statement is incorrect?
6
Solution
A. Correct. Since the deck of cards has an equal number of red and black cards, Julian will definitely
pick at least 1 card of either black or red colour, meaning that we have a perfect probability equal to
1
B. Correct. Every time Julian picks a card, he does not put it back, meaning that each outcome of every
trial will influence the next one (the events become dependent)
C. Incorrect. P(QH) = P(QH/7S) à That would be correct if the events were independent. In other
words, if after every trial, Julian put his chosen card back in the deck.
D. Correct. Julian picks 3 cards in total so any possible combination that he can make with 3 cards is
included in the sample space
Answers
Question
Suppose that 2 dice are rolled at the same time. Calculate the following probabilities:
• P(A): The sum of the two numbers is equal to 1
• P(B): The sum of the two numbers is equal to 5
• P(C): The sum of the two numbers is less than 13
7
A. P(A) = 0.5, P(B) = 0.23, P(C) = 0
B. P(A) = 0, P(B) = 0.111, P(C) = 1
C. P(A) = 1, P(B) = 0.12, P(C) = 0
D. The probabilities cannot be calculated
Answer: B
2. Probability Theory
2E. Probability Theory
Question
Suppose that 2 dice are rolled at the same time.
Calculate the following probabilities:
• P(A): The sum of the two numbers is
equal to 1
• P(B): The sum of the two numbers is
equal to 5
• P(C): The sum of the two numbers is less
than 13
Sample Space:
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
8
Solution
No possible combination resulting from rolling 2
dice at the same time can give us a sum equal to 1
since dice do not have the number 0.
• The smallest sum we can find is equal to 2,
resulting from the combination (1,1)
• P(A) = 0
To calculate P(B), we need to identify from our
sample space the combinations that yield a sum
of 5. In this case, we have 4 combinations
(colored ones).
• We can use the general formula
• P(A) =
𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑨
𝒕𝒐𝒕𝒂𝒍
• 𝑷 𝑨 =
𝟒
𝟑𝟔
=
𝟏
𝟗
= 𝟎. 𝟏𝟏𝟏
We can observe that the combination resulting in
the largest sum is the (6,6) with a sum of 12.
• This means that all possible combinations will
yield a sum lower than 13
• P(C) is the probability of the entire sample
space
• P(C) = 1
Answers
Question
An experiment has four mutually exclusive outcomes, A, B, C, and D. If P(A) = 0.33, P(B) = 0.17, P(C) =
0.43, P(D) = 0.07, which of the following statements must be true?
9
A. All of the events are independent with each other
B. The marginal probability of A equals the conditional probability of A given D
C. The joint probability of C and B is equal to 0
D. None of the alternatives is correct
Answer: C
3. Probability Theory
3E. Probability Theory
Question
An experiment has four mutually exclusive outcomes, A, B, C, and D. If P(A) = 0.33, P(B) = 0.17, P(C) =
0.43, P(D) = 0.07, which of the following statements must be true?
10
Solution
A. Incorrect. Given that all of our 4 events are mutually exclusive, they cannot happen at the same
time. Thus, we know that our events must be dependent on each other.
B. Incorrect. This is only the case when the 2 events are independent with one another [𝑃 𝐴 =
𝑃 ⁄
𝐴 𝐵 .]
C. Correct. Οur events are mutually exclusive, meaning that they cannot happen at the same time.
[P(C AND B) = 0]
D. Incorrect. C is the correct statement.
Answers
Question
Suppose we conduct a random experiment and two events, A and B are independent. Which of the
following rules can we use to prove the relationship between A and B?
11
A. P(A and B) = 0
B. P(and B) = P(A) x P(B/A)
C. P(A or B) = P(A) + P(B) – P(A and B)
D. P(A)=P(A/B)
Answer: D
4. Probability Theory
4E. Probability Theory
Question
Suppose we conduct a random experiment and two events, A and B are independent. Which of the
following rules can we use to prove the relationship between A and B?
12
Solution
A. Incorrect. P(A and B) = 0 is the rule for spotting disjoint events. It shows that the two events cannot
happen at the same time.
B. Incorrect. P(A and B) = P(A) x P(B/A) is the general multiplication rule
C. Incorrect. P(A or B) = P(A) + P(B) – P(A and B) is the general addition rule
D. Correct. P(A) = P(A/B) is a rule for spotting independent events, showing that the probability of
event A is not influenced by the occurrence of event B
Answers
Question
A recent survey showed that 45% of Success Formula students prefer to visit Tapijn park to relax after a
long day of studying. Also, 27% of UM students both like to go to Tapijn park and the city center to
relax. Finally, the survey showed that 40% of students said that they don’t visit the city center for some
time off. Based on the above data, determine the following probabilities:
a. PA: the probability that a randomly selected UM student visits Tapijn given that he/she also
visits the city center
b. PB: the probability that a randomly selected UM student visits Tapijn or visits the city center
13
A. P(A) = 0.45, P(B) = 0.27
B. P(A) = 0.88, P(B) = 0
C. P(A) = 0.18, P(B) = 0.85
D. P(A) = 0.45, P(B) = 0.78
Answer: D
5. Probability Theory
5E. Probability Theory
Question
A recent survey showed that 45% of Success
Formula students prefer to visit Tapijn park to
relax after a long day of studying. Also, 27% of
UM students both like to go to Tapijn park and
the city center to relax. Finally, the survey
showed that 40% of students said that they don’t
visit the city center for some time off. Based on
the above data, determine the following
probabilities:
a. PA: the probability that a randomly
selected UM student visits Tapijn given
that he/she also visits the city center
b. PB: the probability that a randomly
selected UM student visits Tapijn or
visits the city center
P(Tapijn) = 0.45
P(Tapijn AND City) = 0.27
𝑷 𝑪𝒊𝒕𝒚! = 0.4
P(City) =𝟏 − 𝑷 𝑪𝒊𝒕𝒚!
P(City) = 𝟏 − 𝟎. 𝟒 = 𝟎. 𝟔
14
Solution
Ø For P(A) we are looking for the P(Tapijn/City)
Ø We can first check if these 2 events are
independent
• 𝑃 𝐴 𝐴𝑁𝐷 𝐵 = 𝑃 𝐴 ×𝑃 𝐵 à rule for
spotting independence
• 0.27 = 0.45 × 0.6
• 0.27 = 0.27 à P(Tapijn) and P(City) are
independent
• P(Tapijn/City) = P(Tapijn)
• P(A) = 0.45
Ø For P(B) we want the P(Tapijn Or City)
Ø The joint probability of these events is not
equal to 0, thus the events are non-disjoint
Ø We can use the general formula
• 𝑃 𝐵 = 𝑃 𝑇𝑎𝑝𝑖𝑗𝑛 + 𝑃 𝐶𝑖𝑡𝑦 −
𝑃 𝑇𝑎𝑝𝑖𝑗𝑛 𝐴𝑛𝑑 𝐶𝑖𝑡𝑦
• P(B) = 0.45 + 0.6 – 0.27
• P(B) = 0.78
Answers
Question
Suppose one runs a random experiment with 3 events (A, B, C). Events A and B are disjoint, C is
independent of A and dependent with B. P(B) = 0.3, P(C/B) = 0.135, P(C/A) =0.48, P(C and A) = 0.16.
Calculate the following probabilities:
a. P(C)
b. P(A and B)
c. P(B or C)
d. P(A or B)
15
A. P(C) = 0.48, P(A and B) = 0, P(B or C) = 0.74, P(A or B) = 0.63
B. P(C) = 0.48, P(A and B) = 0.0405, P(B or C) = 0.78, P(A or B) = 0
C. P(C) = 0.48, P(A and B) = 0, P(B or C) = 0.63, P(A or B) = 0.74
D. P(C) = 0.48, P(A and B) = 0.73, P(B or C) = 0.86, P(A or B) = 0.63
Answer: A
6. Probability Theory
6E. Probability Theory
Question
Suppose one runs a random experiment with 3
events (A, B, C). Events A and B are disjoint, C is
independent of A and dependent with B. P(B) =
0.3, P(C/B) = 0.135, P(C/A) =0.48, P(C and A) =
0.16. Calculate the following probabilities:
a. P(C)
b. P(A and B)
c. P(B or C)
d. P(A or B)
16
Graph
Event C
Event B
Event A
Solution
Since events A and C are independent we can say:
• P(C) = P(C/A)
• P(C) = 0.48
We know that events A and B are disjoint and we
also see that there is no intersection in the graph:
• P(A and B) = 0
P(B or C) = P(B) + P(C) – P(B and C)
• We do not have P(B and C) but we can find it
using the multiplication rule
• P(B and C) = P(B) x P(C/B) = 0.3 X 0.135 =
0.0405
• P(B or C) = 0.3 + 0.48 - 0.0405 = 0.74
Since A and B are disjoint events we will use the
special form of the formula:
• P(A or B) = P(A) +P(B)
• We can calculate P(A) using the
multiplication rule
• P(C and A) = P(A) x P(C)
• à P(A) = 0.16/0.48 = 0.33
P(A or B) = 0.33 + 0.3 = 0.63
Answers
Question
Remco decides to investigate which Dutch delicacy is most preferred by students in Maastricht. He
writes down his results in the following table. Calculate the following probabilities:
1. The probability that we randomly select a student who likes fries, given that they are a male
2. The probability that we randomly select a student who is a female, given they like fries
3. The probability that the student likes bitterballen
17
A. P(1) = 66.67%, P(2) = 34.78%, P(3) = 32.5%
B. P(1) = 20%, P(2) = 66.67%, P(3) = 17.5%
C. P(1) = 34.78%, P(2) = 33.33%%, P(3) = 32.5%
D. P(1) = 34.78%, P(2) = 23.52%, P(3) = 17.5%
Answer: C
7. Probability Theory
Fries Bitterballen Stroopwaffles
Male 40 35 40 115
Female 20 30 35 85
60 65 80 200
7E. Probability Theory
Question
Remco decides to investigate which Dutch
delicacy is most preferred by students in
Maastricht. He writes down his results in the
following table. Calculate the following
probabilities:
1. The probability that we randomly select a
student who likes fries, given that they are a
male
2. The probability that we randomly select a
student who is a female, given they like fries
3. The probability that the student likes
bitterballen
18
Solution
P(1) = P(Fries/Male)
• It is a conditional probability so we are not
working within the entire sample space
• The condition indicates the denominator
• 𝑃 1 =
!"
##$
= 34,78%
P(2) = P(Female/Fries)
• P(2) =
"#
$#
= 33.33%
P(3) = P(Bitterballen)
• It is the marginal probability within the
entire sample space
• P(3) =
$%
"##
= 32.5%
Fries Bitter
ballen
Stroop
waffles
Male 40 35 40 115
Female 20 30 35 85
60 65 80 200
Answers
Question
Refer to the table from the previous question. Which of the following statements is correct:
19
A. The probability P(Bitterballen/Female) is not evaluated across the entire sample space
B. The events of picking randomly someone that is a female and of picking randomly someone who
likes stroopwaffles are disjoint
C. The marginal probability of P(Fries) is equal to the conditional probability of P(Fries/Male)
D. The events of randomly picking a male and randomly picking someone that likes stroopwaffles are
independent
Answer: A
8. Probability Theory
Fries Bitterballen Stroopwaffles
Male 40 35 40 115
Female 20 30 35 85
60 65 80 200
8E. Probability Theory
Question
Refer to the table from the previous question.
Which of the following statements is correct:
20
Solution
A. Correct. P(Bitterballen/Female) is not
evaluated across the entire sample space,
Conditional probabilities are evaluated across
a subset of the entire sample space, in this
case acorss the subset of females.
B. Incorrect. We can see from the table that
there are females that prefer stroopwaffles
(n=35), so these 2 events can happen at the
same time (not Disjoint)
C. Incorrect. P(Fries) ≠ P(Fries/Male)
𝑃(𝐹𝑟𝑖𝑒𝑠) =
60
200
= 0.3
𝑃
𝐹𝑟𝑖𝑒𝑠
𝑀𝑎𝑙𝑒
=
40
115
= 0.35
D. Incorrect. P(Male) ≠ P(Male/Stroopwaffles)
𝑃 𝑀𝑎𝑙𝑒 =
115
200
= 0.575
𝑃 𝑀𝑎𝑙𝑒/𝑆𝑡𝑟𝑜𝑜𝑝𝑤𝑎𝑓𝑓𝑙𝑒𝑠 =
40
80
= 0.2
Fries Bitterb
allen
Stroop
waffles
Male 40 35 40 115
Female 20 30 35 85
60 65 80 200
Answers
Question
The probability of meeting someone who wears eyeglasses randomly in the street is 0.55. When
meeting 4 random people, what is the probability that the number of people that you meet wearing
eyeglasses is 3 or higher?
21
A. P(X≥ 3) = 0.392
B. P(X≥ 3) = 0.346
C. P(X≥ 3) = 0.092
D. The probability cannot be calculated because we do not have the sample size
Answer: A
9. Probability Theory
9E. Probability Theory
Question
The probability of meeting someone who wears
eyeglasses randomly in the street is 0.55. When
meeting 4 random people, what is the probability
that the number of people that you meet wearing
eyeglasses is 3 or higher?
22
Solution
G
G
G
G
NG
NG
G
NG
NG
G
G
NG
NG
G
NG
NG
G
G
G
NG
NG
G
NG
NG
G
G
NG
NG
G
NG
0.55
0.45
9E. Probability Theory
23
Find the Right
Combinations
Since we are looking for the probability of meeting 3 or more people with glasses
in our sample of 4, the right combinations are the following:
• G-G-G-G
• G-G-G-NG
• G-G-NG-G
• G-NG-G-G
• NG-G-G-G
Calculate the
Probabilities
We need to calculate the probabilities using multiplication for each of the
combinations:
• G-G-G-G è 0.55 x 0.55 x 0.55 x 0.55 = 0.092
• G-G-G-NG è 0.55 x 0.55 x 0.55 x 0.45 = 0.075
• G-G-NG-G è 0.55 x 0.55 x 0.45 x 0.55 = 0.075
• G-NG-G-G è 0.55 x 0.45 x 0.55 x 0.55 = 0.075
• NG-G-G-G è 0.45 x 0.55 x 0.55 x 0.55 = 0.075
Sum Them Up
We need to add all of the probabilities we just calculated to find the overall
probability of meeting 3 or more people with glasses [P(x ≥ 3)]
• 0.092 + 0.075 + 0.075 + 0.075 + 0.075 = 0.392
Answers
Question
Given the following probability distribution, what is the approximate variance of X?
24
A. 4.05
B. -1.66
C. 7.38
D. 15.52
Answer: D
10. Probability Theory
X P(x)
0 0.4
1 0.8
2 0.32
3 0.15
4 0.54
10E. Probability Theory
Question
25
Solution
Ø First, we need to calculate the expected value in order to use in the formula for the variance:
• µ𝒙 = ∑ 𝑃(𝑥) ∗ x = 0 x 0.4 + 1 x 0.8 + 2 x 0.32 + 3 x 0.15 + 4 x 0.54 = 4.05
Ø We can now calculate the variance using the formula 𝜎3² = ∑ 𝑃(𝑥) ∗ (𝑥 − µ3)²
• 𝜎3² = 0.4 0 − 4.05 4
+ 0.8 1 − 4.05 4
+ 0.32 2 − 4.05 4
+ 0.15 3 − 4.05 4
+ 0.54 4 − 4.05 4
𝜎3² = (6.56) + (7.44) + (1.34) + (0.17) + (0.00135)
𝝈𝒙² = 15.52
Given the following probability distribution, what is the variance of X?
X P(x)
0 0.4
1 0.8
2 0.32
3 0.15
4 0.54
Stats1 – Question Pool
Probability Distribution
Answers
Question
Thomas takes a standardized test as part of his university application. Standardized tests allow
comparisons to be made regarding student achievement. When he received his results, he was told that
he scored -0.28 in terms of Z-scores. However, he is not sure whether that is a good or bad result.
Given that the test scores are normally distributed, what can he conclude from the result?
27
A. He did better than half of the participants
B. He did worse than half of the participants
C. He did worse than 28% of the participants
D. Nothing can be said because we do not have the standard deviation and the mean
Answer: B
1. Probability Distribution
1E. Probability Distribution
Question
Thomas takes a standardized test as part of his university application. Standardized tests allow
comparisons to be made regarding student achievement. When he received his results, he was told that
he scored -0.28 in terms of Z-scores. However, he is not sure whether that is a good or bad result. Given
that the test scores are normally distributed, what can he conclude from the result?
28
Solution
Ø Since Thomas has a Z-score equal to -0.28, it means that he scored 0.28 standard deviations below
the mean. The negative sign indicates the direction in regards to the mean. The mean is the average,
with 50% of the scores below and 50% of the scores above it. Since Thomas is on the left side, we can
say that he performed worse than 50% of the test takers.
1E. Probability Distribution
29
µ
50% 50%
Z=-0.28
Answers
Question
Lea decides to investigate the average income distribution in her hometown. She observes that the
majority of households have a low to middle income and a small minority with a high-income.
Which of the following statements is correct?
30
A. Scores located within 1 standard deviation to the left and right of the mean make up 68% of the
entire data set
B. A household with an income of 2.3 standard deviations above the mean is in the top 2.5% of the
population
C. The variable in question is a discrete variable
D. None of the above statements is correct
Answer: D
2. Probability Distribution
2E. Probability Distribution
Question
Lea decides to investigate the average income distribution in her hometown. She observes that the
majority of households have a low to middle income and a small minority with a high-income.
Which of the following statements is correct?
31
Solution
Ø From the discription, we can understand that the distribution of average income is right skewed,
rather than a normal distribution.
Ø A) and B) alternatives are wrong because they refer to the rule of thumb (68%-95%-99.7%), which
can only be used for normal distributions
Ø The thrid alternative is wrong because the variable of average income can take infinite possible
values, thus the variable is continuous
Answers
Question
Alexandra decides to measure extraversion scores of students at Success Formula. The scores are well
modeled by a normal distribution with a mean of 72 and a standard deviation of 14. What is the
probability of a randomly selected person to score between 66 and 76 for extraversion?
32
A. 28.05%
B. 61.41%
C. 32.98%
D. 40.82%
Answer: A
3. Probability Distribution
3E. Probability Distribution
Question
Alexandra decides to measure extraversion scores of students at Success Formula. The scores are well
modeled by a normal distribution with a mean of 72 and a standard deviation of 14. What is the
probability of a randomly selected person to score between 66 and 76 for extraversion?
33
Solution
Calculate the z-scores: 𝑧& =
'$('"
&)
= 0.29 and 𝑧" =
$$('"
&)
= −0.43
Look up probabilities in z-table: 𝑧& = 0.29 → 61.41% and 𝑧" = −0.43 → 33.36%
Calculate the probability that the score is between 66 and 78: 61.41% − 33.36% = 28.05%
Answers
Question
Suppose that Alexandra measures extraversion scores for a different population with a mean of 80 and
a standard deviation of 9. What is the probability that a randomly selected person scores higher than
91?
34
A. 73.89%
B. 11.12%
C. 40.57%
D. 55.63%
Answer: B
4. Probability Distribution
4E. Probability Distribution
Question
Suppose that Alexandra measures extraversion scores for a different population with a mean of 80 and
a standard deviation of 9. What is the probability that a randomly selected person scores higher than
91?
35
Solution
Calculate the z-scores: 𝑧& =
*(+
,
=
-&(.#
-
= 1.22
Look up probabilities in z-table: 𝑧& = 1.22 → 0.8888 (𝑇ℎ𝑖𝑠 𝑖𝑠 𝑡ℎ𝑒 𝑙𝑒𝑓𝑡 𝑠𝑖𝑑𝑒𝑑 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦)
Calculate the probability that score is higher than 91 (right sided probability):
1 − 0.8888 = 0.1112
→ 11.12%
Answers
Question
According to the Central Limit Theorem:
36
A. The sample distribution becomes normal if there is a sufficient sample size (n>25)
B. The sampling distribution becomes normal only when the population distribution is normal
C. Regardless of the shape of the population distribution, the sampling distribution will always be
normal
D. As a sample size increases, the sample mean and standard deviation will be closer in value to the
population mean µ and standard deviation σ
Answer: D
5. Probability Distribution
5E. Probability Distribution
Questions
According to the Central Limit Theorem:
37
Solution
A. Incorrect. It is not the sample distribution that approaches normality when there is a sufficiently
large sample. It is the sampling distribution.
B. Incorrect. The sampling distribution is indeed normal when the population distribution is normal
but it can also approach normality whenever the sample size is suffciently large, regardless of the
population’s shape
C. Incorrect. The sampling distribution is not always normal. For a small sample size, it has a similar
shape to the population distribution and not necessarly normal. For a large sample size, it becomes
approximately normal
D. Correct. As the sample size becomes larger, the mean of all sampled variables and the variances of
the samples become approximately equal to that of the population.
Answers
Question
Maja plans to study the effects of Omega-3 supplements on antisocial behaviour. She develops a
measurement which will be filled by her participants before and after a 2-month long trial during which
subjects will be taking daily omega-3 supplements. However, she has trouble recruiting a high number
of participants.
Given that the sample size is not large enough, which of the following statements is incorrect:
38
A. The sample mean is a biased estimator of the population mean
B. The shape of the sampling distribution will be similar to that of the population distribution
C. The standard error will probably be too high
D. There is a high risk of unreliable statements about population parameters
Answer: A
6. Probability Distribution
6E. Probability Distribution
Question
Maja plans to study the effects of Omega-3 supplements on antisocial behaviour. She develops a
measurement which will be filled by her participants before and after a 2-month long trial during which
subjects will be taking daily omega-3 supplements. However, she has trouble recruiting a high number
of participants.
Given that the sample size is not large enough, which of the following statements is incorrect:
39
Solution
A. This statement is incorrect. Bias is not depended on the size of the sample. We might have an
inaccurate estimate, but if we are using the right one for the population parameter, the estimate is
still unbiased. An estimate will be biased if the estimate is not the appropriate one (e.g., no random
sample)
B. Correct. Since Maja has a small sample size, the sampling distribution has a similar shape to the
population distribution and not necessarly a normnal one.
C. Correct. Based on the C.L.T, the lower the sample size, the greater the standard error
D. Correct. Larger sample sizes allow more reliable statements about population parameters,
compared to small sample sizes.
6E. Probability Distribution
40
Estimator
Something that is used in statistics to estimate some facts about population.
à Sample mean is an estimator of population mean.
Bias
Bias = the difference between the expected value that is estimated and the true
value of the parameter
à The V
𝑿 of a simple random sample is always unbiased.
Efficiency
The accuracy of the sample mean.
à The larger the sample size, the smaller the standard error.
à The smaller the standard error, the more efficient the estimate.
Answers
Question
Alexithymia is a personality trait which features inability to describe, identify and experience
emotions. In a population of people with borderline alexithymia, emotional intelligence scores have a
mean of 57 and a standard deviation of 15. The population distribution is skewed the right. Darian
takes a simple random sample of 32. What is the probability that our sample mean will be between 55
and 60?
41
A. 74.86%
B. 13.11%
C. 64.42%
D. The probability cannot be calculated because the population distribution is skewed
Answer: C
7. Probability Distribution
7E. Probability Distribution
Question
Alexithymia is a personality trait which features
inability to describe, identify and experience
emotions. In a population of people with
borderline alexithymia, emotional intelligence
scores have a mean of 57 and a standard
deviation of 15. The population distribution is
skewed the right. Darian takes a simple random
sample of 32. What is the probability that our
sample mean will be between 55 and 60?
µ = 57
σ = 15
n = 32
à Central Limit Theorem applies (n >25)
42
Solution
Ø Calculate Z-scores
𝑧& =
X
Χ − 𝜇
𝜎
𝑛
=
60 − 57
15
32
= 1.13
z" =
X
Χ − 𝜇
𝜎
𝑛
=
55 − 57
15
32
= −0.75
Ø Look up probabilities in z-table
𝑧& = 1.13 → 87.08%
𝑧" = −0.75 → 22.66%
Ø Calculate the probability that the score is
between 55 and 60:
87.08% − 22.66% = 64.42%
Answers
Question
A certain variable follows a normal population distribution. The population mean is equal to 23.48 and
the standard deviation equal to 4.657. The probability that the sample mean is higher than 24 equals
25.14%.
Calculate the sample size.
43
A. 49
B. 24
C. 36
D. The sample size cannot be calculated
Answer: C
8. Probability Distribution
8E. Probability Distribution
Question
A certain variable follows a normal population
distribution. The population mean is equal to
23.48 and the standard deviation equal to 4.657.
The probability that the sample mean is higher
than 24 equals 25.14%.
Calculate the sample size.
µ = 23.48
σ = 4.657
P( ̅
𝑥 > 24) = 25.14%
44
Solution
Ø We need to see for which Z-score, the
probability of having a sample mean
higher than 24 equals 25.14%
• Since it is a right-sided probability, we
need to substract from 1 (table gives
left-sided probabilities)
• 1-0.2514=0.7486
• We can find the 0.7486 in the table and
it is for the z-score of 0.67
Ø We can use the Z-formula
𝑧 =
X
𝑋 − 𝜇
𝜎
𝑛
0.67 =
24 − 23.48
4.657
𝑛
=
0.52
4.657
𝑛
0.67 =
0.52× 𝑛
4.657
𝑛 =
0.67×4.657
0.52
= 6
𝒏 = 𝟔𝟐 = 𝟑𝟔
Answers
Question
Eero develops a new brand of cherry soda and he has decided on a specific bottle design. The contents
of soda bottles are normally distriuted with a mean of 400 and a standard deviation of 7. There is a
8.38% chance that the average contents of a 4-pack will exceed how many ml?
45
A. 400.12
B. 404.83
C. 407.31
D. 400.60
Answer: B
9E. Probability Distribution
9E. Probability Distribution
Question
Eero develops a new brand of cherry soda and he has decided on a specific bottle design. The contents
of soda bottles are normally distriuted with a mean of 400 and a standard deviation of 7. There is a
8.38% chance that the average contents of a 4-pack will exceed how many ml?
46
Solution
Ø We know that the contents of the soda bottles are normally distributed, thus we can use the Z-table
Ø P( ̅
𝑥>?)=8.38 (right sided probability) ⇔ 1– 0.0838 = 0.9162 ⇔ Z = 1.38
𝑍 =
̅
𝑥 − 𝜇
g
𝜎
𝑛
1.38 =
̅
𝑥 − 400
g
7
4
4.83 + 400 = ̅
𝑥
̅
𝑥 = 404.83
Answers
Question
Leonie wishes to investigate homeslessness experiences in Maastricht. However, there is no list of
homeless people in the city. She decides to use instead a non-random sampling method known as
snowball sampling. Leonie meets one homeless person who participates in her research and also put
her in contact with other homeless people in the area that they know. Using this method she is able to
gather 178 participants.
Which of following statements pertaining to the population estimator is true?
47
A. The estimator is unbiased and efficient
B. The estimator is unbiased and not efficient
C. The estimator is biased and efficient
D. The estimator is biased and not efficient
Answer: C
10. Probability Distribution
10E. Probability Distribution
Question
Leonie wishes to investigate homeslessness experiences in Maastricht. However, there is no list of
homeless people in the city. She decides to use instead a non-random sampling method known as
snowball sampling. Leonie meets one homeless person who participates in her research and also put
her in contact with other homeless people in the area that they know. Using this method she is able to
gather 178 participants.
Which of following statements pertaining to the population mean estimator is true?
48
Solution
Ø Leonie is using a non-random sampling method, meaning that her sample is not random. This can
lead to Leonie using an inappropriate estimator for the population mean which would make her
estimator biased. ’Bias’ has nothing to do with the sample size
Ø Leonie has a sample size of 178 participants which is a sufficiently large sample (C.L.T). Thus, her
estimator for the population mean will indeed be efficient. As the sample size increases, the
standard error decreases
Stats1 – Question Pool
Hypothesis Testing
Answers
Question
A researcher claims that he was able to develop a drug that enhances human attention. He will test this
hypothesis by recruiting 80 individuals with Attention Deficit Disorder (ADD). He divides evenly his
sample into 2 groups and makes sure that the groups are matched in their attention levels. He
continues by administering the drug only in group 1, keeping group 2 as a control. Finally, all
participants across both groups have to complete an Attention Test, with higher scores indicating
worse attention.
What is the researcher’s null and alternative hypothesis?
50
A. H0: µ1= µ2, Hα: µ1 ≠ µ2
B. H0: µ1 ≠ µ2, Hα: µ1< µ2
C. H0: µ1= µ2 Hα: µ1> µ2
D. H0: µ1= µ2 Hα: µ1< µ2
Answer: D
1. Probability Theory
1E. Hypothesis Testing
Question
A researcher claims that he was able to develop a drug that enhances human attention. He will test this
hypothesis by recruiting 80 individuals with Attention Deficit Disorder (ADD). He divides evenly his
sample into 2 groups and makes sure that the groups are matched in their attention levels. He
continues by administering the drug only in group 1, keeping group 2 as a control. Finally, all
participants across both groups have to complete an Attention Test, with higher scores indicating
worse attention.
What is the researcher’s null and alternative hypothesis?
51
Solution
A. Incorrect. The alternative hypothesis indicates a two-sided test (Hα: µ1 ≠ µ2). The researcher wants to test
the hypothesis that the drug enhances human attention, so we are looking for a one-sided test.
B. Incorrect. The null hypothesis always suggests that there is no significant relationship between our data.
In this case, it is the hypothesis that the drug will not have an effect on the mean of group 1 (H0: µ1 =µ2)
C. Incorrect. The alternative hypothesis states that the mean of group 1 should be higher than that of group
2 after the drug administration. However, higher scores mean worse attention levels. Since the researcher
expects that the drug is beneficial, we should be expecting that group 1 has better attention levels than
group 2, thus lower scores
D. Correct. The alternative hypothesis claims that group 2 will have worse attention relative to group 1, as
seen from their higher test scores
Answers
Question
Refer back to the example in question one. The researcher is informed that the population of people
with ADD is skewed to the right. Which of the following statements is correct?
52
A. The researcher can still test his hypothesis because normality is not a necessary condition
B. The researcher can still test his hypothesis because his sample size is large enough
C. The researcher cannot test his hypothesis because there is no normality in the population
D. The researcher cannot test his hypothesis because his sample size is not large enough
Answer: B
2. Hypothesis Testing
2E. Hypothesis Testing
Question
Refer back to the example in question one. The researcher is informed that the population of people
with ADD is skewed to the right. Which of the following statements is correct?
53
Solution
A. Incorrect. In order to be able to test our hypothesis, we need to make sure that we are working with
a normal distribution
B. Correct. The researcher can indeed do the test because he has a large enough sample size, meaning
that the central limit theorem applies (= the sampling distribution approximates a normal
distribution as the sample size gets larger, regardless of the population distribution)
C. Incorrect. Since the central limit theorem applies, we do not need to worry about the skewed
population distribution
D. Incorrect. The sample size is large enough. The cut-off for the central limit theorem to apply is n ≥
25
Answers
Question
Florian believes that a new Artificial Intelligence teaching method can influence student ratings
compared to using human tutors. He is however unsure about what this influence can look like because,
despite the AI’s greater efficiency, students might still prefer human interaction during their tutorials.
Florian then takes a SRS of 27 students from a population of students with a mean rating of µ=30,2 and
a standard deviation of σ=16. The sample of students take a lesson from the AI system and then give it a
rating with a mean of 24,5.
Can Florian conclude that the mean rating of the AI system is significantly different from the mean of
the normal method?
54
A. Yes, we reject the null hypothesis with the p-value of 0.0322
B. Yes, we reject the null hypothesis with the p-value of 0.0644
C. No, we cannot reject the null hypothesis with the p-value of 0.0322
D. No, we cannot reject the null hypothesis with the p-value of 0.0644
Answer: D
3. Hypothesis Testing
3E. Hypothesis Testing
Question
Florian believes that a new Artificial Intelligence teaching
method can influence student ratings compared to using
human tutors. He is however unsure about what this
influence can look like because, despite the AI’s greater
efficiency, students might still prefer human interaction
during their tutorials. Florian then takes a SRS of 27 students
from a population of students with a mean rating of µ=30,2
and a standard deviation of σ=16. The sample of students
take a lesson from the AI system and then give it a rating
with a mean of 24,5. The significance level is 5%
Can Florian conclude that the mean rating of the AI system
is significantly different from the mean of the normal
method?
55
Data
Η0: 𝜇& = 𝜇"
Hα: 𝜇& ≠ 𝜇" (2-tailed test)
α = 0.05
µ = 30.2
σ = 16
n = 27
̅
𝑥 = 24.5
Solution
Ø The sample size is large enough (n=27), so we
can continue with the test
Ø We can use the Z formula to calculate the Zobs
𝑍012 =
X
𝑋 − 𝜇
𝜎
𝑛
=
24.5 − 30.2
16
27
= −1.85
Ø Using the Z-table we see that a Zobs with a
value of -1.85 is matched to a p-value of
0.0322
Ø Since we have a 2-tailed test, we need to
double our p-value
𝑝 − 𝑣𝑎𝑙𝑢𝑒×2
0.0322×2 = 0.0644
Ø We can then compare our p-value to the alpha
0.0644 > 0.05
Ø The p-value is larger than the α, thus the
null hypothesis cannot be rejected
Answers
Question
Suppose that for a two-sided test, an experimenter decides to have a significance level of 0.10.
Which of the following statements is incorrect?
56
A. The Z-critical is going to be equal to ±1.65
B. The probability of a type 1 error is equal to 10%
C. If the null hypothesis is rejected at this level, then it will also be rejected at α=0.05
D. With the current significance level, there is a lower probability of not rejecting a false null
hypothesis compared to a significance level of 0.05
Answer: C
4. Hypothesis Testing
4E. Hypothesis Testing
Question
Suppose that for a two-sided test, an
experimenter decides to have a significance level
of 0.10.
Which of the following statements is incorrect?
57
Solution
A. Correct. In case of a two-sided test with
α=10%, then the Z-critical becomes +/- 1.65
B. Correct. The probability of a type 1 error is
always equal to the significance level of the
study
• Type 1 error = α = 10%
C. Incorrect. If the null hypothesis is rejected at α
= 10%, it does not necessarily mean that it
will be rejected at α = 1%
• E.g., a p-value equal to 0.04 is smaller
than 0.10, however it is not smaller than
0.01. Thus, the H0 would be rejected at α
= 10% but not at α = 1%
D. Correct. By increasing the significance level,
we make the decision criteria more lenient,
making it more difficult to commit a type 2
error. However, we simultaneously increase
the risk of a false positive, that is rejecting a
true null hypothesis
90%
5%
5%
Answers
Question
A questionnaire has been constructed to measure the level of psychopathy for incarcerated individuals.
The population is normally distributed with a mean of 44 and a standard deviation of 12. A researcher
wants to check the hypothesis that the population mean is different, so she draws a SRS of 23
individuals. The sample mean is 53.
What are the boundaries of a 90% confidence interval based on this specific sample?
58
A. [48.87, 57.13]
B. [48.14, 56.90]
C. [43.89, 54.96]
D. [49.63, 52.47]
Answer: A
5. Hypothesis Testing
5E. Hypothesis Testing
Question
A questionnaire has been constructed to measure the level of psychopathy for incarcerated individuals.
The population is normally distributed with a mean of 44 and a standard deviation of 12. A researcher
wants to check the hypothesis that the population mean is different, so she draws a SRS of 23
individuals. The sample mean is 53.
What are the boundaries of a 90% confidence interval based on this specific sample?
59
Solution
H0: µ = 44
Hα: µ≠ 44
µ = 44
σ = 12
n = 23
X
𝑋 = 53
Zc = 1.65 (because it is a 90% CI)
𝑋𝑜𝑏𝑠 ± 𝑍𝑐×
𝜎
𝑛
53 ± 1.65×
12
23
53 − 1.65×
12
23
= 53 − 1.65×2.5 = 48.87
53 + 1.65×
12
23
= 53 + 1.65×2.5 = 57.13
[48.87, 57.13]
Answers
Question
Suppose we have a 95% Confidence Interval [37.2, 42.5].
Calculate the sample mean and the standard error
60
A. X
𝑋 = 40.05, 𝑆𝐸 = 3,39
B. X
𝑋 = 38.74, 𝑆𝐸 = 4.63
C. X
𝑋 = 39.85, 𝑆𝐸 = 1.35
D. X
𝑋 = 41.40, 𝑆𝐸 = 2.22
Answer: C
6. Hypothesis Testing
6E. Hypothesis Testing
Sample Mean
Suppose we have a 95% Confidence Interval
[37.2, 42.5].
Calculate the sample mean and the standard
error.
α = 5%
Zc = 1.96
CI [37.2, 42.5]
V
𝒙 ± 𝒁𝒄×
𝝈
𝒏
V
𝒙 ± 𝟏. 𝟗𝟔×
𝝈
𝒏
61
Standard Error
Ø Confidence interval: x̄012 ± 𝑍3 ∗ g
4
5
Ø From the previous calculations we can see
that:
1.96×
𝜎
𝑛
= ̅
𝑥−37.2
Ø We already found the sample mean, so we can
use it to calculate the fruction:
1.96×
𝜎
𝑛
= 39.85 − 37.2
𝜎
𝑛
=
2.65
1.96
𝜎
𝑛
= 1.35
37.2 = ̅
𝑥 − 1.96×
𝜎
𝑛
1.96×
𝜎
𝑛
= ̅
𝑥−37.2
42.5 = ̅
𝑥 + (1.96×
𝜎
𝑛
)
42.5 = ̅
𝑥 + ̅
𝑥 − 37.2
2 ̅
𝑥 = 42.5 + 37.2
2 ̅
𝑥 = 79.7
̅
𝑥 =
79.7
2
4
𝒙 = 𝟑𝟗. 𝟖𝟓
Standard Error
Answers
Question
Going back to the example of the previous question, what can be said about the null hypothesis, given
that the population mean is equal to 36.05?
62
A. The null hypothesis is accepted
B. The null hypothesis is rejected
C. The null hypothesis cannot be rejected
D. Nothing can be said about the null hypothesis with the current data
Answer: B
7. Hypothesis Testing
7E. Hypothesis Testing
Question
Going back to the example of the previous question, what can be said about the null hypothesis, given
that the population mean is equal to 36.05?
63
Solution
A. Incorrect. When doing a hypothesis test, we can either reject the null hypothesis or do not reject
the null hypothesis, but we can never accept the null hypothesis. We cannot conclude that the null
hypothesis is true merely because we did not find evidence to reject it
B. Correct. We can see that for our 2-tailed test, the population mean is not included within the range
of the 90% CI, so the null hypothesis is rejected
C. Incorrect. Since the population mean is not included in the confidence interval, the null hypothesis
is rejected
D. Incorrect. The second statement is correct.
7E. Hypothesis Testing
64
Condifence
Interval
Ø A confidence interval is an interval estimate of µ.
Ø It shows the values that the population mean probably falls between
V
𝑿 ± 𝒁𝒄×
𝝈
𝒏
Interpretation
Example: 95% Confidence Interval
Ø If we draw infinite Confidence Intervals, then 95% of those CI have the
population mean µ
Hypothesis
Testing
Ø We can use the confidence interval to see if the null hypothesis is rejected or
not for a two-tailed test
Ø If the population mean from the null hypothesis is located inside the interval,
then the null hypothesis cannot be rejected because the specific value is a
possible population mean
Ø If the population mean from the null hypothesis is not located inside the
interval, the null hypothesis is rejected
Answers
Question
Tobias investigates the effects of participative leadership on satisfaction levels within employees.The
sample mean is equal to 73.8. The boundaries of the 95% confidence interval are [71.4, 76.5].
Calculate the margin of error and the standard error.
65
A. ME = 5.7, SE = 1.22
B. ME = 2.4, SE = 1.22
C. ME = 2.9, SE = 3.91
D. ME = 2.4, SE = 4.75
Answer: B
8. Hypothesis Testing
8E. Hypothesis Testing
Question
Tobias investigates the effects of participative leadership on satisfaction levels within employees.The
sample mean is equal to 73.8. The boundaries of the 95% confidence interval are [71.4, 76.5].
Calculate the margin of error and the standard error
66
Solution
X
𝑋 = 73.8
95% 𝐶. 𝐼 → [71.4, 76.5]
Zcritical = 1.96
Margin of error:
L
𝑋 ± 𝑍5×
𝜎
𝑛
L
𝑋 − 𝑍5×
𝜎
𝑛
= 71.4
𝑍5×
𝜎
𝑛
= L
𝑋 − 71.4 = 73.8 − 71.4
𝑍5×
𝜎
𝑛
= 2.4
Standard error:
𝑍6×
𝜎
𝑛
= 2.4
𝜎
𝑛
=
2.4
𝑍6
=
2.4
1.96
= 1.22
Answers
Question
Kian is the HR manager for Success Formula. He noticed that the employees are lately having more
stress than usual, so he decides to evaluate their stress levels using a measurement scale (less points =
less stress). On average, the 26 employees had a stress score of 83 with a standard deviation of 17 . Kian
then decided to implement a mindfulness program with the goal of reducing stress scores by 8 points.
The significance level is 5%
What is the power of the test, given that the mindfulness program works as Kian was expecting?
67
A. 0.7734
B. 0.2266
C. 0.6066
D. 0.7123
Answer: B
9. Hypothesis Testing
Question
Kian is the HR manager for Success Formula. He
noticed that the employees are lately having
more stress than usual, so he decides to evaluate
their stress levels using a measurement scale
(less points = less stress). On average, the 26
employees had a stress score of 83 with a
standard deviation of 17 . Kian then decided to
implement a mindfulness program with the goal
of reducing stress scores by 8 points. The
significance level is 5%
What is the power of the test, given that the
mindfulness program works as Kian was
expecting?
H0: µ = 83
Ηα: µ < 83
Zc = -1.65
α = 0.05
n = 26
σ = 17
µ = 83
µ (new) = 75
68
Answer
Ø Find the critical value
𝑍3 =
𝑋3 − 𝜇
𝜎
𝑛
−1.65 =
Χ3 − 83
17
26
−5.49 = 𝑋3 − 83 ⇒ 𝑋3 = 77.51
Ø Solve for Z
𝑍3 =
𝑋3 − 𝜇(𝑛𝑒𝑤)
𝜎
𝑛
Z =
77.51 − 75
17
26
= 0.75
Ø Find the β
• Using the Z-table, we find a p-value of
0.7734
Ø To calculate the power we use the formula:
𝑷𝒐𝒘𝒆𝒓 = 𝟏 − 𝜷
𝑷𝒐𝒘𝒆𝒓 = 𝟏 − 𝟎. 𝟕𝟕𝟑𝟒 = 𝟎. 𝟐𝟐𝟔𝟔
9E. Hypothesis Testing
9E. Hypothesis Testing
69
Type II
Error
Ø Definition: We fail to reject a false null hypothesis
Ø Measured by β
Ø Calculation:
• Find the critical value where 𝑯𝒐 would be rejected.
• 𝑍5 =
𝑿𝒄78"
9
#
$
à solve for 𝑿𝒄
• Z =
𝑿𝒄78%
9
#
$
à solve for Z, then look up P
Power
Ø Definition: The probability that we are able to reject a false null hypothesis
Ø Calculation:
• Power = 1 - 𝜷
Illustration
Answers
Question
Suppose Micheal is conducting an experiment on fear conditioning. He uses a sample of 65 participants
and a significance level of 5%. Before he begins, he wants to make sure that the probability of rejecting
a true null hypothesis is as small as possible.
Which of the following statements is correct?
70
A. He should increase his sample size
B. He should increase the effect size
C. He should increase the significance level
D. None of the above
Answer: D
10. Hypothesis Testing
10E. Hypothesis Testing
Questions
Suppose Micheal is conducting an experiment on fear conditioning. He uses a sample of 65 participants
and a significance level of 5%. Before he begins, he wants to make sure that the probability of rejecting
a true null hypothesis is as small as possible.
Which of the following statements is correct?
71
Solution
A. Incorrect. By increasing the sample size, we decrease the standard error and thus the probability of
not rejecting a false null hypothesis (Type II error)
B. Incorrect. Increasing the effect size is difficult in real life since researchers do not have any control
over it. Theoretically, the higher the effect size, the lower the probability of failing to reject a null
hypothesis (Type II error)
C. Incorrect. By increasing the significance level, it becomes easier to reject a null hypothesis. We
increase the probability of rejecting a true H0 hypothesis (Type I error)
D. None of the above alternatives is correct. Rejecting a true null hypothesis is the Type I error and its
probability is measured by α. We can reduce the probability by reducing the α, but this increases the
probability of type II error (Nor recommended)
Stats1 - Question Pool
T-tests
72
Answers
Question
A randomly drawn sample of 60 university students undergo exam training. Before the training, their
mean score on a practice exam was 68. After the training, their mean score improved by 7 points. What
(t-)test would you employ to check if the exam training had a significant effect?
73
A. One-sample t-test
B. Paired samples t-test
C. Independent samples t-test
D. Two-sample t-test
Answer: B
1. T-tests
1E. T-tests
Question
A randomly drawn sample of 60 university students undergo exam training. Before the training, their
mean score on a practice exam was 68. After the training, their mean score improved by 7 points. What
(t-)test would you employ to check if the exam training had a significant effect?
74
Solution
A. Incorrect, we compare two dependent samples not the one sample against the population.
B. Correct, the groups are paired since we test the sample twice (before and after exam training).
C. Incorrect, the two groups are not independent, they are dependent.
D. Incorrect, a two-samples t-test is an independent t-test. The groups were dependent, not
independent.
Answers
Question
When testing a null hypothesis about a single population mean, a t-test is usually performed rather
than a z-test. A t-test is more likely to be employed because…
75
A. A t-test has more power than a z-test, leading to a more reliable result.
B. Quantitative variables can only be analysed with t-tests.
C. Z-tests are more prone to type I errors, which are to be avoided.
D. In practice, the standard deviation of a population is rarely known.
Answer: D
2. T-tests
2E. T-tests
T-tests
When to use a t-test?
When we can’t use the z-scores because, σ
(population standard deviation) is unknown
• We have to estimate for both parameters.
• We use an extra estimate (Sx)
• T-distribution is more dispersed relative to
the z-distribution
• T-test is always less powerful
76
Z-tests
Z-tests measure of how many standard deviations
our sample (V
𝑿) differs from the hypothesized
value of the population mean (𝝁).
• Makes use of the z-distribution
• More powerful than a t-test
• Most times cannot be used, since in reality
we do not know much about the
parameters of the population
Answers
Question
A researcher is interested in the effect of wearing red lipstick on the score at minigolf. They ask 40
people to wear red lipstick while playing 18 holes on the minigolf court. 70 people played the same 18
holes without wearing red lipstick. The dependent variable is the obtained score after the 18 holes (a
lower score is considered to be better). The red lipstick condition had a mean score of 47.5 and a
standard deviation of 4.3. The no-red lipstick condition had a mean score of 62 and a standard
deviation of 9.2.
Which test should the researcher use to test the hull hypothesis that the score at minigolf is not
affected by wearing red lipstick?
77
A. An independent samples t-test, assuming unequal population variances.
B. An independent samples t-test, assuming equal population variances.
C. A paired samples t-test.
D. A one-sample t-tests.
Answer: A
3. T-tests
3E. T-tests
Question
A researcher is interested in the effect of wearing red lipstick on the score at minigolf. They ask 40
people to wear red lipstick while playing 18 holes on the minigolf court. 70 people played the same 18
holes without wearing red lipstick. The dependent variable is the obtained score after the 18 holes (a
lower score is considered to be better). The red lipstick condition had a mean score of 47.5 and a
standard deviation of 4.3. The no-red lipstick condition had a mean score of 62 and a standard
deviation of 9.2.
Which test should the researcher use to test the hull hypothesis that the score at minigolf is not
affected by wearing red lipstick?
78
Solution
A. Correct. The 2 groups are independent, and we compare their samples. The goal of the test is to
check if the 2 samples come from populations with equal means. We see that the rule of thumb
(𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑆𝐷 ×2 > 𝐵𝑖𝑔𝑔𝑒𝑟 𝑆𝐷) does not hold and the groups don’t have equal sample sizes. This
means we have to do the t-test without assuming equal variances
B. Incorrect. We cannot assume equal variances because the rule of thumb is violated and the group
sizes are not equal
C. Incorrect. Paired samples t-test requires matched groups or a within-subject design.
D. Incorrect. One sample t-test is used when we have 1 population and want to check if its mean is
equal to a specific value.
3E. T-tests
Assumption T-Test Concerned How to Determine What if Violated
Normality All T-Tests
1. Histogram of
Sample Scores looks
normal
2. Sample Size is
large (Central Limit
Theorem)
Can’t do T-test
Quantitative All T-Tests
Dependent variable is
quantitative
Can’t do T-test
Dependent Groups Paired T-Test
The groups are
matched
Two-Sample T-test
Independent Groups Two-Samples T-Test
Two separate groups
are measured.
Paired T-test
Equal Variance Two-Samples T-Test
1. One sample SD is
not 2x bigger than
the other. (Rule of
Thumb).
2. Levene’s Test is not
significant.
3. The sample sizes
are equal.
If the assumption is
violated Two-Sample
T-test not assuming
Equal variance has to
be used.
à Less powerful
79
Answers
Question
The effect of Ritalin on test performance is tested. 31 participants received a Ritalin pill while another
31 participants received a placebo. The test performance is assumed to be good if the score on the test
is high. The null hypothesis is that exam performance is the same both under Ritalin and placebo, while
the alternative hypothesis is that Ritalin leads to better test performance. The table below presents the
group statistics, computed by SPSS (equal variances assumed).
What statement is incorrect?
80
A. The means of the two populations are very similar. However, a visual inspection of the group
statistics is not enough to reject the null hypothesis.
B. The equal variances assumption is violated, thus we should not interpret the test
C. The equal variances assumption is not violated, thus we can interpret the test
D. During the t-test, we should compute the weighted average of the two standard deviations
Answer: B
4. T-tests
condition N Mean Std. Deviation Std. Error Mean
Test score placebo 31 10.1182 1.9463 .1699
Ritalin 31 10.9374 2.2824 .4099
4E. T-tests
Question
The effect of Ritalin on test performance is tested. 31 participants received a Ritalin pill while another
31 participants received a placebo. The test performance is assumed to be good if the score on the test
is high. The null hypothesis is that exam performance is the same both under Ritalin and placebo, while
the alternative hypothesis is that Ritalin leads to better test performance. The table below presents the
group statistics, computed by SPSS.
What statement is incorrect?
81
Solution
A. Correct. Sample means are random variables, meaning they change depending on the sample. Thus
in order to be able to make conclusions about the populations we need to make sure whether the
differences between the means are indeed significant.
B. Incorrect. The equal variances assumption is not violated. We can check this using the rule of
thumb (biggest SD < smallest SD x 2)
C. Correct. Using the rule of thumb, we can see that the product of the smallest SD multiplied by 2 is
bigger than the bigger SD (Ritalin group), thus the assumption is not violated
D. Correct. Since the equal variances assumtpion is not violated, the 2 standard deviations estimate
the same population standard deviation. By computing their weighted average (pooled SD), we have
the best estimate of σ
condition N Mean Std. Deviation Std. Error Mean
Test score placebo 31 10.1182 1.9463 .1699
Ritalin 31 10.9374 2.2824 .4099
4E. T-test
82
Checking
Equal Variances
Assumption
We can use 2 ways to check for the assumption
1. Rule of Thumb
– Smaller SDx2 should be larger than the Bigger SF
2. Levene’s Test
– If the test is significant, the variances are unequal (H0: 𝜎;
4
= 𝜎4
4
)
Violation of
Assumption
If this assumption is violated, we can continue with the t-test if the sample size
across both samples is approximately equally large
Special case
If there is violation AND the samples have a difference in size, we can do the t-test
but only with the following formula:
𝑡 =
x̅! − x̅" − (𝜇!− 𝜇")
𝑠!
"
𝑛!
+
𝑠"
"
𝑛"
If H0: 𝜇! = 𝜇" → = 0
Answers
Question
Natalia is a memory researcher and as part of her pilot study, she wishes to test the differences in
memory recall between severe anxiety patients and controls. She suspects that anxiety patients will
have different memory recall scores compared to controls. After a memory test, she compares the
scores of the groups. The anxiety group has a mean of 12.6 and a standard deviation of 3.38. The
control group has a mean of 13.4 and a standard deviation of 2.61. There are 70 participants in total,
equally divided into the 2 groups.
What can Natalia conclude about the null hypothesis.
83
A. The null hypothesis is not rejected with 0.10 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.15
B. The null hypothesis is rejected with 0.01 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.05
C. The null hypothesis is not rejected with 0.20 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.30
D. The nyll hypothesis is rejected with 0.02 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.025
Answer: C
5. T-tests
5E. T-tests
Question
Natalia is a memory researcher and as part of her
pilot study, she wishes to test the differences in
memory recall between severe anxiety patients
and controls. She suspects that anxiety patients
will have different memory recall scores
compared to controls. After a memory test, she
compares the scores of the groups. The control
group has a mean of 13.4 and a standard
deviation of 2.61. The anxiety group has a mean
of 12.6 and a standard deviation of 3.38. There
are 70 participants in total, equally divided into
the 2 groups.
What can Natalia conclude about the null
hypothesis.
H0: µ1=µ2
Hα: µ1≠ µ2
n1=n2=35
X1=13.4
X2= 12.6
S1=2.61
S2=3.38
84
Solution
Ø Since equal variances assumed, we need to
calculate the pooled standard deviation
𝑠#=
𝑛! − 1 𝑠!
" + (𝑛" − 1)𝑠"²
(𝑛!−1) + (𝑛" − 1)
𝑆𝑝 =
34 < 2.61" + 34 < 3.38"
34 + 34
= 3.02
Ø Next, we need to calculate the Tobs
𝑇 =
@
𝑋! − @
𝑋"
𝑆𝑝 <
1
𝑛1
+
1
𝑛2
𝑇 =
13.4 − 12.6
3.02 <
1
35
+
1
35
𝑇 =
0.8
3.02 < 0.24
= 1.11
Ø Using the t-table we see that the p-value is
between the 0.10 and the 0.15. For a 2-
tailed test, we need to double these values
0.20 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.30
Bigger SD < Smallest SD x 2
3.38 < 2.61 x 2
3.38 <5.22 (True)
à Equal variances assumed
Answers
Question
85
A. [-6.52, -3.88]
B. [-6.34; -4.59]
C. [-6.50; -4.0]
D. [-7.29;-3.91]
Answer: A
6. T-tests
An ice cream company has two new potential flavours ready for the market. They developed a tastiness
scale scored from 0 to 30. 40 volunteers tasted flavour A and another 25 volunteers tasted flavour B.
The obtained values are: @
𝑋$= 22.8, @
𝑋% = 28, 𝑠$ = 4.2 and 𝑆% = 1.9.
What is the 95% Confidence Interval corresponding to this t-test?
6E. T-tests
Question
An ice cream company has two new potential
flavours ready for the market. They developed a
tastiness scale scored from 0 to 30. 40 volunteers
tasted flavour A and another 25 volunteers tasted
flavour B. The obtained values are: @
𝑋$= 22.8,
@
𝑋% = 28, 𝑠$ = 4.2 and 𝑆% = 1.9.
What is the 95% Confidence Interval
corresponding to this t-test?
nA=40
nB=25
@
𝑋$= 22.8
@
𝑋% = 28
𝑆$ = 4.2
𝑆% = 1.9
86
Solution
Ø We are dealing with 2 independent groups,
thus we should have an independent samples
t-test
Ø We have to decide if the assumption of equal
variances is violated, in order to use the
correct fomrulas
𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑆𝐷 ×2 > 𝐵𝑖𝑔𝑔𝑒𝑟 𝑆𝐷
1.9×2 > 4.2
3.8 > 4.2 𝑁𝑜𝑡 𝑡𝑟𝑢𝑒
Ø The equal variances assumption is violated,
thus we use the special case of the t-test
@
𝑋! − @
𝑋" ± 𝑇 <
𝑠!
"
𝑛!
+
𝑆"
"
𝑛"
22.8 − 28 ± 1.711 <
4.2"
40
+
1.9"
25
−5.2 ± 1.711 < 0.5854
−5.2 ± 1.711 < 0.77
−5.2 ± 1.32
[−6.52, −3.88]
6E. T-tests
Confidence Interval: General Formula
Observed X±𝑡& ∗ Standard Error
Example: Two-Sample T-Test
N=20 (both conditions), @
𝑋$= −2.1, @
𝑋% = −3.5,
𝑠$ = 2.05 and 𝑆% = 1.89. What is the 95% CI?
@
𝑋$ − @
𝑋% ±𝑡& ∗ (𝑠# ∗
!
'!
+
!
'"
)
𝑠#=
!(∗".+,"-!(∗!..( ²
!(-!(
= 1.97
1.4±2.04*(1.97 ∗
!
"+
+
!
"+
)
= [0.13;2.67]
Standard Errors of The Different T-Tests
One-Sample T-test
T
𝑠
𝑛
Paired Sample T-test
T
𝑠0
𝑛
Two-Sample T-test
𝑠# ∗
1
𝑛!
+
1
𝑛"
Pooled Standard Deviation:
𝑠#=
'!1! 2!
"-('"1!)2"²
('!1!) - ('"1!)
Two-Sample T-test
Equal variance not assumed
𝑠!
"
𝑛!
+
𝑠"
"
𝑛"
87
Answers
Question
Suppose we are testing the null hypothesis that the population mean is equal to a specific value and the
test is right sided. Refer to the SPSS output.
Which of the following statements is correct?
88
A. The null hypothesis is rejected for a significance level of 2.5%
B. The null hypothesis is not rejected for a significance level of 5%
C. The degrees of freedom were found by taking the smallest sample size and subtracting 1
D. None of the alternatives is correct
Answer: A
7. T-tests
Test Value = 570
t df Sig. (2-
tailed)
Mean
Difference
95% Confidence Interval
Lower Upper
Test score 2.139 29 0.041 20.333 0.89 39.77
7E. T-tests
Question
Suppose we are testing the null hypothesis that the population mean is equal to a specific value and the
test is right sided. Refer to the SPSS output.
Which of the following statements is correct?
89
Solution
A. Correct. The SPSS output gives the p-value for a two-sided test. However, we have a one-tailed test
(right sided test means that the alternative hypothesis has the (<) symbol). Thus, we need to divide
the p-value by two (0.041/2=0.0205). We can now see that the corrected p-value is smaller than
0.025, thus the H0 is rejected at an α = 2.5%
B. Incorrect. The corrected p-value is smaller than 0.05 as well. Thus, the H0 is rejected at α = 5% as
well.
C. Incorrect. Since we have a one sample t-test, the formula for the degrees of freedom is N-1. It is for
an independent samples t-test, not assuming equal variances that we take the smallest n and
subtract 1 for the df
D. Incorrect. A is the correct one
Test Value = 570
t df Sig. (2-
tailed)
Mean
Difference
95% Confidence Interval
Lower Upper
Test score 2.139 29 0.041 20.333 0.89 39.77
Answers
Question
A researcher wants to test whether ethnic background influences IQ scores of Dutch primary school
children. They draw a sample of 50 children with grandparents of Turkish origin and another 50
children with Dutch grandparents. Each child of Turkish descend is match for age and sex with a Dutch
one. The groups data is summarized in the table below.
A paired sample t-test was used to test this hypothesis. Which of the following tests could have yielded
the same result?
90
Mean N Std. Deviation Std. Error Mean
Turkish 98.657 50 10.0023 1.6523
Dutch 103.203 50 14.5602 2.2436
A. An independent t-test, assuming equal population variances.
B. An independent t-test, assuming unequal population variances.
C. A one-sample t-test, conducted for the difference in IQ score between matched children.
D. None of the answer above.
Answer: C
8. T-tests
8E. T-tests
Question
A researcher wants to test whether ethnic background influences IQ scores of Dutch primary school
children. They draw a sample of 50 children with grandparents of Turkish origin and another 50
children with Dutch grandparents. Each child of Turkish descend is match for age and sex with a Dutch
one. The groups data is summarized in the table below.
A paired sample t-test was used to test this hypothesis. Which of the following tests could have yielded
the same result?
91
Solution
A. Incorrect, the two groups are match, so they are dependent, not independent.
B. Incorrect, the two groups are match, so they are dependent, not independent.
C. Correct, a paired samples t-test compares the means of the samples to check whether there is a
difference between their means. The 2 tests have the same calculations, thus if one finds the
mean differences and then performs a one sample t-test on the differences, they would get the
same result.
D. Incorrect. Answer is C
Mean N Std. Deviation Std. Error Mean
Turkish 98.657 50 10.0023 1.6523
Dutch 103.203 50 14.5602 2.2436
Answers
Question
Inspect the given
output.
What answer is
Correct?
92
A. Lavene’s Test is not significant, therefore equal variances can be assumed.
B. The Tobs is equal to -2.845
C. According to the t-table, the null hypothesis is rejected
D. All answers are correct.
Answer: D
9. T-tests
?
?
?
?
9E. T-tests
Question
Inspect the given
output.
What answer is
Correct?
93
Solution
A. Correct. Levene’s Test has the null hypothesis that the population variances are equal (𝜎!
"
= 𝜎"
"
).
Since we can see that the p-value is a lot larger than 0.05 (p-value = 0.582), we can say that the null
hypothesis is not rejected and that there is no violation of the equal variances assumption
B. Correct. We can calculate the Tobs by dividing the Mean difference ( ̅
𝑥! − ̅
𝑥" = −14.00) by the Std.
Error difference (𝑠# ∗
!
'!
+
!
'"
= 4.92). This will give us -2.845
C. Correct. The null hypothesis in this case is rejected because the value 0 is not located in the 95% CI,
meaning that the population difference between the 2 groups cannot be 0
?
?
?
?
Answers
Question
Florian is the GM of Success Formula and has recently heard that colour can influence learning
performances and outcomes. He was informed that research has shown that the colour blue leads to
better performances in tests and better recall. The classes at SF however are painted in white. Florian
decides to test if indeed the colour blue leads to better results compared to white. He gathers 38
students and assigns them to 2 groups. The groups are matched together in regards to skill, age,
motivation and more. One group takes the class in a room painted white, while the second group in a
room painted blue. The test score means afterwards are compared. The population distribution of
difference scores is normal.
Florian gets the following SPSS output. Which statement is correct?
94
10. T-tests
Paired Differences
Mean Std.
Deviation
Std.
Error
Mean
95% CI T df Sig
(2-
tailed)
Lower Upper
Pair 1. White
-
Blue
-.579 2.524 .579 -1.795 .637 -1.000 18 .331
Answers
Question
Florian is the GM of Success Formula and has recently heard that colour can influence learning
performances and outcomes. He was informed that research has shown that the colour blue leads to
better performances in tests and better recall. The classes at SF however are painted in white. Florian
decides to test if indeed the colour blue leads to better results compared to white. He gathers 38
students and assigns them to 2 groups. The groups are matched together in regards to skill, age,
motivation and more. One group takes the class in a room painted white, while the second group in a
room painted blue. The test score means afterwards are compared. The population distribution of
difference scores is normal.
Florian gets the following SPSS output. Which statement is correct?
95
A. There is a probability of 0.331 that the H0 is true
B. The researcher might be making a Type I error
C. The researcher might be making a Type II error.
D. Since the TOBS is not located within the 95% CI, the null hypothesis can be rejected
Answer: C
10. T-tests
10E. T-tests
Question
Florian is the GM of Success Formula and has recently heard that colour can influence learning
performances and outcomes. He was informed that research has shown that the colour blue leads to
better performances in tests and better recall. The classes at SF however are painted in white. Florian
decides to test if indeed the colour blue leads to better results compared to white. He gathers 38
students and assigns them to 2 groups. The groups are matched together in regards to skill, age,
motivation and more. One group takes the class in a room painted white, while the second group in a
room painted blue. The test score means afterwards are compared. The population distribution of
difference scores is normal.
Florian gets the following SPSS output (next slide). Which statement is correct?
96
Solution
A. Incorrect. The p-value is 0.331 and it is defined as the probability that our data (or more extreme
data) would have occurred, given that the null hypothesis is true. The p-value does not give the
probability that H0 is true. It is the conditional probability with the condition that H0 is true
B. Incorrect. Type 1 error is defined as rejecting a true null hypothesis. However, our p-value is larger
than 0.05, thus we did nor reject the null hypothesis in the first place. The probability that we are
making a Type 1 error in this case is 0%
C. Correct. Type 2 error is defiened as not rejecting a false null hypothesis. Since the p-value is larger
than our significance level, we did reject H0, but there is always the chance that we made an error
D. Incorrect. While using the CI to see if the H0 is rejected or not for a paired samples t-test, we need
to see if the value 0 is located in the interval, not the Tobs. This is becausle the null hypothesis states
that there is no difference.
10E. T-tests
Type I Error
The null hypothesis is true but we reject it.
à Measured with α
97
Graphical Illustration
Type II Error
The null hypothesis is false but we fail to reject it.
à Measured by β
Stats I – Question Pool
ANOVA
98
Answers
Question
ANOVA assumes the following statistical model: 𝑌𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝜀𝑖𝑗, in which Yij denoting the score
of person j in group i.
Choose the incorrect statement from below:
99
A. µ1= Yij - 𝜀𝑖𝑗 represents the mean of group 1
B. εij has a different value for each individual participant, regardless of treatment effects.
C. µ is a variable effect, specific to each participant.
D. If there is no treatment effect, αi is equal among all participants.
Answer: C
1. ANOVA
1E. ANOVA
Question
ANOVA assumes the following statistical model: 𝑌𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝜀𝑖𝑗, in which Yij denoting the score
of person j in group i.
Choose the incorrect statement from below:
100
Solution
A. Correct. The difference between the individual score from the group mean is a great indicator of the
unexplained variation caused by factors not controlled. It can be written as 𝜀𝑖𝑗 = Yij − 𝜇5 ⇔ 𝜇5 =
Yij − 𝜀𝑖𝑗
B. Correct. Individual differences are uncontrollable factors that result in the divergence of scores of
participants within the same groups. For each participant, regardless the treatment effects, the
individual differences/residual factors are different
C. Incorrect, µ is a constant effect. It refers to the factors that are the same in all conditions. It stays
the same for each subject.
D. Correct, if there is no treatment effect, 𝛂𝐢 (for all participants) = 0.
1E. ANOVA
Main Formula
𝐘𝐢𝐣 = 𝛍 + 𝛂𝐢 + 𝛆𝐢𝐣
101
Sum of Squares
∑(𝒀𝒊𝒋 -Ӯ)² = ∑𝒊𝒏𝒊(Ӯ𝒊- Ӯ)² + ∑(𝒀𝒊𝒋 - Ӯ𝒊)²
Participant
j in group i
Constant
effect
Effect
of group i
Effect of remaining
factors of participant
j in group i (error)
= + +
Total sum
of squares
(TSS)
Between
group sum
of squares
(SSG)
Within
group sum
of squares
(SSE)
= +
1E. ANOVA
Example
SSG (Between Groups)
SSG = ∑5𝑛5(Ӯ5- Ӯ)²
SSG = 3*(2-4)²+3*(4-4)²+3*(6-4)²
SSG = 24
Tip: Alternative notation of 𝛼5= µ5 - µ
Here µ5=Ӯ5 (mean of single group) and µ=Ӯ (total
mean).
Preparation
What is the mean of each group
Ӯ!= (1+2+3)/3 = 2
Ӯ"= (3+4+5)/3 = 4
Ӯ7= (4+5+6)/3 = 6
What is the total mean?
Ӯ = (2+4+6)/3 = 4
SSW (Within Groups)
SSW = ∑(𝑌58 - Ӯ5)²
SSW = (1-2)²+(2-2)²+(3-2)²+(3-4)²…+(7-6)²
SSW = 6
Tip: Alternative notation of 𝜀58= 𝑌58 - µ5
Here µ5 is the same as Ӯ5. Both describe the mean
of a single group.
G1 G2 G3
P1 1 3 5
P2 2 4 6
P3 3 5 7
3 different conditions with 3 participants each
102
Answers
Question
Participants were asked to memorise a list of words. They were divided into several groups, each using a
different memorization technique. 60 minutes later, the experimenter assessed how many words they
could still remember (the dependent variable RECALL in the output). Which statement is correct?
103
A. The experimental setting had 3 conditions.
B. The total variance equals 4.91
C. The ANOVA test is significant (𝛂= 5%).
D. All answer are correct.
Answer: D
2. ANOVA
41.566
41.850
83.416
20.783
2.790
2E. ANOVA
104
Question Solution
A. Correct. The degrees of freedom between
groups is given by the formula 𝑘 − 1.
à Degrees of freedom for “between
groups” is equal to “number of groups
minus 1” (k-1). In our case we had 3
conditions so df=(3-1) = 2
B. Correct. The total variance can be found by
the formula 𝑀𝑆9:9;< =
==9
>?#
=
.7.@!A
!B
= 4.91
C. Correct. The ANOVA SPSS output has a p-
value of 0.006 for an F=7.447. The p-value is
smaller than the significance level 5%, thus
the test is significant.
D. Yes, they are all correct.
Participants were asked to memorise a list of
words. They were divided into several groups,
each using a different memorization technique.
60 minutes later, the experimenter assessed how
many words they could still remember (the
dependent variable RECALL in the output).
Which statement is correct?
41.566
41.850
83.416
20.783
2.790
Answers
Question
A sample of n= 35 participants was randomly selected from UM students pool. A baseline assessment
rated their arachnophobia. After undergoing 2 sessions of exposure therapy (to spiders), their
arachnophobia was measured again with the same scale. The researcher wants to see if the 2 sessions of
exposure therapy had a significant effect.
Should an ANOVA test be performed on this data set?
105
A. Yes, the normality assumptions hold since the sample size is big enough.
B. Yes, the equal variances assumptions is met because 35 participants were tested both times.
C. No, The independence assumption is violated.
D. Yes, the data is quantitative as their phobia is rated on scale.
Answer: C
3. ANOVA
3E. ANOVA
Answers
A sample of n= 35 participants was randomly selected from UM students pool. A baseline assessment
rated their arachnophobia. After undergoing 2 sessions of exposure therapy (to spiders), their
arachnophobia was measured again with the same scale. The researcher wants to see if the 2 sessions of
exposure therapy had a significant effect.
Should an ANOVA test be performed on this data set?
106
Solution
A. Correct, but the main criteria for an ANOVA: independent groups is violated. Thus, an ANOVA is
not the suitable test here.
B. Incorrect, the same sample is tested twice (baseline and after exposure). We are not comparing
independent groups.
C. Correct, the same sample is tested twice (baseline and after exposure). We are not comparing
independent groups.
D. Correct, but the main criteria for an ANOVA: independent groups is violated. Thus, an ANOVA is
not the suitable test here.
Answers
Question
An experiment on the effect of listening to music on information retention is performed. A total sample
of 75 is divided into three equally large groups. All three groups are asked to memorized a list of words
while either (a) listening to Vivaldi, (b) listening to AC/DC, or (c) listening to crickets singing.
An analysis of variance is performed. It is concluded that the null hypothesis cannot be rejected.
What statement is correct?
107
A. MSG and MSE are both unbiased estimators of the error variance.
B. Since the null hypothesis is true, then the difference between groups is as large as difference within
groups.
C. There is no group effect.
D. All are correct
Answer: D
4. ANOVA
4E. ANOVA
Question
An experiment on the effect of listening to music on information retention is performed. A total sample
of 75 is divided into three equally large groups. All three groups are asked to memorized a list of words
while either (a) listening to Vivaldi, (b) listening to AC/DC, or (c) listening to crickets singing.
An analysis of variance is performed. It is concluded that the null hypothesis cannot be rejected.
What statement is correct?
108
Solution
A. Correct. When H0 is rejected, it means that the difference between groups was caused by
uncontrolled factors (error). This means that the MS(G) is an unbiased estimator of error variance.
MSE is an unbiased estimator of error variance in any case.
B. Correct. The difference between groups is measured by MSG while the difference within groups is
measured by MSE. In the case of a true null hypothesis, both MSE and MSQ are unbiased estimators
of error variance, thus MSE=MSG
C. Correct. The H0 for ANOVA states that the means of all groups are equal, meaning that there is no
treatment effect.
D. Correct
4E. ANOVA
109
Unbiased
Estimator
• MSE is an unbiased estimator of error variance.
Pooled Variance
• Since we already have the assumption that all populations have equal variance,
we can take the average of estimates.
𝑆𝑝" =
𝑁! − 1 ×𝑆!
"
+ 𝑁" − 1 ×𝑆"
"
+. . +(𝑁' − 1)×𝑆'
"
𝑁! − 1 + 𝑁" − 1 +. . +(𝑁' − 1)
Conclusion
• MSE = Sp
2
• Accurate and efficient error estimate.
4E. ANOVA
Random Variables
MSG and MSE count as random variables.
MSE and MSG as Estimators of Error Variance
If there is no group effect (𝐻+: true) MSE as well as MSG count as unbiased estimations of the error
variance.
Relation of MSE and MSG
MSE is the error (or noise)
MSG is the error + the effect of the group.
If 𝐻+ is true and there is no effect of the group
MSE and MSG will be approximately equal.
Another way to phrase this would be, the
difference between groups is as large as
difference within groups.
110
Answers
Question
Synesthesia is a perceptual phenomeneon in which there is an experience of 2 sensory/cognitive
pathways. Synesthesia has been linked to enhanced memory skills due to increased association
available. Anton wanders if there is a difference in memory recall between different synesthesia types.
He gathers 120 participants and within his sample, there are 4 different synesthesia types. Each group
has an equal number of participants. After a memorization period, Anton gives his participants a
memory test. Following an ANOVA, SSG = 167.91 and SSE = 1760.88
What can be concluded?
111
A. H0 not rejected with p-value > 0.05
B. H0 rejected with 0.025 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.05
C. H0 rejected with 0.01 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.025
D. Ho not rejected because Fobs< Fcritical
Answer: C
5. ANOVA
5E. ANOVA
Question
Synesthesia is a perceptual phenomeneon in
which there is an experience of 2
sensory/cognitive pathways. Synesthesia has
been linked to enhanced memory skills due to
increased association available. Anton wanders if
there is a difference in memory recall between
different synesthesia types. He gathers 120
participants and within his sample, there are 4
different synesthesia types. Each group has an
equal number of participants. After a
memorization period, Anton gives his
participants a memory test. Following an
ANOVA, SSG = 167.91 and SSE = 1760.88
What can be concluded?
112
Solution
Ø Calculate the degrees of freedom
𝑑𝑓 𝐺 = 𝑘 − 1 = 4 − 1 = 3
𝑑𝑓 𝐸 = 𝑁 − 𝑘 = 120 − 4 = 116
Ø Calculate the Mean Squares
𝑀𝑆 𝐺 =
𝑆𝑆𝐺
𝑑𝑓(𝐺)
=
167.91
3
= 55.97
𝑀𝑆 𝐸 =
𝑆𝑆𝐸
𝑑𝑓(𝐸)
=
1760.88
116
= 15.18
Ø Calculate the F-value
𝐹 =
𝑀𝑆(𝐺)
𝑀𝑆(𝐸)
=
55.97
15.18
= 3.687
Ø By taking a look at the F-table we see that for
α=0.05, the Fc(3.116)=2.70, which means the
null hypothesis is rejected
𝐹C%= > 𝐹D
Ø On the next pages we see that for α=0.01, the
Fc = 3.98
0.01 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.025
Answers
Question
Based on the ANOVA output, which of the following statements are correct?
113
A. The scores on the dependent variable likely vary due to residual effects only.
B. The scores on the dependent variable likely vary due to residual effects and group effect.
C. The scores on the dependent variable likely vary due to group effect only
D. The scores on the dependent variable likely do not vary due to residual effects nor due
to the group effect.
Answer: B
6. ANOVA
Sum of Squares df Mean Square F Sig
Between Groups 126 1 126 4.4843 ?
Within Groups 1630 58 28.1034
Total 1756 59
6E. ANOVA
Question
Based on the ANOVA output, which of the following statements are correct?
114
Solution
Ø Using the F-table, we can see that for α=0.05, the 𝐹𝑐 1.58 = 4.03
Ø The Fobs is bigger than the Fc, meaning that the null hypothesis is rejected
Ø There is an overall treatment effect, thus not all group means are the same
Ø However, error cannot be controlled for, so it is always there
Scores likely vary due to treatment/group effect AND error/residual factors
Sum of Squares df Mean Square F Sig
Between Groups 126 1 126 4.4843 ?
Within Groups 1630 58 28.1034
Total 1756 59
Answers
Question
Maja conducted a study with 5 conditions and 30 participants in total have been recruited.
Choose the correct statement:
115
A. F = 21.801, not significant
B. F = 17.474, not significant
C. F = 19.625, significant
D. F = 18.926, significant
Answer: D
7. ANOVA
?
?
?
?
?
? ?
?
2244.500
9041.367
7E. ANOVA
Question
Maja conducted a study with 5 conditions and 30
participants in total have been recruited.
Choose the correct statement:
116
Solution
1) Calcualte the SS(G):
𝑆𝑆𝑇 = 𝑆𝑆𝐺 + 𝑆𝑆𝐸
𝑆𝑆𝐺 = 𝑆𝑆𝑇 − 𝑆𝑆𝐸
𝑆𝑆𝐺 = 9041.367 − 2244.5 = 6796.867
2) Calcualte degrees of freedom:
𝑑𝑓 𝐺 = 𝑘 − 1 = 5 − 1 = 4
𝑑𝑓 𝐸 = 𝑁 − 𝑘 = 30 − 5 = 25
𝑑𝑓 𝑇 = 𝑁 − 1 = 30 − 1 = 29
3) Calculate Mean squares:
𝑀𝑆 𝐺 =
𝑆𝑆𝐺
𝑑𝑓(𝐺)
=
6796.867
4
= 1699.217
𝑀𝑆 𝐸 =
𝑆𝑆𝐸
𝑑𝑓(𝐸)
=
2244.5
25
= 89.780
4) Calculate F-value:
𝐹 =
𝑀𝑆𝐺
𝑀𝑆𝐸
=
1699.217
89.780
= 18.926
5) Use the F-table to reach yout decision:
𝐹𝑐 4,25 = 2.76 ⇒ 𝐹𝑜𝑏𝑠 > 𝐹𝑐 ⇒ 𝑆𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡
?
2244.500
9041.367
?
?
?
?
?
? ?
Answers
Question
Micheal is a sports enthusiast. He wants to investigate which form of excersise leads to better
concentration. He recruits 75 participants and assigns them randomly to 3 groups (cardio, weights,
crossfit). He later measures their concentration levels and compares the means of the groups.
Given that Micheal ended up rejecting the null hypothesis, which of the following is correct?
117
A. There is no difference in concentration levels between groups
B. Micheal can confidently say that cardio is better than weights
C. Micheal needs an extra statistical analysis
D. There is no treatment effect
Answer: C
8. ANOVA
8E. ANOVA
Question
Micheal is a sports enthusiast. He wants to investigate which form of excersise leads to better
concentration. He recruits 75 participants and assigns them randomly to 3 groups (cardio, weights,
crossfit). He later measures their concentration levels and compares the means of the groups.
Given that Micheal ended up rejecting the null hypothesis, which of the following is correct?
118
Solution
A. Incorrect. The null hypothesis states that all group means are the same (no treatment effect). By
rejecting the null hypothesis we can confidently say that not all group means are the same.
B. Incorrect. By rejecting the null hypothesis, we know that not all group means are the same, however
we do not know where the difference is exactly (e.i., between which groups).
C. Correct. If we want to uncover the exact nature of the group difference, we need to conduct
multiple comparisons.
D. Incorrect. Null hypothesis was rejected, thus there is treatment effect.
Answers
Question
Micheal did conduct multiple comparisons to examine the differences between groups. What can be
concluded based on the SPSS output?
119
9. ANOVA
Dependent Variable: Concentration scores
LSD
(I) Group (J) Group Mean
Difference
Std. Error Sig. 95% Confidence Interval
Lower
Bound
Upper
Bound
Cardio Weights 0.1762 0.5102 0.730 -.8338 1.1861
Crossfit 1.4606 0.5470 0.009 .3778 2.5435
Weights Cardio -.1762 0.5102 0.730 -1.1861 .8338
Crossfit 1.2844 0.5696 0.026 .1569 2.4119
Crossfit Cardio -1.4606 0.5470 0.009 -2.5435 -.3778
Weights -1.2844 0.5696 0.026 -2.4119 -.1569
Answers
Question
Micheal did conduct multiple comparisons to examine the differences between groups. What can be
concluded based on the SPSS output?
120
A. There are 2 statistically significant comparisons
B. There is 1 statistically significant comparison
C. All three comparisons are statistically significant
D. None of the comparisons reaches significance
Answer: B
9. ANOVA
9E. ANOVA
Question
Micheal did conduct multiple comparisons to
examine the differences between groups. What
can be concluded based on the SPSS output?
121
Family-wise Type 1 error
In a multiple comparison the α-value of each
comparison is added up. Hence, the chance of
making a Type I Error increases
Solution
Ø While the output does show 2 comparisons
that reach significance (cardio-crossfit,
weights-crossfit), no Bonferroni correction
has been appied for the family-wise Type 1
error.
Ø By applying the Bonferroni correction
(multiply p-value by number of comparisons),
we see that only the comparison between
cardio and crossfit remains significant
Bonferroni Correction
1. Multiply p-value by number of comparisons
Or
2. Divide significance level by number of
comparisons
Number of comparisons: (k(k-1))/2)
Answers
Question
Given that the groups have equal sample sizes and the following output, which statement is correct?
122
A. The normality assumption was violated, so the test should not have been done
B. An independent samples t-test could be done instead of ANOVA
C. MSE is smaller than MSG, hence the treatment effects are significant
D. If the test is significant, multiple comparisons are the necessary next step
Answer: C
10. ANOVA
Sum of Squares df Mean Square F Sig
Between Groups 126 1 126 4.4843 ?
Within Groups 1630 58 28.1034
Total 1756 59
10E. ANOVA
Question
Given that the groups have equal sample sizes, which statement is correct, given the following output?
123
Solution
A. Incorrect. We can see that our sample size is 60 (N-1=59 à N=60). Given that each group has 30
participants, the CLT can be applied, thus the test is robust against a normality violation
B. Correct. Since we have just 2 groups, an independent samples t-test would be equivalent to this
ANOVA.
C. Incorrect. It might be that MSE is smaller than MSG, thus F is bigger than 1, but we always have to
rely on the p-value which tells us whether the result is actually significant
D. Incorrect. Since we only have two groups, if the test is significant, we can immediately tell between
which groups there is a difference, thus it is not a necessity to conduct multiple comparisons.
However if we want to see how the difference will look like, we can continue on with them.
Sum of Squares df Mean Square F Sig
Between Groups 126 1 126 4.4843 ?
Within Groups 1630 58 28.1034
Total 1756 59
Stats1 – Question Pool
Proportions, Entire Distributions
Answers
Question
Florian is the new general manager at Success Formula, replacing Michalina. Success formula offers
courses in Psychology, Business Economics and Law. During the time Michalina was GM, 60% the
student population at SF attended Business Economics courses, 25% Psychology courses and 15% Law
courses. After an intense marketing campaign, Florian believes that this year, things will be different.
In a simple random sample of 275 students, 145 of them chose B/E courses, 75 choose psychology and
55 choose law. Based on the data, Florian wants to test whether the population distribution of field
choice will change or will it be the same as during Michalina’s reign as GM.
Does the result from the sample give sufficient evidence?
125
A. No, the null hypothesis is not rejected with the observed value of the statistic test equal to 1.23
B. Yes, the null hypothesis is rejected with the observed value of the statistic test equal to 7,57
C. No, the null hypothesis is not rejected with the observed value of the statistic test equal to 2.50
D. Yes, the null hypothesis is rejected with the observed value of the statistic test equal to 9.93
Answer: B
1. Proportions and Entire Distributions
1E. Proportions and Entire Distributions
Question
Florian is the new general manager at Success Formula, replacing Michalina. Success formula offers
courses in Psychology, Business Economics and Law. During the time Michalina was GM, 60% the
student population at SF attended Business Economics courses, 25% Psychology courses and 15% Law
courses. After an intense marketing campaign, Florian believes that this year, things will be different.
In a simple random sample of 275 students, 145 of them chose B/E courses, 75 choose psychology and
55 choose law. Based on the data, Florian wants to test whether the population distribution of field
choice will change or will it be the same as during Michalina’s reign as GM.
Does the result from the sample give sufficient evidence?
126
Solution
Ø We see that we have only 1 variable (count of students) which has more than 2 levels (3)
Ø We want to see how well the sample distribution fits a specific model
Ø We have to use the X2 Goodness of Fit Test
1E. Proportions and Entire Distributions
Data
Model: BE(60%)-Psy(25%)-Law(15%)
N = 275
H0: Distribution within sample fits the model
Hα: Distribution within sample does not fit
model
127
Solution
Ø Calculate Expected Counts [𝐸𝑐 = 𝑁×𝑃 𝑒 ]
• B/E: 275 x 0.6 = 165
• Psy: 275 x 0.25 = 68.75
• Law: 275 x 0.15 = 41.25
Ø Calculate the chi-square
𝑥! = Σ
𝑂𝐶 − 𝐸𝐶 !
𝐸𝐶
𝑥!
=
145 − 165 !
165
+
75 − 68.75 !
68.75
+
55 − 41.25 !
41.25
𝑥! = 2.42 + 0.57 + 4.58
𝒙𝟐 = 𝟕. 𝟓𝟕
Ø Check the x2 table for the p-value
𝟎. 𝟎𝟐 ≤ 𝒑 − 𝒗𝒂𝒍𝒖𝒆 ≤ 𝟎. 𝟎𝟐𝟓
We see that the p-value should be lower than
0.05, thus the H0 that the distribution within the
sample fits the model is rejected.
Students
Business/Economics 145
Psychology 75
Law 55
1E. Proportions and Entire Distributions
When to Use
Data type: categorical data
à Check how well a proposed proportion
distribution fits with an observed one.
𝐻#: 𝑇ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒
𝑓𝑖𝑡𝑠 𝑜𝑢𝑟 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛
𝐻$: 𝑇ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒
𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑓𝑖𝑡 𝑜𝑢𝑟 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛
Degrees of Freedom
Nationality
of class
Dutch 0.2
German 0.5
Belgian 0.2
French 0.1
Formula
Χ!= Σ
Obs−Exp !
Exp
Assumptions
• Categorical Data
• Expected Counts >5
EC = N*p(e)
Df = # of cells – 1
df = 4-1 = 3
128
Answers
Question
Andreia has been researching the effectiveness of dialectical behavior therapy (DBT), a type of
cognitive behavioural therapy, for the development of healthy ways to cope with stress and emotion
regulation. She wonders whether DBT has different efficiency levels for different types of populations.
She decides to take two samples, one of people exhibiting eating disorders and one of people with
substance use disorders. After several sessions, Andreia and her team, note for each subject if there was
improvement or not. Andreia is the first researcher to conduct such a study, so she does not know how
the different disorders can have an effect on improvement.
What can be concluded?
129
2. Proportions and Entire Distributions
Improvement
Yes No
Disorder Eating Disorders 148 112 260
Substance use
Disorders
173 102 275
321 214 535
Answers
Question
Andreia has been researching the effectiveness of dialectical behavior therapy (DBT), a type of
cognitive behavioural therapy, for the development of healthy ways to cope with stress and emotion
regulation. She wonders whether DBT has different efficiency levels for different types of populations.
She decides to take two samples, one of people exhibiting eating disorders and one of people with
substance use disorders. After several sessions, Andreia and her team, note for each subject if there was
improvement or not. Andreia is the first researcher to conduct such a study, so she does not know how
the different disorders can have an effect on improvement.
What can be concluded?
130
A. The null hypothesis is not rejected with the observed value of the statistic test equal to 0.98
B. The null hypothesis is rejected with the observed value of the statistic test equal to 1.36
C. The null hypothesis is not rejected with the observed value of the statistic test equal to -1.36
D. The null hypothesis is rejected with the observed value of the statistic test equal to -2.71
Answer: C
2. Proportions and Entire Distributions
2E. Proportions and Entire Distributions
Data
Ø We now compare 2 independent samples
Ø The dependent variable is dichotomous
Ø We have to use a 2 proportion z-test
𝐻#: 𝜋% = 𝜋!
𝐻$: 𝜋% ≠ 𝜋!
𝑝1 =
𝑥%
𝑛%
=
148
260
= 0.57
𝑝2 =
𝑥!
𝑛!
=
173
275
= 0.63
𝜋 =
𝑥% + 𝑥!
𝑛% + 𝑛!
=
148 + 173
260 + 275
= 0.6
131
Solution
Ø Calculate the Z
𝑍 =
𝑝1 − 𝑝2 − (𝜋1 − 𝜋2)
𝜋 < (1 − 𝜋) <
1
𝑛1 +
1
𝑛2
𝑍 =
0.57 − 0.63
0.6(1 − 0.6) <
1
260 +
1
275
𝑍 =
−0.06
0.49 < 0.09
= −1.36
Ø Look at the Z-table for the p-value
P-value(z=-1.36)= 0.0869
Ø Double the p-value since it is a two-tailed test
2x0.0869 = 0.1738 > 0.05
The null hypothesis cannot be rejected.
2E. Proportions and Entire Distributions
When to Use
Comparing the proportion of two groups
(categorical data).
𝐻#: 𝑝% = 𝑝!
𝐻$: 𝑝% ≠ 𝑝!(two-sided)
𝐻$: 𝑝% < 𝑝!or 𝐻$: 𝑝% > 𝑝!(one-sided)
Assumptions:
• Categorical variables
à dichotomous
• Independent groups
• Normality
- always violated
- Central Limit Theorem
Formulas and Application
Z-score =
('
(!) *
("))#
,-
Estimate: •
𝑝% − •
𝑝!
SE (for z-test):
'
(!∗(%)*
(%)
/!
+
'
("∗(%)*
(!)
/"
Confidence Interval
p1 – p2 ± 𝑍!
"#(#%"#)
'#
+
"((#%"()
'(
132
Answers
Question
Refer back to the previous question. What is the 95% confidence interval?
133
A. [0.063, 0.015]
B. [-0.014, 0.023]
C. [-0.053, 0.090]
D. [1.678, 3.683]
Answer: B
3. Proportions and Entire Distributions
3E. Proportions and Entire Distributions
Question
Refer back to the previous question. What is the 95% confidence interval?
134
Solution
𝑝1 − 𝑝2 ± 𝑍𝑐 <
𝑝1 1 − 𝑝1
𝑛1
+
𝑝2 1 − 𝑝2
𝑛2
0.57 − 0.63 ± 1.96 <
0.57 < 0.43
260
+
0.63 < 0.37
275
−0.06 ± 1.96 < 0.042
[−0.014, 0.023]
Answers
Question
Nik wants to see if there is association between the presence of neuroscientific evidence (1=no, 2=yes)
and juror verdicts (not guilty=1, not guilty due to insanity=2 guilty=3).
What can be concluded based on the table?
135
4. Proportion and Entire Distribution
Neuroscientific Evidence
No Yes
Verdict Not Guilty 32 29 61
Not Guilty due
to insanity
55 61 116
Guilty 10 13 23
97 103 200
Answers
Question
Nik wants to see if there is association between the presence of neuroscientific evidence (1=no, 2=yes)
and juror verdicts (not guilty=1, not guilty due to insanity=2 guilty=3).
What can be concluded based on the table?
136
A. The null hypothesis is not rejected with the observed value of the statistic test equal to 0.67
B. The null hypothesis is rejected with the observed value of the statistic test equal to 1.30
C. The null hypothesis is not rejected with the observed value of the statistic test equal to 0.20
D. The null hypothesis is rejected with the observed value of the statistic test equal to 0.65
Answer: A
4. Proportion and Entire Distribution
4E. Proportion and Entire Distribution
Data
Ø We want to study the relationship of two
categorical variables
Ø We use a contigency table
Ø We use the chi-square test for contigency
tables
Expected Counts:
𝐸𝐶 =
𝑇𝑜𝑡𝑎𝑙 𝑟𝑜𝑤 < 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑙𝑢𝑚𝑛
𝑁
137
Solution
Ø Caclualte the chi-square
𝑥! = Σ
𝑂𝐶 − 𝐸𝐶 !
𝐸𝐶
𝑋!
=
32 − 29.585 !
29.585
+
55 − 56.26 !
56.26
+
10 − 11.155 !
11.155
+
29 − 31.415 !
31.415
+
61 − 59.740 !
59.740
+
13 − 11.845 !
11.845
𝑥!
= 0.197 + 0.028 + 0.119 + 0.186 + 0.026 + 0.113
𝑋! = 0.669 = 0.67
Ø Calculate df
𝑑𝑓 = #𝑟𝑜𝑤𝑠 − 1 < #𝑐𝑜𝑙𝑢𝑚𝑛𝑠 − 1
= 3 − 1 < 2 − 1 = 2
Ø Check the p-value
The p-value looks to be greater than 0.25, thus
the null hypothesis cannot be rejected.
No Yes
Not Guilty 32
(29.585)
29
(31.415)
Not Guilty due
to Insanity
55
(56.26)
61
(59.740)
Guilty 10
(11.155)
13
(11.845)
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
Statistics 1 (FPN) QP
1 von 154

Recomendados

Conditional-Probability-Powerpoint.pptx von
Conditional-Probability-Powerpoint.pptxConditional-Probability-Powerpoint.pptx
Conditional-Probability-Powerpoint.pptxVilDom
76 views21 Folien
Lecture_5Conditional_Probability_Bayes_T.pptx von
Lecture_5Conditional_Probability_Bayes_T.pptxLecture_5Conditional_Probability_Bayes_T.pptx
Lecture_5Conditional_Probability_Bayes_T.pptxAbebe334138
1 view43 Folien
Chapter 4 260110 044531 von
Chapter 4 260110 044531Chapter 4 260110 044531
Chapter 4 260110 044531guest25d353
18.5K views75 Folien
CAT Probability von
CAT ProbabilityCAT Probability
CAT ProbabilityGeorge Prep
302 views23 Folien
MATHS_PROBALITY_CIA_SEM-2[1].pptx von
MATHS_PROBALITY_CIA_SEM-2[1].pptxMATHS_PROBALITY_CIA_SEM-2[1].pptx
MATHS_PROBALITY_CIA_SEM-2[1].pptxSIDDHARTBHANSALI
59 views12 Folien
Complements and Conditional Probability, and Bayes' Theorem von
 Complements and Conditional Probability, and Bayes' Theorem Complements and Conditional Probability, and Bayes' Theorem
Complements and Conditional Probability, and Bayes' TheoremLong Beach City College
561 views16 Folien

Más contenido relacionado

Similar a Statistics 1 (FPN) QP

2.statistical DEcision makig.pptx von
2.statistical DEcision makig.pptx2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptxImpanaR2
27 views151 Folien
powerpoints probability.pptx von
powerpoints probability.pptxpowerpoints probability.pptx
powerpoints probability.pptxcarrie mixto
19 views53 Folien
Mathematics Homework Help von
Mathematics Homework HelpMathematics Homework Help
Mathematics Homework HelpEdu Assignment Help
145 views25 Folien
Probability concepts for Data Analytics von
Probability concepts for Data AnalyticsProbability concepts for Data Analytics
Probability concepts for Data AnalyticsSSaudia
1K views39 Folien
5. probability qt 1st tri semester von
5. probability qt 1st tri semester 5. probability qt 1st tri semester
5. probability qt 1st tri semester Karan Kukreja
3.1K views28 Folien
PPT8.ppt von
PPT8.pptPPT8.ppt
PPT8.pptReinabelleMarfilMarq
4 views47 Folien

Similar a Statistics 1 (FPN) QP(20)

2.statistical DEcision makig.pptx von ImpanaR2
2.statistical DEcision makig.pptx2.statistical DEcision makig.pptx
2.statistical DEcision makig.pptx
ImpanaR227 views
powerpoints probability.pptx von carrie mixto
powerpoints probability.pptxpowerpoints probability.pptx
powerpoints probability.pptx
carrie mixto19 views
Probability concepts for Data Analytics von SSaudia
Probability concepts for Data AnalyticsProbability concepts for Data Analytics
Probability concepts for Data Analytics
SSaudia1K views
5. probability qt 1st tri semester von Karan Kukreja
5. probability qt 1st tri semester 5. probability qt 1st tri semester
5. probability qt 1st tri semester
Karan Kukreja3.1K views
Introduction of Probability von rey castro
Introduction of ProbabilityIntroduction of Probability
Introduction of Probability
rey castro8.7K views
Unit 4--probability and probability distribution (1).pptx von akshay353895
Unit 4--probability and probability distribution (1).pptxUnit 4--probability and probability distribution (1).pptx
Unit 4--probability and probability distribution (1).pptx
akshay35389534 views
[Junoon - E - Jee] - Probability - 13th Nov.pdf von PrakashPatra7
[Junoon - E - Jee] - Probability - 13th Nov.pdf[Junoon - E - Jee] - Probability - 13th Nov.pdf
[Junoon - E - Jee] - Probability - 13th Nov.pdf
PrakashPatra771 views
Probability and Conditional.pdf von PyaePhyoKoKo2
Probability and Conditional.pdfProbability and Conditional.pdf
Probability and Conditional.pdf
PyaePhyoKoKo211 views
Statistics and probability test questions von rosschristian
Statistics and probability test questionsStatistics and probability test questions
Statistics and probability test questions
rosschristian3.3K views

Último

ACTIVITY BOOK key water sports.pptx von
ACTIVITY BOOK key water sports.pptxACTIVITY BOOK key water sports.pptx
ACTIVITY BOOK key water sports.pptxMar Caston Palacio
605 views4 Folien
Collective Bargaining and Understanding a Teacher Contract(16793704.1).pptx von
Collective Bargaining and Understanding a Teacher Contract(16793704.1).pptxCollective Bargaining and Understanding a Teacher Contract(16793704.1).pptx
Collective Bargaining and Understanding a Teacher Contract(16793704.1).pptxCenter for Integrated Training & Education
93 views57 Folien
Classification of crude drugs.pptx von
Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptxGayatriPatra14
86 views13 Folien
Dance KS5 Breakdown von
Dance KS5 BreakdownDance KS5 Breakdown
Dance KS5 BreakdownWestHatch
79 views2 Folien
REPRESENTATION - GAUNTLET.pptx von
REPRESENTATION - GAUNTLET.pptxREPRESENTATION - GAUNTLET.pptx
REPRESENTATION - GAUNTLET.pptxiammrhaywood
100 views26 Folien
MercerJesse2.1Doc.pdf von
MercerJesse2.1Doc.pdfMercerJesse2.1Doc.pdf
MercerJesse2.1Doc.pdfjessemercerail
169 views5 Folien

Último(20)

Classification of crude drugs.pptx von GayatriPatra14
Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptx
GayatriPatra1486 views
Dance KS5 Breakdown von WestHatch
Dance KS5 BreakdownDance KS5 Breakdown
Dance KS5 Breakdown
WestHatch79 views
REPRESENTATION - GAUNTLET.pptx von iammrhaywood
REPRESENTATION - GAUNTLET.pptxREPRESENTATION - GAUNTLET.pptx
REPRESENTATION - GAUNTLET.pptx
iammrhaywood100 views
11.28.23 Social Capital and Social Exclusion.pptx von mary850239
11.28.23 Social Capital and Social Exclusion.pptx11.28.23 Social Capital and Social Exclusion.pptx
11.28.23 Social Capital and Social Exclusion.pptx
mary850239298 views
The Open Access Community Framework (OACF) 2023 (1).pptx von Jisc
The Open Access Community Framework (OACF) 2023 (1).pptxThe Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptx
Jisc110 views
Psychology KS5 von WestHatch
Psychology KS5Psychology KS5
Psychology KS5
WestHatch93 views
AI Tools for Business and Startups von Svetlin Nakov
AI Tools for Business and StartupsAI Tools for Business and Startups
AI Tools for Business and Startups
Svetlin Nakov107 views
Are we onboard yet University of Sussex.pptx von Jisc
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptx
Jisc96 views
Psychology KS4 von WestHatch
Psychology KS4Psychology KS4
Psychology KS4
WestHatch84 views
Sociology KS5 von WestHatch
Sociology KS5Sociology KS5
Sociology KS5
WestHatch70 views
11.30.23 Poverty and Inequality in America.pptx von mary850239
11.30.23 Poverty and Inequality in America.pptx11.30.23 Poverty and Inequality in America.pptx
11.30.23 Poverty and Inequality in America.pptx
mary850239160 views
PLASMA PROTEIN (2).pptx von MEGHANA C
PLASMA PROTEIN (2).pptxPLASMA PROTEIN (2).pptx
PLASMA PROTEIN (2).pptx
MEGHANA C68 views
Use of Probiotics in Aquaculture.pptx von AKSHAY MANDAL
Use of Probiotics in Aquaculture.pptxUse of Probiotics in Aquaculture.pptx
Use of Probiotics in Aquaculture.pptx
AKSHAY MANDAL100 views
Structure and Functions of Cell.pdf von Nithya Murugan
Structure and Functions of Cell.pdfStructure and Functions of Cell.pdf
Structure and Functions of Cell.pdf
Nithya Murugan545 views

Statistics 1 (FPN) QP

  • 1. Statistics 1 (FPN)– Question Pool
  • 2. Welcome to Success Formula Question Pool Disclaimers • All slides and its materials are the property of Success Formula • You get an exclusive free personal access once buying the course the slides are made for • The slides are individually marked, and Success Formula can track to which users they belong • No part of this slide deck may be reproduced, distributed, or transmitted (hereafter in this slide referred together as “Shared”) in any form or by any means, including sharing the material on platforms such as StudyDrive • In case slides are shared, Success Formula can attempt legal actions towards the sharing party in line with European and Dutch Law (Copyright laws) 1 Error Bounty • If you find any mistake in this slide deck, let us know and we will refund you the cost of the slides • Only the first person indicating the mistake gets the refund
  • 3. Answers Question Some people seem to like Breaking Bad, others like Prison Break. What is the percentage of people that watch TV? 2 A. The Walking Dead B. Depends on the year C. All of them D. Answer D because it is the best answer Answer: C Introduction question Question topic The question Difficulty Answers Correct Answer
  • 4. Significance level *** Always use a significance level of 0.05 if otherwise not specified*** 3
  • 5. Stats1 – Question Pool Probability Theory
  • 6. Answers Question Florian wants to show Julian a new magic trick. As part of the trick, Julian has to pull a card out of a 52 card deck, 3 times in a row, each time keeping the card before pulling the next one. There are 26 red cards and 26 black cards. Which statement is incorrect? 5 A. The probability that out of the three chosen cards, there is at least one red card or at least one black card is equal to 1 B. The outcome of the 2nd trial will influence the outcome of the 3rd trial C. The probability of picking a queen of hearts equals the probability of picking a queen of hearts given that in the previous trial Julian picked a 7 of spades D. The sample space is all the possible combinations of cards that can be drawn in a sample of 3 Answer: C 1. Probability Theory
  • 7. 1E. Probability Theory Question Florian wants to show Julian a new magic trick. As part of the trick, Julian has to pull a card out of a 52 card deck, 3 times in a row, each time keeping the card before pulling the next one. There are 26 red cards and 26 black cards. Which statement is incorrect? 6 Solution A. Correct. Since the deck of cards has an equal number of red and black cards, Julian will definitely pick at least 1 card of either black or red colour, meaning that we have a perfect probability equal to 1 B. Correct. Every time Julian picks a card, he does not put it back, meaning that each outcome of every trial will influence the next one (the events become dependent) C. Incorrect. P(QH) = P(QH/7S) à That would be correct if the events were independent. In other words, if after every trial, Julian put his chosen card back in the deck. D. Correct. Julian picks 3 cards in total so any possible combination that he can make with 3 cards is included in the sample space
  • 8. Answers Question Suppose that 2 dice are rolled at the same time. Calculate the following probabilities: • P(A): The sum of the two numbers is equal to 1 • P(B): The sum of the two numbers is equal to 5 • P(C): The sum of the two numbers is less than 13 7 A. P(A) = 0.5, P(B) = 0.23, P(C) = 0 B. P(A) = 0, P(B) = 0.111, P(C) = 1 C. P(A) = 1, P(B) = 0.12, P(C) = 0 D. The probabilities cannot be calculated Answer: B 2. Probability Theory
  • 9. 2E. Probability Theory Question Suppose that 2 dice are rolled at the same time. Calculate the following probabilities: • P(A): The sum of the two numbers is equal to 1 • P(B): The sum of the two numbers is equal to 5 • P(C): The sum of the two numbers is less than 13 Sample Space: (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) 8 Solution No possible combination resulting from rolling 2 dice at the same time can give us a sum equal to 1 since dice do not have the number 0. • The smallest sum we can find is equal to 2, resulting from the combination (1,1) • P(A) = 0 To calculate P(B), we need to identify from our sample space the combinations that yield a sum of 5. In this case, we have 4 combinations (colored ones). • We can use the general formula • P(A) = 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑨 𝒕𝒐𝒕𝒂𝒍 • 𝑷 𝑨 = 𝟒 𝟑𝟔 = 𝟏 𝟗 = 𝟎. 𝟏𝟏𝟏 We can observe that the combination resulting in the largest sum is the (6,6) with a sum of 12. • This means that all possible combinations will yield a sum lower than 13 • P(C) is the probability of the entire sample space • P(C) = 1
  • 10. Answers Question An experiment has four mutually exclusive outcomes, A, B, C, and D. If P(A) = 0.33, P(B) = 0.17, P(C) = 0.43, P(D) = 0.07, which of the following statements must be true? 9 A. All of the events are independent with each other B. The marginal probability of A equals the conditional probability of A given D C. The joint probability of C and B is equal to 0 D. None of the alternatives is correct Answer: C 3. Probability Theory
  • 11. 3E. Probability Theory Question An experiment has four mutually exclusive outcomes, A, B, C, and D. If P(A) = 0.33, P(B) = 0.17, P(C) = 0.43, P(D) = 0.07, which of the following statements must be true? 10 Solution A. Incorrect. Given that all of our 4 events are mutually exclusive, they cannot happen at the same time. Thus, we know that our events must be dependent on each other. B. Incorrect. This is only the case when the 2 events are independent with one another [𝑃 𝐴 = 𝑃 ⁄ 𝐴 𝐵 .] C. Correct. Οur events are mutually exclusive, meaning that they cannot happen at the same time. [P(C AND B) = 0] D. Incorrect. C is the correct statement.
  • 12. Answers Question Suppose we conduct a random experiment and two events, A and B are independent. Which of the following rules can we use to prove the relationship between A and B? 11 A. P(A and B) = 0 B. P(and B) = P(A) x P(B/A) C. P(A or B) = P(A) + P(B) – P(A and B) D. P(A)=P(A/B) Answer: D 4. Probability Theory
  • 13. 4E. Probability Theory Question Suppose we conduct a random experiment and two events, A and B are independent. Which of the following rules can we use to prove the relationship between A and B? 12 Solution A. Incorrect. P(A and B) = 0 is the rule for spotting disjoint events. It shows that the two events cannot happen at the same time. B. Incorrect. P(A and B) = P(A) x P(B/A) is the general multiplication rule C. Incorrect. P(A or B) = P(A) + P(B) – P(A and B) is the general addition rule D. Correct. P(A) = P(A/B) is a rule for spotting independent events, showing that the probability of event A is not influenced by the occurrence of event B
  • 14. Answers Question A recent survey showed that 45% of Success Formula students prefer to visit Tapijn park to relax after a long day of studying. Also, 27% of UM students both like to go to Tapijn park and the city center to relax. Finally, the survey showed that 40% of students said that they don’t visit the city center for some time off. Based on the above data, determine the following probabilities: a. PA: the probability that a randomly selected UM student visits Tapijn given that he/she also visits the city center b. PB: the probability that a randomly selected UM student visits Tapijn or visits the city center 13 A. P(A) = 0.45, P(B) = 0.27 B. P(A) = 0.88, P(B) = 0 C. P(A) = 0.18, P(B) = 0.85 D. P(A) = 0.45, P(B) = 0.78 Answer: D 5. Probability Theory
  • 15. 5E. Probability Theory Question A recent survey showed that 45% of Success Formula students prefer to visit Tapijn park to relax after a long day of studying. Also, 27% of UM students both like to go to Tapijn park and the city center to relax. Finally, the survey showed that 40% of students said that they don’t visit the city center for some time off. Based on the above data, determine the following probabilities: a. PA: the probability that a randomly selected UM student visits Tapijn given that he/she also visits the city center b. PB: the probability that a randomly selected UM student visits Tapijn or visits the city center P(Tapijn) = 0.45 P(Tapijn AND City) = 0.27 𝑷 𝑪𝒊𝒕𝒚! = 0.4 P(City) =𝟏 − 𝑷 𝑪𝒊𝒕𝒚! P(City) = 𝟏 − 𝟎. 𝟒 = 𝟎. 𝟔 14 Solution Ø For P(A) we are looking for the P(Tapijn/City) Ø We can first check if these 2 events are independent • 𝑃 𝐴 𝐴𝑁𝐷 𝐵 = 𝑃 𝐴 ×𝑃 𝐵 à rule for spotting independence • 0.27 = 0.45 × 0.6 • 0.27 = 0.27 à P(Tapijn) and P(City) are independent • P(Tapijn/City) = P(Tapijn) • P(A) = 0.45 Ø For P(B) we want the P(Tapijn Or City) Ø The joint probability of these events is not equal to 0, thus the events are non-disjoint Ø We can use the general formula • 𝑃 𝐵 = 𝑃 𝑇𝑎𝑝𝑖𝑗𝑛 + 𝑃 𝐶𝑖𝑡𝑦 − 𝑃 𝑇𝑎𝑝𝑖𝑗𝑛 𝐴𝑛𝑑 𝐶𝑖𝑡𝑦 • P(B) = 0.45 + 0.6 – 0.27 • P(B) = 0.78
  • 16. Answers Question Suppose one runs a random experiment with 3 events (A, B, C). Events A and B are disjoint, C is independent of A and dependent with B. P(B) = 0.3, P(C/B) = 0.135, P(C/A) =0.48, P(C and A) = 0.16. Calculate the following probabilities: a. P(C) b. P(A and B) c. P(B or C) d. P(A or B) 15 A. P(C) = 0.48, P(A and B) = 0, P(B or C) = 0.74, P(A or B) = 0.63 B. P(C) = 0.48, P(A and B) = 0.0405, P(B or C) = 0.78, P(A or B) = 0 C. P(C) = 0.48, P(A and B) = 0, P(B or C) = 0.63, P(A or B) = 0.74 D. P(C) = 0.48, P(A and B) = 0.73, P(B or C) = 0.86, P(A or B) = 0.63 Answer: A 6. Probability Theory
  • 17. 6E. Probability Theory Question Suppose one runs a random experiment with 3 events (A, B, C). Events A and B are disjoint, C is independent of A and dependent with B. P(B) = 0.3, P(C/B) = 0.135, P(C/A) =0.48, P(C and A) = 0.16. Calculate the following probabilities: a. P(C) b. P(A and B) c. P(B or C) d. P(A or B) 16 Graph Event C Event B Event A Solution Since events A and C are independent we can say: • P(C) = P(C/A) • P(C) = 0.48 We know that events A and B are disjoint and we also see that there is no intersection in the graph: • P(A and B) = 0 P(B or C) = P(B) + P(C) – P(B and C) • We do not have P(B and C) but we can find it using the multiplication rule • P(B and C) = P(B) x P(C/B) = 0.3 X 0.135 = 0.0405 • P(B or C) = 0.3 + 0.48 - 0.0405 = 0.74 Since A and B are disjoint events we will use the special form of the formula: • P(A or B) = P(A) +P(B) • We can calculate P(A) using the multiplication rule • P(C and A) = P(A) x P(C) • à P(A) = 0.16/0.48 = 0.33 P(A or B) = 0.33 + 0.3 = 0.63
  • 18. Answers Question Remco decides to investigate which Dutch delicacy is most preferred by students in Maastricht. He writes down his results in the following table. Calculate the following probabilities: 1. The probability that we randomly select a student who likes fries, given that they are a male 2. The probability that we randomly select a student who is a female, given they like fries 3. The probability that the student likes bitterballen 17 A. P(1) = 66.67%, P(2) = 34.78%, P(3) = 32.5% B. P(1) = 20%, P(2) = 66.67%, P(3) = 17.5% C. P(1) = 34.78%, P(2) = 33.33%%, P(3) = 32.5% D. P(1) = 34.78%, P(2) = 23.52%, P(3) = 17.5% Answer: C 7. Probability Theory Fries Bitterballen Stroopwaffles Male 40 35 40 115 Female 20 30 35 85 60 65 80 200
  • 19. 7E. Probability Theory Question Remco decides to investigate which Dutch delicacy is most preferred by students in Maastricht. He writes down his results in the following table. Calculate the following probabilities: 1. The probability that we randomly select a student who likes fries, given that they are a male 2. The probability that we randomly select a student who is a female, given they like fries 3. The probability that the student likes bitterballen 18 Solution P(1) = P(Fries/Male) • It is a conditional probability so we are not working within the entire sample space • The condition indicates the denominator • 𝑃 1 = !" ##$ = 34,78% P(2) = P(Female/Fries) • P(2) = "# $# = 33.33% P(3) = P(Bitterballen) • It is the marginal probability within the entire sample space • P(3) = $% "## = 32.5% Fries Bitter ballen Stroop waffles Male 40 35 40 115 Female 20 30 35 85 60 65 80 200
  • 20. Answers Question Refer to the table from the previous question. Which of the following statements is correct: 19 A. The probability P(Bitterballen/Female) is not evaluated across the entire sample space B. The events of picking randomly someone that is a female and of picking randomly someone who likes stroopwaffles are disjoint C. The marginal probability of P(Fries) is equal to the conditional probability of P(Fries/Male) D. The events of randomly picking a male and randomly picking someone that likes stroopwaffles are independent Answer: A 8. Probability Theory Fries Bitterballen Stroopwaffles Male 40 35 40 115 Female 20 30 35 85 60 65 80 200
  • 21. 8E. Probability Theory Question Refer to the table from the previous question. Which of the following statements is correct: 20 Solution A. Correct. P(Bitterballen/Female) is not evaluated across the entire sample space, Conditional probabilities are evaluated across a subset of the entire sample space, in this case acorss the subset of females. B. Incorrect. We can see from the table that there are females that prefer stroopwaffles (n=35), so these 2 events can happen at the same time (not Disjoint) C. Incorrect. P(Fries) ≠ P(Fries/Male) 𝑃(𝐹𝑟𝑖𝑒𝑠) = 60 200 = 0.3 𝑃 𝐹𝑟𝑖𝑒𝑠 𝑀𝑎𝑙𝑒 = 40 115 = 0.35 D. Incorrect. P(Male) ≠ P(Male/Stroopwaffles) 𝑃 𝑀𝑎𝑙𝑒 = 115 200 = 0.575 𝑃 𝑀𝑎𝑙𝑒/𝑆𝑡𝑟𝑜𝑜𝑝𝑤𝑎𝑓𝑓𝑙𝑒𝑠 = 40 80 = 0.2 Fries Bitterb allen Stroop waffles Male 40 35 40 115 Female 20 30 35 85 60 65 80 200
  • 22. Answers Question The probability of meeting someone who wears eyeglasses randomly in the street is 0.55. When meeting 4 random people, what is the probability that the number of people that you meet wearing eyeglasses is 3 or higher? 21 A. P(X≥ 3) = 0.392 B. P(X≥ 3) = 0.346 C. P(X≥ 3) = 0.092 D. The probability cannot be calculated because we do not have the sample size Answer: A 9. Probability Theory
  • 23. 9E. Probability Theory Question The probability of meeting someone who wears eyeglasses randomly in the street is 0.55. When meeting 4 random people, what is the probability that the number of people that you meet wearing eyeglasses is 3 or higher? 22 Solution G G G G NG NG G NG NG G G NG NG G NG NG G G G NG NG G NG NG G G NG NG G NG 0.55 0.45
  • 24. 9E. Probability Theory 23 Find the Right Combinations Since we are looking for the probability of meeting 3 or more people with glasses in our sample of 4, the right combinations are the following: • G-G-G-G • G-G-G-NG • G-G-NG-G • G-NG-G-G • NG-G-G-G Calculate the Probabilities We need to calculate the probabilities using multiplication for each of the combinations: • G-G-G-G è 0.55 x 0.55 x 0.55 x 0.55 = 0.092 • G-G-G-NG è 0.55 x 0.55 x 0.55 x 0.45 = 0.075 • G-G-NG-G è 0.55 x 0.55 x 0.45 x 0.55 = 0.075 • G-NG-G-G è 0.55 x 0.45 x 0.55 x 0.55 = 0.075 • NG-G-G-G è 0.45 x 0.55 x 0.55 x 0.55 = 0.075 Sum Them Up We need to add all of the probabilities we just calculated to find the overall probability of meeting 3 or more people with glasses [P(x ≥ 3)] • 0.092 + 0.075 + 0.075 + 0.075 + 0.075 = 0.392
  • 25. Answers Question Given the following probability distribution, what is the approximate variance of X? 24 A. 4.05 B. -1.66 C. 7.38 D. 15.52 Answer: D 10. Probability Theory X P(x) 0 0.4 1 0.8 2 0.32 3 0.15 4 0.54
  • 26. 10E. Probability Theory Question 25 Solution Ø First, we need to calculate the expected value in order to use in the formula for the variance: • µ𝒙 = ∑ 𝑃(𝑥) ∗ x = 0 x 0.4 + 1 x 0.8 + 2 x 0.32 + 3 x 0.15 + 4 x 0.54 = 4.05 Ø We can now calculate the variance using the formula 𝜎3² = ∑ 𝑃(𝑥) ∗ (𝑥 − µ3)² • 𝜎3² = 0.4 0 − 4.05 4 + 0.8 1 − 4.05 4 + 0.32 2 − 4.05 4 + 0.15 3 − 4.05 4 + 0.54 4 − 4.05 4 𝜎3² = (6.56) + (7.44) + (1.34) + (0.17) + (0.00135) 𝝈𝒙² = 15.52 Given the following probability distribution, what is the variance of X? X P(x) 0 0.4 1 0.8 2 0.32 3 0.15 4 0.54
  • 27. Stats1 – Question Pool Probability Distribution
  • 28. Answers Question Thomas takes a standardized test as part of his university application. Standardized tests allow comparisons to be made regarding student achievement. When he received his results, he was told that he scored -0.28 in terms of Z-scores. However, he is not sure whether that is a good or bad result. Given that the test scores are normally distributed, what can he conclude from the result? 27 A. He did better than half of the participants B. He did worse than half of the participants C. He did worse than 28% of the participants D. Nothing can be said because we do not have the standard deviation and the mean Answer: B 1. Probability Distribution
  • 29. 1E. Probability Distribution Question Thomas takes a standardized test as part of his university application. Standardized tests allow comparisons to be made regarding student achievement. When he received his results, he was told that he scored -0.28 in terms of Z-scores. However, he is not sure whether that is a good or bad result. Given that the test scores are normally distributed, what can he conclude from the result? 28 Solution Ø Since Thomas has a Z-score equal to -0.28, it means that he scored 0.28 standard deviations below the mean. The negative sign indicates the direction in regards to the mean. The mean is the average, with 50% of the scores below and 50% of the scores above it. Since Thomas is on the left side, we can say that he performed worse than 50% of the test takers.
  • 31. Answers Question Lea decides to investigate the average income distribution in her hometown. She observes that the majority of households have a low to middle income and a small minority with a high-income. Which of the following statements is correct? 30 A. Scores located within 1 standard deviation to the left and right of the mean make up 68% of the entire data set B. A household with an income of 2.3 standard deviations above the mean is in the top 2.5% of the population C. The variable in question is a discrete variable D. None of the above statements is correct Answer: D 2. Probability Distribution
  • 32. 2E. Probability Distribution Question Lea decides to investigate the average income distribution in her hometown. She observes that the majority of households have a low to middle income and a small minority with a high-income. Which of the following statements is correct? 31 Solution Ø From the discription, we can understand that the distribution of average income is right skewed, rather than a normal distribution. Ø A) and B) alternatives are wrong because they refer to the rule of thumb (68%-95%-99.7%), which can only be used for normal distributions Ø The thrid alternative is wrong because the variable of average income can take infinite possible values, thus the variable is continuous
  • 33. Answers Question Alexandra decides to measure extraversion scores of students at Success Formula. The scores are well modeled by a normal distribution with a mean of 72 and a standard deviation of 14. What is the probability of a randomly selected person to score between 66 and 76 for extraversion? 32 A. 28.05% B. 61.41% C. 32.98% D. 40.82% Answer: A 3. Probability Distribution
  • 34. 3E. Probability Distribution Question Alexandra decides to measure extraversion scores of students at Success Formula. The scores are well modeled by a normal distribution with a mean of 72 and a standard deviation of 14. What is the probability of a randomly selected person to score between 66 and 76 for extraversion? 33 Solution Calculate the z-scores: 𝑧& = '$('" &) = 0.29 and 𝑧" = $$('" &) = −0.43 Look up probabilities in z-table: 𝑧& = 0.29 → 61.41% and 𝑧" = −0.43 → 33.36% Calculate the probability that the score is between 66 and 78: 61.41% − 33.36% = 28.05%
  • 35. Answers Question Suppose that Alexandra measures extraversion scores for a different population with a mean of 80 and a standard deviation of 9. What is the probability that a randomly selected person scores higher than 91? 34 A. 73.89% B. 11.12% C. 40.57% D. 55.63% Answer: B 4. Probability Distribution
  • 36. 4E. Probability Distribution Question Suppose that Alexandra measures extraversion scores for a different population with a mean of 80 and a standard deviation of 9. What is the probability that a randomly selected person scores higher than 91? 35 Solution Calculate the z-scores: 𝑧& = *(+ , = -&(.# - = 1.22 Look up probabilities in z-table: 𝑧& = 1.22 → 0.8888 (𝑇ℎ𝑖𝑠 𝑖𝑠 𝑡ℎ𝑒 𝑙𝑒𝑓𝑡 𝑠𝑖𝑑𝑒𝑑 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦) Calculate the probability that score is higher than 91 (right sided probability): 1 − 0.8888 = 0.1112 → 11.12%
  • 37. Answers Question According to the Central Limit Theorem: 36 A. The sample distribution becomes normal if there is a sufficient sample size (n>25) B. The sampling distribution becomes normal only when the population distribution is normal C. Regardless of the shape of the population distribution, the sampling distribution will always be normal D. As a sample size increases, the sample mean and standard deviation will be closer in value to the population mean µ and standard deviation σ Answer: D 5. Probability Distribution
  • 38. 5E. Probability Distribution Questions According to the Central Limit Theorem: 37 Solution A. Incorrect. It is not the sample distribution that approaches normality when there is a sufficiently large sample. It is the sampling distribution. B. Incorrect. The sampling distribution is indeed normal when the population distribution is normal but it can also approach normality whenever the sample size is suffciently large, regardless of the population’s shape C. Incorrect. The sampling distribution is not always normal. For a small sample size, it has a similar shape to the population distribution and not necessarly normal. For a large sample size, it becomes approximately normal D. Correct. As the sample size becomes larger, the mean of all sampled variables and the variances of the samples become approximately equal to that of the population.
  • 39. Answers Question Maja plans to study the effects of Omega-3 supplements on antisocial behaviour. She develops a measurement which will be filled by her participants before and after a 2-month long trial during which subjects will be taking daily omega-3 supplements. However, she has trouble recruiting a high number of participants. Given that the sample size is not large enough, which of the following statements is incorrect: 38 A. The sample mean is a biased estimator of the population mean B. The shape of the sampling distribution will be similar to that of the population distribution C. The standard error will probably be too high D. There is a high risk of unreliable statements about population parameters Answer: A 6. Probability Distribution
  • 40. 6E. Probability Distribution Question Maja plans to study the effects of Omega-3 supplements on antisocial behaviour. She develops a measurement which will be filled by her participants before and after a 2-month long trial during which subjects will be taking daily omega-3 supplements. However, she has trouble recruiting a high number of participants. Given that the sample size is not large enough, which of the following statements is incorrect: 39 Solution A. This statement is incorrect. Bias is not depended on the size of the sample. We might have an inaccurate estimate, but if we are using the right one for the population parameter, the estimate is still unbiased. An estimate will be biased if the estimate is not the appropriate one (e.g., no random sample) B. Correct. Since Maja has a small sample size, the sampling distribution has a similar shape to the population distribution and not necessarly a normnal one. C. Correct. Based on the C.L.T, the lower the sample size, the greater the standard error D. Correct. Larger sample sizes allow more reliable statements about population parameters, compared to small sample sizes.
  • 41. 6E. Probability Distribution 40 Estimator Something that is used in statistics to estimate some facts about population. à Sample mean is an estimator of population mean. Bias Bias = the difference between the expected value that is estimated and the true value of the parameter à The V 𝑿 of a simple random sample is always unbiased. Efficiency The accuracy of the sample mean. à The larger the sample size, the smaller the standard error. à The smaller the standard error, the more efficient the estimate.
  • 42. Answers Question Alexithymia is a personality trait which features inability to describe, identify and experience emotions. In a population of people with borderline alexithymia, emotional intelligence scores have a mean of 57 and a standard deviation of 15. The population distribution is skewed the right. Darian takes a simple random sample of 32. What is the probability that our sample mean will be between 55 and 60? 41 A. 74.86% B. 13.11% C. 64.42% D. The probability cannot be calculated because the population distribution is skewed Answer: C 7. Probability Distribution
  • 43. 7E. Probability Distribution Question Alexithymia is a personality trait which features inability to describe, identify and experience emotions. In a population of people with borderline alexithymia, emotional intelligence scores have a mean of 57 and a standard deviation of 15. The population distribution is skewed the right. Darian takes a simple random sample of 32. What is the probability that our sample mean will be between 55 and 60? µ = 57 σ = 15 n = 32 à Central Limit Theorem applies (n >25) 42 Solution Ø Calculate Z-scores 𝑧& = X Χ − 𝜇 𝜎 𝑛 = 60 − 57 15 32 = 1.13 z" = X Χ − 𝜇 𝜎 𝑛 = 55 − 57 15 32 = −0.75 Ø Look up probabilities in z-table 𝑧& = 1.13 → 87.08% 𝑧" = −0.75 → 22.66% Ø Calculate the probability that the score is between 55 and 60: 87.08% − 22.66% = 64.42%
  • 44. Answers Question A certain variable follows a normal population distribution. The population mean is equal to 23.48 and the standard deviation equal to 4.657. The probability that the sample mean is higher than 24 equals 25.14%. Calculate the sample size. 43 A. 49 B. 24 C. 36 D. The sample size cannot be calculated Answer: C 8. Probability Distribution
  • 45. 8E. Probability Distribution Question A certain variable follows a normal population distribution. The population mean is equal to 23.48 and the standard deviation equal to 4.657. The probability that the sample mean is higher than 24 equals 25.14%. Calculate the sample size. µ = 23.48 σ = 4.657 P( ̅ 𝑥 > 24) = 25.14% 44 Solution Ø We need to see for which Z-score, the probability of having a sample mean higher than 24 equals 25.14% • Since it is a right-sided probability, we need to substract from 1 (table gives left-sided probabilities) • 1-0.2514=0.7486 • We can find the 0.7486 in the table and it is for the z-score of 0.67 Ø We can use the Z-formula 𝑧 = X 𝑋 − 𝜇 𝜎 𝑛 0.67 = 24 − 23.48 4.657 𝑛 = 0.52 4.657 𝑛 0.67 = 0.52× 𝑛 4.657 𝑛 = 0.67×4.657 0.52 = 6 𝒏 = 𝟔𝟐 = 𝟑𝟔
  • 46. Answers Question Eero develops a new brand of cherry soda and he has decided on a specific bottle design. The contents of soda bottles are normally distriuted with a mean of 400 and a standard deviation of 7. There is a 8.38% chance that the average contents of a 4-pack will exceed how many ml? 45 A. 400.12 B. 404.83 C. 407.31 D. 400.60 Answer: B 9E. Probability Distribution
  • 47. 9E. Probability Distribution Question Eero develops a new brand of cherry soda and he has decided on a specific bottle design. The contents of soda bottles are normally distriuted with a mean of 400 and a standard deviation of 7. There is a 8.38% chance that the average contents of a 4-pack will exceed how many ml? 46 Solution Ø We know that the contents of the soda bottles are normally distributed, thus we can use the Z-table Ø P( ̅ 𝑥>?)=8.38 (right sided probability) ⇔ 1– 0.0838 = 0.9162 ⇔ Z = 1.38 𝑍 = ̅ 𝑥 − 𝜇 g 𝜎 𝑛 1.38 = ̅ 𝑥 − 400 g 7 4 4.83 + 400 = ̅ 𝑥 ̅ 𝑥 = 404.83
  • 48. Answers Question Leonie wishes to investigate homeslessness experiences in Maastricht. However, there is no list of homeless people in the city. She decides to use instead a non-random sampling method known as snowball sampling. Leonie meets one homeless person who participates in her research and also put her in contact with other homeless people in the area that they know. Using this method she is able to gather 178 participants. Which of following statements pertaining to the population estimator is true? 47 A. The estimator is unbiased and efficient B. The estimator is unbiased and not efficient C. The estimator is biased and efficient D. The estimator is biased and not efficient Answer: C 10. Probability Distribution
  • 49. 10E. Probability Distribution Question Leonie wishes to investigate homeslessness experiences in Maastricht. However, there is no list of homeless people in the city. She decides to use instead a non-random sampling method known as snowball sampling. Leonie meets one homeless person who participates in her research and also put her in contact with other homeless people in the area that they know. Using this method she is able to gather 178 participants. Which of following statements pertaining to the population mean estimator is true? 48 Solution Ø Leonie is using a non-random sampling method, meaning that her sample is not random. This can lead to Leonie using an inappropriate estimator for the population mean which would make her estimator biased. ’Bias’ has nothing to do with the sample size Ø Leonie has a sample size of 178 participants which is a sufficiently large sample (C.L.T). Thus, her estimator for the population mean will indeed be efficient. As the sample size increases, the standard error decreases
  • 50. Stats1 – Question Pool Hypothesis Testing
  • 51. Answers Question A researcher claims that he was able to develop a drug that enhances human attention. He will test this hypothesis by recruiting 80 individuals with Attention Deficit Disorder (ADD). He divides evenly his sample into 2 groups and makes sure that the groups are matched in their attention levels. He continues by administering the drug only in group 1, keeping group 2 as a control. Finally, all participants across both groups have to complete an Attention Test, with higher scores indicating worse attention. What is the researcher’s null and alternative hypothesis? 50 A. H0: µ1= µ2, Hα: µ1 ≠ µ2 B. H0: µ1 ≠ µ2, Hα: µ1< µ2 C. H0: µ1= µ2 Hα: µ1> µ2 D. H0: µ1= µ2 Hα: µ1< µ2 Answer: D 1. Probability Theory
  • 52. 1E. Hypothesis Testing Question A researcher claims that he was able to develop a drug that enhances human attention. He will test this hypothesis by recruiting 80 individuals with Attention Deficit Disorder (ADD). He divides evenly his sample into 2 groups and makes sure that the groups are matched in their attention levels. He continues by administering the drug only in group 1, keeping group 2 as a control. Finally, all participants across both groups have to complete an Attention Test, with higher scores indicating worse attention. What is the researcher’s null and alternative hypothesis? 51 Solution A. Incorrect. The alternative hypothesis indicates a two-sided test (Hα: µ1 ≠ µ2). The researcher wants to test the hypothesis that the drug enhances human attention, so we are looking for a one-sided test. B. Incorrect. The null hypothesis always suggests that there is no significant relationship between our data. In this case, it is the hypothesis that the drug will not have an effect on the mean of group 1 (H0: µ1 =µ2) C. Incorrect. The alternative hypothesis states that the mean of group 1 should be higher than that of group 2 after the drug administration. However, higher scores mean worse attention levels. Since the researcher expects that the drug is beneficial, we should be expecting that group 1 has better attention levels than group 2, thus lower scores D. Correct. The alternative hypothesis claims that group 2 will have worse attention relative to group 1, as seen from their higher test scores
  • 53. Answers Question Refer back to the example in question one. The researcher is informed that the population of people with ADD is skewed to the right. Which of the following statements is correct? 52 A. The researcher can still test his hypothesis because normality is not a necessary condition B. The researcher can still test his hypothesis because his sample size is large enough C. The researcher cannot test his hypothesis because there is no normality in the population D. The researcher cannot test his hypothesis because his sample size is not large enough Answer: B 2. Hypothesis Testing
  • 54. 2E. Hypothesis Testing Question Refer back to the example in question one. The researcher is informed that the population of people with ADD is skewed to the right. Which of the following statements is correct? 53 Solution A. Incorrect. In order to be able to test our hypothesis, we need to make sure that we are working with a normal distribution B. Correct. The researcher can indeed do the test because he has a large enough sample size, meaning that the central limit theorem applies (= the sampling distribution approximates a normal distribution as the sample size gets larger, regardless of the population distribution) C. Incorrect. Since the central limit theorem applies, we do not need to worry about the skewed population distribution D. Incorrect. The sample size is large enough. The cut-off for the central limit theorem to apply is n ≥ 25
  • 55. Answers Question Florian believes that a new Artificial Intelligence teaching method can influence student ratings compared to using human tutors. He is however unsure about what this influence can look like because, despite the AI’s greater efficiency, students might still prefer human interaction during their tutorials. Florian then takes a SRS of 27 students from a population of students with a mean rating of µ=30,2 and a standard deviation of σ=16. The sample of students take a lesson from the AI system and then give it a rating with a mean of 24,5. Can Florian conclude that the mean rating of the AI system is significantly different from the mean of the normal method? 54 A. Yes, we reject the null hypothesis with the p-value of 0.0322 B. Yes, we reject the null hypothesis with the p-value of 0.0644 C. No, we cannot reject the null hypothesis with the p-value of 0.0322 D. No, we cannot reject the null hypothesis with the p-value of 0.0644 Answer: D 3. Hypothesis Testing
  • 56. 3E. Hypothesis Testing Question Florian believes that a new Artificial Intelligence teaching method can influence student ratings compared to using human tutors. He is however unsure about what this influence can look like because, despite the AI’s greater efficiency, students might still prefer human interaction during their tutorials. Florian then takes a SRS of 27 students from a population of students with a mean rating of µ=30,2 and a standard deviation of σ=16. The sample of students take a lesson from the AI system and then give it a rating with a mean of 24,5. The significance level is 5% Can Florian conclude that the mean rating of the AI system is significantly different from the mean of the normal method? 55 Data Η0: 𝜇& = 𝜇" Hα: 𝜇& ≠ 𝜇" (2-tailed test) α = 0.05 µ = 30.2 σ = 16 n = 27 ̅ 𝑥 = 24.5 Solution Ø The sample size is large enough (n=27), so we can continue with the test Ø We can use the Z formula to calculate the Zobs 𝑍012 = X 𝑋 − 𝜇 𝜎 𝑛 = 24.5 − 30.2 16 27 = −1.85 Ø Using the Z-table we see that a Zobs with a value of -1.85 is matched to a p-value of 0.0322 Ø Since we have a 2-tailed test, we need to double our p-value 𝑝 − 𝑣𝑎𝑙𝑢𝑒×2 0.0322×2 = 0.0644 Ø We can then compare our p-value to the alpha 0.0644 > 0.05 Ø The p-value is larger than the α, thus the null hypothesis cannot be rejected
  • 57. Answers Question Suppose that for a two-sided test, an experimenter decides to have a significance level of 0.10. Which of the following statements is incorrect? 56 A. The Z-critical is going to be equal to ±1.65 B. The probability of a type 1 error is equal to 10% C. If the null hypothesis is rejected at this level, then it will also be rejected at α=0.05 D. With the current significance level, there is a lower probability of not rejecting a false null hypothesis compared to a significance level of 0.05 Answer: C 4. Hypothesis Testing
  • 58. 4E. Hypothesis Testing Question Suppose that for a two-sided test, an experimenter decides to have a significance level of 0.10. Which of the following statements is incorrect? 57 Solution A. Correct. In case of a two-sided test with α=10%, then the Z-critical becomes +/- 1.65 B. Correct. The probability of a type 1 error is always equal to the significance level of the study • Type 1 error = α = 10% C. Incorrect. If the null hypothesis is rejected at α = 10%, it does not necessarily mean that it will be rejected at α = 1% • E.g., a p-value equal to 0.04 is smaller than 0.10, however it is not smaller than 0.01. Thus, the H0 would be rejected at α = 10% but not at α = 1% D. Correct. By increasing the significance level, we make the decision criteria more lenient, making it more difficult to commit a type 2 error. However, we simultaneously increase the risk of a false positive, that is rejecting a true null hypothesis 90% 5% 5%
  • 59. Answers Question A questionnaire has been constructed to measure the level of psychopathy for incarcerated individuals. The population is normally distributed with a mean of 44 and a standard deviation of 12. A researcher wants to check the hypothesis that the population mean is different, so she draws a SRS of 23 individuals. The sample mean is 53. What are the boundaries of a 90% confidence interval based on this specific sample? 58 A. [48.87, 57.13] B. [48.14, 56.90] C. [43.89, 54.96] D. [49.63, 52.47] Answer: A 5. Hypothesis Testing
  • 60. 5E. Hypothesis Testing Question A questionnaire has been constructed to measure the level of psychopathy for incarcerated individuals. The population is normally distributed with a mean of 44 and a standard deviation of 12. A researcher wants to check the hypothesis that the population mean is different, so she draws a SRS of 23 individuals. The sample mean is 53. What are the boundaries of a 90% confidence interval based on this specific sample? 59 Solution H0: µ = 44 Hα: µ≠ 44 µ = 44 σ = 12 n = 23 X 𝑋 = 53 Zc = 1.65 (because it is a 90% CI) 𝑋𝑜𝑏𝑠 ± 𝑍𝑐× 𝜎 𝑛 53 ± 1.65× 12 23 53 − 1.65× 12 23 = 53 − 1.65×2.5 = 48.87 53 + 1.65× 12 23 = 53 + 1.65×2.5 = 57.13 [48.87, 57.13]
  • 61. Answers Question Suppose we have a 95% Confidence Interval [37.2, 42.5]. Calculate the sample mean and the standard error 60 A. X 𝑋 = 40.05, 𝑆𝐸 = 3,39 B. X 𝑋 = 38.74, 𝑆𝐸 = 4.63 C. X 𝑋 = 39.85, 𝑆𝐸 = 1.35 D. X 𝑋 = 41.40, 𝑆𝐸 = 2.22 Answer: C 6. Hypothesis Testing
  • 62. 6E. Hypothesis Testing Sample Mean Suppose we have a 95% Confidence Interval [37.2, 42.5]. Calculate the sample mean and the standard error. α = 5% Zc = 1.96 CI [37.2, 42.5] V 𝒙 ± 𝒁𝒄× 𝝈 𝒏 V 𝒙 ± 𝟏. 𝟗𝟔× 𝝈 𝒏 61 Standard Error Ø Confidence interval: x̄012 ± 𝑍3 ∗ g 4 5 Ø From the previous calculations we can see that: 1.96× 𝜎 𝑛 = ̅ 𝑥−37.2 Ø We already found the sample mean, so we can use it to calculate the fruction: 1.96× 𝜎 𝑛 = 39.85 − 37.2 𝜎 𝑛 = 2.65 1.96 𝜎 𝑛 = 1.35 37.2 = ̅ 𝑥 − 1.96× 𝜎 𝑛 1.96× 𝜎 𝑛 = ̅ 𝑥−37.2 42.5 = ̅ 𝑥 + (1.96× 𝜎 𝑛 ) 42.5 = ̅ 𝑥 + ̅ 𝑥 − 37.2 2 ̅ 𝑥 = 42.5 + 37.2 2 ̅ 𝑥 = 79.7 ̅ 𝑥 = 79.7 2 4 𝒙 = 𝟑𝟗. 𝟖𝟓 Standard Error
  • 63. Answers Question Going back to the example of the previous question, what can be said about the null hypothesis, given that the population mean is equal to 36.05? 62 A. The null hypothesis is accepted B. The null hypothesis is rejected C. The null hypothesis cannot be rejected D. Nothing can be said about the null hypothesis with the current data Answer: B 7. Hypothesis Testing
  • 64. 7E. Hypothesis Testing Question Going back to the example of the previous question, what can be said about the null hypothesis, given that the population mean is equal to 36.05? 63 Solution A. Incorrect. When doing a hypothesis test, we can either reject the null hypothesis or do not reject the null hypothesis, but we can never accept the null hypothesis. We cannot conclude that the null hypothesis is true merely because we did not find evidence to reject it B. Correct. We can see that for our 2-tailed test, the population mean is not included within the range of the 90% CI, so the null hypothesis is rejected C. Incorrect. Since the population mean is not included in the confidence interval, the null hypothesis is rejected D. Incorrect. The second statement is correct.
  • 65. 7E. Hypothesis Testing 64 Condifence Interval Ø A confidence interval is an interval estimate of µ. Ø It shows the values that the population mean probably falls between V 𝑿 ± 𝒁𝒄× 𝝈 𝒏 Interpretation Example: 95% Confidence Interval Ø If we draw infinite Confidence Intervals, then 95% of those CI have the population mean µ Hypothesis Testing Ø We can use the confidence interval to see if the null hypothesis is rejected or not for a two-tailed test Ø If the population mean from the null hypothesis is located inside the interval, then the null hypothesis cannot be rejected because the specific value is a possible population mean Ø If the population mean from the null hypothesis is not located inside the interval, the null hypothesis is rejected
  • 66. Answers Question Tobias investigates the effects of participative leadership on satisfaction levels within employees.The sample mean is equal to 73.8. The boundaries of the 95% confidence interval are [71.4, 76.5]. Calculate the margin of error and the standard error. 65 A. ME = 5.7, SE = 1.22 B. ME = 2.4, SE = 1.22 C. ME = 2.9, SE = 3.91 D. ME = 2.4, SE = 4.75 Answer: B 8. Hypothesis Testing
  • 67. 8E. Hypothesis Testing Question Tobias investigates the effects of participative leadership on satisfaction levels within employees.The sample mean is equal to 73.8. The boundaries of the 95% confidence interval are [71.4, 76.5]. Calculate the margin of error and the standard error 66 Solution X 𝑋 = 73.8 95% 𝐶. 𝐼 → [71.4, 76.5] Zcritical = 1.96 Margin of error: L 𝑋 ± 𝑍5× 𝜎 𝑛 L 𝑋 − 𝑍5× 𝜎 𝑛 = 71.4 𝑍5× 𝜎 𝑛 = L 𝑋 − 71.4 = 73.8 − 71.4 𝑍5× 𝜎 𝑛 = 2.4 Standard error: 𝑍6× 𝜎 𝑛 = 2.4 𝜎 𝑛 = 2.4 𝑍6 = 2.4 1.96 = 1.22
  • 68. Answers Question Kian is the HR manager for Success Formula. He noticed that the employees are lately having more stress than usual, so he decides to evaluate their stress levels using a measurement scale (less points = less stress). On average, the 26 employees had a stress score of 83 with a standard deviation of 17 . Kian then decided to implement a mindfulness program with the goal of reducing stress scores by 8 points. The significance level is 5% What is the power of the test, given that the mindfulness program works as Kian was expecting? 67 A. 0.7734 B. 0.2266 C. 0.6066 D. 0.7123 Answer: B 9. Hypothesis Testing
  • 69. Question Kian is the HR manager for Success Formula. He noticed that the employees are lately having more stress than usual, so he decides to evaluate their stress levels using a measurement scale (less points = less stress). On average, the 26 employees had a stress score of 83 with a standard deviation of 17 . Kian then decided to implement a mindfulness program with the goal of reducing stress scores by 8 points. The significance level is 5% What is the power of the test, given that the mindfulness program works as Kian was expecting? H0: µ = 83 Ηα: µ < 83 Zc = -1.65 α = 0.05 n = 26 σ = 17 µ = 83 µ (new) = 75 68 Answer Ø Find the critical value 𝑍3 = 𝑋3 − 𝜇 𝜎 𝑛 −1.65 = Χ3 − 83 17 26 −5.49 = 𝑋3 − 83 ⇒ 𝑋3 = 77.51 Ø Solve for Z 𝑍3 = 𝑋3 − 𝜇(𝑛𝑒𝑤) 𝜎 𝑛 Z = 77.51 − 75 17 26 = 0.75 Ø Find the β • Using the Z-table, we find a p-value of 0.7734 Ø To calculate the power we use the formula: 𝑷𝒐𝒘𝒆𝒓 = 𝟏 − 𝜷 𝑷𝒐𝒘𝒆𝒓 = 𝟏 − 𝟎. 𝟕𝟕𝟑𝟒 = 𝟎. 𝟐𝟐𝟔𝟔 9E. Hypothesis Testing
  • 70. 9E. Hypothesis Testing 69 Type II Error Ø Definition: We fail to reject a false null hypothesis Ø Measured by β Ø Calculation: • Find the critical value where 𝑯𝒐 would be rejected. • 𝑍5 = 𝑿𝒄78" 9 # $ à solve for 𝑿𝒄 • Z = 𝑿𝒄78% 9 # $ à solve for Z, then look up P Power Ø Definition: The probability that we are able to reject a false null hypothesis Ø Calculation: • Power = 1 - 𝜷 Illustration
  • 71. Answers Question Suppose Micheal is conducting an experiment on fear conditioning. He uses a sample of 65 participants and a significance level of 5%. Before he begins, he wants to make sure that the probability of rejecting a true null hypothesis is as small as possible. Which of the following statements is correct? 70 A. He should increase his sample size B. He should increase the effect size C. He should increase the significance level D. None of the above Answer: D 10. Hypothesis Testing
  • 72. 10E. Hypothesis Testing Questions Suppose Micheal is conducting an experiment on fear conditioning. He uses a sample of 65 participants and a significance level of 5%. Before he begins, he wants to make sure that the probability of rejecting a true null hypothesis is as small as possible. Which of the following statements is correct? 71 Solution A. Incorrect. By increasing the sample size, we decrease the standard error and thus the probability of not rejecting a false null hypothesis (Type II error) B. Incorrect. Increasing the effect size is difficult in real life since researchers do not have any control over it. Theoretically, the higher the effect size, the lower the probability of failing to reject a null hypothesis (Type II error) C. Incorrect. By increasing the significance level, it becomes easier to reject a null hypothesis. We increase the probability of rejecting a true H0 hypothesis (Type I error) D. None of the above alternatives is correct. Rejecting a true null hypothesis is the Type I error and its probability is measured by α. We can reduce the probability by reducing the α, but this increases the probability of type II error (Nor recommended)
  • 73. Stats1 - Question Pool T-tests 72
  • 74. Answers Question A randomly drawn sample of 60 university students undergo exam training. Before the training, their mean score on a practice exam was 68. After the training, their mean score improved by 7 points. What (t-)test would you employ to check if the exam training had a significant effect? 73 A. One-sample t-test B. Paired samples t-test C. Independent samples t-test D. Two-sample t-test Answer: B 1. T-tests
  • 75. 1E. T-tests Question A randomly drawn sample of 60 university students undergo exam training. Before the training, their mean score on a practice exam was 68. After the training, their mean score improved by 7 points. What (t-)test would you employ to check if the exam training had a significant effect? 74 Solution A. Incorrect, we compare two dependent samples not the one sample against the population. B. Correct, the groups are paired since we test the sample twice (before and after exam training). C. Incorrect, the two groups are not independent, they are dependent. D. Incorrect, a two-samples t-test is an independent t-test. The groups were dependent, not independent.
  • 76. Answers Question When testing a null hypothesis about a single population mean, a t-test is usually performed rather than a z-test. A t-test is more likely to be employed because… 75 A. A t-test has more power than a z-test, leading to a more reliable result. B. Quantitative variables can only be analysed with t-tests. C. Z-tests are more prone to type I errors, which are to be avoided. D. In practice, the standard deviation of a population is rarely known. Answer: D 2. T-tests
  • 77. 2E. T-tests T-tests When to use a t-test? When we can’t use the z-scores because, σ (population standard deviation) is unknown • We have to estimate for both parameters. • We use an extra estimate (Sx) • T-distribution is more dispersed relative to the z-distribution • T-test is always less powerful 76 Z-tests Z-tests measure of how many standard deviations our sample (V 𝑿) differs from the hypothesized value of the population mean (𝝁). • Makes use of the z-distribution • More powerful than a t-test • Most times cannot be used, since in reality we do not know much about the parameters of the population
  • 78. Answers Question A researcher is interested in the effect of wearing red lipstick on the score at minigolf. They ask 40 people to wear red lipstick while playing 18 holes on the minigolf court. 70 people played the same 18 holes without wearing red lipstick. The dependent variable is the obtained score after the 18 holes (a lower score is considered to be better). The red lipstick condition had a mean score of 47.5 and a standard deviation of 4.3. The no-red lipstick condition had a mean score of 62 and a standard deviation of 9.2. Which test should the researcher use to test the hull hypothesis that the score at minigolf is not affected by wearing red lipstick? 77 A. An independent samples t-test, assuming unequal population variances. B. An independent samples t-test, assuming equal population variances. C. A paired samples t-test. D. A one-sample t-tests. Answer: A 3. T-tests
  • 79. 3E. T-tests Question A researcher is interested in the effect of wearing red lipstick on the score at minigolf. They ask 40 people to wear red lipstick while playing 18 holes on the minigolf court. 70 people played the same 18 holes without wearing red lipstick. The dependent variable is the obtained score after the 18 holes (a lower score is considered to be better). The red lipstick condition had a mean score of 47.5 and a standard deviation of 4.3. The no-red lipstick condition had a mean score of 62 and a standard deviation of 9.2. Which test should the researcher use to test the hull hypothesis that the score at minigolf is not affected by wearing red lipstick? 78 Solution A. Correct. The 2 groups are independent, and we compare their samples. The goal of the test is to check if the 2 samples come from populations with equal means. We see that the rule of thumb (𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑆𝐷 ×2 > 𝐵𝑖𝑔𝑔𝑒𝑟 𝑆𝐷) does not hold and the groups don’t have equal sample sizes. This means we have to do the t-test without assuming equal variances B. Incorrect. We cannot assume equal variances because the rule of thumb is violated and the group sizes are not equal C. Incorrect. Paired samples t-test requires matched groups or a within-subject design. D. Incorrect. One sample t-test is used when we have 1 population and want to check if its mean is equal to a specific value.
  • 80. 3E. T-tests Assumption T-Test Concerned How to Determine What if Violated Normality All T-Tests 1. Histogram of Sample Scores looks normal 2. Sample Size is large (Central Limit Theorem) Can’t do T-test Quantitative All T-Tests Dependent variable is quantitative Can’t do T-test Dependent Groups Paired T-Test The groups are matched Two-Sample T-test Independent Groups Two-Samples T-Test Two separate groups are measured. Paired T-test Equal Variance Two-Samples T-Test 1. One sample SD is not 2x bigger than the other. (Rule of Thumb). 2. Levene’s Test is not significant. 3. The sample sizes are equal. If the assumption is violated Two-Sample T-test not assuming Equal variance has to be used. à Less powerful 79
  • 81. Answers Question The effect of Ritalin on test performance is tested. 31 participants received a Ritalin pill while another 31 participants received a placebo. The test performance is assumed to be good if the score on the test is high. The null hypothesis is that exam performance is the same both under Ritalin and placebo, while the alternative hypothesis is that Ritalin leads to better test performance. The table below presents the group statistics, computed by SPSS (equal variances assumed). What statement is incorrect? 80 A. The means of the two populations are very similar. However, a visual inspection of the group statistics is not enough to reject the null hypothesis. B. The equal variances assumption is violated, thus we should not interpret the test C. The equal variances assumption is not violated, thus we can interpret the test D. During the t-test, we should compute the weighted average of the two standard deviations Answer: B 4. T-tests condition N Mean Std. Deviation Std. Error Mean Test score placebo 31 10.1182 1.9463 .1699 Ritalin 31 10.9374 2.2824 .4099
  • 82. 4E. T-tests Question The effect of Ritalin on test performance is tested. 31 participants received a Ritalin pill while another 31 participants received a placebo. The test performance is assumed to be good if the score on the test is high. The null hypothesis is that exam performance is the same both under Ritalin and placebo, while the alternative hypothesis is that Ritalin leads to better test performance. The table below presents the group statistics, computed by SPSS. What statement is incorrect? 81 Solution A. Correct. Sample means are random variables, meaning they change depending on the sample. Thus in order to be able to make conclusions about the populations we need to make sure whether the differences between the means are indeed significant. B. Incorrect. The equal variances assumption is not violated. We can check this using the rule of thumb (biggest SD < smallest SD x 2) C. Correct. Using the rule of thumb, we can see that the product of the smallest SD multiplied by 2 is bigger than the bigger SD (Ritalin group), thus the assumption is not violated D. Correct. Since the equal variances assumtpion is not violated, the 2 standard deviations estimate the same population standard deviation. By computing their weighted average (pooled SD), we have the best estimate of σ condition N Mean Std. Deviation Std. Error Mean Test score placebo 31 10.1182 1.9463 .1699 Ritalin 31 10.9374 2.2824 .4099
  • 83. 4E. T-test 82 Checking Equal Variances Assumption We can use 2 ways to check for the assumption 1. Rule of Thumb – Smaller SDx2 should be larger than the Bigger SF 2. Levene’s Test – If the test is significant, the variances are unequal (H0: 𝜎; 4 = 𝜎4 4 ) Violation of Assumption If this assumption is violated, we can continue with the t-test if the sample size across both samples is approximately equally large Special case If there is violation AND the samples have a difference in size, we can do the t-test but only with the following formula: 𝑡 = x̅! − x̅" − (𝜇!− 𝜇") 𝑠! " 𝑛! + 𝑠" " 𝑛" If H0: 𝜇! = 𝜇" → = 0
  • 84. Answers Question Natalia is a memory researcher and as part of her pilot study, she wishes to test the differences in memory recall between severe anxiety patients and controls. She suspects that anxiety patients will have different memory recall scores compared to controls. After a memory test, she compares the scores of the groups. The anxiety group has a mean of 12.6 and a standard deviation of 3.38. The control group has a mean of 13.4 and a standard deviation of 2.61. There are 70 participants in total, equally divided into the 2 groups. What can Natalia conclude about the null hypothesis. 83 A. The null hypothesis is not rejected with 0.10 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.15 B. The null hypothesis is rejected with 0.01 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.05 C. The null hypothesis is not rejected with 0.20 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.30 D. The nyll hypothesis is rejected with 0.02 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.025 Answer: C 5. T-tests
  • 85. 5E. T-tests Question Natalia is a memory researcher and as part of her pilot study, she wishes to test the differences in memory recall between severe anxiety patients and controls. She suspects that anxiety patients will have different memory recall scores compared to controls. After a memory test, she compares the scores of the groups. The control group has a mean of 13.4 and a standard deviation of 2.61. The anxiety group has a mean of 12.6 and a standard deviation of 3.38. There are 70 participants in total, equally divided into the 2 groups. What can Natalia conclude about the null hypothesis. H0: µ1=µ2 Hα: µ1≠ µ2 n1=n2=35 X1=13.4 X2= 12.6 S1=2.61 S2=3.38 84 Solution Ø Since equal variances assumed, we need to calculate the pooled standard deviation 𝑠#= 𝑛! − 1 𝑠! " + (𝑛" − 1)𝑠"² (𝑛!−1) + (𝑛" − 1) 𝑆𝑝 = 34 < 2.61" + 34 < 3.38" 34 + 34 = 3.02 Ø Next, we need to calculate the Tobs 𝑇 = @ 𝑋! − @ 𝑋" 𝑆𝑝 < 1 𝑛1 + 1 𝑛2 𝑇 = 13.4 − 12.6 3.02 < 1 35 + 1 35 𝑇 = 0.8 3.02 < 0.24 = 1.11 Ø Using the t-table we see that the p-value is between the 0.10 and the 0.15. For a 2- tailed test, we need to double these values 0.20 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.30 Bigger SD < Smallest SD x 2 3.38 < 2.61 x 2 3.38 <5.22 (True) à Equal variances assumed
  • 86. Answers Question 85 A. [-6.52, -3.88] B. [-6.34; -4.59] C. [-6.50; -4.0] D. [-7.29;-3.91] Answer: A 6. T-tests An ice cream company has two new potential flavours ready for the market. They developed a tastiness scale scored from 0 to 30. 40 volunteers tasted flavour A and another 25 volunteers tasted flavour B. The obtained values are: @ 𝑋$= 22.8, @ 𝑋% = 28, 𝑠$ = 4.2 and 𝑆% = 1.9. What is the 95% Confidence Interval corresponding to this t-test?
  • 87. 6E. T-tests Question An ice cream company has two new potential flavours ready for the market. They developed a tastiness scale scored from 0 to 30. 40 volunteers tasted flavour A and another 25 volunteers tasted flavour B. The obtained values are: @ 𝑋$= 22.8, @ 𝑋% = 28, 𝑠$ = 4.2 and 𝑆% = 1.9. What is the 95% Confidence Interval corresponding to this t-test? nA=40 nB=25 @ 𝑋$= 22.8 @ 𝑋% = 28 𝑆$ = 4.2 𝑆% = 1.9 86 Solution Ø We are dealing with 2 independent groups, thus we should have an independent samples t-test Ø We have to decide if the assumption of equal variances is violated, in order to use the correct fomrulas 𝑆𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑆𝐷 ×2 > 𝐵𝑖𝑔𝑔𝑒𝑟 𝑆𝐷 1.9×2 > 4.2 3.8 > 4.2 𝑁𝑜𝑡 𝑡𝑟𝑢𝑒 Ø The equal variances assumption is violated, thus we use the special case of the t-test @ 𝑋! − @ 𝑋" ± 𝑇 < 𝑠! " 𝑛! + 𝑆" " 𝑛" 22.8 − 28 ± 1.711 < 4.2" 40 + 1.9" 25 −5.2 ± 1.711 < 0.5854 −5.2 ± 1.711 < 0.77 −5.2 ± 1.32 [−6.52, −3.88]
  • 88. 6E. T-tests Confidence Interval: General Formula Observed X±𝑡& ∗ Standard Error Example: Two-Sample T-Test N=20 (both conditions), @ 𝑋$= −2.1, @ 𝑋% = −3.5, 𝑠$ = 2.05 and 𝑆% = 1.89. What is the 95% CI? @ 𝑋$ − @ 𝑋% ±𝑡& ∗ (𝑠# ∗ ! '! + ! '" ) 𝑠#= !(∗".+,"-!(∗!..( ² !(-!( = 1.97 1.4±2.04*(1.97 ∗ ! "+ + ! "+ ) = [0.13;2.67] Standard Errors of The Different T-Tests One-Sample T-test T 𝑠 𝑛 Paired Sample T-test T 𝑠0 𝑛 Two-Sample T-test 𝑠# ∗ 1 𝑛! + 1 𝑛" Pooled Standard Deviation: 𝑠#= '!1! 2! "-('"1!)2"² ('!1!) - ('"1!) Two-Sample T-test Equal variance not assumed 𝑠! " 𝑛! + 𝑠" " 𝑛" 87
  • 89. Answers Question Suppose we are testing the null hypothesis that the population mean is equal to a specific value and the test is right sided. Refer to the SPSS output. Which of the following statements is correct? 88 A. The null hypothesis is rejected for a significance level of 2.5% B. The null hypothesis is not rejected for a significance level of 5% C. The degrees of freedom were found by taking the smallest sample size and subtracting 1 D. None of the alternatives is correct Answer: A 7. T-tests Test Value = 570 t df Sig. (2- tailed) Mean Difference 95% Confidence Interval Lower Upper Test score 2.139 29 0.041 20.333 0.89 39.77
  • 90. 7E. T-tests Question Suppose we are testing the null hypothesis that the population mean is equal to a specific value and the test is right sided. Refer to the SPSS output. Which of the following statements is correct? 89 Solution A. Correct. The SPSS output gives the p-value for a two-sided test. However, we have a one-tailed test (right sided test means that the alternative hypothesis has the (<) symbol). Thus, we need to divide the p-value by two (0.041/2=0.0205). We can now see that the corrected p-value is smaller than 0.025, thus the H0 is rejected at an α = 2.5% B. Incorrect. The corrected p-value is smaller than 0.05 as well. Thus, the H0 is rejected at α = 5% as well. C. Incorrect. Since we have a one sample t-test, the formula for the degrees of freedom is N-1. It is for an independent samples t-test, not assuming equal variances that we take the smallest n and subtract 1 for the df D. Incorrect. A is the correct one Test Value = 570 t df Sig. (2- tailed) Mean Difference 95% Confidence Interval Lower Upper Test score 2.139 29 0.041 20.333 0.89 39.77
  • 91. Answers Question A researcher wants to test whether ethnic background influences IQ scores of Dutch primary school children. They draw a sample of 50 children with grandparents of Turkish origin and another 50 children with Dutch grandparents. Each child of Turkish descend is match for age and sex with a Dutch one. The groups data is summarized in the table below. A paired sample t-test was used to test this hypothesis. Which of the following tests could have yielded the same result? 90 Mean N Std. Deviation Std. Error Mean Turkish 98.657 50 10.0023 1.6523 Dutch 103.203 50 14.5602 2.2436 A. An independent t-test, assuming equal population variances. B. An independent t-test, assuming unequal population variances. C. A one-sample t-test, conducted for the difference in IQ score between matched children. D. None of the answer above. Answer: C 8. T-tests
  • 92. 8E. T-tests Question A researcher wants to test whether ethnic background influences IQ scores of Dutch primary school children. They draw a sample of 50 children with grandparents of Turkish origin and another 50 children with Dutch grandparents. Each child of Turkish descend is match for age and sex with a Dutch one. The groups data is summarized in the table below. A paired sample t-test was used to test this hypothesis. Which of the following tests could have yielded the same result? 91 Solution A. Incorrect, the two groups are match, so they are dependent, not independent. B. Incorrect, the two groups are match, so they are dependent, not independent. C. Correct, a paired samples t-test compares the means of the samples to check whether there is a difference between their means. The 2 tests have the same calculations, thus if one finds the mean differences and then performs a one sample t-test on the differences, they would get the same result. D. Incorrect. Answer is C Mean N Std. Deviation Std. Error Mean Turkish 98.657 50 10.0023 1.6523 Dutch 103.203 50 14.5602 2.2436
  • 93. Answers Question Inspect the given output. What answer is Correct? 92 A. Lavene’s Test is not significant, therefore equal variances can be assumed. B. The Tobs is equal to -2.845 C. According to the t-table, the null hypothesis is rejected D. All answers are correct. Answer: D 9. T-tests ? ? ? ?
  • 94. 9E. T-tests Question Inspect the given output. What answer is Correct? 93 Solution A. Correct. Levene’s Test has the null hypothesis that the population variances are equal (𝜎! " = 𝜎" " ). Since we can see that the p-value is a lot larger than 0.05 (p-value = 0.582), we can say that the null hypothesis is not rejected and that there is no violation of the equal variances assumption B. Correct. We can calculate the Tobs by dividing the Mean difference ( ̅ 𝑥! − ̅ 𝑥" = −14.00) by the Std. Error difference (𝑠# ∗ ! '! + ! '" = 4.92). This will give us -2.845 C. Correct. The null hypothesis in this case is rejected because the value 0 is not located in the 95% CI, meaning that the population difference between the 2 groups cannot be 0 ? ? ? ?
  • 95. Answers Question Florian is the GM of Success Formula and has recently heard that colour can influence learning performances and outcomes. He was informed that research has shown that the colour blue leads to better performances in tests and better recall. The classes at SF however are painted in white. Florian decides to test if indeed the colour blue leads to better results compared to white. He gathers 38 students and assigns them to 2 groups. The groups are matched together in regards to skill, age, motivation and more. One group takes the class in a room painted white, while the second group in a room painted blue. The test score means afterwards are compared. The population distribution of difference scores is normal. Florian gets the following SPSS output. Which statement is correct? 94 10. T-tests Paired Differences Mean Std. Deviation Std. Error Mean 95% CI T df Sig (2- tailed) Lower Upper Pair 1. White - Blue -.579 2.524 .579 -1.795 .637 -1.000 18 .331
  • 96. Answers Question Florian is the GM of Success Formula and has recently heard that colour can influence learning performances and outcomes. He was informed that research has shown that the colour blue leads to better performances in tests and better recall. The classes at SF however are painted in white. Florian decides to test if indeed the colour blue leads to better results compared to white. He gathers 38 students and assigns them to 2 groups. The groups are matched together in regards to skill, age, motivation and more. One group takes the class in a room painted white, while the second group in a room painted blue. The test score means afterwards are compared. The population distribution of difference scores is normal. Florian gets the following SPSS output. Which statement is correct? 95 A. There is a probability of 0.331 that the H0 is true B. The researcher might be making a Type I error C. The researcher might be making a Type II error. D. Since the TOBS is not located within the 95% CI, the null hypothesis can be rejected Answer: C 10. T-tests
  • 97. 10E. T-tests Question Florian is the GM of Success Formula and has recently heard that colour can influence learning performances and outcomes. He was informed that research has shown that the colour blue leads to better performances in tests and better recall. The classes at SF however are painted in white. Florian decides to test if indeed the colour blue leads to better results compared to white. He gathers 38 students and assigns them to 2 groups. The groups are matched together in regards to skill, age, motivation and more. One group takes the class in a room painted white, while the second group in a room painted blue. The test score means afterwards are compared. The population distribution of difference scores is normal. Florian gets the following SPSS output (next slide). Which statement is correct? 96 Solution A. Incorrect. The p-value is 0.331 and it is defined as the probability that our data (or more extreme data) would have occurred, given that the null hypothesis is true. The p-value does not give the probability that H0 is true. It is the conditional probability with the condition that H0 is true B. Incorrect. Type 1 error is defined as rejecting a true null hypothesis. However, our p-value is larger than 0.05, thus we did nor reject the null hypothesis in the first place. The probability that we are making a Type 1 error in this case is 0% C. Correct. Type 2 error is defiened as not rejecting a false null hypothesis. Since the p-value is larger than our significance level, we did reject H0, but there is always the chance that we made an error D. Incorrect. While using the CI to see if the H0 is rejected or not for a paired samples t-test, we need to see if the value 0 is located in the interval, not the Tobs. This is becausle the null hypothesis states that there is no difference.
  • 98. 10E. T-tests Type I Error The null hypothesis is true but we reject it. à Measured with α 97 Graphical Illustration Type II Error The null hypothesis is false but we fail to reject it. à Measured by β
  • 99. Stats I – Question Pool ANOVA 98
  • 100. Answers Question ANOVA assumes the following statistical model: 𝑌𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝜀𝑖𝑗, in which Yij denoting the score of person j in group i. Choose the incorrect statement from below: 99 A. µ1= Yij - 𝜀𝑖𝑗 represents the mean of group 1 B. εij has a different value for each individual participant, regardless of treatment effects. C. µ is a variable effect, specific to each participant. D. If there is no treatment effect, αi is equal among all participants. Answer: C 1. ANOVA
  • 101. 1E. ANOVA Question ANOVA assumes the following statistical model: 𝑌𝑖𝑗 = 𝜇 + 𝛼𝑖 + 𝜀𝑖𝑗, in which Yij denoting the score of person j in group i. Choose the incorrect statement from below: 100 Solution A. Correct. The difference between the individual score from the group mean is a great indicator of the unexplained variation caused by factors not controlled. It can be written as 𝜀𝑖𝑗 = Yij − 𝜇5 ⇔ 𝜇5 = Yij − 𝜀𝑖𝑗 B. Correct. Individual differences are uncontrollable factors that result in the divergence of scores of participants within the same groups. For each participant, regardless the treatment effects, the individual differences/residual factors are different C. Incorrect, µ is a constant effect. It refers to the factors that are the same in all conditions. It stays the same for each subject. D. Correct, if there is no treatment effect, 𝛂𝐢 (for all participants) = 0.
  • 102. 1E. ANOVA Main Formula 𝐘𝐢𝐣 = 𝛍 + 𝛂𝐢 + 𝛆𝐢𝐣 101 Sum of Squares ∑(𝒀𝒊𝒋 -Ӯ)² = ∑𝒊𝒏𝒊(Ӯ𝒊- Ӯ)² + ∑(𝒀𝒊𝒋 - Ӯ𝒊)² Participant j in group i Constant effect Effect of group i Effect of remaining factors of participant j in group i (error) = + + Total sum of squares (TSS) Between group sum of squares (SSG) Within group sum of squares (SSE) = +
  • 103. 1E. ANOVA Example SSG (Between Groups) SSG = ∑5𝑛5(Ӯ5- Ӯ)² SSG = 3*(2-4)²+3*(4-4)²+3*(6-4)² SSG = 24 Tip: Alternative notation of 𝛼5= µ5 - µ Here µ5=Ӯ5 (mean of single group) and µ=Ӯ (total mean). Preparation What is the mean of each group Ӯ!= (1+2+3)/3 = 2 Ӯ"= (3+4+5)/3 = 4 Ӯ7= (4+5+6)/3 = 6 What is the total mean? Ӯ = (2+4+6)/3 = 4 SSW (Within Groups) SSW = ∑(𝑌58 - Ӯ5)² SSW = (1-2)²+(2-2)²+(3-2)²+(3-4)²…+(7-6)² SSW = 6 Tip: Alternative notation of 𝜀58= 𝑌58 - µ5 Here µ5 is the same as Ӯ5. Both describe the mean of a single group. G1 G2 G3 P1 1 3 5 P2 2 4 6 P3 3 5 7 3 different conditions with 3 participants each 102
  • 104. Answers Question Participants were asked to memorise a list of words. They were divided into several groups, each using a different memorization technique. 60 minutes later, the experimenter assessed how many words they could still remember (the dependent variable RECALL in the output). Which statement is correct? 103 A. The experimental setting had 3 conditions. B. The total variance equals 4.91 C. The ANOVA test is significant (𝛂= 5%). D. All answer are correct. Answer: D 2. ANOVA 41.566 41.850 83.416 20.783 2.790
  • 105. 2E. ANOVA 104 Question Solution A. Correct. The degrees of freedom between groups is given by the formula 𝑘 − 1. à Degrees of freedom for “between groups” is equal to “number of groups minus 1” (k-1). In our case we had 3 conditions so df=(3-1) = 2 B. Correct. The total variance can be found by the formula 𝑀𝑆9:9;< = ==9 >?# = .7.@!A !B = 4.91 C. Correct. The ANOVA SPSS output has a p- value of 0.006 for an F=7.447. The p-value is smaller than the significance level 5%, thus the test is significant. D. Yes, they are all correct. Participants were asked to memorise a list of words. They were divided into several groups, each using a different memorization technique. 60 minutes later, the experimenter assessed how many words they could still remember (the dependent variable RECALL in the output). Which statement is correct? 41.566 41.850 83.416 20.783 2.790
  • 106. Answers Question A sample of n= 35 participants was randomly selected from UM students pool. A baseline assessment rated their arachnophobia. After undergoing 2 sessions of exposure therapy (to spiders), their arachnophobia was measured again with the same scale. The researcher wants to see if the 2 sessions of exposure therapy had a significant effect. Should an ANOVA test be performed on this data set? 105 A. Yes, the normality assumptions hold since the sample size is big enough. B. Yes, the equal variances assumptions is met because 35 participants were tested both times. C. No, The independence assumption is violated. D. Yes, the data is quantitative as their phobia is rated on scale. Answer: C 3. ANOVA
  • 107. 3E. ANOVA Answers A sample of n= 35 participants was randomly selected from UM students pool. A baseline assessment rated their arachnophobia. After undergoing 2 sessions of exposure therapy (to spiders), their arachnophobia was measured again with the same scale. The researcher wants to see if the 2 sessions of exposure therapy had a significant effect. Should an ANOVA test be performed on this data set? 106 Solution A. Correct, but the main criteria for an ANOVA: independent groups is violated. Thus, an ANOVA is not the suitable test here. B. Incorrect, the same sample is tested twice (baseline and after exposure). We are not comparing independent groups. C. Correct, the same sample is tested twice (baseline and after exposure). We are not comparing independent groups. D. Correct, but the main criteria for an ANOVA: independent groups is violated. Thus, an ANOVA is not the suitable test here.
  • 108. Answers Question An experiment on the effect of listening to music on information retention is performed. A total sample of 75 is divided into three equally large groups. All three groups are asked to memorized a list of words while either (a) listening to Vivaldi, (b) listening to AC/DC, or (c) listening to crickets singing. An analysis of variance is performed. It is concluded that the null hypothesis cannot be rejected. What statement is correct? 107 A. MSG and MSE are both unbiased estimators of the error variance. B. Since the null hypothesis is true, then the difference between groups is as large as difference within groups. C. There is no group effect. D. All are correct Answer: D 4. ANOVA
  • 109. 4E. ANOVA Question An experiment on the effect of listening to music on information retention is performed. A total sample of 75 is divided into three equally large groups. All three groups are asked to memorized a list of words while either (a) listening to Vivaldi, (b) listening to AC/DC, or (c) listening to crickets singing. An analysis of variance is performed. It is concluded that the null hypothesis cannot be rejected. What statement is correct? 108 Solution A. Correct. When H0 is rejected, it means that the difference between groups was caused by uncontrolled factors (error). This means that the MS(G) is an unbiased estimator of error variance. MSE is an unbiased estimator of error variance in any case. B. Correct. The difference between groups is measured by MSG while the difference within groups is measured by MSE. In the case of a true null hypothesis, both MSE and MSQ are unbiased estimators of error variance, thus MSE=MSG C. Correct. The H0 for ANOVA states that the means of all groups are equal, meaning that there is no treatment effect. D. Correct
  • 110. 4E. ANOVA 109 Unbiased Estimator • MSE is an unbiased estimator of error variance. Pooled Variance • Since we already have the assumption that all populations have equal variance, we can take the average of estimates. 𝑆𝑝" = 𝑁! − 1 ×𝑆! " + 𝑁" − 1 ×𝑆" " +. . +(𝑁' − 1)×𝑆' " 𝑁! − 1 + 𝑁" − 1 +. . +(𝑁' − 1) Conclusion • MSE = Sp 2 • Accurate and efficient error estimate.
  • 111. 4E. ANOVA Random Variables MSG and MSE count as random variables. MSE and MSG as Estimators of Error Variance If there is no group effect (𝐻+: true) MSE as well as MSG count as unbiased estimations of the error variance. Relation of MSE and MSG MSE is the error (or noise) MSG is the error + the effect of the group. If 𝐻+ is true and there is no effect of the group MSE and MSG will be approximately equal. Another way to phrase this would be, the difference between groups is as large as difference within groups. 110
  • 112. Answers Question Synesthesia is a perceptual phenomeneon in which there is an experience of 2 sensory/cognitive pathways. Synesthesia has been linked to enhanced memory skills due to increased association available. Anton wanders if there is a difference in memory recall between different synesthesia types. He gathers 120 participants and within his sample, there are 4 different synesthesia types. Each group has an equal number of participants. After a memorization period, Anton gives his participants a memory test. Following an ANOVA, SSG = 167.91 and SSE = 1760.88 What can be concluded? 111 A. H0 not rejected with p-value > 0.05 B. H0 rejected with 0.025 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.05 C. H0 rejected with 0.01 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.025 D. Ho not rejected because Fobs< Fcritical Answer: C 5. ANOVA
  • 113. 5E. ANOVA Question Synesthesia is a perceptual phenomeneon in which there is an experience of 2 sensory/cognitive pathways. Synesthesia has been linked to enhanced memory skills due to increased association available. Anton wanders if there is a difference in memory recall between different synesthesia types. He gathers 120 participants and within his sample, there are 4 different synesthesia types. Each group has an equal number of participants. After a memorization period, Anton gives his participants a memory test. Following an ANOVA, SSG = 167.91 and SSE = 1760.88 What can be concluded? 112 Solution Ø Calculate the degrees of freedom 𝑑𝑓 𝐺 = 𝑘 − 1 = 4 − 1 = 3 𝑑𝑓 𝐸 = 𝑁 − 𝑘 = 120 − 4 = 116 Ø Calculate the Mean Squares 𝑀𝑆 𝐺 = 𝑆𝑆𝐺 𝑑𝑓(𝐺) = 167.91 3 = 55.97 𝑀𝑆 𝐸 = 𝑆𝑆𝐸 𝑑𝑓(𝐸) = 1760.88 116 = 15.18 Ø Calculate the F-value 𝐹 = 𝑀𝑆(𝐺) 𝑀𝑆(𝐸) = 55.97 15.18 = 3.687 Ø By taking a look at the F-table we see that for α=0.05, the Fc(3.116)=2.70, which means the null hypothesis is rejected 𝐹C%= > 𝐹D Ø On the next pages we see that for α=0.01, the Fc = 3.98 0.01 ≤ 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ≤ 0.025
  • 114. Answers Question Based on the ANOVA output, which of the following statements are correct? 113 A. The scores on the dependent variable likely vary due to residual effects only. B. The scores on the dependent variable likely vary due to residual effects and group effect. C. The scores on the dependent variable likely vary due to group effect only D. The scores on the dependent variable likely do not vary due to residual effects nor due to the group effect. Answer: B 6. ANOVA Sum of Squares df Mean Square F Sig Between Groups 126 1 126 4.4843 ? Within Groups 1630 58 28.1034 Total 1756 59
  • 115. 6E. ANOVA Question Based on the ANOVA output, which of the following statements are correct? 114 Solution Ø Using the F-table, we can see that for α=0.05, the 𝐹𝑐 1.58 = 4.03 Ø The Fobs is bigger than the Fc, meaning that the null hypothesis is rejected Ø There is an overall treatment effect, thus not all group means are the same Ø However, error cannot be controlled for, so it is always there Scores likely vary due to treatment/group effect AND error/residual factors Sum of Squares df Mean Square F Sig Between Groups 126 1 126 4.4843 ? Within Groups 1630 58 28.1034 Total 1756 59
  • 116. Answers Question Maja conducted a study with 5 conditions and 30 participants in total have been recruited. Choose the correct statement: 115 A. F = 21.801, not significant B. F = 17.474, not significant C. F = 19.625, significant D. F = 18.926, significant Answer: D 7. ANOVA ? ? ? ? ? ? ? ? 2244.500 9041.367
  • 117. 7E. ANOVA Question Maja conducted a study with 5 conditions and 30 participants in total have been recruited. Choose the correct statement: 116 Solution 1) Calcualte the SS(G): 𝑆𝑆𝑇 = 𝑆𝑆𝐺 + 𝑆𝑆𝐸 𝑆𝑆𝐺 = 𝑆𝑆𝑇 − 𝑆𝑆𝐸 𝑆𝑆𝐺 = 9041.367 − 2244.5 = 6796.867 2) Calcualte degrees of freedom: 𝑑𝑓 𝐺 = 𝑘 − 1 = 5 − 1 = 4 𝑑𝑓 𝐸 = 𝑁 − 𝑘 = 30 − 5 = 25 𝑑𝑓 𝑇 = 𝑁 − 1 = 30 − 1 = 29 3) Calculate Mean squares: 𝑀𝑆 𝐺 = 𝑆𝑆𝐺 𝑑𝑓(𝐺) = 6796.867 4 = 1699.217 𝑀𝑆 𝐸 = 𝑆𝑆𝐸 𝑑𝑓(𝐸) = 2244.5 25 = 89.780 4) Calculate F-value: 𝐹 = 𝑀𝑆𝐺 𝑀𝑆𝐸 = 1699.217 89.780 = 18.926 5) Use the F-table to reach yout decision: 𝐹𝑐 4,25 = 2.76 ⇒ 𝐹𝑜𝑏𝑠 > 𝐹𝑐 ⇒ 𝑆𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑡 ? 2244.500 9041.367 ? ? ? ? ? ? ?
  • 118. Answers Question Micheal is a sports enthusiast. He wants to investigate which form of excersise leads to better concentration. He recruits 75 participants and assigns them randomly to 3 groups (cardio, weights, crossfit). He later measures their concentration levels and compares the means of the groups. Given that Micheal ended up rejecting the null hypothesis, which of the following is correct? 117 A. There is no difference in concentration levels between groups B. Micheal can confidently say that cardio is better than weights C. Micheal needs an extra statistical analysis D. There is no treatment effect Answer: C 8. ANOVA
  • 119. 8E. ANOVA Question Micheal is a sports enthusiast. He wants to investigate which form of excersise leads to better concentration. He recruits 75 participants and assigns them randomly to 3 groups (cardio, weights, crossfit). He later measures their concentration levels and compares the means of the groups. Given that Micheal ended up rejecting the null hypothesis, which of the following is correct? 118 Solution A. Incorrect. The null hypothesis states that all group means are the same (no treatment effect). By rejecting the null hypothesis we can confidently say that not all group means are the same. B. Incorrect. By rejecting the null hypothesis, we know that not all group means are the same, however we do not know where the difference is exactly (e.i., between which groups). C. Correct. If we want to uncover the exact nature of the group difference, we need to conduct multiple comparisons. D. Incorrect. Null hypothesis was rejected, thus there is treatment effect.
  • 120. Answers Question Micheal did conduct multiple comparisons to examine the differences between groups. What can be concluded based on the SPSS output? 119 9. ANOVA Dependent Variable: Concentration scores LSD (I) Group (J) Group Mean Difference Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound Cardio Weights 0.1762 0.5102 0.730 -.8338 1.1861 Crossfit 1.4606 0.5470 0.009 .3778 2.5435 Weights Cardio -.1762 0.5102 0.730 -1.1861 .8338 Crossfit 1.2844 0.5696 0.026 .1569 2.4119 Crossfit Cardio -1.4606 0.5470 0.009 -2.5435 -.3778 Weights -1.2844 0.5696 0.026 -2.4119 -.1569
  • 121. Answers Question Micheal did conduct multiple comparisons to examine the differences between groups. What can be concluded based on the SPSS output? 120 A. There are 2 statistically significant comparisons B. There is 1 statistically significant comparison C. All three comparisons are statistically significant D. None of the comparisons reaches significance Answer: B 9. ANOVA
  • 122. 9E. ANOVA Question Micheal did conduct multiple comparisons to examine the differences between groups. What can be concluded based on the SPSS output? 121 Family-wise Type 1 error In a multiple comparison the α-value of each comparison is added up. Hence, the chance of making a Type I Error increases Solution Ø While the output does show 2 comparisons that reach significance (cardio-crossfit, weights-crossfit), no Bonferroni correction has been appied for the family-wise Type 1 error. Ø By applying the Bonferroni correction (multiply p-value by number of comparisons), we see that only the comparison between cardio and crossfit remains significant Bonferroni Correction 1. Multiply p-value by number of comparisons Or 2. Divide significance level by number of comparisons Number of comparisons: (k(k-1))/2)
  • 123. Answers Question Given that the groups have equal sample sizes and the following output, which statement is correct? 122 A. The normality assumption was violated, so the test should not have been done B. An independent samples t-test could be done instead of ANOVA C. MSE is smaller than MSG, hence the treatment effects are significant D. If the test is significant, multiple comparisons are the necessary next step Answer: C 10. ANOVA Sum of Squares df Mean Square F Sig Between Groups 126 1 126 4.4843 ? Within Groups 1630 58 28.1034 Total 1756 59
  • 124. 10E. ANOVA Question Given that the groups have equal sample sizes, which statement is correct, given the following output? 123 Solution A. Incorrect. We can see that our sample size is 60 (N-1=59 à N=60). Given that each group has 30 participants, the CLT can be applied, thus the test is robust against a normality violation B. Correct. Since we have just 2 groups, an independent samples t-test would be equivalent to this ANOVA. C. Incorrect. It might be that MSE is smaller than MSG, thus F is bigger than 1, but we always have to rely on the p-value which tells us whether the result is actually significant D. Incorrect. Since we only have two groups, if the test is significant, we can immediately tell between which groups there is a difference, thus it is not a necessity to conduct multiple comparisons. However if we want to see how the difference will look like, we can continue on with them. Sum of Squares df Mean Square F Sig Between Groups 126 1 126 4.4843 ? Within Groups 1630 58 28.1034 Total 1756 59
  • 125. Stats1 – Question Pool Proportions, Entire Distributions
  • 126. Answers Question Florian is the new general manager at Success Formula, replacing Michalina. Success formula offers courses in Psychology, Business Economics and Law. During the time Michalina was GM, 60% the student population at SF attended Business Economics courses, 25% Psychology courses and 15% Law courses. After an intense marketing campaign, Florian believes that this year, things will be different. In a simple random sample of 275 students, 145 of them chose B/E courses, 75 choose psychology and 55 choose law. Based on the data, Florian wants to test whether the population distribution of field choice will change or will it be the same as during Michalina’s reign as GM. Does the result from the sample give sufficient evidence? 125 A. No, the null hypothesis is not rejected with the observed value of the statistic test equal to 1.23 B. Yes, the null hypothesis is rejected with the observed value of the statistic test equal to 7,57 C. No, the null hypothesis is not rejected with the observed value of the statistic test equal to 2.50 D. Yes, the null hypothesis is rejected with the observed value of the statistic test equal to 9.93 Answer: B 1. Proportions and Entire Distributions
  • 127. 1E. Proportions and Entire Distributions Question Florian is the new general manager at Success Formula, replacing Michalina. Success formula offers courses in Psychology, Business Economics and Law. During the time Michalina was GM, 60% the student population at SF attended Business Economics courses, 25% Psychology courses and 15% Law courses. After an intense marketing campaign, Florian believes that this year, things will be different. In a simple random sample of 275 students, 145 of them chose B/E courses, 75 choose psychology and 55 choose law. Based on the data, Florian wants to test whether the population distribution of field choice will change or will it be the same as during Michalina’s reign as GM. Does the result from the sample give sufficient evidence? 126 Solution Ø We see that we have only 1 variable (count of students) which has more than 2 levels (3) Ø We want to see how well the sample distribution fits a specific model Ø We have to use the X2 Goodness of Fit Test
  • 128. 1E. Proportions and Entire Distributions Data Model: BE(60%)-Psy(25%)-Law(15%) N = 275 H0: Distribution within sample fits the model Hα: Distribution within sample does not fit model 127 Solution Ø Calculate Expected Counts [𝐸𝑐 = 𝑁×𝑃 𝑒 ] • B/E: 275 x 0.6 = 165 • Psy: 275 x 0.25 = 68.75 • Law: 275 x 0.15 = 41.25 Ø Calculate the chi-square 𝑥! = Σ 𝑂𝐶 − 𝐸𝐶 ! 𝐸𝐶 𝑥! = 145 − 165 ! 165 + 75 − 68.75 ! 68.75 + 55 − 41.25 ! 41.25 𝑥! = 2.42 + 0.57 + 4.58 𝒙𝟐 = 𝟕. 𝟓𝟕 Ø Check the x2 table for the p-value 𝟎. 𝟎𝟐 ≤ 𝒑 − 𝒗𝒂𝒍𝒖𝒆 ≤ 𝟎. 𝟎𝟐𝟓 We see that the p-value should be lower than 0.05, thus the H0 that the distribution within the sample fits the model is rejected. Students Business/Economics 145 Psychology 75 Law 55
  • 129. 1E. Proportions and Entire Distributions When to Use Data type: categorical data à Check how well a proposed proportion distribution fits with an observed one. 𝐻#: 𝑇ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑖𝑡𝑠 𝑜𝑢𝑟 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 𝐻$: 𝑇ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑓𝑖𝑡 𝑜𝑢𝑟 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑡𝑖𝑜𝑛 Degrees of Freedom Nationality of class Dutch 0.2 German 0.5 Belgian 0.2 French 0.1 Formula Χ!= Σ Obs−Exp ! Exp Assumptions • Categorical Data • Expected Counts >5 EC = N*p(e) Df = # of cells – 1 df = 4-1 = 3 128
  • 130. Answers Question Andreia has been researching the effectiveness of dialectical behavior therapy (DBT), a type of cognitive behavioural therapy, for the development of healthy ways to cope with stress and emotion regulation. She wonders whether DBT has different efficiency levels for different types of populations. She decides to take two samples, one of people exhibiting eating disorders and one of people with substance use disorders. After several sessions, Andreia and her team, note for each subject if there was improvement or not. Andreia is the first researcher to conduct such a study, so she does not know how the different disorders can have an effect on improvement. What can be concluded? 129 2. Proportions and Entire Distributions Improvement Yes No Disorder Eating Disorders 148 112 260 Substance use Disorders 173 102 275 321 214 535
  • 131. Answers Question Andreia has been researching the effectiveness of dialectical behavior therapy (DBT), a type of cognitive behavioural therapy, for the development of healthy ways to cope with stress and emotion regulation. She wonders whether DBT has different efficiency levels for different types of populations. She decides to take two samples, one of people exhibiting eating disorders and one of people with substance use disorders. After several sessions, Andreia and her team, note for each subject if there was improvement or not. Andreia is the first researcher to conduct such a study, so she does not know how the different disorders can have an effect on improvement. What can be concluded? 130 A. The null hypothesis is not rejected with the observed value of the statistic test equal to 0.98 B. The null hypothesis is rejected with the observed value of the statistic test equal to 1.36 C. The null hypothesis is not rejected with the observed value of the statistic test equal to -1.36 D. The null hypothesis is rejected with the observed value of the statistic test equal to -2.71 Answer: C 2. Proportions and Entire Distributions
  • 132. 2E. Proportions and Entire Distributions Data Ø We now compare 2 independent samples Ø The dependent variable is dichotomous Ø We have to use a 2 proportion z-test 𝐻#: 𝜋% = 𝜋! 𝐻$: 𝜋% ≠ 𝜋! 𝑝1 = 𝑥% 𝑛% = 148 260 = 0.57 𝑝2 = 𝑥! 𝑛! = 173 275 = 0.63 𝜋 = 𝑥% + 𝑥! 𝑛% + 𝑛! = 148 + 173 260 + 275 = 0.6 131 Solution Ø Calculate the Z 𝑍 = 𝑝1 − 𝑝2 − (𝜋1 − 𝜋2) 𝜋 < (1 − 𝜋) < 1 𝑛1 + 1 𝑛2 𝑍 = 0.57 − 0.63 0.6(1 − 0.6) < 1 260 + 1 275 𝑍 = −0.06 0.49 < 0.09 = −1.36 Ø Look at the Z-table for the p-value P-value(z=-1.36)= 0.0869 Ø Double the p-value since it is a two-tailed test 2x0.0869 = 0.1738 > 0.05 The null hypothesis cannot be rejected.
  • 133. 2E. Proportions and Entire Distributions When to Use Comparing the proportion of two groups (categorical data). 𝐻#: 𝑝% = 𝑝! 𝐻$: 𝑝% ≠ 𝑝!(two-sided) 𝐻$: 𝑝% < 𝑝!or 𝐻$: 𝑝% > 𝑝!(one-sided) Assumptions: • Categorical variables à dichotomous • Independent groups • Normality - always violated - Central Limit Theorem Formulas and Application Z-score = (' (!) * ("))# ,- Estimate: • 𝑝% − • 𝑝! SE (for z-test): ' (!∗(%)* (%) /! + ' ("∗(%)* (!) /" Confidence Interval p1 – p2 ± 𝑍! "#(#%"#) '# + "((#%"() '( 132
  • 134. Answers Question Refer back to the previous question. What is the 95% confidence interval? 133 A. [0.063, 0.015] B. [-0.014, 0.023] C. [-0.053, 0.090] D. [1.678, 3.683] Answer: B 3. Proportions and Entire Distributions
  • 135. 3E. Proportions and Entire Distributions Question Refer back to the previous question. What is the 95% confidence interval? 134 Solution 𝑝1 − 𝑝2 ± 𝑍𝑐 < 𝑝1 1 − 𝑝1 𝑛1 + 𝑝2 1 − 𝑝2 𝑛2 0.57 − 0.63 ± 1.96 < 0.57 < 0.43 260 + 0.63 < 0.37 275 −0.06 ± 1.96 < 0.042 [−0.014, 0.023]
  • 136. Answers Question Nik wants to see if there is association between the presence of neuroscientific evidence (1=no, 2=yes) and juror verdicts (not guilty=1, not guilty due to insanity=2 guilty=3). What can be concluded based on the table? 135 4. Proportion and Entire Distribution Neuroscientific Evidence No Yes Verdict Not Guilty 32 29 61 Not Guilty due to insanity 55 61 116 Guilty 10 13 23 97 103 200
  • 137. Answers Question Nik wants to see if there is association between the presence of neuroscientific evidence (1=no, 2=yes) and juror verdicts (not guilty=1, not guilty due to insanity=2 guilty=3). What can be concluded based on the table? 136 A. The null hypothesis is not rejected with the observed value of the statistic test equal to 0.67 B. The null hypothesis is rejected with the observed value of the statistic test equal to 1.30 C. The null hypothesis is not rejected with the observed value of the statistic test equal to 0.20 D. The null hypothesis is rejected with the observed value of the statistic test equal to 0.65 Answer: A 4. Proportion and Entire Distribution
  • 138. 4E. Proportion and Entire Distribution Data Ø We want to study the relationship of two categorical variables Ø We use a contigency table Ø We use the chi-square test for contigency tables Expected Counts: 𝐸𝐶 = 𝑇𝑜𝑡𝑎𝑙 𝑟𝑜𝑤 < 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑙𝑢𝑚𝑛 𝑁 137 Solution Ø Caclualte the chi-square 𝑥! = Σ 𝑂𝐶 − 𝐸𝐶 ! 𝐸𝐶 𝑋! = 32 − 29.585 ! 29.585 + 55 − 56.26 ! 56.26 + 10 − 11.155 ! 11.155 + 29 − 31.415 ! 31.415 + 61 − 59.740 ! 59.740 + 13 − 11.845 ! 11.845 𝑥! = 0.197 + 0.028 + 0.119 + 0.186 + 0.026 + 0.113 𝑋! = 0.669 = 0.67 Ø Calculate df 𝑑𝑓 = #𝑟𝑜𝑤𝑠 − 1 < #𝑐𝑜𝑙𝑢𝑚𝑛𝑠 − 1 = 3 − 1 < 2 − 1 = 2 Ø Check the p-value The p-value looks to be greater than 0.25, thus the null hypothesis cannot be rejected. No Yes Not Guilty 32 (29.585) 29 (31.415) Not Guilty due to Insanity 55 (56.26) 61 (59.740) Guilty 10 (11.155) 13 (11.845)