I am Ben R. I am a Statistics Assignment Expert at statisticshomeworkhelper.com. I hold a Ph.D. in Statistics, from University of Denver, USA. I have been helping students with their homework for the past 5 years. I solve assignments related to Statistics.
Visit statisticshomeworkhelper.com or email info@statisticshomeworkhelper.com.
You can also call on +1 678 648 4277 for any assistance with Statistics Assignments.
This PowerPoint helps students to consider the concept of infinity.
Statistics Homework Help
1. For any help regarding Statistics and probability Assignments Help
visit: https://www.statisticshomeworkhelper.com
Email- info@statisticshomeworkhelper.com or call us at-+1 678 648 4277
Statistics Assignment Help
statisticshomeworkhelper.co
m
2. Introduction to probability and statistics
Problem
Problem 1.Fitting a line to data using the MLE.
Suppose you have bivariate data (x1, y1),..., (xn, yn). A common model is that there is a linear relationship between
x and y, so in principle the data should lie exactly along a line. However since data has random noise and our model
is probably not exact this will not be the case. What we can do is look for the line that best fits the data. To do this
we will use a simple linear regression model.
For bivariate data the simple linear regression model assumes that the xi are not random but that for some values of
the parameters aand bthe value yi is drawn from the random variable
Yi ∼ axi + b+ εi
where εi is a normal random variable with mean 0 and variance σ2. We assume all of the random variables εi are
independent.
Notes. 1. The model assumes that σ is a known constant, the same for each εi.
2. We think of εi as the measurement error, so the model says that
yi = axi + b+ random measurement error.
3. Remember that (xi, yi) are not variables. They are values of data.
(a) The distribution of Yi depends on a, b, σ and xi. Of these only a and bare not known.
Give the formula for the likelihood function f (yi |a,b,xi, σ) corresponding to one random
value yi. (Hint: yi − axi − b∼ N(0,σ2).)
(b) (i) Suppose we have data (1, 8), (3, 2), (5, 1). Based on our model write down the likelihood and log likelihood
as functions of a, b, and σ.
(ii) For general data (x1, y1), ..., (xn, yn) give the likelihood and and log likelihood functions (again as functions of a, b,
and σ).
(c)Assume σ is a constant, known value. For the data in part b(i) find the maximum likelihood estimates for a and
b
(d)Use R to plot the data and the regression line you found in problem (1c). The commands p l o t ( x , y ,
pch=19) and abl i ne( ) will come in handy.
Print the plot and turn it in.
Problem 2. Estimating uniform parameters
(a)Suppose we have data 1 . 2 , 2 . 1 , 1 . 3 , 10.5, 5 which we know is drawn indepene dently from a uniform(a, b)
distribution. Give the maximum likelihood estimate for the parameters a and b.
Hint: in this case you should not try to find the MLE bydifferentiating the likelihood function.
(b)Suppose we have data x1, x2,..., xn which we know is drawn indepenedently from a uniform(a, b) distribution.
Give the maximum likelihood estimate for the parameters aand b.
Problem 3. Monty Hall: Sober and drunk.
statisticshomeworkhelper.com
3. Recall the Monty Hall problem: Monty hosts a game show. There are three doors: one hides a car and two hide goats.
The contestant Shelby picks a door, which is not opened. Monty then opens another door which has nothing behind
it. Finally, Shelby must decide whether to stay with her original choice or switch to the other unopened door. The
problem asks which is the better strategy: staying or switching?
To be precise, let’s label the door that Shelby picks first by A, and the other two doors by B and C. Hypothesis A is
that the car is behind door A, and similarly for hypotheses B and C.
(a)In the usual formulation, Monty is sober and knows the locations of the car and goats. So if the contestant picks a
door with a goat, Monty always opens the other door with a goat. And if the contestant picks the door with a car,
Monty opens one of the other two doors at random. Suppose that sober Monty Hall opens door B, revealing a
goat (this is the data). Make a Bayes’ table with prior, likelihood and posterior. Use the posterior probabilities to
determine the best strategy.
(b)Now suppose that Monty is drunk, i.e. he has completely forgotten where the car is and is only aware enough to
open one of the doors not chosen by the contestant at random. It’s entirely possible he might accidentally reveal the
car, ruining the show.
Suppose that drunk Monty Hall opens door B, revealing a goat. Make a Bayes’ table with prior, likelihood and
posterior. Use the posterior probabilities to determine the best strategy. (Hint: the data is the same but the
likelihood function is not.)
(c) Based on Monty’s pre-show behavior, Shelby thinks that Monty is sober with probability
.7 and drunk with probability .3. Repeat the analysis from parts (a) and (b) in this situation.
Problem 4. We are going to explore the dice problem from class further. I pick one of the five dice uniformly at
random (4, 6, 8, 12, or 20 sides). I then roll this die n times and tell you that, miraculously, every roll resulted in the
value 7. As I am in a position of authority, assume that I am telling the truth!
We write the data as x1 = 7, x2 = 7, . . . xn = 7, where xi is the result of the ith roll.
(a)Find the posterior probability P (H|data) for each die given the data of all n rolls (your answers should involve n).
What is the limit of each of these probabilities as n grows to infinity? Explain why this makes sense.
(b)Given that my first 10 rolls resulted in 7 (i.e., n = 10), rank the possible values for my next roll from most likely to
least likely. Note any ties in rank and explain your reasoning carefully. You need not do any computations to solve
this problem.
(c)Find the posterior predictive pmf for the (n + 1)st roll given the data. That is, find P (xn+1|x1 = 7, ··· , xn =
7) for xn+1 = 1,..., 20. (Hint: use (a) and the law of total probability. Many values of the pmf coincide, so you do
not need to do 20 separate computations. You should check that your answer is consistent with your ranking in (b)
for n = 10).
(d)What function does the pmf in part (c) converge to as n grows to infinity? Explain why this makes sense.
Problem 5. Odds.
You have a drawer that contains 50 coins. 10 coins have probability p = 0.3 of heads, 30
statisticshomeworkhelper.com
4. coins have probability p = 0.5 and 10 coins have probability p = 0.7. You pick one coin at random from the drawer
and flip it.
(a) What are the (prior) odds you chose a 0.3 coin? A 0.7 coin?
(b) What are the (prior) odds of flipping a heads?
(c) Suppose the flip lands heads.
(i) What are the posterior odds the coin is a 0.3 coin?
(ii) A 0.7 coin?
(d) What are the posterior predictive odds of heads on the next (second) flip?
Problem 6. Courtroom fallacies.
(a) [Mackay, Information Theory, Inference, and Learning Algorithms, and OJ Simpson trial] Mrs S is found stabbed
in her family garden. Mr S behaves strangely after her death and is considered as a suspect. On investigation of
police and social records it is found that Mr S had beaten up his wife on at least nine previous occasions. The
prosecution advances this data as evidence in favor of the hypothesis that Mr S is guilty of the murder. ‘Ahno,’ says
Mr S
’
shighly paid lawyer, ‘statistically, only one in a thousand wife-beaters actually goes on to murder his wife. So
the wife-beating is not strong evidence at all. In fact, given the wife beating evidence alone, it’s extremely unlikely
that he would be the murderer of his wife –only a 1/1000 chance. You should therefore find him innocent.’
Is the lawyer right to imply that the history of wife-beating does not point to Mr S
’
sbeing the murderer? Or is the
lawyer a slimy trickster? If the latter, what is wrong with his argument?
Use the following scaffolding to reason precisely:
Hypothesis: M = ‘MrS murdered Mrs S
’
Data: K = ‘Mrs S was killed’, B = ‘MrS had a history of beating Mrs S
’How is the above probability
1/1000 expressed in these terms? How is the (posterior) probability of guilt expressed in these terms? How are
these two probabilities related? Hint: Bayes’ theorem, conditioning on K throughout.
b) [True story] In 1999 in Great Britain, Sally Clark was convicted of murdering her two sons after each child died
weeks after birth (the first in 1996, the second in 1998). Her conviction was largely based on the testimony of the
pediatrician Professor Sir Roy Meadow. He claimed that, for an affluent non-smoking family like the Clarks, the
probability of a single cot death (SIDS) was 1 in 8543, so the probability of two cot deaths in the same family was
around “
1in 73 million.” Given that there are around 700,000 live births in Britain each year, Meadow argued that a
double cot death would be expected to occur once every hundred years. Finally, he reasoned that given this
vanishingly small rate, the far more likely scenario is that Sally Clark murdered her children.
Carefully explain at least two errors in Meadow’s argument.
statisticshomeworkhelper.com
5. Solution
Problem 1. (a) We know that yi − axi − b = εi ∼ N(0,σ2). Therefore
yi = εi + axi + b∼ N(axi + b,σ2). That is,
1
σ 2π
f (yi |a,b,xi, σ) = √ e − i i
(y − a x −b) 2
2σ2
.
(b) (i) The y values are 8,2,1. The likelihood function is a product of the densities found in part (a)
1 3
f (8,3,2 |a,b,σ) = e
2 2 2
−((8−a−b) +(2−3a−b) +(1−5a−b) )/2σ 2
σ
√
2π
3
ln(f (8,3,2 |a,b,σ)) = −3 ln(σ) −
2
ln(2π) −
(8 −a−b)2 + (2 −3a−b)2 + (1 −5a−b)2
2σ2
(ii) We just copy our answer in part (i) replacing the explicit values of xi and yi by their symbols
1 n
f (y1,..., yn |a,b,σ) = √ e
Σ n 2
j j
— (y −ax −b) /2σ 2
j =1
σ 2π
n n
Σ
j j
2 j = 1
2
ln(f (8,3,2 |a,b,σ)) = −nln(σ) − ln(2π) − (y −ax −b) /2σ 2
(c) We set partial derivatives to 0 to try and find the MLE. (Don’tforget that σ is a contstant.)
a−b) −6(2 −3a−b) −10(1 −5a−b)
∂ −2(8 −
∂a
ln(f (8,3,2 |a,b,σ)) = −
2σ2
=
−84a−18b+ 38
2σ2
= 0
⇒ 70a+ 18b= 38
−2(8 −a−b) −2(2 −3a−b) −2(1 −5a−b)
∂
∂b
ln(f (8,3,2 |a,b,σ)) = −
2σ2
=
−18a−6b+ 22
2σ2
= 0
⇒ 18a+ 6b= 22
We have two simultaneous equations: 70a + 18b = 38, 18a + 6b = 22. These are easy to
solve, e.g. first eliminate band solve for a. We get 7 107
a = −
4
b=
12
statisticshomeworkhelper.com
6. (d) Here’s the R code I used to make the plot
x = c( 1, 3, 5)
y = c( 8, 2, 1)
a = -7/4
b = 107/12
plot(x,y,pch=19,col="blue")
abline(a=b,b=a, col="magenta") 1 2 3
x
4 5
y
1
2
3
4
5
6
7
8
Problem 2. Estimating uniform parameters
(a) The pdf for uniform(a, b) one data value is f (xi | a,b) = 1/(b−a) if xi is in the interval [a,b]and 0 if it is not. So the
likelihood function for our 5 data values is
5
1/(b−a)
0
f (data |a,b) =
if all data is in [a,b] if
not
This is maximized when (b −a) is as small as possible. Since all the data has to be in the
interval [a,b] we minimize (b −a) by taking a = minimum of data and b = maximum of
a = 1.2, b= 10.5
data.
answer: .
(b) The same logic as in part (a) shows a = min(x1,...,xn) and b= max(x1,...,xn) .
Problem 3. all three parts to this problem we have 3 hypotheses:
HA = ‘the car is behind door A
’ HB = ‘the car is behind
door B
’
HC = ‘thecar is behind door C
’
.
In all three parts the data is D = ‘Monty opens door B and reveals a goat’.
(a) The key to our Bayesian update table is the likelihoods: Since Monty is sober he always reveals a goat.
P (D|HA): HA says the car is behind A. So Monty is equally likely to pick B or C and reveal a goat. Thus P
(D|HA) = 1/2.
P (D|HB): Since HB says the car is behind B, sober Monty will never choose B (and if he did it would not reveal a
car). So P (D|HB) = 0.
P (D|HC): Hc says the car is behind C. Since sober Monty doesn’t make mistakes he will open door B and reveal a
goat. So P (D|HC) = 1.
Here is the table for this situation.
H P(H) P(D|H) Unnorm. Post. Posterior
HA 1/3 1/2 1/6 1/3
HB 1/3 0 0 0
HC 1/3 1 1/3 2/3
Total: 1 – 1/2 1 statisticshomeworkhelper.com
7. Therefore, Shelby should switch, as her chance of winning the car after switching is double that had she stayed with
her initial choice.
b) Some of the likelihoods change in this setting.
P (D|HA): HA says the car is behind A. So Monty is equally likely to show B or C and reveal a goat. Thus P
(D|HA) = 1/2.
P (D|HB): Since HB says the car is behind B, drunk Monty might show B, but if he does we won’treveal a goat.
(He will ruin the game.) So P (D|HB ) = 0.
P (D|HC): Hc says the car is behind C. Drunk Monty is equally likely to B or C. If he chooses B he’llreveal a
goat. So P (D|HC) = 1/2.
Our table is now: H P(H) P(D|H) Unnorm. Post. Posterior
HA 1/3 1/2 1/6 1/2
HB 1/3 0 0 0
HC 1/3 1/2 1/6 1/2
Total: 1 – 1/3 1
So in this case switching is just as good (or as bad) as staying with the original choice.
(c) We have to recompute the likelihoods.
P (D|HA): If the car is behind A then sober or drunk Monty is equally likely to choose door
B and reveal a goat. Thus P (D|HA ) = 1/2.
P (D|HB ): If the car is behind door B then whether he chooses it or not Monty can’t reveal a goat behind it. So P
(D|HB) = 0.
P (D|HC ): Let S be the event that Monty is sober and Sc the event he is drunk. From the table in (a), we see
that P (D|HC, S) = 1 and from the table in (b), we see that P (D|HC, Sc) = 1/2. Thus, by the law of total
probability 1
C C
c c
P (D|H ) = P (D|H ,S)P ( S ) + P (D|HC,S )P (S ) = 0.7+
2
(0.3) = .85 =
20
17
.
H P(H) P(D|H) Unnorm. Post. Posterior
HA 1/3 1/2 1/6 10/27
HB 1/3 0 0 0
HC 1/3 17/20 17/60 17/27
Total: 1 – 9/20 1
Thus switching gives a probability of 17/27 of winning. So switching is the best strategy.
Problem 4. (a) Let H4, H6, H8, H12, and H20 are the hypotheses that we have selected the 4, 6, 8, 12, or
20 sided die respectively.
We compute
statisticshomeworkhelper.co
8. Hyp.
H
Prior
P(H)
Likelihood
P(data|H)
Unnorm. Post. Posterior
P(H|data)
H4 1/5 0 0
0
1 n
5 · (1/8)
1 n
5 · (1/12)
1 n
5 · (1/20)
0
H6 1/5 0 0
H8 1/5 (1/8)n 1 (1/8)n 5T
H12 1/5 (1/12)n 1 (1/12)n 5T
H20 1/5 (1/20)n 1 (1/20)n 5T
Total: 1 – T = 1 · ((1/8)n + (1/12)n + (1/20)n)
5
1
The posterior probabilities are given in the table. To find what happens as n grows large, we rewrite the posterior
probabilities by multiplying numerator and denominator by 8n:
1
8 2
3
P (H |data) = n
1 + + 2 n
5
12
2
3
n
2
3
1 + + 2
n n
5
20
P (H |data) =
2
5
n
2 n
1 + + 2 n
3 5
P (H |data) =
As n → ∞, we know that 2 n
→ 0 and
2 n
8
3 5 → 0. Thus, as n grows to infinity, P (H |data)
approaches 1 and the posterior probability of all the other hypotheses goes to 0.
(b)Having observed n 7
’
s already, we know that we could not have selected the 4-sided or the 6-sided die. We have
three different groups of numbers: we can roll 1 to 8 with all three remaining dice; 9 to 12 with the 12 and 20-sided
dice; and 13 to 20 with only the 20-sided die. Thus, rolling 1 to 8 are all equally likely, likewise 9 to 12 and 13 to 20.
Since we can get 1 to 8 from all three dice each of these values is in the most likely group. The next most likely
values are 9 to 12 which can happen on two dice. Least likely values are 13 to 20.
(c) By the law of total probability, for xn+ 1 = 1,2,...,8, we have
P (xn+1|data) = ·
1 1
5T 8
n
·
8
+ ·
1 1 1 n
+ ·
1 1 n
1 1
5T 12
·
12 5T 20
·
20
.
For xn+ 1 = 9,10,11,12, we have
P (xn+1|data) = ·
1 1 1
+ ·
n n
1 1 1
5T 20
·
20
.
5T 12
·
12
Finally, for xn+ 1 = 13,14,...,20, we have
P (xn+1|data) = ·
n
1 1 1
5T 20
·
20
.
(d) As n → ∞, we see that P (Dn+1 = x|data = all sevens) → 1/8 for x = 1,2,..., 8, and 0 for 9 ≤ x ≤20.
Problem 5. Odds.
(a) Odds of A are P (A)/P (Ac). So both types of coin have odds 10/40.
statisticshomeworkhelper.com
9. (b) To answer parts b-d we make a likelihood table and a Bayesian update table. We la- bel our hypothesis 0.3, 0.5
and 0.7 meaning the the chosen has that probability of heads. Our
data from the first flip is D1 is the event ‘heads on the first flip’.
outcomes
Heads Tails
0.3 0.3 0.7
hypotheses 0.5 0.5 0.5
0.7 0.7 0.3
Hypoth.
H
Prior
P(H)
likelihood
P(D1|H)
unnorm. post
P(H)P(D1|H)
posterior
P(H|D1)
0.3 0.2 0.3 0.06 0.12
0.5 0.6 0.5 0.30 0.60
0.7 0.2 0.7 0.14 0.28
Total: 1 – 0.50 1
The prior probability of heads is just the total in the unnormalized posterior column:
P (heads) = 0.50
So the prior probability of tails is 1 −P (heads) = 0.50
So the prior odds of heads are O(heads) = 1, i.e. 50-50 odds.
(c) (i) From the table we see the posterior probability the coin is the 0.3 coin is 0.12 so
the posterior odds are
0.12 12
0.88 88
= = 0.136 .
(ii) Likewise the posterior odds it’s the 0.7 coin are
0.28 28
0.72 72
= = 0.389 .
(d) The posterior predictive probability of heads is found by summing the product of the posterior column in the
Bayesian update table and the heads column in the likelihood table. We get P (heads|D1) = .12 · .3+ .60 · .5+ .28 · .7
= 0.532.
The posterior predictive probability of tails P (tails|D1) = 1−0.532 = 0.468. So the posterior
0.532
predictive odds of heads are O(heads|D1) =
0.468
= 1.1368 .
Problem 6. (a) The lawyer may correctly state that P (M |B) = 1/1000, but the lawyer then conflates this with
the probability of guilt given all the relevant data, which is really P (M |B, K). From Bayes’ theorem, conditioning on
B throughout, we have:
P (M |K, B) =
P (K |M ,B)P (M |B)
=
P (M |B)
.
P (K |B) P (K |B)
since P (K|M, B) = 1. If we let N be the event that Mrs S was murdered by someone other than her husband, then M
and N partition K, so and the odds of Mr S
’
sguilt are
O(M |K, B) =
=
P (M |B) P (Mr S murdered Mrs S given that he beat her and she was killed)
.
P (N |B) P (Someone else murdered Mrs S given that Mr S beat her and she was killed)
At this point, the relevant question is clear: is a battered wife more likely to be murdered by her husband or by
someone else? I would guess the odds strongly favor the former.
(b) Here are four errors in the argument statisticshomeworkhelper.com
10. 1. The prosecutor arrived at “
1in 73 million” as follows: The probability that 1 child from an affluent non-smoking
family dies of SIDS is 1/8543, so the probability that 2 children die is (1/8543)2. However, this assumes that
the SIDS death among sib- lings are independent. Due to genetic or environmental factors, we suspect that this
assumption is invalid.
2. The use of the figure “700,000 live births in Britain each year.” The prosecutor had restricted attention only to
affluent non-smoking families when (erroneously) comput- ing the probability of two SIDS deaths. However, he
does not similarly restrict his attention when considering the number of births.
3. The rate “once every hundred years” is not valid: The prosecutor arrived at this by multiplying the number of
live births by the probability that two children die from SIDS. The result is a non-sensical rate.
4. While double SIDS is very unlikely, double infanticide may be even more unlikely. It is the odds of one
explanation relative to the other given the deaths that matters, and not just how unlikely one possibility is.
statisticshomeworkhelper.com