SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
6.867 Homework 1
Fall 2005
Due: Wednesday, September 28, in class. 10% off per day late.
Don’t spend time doing hairy integrals by hand. If that’s the answer to a question, you can
just leave it un-integrated.
1 A Tale of Two Estimators
In this problem, we’re going to explore the bias-variance trade-off in a very simple setting. We
have a set of unidimensional data, X1, . . . , Xn, drawn from the positive reals. We will consider two
different models for its distribution:
• Model 1: The data are drawn from a uniform distribution on the interval [0, b]. This model
has a single positive real parameter, 0 < b.
• Model 2: The data are drawn from a uniform distribution on the interval [a, b]. This model
has two positive real parameters, a and b, such that 0 < a < b.
We are interested in comparing estimates of the mean of the distribution, derived from each of
these two models.
Question 0: What’s the mean of each of the distributions?
1.1 The Right Stuff
Let’s start by considering the situation in which the data were, in fact, drawn from an instance of
the model under consideration: either a uniform distribution on [0, b∗] (for model 1) or a uniform
distribution on [a∗, b∗] (for model 2). If you are, like Commander Toad1, “brave and bold, bold and
brave,” then avert your eyes from the rest of this section and compute the bias, variance, and mean
squared error of ML estimator of the mean of each model, on data drawn from this distribution, as
a function of n. Otherwise, let’s take it step by step.
As we saw in recitation, the ML estimator for b, ˆb = maxi Xi. By a similar argument, the ML
estimator for a, ˆa = mini Xi. To understand the properties of these estimators we have to start by
deriving their PDFs. The minimum and maximum of a data set are also known as their first and
nth order statistics, and sometimes written X(1) and X(n).
1
Commander Toad In Space, Jane Yolen, Putnam, 1996
1
1.1.1 Using Model 1
In model 1, we just need to consider the distribution of ˆb. Generally speaking, the pdf of the
maximum of a set of data drawn from pdf f, with cdf F, is
fˆb(x) = nF(x)n−1
f(x) .
The idea is that, if x is the maximum, then n − 1 of the other data values will have to be less than
x, and the probability of that is F(x)n−1, and then one value will have to equal x, the probability
of which is f(x). We multiply by n because there are n different ways to choose the data value
that could be the maximum.
Question 1: What is fˆb in the particular case where the data are drawn uniformly from 0 to
b?
Question 2: Write an expression for the expected value of ˆµ, as an integral.
In fact, there’s a nice closed form expression, which you can use in the following questions:
E[ˆµ] =
b∗
2
n
(n + 1)
.
Question 3: What is the squared bias of ˆµ? Is this estimator unbiased? Is it asymptotically
unbiased? (Reminder: bias2
(ˆθ) = (ED[ˆθ] − θ)2.)
Question 4: Write an expression for the variance of ˆµ, as an integral.
The closed form for the variance is
V [ˆµ] =
b∗2
4
n
(n + 1)2(n + 2)
.
Question 5: What is the mean squared error of ˆµ? (Reminder: MSE(ˆθ) = bias2
(ˆθ) + var(ˆθ).)
1.1.2 Another estimator for Model 1
We might consider something other than the MLE for Model 1. Consider the estimator
ˆµ =
x(n)(n + 1)
2n
.
Question 6: Write an expression for the expected value of this version of ˆµ as an integral.
Then solve the integral.
Question 7: What is the squared bias of this estimator for ˆµ? Is this estimator unbiased? Is
it asymptotically unbiased?
Question 8: Write an expression for the variance of ˆµ as an integral.
The closed form for the variance is
V [ˆµ] =
b∗2
4n(n + 2)
.
Question 9: What is the mean squared error of this version of ˆµ?
Question 10: What are the relative advantages of the estimator from the previous section and
this one?
2
1.1.3 Using Model 2
Now, let’s do the same thing, but for the MLE for model 2. We have to start by thinking about
the joint distribution of MLE’s ˆa and ˆb. Generally speaking, the joint pdf of the minimum and the
maximum of a set of data drawn from pdf f, with cdf F, is
fˆa,ˆb(x, y) = n(n − 1)(F(y) − F(x))n−2
f(x)f(y) .
Question 11: Explain in words why this makes sense.
Question 12: What is fˆa,ˆb in the particular case where the data are drawn uniformly from a
to b?
Question 13: Write an expression for the expected value of ˆµ in terms of an integral.
Here’s what it should integrate to:
E[ˆµ] =
a∗ + b∗
2
.
Question 14: What is the squared bias of ˆµ? Is this estimator unbiased? Is it asymptotically
unbiased?
Question 15: Write an expression for the variance of ˆµ in terms of an integral.
The closed form for the variance is
V [ˆµ] =
(b∗ − a∗)2
2(n + 1)(n + 2)
.
Question 16: What is the mean squared error of ˆµ?
1.1.4 Comparing Models
What if we have data that is actually drawn from the interval [0, 1]? Both models seem like
reasonable choices.
Question 17: Show plots that compare the bias, variance, and mse of each of the estimators
we’ve considered on that data, as a function of n. (Use the formulas above; don’t do it by actually
generating data). Write a paragraph in English explaining your results. What estimator would you
use?
Now, what if we have data that is actually drawn from the interval [.1, 1]? It seems like model
2 is the only reasonable choice. But is it?
We already know the bias, variance, and mse for model 2 in this case. But what about the
MLE and unbiased estimators for model 1? Let’s characterize the general behavior when we use
the estimator ˆµ = x(n)(n + 1)/(2n) on data drawn from an interval [a∗, b∗].
Question 18: Write an expression for the expected value of ˆµ in terms of an integral.
The closed form expression is
E[ˆµ] =
a∗ + b∗n
2n
.
Question 19: Explain in English why this answer makes sense.
Question 20: What is the squared bias of this ˆµ? Explain in English why your answer makes
sense. Consider how it behaves as a increases, and how it behaves as n increases.
Question 21: Write an expression for the variance of this ˆµ in terms of an integral.
3
The closed form for the variance is
V [ˆµ] =
(b∗ − a∗)2
4n(n + 2)
.
To save you some tedious algebra, we’ll tell you that the mean squared error of this ˆµ is
(apologies for the ugliness; let us know if you find a beautiful rewrite)
b∗2
n − 2a∗b∗n + a∗2
(2 − 2n + n3)
4n2(n + 2)
.
Question 22: Show plots that compare the bias, variance, and mse of this estimator with the
regular model 2 estimator on data drawn from [0.1, 1], as a function of n. Are there circumstances
in which it would be better to use this estimator? If so, what are they and why? If not, why not?
Question 23: Show plots of mse of both estimators, as a function of n on data drawn from
[.01, 1] and on data drawn from [.2, 1]. How do things change? Explain why this makes sense.
2 Bayesian Estimation
2.1 Dirichlet Prior
Exercise borrowed from Stat180 at UCLA.
The Dirichlet distribution is a multivariate version of the Beta distribution. When we have a
coin with two outcomes, we really only need a single parameter θ to model the probability of heads.
But now let’s consider a “thick” coin that has three possible outcomes: heads, tails, and edge. Now
we need two parameters: θh is the probability of heads, θt is the probability of tails, and then the
probability of an edge is 1 − θh − θt.
The random variables (V, W) ∈ [0, 1] and such that V + W ≤ 1 have a Dirichlet distribution
with parameters α1, α2, α3 if their joint density is
f(v, w) = vα1−1
wα2−1
(1 − v − w)α3−1 Γ(α1 + α2 + α3)
Γ(α1)Γ(α2)Γ(α3)
.
This is a direct generalization of the Beta distribution.
Question 24: If (θh, θt) have a Dirichlet distribution as above, what is the marginal distribution
of θh?
Question 25: Suppose you are playing with a thick coin, and get results X1 . . . Xn, resulting
in H heads and T tails out of n throws. Given θh and θt the random variables H and T have a
multinomial distribution:
Pr(h, t) =
n!
h!t!(n − h − t)!
θh
hθt
t(1 − θh − θt)n−h−t
.
Assume a uniform prior on the simplex of possible values of θh and θt. What is the posterior
distribution for θh and θt?
Question 26: In this same setting, what is the predictive distribution for getting another head?
That is, what’s Pr(Xn+1 = heads|X1 . . . Xn)?
Question 27: Now assume a Dirichlet prior for θh and θt with parameters α1, α2, α3. What is
the posterior in this case?
Question 28: What is the predictive distribution?
Question 29: If you assume a squared-error loss, what is your estimate of θh and θt?
Question 30: What happens as n → ∞?
4
3 Gaussian distributions
Question 31: (DeGroot) Consider a normal distribution for which the value of the mean µ is
unknown and the variance is 4, and suppose that the prior distribution of µ is a normal distribution
whose variance is 9. How large a random sample must be taken from the given normal distribution
in order to be able to specify an interval having a length of 1 unit such that the probability that µ
lies in this interval is at least 0.95?
4 Presidential debate
(Gelman) On September 25, 1988, the evening of a presidential campaign debate, ABC News
conducted a survey of registered voters in the United States; 639 persons were polled before the
debate, and 639 different persons were polled after. The results are displayed below:
Survey Bush Dukakis No opinion/other Total
pre-debate 294 307 38 639
post-debate 288 332 19 639
Assume the surveys are independent simple random samples from the population of registered
voters. We will ignore the “no opinion” cases and model the data with two different binomial
distributions. For j = 1, 2, let αj be the proportion of voters who preferred Bush, out of those who
had a preference for either Bush or Dukakis at the time of survey j.
Question 32: What is the posterior probability that there was a shift toward Bush?
5 Decision theory
Question 33: (DeGroot) Suppose that a person is going to sell Fizzy Cola at a football game and
must decide in advance how much to order. Suppose that he makes a gain of m cents on each quart
that he sells at the game but suffers a loss of c cents on each quart that he has ordered but does not
sell. Assume that the demand for Fizzy Cola at the game, as measured in quarts, is a continuous
random variable X with PDF f and CDF F, show that his expected profit will be maximized if he
orders an amount α such that F(α) = m/(m + c).
(Duda, Hart, Stork) Suppose we use a randomized decision rule, which chooses action ai with
probability Pr(ai|x).
Question 34: Show that the resulting risk is given by
[
m
i=1
R(ai|x) Pr(ai|x)] Pr(x)dx .
Question 35: What choice of Pr(ai|x) minimizes R in general?
In many problems, one has the option either to assign the pattern to one of c classes, or to
reject it as being unrecognizable. If the cost for rejects is not too high, rejection may be a desirable
action. Let the cost for a correct answer be 0, for an incorrect answer be λs, and for a reject be λr.
Question 36: Under what conditions on Pr(ci|x), λs and λr should we reject?
5

Weitere ähnliche Inhalte

Was ist angesagt?

An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testingjemille6
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic ReasoningJunya Tanaka
 
Conference on theoretical and applied computer science
Conference on theoretical and applied computer scienceConference on theoretical and applied computer science
Conference on theoretical and applied computer scienceSandeep Katta
 
Optimization of Fuzzy Matrix Games of Order 4 X 3
Optimization of Fuzzy Matrix Games of Order 4 X 3Optimization of Fuzzy Matrix Games of Order 4 X 3
Optimization of Fuzzy Matrix Games of Order 4 X 3IJERA Editor
 
Chap05 continuous random variables and probability distributions
Chap05 continuous random variables and probability distributionsChap05 continuous random variables and probability distributions
Chap05 continuous random variables and probability distributionsJudianto Nugroho
 
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluationA. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluationjemille6
 
6334 Day 3 slides: Spanos-lecture-2
6334 Day 3 slides: Spanos-lecture-26334 Day 3 slides: Spanos-lecture-2
6334 Day 3 slides: Spanos-lecture-2jemille6
 
Stability criterion of periodic oscillations in a (9)
Stability criterion of periodic oscillations in a (9)Stability criterion of periodic oscillations in a (9)
Stability criterion of periodic oscillations in a (9)Alexander Decker
 

Was ist angesagt? (18)

Lecture 2
Lecture 2Lecture 2
Lecture 2
 
adalah
adalahadalah
adalah
 
Ch06 se
Ch06 seCh06 se
Ch06 se
 
Unit 2.7
Unit 2.7Unit 2.7
Unit 2.7
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testing
 
1561 maths
1561 maths1561 maths
1561 maths
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 
Zahedi
ZahediZahedi
Zahedi
 
Conference on theoretical and applied computer science
Conference on theoretical and applied computer scienceConference on theoretical and applied computer science
Conference on theoretical and applied computer science
 
Probability distributionv1
Probability distributionv1Probability distributionv1
Probability distributionv1
 
Unit 7.5
Unit 7.5Unit 7.5
Unit 7.5
 
Numerical approximation
Numerical approximationNumerical approximation
Numerical approximation
 
Optimization of Fuzzy Matrix Games of Order 4 X 3
Optimization of Fuzzy Matrix Games of Order 4 X 3Optimization of Fuzzy Matrix Games of Order 4 X 3
Optimization of Fuzzy Matrix Games of Order 4 X 3
 
Chap05 continuous random variables and probability distributions
Chap05 continuous random variables and probability distributionsChap05 continuous random variables and probability distributions
Chap05 continuous random variables and probability distributions
 
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluationA. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation
 
6334 Day 3 slides: Spanos-lecture-2
6334 Day 3 slides: Spanos-lecture-26334 Day 3 slides: Spanos-lecture-2
6334 Day 3 slides: Spanos-lecture-2
 
Stability criterion of periodic oscillations in a (9)
Stability criterion of periodic oscillations in a (9)Stability criterion of periodic oscillations in a (9)
Stability criterion of periodic oscillations in a (9)
 
Unit 7.1
Unit 7.1Unit 7.1
Unit 7.1
 

Ähnlich wie HW1 MIT Fall 2005

The Use of Fuzzy Optimization Methods for Radiation.ppt
The Use of Fuzzy Optimization Methods for Radiation.pptThe Use of Fuzzy Optimization Methods for Radiation.ppt
The Use of Fuzzy Optimization Methods for Radiation.ppteslameslam18
 
Math 235 - Summer 2015Homework 2Due Monday June 8 in cla.docx
Math 235 - Summer 2015Homework 2Due Monday June 8 in cla.docxMath 235 - Summer 2015Homework 2Due Monday June 8 in cla.docx
Math 235 - Summer 2015Homework 2Due Monday June 8 in cla.docxandreecapon
 
Machine Learning
Machine LearningMachine Learning
Machine LearningAshwin P N
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksFederico Cerutti
 
Precalculus 1 chapter 1
Precalculus 1 chapter 1Precalculus 1 chapter 1
Precalculus 1 chapter 1oreves
 
A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)jemille6
 

Ähnlich wie HW1 MIT Fall 2005 (20)

Statistical Method In Economics
Statistical Method In EconomicsStatistical Method In Economics
Statistical Method In Economics
 
Math Algebra
Math AlgebraMath Algebra
Math Algebra
 
The Use of Fuzzy Optimization Methods for Radiation.ppt
The Use of Fuzzy Optimization Methods for Radiation.pptThe Use of Fuzzy Optimization Methods for Radiation.ppt
The Use of Fuzzy Optimization Methods for Radiation.ppt
 
Math 235 - Summer 2015Homework 2Due Monday June 8 in cla.docx
Math 235 - Summer 2015Homework 2Due Monday June 8 in cla.docxMath 235 - Summer 2015Homework 2Due Monday June 8 in cla.docx
Math 235 - Summer 2015Homework 2Due Monday June 8 in cla.docx
 
Probabilistic systems exam help
Probabilistic systems exam helpProbabilistic systems exam help
Probabilistic systems exam help
 
Probabilistic systems assignment help
Probabilistic systems assignment helpProbabilistic systems assignment help
Probabilistic systems assignment help
 
statistics assignment help
statistics assignment helpstatistics assignment help
statistics assignment help
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Mathematical Statistics Homework Help
Mathematical Statistics Homework HelpMathematical Statistics Homework Help
Mathematical Statistics Homework Help
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
2 simple regression
2   simple regression2   simple regression
2 simple regression
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
Data Analysis Assignment Help
Data Analysis Assignment Help Data Analysis Assignment Help
Data Analysis Assignment Help
 
Signals and Systems Homework Help.pptx
Signals and Systems Homework Help.pptxSignals and Systems Homework Help.pptx
Signals and Systems Homework Help.pptx
 
Excel Homework Help
Excel Homework HelpExcel Homework Help
Excel Homework Help
 
Precalculus 1 chapter 1
Precalculus 1 chapter 1Precalculus 1 chapter 1
Precalculus 1 chapter 1
 
A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)
 
Probability Homework Help
Probability Homework Help Probability Homework Help
Probability Homework Help
 
Stochastic Process Exam Help
Stochastic Process Exam HelpStochastic Process Exam Help
Stochastic Process Exam Help
 

Kürzlich hochgeladen

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 

Kürzlich hochgeladen (20)

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

HW1 MIT Fall 2005

  • 1. 6.867 Homework 1 Fall 2005 Due: Wednesday, September 28, in class. 10% off per day late. Don’t spend time doing hairy integrals by hand. If that’s the answer to a question, you can just leave it un-integrated. 1 A Tale of Two Estimators In this problem, we’re going to explore the bias-variance trade-off in a very simple setting. We have a set of unidimensional data, X1, . . . , Xn, drawn from the positive reals. We will consider two different models for its distribution: • Model 1: The data are drawn from a uniform distribution on the interval [0, b]. This model has a single positive real parameter, 0 < b. • Model 2: The data are drawn from a uniform distribution on the interval [a, b]. This model has two positive real parameters, a and b, such that 0 < a < b. We are interested in comparing estimates of the mean of the distribution, derived from each of these two models. Question 0: What’s the mean of each of the distributions? 1.1 The Right Stuff Let’s start by considering the situation in which the data were, in fact, drawn from an instance of the model under consideration: either a uniform distribution on [0, b∗] (for model 1) or a uniform distribution on [a∗, b∗] (for model 2). If you are, like Commander Toad1, “brave and bold, bold and brave,” then avert your eyes from the rest of this section and compute the bias, variance, and mean squared error of ML estimator of the mean of each model, on data drawn from this distribution, as a function of n. Otherwise, let’s take it step by step. As we saw in recitation, the ML estimator for b, ˆb = maxi Xi. By a similar argument, the ML estimator for a, ˆa = mini Xi. To understand the properties of these estimators we have to start by deriving their PDFs. The minimum and maximum of a data set are also known as their first and nth order statistics, and sometimes written X(1) and X(n). 1 Commander Toad In Space, Jane Yolen, Putnam, 1996 1
  • 2. 1.1.1 Using Model 1 In model 1, we just need to consider the distribution of ˆb. Generally speaking, the pdf of the maximum of a set of data drawn from pdf f, with cdf F, is fˆb(x) = nF(x)n−1 f(x) . The idea is that, if x is the maximum, then n − 1 of the other data values will have to be less than x, and the probability of that is F(x)n−1, and then one value will have to equal x, the probability of which is f(x). We multiply by n because there are n different ways to choose the data value that could be the maximum. Question 1: What is fˆb in the particular case where the data are drawn uniformly from 0 to b? Question 2: Write an expression for the expected value of ˆµ, as an integral. In fact, there’s a nice closed form expression, which you can use in the following questions: E[ˆµ] = b∗ 2 n (n + 1) . Question 3: What is the squared bias of ˆµ? Is this estimator unbiased? Is it asymptotically unbiased? (Reminder: bias2 (ˆθ) = (ED[ˆθ] − θ)2.) Question 4: Write an expression for the variance of ˆµ, as an integral. The closed form for the variance is V [ˆµ] = b∗2 4 n (n + 1)2(n + 2) . Question 5: What is the mean squared error of ˆµ? (Reminder: MSE(ˆθ) = bias2 (ˆθ) + var(ˆθ).) 1.1.2 Another estimator for Model 1 We might consider something other than the MLE for Model 1. Consider the estimator ˆµ = x(n)(n + 1) 2n . Question 6: Write an expression for the expected value of this version of ˆµ as an integral. Then solve the integral. Question 7: What is the squared bias of this estimator for ˆµ? Is this estimator unbiased? Is it asymptotically unbiased? Question 8: Write an expression for the variance of ˆµ as an integral. The closed form for the variance is V [ˆµ] = b∗2 4n(n + 2) . Question 9: What is the mean squared error of this version of ˆµ? Question 10: What are the relative advantages of the estimator from the previous section and this one? 2
  • 3. 1.1.3 Using Model 2 Now, let’s do the same thing, but for the MLE for model 2. We have to start by thinking about the joint distribution of MLE’s ˆa and ˆb. Generally speaking, the joint pdf of the minimum and the maximum of a set of data drawn from pdf f, with cdf F, is fˆa,ˆb(x, y) = n(n − 1)(F(y) − F(x))n−2 f(x)f(y) . Question 11: Explain in words why this makes sense. Question 12: What is fˆa,ˆb in the particular case where the data are drawn uniformly from a to b? Question 13: Write an expression for the expected value of ˆµ in terms of an integral. Here’s what it should integrate to: E[ˆµ] = a∗ + b∗ 2 . Question 14: What is the squared bias of ˆµ? Is this estimator unbiased? Is it asymptotically unbiased? Question 15: Write an expression for the variance of ˆµ in terms of an integral. The closed form for the variance is V [ˆµ] = (b∗ − a∗)2 2(n + 1)(n + 2) . Question 16: What is the mean squared error of ˆµ? 1.1.4 Comparing Models What if we have data that is actually drawn from the interval [0, 1]? Both models seem like reasonable choices. Question 17: Show plots that compare the bias, variance, and mse of each of the estimators we’ve considered on that data, as a function of n. (Use the formulas above; don’t do it by actually generating data). Write a paragraph in English explaining your results. What estimator would you use? Now, what if we have data that is actually drawn from the interval [.1, 1]? It seems like model 2 is the only reasonable choice. But is it? We already know the bias, variance, and mse for model 2 in this case. But what about the MLE and unbiased estimators for model 1? Let’s characterize the general behavior when we use the estimator ˆµ = x(n)(n + 1)/(2n) on data drawn from an interval [a∗, b∗]. Question 18: Write an expression for the expected value of ˆµ in terms of an integral. The closed form expression is E[ˆµ] = a∗ + b∗n 2n . Question 19: Explain in English why this answer makes sense. Question 20: What is the squared bias of this ˆµ? Explain in English why your answer makes sense. Consider how it behaves as a increases, and how it behaves as n increases. Question 21: Write an expression for the variance of this ˆµ in terms of an integral. 3
  • 4. The closed form for the variance is V [ˆµ] = (b∗ − a∗)2 4n(n + 2) . To save you some tedious algebra, we’ll tell you that the mean squared error of this ˆµ is (apologies for the ugliness; let us know if you find a beautiful rewrite) b∗2 n − 2a∗b∗n + a∗2 (2 − 2n + n3) 4n2(n + 2) . Question 22: Show plots that compare the bias, variance, and mse of this estimator with the regular model 2 estimator on data drawn from [0.1, 1], as a function of n. Are there circumstances in which it would be better to use this estimator? If so, what are they and why? If not, why not? Question 23: Show plots of mse of both estimators, as a function of n on data drawn from [.01, 1] and on data drawn from [.2, 1]. How do things change? Explain why this makes sense. 2 Bayesian Estimation 2.1 Dirichlet Prior Exercise borrowed from Stat180 at UCLA. The Dirichlet distribution is a multivariate version of the Beta distribution. When we have a coin with two outcomes, we really only need a single parameter θ to model the probability of heads. But now let’s consider a “thick” coin that has three possible outcomes: heads, tails, and edge. Now we need two parameters: θh is the probability of heads, θt is the probability of tails, and then the probability of an edge is 1 − θh − θt. The random variables (V, W) ∈ [0, 1] and such that V + W ≤ 1 have a Dirichlet distribution with parameters α1, α2, α3 if their joint density is f(v, w) = vα1−1 wα2−1 (1 − v − w)α3−1 Γ(α1 + α2 + α3) Γ(α1)Γ(α2)Γ(α3) . This is a direct generalization of the Beta distribution. Question 24: If (θh, θt) have a Dirichlet distribution as above, what is the marginal distribution of θh? Question 25: Suppose you are playing with a thick coin, and get results X1 . . . Xn, resulting in H heads and T tails out of n throws. Given θh and θt the random variables H and T have a multinomial distribution: Pr(h, t) = n! h!t!(n − h − t)! θh hθt t(1 − θh − θt)n−h−t . Assume a uniform prior on the simplex of possible values of θh and θt. What is the posterior distribution for θh and θt? Question 26: In this same setting, what is the predictive distribution for getting another head? That is, what’s Pr(Xn+1 = heads|X1 . . . Xn)? Question 27: Now assume a Dirichlet prior for θh and θt with parameters α1, α2, α3. What is the posterior in this case? Question 28: What is the predictive distribution? Question 29: If you assume a squared-error loss, what is your estimate of θh and θt? Question 30: What happens as n → ∞? 4
  • 5. 3 Gaussian distributions Question 31: (DeGroot) Consider a normal distribution for which the value of the mean µ is unknown and the variance is 4, and suppose that the prior distribution of µ is a normal distribution whose variance is 9. How large a random sample must be taken from the given normal distribution in order to be able to specify an interval having a length of 1 unit such that the probability that µ lies in this interval is at least 0.95? 4 Presidential debate (Gelman) On September 25, 1988, the evening of a presidential campaign debate, ABC News conducted a survey of registered voters in the United States; 639 persons were polled before the debate, and 639 different persons were polled after. The results are displayed below: Survey Bush Dukakis No opinion/other Total pre-debate 294 307 38 639 post-debate 288 332 19 639 Assume the surveys are independent simple random samples from the population of registered voters. We will ignore the “no opinion” cases and model the data with two different binomial distributions. For j = 1, 2, let αj be the proportion of voters who preferred Bush, out of those who had a preference for either Bush or Dukakis at the time of survey j. Question 32: What is the posterior probability that there was a shift toward Bush? 5 Decision theory Question 33: (DeGroot) Suppose that a person is going to sell Fizzy Cola at a football game and must decide in advance how much to order. Suppose that he makes a gain of m cents on each quart that he sells at the game but suffers a loss of c cents on each quart that he has ordered but does not sell. Assume that the demand for Fizzy Cola at the game, as measured in quarts, is a continuous random variable X with PDF f and CDF F, show that his expected profit will be maximized if he orders an amount α such that F(α) = m/(m + c). (Duda, Hart, Stork) Suppose we use a randomized decision rule, which chooses action ai with probability Pr(ai|x). Question 34: Show that the resulting risk is given by [ m i=1 R(ai|x) Pr(ai|x)] Pr(x)dx . Question 35: What choice of Pr(ai|x) minimizes R in general? In many problems, one has the option either to assign the pattern to one of c classes, or to reject it as being unrecognizable. If the cost for rejects is not too high, rejection may be a desirable action. Let the cost for a correct answer be 0, for an incorrect answer be λs, and for a reject be λr. Question 36: Under what conditions on Pr(ci|x), λs and λr should we reject? 5