SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Stat310          Confidence intervals


                             Hadley Wickham
Thursday, 15 April 2010
Quiz

                     • Pick up quiz on your way in
                     • Start at 1pm
                     • Finish at 1:10pm
                     • Closed book




Thursday, 15 April 2010
1. Test extra credit
                 2. Inference roadmap
                 3. Steps for making a confidence
                    interval
                 4. One more sampling distribution (the
                    t-distribution)



Thursday, 15 April 2010
I s rt
                                                              til y p
                                                               pa
                             Test makeup




                                                                 l n la
                                                                    ee nn
                                                                      d er
                                                                       on !
                                                                         e
                     Homework graded out 10
                     4% of overall grade (= 20% of two tests)
                     Will act as extra credit for the test. (i.e.
                     there is no penalty you don’t do it)
                     Due next Thursday.
                     The extra 5% of the grade will be
                     distributed across all other assessment.


Thursday, 15 April 2010
What we want to do
                 Given data:
                 • Estimate true value of parameter
                   (last week)
                 • Quantify uncertainty of estimate
                   (today)
                 • Test whether true value is a certain value
                   (Thursday)


Thursday, 15 April 2010
Tools
                     • Construct an estimator
                          • Method of moments
                          • Maximum likelihood
                     • Work out its distribution
                          • Sampling distribution of mean
                          • Sampling distribution of variance
                          • General properties of ML (not in this course)


Thursday, 15 April 2010
Set up
                     I repeated an experiment defined by Poisson(λ)
                     10 times, and recorded the following results: 6
                     11 10 6 12 7 8 5 7 10
                     The MLE of λ is 8.2, and its standard deviation
                     is 0.90.
                     What is the distribution of the estimate?
                     (Remember that it’s a mean) Can you construct
                     an interval that will contain λ 95% of the time?


Thursday, 15 April 2010
Steps
                     1. Identify distribution that connects estimator
                     and true value (4 choices).
                     2. Form confidence interval for known
                     (sampling) distribution, and work out bounds.
                     3. Back transform.
                     4. Write as interval.
                     5. Plug in sample estimates (actual numbers).


Thursday, 15 April 2010
Your turn


                     Work through the steps on the handout.




Thursday, 15 April 2010
Confidence interval
                     A confidence interval is a simple numerical
                     summary of the uncertainty of an estimate.
                     A 95% confidence interval will contain the
                     true value 95% of the time.
                     An additional constraint is that we want
                     the confidence interval to be a short as
                     possible.


Thursday, 15 April 2010
Each line = 95% confidence
         interval from one experiment


   12

   11

   10

     9

     8


                          50    100     150   200
                               expt




Thursday, 15 April 2010
Horizontal line = true value


   12

   11

   10

     9

     8


                          50      100   150   200
                                 expt




Thursday, 15 April 2010
Red intervals don’t include true
         value


   12

   11

   10

     9

     8


                          50     100        150   200
                                expt

         There are 13 red lines and 200
         experiments. Is this an ok
         interval?
Thursday, 15 April 2010
Your turn

                     What’s wrong with a statement like this:
                     P(2 < μ < 6) = 0.95
                     ?




Thursday, 15 April 2010
Steps
                     Identify distribution that connects
                     estimator and true value.
                     Form confidence interval for
                     known (sampling) distribution.
                     Write as probability statement.
                     Back transform.
                     Write as interval.


Thursday, 15 April 2010
Xi iid, and n large:




                          ¯n − µ .
                          X
                            √ ∼Z
                          σ/ n



Thursday, 15 April 2010
iid                   2
Xi ∼ Normal(µ, σ )

                          (n − 1)S2

                               2
                                     ∼ χ (n − 1)
                                        2
                             σ
                               X ¯n − µ
                                   √ ∼Z
                                 σ/ n
                               X ¯n − µ
                                   √ ∼ tn−1
                                 s/ n
Thursday, 15 April 2010
0.3




                                                        df
                                                              1
 dens




    0.2                                                       2
                                                             15
                                                             Inf




    0.1




                −3        −2   −1       0   1   2   3
                                    x
Thursday, 15 April 2010
Properties of the t-dist
                     Heavier tails compared to the normal
                     distribution.

                                 lim tn = Z
                                n→∞
                     Practically, if n > 30, the t distribution is
                     practically equivalent to the normal.


Thursday, 15 April 2010
t-tables
                     Basically the same as the standard
                     normal. But one table for each value of
                     degrees of freedom.
                     Easiest to use calculator or computer:
                     http://www.stat.tamu.edu/~west/applets/
                     tdemo.html
                     (For homework, use this applet, for final, I’ll give
                     you a small table, if necessary)


Thursday, 15 April 2010
Thursday, 15 April 2010
Your turn
                     We perform the experiment an experiment to
                     measure the speed of sound and repeat it 10
                     times: 340 333 334 332 333 336 350 348 331
                     344 (mean: 338, sd: 7.01)
                     Assuming Xi ~ Normal(μ, σ2), what is an
                     estimate of the speed of sound? What is the
                     error (sd) of this estimate? Give an interval
                     that we’re 95% certain the true speed of
                     sound lies in.


Thursday, 15 April 2010
Example

                     340 333 334 332 333 336 350 348 331
                     344 (mean: 338, sd: 7.01)
                     If not known: (333, 342)   (2.23)




Thursday, 15 April 2010
Steps
                     Identify distribution that connects
                     estimator and true value.
                     Form confidence interval for
                     known (sampling) distribution.
                     Write as probability statement.
                     Back transform.
                     Write as interval.


Thursday, 15 April 2010
Steps
                     Want P(a < Q < b) = 1 - α, and b - a to be
                     as small as possible.
                     If Q is symmetric, P(-a < Q < a) = 1 - α. So
                     a = F(α/2), and there is no interval smaller.
                     If Q isn’t symmetric, pick a = F(α/2),
                     b = F(1 - α/2), but there might be a shorter
                     interval.


Thursday, 15 April 2010
Example


                     We want a 90% confidence interval, then
                     two possible ends for the interval are
                     F(0.05) and F(0.95)




Thursday, 15 April 2010
Reading

                     Read the rest of chapter 6.
                     Everything else is just examples of the
                     general method we learned today.




Thursday, 15 April 2010

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (6)

07 Problem Solving
07 Problem Solving07 Problem Solving
07 Problem Solving
 
12 Ddply
12 Ddply12 Ddply
12 Ddply
 
06 Data
06 Data06 Data
06 Data
 
Plyr, one data analytic strategy
Plyr, one data analytic strategyPlyr, one data analytic strategy
Plyr, one data analytic strategy
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
03 Modelling
03 Modelling03 Modelling
03 Modelling
 

Ähnlich wie 22 confidence (8)

20 Estimation
20 Estimation20 Estimation
20 Estimation
 
11 Simulation
11 Simulation11 Simulation
11 Simulation
 
08 Continuous
08 Continuous08 Continuous
08 Continuous
 
ICTIR2016tutorial
ICTIR2016tutorialICTIR2016tutorial
ICTIR2016tutorial
 
23 testing
23 testing23 testing
23 testing
 
07 Mgf
07 Mgf07 Mgf
07 Mgf
 
lecture8.ppt
lecture8.pptlecture8.ppt
lecture8.ppt
 
Lecture8
Lecture8Lecture8
Lecture8
 

Mehr von Hadley Wickham (20)

27 development
27 development27 development
27 development
 
27 development
27 development27 development
27 development
 
24 modelling
24 modelling24 modelling
24 modelling
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
R packages
R packagesR packages
R packages
 
22 spam
22 spam22 spam
22 spam
 
21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
17 polishing
17 polishing17 polishing
17 polishing
 
16 critique
16 critique16 critique
16 critique
 
15 time-space
15 time-space15 time-space
15 time-space
 
14 case-study
14 case-study14 case-study
14 case-study
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 

22 confidence

  • 1. Stat310 Confidence intervals Hadley Wickham Thursday, 15 April 2010
  • 2. Quiz • Pick up quiz on your way in • Start at 1pm • Finish at 1:10pm • Closed book Thursday, 15 April 2010
  • 3. 1. Test extra credit 2. Inference roadmap 3. Steps for making a confidence interval 4. One more sampling distribution (the t-distribution) Thursday, 15 April 2010
  • 4. I s rt til y p pa Test makeup l n la ee nn d er on ! e Homework graded out 10 4% of overall grade (= 20% of two tests) Will act as extra credit for the test. (i.e. there is no penalty you don’t do it) Due next Thursday. The extra 5% of the grade will be distributed across all other assessment. Thursday, 15 April 2010
  • 5. What we want to do Given data: • Estimate true value of parameter (last week) • Quantify uncertainty of estimate (today) • Test whether true value is a certain value (Thursday) Thursday, 15 April 2010
  • 6. Tools • Construct an estimator • Method of moments • Maximum likelihood • Work out its distribution • Sampling distribution of mean • Sampling distribution of variance • General properties of ML (not in this course) Thursday, 15 April 2010
  • 7. Set up I repeated an experiment defined by Poisson(λ) 10 times, and recorded the following results: 6 11 10 6 12 7 8 5 7 10 The MLE of λ is 8.2, and its standard deviation is 0.90. What is the distribution of the estimate? (Remember that it’s a mean) Can you construct an interval that will contain λ 95% of the time? Thursday, 15 April 2010
  • 8. Steps 1. Identify distribution that connects estimator and true value (4 choices). 2. Form confidence interval for known (sampling) distribution, and work out bounds. 3. Back transform. 4. Write as interval. 5. Plug in sample estimates (actual numbers). Thursday, 15 April 2010
  • 9. Your turn Work through the steps on the handout. Thursday, 15 April 2010
  • 10. Confidence interval A confidence interval is a simple numerical summary of the uncertainty of an estimate. A 95% confidence interval will contain the true value 95% of the time. An additional constraint is that we want the confidence interval to be a short as possible. Thursday, 15 April 2010
  • 11. Each line = 95% confidence interval from one experiment 12 11 10 9 8 50 100 150 200 expt Thursday, 15 April 2010
  • 12. Horizontal line = true value 12 11 10 9 8 50 100 150 200 expt Thursday, 15 April 2010
  • 13. Red intervals don’t include true value 12 11 10 9 8 50 100 150 200 expt There are 13 red lines and 200 experiments. Is this an ok interval? Thursday, 15 April 2010
  • 14. Your turn What’s wrong with a statement like this: P(2 < μ < 6) = 0.95 ? Thursday, 15 April 2010
  • 15. Steps Identify distribution that connects estimator and true value. Form confidence interval for known (sampling) distribution. Write as probability statement. Back transform. Write as interval. Thursday, 15 April 2010
  • 16. Xi iid, and n large: ¯n − µ . X √ ∼Z σ/ n Thursday, 15 April 2010
  • 17. iid 2 Xi ∼ Normal(µ, σ ) (n − 1)S2 2 ∼ χ (n − 1) 2 σ X ¯n − µ √ ∼Z σ/ n X ¯n − µ √ ∼ tn−1 s/ n Thursday, 15 April 2010
  • 18. 0.3 df 1 dens 0.2 2 15 Inf 0.1 −3 −2 −1 0 1 2 3 x Thursday, 15 April 2010
  • 19. Properties of the t-dist Heavier tails compared to the normal distribution. lim tn = Z n→∞ Practically, if n > 30, the t distribution is practically equivalent to the normal. Thursday, 15 April 2010
  • 20. t-tables Basically the same as the standard normal. But one table for each value of degrees of freedom. Easiest to use calculator or computer: http://www.stat.tamu.edu/~west/applets/ tdemo.html (For homework, use this applet, for final, I’ll give you a small table, if necessary) Thursday, 15 April 2010
  • 22. Your turn We perform the experiment an experiment to measure the speed of sound and repeat it 10 times: 340 333 334 332 333 336 350 348 331 344 (mean: 338, sd: 7.01) Assuming Xi ~ Normal(μ, σ2), what is an estimate of the speed of sound? What is the error (sd) of this estimate? Give an interval that we’re 95% certain the true speed of sound lies in. Thursday, 15 April 2010
  • 23. Example 340 333 334 332 333 336 350 348 331 344 (mean: 338, sd: 7.01) If not known: (333, 342) (2.23) Thursday, 15 April 2010
  • 24. Steps Identify distribution that connects estimator and true value. Form confidence interval for known (sampling) distribution. Write as probability statement. Back transform. Write as interval. Thursday, 15 April 2010
  • 25. Steps Want P(a < Q < b) = 1 - α, and b - a to be as small as possible. If Q is symmetric, P(-a < Q < a) = 1 - α. So a = F(α/2), and there is no interval smaller. If Q isn’t symmetric, pick a = F(α/2), b = F(1 - α/2), but there might be a shorter interval. Thursday, 15 April 2010
  • 26. Example We want a 90% confidence interval, then two possible ends for the interval are F(0.05) and F(0.95) Thursday, 15 April 2010
  • 27. Reading Read the rest of chapter 6. Everything else is just examples of the general method we learned today. Thursday, 15 April 2010