SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Stat310
              Introduction to inference


                          Hadley Wickham
Saturday, 27 March 2010
Continuity correction


                    Applies only to sums, not means!
                    Important only where n smaller.




Saturday, 27 March 2010
Homework

                    Learn the change of variable steps!
                    If X and Y are independent, and you know
                    f(x) and f(y), what is f(x, y)?
                    Common pattern




Saturday, 27 March 2010
What proportion of
                   Rice students smoke
                        marijuana?


Saturday, 27 March 2010
1. Introduction to inference
                2. A survey
                3. Estimators and their properties
                4. Feedback




Saturday, 27 March 2010
Data vs. Distributions
                    Random experiments produce data.
                    A repeatable random experiment has
                    some underlying distribution.
                    We want to go from the data to say
                    something about the underlying
                    distribution. So far we have been doing
                    the opposite.


Saturday, 27 March 2010
Inference
                    Up to now: Given a sequence of random
                    variables with distribution F, what can we
                    say about the mean of a sample?
                    What we really want: Given the mean of
                    a sample, what can we say about the
                    underlying sequence of random
                    variables?


Saturday, 27 March 2010
Marijuana
                    Up to now: If I told you that smoking pot
                    was Binomially distributed with probability
                    p, you could tell me what I would expect
                    to see if I sampled n people at random.
                    What we want: If I sample n people at
                    random and find out m of them smoke
                    pot, what does that tell me about p?


Saturday, 27 March 2010
Your turn
                    Let’s I selected 10 people at random from
                    the Rice campus and asked them if
                    they’d smoked pot in the last month. Two
                    said yes.
                    Let Yi be 1 if person i smoked pot and 0
                    otherwise. What’s a good guess for
                    distribution of Yi?


Saturday, 27 March 2010
3.5e−08



    3.0e−08



    2.5e−08



    2.0e−08
 prob




    1.5e−08



    1.0e−08



    5.0e−09



    0.0e+00

                      0.0   0.2   0.4       0.6   0.8   1.0
                                        p
Saturday, 27 March 2010
Plug-in principle

                    A good first guess at the true mean,
                    variance, or any other parameter of a
                    distribution is simply to estimate it from
                    the data.
                    We’ll learn more sophisticated ways after
                    the break.



Saturday, 27 March 2010
Problems

                    If I picked someone at random out of the
                    Rice campus directory, and asked them if
                    they’d smoked pot in the last month, how
                    likely do you think I am to get an honest
                    answer?
                    How can we get around this?



Saturday, 27 March 2010
Your turn

                    Compare your survey to someone with
                    the other colour survey. What is the
                    difference? How could you use the
                    answers to estimate how many people
                    smoke pot?




Saturday, 27 March 2010
Questions
                1. Have you smoked pot in the last month?
                2. Flip a coin. If it’s heads, write yes.
                   Otherwise write yes if you’ve smoked pot
                   in the last month, and no otherwise.
                3. In the last month, how many of the
                   following activities have you done? One
                   list contains smoking pot, the other
                   doesn’t.


Saturday, 27 March 2010
Results

                    1. The proportion of yeses in Q1.
                    2. The proportion of yeses in (Q2 - 0.5) * 2
                    3. (Q3a/4 - Q3b/5) / n.




Saturday, 27 March 2010
Your turn


                    Fill in the survey. Make sure no one else
                    can see your answers!




Saturday, 27 March 2010
Definitions
                    Parameter space: set of all possible
                    parameter values
                    Estimator: process/function which takes
                    data and gives best guess for parameter
                    Point estimate: estimator for a single
                    value



Saturday, 27 March 2010
Your turn

                    With a partner, brainstorm as many
                    different properties that you could use to
                    decide which estimator is best.
                    Remember that there is some true
                    unknown proportion that we are trying to
                    estimate.



Saturday, 27 March 2010
Important properties of
                     an estimator

                    Unbiased
                    Small variance
                    Consistent
                    Sufficient




Saturday, 27 March 2010
ˆ =θ
           E(θ)                    Unbiased




                ˆ1 ) < V ar(θ2 )
           V ar(θ           ˆ
                                   Minimum variance


        Common problem is to find UMVE (unbiased
        minimum variance estimator) across all possible
        estimators
Saturday, 27 March 2010
Low bias, low variance    Low bias, high variance




               High bias, low variance   High bias, high variance




Saturday, 27 March 2010
Consistent




                                       n=10

                                              n=5


                                                    n=1
Saturday, 27 March 2010
Results



Saturday, 27 March 2010
Your turn

                    How can we describe this as a random
                    experiment? Do you have any concerns
                    about applying the results of this survey
                    to the entire Rice campus?




Saturday, 27 March 2010
Experiments
                    Remember, the context is always the
                    random experiment. For this case the
                    randomness comes from picking the
                    person at random.
                    Repeating the experiment does not mean
                    asking each person again (the answer
                    would be the same) but asking a different
                    set of people.


Saturday, 27 March 2010
Next time

                    (After the test and spring recess)
                    Method of moments
                    Maximum likelihood




Saturday, 27 March 2010
Feedback



Saturday, 27 March 2010

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (6)

24 modelling
24 modelling24 modelling
24 modelling
 
Plyr, one data analytic strategy
Plyr, one data analytic strategyPlyr, one data analytic strategy
Plyr, one data analytic strategy
 
R packages
R packagesR packages
R packages
 
27 development
27 development27 development
27 development
 
27 development
27 development27 development
27 development
 
22 spam
22 spam22 spam
22 spam
 

Ähnlich wie 19 Inference (6)

20 Estimation
20 Estimation20 Estimation
20 Estimation
 
23 testing
23 testing23 testing
23 testing
 
04 Bayes Rule
04 Bayes Rule04 Bayes Rule
04 Bayes Rule
 
Affordances in Modern Web Design
Affordances in Modern Web DesignAffordances in Modern Web Design
Affordances in Modern Web Design
 
05 Random Variables
05 Random Variables05 Random Variables
05 Random Variables
 
Simpycity and Exceptable
Simpycity and ExceptableSimpycity and Exceptable
Simpycity and Exceptable
 

Mehr von Hadley Wickham (20)

21 spam
21 spam21 spam
21 spam
 
20 date-times
20 date-times20 date-times
20 date-times
 
19 tables
19 tables19 tables
19 tables
 
17 polishing
17 polishing17 polishing
17 polishing
 
16 critique
16 critique16 critique
16 critique
 
15 time-space
15 time-space15 time-space
15 time-space
 
14 case-study
14 case-study14 case-study
14 case-study
 
13 case-study
13 case-study13 case-study
13 case-study
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
10 simulation
10 simulation10 simulation
10 simulation
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
08 functions
08 functions08 functions
08 functions
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 
06 data
06 data06 data
06 data
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
04 reports
04 reports04 reports
04 reports
 
03 extensions
03 extensions03 extensions
03 extensions
 

19 Inference

  • 1. Stat310 Introduction to inference Hadley Wickham Saturday, 27 March 2010
  • 2. Continuity correction Applies only to sums, not means! Important only where n smaller. Saturday, 27 March 2010
  • 3. Homework Learn the change of variable steps! If X and Y are independent, and you know f(x) and f(y), what is f(x, y)? Common pattern Saturday, 27 March 2010
  • 4. What proportion of Rice students smoke marijuana? Saturday, 27 March 2010
  • 5. 1. Introduction to inference 2. A survey 3. Estimators and their properties 4. Feedback Saturday, 27 March 2010
  • 6. Data vs. Distributions Random experiments produce data. A repeatable random experiment has some underlying distribution. We want to go from the data to say something about the underlying distribution. So far we have been doing the opposite. Saturday, 27 March 2010
  • 7. Inference Up to now: Given a sequence of random variables with distribution F, what can we say about the mean of a sample? What we really want: Given the mean of a sample, what can we say about the underlying sequence of random variables? Saturday, 27 March 2010
  • 8. Marijuana Up to now: If I told you that smoking pot was Binomially distributed with probability p, you could tell me what I would expect to see if I sampled n people at random. What we want: If I sample n people at random and find out m of them smoke pot, what does that tell me about p? Saturday, 27 March 2010
  • 9. Your turn Let’s I selected 10 people at random from the Rice campus and asked them if they’d smoked pot in the last month. Two said yes. Let Yi be 1 if person i smoked pot and 0 otherwise. What’s a good guess for distribution of Yi? Saturday, 27 March 2010
  • 10. 3.5e−08 3.0e−08 2.5e−08 2.0e−08 prob 1.5e−08 1.0e−08 5.0e−09 0.0e+00 0.0 0.2 0.4 0.6 0.8 1.0 p Saturday, 27 March 2010
  • 11. Plug-in principle A good first guess at the true mean, variance, or any other parameter of a distribution is simply to estimate it from the data. We’ll learn more sophisticated ways after the break. Saturday, 27 March 2010
  • 12. Problems If I picked someone at random out of the Rice campus directory, and asked them if they’d smoked pot in the last month, how likely do you think I am to get an honest answer? How can we get around this? Saturday, 27 March 2010
  • 13. Your turn Compare your survey to someone with the other colour survey. What is the difference? How could you use the answers to estimate how many people smoke pot? Saturday, 27 March 2010
  • 14. Questions 1. Have you smoked pot in the last month? 2. Flip a coin. If it’s heads, write yes. Otherwise write yes if you’ve smoked pot in the last month, and no otherwise. 3. In the last month, how many of the following activities have you done? One list contains smoking pot, the other doesn’t. Saturday, 27 March 2010
  • 15. Results 1. The proportion of yeses in Q1. 2. The proportion of yeses in (Q2 - 0.5) * 2 3. (Q3a/4 - Q3b/5) / n. Saturday, 27 March 2010
  • 16. Your turn Fill in the survey. Make sure no one else can see your answers! Saturday, 27 March 2010
  • 17. Definitions Parameter space: set of all possible parameter values Estimator: process/function which takes data and gives best guess for parameter Point estimate: estimator for a single value Saturday, 27 March 2010
  • 18. Your turn With a partner, brainstorm as many different properties that you could use to decide which estimator is best. Remember that there is some true unknown proportion that we are trying to estimate. Saturday, 27 March 2010
  • 19. Important properties of an estimator Unbiased Small variance Consistent Sufficient Saturday, 27 March 2010
  • 20. ˆ =θ E(θ) Unbiased ˆ1 ) < V ar(θ2 ) V ar(θ ˆ Minimum variance Common problem is to find UMVE (unbiased minimum variance estimator) across all possible estimators Saturday, 27 March 2010
  • 21. Low bias, low variance Low bias, high variance High bias, low variance High bias, high variance Saturday, 27 March 2010
  • 22. Consistent n=10 n=5 n=1 Saturday, 27 March 2010
  • 24. Your turn How can we describe this as a random experiment? Do you have any concerns about applying the results of this survey to the entire Rice campus? Saturday, 27 March 2010
  • 25. Experiments Remember, the context is always the random experiment. For this case the randomness comes from picking the person at random. Repeating the experiment does not mean asking each person again (the answer would be the same) but asking a different set of people. Saturday, 27 March 2010
  • 26. Next time (After the test and spring recess) Method of moments Maximum likelihood Saturday, 27 March 2010