SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Outline
                  Why Statistics?
 Populations, Samples, and Census
         Some Sampling Concepts




             Lecture 1
Chapter 1: Basic Statistical Concepts


                      M. George Akritas




                M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                           Why Statistics?
          Populations, Samples, and Census
                  Some Sampling Concepts




Why Statistics?


Populations, Samples, and Census


Some Sampling Concepts
   Representative Samples
   Simple Random and Stratified Sampling
   Sampling With and Without Replacement
   Non-representative Sampling




                         M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                           Why Statistics?
          Populations, Samples, and Census
                  Some Sampling Concepts



Example (Examples of Engineering/Scientific Studies)
    Comparing the compressive strength of two or more cement
    mixtures.
    Comparing the effectiveness of three cleaning products in
    removing four different types of stains.
    Predicting failure time on the basis of stress applied.
    Assessing the effectiveness of a new traffic regulatory measure
    in reducing the weekly rate of accidents.
    Testing a manufacturer’s claim regarding a product’s quality.
    Studying the relation between salary increases and employee
    productivity in a large corporation.

What makes these studies challenging (and thus to require
Statistics) is the inherent or intrinsic variability:

                         M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                             Why Statistics?
            Populations, Samples, and Census
                    Some Sampling Concepts




     The compressive strength of different preparations of the same
     cement mixture will differ. The figure in http://sites.
     stat.psu.edu/~mga/401/fig/HistComprStrCement.pdf
     shows 32 compressive strength measurements, in MPa
     (MegaPascal units), of test cylinders 6 in. in diameter by 12
     in. high, using water/cement ratio of 0.4, measured on the
     28th day after they are made.
     Under the same stress, two beams will fail at different times.
     The proportion of defective items of a certain product will
     differ from batch to batch.

Intrinsic variability renders the objectives of the case studies, as
stated, ambiguous.


                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                            Why Statistics?
           Populations, Samples, and Census
                   Some Sampling Concepts




The objectives of the case studies can be made precise if stated in
terms of averages or means.

    Comparing the average hardness of two different cement
    mixtures.
    Predicting the average failure time on the basis of stress
    applied.
    Estimation of the average coefficient of thermal expansion.
    Estimation of the average proportion of defective items.

Moreover, because of variability, the words ”average” and ”mean”
have a technical meaning which can be made clear through the
concepts of population and sample.


                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                             Why Statistics?
            Populations, Samples, and Census
                    Some Sampling Concepts



Definition
Population is a well-defined collection of objects or subjects, of
relevance to a particular study, which are exposed to the same
treatment or method. Population members are called units.

Example (Examples of populations:)

    All water samples that can be taken from a lake.
    All items of a certain manufactured product.
    All students enrolled in Big Ten universities during the
    2007-08 academic year.
    Two types of cleaning products. (Each type corresponds to a
    population.)


                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                            Why Statistics?
           Populations, Samples, and Census
                   Some Sampling Concepts



The objective of a study is to investigate certain characteristic(s)
of the units of the population(s) of interest.

Example (Examples of characteristics:)

    All water samples taken from a lake. Characteristics: Mercury
    concentration; Concentration of other pollutants.
    All items of a certain manufactured product (that have, or will
    be produced). Characteristic: Proportion of defective items.
    All students enrolled in Big Ten universities during the
    2007-08 academic year. Characteristics: Favorite type of
    music; Political affiliation.
    Two types of cleaning products. Characteristic: cleaning
    effectiveness.


                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                       Why Statistics?
      Populations, Samples, and Census
              Some Sampling Concepts




In the example where different (but of the same type) beams
are exposed to different stress levels:
    the characteristic of interest is time to failure of a beam under
    each stress level, and
    each stress level used in the study corresponds to a separate
    population which consists of all beams that will be exposed to
    that stress level.
This emphasizes that populations are defined not only by the
units they consist of, but also by the method or treatment
applied to these units.




                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                       Why Statistics?
      Populations, Samples, and Census
              Some Sampling Concepts




Full (i.e. population-level) understanding of a characteristic
requires the examination of all population units, i.e. a census.

    For example, full understanding of the relation between salary
    and productivity of a corporation’s employees requires
    obtaining these two characteristics from all employees.
However,
    taking a census can be time consuming and expensive: The
    2000 U.S. Census costed $6.5 billion, while the 2010 Census
    costed $13 billion.
    Moreover, census is not feasible if the population is
    hypothetical or conceptual, i.e. not all members are
    available for examination.
Because of the above, we typically settle for examining all
units in a sample, which is a subset of the population.

                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                            Why Statistics?
           Populations, Samples, and Census
                   Some Sampling Concepts




Due to the intrinsic variability, the sample properties/attributes of
the characteristic of interest will differ from those of the
population. For example

     The average mercury concentration in 25 water samples will
     differ from the overall mercury concentration in the lake.
     The proportion in a sample of 100 PSU students who favor
     the use of solar energy will differ from the corresponding
     proportion of all PSU students.
     The relation between bear’s chest girth and weight in a
     sample of 10 bears, will differ from the corresponding relation
     in the entire population of 50 bears in a forested region.



                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                              Why Statistics?
             Populations, Samples, and Census
                     Some Sampling Concepts

The GOOD NEWS is that, if the sample is suitably drawn, then
sample properties approximate the population properties.


                       400
                       300
              Weight

                       200
                       100




                             20   25   30        35        40   45   50   55

                                                 Chest Girth




Figure: Population and sample relationships 1between Basic Statistical Concepts
                      M. George Akritas Lecture Chapter 1:
                                                           chest girth and
Outline
                              Why Statistics?
             Populations, Samples, and Census
                     Some Sampling Concepts


Sampling Variability


       Samples properties of the characteristic of interest also differ
       from sample to sample. For example:
        1. The number of US citizens, in a sample of size 20, who favor
           expanding solar energy, will (most likely) be different from the
           corresponding number in a different sample of 20 US citizens.
        2. The average mercury concentration in two sets of 25 water
           samples drawn from a lake will differ.
       The term sampling variability is used to describe such
       differences in the characteristic of interest from sample to
       sample.



                            M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                 Why Statistics?
Populations, Samples, and Census
        Some Sampling Concepts




         400
         300
Weight

         200
         100




               20      25     30        35        40    45     50      55

                                        Chest Girth




         Figure: Illustration of Sampling Variability.


                    M. George Akritas         Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                       Why Statistics?
      Populations, Samples, and Census
              Some Sampling Concepts




Population level properties/attributes of characteristic(s) of
interest are called (population) parameters.
     Examples of parameters include averages, proportions,
     percentiles, and correlation coefficient.
The corresponding sample properties/attributes of
characteristics are called statistics. The term sports statistics
comes from this terminology.
Sample statistics approximate the corresponding population
parameters but are not equal to them.
Statistical inference deals with the uncertainty issues which
arise in approximating parameters by statistics.
The tools of statistical inference include point and interval
estimation, hypothesis testing and prediction.


                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                           Why Statistics?
          Populations, Samples, and Census
                  Some Sampling Concepts




Example (Examples of Estimation, Hypothesis Testing and
Prediction)

    Estimation (point and interval) would be used in the task of
    estimating the coefficient of thermal expansion of a metal, or
    the air pollution level.
    Hypothesis testing would be used for deciding whether to take
    corrective action to bring the air pollution level down, or
    whether a manufacturer’s claim regarding the quality of a
    product is false.
    Prediction arises in cases where we would like to predict the
    failure time on the basis of the stress applied, or the age of a
    tree on the basis of its trunk diameter.


                         M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                       Why Statistics?    Simple Random and Stratified Sampling
      Populations, Samples, and Census    Sampling With and Without Replacement
              Some Sampling Concepts      Non-representative Sampling




For valid statistical inference the sample must be
representative of the population. For example, a sample of
PSU basketball players is not representative of PSU students,
if the characteristic of interest is height.
Typically it is hard to tell whether a sample is representative
of the population. So, we define a sample to be representative
if . . . (cyclical definition!!)

           it allows for valid statistical inference.

The only guarantee for that comes from the method used to
select the sample (sampling method).
The good news is that there are several sampling methods
guarantee representativeness.


                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                             Why Statistics?    Simple Random and Stratified Sampling
            Populations, Samples, and Census    Sampling With and Without Replacement
                    Some Sampling Concepts      Non-representative Sampling


Definition
A sample of size n is a simple random sample if the selection
process ensures that every sample of size n has equal chance of
being selected.
    To select a s.r.s. of size 10 from a population of 100 units, any
    of the 100!/(10!90!) samples of size 10 must be equally likely.
    In simple random sampling every member of the population
    has the same chance of being included in the sample. The
    reverse, however, is not true.

Example
To select a sample of 2 students from a population of 20 male and
20 female students, one selects at random one male and one
female students. Is this a s.r.s.? (Does every student have the
same chance of being included in the sample?)
                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                             Why Statistics?    Simple Random and Stratified Sampling
            Populations, Samples, and Census    Sampling With and Without Replacement
                    Some Sampling Concepts      Non-representative Sampling


Another sampling method for obtaining a representative sample is
called stratified sampling.

Definition
A stratified sample consists of simple random samples from each
of a number of groups (which are non-overlapping and make up
the entire population) called strata.

    Examples of strata include: ethnic groups, age groups, and
    production facilities.
    If the units in the different strata differ in terms of the
    characteristic under study, stratified sampling is preferable to
    s.r.s. For example, if different production facilities differ in
    terms of the proportion of defective products, a stratified
    sample is preferable.

                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                            Why Statistics?    Simple Random and Stratified Sampling
           Populations, Samples, and Census    Sampling With and Without Replacement
                   Some Sampling Concepts      Non-representative Sampling




How do we select a s.r.s. of size n from a population of N units?
    STEP 1: Assign to each unit a number from 1 to N.
    STEP 2: Write each number on a slips of paper, place the N
    slips of paper in an urn, and shuffle them.
    STEP 3: Select n slips of paper at random, one at a time.
Alternatively, the entire process can be performed in software like
R. We will see this in the next lab session.




                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                       Why Statistics?    Simple Random and Stratified Sampling
      Populations, Samples, and Census    Sampling With and Without Replacement
              Some Sampling Concepts      Non-representative Sampling



Sampling without replacement simply means that a
population unit can be included in a sample at most once. For
example, a simple random sample is obtained by sampling
without replacement: Once a unit’s slip of paper is drawn, it
is not placed back into the urn.
Sampling with replacement means that after a unit’s slip of
paper is chosen, it is put back in the urn. Thus a population
unit could be included in the sample anywhere between 0 and
n times. Rolling a die can be thought of as sampling with
replacement from the numbers 1, 2, . . . , 6.
Though conceptually undesirable, sampling with replacement
is easier to work with from a mathematical point of view.
When a population is very large, sampling with and without
replacement are practically equivalent.

                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                       Why Statistics?    Simple Random and Stratified Sampling
      Populations, Samples, and Census    Sampling With and Without Replacement
              Some Sampling Concepts      Non-representative Sampling




Non-representative samples arise whenever the sampling plan
is such that a part, or parts, of the population of interest are
either excluded from, or systematically under-represented in,
the sample. This is called selection bias.
Two examples of non-representative samples are self-selected
and convenience samples.
A self-selected sample often occurs when people are asked to
send in their opinions in surveys or questionnaires. For
example, in a political survey, often those who feel that things
are running smoothly or who support an incumbent will
(apathetically) not respond, whereas those activists who
strongly desire change will voice their opinions.



                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                            Why Statistics?    Simple Random and Stratified Sampling
           Populations, Samples, and Census    Sampling With and Without Replacement
                   Some Sampling Concepts      Non-representative Sampling


    A convenience sample is a sample made up from units that
    are most easily reached. For example, randomly selecting
    students from your classes will not result in a sample that is
    representative of all PSU students because your classes are
    mostly comprised of students with the same major as you.
    A famous example of selection bias is the following.

Example (The Literary Digest poll of 1936)
The magazine had been extremely successful in predicting the
results in US presidential elections, but in 1936 it predicted a
3-to-2 victory for Republican Alf Landon over the Democratic
incumbent Franklin Delano Roosevelt. Worth noting is that this
prediction was based on 2.3 million responses (out of 10 million
questionnaires sent). On the other hand Gallup correctly predicted
the outcome of that election by surveying only 50,000 people.
                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                      Why Statistics?    Simple Random and Stratified Sampling
     Populations, Samples, and Census    Sampling With and Without Replacement
             Some Sampling Concepts      Non-representative Sampling




Go to next lesson http://www.stat.psu.edu/~mga/401/
course.info/b.lect2.pdf
Go to the Stat 401 home page
http://www.stat.psu.edu/~mga/401/course.info/
http://www.stat.psu.edu/~mga
http://www.google.com




                    M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts

Weitere ähnliche Inhalte

Ähnlich wie B.lect1

Berman pcori challenge document
Berman pcori challenge documentBerman pcori challenge document
Berman pcori challenge documentLew Berman
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfkobra22
 
PSY326 WEEK 2 ASSIGNMENT RESEARCH QUESTION, HYPOTHESIS, AND APPRO.docx
PSY326 WEEK 2 ASSIGNMENT RESEARCH QUESTION, HYPOTHESIS, AND APPRO.docxPSY326 WEEK 2 ASSIGNMENT RESEARCH QUESTION, HYPOTHESIS, AND APPRO.docx
PSY326 WEEK 2 ASSIGNMENT RESEARCH QUESTION, HYPOTHESIS, AND APPRO.docxwoodruffeloisa
 
Good Science Essay Topics. Essay on Science and Technology Science and Techn...
Good Science Essay Topics. Essay on Science and Technology  Science and Techn...Good Science Essay Topics. Essay on Science and Technology  Science and Techn...
Good Science Essay Topics. Essay on Science and Technology Science and Techn...Kimberly Pulley
 
Sqqs1013 ch1-a122
Sqqs1013 ch1-a122Sqqs1013 ch1-a122
Sqqs1013 ch1-a122kim rae KI
 
Lecture 1 basic concepts2009
Lecture 1 basic concepts2009Lecture 1 basic concepts2009
Lecture 1 basic concepts2009barath r baskaran
 
Acadamic Writing AricleWS.pptx
Acadamic Writing AricleWS.pptxAcadamic Writing AricleWS.pptx
Acadamic Writing AricleWS.pptxWaseemPanhwar
 
Ch 1 and 2 test review
Ch 1 and 2 test reviewCh 1 and 2 test review
Ch 1 and 2 test reviewEsther Herrera
 
Internal examination 3rd semester disaster
Internal examination 3rd semester disasterInternal examination 3rd semester disaster
Internal examination 3rd semester disasterMahendra Poudel
 
Nber Lecture Final
Nber Lecture FinalNber Lecture Final
Nber Lecture FinalNBER
 
Applications of Computer Science in Environmental Models
Applications of Computer Science in Environmental ModelsApplications of Computer Science in Environmental Models
Applications of Computer Science in Environmental ModelsIJLT EMAS
 
Frictional resistance in self ligating orthodontic brackets and conventionall...
Frictional resistance in self ligating orthodontic brackets and conventionall...Frictional resistance in self ligating orthodontic brackets and conventionall...
Frictional resistance in self ligating orthodontic brackets and conventionall...VARADARAJU MAGESH
 
Quotes For College Essays. This is How You Write a College Essay College app...
Quotes For College Essays. This is How You Write a College Essay  College app...Quotes For College Essays. This is How You Write a College Essay  College app...
Quotes For College Essays. This is How You Write a College Essay College app...Mimi Williams
 
Cases studies 3 & 4 – primary care a 47 year-old male pati
Cases studies 3 & 4 – primary care a 47 year-old male patiCases studies 3 & 4 – primary care a 47 year-old male pati
Cases studies 3 & 4 – primary care a 47 year-old male patisodhi3
 

Ähnlich wie B.lect1 (20)

Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
Berman pcori challenge document
Berman pcori challenge documentBerman pcori challenge document
Berman pcori challenge document
 
Samples Types and Methods
Samples Types and Methods Samples Types and Methods
Samples Types and Methods
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdf
 
PSY326 WEEK 2 ASSIGNMENT RESEARCH QUESTION, HYPOTHESIS, AND APPRO.docx
PSY326 WEEK 2 ASSIGNMENT RESEARCH QUESTION, HYPOTHESIS, AND APPRO.docxPSY326 WEEK 2 ASSIGNMENT RESEARCH QUESTION, HYPOTHESIS, AND APPRO.docx
PSY326 WEEK 2 ASSIGNMENT RESEARCH QUESTION, HYPOTHESIS, AND APPRO.docx
 
Good Science Essay Topics. Essay on Science and Technology Science and Techn...
Good Science Essay Topics. Essay on Science and Technology  Science and Techn...Good Science Essay Topics. Essay on Science and Technology  Science and Techn...
Good Science Essay Topics. Essay on Science and Technology Science and Techn...
 
Sqqs1013 ch1-a122
Sqqs1013 ch1-a122Sqqs1013 ch1-a122
Sqqs1013 ch1-a122
 
Ch 4 SAMPLE..doc
Ch 4 SAMPLE..docCh 4 SAMPLE..doc
Ch 4 SAMPLE..doc
 
Lecture 1 basic concepts2009
Lecture 1 basic concepts2009Lecture 1 basic concepts2009
Lecture 1 basic concepts2009
 
Acadamic Writing AricleWS.pptx
Acadamic Writing AricleWS.pptxAcadamic Writing AricleWS.pptx
Acadamic Writing AricleWS.pptx
 
Ch 1 and 2 test review
Ch 1 and 2 test reviewCh 1 and 2 test review
Ch 1 and 2 test review
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Internal examination 3rd semester disaster
Internal examination 3rd semester disasterInternal examination 3rd semester disaster
Internal examination 3rd semester disaster
 
Federalists Essays.pdf
Federalists Essays.pdfFederalists Essays.pdf
Federalists Essays.pdf
 
Nber Lecture Final
Nber Lecture FinalNber Lecture Final
Nber Lecture Final
 
Applications of Computer Science in Environmental Models
Applications of Computer Science in Environmental ModelsApplications of Computer Science in Environmental Models
Applications of Computer Science in Environmental Models
 
Frictional resistance in self ligating orthodontic brackets and conventionall...
Frictional resistance in self ligating orthodontic brackets and conventionall...Frictional resistance in self ligating orthodontic brackets and conventionall...
Frictional resistance in self ligating orthodontic brackets and conventionall...
 
Practice Test 1 solutions
Practice Test 1 solutions  Practice Test 1 solutions
Practice Test 1 solutions
 
Quotes For College Essays. This is How You Write a College Essay College app...
Quotes For College Essays. This is How You Write a College Essay  College app...Quotes For College Essays. This is How You Write a College Essay  College app...
Quotes For College Essays. This is How You Write a College Essay College app...
 
Cases studies 3 & 4 – primary care a 47 year-old male pati
Cases studies 3 & 4 – primary care a 47 year-old male patiCases studies 3 & 4 – primary care a 47 year-old male pati
Cases studies 3 & 4 – primary care a 47 year-old male pati
 

Mehr von Ankit Katiyar

Transportation and assignment_problem
Transportation and assignment_problemTransportation and assignment_problem
Transportation and assignment_problemAnkit Katiyar
 
Time and space complexity
Time and space complexityTime and space complexity
Time and space complexityAnkit Katiyar
 
The oc curve_of_attribute_acceptance_plans
The oc curve_of_attribute_acceptance_plansThe oc curve_of_attribute_acceptance_plans
The oc curve_of_attribute_acceptance_plansAnkit Katiyar
 
Simple queuingmodelspdf
Simple queuingmodelspdfSimple queuingmodelspdf
Simple queuingmodelspdfAnkit Katiyar
 
Scatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionScatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionAnkit Katiyar
 
Probability mass functions and probability density functions
Probability mass functions and probability density functionsProbability mass functions and probability density functions
Probability mass functions and probability density functionsAnkit Katiyar
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statisticsAnkit Katiyar
 
Conceptual foundations statistics and probability
Conceptual foundations   statistics and probabilityConceptual foundations   statistics and probability
Conceptual foundations statistics and probabilityAnkit Katiyar
 
Applied statistics and probability for engineers solution montgomery && runger
Applied statistics and probability for engineers solution   montgomery && rungerApplied statistics and probability for engineers solution   montgomery && runger
Applied statistics and probability for engineers solution montgomery && rungerAnkit Katiyar
 
A hand kano-model-boston_upa_may-12-2004
A hand kano-model-boston_upa_may-12-2004A hand kano-model-boston_upa_may-12-2004
A hand kano-model-boston_upa_may-12-2004Ankit Katiyar
 
08.slauson.dissertation
08.slauson.dissertation08.slauson.dissertation
08.slauson.dissertationAnkit Katiyar
 

Mehr von Ankit Katiyar (20)

Transportation and assignment_problem
Transportation and assignment_problemTransportation and assignment_problem
Transportation and assignment_problem
 
Time and space complexity
Time and space complexityTime and space complexity
Time and space complexity
 
The oc curve_of_attribute_acceptance_plans
The oc curve_of_attribute_acceptance_plansThe oc curve_of_attribute_acceptance_plans
The oc curve_of_attribute_acceptance_plans
 
Stat methchapter
Stat methchapterStat methchapter
Stat methchapter
 
Simple queuingmodelspdf
Simple queuingmodelspdfSimple queuingmodelspdf
Simple queuingmodelspdf
 
Scatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssionScatter diagrams and correlation and simple linear regresssion
Scatter diagrams and correlation and simple linear regresssion
 
Queueing 3
Queueing 3Queueing 3
Queueing 3
 
Queueing 2
Queueing 2Queueing 2
Queueing 2
 
Queueing
QueueingQueueing
Queueing
 
Probability mass functions and probability density functions
Probability mass functions and probability density functionsProbability mass functions and probability density functions
Probability mass functions and probability density functions
 
Lecture18
Lecture18Lecture18
Lecture18
 
Lect17
Lect17Lect17
Lect17
 
Lect 02
Lect 02Lect 02
Lect 02
 
Kano
KanoKano
Kano
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
 
Conceptual foundations statistics and probability
Conceptual foundations   statistics and probabilityConceptual foundations   statistics and probability
Conceptual foundations statistics and probability
 
Axioms
AxiomsAxioms
Axioms
 
Applied statistics and probability for engineers solution montgomery && runger
Applied statistics and probability for engineers solution   montgomery && rungerApplied statistics and probability for engineers solution   montgomery && runger
Applied statistics and probability for engineers solution montgomery && runger
 
A hand kano-model-boston_upa_may-12-2004
A hand kano-model-boston_upa_may-12-2004A hand kano-model-boston_upa_may-12-2004
A hand kano-model-boston_upa_may-12-2004
 
08.slauson.dissertation
08.slauson.dissertation08.slauson.dissertation
08.slauson.dissertation
 

B.lect1

  • 1. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Lecture 1 Chapter 1: Basic Statistical Concepts M. George Akritas M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 2. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Why Statistics? Populations, Samples, and Census Some Sampling Concepts Representative Samples Simple Random and Stratified Sampling Sampling With and Without Replacement Non-representative Sampling M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 3. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Example (Examples of Engineering/Scientific Studies) Comparing the compressive strength of two or more cement mixtures. Comparing the effectiveness of three cleaning products in removing four different types of stains. Predicting failure time on the basis of stress applied. Assessing the effectiveness of a new traffic regulatory measure in reducing the weekly rate of accidents. Testing a manufacturer’s claim regarding a product’s quality. Studying the relation between salary increases and employee productivity in a large corporation. What makes these studies challenging (and thus to require Statistics) is the inherent or intrinsic variability: M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 4. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The compressive strength of different preparations of the same cement mixture will differ. The figure in http://sites. stat.psu.edu/~mga/401/fig/HistComprStrCement.pdf shows 32 compressive strength measurements, in MPa (MegaPascal units), of test cylinders 6 in. in diameter by 12 in. high, using water/cement ratio of 0.4, measured on the 28th day after they are made. Under the same stress, two beams will fail at different times. The proportion of defective items of a certain product will differ from batch to batch. Intrinsic variability renders the objectives of the case studies, as stated, ambiguous. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 5. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The objectives of the case studies can be made precise if stated in terms of averages or means. Comparing the average hardness of two different cement mixtures. Predicting the average failure time on the basis of stress applied. Estimation of the average coefficient of thermal expansion. Estimation of the average proportion of defective items. Moreover, because of variability, the words ”average” and ”mean” have a technical meaning which can be made clear through the concepts of population and sample. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 6. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Definition Population is a well-defined collection of objects or subjects, of relevance to a particular study, which are exposed to the same treatment or method. Population members are called units. Example (Examples of populations:) All water samples that can be taken from a lake. All items of a certain manufactured product. All students enrolled in Big Ten universities during the 2007-08 academic year. Two types of cleaning products. (Each type corresponds to a population.) M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 7. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The objective of a study is to investigate certain characteristic(s) of the units of the population(s) of interest. Example (Examples of characteristics:) All water samples taken from a lake. Characteristics: Mercury concentration; Concentration of other pollutants. All items of a certain manufactured product (that have, or will be produced). Characteristic: Proportion of defective items. All students enrolled in Big Ten universities during the 2007-08 academic year. Characteristics: Favorite type of music; Political affiliation. Two types of cleaning products. Characteristic: cleaning effectiveness. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 8. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts In the example where different (but of the same type) beams are exposed to different stress levels: the characteristic of interest is time to failure of a beam under each stress level, and each stress level used in the study corresponds to a separate population which consists of all beams that will be exposed to that stress level. This emphasizes that populations are defined not only by the units they consist of, but also by the method or treatment applied to these units. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 9. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Full (i.e. population-level) understanding of a characteristic requires the examination of all population units, i.e. a census. For example, full understanding of the relation between salary and productivity of a corporation’s employees requires obtaining these two characteristics from all employees. However, taking a census can be time consuming and expensive: The 2000 U.S. Census costed $6.5 billion, while the 2010 Census costed $13 billion. Moreover, census is not feasible if the population is hypothetical or conceptual, i.e. not all members are available for examination. Because of the above, we typically settle for examining all units in a sample, which is a subset of the population. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 10. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Due to the intrinsic variability, the sample properties/attributes of the characteristic of interest will differ from those of the population. For example The average mercury concentration in 25 water samples will differ from the overall mercury concentration in the lake. The proportion in a sample of 100 PSU students who favor the use of solar energy will differ from the corresponding proportion of all PSU students. The relation between bear’s chest girth and weight in a sample of 10 bears, will differ from the corresponding relation in the entire population of 50 bears in a forested region. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 11. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The GOOD NEWS is that, if the sample is suitably drawn, then sample properties approximate the population properties. 400 300 Weight 200 100 20 25 30 35 40 45 50 55 Chest Girth Figure: Population and sample relationships 1between Basic Statistical Concepts M. George Akritas Lecture Chapter 1: chest girth and
  • 12. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Sampling Variability Samples properties of the characteristic of interest also differ from sample to sample. For example: 1. The number of US citizens, in a sample of size 20, who favor expanding solar energy, will (most likely) be different from the corresponding number in a different sample of 20 US citizens. 2. The average mercury concentration in two sets of 25 water samples drawn from a lake will differ. The term sampling variability is used to describe such differences in the characteristic of interest from sample to sample. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 13. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts 400 300 Weight 200 100 20 25 30 35 40 45 50 55 Chest Girth Figure: Illustration of Sampling Variability. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 14. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Population level properties/attributes of characteristic(s) of interest are called (population) parameters. Examples of parameters include averages, proportions, percentiles, and correlation coefficient. The corresponding sample properties/attributes of characteristics are called statistics. The term sports statistics comes from this terminology. Sample statistics approximate the corresponding population parameters but are not equal to them. Statistical inference deals with the uncertainty issues which arise in approximating parameters by statistics. The tools of statistical inference include point and interval estimation, hypothesis testing and prediction. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 15. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Example (Examples of Estimation, Hypothesis Testing and Prediction) Estimation (point and interval) would be used in the task of estimating the coefficient of thermal expansion of a metal, or the air pollution level. Hypothesis testing would be used for deciding whether to take corrective action to bring the air pollution level down, or whether a manufacturer’s claim regarding the quality of a product is false. Prediction arises in cases where we would like to predict the failure time on the basis of the stress applied, or the age of a tree on the basis of its trunk diameter. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 16. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling For valid statistical inference the sample must be representative of the population. For example, a sample of PSU basketball players is not representative of PSU students, if the characteristic of interest is height. Typically it is hard to tell whether a sample is representative of the population. So, we define a sample to be representative if . . . (cyclical definition!!) it allows for valid statistical inference. The only guarantee for that comes from the method used to select the sample (sampling method). The good news is that there are several sampling methods guarantee representativeness. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 17. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Definition A sample of size n is a simple random sample if the selection process ensures that every sample of size n has equal chance of being selected. To select a s.r.s. of size 10 from a population of 100 units, any of the 100!/(10!90!) samples of size 10 must be equally likely. In simple random sampling every member of the population has the same chance of being included in the sample. The reverse, however, is not true. Example To select a sample of 2 students from a population of 20 male and 20 female students, one selects at random one male and one female students. Is this a s.r.s.? (Does every student have the same chance of being included in the sample?) M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 18. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Another sampling method for obtaining a representative sample is called stratified sampling. Definition A stratified sample consists of simple random samples from each of a number of groups (which are non-overlapping and make up the entire population) called strata. Examples of strata include: ethnic groups, age groups, and production facilities. If the units in the different strata differ in terms of the characteristic under study, stratified sampling is preferable to s.r.s. For example, if different production facilities differ in terms of the proportion of defective products, a stratified sample is preferable. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 19. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling How do we select a s.r.s. of size n from a population of N units? STEP 1: Assign to each unit a number from 1 to N. STEP 2: Write each number on a slips of paper, place the N slips of paper in an urn, and shuffle them. STEP 3: Select n slips of paper at random, one at a time. Alternatively, the entire process can be performed in software like R. We will see this in the next lab session. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 20. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Sampling without replacement simply means that a population unit can be included in a sample at most once. For example, a simple random sample is obtained by sampling without replacement: Once a unit’s slip of paper is drawn, it is not placed back into the urn. Sampling with replacement means that after a unit’s slip of paper is chosen, it is put back in the urn. Thus a population unit could be included in the sample anywhere between 0 and n times. Rolling a die can be thought of as sampling with replacement from the numbers 1, 2, . . . , 6. Though conceptually undesirable, sampling with replacement is easier to work with from a mathematical point of view. When a population is very large, sampling with and without replacement are practically equivalent. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 21. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Non-representative samples arise whenever the sampling plan is such that a part, or parts, of the population of interest are either excluded from, or systematically under-represented in, the sample. This is called selection bias. Two examples of non-representative samples are self-selected and convenience samples. A self-selected sample often occurs when people are asked to send in their opinions in surveys or questionnaires. For example, in a political survey, often those who feel that things are running smoothly or who support an incumbent will (apathetically) not respond, whereas those activists who strongly desire change will voice their opinions. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 22. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling A convenience sample is a sample made up from units that are most easily reached. For example, randomly selecting students from your classes will not result in a sample that is representative of all PSU students because your classes are mostly comprised of students with the same major as you. A famous example of selection bias is the following. Example (The Literary Digest poll of 1936) The magazine had been extremely successful in predicting the results in US presidential elections, but in 1936 it predicted a 3-to-2 victory for Republican Alf Landon over the Democratic incumbent Franklin Delano Roosevelt. Worth noting is that this prediction was based on 2.3 million responses (out of 10 million questionnaires sent). On the other hand Gallup correctly predicted the outcome of that election by surveying only 50,000 people. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 23. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Go to next lesson http://www.stat.psu.edu/~mga/401/ course.info/b.lect2.pdf Go to the Stat 401 home page http://www.stat.psu.edu/~mga/401/course.info/ http://www.stat.psu.edu/~mga http://www.google.com M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts