Diese Präsentation wurde erfolgreich gemeldet.

# Sampling & Sampling Distribtutions   ×

1 von 62
1 von 62

# Sampling & Sampling Distribtutions

This presentation includes topics related to sampling and its distributions, estimates related to large samples and small samples using Z test and T test respectively. Also when to use Finite Population Multiplier is explained in detail.

This presentation includes topics related to sampling and its distributions, estimates related to large samples and small samples using Z test and T test respectively. Also when to use Finite Population Multiplier is explained in detail.

### Sampling & Sampling Distribtutions

1. 1. SAMPLING & SAMPLING DISTRIBUTIONS ESTIMATION…
2. 2.  India’s population = 132 Cr.  TV Viewership = 66 Cr.  No. of TV Sets = 16 Cr. (hypothetical)  We want to determine what programs Indian watch and 10000 TV sets are sampled to determine for this.  Why select only 1000 sets out of 16 Cr.  Because time and average cost of interview prohibit the rating companies from trying to reach millions of people. 2 BirinderSingh,AssistantProfessor,PCTE Ludhiana
3. 3. IN THIS CHAPTER, WE EXAMINE QUESTIONS SUCH AS  How many people should be interviewed?  How should they be selected?  How do we know when our sample accurately reflects the entire population? 3 BirinderSingh,AssistantProfessor,PCTE Ludhiana
4. 4. WHY SAMPLING?  The testing process is destructive (Time Constraint)  The population is too large to be completely tested  It is almost impossible to define the population  Average Cost is too high 4 BirinderSingh,AssistantProfessor,PCTE Ludhiana
5. 5. DEFINITIONS  Population: All items that have been chosen for study. It is also called Census.  Sample: A portion chosen from the population.  Parameters: Characteristics that describe a population  Statistics: Characteristics that describe a sample  Census: Process of obtaining responses from/about each member of the population  Sampling: Process of selecting a subset from members of the population 5 BirinderSingh,AssistantProfessor,PCTE Ludhiana
6. 6. CONVENTIONS TO BE USED Characteristics Population Parameter Sample Statistics Size N n Mean µ ҧ𝑥 Std. Deviation σ s Proportion p or π ҧ𝑝 or p 6 BirinderSingh,AssistantProfessor,PCTE Ludhiana
7. 7. SAMPLING METHODS BirinderSingh,AssistantProfessor,PCTE Ludhiana 7  A sample in which the probability that an element of population will be drawn is not known  Classifications :  Convenience Sampling  Judgemental Sampling  Voluntary Response Sampling  A sample in which the probability that an element of population will be drawn is known.  It is also called random sampling  Methods:  Simple Random Sampling  Systematic Sampling  Stratified Sampling  Cluster Sampling Non Probability SM Probability SM
8. 8. SIMPLE RANDOM SAMPLING  Simple Random Sampling selects samples by methods that allow each possible sample to have an equal probability of being picked and each item in the entire population to have an equal chance of being included in the sample.  Ex: Selecting a pair of 2 students from four students A,B,C,D  How to do Random Sampling:  The easiest way is the use of random numbers. These numbers can be generated by a computer programmed to scramble numbers or by a table of random numbers/digits.  Another method is to write the name of each number on a slip of paper and deposit the slips in a box. 8 BirinderSingh,AssistantProfessor,PCTE Ludhiana
9. 9. SIMPLE RANDOM SAMPLING BirinderSingh,AssistantProfessor,PCTE Ludhiana 9  It eliminates bias, hence is more representative of the population.  This theory is more reliable & highly developed  It saves time & effort  Requires an upto date & complete list of population units to be sampled.  If area of coverage is large, random samples are also widely scattered geographically. Merits Demerits
10. 10. SYSTEMATIC SAMPLING  In systematic sampling, elements are selected from the population at a uniform interval that is measured in time, order or space.  Ex: If we wanted to interview every 20th student on a college campus, we would chose a random starting point in the first 20 names in the student directory and then pick every 20th name thereafter. 10 BirinderSingh,AssistantProfessor,PCTE Ludhiana
11. 11. STRATIFIED RANDOM SAMPLING VS  In stratified random sampling, we divide the population into relatively homogeneous called strata.  Each group has small variation within itself but there is a wide variation between the groups.  In cluster random sampling, we divide the population into groups or clusters and then select a random sample of these clusters.  Each group has considerable variation within itself but there is a noticeable similarity between the groups. 11 BirinderSingh,AssistantProfessor,PCTE Ludhiana
12. 12. SAMPLING DISTRIBUTIONS  Sampling Distribution of the Mean: It is a probability distribution of all the possible means of the samples is a distribution of the sample means.  Ex: Suppose our samples each consist of ten 25 year old women from a city with a population of 1,00,000. By computing the mean height and SD of each of these samples, we would quickly see that mean and SD of each sample would be different.  Sampling Distribution of Proportion……… refers to the proportion instead of mean 12 BirinderSingh,AssistantProfessor,PCTE Ludhiana
13. 13. SAMPLING DISTRIBUTION – EXAMPLE Population Sample Sample Statistics Sampling Distribution All professional basketball teams Group of 5 players Mean Height SD of Mean All parts produced by manufacturing process 50 parts Proportion defective SD of proportion 13 BirinderSingh,AssistantProfessor,PCTE Ludhiana
14. 14. A SAMPLING DISTRIBUTION Let’s create a sampling distribution of means… Take a sample of size 1,500 from the US. Record the mean income. Our census said the mean is \$30K. \$30K
15. 15. A SAMPLING DISTRIBUTION Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is \$30K. \$30K
16. 16. A SAMPLING DISTRIBUTION Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is \$30K. \$30K
17. 17. A SAMPLING DISTRIBUTION Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is \$30K. \$30K
18. 18. A SAMPLING DISTRIBUTION Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is \$30K. \$30K
19. 19. A SAMPLING DISTRIBUTION Let’s create a sampling distribution of means… Take another sample of size 1,500 from the US. Record the mean income. Our census said the mean is \$30K. \$30K
20. 20. A SAMPLING DISTRIBUTION Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is \$30K. \$30K
21. 21. A SAMPLING DISTRIBUTION Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is \$30K. \$30K
22. 22. A SAMPLING DISTRIBUTION Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is \$30K. \$30K
23. 23. A SAMPLING DISTRIBUTION Let’s create a sampling distribution of means… Let’s repeat sampling of sizes 1,500 from the US. Record the mean incomes. Our census said the mean is \$30K. \$30K The sample means would stack up in a normal curve. A normal sampling distribution.
24. 24. A SAMPLING DISTRIBUTION Say that the standard deviation of this distribution is \$10K. Think back to the empirical rule. What are the odds you would get a sample mean that is more than \$20K off. \$30K The sample means would stack up in a normal curve. A normal sampling distribution. -3z -2z -1z 0z 1z 2z 3z
25. 25. A SAMPLING DISTRIBUTION Say that the standard deviation of this distribution is \$10K. Think back to the empirical rule. What are the odds you would get a sample mean that is more than \$20K off. \$30K The sample means would stack up in a normal curve. A normal sampling distribution. -3z -2z -1z 0z 1z 2z 3z 2.5% 2.5%
26. 26. STANDARD ERROR (S.E.)  The standard deviation of the distribution of a sample statistic is known as the standard error of the statistic.  SE indicates how spread out (dispersed) the means of the sample are.  SE indicates not only the size of the chance error that has been made, but also the accuracy we are likely to get if we use a sample statistic to estimate a population parameter.  A distribution of sample means that is less spread out (having small SE) is a better estimator of the population mean 26 BirinderSingh,AssistantProfessor,PCTE Ludhiana
27. 27. The expected value of the sample mean is equal to the population mean: E X X X ( )    The variance of the sample mean is equal to the population variance divided by the sample size: V X nX X ( )   2 2 The standard deviation of the sample mean, known as the standard error of the mean, is equal to the population standard deviation divided by the square root of the sample size: n XSD X X    )(s.e. RELATIONSHIPS BETWEEN POPULATION PARAMETERS AND THE SAMPLING DISTRIBUTION OF THE SAMPLE MEAN
28. 28. CENTRAL LIMIT THEOREM  As sample size increases, the sampling distribution of means approaches normal distribution, irrespective of the nature of population distribution.  As a thumb rule, for n≥30, SDM is taken to be normally distributed.  This is called Central Limit Theorem.  The significance of CLT is that it permits us to use sample statistics to make inferences about population parameters without knowing anything about the shape of the frequency distribution of that population.  Sample means from population which are normally distributed are also normally distributed regardless of size if sample. 28 BirinderSingh,AssistantProfessor,PCTE Ludhiana
29. 29. CONVENTIONS TO BE USED Characteristics Population Parameter Sample Statistics Size N n Mean µ ҧ𝑥 Std. Deviation σ s Proportion p or π ҧ𝑝 or p 29 BirinderSingh,AssistantProfessor,PCTE Ludhiana
30. 30. WORKING METHODOLOGY  Make sure population is infinite i.e. N is not given  Check whether n≥30; if yes, SDM is considered to be normally distributed  Find Z score using formula:  Z = 𝑥 − 𝜇 ҧ𝑥 𝜎 ҧ𝑥 where  𝑥 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛;  𝜇 ҧ𝑥 = Mean of Means; 𝜇 ҧ𝑥 = 𝜇  𝜎 ҧ𝑥 = 𝜎 𝑛 30 BirinderSingh,AssistantProfessor,PCTE Ludhiana
31. 31. PRACTICE PROBLEMS – SDM / CLT  A bank calculates that its individual savings accounts are having a mean of \$2000 and SD of \$600. If the bank takes a random sample of 100 accounts, what is the probability that the sample mean will lie between \$1900 and \$2050? (0.75) 31 BirinderSingh,AssistantProfessor,PCTE Ludhiana
32. 32. PRACTICE PROBLEMS – SDM / CLT  A continuous manufacturing process produces items whose weights are normally distributed with a mean of 8 kg and SD of 3 kg. A random sample of 16 items is to be drawn. What is the probability that sample mean exceeds 9 kgs. (9.18%) 32 BirinderSingh,AssistantProfessor,PCTE Ludhiana
33. 33. THE FINITE POPULATION MULTIPLIER  Most of the populations decision are examined on finite population i.e. it has limited size.  Standard Error of the mean for Finite Population is given by:  𝜎 ҧ𝑥 = 𝜎 𝑛 𝑥 𝑁 −𝑛 𝑁 −1 𝑤ℎ𝑒𝑟𝑒  𝑁 −𝑛 𝑁 −1 is called Finite Population Multiplier  N = Size of population  n = sample size  Population & Sampling Ratio  If n/N > 0.05; population is finite  If n/N ≤ 0.05; population is infinite  When the sampling fraction is less than 0.05, the finite population multiplier need not to be used. 33 BirinderSingh,AssistantProfessor,PCTE Ludhiana
34. 34. PRACTICE PROBLEMS  From a population of 125 items with a mean of 105 and SD of 17, 64 items were chosen.  Find Standard Error. (1.4904)  What is the P(107.5 < ҧ𝑥 < 109)? (0.0428)  34 BirinderSingh,AssistantProfessor,PCTE Ludhiana
35. 35. PRACTICE PROBLEMS  From a population of 75 items with a mean of 364 and Variance of 18, 32 items were chosen.  Find Standard Error.  What is the P(363 < ҧ𝑥 < 366)? 35 BirinderSingh,AssistantProfessor,PCTE Ludhiana
36. 36. 36 BirinderSingh,AssistantProfessor,PCTE Ludhiana
37. 37. ESTIMATION  When you are ready to cross a street, you estimate the speed of the car that is approaching towards you, the distance between you and the car and your own speed.  Based on these quick estimates, you decide whether to wait, walk or run….. 37 BirinderSingh,AssistantProfessor,PCTE Ludhiana
38. 38. REASONS FOR ESTIMATES  Unit head estimates of next year admissions  Credit Manager estimates whether a purchase will eventually pay his bills  Homemakers estimate about the increase in commodity prices  All these people make estimates without worry about whether they are scientific but with the hope that the estimates bear a reasonable resemblance to the outcome. 38 BirinderSingh,AssistantProfessor,PCTE Ludhiana
39. 39. TYPES OF ESTIMATES BirinderSingh,AssistantProfessor,PCTE Ludhiana 39  A single number that is used to estimate an unknown population parameter.  Ex: Department head makes an estimate that our current data indicates that MBA course will have 300 students in the next year.  It indicates the errors in two ways:  Often insufficient as it is either right or wrong.  Evaluation of precision of estimator is not possible.  Range of values that is used to estimate an unknown population parameter.  Ex: Department head makes an estimate that our current data indicates that MBA course will have 280-320 students in the next year.  It indicates the errors in two ways:  Extent of range  Probability of true population parameter lying within that range. Point Estimate Interval Estimate
40. 40. ESTIMATOR & ESTIMATES  An estimator is a sample statistic used to estimate a population parameter.  Sample Mean ҧ𝑥 can be a estimator of the Population Mean µ.  Sample Proportion ҧ𝑝 can be a estimator of the Population Proportion p.  An estimate is a specific observed value (numerical value) of a statistic. 40 BirinderSingh,AssistantProfessor,PCTE Ludhiana Population in which we are interested Population Parameter we wish to Estimate Sample Statistic we will use as an Estimator Estimate we make Employees in a furniture factory Mean turnover per year Mean turnover for a period of 1 month 8.9% turnover per year Teenagers in a given community Proportion who have criminal record Proportion of a sample of 50 teenagers 2% have criminal records
41. 41. CHARACTERISTICS (CRITERIA) OF A GOOD ESTIMATOR  It should be unbiased: Sample mean is an unbiased estimator of population because mean of sampling distribution of means is equal to the population mean i.e. µ ҧ𝑥 = µ  It should be efficient: Efficiency refers to the size of the standard error of the statistic. The distribution with small standard error or deviation is preferred.  It should be consistent: Large samples are always more consistent. As sample size increases, it becomes almost certain that the value of the statistic comes very close to the value of the population parameter.  It should be sufficient: No other estimator could be able to extract more information from the sample being estimated. 41 BirinderSingh,AssistantProfessor,PCTE Ludhiana
42. 42. PRACTICE PROBLEMS – POINT ESTIMATES  ABC Co. Ltd is considering expanding its seating capacity and needs to know both the average number of people who attend events there and the variability in this number. The following are the attendances (in thousands) at nine randomly selecting sporting events. Find point estimates of the mean and the variance of the population from which sample was drawn. 8.8, 14.0, 21.3, 7.9, 12.5, 20.6, 16.3, 14.1, 13.0 (14.28, 21.12) 42 BirinderSingh,AssistantProfessor,PCTE Ludhiana
43. 43. INTERVAL ESTIMATE  Interval Estimate: Range of values within which a population parameter is likely to be.  Confidence Level: Probability that is associated with an interval estimate.  Confidence Interval: Range of estimate for a given confidence level.  ഥ𝒙 − 𝒛 𝝈ഥ𝒙 ≤ µ ≤ ഥ𝒙 + 𝒛 𝝈ഥ𝒙 43 BirinderSingh,AssistantProfessor,PCTE Ludhiana Sample Mean (Point Estimate of Mean) Confidence Coefficient Standard Error Population Mean
44. 44. COMMONLY USED CONFIDENCE LEVEL & CONFIDENCE COEFFICIENTS Confidence Level (%age) Confidence Coefficient 90 1.64 95 1.96 98 2.33 99 2.58 68.26 1 95.4 2 99.9 3 44 BirinderSingh,AssistantProfessor,PCTE Ludhiana
45. 45. INTERVAL ESTIMATES OF MEAN FROM LARGE SAMPLES  There are two cases:  Case 1: When Population SD is known  Case 2: When Population SD is not known 45 BirinderSingh,AssistantProfessor,PCTE Ludhiana
46. 46. COMPUTATIONAL PROCEDURE  Choose level of confidence  Find ‘Z’ for chosen level  Compute Standard Error  If σ is known  For infinite population: 𝜎 ҧ𝑥 = 𝜎 𝑛  For finite population: 𝜎 ҧ𝑥 = 𝜎 𝑛 𝑥 𝑁 −𝑛 𝑁 −1  If σ is not known  𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝐸 = 𝑠 ҧ𝑥 = ො𝜎 ҧ𝑥 = 𝑠 𝑛 where Sample SD = s = ො𝜎 = Σ 𝑥− ҧ𝑥 2 𝑛−1  s = Sample SD is used to estimate of the population SD  Construct Confidence Interval  𝑳𝑪𝑳 = ഥ𝒙 − 𝒛 𝝈ഥ𝒙  𝑳𝑪𝑳 = ഥ𝒙 + 𝒛 𝝈ഥ𝒙 46 BirinderSingh,AssistantProfessor,PCTE Ludhiana
47. 47. PRACTICE PROBLEMS – ESTIMATION  Sample mean life of 200 batteries of a make is 36 months. Estimate the mean life of that make of batteries with 95% confidence. Standard Deviation of population is known to be 10 months. (34.61 ≤ µ≤ 37.39) 47 BirinderSingh,AssistantProfessor,PCTE Ludhiana
48. 48. PRACTICE PROBLEMS – ESTIMATION  50 randomly selected pieces of plastic rope had a mean breaking strength of 25 psi & SD of 1.4 psi. Find mean breaking strength at 99% confidence level. (psi = pounce per square inch) (24.49 ≤ µ≤ 25.51) 48 BirinderSingh,AssistantProfessor,PCTE Ludhiana
49. 49. PRACTICE PROBLEMS – ESTIMATION  A large automotive parts wholesaler needs an estimate of the mean life it can expect from windshield wiper blades under typical driving conditions, Already, management has determined that the SD of the population life is 6 months. A random sample of 100 wiper blades has been selected with mean life of 21 months. Find an interval estimate of mean life with confidence level of 95%. (19.82 ≤ µ≤ 22.18) 49 BirinderSingh,AssistantProfessor,PCTE Ludhiana
50. 50. PRACTICE PROBLEMS – ESTIMATION  From a population of 540, a sample of 60 individuals is taken. From this sample, the mean is found to be 6.2 and the SD is 1.368.  Find the estimated standard error of the mean (0.167)  Construct a 96 percent confidence interval of the mean. (5.86 ≤ µ≤ 6.54) 50 BirinderSingh,AssistantProfessor,PCTE Ludhiana
51. 51. INTERVAL ESTIMATES OF MEAN FROM SMALL SAMPLES (T DISTRIBUTION)  In certain cases, where normal distribution is not the appropriate sampling distribution i.e. when we are estimating the population SD and the sample size is small i.e. less than 30  In such cases, other distribution is appropriate called t – distribution  Also called Student’s distribution  The second condition is that population standard deviation must be unknown. 51 BirinderSingh,AssistantProfessor,PCTE Ludhiana
52. 52. T - DISTRIBUTION  The shape of the t distribution is very similar to the shape of the standard normal distribution.  The t distribution has a (slightly) different shape for each possible sample size.  They are all symmetric and unimodal.  They are somewhat broader than Z, reflecting the additional uncertainty resulting from using s in place of .  As n gets larger and larger, the shape of the t distribution approaches the standard normal.  Contains more area under tails.  We need to know degree of freedom in t distribution. If sample size is n, then df = n – 1. 52 BirinderSingh,AssistantProfessor,PCTE Ludhiana
53. 53. CONDITIONS FOR T DISTRIBUTION  n<30  Populations SD (σ) is not known.  Populations assumed to be normal or nearly normal  Note:  Since σ is not known, ො𝜎 ҧ𝑥 is used in lieu of 𝜎 ҧ𝑥  Interval Estimation of Population Mean is  ഥ𝒙 −𝒕ෝ𝝈ഥ𝒙≤ µ ≤ ഥ𝒙 +𝒕 ෝ𝝈ഥ𝒙 where t = ഥ𝒙 − µ ෝ𝜎ഥ𝑥  In t-distribution table, it shows area and t-values for only few %ages (10,5,2,1) 53 BirinderSingh,AssistantProfessor,PCTE Ludhiana
54. 54. COMPUTATIONAL PROCEDURE  Choose Confidence Level  Find total chance of error i.e. α = 1 – CL  Find degree of freedom i.e. df = n – 1  Extract t value using df & α  Compute estimate intervals. 54 BirinderSingh,AssistantProfessor,PCTE Ludhiana
55. 55. PRACTICE PROBLEMS – T DISTRIBUTION  Determine the 95% Confidence Interval for mean burning time of marine flares if 9 flares were tested and yielded a mean burning time of 40 minutes with a SD of 10 minutes. (32.32 ≤ µ≤ 47.68) 55 BirinderSingh,AssistantProfessor,PCTE Ludhiana
56. 56. PRACTICE PROBLEMS – T DISTRIBUTION  Seven homemakers were randomly sampled and it was determined that the distances they walked in their housework had an average of 39.2 miles per week and a SD of 3.2 miles per week. Construct a 95% confidence interval for the population mean (36.24 ≤ µ≤ 42.16) 56 BirinderSingh,AssistantProfessor,PCTE Ludhiana
57. 57. DECISION FLOW DIAGRAM - ESTIMATION 57 BirinderSingh,AssistantProfessor,PCTE Ludhiana Start Is n≥30 Is pop. Known to be normally distributed Use ‘Z’ table Stop Use a Statistician Is SD known ? Use ‘Z’ table Stop Use ‘t’ table Stop Not Known Known
58. 58. SAMPLING DISTRIBUTION OF PROPORTIONS (SDP) Means Proportions Population Mean µ p Sample Mean ҧ𝑥 ҧ𝑝 Mean of SDM µ ҧ𝑥 = µ µ ҧ𝑝 = p SD of SDM σ ҧ𝑥 σ ҧ𝑝 Estimation of SDM ො𝜎 ҧ𝑥 ො𝜎 ҧ𝑝 58 BirinderSingh,AssistantProfessor,PCTE Ludhiana
59. 59. SDP – FORMULAE  Standard Error  𝝈ഥ𝒑 = 𝑝𝑞 𝑛 (From population proportion)  ො𝜎 ҧ𝑝 = ҧ𝑝 ത𝑞 𝑛 (Estimated from sample proportion)  Confidence Interval Estimate in SDP:  ഥ𝒑 − 𝒛 𝝈ഥ𝒑 ≤ µ ≤ ഥ𝒑 + 𝒛 𝝈ഥ𝒑 59 BirinderSingh,AssistantProfessor,PCTE Ludhiana
60. 60. PRACTICE PROBLEMS – SDP  A TV company wishes to find out the proportion of families in a city who owns a TV. A sample survey of 400 families revealed that 320 of them owned a TV. Can we estimate with 95% confidence the percentage of families in entire city who own a TV. (76.08% ≤ p≤ 83.92%) 60 BirinderSingh,AssistantProfessor,PCTE Ludhiana
61. 61. PRACTICE PROBLEMS – SDP  Delhi police intends to introduce a new uniform for officers cadre. A survey estimates the proportion of officers who would prefer change. Results showed that 45 out of 75 favored change. Estimate the population proportion in favor of proposal with 90% confidence level. (50.65% ≤ p≤ 69.35%) 61 BirinderSingh,AssistantProfessor,PCTE Ludhiana
62. 62. PRACTICE PROBLEMS – SDP  Dr. Benjamin, a noted social psychologist, surveyed 150 top executives and found that 42% of them were unable to add fractions correctly.  Estimate the standard error of the population. (0.0403)  Construct a 99% confidence interval for the true proportion of top executives who cannot correctly add fractions. (0.316 ≤ p≤ 0.524) 62 BirinderSingh,AssistantProfessor,PCTE Ludhiana