2. SAMPLING DISTRIBUTION
There are three distinct types of distribution of data which are –
1.Population Distribution, characterizes the distribution of elements of a
population
2.Sample Distribution, characterizes the distribution of elements of a sample
drawn from a population
3.Sampling Distribution, describes the expected behavior of a large number
of simple random samples drawn from the same population.
Sampling distributions constitute the theoretical basis of statistical inference
and are of considerable importance in business decision-making. Sampling
distributions are important in statistics because they provide a major
simplification on the route to statistical inference.
3. DEFINITION
A sampling distribution is a theoretical probability distribution of a
statistic obtained through a large number of samples drawn from a
specific population ( McTavish : 435)
A sampling distribution is a graph of a statistics(i.e. mean, mean
absolute value of the deviation from the mean,range,standard
deviation of the sample, unbiased estimate of variance, variance of the
sample) for sample data.
Sampling distribution is a theoretical distribution of an infinite number
of sample means of equal size taken from a population . ( Walsh : 95)
4. CHARACTERISTICS
Usually a univariate distribution.
Closely approximate a normal
distribution.
Sample statistic is a random variable –
sample mean , sample & proportion
A theoretical probability distribution
The form of a sampling distribution
refers to the shape of the particular
curve that describes the distribution.
5. FUNCTIONS OF SAMPLING DISTRIBUTION
Sampling distribution is a graph which perform several duties to
show data graphically.
Sampling distribution works for :
Mean
Mean absolute value of the deviation from the mean
Range
Standard deviation of the sample
Unbiased estimate of the sample
Variance of the sample
6. WHY SAMPLING DISTRIBUTION IS
IMPORTANT????
PROPERTIES OF
STATISTICS
SELECTION OF
DISTRIBUTIO TYPE TO
MODEL SCORE
HYPOTHESIS
TESTING
7. i)Properties of Statistic : Statistic have different properties as
estimators of a population parameters. The sampling
distribution of a statistic provides a window into some of the
important properties. For example if the expected value of a
statistic is equal to the expected value of the corresponding
population parameter, the statistic is said to be unbiased
Consistency is another valuable property to have in
estimation of a population parameter, as the statistic with the
smallest standard error is preferred as an estimator estimator A
statistic used to estimate a model parameter.of the
corresponding population parameter, everything else being
equa.l
8. ii) Selection of distribution type to model scores :
The sampling distribution provides the theoretical foundation to select a
distribution for many useful measures. For example, the central limit
theorem describes why a measure, such as intelligence, that may be
considered a summation of a number of independent quantities would
necessarily be distributed as a normal (Gaussian) curve.
iii) Hypothesis Testing :
The sampling distribution is integral to the hypothesis testing procedure. The
sampling distribution is used in hypothesis testing to create a model of what the
world would look like given the null hypothesis was true and a statistic was
collected an infinite number of times. A single sample is taken, the sample
is calculated, and then it is compared to the model created by the sampling
distribution of that statistic when the null hypothesis is true. If the sample statistic
is unlikely given the model, then the model is rejected and a model with real
effects is more likely.
9. TYPES OF SAMPLING
DISTRIBUTION
The types of sampling distribution are as
follows:
1) Sampling Distribution of the Mean:
Sampling distribution of means of a population
data is defined as the theoretical probability
distribution of the sample means which are
obtained by extracting all the possible
samples having the same size from the given
population.
Given a finite population with mean (m) and
variance (s2). When sampling from a normally
distributed population, it can be shown that the
distribution of the sample mean will have the
following properties -
10. CENTRAL LIMIT THEOREM
The central limit theorem, first introduced by De Moivre during the early
eighteenth century, happens to be the most important theorem in
statistics. According to this theorem, if we select a large number of
simple random samples, for example, from any population distribution
and determine the mean of each sample, the distribution of these
sample means will tend to be described by the normal probability
distribution with a mean µ and variance σ 𝟐/n.
Or in other words, we can say that, the sampling distribution of sample
means approaches to a normal distribution.
Symbolically, the theorem can be explained as following :
11. When given n independent random variables
𝑋1,𝑋2,𝑋3,…..𝑋 𝑛 which have the same distribution ( no
matter what distribution),then :
X = 𝑋1 + 𝑋2 + 𝑋3 + … 𝑋 𝑛
is a normal variate. The mean µ and variance 𝝈 𝟐 of X are
𝜇 = 𝜇1 + 𝜇2 + 𝜇3 + … . 𝜇 𝑛 = 𝑛𝜇1
𝜎2 = 𝜎1
2
+ 𝜎2
2
+ 𝜎3
2
+ … 𝜎 𝑛
2 = 𝑛𝜎1
2
where µ 𝟏 𝑎𝑛𝑑 𝝈 𝟏
𝟐
are the mean and variance of 𝑿 𝟏
12. UTILITY :
The utility of this theory is that it requires virtually no conditions on
distribution patterns of the individual random variable being summed.
As a result, it furnishes a practical method of computing approximate
probability values associated with sums of arbitrarily distributed
independent random variables.
This theorem helps to explain why a vast number of phenomena
show approximately a normal distribution. Because of its theoretical
and practical significance, this theorem is considered as most
remarkable theoretical formulation of all probability laws.
However, most of hypothesis testing and sampling theory is based
on this theorem. So the central limit theorem is perhaps the most
fundamental result in all of statistics.
13. 2) SAMPLING DISTRIBUTION OF THE PROPORTION :
Sampling distribution of the proportion is found when the sample proportion
and proportion of successes are given.
Properties :
Sample proportion tend to target the value of proportion.
Under certain conditions, the distribution of sample proportion
can be approximated by a normal distribution.
14. Example:
Sample distribution of the proportion
of the girls from sample space for two
randomly selected births:bb,bg,gb,gg
All four outcomes are equally likely:
Probabilities:
P(0 girls)=0.25
P(1 girl)=0.50
P(2girls)=0.75
15. STANDARD ERROR OF THE SAMPLING DISTRIBUTION
The sampling distribution has a standard deviation. The mean of the
sampling distribution will be the same as the population mean, but the
standard deviation will be smaller than the Population Standard Deviation.
The standard deviation of the sampling distri bution has a special name :
‘The Standard Error’ or sometimes ‘The Standard Error of the Mean .
The variation of sample mean around the population mean is the
sampling error and is measured using a statistic known as the standard error
of the mean. This is an estimate of the amount that a sample mean is likely to
differ from the population mean. This consideration is important because
sampling theory tells us that 68% of all sample means will lie between + or –
one standard error from the population mean. And that 95 % of all sample
mean will lie between + or – 1.96 standard errors from the population mean
(Bryman,Alan,2004, P: 96 ) .
16. Formula :
The standard error of a sampling distribution is equal to the
standard deviation of the population divided by the square root of the
sample size. The formula of the standard error is as follows :
𝝈 𝒙 = σ/ √𝑵
Here,
𝜎 𝑥 = Standard deviation of sample mean .
𝜎 = Standard deviation of population .
𝑁 = Total Population .
How to reduce Error :
When sample size increases, sampling error decreases .
17. Purpose :
1. Allows us to quantify the extent to which a ‘test’ provides accurate scores.
2. If the standard error is smaller,the range of population mean will be
narrower.
3. When standard error is larger, the range of population mean will be wider
Application :
95 % CI = Mean ± ( 1.96 × SEM )
99 % CI = Mean ± ( 2.58 × SEM )
18. STANDARD ERROR TABLE
SAMPLING
DISTRIBUTION
STANDARD ERROR SAMPLING
DISTRIBUTION
STANDARD ERROR
MEANS 𝜎 𝑥 =
𝜎
√𝑁
FIRST & THIRD
QUARTILES
𝜎 𝑄1= 𝜎 𝑄3 =
1.3626 𝜎
𝑁
PROPORTIONS
𝜎 𝑝=
𝑝 (1−𝑝 )
𝑁
=
𝑝𝑞
𝑁
SEMI-INTERQUARTILE
RANGESS
𝜎 𝑄 =
0.7867 𝜎
𝑁
STANDARD DEVIATIONS 1. 𝜎𝑠=
𝜎
2𝑁
2. 𝜎𝑠=
𝜇4− 𝜇2
2
4𝑁𝜇2
VARIANCES
1. 𝜎 𝑠2 = 𝜎2 2
𝑁
2. 𝜎 𝑠2 =
𝜇4− 𝜇2
2
𝑁
MEDIANS
𝜎 𝑚𝑒𝑑=σ
𝜋
2𝑁
=
1.2533 𝜎
√𝑁
COEFFICIENTS OF
VARIATION
𝜎𝑣=
𝑣
2𝑁
1 + 2𝑣2
19. Point & Interval Estimates
There are two kinds of estimates of population parameters from sample
statistics :
A point estimate is a single value and an interval estimate is a range of
values.
POINT
ESTIMATES
INTERVAL
ESTIMATES
20. POINT ESTIMATION :
A point estimate of a population parameter is a single value of a statistic.
For example,
the sample mean ¯x is a point estimate of the population mean μ. Similarly,
the sample proportion p is a point estimate of the population proportion P.
Interval Estimation :
An interval estimate is defined by two numbers, between which a population
parameter is said to lie.
21. For example
a < x < b is an interval estimate of the population mean
μ. It indicates that the population mean is greater than a
but less than b.
In any estimation problem, we need to obtain both a
point estimate and an interval estimate. The point
estimate is our best guess of the true value of the
parameter, while the interval estimate gives a measure
of accuracy of that point estimate by providing an
interval that contains plausible values.
22. MATHEMATICAL PROBLEMS
Sampling Distribution of means
Prob. 1 :
A population consists of the five numbers 2,3,6,8 and 11. Consider
all possible samples of size 2 that can be drawn with and without replacement
from this population .
a)The mean of the population.
b)The standard deviation of the population .
c)The mean of the sampling distribution of means.
d)Standard deviation of the sampling distribution of means (the standard error
of means ).
23. # Answer :
a) Mean of the population =
2+3+6+8+11
5
=
30
5
= 6
b)Standard deviation of population ,𝜎2
=
⅀ 𝑥−𝜇 2
𝑁
=
(2−6)2 + (3−6)2+(6−6)2+(8−6)2+(11−6)2
5
=
16+9+0+4+25
5
=
54
5
= 10.8
∴ 𝜎 = 3.29
With replacement :
c)There are 5(5)= 25 samples of size 2 that can be drawn with replacement. These are :
(2,2) (2,3) (2,6) (2,8) (2,11)
(3,2) (3,3) (3,6) (3,8) (3,11)
(6,2) (6,3) (6,6) (6,8) (6,11)
(8,2) (8,3) (8,6) (8,8) (8,11)
24. The corresponding sample means are :
2.0 2.5 4.0 5.0 6.5
2.5 3.0 4.5 5.5 7.0
4.0 4.5 6.0 7.0 8.5.
5.0 5.5 7.0 8.0 9.5
6.5 7.0 8.5 9.5 11.0
And the mean of sampling distribution of mean is ,
𝜇 𝑥 =
𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛𝑠
25
=
150
25
= 6.0
Illustrating the fact that 𝜇 𝑥 = µ
25. d) Here, standard deviation of the sampling distribution of mean is,
𝜎2x =
2−6 2+(2.5−6)2+ ………+ (11−6)2
25
( substracting the mean 6 from each numbers, squaring the
result, adding all 25 numbers thus obtained and dividing by 25 )
=
135
25
= 5.40
σx = 5.40 = 2.32
This illustrates the fact that for finite populations involving sampling with replacement , 𝜎2x =
𝜎2
𝑁
-
since the right hand side is 10.8/2 = 5.40 ; agreeing with the above value .
Without Replacement:
c) There are 10 samples of size 2 that can be drawn without replacement from the population :
(2,3) (2,6) (2,8) (2,11) (3,6) (3,8) (3,11) (6,8) (6,11) (8,11)
26. The corresponding sample means are :
2.5, 4.0 , 5 , 0 , 6.5 , 4.5 , 5.5 , 7.0 , 7.0 , 8.5 , 9.5 .
The mean of sampling distribution of means is ,
𝜇 𝑥 =
2.5+4.0+ …….…+9.5
10
= 6.0
∴ 𝜇 𝑥 = µ
(d) The variance of sampling distribution of mean is ,
𝜎2x =
(2.5−6)2+ 4.0−6 2+ ……….+ (9.5−6)2
10
= 4.05
And, 𝜎 𝑥 = 2.01
this illustrates, 𝜎2
x =
𝜎2
𝑁
(
𝑁 𝑝− 𝑁
𝑁 𝑝− 1
)
=
10.8
2
(
5−2
5−1
)
=4.05
As obtained above .
27. SAMPLING DISTRIBUTION OF PROPORTIONS
Prob. 2 :
Find the probability that in 120 tosses of a fair coin ,
a)Between 40 % and 60 % will be heads and
b)5/8 or more will be heads .
Answer:
We consider the 120 tosses of the coin to be simple from
the infinite population of all possible tosses of the coin. In this
population the probability of heads is p=1/2 and the probability
of tails is q= 1-p = ½
28. a) 𝜇 𝑝= 𝑝 =
1
2
= 0.50
𝜎 𝑝 = √
𝑝𝑞
𝑁
= √
1
2
(
1
2
)
120
= 0.0456
40 % in standard units =
0.40−0.50
0.0456
= -2.19
60 % in standard units =
0.60−0.50
0.0456
= 2.19
Required probability = (area under normal curve between z= -2.19 and z= 2.19 )
= 2 ( 0.4857 )
= 0.9714
Although this result is accurate to two significant figures, it does not agree exactly since we have
not used the fact that the proportion is actually a discrete variable. To account for this, we subtract ½
N = ½ (120) from 0.40 and add ½ N = ½ (120) to 0.60 ; thus, since 1/240 = 0.00417, the required
proportions in standard units are,
0.40−0.00417−0.50
0.0456
= -2.28 and
0.60+0.00417−0.50
0.0456
= 2.28
29. b) According to (a) since 5/8 = 0.6250
(0.6250 – 0.00417 ) in standard units =
0.6250−0.00417−0.50
0.0456
= 2.65
Required probability = ( area under normal curve to right of z=2.65 )
=(area to right of z = 0) – (area between z=0 and z= 2.65 )
= 0.5 – 0.4960
=0.0040 .
30. REFERENCES :
1.Statistics For The Social Sciences with Computer Applications –
Anthony Walsh
2.Schaum’s Outline of Theory and Problems of STATISTICS – Murray R.
Spiegel
3.Business Statistics – SP Gupta & MP Gupta
4.Descriptive and Inferential Statistics – An introduction - Herman J
Loether & Donald G McTavish