2. First things first
âą There is no mathematical background is required
for participants. I will assume that you wish to use
quantitative ,qualitative or mixed methods research
design.
âą Basics are dealt with required rigour and you
have self learning material given according to
the norms in second quadrant of four quadrant
learning.
28-02-2020 Sampling 2
3. âą Opensource software such as R will be highly helpful to
you. Before using those software please do understand
the theory.
âą There will be discussion on common mistakes in
statistics. Please use Discussion Forum for posting your
questions, all your questions will either be answered or
find a reference or a person will answer the questions.
âą This is the first module followed by other lectures.
Activities by you
28-02-2020 Sampling 3
5. Introduction of topics and broad questions
answered in the lecture
In the first part we will describe
methods that are used for selecting
the samples.
There is practice sessions on
selecting samples using
Randomizer software.
28-02-2020 Sampling 5
6. In the second part we will discuss how
information gained from sample is likely to vary
from the corresponding information for the
population and what is the relationship between
population parameter and sample statistic?
This will form the foundation for statistical
inference which we will discuss in the next
lecture.
Introduction of topics and broad questions
answered in the lecture
Remember
Known
Population to
Sample!
28-02-2020 Sampling 6
7. If you are planning to study television viewers of New Delhi
whose numbers may be in millions and you wish to pick up
1000 users to make a decision about all the viewers in New
Delhi, you are using two concepts- population and sample.
The basic unit of a population is called an element of the
population. Each television viewer is an element of viewer
population. A population is a theoretically specified aggregation
of elements. Collecting information from all the elements is
known as Census or making a complete enumeration
Definitions
28-02-2020 Sampling 7
8. A sampling frame is the list, index, or records from which the
sample will be drawn, which might not be totally inclusive of
the study population. It includes all the active elements of the
population.
Sampling is a procedure to select elements from a population.
A sample is a subset of the population elements that results
from a sampling strategy.
Definitions
28-02-2020 Sampling 8
9. Random sampling is not haphazard or absence of the procedure. It is a
selection procedure that guarantees a known, nonzero, chance of
selection for each and every element of population.
Random sample or a probability sample is a sample selected in a way
such that every element of the population has a known chance (non
zero and need not be equal in all cases) of being included in the
sample.
Statistical sampling theory provides methods used for solving applied
problems and the theory is based on random samples.
Random Samples-Meaning
28-02-2020 Sampling 9
10. Random sampling eliminates systematic biases by giving all elements in a
population a given chance to be chosen. We will discuss more about
biases in a separate lecture.
It is better to understand it as â randomly chosen sampleâ, than as a
random sample as the dictionary defines it as "Having no specific pattern
or objective; haphazard" (The American Heritage Dictionary, Second
College Edition, Houghton Mifflin, 1985).
Why random sample is so important?
(https://web.ma.utexas.edu/users/mks/statmistakes/randomsample.html)
Common mistakes in statistics
28-02-2020 Sampling 10
11. If you want to study the attitude of Delhi University students
on climate change and you visit university on a random day
and meet 100 students randomly, it is not a random sample.
You have excluded many who may be absent or gone on
vacation. The right way is to take the list of all students in
different years and use it as a frame and number them and
use random numbers to select the calculated sample size.
This step is important as it will help us to quantify errors
and minimize them. If it is not random sample, we may
not be confident about our results.
28-02-2020 Sampling 11
12. A simple random sample is a sample selected in such a way that it satisfies
two conditions
Every element in the population has the same chance of being selected and
Every sample size of n has the same chance of being chosen.
Simple random samples (SRS)
Example: Binod, Charan, Dipesh and Ekanth are the scholars. All four
want leave on vacation and only two can be away. You may put B,C,D and
E in a paper and put it in a container and request another scholar to take
two papers. Thus each of the scholar has a equal chance of being
selected.
28-02-2020 Sampling 12
13. The possible samples of size two that are
selected will be;
BC, BD,BE,CD,CE,DE. If you observe B is
chosen in three of the six samples. Thus
P(B)= 3/6=1/2. Similarly for P(C)=P(D)= P(E).
Each of the six possible samples has the
same chance 1/6 of being selected. Thus
both the conditions are satisfied.
Simple random samples (SRS)
28-02-2020 Sampling 13
15. In a medical study you wanted to study population of all hospitals that are
performing heart surgery.
Find all hospitals that are performing renal surgery and ensure that they
performed at least given number of heart surgeries for the past two months .
Number them 1 to k, and this list is known as the sample frame as it includes
active elements of population.
Determine the sample size scientifically as discussed in the lectures.
You may use the randomizer software to obtain random samples.
28-02-2020 Sampling 15
16. How many sets of numbers do you want to
generate?
Number range (e.g., 1-50)
Do you wish to sort the numbers that are
generated?
How do you wish to view your random
numbers?
Software help You may use research randomizer
(https://www.randomizer.org/)
You may generate the needed random
samples from the numbered frame you have
created
You have to plug-in the following
requirements decided by you.
Number range (e.g., 1-50)
How many numbers per set?
28-02-2020 Sampling 16
17. I have generated 2 sets of 30
unique numbers per set.
I assumed that you want to
experiment with one set and
another set is control.
The range is sorted from
least to greatest.
Please do try for different
values.
Results
28-02-2020 Sampling 17
18. Systematic Random Samples
A systematic random sample is a sample
which contains every ith element of a
population. Let us consider 20 element
population as given here.
A B C D E F G H I J K L M N O P Q R S T
and every fifth (i=5) element is to be in the
sample. We need to start with one of the
five elements A,B,C,D,E. Let us number
them as 1,2,3,4,5 respectively. We try to
obtain a random starting point from a table
of random digits.
28-02-2020 Sampling 18
19. Systematic Random Samples
I took a snap shot of random
number table available at
internet. We need to decide
before hand that we will
choose first digit that may be
useable for us ( from 1 to 5)
from the number starting in
the fourth row and eight
column (without looking at
the table) it is 47200 (after
looking at the table).
28-02-2020 Sampling 19
20. Selection
Out of the digits 47200, the first digit is useful. That is with in
1,2,3,4,5. Thus every 4 th element will be a sample. In this case it
is D,H so on.
In the discussion we have seen that every element has a known,
non zero chance ( 1-5) of being selected as the starting point of
selection is random. Therefore we have a random sample.
The samples we get from Simple Random Sampling are not the
same in the systematic random sampling. They are different
methods.
28-02-2020 Sampling 20
22. Dividing population into non overlapping groups is called
stratification.
If the population is stratified and random sample is selected from each of the
stratum and this process is called stratified random sampling.
This process is to ensure that samples of adequate size are obtained from each
stratum. Let us assume that we want to study education levels of persons between
18-28 years. If you take Indiaâs total population some of the smaller states may be
missing from being represented. Therefore, it is better to create strata such as
states of India and then use random numbers to generate the given sample.
Stratified random samples
28-02-2020 Sampling 22
23. Stratified random sampling in practice
If we find the differences between stratas is high and within the strata are
low, it is better to use stratified random sampling. Let us assume the
following data which is extreme (with small and large values combined)
2,4,6,6,2,100,102,100,104,104
The population total sum is 530. We will plan to estimate the population
sum by random samples of two and multiply it by 5.
Now a simple random sample may yield poor results. For example 5(2+4)
will yield only 30. Then we stratify into two groups. The two groups are,
now a single random sample from each strata may give one pair 2,100 or
5(2+100) will give you 510, which is a better estimate.
2,4,6,6,2 100,102,100,104,10428-02-2020 Sampling 23
24. Non random samples are judgemental samples. The term judgement
implies that instead of chance, judgement of the researcher plays a
role and elements of the sample has unknown probability of being
chosen.
Convenient samples are selected just because it is convenient for
the researcher. For example the frequently used sampling technique
in market research is the quota controlled judgement sampling and
the sampling is done based on the population percentages. For
example the total population has 20% high income, 30% middle
income and 50% lower income then by using judgement an
organization may choose sample units.
Non random samples
28-02-2020 Sampling 24
25. Availability sampling
Availability sampling is a technique in which elements are selected
Because these samples are accessible to the researcher. This may
introduce bias in the sample. Volunteers are used for research and
they may not be representative of the population.
Purposive sampling is based on researcherâs knowledge of the
population elements in terms of research goals. It is used when there
is good knowledge about who will be able to provide information on
the domain.
Snowball sampling is used to identify participants when population
units are difficult to locate. To study drug addicts, it may be difficult to
locate the population. In such cases the researcher may choose
snowball sampling to locate one or two addicts and get their
confidence and in turn may increase the respondents numbers.
28-02-2020 Sampling 25
26. I will discuss here some of the key concepts you
should understand while performing your research.
The first concept is the estimation of population
parameter by random sampling. This concept we
will understand by using a small population and get
random samples from it and try to understand
intuitively the sampling.
Concepts in sampling
28-02-2020 Sampling 26
27. Parameters and Statistics- Relation between
population mean and sample mean
Let us discuss. The population of students of Delhi University in a
given year for commerce course may be 20,000 students. We wish to
estimate their marks in Financial Accounting. We will chose a random
sample of 150 and found their average marks are 65. If you reflect on
the result you may say âHow it is possible to get data from 150 and
estimate for 20,000?â. Let us do some problems and definitions. A lay
person may suspect the results. But not you!
A parameter is a characteristic of population.
28-02-2020 Sampling 27
28. Parameters and Statistics- Relation between
population mean and sample mean
A statistic is a characteristic of sample.
The mean of the population is a population parameter.
If a sample is drawn from population, the mean of the
sample is statistic. The mean of sample is denoted by x
bar and ” is used to denote population mean.
28-02-2020 Sampling 28
30. Let me assume that there are five numbers in a population.
0,6,12,6,36
The mean of population is ” đ = đ + đ + đđ + đ + đđ / 5 is given as 12.
The population parameter is a single value 12.
Let us assume that we draw a sample of three units and they are 12,6,36,
then their mean will be 12+6+36 /3 which is equal to 18. There are ten
different samples may be drawn from 5 population units random samples.
Our question is what will be the Mean of Means of all samples?
28-02-2020 Sampling 30
31. Generating sample means from the Population-Ten
samples are generated and their mean is calculated
Sample values Sample Mean
0,6,12 6
0,6,6 4
0,6,36 14
0,12,6 6
0,12,36 16
0,6,36 14
6,12,6 8
6,12,36 18
6,6,36 16
12,6,18 18
Mean of Means of all samples 120/10 = 1228-02-2020 Sampling 31
32. The Mean of all sample means that is ten in
this case is 12. That is, it is equal to the mean
of population.
I have given the frequency table of the
sample mean. You will see the sample mean
occurs twice and the frequency is 2.
We refer this as sampling distribution of the
mean (or) probability distribution for the
means of all the samples of a given size that
can be drawn from the population.
Sample mean
x
Frequency (f) Probability of
x
4 1 0.1
6 2 0.2
8 1 0.1
14 2 0.2
16 2 0.2
18 2 0.2
Total 10 1
What we observe?
28-02-2020 Sampling 32
33. We will use the following symbols
N = population size
”x = population mean
Ïx = population standard deviation
n = sample size
” đ = mean distribution of đ
Ï đ = standard deviation of the distribution of đ
Relationship between the population and the sample distribution of the mean
28-02-2020 Sampling 33
34. The standard deviation of the sample means Ïxbar is called
the Standard Error of the Mean or by some authors as SEM.
However, the relationship is not intuitive like the sample
means and the population mean. The formula for Ïxbar to
Ïx is as follows
Ï đ = Ïx / đ
We will take the same population what you have taken in
the example (0,6,12,6,36) and compute the Ïxbar.
Population and sampling distribution
standard deviation
28-02-2020 Sampling 34
35. đđ„ =
đđ„
đ
đ â đ
đ â 1
The relation between the standard error of mean and the
population standard deviation are given by the formula here.
We will ignore the
đ”âđ
đ”âđ
small sample multiplier if the sample
size is more than 100 as it may not make much difference.
28-02-2020 Sampling 35