5. 5
Steps in Sampling Design
What is the target population?
What are the parameters of interest?
What is the sampling frame?
What is the appropriate sampling method?
What size sample is needed?
Error?
- Random sampling error (chance fluctuations)
- Non-sampling error (design errors)
6. Target Population (step 1)
• Who has the information/data you need?
• How do you define your target population?
- Geography
- Demographics
- Use
- Awareness
7. Sampling Frame (step 2)
• List of elements
• Sampling Frame error
– Error that occurs when certain sample elements are not listed
or available and are not represented in the sampling frame
8. Probability or Nonprobability (step 3)
• Probability Sample:
– A sampling technique in which every member of the
population will have a known, nonzero probability of being
selected
• Non-Probability Sample:
– Units of the sample are chosen on the basis of personal
judgment or convenience
– There are NO statistical techniques for measuring random
sampling error in a non-probability sample. Therefore,
generalizability is never statistically appropriate.
9. Classification of Sampling Methods
Sampling
Methods
Probability
Samples
Simple
Random
Cluster
Systematic Stratified
Non-
probability
QuotaJudgment
Convenience Snowball
11. • Simple Random Sampling
– The purest form of probability sampling.
– Assures each element in the population has an equal
chance of being included in the sample
– Random number generators
Probability of Selection =
Sample Size
Population Size
Simple Random Sampling
12. 12
Simple Random
Advantages
• Easy to implement with
random dialing
Disadvantages
• Requires list of population
elements
• Time consuming
• Uses larger sample sizes
• Produces larger errors
• High cost
13. 13
Systematic
Advantages
• Simple to design
• Easier than simple random
• Easy to determine
sampling distribution of
mean or proportion
Disadvantages
• Periodicity within population
may skew sample and
results
• Trends in list may bias results
• Moderate cost
14. Stratified Sampling
• Sub-samples are randomly
drawn from samples within
different strata that are more
or less equal on some
characteristic
• Why?
– Can reduce random error
– More accurately reflect
the population by more
proportional
representation
• How?
1. Identify variable(s) as an
efficient basis for stratification.
Must be known to be related to
dependent variable. Usually a
categorical variable
2. Complete list of population
elements must be obtained
3. Use randomization to take a
simple random sample from
each stratum
15. Stratified Sampling
• Advantages
– Assures representation of
all groups in sample
population needed
– Characteristics of each
stratum can be estimated
and comparisons made
– Reduces variability from
systematic
• Disadvantages
– Requires accurate
information on proportions
of each stratum
– Stratified lists costly to
prepare
16. Cluster Sampling
The primary sampling unit is not the individual element, but a large
cluster of elements. Either the cluster is randomly selected or the
elements within are randomly selected
Why? Frequently used when no list of population available or
because of cost
Types of Cluster Samples
Area sample: Primary sampling unit is a geographical area
Multistage area sample: Involves a combination of two or more
types of probability sampling techniques. Typically,
progressively smaller geographical areas are randomly
selected in a series of steps
17. Cluster Sampling
• Advantages
– Low cost/high frequency
of use
– Requires list of all clusters,
but only of individuals
within chosen clusters
– Can estimate
characteristics of both
cluster and population
– For multistage, has
strengths of used methods
• Disadvantages
– Larger error for
comparable size than
other probability methods
– Multistage very expensive
and validity depends on
other methods used
18. 18
Stratified and Cluster Sampling
Stratified
• Population divided into
few subgroups
• Homogeneity within
subgroups
• Heterogeneity between
subgroups
• Choice of elements from
within each subgroup
Cluster
• Population divided into
many subgroups
• Heterogeneity within
subgroups
• Homogeneity between
subgroups
• Random choice of
subgroups
19. Example
EVN conducted a surveying on customer satisfaction of their clients
in Vietnam. They want to create the sample that can produce the
good results
First they selected 16 provinces and cities in Northern, Middle and
Southern
In each province/ city, they selected some districts (that account
20% of all districts) randomly
In each district, they choose some communes/ wards
In each commune/ wards they list their clients into 2 groups:
Household and Business and choose random clients in each group
What is their sampling method(s)?
22. Convenience Sample
• The sampling procedure used to obtain those units or
people most conveniently available
• Advantages
– Very low cost
– Extensively used/understood
– No need for list of population
elements
• Disadvantages
– Variability and bias cannot
be measured or controlled
– Projecting data beyond
sample not justified.
23. Judgment or Purposive Sample
• An experienced research selects the sample based on
some appropriate characteristic of sample members to
serve a specific purpose
• Advantages
– Moderate cost
– Commonly
used/understood
– Sample will meet a specific
objective
• Disadvantages
– Bias!
– Projecting data beyond
sample not justified.
24. Quota Sample
• To ensure that a certain characteristic of a population sample
will be represented to the exact extent that the investigator
desires
• Advantages
– moderate cost
– Very extensively
used/understood
– No need for list of
population elements
– Introduces some elements
of stratification
• Disadvantages
– Variability and bias cannot
be measured or controlled
(classification of subjects)
– Projecting data beyond
sample not justified.
25. Snowball sampling
• the initial respondents are chosen by probability or non-
probability methods, and then additional respondents are
obtained by information provided by the initial respondents
• Advantages
– low cost
– Useful in specific
circumstances
– Useful for locating rare
populations
• Disadvantages
– Bias because sampling
units not independent
– Projecting data beyond
sample not justified.
28. Determining Sample Size
Formulas:
Means n = (ZS/E) 2
Proportions n = Z2 pq/ E2
Percentiles n = pc (100 – pc) Z2/ E2
Z at 95% confidence = 1.96
Z at 99% confidence = 2.58
30. INTRODUCTION
A common goal of survey research is to collect data representative of a
population;
The researcher uses information gathered from the survey to generalize
findings from a drawn sample back to a population, within the limits of
random error;
Wunsch (1986) stated that two of the most consistent flaws included:
– Disregard for sampling error when determining sample size;
– Disregard for response and non-response bias
The purpose of this paper is to:
– Describe common procedures for determining sample size for simple
random and systematic random samples;
– Focus on Cochran’s (1977) sample size formula for both continuous
and categorical data
31. Foundations
Primary Variables of Measurement
– The researcher must make decisions as to which variables will
be incorporated into formula calculations;
– One method of determining sample size is to specify margins
of error for the items that are regarded as most vital to survey
Researchers will have a range of n’s, usually ranging from smaller
n’s for scaled, continuous variables (height, job satisfaction), to
larger n’s for dichotomous or categorical variables (gender,
education levels);
If the n’s for the variables of interest are relatively close, the
researcher can simply use the largest n as the sample size and
be confident that the sample size will provide the desired results
32. Error Estimation
Cochran’s formula uses two key factors
– The risk (margin of error) the researcher is willing to accept in the
study;
– The alpha level, the level of acceptable risk the researcher is willing
to accept that the true margin of error exceeds the acceptable
margin of error
The alpha level used in determining sample size in most
educational research studies is either 0.05 or 0.01 (Ary, Jacobs, &
Razavieh, 1996);
The general rule related to acceptable margins of error in
educational and social research (Krejcie & Morgan, 1970):
– For categorical data, 5% margin of error is acceptable;
– For continuous data, 3% margin of error is acceptable
34. Variance Estimation
Cochran listed four ways of estimating population variances:
– Take the sample in two steps, and use the results of the first
step to determine how many additional responses are
needed to attain an appropriate sample size based on the
variance observed in the first step data;
– Use pilot study results;
– Use data from previous studies of the same or a similar
population;
– Estimate or guess the structure of the population assisted
by some logical mathematical results
35. 35
Standard deviation
Data Standard deviation
Continuous
variables
7 (number of points on the scale)
S = -----------------------------------------------------
6 (number of standard deviations)
Categorical
variables
S = (p x q)^(1/2) = (0.5 x 0.5) ^(1/2) = 0.5
36. Sample size determination process
• Step 1: Sample size calculation
– Use appropriate Cochran’s sample size formulas for each kind
of data
• Step 2: Sample size adjustment for population
– Use Cochran’s (1977) correction formula if sample size
exceeds 5% of the population
• Step 3: Sample size adjustment for real situation
– Real situations: the response rates are below 100%;
– Use oversampling with the anticipated response rate
determined by using the same four methods of variance
estimation.
37. 37
Where
• t: value in t-distribution = z value in Normal Distribution when population is large
• s: estimate of standard deviation in population
• d: acceptable margin of error for mean
• α: significant level
• p: estimate of population proportion
• q: q = 1 - p
Sample size determination
Continuous Data Categorical Data
Step 1
Step 2
Step 3
2
2
2
2
2
0
)(
)(*
d
pqz
d
pqt
n
2
22
2
2
22
0
d
sz
d
st
n
)/1( 0
0
1
Populationn
n
n
)RateReturndAnticipate/(12 nn
38. 38
Continuous Data Categorical Data
Step 1
Point scales = 7
α = 5%, error margin = 0.03
Population proportion = 0.5
α = 5%, error margin = 0.05
Step 2
Population = 1,679
5% of population = 84 < n0
Population = 1,679
5% of population = 84 < n0
Step 3
Anticipated return rate = 65%
n2 = 111/0.65 = 171
Anticipated return rate = 65%
n2 = 313/0.65 = 482
118
)03.0*7(
)167.1()96.1(
2
22
0 n 384
05.0
)5.0)(5.0()96.1(
2
2
0 n
313
)1679/3841(
384
1
n111
)1679/1181(
118
1
n
Sample size determination - Example
40. Other Considerations
Regression analysis: The researcher wishes to use multiple
regression analysis in a study.
– The ratio of observations (n) to independent variables (X)
should not fall below five (Hair, Anderson, Tatham, & Black,
1995);
– A more conservative ratio, of ten observations for each
independent variable was reported optimal (Miller and
Kunce, 1973).
41. Other Considerations
Factor analysis:
– The same ratio considerations discussed under multiple
regression should be used;
– One additional criteria is that factor analysis should not be
done with less than 100 observations).
– Loading factors to be significant for an alpha level of 0.05
42. 42
Factor Analysis – Example
An analysis of the responses of 1,076 randomly sampled people to a
survey about job satisfaction was carried out.
43. Other Considerations
Sampling non-respondents :
– The researcher could consider using Cochran’s formula to
determine an adequate sample for the non-respondent follow-
up response analyses.
– Budget, time and other constraints
– Often, the researcher is faced with various constraints that may
force them to use inadequate sample sizes;
– Researchers should a discussion of the effect the inadequate
sample sizes may have on the results of the study.
Non-respondents
Respondents
Biased
Sample
44. 44
Final thoughts
– In general, a researcher could use the standard factors
identified in this paper in the sample size determination
process;
– Using an adequate sample along with high quality data
collection efforts will result in more reliable, valid, and
generalizable results.