1. Complex sampling design &
analysis. A revision
Assoc. Prof. Dr. JamalludinAb Rahman MD MPH
Department of Community Medicine
Kulliyyah of Medicine
2. Content
Sampling method & sample size for survey
What is complex sampling method
Sampling weight
Complex sampling analysis
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
2
3. About sampling
Not feasible to select ALL population
Best sampling should be able to represent population
Sampling error occurs when statistics ≠ parameters
Sampling error is not sampling bias
Sampling error is random, sampling bias is predictable
(systematic)
Sampling design affects sampling error
Standard error measures sampling error
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
3
5. Describe the sample
Target population – inferred population
Study population – representative of the target population
Sampling frame – list of sampling unit
Sampling unit – unit to be sampled
Observation unit – unit to be observed/measured
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
5
6. Sampling method
Random vs. non-random
Random ensures representativeness
Simple vs. complex
SRS = all samples have equal chance to be selected
i.e. equal probability of selection
Anything not SRS is complex sampling
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
6
8. Stratified versus cluster sampling
Stratified for heterogeneous groups
e.g. male-female, age groups
Cluster for homogenous groups – rarely homogenous,
only in ideal situation e.g. schools, districts
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
8
10. Design Effect (deff)
Design Effect =
Variance estimate (complex)
Variance estimate (SRS)
How much the sample differ from population
Different value for different variable
Usually deff for complex survey >> 1
If > 1.5, meaning effective loss 50% of sample if
designed using SRS
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
10
11. Design Factor (deft)
Design factor (deft) is sqrt(deff) ~ effect of sampling to
standard error
If deft = 2, the SE is twice larger than if the sampling
design is SRS
The use of deff or deft, is as guide (a priori) to measure
sample size or to measure whether sample size has
been adequately achieved (post hoc)
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
11
12. Sampling Weight
aka Probability Weight
N/n (inverse of sampling fraction)
Two stage = (N1/n1)*(N2/n2)
The sum of PW = population
Weighting can increase standard error
12
13. Sampling weight…
Why? There is always imperfection in sampling
Weighting will try to correct
1. Unequal probability of selection – base/design
weight
2. Non-response bias
3. Stratification in population – trying to represent true
characteristics of population e.g. by sex, ethnic etc. – post
stratification
Slide |
13
14. Example
N = 100,000 people
Sample (n) = 1000
Therefore, SW = 100,000/1000 = 100
Every 1 sample represents 100 people in that region
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
14
20. Practical
Sampling distribution
Calculating sampling weight
Preparing data for analysis
Complex sample analysis (using SPSS)
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
20
21. Sampling distribution
Using 2016 adult household by location (urban/rural) in
Malaysia, prepare sampling distribution to represent up
to Malaysian urban/rural if the sample size calculated is
10,000 respondents
Taking 12 LQ per EB and 2 adults per LQ
Proportionate to size
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
21
22. 6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
22
Population Size by census ('000)*
No. State Urban Rural Total
1 Johor 1,682 537 2,219
2 Kedah 905 433 1,338
3 Kelantan 508 543 1,050
4 Melaka 537 47 584
5 Negeri Sembilan 492 198 690
6 Pahang 564 427 991
7 Perak 1,260 394 1,653
8 Perlis 102 66 167
9 Pulau Pinang 1,069 69 1,138
10 Sabah 1,064 597 1,661
11 Sarawak 1,009 694 1,703
12 Selangor 3,583 274 3,857
13 Terengganu 450 250 700
14 WP Kuala Lumpur 1,133 1,133
15 WP Labuan 50 6 57
16 WP Putrajaya 46 46
14,454 4,533 18,987
23. Calculating sampling weight
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
23
PSU (Kindergarten) SSU (Children)
URBAN RURAL URBAN RURAL
Total
population *
Kindergarten
visited
Total
population *
Kindergarten
visited
Total
population *
Children
Examined
Total
population *
Children
Examined
FT Kuala Lumpur 471 34 - - 10,940 687 - -
Perlis 65 5 222 7 1,007 97 2,557 113
Kedah 164 19 757 69 1,913 203 9,154 846
Penang 297 21 316 24 4,845 402 4,496 366
Perak 356 19 1,040 55 6,382 412 12,627 819
Selangor 1,051 93 607 55 22,951 2,204 7,994 815
Negeri Sembilan 206 15 420 30 2,924 253 4,850 373
Melaka 131 8 384 22 1,941 125 5,111 316
Johor 586 42 1,121 80 9,389 779 13,594 1,163
Pahang 235 13 873 45 4,188 224 12,092 642
Terengganu 400 21 813 35 6,979 336 9,308 427
Kelantan 144 9 1,042 58 2,924 178 14,882 934
FT Putrajaya 71 4 - - 2,170 127 - -
Sabah 395 32 1,230 101 10,330 998 13,837 1,006
Sarawak 590 30 1,493 67 13,395 644 14,936 725
FT Labuan 74 8 - - 1,400 135 - -
Total 5,236 373 10,318 648 103,678 7,804 125,438 8,545
24. Preparing data for analysis
Merge SW into dataset
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
24
26. Complex sample analysis
Preparing cs plan
Analysis using SPSS
6-7thApril2016(C)JamalludinAbRahman.Allrightsreserved.
27
Hinweis der Redaktion
For cluster samples, the main components of deff are the intraclass correlation or rho, and the number of units within each cluster. Rho is a statistical estimate of within cluster homogeneity. It represents the probability that two units drawn randomly from the same cluster will have the same value on the variable in question, relative to two units drawn at random from the population as a whole.
Thus, a rho of 0.10 indicates that two units randomly selected from within the same cluster are 10% more likely to have the same value than are two randomly selected units in the population as a whole.
For example, a deft value
of 2, indicates that the standard errors are
twice as large as they would have been had
the design been a simple random sample.