SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Respondent-Driven Sampling:
An Overview
Ashton M. Verdery
Duke Network Analysis Center
May, 2019
Outline
• Intuition about network sampling
• Leveraging social networks for sampling
– Why?
– How?
• What is RDS?
– Hidden populations
– RDS origins and concepts
– RDS applications
– Pitfalls and promises of RDS
– New directions
Ashton M. Verdery 2
Samples from social networks
Ashton M. Verdery 3
1
6
2
8
3
11
4
5
9
7
10
Samples from social networks
Ashton M. Verdery 4
1
6
2
8
3
11
4
5
9
7
10
Samples from social networks
Ashton M. Verdery 5
1
6
2
8
3
11
4
5
9
7
10
Samples from social networks
Ashton M. Verdery 6
1
6
2
8
3
11
4
5
9
7
10
Samples from social networks
Ashton M. Verdery 7
1
6
2
8
3
11
4
5
9
7
10
Samples from social networks
Ashton M. Verdery 8
1
6
2
8
3
11
4
5
9
7
10
Samples from social networks
Ashton M. Verdery 9
1
6
2
8
3
11
4
5
9
7
10
Samples from social networks
Ashton M. Verdery 10
1
6
2
8
3
11
4
5
9
7
10
Why do it?
• Future of social science research
– New populations of interest are hard to survey
• e.g., undocumented migrants, people who use drugs
– New theories & tools require new types of data
• e.g., social network analysis
– Existential threat of declining survey participation
• i.e, all groups are becoming hidden populations
Ashton M. Verdery 11
Silliness
12
http://www.pewresearch.org/
2017/05/15/what-low-
response-rates-mean-for-
telephone-surveys/
Hidden populations
• Collecting data from
hidden populations is
difficult because the
absence of a sampling
frame
– Stigma
– Non response
– Lack of trust
– Rarity
Ashton M. Verdery 13
Household based sampling in Lilongwe, Malawi
Escamilla et al. 2014
How to sample hidden populations?
• Traditional approaches
– Convenience samples
– Clinical samples
– Location samples
• Problems
– Are we learning about people other than those sampled?
• Limited ability to infer representation
• Poor coverage for sampling frame
• Often time intensive, costly, very small samples
Ashton M. Verdery 14
Respondent-Driven Sampling (RDS)
• A sociological method with wide applications
– Heckathorn 1997
• Most popular solution to problems of hidden
populations in recent decades (as of May 2019)
– 544+ studies
– 1.2k+ papers, 24k+ cites
– H-index of 59
– Over $213 mill. from NIH
• Compare to “ego centric”
– 254 studies funded
– $59 million since 1990
RDS applications
• Hidden populations of many stripes
– Men who have sex with men
– People who inject drugs
– Commercial sex workers
– High risk heterosexuals
– Other drug users (opioids, methamphetamines)
– Domestic violence victims
– Victims of sexual violence (child prostitution, sex trafficking, war-time rape)
– Jazz musicians
– Vegetarians and vegans in Argentina
– Wheelchair users
– Non-institutionalized older adults (85+)
• Most common questions
– Can we sample this population?
– What are the characteristics of this population?
– What is the size of this population? 16
Top 10 fields
Ashton M. Verdery 17
Web of Science. May 2019.
RDS overview
Two parts
1) Chain referral / peer recruitment
– “Seed” participants receive 2 coupons
• Recruit 2 new participants each
• Dual incentives for participation & recruitment
• Each new respondent given 2 coupons to recruit others
• Process continues until desired sample size is obtained
• (No one participates more than once)
• *Researchers lack control of sampling process
2) Post-recruitment weighting of cases
– Correct for theoretical sampling process
– Make inferences about population & quantify uncertainty
Seeds & coupons
19
Wirtz et al. 2017
• Seeds
– 7-10 population members
– Convenience selection
• Willing to participate
• Large personal networks
• Diverse on relevant attributes
• Coupons
– Give 2 to 3 per respondent
• Non-seeds can only
participate with a coupon
– Uniquely coded for tracking
• Codes given out & redeemed
– Non-physical coupons
• Possible, but challenging
Coupons
Ashton M. Verdery 20
Contact number
Consent and study
description (on back)
Valid dates
Interview site location
Tracking codes
Example
Ashton M. Verdery 21(Fisher and Merli, Net. Sci.
2014)
Example
22(Verdery, et al Soc. Meth. 2017)
Sampled = black
Core resources
• Useful website from Handcock, Gile, & collaborators
– http://hpmrg.org
• Manuals for RDS survey design
– Johnson tutorial, with questionnaires, consent forms, etc.
• http://applications.emro.who.int/dsaf/EMRPUB_2013_EN_1539.pdf
– CDC, UNAIDS, & others also have useful manuals
• https://www.cdc.gov/hiv/pdf/statistics/systems/nhbs/nhbs-idu3_nhbs-het3-protocol.pdf
• https://globalhealthsciences.ucsf.edu/sites/globalhealthsciences.ucsf.edu/files/ibbs-rds-protocol.pdf
• Software for RDS analysis
– Stand alone software for RDS coupon management & analysis
• http://www.respondentdrivensampling.org/main.htm
– R package “RDS” for analysis & diagnostics
• https://cran.r-project.org/web/packages/RDS/index.html
– Stata packages for analysis
• http://www.stata-journal.com/article.html?article=st0247
• I have unreleased Stata packages for many RDS estimators and RDS multivariate regression
• Diagnostics for RDS preplanning and post-survey analysis
– http://www.princeton.edu/~mjs3/gile_diagnostics_2014.pdf
Network structure assumptions
There is a social network
Population size large (N>>n)
Homophily weak
Community structure weak
Connected graph w/1 component (giant
component)
All ties reciprocated (undirected)
Known population size N
Sampling assumptions
Sampling with replacement
Single, non-branching chain (1 seed; 1
coupon)
Sufficiently many sample waves
Initial sample of seeds unbiased
Degree accurately measured
Conditionally random referrals (random
Key concepts & assumptions
• Baseline assumptions
– Population members are linked
in a social network & will refer
other members into the study
• Key concepts
– Primary & secondary interviews
– Respondent degree
– Random recruitment
– Bottlenecks
– Bias, sampling variance, & RMSE
• Different estimators make
different assumptions about
recruitment process and
underlying network
Ashton M. Verdery 24
(see Gile 2011:144)
Primary & secondary interviews
Ashton M. Verdery 25
Respondent degree
• Degree
– Popularity
– How many incoming ties
• network assumed undirected
• Typical solicitation
– “how many people do you
know (you know their name
and they know yours) who have
exchanged sex for money in the
past six months?”
– Often, successive restrictions
• Last 30 days, live in area, etc.
• Key element of most mean
estimators
𝑤𝑖 = 𝑑𝑖
−1
𝑖
𝑑𝑖
−1
Merli, et al Soc. Sci. Med.
Assumption: “random recruitment”
Ashton M. Verdery 27
A
B
C
D
3/9
3/9
3/9
In practice: “preferential recruitment”
Ashton M. Verdery 28
A
B
C
D
4/9
4/9
1/9
Reasons for preferential recruitment
• NOT A REASON
– Has more connections to similar people
• In principle, the weighting approaches should deal with this
• Reasons (not exhaustive)
– Better relationships with similar people
– Wants to help friend who needs money
– Wants friend to get HIV test
– Only friends who do riskier things want to get tested
– Unemployed friends more likely to be encountered
– Etc.
Ashton M. Verdery 29
“Bottlenecks”
• Few ties between clusters
– Assumed to matter
substantially
– Somewhat overstated
• General advice:
– Split sample
– Tough to achieve a priori
With n=500, rds on this
network exhibits 150X
the sampling variance of
SRS and the estimated
sampling variance bears
no relation to this, we
see this in network after
network after network
Mouw & Verdery Soc. Meth. 2012
Salgnik & Goel Stat. Med. 2009
Key concepts
• Bias
– “Accuracy”
– How far from the population parameter
is the average sample?
• Sampling variance
– “Precision”
– How variable are the results, sample to
sample, on average?
– Often expressed as Design Effects
• Ratio of RDS to SRS sampling variance
• Interpretable as sample size multiplier
• Root Mean Square Error (RMSE)
– Balancing accuracy and precision
• There are many other error metrics
Verdery, Merli, et al. Epid. 2015.
Just right?
Contrast with SRS
Network: Project 90 (N=4413)
Variable: Percent White
RDS
– Unbiased, 10 seeds, 3 coupons
– Without replacement
– N=150
SRS
– Without replacement
– n=150
Ashton M. Verdery 32
Project 90 network, red nodes=non-white
Verdery et al. 2017
Contrast with SRS
Ashton M. Verdery 33
0
50
100
150
200
250
40 60 80 100
Estimated percentage
RDS
Sample
0
50
100
150
200
250
Numberofsamples
40 60 80 100
Estimated percentage
SRS
Sample
Contrast with SRS
Ashton M. Verdery 34
Backup: https://youtu.be/BZL3XBeG7W8
Contrast with SRS
Ashton M. Verdery 35
Contrast with SRS
(n=400)
Ashton M. Verdery 36
0
10
20
30
Frequency
.5 .6 .7 .8 .9
Estimate
RDS VH weights
SRS w/o replacement
• Early RDS work focused on bias, but
sampling variance is also critical
• A related concern:
– Quantifying uncertainty
– After data collection, can you say:
• How biased your sample is?
• How results would vary sample to sample?
– Key feature of inferential statistics
• E.g., if sampling conformed to
assumptions, we can provide a confidence
interval for an estimate and be reasonably
sure the confidence interval is accurate
• Is this true in RDS?
Bias, sampling variance, & uncertainty
37
Quantifying uncertainty
• Traditional estimators of RDS sampling variance are bad
• Example
– Sampling variance (SV)
• RDS mean estimators have high SV
– Estimated sampling variance
• RDS SV estimators have high bias
Verdery et al.,
Plos1 2015
Recent progress on estimating
RDS sampling variance
Ashton M. Verdery 39
Baraff, et al., PNAS. 2015
Estimators
• Of the population mean
– At least 11 in current use
• Table on right
• McCreesh et al. 2013
• Crawford 2016
• Gile & Handcock 2015
• Berchenko 2017
• Of the sampling variance
– 5 primary methods in use
• Bootstrap (Salganik 2006)
• Analytical (Volz & Heckathorn 2008)
• Successive Sampling (Gile 2011)
• Model assisted (Gile & Handcock 2015)
• Tree Bootstrap (Baraff et al. 2016)
eTable 1. The seven respondent-driven sampling estimators evaluated in this paper.
Estimator Source
1. Naïve None
2. RDS1-SH Salganik MJ, Heckathorn DD. Sampling and Estimation in Hidden Populations Using
Respondent-Driven Sampling. Sociol Methodol. 2004;34(1):193–240.
doi:10.1111/j.0081-1750.2004.00152.x.
3. RDS1-DS Heckathorn DD. Respondent-Driven Sampling II: Deriving Valid Population Estimates
from Chain-Referral Samples of Hidden Populations. Soc Probl. 2002;49(1):11-34.
doi:10.1525/sp.2002.49.1.11.
4. RDS1-DG Heckathorn DD. Extensions of Respondent-Driven Sampling: Analyzing Continuous
Variables and Controlling for Differential Recruitment. Sociol Methodol.
2007;37(1):151–207. doi:10.1111/j.1467-9531.2007.00188.x.
5. RDS1-LEN Lu X. Linked Ego Networks: Improving estimate reliability and validity with
respondent-driven sampling. Soc Netw. 2013;35(4):669-685.
doi:10.1016/j.socnet.2013.10.001.
6. RDS2-VH Volz E, Heckathorn DD. Probability based estimation theory for respondent driven
sampling. J Off Stat. 2008;24(1):79.
7. RDS2-SS Gile KJ. Improved Inference for Respondent-Driven Sampling Data With Application
to HIV Prevalence Estimation. J Am Stat Assoc. 2011;106(493):135-146.
doi:10.1198/jasa.2011.ap09475.
40
Verdery, et al., Epid. 2015
General comments on estimators
• For the population mean
– “linked ego networks” is best
• Requires respondents know
peer attributes reasonably well
• Can’t calculate for many
variables of interest
– Naïve estimator often works
– Most common
• Volz-Heckathorn
• Successive Sampling
– (In general, SS is better)
• For the sampling variance
– Only the tree bootstrap
method seems to have
anything resembling
reasonable properties
41
Verdery, et al., Epid. 2015
Diagnostics
• Embed questions in the survey to
allow you to estimate whether
assumptions were met
– E.g., ask why people recruited those
they did, how many people they
tried to recruit who had already
participated, etc.
• Assess potential bottlenecks and
seed bias with convergence plots
Ashton M. Verdery 42
Johnston, et al., Epid. 2015
A few notes on web-based RDS
• Developing area with challenges but lots of potential
• Recommendations
– Differences from traditional
• Be prepared to expand to 30-60 seeds; 20+ waves
– Verification
• Respondent Uniqueness
– IP address verification; web-cam interview?
• Respondent is in target population
– In geographic area of interest? Fits other criteria?
• Coupon management
– Careful with secondary incentives
– Remember limitations
• Internet access, etc.
43
If problems…
• Expand recruitment
– Expand number of seeds
– Expand allowable recruits
– Raise incentives
– Reduce burdens
• Greater emphasis on anonymity
• Shorten survey
• Drop secondary interview
• If all else fails…
– Convenience sample
– Lean on other features
• It won’t always look like it does on paper
Ashton M. Verdery 44
My recommendations
• 1) Embed additional data collection in RDS
– Qualitative interviews
– Ego network rosters
– Minimally identifiable information about alters
• 2) Examine more than just prevalence
– Population size
– Network structure
– Multivariate relationships
Ashton M. Verdery 45
Promises & pitfalls
Weighting/estimation can yield asymptotically
unbiased estimates of population mean
– Unrealistic, hard to verify assumptions required
Design effects remain high
– Orders of magnitude larger N needed
But…
– New data on understudied populations
– Effective, fast method (50 cases/week)
– Possible to learn a lot about networks (underutilized)
Ashton M. Verdery 46
Thank you!
Portions of this work were supported by a grant from the National Institutes of
Health (1 R03 SH000056-01; Verdery PI): “Multivariate Regression with Respondent-
Driven Sampling Data.”
I also appreciate assistance from the Justice Center for Research, the Institute for
CyberScience, the Social Science Research Institute, the College of the Liberal Arts,
and the Population Research Institute at Penn State University, the last of which is
supported by an infrastructure grant from the Eunice Kennedy Shriver National
Institute of Child Health and Human Development (P2CHD041025 & R24 HD041025).
Other portions of this work benefitted from support from the Duke Network Analysis
Center, the Duke Population Research Institute, and the Carolina Population Center.
Ashton M. Verdery: amv5430@psu.edu
I thank many coauthors: M. Giovanna Merli, James Moody, Ted Mouw,
Peter J. Mucha, Jacob C. Fisher, Shawn Bauldry, Nalyn Siripong, Jeff
Smith, Kahina Abdessalem, Sergio Chavez, Heather Edelblute, Jing Li,
Jose Luis Molina, Miranda Lubbers, Sara Francisco, Claire Kelling, Anne
DeLessio-Parson, & David Hunter.
alternate link-tracing designs
• Network Sampling with Memory
– collect network data from respondents
– minimally identifying information to link
nominated but not sampled individuals
– “search” algorithm to explore the network more
efficiently based on currently uncovered data
– recovers sampling frame
– “list” algorithm samples frame as if at random
48
Mouw & Verdery. 2012. Sociological
Methodology.
network sampling with memory
• Two sampling modes:
– Search
• Push sample to explore network by seeking bridge ties
– List
• Keep a list L of unique members, both nominated & sampled
• Sample with replacement from L
• “Even” sampling of nodes ensured by probabilistic selection
• When whole network nominated, converges to SRS
• Simulated sampling showed hybrid (S -> L) best
Ashton M. Verdery 49
network sampling with memory
test network
• Add Health high school
– 1,281 students
– 67.3% white
– 10,414 edges in data
– 587 cross race ties (w->nw)
– 8% of whites’ friends n.w.
• Conclusions:
– Homophily in the data
– But no “choke points”
– Lots of cross group ties
• Method
– Test simulated FNSM
• 500 samples, 500 cases each
– RDS, NSM, FNSM
• Calculate Cis and DEs 51
results in test network
Ashton M. Verdery 52
empirical results
Key concepts
• Where 𝑎 is number of samples, 𝑐𝑖 is the
estimated statistic from sample 𝑖, and 𝐶 is
the population parameter:
– Bias
• 𝑏𝑖𝑎𝑠 = 𝑎−1
𝑖=1
𝑖=𝑎
𝑐𝑖 − 𝐶
• “Accuracy”
– Sampling variance
• 𝑆𝑉 = 𝑎−1
𝑖=1
𝑖=𝑎
𝑐𝑖 − 𝑎−1
𝑗=1
𝑗=𝑎
𝑐𝑗
2
• “Precision”
– Root Mean Square Error (RMSE)
• 𝑅𝑀𝑆𝐸 = (𝑏𝑖𝑎𝑠2 + 𝑆𝑉)
• Balancing accuracy and precision
– There are many other error metrics; I like this
– Design effects
• 𝐷𝐸 = 𝑆𝑉𝑅𝐷𝑆 𝑆𝑉𝑆𝑅𝑆
• Precision ratio compared to simple random samples
• Sample size ratios for equivalent efficiency
Verdery, Merli, et al. Epid. 2015.
Just right?

Weitere ähnliche Inhalte

Was ist angesagt?

12. ethics in medical research
12. ethics in medical research12. ethics in medical research
12. ethics in medical researchAshok Kulkarni
 
Decentralized Monitoring in Clinical Trials
Decentralized Monitoring in Clinical TrialsDecentralized Monitoring in Clinical Trials
Decentralized Monitoring in Clinical TrialsClinosolIndia
 
Data Integrity in Decentralized Clinical Trials (DCTs)
Data Integrity in Decentralized Clinical Trials (DCTs)Data Integrity in Decentralized Clinical Trials (DCTs)
Data Integrity in Decentralized Clinical Trials (DCTs)InsideScientific
 
Phases of clinical trial 11.9.14
Phases of clinical trial 11.9.14Phases of clinical trial 11.9.14
Phases of clinical trial 11.9.14DR ANUP PETARE
 
Intro to Research Methods: Research Strategy
Intro to Research Methods: Research StrategyIntro to Research Methods: Research Strategy
Intro to Research Methods: Research StrategySam Ladner
 
Risk Based Monitoring in Clinical Trials.
Risk Based Monitoring in Clinical Trials.Risk Based Monitoring in Clinical Trials.
Risk Based Monitoring in Clinical Trials.ClinosolIndia
 
Research proposal presentation
Research proposal presentationResearch proposal presentation
Research proposal presentationAhmed Raza
 
Case Report Form (CRF)
Case Report Form (CRF)Case Report Form (CRF)
Case Report Form (CRF)Neelam Shinde
 
sampling in research methodology. qualitative and quantitative approach
sampling in research methodology. qualitative and quantitative approach sampling in research methodology. qualitative and quantitative approach
sampling in research methodology. qualitative and quantitative approach Samantha Jayasundara
 
Biomarker base™ Introduction
Biomarker base™ IntroductionBiomarker base™ Introduction
Biomarker base™ Introductionbiomarkerbase
 
Site & investigator selection
Site & investigator selectionSite & investigator selection
Site & investigator selectionMukesh Jaiswal
 
Multicenter trial
Multicenter trialMulticenter trial
Multicenter trialswati2084
 
Decentralized Clinical Trials
Decentralized Clinical TrialsDecentralized Clinical Trials
Decentralized Clinical TrialsPCE121
 
Efficacy endpoints in Oncology
Efficacy endpoints in OncologyEfficacy endpoints in Oncology
Efficacy endpoints in OncologyAngelo Tinazzi
 

Was ist angesagt? (20)

12. ethics in medical research
12. ethics in medical research12. ethics in medical research
12. ethics in medical research
 
Decentralized Monitoring in Clinical Trials
Decentralized Monitoring in Clinical TrialsDecentralized Monitoring in Clinical Trials
Decentralized Monitoring in Clinical Trials
 
Icmr Code
Icmr CodeIcmr Code
Icmr Code
 
Data Integrity in Decentralized Clinical Trials (DCTs)
Data Integrity in Decentralized Clinical Trials (DCTs)Data Integrity in Decentralized Clinical Trials (DCTs)
Data Integrity in Decentralized Clinical Trials (DCTs)
 
Phases of clinical trial 11.9.14
Phases of clinical trial 11.9.14Phases of clinical trial 11.9.14
Phases of clinical trial 11.9.14
 
Intro to Research Methods: Research Strategy
Intro to Research Methods: Research StrategyIntro to Research Methods: Research Strategy
Intro to Research Methods: Research Strategy
 
Risk Based Monitoring in Clinical Trials.
Risk Based Monitoring in Clinical Trials.Risk Based Monitoring in Clinical Trials.
Risk Based Monitoring in Clinical Trials.
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
Research proposal presentation
Research proposal presentationResearch proposal presentation
Research proposal presentation
 
Case Report Form (CRF)
Case Report Form (CRF)Case Report Form (CRF)
Case Report Form (CRF)
 
sampling in research methodology. qualitative and quantitative approach
sampling in research methodology. qualitative and quantitative approach sampling in research methodology. qualitative and quantitative approach
sampling in research methodology. qualitative and quantitative approach
 
Inferential statistics nominal data
Inferential statistics   nominal dataInferential statistics   nominal data
Inferential statistics nominal data
 
Grading Strength of Evidence
Grading Strength of EvidenceGrading Strength of Evidence
Grading Strength of Evidence
 
Biomarker base™ Introduction
Biomarker base™ IntroductionBiomarker base™ Introduction
Biomarker base™ Introduction
 
Site & investigator selection
Site & investigator selectionSite & investigator selection
Site & investigator selection
 
Multicenter trial
Multicenter trialMulticenter trial
Multicenter trial
 
Decentralized Clinical Trials
Decentralized Clinical TrialsDecentralized Clinical Trials
Decentralized Clinical Trials
 
Part 2 Cox Regression
Part 2 Cox RegressionPart 2 Cox Regression
Part 2 Cox Regression
 
ABC1 - R.E. Coleman - Bone metastases
ABC1 - R.E. Coleman - Bone metastases ABC1 - R.E. Coleman - Bone metastases
ABC1 - R.E. Coleman - Bone metastases
 
Efficacy endpoints in Oncology
Efficacy endpoints in OncologyEfficacy endpoints in Oncology
Efficacy endpoints in Oncology
 

Ähnlich wie 11 Respondent Driven Sampling

09 Respondent Driven Sampling and Network Sampling with Memory
09 Respondent Driven Sampling and Network Sampling with Memory09 Respondent Driven Sampling and Network Sampling with Memory
09 Respondent Driven Sampling and Network Sampling with Memorydnac
 
09 Respondent Driven Sampling and Network Sampling with Memory (2016)
09 Respondent Driven Sampling and Network Sampling with Memory (2016)09 Respondent Driven Sampling and Network Sampling with Memory (2016)
09 Respondent Driven Sampling and Network Sampling with Memory (2016)Duke Network Analysis Center
 
Meps secondary data analysis talk 20080806
Meps secondary data analysis talk 20080806Meps secondary data analysis talk 20080806
Meps secondary data analysis talk 20080806Marion Sills
 
Collaborating surveycenters
Collaborating surveycentersCollaborating surveycenters
Collaborating surveycentersErik Olsen
 
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant DivideDay 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant DivideAg4HealthNutrition
 
Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Peter Embi
 
Embi cri yir-2017-final
Embi cri yir-2017-finalEmbi cri yir-2017-final
Embi cri yir-2017-finalPeter Embi
 
Measuring Progress: Indicators, Data Sources and Assessment | Laszlo Pinter, ...
Measuring Progress: Indicators, Data Sources and Assessment | Laszlo Pinter, ...Measuring Progress: Indicators, Data Sources and Assessment | Laszlo Pinter, ...
Measuring Progress: Indicators, Data Sources and Assessment | Laszlo Pinter, ...NAP Global Network
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysCliff Lampe
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collectiondnac
 
Capturing Social and Clinical Knowledge for personalised care
Capturing Social and Clinical Knowledge for personalised careCapturing Social and Clinical Knowledge for personalised care
Capturing Social and Clinical Knowledge for personalised careVanessa Lopez
 
Measuring Progress: Indicators, data sources and assessment | Laszlo Pinter, ...
Measuring Progress: Indicators, data sources and assessment | Laszlo Pinter, ...Measuring Progress: Indicators, data sources and assessment | Laszlo Pinter, ...
Measuring Progress: Indicators, data sources and assessment | Laszlo Pinter, ...NAP Global Network
 
Recruiting Study Participants Online using Amazon's Mechanical Turk
Recruiting Study Participants Online using Amazon's Mechanical TurkRecruiting Study Participants Online using Amazon's Mechanical Turk
Recruiting Study Participants Online using Amazon's Mechanical TurkSC CTSI at USC and CHLA
 
April Heyward Research Methods Class Session - 7-29-2021
April Heyward Research Methods Class Session - 7-29-2021April Heyward Research Methods Class Session - 7-29-2021
April Heyward Research Methods Class Session - 7-29-2021April Heyward
 

Ähnlich wie 11 Respondent Driven Sampling (20)

03 RDS
03 RDS03 RDS
03 RDS
 
09 Respondent Driven Sampling and Network Sampling with Memory
09 Respondent Driven Sampling and Network Sampling with Memory09 Respondent Driven Sampling and Network Sampling with Memory
09 Respondent Driven Sampling and Network Sampling with Memory
 
09 Respondent Driven Sampling and Network Sampling with Memory (2016)
09 Respondent Driven Sampling and Network Sampling with Memory (2016)09 Respondent Driven Sampling and Network Sampling with Memory (2016)
09 Respondent Driven Sampling and Network Sampling with Memory (2016)
 
Meps secondary data analysis talk 20080806
Meps secondary data analysis talk 20080806Meps secondary data analysis talk 20080806
Meps secondary data analysis talk 20080806
 
Quality Assessment of Community Evidence (QACE) Tools (March 2020)
Quality Assessment of Community Evidence (QACE) Tools (March 2020)Quality Assessment of Community Evidence (QACE) Tools (March 2020)
Quality Assessment of Community Evidence (QACE) Tools (March 2020)
 
Collaborating surveycenters
Collaborating surveycentersCollaborating surveycenters
Collaborating surveycenters
 
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant DivideDay 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
Day 1 - Quisumbing and Davis - Moving Beyond the Qual-Quant Divide
 
Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024Clinical Research Informatics Year-in-Review 2024
Clinical Research Informatics Year-in-Review 2024
 
Embi cri yir-2017-final
Embi cri yir-2017-finalEmbi cri yir-2017-final
Embi cri yir-2017-final
 
Towards Open Research: practices, experiences, barriers and opportunities
Towards Open Research: practices, experiences, barriers and opportunitiesTowards Open Research: practices, experiences, barriers and opportunities
Towards Open Research: practices, experiences, barriers and opportunities
 
Measuring Progress: Indicators, Data Sources and Assessment | Laszlo Pinter, ...
Measuring Progress: Indicators, Data Sources and Assessment | Laszlo Pinter, ...Measuring Progress: Indicators, Data Sources and Assessment | Laszlo Pinter, ...
Measuring Progress: Indicators, Data Sources and Assessment | Laszlo Pinter, ...
 
AAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveysAAPOR - comparing found data from social media and made data from surveys
AAPOR - comparing found data from social media and made data from surveys
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
 
Introductory sessions
Introductory sessionsIntroductory sessions
Introductory sessions
 
Capturing Social and Clinical Knowledge for personalised care
Capturing Social and Clinical Knowledge for personalised careCapturing Social and Clinical Knowledge for personalised care
Capturing Social and Clinical Knowledge for personalised care
 
Measuring Progress: Indicators, data sources and assessment | Laszlo Pinter, ...
Measuring Progress: Indicators, data sources and assessment | Laszlo Pinter, ...Measuring Progress: Indicators, data sources and assessment | Laszlo Pinter, ...
Measuring Progress: Indicators, data sources and assessment | Laszlo Pinter, ...
 
Recruiting Study Participants Online using Amazon's Mechanical Turk
Recruiting Study Participants Online using Amazon's Mechanical TurkRecruiting Study Participants Online using Amazon's Mechanical Turk
Recruiting Study Participants Online using Amazon's Mechanical Turk
 
Zhuoran zhang
Zhuoran zhangZhuoran zhang
Zhuoran zhang
 
April Heyward Research Methods Class Session - 7-29-2021
April Heyward Research Methods Class Session - 7-29-2021April Heyward Research Methods Class Session - 7-29-2021
April Heyward Research Methods Class Session - 7-29-2021
 

Mehr von Duke Network Analysis Center

01 Add Health Network Data Challenges: IRB and Security Issues
01 Add Health Network Data Challenges: IRB and Security Issues01 Add Health Network Data Challenges: IRB and Security Issues
01 Add Health Network Data Challenges: IRB and Security IssuesDuke Network Analysis Center
 
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...Duke Network Analysis Center
 
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)Duke Network Analysis Center
 
02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and OverviewDuke Network Analysis Center
 
00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network FunctionDuke Network Analysis Center
 
00 Arrest Networks and the Spread of Violent Victimization
00 Arrest Networks and the Spread of Violent Victimization00 Arrest Networks and the Spread of Violent Victimization
00 Arrest Networks and the Spread of Violent VictimizationDuke Network Analysis Center
 
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...Duke Network Analysis Center
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...Duke Network Analysis Center
 

Mehr von Duke Network Analysis Center (20)

01 Add Health Network Data Challenges: IRB and Security Issues
01 Add Health Network Data Challenges: IRB and Security Issues01 Add Health Network Data Challenges: IRB and Security Issues
01 Add Health Network Data Challenges: IRB and Security Issues
 
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
 
24 The Evolution of Network Thinking
24 The Evolution of Network Thinking24 The Evolution of Network Thinking
24 The Evolution of Network Thinking
 
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
22 An Introduction to Stochastic Actor-Oriented Models (SAOM or Siena)
 
20 Network Experiments
20 Network Experiments20 Network Experiments
20 Network Experiments
 
19 Electronic Medical Records
19 Electronic Medical Records19 Electronic Medical Records
19 Electronic Medical Records
 
18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence
 
17 Statistical Models for Networks
17 Statistical Models for Networks17 Statistical Models for Networks
17 Statistical Models for Networks
 
15 Network Visualization and Communities
15 Network Visualization and Communities15 Network Visualization and Communities
15 Network Visualization and Communities
 
13 Community Detection
13 Community Detection13 Community Detection
13 Community Detection
 
09 Ego Network Analysis
09 Ego Network Analysis09 Ego Network Analysis
09 Ego Network Analysis
 
07 Whole Network Descriptive Statistics
07 Whole Network Descriptive Statistics07 Whole Network Descriptive Statistics
07 Whole Network Descriptive Statistics
 
04 Network Data Collection
04 Network Data Collection04 Network Data Collection
04 Network Data Collection
 
02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview
 
00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function00 Differentiating Between Network Structure and Network Function
00 Differentiating Between Network Structure and Network Function
 
00 Arrest Networks and the Spread of Violent Victimization
00 Arrest Networks and the Spread of Violent Victimization00 Arrest Networks and the Spread of Violent Victimization
00 Arrest Networks and the Spread of Violent Victimization
 
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
00 Networks of People Who Use Opiods Nonmedically: Reports from Rural Souther...
 
00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...00 Automatic Mental Health Classification in Online Settings and Language Emb...
00 Automatic Mental Health Classification in Online Settings and Language Emb...
 
12 SN&H Keynote: Thomas Valente, USC
12 SN&H Keynote: Thomas Valente, USC12 SN&H Keynote: Thomas Valente, USC
12 SN&H Keynote: Thomas Valente, USC
 
11 Siena Models for Selection & Influence
11 Siena Models for Selection & Influence 11 Siena Models for Selection & Influence
11 Siena Models for Selection & Influence
 

Kürzlich hochgeladen

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxrohankumarsinghrore1
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptRakeshMohan42
 

Kürzlich hochgeladen (20)

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 

11 Respondent Driven Sampling

  • 1. Respondent-Driven Sampling: An Overview Ashton M. Verdery Duke Network Analysis Center May, 2019
  • 2. Outline • Intuition about network sampling • Leveraging social networks for sampling – Why? – How? • What is RDS? – Hidden populations – RDS origins and concepts – RDS applications – Pitfalls and promises of RDS – New directions Ashton M. Verdery 2
  • 3. Samples from social networks Ashton M. Verdery 3 1 6 2 8 3 11 4 5 9 7 10
  • 4. Samples from social networks Ashton M. Verdery 4 1 6 2 8 3 11 4 5 9 7 10
  • 5. Samples from social networks Ashton M. Verdery 5 1 6 2 8 3 11 4 5 9 7 10
  • 6. Samples from social networks Ashton M. Verdery 6 1 6 2 8 3 11 4 5 9 7 10
  • 7. Samples from social networks Ashton M. Verdery 7 1 6 2 8 3 11 4 5 9 7 10
  • 8. Samples from social networks Ashton M. Verdery 8 1 6 2 8 3 11 4 5 9 7 10
  • 9. Samples from social networks Ashton M. Verdery 9 1 6 2 8 3 11 4 5 9 7 10
  • 10. Samples from social networks Ashton M. Verdery 10 1 6 2 8 3 11 4 5 9 7 10
  • 11. Why do it? • Future of social science research – New populations of interest are hard to survey • e.g., undocumented migrants, people who use drugs – New theories & tools require new types of data • e.g., social network analysis – Existential threat of declining survey participation • i.e, all groups are becoming hidden populations Ashton M. Verdery 11
  • 13. Hidden populations • Collecting data from hidden populations is difficult because the absence of a sampling frame – Stigma – Non response – Lack of trust – Rarity Ashton M. Verdery 13 Household based sampling in Lilongwe, Malawi Escamilla et al. 2014
  • 14. How to sample hidden populations? • Traditional approaches – Convenience samples – Clinical samples – Location samples • Problems – Are we learning about people other than those sampled? • Limited ability to infer representation • Poor coverage for sampling frame • Often time intensive, costly, very small samples Ashton M. Verdery 14
  • 15. Respondent-Driven Sampling (RDS) • A sociological method with wide applications – Heckathorn 1997 • Most popular solution to problems of hidden populations in recent decades (as of May 2019) – 544+ studies – 1.2k+ papers, 24k+ cites – H-index of 59 – Over $213 mill. from NIH • Compare to “ego centric” – 254 studies funded – $59 million since 1990
  • 16. RDS applications • Hidden populations of many stripes – Men who have sex with men – People who inject drugs – Commercial sex workers – High risk heterosexuals – Other drug users (opioids, methamphetamines) – Domestic violence victims – Victims of sexual violence (child prostitution, sex trafficking, war-time rape) – Jazz musicians – Vegetarians and vegans in Argentina – Wheelchair users – Non-institutionalized older adults (85+) • Most common questions – Can we sample this population? – What are the characteristics of this population? – What is the size of this population? 16
  • 17. Top 10 fields Ashton M. Verdery 17 Web of Science. May 2019.
  • 18. RDS overview Two parts 1) Chain referral / peer recruitment – “Seed” participants receive 2 coupons • Recruit 2 new participants each • Dual incentives for participation & recruitment • Each new respondent given 2 coupons to recruit others • Process continues until desired sample size is obtained • (No one participates more than once) • *Researchers lack control of sampling process 2) Post-recruitment weighting of cases – Correct for theoretical sampling process – Make inferences about population & quantify uncertainty
  • 19. Seeds & coupons 19 Wirtz et al. 2017 • Seeds – 7-10 population members – Convenience selection • Willing to participate • Large personal networks • Diverse on relevant attributes • Coupons – Give 2 to 3 per respondent • Non-seeds can only participate with a coupon – Uniquely coded for tracking • Codes given out & redeemed – Non-physical coupons • Possible, but challenging
  • 20. Coupons Ashton M. Verdery 20 Contact number Consent and study description (on back) Valid dates Interview site location Tracking codes
  • 21. Example Ashton M. Verdery 21(Fisher and Merli, Net. Sci. 2014)
  • 22. Example 22(Verdery, et al Soc. Meth. 2017) Sampled = black
  • 23. Core resources • Useful website from Handcock, Gile, & collaborators – http://hpmrg.org • Manuals for RDS survey design – Johnson tutorial, with questionnaires, consent forms, etc. • http://applications.emro.who.int/dsaf/EMRPUB_2013_EN_1539.pdf – CDC, UNAIDS, & others also have useful manuals • https://www.cdc.gov/hiv/pdf/statistics/systems/nhbs/nhbs-idu3_nhbs-het3-protocol.pdf • https://globalhealthsciences.ucsf.edu/sites/globalhealthsciences.ucsf.edu/files/ibbs-rds-protocol.pdf • Software for RDS analysis – Stand alone software for RDS coupon management & analysis • http://www.respondentdrivensampling.org/main.htm – R package “RDS” for analysis & diagnostics • https://cran.r-project.org/web/packages/RDS/index.html – Stata packages for analysis • http://www.stata-journal.com/article.html?article=st0247 • I have unreleased Stata packages for many RDS estimators and RDS multivariate regression • Diagnostics for RDS preplanning and post-survey analysis – http://www.princeton.edu/~mjs3/gile_diagnostics_2014.pdf
  • 24. Network structure assumptions There is a social network Population size large (N>>n) Homophily weak Community structure weak Connected graph w/1 component (giant component) All ties reciprocated (undirected) Known population size N Sampling assumptions Sampling with replacement Single, non-branching chain (1 seed; 1 coupon) Sufficiently many sample waves Initial sample of seeds unbiased Degree accurately measured Conditionally random referrals (random Key concepts & assumptions • Baseline assumptions – Population members are linked in a social network & will refer other members into the study • Key concepts – Primary & secondary interviews – Respondent degree – Random recruitment – Bottlenecks – Bias, sampling variance, & RMSE • Different estimators make different assumptions about recruitment process and underlying network Ashton M. Verdery 24 (see Gile 2011:144)
  • 25. Primary & secondary interviews Ashton M. Verdery 25
  • 26. Respondent degree • Degree – Popularity – How many incoming ties • network assumed undirected • Typical solicitation – “how many people do you know (you know their name and they know yours) who have exchanged sex for money in the past six months?” – Often, successive restrictions • Last 30 days, live in area, etc. • Key element of most mean estimators 𝑤𝑖 = 𝑑𝑖 −1 𝑖 𝑑𝑖 −1 Merli, et al Soc. Sci. Med.
  • 27. Assumption: “random recruitment” Ashton M. Verdery 27 A B C D 3/9 3/9 3/9
  • 28. In practice: “preferential recruitment” Ashton M. Verdery 28 A B C D 4/9 4/9 1/9
  • 29. Reasons for preferential recruitment • NOT A REASON – Has more connections to similar people • In principle, the weighting approaches should deal with this • Reasons (not exhaustive) – Better relationships with similar people – Wants to help friend who needs money – Wants friend to get HIV test – Only friends who do riskier things want to get tested – Unemployed friends more likely to be encountered – Etc. Ashton M. Verdery 29
  • 30. “Bottlenecks” • Few ties between clusters – Assumed to matter substantially – Somewhat overstated • General advice: – Split sample – Tough to achieve a priori With n=500, rds on this network exhibits 150X the sampling variance of SRS and the estimated sampling variance bears no relation to this, we see this in network after network after network Mouw & Verdery Soc. Meth. 2012 Salgnik & Goel Stat. Med. 2009
  • 31. Key concepts • Bias – “Accuracy” – How far from the population parameter is the average sample? • Sampling variance – “Precision” – How variable are the results, sample to sample, on average? – Often expressed as Design Effects • Ratio of RDS to SRS sampling variance • Interpretable as sample size multiplier • Root Mean Square Error (RMSE) – Balancing accuracy and precision • There are many other error metrics Verdery, Merli, et al. Epid. 2015. Just right?
  • 32. Contrast with SRS Network: Project 90 (N=4413) Variable: Percent White RDS – Unbiased, 10 seeds, 3 coupons – Without replacement – N=150 SRS – Without replacement – n=150 Ashton M. Verdery 32 Project 90 network, red nodes=non-white Verdery et al. 2017
  • 33. Contrast with SRS Ashton M. Verdery 33 0 50 100 150 200 250 40 60 80 100 Estimated percentage RDS Sample 0 50 100 150 200 250 Numberofsamples 40 60 80 100 Estimated percentage SRS Sample
  • 34. Contrast with SRS Ashton M. Verdery 34 Backup: https://youtu.be/BZL3XBeG7W8
  • 35. Contrast with SRS Ashton M. Verdery 35
  • 36. Contrast with SRS (n=400) Ashton M. Verdery 36 0 10 20 30 Frequency .5 .6 .7 .8 .9 Estimate RDS VH weights SRS w/o replacement
  • 37. • Early RDS work focused on bias, but sampling variance is also critical • A related concern: – Quantifying uncertainty – After data collection, can you say: • How biased your sample is? • How results would vary sample to sample? – Key feature of inferential statistics • E.g., if sampling conformed to assumptions, we can provide a confidence interval for an estimate and be reasonably sure the confidence interval is accurate • Is this true in RDS? Bias, sampling variance, & uncertainty 37
  • 38. Quantifying uncertainty • Traditional estimators of RDS sampling variance are bad • Example – Sampling variance (SV) • RDS mean estimators have high SV – Estimated sampling variance • RDS SV estimators have high bias Verdery et al., Plos1 2015
  • 39. Recent progress on estimating RDS sampling variance Ashton M. Verdery 39 Baraff, et al., PNAS. 2015
  • 40. Estimators • Of the population mean – At least 11 in current use • Table on right • McCreesh et al. 2013 • Crawford 2016 • Gile & Handcock 2015 • Berchenko 2017 • Of the sampling variance – 5 primary methods in use • Bootstrap (Salganik 2006) • Analytical (Volz & Heckathorn 2008) • Successive Sampling (Gile 2011) • Model assisted (Gile & Handcock 2015) • Tree Bootstrap (Baraff et al. 2016) eTable 1. The seven respondent-driven sampling estimators evaluated in this paper. Estimator Source 1. Naïve None 2. RDS1-SH Salganik MJ, Heckathorn DD. Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling. Sociol Methodol. 2004;34(1):193–240. doi:10.1111/j.0081-1750.2004.00152.x. 3. RDS1-DS Heckathorn DD. Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden Populations. Soc Probl. 2002;49(1):11-34. doi:10.1525/sp.2002.49.1.11. 4. RDS1-DG Heckathorn DD. Extensions of Respondent-Driven Sampling: Analyzing Continuous Variables and Controlling for Differential Recruitment. Sociol Methodol. 2007;37(1):151–207. doi:10.1111/j.1467-9531.2007.00188.x. 5. RDS1-LEN Lu X. Linked Ego Networks: Improving estimate reliability and validity with respondent-driven sampling. Soc Netw. 2013;35(4):669-685. doi:10.1016/j.socnet.2013.10.001. 6. RDS2-VH Volz E, Heckathorn DD. Probability based estimation theory for respondent driven sampling. J Off Stat. 2008;24(1):79. 7. RDS2-SS Gile KJ. Improved Inference for Respondent-Driven Sampling Data With Application to HIV Prevalence Estimation. J Am Stat Assoc. 2011;106(493):135-146. doi:10.1198/jasa.2011.ap09475. 40 Verdery, et al., Epid. 2015
  • 41. General comments on estimators • For the population mean – “linked ego networks” is best • Requires respondents know peer attributes reasonably well • Can’t calculate for many variables of interest – Naïve estimator often works – Most common • Volz-Heckathorn • Successive Sampling – (In general, SS is better) • For the sampling variance – Only the tree bootstrap method seems to have anything resembling reasonable properties 41 Verdery, et al., Epid. 2015
  • 42. Diagnostics • Embed questions in the survey to allow you to estimate whether assumptions were met – E.g., ask why people recruited those they did, how many people they tried to recruit who had already participated, etc. • Assess potential bottlenecks and seed bias with convergence plots Ashton M. Verdery 42 Johnston, et al., Epid. 2015
  • 43. A few notes on web-based RDS • Developing area with challenges but lots of potential • Recommendations – Differences from traditional • Be prepared to expand to 30-60 seeds; 20+ waves – Verification • Respondent Uniqueness – IP address verification; web-cam interview? • Respondent is in target population – In geographic area of interest? Fits other criteria? • Coupon management – Careful with secondary incentives – Remember limitations • Internet access, etc. 43
  • 44. If problems… • Expand recruitment – Expand number of seeds – Expand allowable recruits – Raise incentives – Reduce burdens • Greater emphasis on anonymity • Shorten survey • Drop secondary interview • If all else fails… – Convenience sample – Lean on other features • It won’t always look like it does on paper Ashton M. Verdery 44
  • 45. My recommendations • 1) Embed additional data collection in RDS – Qualitative interviews – Ego network rosters – Minimally identifiable information about alters • 2) Examine more than just prevalence – Population size – Network structure – Multivariate relationships Ashton M. Verdery 45
  • 46. Promises & pitfalls Weighting/estimation can yield asymptotically unbiased estimates of population mean – Unrealistic, hard to verify assumptions required Design effects remain high – Orders of magnitude larger N needed But… – New data on understudied populations – Effective, fast method (50 cases/week) – Possible to learn a lot about networks (underutilized) Ashton M. Verdery 46
  • 47. Thank you! Portions of this work were supported by a grant from the National Institutes of Health (1 R03 SH000056-01; Verdery PI): “Multivariate Regression with Respondent- Driven Sampling Data.” I also appreciate assistance from the Justice Center for Research, the Institute for CyberScience, the Social Science Research Institute, the College of the Liberal Arts, and the Population Research Institute at Penn State University, the last of which is supported by an infrastructure grant from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (P2CHD041025 & R24 HD041025). Other portions of this work benefitted from support from the Duke Network Analysis Center, the Duke Population Research Institute, and the Carolina Population Center. Ashton M. Verdery: amv5430@psu.edu I thank many coauthors: M. Giovanna Merli, James Moody, Ted Mouw, Peter J. Mucha, Jacob C. Fisher, Shawn Bauldry, Nalyn Siripong, Jeff Smith, Kahina Abdessalem, Sergio Chavez, Heather Edelblute, Jing Li, Jose Luis Molina, Miranda Lubbers, Sara Francisco, Claire Kelling, Anne DeLessio-Parson, & David Hunter.
  • 48. alternate link-tracing designs • Network Sampling with Memory – collect network data from respondents – minimally identifying information to link nominated but not sampled individuals – “search” algorithm to explore the network more efficiently based on currently uncovered data – recovers sampling frame – “list” algorithm samples frame as if at random 48 Mouw & Verdery. 2012. Sociological Methodology.
  • 49. network sampling with memory • Two sampling modes: – Search • Push sample to explore network by seeking bridge ties – List • Keep a list L of unique members, both nominated & sampled • Sample with replacement from L • “Even” sampling of nodes ensured by probabilistic selection • When whole network nominated, converges to SRS • Simulated sampling showed hybrid (S -> L) best Ashton M. Verdery 49
  • 51. test network • Add Health high school – 1,281 students – 67.3% white – 10,414 edges in data – 587 cross race ties (w->nw) – 8% of whites’ friends n.w. • Conclusions: – Homophily in the data – But no “choke points” – Lots of cross group ties • Method – Test simulated FNSM • 500 samples, 500 cases each – RDS, NSM, FNSM • Calculate Cis and DEs 51
  • 52. results in test network Ashton M. Verdery 52
  • 54. Key concepts • Where 𝑎 is number of samples, 𝑐𝑖 is the estimated statistic from sample 𝑖, and 𝐶 is the population parameter: – Bias • 𝑏𝑖𝑎𝑠 = 𝑎−1 𝑖=1 𝑖=𝑎 𝑐𝑖 − 𝐶 • “Accuracy” – Sampling variance • 𝑆𝑉 = 𝑎−1 𝑖=1 𝑖=𝑎 𝑐𝑖 − 𝑎−1 𝑗=1 𝑗=𝑎 𝑐𝑗 2 • “Precision” – Root Mean Square Error (RMSE) • 𝑅𝑀𝑆𝐸 = (𝑏𝑖𝑎𝑠2 + 𝑆𝑉) • Balancing accuracy and precision – There are many other error metrics; I like this – Design effects • 𝐷𝐸 = 𝑆𝑉𝑅𝐷𝑆 𝑆𝑉𝑆𝑅𝑆 • Precision ratio compared to simple random samples • Sample size ratios for equivalent efficiency Verdery, Merli, et al. Epid. 2015. Just right?

Hinweis der Redaktion

  1. See Bengston et al. 2012 for a successful case