SlideShare a Scribd company logo
1 of 23
Download to read offline
Using STATA in Survey
Data Analysis
NIVEEN EL ZAYAT
ASSISTANT LECTURER
DEPARTMENT OF STATISTICS
FACULTY OF ECONOMICS AND POLITICAL SCIENCE
Outlines
 Aspects of survey data
 Survey Designs
 Setting survey design in STATA: svyset
 Describing survey design: svydes
 Estimation commands using prefix svy: (In STATA do-file)
 Estimation for subpopulations (In STATA do-file)
 Extracting survey-weighted tables and graphs (In STATA do-file)
Aspects of survey data
 What is Survey?
 A research technique to draw information about well-defined population through selecting a
sample that systematically questioned, hence, results are analyzed and generalized to the
population.
 Surveys necessary for providing decision-makers with information that serves the planning,
monitoring and evaluation purposes
 Survey data is collected through either “complete enumeration” or “sampling”.
 Complete enumeration indicates to collecting data from all units exist in the population.
 Sampling indicates to collecting data from a subset of the population and designed to be
hopefully representative to that population and used to determine truths about it.
 Censuses are examples of complete enumeration while Demographic Health Survey (DHS) and
Labor Force Sample Survey are examples of sample surveys.
Aspects of survey data
 Populations in research
 Target population:
1. Refers to the ENTIRE group of individuals that researchers are interested in.
2. It is known as the theoretical population.
 Accessible population:
1. Refers to the one that researchers can get access to their individuals.
2. It is a subset of the target population and also known as the study population.
3. Researchers actually draw their samples form it thus it is define by the sample frame.
Aspects of survey data
 Why Sample Survey Data?
 Assessing entire population may be impossible:
 Impractical; in case of infinite population and population with destructive observations.
 Data are wasteful if they are not collected within time limit.
 Expensive; a sample survey will be less costly than complete enumeration.
 Limited recourses (money and human) and extremely large workload.
 Cause a lot of errors to control and monitor.
 Cause destructiveness of the observations.
 Lists (frame) are rarely up to date.
 Due to the above reasons, samples are the wonderful option.
Aspects of survey data
 Probability (random) Samples
 Every unit of the population at any stage is drawn with known probability. Inference
about population parameter can be easily drawn based on “probability theory”.
 There are two sampling procedures:
1. Sampling with replacement (with finite population)
2. Sampling without replacement (with infinite population)
Survey Designs
► Simple Survey Design: SRS
 Data from a single-round survey analyzed with limited reference to other information (aka “flat” or
“rectangle”)
 Every unit in the population has an equal chance of being part of the sample.
 Data collection is very simple, it just needs sample frame to the study population.
 Often lead to inaccurate point estimates and/or inaccurate standard errors if population is not
homogeneous.
 Most standard statistical methods assume simple random sampling
Complex or multi-stage designSimple survey design
Stratified Sample
Simple Random Sample (SRS) Cluster Sample
Stratified-cluster sample
► Complex (multi-stages/ hierarchical) designs
 Drawing SRS is sometimes impossible (e.g., there is indicative frame to the population elements).
 Different elements may have different probability of selection into the sample due to population
nature.
 Used with hierarchical data (HH surveys).
 Many statistical procedures assume i.i.d. , moreover, many statistical packages treat data as SRS.
 Elements are not samples independently in most surveys as one may need to select groups of
individual.
Survey Designs
Survey Designs
► Stratification
 The entire studied population is divided into well-defined groups called “strata” based
on a relevant characteristic often based on geographically (region) or demographic
variables (gender, level of education or SES).
 Sampling units are independently and randomly sampled from within each stratum with
different probability of selection.
 It is usually results in smaller variance and standard errors than that of SRS.
 An example is to stratify the population by locality (urban/rural).
Survey Designs
► Clustering
 Set of individuals (regions, districts, city blocks or households) are sampled as a group (cluster)
then population elements are drawn from the selected cluster.
 Further subsamples within cluster may be drawn (often called multi-stage design).
 The highest level of cluster is referred to as Primary Sampling Unit (PSU)
 The lower level of cluster is referred to as Secondary Sampling Unit (SSU)
 An example when geographical regions, such as local government areas, are selected in the
first stage. In the second stage schools were selected. In the third stage, the unit of analysis -
perhaps teachers or students are sampled. Regions represent PSUs in this example.
 Different sample techniques may be applied at different stages which increase sample-to-
sample variability and lead to higher variance and standard errors.
Survey Designs
► Exercise
Survey Designs
► Exercise (Name each of the following design)
Survey Designs
► Exercise (Name each of the following design)
Survey Designs
► Features of survey designs
 Probability (sample) weights
 The most common is the sampling weight (aka probability weight) which is used to weight the
sample back to the population from which the sample was drawn.
 By definition, this weight is the inverse of the sample fraction (N/n).
 In a two-stage design, the probability weight is calculated as f1f2, which means that the inverse of
the sampling fraction for the first stage is multiplied by the inverse of the sampling fraction for the
second stage.
 In actual survey data sets, the "final weight" usually starts with the inverse of the sampling fraction,
then several adjustments may be applied to account for sampling design problems such as unit
non-response, errors in the sampling frame (aka non-coverage) or post-stratification.
Survey Designs
► Features of survey designs
 Finite population correction (FPC)
 It is an adjustment applied to the variance due to sampling without replacement from finite
population. Based on central limit theorem, FPC is calculated as:
FPC = [ (N-n)/(N-1) ]1/2.
 PC is usually applied when sample fraction (n/N) is large otherwise when n is small relative to
the population size N, the FPC is almost close to 1, it will have a little impact and can be
safely ignored.
 For multi-stage survey design, one may apply FPC at one or more stages.
Survey Designs
► Features of survey designs
 Design effect (DEFF)
 Standard errors under different sample designs are compared using design effect statistics. For complex
samples, this is typically carried out by drawing comparisons to a hypothetical simple random sample (SRS)
of the same size.
 It is computed as the ratio of the variance of an estimate θ (based on complex design) to the variance of
an estimate θ from a simple random sample (SRS) of the same size; DEEF=Var(θDesign)/Var(θSRS).
 Design factor (DEFT)
The square root of the design effect; DEFT=(D.EEF)½ which sets things back to the scale of standard errors.
 DEFT=1 (No effect of sample design on standard errors).
 DEFT>1 (Sample design increase/ inflate standard errors).
 DEFT<1 (Sample design reduces standard errors).
Setting survey design in STATA: svyset
 STATA Syntax
The command svyset (declare data as survey data) is used to identify the sample design features of
your data to STATA. It allows us to identify a wide range of complex sampling designs.
Single-stage design:
svyset [psu_varname] [weight_var] [, strata(varname) fpc(varname) options]
Multiple-stage design
svyset psu_var [weight_var] [, design_options options] [|| ssu_var , design_options]
Once the data saved with the survey design it will be a part of dataset until they are cleared or
changed or a new dataset is loaded into memory.
Setting survey design in STATA: svyset
 One-stage design
 Example 1: The National Maternal and Infant Health Survey (NMIHS) 1988:
Data file name: NMIHS.dta (represents a sample of 9,953 live births)
population was divided into 6 strata according to the subdomains of two birth demographic variables; race
(black & non black) and birth weight (<1,500 g, 1,500-2,499 g & 2,500+ g), a systematic samples of live birth that
were restricted to women 15+ years of age and that were registered in 48 States.
Svyset
no survey characteristics are set
svyset [pw=finwgt], strata(stratan)
pweight: finwgt
VCE: linearized
Single unit: missing
Strata 1: stratan
SU 1: <observations>
FPC 1: <zero>
This is an example of stratified systematic design. The weight variable was adjusted to get “finwgt” (see Table N,
look at p23 in the report pdf file).
Setting survey design in STATA: svyset
 Two-stage design
 Example 2: Oman World Health Survey (OWHS) 2008:
Data file name: OWHS_chronic.dta [40% of the original sample], the design of the survey was
set as a part of the data set
svyset
pweight: adjINDweight
VCE: linearized
Single unit: missing
Strata 1: v022
SU 1: v021
FPC 1: <zero>
This is an example of stratified cluster design. Strata is represented by “v022”, clusters by “v021”
and the weight variable by “adjINDweight”.
Setting survey design in STATA: svyset
 Exercises 1: We want to know the income per household in a certain city, and we don’t have a list of households.
Instead of trying to create a list of households, it would be more practical to sample blocks. Each block would be
considered a sampling unit. Assuming a FPC variable exist in the data set, write the STATA command declaring
such survey design?
 Exercises 2: In above example, instead of having sampling clusters from the city, we first divided the city into
regions and then, within each region, we sampled blocks (eventually with different criteria among regions).
Assuming a FPC variable exist in the data set, write the STATA command declaring such survey design?
► More survey design (Hypothetical Exercises)
svyset block [pw = pwvar], fpc(fpcvar)
svyset block [pw = pwvar], strata(region) fpc(fpcvar)
Setting survey design in STATA: svyset
 Exercises 3: We want to perform a survey on the eating habits of children attending elementary schools. A possible
design would be: perform samples independently on each state. For each state, perform a random sample of
counties. Within each county, perform a random sample of schools, and interview each student for the selected
schools. Assuming a FPC variables exist at each stage, write the STATA command declaring such design?
 Exercises 4: In above example, if within each school we stratify per grade and sample students independently on
each grade, then we need to add another level. Assuming a FPC variables exist at each stage, write the STATA
command declaring such design?
► More survey design (Hypothetical Exercises)
svyset county [pw = pwvar], strata(state) fpc(fpcvar) || school, fpc(fpcvar2)
svyset county [pw = pwvar], strata(state) fpc(fpcvar) || school, fpc(fpcvar2) || student, fpc(fpcvar3) strata(grade)
Describing survey design: svydes
 STATA Syntax
The command svydes describe the survey design that was previously declared to the data set by
svyset.
svyset [varlist], [stage(#) finalstage single option]
For multistage design it describe the design for each stage by determining the number of the
stage by option [stage(#)]. Option [single] used to only list strata with single PSUs (singleton).
Generally, It adds “*” at the strata id variable to show that it has a single PSU.
Describing survey design: svydes
 Two-stage design
use data set: OWHS_chronic.dta , STATA command
Svydes
 Stratum is the stratum id number given by the strata
variable;
 #units is the number of PSUs in the strata and
 #Obs the number of PSUs in a given stratum.
 The other columns give some summary statistics on the
number of observations.
 The important thing to note here: if strata have singleton
PSUs then #units will =1. This means they only include one
PSU which also indicated by a "*"

More Related Content

What's hot

Kaplan meier survival curves and the log-rank test
Kaplan meier survival curves and the log-rank testKaplan meier survival curves and the log-rank test
Kaplan meier survival curves and the log-rank test
zhe1
 

What's hot (20)

Introduction to SPSS 22 - step by steps
Introduction to SPSS 22 - step by stepsIntroduction to SPSS 22 - step by steps
Introduction to SPSS 22 - step by steps
 
Kappa statistics
Kappa statisticsKappa statistics
Kappa statistics
 
Chi square test
Chi square testChi square test
Chi square test
 
Overview of DGFP DHIS2
Overview of DGFP DHIS2 Overview of DGFP DHIS2
Overview of DGFP DHIS2
 
Introduction to Stata
Introduction to Stata Introduction to Stata
Introduction to Stata
 
Kaplan meier survival curves and the log-rank test
Kaplan meier survival curves and the log-rank testKaplan meier survival curves and the log-rank test
Kaplan meier survival curves and the log-rank test
 
Introduction to STATA - Ali Rashed
Introduction to STATA - Ali RashedIntroduction to STATA - Ali Rashed
Introduction to STATA - Ali Rashed
 
LQAS 2011
LQAS 2011LQAS 2011
LQAS 2011
 
Chisquare
ChisquareChisquare
Chisquare
 
National family health survey 5
National family health survey 5National family health survey 5
National family health survey 5
 
Parametric versus semi nonparametric parametric regression models
Parametric versus semi nonparametric parametric regression modelsParametric versus semi nonparametric parametric regression models
Parametric versus semi nonparametric parametric regression models
 
Sources of data collection
Sources of data collectionSources of data collection
Sources of data collection
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Surveillance
SurveillanceSurveillance
Surveillance
 
Choosing a statistical test
Choosing a statistical testChoosing a statistical test
Choosing a statistical test
 
Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3Basics of Systematic Review and Meta-analysis: Part 3
Basics of Systematic Review and Meta-analysis: Part 3
 
Multi level models
Multi level modelsMulti level models
Multi level models
 
Introduction to EpiData
Introduction to EpiDataIntroduction to EpiData
Introduction to EpiData
 
Lecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisLecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysis
 
India sample registration system
India sample registration systemIndia sample registration system
India sample registration system
 

Viewers also liked

STATA - Summary Statistics
STATA - Summary StatisticsSTATA - Summary Statistics
STATA - Summary Statistics
stata_org_uk
 
Stata Learning From Treiman
Stata Learning From TreimanStata Learning From Treiman
Stata Learning From Treiman
Chengjun Wang
 
STATA - Importing Data
STATA - Importing DataSTATA - Importing Data
STATA - Importing Data
stata_org_uk
 
Narrative inquiry
Narrative inquiryNarrative inquiry
Narrative inquiry
Paul Rhodes
 
STATA - Merge or Drop Data
STATA - Merge or Drop DataSTATA - Merge or Drop Data
STATA - Merge or Drop Data
stata_org_uk
 
STATA - Download Examples
STATA - Download ExamplesSTATA - Download Examples
STATA - Download Examples
stata_org_uk
 
STATA - Linear Regressions
STATA - Linear RegressionsSTATA - Linear Regressions
STATA - Linear Regressions
stata_org_uk
 

Viewers also liked (18)

STATA - Summary Statistics
STATA - Summary StatisticsSTATA - Summary Statistics
STATA - Summary Statistics
 
Magazine research survey analysis
Magazine research  survey analysisMagazine research  survey analysis
Magazine research survey analysis
 
التعداد الإقتصادى 2012/2013
التعداد الإقتصادى 2012/2013 التعداد الإقتصادى 2012/2013
التعداد الإقتصادى 2012/2013
 
Overview of Uses of Economic Census Data - Noha Samy - Mai El Mossallamy - Ta...
Overview of Uses of Economic Census Data - Noha Samy - Mai El Mossallamy - Ta...Overview of Uses of Economic Census Data - Noha Samy - Mai El Mossallamy - Ta...
Overview of Uses of Economic Census Data - Noha Samy - Mai El Mossallamy - Ta...
 
Economic Censuses in the World, Mohamed Ismail and Nany Abd El Kader
Economic Censuses in the World, Mohamed Ismail and Nany Abd El KaderEconomic Censuses in the World, Mohamed Ismail and Nany Abd El Kader
Economic Censuses in the World, Mohamed Ismail and Nany Abd El Kader
 
التعداد الإقتصادى الرابع لمصر 2012/2013 - خالد ماهر
التعداد الإقتصادى الرابع لمصر 2012/2013 - خالد ماهر التعداد الإقتصادى الرابع لمصر 2012/2013 - خالد ماهر
التعداد الإقتصادى الرابع لمصر 2012/2013 - خالد ماهر
 
Economic exploitation and happiness
Economic exploitation and happinessEconomic exploitation and happiness
Economic exploitation and happiness
 
Stata Learning From Treiman
Stata Learning From TreimanStata Learning From Treiman
Stata Learning From Treiman
 
Stata tutorial university of princeton
Stata tutorial university of princetonStata tutorial university of princeton
Stata tutorial university of princeton
 
Relationship between economic growth and happiness
Relationship between economic growth and happinessRelationship between economic growth and happiness
Relationship between economic growth and happiness
 
Economic Census and the Economic Indicators - Sherine Al-Shawarby
Economic Census and the Economic Indicators - Sherine Al-ShawarbyEconomic Census and the Economic Indicators - Sherine Al-Shawarby
Economic Census and the Economic Indicators - Sherine Al-Shawarby
 
STATA - Importing Data
STATA - Importing DataSTATA - Importing Data
STATA - Importing Data
 
Narrative inquiry
Narrative inquiryNarrative inquiry
Narrative inquiry
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research data
 
شروط وخطوات برنامج الدعم السكني وزارة الاسكان السعودية http://www.thaqfny.co...
شروط وخطوات برنامج الدعم السكني وزارة الاسكان السعودية  http://www.thaqfny.co...شروط وخطوات برنامج الدعم السكني وزارة الاسكان السعودية  http://www.thaqfny.co...
شروط وخطوات برنامج الدعم السكني وزارة الاسكان السعودية http://www.thaqfny.co...
 
STATA - Merge or Drop Data
STATA - Merge or Drop DataSTATA - Merge or Drop Data
STATA - Merge or Drop Data
 
STATA - Download Examples
STATA - Download ExamplesSTATA - Download Examples
STATA - Download Examples
 
STATA - Linear Regressions
STATA - Linear RegressionsSTATA - Linear Regressions
STATA - Linear Regressions
 

Similar to Using STATA in Survey Data Analysis - Niveen El Zayat

Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
albertlaporte
 

Similar to Using STATA in Survey Data Analysis - Niveen El Zayat (20)

导论1
导论1导论1
导论1
 
sampling techniques.pptx
sampling techniques.pptxsampling techniques.pptx
sampling techniques.pptx
 
sampling techniques.pptx
sampling techniques.pptxsampling techniques.pptx
sampling techniques.pptx
 
Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
 
Stat-Lesson.pptx
Stat-Lesson.pptxStat-Lesson.pptx
Stat-Lesson.pptx
 
Sampling techniques
Sampling techniques  Sampling techniques
Sampling techniques
 
Basic & Simple Quality management tools
Basic & Simple Quality management toolsBasic & Simple Quality management tools
Basic & Simple Quality management tools
 
Unit 3 Sampling
Unit 3 SamplingUnit 3 Sampling
Unit 3 Sampling
 
Sampling design
Sampling designSampling design
Sampling design
 
Unit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptxUnit 1 - Statistics (Part 1).pptx
Unit 1 - Statistics (Part 1).pptx
 
Adv.-Statistics-2.pptx
Adv.-Statistics-2.pptxAdv.-Statistics-2.pptx
Adv.-Statistics-2.pptx
 
Sampling techniques new
Sampling techniques newSampling techniques new
Sampling techniques new
 
Sampling techniques new
Sampling techniques newSampling techniques new
Sampling techniques new
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inference
 
BIOSTATISTICS.pptx sidhathab.pptx oral pathology
BIOSTATISTICS.pptx sidhathab.pptx oral pathologyBIOSTATISTICS.pptx sidhathab.pptx oral pathology
BIOSTATISTICS.pptx sidhathab.pptx oral pathology
 
Introduction To Statistics.ppt
Introduction To Statistics.pptIntroduction To Statistics.ppt
Introduction To Statistics.ppt
 
Sampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptxSampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptx
 
2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data
 
Biostats in ortho
Biostats in orthoBiostats in ortho
Biostats in ortho
 
Bio stat
Bio statBio stat
Bio stat
 

More from Economic Research Forum

More from Economic Research Forum (20)

Session 4 farhad mehran, single most data gaps
Session 4 farhad mehran, single most data gapsSession 4 farhad mehran, single most data gaps
Session 4 farhad mehran, single most data gaps
 
Session 3 mahdi ben jelloul, microsimulation for policy evaluation
Session 3 mahdi ben jelloul, microsimulation for policy evaluationSession 3 mahdi ben jelloul, microsimulation for policy evaluation
Session 3 mahdi ben jelloul, microsimulation for policy evaluation
 
Session 3 m.a. marouani, structual change, skills demand and job quality
Session 3 m.a. marouani, structual change, skills demand and job qualitySession 3 m.a. marouani, structual change, skills demand and job quality
Session 3 m.a. marouani, structual change, skills demand and job quality
 
Session 3 ishac diwn, bridging mirco and macro appraoches
Session 3 ishac diwn, bridging mirco and macro appraochesSession 3 ishac diwn, bridging mirco and macro appraoches
Session 3 ishac diwn, bridging mirco and macro appraoches
 
Session 3 asif islam, jobs flagship report
Session 3 asif islam, jobs flagship reportSession 3 asif islam, jobs flagship report
Session 3 asif islam, jobs flagship report
 
Session 2 yemen hlel, insights from tunisia
Session 2 yemen hlel, insights from tunisiaSession 2 yemen hlel, insights from tunisia
Session 2 yemen hlel, insights from tunisia
 
Session 2 samia satti, insights from sudan
Session 2 samia satti, insights from sudanSession 2 samia satti, insights from sudan
Session 2 samia satti, insights from sudan
 
Session 2 mona amer, insights from egypt
Session 2 mona amer, insights from egyptSession 2 mona amer, insights from egypt
Session 2 mona amer, insights from egypt
 
Session 2 ali souag, insights from algeria
Session 2 ali souag, insights from algeriaSession 2 ali souag, insights from algeria
Session 2 ali souag, insights from algeria
 
Session 2 abdel rahmen el lahga, insights from tunisia
Session 2 abdel rahmen el lahga, insights from tunisiaSession 2 abdel rahmen el lahga, insights from tunisia
Session 2 abdel rahmen el lahga, insights from tunisia
 
Session 1 ragui assaad, moving beyond the unemployment rate
Session 1 ragui assaad, moving beyond the unemployment rateSession 1 ragui assaad, moving beyond the unemployment rate
Session 1 ragui assaad, moving beyond the unemployment rate
 
Session 1 luca fedi, towards a research agenda
Session 1 luca fedi, towards a research agendaSession 1 luca fedi, towards a research agenda
Session 1 luca fedi, towards a research agenda
 
من البيانات الى السياسات : مبادرة إتاحة البيانات المنسقة
من البيانات الى السياسات : مبادرة إتاحة البيانات المنسقةمن البيانات الى السياسات : مبادرة إتاحة البيانات المنسقة
من البيانات الى السياسات : مبادرة إتاحة البيانات المنسقة
 
The Future of Jobs is Facing the Biggest Policy Induced Price Distortion in H...
The Future of Jobs is Facing the Biggest Policy Induced Price Distortion in H...The Future of Jobs is Facing the Biggest Policy Induced Price Distortion in H...
The Future of Jobs is Facing the Biggest Policy Induced Price Distortion in H...
 
Job- Creating Growth in the Emerging Global Economy
Job- Creating Growth in the Emerging Global EconomyJob- Creating Growth in the Emerging Global Economy
Job- Creating Growth in the Emerging Global Economy
 
The Role of Knowledge in the Process of Innovation in the New Global Economy:...
The Role of Knowledge in the Process of Innovation in the New Global Economy:...The Role of Knowledge in the Process of Innovation in the New Global Economy:...
The Role of Knowledge in the Process of Innovation in the New Global Economy:...
 
Rediscovering Industrial Policy for the 21st Century: Where to Start?
Rediscovering Industrial Policy for the 21st Century: Where to Start?Rediscovering Industrial Policy for the 21st Century: Where to Start?
Rediscovering Industrial Policy for the 21st Century: Where to Start?
 
How the Rise of the Intangibles Economy is Disrupting Work in Africa
How the Rise of the Intangibles Economy is Disrupting Work in AfricaHow the Rise of the Intangibles Economy is Disrupting Work in Africa
How the Rise of the Intangibles Economy is Disrupting Work in Africa
 
On Ideas and Economic Policy: A Survey of MENA Economists
On Ideas and Economic Policy: A Survey of MENA EconomistsOn Ideas and Economic Policy: A Survey of MENA Economists
On Ideas and Economic Policy: A Survey of MENA Economists
 
Future Research Directions for ERF
Future Research Directions for ERFFuture Research Directions for ERF
Future Research Directions for ERF
 

Recently uploaded

Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

An Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCCAn Atoll Futures Research Institute? Presentation for CANCC
An Atoll Futures Research Institute? Presentation for CANCC
 
1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS
 
Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
Call Girls in Chandni Chowk (delhi) call me [9953056974] escort service 24X7
 
A Press for the Planet: Journalism in the face of the Environmental Crisis
A Press for the Planet: Journalism in the face of the Environmental CrisisA Press for the Planet: Journalism in the face of the Environmental Crisis
A Press for the Planet: Journalism in the face of the Environmental Crisis
 
Top Rated Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
Top Rated  Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...Top Rated  Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
Top Rated Pune Call Girls Hadapsar ⟟ 6297143586 ⟟ Call Me For Genuine Sex Se...
 
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'IsraëlAntisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
 
Regional Snapshot Atlanta Aging Trends 2024
Regional Snapshot Atlanta Aging Trends 2024Regional Snapshot Atlanta Aging Trends 2024
Regional Snapshot Atlanta Aging Trends 2024
 
World Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - PosterWorld Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - Poster
 
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
VIP Model Call Girls Narhe ( Pune ) Call ON 8005736733 Starting From 5K to 25...
 
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Nanded City Call Me 7737669865 Budget Friendly No Advance Booking
 
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
Russian🍌Dazzling Hottie Get☎️ 9053900678 ☎️call girl In Chandigarh By Chandig...
 
The U.S. Budget and Economic Outlook (Presentation)
The U.S. Budget and Economic Outlook (Presentation)The U.S. Budget and Economic Outlook (Presentation)
The U.S. Budget and Economic Outlook (Presentation)
 
Akurdi ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Akurdi ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Akurdi ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Akurdi ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Agra 7001035870 Whatsapp Number, 24/07 Booking
 
Finance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCCFinance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCC
 
Booking open Available Pune Call Girls Shukrawar Peth 6297143586 Call Hot In...
Booking open Available Pune Call Girls Shukrawar Peth  6297143586 Call Hot In...Booking open Available Pune Call Girls Shukrawar Peth  6297143586 Call Hot In...
Booking open Available Pune Call Girls Shukrawar Peth 6297143586 Call Hot In...
 
Pimpri Chinchwad ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi R...
Pimpri Chinchwad ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi R...Pimpri Chinchwad ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi R...
Pimpri Chinchwad ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi R...
 
Scaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP processScaling up coastal adaptation in Maldives through the NAP process
Scaling up coastal adaptation in Maldives through the NAP process
 
Top Rated Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated  Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Top Rated  Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated Pune Call Girls Bhosari ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
 
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
 

Using STATA in Survey Data Analysis - Niveen El Zayat

  • 1. Using STATA in Survey Data Analysis NIVEEN EL ZAYAT ASSISTANT LECTURER DEPARTMENT OF STATISTICS FACULTY OF ECONOMICS AND POLITICAL SCIENCE
  • 2. Outlines  Aspects of survey data  Survey Designs  Setting survey design in STATA: svyset  Describing survey design: svydes  Estimation commands using prefix svy: (In STATA do-file)  Estimation for subpopulations (In STATA do-file)  Extracting survey-weighted tables and graphs (In STATA do-file)
  • 3. Aspects of survey data  What is Survey?  A research technique to draw information about well-defined population through selecting a sample that systematically questioned, hence, results are analyzed and generalized to the population.  Surveys necessary for providing decision-makers with information that serves the planning, monitoring and evaluation purposes  Survey data is collected through either “complete enumeration” or “sampling”.  Complete enumeration indicates to collecting data from all units exist in the population.  Sampling indicates to collecting data from a subset of the population and designed to be hopefully representative to that population and used to determine truths about it.  Censuses are examples of complete enumeration while Demographic Health Survey (DHS) and Labor Force Sample Survey are examples of sample surveys.
  • 4. Aspects of survey data  Populations in research  Target population: 1. Refers to the ENTIRE group of individuals that researchers are interested in. 2. It is known as the theoretical population.  Accessible population: 1. Refers to the one that researchers can get access to their individuals. 2. It is a subset of the target population and also known as the study population. 3. Researchers actually draw their samples form it thus it is define by the sample frame.
  • 5. Aspects of survey data  Why Sample Survey Data?  Assessing entire population may be impossible:  Impractical; in case of infinite population and population with destructive observations.  Data are wasteful if they are not collected within time limit.  Expensive; a sample survey will be less costly than complete enumeration.  Limited recourses (money and human) and extremely large workload.  Cause a lot of errors to control and monitor.  Cause destructiveness of the observations.  Lists (frame) are rarely up to date.  Due to the above reasons, samples are the wonderful option.
  • 6. Aspects of survey data  Probability (random) Samples  Every unit of the population at any stage is drawn with known probability. Inference about population parameter can be easily drawn based on “probability theory”.  There are two sampling procedures: 1. Sampling with replacement (with finite population) 2. Sampling without replacement (with infinite population)
  • 7. Survey Designs ► Simple Survey Design: SRS  Data from a single-round survey analyzed with limited reference to other information (aka “flat” or “rectangle”)  Every unit in the population has an equal chance of being part of the sample.  Data collection is very simple, it just needs sample frame to the study population.  Often lead to inaccurate point estimates and/or inaccurate standard errors if population is not homogeneous.  Most standard statistical methods assume simple random sampling Complex or multi-stage designSimple survey design Stratified Sample Simple Random Sample (SRS) Cluster Sample Stratified-cluster sample
  • 8. ► Complex (multi-stages/ hierarchical) designs  Drawing SRS is sometimes impossible (e.g., there is indicative frame to the population elements).  Different elements may have different probability of selection into the sample due to population nature.  Used with hierarchical data (HH surveys).  Many statistical procedures assume i.i.d. , moreover, many statistical packages treat data as SRS.  Elements are not samples independently in most surveys as one may need to select groups of individual. Survey Designs
  • 9. Survey Designs ► Stratification  The entire studied population is divided into well-defined groups called “strata” based on a relevant characteristic often based on geographically (region) or demographic variables (gender, level of education or SES).  Sampling units are independently and randomly sampled from within each stratum with different probability of selection.  It is usually results in smaller variance and standard errors than that of SRS.  An example is to stratify the population by locality (urban/rural).
  • 10. Survey Designs ► Clustering  Set of individuals (regions, districts, city blocks or households) are sampled as a group (cluster) then population elements are drawn from the selected cluster.  Further subsamples within cluster may be drawn (often called multi-stage design).  The highest level of cluster is referred to as Primary Sampling Unit (PSU)  The lower level of cluster is referred to as Secondary Sampling Unit (SSU)  An example when geographical regions, such as local government areas, are selected in the first stage. In the second stage schools were selected. In the third stage, the unit of analysis - perhaps teachers or students are sampled. Regions represent PSUs in this example.  Different sample techniques may be applied at different stages which increase sample-to- sample variability and lead to higher variance and standard errors.
  • 12. Survey Designs ► Exercise (Name each of the following design)
  • 13. Survey Designs ► Exercise (Name each of the following design)
  • 14. Survey Designs ► Features of survey designs  Probability (sample) weights  The most common is the sampling weight (aka probability weight) which is used to weight the sample back to the population from which the sample was drawn.  By definition, this weight is the inverse of the sample fraction (N/n).  In a two-stage design, the probability weight is calculated as f1f2, which means that the inverse of the sampling fraction for the first stage is multiplied by the inverse of the sampling fraction for the second stage.  In actual survey data sets, the "final weight" usually starts with the inverse of the sampling fraction, then several adjustments may be applied to account for sampling design problems such as unit non-response, errors in the sampling frame (aka non-coverage) or post-stratification.
  • 15. Survey Designs ► Features of survey designs  Finite population correction (FPC)  It is an adjustment applied to the variance due to sampling without replacement from finite population. Based on central limit theorem, FPC is calculated as: FPC = [ (N-n)/(N-1) ]1/2.  PC is usually applied when sample fraction (n/N) is large otherwise when n is small relative to the population size N, the FPC is almost close to 1, it will have a little impact and can be safely ignored.  For multi-stage survey design, one may apply FPC at one or more stages.
  • 16. Survey Designs ► Features of survey designs  Design effect (DEFF)  Standard errors under different sample designs are compared using design effect statistics. For complex samples, this is typically carried out by drawing comparisons to a hypothetical simple random sample (SRS) of the same size.  It is computed as the ratio of the variance of an estimate θ (based on complex design) to the variance of an estimate θ from a simple random sample (SRS) of the same size; DEEF=Var(θDesign)/Var(θSRS).  Design factor (DEFT) The square root of the design effect; DEFT=(D.EEF)½ which sets things back to the scale of standard errors.  DEFT=1 (No effect of sample design on standard errors).  DEFT>1 (Sample design increase/ inflate standard errors).  DEFT<1 (Sample design reduces standard errors).
  • 17. Setting survey design in STATA: svyset  STATA Syntax The command svyset (declare data as survey data) is used to identify the sample design features of your data to STATA. It allows us to identify a wide range of complex sampling designs. Single-stage design: svyset [psu_varname] [weight_var] [, strata(varname) fpc(varname) options] Multiple-stage design svyset psu_var [weight_var] [, design_options options] [|| ssu_var , design_options] Once the data saved with the survey design it will be a part of dataset until they are cleared or changed or a new dataset is loaded into memory.
  • 18. Setting survey design in STATA: svyset  One-stage design  Example 1: The National Maternal and Infant Health Survey (NMIHS) 1988: Data file name: NMIHS.dta (represents a sample of 9,953 live births) population was divided into 6 strata according to the subdomains of two birth demographic variables; race (black & non black) and birth weight (<1,500 g, 1,500-2,499 g & 2,500+ g), a systematic samples of live birth that were restricted to women 15+ years of age and that were registered in 48 States. Svyset no survey characteristics are set svyset [pw=finwgt], strata(stratan) pweight: finwgt VCE: linearized Single unit: missing Strata 1: stratan SU 1: <observations> FPC 1: <zero> This is an example of stratified systematic design. The weight variable was adjusted to get “finwgt” (see Table N, look at p23 in the report pdf file).
  • 19. Setting survey design in STATA: svyset  Two-stage design  Example 2: Oman World Health Survey (OWHS) 2008: Data file name: OWHS_chronic.dta [40% of the original sample], the design of the survey was set as a part of the data set svyset pweight: adjINDweight VCE: linearized Single unit: missing Strata 1: v022 SU 1: v021 FPC 1: <zero> This is an example of stratified cluster design. Strata is represented by “v022”, clusters by “v021” and the weight variable by “adjINDweight”.
  • 20. Setting survey design in STATA: svyset  Exercises 1: We want to know the income per household in a certain city, and we don’t have a list of households. Instead of trying to create a list of households, it would be more practical to sample blocks. Each block would be considered a sampling unit. Assuming a FPC variable exist in the data set, write the STATA command declaring such survey design?  Exercises 2: In above example, instead of having sampling clusters from the city, we first divided the city into regions and then, within each region, we sampled blocks (eventually with different criteria among regions). Assuming a FPC variable exist in the data set, write the STATA command declaring such survey design? ► More survey design (Hypothetical Exercises) svyset block [pw = pwvar], fpc(fpcvar) svyset block [pw = pwvar], strata(region) fpc(fpcvar)
  • 21. Setting survey design in STATA: svyset  Exercises 3: We want to perform a survey on the eating habits of children attending elementary schools. A possible design would be: perform samples independently on each state. For each state, perform a random sample of counties. Within each county, perform a random sample of schools, and interview each student for the selected schools. Assuming a FPC variables exist at each stage, write the STATA command declaring such design?  Exercises 4: In above example, if within each school we stratify per grade and sample students independently on each grade, then we need to add another level. Assuming a FPC variables exist at each stage, write the STATA command declaring such design? ► More survey design (Hypothetical Exercises) svyset county [pw = pwvar], strata(state) fpc(fpcvar) || school, fpc(fpcvar2) svyset county [pw = pwvar], strata(state) fpc(fpcvar) || school, fpc(fpcvar2) || student, fpc(fpcvar3) strata(grade)
  • 22. Describing survey design: svydes  STATA Syntax The command svydes describe the survey design that was previously declared to the data set by svyset. svyset [varlist], [stage(#) finalstage single option] For multistage design it describe the design for each stage by determining the number of the stage by option [stage(#)]. Option [single] used to only list strata with single PSUs (singleton). Generally, It adds “*” at the strata id variable to show that it has a single PSU.
  • 23. Describing survey design: svydes  Two-stage design use data set: OWHS_chronic.dta , STATA command Svydes  Stratum is the stratum id number given by the strata variable;  #units is the number of PSUs in the strata and  #Obs the number of PSUs in a given stratum.  The other columns give some summary statistics on the number of observations.  The important thing to note here: if strata have singleton PSUs then #units will =1. This means they only include one PSU which also indicated by a "*"