Gilligan quantitative impact eval methods

INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Quantitative Impact
Evaluation Methods
Dan Gilligan, IFPRI
INTERNATIONAL LIVESTOCK RESEARCH INSTITUTE

An Introduction to
Quantitative Impact Evaluation
I. Why is impact evaluation important?
• What are appropriate goals for an impact
evaluation?
• Monitoring and evaluation
II. How do you design an impact evaluation?
• The evaluation problem
• Measuring causal impact
• Impact evaluation methodologies

Introduction (cont‟d)
III. Impact Evaluation and Measurement Tools
• Choice of evaluation estimator
• Data requirements
• How to randomize
• Sample design
• Sample size

What are appropriate goals for
an impact evaluation?
 Measure impact on important outcomes
• Need a limited set of outcome indicators that are easy to
measure
 Estimate the program‟s cost effectiveness
 Explain which components of a program work best
 Caution:
• Evaluations can only answer a limited number of questions
• Evaluations sometimes cannot explain what caused the
impacts
 Effective monitoring and qualitative assessments help to
explain the context for impact evaluation results

Indicators for Monitoring and Evaluation
IMPACT
OUTPUTS
OUTCOMES
INPUTS
Effect on living standards
-better welfare impacts (e.g literacy, health)
- increase in participation, happiness
Financial and physical resources
- track resources used in the intervention
- e.g. budget support for local service delivery
Goods and services generated
- more local government services delivered
- e.g., textbooks, food delivered, roads built
Access, usage and satisfaction of users
- e.g. school attendance, vaccination rates,
- food consumption, number of mobile phones
EvaluationMonitoring

II. How do you design an
impact evaluation?
 The central problem of impact evaluation
• Want to measure the impact of a program or
“treatment” on outcomes
• How do we know measured impacts are due to the
program?
• If we want to claim that the impacts observed are
causal, we need an „identification strategy‟—a way
to attribute the observed effects to the program
and not to other factors

II. How do you design an
impact evaluation?
 Designing the impact evaluation
• Measure impact by comparing outcomes in households
exposed to the treatment to what those outcomes
would have been without that exposure—the
counterfactual
• Problem: you cannot observe the counterfactual
because program beneficiaries receive the treatment
• Need to construct a comparison group from
nonbeneficiaries
• Comparison group makes it possible to control for other
factors that affect the outcome
 Ex: IFPRI evaluated the effect of Ethiopia‟s public works
(PSNP) on food consumption, but food prices rose at the
same time; use comparison group to remove the effect of
rising prices on food consumption in impact estimates

Suppose we observe an increase in outcome Y
for beneficiaries over time after an intervention
Y0
Y1
baseline(t0) follow-up(t1)
Intervention
(observed)

To measure impact, we need to remove the
counterfactual from the observed outcome
Y0
Y1
baseline(t0) follow-up(t1)
Intervention
(observed)
Y1
*
Impact=
Y1-Y1
*
(counterfactual)
Comparison

What You Can Miss Without a
Comparison Group
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
SFP THR CTR
% Round 1
Round 2
-3.4
13.9
-5.3
Impact:
SFP -19.2%
THR -17.2%
(*Anemic = hemoglobin<11g/dL)
Impact on School Feeding on Anemia Prevalence of Girls Age 10-13

Constructing a Comparison Group
 Suppose we want to measure the impact of public works
on household food security (calorie consumption)
 Q: Why not compare average calorie consumption of PW
beneficiaries to average calorie consumption of randomly
selected nonbeneficiaries?
 A: On average, nonbeneficiaries are different from
beneficiaries in ways that make them an ineffective
comparison group
 Need to correct for pre-program differences between
beneficiaries and nonbeneficiaries
• Beneficiaries are usually poorer; they also decided to participate
• If you don‟t control for this, impact estimates are biased

Impact Evaluation Methodologies
Ways of constructing a control or comparison group
 Randomization
 Matching (including propensity score matching,
covariate matching)
 Regression discontinuity design (RDD)
 Instrumental variables
 Difference-in-differences

Randomization
• Randomly assign communities or households into treatment
and control groups before the program for the purpose of
evaluation
 random assignment makes it likely that treatment and
control communities have identical characteristics on
average at baseline
 for safety nets, usually randomize at the community
level
• Common approach: use phased rounds of program
implementation and randomly decide which communities
enter the program in each round
• Example of randomization from N. Uganda school feeding
study

Randomization
• How do you justify having a control group?
 Justified if program cannot reach all communities at once
 Some communities are always excluded
 Main difference between control group and other
nonbeneficiaries is that you interview the control group
 Ex: transparency in Nicaragua RPS evaluation. Randomization
done in public with media and politicians present
• There is consensus that a randomized out control group
provides the best estimate of counterfactual outcomes
 Results of a good randomized evaluation will be convincing to
everyone: you have solid evidence of the impact of the
program

Matching
• Match beneficiary and nonbeneficiary households by
characteristics observed in a survey
• Estimate impact as the difference in weighted average
outcomes between beneficiaries and matched
nonbeneficiaries
• Propensity score matching matches households on
estimated probability of being in the program
• With matching, the quality of the evaluation depends
heavily on the quality of the data: not as convincing as
randomization

Propensity Score Matching
0
.5
1
1.5
2
0 .2 .4 .6 .8 1
Estimated propensity score
Non-beneficiary Beneficiary
Kernel density of PPS by treatment status

Many of the projects being presented here may be able
to rely on matching methods for their evaluation
• Need detailed data from the baseline or on variables
that change very little over time (adult education level)
Tips on Using Propensity Score Matching
• Need variables that are correlated with the outcome and
with the treatment
• Comparison households should come from the same
community as treated households if possible; otherwise
include many community-level variables

Regression Discontinuity Design (RDD)
 If program eligibility is based on threshold for
some characteristic (e.g., poverty index),
compare outcomes for households just above
and just below the threshold
 More useful for poverty programs targeted on
easily observable and measureable criteria
» poverty score, proxy means score, food insecurity
score

How RDD Measures Impact
 Before start of the program
0510152025
20 25 30 35 40 45
Poverty Score
Pr(CompleteSecondarySchool)

 After the program
0510152025
20 25 30 35 40 45
Poverty Score
beneficiariesnonbeneficiaries

 After the program
0510152025
20 25 30 35 40 45
Poverty Score
beneficiariesnonbeneficiaries
IMPACT

Example of RDD from El Salvador
RPS Evaluation
 Figure 4. Change in enrollment rate of 7-12 year olds from 2006-2007 by distance from
implied cluster threshold, 2006 and 2007 entry groups
 Source: Impact Evaluation Survey Data, 2008
-.05
0
.05
.1
ChangeinEnrollmentRate
-10 -5 0 5 10 15
Distance to Cluster Threshold
2006 2007

Difference-in-Differences (DID)
 Using any evaluation method, measure outcomes
before and after the program begins to obtain
“difference-in-differences” (DID) impact estimates
Impact = (T1-T0)-(C1-C0)

Cost Effectiveness
 Comparisons of programs should focus on cost
effectiveness.
• Cost effectiveness is most relevant for policy: Which
program has the biggest impact per dollar spent?
• Impact evaluation methodology focuses on measuring
program benefits—one side of cost effectiveness.
 Would need to add a cost study similar to
Caldés, Coady and Maluccio, IFPRI, 2004.

Gilligan quantitative impact eval methods

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie Gilligan quantitative impact eval methods

Ähnlich wie Gilligan quantitative impact eval methods (20)

Mehr von genderassets

Mehr von genderassets (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Gilligan quantitative impact eval methods