Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Gilligan quantitative impact eval methods
1. INTERNATIONAL FOOD POLICY RESEARCH INSTITUTE
Quantitative Impact
Evaluation Methods
Dan Gilligan, IFPRI
INTERNATIONAL LIVESTOCK RESEARCH INSTITUTE
2. An Introduction to
Quantitative Impact Evaluation
I. Why is impact evaluation important?
• What are appropriate goals for an impact
evaluation?
• Monitoring and evaluation
II. How do you design an impact evaluation?
• The evaluation problem
• Measuring causal impact
• Impact evaluation methodologies
3. Introduction (cont‟d)
III. Impact Evaluation and Measurement Tools
• Choice of evaluation estimator
• Data requirements
• How to randomize
• Sample design
• Sample size
4. What are appropriate goals for
an impact evaluation?
Measure impact on important outcomes
• Need a limited set of outcome indicators that are easy to
measure
Estimate the program‟s cost effectiveness
Explain which components of a program work best
Caution:
• Evaluations can only answer a limited number of questions
• Evaluations sometimes cannot explain what caused the
impacts
Effective monitoring and qualitative assessments help to
explain the context for impact evaluation results
5. Indicators for Monitoring and Evaluation
IMPACT
OUTPUTS
OUTCOMES
INPUTS
Effect on living standards
-better welfare impacts (e.g literacy, health)
- increase in participation, happiness
Financial and physical resources
- track resources used in the intervention
- e.g. budget support for local service delivery
Goods and services generated
- more local government services delivered
- e.g., textbooks, food delivered, roads built
Access, usage and satisfaction of users
- e.g. school attendance, vaccination rates,
- food consumption, number of mobile phones
EvaluationMonitoring
6. II. How do you design an
impact evaluation?
The central problem of impact evaluation
• Want to measure the impact of a program or
“treatment” on outcomes
• How do we know measured impacts are due to the
program?
• If we want to claim that the impacts observed are
causal, we need an „identification strategy‟—a way
to attribute the observed effects to the program
and not to other factors
7. II. How do you design an
impact evaluation?
Designing the impact evaluation
• Measure impact by comparing outcomes in households
exposed to the treatment to what those outcomes
would have been without that exposure—the
counterfactual
• Problem: you cannot observe the counterfactual
because program beneficiaries receive the treatment
• Need to construct a comparison group from
nonbeneficiaries
• Comparison group makes it possible to control for other
factors that affect the outcome
Ex: IFPRI evaluated the effect of Ethiopia‟s public works
(PSNP) on food consumption, but food prices rose at the
same time; use comparison group to remove the effect of
rising prices on food consumption in impact estimates
8. Suppose we observe an increase in outcome Y
for beneficiaries over time after an intervention
Y0
Y1
baseline(t0) follow-up(t1)
Intervention
(observed)
9. To measure impact, we need to remove the
counterfactual from the observed outcome
Y0
Y1
baseline(t0) follow-up(t1)
Intervention
(observed)
Y1
*
Impact=
Y1-Y1
*
(counterfactual)
Comparison
10. What You Can Miss Without a
Comparison Group
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
SFP THR CTR
% Round 1
Round 2
-3.4
13.9
-5.3
Impact:
SFP -19.2%
THR -17.2%
(*Anemic = hemoglobin<11g/dL)
Impact on School Feeding on Anemia Prevalence of Girls Age 10-13
11. Constructing a Comparison Group
Suppose we want to measure the impact of public works
on household food security (calorie consumption)
Q: Why not compare average calorie consumption of PW
beneficiaries to average calorie consumption of randomly
selected nonbeneficiaries?
A: On average, nonbeneficiaries are different from
beneficiaries in ways that make them an ineffective
comparison group
Need to correct for pre-program differences between
beneficiaries and nonbeneficiaries
• Beneficiaries are usually poorer; they also decided to participate
• If you don‟t control for this, impact estimates are biased
12. Impact Evaluation Methodologies
Ways of constructing a control or comparison group
Randomization
Matching (including propensity score matching,
covariate matching)
Regression discontinuity design (RDD)
Instrumental variables
Difference-in-differences
13. Impact Evaluation Methodologies
Randomization
• Randomly assign communities or households into treatment
and control groups before the program for the purpose of
evaluation
random assignment makes it likely that treatment and
control communities have identical characteristics on
average at baseline
for safety nets, usually randomize at the community
level
• Common approach: use phased rounds of program
implementation and randomly decide which communities
enter the program in each round
• Example of randomization from N. Uganda school feeding
study
14. Impact Evaluation Methodologies
Randomization
• How do you justify having a control group?
Justified if program cannot reach all communities at once
Some communities are always excluded
Main difference between control group and other
nonbeneficiaries is that you interview the control group
Ex: transparency in Nicaragua RPS evaluation. Randomization
done in public with media and politicians present
• There is consensus that a randomized out control group
provides the best estimate of counterfactual outcomes
Results of a good randomized evaluation will be convincing to
everyone: you have solid evidence of the impact of the
program
15. Impact Evaluation Methodologies
Matching
• Match beneficiary and nonbeneficiary households by
characteristics observed in a survey
• Estimate impact as the difference in weighted average
outcomes between beneficiaries and matched
nonbeneficiaries
• Propensity score matching matches households on
estimated probability of being in the program
• With matching, the quality of the evaluation depends
heavily on the quality of the data: not as convincing as
randomization
17. Impact Evaluation Methodologies
Many of the projects being presented here may be able
to rely on matching methods for their evaluation
• Need detailed data from the baseline or on variables
that change very little over time (adult education level)
Tips on Using Propensity Score Matching
• Need variables that are correlated with the outcome and
with the treatment
• Comparison households should come from the same
community as treated households if possible; otherwise
include many community-level variables
18. Impact Evaluation Methodologies
Regression Discontinuity Design (RDD)
If program eligibility is based on threshold for
some characteristic (e.g., poverty index),
compare outcomes for households just above
and just below the threshold
More useful for poverty programs targeted on
easily observable and measureable criteria
» poverty score, proxy means score, food insecurity
score
19. How RDD Measures Impact
Before start of the program
0510152025
20 25 30 35 40 45
Poverty Score
Pr(CompleteSecondarySchool)
20. How RDD Measures Impact
After the program
0510152025
20 25 30 35 40 45
Poverty Score
Pr(CompleteSecondarySchool)
beneficiariesnonbeneficiaries
21. How RDD Measures Impact
After the program
0510152025
20 25 30 35 40 45
Poverty Score
Pr(CompleteSecondarySchool)
beneficiariesnonbeneficiaries
IMPACT
22. Example of RDD from El Salvador
RPS Evaluation
Figure 4. Change in enrollment rate of 7-12 year olds from 2006-2007 by distance from
implied cluster threshold, 2006 and 2007 entry groups
Source: Impact Evaluation Survey Data, 2008
-.05
0
.05
.1
ChangeinEnrollmentRate
-10 -5 0 5 10 15
Distance to Cluster Threshold
2006 2007
23. Difference-in-Differences (DID)
Using any evaluation method, measure outcomes
before and after the program begins to obtain
“difference-in-differences” (DID) impact estimates
Impact = (T1-T0)-(C1-C0)
24. Cost Effectiveness
Comparisons of programs should focus on cost
effectiveness.
• Cost effectiveness is most relevant for policy: Which
program has the biggest impact per dollar spent?
• Impact evaluation methodology focuses on measuring
program benefits—one side of cost effectiveness.
Would need to add a cost study similar to
Caldés, Coady and Maluccio, IFPRI, 2004.