Whast goes up must come down: challenges of getting evidence back to the ground
Reviewing quality of evidence in humanitarian evaluations (Juliet Parker, Christian Aid, and David Sanderson, Oxford Brookes Uni)
1. Reviewing the quality of evidence
in humanitarian evaluations
Review of four evaluations
Juliet Parker, Christian Aid
David Sanderson, CENDEP, Oxford Brookes University
ALNAP, March 2013
2. Four parts
1. Why did Christian Aid want to do
this?
2. The evidence assessment tool
3. Quality of evidence - assessing four
evaluations
4. So what for Christian Aid?
3. 1. Why do this?
We want to improve the quality of our
evaluations:
• For our own analysis and decision making
• To get our money’s worth from evaluation
consultants(!)
• As part of a challenge to, and move across,
the sector
4. 2. The tool used
BOND’s ‘checklist for assessing the quality of
evidence:’
• Developed between 2011-12 through
NGO and donor consultation
• Five principles, four questions for each
that are scored on a scale of 1-4 …
5. Five principles
• Voice and inclusion – ‘the perspectives of people living in
poverty, including the most marginalised, are included in
the evidence, and a clear picture is provided of who is
affected and how’
• Appropriateness – ‘the evidence is generated through
methods that are justifiable given the nature of the
purpose of the assessment’
• Triangulation – ‘the evidence has been generated using a
mix of methods, data sources, and perspectives’
• Contribution – ‘the evidence explores how change
happens and the contribution of the intervention and
factors outside the intervention in explaining change’
• Transparency - ‘the evidence discloses the details of the
data sources and methods used, the results
achieved, and any limitations in the data or conclusions’
6. Checklist to assess evidence quality
Evidence being assessed: ………………………….. Name of assessor: …………………………..
Principle Criteria 1 2 3 4 Comments / evidence
1) Voice and 1a. Are the perspectives of beneficiaries included in the 1 2 3 4
Inclusion evidence?
We present 1b. Are the perspectives of the most excluded and 1 2 3 4
beneficiaries’ views marginalised groups included in the evidence?
on the effects of
the intervention, 1c. Are the findings disaggregated according to sex, 1 2 3 4
and identify who disability and other relevant social differences?
has been affected 1d. Did beneficiaries play an active role in the assessment 1 2 3 4
and how process?
Score for voice and inclusion: 0/16
2a. Are the data collection methods relevant to the 1 2 3 4
2) Appropriateness purpose of the assessment and do they generate reliable
data?
We use methods 2b. Is the size and composition of the sample in 1 2 3 4
that are justifiable proportion to the conclusions sought by the assessment?
given the nature of 2c. Does the team have the skills and characteristics to 1 2 3 4
the intervention deliver high quality data collection and analysis?
and purpose of the 2d. Do the methods for analysis unpack the data it in a 1 2 3 4
assessment systematic way and produce convincing conclusions?
Score for appropriateness: 0/16
3a. Are different data collection methodologies used and 1 2 3 4
3) Triangulation different types of data collected?
3b. Are the perspectives of different stakeholders 1 2 3 4
We make compared and analysed in establishing if and how change
conclusions about has occurred?
the intervention’s 3c. Are conflicting findings and divergent perspectives 1 2 3 4
effects by using a presented and explained in the analysis and conclusions?
mix of methods, 3d. Are the findings and conclusions of the assessment 1 2 3 4
data sources, and shared with and validated by a range of key stakeholders
perspectives (eg. beneficiaries, partners, peers)?
Score for triangulation: 0/16
4a. Is a point of comparison used to show that change 1 2 3 4
4) Contribution has happened (eg. a baseline, a counterfactual,
comparison with a similar group)?
We can show how 4b. Is the explanation of how the intervention 1 2 3 4
change happened contributes to change explored?
and explain how 4c. Are alternative factors (eg. the contribution of other 1 2 3 4
we contributed to actors) explored to explain the observed result alongside
this an intervention’s contribution?
4d. Are unintended and unexpected changes (positive or 1 2 3 4
negative) identified and explained?
Score for contribution: 0/16
5a. Is the size and composition of the group from which 1 2 3 4
5) Transparency data is collected explained and justified?
We are open 5b. Are the methods used to collect and analyse data and 1 2 3 4
about the data any limitations of the quality of the data and collection
sources and methodology explained and justified?
methods used, the 5c. Is it clear who has collected and analysed the data and 1 2 3 4
results achieved, is any potential bias they may have explained and
and the strengths justified?
and limitations of 5d. Is there a clear logical link between the conclusions 1 2 3 4
the evidence presented and the data collected?
Score for transparency: 0/16
7. Checklist for criteria
(eg. of voice and appropriateness)
1 2 3 4
Weak evidence Minimum standard of evidence Good standard of evidence Gold standard evidence
1a. Are the perspectives of No beneficiary perspectives Beneficiary perspectives presented, Beneficiary perspectives presented and Beneficiary perspectives presented and
beneficiaries included in the presented but not integrated into analysis integrated into analysis integrated into analysis, and beneficiaries have
evidence? validated the findings; the evidence is strongly
grounded in the voices of the poor
1b. Are the perspectives of the most No perspectives from most Perspectives from most excluded Perspectives from most excluded Perspectives from most excluded groups
Voice and Inclusion
excluded and marginalised groups excluded groups presented groups presented, but not integrated groups presented and integrated into presented and integrated into analysis, and
included in the evidence? into analysis analysis excluded groups have validated the findings;
the evidence is strongly grounded in the voices
of the most excluded
1c. Are the findings disaggregated No disaggregation of findings Findings are disaggregated, but a Findings are disaggregated according to Findings are disaggregated according to all
according to sex, disability and other by social differences number of social differences relevant all social differences relevant to the social differences relevant to the intervention,
relevant social differences? to the intervention are missing intervention and why these have been chosen has been
1)
clearly explained
1d. Did beneficiaries play an active Beneficiaries had no Beneficiaries had involvement in one Beneficiaries had involvement in two of Beneficiaries had involvement in all of the
role in the assessment process? involvement in the assessment of the following: (1) designing the the following: (1) designing the process following: (1) designing the process (2)
process process (2) analysing the data (3) (2) analysing the data (3) formulating analysing the data (3) formulating the
formulating the conclusions the conclusions conclusions
2a. Are the data collection methods The methods of data collection The methods of data collection are Methods of data collection are relevant Methods of data collection are relevant to the
relevant to the purpose of the are not relevant to the purpose relevant to the purpose of the to the purpose of the assessment and purpose of the assessment and generate highly
assessment and do they generate of the assessment and/or the assessment, but there is uncertainty generate reliable data reliable data; there has been appropriate
reliable data? data is unreliable about the reliability of some of the quality control of the data (eg spot checks,
data training data collectors)
Appropriateness
2b. Is the size and composition of the Conclusions are not in Conclusions claim no more than the Conclusions are in proportion to the Conclusions are in proportion to the size and
sample in proportion to the proportion to the size and size and composition of the sample size and composition of the sample and composition of the sample and have a high
conclusions sought by the composition of the sample and allows, but there is uncertainty about are valid degree of validity
assessment? lack validity their validity
2c. Does the team have the skills and There are doubts about the The combined team appear to have The combined team have The combined team have demonstrated both
characteristics to deliver high quality skills and/or characteristics of the necessary skills and demonstrated the necessary skills and exceptional skills and the characteristics
2)
data collection and analysis? the combined team characteristics characteristics necessary for the task
2d. Is the data analysed in a The method through which the The data is analysed through a clear The data is analysed through a clear The data is analysed through a clear and
systematic way that leads to data is analysed is not clear and method, but not every conclusion is and systematic method that produces systematic method that produces convincing
convincing conclusions? the conclusions are not wholly convincing convincing conclusions in all key areas conclusions in all key areas; there is a detailed
convincing analysis of the implications of the conclusions
8. Review of four evaluations
1. DRC Final phase evaluation, August 2011
(assistance to conflict and displacement)
2. Tropical storms in the Philippines end-of-
project evaluation, October 2011
(response to typhoon Ketsana)
3. Middle East Crisis Impact Evaluation final
report, May 2011 (Gaza crisis)
4. Sudan Appeal End of term evaluation,
April 2011 (conflict in Darfur)
9. Principle Criteria D M P S
1a. Are the perspectives of beneficiaries included in the evidence? 3 3 2 1
1) Voice and Inclusion 1b. Are the perspectives of the most excluded and marginalised groups 1 1 1 1
We present beneficiaries’ included in the evidence?
views on the effects of
the intervention, and 1c. Are the findings disaggregated according to sex, disability and other 1 1 1 1
identify who has been relevant social differences?
affected and how 1d. Did beneficiaries play an active role in the assessment process? 1 1 1 1
2a. Are the data collection methods relevant to the purpose of the 3 3 2 3
2) Appropriateness assessment and do they generate reliable data?
2b. Is the size and composition of the sample in proportion to the 1 4 1 1
We use methods that are conclusions sought by the assessment?
justifiable given the 2c. Does the team have the skills and characteristics to deliver high 2 3 1 2
nature of the quality data collection and analysis?
intervention and purpose 2d. Do the methods for analysis unpack the data it in a systematic way 1 3 1 1
of the assessment and produce convincing conclusions?
3a. Are different data collection methodologies used and different types 2 4 2 2
3) Triangulation of data collected?
3b. Are the perspectives of different stakeholders compared and 3 3 2 3
We make conclusions analysed in establishing if and how change has occurred?
about the intervention’s 3c. Are conflicting findings and divergent perspectives presented and 3 3 1 3
effects by using a mix of explained in the analysis and conclusions?
methods, data sources, 3d. Are the findings and conclusions of the assessment shared with and 2 1 2 3
and perspectives validated by a range of key stakeholders (eg. beneficiaries, partners,
peers)?
4a. Is a point of comparison used to show that change has happened (eg. 1 1 1 1
4) Contribution a baseline, a counterfactual, comparison with a similar group)?
4b. Is the explanation of how the intervention contributes to change 2 3 1 1
We can show how explored?
change happened and 4c. Are alternative factors (eg. the contribution of other actors) explored 2 1 1 1
explain how we to explain the observed result alongside an intervention’s contribution?
contributed to this 4d. Are unintended and unexpected changes (positive or negative) 3 2 1 1
identified and explained?
5a. Is the size and composition of the group from which data is collected 1 1 1 3
5) Transparency explained and justified?
We are open about the 5b. Are the methods used to collect and analyse data and any limitations 1 2 1 2
data sources and of the quality of the data and collection methodology explained and
methods used, the justified?
results achieved, and the 5c. Is it clear who has collected and analysed the data and is any potential 1 1 1 1
strengths and limitations bias they may have explained and justified?
of the evidence
5d. Is there a clear logical link between the conclusions presented and 2 3 1 1
the data collected?
10. Findings
Voice and inclusion
• No mention that most excluded or marginalised groups were
included
• No evaluations provided data by gender
• No mention that beneficiaries engaged in the assessment process,
eg analysing data
Appropriateness
• ‘Good’ data collection methods, involving qualitative review, focus
group discussions and review of reports
• But, no information given for sample size
Triangulation
• Data collection methods: one ‘gold standard’, three minimal level
• Varied presenting of findings back to people
11. Findings …..
Contribution
• No baselines (not unusual)
• Little/no exploration of how interventions contributed to change
• Unidentified and unexpected changes: two ‘weak’, one ‘minimal’ and
one ‘good’
Transparency
• Three evaluations were ‘weak’ in explaining the composition of the
group from which data was collected
• Data collection and analysis for two was ‘weak’ and for two ‘minimal’
• Explanation and discussion of bias was ‘weak’ for all four evaluations
12. In summary
• ‘The quality of evidence in the
evaluations was found to be low in almost
every category identified by the BOND
tool, ie voice and
inclusion, appropriateness, triangulation,
contribution and transparency.’
• ‘That does not mean the project was bad
- it means it’s hard to tell.’
13. Observations on the BOND tool
• The tool prioritises affected populations –
good for accountability
• Assumes a thorough write up of methodology
– not current practice
• Assumes no baseline means a poor
evaluation - yet for disasters this is the norm
not the exception
• Ultimately it’s subjective judgement based on
interpretation of words (academic similarity)
• … that’s the nature of the business
14. 4. So what for Christian Aid?
• Be clearer on what we’re expecting of
our evaluation consultants
• Repeat the process next year
• Improve the quality of our data
collection during programme
implementation