The document discusses evaluation theory and its importance in guiding evaluation practice. It makes three key points:
1) Evaluation theory helps guide methodological choices by providing strategies for when and how to employ different evaluation methods.
2) Comparing evaluation theories aids in identifying debates in the field and unsettled issues in practice.
3) Evaluation theory is important for evaluators' professional identities by differentiating their skills from others and helping choose methods to achieve intended goals.
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
New microsoft word document (2)
1. Volume XI, Number 2, Summer 2005
Issue Topic: Evaluation Methodology
Theory & Practice
Evaluation Theory or What Are Evaluation Methods for?
Quick Links
• About The Evaluation Exchange
• Table of Contents
• PDF Version (152 kB)
• Request reprint permission
• How to cite
Mel Mark, professor of psychology at the Pennsylvania State University and president-elect of the American Evaluation
Association, discusses why theory is important to evaluation practice.
Hey, this issue of The Evaluation Exchange focuses on methods, including recent methodological developments. What’s
this piece on evaluation theory doing here? Was there some kind of mix-up?
No, it’s not a mistake. Although evaluation theory1 serves several purposes, perhaps it functions most importantly as a
guide to practice. Learning the latest methodological advance—whether it’s some new statistical adjustment for selection
bias or the most recent technique to facilitate stakeholder dialogue—without knowing the relevant theory is a bit like
learning what to do without knowing why or when.
What you risk is the equivalent of becoming really skilled at tuning your car’s engine without thinking about whether your
transportation needs involve going across town, overseas, or to the top of a skyscraper. Will Shadish, Tom Cook, and
Laura Leviton make the same point using a military metaphor: “Evaluation theories are like military strategies and tactics;
methods are like military weapons and logistics,” they say. “The good commander needs to know strategy and tactics to
deploy weapons properly or to organize logistics in different situations. The good evaluator needs theories for the same
reasons in choosing and employing methods.”2
The reasons to learn about evaluation theory go beyond the strategy/tactic or why/how distinction, however. Evaluation
theory does more than help us make good judgments about what kind of methods to use, under what circumstances, and
toward what forms of evaluation influence.
2. First, evaluation theories are a way of consolidating lessons learned, that is, of synthesizing prior experience. Carol
Weiss’ work can help evaluators develop a more sophisticated and nuanced understanding of the way organizations
make decisions and may be influenced by evaluation findings.3 Theories enable us to learn from the experience of others
(as the saying goes, we don’t live long enough to learn everything from our own mistakes). George Madaus, Michael
Scriven, and Daniel Stufflebeam had this function of evaluation theory in mind when they said that evaluators who are
unknowledgeable about theory are “doomed to repeat past mistakes and, equally debilitating, will fail to sustain and build
on past successes.”4
Second, comparing evaluation theories is a useful way of identifying and better understanding the key areas of debate
within the field. Comparative study of evaluation theory likewise helps crystallize what the unsettled issues are for
practice. When we read the classic exchange between Michael Patton and Carol Weiss,5 for example, we learn about
very different perspectives on what evaluation use can or should look like.
A third reason for studying evaluation theory is that theory should be an important part of our identities as evaluators, both
individually and collectively. If we think of ourselves in terms of our methodological skills, what is it that differentiates us
from many other people with equal (or even superior) methodological expertise? Evaluation theory. Evaluation theory, as
Will Shadish said in his presidential address to the American Evaluation Association, is “who we are.”6 But people come
to evaluation through quite varied pathways, many of which don’t involve explicit training in evaluation. That there are
myriad pathways into evaluation is, of course, a source of great strength for the field, bringing diversity of skills, opinions,
knowledge sets, and so on.
Despite the positive consequences of the various ways that people enter the field, this diversity also reinforces the
importance of studying evaluation theories. Methods are important, but, again, they need to be chosen in the service of
some larger end. Theory helps us figure out where an evaluation should be going and why—and, not trivially, what it is to
be an evaluator.
Of course, knowing something about evaluation theory doesn’t mean that choices about methods can be made
automatically. Indeed, lauded evaluation theorist Ernie House notes that while theorists typically have a high profile,
“practitioners lament that [theorists’] ideas are far too theoretical, too impractical. Practitioners have to do the project work
tomorrow, not jawbone fruitlessly forever.”7 Especially for newer theoretical work, the translation into practice may not be
clear—and sometimes not even feasible. Even evaluation theory that has withstood the test of time doesn’t automatically
translate into some cookbook or paint-by-number approach to evaluation practice.
More knowledge about evaluation theory can, especially at first, actually make methods choices harder. Why? Because
many evaluation theories take quite different stances about what kind of uses evaluation should focus on, and about how
evaluation should be done to achieve those uses. For example, to think about Donald Campbell8 as an evaluation theorist
is to highlight (a) the possibility of major choice points in the road, such as decisions about whether or not to implement
some new program; (b) the way decisions about such things often depend largely on the program’s potential effects (e.g.,
does universal pre-K lead to better school readiness and other desirable outcomes?); and (c) the benefits of either
randomized experiments or the best-available quasi-experimental data for assessing program effects.
In contrast, when we think about Joseph Wholey9 as an evaluation theorist, we focus on a very different way that
evaluation can contribute: through developing performance-measurement systems that program administrators can use to
improve their ongoing decision making. These performance measurement systems can help managers identify problem
areas and also provide them with good-enough feedback about the apparent consequences of decisions.
Choosing among these and the many other perspectives available in evaluation theories may seem daunting, especially
at first. But it’s better to learn to face the choices than to have them made implicitly by some accident of one’s
methodological training. In addition, theories themselves can help in the choosing. Some evaluation theories have huge
“default options.” These theories may not exactly say “one size fits all,” but they certainly suggest that one size fits darn
near all. Indeed, one of the dangers for those starting to learn about evaluation theory is becoming a true believer in one
of the first theories they encounter. When this happens, the new disciple may act like his or her preferred theory fits all
circumstances. Perhaps the most effective antidote to this problem is to be sure to learn about several evaluation theories
3. that take fairly different stances. Metaphorically, we probably need to be multilingual: No single evaluation theory should
be “spoken” in all the varying contexts we will encounter.
However, most, if not all, evaluation theories are contingent; that is, they prescribe (or at least are open to) quite different
approaches under different circumstances. As it turns out, there even exist theories that suggest very different bases for
contingent decision making. Put differently, there are theories that differ significantly on reasons for deciding to use one
evaluation design and not another.
These lead us to think about different “drivers” of contingent decision making. For example, Michael Patton’s well-
known Utilization-Focused Evaluation tells us to be contingent based on intended use by intended users. Almost any
method may be appropriate, if it is likely to help intended users make the intended use.10 Alternatively, in a recent book,
Huey-Tsyh Chen joins others who suggest that the choices made in evaluation should be driven by program
stage.11 Evaluation purposes and methods for a new program, according to Chen, would typically be different from those
for a mature program. Gary Henry, George Julnes, and I12 have suggested that choices among alternative evaluation
purposes and methods should be driven by a kind of analytic assessment of each one’s likely contribution to social
betterment.13
It can help to be familiar with any one of these fundamentally contingent evaluation theories. And, as is true of evaluation
theories in general, one or another may fit better, depending on the specific context. Nevertheless, the ideal would
probably be to be multilingual even in terms of these contingent evaluation theories. For instance, sometimes intended
use may be the perfect driver of contingent decision making, but in other cases decision making may be so distributed
across multiple parties that it isn’t feasible to identify specific intended users: Even the contingencies are contingent.
Evaluation theories are an aid to thoughtful judgment—not a dispensation from it. But as an aid to thoughtful choices
about methods, evaluation theories are indispensable.
1 Although meaningful distinctions could perhaps be made, here I am treating evaluation theory as equivalent
to evaluation modeland to the way the term evaluation approach is sometimes used.
2 Shadish, W. R., Cook, T. D., & Leviton, L. C. (1991). Foundations of program evaluation: Theories of practice. Newbury
Park, CA: Sage.
3 For a recent overview, see Weiss, C. H. (2004). Rooting for evaluation: A Cliff Notes version of my work. In M. C. Alkin
(Ed.),Evaluation roots: Tracing theorists’ views and influences (pp. 12–65). Thousand Oaks, CA: Sage.
4 Madaus, G. F., Scriven, M., & Stufflebeam, D. L. (1983). Evaluation models. Boston: Kluwer-Nijhoff.
5 See the papers by Weiss and Patton in Vol. 9, No. 1 (1988) of Evaluation Practice, reprinted in M. Alkin (Ed.).
(1990). Debates on evaluation. Newbury Park, CA: Sage.
6 Shadish, W. (1998). Presidential address: Evaluation theory is who we are. American Journal of Evaluation, 19(1), 1–
19.
7 House, E. R. (2003). Evaluation theory. In T. Kellaghan & D. L. Stufflebeam (Eds.), International handbook of
educational evaluation (pp. 9–14). Boston: Kluwer Academic.
8 For an overview, see the chapter on Campbell in the book cited in footnote 2.
9 See, e.g., Wholey, J. S. (2003). Improving performance and accountability: Responding to emerging management
challenges. In S. I. Donaldson & M. Scriven (Eds.), Evaluating social programs and problems: Visions for the new
millennium. Mahwah, NJ: Lawrence Erlbaum.
10 Patton, M. Q. (1997). Utilization-focused evaluation: The new century text. Thousand Oaks, CA: Sage
11 Chen, H.-T. (2005). Practical program evaluation: Assessing and improve planning, implementation, and effectiveness.
Thousand Oaks, CA: Sage.
12 Mark, M. M., Henry, G. T., & Julnes, G. (2000). Evaluation: An integrated framework for understanding, guiding, and
improving policies and programs. San Francisco, CA: Jossey-Bass.
13 Each of these three theories also addresses other factors that help drive contingent thinking about evaluation. In
addition, at least in certain places, there is more overlap among these models than this brief summary suggests.
Nevertheless, they do in general focus on different drivers of contingent decision making, as noted.
Mel Mark, Ph.D.
Professor of Psychology
4. The Pennsylvania State University
Department of Psychology
407 Moore Hall
University Park, PA 16802.
Tel: 814-863-1755
Email: m5m@psu.edu
kirkpatrick's learning and training
evaluation theory
Donald L Kirkpatrick's training evaluation model -
the four levels of learning evaluation
also below - HRD performance evaluation guide
Donald L Kirkpatrick, Professor Emeritus, University Of Wisconsin
(where he achieved his BBA, MBA and PhD), first published his ideas in
1959, in a series of articles in the Journal of American Society of
Training Directors. The articles were subsequently included in
Kirkpatrick's book Evaluating Training Programs (originally published in
1994; now in its 3rd edition - Berrett-Koehler Publishers).
Donald Kirkpatrick was president of the American Society for Training
and Development (ASTD) in 1975. Kirkpatrick has written several other
significant books about training and evaluation, more recently with his
similarly inclined son James, and has consulted with some of the world's
largest corporations.
Donald Kirkpatrick's 1994 book Evaluating Training Programs defined
his originally published ideas of 1959, thereby further increasing
awareness of them, so that his theory has now become arguably the
5. most widely used and popular model for the evaluation of training and
learning. Kirkpatrick's four-level model is now considered an industry
standard across the HR and training communities.
More recently Don Kirkpatrick formed his own company, Kirkpatrick
Partners, whose website provides information about their services and
methods, etc.
kirkpatrick's four levels of evaluation model
The four levels of Kirkpatrick's evaluation model essentially measure:
• reaction of student - what they thought and felt about
the training
• learning - the resulting increase in knowledge or
capability
• behaviour - extent of behaviour and capability
improvement and implementation/application
• results - the effects on the business or environment
resulting from the trainee's performance
All these measures are recommended for full
and meaningful evaluation of learning in organizations, although their
application broadly increases in complexity, and usually cost, through
the levels from level 1-4.
Quick Training Evaluation and Feedback Form, based on
Kirkpatrick's Learning Evaluation Model - (Excel file)
6. kirkpatrick's four levels of training evaluation
This grid illustrates the basic Kirkpatrick structure at a glance. The
second grid, beneath this one, is the same thing with more detail.
level evaluation
type (what
is
measured)
evaluation
description and
characteristics
examples of
evaluation tools
and methods
relevance and
practicability
1 Reaction Reaction
evaluation is how the
delegates
felt about the training
or learning
experience.
'Happy sheets',
feedback forms.
Verbal reaction, post-
training surveys or
questionnaires.
Quick and very easy to
obtain.
Not expensive to gather
or to analyse.
2 Learning Learning
evaluation is the
measurement of
the increase in
knowledge - before
and after.
Typically
assessments or tests
before and after the
training.
Interview or
observation can also
be used.
Relatively simple to set
up; clear-cut for
quantifiable skills.
Less easy for complex
learning.
3 Behaviour Behaviour
evaluation is the
extent of applied
learning back on the
job - implementation.
Observation and
interview over time
are required to
assess change,
relevance of change,
and sustainability of
change.
Measurement of
behaviour change
typically requires
cooperation and skill of
line-managers.
4 Results Results evaluation is Measures are already Individually not difficult;
7. the effect on the
business or
environment by the
trainee.
in place via normal
management systems
and reporting - the
challenge is to relate
to the trainee.
unlike whole organisation.
Process must attribute
clear accountabilities.
kirkpatrick's four levels of training evaluation in detail
This grid illustrates the Kirkpatrick's structure detail, and particularly the
modern-day interpretation of the Kirkpatrick learning evaluation model,
usage, implications, and examples of tools and methods. This diagram
is the same format as the one above but with more detail and
explanation:
evaluation
level and
type
evaluation description
and characteristics
examples of
evaluation tools and
methods
relevance and
practicability
1.
Reaction
Reaction
evaluation is how the
delegates felt, and
their personal reactions
to the training or
learning experience, for
example:
Did the trainees like and
enjoy the training?
Did they consider the training
relevant?
Was it a good use of their
Typically 'happy sheets'.
Feedback forms based on
subjective personal
reaction to the training
experience.
Verbal reaction which can
be noted and analysed.
Post-training surveys or
questionnaires.
Online evaluation or
grading by delegates.
Can be done
immediately the
training ends.
Very easy to obtain
reaction feedback
Feedback is not
expensive to gather or
to analyse for groups.
Important to know that
people were not upset
or disappointed.
8. time?
Did they like the venue, the
style, timing, domestics, etc?
Level of participation.
Ease and comfort of
experience.
Level of effort required to
make the most of the
learning.
Perceived practicability and
potential for applying the
learning.
Subsequent verbal or
written reports given by
delegates to managers
back at their jobs.
Important that people
give a positive
impression when
relating their
experience to others
who might be deciding
whether to experience
same.
2.
Learning
Learning evaluation is the
measurement of
the increase in
knowledge or intellectual
capability from before to
after the learning experience:
Did the trainees learn what
what intended to be taught?
Did the trainee experience
what was intended for them
to experience?
What is the extent of
advancement or change in
the trainees after the training,
in the direction or area that
was intended?
Typically assessments or
tests before and after the
training.
Interview or observation
can be used before and
after although this is time-
consuming and can be
inconsistent.
Methods of assessment
need to be closely related
to the aims of the learning.
Measurement and
analysis is possible and
easy on a group scale.
Reliable, clear scoring and
measurements need to be
established, so as to limit
the risk of inconsistent
Relatively simple to set
up, but more
investment and thought
required than reaction
evaluation.
Highly relevant and
clear-cut for certain
training such as
quantifiable or technical
skills.
Less easy for more
complex learning such
as attitudinal
development, which is
famously difficult to
assess.
Cost escalates if
systems are poorly
9. assessment.
Hard-copy, electronic,
online or interview style
assessments are all
possible.
designed, which
increases work
required to measure
and analyse.
3.
Behaviour
Behaviour evaluation is
the extent to which the
trainees applied the
learningand changed
their behaviour, and this
can be immediately and
several months after the
training, depending on the
situation:
Did the trainees put their
learning into effect when
back on the job?
Were the relevant skills and
knowledge used
Was there noticeable and
measurable change in the
activity and performance of
the trainees when back in
their roles?
Was the change in behaviour
and new level of knowledge
sustained?
Would the trainee be able to
transfer their learning to
another person?
Is the trainee aware of their
Observation and interview
over time are required to
assess change, relevance
of change, and
sustainability of change.
Arbitrary snapshot
assessments are not
reliable because people
change in different ways
at different times.
Assessments need to be
subtle and ongoing, and
then transferred to a
suitable analysis tool.
Assessments need to be
designed to reduce
subjective judgement of
the observer or
interviewer, which is a
variable factor that can
affect reliability and
consistency of
measurements.
The opinion of the trainee,
which is a relevant
indicator, is also
subjective and unreliable,
Measurement of
behaviour change is
less easy to quantify
and interpret than
reaction and learning
evaluation.
Simple quick response
systems unlikely to be
adequate.
Cooperation and skill of
observers, typically
line-managers, are
important factors, and
difficult to control.
Management and
analysis of ongoing
subtle assessments are
difficult, and virtually
impossible without a
well-designed system
from the beginning.
Evaluation of
implementation and
application is an
extremely important
assessment - there is
little point in a good
10. change in behaviour,
knowledge, skill level?
and so needs to be
measured in a consistent
defined way.
360-degree feedback is
useful method and need
not be used before
training, because
respondents can make a
judgement as to change
after training, and this can
be analysed for groups of
respondents and trainees.
Assessments can be
designed around relevant
performance scenarios,
and specific key
performance indicators or
criteria.
Online and electronic
assessments are more
difficult to incorporate -
assessments tend to be
more successful when
integrated within existing
management and
coaching protocols.
Self-assessment can be
useful, using carefully
designed criteria and
measurements.
reaction and good
increase in capability if
nothing changes back
in the job, therefore
evaluation in this area
is vital, albeit
challenging.
Behaviour change
evaluation is possible
given good support and
involvement from line
managers or trainees,
so it is helpful to
involve them from the
start, and to identify
benefits for them,
which links to the level
4 evaluation below.
4.
Results
Results evaluation is
the effect on the
It is possible that many of
these measures are
Individually, results
evaluation is not
11. business or
environment resulting from
the improved performance of
the trainee - it is the acid test.
Measures would typically be
business or organisational
key performance indicators,
such as:
Volumes, values,
percentages, timescales,
return on investment, and
other quantifiable aspects of
organisational performance,
for instance; numbers of
complaints, staff turnover,
attrition, failures, wastage,
non-compliance, quality
ratings, achievement of
standards and accreditations,
growth, retention, etc.
already in place via
normal management
systems and reporting.
The challenge is to
identify which and how
relate to to the trainee's
input and influence.
Therefore it is important to
identify and agree
accountability and
relevance with the trainee
at the start of the training,
so they understand what
is to be measured.
This process overlays
normal good management
practice - it simply needs
linking to the training
input.
Failure to link to training
input type and timing will
greatly reduce the ease by
which results can be
attributed to the training.
For senior people
particularly, annual
appraisals and ongoing
agreement of key
business objectives are
integral to measuring
business results derived
from training.
particularly difficult;
across an entire
organisation it
becomes very much
more challenging, not
least because of the
reliance on line-
management, and the
frequency and scale of
changing structures,
responsibilities and
roles, which
complicates the
process of attributing
clear accountability.
Also, external factors
greatly affect
organisational and
business performance,
which cloud the true
cause of good or poor
results.
12. Since Kirkpatrick established his original model, other theorists (for
example Jack Phillips), and indeed Kirkpatrick himself, have referred to
a possible fifth level, namely ROI (Return On Investment). In my view
ROI can easily be included in Kirkpatrick's original fourth level 'Results'.
The inclusion and relevance of a fifth level is therefore arguably only
relevant if the assessment of Return On Investment might otherwise be
ignored or forgotten when referring simply to the 'Results' level.
Learning evaluation is a widely researched area. This is understandable
since the subject is fundamental to the existence and performance of
education around the world, not least universities, which of course
contain most of the researchers and writers.
While Kirkpatrick's model is not the only one of its type, for most
industrial and commercial applications it suffices; indeed most
organisations would be absolutely thrilled if their training and learning
evaluation, and thereby their ongoing people-development, were
planned and managed according to Kirkpatrick's model.
For reference, should you be keen to look at more ideas, there are
many to choose from...
• Jack Phillips' Five Level ROI Model
• Daniel Stufflebeam's CIPP Model (Context, Input, Process,
Product)
• Robert Stake's Responsive Evaluation Model
• Robert Stake's Congruence-Contingency Model
• Kaufman's Five Levels of Evaluation
13. • CIRO (Context, Input, Reaction, Outcome)
• PERT (Program Evaluation and Review Technique)
• Alkins' UCLA Model
• Michael Scriven's Goal-Free Evaluation Approach
• Provus's Discrepancy Model
• Eisner's Connoisseurship Evaluation Models
• Illuminative Evaluation Model
• Portraiture Model
• and also the American Evaluation Association
Also look at Leslie Rae's excellent Training Evaluation and
tools available on this site, which, given Leslie's experience and
knowledge, will save you the job of researching and designing your own
tools.
evaluation of HRD function performance
If you are responsible for HR functions and services to internal and/or
external customers, you might find it useful to go beyond Kirkpatrick's
evaluation of training and learning, and to evaluate
also satisfaction among staff/customers with HR department's
overall performance. The parameters for such an evaluation
ultimately depend on what your HR function is responsible for - in other
words, evaluate according to expectations.
14. Like anything else, evaluating customer satisfaction must first begin with
a clear appreciation of (internal) customers' expectations. Expectations -
agreed, stated, published or otherwise - provide the basis for evaluating
all types of customer satisfaction.
If people have expectations which go beyond HR
department's stated and actual responsibilities, then the
matter must be pursued because it will almost certainly offer
an opportunity to add value to HR's activities, and to add
value and competitive advantage to your organisation as a
whole. In this fast changing world, HR is increasingly the department
which is most likely to see and respond to new opportunities for the
support and development of the your people - so respond, understand,
and do what you can to meet new demands when you see them.
If you are keen to know how well HR department is meeting people's
expectations, a questionnaire, and/or some group discussions will shed
light on the situation.
Here are some example questions. Effectively you should be asking
people to say how well HR or HRD department has done the following:
• helped me
to identify, understand, identify and prioritise my personal
development needs and wishes, in terms
of: skills,knowledge, experience and attitude (or personal well-
being, or emotional maturity, or mood, or mind-set, or any other
suitable term meaning mental approach, which people will respond
to)
15. • helped me to understand my own preferred learning
style and learning methods for acquiring
new skills, knowledge and attitudinalcapabilities
• helped me to identify and obtain effective
learning and development that suits my preferred style and
circumstances
• helped me to measure my development, and for the
measurement to be clear to my boss and others in the organisation
who should know about my capabilities
• provided tools and systems to encourage and facilitate
my personal development
• and particularly helped to optimise
the relationship between me and my boss relating to assisting my
own personal development and well-being
• provided a working environment that protects
me from discrimination and harassment of any sort
• provided the opportunity for me to voice my grievances if I have
any, (in private, to a suitably trained person in the company whom I
trust) and then if I so wish for proper consideration and response to
be given to them by the company
• provided the opportunity for me to receive counselling and
advice in the event that I need private and supportive help of this
type, again from a suitably trained person in the company whom I
trust
• ensured that disciplinary processes are clear and fair, and
include the right of appeal
16. • ensured that recruitment and promotion of staff are
managed fairly and transparently
• ensuring that systems and activities exist to keep all
staff informed of company plans, performance, etc., (as normally
included in a Team Briefing system)
• (if you dare...) ensuring that people are paid and rewarded
fairly in relation to other company employees, and separately, paid
and rewarded fairly when compared to market norms (your CEO will
not like this question, but if you have a problem in this area it's best to
know about it...)
• (and for managers) helped me to ensure the development
needs of my staff are identified and supported
This is not an exhaustive list - just some examples. Many of the
examples contain elements which should under typical large company
circumstances be broken down to create more and smaller questions
about more specific aspects of HR support and services.
If you work in HR, or run an HR department, and consider that some of
these issues and expectations fall outside your remit, then consider who
else is responsible for them.
I repeat, in this fast changing world, HR is increasingly the department
which is most likely to see and respond to new opportunities for the
support and development of the your people - so respond, understand,
and do what you can to meet new demands when you see them. In
doing so you will add value to your people and your organisation - and
your department.
17. Informing Practice Using Evaluation Models and Theories
Instructor: Dr. Melvin M. Mark, Professor of Psychology at the Pennsylvania State University
Description: Evaluators who are not aware of the contemporary and historical aspects of the profession.
"are doomed to repeat past mistakes and, equally debilitating, will fail to sustain and build on past
successes." Madaus, Scriven and Stufflebeam (1983).
"Evaluation theories are like military strategy and tactics; methods are like military weapons and logistics.
The good commander needs to know strategy and tactics to deploy weapons properly or to organize
logistics in different situations. The good evaluator needs theories for the same reasons in choosing and
deploying methods." Shadish, Cook and Leviton (1991).
These quotes from Madaus et al. (1983) and Shadish et al. (1991) provide the perfect rationale for why the
serious evaluator should be concerned with models and theories of evaluation. The primary purpose of this
class is to overview major streams of evaluation theories (or models), and to consider their implications for
practice. Topics include: (1) why evaluation theories matter, (2) frameworks for classifying different
theories, (3) in-depth examination of 4-6 major theories, (4) identification of key issues on which evaluation
theories and models differ, (5) benefits and risks of relying heavily on any one theory, and (6) tools and
skills that can help you in picking and choosing from different theoretical perspectives in planning an
evaluation in a specific context. The overarching theme will be on practice implications, that is, on what
difference it would make for practice to follow one theory or some other.
Theories to be discussed will be ones that have had a significant impact on the evaluation field, that offer
perspectives with major implications for practice, and that represent important and different streams of
theory and practice. Case examples from the past will be used to illustrate key aspects of each theory's
approach to practice.
Participants will be asked to use the theories to question their own and others' practices, and to consider
what characteristics of evaluations will help increase their potential for use. Each participant will receive
Marvin Alkin's text, Evaluation Roots (Sage, 2004) and other materials.
The instructor's assumption will be that most people attending the session have some general familiarity
with the work of a few evaluation theorists, but that most will not themselves be scholars of evaluation
theory. At the same time, the course should be useful, whatever one's level of familiarity with evaluation
theory.
Applied Measurement for Evaluation
Instructor: Dr. Ann Doucette, TEI Director, Research Professor, Columbian College of Arts and Sciences,
George Washington University
Description: Successful evaluation depends on our ability to generate evidence attesting to the feasibility,
relevance and/or effectiveness of the interventions, services, or products we study. While theory guides our
designs and how we organize our work, it is measurement that provides the evidence we use in making
judgments about the quality of what we evaluate. Measurement, whether it results from self-report survey,
interview/focus groups, observation, document review, or administrative data must be systematic,
replicable, interpretable, reliable, and valid. While hard sciences such as physics and engineering have
18. advanced precise and accurate measurement (i.e., weigh, length, mass, volume), the measurement used in
evaluation studies is often imprecise and characterized by considerable error. The quality of the inferences
made in evaluation studies is directly related to the quality of the measurement on which we base our
judgments. Judgments attesting to the ineffective interventions may be flawed - the reflection of measures
that are imprecise and not sensitive to the characteristics we chose to evaluate. Evaluation attempts to
compensate for imprecise measurement with increasingly sophisticated statistical procedures to manipulate
data. The emphasis on statistical analysis all too often obscures the important characteristics of the
measures we choose. This class content will cover these topics:
• Assessing measurement precision: examining the precision of measures in relationship to the
degree of accuracy that is needed for what is being evaluated. Issues to be addressed include:
measurement/item bias, the sensitivity of measures in terms of developmental and cultural
issues, scientific soundness (reliability, validity, error, etc.), and the ability of the measure to
detect change over time.
• Quantification: Measurement is essentially assigning numbers to what is observed (direct and
inferential). Decisions about how we quantify observations and the implications these decisions
have for using the data resulting from the measures, as well as for the objectivity and certainty
we bring to the judgment made in our evaluations will be examined. This section of the course
will focus on the quality of response options, coding categories - Do response options/coding
categories segment the respondent sample in meaningful and useful ways?
• Issues and Considerations - using existing measures versus developing your own measures:
What to look for and how to assess whether existing measures are suitable for your evaluation
project will be examined. Issues associated with the development and use of new measures will
be addressed in terms of how to establish sound psychometric properties, and what cautionary
statements should accompanying interpretation and evaluation findings using these new
measures.
• Criteria for choosing measures: assessing the adequacy of measures in terms of the
characteristics of measurement - choosing measures that fit your evaluation theory and
evaluation focus (exploration, needs assessment, level of implementation, process, impact and
outcome). Measurement feasibility, practicability and relevance will be examined. Various
measurement techniques will be examined in terms of precision and adequacy, as well as the
implications of using screening, broad-range, and peaked tests.
• Error - influences on measurement precision: The characteristics of various measurement
techniques, assessment conditions (setting, respondent interest, etc.), and evaluator
characteristics will be addressed.
Participants will be provided with a copy of the text: Measurement Theory in Action (Case Studies and
Exercises) by Shulz, K.S. and D.J. Whitney (Sage, 2004).
>> Return to top