This article co-authored by the Division Director of the US Department of ED unit responsible for the National Evaluation of Upward Bound and the study's Technical Monitor summarizes major study error issues with the Mathematica reports from the study. The article reports substantial positive impacts when these study errors are addressed in a re-analysis using a standards based approach to the analyses.
Flawed evaluation of Upward Bound Reports Masks Postive Impacts
1. Flawed National Evaluation of Upward Bound Reports Masked Significant and
Substantial Positive Impacts: The Technical Monitors’ Perspective
By Margaret Cahalan and David Goodwin
In the last week of the Bush Administration, theU. S. Department of Education (ED) published
the final report in a long running evaluation of Upward Bound (UB) conducted by Mathematica
Policy Research(Mathematica). The results of this seemingly high-quality random assignment
study have formed the basis for significant policy
decisions—most notably a Bush administration Dr. Cahalan is a Senior Scientist with the
request to eliminate funding for Upward Bound and Pell Institute for the Study of Opportunity in
other federal pre-college access programs, and a Higher Education of the Council on
decision by the Office of Management and Budgetto Opportunity in Education (COE). Dr.
rate the program as “ineffective”--a decision that still Cahalan supervised the staff serving as the
stands. As technical monitors for the evaluation UB evaluation’stechnical monitors and
while working at the ED,we found the published served in this capacity herself in the final
few months of the UB evaluation. She is
reports from this evaluation were seriously flawed
currently the Co-PI of the COE i-3 project
and that a replicated re-analysis using ED’s own “Using Data to Inform College Access
research standards showed substantial positive Programming.”
impacts for the UB program. We made our concerns
well known within the Department. We do so now Dr. Goodwin, who recently retired from
publicly in order to support the formal Request for the Gates Foundation, is the former
Correction of the evaluation final report, recently Director of the unit within the U.S.
submitted to ED by the Council for Opportunity in Department of Education responsible for
Education (COE). the UB Evaluation and was Dr. Cahalan’s
supervisor. He was the UB study monitor
First,we wish to make clear that this commentary is when the study was first begun in 1992.
neither intended to be a critique of the random
assignment method nor a post-hoc effort to “fish” for positive study findings. Nor is the article
intended to discredit the study as a whole. While we strongly disagree with the analyses and
research transparency choices and the conclusions reached by Mathematica, we also believe that
the National Evaluation of Upward Bound is among the most carefully conducted and useful
studies we have in the area of pre-college research.
The essence of our argument is as follows:
The Department of Education attempted an unusual and overly ambitious study intended
to estimate the impact of Upward Bound as a whole, as well as for multiple sub-groups.
It required a highly-stratified sample to ensure representation of various types of2- and 4-
year college grantees. The study design resulted in a national sample of 67 projects, each
of which carried proportional weights representative of different types of projects found
in UB as a whole. The second stage student weighting was related to the number of
students submitting baseline surveys constituting a “waiting list” from which students
were randomly assigned to be invited into Upward Bound asactual openings occurred.
Those not randomly selected from the “waiting list” constituted the control group.
2. In a seriously flawed sample design, only one projectin the sample (called project 69)was
selected to represent the largest study defined 4-year public stratum and,because of an
unusually large number of “applicant” surveys in the final stage of weighting,carried 26
percent of the overall weight. Figure 1 shows just how extreme the unequal weighting
was from project 69. This flawed design meant that the outcomes of some students from
the project 69 “waiting list” carried a weight of 158, while the outcomes of other students
from the smallest weighted project carried a weight of only 4.
Figure 1. Percentage of sum of the weights by project of the 67 projects making up the
study sample: National Evaluation of Upward Bound, study conducted 1992-93-2003-04
30
26.38
25
20
15 Percent of weight
10
5
0
1
3
6
8
0
2
4
7
9
2
4
6
8
0
3
5
7
9
1
4
6
8
0
2
4
7
9
1
3
5
7
9
1
P1
P1
P1
P1
P2
P2
P2
P2
P2
P3
P3
P3
P3
P4
P4
P4
P4
P4
P5
P5
P5
P5
P6
P6
P6
P6
P6
P7
P7
P7
P7
P7
P8
NOTE: Of the 67 projects making up the UB sample just over half (54 percent) have less than 1 percent of the weights each and one
project (69) accounts for 26.4 percent of the weights.
SOURCE: Data tabulated December 2007 using: National Evaluation of Upward Bound data files, study sponsored by the Policy and
Planning Studies Services (PPSS), of the Office of Planning, Evaluation and Policy Development (OPEPD), US Department of Education,:
study conducted 1992-93-2003-04.
Unfortunately, project 69, whose students carried 26 percent of the weight, was also very
atypical of the grantee institution stratum for which it was the sole representative. Chosen as
the sole representative ofthe largest study defined 4-year stratum,project 69 had actually been
a community college,recently taken over to serve as a branch of a nearby 4-year institution.
Itretained its largely vocational certificate and 2-year offerings, with associated UB
programming. Project 69’s UB program was non-residential and partnered with a job training
program serving CTE high schools and awarding vocational certificates.Mathematica chose
to base all conclusions about the program with the bias introducing project 69 included. The
reportsalso do not reveal to readers project 69’s representational issues and indeed maintain
that project 69 is an adequate representative of its stratum.
In addition and very importantly, although therandom assignment method is intended to
ensure that treatment and control groups are equivalent, (and did so quite well for the UB
sample without project 69), in project 69 we found major differences between the treatment
and controls. This imbalance was so large that some reviewers suspected a failure to
implement the random assignment correctly in this project.For example, 80 percent of the
academically at-risk students from the project 69 sample were in the treatment group
3. (randomly assigned to Upward Bound),
Figure 2.Balance checks between treatment and
control groups for key baseline factors related to while 20 percent of the academically at-
outcomes for project 69, the other 66 projects risk students were in the control group
taken together, and the overall sample: National (not randomly assigned to UB). The
Evaluation of Upward Bound, study conducted treatment sample on average resembled
1992-93 to 2003-04 the vocational programming emphasis of
project 69 while the control group on
Project 69 has severe imbalance in favor of average resembled the typical Upward
control group Bound Math Science (UBMS) applicant--
being at a higher grade and also more
academically proficient with higher
educational expectations (Figure 2). In
contrast, without project 69 the sample has
a good balance between the treatment and
control group, with 51 percent of the
academically at risk students in the
treatment group and 49 percent in the
control group. The severe non-
equivalency in project 69 combined with
the extremely large weight resulted in an
Sample without project 69 (66 of 67) is balanced imbalance in the overall sample and
aninadequately-controlled bias in favor of
the control group in all of the
Mathematica impact estimates. For
example, 58 percent of the academically at
risk students were in the treatment group
and 42 percent in the control group in the
overall sample with project 69 (Figure 2).
Mathematica also did not
standardize the outcome measures for a
sample that spanned five years of expected
high school graduation years, arguing that
Overall sample with project 69 included lacks randomization made this unnecessary.
needed balance for random assignment study However, balance checks done by ED
monitoring staff found that on average, the
control group was in a higher grade at
baseline than the treatment group. A re-
analysis of Mathematica’s data
standardizing outcome measures to
expected high school graduation year
(EHSGY) found,contrary to what
Mathematica reports, that there were
substantial and statistically significant
positive impacts on postsecondary
entrance and federal financial aid
4. outcomes with and without project 69 (Figure 3). For the full report see
http://www.coenet.us/files/files- do_the_Conclusions_Change_2009.pdf ).
Figure 3. Treated on the Treated (TOT) and Intent to Treat (ITT) estimates of impact of
Upward Bound (UB) on postsecondary entrance within +1 year (18 months) of expected high
school graduation year (EHSGY) 1992-93 to 2003-04
Not UB participant (control) UB participant (treatment)
Difference
TOT (excludes project 69) 60.4 14.2****
74.6
ITT (excludes project 69) 64.3 Difference
73.3
9.0***
62.5 Difference
TOT (includes project 69) 73.5 11.0****
ITT (includes project 69 ) 66
72.9
Difference
0 20 40 60 80 6.9****
*/**/***/**** Significant at 0.10/0.05/. 01/00level.NOTE. Model based estimates based on STATA logistic
and instrumental variables regression and also taking into account the complex sample design. Based on
responses to three follow-up surveys and federal student aid files.SOURCE: Data tabulated January 2008
using: National Evaluation of Upward Bound data files, and federal Student Financial Aid (SFA) files 1994-95
to 2003-04. (Excerpted from the Cahalan Re-Analysis Report, Figure IV)
Importantly, there are also large impacts on BA attainment when project 69 is removed
representing 74 percent of “applicants” in the study period (Figure 4). Mathematica reported
no impacts on BA attainment and reported only positive impacts on certificate attainment
reflecting project 69’s vocational programming. As seen in Figure 4, the re-analysis without
project 69 found that students who were randomly assigned to Upward Bound and who
participated in the program (Treatment on the Treated-TOT estimates) had a 50 percent
higher chance of obtaining a BA degree within eight years of expected high school
graduation date. The Intent to Treat (ITT) estimates found almost a 30 percent increase in
BA receipt.
5. Figure4. Impact of Upward Bound (UB) on Bachelor’s (BA) degree attainment:
estimates based on 66 of 67 projects in UB sample: National Evaluation of Upward
Bound, study conducted 1992-93 to 2003-04
TOT (Longitudinal file BA in +8
years of EHSGY- evidence from 14.6
any Followup Survey (Third to
Fifith) or NSC; no evidence set 21.7
to 0)****
TOT(BA by end of the survey
period, Fifth Follow-Up 21.1
responders only-adjusted for Control
28.7
non-response)**** Treatment
ITT (Longitudinal file BA in +8
years of EHSGY- evidence from 13.7
any Followup Survey (Third to
Fifith) or NSC; no evidence set 17.5
to 0)****
0 5 10 15 20 25 30 35
*/**/***/**** Significant at 0.10/0.05/.01/00 level. NOTE: TOT = Treatment on the Treated; ITT= Intent to
Treat; EHSGY = Expected High School Graduation Year; NSC = National Student Clearinghouse; SFA =
Student Financial Aid. Estimates based on 66 of 67 projects in sample representing 74 percent of UB at the
time of the study. One project removed due to introducing bias into estimates in favor of the control group and
representational issues. Model based estimates based on STATA logistic and instrumental variables regression
taking into account the complex sample design. We use a 2-stage instrumental variables regression procedure to
control for selection effects for the Treatment on the Treated (TOT) impact estimates. ITT estimates include
14 percent of control group who were in Upward Bound Math Science or UB and 20-26 percent of treatment
group who did not enter Upward Bound. Calculated January 2010.
None of the positive impacts shown above are included in the Mathematica reports. Nor are the
representational issues with project 69 or the seriousness of the treatment control group non-
equivalency acknowledged. The Mathematicaconclusions of Upward Bound’s lack of “detectable
impact” are based entirely on results that include project 69 and do not standardize outcomes to
expected high school graduation year. The reports also misleadingly state that the major conclusions
do not changesubstantially because of project 69. Buried in their final report is an admission that
results are sensitive to project 69. The report states: “Because Project 69 had below average
impacts, reducing its weight relative to other projects resulted in larger overall impacts for most
6. outcomes compared with the findings from the main impact analysis, which weighted all sample
members according to their actual selection probabilities.” This, however, is also a misleading
statement about the effectiveness of project 69. As noted above in Figure 2, a closer look at project
69’s treatment and control group clearly reveals that the so-called “below average impacts” were not
due to “project 69’s poor performance” but were due to the extreme differences between the treatment
and control group in favor of the control group in this project.
In summary,we found that the Mathematica reports are not transparent in reporting study issues
and alternative results such that readers have enough information to make a judgment concerning
the validity of the Mathematica conclusions about Upward Bound. Despite being shown “more
credible”positive results for Upward Bound that have been replicated and confirmed as
technically more robust, Mathematica and the U.S. Department of Education continue to report
to Congress and the public erroneous conclusions concerning the UB program’s effectiveness.
This is a very serious matter that needs correcting. The complete texts of the Request for
Correction is available athttp://www.coenet.us/files/pubs_reports-
COE_Request_for_Correction_011712.pdf and of the Statement of Concern signed by leading
researchers can be found at http://www.coenet.us/files/ED-Statement_of_Concern_011712.pdf.
.