4. How do you decide which high school to go
to?
• You can compare information about the current performance of
different high schools
5. How do you decide which high school to go
to?
• You can compare information about the current performance of
different high schools
• But you’re going to be in high school for 4 years: can you compare
the future performance of high schools?
7. Auto-regressive model outperforms
historical average (2007 to 2014)
Statistics
RMS error
(my model)
RMS error
(historical average)
• Post-secondary education rate 13% vs. 14%
• Regents exams pass rate 5% vs. 7%
• Student dropout rate 4% vs. 6%
8. Auto-regressive model outperforms
historical average (2007 to 2014)
• Compute root-mean-square (RMS) error from 10-fold cross validation
Statistics
RMS error
(my model)
RMS error
(historical average)
• Post-secondary education rate 13% vs. 14%
• Regents exams pass rate 5% vs. 7%
• Student dropout rate 4% vs. 6%
9. Auto-regressive model outperforms
historical average (2007 to 2014)
• Compute root-mean-square (RMS) error from 10-fold cross validation
Statistics
RMS error
(my model)
RMS error
(historical average)
• Post-secondary education rate 13% vs. 14%
• Regents exams pass rate 5% vs. 7%
• Student dropout rate 4% vs. 6%
10. Auto-regressive model outperforms
historical average (2007 to 2014)
Statistics
RMS error
(my model)
RMS error
(historical average)
• Post-secondary education rate 13% vs. 14%
• Regents exams pass rate 5% vs. 7%
• Student dropout rate 4% vs. 6%
• Compute root-mean-square (RMS) error from 10-fold cross validation
11. Correlations amongpercentage
changes in features over time,
2007 to 2014
• Do different markers of school
performance change together or
separately?
• Decreasing district budget
correlated with increasing number of
low-income students
• Increasing classroom size correlated
with increasing dropout rate
Troubling correlations exist in
features’ trends over time
12. Correlations amongpercentage
changes in features over time,
2007 to 2014
• Do different markers of school
performance change together or
separately?
• Decreasing district budget
correlated with increasing number of
low-income students
• Increasing classroom size correlated
with increasing dropout rate
Troubling correlations exist in
features’ trends over time
18. Cross-correlations
among features not
very descriptive
• Grid: cross-correlation of every
feature’s time traces with respect to
those of every other feature, averaged
over schools
• Each time trace standardized per
school prior to cross-correlation
• Pink, pinker, red shading: p < 1%,
0.01%, 0.0001%
• Difficulties: non-normal features, only
8 time points
19. Why not use a more complicated model?
1. Complicated model: most residuals not normal
2. Complicated model: strength of coefficients wildly different
among different k-folds of cross-validation
3. By eye, the more complicated model is more likely to give bizarre
predictions
4. Occam's razor: increase in predictive power too small to warrant
such complication
20. School district budget Regents Exams pass rate
For the simple
AR(1) model,
residuals are
mostly normal
12th grade population Student retention rate
• (School district budget is the
only statistic with non-normal
residuals)
21. School district budget Regents Exams pass rate
For the complex
AR(4) model,
residuals are
extremely non-
normal
12th grade population Student retention rate
• Model includes other
statistics as features and
uses elastic-net
regularization
• Indicates that the complex
model has lost sight of the
actual structure in the data?
• (In this case, school district
budget is the only statistic
with normal residuals)
22. Correlations amongfeatures (2014)
1. Regents Exam pass rate, college/post-
secondary placement rate correlated
2. Dropout rate, % receiving discount
lunch, teacher turnover rate highly
correlated
• These two groups are anti-correlated to
each other
• (Hard to judge 12th grade population,
teacher number, budget because of
differences in school and school district
sizes)
Clear correlations exist
among features
Hinweis der Redaktion
Hi, my name is Eric Smith, and today I’m going to talk about a tool for predicting the future performance of New York State high schools.
We can step back and investigate some of the trends in these data by seeing whether various school statistics tend to increase or decrease together or separately.
What the green square shows is that {}
What the purple square shows is that {}
What this implies that schools that are improving tend to do so on several different fronts at once, and that schools that are getting worse tend to do so on several different fronts at once as well, which is potentially useful from a policy perspective.
[
discount_lunch and budget: r-value -0.339, p-value 2.02e-35 1001 schools matched
tenth_class_size and student_retention_rate: r-value -0.667, p-value 3.16e-32 2025 schools matched
teacher_number and student_retention_rate: r-value 0.495, p-value 7.39e-33 1755 schools matched
teacher_number and tenth_class_size: r-value -0.311, p-value 5.7e-18 1530 schools matched
]