Introduction to prediction modelling - Berlin 2018 - Part II
1. Advanced Epidemiologic Methods
causal research and prediction modelling
Prediction modelling topics 5 - 7
Maarten van Smeden
LUMC, Department of Clinical Epidemiology
20-24 August 2018
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
2. Outline
1 Introduction to prediction modelling
2 Example: predicting systolic blood pressure
3 Risk and probability
4 Risk prediction modelling: rationale and context
5 Risk prediction model building
6 Overfitting
7 External validation and updating
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
4. TRIPOD statement
TRIPOD, Ann Int Med, 2016, doi: 10.7326/M14-0697 and 10.7326/M14-0698
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
5. Steps of model development
• Research design and data collection
• Choice of statistical model, outcome and (candidate) predictors
• Initial data analysis
• Descriptive analysis
• Model specification and estimation
• Evaluation of performance and internal validation
• Presentation
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
6. Steps of model development
• Research design and data collection
• Choice of statistical model, outcome and (candidate) predictors
• Initial data analysis
• Descriptive analysis
• Model specification and estimation
• Evaluation of performance and internal validation
• Presentation
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
7. Research design: aims
• Point of intended use of the risk model
- Primary care (paper/computer/app)?
- Secondary care (beside)?
- Low resource setting?
• Complexity
- Number of predictors?
- Transparency of calculation?
- Should it be fast?
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
8. Research design: design of data collection
• Diagnostic risk prediction:
cross-sectional design (e.g.
consecutive patients):
measurement of predictors
at baseline + reference
standard (”gold standard” is
often a misnomer)
• Prognostic risk prediction:
(prospective) cohort study:
measurement of predictors at
baseline + follow-up until
event occurs (time-horizon)
Figure: Moons, Ann Int Med, 2016, doi: 10.7326/M14-0698
Alternative data collection designs:
• Randomized trial: typically small, large treatment effects, strict eligibility criteria
• Routine care data: often suffering from data quality issues (misclassifications, missing data)
• Case-control study: generally unsuitable for risk prediction
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
9. Steps of model development
• Research design and data collection
• Choice of statistical model, outcome and (candidate) predictors
• Initial data analysis
• Descriptive analysis
• Model specification and estimation
• Evaluation of performance and internal validation
• Presentation
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
10. Possible outcomes
Types of outcomes
• Death (e.g. 10 day in hospital mortality)
• Hospital readmission (e.g. 1 year after CVD event)
• Developing a disease (e.g. 10 year risk of Diabetes Type-II)
• Bleeding risk (Thrombosis)
• Complications after surgery
• Response to treatment
Considerations
• Relevant time horizon for risk essential
• Broad composite outcomes not informative
• Misclassification errors can be influential on risk prediction
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
11. Possible candidate predictors
General advise: Use clinical knowledge and (systematic) reviews to identify predictors that are
plausibly related to the outcome of interest
Type of predictors
• Demographics (age, sex, SES)
• Patient history (previous disease)
• Physical examination (may be subjective)
• Diagnostic tests (imaging, ECG)
• Biomarkers
• Disease characteristics (diagnosis, severity)
• Therapies received
• Physical functioning
• . . .
Include?
• Unique contribution to prediction
• Cost of measurement
• Speed of measurement
• Invasiveness of measurement
• Availability in clinical practice
• Measurement objectivity
• Measurement quality
• Model parsimony
• . . .
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
12. Choice of statistical model
Outcome Regression model Example
Continuous linear (OLS) blood pressure at discharge
Binary
(death/alive)
binary logistic EuroSCORE: 30 day mortality
after cardiac surgery
Survival (time to
event)
Cox model Framingham risk score: 10-year
cardio-vascular disease
Categorical multinomial logistic Operative delivery (spontaneous,
instrumental, caesarean section)
Note: many alternative regression models exist for similar outcomes (e.g. weighted linear, probit, Weibull, proportional odds)
Machine learning methods and artificial intelligence:
so far shown to give little advantage or to perform worse than regression models based risk
prediction (more about this tomorrow)
EuroSCORE: 10.1016/S0195-668X(02)00799-6; Framingham: 10.1161/CIRCULATIONAHA.107.699579; Operative delivery:
10.1111/j.1471-0528.2012.03334.x
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
13. Steps of model development
• Research design and data collection
• Choice of statistical model, outcome and (candidate) predictors
• Initial data analysis
• Descriptive analysis
• Model specification and estimation
• Evaluation of performance and internal validation
• Presentation
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
14. Initial data analysis and descriptive analysis
Risk model for venous thromboembolism in postpartum women: Abdul Sultan, BMJ, 2016, doi:10.1136/bmj.i6253
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
15. Selecting predictors on univariable associations
• The association between one particular predictor and the outcome is a univariable
association ⇒ informative at the initial data analysis and descriptive analysis step
Univariable selection:
• Is the use of a p-value criterion (p < .05) for selecting predictors for inclusion in the
prediction model based on the univariable relations between predictors and the outcome
• Is commonly used for selecting predictors
• Is inappropriate as it rejects important predictors
• Is inappropriate as it selects unimportant predictors
• only works for completely uncorrelated predictor variables, which they never are
Bottom line: don’t use univariable selection to select or reject predictors
Read more: Sun, JCE, 1996, doi: 10.1016/0895-4356(96)00025-X
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
16. Missing data
Discussed extensively on day 2.
Missing data often poses a non-ignorable problem for prediction
models, requiring extra steps and efforts when developing and
validating the model. But there is consensus on how to deal with
particular forms of missing data (e.g. multiple imputation by chained
equations when MAR, sensitivity analyses when MNAR). Missing
data should be prevented as much as possible.
Read more: Vergouwe, JCE, 2010, doi: 10.1016/j.jclinepi.2009.03.017
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
17. Steps of model development
• Research design and data collection
• Choice of statistical model, outcome and (candidate) predictors
• Initial data analysis
• Descriptive analysis
• Model specification and estimation
• Evaluation of performance and internal validation
• Presentation
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
18. Model specification
f(X) → linear predictor (lp)
Simplest case: lp = β0 + β1x1 + . . . + βPxP (only ”main effects”)
linear regression
Y = lp + ε
logistic regression
ln{Pr(Y = 1)/(1 – Pr(Y = 1))} = lp
Pr(Y = 1) = 1/(1 + exp{–lp})
Cox regression
h(t) = h0(t)exp(lp)
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
19. Continuous predictors
• Many predictors are measured on a continuous scale
- Age
- Systolic/diastolic blood pressure
- HDL/LDL
- Biomarkers
- . . .
• Decision required on how to include continuous predictors in the modelling
• Allow for nonlinearity
- Polynomials (e.g. quadratic)
- Splines functions
- Fractional polynomials
Read more: Collins, Stat Med, 2016, doi: 10.1002/sim.6986
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
21. Dichotomania
Dichotomania is an obsessive compulsive disorder to which medical advisors in
particular are prone [. . .]. Show a medical advisor some continuous measurements
and he or she immediately wonders. Hmm, how can I make these clinically
meaningful? Where can I cut them in two? What ludicrous side conditions can I
impose on this?
Stephen Senn
Quote source: Senn, http://www.senns.demon.co.uk/Geep.htm
Dichotomising predictors is unfortunately very common in prediction modeling
• Example: create a new predictor with 0 if age < 50 years (’young’); 1 if age ≥ 50 years
(’old’)
• Throws away precious information for risk prediction
• Unrealistic, it assumes those immediately above and below the cut point have different risk
• Reduces predictive accuracy of the model
Avoid dichotomising predictors!
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
22. Dichotomania
Source: Royston, Stat Med, 2006, doi: 10.1002/sim.2331
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
23. Steps of model development
• Research design and data collection
• Choice of statistical model, outcome and (candidate) predictors
• Initial data analysis
• Descriptive analysis
• Model specification and estimation
• Evaluation of performance and internal validation
• Presentation
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
24. Model predictive performance
Source: Steyerberg, Epidemiology, 2010, doi: 10.1097/EDE.0b013e3181c30fb2
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
29. Discrimination
• Sensitivity/specificity trade-off
• Arbitrary choice threshold → many
possible sensitivity/specificity pairs
• All pairs in 1 graph: ROC curve
• Area under the ROC-curve:
probability that a random individual
with event has a higher predicted
probability than a random individual
without event
• Area under the ROC-curve: the c-
statistic (for logistic regression) takes
on values between 0.5 (no better
than a coin-flip) and 1.0 (perfect
discrimination)
Read more: Sedgwick, BMJ, 2015, doi: 10.1136/bmj.h2464
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
31. Discrimination and calibration
• Discrimination: the extent to which risks differentiate between positive and negative
outcomes
• Calibration: the extent to which estimated risks are valid
• Discrimination is usually the no. 1 performance measure
- Risk models are typically compared based discriminative performance; not on
calibration
- A risk prediction model with no discriminative performance is uninformative
- A risk prediction model that is poorly calibrated is misleading
Van Calster, JCE, 2016, doi: 10.1016/j.jclinepi.2015.12.005
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
32. Overoptimism
Overoptimsm
Predictive performance evaluations are too optimistic when estimated
on the same data where the risk prediction model was developed. This
is therefore called apparent performance of the model
• Optimism can be large, especially in small datasets and with a large number of predictors
• To get a better estimate of the predictive performance:
- Internal validation (same data sample)
- External validation (other data sample, discussed in tomorrow’s lecture)
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
33. Internal validation
• Evaluate performance of risk prediction model on data from the same population from
which model was developed
• Say that we start with one dataset with all data available: the original data
• Option 1: Splitting original data
- One portion to develop (’training set’); one portion to evaluate (’test set’)
- Non-random vs random split
- Generates 1 test of performance
• Option 2: Resampling from original data
- Cross-validation
- Bootstrapping
- Generates a distribution of performances
• General advice: avoid splitting (option 1) because
- Inefficient → especially when original data is small
- Usually leads to a too small test set
Read more: Steyerberg, JCE, 2001, doi: 10.1016/S0895-4356(01)00341-9
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
34. Bootstrapping
Steps:
• Randomly selects individuals from the original data until a dataset of the same size is
obtained (called the bootstrap sample)
• Each time an individual is selected, they are put back into the original dataset individuals
may therefore be selected more than once in each bootstrap sample
• Repeat this process many times - say 500 - to obtain 500 bootstrap samples
• Repeat the model development process (incl non-linear effects, variable selection) on each
of the bootstrap samples
• Calculate the predictive performance of the developed models on the original data.
• Take the average over these samples to get an optimism corrected estimate of performance
of the model in the original sample.
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
35. Steps of model development
• Research design and data collection
• Choice of statistical model, outcome and (candidate) predictors
• Initial data analysis
• Descriptive analysis
• Model specification and estimation
• Evaluation of performance and internal validation
• Presentation
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
36. Presentation
• Make sure that information about all the estimated regression parameters are provided,
including intercept.
• Consider: adding a nomogram, developing a score chart or app
• Follow the reporting guideline TRIPOD
TRIPOD, Ann Int Med, 2016, doi: 10.7326/M14-0697 and 10.7326/M14-0698
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
37. Report all estimated parameters
Maarten van Smeden (LUMC) Risk prediction model building 20-24 August 2018
39. Outline
1 Introduction to prediction modelling
2 Example: predicting systolic blood pressure
3 Risk and probability
4 Risk prediction modelling: rationale and context
5 Risk prediction model building
6 Overfitting
7 External validation and updating
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
40. Overfitting
Curse of all statistical modelling1
What you see is not what you get2
When a model is fitted that is too complex, that is it has too many free
parameters to estimate for the amount of information in the data, the worth of
the model (e.g., R2 ) will be exaggerated and future observed values will not
agree with predicted values3
Idiosyncrasies in the data are fitted rather than generalizable patterns. A
model may hence not be applicable to new patients, even when the setting of
application is very similar to the development setting4
1van Houwelingen, Stat Med, 2000, PMID: 11122504; 2Babyak, Psychosomatic Medicine, 2004, PMID: 15184705; 3Harrell, 2001, Springer, ISBN
978-1-4757-3462-1; 4 Steyerberg, 2009, Springer, ISBN 978-0-387-77244-8.
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
41. Overfitting poem
Wherry, Personnel Psychology, 1975, doi: 10.1111/j.1744-6570.1975.tb00387.x
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
43. Overfitting causes and consequences
Steyerberg, 2009, Springer, ISBN 978-0-387-77244-8.
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
44. Overfitting: typical calibration plot
• Low probabilities are predicted too low, high probabilities are predicted too high
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
46. Calibration development data: not insightful
Bell, BMJ, 2015, doi: 10.1136/bmj.h5639
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
47. How to avoid overfitting?
• Be conservative selecting/removing variable predictor variables
• Avoid stepwise selection and forward selection
• When using backward elimination use conservative p-values (e.g. p = 0.10 or 0.20)
• Apply shrinkage methods
• Sample size
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
48. Automated (stepwise) variable selection
• Selection unstable: selection and order of entry often overinterpreted
• Limited power to detect true effects: predictive ability suffers, underfitting
• Risk of false-positive associations: multiple testing, overfitting
• Inference biased: P-values exaggerated; standard errors too small
• Estimated coefficients biased: testimation
Figure: Steyerberg, JCE, 2018, doi: 10.1016/j.jclinepi.2017.11.013; Read more: Heinze, Biometrical J, 2018, doi: 10.1002/bimj.201700067
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
49. 1956: Steins paradox
Stein, 1956: http://www.dtic.mil/dtic/tr/fulltext/u2/1028390.pdf
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
50. 1956: Steins paradox
In words (rather simplified):
When one has three or more units (say, individuals), and for each unit one can
calculate an average score (say, average blood pressure), then the best guess
of future observations (blood pressure) for each unit is NOT its average score
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
51. 1961: James-Stein estimator: the next Berkley Symposium
James, 1961: https://projecteuclid.org/euclid.bsmsp/1200512173
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
52. 1977: Baseball example
Efron, Scientific American, 1977, www.jstor.org/stable/24954030
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
53. Lessons from Stein’s paradox
• Stein’s paradox is among the most surprising (and initially doubted) phenomena in statistics
• After the James-Stein paradox many other shrinkage estimators were developed. Now a
large family: shrinkage estimators reduce prediction variance to an extent that outweighs
the bias that is introduced (bias/variance trade-off)
Bias, variance and prediction error
Expected prediction error = irreducible error + bias + variance2
Friedman et al. (2001). The elements of statistical learning. Vol. 1. New York: Springer series.
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
64. Was I just lucky?
No: 5% reduction in MSPE just by shrinkage estimator (Van Houwelingen and le Cessie’s
heuristic shrinkage factor)
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
67. Shrinkage estimators
Popular shrinkage approaches for prediction modeling:
• Bootstrap
• Heuristic formula
• Firths correction
• Ridge regression
• LASSO regression
• Bayesian prediction modeling
• Note: shrinkage is in general particularly beneficial for calibration of the risk prediction
model and less so for its discrimination
Further reading: Pavlou, BMJ, 2015, doi: 10.1136/bmj.h3868; van Smeden, SMMR, 2018, doi: 10.1177/0962280218784726
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
68. Sample size
• Sample size is important factor driving performance of risk prediction models
• No consensus on what counts as an adequate sample size
• General principles for adequate sample size:
- Effective sample size driven by number of observations in the group with or without
the outcome predicted whichever is the smallest, per convention called ”events”
- EPV: the number of events divided by the number of candidate predictors is a
common ratio to describe model parsimony vs effective sample size
- EPV < 10 is ”danger zone”: avoid
- EPV much larger than 10 is often needed to a prediction model that gives precise risk
estimates
Further reading: van Smeden, SMMR, 2018, doi: 10.1177/0962280218784726
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
69. Sample size and shrinkage
Benefit of regression shrinkage dependents on:
• Sample size
• Correlations between predictor variables
• Sparsity of outcome and predictor variables
• The irreducible error component
• Type of outcome (continuous, binary, count, time-to-event,...)
• Number of candidate predictor variables
• Non-linear/interaction effects
• Weak/strong predictor balance
How to know that there is no need for shrinkage at some sample size?
Advice: always apply shrinkage regardless of sample size and compare to non-shrunken model.
Very large differences may indicate a variety of non-identified issues that may need fixing →
contact statistician
Maarten van Smeden (LUMC) Overfitting 20-24 August 2018
70. Outline
1 Introduction to prediction modelling
2 Example: predicting systolic blood pressure
3 Risk and probability
4 Risk prediction modelling: rationale and context
5 Risk prediction model building
6 Overfitting
7 External validation and updating
Maarten van Smeden (LUMC) External validation and updating 20-24 August 2018
71. Prediction model landscape
• > 110 models for prostate cancer (Shariat 2008)
• > 100 models for traumatic brain injury (Perel 2006)
• 83 models for stroke (Counsell 2001)
• 54 models for breast cancer (Altman 2009)
• 43 models for type 2 diabetes (Collins 2011; Dieren 2012)
• 31 models for osteoporotic fracture (Steurer 2011)
• 29 models in reproductive medicine (Leushuis 2009)
• 26 models for hospital readmission (Kansagara 2011)
• > 25 models for length of stay in cardiac surgery (Ettema 2010)
• > 350 models for cardiovascular disease outcomes (Damen 2016)
• What if your model becomes number 300-something?
• What about the clinical benefit/utility of number 300-something?
Courtesy of KGM Moons and GS Collins for this overview
Maarten van Smeden (LUMC) External validation and updating 20-24 August 2018
72. Before developing yet another model, know that:
• For most diseases / outcomes risk prediction models have already been developed
→ Only few are externally validated or updated
→ Even fewer are disseminated and used in clinical practice
• Use your data for external validation of models already developed!
Maarten van Smeden (LUMC) External validation and updating 20-24 August 2018
73. External validation
• Study of the predictive performance of the risk prediction model in data of new subjects
that were not used to develop it
• The larger the difference between development and validation data, the more likely the
model will be useful in (as yet) untested populations
- Case-mix (distributions of predictors and outcome)
• External validation is the strongest test of a prediction model
- Different time period (’temporal’)
- Different areas/centres (’geographical’)
- Ideally by independent investigators
Collins, BMJ, 2012, doi: 10.1136/bmj.e3186
Maarten van Smeden (LUMC) External validation and updating 20-24 August 2018
74. External validation is not
• It is not repeating model development steps
• Whether the same predictors, regression coefficients and predictive performance would be
found in new data is not in question
• It is not re-estimating a previously developed model
• Updating regression coefficients is sometimes done when the performance at external
validation is unsatisfactory. This can be viewed as model (model revision) and calls for new
external validation
Maarten van Smeden (LUMC) External validation and updating 20-24 August 2018
75. What to expect at external validation
• Decreased predictive performance compared to development is expected
• Many possible causes:
- Overfitting of the model at development
- Different type of patients (case mix)
- Different outcome occurrence
- Differences in care over time
- Differences in treatments
- Improvement in measurements over time (e.g.previous CTs less accurate than spiral
CT for PE detection)
- . . .
• When predictive performance is judged too low → consider model updating
Maarten van Smeden (LUMC) External validation and updating 20-24 August 2018
76. Model updating
• Recalibration in the large: re-estimate the intercept
• Recalibration: re-estimate the intercept + additional factor that multiplies all coefficients
with same factor (calibration slope)
Table from Vergouwe, Stat Med, 2017, doi: 10.1002/sim.7179
Maarten van Smeden (LUMC) External validation and updating 20-24 August 2018
77. Sample size for external validation
Vergouwe, JCE, 2005, doi: 10.1016/j.jclinepi.2004.06.017; Collins, Stat Med, 2015, doi: 10.1002/sim.6787
Maarten van Smeden (LUMC) External validation and updating 20-24 August 2018
78. Maarten van Smeden (LUMC) External validation and updating 20-24 August 2018
79. Advanced Epidemiologic Methods
causal research and prediction modelling
Final remarks
Maarten van Smeden
LUMC, Department of Clinical Epidemiology
20-24 August 2018
Maarten van Smeden (LUMC) Final remarks 20-24 August 2018
81. Machine learning
Beam, JAMA, 2018, doi: 10.1001/jama.2017.18391
Maarten van Smeden (LUMC) Final remarks 20-24 August 2018
82. Machine learning
Shah, JAMA, 2018, doi: 10.1001/jama.2018.5602
Maarten van Smeden (LUMC) Final remarks 20-24 August 2018
83. Machine learning
Shah, JAMA, 2018, doi: 10.1001/jama.2018.5602
Maarten van Smeden (LUMC) Final remarks 20-24 August 2018
84. Machine learning
source: blog Frank Harrell, http://www.fharrell.com/post/stat-ml/
Maarten van Smeden (LUMC) Final remarks 20-24 August 2018
85. Final remarks
• Prediction models can take many forms but in medicine the interest is often in calculating
risk of a health state currently being present (diagnostic) or developing in the future
(prognostic)
• Risk prediction models are tools that aim to support medical decision making, not replace
physicians
• Many prediction models have been developed already → make sure you know review the
earlier models in the field before deciding to build your own
• Calibration is essential for accurate risk prediction. Miscalibrated models misinform and may
cause patients harm
Maarten van Smeden (LUMC) Final remarks 20-24 August 2018
86. Acknowledgment
The materials (slides) used for in this course were inspired by materials that belong to Prof dr Gary Collins.
Maarten van Smeden (LUMC) Final remarks 20-24 August 2018