Machine learning versus traditional statistical modeling and medical doctors
1. Machine learning versus traditional
statistical modeling and medical
doctors
Maarten van Smeden
Leiden University Medical Center
IBS ROeS - Lausanne
September 10, 2019
2. IBS ROeS 2019, Lausanne MaartenvSmeden
Left out artificial intelligence?
In medical research, “artificial intelligence” usually
just means “machine learning” or “algorithm”
3. IBS ROeS 2019, Lausanne MaartenvSmeden
Tech company business model
4. IBS ROeS 2019, Lausanne MaartenvSmeden
Tech company business model
https://bit.ly/2HSp8X5; https://bit.ly/2Z0Pfop; https://bit.ly/2KIcpHG; https://bit.ly/33IJhr9
11. IBS ROeS 2019, Lausanne MaartenvSmeden
“Everything is an ML method”
https://bit.ly/2lEVn33
12. IBS ROeS 2019, Lausanne MaartenvSmeden
“ML methods come from computer science”
https://bit.ly/2zhbwPv; https://stanford.io/2TVp1xK; https://stanford.io/2ZfED0k
Leo Breiman Jerome H Friedman Trevor Hastie
CART, random forest Gradient boosting Elements of statistical learning
Education Physics/Math Physics Statistics
Job title Professor of Statistics Professor of Statistics Professor of Statistics
13. IBS ROeS 2019, Lausanne MaartenvSmeden
“ML methods for prediction, statistics for explaining”
Damen, BMJ, 2016, DOI:10.1136/bmj.i2416
363 developed models how many?
Decision trees 0
Random forests 0
Support vector machines 0
Nearest neighbor algorithms 0
Neural networks 1
14. IBS ROeS 2019, Lausanne MaartenvSmeden
“ML methods for prediction, statistics for explaining”
1See further: Kreiff and Diaz Ordaz; https://bit.ly/2m1eYdK
ML and causal inference, small selection1
• Superlearner (e.g. van der Laan)
• High dimensional propensity scores (e.g. Schneeweiss)
• The book of why (Pearl)
Wednesday 10:40-12:10 Keynote Session 3
Els Goetghebeur: Plea for a marriage of
machine learning and causal inference
15. IBS ROeS 2019, Lausanne MaartenvSmeden
Two cultures
Breiman, Stat Sci, 2001, DOI: 10.1214/ss/1009213726
16. IBS ROeS 2019, Lausanne MaartenvSmedenRobert Tibshirani: https://stanford.io/2zqEGfr
Machine learning: large grant = $1,000,000
Statistics: large grant = $50,000
17. IBS ROeS 2019, Lausanne MaartenvSmeden
Statistics Machine learning
Covariates Features
Outcome variable Target
Model Network, graphs
Parameters Weights
Model for discrete var. Classifier
Model for continuous var. Regression
Log-likelihood Loss
Multinomial regression Softmax
Measurement error Noise
Subject/observation Sample/instance
Dummy coding One-hot encoding
Measurement invariance Concept drift
Statistics Machine learning
Prediction Supervised learning
Latent variable modeling Unsupervised learning
Fitting Learning
Prediction error Error
Sensitivity Recall
Positive predictive value Precision
Contingency table Confusion matrix
Measurement error model Noise-aware ML
Structural equation model Gaussian Bayesian network
Gold standard Ground truth
Derivation–validation Training–test
Experiment A/B test
Adapted from Daniel Obserski: https://bit.ly/2YN12Xf and Robert Tibshirani: https://stanford.io/2zqEGfr
Language
18. IBS ROeS 2019, Lausanne MaartenvSmeden
ML refers to a culture, not to methods
Distinguishing between statistics and machine learning
• Substantial overlap methods used by both cultures
• Substantial overlap analysis goals
• Attempts to separate the two frequently result in disagreement
Pragmatic approach:
I’ll use “ML” to refer to models roughly outside of the traditional regression
types of analysis: decision trees (and descendants), SVMs, neural networks,
boosting etc.
21. IBS ROeS 2019, Lausanne MaartenvSmeden
Example: retinal disease
Gulshan et al, JAMA, 2016, 10.1001/jama.2016.17216; Picture retinopathy: https://bit.ly/2kB3X2w
Diabetic retinopathy
Deep learning (= Neural network)
• 128,000 images
• Transfer learning (preinitialization)
• Sensitivity and specificity > .90
• Estimated from training data
22. IBS ROeS 2019, Lausanne MaartenvSmeden
Example: lymph node metastases
Bejnordi et al, JAMA, 2018, doi: 10.1001/jama.2017.14585. See our letter to the editor for a critical discussion: https://bit.ly/2kcYS0e
Deep learning competition
• 390 teams signed up, 23 submitted
• Only 270 images for training
• Test AUC range: 0.56 to 0.99
23. IBS ROeS 2019, Lausanne MaartenvSmeden
Deep learning on images
Many similar studies and challenges in radiology, pathology,
dermatology, opthalmology, gastroenterology, cardiology, ….
Topol, Nature Medicine, 2019, DOI: 10.1038/s41591-018-0300-7
24. IBS ROeS 2019, Lausanne MaartenvSmeden
Other sources of “medical” data
• Large scale gene expression data
• e.g. diagnosis of acute myeloid leukemia
https://bit.ly/2k8Ao8e
• Prognostication by text mining electronic health records
• e.g. predicting life expectancy
https://bit.ly/2k8Ao8e
• Analyzing social media posts
• e.g. pharmacovigilance, adverse events monitoring via Twitter posts
https://bit.ly/2m0KKrg
33. IBS ROeS 2019, Lausanne MaartenvSmeden
Comparison “ML” vs statistical models
• Machine learning methods versus statistical models is a false
dichotomy
• Advanced “ML” shows promise, especially in areas that are
not the traditional “tabular data” (e.g. images)
• Tabular data settings where “ML” can be compared with
traditional regression model techniques show little added value
in medical applications
34. IBS ROeS 2019, Lausanne MaartenvSmeden
Classification versus risk prediction
Most ML “classifiers” don’t come naturally with risk prediction, i.e.
a probability estimate of predicted outcome for individuals
• Possibly much large sample size needed to obtain reliable
(calibrated) risk predictions1 than reliable classifications
• Models can be trained to be optimized for a certain predictive
performance (e.g. AUC, sensitivity, calibration)
• Which performance to use to compare models are optimized
for different types of performance?
• What about the patient outcomes?
Van Smeden et al., Stat Meth Med Res, 2019
35. IBS ROeS 2019, Lausanne MaartenvSmeden
Where do we stand on “ML” vs doctors?
Domain: radiology and pathology
• Article hits: 12,000
• After screening: 22
• Out-of-sample comparison “ML” vs doctors: 2
Faes et al., Lancet preprint, 2019, https://ssrn.com/abstract=3384923
36. IBS ROeS 2019, Lausanne MaartenvSmeden
Fair “ML” vs doctor comparisons
Three basic principles
• Doctors should work under realistic time constraints and have
access to all regular diagnostic information, including relevant
additional diagnostic testing, unless there are compelling
reasons not to do so
• The output generated by algorithms and physicians should be
evaluated on the same scale
• Performance over-optimism should be avoided
Van Smeden et al., JAMA, 2018, doi:
37. IBS ROeS 2019, Lausanne MaartenvSmeden
Fair “ML” vs doctor comparisons
Several barriers for diagnosis/prognosis
• Absence of a gold standard for most diseases1
• Errors/unclear category are to be expected
• Errors are transferred to algorithm
• Risk overestimating the performance
• Which performance measures should we be looking at?
• AUC, sens/spec, predictive values, F1?
• What about patient outcomes?
1See: Reitsma, Journal of Clinical Epidemiology, 2009, doi: 10.1016/j.jclinepi.2009.02.005
38. IBS ROeS 2019, Lausanne MaartenvSmeden
My plea
To big data (and use it) and back to trials
• There is a need to evaluate and compare the performance of
well developed statistical learning models on patient outcomes
(e.g. survival, response to treatment, PROs, etc.)
• The analogue of test-treatment trials in diagnostic research:
algorithm-treatment trials