Machine learning versus traditional statistical modeling and medical doctors

Machine learning versus traditional
statistical modeling and medical
doctors
Maarten van Smeden
Leiden University Medical Center
IBS ROeS - Lausanne
September 10, 2019

IBS ROeS 2019, Lausanne MaartenvSmeden
Left out artificial intelligence?
In medical research, “artificial intelligence” usually
just means “machine learning” or “algorithm”

Tech company business model

Tech company business model
https://bit.ly/2HSp8X5; https://bit.ly/2Z0Pfop; https://bit.ly/2KIcpHG; https://bit.ly/33IJhr9

Other success stories
https://go.nature.com/2VG2hS7; https://bbc.in/2Z1drXQ

IBM Watson winning Jeopardy
https://bbc.in/2TMvV8I

IBM Watson for oncology
https://bit.ly/2LxiWGj

IBS ROeS 2019, Lausanne MaartenvSmedenForsting, J Nuc Med, 2017, DOI: 10.2967/jnumed.117.190397

Machine learning everywhere (selection of last month)
https://bit.ly/2ka0HLq; https://go.nature.com/33TQgO6; https://bit.ly/2kp6X23; https://bit.ly/2lZuKWt; https://bit.ly/2lI298g

What are these
Machine Learning methods?

“Everything is an ML method”
https://bit.ly/2lEVn33

“ML methods come from computer science”
https://bit.ly/2zhbwPv; https://stanford.io/2TVp1xK; https://stanford.io/2ZfED0k
Leo Breiman Jerome H Friedman Trevor Hastie
CART, random forest Gradient boosting Elements of statistical learning
Education Physics/Math Physics Statistics
Job title Professor of Statistics Professor of Statistics Professor of Statistics

“ML methods for prediction, statistics for explaining”
Damen, BMJ, 2016, DOI:10.1136/bmj.i2416
363 developed models how many?
Decision trees 0
Random forests 0
Support vector machines 0
Nearest neighbor algorithms 0
Neural networks 1

“ML methods for prediction, statistics for explaining”
1See further: Kreiff and Diaz Ordaz; https://bit.ly/2m1eYdK
ML and causal inference, small selection1
• Superlearner (e.g. van der Laan)
• High dimensional propensity scores (e.g. Schneeweiss)
• The book of why (Pearl)
Wednesday 10:40-12:10 Keynote Session 3
Els Goetghebeur: Plea for a marriage of
machine learning and causal inference

Two cultures
Breiman, Stat Sci, 2001, DOI: 10.1214/ss/1009213726

IBS ROeS 2019, Lausanne MaartenvSmedenRobert Tibshirani: https://stanford.io/2zqEGfr
Machine learning: large grant = $1,000,000
Statistics: large grant = $50,000

Statistics Machine learning
Covariates Features
Outcome variable Target
Model Network, graphs
Parameters Weights
Model for discrete var. Classifier
Model for continuous var. Regression
Log-likelihood Loss
Multinomial regression Softmax
Measurement error Noise
Subject/observation Sample/instance
Dummy coding One-hot encoding
Measurement invariance Concept drift
Statistics Machine learning
Prediction Supervised learning
Latent variable modeling Unsupervised learning
Fitting Learning
Prediction error Error
Sensitivity Recall
Positive predictive value Precision
Contingency table Confusion matrix
Measurement error model Noise-aware ML
Structural equation model Gaussian Bayesian network
Gold standard Ground truth
Derivation–validation Training–test
Experiment A/B test
Adapted from Daniel Obserski: https://bit.ly/2YN12Xf and Robert Tibshirani: https://stanford.io/2zqEGfr
Language

ML refers to a culture, not to methods
Distinguishing between statistics and machine learning
• Substantial overlap methods used by both cultures
• Substantial overlap analysis goals
• Attempts to separate the two frequently result in disagreement
Pragmatic approach:
I’ll use “ML” to refer to models roughly outside of the traditional regression
types of analysis: decision trees (and descendants), SVMs, neural networks,
boosting etc.

Examples where “ML” has
done well

Example: retinal disease
Gulshan et al, JAMA, 2016, 10.1001/jama.2016.17216; Picture retinopathy: https://bit.ly/2kB3X2w
Diabetic retinopathy
Deep learning (= Neural network)
• 128,000 images
• Transfer learning (preinitialization)
• Sensitivity and specificity > .90
• Estimated from training data

Example: lymph node metastases
Bejnordi et al, JAMA, 2018, doi: 10.1001/jama.2017.14585. See our letter to the editor for a critical discussion: https://bit.ly/2kcYS0e
Deep learning competition
• 390 teams signed up, 23 submitted
• Only 270 images for training
• Test AUC range: 0.56 to 0.99

Deep learning on images
Many similar studies and challenges in radiology, pathology,
dermatology, opthalmology, gastroenterology, cardiology, ….
Topol, Nature Medicine, 2019, DOI: 10.1038/s41591-018-0300-7

Other sources of “medical” data
• Large scale gene expression data
• e.g. diagnosis of acute myeloid leukemia
https://bit.ly/2k8Ao8e
• Prognostication by text mining electronic health records
• e.g. predicting life expectancy
https://bit.ly/2k8Ao8e
• Analyzing social media posts
• e.g. pharmacovigilance, adverse events monitoring via Twitter posts
https://bit.ly/2m0KKrg

Examples where “ML” has
done poorly

Skin cancer and rulers
Esteva et al., Nature, 2016, DOI: 10.1038/nature21056; https://bit.ly/2lE0vV0

Predicting mortality – the conclusion
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344

Predicting mortality – the results
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344

Predicting mortality – the media
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344; https://bit.ly/2Q6H41R; https://bit.ly/2m3RLrn

HYPE!

Systematic review clinical prediction models
Christodoulou et al. Journal of Clinical Epidemiology, 2019, doi: 10.1016/j.jclinepi.2019.02.004

“ML” versus traditional
statistics and medical
doctors

Comparison “ML” vs statistical models
• Machine learning methods versus statistical models is a false
dichotomy
• Advanced “ML” shows promise, especially in areas that are
not the traditional “tabular data” (e.g. images)
• Tabular data settings where “ML” can be compared with
traditional regression model techniques show little added value
in medical applications

Classification versus risk prediction
Most ML “classifiers” don’t come naturally with risk prediction, i.e.
a probability estimate of predicted outcome for individuals
• Possibly much large sample size needed to obtain reliable
(calibrated) risk predictions1 than reliable classifications
• Models can be trained to be optimized for a certain predictive
performance (e.g. AUC, sensitivity, calibration)
• Which performance to use to compare models are optimized
for different types of performance?
• What about the patient outcomes?
Van Smeden et al., Stat Meth Med Res, 2019

Where do we stand on “ML” vs doctors?
Domain: radiology and pathology
• Article hits: 12,000
• After screening: 22
• Out-of-sample comparison “ML” vs doctors: 2
Faes et al., Lancet preprint, 2019, https://ssrn.com/abstract=3384923

Fair “ML” vs doctor comparisons
Three basic principles
• Doctors should work under realistic time constraints and have
access to all regular diagnostic information, including relevant
additional diagnostic testing, unless there are compelling
reasons not to do so
• The output generated by algorithms and physicians should be
evaluated on the same scale
• Performance over-optimism should be avoided
Van Smeden et al., JAMA, 2018, doi:

Fair “ML” vs doctor comparisons
Several barriers for diagnosis/prognosis
• Absence of a gold standard for most diseases1
• Errors/unclear category are to be expected
• Errors are transferred to algorithm
• Risk overestimating the performance
• Which performance measures should we be looking at?
• AUC, sens/spec, predictive values, F1?
• What about patient outcomes?
1See: Reitsma, Journal of Clinical Epidemiology, 2009, doi: 10.1016/j.jclinepi.2009.02.005

My plea
To big data (and use it) and back to trials
• There is a need to evaluate and compare the performance of
well developed statistical learning models on patient outcomes
(e.g. survival, response to treatment, PROs, etc.)
• The analogue of test-treatment trials in diagnostic research:
algorithm-treatment trials

Machine learning versus traditional statistical modeling and medical doctors

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Machine learning versus traditional statistical modeling and medical doctors

Ähnlich wie Machine learning versus traditional statistical modeling and medical doctors (20)

Mehr von Maarten van Smeden

Mehr von Maarten van Smeden (11)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Machine learning versus traditional statistical modeling and medical doctors