Machine learning, biomarker accuracy and best practices

Machine learning,
biomarker accuracy  
and best practices
Pradeep Reddy Raamana
crossinvalidation.com

Singular goal of workshop
Accuracydistribution
model 1 model 2 model 3 model 4 model 5 model 6
• understand

• understand
• machine learning

• understand
• support vector machine

• understand
• classiﬁcation accuracy

• understand
• classiﬁcation accuracy
• cross-validation

What is Machine Learning?
• “giving computers the ability to
learn without being explicitly
programmed.”

programmed.”
• i.e. building algorithms to
learn patterns in data

programmed.”
• i.e. building algorithms to
learn patterns in data
• automatically

Examples
images from various sites on internet

Clinical Application of ML
• ML has many clinical applications, including:

• computer-aided diagnosis
• clinical decision support

• personalized medicine

• treatment and monitoring

• better care and service delivery systems
(reduce length of hospitalization, optimize
resource redistribution etc)

• I will focus on biomarkers today!

• more on how to assess their utility

• more on how to assess their utility
• less on how to identify, build and tune them.

Types of Machine learning
Data
Labelled
SupervisedUnsupervised
Data Not
labelled

Unsupervised learning
Discover hidden patterns

Unsupervised: examples
images from wikipedia.com and gerfﬁcient.com

• Clustering

• Clustering
• Blind source
separation
• PCA
• ICA

Supervised learning
Classiﬁcation
Setosa Versicolor Viriginica

Supervised learning
Classiﬁcation
Regression
Setosa Versicolor Viriginica

Supervised: examples
linear classiﬁer
A
B

support vector
machine
linear classiﬁer
A
B

support vector
machine
linear classiﬁer
A
B
decision tree
is x1 < 1.5
BA
yes no

Focus Today
classiﬁcationclustering regression

Terminology
names→
counter↓
sepal
width
sepal
length
petal
width
petal
length
class
1 0.2 1.1 0.4 1 setosa
features
(variables, dimensions, columns etc)
PetalSepal

Terminology
names→
counter↓
sepal
width
sepal
length
petal
width
petal
length
class
1 0.2 1.1 0.4 1 setosa
2 0.35 0.9 0.1 2 setosa
3 0.3 …
4 0.28 versicolor
5 .. versicolor
… .. ..
… 0.45 virginica
N 0.35 virginica
samples
(observations,
data points etc)
features
PetalSepal

Terminology
names→
counter↓
sepal
width
sepal
length
petal
width
petal
length
class
1 0.2 1.1 0.4 1 setosa
2 0.35 0.9 0.1 2 setosa
3 0.3 …
4 0.28 versicolor
5 .. versicolor
… .. ..
… 0.45 virginica
N 0.35 virginica
samples
(observations,
data points etc)
features
PetalSepal
y
↓
→ X

Classiﬁcation
Training data
Build the
classiﬁer

Classiﬁcation
Training data
New test data
Build the
classiﬁer

Classiﬁcation
Training data
New test data
map to
known classes
Build the
classiﬁer

Support Vector Machine (SVM)
• A popular classiﬁcation technique

• At its core, it is
• binary (separate two classes)

• linear (boundary: line in 2d or
hyperplane in n-d)

• linear (boundary: line in 2d or
hyperplane in n-d)
• Its power lies in ﬁnding the boundary
between classes difﬁcult to separate

How does SVM work?
L1 L2
x1
x2

How does SVM work?
L1 L2
L3
x1
x2

How does SVM work?
L1 L2
L3
x1
x2 support
vectors

How does SVM work?
L3
x1
x2 support
vectors

Harder problem  
(classes are not linearly separable)
x1
x2

Harder problem  
L1
x1
x2

Harder problem  
L1
x1
x2
L1→less errors,
smaller margin

Harder problem  
L1
L2
x1
x2
L1→less errors,
smaller margin

Harder problem  
L1
L2
x1
x2
L1→less errors,
smaller margin
L2→more errors,
larger margin

Harder problem  
L1
L2
x1
x2
L1→less errors,
smaller margin
L2→more errors,
larger margin
Tradeoff between
error and margin!

Harder problem  
L1
L2
x1
x2
L1→less errors,
smaller margin
L2→more errors,
larger margin
Tradeoff between
error and margin!
parameter C: penalty
for misclassiﬁcation

Transform to higher
dimensions
x1

Transform to higher
dimensions
x1
x2=x1^2

Transform to higher
dimensions
x1
x2=x1^2
We turned the linear problem into a nonlinear problem.

Transform to higher
dimensions
x1
x2=x1^2
We turned the linear problem into a nonlinear problem.
This trick is achieved via kernel functions!

Fancier kernels exist!
x1
x2
x1
x2
nonlinear kernel

Recap: SVM
• Linear classiﬁer at its core

Recap: SVM
• Boundary with max. margin

Recap: SVM
• Boundary with max. margin
• Input data can be transformed
to higher dimensions to
achieve better separation

Classifier Performance
• How do you evaluate how well the classifier works?
• input unseen data with known labels (ground truth)
• make predictions with previously trained classifier
• using ground truth,
• compute % of when prediction matches ground
truth —> classification accuracy

Ground
Truth (GT)

Ground
Truth (GT)
Predicted
(P)

Ground
Truth (GT)
Predicted
(P)
Accuracy = %(P == GT)

P. Raamana
What is generalizability?
available
data (sample*)
23*has a statistical deﬁnition

P. Raamana
available
data (sample*) desired: accuracy on  
unseen data (population*)

P. Raamana
available
out-of-sample
predictions

P. Raamana
available
out-of-sample
predictions
23
avoid  
overﬁtting
*has a statistical deﬁnition

P. Raamana
Overﬁtting
24
Underﬁt

P. Raamana
Overfitting
24
Overfit
Underfit

P. Raamana
Overfitting
24
Good fit
Overfit
Underfit

P. Raamana
50 shades of overﬁtting
25
Decade
Population
© mathworks
human  
annihilation?

P. Raamana
“Clever forms of overﬁtting”
26from http://hunch.net/?p=22

P. Raamana
Cross-validation
27

P. Raamana
Cross-validation
• What is cross-validation? Training set Test set
27

P. Raamana
Cross-validation
• What is cross-validation?
• How to perform it?
Training set Test set
≈ℵ≈
27

P. Raamana
Cross-validation
• What are the effects of
different CV choices?
≈ℵ≈
27

P. Raamana
Cross-validation
• What are the effects of
different CV choices?
≈ℵ≈
negative bias unbiased positive bias
27

P. Raamana
CV helps quantify generalizability
28

P. Raamana
Why cross-validate?
29

P. Raamana
Why cross-validate?
bigger training set
better learning
29

P. Raamana
Why cross-validate?
bigger training set
better learning better testing
bigger test set
29

P. Raamana
Why cross-validate?
bigger training set
bigger test set
Key: Train & test sets must be disjoint.
29

P. Raamana
Why cross-validate?
bigger training set
bigger test set
And the dataset or sample size is ﬁxed.
29

P. Raamana
Why cross-validate?
bigger training set
bigger test set
They grow at the expense of each other!
29

P. Raamana
Why cross-validate?
bigger training set
bigger test set
They grow at the expense of each other!
cross-validate
to maximize both
29

P. Raamana
Use cases
• “When setting aside data for parameter
estimation and validation of results can
not be afforded, cross-validation (CV) is
typically used”
30

P. Raamana
Use cases
typically used”
• Use cases:
30

P. Raamana
accuracydistribution 
fromrepetitionofCV(%)
Use cases
typically used”
• Use cases:
• to estimate generalizability  
(test accuracy)
30

P. Raamana
Use cases
typically used”
• Use cases:
(test accuracy)
• to pick optimal parameters  
(model selection)
30

P. Raamana
Use cases
typically used”
• Use cases:
(test accuracy)
• to pick optimal parameters  
(model selection)
• to compare performance  
(model comparison).
30

P. Raamana
Key Aspects of CV
31

P. Raamana
Key Aspects of CV
1. How you split the dataset into train/test
31

P. Raamana
Key Aspects of CV
•maximal independence between  
training and test sets is desired.
31

P. Raamana
Key Aspects of CV
•This split could be
• over samples (e.g. indiv. diagnosis)
samples
(rows)
31

P. Raamana
Key Aspects of CV
samples
(rows)
31
healt
hy
dise
ase

P. Raamana
Key Aspects of CV
• over time (for task prediction in fMRI)
time (columns)
samples
(rows)
31
healt
hy
dise
ase

P. Raamana
Key Aspects of CV
• over time (for task prediction in fMRI)
2. How often you repeat randomized splits?
•to expose classiﬁer to full variability
•As many as times as you can e.g. 100
≈ℵ≈
time (columns)
samples
(rows)
31
healt
hy
dise
ase

P. Raamana
Validation set
Training set
32*biased towards X —> overﬁt to X

P. Raamana
Validation set
goodness of ﬁt
of the model
Training set

P. Raamana
Validation set
goodness of ﬁt
of the model
biased* towards
the training set
Training set

P. Raamana
Validation set
goodness of ﬁt
of the model
biased* towards
the training set

P. Raamana
Validation set
goodness of ﬁt
of the model
biased* towards
the training set
≈ℵ≈

P. Raamana
Validation set
optimize
parameters
goodness of ﬁt
of the model
biased* towards
the training set
≈ℵ≈

P. Raamana
Validation set
optimize
parameters
goodness of ﬁt
of the model
biased towards
the test set
biased* towards
the training set
≈ℵ≈

P. Raamana
Validation set
optimize
parameters
goodness of ﬁt
of the model
biased towards
the test set
biased* towards
the training set
Training set Test set Validation set
≈ℵ≈

P. Raamana
Validation set
optimize
parameters
goodness of ﬁt
of the model
biased towards
the test set
biased* towards
the training set
evaluate
generalization
independent of
training or test sets
≈ℵ≈

P. Raamana
Validation set
optimize
parameters
goodness of ﬁt
of the model
biased towards
the test set
biased* towards
the training set
evaluate
generalization
independent of
Whole dataset
≈ℵ≈

P. Raamana
Validation set
optimize
parameters
goodness of ﬁt
of the model
biased towards
the test set
biased* towards
the training set
evaluate
generalization
independent of
Whole dataset
≈ℵ≈
inner-loop

P. Raamana
Validation set
optimize
parameters
goodness of ﬁt
of the model
biased towards
the test set
biased* towards
the training set
evaluate
generalization
independent of
Whole dataset
≈ℵ≈
inner-loop
outer-loop

P. Raamana
Terminology
33
Data split
Training
Testing
Validation

P. Raamana
Terminology
33
Data split
Training
Testing
Validation
Purpose (Do’s)
Train model to learn its
core parameters
Optimize 
hyperparameters
Evaluate fully-optimized
classiﬁer to report
performance

P. Raamana
Terminology
33
Data split
Training
Testing
Validation
Purpose (Do’s)
core parameters
Optimize 
hyperparameters
performance
Don’ts (Invalid use)
Don’t report training error as
the test error!
Don’t do feature selection or
anything supervised on test
set to learn or optimize!
Don’t use it in any way to train
classiﬁer or optimize
parameters

P. Raamana
Terminology
33
Data split
Training
Testing
Validation
Purpose (Do’s)
core parameters
Optimize 
hyperparameters
performance
Don’ts (Invalid use)
Don’t report training error as
the test error!
Don’t do feature selection or
anything supervised on test
set to learn or optimize!
Don’t use it in any way to train
classiﬁer or optimize
parameters
Alternative
names
Training  
(no confusion)
Validation  
(or tweaking, tuning,
optimization set)
Test set (more
accurately reporting
set)

P. Raamana
K-fold CV
Train Test, 4th fold
trial
1
2
…
k
34

P. Raamana
K-fold CV
Test sets in different trials are indeed mutually disjoint
trial
1
2
…
k
34

P. Raamana
K-fold CV
Test sets in different trials are indeed mutually disjoint
trial
1
2
…
k
Note: different folds won’t be contiguous. 34

P. Raamana
Repeated Holdout CV
Set aside an independent subsample (e.g. 30%) for testing
whole dataset
35

P. Raamana
Repeated Holdout CV
Train Test
trial
1
2
…
n
whole dataset
35

P. Raamana
Repeated Holdout CV
Train Test
trial
1
2
…
n
Note: there could be overlap among the test sets  
from different trials! Hence large n is recommended.
whole dataset
35

Typical workﬂow
Whole dataset
(randomized split)
Training set
(with labels)
feature extraction
selection
parameter optimization
(on training data only)
Trained classiﬁer
Test set: rest
(no labels)
Same feature
extraction
Select same
features
Evaluate on
test set
Pool predictions
over repetitions
NextCVrepetitioniofn

Software
• There is a free machine learning toolbox in every
major language!
• Check below for the latest techniques/toolboxes:
• http://www.jmlr.org/mloss/ or
• http://mloss.org/software/

Confusion Matrices
Feature Importance
Accuracy distributions Intuitive comparison of misclassiﬁcation rates
neuropredict : easy and comprehensive predictive analysis
input features
• each feature set could be any
set of numbers estimated from
sample by itself (intrinsic, not
group-wise)
• designed to seamlessly
compare many features (n>1),
if they are all from same set of
samples belonging to same
classes
• supports many input formats.
• plugs directly into outputs from
popular software like
Freesurfer.
neuropredict
• performs cross-validation,  
in such a way to increase
power of statistical
comparisons later on
• tracks misclassiﬁcation rates:
class- and subject-wise,  
for each feature
• measures feature importance
• statistical comparison of
predictive performance
• intuitive visualizations
• stream-lined comparison for  
a large number of features!
docs: http://neuropredict.readthedocs.io
code: github.com/raamana/neuropredict twitter: @raamana_

neuropredict features
• Auto-reading of neuroimaging features

• Auto-evaluation of predictive accuracy

• Auto-comparison of performance …

• Notice the word I am repeating?

• Auto
• Auto
• Auto

• Being automatic is important, without
which, it becomes hard, and error-prone!!
• Auto
• Auto
• Auto

Accuracy distributions
sample outputs

Confusion Matrices
Accuracy distributions
sample outputs

Confusion Matrices
Accuracy distributions Intuitive comparison of
misclassiﬁcation rates
sample outputs

Confusion Matrices
Feature Importance
Accuracy distributions Intuitive comparison of
misclassiﬁcation rates
sample outputs

Now, it’s time to neuropredict!
xkcd
.com/raamana

Model selection
Friedman, J., Hastie, T., & Tibshirani, R. (2008). The elements of statistical learning. Springer, Berlin: Springer series in statistics.

Machine learning, biomarker accuracy and best practices

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (7)

Ähnlich wie Machine learning, biomarker accuracy and best practices

Ähnlich wie Machine learning, biomarker accuracy and best practices (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Machine learning, biomarker accuracy and best practices