Prototype-based classifiers and their applications in the life sciences
1. Michael Biehl
Mathematics and Computing Science
University of Groningen / NL
Prototype-based classifiers
and their applications in the life-sciences
www.cs.rug.nl/~ biehl
2. Michael Biehl
Mathematics and Computing Science
University of Groningen / NL
LVQ and Relevance Learning
frequently asked questions and rarely given answers
www.cs.rug.nl/~ biehl
4. WSOM 2014
basics: distance based classifiers, relevance learning
What about the curse of dimensionality ?
How do you find a good distance measure ?
example: Generalized Matrix LVQ
What about over-fitting ?
Is the relevance matrix unique ?
Is it useful in practice ?
frequently asked questions
applications: bio-medical data
adrenal tumors, rheumatoid arthritis
outlook: What‘s next ?
5. WSOM 2014
K-NN classifier
a simple distance-based classifier
- store a set of labeled examples
- classify a query according to the
label of the Nearest Neighbor
(or the majority of K NN)
?
- piece-wise linear class borders
parameterized by all examples
feature space
+ conceptually simple, no training required, one parameter (K)
- expensive storage and computation, sensitivity to “outliers”
can result in overly complex decision boundaries
6. WSOM 2014
prototype based classification
a prototype based classifier
- represent the data by one or
several prototypes per class
- classify a query according to the
label of the nearest prototype
(or alternative schemes)
- piece-wise linear class borders
parameterized by prototypes
feature space
+ less sensitive to outliers, lower storage needs, little computational
effort in the working phase
- training phase required in order to place prototypes,
model selection problem: number of prototypes per class, etc.
?
7. WSOM 2014
set of prototypes
carrying class-labels
based on dissimilarity / distance measure
nearest prototype classifier (NPC):
given - determine the winner with
Nearest Prototype Classifier
minimal requirements:
- assign to class
standard example:
squared Euclidean
8. WSOM 2014
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors
for different classes
competititve learning: LVQ1 [Kohonen, 1990]
• identify the winner
(closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)
9. WSOM 2014
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
∙ tesselation of feature space
[piece-wise linear]
∙ distance-based classification
[here: Euclidean distances]
∙ generalization ability
correct classification of new data
∙ aim: discrimination of classes
( ≠ vector quantization
or density estimation )
10. WSOM 2014
What about the curse of dimensionality ?
concentration of norms/distances for large N
„distance based methods are bound to fail in high dimensions“ ?
LVQ:
- prototypes are not just random data points
- carefully selected representatives of the data
- distances of a given data point to prototypes are compared
projection to non-trivial
low-dimensional subspace!
[Ghosh et al., 2007, Witoelar et al., 2010]
models of LVQ training, analytical treatment in the limit
successful training needs training examples
see also:
11. WSOM 2014
cost function based LVQ
one example: Generalized LVQ [Sato & Yamada, 1995]
sigmoidal (linear for small arguments), e.g.
E approximates number of misclassifications
linear
E favors large margin separation of classes, e.g.
two winning prototypes:
minimize
small , large
E favors class-typical prototypes
12. WSOM 2014
cost function based LVQ
There is nothing objective about objective functions
J. McClelland
13. WSOM 2014
GLVQ
training = optimization with respect to prototype position,
e.g. single example presentation, stochastic sequence of examples,
update of two prototypes per step
based on non-negative, differentiable distance
requirement:
update decreases , increases
14. WSOM 2014
GLVQ
training = optimization with respect to prototype position,
e.g. single example presentation, stochastic sequence of examples,
update of two prototypes per step
based on non-negative, differentiable distance
15. WSOM 2014
GLVQ
training = optimization with respect to prototype position,
e.g. single example presentation, stochastic sequence of examples,
update of two prototypes per step
based on Euclidean distance
moves prototypes towards / away from
sample with prefactors
16. WSOM 2014 16
fixed distance measures:
- select distance measures according to prior knowledge
- data driven choice in a preprocessing step
- compare performance of various measures
example: divergence based LVQ (DLVQ)
Mwebaze et al., Neurocomputing (2011)
relevance learning:
- employ a parameterized distance measure
- update its parameters in the training process
together with prototype training
- adaptive, data driven dissimilarity
example: Matrix Relevance LVQ
What is a good distance measure ?
17. WSOM 2014
Relevance Matrix LVQ
generalized quadratic distance in LVQ:
variants:
one global, several local, class-wise relevance matrices Λ(j)
→ piecewise quadratic decision boundaries
rectangular discriminative low-dim. representation
e.g. for visualization [Bunte et al., 2012]
possible constraints: rank-control, sparsity, …
diagonal matrices: single feature weights [Bojer et al., 2001]
[Hammer et al., 2002]
[Schneider et al., 2009]
18. WSOM 2014
Relevance Matrix LVQ
Generalized Matrix-LVQ
(GMLVQ)
optimization of prototypes and distance measure
19. WSOM 2014 19
heuristic interpretation
summarizes
- the contribution of the original dimension
- the relevance of original features for the classification
interpretation assumes implicitly:
features have equal order of magnitude
e.g. after z-score-transformation →
(averages over data set)
standard Euclidean distance for
linearly transformed features
20. Classification of adrenal tumors
Wiebke Arlt , Angela Taylor
Dave J. Smith, Peter Nightingale
P.M. Stewart, C.H.L. Shackleton
et al.
Petra Schneider
Han Stiekema
Michael Biehl
Johann Bernoulli Institute for
Mathematics and Computer Science
University of Groningen
School of Medicine
Queen Elizabeth Hospital
University of Birmingham/UK
(+ several centers in Europe)
[Arlt et al., J. Clin. Endocrinology & Metabolism, 2011]
[Biehl et al., Europ. Symp. Artficial Neural Networks (ESANN), 2012]
22. WSOM 2014
Generalized Matrix LVQ , ACC vs. ACA classification
∙ data divided in 90% training, 10% test set
∙ determine prototypes
typical profiles (1 per class)
∙ apply classifier to test data
evaluate performance (error rates, ROC)
∙ adaptive generalized quadratic distance measure
parameterized by
∙ repeat and average over many random splits
adrenocortical tumors
data set: 24 hrs. urinary steroid excretion
102 patients with benign ACA
45 patients with malignant ACC
26. WSOM 2014
ROC characteristics
clear improvement due to
adaptive distances
(1-specificity)
(sensitivity)
8
GMLVQ
GRLVQ
diagonal rel.
Euclidean
full matrix
AUC
0.87
0.93
0.97
adrenocortical tumors
27. WSOM 2014 27
frequently asked questions
How relevant are the relevances ?
What about over-fitting ?
matrices introduce O(N2) additional adaptive parameters!
Is the relevance matrix unique ?
- uniqueness of parameterization (Ω for given Λ) ?
- uniqueness of the relevance matrix Λ ?
- interpretation of relevance matrix ( uniqueness)
28. WSOM 2014
What about over-fitting ?
observation:
low rank of resulting relevance matrix
effective # of degrees of freedom ~ N
eigenvalues
in ACA/ACC
classification
columns of stationary ΩT are vectors in the eigenspace associated
with the smallest eigenvalue of the pseudo-covariance
mathematics: stationarity conditions
Γ
- not necessarily positive
- depends on Ω itself
- cannot be determined
prior to trainingBiehl et al. Machine Learning Reports (2009)
in preparation (forever)
29. WSOM 2014
by-product: low-dim. representation
projection on first eigenvector of Λ
projectionon2ndeigenvector
control
benign
malignant
30. WSOM 2014 30
Is the relevance matrix unique ?
(I) uniqueness of Ω, given Λ
matrix square root is not unique
irrelevant rotations, reflections, symmetries….
canonical representation in terms of eigen-decomposition of Λ:
- pos. semi-definite
- symmetric
31. WSOM 2014 31
(II) uniqueness
given transformation:
are in the null-space of
is possible if the rows of
→ identical mapping of examples, different for
possible to extend by prototypes
is singular if
features are correlated, dependent
Is the relevance matrix unique ?
32. WSOM 2014 32
regularization
training process yields
determine with eigenvectors and eigenvalues
regularization:
(K>J ) retains the eigenspace corresponding to largest eigenvalues
removes also span of small non-zero eigenvalues
(K=J ) removes null-space contributions, unique solution
with minimal Euclidean norm of row vectors
33. WSOM 2014 33
regularization
regularized mapping
after/during training
pre-processing of data
(PCA-like)
mapped feature space
fixed K
prototypes yet unknown
retains original features
flexible K
may include prototypes
Strickert, Hammer, Villmann, Biehl, IEEE SCCI 2013
Regularization and improved interpretation of linear data mappings
and adaptive distance measures
34. WSOM 2014 34
illustrative example
infra-red spectral data: 124 wine spamples
256 wavelengths 30 training data
94 test spectra
alcoholcontent
high
low
medium
GMLVQ classification
36. WSOM 2014 36
original
regularized
regularization
- enhances generalization
- smoothens relevance profile/matrix
- removes ‘false relevances’
- improves interpretability of Λ
raw relevance matrix
posterior regularization
37. Early diagnosis of Rheumatoid Arthritis
Synovial expression of CXCL4 and CXCL7 by macrophages
during early inflammatory arthritis predicts progression to
rheumatoid arthritis (in preparation)
L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow
C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner
Rheumatology Research Group, Univ. of Birmingham, UK
38. WSOM 2014 38
Rheumatoid Arthritis
Rheumatoid Arthritis (RA)
- chronicle inflammatory disease
- immune system affects joints
- RA leads to deformation and disability
40. WSOM 2014
uninflamed control
(n=9)
established RA
(n=12)
early inflammation
resolving
(n=9)
early RA
(n=17)
cytokine based diagnosis of RA
at earliest possible stage ?
long term goals:
understand pathogenesis and
mechanism of progression
synovial tissue cytokine expression
41. WSOM 2014
GMLVQ analysis
pre-processing:
• log-transformed expression values (117 dim. data, 47 samples in total)
• 21 leading principal components explain 95% of the variation
Two two-class problems: (A) established RA vs. uninflamed controls
(B) early RA vs. resolving inflammation
• 1 prototype per class, global relevance matrix, distance measure:
• leave-two-out validation (one from each class)
evaluation in terms of Receiver Operating Characteristics
42. WSOM 2014
false positive rate
truepositiveratetruepositiverate
diagonal Λii vs. cytokine index i
established RA vs.
uninflamed control
early RA vs.
resolving inflammation
initialization of
relevances as
prior knowledge
relevant cytokines
43. WSOM 2014
PF4 (platelet factor 4) = CXCL4 chemokine (C-X-C motif) ligand 4
PPBP (pro-platelet basic protein) = CXCL7 chemokine (C-X-C motif) ligand 7
cytokines associated with platelets (historically), also produced by other cell types
direct study on protein level, imaging of sinovial tissue with co-staining for
CD41 platelets
CD68 macrophages ------------- here: predominant source of CXCL4/7 expression
vWF vascular endothelium
protein level studies
• high levels of CXCL4 and CXLC7
in the first 12 weeks of synovitis
(less pronounced later)
• cytokines potentially important for
- disease progression
- early diagnosis, outcome pred.
• expression on macrophages
outside of blood vessels
discriminates early RA / resolving
44. WSOM 2014
false positive rate
truepositiveratetruepositiverate
diagonal Λii vs. cytokine index i
established RA vs.
uninflamed control
early RA vs.
resolving inflammation
initialization
of relevances
relevant cytokines
macrophage
stimulating 1
45. WSOM 2014 45
What next ?
just two (selected) on-going projects MIWOCI poster session
• Improved interpretation of linear mappings
with B. Frenay, D. Hofmann, A. Schulz, B. Hammer
minimal / maximal feature relevances by
null-space contributions at constant
(minimal) L1-norm of Ω rows
• Optimization of Receiver Operating Characteristics
with M. Kaden, P. Stürmer, T. Villmann
statistical interpretation of AUC (ROC) allows for direct
optimization based on pairs of examples (one from each class)