Prototype-based classifiers and their applications in the life sciences

Michael Biehl
Mathematics and Computing Science
University of Groningen / NL
Prototype-based classifiers
and their applications in the life-sciences
www.cs.rug.nl/~ biehl

Michael Biehl
Mathematics and Computing Science
University of Groningen / NL
LVQ and Relevance Learning
frequently asked questions and rarely given answers
www.cs.rug.nl/~ biehl

WSOM 2014
frequently asked questions
So, why do you still do this LVQ-stuff ?

WSOM 2014
basics: distance based classifiers, relevance learning
What about the curse of dimensionality ?
How do you find a good distance measure ?
example: Generalized Matrix LVQ
What about over-fitting ?
Is the relevance matrix unique ?
Is it useful in practice ?
applications: bio-medical data
adrenal tumors, rheumatoid arthritis
outlook: What‘s next ?

WSOM 2014
K-NN classifier
a simple distance-based classifier
- store a set of labeled examples
- classify a query according to the
label of the Nearest Neighbor
(or the majority of K NN)
?
- piece-wise linear class borders
parameterized by all examples
feature space
+ conceptually simple, no training required, one parameter (K)
- expensive storage and computation, sensitivity to “outliers”
can result in overly complex decision boundaries

WSOM 2014
prototype based classification
a prototype based classifier
- represent the data by one or
several prototypes per class
- classify a query according to the
label of the nearest prototype
(or alternative schemes)
- piece-wise linear class borders
parameterized by prototypes
feature space
+ less sensitive to outliers, lower storage needs, little computational
effort in the working phase
- training phase required in order to place prototypes,
model selection problem: number of prototypes per class, etc.
?

WSOM 2014
set of prototypes
carrying class-labels
based on dissimilarity / distance measure
nearest prototype classifier (NPC):
given - determine the winner with
Nearest Prototype Classifier
minimal requirements:
- assign to class
standard example:
squared Euclidean

WSOM 2014
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors
for different classes
competititve learning: LVQ1 [Kohonen, 1990]
• identify the winner
(closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)

WSOM 2014
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
∙ tesselation of feature space
[piece-wise linear]
∙ distance-based classification
[here: Euclidean distances]
∙ generalization ability
correct classification of new data
∙ aim: discrimination of classes
( ≠ vector quantization
or density estimation )



WSOM 2014
What about the curse of dimensionality ?
concentration of norms/distances for large N
„distance based methods are bound to fail in high dimensions“ ?
LVQ:
- prototypes are not just random data points
- carefully selected representatives of the data
- distances of a given data point to prototypes are compared
projection to non-trivial
low-dimensional subspace!
[Ghosh et al., 2007, Witoelar et al., 2010]
models of LVQ training, analytical treatment in the limit
successful training needs training examples
see also:

WSOM 2014
cost function based LVQ
one example: Generalized LVQ [Sato & Yamada, 1995]
sigmoidal (linear for small arguments), e.g.
E approximates number of misclassifications
linear
E favors large margin separation of classes, e.g.
two winning prototypes:
minimize
small , large
E favors class-typical prototypes

WSOM 2014
cost function based LVQ
There is nothing objective about objective functions
J. McClelland

WSOM 2014
GLVQ
training = optimization with respect to prototype position,
e.g. single example presentation, stochastic sequence of examples,
update of two prototypes per step
based on non-negative, differentiable distance
requirement:
update decreases , increases

WSOM 2014
GLVQ
based on non-negative, differentiable distance

WSOM 2014
GLVQ
based on Euclidean distance
moves prototypes towards / away from
sample with prefactors

WSOM 2014 16
fixed distance measures:
- select distance measures according to prior knowledge
- data driven choice in a preprocessing step
- compare performance of various measures
example: divergence based LVQ (DLVQ)
Mwebaze et al., Neurocomputing (2011)
relevance learning:
- employ a parameterized distance measure
- update its parameters in the training process
together with prototype training
- adaptive, data driven dissimilarity
example: Matrix Relevance LVQ
What is a good distance measure ?

WSOM 2014
Relevance Matrix LVQ
generalized quadratic distance in LVQ:
variants:
one global, several local, class-wise relevance matrices Λ(j)
→ piecewise quadratic decision boundaries
rectangular discriminative low-dim. representation
e.g. for visualization [Bunte et al., 2012]
possible constraints: rank-control, sparsity, …
diagonal matrices: single feature weights [Bojer et al., 2001]
[Hammer et al., 2002]
[Schneider et al., 2009]

WSOM 2014
Relevance Matrix LVQ
Generalized Matrix-LVQ
(GMLVQ)
optimization of prototypes and distance measure

WSOM 2014 19
heuristic interpretation
summarizes
- the contribution of the original dimension
- the relevance of original features for the classification
interpretation assumes implicitly:
features have equal order of magnitude
e.g. after z-score-transformation →
(averages over data set)
standard Euclidean distance for
linearly transformed features

Classification of adrenal tumors
Wiebke Arlt , Angela Taylor
Dave J. Smith, Peter Nightingale
P.M. Stewart, C.H.L. Shackleton
et al.
Petra Schneider
Han Stiekema
Michael Biehl
Johann Bernoulli Institute for
Mathematics and Computer Science
University of Groningen
School of Medicine
Queen Elizabeth Hospital
University of Birmingham/UK
(+ several centers in Europe)
[Arlt et al., J. Clin. Endocrinology & Metabolism, 2011]
[Biehl et al., Europ. Symp. Artficial Neural Networks (ESANN), 2012]

WSOM 2014
∙ adrenocortical tumors, difficult differential diagnosis:
ACC: adrenocortical carcinomas
ACA: adrenocortical adenomas
∙ idea: steroid metabolomics
tumor classification based on urinary steroid excretion
32 candidate steroid markers:
adrenocortical tumors

WSOM 2014
Generalized Matrix LVQ , ACC vs. ACA classification
∙ data divided in 90% training, 10% test set
∙ determine prototypes
typical profiles (1 per class)
∙ apply classifier to test data
evaluate performance (error rates, ROC)
∙ adaptive generalized quadratic distance measure
parameterized by
∙ repeat and average over many random splits
data set: 24 hrs. urinary steroid excretion
102 patients with benign ACA
45 patients with malignant ACC

WSOM 2014
prototypes
ACA
ACC
log-transformed steroid
excretion in ACA/ACC
rescaled using healthy
control group values

WSOM 2014
off-diagonaldiagonal elements
subset of 9 selected steroids ↔ technical realization (patented, University
of Birmingham/UK)
Relevance matrix

WSOM 2014
highly discriminative
combination of markers!
weaklydiscriminativemarkers
5a-THA (8)
TH-Doc (12)

WSOM 2014
ROC characteristics
clear improvement due to
adaptive distances
(1-specificity)
(sensitivity)
8
GMLVQ
GRLVQ
diagonal rel.
Euclidean
full matrix
AUC
0.87
0.93
0.97

WSOM 2014 27
How relevant are the relevances ?
matrices introduce O(N2) additional adaptive parameters!
- uniqueness of parameterization (Ω for given Λ) ?
- uniqueness of the relevance matrix Λ ?
- interpretation of relevance matrix (  uniqueness)

WSOM 2014
observation:
low rank of resulting relevance matrix
effective # of degrees of freedom ~ N
eigenvalues
in ACA/ACC
classification
columns of stationary ΩT are vectors in the eigenspace associated
with the smallest eigenvalue of the pseudo-covariance
mathematics: stationarity conditions
Γ
- not necessarily positive
- depends on Ω itself
- cannot be determined
prior to trainingBiehl et al. Machine Learning Reports (2009)
in preparation (forever)

WSOM 2014
by-product: low-dim. representation
projection on first eigenvector of Λ
projectionon2ndeigenvector
control
benign
malignant

WSOM 2014 30
(I) uniqueness of Ω, given Λ
matrix square root is not unique
irrelevant rotations, reflections, symmetries….
canonical representation in terms of eigen-decomposition of Λ:
- pos. semi-definite
- symmetric

WSOM 2014 31
(II) uniqueness
given transformation:
are in the null-space of
is possible if the rows of
→ identical mapping of examples, different for
possible to extend by prototypes
is singular if
features are correlated, dependent

WSOM 2014 32
regularization
training process yields
determine with eigenvectors and eigenvalues
regularization:
(K>J ) retains the eigenspace corresponding to largest eigenvalues
removes also span of small non-zero eigenvalues
(K=J ) removes null-space contributions, unique solution
with minimal Euclidean norm of row vectors

WSOM 2014 33
regularization
regularized mapping
after/during training
pre-processing of data
(PCA-like)
mapped feature space
fixed K
prototypes yet unknown
retains original features
flexible K
may include prototypes
Strickert, Hammer, Villmann, Biehl, IEEE SCCI 2013
Regularization and improved interpretation of linear data mappings
and adaptive distance measures

WSOM 2014 34
illustrative example
infra-red spectral data: 124 wine spamples
256 wavelengths 30 training data
94 test spectra
alcoholcontent
high
low
medium
GMLVQ classification

WSOM 2014 35
GMLVQ
best performance
7 dimensions remaining
over-fitting
effect
null-space correction
P=30 dimensions

WSOM 2014 36
original
regularized
regularization
- enhances generalization
- smoothens relevance profile/matrix
- removes ‘false relevances’
- improves interpretability of Λ
raw relevance matrix
posterior regularization

Early diagnosis of Rheumatoid Arthritis
Synovial expression of CXCL4 and CXCL7 by macrophages
during early inflammatory arthritis predicts progression to
rheumatoid arthritis (in preparation)
L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow
C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner
Rheumatology Research Group, Univ. of Birmingham, UK

WSOM 2014 38
Rheumatoid Arthritis
Rheumatoid Arthritis (RA)
- chronicle inflammatory disease
- immune system affects joints
- RA leads to deformation and disability

WSOM 2014
mRNA extraction real-time PCRtissue sectionsynovium
IL1A IL17F FASL CXCL4 CCL15 TGFB1 KITLG
IL1B IL18 CD70 CXCL5 CCL16 TGFB2 MST1
IL1RN IL19 CD30L CXCL6 CCL17 TGFB3 SPP1
IL2 IL20 4-1BB-L CXCL7 CCL18 EGF SFRP1
IL3 IL21 TRAIL CXCL9 CCL19 FGF2 ANXA1
IL4 IL22 RANKL CXCL10 CCL20 TGFA TNFRSF13B
IL5 IL23A TWEAK CXCL11 CCL21 IGF2 IL6R
IL6 IL24 APRIL CXCL12 CCL22 VEGFA NAMPT
IL7 IL25 BAFF CXCL13 CCL23 VEGFB C1QTNF3
IL8 IL26 LIGHT CXCL14 CCL24 MIF VCAM1
IL9 IL27 TL1A CXCL16 CCL25 LIF LGALS1
IL10 IL28A GITRL CCL1 CCL26 OSM LGALS9
IL11 IL29 FASLG CCL2 CCL27 ADIPOQ LGALS3
IL12A IL32 IFNA1 CCL3 CCL28 LEP LGALS12
IL12B IL33 IFNA2 CCL4 XCL1 GHRL
IL13 LTA IFNB1 CCL5 XCL2 RETN
IL14 TNF IFNG CCL7 CX3CL1 CTLA4
IL15 LTB CXCL1 CCL8 CSF1 EPO
IL16 OX40L CXCL2 CCL11 CSF2 TPO
IL17A CD40L CXCL3 CCL13 CSF3 FLT3LG
panel of 117 cytokines
• cell signaling proteins
• regulate immune response
• produced by, e.g.
T-cells, macrophages,
lymphocytes, fibroblasts, etc.
synovial tissue cytokine expression

WSOM 2014
uninflamed control
(n=9)
established RA
(n=12)
early inflammation
resolving
(n=9)
early RA
(n=17)
cytokine based diagnosis of RA
at earliest possible stage ?
long term goals:
understand pathogenesis and
mechanism of progression
synovial tissue cytokine expression

WSOM 2014
GMLVQ analysis
pre-processing:
• log-transformed expression values (117 dim. data, 47 samples in total)
• 21 leading principal components explain 95% of the variation
Two two-class problems: (A) established RA vs. uninflamed controls
(B) early RA vs. resolving inflammation
• 1 prototype per class, global relevance matrix, distance measure:
• leave-two-out validation (one from each class)
evaluation in terms of Receiver Operating Characteristics

WSOM 2014
false positive rate
truepositiveratetruepositiverate
diagonal Λii vs. cytokine index i
established RA vs.
uninflamed control
early RA vs.
resolving inflammation
initialization of
relevances as
prior knowledge
relevant cytokines

WSOM 2014
PF4 (platelet factor 4) = CXCL4 chemokine (C-X-C motif) ligand 4
PPBP (pro-platelet basic protein) = CXCL7 chemokine (C-X-C motif) ligand 7
cytokines associated with platelets (historically), also produced by other cell types
direct study on protein level, imaging of sinovial tissue with co-staining for
CD41  platelets
CD68  macrophages ------------- here: predominant source of CXCL4/7 expression
vWF  vascular endothelium
protein level studies
• high levels of CXCL4 and CXLC7
in the first 12 weeks of synovitis
(less pronounced later)
• cytokines potentially important for
- disease progression
- early diagnosis, outcome pred.
• expression on macrophages
outside of blood vessels
discriminates early RA / resolving

WSOM 2014
false positive rate
truepositiveratetruepositiverate
diagonal Λii vs. cytokine index i
established RA vs.
uninflamed control
early RA vs.
resolving inflammation
initialization
of relevances
relevant cytokines
macrophage
stimulating 1

WSOM 2014 45
What next ?
just two (selected) on-going projects  MIWOCI poster session
• Improved interpretation of linear mappings
with B. Frenay, D. Hofmann, A. Schulz, B. Hammer
minimal / maximal feature relevances by
null-space contributions at constant
(minimal) L1-norm of Ω rows
• Optimization of Receiver Operating Characteristics
with M. Kaden, P. Stürmer, T. Villmann
statistical interpretation of AUC (ROC) allows for direct
optimization based on pairs of examples (one from each class)

WSOM 2014 46
http://matlabserver.cs.rug.nl/gmlvqweb/web/
Matlab collection:
Relevance and Matrix adaptation in Learning Vector Quantization
(GRLVQ, GMLVQ and LiRaM LVQ)
http://www.cs.rug.nl/~biehl/
links
Pre/re-prints etc.:

Prototype-based classifiers and their applications in the life sciences

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Prototype-based classifiers and their applications in the life sciences

Ähnlich wie Prototype-based classifiers and their applications in the life sciences (20)

Mehr von University of Groningen

Mehr von University of Groningen (15)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Prototype-based classifiers and their applications in the life sciences