Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
1. 1 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Extracting a cellular hierarchy from high-
dimensional cytometry data with SPADE
Nikolas Pontikos
PhD Student, CIMR
2. 2 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Different types of cells can be identified based on their
shape/size and the surface markers (proteins) that they
express:
Biological Context: Cell Phenotypes
Lymphocytes Granulocytes
Neutrophils
CD4+ Lymphocytes CD8+ Lymphocytes
CD45RA+ CD45RA-
CD stands for Cluster of Differentiation these are
surface proteins which can be used as markers to
distinguish different cell types.
4. 4 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
The Transitional Phenotype of Cells
Memory Cell Naive Cell
CD45RA
Memory Cells Naive Cells
0.0 0.5 1.0 1.5 2.0
0.00.20.40.60.81.0
Log10 CD45RA Intensity
Density
As cells transition from one cell type (state) to
another they lose/gain expression of certain
markers.
Here the CD45RA marker is lost as cells
transition from naive to memory status.
This results in a bimodal distributions of the
intensity of CD45RA.
5. 5 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Manual Method of Identifying Cell Phenotypes
% of CD25+ Naive Cells
% of Memory Cells
6. 6 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Identifying all possible cell subsets is tedious and error-
prone.
P parameters results in the order of P^2 bi-dimensional
comparisons.
Manual analysis also introduces operator bias.
Unexpected or rare cell populations may be missed.
Issues with Manual Analysis of Flow Cytometry Data
7. 7 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Flow Data Genetic Data
P ~ 100,000P ~ 10
N~1000
N~1,00,000
N > 10,000 x P N < 100 x P
VS
Distance-based clustering:
- hclust
- kmeans
Density-based clustering:
- identifying regions of significantly high-density
- fitting mixture models
N cells N individuals
P cellular markers
P SNPs
8. 8 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Motivation for SPADE
•Heading towards high-dimensional data sets:
- pooling of datasets
- mass cytometry
•Distance based methods are fast at the expense of storing the entire distance matrix.
Distance-based clustering is well suited for high-dimensional data
sets when data is too sparse for density-based methods.
9. 9 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Primarily a visualisation tool for revealing structure in point
clouds as obtained from flow cytometry.
A clustering method with rare event detection thanks to
density-dependent downsampling.
Four main steps in SPADE:
1) Density-dependent downsampling
2) Agglomerative clustering
3) Minimum spanning tree construction
4) Upsampling
SPADE:
spanning-tree progression analysis of density-normalised events
10. 10 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Outline of SPADE as applied to a simulated data set
- Proof of concept
- Structure of data preserved and rarer cell population identified
Analysis of mouse hematopoiesis using flow cytometry data
- Ability to reconstruct a known hierarchy
- Comparison to manual gating
- Identified cell population missed in manual gating (dendritic cells)
Analysis of human hematopoiesis using mass cytometry data
- Joining multiple stimulation experiments on core markers
- Non-targeted cell population identified (NK cells)
Results from paper
11. 11 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
SPADE: Spanning-tree Progression Analysis of Density-normalised Events
(i) A simulated two-parameter flow
cytometry data set, with one rare
population and three abundant
populations.
(ii) Result of density-dependent down-
sampling of the original data.
(iii) Agglomerative clustering result of
the down-sampled cells. Adjacent
clusters are drawn in alternating
colors.
(iv) Minimum spanning tree that
connects the cell clusters.
(v) Colored SPADE trees. Nodes are
colored by the median intensities of
protein markers of cells in each
node, allowing visualization of the
behaviors of the two markers across
the entire heterogeneous cell
population.
Input
Output