Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

1 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Extracting a cellular hierarchy from high-
dimensional cytometry data with SPADE
Nikolas Pontikos
PhD Student, CIMR

2 of 28
Different types of cells can be identified based on their
shape/size and the surface markers (proteins) that they
express:
Biological Context: Cell Phenotypes
Lymphocytes Granulocytes
Neutrophils
CD4+ Lymphocytes CD8+ Lymphocytes
CD45RA+ CD45RA-
CD stands for Cluster of Differentiation these are
surface proteins which can be used as markers to
distinguish different cell types.

3 of 28
0 1000 2000 3000 4000
02004006008001000
Forward Scatter
SideScatter
What is Flow Cytometry?
© 1998-2012 Abcam plc. All rights reserved
Cells Forward
Scatter
Side
Scatter
CD4 CD127 CD45RA CD25
1 2110 309 103 254 4 70
2 1565 252 57 278 341 59
... ... ... ... ... ... ...
110,992 964 256 78 199 9 345
110,992 points
Granularity
Lymphocytes
Cell Size
Neutrophils
Granulocytes

4 of 28
The Transitional Phenotype of Cells
Memory Cell Naive Cell
CD45RA
Memory Cells Naive Cells
0.0 0.5 1.0 1.5 2.0
0.00.20.40.60.81.0
Log10 CD45RA Intensity
Density
As cells transition from one cell type (state) to
another they lose/gain expression of certain
markers.
Here the CD45RA marker is lost as cells
transition from naive to memory status.
This results in a bimodal distributions of the
intensity of CD45RA.

5 of 28
Manual Method of Identifying Cell Phenotypes
% of CD25+ Naive Cells
% of Memory Cells

6 of 28
Identifying all possible cell subsets is tedious and error-
prone.
P parameters results in the order of P^2 bi-dimensional
comparisons.
Manual analysis also introduces operator bias.
Unexpected or rare cell populations may be missed.
Issues with Manual Analysis of Flow Cytometry Data

7 of 28
Flow Data Genetic Data
P ~ 100,000P ~ 10
N~1000
N~1,00,000
N > 10,000 x P N < 100 x P
VS
Distance-based clustering:
- hclust
- kmeans
Density-based clustering:
- identifying regions of significantly high-density
- fitting mixture models
N cells N individuals
P cellular markers
P SNPs

8 of 28
Motivation for SPADE
•Heading towards high-dimensional data sets:
- pooling of datasets
- mass cytometry
•Distance based methods are fast at the expense of storing the entire distance matrix.
Distance-based clustering is well suited for high-dimensional data
sets when data is too sparse for density-based methods.

9 of 28
Primarily a visualisation tool for revealing structure in point
clouds as obtained from flow cytometry.
A clustering method with rare event detection thanks to
density-dependent downsampling.
Four main steps in SPADE:
1) Density-dependent downsampling
2) Agglomerative clustering
3) Minimum spanning tree construction
4) Upsampling
SPADE:
spanning-tree progression analysis of density-normalised events

10 of 28
Outline of SPADE as applied to a simulated data set
- Proof of concept
- Structure of data preserved and rarer cell population identified
Analysis of mouse hematopoiesis using flow cytometry data
- Ability to reconstruct a known hierarchy
- Comparison to manual gating
- Identified cell population missed in manual gating (dendritic cells)
Analysis of human hematopoiesis using mass cytometry data
- Joining multiple stimulation experiments on core markers
- Non-targeted cell population identified (NK cells)
Results from paper

11 of 28
SPADE: Spanning-tree Progression Analysis of Density-normalised Events
(i) A simulated two-parameter ﬂow
cytometry data set, with one rare
population and three abundant
populations.
(ii) Result of density-dependent down-
sampling of the original data.
(iii) Agglomerative clustering result of
the down-sampled cells. Adjacent
clusters are drawn in alternating
colors.
(iv) Minimum spanning tree that
connects the cell clusters.
(v) Colored SPADE trees. Nodes are
colored by the median intensities of
protein markers of cells in each
node, allowing visualization of the
behaviors of the two markers across
the entire heterogeneous cell
population.
Input
Output

Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

Similar to Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE (20)

Recently uploaded

Recently uploaded (20)

Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE