This document discusses reconstructing cancer progression models from bulk and single-cell data using TRaIT. It provides an overview of TRaIT, which is part of the TRONCO library and can infer mutational trees from individual tumor data. TRaIT supports both multi-region and single-cell sequencing data within a unified statistical framework. The document also presents examples of TRaIT analyzing multi-region colorectal cancer data and single-cell triple-negative breast cancer data to reconstruct progression models.
2. Outline
• We have been exploring methods for reconstructing progression
models: in this case of individual tumors, as opposed to
ensemble models coming from several patients’ data
• In the following we distinguish between
– Phylogenetic trees
– Clonal Trees
– Mutational trees (and graphs)
• … and we will present the TRONCO submodule TRaIT
(Temporal oRder of Individual Tumors), which can be used to
infer mutational trees from individual tumor data
CDAC 2018 2
3. TRaIT and the TRONCO Library
• You can look up the TRONCO R library at
troncopackage.org
• TRaIT is part of the TRONCO library and you can find a
description of its structure in
Ramazzotti et al (2017), Learning mutational graphs of individual tumor
evolution from multi-sample sequencing data, biorXiv, doi:
10.1101/132183 (submitted)
CDAC 2018 3
5. Cancer Evolution
CDAC 2018 5
[Davis,A.,Gao,R.,Navin,N.(2017)BiochimBiophysActa1867(2)]
competition
selection
expansion
differentiation
diffusion
Intra-Tumor Heterogeneity (ITH)
Cancer
develops via
the progressive
accumulation
of genomic
and epigenetic
alterations
(drivers)
Modeled via
phylogenetic-
like models
One of the most critical issues in dealing
with tumor data is Intra Tumour
Heterogeneity
7. From Sequences to Mutational
Information (in Cancer)
We can now go back to the other two
kinds of analysis described by Schwartz
and Schäffer
We can sequence a number of cells
taken from a single tumor (bulk
sequencing) or from “slices” of it
In this case we can build a tree (a
phylogeny) of the tumor “pieces”
The evolution of tumour phylogenetics: principles and
practice, R. Schwartz and A. A. Schäffer, Nature
Review Genetics, 2017
CDAC 2018 7
8. From Sequences to Mutational
Information (in Cancer)
Again, at the most advanced (and
currently expensive) frontier of
sequencing technology are Single-Cell
projects, where much of the effort is
concentrated in isolating “single” cells
In this case we can build a tree (a
phylogeny) of the tumor “sub-clones”
This is potentially the most precise way
to build statistically well-founded
progression models of a single tumor.
CDAC 2018 8
9. Multiple Samples per Tumor
CDAC 2018 9
Single-Cell Sequencing (SCS): highest resolution, but
technical problems due to cell isolation and whole-
genome amplification (WGA):
data-specific errors: allelic dropouts (ADOs), false alleles,
missing data, non-uniform coverage, doublets, etc.
Bulk intermingled signal
Phylogeny reconstruction, often via signal deconvolution
(e.g., VAFs)
10. From Bulk to Single-cell Analysis
CDAC 2018
Single-cell genome sequencing: current state of the science, Charles Gawad, Winston Koh & Stephen
R. Quake, Nature Reviews Genetics 17, 175–188 (2016) doi:10.1038/nrg.2015.16
10
12. Clonal and Mutational Trees
CDAC 2018 12
annotated in a set of cells
Given a set of mutations
A B C D E F
Clonal Lineage Trees Mutational Trees
Standard
phylogenetic
tree
Clonal signature
Prevalence Ordering
Mutational Ordering
13. Clonal and Mutational Trees
• “Standard” Phylogenetic Trees
– Davis, A.,Navin, N. (2016) Genome Biology, 17(1):113
– …
• Clonal Lineage Trees
– Bitphylogeny: Yuan et al. (2015) Genome biology 16(1), 1
– OncoNem: Ross & Markowetz (2016) Genome biology 17(1), 1
– Single Cell Genotyper: Roth et al. (2016) Nat met 13(7), 573-576
– ddClone: Salehi et al (2017) Genome biology 18:44
– …
• Mutational Trees
– MUTTREE: Kim, & Simon (2014), BMC bioinformatics, 15(1), 27
– SCITE: Kuipers et al. (2016) Genome biology, 17(1), 86
– …
– SiFit: Zafar et a. (2017), Genome Biology 18:178 (*)
CDAC 2018 13
14. Clonal and Mutational Trees from SCS
Most techniques rely on technical assumptions
– E.g. Infinite Sites Assumption (ISA):
• “each mutation occurs at most once during th evolutionary
history of a tumor, and is never lost”
• ⟹ possible violations, due to, e.g., convergent evolution
Can be computationally expensive and require data-specific error models
CDAC 2018 14
A
B
C D
D
A
B
C D
D
15. TRaIT: Temporal oRder
of Individual Tumors
• Robust estimation of the mutational
ordering in single tumors
• Supports both multi-region and SCS
data within a unified statistical
framework
– no data-specific noise model
• Binary input data → any alteration
type
– SNVs, CNAs, fusions, etc.
• Extends mutational trees to
mutational graphs (direct acyclic
graphs - DAGs) :
– confounding factors
– possible multiple independent trajectories
– violations of the ISA, due to convergent evolution
CDAC 2018 15
16. TRaIT Suite
CDAC 2018 16
• Given a binary matrix that stores the presence of any
alteration in a sample,
• We assess (i) temporal ordering and (ii) statistical association via non-parametric Bootstrap and
hypothesis testing -> direct graph G (variables = alterations).
• We extract output models with algorithmic strategies based on information theoretic measures (e.g.,
mutual information)
• Optimal polynomial-time “off-the-shelf” algorithms; e.g., Edmonds and Gabow algorithms infer trees (weighted
directed MST) and Prim and Chow-Liu plus post-processing infer DAGs
• The overall complexity of this step is O((nm)2 x B) where B is the cost of running bootstrap and hypothesis
testing on each entry in D.
18. TRaIT Analisys: Multi-region data MSI-
High Colorectal Cancer
CDAC 2018 18
Lu, You-Wang, et al.
"Colorectal cancer
genetic
heterogeneity
delineated by multi-
region sequencing."
PloS one 11.3
(2016): e0152673
19. TRaIT Analisys: SCS Triple-neg Breast
Cancer
CDAC 2018 19
ADO rate = 9.73x10-2
FP rate = 1.24x10-6
Undetected subclone?
Subclone H?
Clonal group also
detected in the
control bulk sample
Subclonal groups
Uncertainty on
temporal direction
wild typemutated
Wang, Yong, et al.
"Clonal evolution in
breast cancer
revealed by single
nucleus genome
sequencing." Nature
512.7513 (2014):
155
20. Conclusions
• In this talk we have seen the analysis of two kinds of data types
that are produced when studying individual tumors
– Region data, bulk-sequenced
– Single cells sequenced
• In particular we have seen a framework, based on the TRONCO
library that can be used to analyze both kinds of data
• Again, you are invited to use the TRaIT facilities in TRONCO to
reproduce the studies presented
CDAC 2018 20