2. Structural variants. Easy to grok. Hard to find (well).
Reference
Deletion
Duplication
Inversion
Insertion
Complex
Ira Hall. Saturday @ 3PM: Complex SV in 64 tumor genomes
3. “Signals” for SV discovery
Depth of Paired-end
coverage mapping
too big
(deletion)
Split-read Prior
mapping knowledge
Known SV sites
Predictions from
other tools
Most SV software exploit just one signal
4. DELLY: Rausch et al, 2012
Depth of Paired-end
coverage mapping
too big
(deletion)
1. Predict
Split-read Prior
mapping knowledge “Stepwise”
2. Refine SV sites
Known
Predictions from
other tools
5. GASVPro: Sindhi et al, 2012
Depth of Paired-end
coverage mapping
too big
(deletion)
Combines DoC and PEM signals for greater specificity,
especially for deletions (using DoC)
Split-read Prior
mapping knowledge
“Integrative”
Known SV sites
Predictions from
other tools
6. Layer et al, unpub.
Depth of Paired-end
coverage mapping
too big
(deletion)
Split-read Prior
mapping knowledge
Known SV sites
Predictions from
other tools
LUMPY integrates all (and future) signals
7. Ryan Layer
Graduate Student
Co-mentored with Ira Hall
github.com/ryanlayer
25. Bakeoff #1: detection of 4000 simulated SVs
chr10
- Simulate 4000 SVs on chr10 (build 37)
- 1000 deletions
- 1000 duplications
- 1000 insertions
- 1000 inversions
- For each SV type, 500 < 1kb and 500 >= 1kb
- “Sequence” mutant chr10 to 2X, 5X, 20X w/ wgsim
- Compare LUMPY, HYDRA, GASVPro, DELLY
26. Fraction of deletions found
0.50
0.75
1.00
0.00
0.25
0.00 0.25 0.50 0.75 1.00
lum
py
(PE
)
lum 0.86
py
(SR
)
0.95 0.93
lum
p y(
bo
th)
0.96 0.95
hy
dra
20x
ga 0.78
svp
ro
de 0.7
lly
(pe
)
de
lly 0.93
(pe
+s
r)
0.82
0.00 0.25 0.50 0.75 1.00
0.36
all
Legend
0.39
Increased sensitivity for deletions: 20X coverage
< 1kb
Delly: 82%
GASV: 70%
Hydra: 78%
Lumpy: 95%
>= 1kb
27. Fraction of deletions found
0.82
0.50
0.75
1.00
0.00
0.25
0.00 0.25 0.50 0.75 1.00
lum
py
(PE
)
lum 0.36
py
(SR
)
0.39
lum
py
(bo
th)
0.79
hy
dra
5x
ga 0.31
svp
ro
de 0.28
lly
(pe
)
de
lly 0.4
(pe
+s
r)
0.29
0.00 0.25 0.50 0.75 1.00
0.04
all
Legend
0.03
sensitive
Increased sensitivity for deletions: 10X coverage
< 1kb
Delly: 29%
GASV: 28%
Hydra: 31%
Lumpy: 79%
>= 1kb
>2 times more
28. Increased sensitivity for deletions: 2X coverage
2x
Lumpy: 24%
Hydra: 3%
1.00
1.00
Fraction of deletions found
GASV: 3%
Delly: 4%
0.75
0.75
6 times more
0.50
0.50
0.29
0.24
sensitive
0.25
0.25
0.04
0.04
0.03
0.03
0.03
0.02
Legend
0.00
0.00
)
)
r)
)
th)
dra
ro
(PE
(SR
(pe
+s
svp
< 1kb
(bo
hy
(pe
py
py
lly
ga
>= 1kb
de
lum
lum
py
lly
all
de
lum
29. Fraction of deletions found
0.79
0.50
0.75
1.00
0.00
0.25
0.00 0.25 0.50 0.75 1.00 0.00
lum
py
(PE
)
lum 0.27
py
(SR
lum )
py 0.36
(bo
th)
0.7
hy
dra
ga 0.26
svp
ro
de
lly 0
(pe
)
de
lly
(pe 0.3
+s
r)
0.21
0.00 0.25 0.50 0.75 1.00 0.00
Same goes for duplications (5X)
0.03
all
Legend
sensitive
0.04
< 1kb
Lumpy: 70%
GASV: N/A
Delly: 21%
Hydra: 26%
>= 1kb
~3 times more
30. delly−sr Fraction of deletions found
0.5
0.50
0.75
1.00
0.00
0.25
0.00 0.25 0.50 0.75 1.00 0.00
lum
py
(PE
lumpy−pe )
lum 0.74
py
(SR
lumpy−sr
lum )
py 0.83
(bo
th)
lumpy
0.95
hy
dra
hydra
ga 0.52
svp
ro
gasvpro
de
lly 0.71
(pe
)
dedelly−pe
lly
(pe 0.55
+s
r)
delly−sr
...and inversions (5X)
0.1
0.00 0.25 0.50 0.75 1.00 0.00
lumpy−pe
0.22
lumpy−sr
all
Legend
sensitive
0.26
< 1kb
Lumpy: 95%
Delly: 10%
GASV: 71%
Hydra: 52%
>= 1kb
1.2 - 2X more
lumpy
31. Best sensitivity across the board
- Profound for improvement for smaller (<1kb) variants
- And, importantly, at low coverage.
- up to 6X more sensitive.
- No significant increase in false positives.
34. Tumor heterogeneity simulation: an in silico “spike in”
140 SVs >= 100bp
chr17 (HuRef) chr17 (build 37)
50% tumor freq.
FASTA FASTA
wgsim wgsim
(20x) (20X)
What fraction of
40X the 140 SVs
BAM can we detect?
35. Tumor heterogeneity simulation: an in silico “spike in”
140 SVs >= 100bp
chr17 (HuRef) chr17 (build 37)
50% tumor freq. 20% tumor freq.
*
FASTA FASTA FASTA FASTA
wgsim wgsim wgsim wgsim
(20x) (20X) (4x) (36X)
What fraction of
40X the 140 SVs 40X
BAM can we detect? BAM
* Not even close to scale.
36. Tumor heterogeneity simulation: an in silico “spike in”
140 SVs >= 100bp
chr17 (HuRef) chr17 (build 37)
50% tumor freq. 20% tumor freq. . . . 1% tumor freq.
*
*
FASTA FASTA FASTA FASTA FASTA FASTA
wgsim wgsim wgsim wgsim wgsim wgsim
(20x) (20X) (4x) (36X) (0.4x) (39.6X)
What fraction of
40X the 140 SVs 40X 40X
BAM can we detect? BAM BAM
* Not even close to scale.
41. 1. Integrates all SV signals
2. High sensitivity
3. Power for low frequency variants:
cancer genomics / heterogeneity
github.com/arq5x/lumpy-sv
42. Acknowledgments
Ryan Layer Ira Hall Raphael Lab
Graduate Student Univ. of Virginia Brown University
Co-mentored with Ira Hall Former mentor & key collaborator Help w/ GASV & Venter simulation
github.com/ryanlayer faculty.virginia.edu/irahall/ compbio.cs.brown.edu/
Funding R01 HG006693-01
Fund for Excellence in
Science & Technology