Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
SMBE 2015: Rapid Identification of Phylogenetically Informative Data from Next-Gen Sequencing
1. Rapid identification of phylogenetically informative
data from next-gen sequencing
Rachel Schwartz
The Biodesign Institute
Arizona State University
Rachel.Schwartz@asu.edu
July 16, 2015
7. A composite genome for reference
Shotgun sequencingShotgun sequencing Shotgun sequencing
Assemble composite genome
Align reads to
composite genome
8. A composite genome for reference
Shotgun sequencingShotgun sequencing Shotgun sequencing
Assemble composite genome
Align reads to
composite genome
Call genotype at each
site for each sample
9. A composite genome for reference
Shotgun sequencingShotgun sequencing Shotgun sequencing
Assemble composite genome
Align reads to
composite genome
Call genotype at each
site for each sample
Remove sites
with missing data
10. A composite genome for reference
Shotgun sequencingShotgun sequencing Shotgun sequencing
Assemble composite genome
Align reads to
composite genome
Call genotype at each
site for each sample
Remove sites
with missing data
Output alignment
12. Simulation Results: 1 million bp genome
A
B
C
D
E
F
G
H
Laddertrees
Equal branch length
A
B
C
D
E
F
G
H
Long deep branches
A
B
C
D
E
F
G
H
Short deep branches
G
H
G
H
G
H
Coverage
Numberofcorrectmappablesites
1
10
100
1000
10000
100000
1 2 4 8 10 20 50
●
●
●
● ● ● ●
Slow genes
Fast genes
●
Schwartz et al. (2015) BMC Bioinformatics
13. Simulation Results: by depth
Coverage
Numberofcorrectmappablesites
1
10
100
1000
10000
100000
1 2 4 8 10 20 50
q
q
q
q q q q
q
q
q
q
q q
A
B
C
D
E
F
G
H
Laddertrees
Equal branch length Long
C
D
E
F
G
H
Balancedtrees
Depth 1
Depth 2
Depth 3
Depth 4
∗ Depth 5
• Depth 6
Schwartz et al. (2015) BMC Bioinformatics
14. Phylogeny of apes from SISRS data
Bonobo
Human
Gorilla
Orangutan
Rhesus macaque
Crab macaque
Chimp
15. Phylogeny of mammals from SISRS data
treeshrew
horse
pig
cow
toothed whale
baleen whale
pangolin
dog
cat
bat
megabat
shrew
star nosed mole
aardvark
tenrec
elephant shrew
manatee
elephant
sloth
armadillo
opossum
wallaby
rabbit
pika
rat
mouse
colugo
lemur
human
macaque
100
90
100
91
61
100
100
100
100
100
86
51
100
100
100
100
100
100
100
100
72
100
100
100
100
100
100
Schwartz et al. (2015) BMC Bioinformatics
16. Phylogeny of mammals from SISRS data
colugo
sn mole
shrew
horse
pig
cow
baleenwhale
toothedwhale
pangolin
dog
cat
bat
megabat
aardvark
tenrec
e shrew
elephant
manatee
opossum
wallaby
sloth
armadillo
treeshrew
rat
mouse
rabbit
pika
lemur
human
macaque
100
60
100
100
100
100
100
100
100
100
53
100
100
100
100
100
100
100
99
100
100
75
100
100
100
100
treeshrew
horse
pig
cow
toothedwhale
baleenwhale
pangolin
dog
cat
bat
megabat
shrew
sn mole
aardvark
tenrec
e shrew
manatee
elephant
sloth
armadillo
opossum
wallaby
rabbit
pika
rat
mouse
colugo
lemur
human
macaque
100
90
100
91
61
100
100
100
100
100
86
51
100
100
100
100
100
100
100
100
72
100
100
100
100
100
100
lemur
colugo
bat
megabat
horse
pig
cow
toothedwhale
baleenwhale
pangolin
dog
cat
sn mole
shrew
manatee
elephant
tenrec
aardvark
e shrew
wallaby
opossum
armadillo
sloth
treeshrew
rabbit
pika
rat
mouse
human
macaque
100
100
61
62
61
100
100
100
100
100
87
80
100
100
100
100
100
100
100
100
100
100
92
100
100
100
100
Schwartz et al. (2015) BMC Bioinformatics
17. Phylogeny of mammals from SISRS data
Opossum
Wallaby
aardvarkG
armadillo
baleenwhaleG
bat
cat
colugoG
cow
dog
elephant
eshrewG
horse
human
lemur
macaque
manateeG
megabatG
mouse
pangolinG
pig
pika
rabbit
ratT
shrew
slothG
sn moleG
tenrecG
toothedwhale
treeshrew
18. Phylogenies of angiosperms from SISRS data
q
q
qq
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
50
75
100
125
20 40 60
Number of Gaps Allowed At A Site
Robinson−FouldsDistance
q
q
Distance
Nodes In Tree
Comparing Trees Generated With Varying Amounts of Missing Data
Adam Orr
19. SISRS rapidly identifies phylogenetically informative
data from next-gen sequencing reads
Apes: 3 days
Mammals: 7 days
Leishmania: 12 hours
No reference genome is required.
Minimal assembly required (completely automated).
Results are comparable to slower, labor-intensive methods.
26. Conclusions
SISRS rapidly identifies data for phylogenetics from
next-gen sequencing reads
Different (SISRS) data = alternative topologies
Use SISRS data to estimate branch lengths and
divergence dates accurately
27. Acknowledgements
Co-authors / Collaborators
Reed Cartwright (ASU)
Kelly Harkins (ASU and
UCSC)
Anne Stone (ASU)
Kael Dai (ASU)
Adam Orr (ASU)
Mike Miller (Villanova)
Funding
NSF DBI-1356548
NIH R01-GM101352-01A1
NSF DDIG BCS-1232582
ASU Startup Funds
SISRS is available at
https://github.com/rachelss/SISRS