This document compares the amount and quality of phylogenetic information from six amniote phylogenomic datasets. It finds that while median support for major relationships is often strong, there is wide variance in support, both for and against known relationships. This suggests a minimum level of systematic error. Support for turtle placement varies the most between datasets, indicating relatively little phylogenetic information about turtles compared to other amniote groups. Overall, the analysis demonstrates that phylogenomic datasets can differ substantially in information content and reliability.
Comparing the Amount and Quality of Information from Different Sequencing Strategies: A Case Study with Amniotes
1. Comparing the amount and quality of information
from different sequencing strategies:
A case study with amniotes
Jeremy M. Brown and Robert C. Thomson
@jembrown www.phyleauxgenetics.org
5. The Ideal
. . . . .
. . . . .
One Gene Whole Genome
Parameter ValueParameter Value
Likelihood
Likelihood
llustration of how large data sets allow an arbitrarily great reduction in the variance of an estimate without making it an
airs of DNA sequences with an evolutionary distance of 0.7 substitutions per site were generated according to a GTR (Lana
and Phylogenomics · doi:10.1093/molbev/msr202 M
Kumar et al. 2012. Mol. Biol. Evol.
6. The Worst Case
n illustration of how large data sets allow an arbitrarily great reduction in the variance of an estimate without making it a
Pairs of DNA sequences with an evolutionary distance of 0.7 substitutions per site were generated according to a GTR (Lana
vare 1986) of evolution using SeqGen (Rambaut and Grassly 1997). The evolutionary distance between simulated sequences w
s and Phylogenomics · doi:10.1093/molbev/msr202 M
Kumar et al. 2012. Mol. Biol. Evol.
7. Big Data = Strong Support
Caiman
Podarcis
Python
Phrynops
Gallus
Ornithorhynchus
Monodelphis
Taeniopygia
Alligator
Emys
Anolis
Caretta
Chelonoidis
Homo
Xenopus
Protopterus
1 1
1
1
1
1
1
0.99
1
1
1
1
0.1 substitution / site
BPML = -
BPPARTG = -
BPPARTC = 100
PPBAY = -
PPPARTC = 1.0
PPCAT = 1.0
tes as inferred from analyses of the 248-gene dataset. (a) Bayesian consensus topology
62,342 sites) under the CAT-GTR + G4 mixture model. (b) Bayesian consensus topology
e dataset (187,026 sites) under the CAT-GTR + G4 mixture model. The nodal values indicate
ical support values obtained with different methods, models and data partitions detailed in
saurs. Note the relative incongruence between the two trees concerning the position of
except for Chelonoidis from Y. Chiari. Please note also that the taxonomy of Galapagos turtles
me for the Chelonoidis specimen included here might be Chelonoidis sp.
enetic position of turtles based on the various reconstruction methods,
Amino acids Nucleotides
All positions All positions Positions 1 + 2 Positions 3
62,342 187,026 124,684 62,342
Page 3 of 14
and relevant to resolving ancient phylogenetic enig
throughout the tree of life [28]. This approach to h
throughput phylogenomics—based on thousand
loci—is likely to fundamentally change the way
systematists gather and analyse data.
(a) Additional information
We provide all data and links to software via Dryad r
sitory (doi:10.5061/dryad.75nv22qj) and GenB
(JQ868813–JQ885411).
We thank R. Nilsen, K. Jones, M. Harvey, R. Nussb
G. Schneider, D. Ray, D. Peterson, C. Moran, L. M
S. Isberg, C. Mancuso, S. Herke, two anonymous revie
and the LSU Genomic Facility. National Science Found
grants DEB-1119734, DEB-0841729 and DEB-0956
and an Amazon Web Services Education Grant supp
this study. N.G.C., B.C.F., J.E.M. and T.C.G. designed
study; N.G.C. and B.C.F. performed phylogenetic ana
B.C.F. created datasets; J.E.M. performed laboratory w
all authors helped write the manuscript.
1 Lee, M. S. Y., Reeder, T. W., Slowinski, J. B. & Law
R. 2004 Resolving reptile relationships. In Assem
the tree of life (eds J. Cracraft & M. J. Donog
pp. 451–467. Oxford, UK: Oxford University Press
2 Lee, M. 1997 Reptile relationships turn turtle. N
389, 245–246. (doi:10.1038/38422)
3 Rieppel, O. 1999 Turtle origins. Science 283, 945–
(doi:10.1126/science.283.5404.945)
4 Janke, A., Erpenbeck, D., Nilsson, M. & Aranason
2001 The mitochondrial genomes of the iguana (Ig
iguana) and the caiman (Caiman crocodylus): implica
for amniote phylogeny. Proc. R. Soc. Lond. B 268, 6
631. (doi:10.1098/rspb.2000.1402)
5 Hedges, S. & Poling, L. 1999 A molecular phyloge
reptiles. Science 283, 998–1001. (doi:10.1126/sci
283.5404.998)
6 Rest, J. S., Ast, J. C., Austin, C. C., Waddell,
(a)
(b)
snake
lizard
turtles
tuatara
crocodilians
birds
human
0.03 substitutions/site
snake
lizard
tuatara
side-necked turtle
painted turtle
American alligator
saltwater crocodile
zebra finch
chicken
human
1.0/100
1.0/100
1.0/100
1.0/100
1.0/100
1.0/100
1.0/100
Pantherophis guttata
Anolis carolinensis
Sphenodon tuatara
Pelomedusa subrufa
Chrysemys picta
Alligator mississippiensis
Crocodylus porosus
Taeniopygia guttata
Gallus gallus
Homo sapiens
UCEs place turtles sister to archosaurs N. G. Crawford et al.
on November 6, 2012rsbl.royalsocietypublishing.orgDownloaded from
Additional file 1, Tables S7, S8), mirroring previous work
showing up-regulated gene expression in response to
hypoxia in other vertebrate tissues, including many
gene exp
(see Add
S12), bu
Figure 2 A revised phylogeny of major amniote lineages and their rates of m
relationships of the eight primary amniote lineages, and their rates of molecular evol
relationship of turtle and archosaurs (allligator plus birds). The numbers at nodes den
(b) The histogram shows the relative rate of substitution inferred for each lineage un
Methods, Phylogeny and substitution rate).
Bradley Shaffer et al. Genome Biology 2013, 14:R28
http://genomebiology.com/2013/14/3/R28
Crawford et al. (2012)
Biology Letters 8:783
Chiari et al. (2012)
BMC Biology 10:65
Shaffer et al. (2013)
Genome Biology 14:R28
@jembrown
9. Basic Unanswered Questions
• How much do phylogenomic datasets vary
in information content?
• How does the amount and quality of
information vary across loci?
@jembrown
10. 6 Amniote Datasets
• Chiari et al. (2012)
• 248 transcriptomic loci
• 12 taxa
• Crawford et al. (2012)
• 1,145 UCEs
• 10 taxa
• Fong et al. (2012)
• 75 Sanger-sequenced loci
• 129 taxa
• Lu et al. (2013)
• 1,638 transriptomic and genomic loci
• 11 taxa
Western Painted Turtle (Chrysemys picta).
Photo by Brad Shaffer.
• Shaffer et al. (2013)
• 1,955 genomic loci
• 8 taxa
• Wang et al. (2013)
• 1,113 genomic loci
• 12 taxa
@jembrown
14. Bayes Factors are Unbounded
-200 0 200 400
0.00.20.40.60.81.0
2ln(BF)
PosteriorProbability
Mammal
Monophyly
(Wang et al.)
37% points
@jembrown Brown and Thomson, In Prep
15. Support for Bird Monophyly
2ln(BF) - Birds Monophyletic
Frequency
0 500 1000 1500
0102030
Chiari et al.
(Transcripts)
2ln(BF) - Birds Monophyletic
Frequency
0 500 1000 1500
0100200300400
Crawford et al.
(UCEs)
2ln(BF) - Birds Monophyletic
Frequency
0 500 1000 1500
051015
Fong et al.
(Sanger)
2ln(BF) - Birds Monophyletic
Frequency
0 500 1000 1500
0100200300400500600
Lu et al.
(Transcripts/Genomes)
2ln(BF) - Birds Monophyletic
Frequency
0 500 1000 1500
050100150
Shaffer et al.
(Genomes)
2ln(BF) - Birds Monophyletic
Frequency
0 500 1000 1500
01020304050
Wang et al.
(Genomes)
@jembrown Brown and Thomson, In Prep
16. Chiari Fong Wang
-1000100
Chiari Crawford Fong Lu Shaffer Wang
-1000100
Amniota Archosauria
2ln(BF)
Chiari Crawford Fong Lu Shaffer Wang
05001000
Aves
Chiari Crawford Fong Wang
-4000400800
Crocodilians
Chiari Fong Lu Shaffer Wang
-1000100300500
Mammalia
Crawford Lu
-100-50050
Lepidosauria
Chiari Crawford Fong Lu Shaffer
-4000400
Squamata
Chiari Fong Lu Wang
-1000-500050010001500
-100100
Testudines
0@jembrown
Brown and Thomson, In Prep
17. Chiari Fong Wang
-1000100
Chiari Crawford Fong Lu Shaffer Wang
-1000100
Amniota Archosauria
2ln(BF)
Chiari Crawford Fong Lu Shaffer Wang
05001000
Aves
Chiari Crawford Fong Wang
-4000400800
Crocodilians
Chiari Fong Lu Shaffer Wang
-1000100300500
Mammalia
Crawford Lu
-100-50050
Lepidosauria
Chiari Crawford Fong Lu Shaffer
-4000400
Squamata
Chiari Fong Lu Wang
-1000-500050010001500
-100100
Testudines
0
Histograms on
previous slide
@jembrown
Brown and Thomson, In Prep
18. Chiari Fong Wang
-1000100
Chiari Crawford Fong Lu Shaffer Wang
-1000100
Amniota Archosauria
2ln(BF)
Chiari Crawford Fong Lu Shaffer Wang
05001000
Aves
Chiari Crawford Fong Wang
-4000400800
Crocodilians
Chiari Fong Lu Shaffer Wang
-1000100300500
Mammalia
Crawford Lu
-100-50050
Lepidosauria
Chiari Crawford Fong Lu Shaffer
-4000400
Squamata
Chiari Fong Lu Wang
-1000-500050010001500
-100100
Testudines
0
Median
support values
don’t change
that much.
@jembrown
Brown and Thomson, In Prep
19. Chiari Fong Wang
-1000100
Chiari Crawford Fong Lu Shaffer Wang
-1000100
Amniota Archosauria
2ln(BF)
Chiari Crawford Fong Lu Shaffer Wang
05001000
Aves
Chiari Crawford Fong Wang
-4000400800
Crocodilians
Chiari Fong Lu Shaffer Wang
-1000100300500
Mammalia
Crawford Lu
-100-50050
Lepidosauria
Chiari Crawford Fong Lu Shaffer
-4000400
Squamata
Chiari Fong Lu Wang
-1000-500050010001500
-100100
Testudines
0
“Genomic” data
have much wider
variance - often
both for and
AGAINST known
relationships
@jembrown
Brown and Thomson, In Prep
20. Chiari Fong Wang
-1000100
Chiari Crawford Fong Lu Shaffer Wang
-1000100
Amniota Archosauria
2ln(BF)
Chiari Crawford Fong Lu Shaffer Wang
05001000
Aves
Chiari Crawford Fong Wang
-4000400800
Crocodilians
Chiari Fong Lu Shaffer Wang
-1000100300500
Mammalia
Crawford Lu
-100-50050
Lepidosauria
Chiari Crawford Fong Lu Shaffer
-4000400
Squamata
Chiari Fong Lu Wang
-1000-500050010001500
-100100
Testudines
0
A “minimum
bound” on the
level of
systematic error
(conservative).
Amniotes likely a
better-case
scenario.
@jembrown
Brown and Thomson, In Prep
22. How much information do we
have about turtle placement?
2ln(BF) - Birds Monophyletic
Frequency
0 500 1000 1500
01020304050
Wang et al.
(Genomes)
2ln(BF) - Crocodilians MonophyleticFrequency
0 500 1000 1500
0102030405060
2ln(BF) - Mammals Monophyletic
Frequency
0 500 1000 1500
050100150
2ln(BF) - Turtles Monophyletic
Frequency
0 500 1000 1500
020406080100120140
23. How much information do we
have about turtle placement?
2ln(BF) - Birds Monophyletic
Frequency
0 500 1000 1500
01020304050
Wang et al.
(Genomes)
2ln(BF) - Crocodilians MonophyleticFrequency
0 500 1000 1500
0102030405060
2ln(BF) - Mammals Monophyletic
Frequency
0 500 1000 1500
050100150
2ln(BF) - Turtles Monophyletic
Frequency
0 500 1000 1500
020406080100120140
-500 0 500 1000 1500
050010001500
2ln(BF)
Frequency
Amniotes
Archosaurs
Birds
Crocodilians
Diapsids
Lepidosaurs
Mammals
24. How much information do we
have about turtle placement?
2ln(BF) - Birds Monophyletic
Frequency
0 500 1000 1500
01020304050
Wang et al.
(Genomes)
2ln(BF) - Crocodilians MonophyleticFrequency
0 500 1000 1500
0102030405060
2ln(BF) - Mammals Monophyletic
Frequency
0 500 1000 1500
050100150
2ln(BF) - Turtles Monophyletic
Frequency
0 500 1000 1500
020406080100120140
-500 0 500 1000 1500
050010001500
2ln(BF)
Frequency
Amniotes
Archosaurs
Birds
Crocodilians
Diapsids
Lepidosaurs
Mammals
Reject Support
25. How much information do we
have about turtle placement?
Not a lot.
-400 -300 -200 -100 0 100
0500100015002000
2ln(BF)
Frequency
Amniotes
Archosaurs
Birds
Crocodilians
Diapsids
Lepidosaurs
Mammals
26. Take Homes
• Highly variable information quantity and quality
across data sets. Implications for comparing results
across studies.
• Lots of heterogeneity across genes. Implications
for methods that model gene-tree variation.
• Relatively speaking, much less information to place
turtles than for other ‘backbone’ branches.
27. Ongoing
• How do properties of genes (rate, clockness,
alignment quality, etc.) relate to signal?
• How is the information in a gene distributed
across branches?
• Can we identify genes with reliable signal? (For
early results, see Doyle et al., Syst Biol.,Advance
Access)
28. Words of Caution
• Bug in MrBayes v3.2.x that turns off topology
moves incorrectly under some combinations of
constraints.
• Negative constraints are tricky. Tree spaces
become exceptionally rugged and strange things can
happen in some cases when using Metropolis
coupling (more coming soon).
Full Posterior Negative Constraint