1. Is your phylogeny informative?
Measuring the power of comparative methods
Carl Boettiger
University of California, Davis
SICB 2011
Fell free to share live
2.
3.
4.
5.
6.
7.
8. Two Questions
Right Model? Sufficient Data?
Currently we worry that we don’t have the
right models.
We should worry that we don’t have the
power to know if we have the right models.
9. Right Model?
h
enoug
m ed ?
cram e tree
we to th
ave gy in
H lo
bio
(diagram shows tree
under weight of so
many models)
10. Sufficient Data?
Is the data-set big enough?
... informative about
the things we want to
estimate?
11. A model of “phylogenetic signal”
Are closely related
species really more
similar?
al” on g
sig
n str al”
“ n
No sig
0 1
“
an
wni
(Bro ion)
mot
14. A simulation-based test of this method
e
ue valu
Tr
estimates
al” on g
sig
n str al”
“ n
“No sig
0 1 w nian
(Bro ion)
mot
15. What went wrong?
Was supposed to address the
problem of adequate data...
Instead, it's just fallen prey
to the problem
Insufficient signal to
estimate signal
17. ree e
e r t at
Bi gg s ti m
e
an da
c b
lam ter
be t
ve r s
s o ci e
ha pe
0s )
←
30 23
(vs
al” on g
sig
n str al”
“ n
“No sig
0 1
18. Not just size that matters....
e
is quit f
a data ate o
l2 3 tax ut estim
Origina tive abo e, sigma
a t
inform ication ra
sif
diver
19. Is the phylogeny informative?
Depends on the question
Depends on the amount of data
x
ple ta to
com da
m ore the
dd have
we a we
as , do ?
So dels hem
mo tify t
jus
uld
wo ?
ow now
H k
we
20. Process vs Pattern
Models represent mechanisms
More complex models fit the data
better
So can we tell when we've matched
the mechanism and when we've
only matched the pattern?
Yes!
22. Phylogenetic Monte Carlo
rom
la te f el →
i mu e mo d
S pl
sim odel
s
b oth m
← Fits data
to this
23. Phylogenetic Monte Carlo
rom
la te f el →
i mu e mo d
S pl
sim odel
s
b oth m
← Fits data
to this
LogLik(complex) LogLik(simple)
24. Phylogenetic Monte Carlo
rom
la te f el →
i mu e mo d
S pl
sim odel
s
b oth m
← Fits data
to this
LogLik(complex) LogLik(simple)
frequency
Likelihood ratio
25. Phylogenetic Monte Carlo
rom
la te f el →
i mu e mo d
S pl
sim odel
s
b oth m
← Fits data
to this
LogLik(complex) LogLik(simple)
frequency
Likelihood ratio
26. Phylogenetic Monte Carlo
rom
la te f el →
i mu e mo d
S pl
sim odel
s
b oth m
← Fits data
to this
LogLik(complex) LogLik(simple)
frequency
Likelihood ratio
27. Phylogenetic Monte Carlo
x
ple
comdel →
rom mo
la te f el →
i mu e mo d
S pl
sim odel
s
b oth m
← Fits data
to this
LogLik(complex) LogLik(simple)
frequency
Likelihood ratio
28. Phylogenetic Monte Carlo
x
ple
comdel →
rom mo
la te f el →
i mu e mo d
S pl
sim odel
s
b oth m
← Fits data
to this
LogLik(complex) LogLik(simple)
frequency
Likelihood ratio
29. A reliable way to justify the complex model
io
← Rat d for
observe ata
d
A noles
Prob of false positive: 0.0115
Power at 95% confidence: 0.99
31. And we can identify cases where the tree
really has no information at all
32. Preferring the simpler model isn't always a
lack of power
Avoid over-
fitting
No trouble with
non-nested
models
33. Compared to what we already do?
a s no
A IC h o f
n
n otio r
e
pow
oses del
cho mo
AIC wrong d
o
the pite go
des r!
e
pow
About as good as flipping a coin...
34. Larger trees can identify more subtle
differences between models
35. Conclusions
We've been worried about the right model.
First we must worry if we have enough data to
tell.
Without this, we will often choose wrong
models.
Future lies in big trees!
PMC: an R package for Phylogenetic Monte Carlo
https://github.com/cboettig/Comparative-Phylogenetics/