SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Is your phylogeny informative?
       Measuring the power of comparative methods


                                  Carl Boettiger
                          University of California, Davis
                                    SICB 2011




Fell free to share live
Two Questions


Right Model?                   Sufficient Data?



      Currently we worry that we don’t have the
      right models.

      We should worry that we don’t have the
      power to know if we have the right models.
Right Model?

                        h
                   enoug
             m ed ?
         cram e tree
     we to th
 ave gy in
H lo
 bio




                                           (diagram shows tree
                                           under weight of so
                                           many models)
Sufficient Data?

Is the data-set big enough?




                                ... informative about
                                the things we want to
                                estimate?
A model of “phylogenetic signal”


 Are closely related
 species really more
 similar?




               al”                        on g
     sig
           n                           str al”
                                      “ n
  No                                   sig
  0                                       1
“
                                                         an
                                                     wni
                                                 (Bro ion)
                                                  mot
A simulation-based test of this method
A simulation-based test of this method




                  al”           on g
         s i gn              str al”
                            “ n
   “No                       sig
    0                           1
A simulation-based test of this method
                                   e
                           ue valu
                      Tr




                                       estimates




                al”                                    on g
      sig
            n                                       str al”
                                                   “ n
  “No                                               sig
    0                                                  1          w  nian
                                                              (Bro ion)
                                                               mot
What went wrong?
Was supposed to address the
problem of adequate data...




            Instead, it's just fallen prey
            to the problem


                        Insufficient signal to
                        estimate signal
“Has the pendulum swung too far?”
ree e
                         e r t at
                    Bi gg s ti m
                          e
                       an da
                     c b
                       lam ter
                        be t


                                              ve r s
                                           s o ci e
                                        ha pe
                                          0s )
                                      ←
                                       30 23
                                        (vs


              al”                            on g
    sig
          n                               str al”
                                         “ n
“No                                       sig
  0                                          1
Not just size that matters....




                                              e
                                       is quit f
                                a data ate o
                        l2 3 tax ut estim
                Origina tive abo e, sigma
                        a          t
                inform ication ra
                       sif
                 diver
Is the phylogeny informative?

                                      Depends on the question

                                Depends on the amount of data
                       x
                    ple ta to
                 com da
            m ore the
          dd have
      we a we
   as , do ?
So dels hem
 mo tify t
  jus
                               uld
                             wo ?
                          ow now
                         H k
                          we
Process vs Pattern


    Models represent mechanisms

    More complex models fit the data
    better
So can we tell when we've matched
the mechanism and when we've
only matched the pattern?




                                              Yes!
Phylogenetic Monte Carlo



             rom
      la te f el →
  i mu e mo d
S pl
 sim
Phylogenetic Monte Carlo



             rom
      la te f el →
  i mu e mo d
S pl
 sim                                   odel
                                            s
                                b oth m
                         ← Fits data
                         to this
Phylogenetic Monte Carlo



             rom
      la te f el →
  i mu e mo d
S pl
 sim                                     odel
                                              s
                                  b oth m
                           ← Fits data
                           to this

   LogLik(complex)    LogLik(simple)
Phylogenetic Monte Carlo



             rom
      la te f el →
  i mu e mo d
S pl
 sim                                     odel
                                              s
                                  b oth m
                           ← Fits data
                           to this

   LogLik(complex)    LogLik(simple)




         frequency



                            Likelihood ratio
Phylogenetic Monte Carlo



             rom
      la te f el →
  i mu e mo d
S pl
 sim                                     odel
                                              s
                                  b oth m
                           ← Fits data
                           to this

   LogLik(complex)    LogLik(simple)




         frequency



                            Likelihood ratio
Phylogenetic Monte Carlo



             rom
      la te f el →
  i mu e mo d
S pl
 sim                                     odel
                                              s
                                  b oth m
                           ← Fits data
                           to this

   LogLik(complex)    LogLik(simple)




         frequency



                            Likelihood ratio
Phylogenetic Monte Carlo


                                                       x
                                                   ple
                                               comdel →
             rom                                mo
      la te f el →
  i mu e mo d
S pl
 sim                                     odel
                                              s
                                  b oth m
                           ← Fits data
                           to this

   LogLik(complex)    LogLik(simple)




         frequency



                            Likelihood ratio
Phylogenetic Monte Carlo


                                                       x
                                                   ple
                                               comdel →
             rom                                mo
      la te f el →
  i mu e mo d
S pl
 sim                                     odel
                                              s
                                  b oth m
                           ← Fits data
                           to this

   LogLik(complex)    LogLik(simple)




         frequency



                            Likelihood ratio
A reliable way to justify the complex model


                               io
                         ← Rat d for
                         observe ata
                                  d
                          A noles




Prob of false positive: 0.0115
Power at 95% confidence: 0.99
Greater overlap indicates less information,
less power to distinguish
And we can identify cases where the tree
really has no information at all
Preferring the simpler model isn't always a
          lack of power




                                                        Avoid over-
                                                        fitting


No trouble with
non-nested
models
Compared to what we already do?

                                                       a s no
                                                A IC h o f
                                                       n
                                                 n otio r
                                                       e
                                                  pow




                                                  oses del
                                               cho mo
                                          AIC wrong d
                                                      o
                                           the pite go
                                            des r!
                                                e
                                            pow




    About as good as flipping a coin...
Larger trees can identify more subtle
    differences between models
Conclusions

      We've been worried about the right model.
    First we must worry if we have enough data to
    tell.

     Without this, we will often choose wrong
    models.

     Future lies in big trees!

     PMC: an R package for Phylogenetic Monte Carlo
https://github.com/cboettig/Comparative-Phylogenetics/
Acknowledgements
●
    Graham Coop
●
    Peter Ralph
●
    Wainwright Lab
Is your phylogeny informative?
Is your phylogeny informative?

Weitere ähnliche Inhalte

Andere mochten auch

Macroevolution
MacroevolutionMacroevolution
Macroevolution
kwiley0019
 
Limits to Detection for Early Warning Signals of Population Collapse
Limits to Detection for Early Warning Signals of Population CollapseLimits to Detection for Early Warning Signals of Population Collapse
Limits to Detection for Early Warning Signals of Population Collapse
Carl Boettiger
 
Regime shifts in ecology and evolution
Regime shifts in ecology and evolutionRegime shifts in ecology and evolution
Regime shifts in ecology and evolution
Carl Boettiger
 
Chu de 4. dinh luat om voi toan mach.doc
Chu de 4. dinh luat om voi toan mach.docChu de 4. dinh luat om voi toan mach.doc
Chu de 4. dinh luat om voi toan mach.doc
thoa kim
 

Andere mochten auch (10)

11 qđhn-bqp
11 qđhn-bqp11 qđhn-bqp
11 qđhn-bqp
 
Macroevolution
MacroevolutionMacroevolution
Macroevolution
 
Medianet Software Presentation 2010
Medianet Software Presentation 2010Medianet Software Presentation 2010
Medianet Software Presentation 2010
 
A general model of continuous character evolution
A general model of continuous character evolutionA general model of continuous character evolution
A general model of continuous character evolution
 
Limits to Detection for Early Warning Signals of Population Collapse
Limits to Detection for Early Warning Signals of Population CollapseLimits to Detection for Early Warning Signals of Population Collapse
Limits to Detection for Early Warning Signals of Population Collapse
 
IIASA Progress Report 2
IIASA Progress Report 2IIASA Progress Report 2
IIASA Progress Report 2
 
Iiasa final
Iiasa finalIiasa final
Iiasa final
 
Regime shifts in ecology and evolution
Regime shifts in ecology and evolutionRegime shifts in ecology and evolution
Regime shifts in ecology and evolution
 
Open Science
Open ScienceOpen Science
Open Science
 
Chu de 4. dinh luat om voi toan mach.doc
Chu de 4. dinh luat om voi toan mach.docChu de 4. dinh luat om voi toan mach.doc
Chu de 4. dinh luat om voi toan mach.doc
 

Mehr von Carl Boettiger (6)

ESA 2012 talk
ESA 2012 talkESA 2012 talk
ESA 2012 talk
 
Detecting evolutionary regime shifts with comparative phylogenetics
Detecting evolutionary regime shifts with comparative phylogeneticsDetecting evolutionary regime shifts with comparative phylogenetics
Detecting evolutionary regime shifts with comparative phylogenetics
 
R interface to TreeBASE
R interface to TreeBASER interface to TreeBASE
R interface to TreeBASE
 
Scioslides
ScioslidesScioslides
Scioslides
 
Inferring Adaptive Landscapes from Phylogenetic Trees
Inferring Adaptive Landscapes from Phylogenetic TreesInferring Adaptive Landscapes from Phylogenetic Trees
Inferring Adaptive Landscapes from Phylogenetic Trees
 
Reading the Secrets of Biological Fluctuations
Reading the Secrets of Biological FluctuationsReading the Secrets of Biological Fluctuations
Reading the Secrets of Biological Fluctuations
 

Is your phylogeny informative?

  • 1. Is your phylogeny informative? Measuring the power of comparative methods Carl Boettiger University of California, Davis SICB 2011 Fell free to share live
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. Two Questions Right Model? Sufficient Data? Currently we worry that we don’t have the right models. We should worry that we don’t have the power to know if we have the right models.
  • 9. Right Model? h enoug m ed ? cram e tree we to th ave gy in H lo bio (diagram shows tree under weight of so many models)
  • 10. Sufficient Data? Is the data-set big enough? ... informative about the things we want to estimate?
  • 11. A model of “phylogenetic signal” Are closely related species really more similar? al” on g sig n str al” “ n No sig 0 1 “ an wni (Bro ion) mot
  • 12. A simulation-based test of this method
  • 13. A simulation-based test of this method al” on g s i gn str al” “ n “No sig 0 1
  • 14. A simulation-based test of this method e ue valu Tr estimates al” on g sig n str al” “ n “No sig 0 1 w nian (Bro ion) mot
  • 15. What went wrong? Was supposed to address the problem of adequate data... Instead, it's just fallen prey to the problem Insufficient signal to estimate signal
  • 16. “Has the pendulum swung too far?”
  • 17. ree e e r t at Bi gg s ti m e an da c b lam ter be t ve r s s o ci e ha pe 0s ) ← 30 23 (vs al” on g sig n str al” “ n “No sig 0 1
  • 18. Not just size that matters.... e is quit f a data ate o l2 3 tax ut estim Origina tive abo e, sigma a t inform ication ra sif diver
  • 19. Is the phylogeny informative? Depends on the question Depends on the amount of data x ple ta to com da m ore the dd have we a we as , do ? So dels hem mo tify t jus uld wo ? ow now H k we
  • 20. Process vs Pattern Models represent mechanisms More complex models fit the data better So can we tell when we've matched the mechanism and when we've only matched the pattern? Yes!
  • 21. Phylogenetic Monte Carlo rom la te f el → i mu e mo d S pl sim
  • 22. Phylogenetic Monte Carlo rom la te f el → i mu e mo d S pl sim odel s b oth m ← Fits data to this
  • 23. Phylogenetic Monte Carlo rom la te f el → i mu e mo d S pl sim odel s b oth m ← Fits data to this LogLik(complex) LogLik(simple)
  • 24. Phylogenetic Monte Carlo rom la te f el → i mu e mo d S pl sim odel s b oth m ← Fits data to this LogLik(complex) LogLik(simple) frequency Likelihood ratio
  • 25. Phylogenetic Monte Carlo rom la te f el → i mu e mo d S pl sim odel s b oth m ← Fits data to this LogLik(complex) LogLik(simple) frequency Likelihood ratio
  • 26. Phylogenetic Monte Carlo rom la te f el → i mu e mo d S pl sim odel s b oth m ← Fits data to this LogLik(complex) LogLik(simple) frequency Likelihood ratio
  • 27. Phylogenetic Monte Carlo x ple comdel → rom mo la te f el → i mu e mo d S pl sim odel s b oth m ← Fits data to this LogLik(complex) LogLik(simple) frequency Likelihood ratio
  • 28. Phylogenetic Monte Carlo x ple comdel → rom mo la te f el → i mu e mo d S pl sim odel s b oth m ← Fits data to this LogLik(complex) LogLik(simple) frequency Likelihood ratio
  • 29. A reliable way to justify the complex model io ← Rat d for observe ata d A noles Prob of false positive: 0.0115 Power at 95% confidence: 0.99
  • 30. Greater overlap indicates less information, less power to distinguish
  • 31. And we can identify cases where the tree really has no information at all
  • 32. Preferring the simpler model isn't always a lack of power Avoid over- fitting No trouble with non-nested models
  • 33. Compared to what we already do? a s no A IC h o f n n otio r e pow oses del cho mo AIC wrong d o the pite go des r! e pow About as good as flipping a coin...
  • 34. Larger trees can identify more subtle differences between models
  • 35. Conclusions  We've been worried about the right model. First we must worry if we have enough data to tell.  Without this, we will often choose wrong models.  Future lies in big trees!  PMC: an R package for Phylogenetic Monte Carlo https://github.com/cboettig/Comparative-Phylogenetics/
  • 36. Acknowledgements ● Graham Coop ● Peter Ralph ● Wainwright Lab