This document discusses generalizing phylogenetics to infer shared evolutionary events. It begins by noting that current phylogenetics assumes independent divergences, which is often violated in reality through shared processes like biogeography, gene family evolution, and disease transmission. The author proposes a framework to account for shared divergences, which would improve phylogenetic inference and provide insights into codiversification processes. Key challenges include approximating high-dimensional integrals and sampling over all possible divergence models.
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Generalizing phylogenetics to infer shared evolutionary events
1. Generalizing phylogenetics to
infer shared evolutionary events
Jamie R. Oaks1,2
1Department of Biological Sciences, Auburn
University
2Department of Biology, University of Washington
March 31, 2016
c 2007 Boris Kulikov boris-kulikov.blogspot.com
Shared divergences Jamie Oaks – phyletica.org 1/35
2. Outline
An assumption (i.e., exciting opportunity) in phylogenetics
An approach to the problem
Empirical applications
Current and future directions
Shared divergences Jamie Oaks – phyletica.org 2/35
3. Current state of phylogenetics
Shared divergences Jamie Oaks – phyletica.org 3/35
4. Current state of phylogenetics
Shared ancestry is a fundamental property
of life
Shared divergences Jamie Oaks – phyletica.org 3/35
5. Current state of phylogenetics
Shared ancestry is a fundamental property
of life
Phylogenetics is rapidly progressing as the
statistical foundation of comparatve biology
Shared divergences Jamie Oaks – phyletica.org 3/35
6. Current state of phylogenetics
Shared ancestry is a fundamental property
of life
Phylogenetics is rapidly progressing as the
statistical foundation of comparatve biology
“Big data” present exciting possibilities and
computational challenges
Shared divergences Jamie Oaks – phyletica.org 3/35
7. Current state of phylogenetics
Shared ancestry is a fundamental property
of life
Phylogenetics is rapidly progressing as the
statistical foundation of comparatve biology
“Big data” present exciting possibilities and
computational challenges
Exciting opportunities to develop new ways
to study biology in the light of phylogeny
Shared divergences Jamie Oaks – phyletica.org 3/35
8. Current state of phylogenetics
Shared divergences Jamie Oaks – phyletica.org 4/35
9. Current state of phylogenetics
Assumption: All processes of
diversification affect each lineage
independently and only cause bifurcating
divergences.
Shared divergences Jamie Oaks – phyletica.org 4/35
10. Current state of phylogenetics
Assumption: All processes of
diversification affect each lineage
independently and only cause bifurcating
divergences.
We know this assumption is frequently
violated
Shared divergences Jamie Oaks – phyletica.org 4/35
15. Violations are pervasive and interesting
Biogeography
Environmental changes that affect whole
communities of species
Shared divergences Jamie Oaks – phyletica.org 6/35
16. Violations are pervasive and interesting
Biogeography
Environmental changes that affect whole
communities of species
Gene family evolution
Chromosomal duplications
Shared divergences Jamie Oaks – phyletica.org 6/35
17. Violations are pervasive and interesting
Biogeography
Environmental changes that affect whole
communities of species
Gene family evolution
Chromosomal duplications
Epidemiology
Disease spread via co-infected individuals
Transmission at social gatherings
Shared divergences Jamie Oaks – phyletica.org 6/35
18. Violations are pervasive and interesting
Biogeography
Environmental changes that affect whole
communities of species
Gene family evolution
Chromosomal duplications
Epidemiology
Disease spread via co-infected individuals
Transmission at social gatherings
Endosymbiont evolution (e.g., parasites,
microbiome)
Speciation of the host
Co-colonization of new host species
Shared divergences Jamie Oaks – phyletica.org 6/35
21. Solution:
Accommodate shared
divergence models
Advantage:
More data to estimate
shared parameters
True history
τ1τ2τ3
Problem:
Current methods
only consider
general model
Consequence:
Unnecessary
parameters
introduce error
Current tree model
τ1 τ2 τ3τ4 τ5 τ6 τ7 τ8
Shared divergences Jamie Oaks – phyletica.org 8/35
22. Solution:
Accommodate shared
divergence models
Advantage:
More data to estimate
shared parameters
True history
τ1τ2τ3
Problem:
Current methods
only consider
general model
Consequence:
Unnecessary
parameters
introduce error
Current tree model
τ1 τ2 τ3τ4 τ5 τ6 τ7 τ8
Shared divergences Jamie Oaks – phyletica.org 8/35
23. Solution:
Accommodate shared
divergence models
Advantage:
More data to estimate
shared parameters
True history
τ1τ2τ3
Problem:
Current methods
only consider
general model
Consequence:
Unnecessary
parameters
introduce error
Current tree model
τ1 τ2 τ3τ4 τ5 τ6 τ7 τ8
Shared divergences Jamie Oaks – phyletica.org 8/35
24. Solution:
Accommodate shared
divergence models
Advantage:
More data to estimate
shared parameters
True history
τ1τ2τ3
Problem:
Current methods
only consider
general model
Consequence:
Unnecessary
parameters
introduce error
Current tree model
τ1 τ2 τ3τ4 τ5 τ6 τ7 τ8
Shared divergences Jamie Oaks – phyletica.org 8/35
25. Solution:
Accommodate shared
divergence models
Advantage:
More data to estimate
shared parameters
True history
τ1τ2τ3
Problem:
Current methods
only consider
general model
Consequence:
Unnecessary
parameters
introduce error
Current tree model
τ1 τ2 τ3τ4 τ5 τ6 τ7 τ8
Shared divergences Jamie Oaks – phyletica.org 8/35
26. Solution:
Accommodate shared
divergence models
Advantage:
More data to estimate
shared parameters
True history
τ1τ2τ3
Problem:
Current methods
only consider
general model
Consequence:
Unnecessary
parameters
introduce error
Current tree model
τ1 τ2 τ3τ4 τ5 τ6 τ7 τ8
Shared divergences Jamie Oaks – phyletica.org 8/35
28. Why account for shared divergences?
1. Improve inference
2. Provide a framework for studying processes of co-diversification
Shared divergences Jamie Oaks – phyletica.org 9/35
29. Violations are pervasive and interesting
Biogeography
Environmental changes that affect whole
communities of species
Gene family evolution
Chromosomal duplications
Epidemiology
Disease spread via co-infected individuals
Transmission at social gatherings
Endosymbiont evolution (e.g., parasites,
microbiome)
Speciation of the host
Co-colonization of new host species
Shared divergences Jamie Oaks – phyletica.org 10/35
30. Outline
An assumption (i.e., exciting opportunity) in phylogenetics
An approach to the problem
Empirical applications
Current and future directions
Shared divergences Jamie Oaks – phyletica.org 11/35
38. Inferring co-diversification
m1 m2 m3 m4 m5
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 13/35
39. Inferring co-diversification
m1 m2 m3 m4 m5
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer m and T given DNA sequence alignments X
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 13/35
40. Inferring co-diversification
p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer m and T given DNA sequence alignments X
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 13/35
41. Inferring co-diversification
p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer m and T given DNA sequence alignments X
p(mi | X) ∝ p(X | mi )p(mi )
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 13/35
42. Inferring co-diversification
p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer m and T given DNA sequence alignments X
p(mi | X) ∝ p(X | mi )p(mi )
p(X | mi ) =
θ
p(X | θ, mi )p(θ | mi )dθ
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 13/35
43. Inferring co-diversification
p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer m and T given DNA sequence alignments X
p(mi | X) ∝ p(X | mi )p(mi )
p(X | mi ) =
θ
p(X | θ, mi )p(θ | mi )dθ
Divergence times
Gene trees
Substitution parameters
Demographic parameters
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 13/35
73. New method: dpp-msbayes
Approximate-likelihood Bayesian approach to inferring models of shared divergences
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 18/35
74. New method: dpp-msbayes
Approximate-likelihood Bayesian approach to inferring models of shared divergences
Flexible Dirichlet-process prior (DPP) over all possible divergence models
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 18/35
75. New method: dpp-msbayes
Approximate-likelihood Bayesian approach to inferring models of shared divergences
Flexible Dirichlet-process prior (DPP) over all possible divergence models
Flexible priors on parameters to avoid strongly weighted posteriors
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 18/35
76. New method: dpp-msbayes
Approximate-likelihood Bayesian approach to inferring models of shared divergences
Flexible Dirichlet-process prior (DPP) over all possible divergence models
Flexible priors on parameters to avoid strongly weighted posteriors
Multi-processing to accommodate genomic datasets
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 18/35
78. dpp-msbayes: Simulation-based assessment
Validation:
Simulate 50,000 datasets and analyze each under the same model
Robustness:
Simulate datasets that violate model assumptions and analyze each of them
Shared divergences Jamie Oaks – phyletica.org 19/35
79. dpp-msbayes: Validation results
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Posterior probability of one divergence
Trueprobabilityofonedivergence
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 20/35
80. dpp-msbayes: Validation results
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Posterior probability of one divergence
Trueprobabilityofonedivergence
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Posterior probability of one divergence
Trueprobabilityofonedivergence
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 20/35
81. dpp-msbayes: Performance
New method for estimating shared evolutionary history shows:
1. Model-choice accuracy
2. Robustness to model violations
3. Power to detect variation in divergence times
4. It’s fast!
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 21/35
82. dpp-msbayes: Performance
New method for estimating shared evolutionary history shows:
1. Model-choice accuracy
2. Robustness to model violations
3. Power to detect variation in divergence times
4. It’s fast!
A new tool for biologists to leverage comparative genomic data to explore
processes of co-diversification
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 21/35
83. Outline
An assumption (i.e., exciting opportunity) in phylogenetics
An approach to the problem
Empirical applications
Current and future directions
Shared divergences Jamie Oaks – phyletica.org 22/35
90. Results
1 3 5 7 9 11 13 15 17 19 21
Number of divergence events
0.00
0.02
0.04
0.06
0.08
0.10
Posteriorprobability
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 26/35
91. Results
1 3 5 7 9 11 13 15 17 19 21
Number of divergence events
0.00
0.02
0.04
0.06
0.08
0.10
Posteriorprobability
0100200300400500
Time (kya)
0
-50
-100
Sealevel(m)
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 26/35
92. More data!
Collecting genomic data from taxa co-distributed across Southeast Asian Islands and
Mainland
Shared divergences Jamie Oaks – phyletica.org 27/35
93. More data!
Collecting genomic data from taxa co-distributed across Southeast Asian Islands and
Mainland
Preliminary results for 1000 loci from 5 pairs of Gekko mindorensis populations
1 2 3 4 5
Number of divergence events, j¿j
-5.0
-4.0
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
2ln(Bayesfactor)
Shared divergences Jamie Oaks – phyletica.org 27/35
94. Diversification across African rainforests
Did climate cycles drive diversification
and community assembly across rainforest
taxa?
Shared divergences Jamie Oaks – phyletica.org 28/35
95. Diversification across African rainforests
Did climate cycles drive diversification
and community assembly across rainforest
taxa?
Preliminary results with 300 loci from 3
taxa
1 2 3
Number of divergence events, j¿j
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2ln(Bayesfactor)
Shared divergences Jamie Oaks – phyletica.org 28/35
96. Conclusions
New method for estimating shared evolutionary history
Shows good “frequentist” behavior
Relatively robust to model violations
Shared divergences Jamie Oaks – phyletica.org 29/35
97. Conclusions
New method for estimating shared evolutionary history
Shows good “frequentist” behavior
Relatively robust to model violations
Finding support for temporally clustered divergences in multiple systems
Shared divergences Jamie Oaks – phyletica.org 29/35
98. Conclusions
New method for estimating shared evolutionary history
Shows good “frequentist” behavior
Relatively robust to model violations
Finding support for temporally clustered divergences in multiple systems
However, there is a lot of uncertainty!
Shared divergences Jamie Oaks – phyletica.org 29/35
99. Outline
An assumption (i.e., exciting opportunity) in phylogenetics
An approach to the problem
Empirical applications
Current and future directions
Shared divergences Jamie Oaks – phyletica.org 30/35
100. Current work: More power
Ecoevolity: Estimating evolutionary coevality
1
D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 31/35
101. Current work: More power
Ecoevolity: Estimating evolutionary coevality
Full-likelihood Bayesian implementation
Uses all the information in the data
Applicable to deeper timescales
1
D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 31/35
102. Current work: More power
Ecoevolity: Estimating evolutionary coevality
Full-likelihood Bayesian implementation
Uses all the information in the data
Applicable to deeper timescales
Analytically integrate over gene trees 1
Very efficient numerical approximation of posterior
Applicable to NGS datasets
1
D. Bryant et al. (2012). Molecular Biology And Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 31/35
103. Next step: A general framework
Develop a framework for inferring shared
divergences across phylogenies
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 32/35
104. Next step: A general framework
Develop a framework for inferring shared
divergences across phylogenies
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 32/35
105. Next step: A general framework
Develop a framework for inferring shared
divergences across phylogenies
Generalize Bayesian phylogenetics to
incorporate shared divergences
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 32/35
106. Next step: A general framework
Develop a framework for inferring shared
divergences across phylogenies
Generalize Bayesian phylogenetics to
incorporate shared divergences
Sample models numerically via
reversible-jump Markov chain Monte Carlo
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 32/35
107. Next step: A general framework
Develop a framework for inferring shared
divergences across phylogenies
Generalize Bayesian phylogenetics to
incorporate shared divergences
Sample models numerically via
reversible-jump Markov chain Monte Carlo
Benefits:
Improve phylogenetic inference
Framework for studying processes of
co-diversification
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 32/35