Discussion of article " Bayes Estimators for Phylogenetic Reconstruction", presented by Leo Martins to the Phylogenomics Lab of the University of Vigo
Syst. Biol. 60(4), 528 540, 2011 doi 10.1093/sysbio/syr021
1. Journal Club – Bayes Estimators for Phylogenetic
Reconstruction
Syst. Biol. 60(4), 528 – 540, 2011 doi 10.1093/sysbio/syr021
Leonardo de O. Martins
University of Vigo
July 22, 2011
Leo Martins (Univ. Vigo) Journal Club 22/7 1 / 12
2. Outline
1 Distance as a penalty
2 Distances, everywhere
3 No phylogenetics, yet...
4 Trees as points in space
5 To the paper, then
Leo Martins (Univ. Vigo) Journal Club 22/7 2 / 12
3. Statistical Risk
ˆ
The risk ρ associated with a decision θ is the expected loss of this decision
ˆ
θ (which can be, for instance, an estimate of θ).
Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
4. Statistical Risk
ˆ
The risk ρ associated with a decision θ is the expected loss of this decision
ˆ
θ (which can be, for instance, an estimate of θ).
ˆ
ρ(θ) = ˆ
L(θ, θ) P(θ | data) dθ
(promptly called posterior expected loss)
Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
5. Statistical Risk
ˆ
The risk ρ associated with a decision θ is the expected loss of this decision
ˆ
θ (which can be, for instance, an estimate of θ).
ˆ
ρ(θ) = ˆ
L(θ, θ) P(θ | data) dθ
(promptly called posterior expected loss)
ˆ
The loss function L(θ, θ) is a penalty we give for ”deciding” away from the
parameter. Examples are the squared loss and the absolute loss.
Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
6. Statistical Risk
ˆ
The risk ρ associated with a decision θ is the expected loss of this decision
ˆ
θ (which can be, for instance, an estimate of θ).
ˆ
ρ(θ) = ˆ
L(θ, θ) P(θ | data) dθ
(promptly called posterior expected loss)
ˆ
The loss function L(θ, θ) is a penalty we give for ”deciding” away from the
parameter. Examples are the squared loss and the absolute loss.
For some loss functions, we can calculate what is the best decision (i.e.
the one that minimizes the risk, for any data).
Leo Martins (Univ. Vigo) Journal Club 22/7 3 / 12
7. Outline
1 Distance as a penalty
2 Distances, everywhere
3 No phylogenetics, yet...
4 Trees as points in space
5 To the paper, then
Leo Martins (Univ. Vigo) Journal Club 22/7 4 / 12
8. How to summarise a collection of objects?
scattered points
library ( MASS ) ;
x <- mvrnorm ( n =1000 , mu = c (0 ,0) , Sigma = matrix ( c (1 , 0.8 , 0.9 , 1) , 2 , 2 , byrow = T ) ) ;
plot ( x [ ,1] , x [ ,2] , pch = " . " , cex = 2 , xlab = " x " , ylab = " y " ) ;
Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12
9. How to summarise a collection of objects?
centroid: minimizes a distance to all points
library ( MASS ) ;
x <- mvrnorm ( n =1000 , mu = c (0 ,0) , Sigma = matrix ( c (1 , 0.8 , 0.9 , 1) , 2 , 2 , byrow = T ) ) ;
plot ( x [ ,1] , x [ ,2] , pch = " . " , cex = 2 , xlab = " x " , ylab = " y " ) ;
Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12
10. How to summarise a collection of objects?
regression line: minimizes a distance to all points
library ( MASS ) ;
x <- mvrnorm ( n =1000 , mu = c (0 ,0) , Sigma = matrix ( c (1 , 0.8 , 0.9 , 1) , 2 , 2 , byrow = T ) ) ;
plot ( x [ ,1] , x [ ,2] , pch = " . " , cex = 2 , xlab = " x " , ylab = " y " ) ;
Leo Martins (Univ. Vigo) Journal Club 22/7 5 / 12
11. Outline
1 Distance as a penalty
2 Distances, everywhere
3 No phylogenetics, yet...
4 Trees as points in space
5 To the paper, then
Leo Martins (Univ. Vigo) Journal Club 22/7 6 / 12
12. How to summarise the posterior distribution P(X)?
Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
13. How to summarise the posterior distribution P(X)?
Posterior mean
Minimize the expected loss under a squared loss function
ˆ ˆ
L(θ, θ) = (θ − θ)2
(Euclidean distance)
Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
14. How to summarise the posterior distribution P(X)?
Posterior median
Minimize the expected loss under a linear loss function
ˆ ˆ
L(θ, θ) =| θ − θ |
(Manhattan distance)
Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
15. How to summarise the posterior distribution P(X)?
Posterior mode
a.k.a. Maximum A Posteriori (MAP) estimate.
Minimize the expected loss under a delta loss function
0, ˆ
for θ = θ
ˆ
L(θ, θ) =
1, ˆ
for θ = θ
Leo Martins (Univ. Vigo) Journal Club 22/7 7 / 12
16. Outline
1 Distance as a penalty
2 Distances, everywhere
3 No phylogenetics, yet...
4 Trees as points in space
5 To the paper, then
Leo Martins (Univ. Vigo) Journal Club 22/7 8 / 12
17. Distances between trees
D D
C E
€ €
€ € €
€
E C
€ €
f
f f
f
f f
fˆˆ fˆˆ
¢ ˆˆ
ˆB ¢ ˆˆ
ˆB
¢ ¢
¢ ¢
¢ ¢
A A
Trees from the article
Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
18. Distances between trees
D D
C E
€ €
€ € €
€
E C
€ €
f
f f
f
f f
fˆˆ fˆˆ
¢ ˆˆ
ˆB ¢ ˆˆ
ˆB
¢ ¢
¢ ¢
¢ ¢
A A
RF distance
DE|ABC and CD|ABE
total 2 branches
Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
19. Distances between trees
D D
C E
€ €
€ € €
€
E C
€ €
f
f f
f
f f
fˆˆ fˆˆ
¢ ˆˆ
ˆB ¢ ˆˆ
ˆB
¢ ¢
¢ ¢
¢ ¢
A A
Quartet distance
AC|DE and AE|CD
BC|DE and BE|CD
4 quartets are different
Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
20. Distances between trees
D D
C E
€ €
€ € €
€
E C
€ €
f
f f
f
f f
fˆˆ fˆˆ
¢ ˆˆ
ˆB ¢ ˆˆ
ˆB
¢ ¢
¢ ¢
¢ ¢
A A
Quartet distance
AC|DE and AE|CD
BC|DE and BE|CD
4 quartets are different
Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
21. Distances between trees
D D
C E
€ €
€ € €
€
E C
€ €
f
f f
f
f f
fˆˆ fˆˆ
¢ ˆˆ
ˆB ¢ ˆˆ
ˆB
¢ ¢
¢ ¢
¢ ¢
A A
Path difference (number of speciations between trees)
path from A to E is one edge longer in one tree than the other
(...)
the overall difference is 6
Leo Martins (Univ. Vigo) Journal Club 22/7 9 / 12
22. Outline
1 Distance as a penalty
2 Distances, everywhere
3 No phylogenetics, yet...
4 Trees as points in space
5 To the paper, then
Leo Martins (Univ. Vigo) Journal Club 22/7 10 / 12
23. If there is a distance, there is a Bayes estimator
For points in Rn , we know that the mean minimizes the Euclidean
distance, etc.
For phylogenies:
there are several Euclidean distances
But some distances between trees also lead to “analytical” solutions:
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
24. If there is a distance, there is a Bayes estimator
For points in Rn , we know that the mean minimizes the Euclidean
distance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
25. If there is a distance, there is a Bayes estimator
For points in Rn , we know that the mean minimizes the Euclidean
distance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
the consensus tree minimizes the Robinson-Foulds distance between
the samples
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
26. If there is a distance, there is a Bayes estimator
For points in Rn , we know that the mean minimizes the Euclidean
distance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
the consensus tree minimizes the Robinson-Foulds distance between
the samples
the quartet puzzling minimizes the quartet distance
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
27. If there is a distance, there is a Bayes estimator
For points in Rn , we know that the mean minimizes the Euclidean
distance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
the consensus tree minimizes the Robinson-Foulds distance between
the samples
the quartet puzzling minimizes the quartet distance
the Buneman tree minimizes (I think) the dissimilarity map distance
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
28. If there is a distance, there is a Bayes estimator
For points in Rn , we know that the mean minimizes the Euclidean
distance, etc.
For phylogenies:
there are several Euclidean distances
the mean does not work since a tree has restrictions
But some distances between trees also lead to “analytical” solutions:
the consensus tree minimizes the Robinson-Foulds distance between
the samples
the quartet puzzling minimizes the quartet distance
the Buneman tree minimizes (I think) the dissimilarity map distance
some of these are hard to solve as well
Leo Martins (Univ. Vigo) Journal Club 22/7 11 / 12
29. How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possible
topologies
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
30. How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possible
topologies
their input data is the posterior distribution of trees from MrBayes
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
31. How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possible
topologies
their input data is the posterior distribution of trees from MrBayes
starting tree can be NJ, MAP tree, ML...
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
32. How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possible
topologies
their input data is the posterior distribution of trees from MrBayes
starting tree can be NJ, MAP tree, ML...
apply branch-swap (NNI) to current optimal tree, then verify distance
to all samples
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
33. How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possible
topologies
their input data is the posterior distribution of trees from MrBayes
starting tree can be NJ, MAP tree, ML...
apply branch-swap (NNI) to current optimal tree, then verify distance
to all samples
the distance used is the path difference (matrix subtraction)
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12
34. How do they find, then, the Bayes estimates?
like many other softwares: hill-climbing on the space of possible
topologies
their input data is the posterior distribution of trees from MrBayes
starting tree can be NJ, MAP tree, ML...
apply branch-swap (NNI) to current optimal tree, then verify distance
to all samples
the distance used is the path difference (matrix subtraction)
don’t need to recalculate distance to all samples, just to matrix with
average values
Leo Martins (Univ. Vigo) Journal Club 22/7 12 / 12