Presentation of PHYLDOG, a piece of software for reconstructing gene and species phylogenies, with a focus on the practical side of things and pointers to a tutorial.
2. Getting the program
• From internet:
• http://pbil.univ-lyon1.fr/software/phyldog/#try
!
• Using the USB keys in the room:
• contain VirtualBox
• contain the application along with the data to analyze
3. • LBBE collaborators (Lyon):
– Gergely Szöllősi (Budapest),
– Eric Tannier,
– Vincent Daubin,
– Manolo Gouy,
– Sophie Abby,
– Laurent Duret,
– Thomas Bigot,
– Magali Semeria
Collaborators
4. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
5. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
6. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
7. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D
8. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
D DL
9. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGTD DL
10. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILSD DL
11. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
PHYLDOG
D DL
12. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
DL
+LGT:
Szollosi
et al.,
PNAS
PHYLDOG
D DL
13. Species: A B C D
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
Species: A B C D
Discrete character:
Continuous character:
a a b a
0.1 0.2 0.2 0.4
T
I
M
E
LGT ILS
DL
+LGT:
Szollosi
et al.,
PNAS
ILS:
Not
yet
PHYLDOG
D DL
14. What is PHYLDOG?
• Program for the coestimation of species and gene trees at the
genome scale
• Probabilistic model of sequence evolution + model of gene
duplication and loss
• Statistical framework
• Branch-wise parameters of duplications and losses
• Gene families evolve independently of each other
• Based on a parallel architecture using MPI
Genome-scale coestimation of species and gene trees. Boussau et al., Genome research. 2013 23:323:330.
17. Option files
family_X.option: options specific to gene family X (alignment
file, substitution model options, gene tree search options)
GeneralOptions.txt: options concerning the species tree search,
and options common to all gene families (species tree search
options, duplication/loss model options, list of gene families)
18. Option files
family_X.option: options specific to gene family X (alignment
file, substitution model options, gene tree search options)
GeneralOptions.txt: options concerning the species tree search,
and options common to all gene families (species tree search
options, duplication/loss model options, list of gene families)
Easy generation of basic option files using the
prepareData.py script
19. PHYLDOG tutorial
http://www.prabi.fr/redmine/projects/phyldogtoolt/wiki/Tutorial
• Installing PHYLDOG
• Downloading files
• Basic input files
• Generating all the option files using the prepareData.py script
• Running PHYLDOG
• Diminishing the number of species considered
• Diminishing the number of gene families considered
• Running PHYLDOG, at last
• Interpreting PHYLDOG's output
• Going further
23. •Gene alignments:
•Error prone
•Short
•Point estimates
•Gene trees:
•based on alignments
•Point estimates
•Species trees:
•based on gene trees
Why our current pipeline can be improved
24. •Gene alignments:
•Error prone
•Short
•Point estimates
•Gene trees:
•based on alignments
•Point estimates
•Species trees:
•based on gene trees
Why our current pipeline can be improved
35. Study of mammalian genomes
• Challenging but well-studied phylogeny
• 36 mammalian genomes available in Ensembl v. 57
• About 7000 gene families
• Correction for incomplete genomes
42. • Two approaches:
1. Looking at ancestral genome sizes
2. Assessing how well one can recover ancestral syntenies using
reconstructed gene trees (Bérard et al., Bioinformatics)
Assessing the quality of gene trees
43. • Two approaches:
1. Looking at ancestral genome sizes
2. Assessing how well one can recover ancestral syntenies using
reconstructed gene trees (Bérard et al., Bioinformatics)
• Comparison between:
– PhyML (PhylomeDB and Homolens databases )
– TreeBeST (Ensembl-Compara database)
– PHYLDOG
Assessing the quality of gene trees
44. 1) Junk trees generate obesity
• Errors in gene tree reconstruction result in larger
ancestral genomes
– Better algorithms should yield smaller ancestral genomes
47. 2) Junk trees break synteny groups
• We use Deco (Bérard et al. Bioinformatics 2013) to
reconstruct ancestral synteny groups using gene trees
• Errors in gene tree reconstruction break synteny
groups
– Better algorithms should yield more genes in ancestral
synteny groups
50. Perspectives
!
• Improvement of the algorithms to reconstruct gene
trees (e.g. Magali Semeria)
• Improvement of the algorithms to reconstruct the
species tree
• Dealing with ILS
• Joint reconstruction of gene trees and gene
alignments
51. Perspectives
!
• Improvement of the algorithms to reconstruct gene
trees (e.g. Magali Semeria)
• Improvement of the algorithms to reconstruct the
species tree
• Dealing with ILS
• Joint reconstruction of gene trees and gene
alignments
52. The Ancestrome project
• Reconstructing a species tree and gene trees for a large
number of species
• Reconstructing ancestral gene contents
• Inferring ancient metabolisms and lifestyles
• Inferring ancient communities
New insights into the evolution of life on Earth,
and into genomic evolution