1. Phylogenomics and the
Origin of Novelty in Microbes
Jonathan A. Eisen
UC Davis
MBL Microbial Diversity Course
July 9, 2011
2. Phylogenomics and the
Origin of Novelty in Microbes
Jonathan A. Eisen
UC Davis
MBL Microbial Diversity Course
July 9, 2011
3. My Obsessions
Jonathan A. Eisen
UC Davis
MBL Microbial Diversity Course
July 9, 2011
4.
5. Social Networking in Science
HOME PAGE MY TIMES TODAY'S PAPER VIDEO MOST POPULAR TIMES TOPICS Welcome, fcollins Member Center Log Out
Sunday, April 1, 2007 Health
WORLD U.S. N.Y. / REGION BUSINESS TECHNOLOGY SCIENCE HEALTH SPORTS OPINION ARTS STYLE TRAVEL JOBS REAL ESTATE AUTOS
FITNESS & NUTRITION HEALTH CARE POLICY MENTAL HEALTH & BEHAVIOR
Scientist Reveals Secret of the Ocean: It's Him
By NICHOLAS WADE
Published: April 1, 2007
PRINT nytimes.com/sports
Maverick scientist J. Craig Venter has done it again. It was just a few years SINGLE-PAGE
ago that Dr. Venter announced that the human genome sequenced by Celera
SAVE
Genomics was in fact, mostly his own. And now, Venter has revealed a second
SHARE
twist in his genomic self-examination. Venter was discussing his Global
SHARE
Ocean Voyage, in which he used his personal yacht to collect ocean water
samples from around the world. He then used large filtration units to collect How good is your bracket? Compare your tournament picks
to choices from members of The New York Times sports
microbes from the water samples which were then brought back to his high desk and other players.
tech lab in Rockville, MD where he used the same methods that were used to Also in Sports:
The Bracket Blog - all the news leading up to the Final
sequence the human genome to study the genomes of the 1000s of ocean Four
dwelling microbes found in each sample. In discussing the sampling methods, Venter let slip his Bats Blog: Spring training updates
Play Magazine: How to build a super athlete
latest attack on the standards of science – some of the samples were in fact not from the ocean, but
were from microbial habitats in and on his body.
“The human microbiome is the next frontier,” Dr. Venter said. “The ocean voyage was just a cover.
My main goal has always been to work on the microbes that live in and on people. And now that my
genome is nearly complete, why not use myself as the model for human microbiome studies as well.
”
It is certainly true that in the last few years, the microbes that live in and on people have become a
hot research topic. So hot that the same people who were involved in the race to sequence the human
7. T. H. Dobzhansky (1973)
“Nothing in biology makes sense
except in the light of evolution.”
8. Evolutionary Perspective and
Comparative Biology
• Comparative biology is the analysis of differences
and similarities between species.
• An evolutionary perspective is useful in such studies
because it allows one to focus on how and why
similarities and differences came to be.
• In other words, biological objects have a history and
understanding that history is important
9. Phylogenomic Analysis
• Evolutionary reconstructions greatly
improve genome analyses
• Genome analysis greatly improves
evolutionary reconstructions
• There is a feedback loop such that these
should be integrated
10. Phylogenomics of Novelty
Variation in
Mechanisms of
Mechanisms:
Origin of New
Patterns, Causes
Functions
and Effects
Species Evolution
11. rRNA Tree of Life
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press. 2007.
Based on tree from Pace 1997 Science
276:734-740
12. Limited Sampling of RRR Studies
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press. 2007.
Based on tree from Pace 1997 Science
276:734-740
13. Limited Sampling of RRR Studies
Haloferax
Methanococcus
Chlorobium
Deinococcus
Thermotoga
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press. 2007.
Based on tree from Pace 1997 Science
276:734-740
15. TIGR Genome Projects
Methanococcus
Chlorobium
Deinococcus
Thermotoga
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press. 2007.
Based on tree from Pace 1997 Science
276:734-740
34. General Steps in Analysis of
Complete Genomes
• Identification/prediction of genes
• Characterization of gene features
• Characterization of genome features
• Prediction of gene function
• Prediction of pathways
• Integration with known biological data
• Comparative genomics
35. Genome Sequences Have
Revolutionized Microbiology
• Predictions of metabolic processes
• Better vaccine and drug design
• New insights into mechanisms of evolution
• Genomes serve as template for functional
studies
• New enzymes and materials for engineering
and synthetic biology
37. Outline
• Phylogenomic Tales
– Selecting genomes for sequencing
– Species evolution
– Predicting functions of genes
– Uncultured microbes
– Searching for novel organisms and genes
38. Outline
• Phylogenomic Tales
– Selecting genomes for sequencing
– Species evolution
– Predicting functions of genes
– Uncultured microbes
– Searching for novel organisms and genes
• All of these going to be told in context of a
recent project “A Genomic Encyclopedia of
Bacteria and Archaea” (aka GEBA)
40. Major Microbial Sequencing
Efforts
• Coordinated, top-down efforts
– Fungal Genome Initiative (Broad/Whitehead)
– Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing
Project
– Sanger Center Pathogen Sequencing Unit
– NHGRI Human Gut Microbiome Project
– NIH Human Microbiome Program
• White paper or grant systems
– NIAID Microbial Sequencing Centers
– DOE/JGI Community Sequencing Program
– DOE/JGI BER Sequencing Program
– NSF/USDA Microbial Genome Sequencing
• Covers lots of ground and biological diversity
42. As of 2002 Proteobacteria
TM6
OS-K • At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
43. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
44. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
45. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
46. Need for Tree Guidance Well Established
• Common approach within some eukaryotic
groups
• Many small projects funded to fill in some
bacterial or archaeal gaps
• Phylogenetic gaps in bacterial and archaeal
projects commonly lamented in literature
47. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of
OP8
Project Nitrospira
Bacteroides bacteria
Chlorobi
• A genome Fibrobacteres
Marine GroupA • Genome
WS3
from each of Gemmimonas sequences are
Firmicutes
eight phyla Fusobacteria
mostly from
Actinobacteria
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are only
Planctomycetes
Spriochaetes sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Solution I:
Dictyoglomus
Eisen, Ward, Aquificae
Thermudesulfobacteria
sequence more
Robb, Nelson, et Thermotogae
phyla
OP1
al OP11
50. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Still highly
OP10
Thermomicrobia
Chloroflexi
biased in terms
TM7
Deinococcus-Thermus
Dictyoglomus
of the tree
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
52. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Same trend in
OP10
Thermomicrobia
Chloroflexi
Archaea
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
53. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Same trend in
OP10
Thermomicrobia
Chloroflexi
Eukaryotes
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
54. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Same trend in
OP10
Thermomicrobia
Chloroflexi
Viruses
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
55. Proteobacteria
• GEBA TM6
OS-K • At least 40
Acidobacteria
• A genomic Termite Group
OP8
phyla of bacteria
encyclopedia Nitrospira
Bacteroides • Genome
Chlorobi
of bacteria Fibrobacteres
Marine GroupA
sequences are
and archaea WS3
Gemmimonas mostly from
Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria • Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter
OP10
• Solution: Really
Thermomicrobia
Chloroflexi Fill in the Tree
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Eisen & Ward, PIs Thermotogae
OP1
OP11
57. GEBA Pilot Project: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
Eisen, Eddy Rubin, Jim Bristow)
• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,
Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et
al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik
D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N.
Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, Eddy Rubin, Jim Bristow)
58. rRNA Tree of Life
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
59.
60.
61.
62. B:
Ac
tin
ob
ac
te
B: ria # of Genomes
Am (H
in igh
10
15
20
25
30
35
0
5
an G
a C
B: B: er )
Ba Aq ob
ct uif ia
B: ero ica
B: e
D Ch ide
B: e ef lo te
r s
D rri ofl
ef ba e
B: e c xi
B: De B rrib ter
Ep lta : D act es
si Pr ei er
lo o n es
n te oc
Pr ob oc
ot a ci
B: e ct
G B: oba eri
am B F ct a
: ir e
B: m Fu mi ria
a
G P so cut
em ro ba e
t c s
B: ma eo te
ba ri
H tim c a
a t
B: loa ona eri
a
B: Pl nae de
an r te
Th c o s
Phyla
er B: to bia
m S m le
y s
B: od piro ce
es c te
T u h
B: he lfo ae s
rm b te
GEBA Pilot Target List
Th o a s
er de cte
m s ri
u a
A: ove lfo
H n bi
A: alo abu a
A: A b la
M rc ac e
A: et ha te
M han eo ria
et g
ha ob lob
ac i
A: no te
m r
A: The icr ia
Th rm obi
er oc a
m oc
op ci
ro
te
i
63. GEBA Pilot Project Overview
• Identify major branches in rRNA tree for
which no genomes are available
• Identify those with a cultured representative
in DSMZ
• DSMZ grew > 200 of these and prepped
DNA
• Sequence and finish 200+
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
• 1st paper Wu et al in Nature Dec 2009
64. Assess Benefits of GEBA
• All genomes have some value
• But what, if any, is the benefit of tree-
guided sequencing over other selection
methods
• Lessons for other large scale microbial
genome projects?
65. GEBA Phylogenomic Lesson 1
The rRNA Tree of Life is a Useful Tool
for Identifying Phylogenetically Novel
Genomes
66. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press. 2007.
Based on tree from Pace 1997 Science
276:734-740
73. articles
Analysis of the genome sequence of the
¯owering plant
The Arabidopsis Genome Initiative
Authorship of this paper should be cited as `The Arabidopsis Genome Iniative'. A full list of contributors appears at the end of this paper
..........................................................................................................................................................................................................................................................................
. .
The ¯owering plant is an important model system for identifying genes and determining their functions.
Here we report the analysis of the genomic sequence of . The sequenced regions cover 115.4 megabases of the
125-megabase genome and extend into centromeric regions. The evolution of involved a whole-genome duplication,
followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene
transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000
families, similar to the functional diversity of and the other sequenced multicellular
eukaryotes. has many families of new proteins but also lacks several common protein families, indicating that the sets
of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the ®rst
complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes
in all eukaryotes, identifying a wide range of plant-speci®c gene functions and establishing rapid systematic ways to identify
genes for crop improvement.
C. elegans Drosophila
Overview of sequencing strategy
Arabidopsis thaliana
Arabidopsis
Arabidopsis
83. Predicting Function
• Key step in genome projects
• More accurate predictions help guide
experimental and computational analyses
• Many diverse approaches
• All improved both by “phylogenomic” type
analyses that integrate evolutionary
reconstructions and understanding of how
new functions evolve
85. Blast Search of H. pylori “MutS”
• Blast search pulls up Syn. sp MutS#2 with much higher p
value than other MutS homologs
• Based on this TIGR predicted this species had mismatch
repair
Based on Eisen
• Assumes functional constancy et al. 1997
Nature Medicine
3: 1076-1078.
87. Phylogenetic Tree of MutS Family
Aquae
Strpy
Bacsu
Synsp
Deira Helpy
Yeast
Human Borbu Metth
Celeg
mSaco
Yeast
Human Yeast
Mouse
Arath Celeg
Human
Arath
Human
Mouse
Spombe Fly
Yeast Xenla
Rat
Mouse
Yeast Human
Spombe Yeast
Neucr
Arath
Aquae Trepa
Chltr
DeiraTheaq
Thema BacsuBorbu Based on Eisen,
SynspStrpy 1998 Nucl Acids
Ecoli
Neigo Res 26: 4291-4300.
88. MutS Subfamilies
MSH5 MutS2
Aquae
Strpy
Bacsu
Synsp
Deira Helpy
Yeast
Human Borbu Metth
Celeg
mSaco
MSH6 Yeast
Human
Mouse
Arath
Yeast MSH4
Celeg
Human
Arath
Human
MSH3 Mouse
Fly
Spombe
Yeast Xenla
Rat
Mouse
Yeast
MSH1 Spombe
Human
Yeast
MSH2
Neucr
Arath
Aquae Trepa
Chltr
Deira
Theaq
BacsuBorbu
Thema
SynspStrpy
Ecoli
Neigo Based on Eisen,
1998 Nucl Acids
MutS1
Res 26: 4291-4300.
89. Overlaying Functions onto Tree
MutS2
MSH5 Aquae
Strpy
Bacsu
Synsp
Deira Helpy
Yeast
Human Borbu Metth
Celeg
MSH6 mSaco
Yeast
Human
Mouse
Arath
YeastMSH4
Celeg
Human
Arath
Human
MSH3 Mouse
Fly
Spombe
Yeast Xenla
Rat
Mouse
Yeast Human
MSH1 Spombe Yeast MSH2
Neucr
Arath
Aquae Trepa
Chltr
DeiraTheaq
BacsuBorbu
Thema
SynspStrpy Based on Eisen,
Ecoli
Neigo
1998 Nucl Acids
MutS1 Res 26: 4291-4300.
90. Functional Prediction Using Tree
MSH5 - Meiotic Crossing Over MutS2 - Unknown Functions
Aquae
Strpy
Bacsu
Synsp
Deira Helpy
Yeast
Human Borbu Metth
Celeg
MSH6 - Nuclear mSaco
Repair
Yeast
Of Mismatches Human MSH4 - Meiotic Crossing
Mouse Yeast Over
Arath Celeg
Human
Arath
MSH3 - Nuclear Human
Mouse
RepairOf Loops Spombe Fly
Yeast Xenla
Rat
Mouse MSH2 - Eukaryotic Nuclear
Yeast Human Mismatch and Loop Repair
MSH1 Spombe Yeast
Neucr
Mitochondrial
Arath
Repair
Aquae Trepa
Chltr
DeiraTheaq
BacsuBorbu
Thema
SynspStrpy
Ecoli Based on Eisen,
Neigo
1998 Nucl Acids
MutS1 - Bacterial Mismatch and Loop Repair Res 26: 4291-4300.
91.
92. PHYLOGENENETIC PREDICTION OF GENE FUNCTION
EXAMPLE A METHOD EXAMPLE B
2A CHOOSE GENE(S) OF INTEREST 5
3A 1 3 4
2B 2
IDENTIFY HOMOLOGS 5
1A 2A 1B 3B 6
ALIGN SEQUENCES
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
CALCULATE GENE TREE
Duplication?
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
OVERLAY KNOWN
FUNCTIONS ONTO TREE
Duplication?
2A 3A 1B 2B 3B 1 2 3 4 5 6
1A
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
Ambiguous
Duplication?
Species 1 Species 2 Species 3
1A 1B 2A 2B 3A 3B 1 2 3 4 5 6
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Based on Eisen,
1998 Genome
Duplication
Res 8: 163-167.
94. Phylogenetic Prediction of Function
• Greatly improves accuracy of functional
predictions compared to similarity alone
(e.g., blast)
• Many surrogate methods (e.g., COGs)
• Automated phylogenetic methods now
available
– Sean Eddy, Steven Brenner, Kimmen Sjölander,
etc.
• But …
96. Example 3: Non homology methods
• Many genes have homologs in other species
but no homologs have ever been studied
experimentally
• Non-homology methods can make functional
predictions for these
• Example: phylogenetic profiling (extension of
prior work of Koonin, Tatusov, Ragan, et al.)
97. Phylogenetic profiling basis
• Microbial genes are lost rapidly when not
maintained by selection
• Genes can be acquired by lateral transfer
• Frequently gain and loss occurs for entire
pathways/processes
• Thus might be able to use correlated presence/
absence information to identify genes with
similar functions
98. Non-Homology Predictions:
Phylogenetic Profiling
• Step 1: Search all genes in
organisms of interest against all
other genomes
• Ask: Yes or No, is each gene
found in each other species
• Cluster genes by distribution
patterns (profiles)
99. Carboxydothermus hydrogenoformans
• Isolated from a Russian hotspring
• Thermophile (grows at 80°C)
• Anaerobic
• Grows very efficiently on CO
(Carbon Monoxide)
• Produces hydrogen gas
• Low GC Gram positive
(Firmicute)
• Genome Determined (Wu et al.
2005 PLoS Genetics 1: e65. )
104. GEBA Lesson 3:
Phylogeny driven genome selection (and
phylogenetics) improves genome annotation
• Took 56 GEBA genomes and compared results vs. 56
randomly sampled new genomes
• Better definition of protein family sequence “patterns”
• Greatly improves “comparative” and “evolutionary”
based predictions
• Conversion of hypothetical into conserved hypotheticals
• Linking distantly related members of protein families
• Improved non-homology prediction
107. Network of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
108. Protein Family Rarefaction
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
116. rRNA Phylotyping
• Collect DNA from
environment
• PCR amplify rRNA
genes using broad (so-
called universal) primers
• Sequence
• Align to others
• Infer evolutionary tree
• Unknowns “identified”
by placement on tree
• Some use BLAST, but
not as good as phylogeny
117. rRNA PCR
The Hidden Majority Richness estimates
Hugenholtz 2002 Bohannan and Hughes 2003
121. Shotgun Sequencing Allows Use of
Alternative Anchors (e.g., RecA)
Venter et al., Science
304: 66. 2004
122. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
te
Be ob
ta ac
pr te
ot ria
G eo
am ba
m ct
ap er
ro ia
Ep te
si ob
lo ac
np te
ro ria
D te
el ob
ta ac
pr te
ot ria
eo
C ba
ya ct
no er
b ia
ac
te
Fi ria
rm
ic
ut
Ac e s
tin
ob
ac
te
C ria
hl
o ro
bi
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
hl
o ro
fle
Sp xi
iro
ch
ae
Fu te
so s
D ba
ei ct
no er
c oc ia
cu
s-
Eu Th
ry erm
ar
ch us
C ae
re ot
na a
rc
ha
eo
ta
304: 66. 2004
Shotgun Sequencing Allows Use of Other Markers
EFG
Venter et al., Science
EFTu
rRNA
RecA
RpoB
HSP70
Hinweis der Redaktion
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
It has been less than 10 years since the first genome was determined\n
\n
It has been less than 10 years since the first genome was determined\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Phylogenetic analysis of rRNAs led to the discovery of archaea\n
Extension of rRNA analysis to uncultured organisms using PCR\n
\n
\n
This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n\n clone from the Sargasso Sea. This shows that this \n