HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
Phylogenomics and the diversity and diversification of microbes
1. Phylogenomics and the Diversity
and Diversification of Microbes
Jonathan A. Eisen
UC Davis
UC Davis Talk
February 11, 2011
2. Phylogenomics of Novelty
Variation in
Mechanisms of
Mechanisms:
Origin of New
Patterns, Causes
Functions
and Effects
Species Evolution
3. Why do this?
• Discover causes and effects of differences in
evolvability
• Improve predictions from genome analysis
• Guide interpretation of biological data
4. Outline
• Introduction
• Phylogenomic Stories
– Within genome invention of novelty
– Stealing novelty
– Communities of microbes
– Community service and knowing what we don’t
know
6. rRNA Tree of Life
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
7. Limited Sampling of RRR Studies
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
8. Limited Sampling of RRR Studies
Haloferax
Methanococcus
Chlorobium
Deinococcus
Thermotoga
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
12. Limited Sampling of RRR Studies
Haloferax
Methanococcus
Chlorobium
Deinococcus
Thermotoga
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
20. Phylogenomics of Novelty
• How does novelty originate?
• Major categories of processes
• From within
• De novo invention
• Simple substitutions
• Duplication and divergence
• Domain shuffling
• Small & large rearrangements
• Regulatory changes
• From outside
• Lateral gene transfer
• Symbioses
21. Phylogenomics of Novelty
• How does novelty originate?
• Major categories of processes
Mechanisms of • From within
Origin of New • De novo invention
Functions • Simple substitutions
• Duplication and divergence
• Domain shuffling
• Small & large rearrangements
• Regulatory changes
• From outside
• Lateral gene transfer
• Symbioses
22. From Eisen et al.
1997 Nature
Medicine 3:
1076-1078.
23. Blast Search of H. pylori “MutS”
• Blast search pulls up Syn. sp MutS#2 with much higher p
value than other MutS homologs
• Based on this TIGR predicted this species had mismatch
repair
• Assumes functional constancy
Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
24. Predicting Function
• Identification of motifs
– Short regions of sequence similarity that are indicative of
general activity
– e.g., ATP binding
• Homology/similarity based methods
– Gene sequence is searched against a databases of other
sequences
– If significant similar genes are found, their functional
information is used
• Problem
– Genes frequently have similarity to hundreds of motifs
and multiple genes, not all with the same function
26. Overlaying Functions onto Tree
MutS2
Aquae
MSH5 Strpy
Bacsu
Synsp
Deira Helpy
Yeast
Human Borbu Metth
Celeg
MSH6 mSaco
Yeast
Human
Mouse
Arath
Yeast MSH4
Celeg
Human
Arath
Human
MSH3 Mouse
Fly
Spombe
Yeast Xenla
Rat
Mouse
Yeast Human
MSH1 Spombe Yeast MSH2
Neucr
Arath
Aquae Trepa
Chltr
DeiraTheaq
BacsuBorbu
Thema
SynspStrpy Based on Eisen,
Ecoli
Neigo
1998 Nucl Acids
MutS1 Res 26: 4291-4300.
27.
28. Evolutionary Functional Prediction
EXAMPLE A METHOD EXAMPLE B
2A CHOOSE GENE(S) OF INTEREST 5
3A 1 3 4
2B 2
IDENTIFY HOMOLOGS 5
1A 2A 1B 3B 6
ALIGN SEQUENCES
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
CALCULATE GENE TREE
Duplication?
1A 2A 3A 1B 2B 3B 1 2 3 4 5 6
OVERLAY KNOWN
FUNCTIONS ONTO TREE
Duplication?
1 2 3 4 5 6
1A 2A 3A 1B 2B 3B
INFER LIKELY FUNCTION
OF GENE(S) OF INTEREST
Ambiguous
Duplication?
Species 1 Species 2 Species 3
1A 1B 2A 2B 3A 3B 1 2 3 4 5 6
ACTUAL EVOLUTION
(ASSUMED TO BE UNKNOWN)
Based on Eisen,
1998 Genome
Duplication
Res 8: 163-167.
30. Tetrahymena Genome Processing
• Probably exists as a defense mechanism
• Analogous to RIPPING and
heterochromatin silencing
• Presence of repetitive DNA in MAC but
not TEs suggests the mechanism involves
targeting foreign DNA
• Thus unlike RIPPING ciliate processing
does not limit diversification by duplication
Eisen et al. 2006. PLoS Biology.
31. Phylogenomics of Novelty II
Sometimes, it is easier to steal, borrow, or
coopt functions rather than evolve them
anew
35. Network of Life
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
36. Correlated gain/loss of genes
• Microbial genes are lost rapidly when not
maintained by selection
• Genes can be acquired by lateral transfer
• Frequently gain and loss occurs for entire
pathways/processes
• Thus might be able to use correlated
presence/absence information to identify
genes with similar functions
37. Non-Homology Predictions:
Phylogenetic Profiling
• Step 1: Search all genes in
organisms of interest against all
other genomes
• Ask: Yes or No, is each gene
found in each other species
• Cluster genes by distribution
patterns (profiles)
38. Carboxydothermus hydrogenoformans
• Isolated from a Russian hotspring
• Thermophile (grows at 80°C)
• Anaerobic
• Grows very efficiently on CO
(Carbon Monoxide)
• Produces hydrogen gas
• Low GC Gram positive
(Firmicute)
• Genome Determined (Wu et al.
2005 PLoS Genetics 1: e65. )
43. Mutualistic Genome Evolution
• Compare and contrast different types of
mutualistic symbioses
• Diverse hosts, symbionts, biology, ages
• Organelles, chemosymbioses,
photosynthetic symbioses, nutritional
symbioses
• What are the rules & patterns?
44. Glassy Winged Sharpshooter
• Feeds on xylem
sap
• Vector for
Pierce’s Disease
• Potential
bioterror agent
49. Higher Evolutionary Rates in
Endosymbionts
Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
50. Variation in Evolution Rates
MutS MutL
+ +
+ +
+ +
+ +
_ _
_ _
Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
51. Polymorphisms in Metapopulation
• Data from ~200 hosts
– 104 SNPs
– 2 indels
• PCR surveys show that
this is between host
variation
• Much lower ratio of
transitions:transversions
than in Blochmannia
• Consistent with absence
of MMR from
Blochmannia
52. Baumannia is a Vitamin and
Cofactor Producing Machine
Wu et al.
2006
PLoS
Biology 4:
e188.
65. How can we best use
metagenomic data?
• Many possible uses including:
– Improvements on rRNA based phylotyping and
species diversity measurements
– Adding functional information on top of
phylogenetic/species diversity information
• Most/all possible uses either require or are
improved with phylogenetic analysis
68. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
te
Be ob
ta ac
pr te
ot ria
G eo
am ba
m ct
ap er
ro ia
Ep te
si ob
lo ac
np te
ro ria
D te
el ob
ta ac
pr te
ot ria
eo
C ba
ya ct
no er
b ia
ac
te
Fi ria
rm
ic
ut
Ac e s
tin
ob
ac
te
C ria
hl
o ro
bi
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
hl
o ro
fle
Sp xi
iro
ch
ae
Fu te
so s
D ba
ei ct
no er
c oc ia
cu
s-
Eu Th
ry erm
ar
ch us
C ae
re ot
na a
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
EFG
Venter et al., Science 304: 66-74. 2004
EFTu
rRNA
RecA
RpoB
HSP70
86. As of 2002 Proteobacteria
TM6
OS-K • At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
87. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
88. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
89. As of 2002 Proteobacteria
TM6
OS-K
• At least 40
Acidobacteria
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
90. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of
OP8
Project Nitrospira
Bacteroides bacteria
Chlorobi
• A genome Fibrobacteres
Marine GroupA • Genome
WS3
from each of Gemmimonas sequences are
Firmicutes
eight phyla Fusobacteria
mostly from
Actinobacteria
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are only
Planctomycetes
Spriochaetes sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Solution I:
Dictyoglomus
Eisen, Ward, Aquificae
Thermudesulfobacteria
sequence more
Robb, Nelson, et Thermotogae
phyla
OP1
al OP11
91.
92. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of bacteria
OP8
Project Nitrospira
• Genome
Bacteroides
• A genome Chlorobi
Fibrobacteres sequences are
Marine GroupA
from each of WS3
Gemmimonas mostly from
eight phyla Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria
• Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter • Still highly
OP10
Thermomicrobia
Chloroflexi
biased in terms
TM7
Deinococcus-Thermus
Dictyoglomus
of the tree
Aquificae
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1
OP11
93. Proteobacteria
• GEBA TM6
OS-K • At least 40
Acidobacteria
• A genomic Termite Group
OP8
phyla of bacteria
encyclopedia Nitrospira
Bacteroides • Genome
Chlorobi
of bacteria Fibrobacteres
Marine GroupA
sequences are
and archaea WS3
Gemmimonas mostly from
Firmicutes
Fusobacteria three phyla
Actinobacteria
OP9
Cyanobacteria • Some other
Synergistes
Deferribacteres
Chrysiogenetes
phyla are only
NKB19
Verrucomicrobia sparsely
Chlamydia
OP3
Planctomycetes
sampled
Spriochaetes
Coprothmermobacter
OP10
• Solution: Really
Thermomicrobia
Chloroflexi Fill in the Tree
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Eisen & Ward, PIs Thermotogae
OP1
OP11
95. GEBA Pilot Project: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
Eisen, Eddy Rubin, Jim Bristow)
• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,
Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et
al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik
D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N.
Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, Eddy Rubin, Jim Bristow)
96. GEBA Pilot Project Overview
• Identify major branches in rRNA tree for
which no genomes are available
• Identify those with a cultured representative in
DSMZ
• DSMZ grew > 200 of these and prepped DNA
• Sequence and finish 100+ (covering breadth of
bacterial/archaea diversity)
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
• 1st paper Wu et al in Nature Dec 2009
97. GEBA Phylogenomic Lesson 1
The rRNA Tree of Life is a Useful Tool
for Identifying Phylogenetically Novel
Genomes
98. Compare PD in Trees
From Wu et al. 2009 Nature 462, 1056-1060
104. Network of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
105. Protein Family Rarefaction Curves
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
115. Most/All Functional Prediction Improves
w/ Better Phylogenetic Sampling
• Took 56 GEBA genomes and compared results vs. 56
randomly sampled new genomes
• Better definition of protein family sequence “patterns”
• Greatly improves “comparative” and “evolutionary”
based predictions
• Conversion of hypothetical into conserved hypotheticals
• Linking distantly related members of protein families
• Improved non-homology prediction
Kostas Natalia Thanos Nikos Iain
Mavrommatis Ivanova Lykidis Kyrpides Anderson
117. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
te
Be ob
ta ac
pr te
ot ria
G eo
am ba
m ct
ap er
ro ia
Ep te
si ob
lo ac
np te
ro ria
D te
el ob
ta ac
pr te
ot ria
eo
C ba
ya ct
no er
b ia
ac
te
Fi ria
rm
ic
ut
Ac e s
tin
ob
ac
te
C ria
hl
o ro
bi
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
hl
o ro
fle
Sp xi
iro
ch
ae
Fu te
so s
D ba
ei ct
no er
c oc ia
cu
s-
Eu Th
ry erm
ar
ch us
C ae
re ot
na a
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
EFG
Venter et al., Science 304: 66-74. 2004
EFTu
rRNA
RecA
RpoB
HSP70
118. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
te
Be ob
ta ac
pr te
ot ria
G eo
am ba
m ct
ap er
ro ia
Ep te
si ob
lo ac
np te
ro ria
D te
el ob
ta ac
pr te
ot ria
eo
C ba
ya ct
no er
b ia
ac
te
Fi ria
rm
ic
ut
Ac e s
tin
ob
ac
te
C ria
hl
o ro
bi
without good
C
FB
Major Phylogenetic Group
Sargasso Phylotypes
C
Cannot be done
hl
o ro
fle
Sp xi
iro
ch
ae
Fu te
so s
D ba
ei ct
no er
c ia
sampling of genomes
oc
cu
s-
Eu Th
ry erm
ar
ch us
C ae
re ot
na a
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
EFG
Venter et al., Science 304: 66-74. 2004
EFTu
rRNA
RecA
RpoB
HSP70
Hinweis der Redaktion
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
It has been less than 10 years since the first genome was determined\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
Extension of rRNA analysis to uncultured organisms using PCR\n
\n
\n
\n
Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
Functional prediction using a gene tree is just like predicting the biology of a species using a species tree\n
\n
\n
\n
\n
This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n
\n
\n
\n
Phylogenetic analysis of rRNAs led to the discovery of archaea\n
This is a tree of a rRNA gene that was found on a large DNA fragment isolated from the Monterey Bay. This rRNA gene groups in a tree with genes from members of the gamma Proteobacteria a group that includes E. coli as well as many environmental bacteria. This rRNA phylotype has been found to be a dominant species in many ocean ecosystems.\n\n clone from the Sargasso Sea. This shows that this \n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
It has been less than 10 years since the first genome was determined\n