The document discusses the phylogeny of the bacterial phylum Actinobacteria. It notes that Actinobacteria contains at least 23 major lineages, but genome sequences are mostly limited to a few lineages. Expanding genomic sampling across the phylum is needed to better understand Actinobacteria diversity and evolution.
3. Microbial genomes
From http://genomesonline.org
Tuesday, May 25, 2010
4. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, May 25, 2010
5. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, May 25, 2010
6. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, May 25, 2010
7. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, May 25, 2010
8. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Same trend in
Dictyoglomus
Aquificae
Thermudesulfobacteria
Archaea
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, May 25, 2010
9. Proteobacteria
2002 TM6
OS-K
Acidobacteria
• At least 40
Termite Group
OP8
phyla of
Nitrospira
Bacteroides bacteria
Chlorobi
Fibrobacteres
Marine GroupA • Genome
WS3
Gemmimonas
Firmicutes
sequences are
Fusobacteria
Actinobacteria
mostly from
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are
Planctomycetes
Spriochaetes only sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Same trend in
Dictyoglomus
Aquificae
Thermudesulfobacteria
Eukaryotes
Thermotogae
OP1 Based on
OP11 Hugenholtz, 2002
Tuesday, May 25, 2010
10. The Tree is not Happy
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, May 25, 2010
11. Why Increase Phylogenetic Coverage?
• Common approach within some eukaryotic
groups
• Many small projects to fill in bacterial or
archaeal gaps
• Phylogenetic gaps in bacterial and archaeal
projects commonly lamented in literature
• Many potential benefits
Tuesday, May 25, 2010
12. Proteobacteria
• NSF-funded TM6
OS-K
• At least 40
Tree of Life Acidobacteria
Termite Group phyla of
OP8
Project Nitrospira
Bacteroides bacteria
Chlorobi
• A genome Fibrobacteres
Marine GroupA • Genome
WS3
from each of Gemmimonas sequences are
Firmicutes
eight phyla Fusobacteria
mostly from
Actinobacteria
OP9
Cyanobacteria
Synergistes
three phyla
Deferribacteres
Chrysiogenetes
NKB19
• Some other
Verrucomicrobia
Chlamydia
OP3
phyla are only
Planctomycetes
Spriochaetes sparsely
Coprothmermobacter
OP10
Thermomicrobia
sampled
Chloroflexi
TM7
Deinococcus-Thermus
• Solution I:
Dictyoglomus
Aquificae sequence more
Eisen & Ward, PIs Thermudesulfobacteria
Thermotogae
OP1 phyla
OP11
Tuesday, May 25, 2010
14. The Tree of Life is Still Angry
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, May 25, 2010
16. Proteobacteria
TM6
OS-K
• At least 100 phyla of
Acidobacteria
Termite Group
OP8
bacteria
Nitrospira
Bacteroides
Chlorobi
• Genome sequences are
Fibrobacteres
Marine GroupA mostly from three phyla
WS3
Gemmimonas
Firmicutes • Most phyla with cultured
Fusobacteria
Actinobacteria species are sparsely
OP9
Cyanobacteria
Synergistes
sampled
Deferribacteres
Chrysiogenetes
NKB19 • Lineages with no cultured
Verrucomicrobia
Chlamydia
OP3
taxa even more poorly
Planctomycetes
Spriochaetes sampled
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
• Solution - use tree to really
TM7
Deinococcus-Thermus fill gaps
Dictyoglomus
Aquificae Well sampled phyla
Thermudesulfobacteria
Thermotogae
OP1
OP11
Tuesday, May 25, 2010
19. GEBA Pilot Project Overview
• Identify major branches in rRNA tree for
which no genomes are available
• Identify branches with a cultured
representative in DSMZ
• Grow > 200 of these and prep. DNA
• Sequence and finish 100 (covering breadth
of bacterial/archaea diversity)
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
Tuesday, May 25, 2010
20. GEBA and Openness
• All data released as quickly as
possible w/ no restrictions to
IMG-GEBA; Genbank, etc
• Data also available in Biotorrents
(http://biotorrents.net)
• Individual genome reports
published in OA “Standards in
Genome Sciences (SIGS)”
• 1st GEBA paper in Nature freely
available and published using
Creative Commons License
Tuesday, May 25, 2010
21. GEBA Lesson 1
rRNA Tree is Useful for Identifying
Phylogenetically Novel Genomes
Tuesday, May 25, 2010
22. rRNA Tree of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, May 25, 2010
23. Network of Life
Bacteria
Archaea
Eukaryotes
Figure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, May 25, 2010
24. Whole Genome Tree w/ AMPHORA
See Wu and Eisen, Genome Biology 2008 9: R151
http://bobcat.genomecenter.ucdavis.edu/AMPHORA/
Tuesday, May 25, 2010
26. PD of rRNA, Genome Trees Similar
From Wu et al. 2009 Nature 462, 1056-1060
Tuesday, May 25, 2010
27. GEBA Lesson 1B
rRNA Tree topology is not perfect;
Genome-based trees better
Tuesday, May 25, 2010
28. 16s Says Hyphomonas is in Rhodobacteriales
Badger et al.
2005
28
Tuesday, May 25, 2010
29. WGT and individual gene trees:
Its Related to Caulobacterales
Badger et al.
2005
29
Tuesday, May 25, 2010
30. Wh
Concatenated
alignment “whole
genome tree” built
using AMPHORA
Tuesday, May 25, 2010
31. Whole genome phylogeny?
• Many approaches
– Gene presence/absence
– Concatenation of phylogenetic markers
– Separate phylogeny of genes and then
integration of results (e.g., networks)
– Models that incorporate gain/loss as well as
gene phylogeny
• No new results from us
– However ... see Eric Alm talk Ballroom A -
“Microbes in a changing world” session
tomorrow AM
Tuesday, May 25, 2010
32. GEBA Lesson 2
Phylogeny-driven genome selection
helps discover new genetic diversity
Tuesday, May 25, 2010
33. Network of Life
Bacteria
Archaea
Eukaryotes
FIgure from Barton, Eisen et al.
“Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Tuesday, May 25, 2010
34. Protein Family Rarefaction Curves
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
Tuesday, May 25, 2010
42. Predicting Function
• Key step in genome projects
• More accurate predictions help guide
experimental and computational analyses
• Many diverse approaches
• Comparative and evolutionary analysis
greatly improves most predictions
Tuesday, May 25, 2010
43. Most/All Functional Prediction Improves
w/ Better Phylogenetic Sampling
• Better definition of protein family sequence
“patterns” (e.g., improved HMMs)
• Conversion of hypothetical into conserved
hypotheticals
• Greatly improves “comparative” and
“evolutionary” based predictions
• Linking distantly related members of protein
families
• Improved non-homology prediction
Tuesday, May 25, 2010
50. Shotgun Sequencing Allows Use of
Alternative Anchors (e.g., RecA)
Venter et al., 2004
Tuesday, May 25, 2010
51. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
t eo
Be b
ac
ta te
pr ria
ot
eo
G ba
am
Tuesday, May 25, 2010
ct
m er
ia
ap
ro
te
Ep ob
si ac
lo te
np ria
ro
te
De ob
ac
lta te
pr ria
ot
eo
ba
C ct
ya er
no ia
ba
ct
er
ia
Fi
rm
ic
ut
es
Ac
tin
ob
ac
te
ria
C
hl
or
ob
i
C
Major Phylogenetic Group
FB
Sargasso Phylotypes
C
hl
or
of
le
xi
Sp
iro
cha
et
es
Fu
so
ba
De ct
in er
o ia
co
cc
u s-
Th
er
Eu
ry m
ar us
ch
ae
ot
C a
re
na
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
EFG
EFTu
rRNA
RecA
RpoB
HSP70
52. Weighted % of Clones
0
0.1250
0.2500
0.3750
0.5000
Al
ph
ap
ro
t eo
Be b
ac
ta te
pr ria
ot
eo
G ba
am
Tuesday, May 25, 2010
ct
m er
ia
ap
ro
te
Ep ob
si ac
lo te
np ria
ro
te
De ob
ac
lta te
pr ria
ot
eo
ba
C ct
ya er
no ia
ba
ct
er
ia
Fi
rm
ic
ut
es
sampling
Ac
tin
ob
ac
te
ria
C
hl
or
ob
i
C
Major Phylogenetic Group
better genomic
FB
Sargasso Phylotypes
C
hl
or
ofl
ex
Sp i
iro
cha
et
es
Fu
so
ba
Should improve with
De ct
in er
o ia
co
cc
u s-
Th
er
Eu
ry m
ar us
ch
ae
ot
C a
re
na
rc
ha
eo
ta
Shotgun Sequencing Allows Use of Other Markers
Venter et al., 2004
EFG
EFTu
rRNA
RecA
RpoB
HSP70
53. Functional Inference from
Metagenomics
• Can work well for individual genes
• Predicting “community” function is
challenging because treating community as
a bag of genes does not work well
• Better to “compartmentalize” data ...
Tuesday, May 25, 2010
60. Al
ph
ap
ro
Be te
ta o ba
G p
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
am ro ct
te er
m o ia
Tuesday, May 25, 2010
ap ba
ro ct
D te er
el ob ia
ta
pr ac
Ep ot te
U si
lo eo ria
nc ba
la np
ct
ss ro er
ifi te ia
ed ob
Pr ac
ot te
eo ria
ba
Cy ct
an er
ob ia
ac
Ch te
ria
la
m
Ac yd
id ia
ob e
Ba ac
te
ct ria
er
Ac oi
de
tin te
ob s
ac
te
ria
Aq
Pl ui
an fic
ct
om ae
yc
Sp et
AMPHORA - each read on its own tree
iro es
ch
ae
Fi te
rm s
ic
ut
Ch es
lo
ro
U fle
nc xi
la Ch
ss lo
ifi ro
ed bi
Ba
ct
er
ia
Phylogenetic Binning Using AMPHORA
frr
tsf
pgk
rplL
rplF
rplP
rplT
rplE
infC
rpsI
rplS
rplA
rplB
rplK
rplC
rpsJ
rplN
rplD
rplM
rpsE
rpsS
rpsB
rpsK
rpsC
rpoB
rpsM
pyrG
nusA
dnaG
rpmA
smpB
61. Phylogenetic Binning Using AMPHORA
dnaG
0.7
frr
infC
0.6 nusA
pgk
pyrG
0.5
0.4
Should improve with rplA
rplB
rplC
rplD
0.3 better genomic rplE
rplF
rplK
rplL
0.2
0.1
sampling rplM
rplN
rplP
rplS
rplT
rpmA
0 rpoB
rpsB
es
ia
s
es
s
ria
ia
ia
bi
ia
ia
om ae
e
ia
ria
ria
ria
xi
te
te
ia
er
er
er
er
r
er
fle
ro
et
ut
rpsC
fic
te
te
te
te
te
yd
de
ae
ct
ct
ct
ct
ct
lo
yc
ro
ic
ac
ac
ac
ac
ac
ui
m
ch
oi
ba
ba
Ch
ba
ba
Ba
rm
rpsE
lo
Aq
ob
ob
ob
ob
ob
er
la
iro
eo
Ch
o
eo
o
Fi
ed
Ch
ct
an
te
te
te
te
id
tin
ct
rpsI
Sp
ot
ot
Ba
Ac
ro
ro
ro
ro
ifi
an
Cy
Ac
Pr
pr
ss
ap
p
ap
np
rpsJ
Pl
ta
ta
ed
la
ph
m
lo
el
Be
nc
rpsK
si
ifi
am
Al
D
Ep
U
ss
rpsM
G
la
nc
rpsS
U
smpB
tsf
AMPHORA - each read on its own tree
Tuesday, May 25, 2010
62. Metagenomic Analysis Improves w/
Phylogenetic Sampling
• Small but real improvements in
– Gene identification / confirmation
– Functional prediction
– Binning
– Phylogenetic classification
Tuesday, May 25, 2010
63. Metagenomic Analysis Improves w/
Phylogenetic Sampling
• Small but real improvements in
– Gene identification / confirmation
– Functional prediction
– Binning
– Phylogenetic classification
• But not a lot ...
Tuesday, May 25, 2010
64. How to improve phylogenetic
analysis of metagenomic data
• Fragmented data
• Which genes to use?
• More automation
Tuesday, May 25, 2010
67. Phylogenetic Binning Using AMPHORA
dnaG
0.7
frr
infC
0.6 nusA
pgk
pyrG
0.5
0.4
Improves with better rplA
rplB
rplC
rplD
0.3 phylogenetic methods rplE
rplF
rplK
rplL
0.2 rplM
rplN
rplP
0.1 rplS
rplT
rpmA
0 rpoB
rpsB
es
ia
s
es
s
ria
ia
ia
bi
ia
ia
om ae
e
ia
ria
ria
ria
xi
te
te
ia
er
er
er
er
r
er
fle
ro
et
ut
rpsC
fic
te
te
te
te
te
yd
de
ae
ct
ct
ct
ct
ct
lo
yc
ro
ic
ac
ac
ac
ac
ac
ui
m
ch
oi
ba
ba
Ch
ba
ba
Ba
rm
rpsE
lo
Aq
ob
ob
ob
ob
ob
er
la
iro
eo
Ch
o
eo
o
Fi
ed
Ch
ct
an
te
te
te
te
id
tin
ct
rpsI
Sp
ot
ot
Ba
Ac
ro
ro
ro
ro
ifi
an
Cy
Ac
Pr
pr
ss
ap
p
ap
np
rpsJ
Pl
ta
ta
ed
la
ph
m
lo
el
Be
nc
rpsK
si
ifi
am
Al
D
Ep
U
ss
rpsM
G
la
nc
rpsS
U
smpB
tsf
AMPHORA - each read on its own tree
Tuesday, May 25, 2010