SlideShare ist ein Scribd-Unternehmen logo
1 von 77
Downloaden Sie, um offline zu lesen
“Bayesian Taxonomic Assignment
for the Next-Generation
Metagenomics”
Jonathan A. Eisen
August 7, 2013
DHS Meeting
Wednesday, August 7, 13
Shotgun Metagenomics
Wednesday, August 7, 13
Shotgun Metagenomics
Wednesday, August 7, 13
Shotgun Metagenomics
DNA
Wednesday, August 7, 13
Shotgun Metagenomics
DNA Sequence
Wednesday, August 7, 13
Shotgun Metagenomics
DNA Sequence
?????
Wednesday, August 7, 13
Shotgun Metagenomics
DNA Sequence
Who is there?
What are they
doing?
Wednesday, August 7, 13
Shotgun Metagenomics
DNA Sequence
Wednesday, August 7, 13
Shotgun Metagenomics
• Which communities
are most similar /
different?
• What accounts for the
differences?
• Natural vs. unnatural
• Community level
signatures (of events,
stability,
biogeography, etc)
Wednesday, August 7, 13
Our Approach - Phylogeny
Phylogeny of
sequences can reveal
details about history,
taxonomy, function,
and ecology
Wednesday, August 7, 13
DNA
extraction
PCR
Sequence
rRNA genes
Sequence alignment = Data
matrix
Phylogenetic tree
PCR
rRNA1
rRNA2
Makes lots of
copies of the
rRNA genes
in sample
rRNA1
5’...ACACACATAGGTGGAGCTA
GCGATCGATCGA... 3’
E. coli
Humans
A
T
T
A
G
A
A
C
A
T
C
A
C
A
A
C
A
G
G
A
G
T
T
C
rRNA1
E. coli Humans
rRNA2
rRNA2
5’..TACAGTATAGGTGGAGCTAG
CGACGATCGA... 3’
rRNA phylotyping
rRNA3
5’...ACGGCAAAATAGGTGGATT
CTAGCGATATAGA... 3’
rRNA4
5’...ACGGCCCGATAGGTGGATT
CTAGCGCCATAGA... 3’
rRNA3 C A C T G T
rRNA4 C A C A G T
Yeast T A C A G T
Yeast
rRNA3
rRNA4
Phylotyping
Wednesday, August 7, 13
Uses of Phylogeny in Metagenomics
• Taxonomic assessment
• Phylogenetic OTUs
• Phylogenetic taxonomy assignment
• Phylogenetic binning
• Sample comparisons and hypothesis testing
• Alpha diversity (i.e., PD)
• Beta diversity
• Trait evolution
• Dispersal
• Functional predictions
• Rates of evolution
• Convergence
Wednesday, August 7, 13
Venter et al., Science 304: 66. 2004
rRNA Phylotyping - Sargasso Metagenome
Wednesday, August 7, 13
Venter et al., Science 304: 66. 2004
RecA Phylotyping - Sargasso Metagenome
Wednesday, August 7, 13
0
0.125
0.250
0.375
0.500
Alphaproteobacteria
Betaproteobacteria
G
am
m
aproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
C
yanobacteriaFirm
icutesActinobacteria
C
hlorobi
C
FB
C
hloroflexiSpirochaetesFusobacteria
Deinococcus-Therm
us
EuryarchaeotaC
renarchaeota
Sargasso Phylotypes
Weighted%ofClones
Major Phylogenetic Group
EFG EFTu HSP70 RecA RpoB rRNA
Phylotyping - Sargasso Metagenome
Venter et al., Science 304: 66. 2004
Wednesday, August 7, 13
GOS 1
GOS 2
GOS 3
GOS 4
GOS 5
Phylogenetic ID of Novel Lineages
Wu et al PLoS One 2011
Wednesday, August 7, 13
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes vitamins and cofactors
Sulcia makes amino acids
Phylogenetic Binning
Wednesday, August 7, 13
Phylogenetic Functional Prediction
Venter et al., Science 304: 66. 2004
Wednesday, August 7, 13
Sequencing Revolution
• More Samples
• Deeper sequencing
• The rare biosphere
• Relative abundance estimates
• More samples (with barcoding)
• Times series
• Spatially diverse sampling
• Fine scale sampling
Wednesday, August 7, 13
http://phylosift.wordpress.com
PhyloSift
Supported by DHS Grant
Wednesday, August 7, 13
Acknowledgements
Jonathan
Eisen
Students and other staff:
- Eric Lowe, John Zhang, David Coil
Open source community:
- BLAST, LAST, HMMER, Infernal, pplacer, Krona,
metAMOS, Bioperl, Bio::Phylo, JSON, etc. etc.
PhyloSift is open source software:
- Website: http://phylosift.wordpress.org
- Code: http://github.com/gjospin/phylosift
Erick Matsen
FHCRC
Todd Treangen
BNBI, NBACC
Holly
Bik
Tiffanie
Nelson
Mark
Brown
Aaron
Darling
Guillaume
Jospin
Supported by DHS Grant
Wednesday, August 7, 13
Analysis &
Summary
PhyloSift
Wednesday, August 7, 13
Analysis &
Summary
Analysis &
Summary
•Metagenomic reads
•Contigs
•Genes
PhyloSift
Wednesday, August 7, 13
Analysis &
Summary
Searching inputs against
reference family DB
PhyloSift
Wednesday, August 7, 13
Analysis &
Summary
Align to reference HMMs
for each family
PhyloSift
Wednesday, August 7, 13
Analysis &
Summary
Place reads into reference
phylogeny using pplacer
PhyloSift
Wednesday, August 7, 13
Analysis &
Summary
Summarize results &
additional analyses
PhyloSift
Wednesday, August 7, 13
Output 1: Taxonomy
Wednesday, August 7, 13
Taxonomic
summary
plots in
Krona
(Ondov et al
2011)
Taxonomic Summaries (via Krona)
Wednesday, August 7, 13
Wednesday, August 7, 13
Wednesday, August 7, 13
Tree Reconciliation in PhyloSift
Wednesday, August 7, 13
Tree Reconciliation in PhyloSift
Environmental
Sequences
Named
Taxa
Wednesday, August 7, 13
Output 2: Phylogenetic Tree of Reads
Wednesday, August 7, 13
PhyloSift Tree Browsing
Darling et al Submitted
Placement tree from 2 week old infant gut data
Wednesday, August 7, 13
Output 3: Edge PCA
 Edge PCA for exploratory data analysis (Matsen and Evans 2013)
 Given E edges and S samples:
− For each edge, calculate difference in placement mass on either side of edge
− Results in E x S matrix
− Calculate E x E covariance matrix
− Calculate eigenvectors, eigenvalues of covariance matrix
 Eigenvector: each value indicates how “important” an edge is in explaining
differences among the S samples
Example calculating a matrix entry for an edge:
This edge gets 5-2=3
mass=5 mass=2
Wednesday, August 7, 13
Edge PCA: Identify
lineages that explain most
variation among samples
Matsen and Evans 2013, Darling et al Submitted.
Edge PCA
Wednesday, August 7, 13
QIIME and Edge PCA on
110 fecal metagenomes from
Yatsunenko et al 2012
Nature.
Sequenced with 454, to
about 150Mbp/metagenome
Darling et al
Submitted.
Edge PCA vs. UNIFRAC PCA
Wednesday, August 7, 13
Output 4: Forensics
Wednesday, August 7, 13
Output 4: Forensics
Wednesday, August 7, 13
Analysis &
Summary
Analysis &
Summary
•Metagenomic reads
•Contigs
•Genes
PhyloSift
Wednesday, August 7, 13
Analysis &
Summary
Analysis &
Summary
•Metagenomic reads
•Contigs
•Genes
PhyloSift
Challenge - Short Non
Overlapping Reads
Wednesday, August 7, 13
Analysis &
Summary
Searching inputs against
reference family DB
PhyloSift
Wednesday, August 7, 13
Markers
• PMPROK – Dongying Wu’s Bac/Arch
markers
• Eukaryotic Orthologs – Parfrey 2011 paper
• 16S/18S rRNA
• Mitochondria - protein-coding genes
• Viral Markers – Markov clustering on
genomes
• Codon Subtrees – finer scale taxonomy
• Extended Markers – plastids, gene families
Wednesday, August 7, 13
PMPROK Genes
Wednesday, August 7, 13
Analysis &
Summary
PhyloSift
Challenges:
•Limited ref. genomes
•Limited markers, families
Searching inputs against
reference family DB
Wednesday, August 7, 13
Improving I: More Markers
Phylogenetic group Genome
Number
Gene
Number
Maker
Candidates
Archaea 62 145415 106
Actinobacteria 63 267783 136
Alphaproteobacteria 94 347287 121
Betaproteobacteria 56 266362 311
Gammaproteobacteria 126 483632 118
Deltaproteobacteria 25 102115 206
Epislonproteobacteria 18 33416 455
Bacteriodes 25 71531 286
Chlamydae 13 13823 560
Chloroflexi 10 33577 323
Cyanobacteria 36 124080 590
Firmicutes 106 312309 87
Spirochaetes 18 38832 176
Thermi 5 14160 974
Thermotogae 9 17037 684
Wu et al. PLOS One 2013. In press.
Wednesday, August 7, 13
Representative
Genomes
Extract
Protein
Annotation
All v. All
BLAST
Homology
Clustering
(MCL)
SFams
Align &
Build
HMMs
HMMs
Screen for
Homologs
New
Genomes
Extract
Protein
Annotation
Figure 1
Sharpton et al. 2013
A
B
C
Improving II: More Families
Wednesday, August 7, 13
Improving III: Filling in the Tree
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Wednesday, August 7, 13
Genomic Encyclopedia of Bacteria & Archaea
Wu et al. 2009 Nature 462, 1056-1060
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Wednesday, August 7, 13
Genomic Encyclopedia of Bacteria & Archaea
Wu et al. 2009 Nature 462, 1056-1060
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Wednesday, August 7, 13
Family Diversity vs. PD
Wu et al. 2009 Nature 462, 1056-1060
Wednesday, August 7, 13
The Dark Matter of Biology
From Wu et al. 2009 Nature 462, 1056-1060
Wednesday, August 7, 13
50
Number of SAGs from Candidate Phyla
OD1
OP11
OP3
SAR406
Site A: Hydrothermal vent 4 1 - -
Site B: Gold Mine 6 13 2 -
Site C: Tropical gyres (Mesopelagic) - - - 2
Site D: Tropical gyres (Photic zone) 1 - - -
Sample collections at 4 additional sites are underway.
Phil Hugenholtz
GEBA Uncultured
Wednesday, August 7, 13
JGI Dark Matter Project
environmental
samples (n=9)
isolation of single
cells (n=9,600)
whole genome
amplification (n=3,300)
SSU rRNA gene
based identification
(n=2,000)
genome sequencing,
assembly and QC (n=201)
draft genomes
(n=201)
SAK
HSM ETLTG
HOT
GOM
GBS
EPR
TAETL T
PR
EBS
AK E
SM G TATTG
OM
OT
seawater brackish/freshwater hydrothermal sediment bioreactor
GN04
WS3 (Latescibacteria)
GN01
+Gí
LD1
WS1
Poribacteria
BRC1
Lentisphaerae
Verrucomicrobia
OP3 (Omnitrophica)
Chlamydiae
Planctomycetes
NKB19 (Hydrogenedentes)
WYO
Armatimonadetes
WS4
Actinobacteria
Gemmatimonadetes
NC10
SC4
WS2
Cyanobacteria
:36í2
Deltaproteobacteria
EM19 (Calescamantes)
2FW6SDí )HUYLGLEDFWHULD
GAL35
Aquificae
EM3
Thermotogae
Dictyoglomi
SPAM
GAL15
CD12 (Aerophobetes)
OP8 (Aminicenantes)
AC1
SBR1093
Thermodesulfobacteria
Deferribacteres
Synergistetes
OP9 (Atribacteria)
:36í2
Caldiserica
AD3
Chloroflexi
Acidobacteria
Elusimicrobia
Nitrospirae
49S1 2B
Caldithrix
GOUTA4
6$5 0DULQLPLFURELD
Chlorobi
)LUPLFXWHV
Tenericutes
)XVREDFWHULD
Chrysiogenetes
Proteobacteria
)LEUREDFWHUHV
TG3
Spirochaetes
WWE1 (Cloacamonetes)
70
ZB3
093í
'HLQRFRFFXVí7KHUPXV
OP1 (Acetothermia)
Bacteriodetes
TM7
GN02 (Gracilibacteria)
SR1
BH1
OD1 (Parcubacteria)
:6
OP11 (Microgenomates)
Euryarchaeota
Micrarchaea
DSEG (Aenigmarchaea)
Nanohaloarchaea
Nanoarchaea
Cren MCG
Thaumarchaeota
Cren C2
Aigarchaeota
Cren pISA7
Cren Thermoprotei
Korarchaeota
pMC2A384 (Diapherotrites)
BACTERIA ARCHAEA
archaeal toxins (Nanoarchaea)
lytic murein transglycosylase
stringent response
(Diapherotrites, Nanoarchaea)
ppGpp
limiting
amino acids
SpotT RelA
(GTP or GDP)
+ PPi
GTP or GDP
+ATP
limiting
phosphate,
fatty acids,
carbon, iron
DksA
Expression of components
for stress response
sigma factor (Diapherotrites, Nanoarchaea)
ı4
ȕ  ȕ¶
ı2ı3 ı1
-35 -10
Į17'
Į7'
51$ SROPHUDVH
oxidoretucase
+ +e- donor e- acceptor
H
1
Ribo
ADP
+
1+2
O
Reduction
Oxidation
H
1
Ribo
ADP
1+
O
2H
1$'  +  H 1$'++ + -
HGT from Eukaryotes (Nanoarchaea)
Eukaryota
O
+2+2
OH
1+
2+3
O
O
+2+2
1+
2+3
O
tetra-
peptide
O
+2+2
OH
1+
2+3
O
O
+2+2
1+
2+3
O
tetra-
peptide
murein (peptido-glycan)
archaeal type purine synthesis
(Microgenomates)
PurF
PurD
3XU1
PurL/Q
PurM
PurK
PurE
3XU
PurB
PurP
?
Archaea
adenine guanine
O
+ 12
+
1
1+2
1
1
H
H
1
1
1
H
H
H1 1
H
PRPP )$,$5
IMP
$,$5
A

GUA 
G U
G
U
A

G
U
A U
A  U
A  U
Growing
AA chain
W51$*O
recognizes
UGA
P51$
UGA recoded for Gly (Gracilibacteria)
ribosome
Woyke et al. Nature 2013.
Wednesday, August 7, 13
A Genomic Encyclopedia of Microbes (GEM)
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Wednesday, August 7, 13
A Genomic Encyclopedia of Microbes (GEM)
Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree
Wednesday, August 7, 13
Analysis 
Summary
Align to reference HMMs
for each family
PhyloSift
Wednesday, August 7, 13
Analysis 
Summary
Align to reference HMMs
for each family
PhyloSift
Challenge:
How to align?
Wednesday, August 7, 13
Zorro - Automated Masking
cetoTrueTree
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
200 400 800 1600 3200
DistancetoTrueTree
Sequence Length
200
no masking
zorro
gblocks
Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty
in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone.
0030288
Wednesday, August 7, 13
Analysis 
Summary
Place reads into reference
phylogeny using pplacer
PhyloSift
Wednesday, August 7, 13
Analysis 
Summary
Place reads into reference
phylogeny using pplacer
PhyloSift
Challenges:
•Trees from short reads
•Probabilistic methods
Wednesday, August 7, 13
Improving IV: Better Reference Tree
Lang et al.
2013
Wednesday, August 7, 13
Analysis 
Summary
Summarize results 
additional analyses
PhyloSift
Wednesday, August 7, 13
Phylosift DB Update
Amino Acid
Tree
Run PhyloSift
(search + align)
Execute'dbupdate'mode'
A'taxa'set'is'selected'with'a'
maxPD'cutoff'of'0.02'and'a'new'
tree'is'inferred'
EBI'
Genomes'
Infer Updated
Tree
Add'new'sequences'to'marker'packages'
JGI'
Genomes'
Private'
Genomes'
NCBI'
Genomes'
Nucleotide
Tree
Prune Tree 
Update reference
sequences with
new data
New'sequences'added'at'0.25'PD'for'amino'
acid'tree;'higher'PD'threshold'enables'
more'aggressive'searches'of'reference'
database,'since'LAST'searching'is'faster'
with'fewer'sequences.'
Reconcile'NCBI'taxonomy'IDs'with'
phylogeneOc'topologies,'for'both'
amino'acid'tree'and'codon'subtrees'
Tree
Reconciliation
Package
Markers
Users’'local'marker'databases'are'automaOcally'
scanned'each'Ome'PhyloSiR'is'run'and'any'new'
updates'are'automaOcally'downloaded'if'available'
Automated 
Download to 
PhyloSift Users
Prune Tree 
A'taxa'set'is'selected'with'a'
maxPD'cutoff'of'0.01'and'a'new'
tree'is'inferred'
Wednesday, August 7, 13
Improving VI: Other Methods
• PhylOTU
• Kembel all markers
• Kembel copy # correction
Wednesday, August 7, 13
Kembel Correction
Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates
of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi:10.1371/journal.pcbi.1002743
Wednesday, August 7, 13
alignment used to build the profile, resulting in a multiple
sequence alignment of full-length reference sequences and
metagenomic reads. The final step of the alignment process is a
PD versus PID clustering, 2) to explore overlap betw
clusters and recognized taxonomic designations, and
the accuracy of PhylOTU clusters from shotgun re
Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in
workflow of PhylOTU. See Results section for details.
doi:10.1371/journal.pcbi.1001061.g001
Finding Meta
Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011)
PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel
Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061
PhylOTU
Wednesday, August 7, 13
Kembel Combiner
typically used as a qualitative measure because duplicate s
quences are usually removed from the tree. However, the
test may be used in a semiquantitative manner if all clone
even those with identical or near-identical sequences, are i
cluded in the tree (13).
Here we describe a quantitative version of UniFrac that w
call “weighted UniFrac.” We show that weighted UniFrac b
haves similarly to the FST test in situations where both a
FIG. 1. Calculation of the unweighted and the weighted UniFr
measures. Squares and circles represent sequences from two differe
environments. (a) In unweighted UniFrac, the distance between t
circle and square communities is calculated as the fraction of t
branch length that has descendants from either the square or the circ
environment (black) but not both (gray). (b) In weighted UniFra
branch lengths are weighted by the relative abundance of sequences
the square and circle communities; square sequences are weight
twice as much as circle sequences because there are twice as many tot
circle sequences in the data set. The width of branches is proportion
to the degree to which each branch is weighted in the calculations, an
gray branches have no weight. Branches 1 and 2 have heavy weigh
since the descendants are biased toward the square and circles, respe
tively. Branch 3 contributes no value since it has an equal contributio
from circle and square sequences after normalization.
Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS
ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
Wednesday, August 7, 13
NMF in MetagenomesCharacterizing the niche-space distributions of components
Sites
North American East Coast_GS005_Embayment
North American East Coast_GS002_Coastal
North American East Coast_GS003_Coastal
North American East Coast_GS007_Coastal
North American East Coast_GS004_Coastal
North American East Coast_GS013_Coastal
North American East Coast_GS008_Coastal
North American East Coast_GS011_Estuary
North American East Coast_GS009_Coastal
Eastern Tropical Pacific_GS021_Coastal
North American East Coast_GS006_Estuary
North American East Coast_GS014_Coastal
Polynesia Archipelagos_GS051_Coral Reef Atoll
Galapagos Islands_GS036_Coastal
Galapagos Islands_GS028_Coastal
Indian Ocean_GS117a_Coastal sample
Galapagos Islands_GS031_Coastal upwelling
Galapagos Islands_GS029_Coastal
Galapagos Islands_GS030_Warm Seep
Galapagos Islands_GS035_Coastal
Sargasso Sea_GS001c_Open Ocean
Eastern Tropical Pacific_GS022_Open Ocean
Galapagos Islands_GS027_Coastal
Indian Ocean_GS149_Harbor
Indian Ocean_GS123_Open Ocean
Caribbean Sea_GS016_Coastal Sea
Indian Ocean_GS148_Fringing Reef
Indian Ocean_GS113_Open Ocean
Indian Ocean_GS112a_Open Ocean
Caribbean Sea_GS017_Open Ocean
Indian Ocean_GS121_Open Ocean
Indian Ocean_GS122a_Open Ocean
Galapagos Islands_GS034_Coastal
Caribbean Sea_GS018_Open Ocean
Indian Ocean_GS108a_Lagoon Reef
Indian Ocean_GS110a_Open Ocean
Eastern Tropical Pacific_GS023_Open Ocean
Indian Ocean_GS114_Open Ocean
Caribbean Sea_GS019_Coastal
Caribbean Sea_GS015_Coastal
Indian Ocean_GS119_Open Ocean
Galapagos Islands_GS026_Open Ocean
Polynesia Archipelagos_GS049_Coastal
Indian Ocean_GS120_Open Ocean
Polynesia Archipelagos_GS048a_Coral Reef
Component 1
Component 2
Component 3
Component 4
Component 5
0.1 0.2 0.3 0.4 0.5 0.6 0.2 0.4 0.6 0.8 1.0
Salinity
SampleDepth
Chlorophyll
Temperature
Insolation
WaterDepth
General
High
M edium
Low
NA
High
M edium
Low
NA
Water depth
4000m
2000!4000m
900!2000m
100!200m
20!100m
0!20m
4000m
2000!4000m
900!2000m
100!200m
20!100m
0!20m
(a) (b) (c)
Figure 3: a) Niche-space distributions for our five components (HT
); b) the site-
similarity matrix ( ˆHT ˆH); c) environmental variables for the sites. The matrices are
aligned so that the same row corresponds to the same site in each matrix. Sites are
ordered by applying spectral reordering to the similarity matrix (see Materials and
Methods). Rows are aligned across the three matrices.
Functional biogeography of ocean microbes
revealed through non-negative matrix
factorization Jiang et al. In press PLoS
One. Comes out 9/18.
w/ Weitz, Dushoff,
Langille, Neches,
Levin, etc
Wednesday, August 7, 13
Other Uses of PhyloSift
• Integration with other tools (e.g., QIIME)
• LGT detection
• Contamination screening
• Synthetic Biology Orders
Wednesday, August 7, 13
w
68
Amino Acid
Tree
Run PhyloSift
(search + align)
Execute'dbupdate'mode'
A'taxa'set'is'selected'with'a'
maxPD'cutoff'of'0.02'and'a'new'
tree'is'inferred'
EBI'
Genomes'
Infer Updated
Tree
Add'new'sequences'to'marker'packages'
JGI'
Genomes'
Private'
Genomes'
NCBI'
Genomes'
Nucleotide
Tree
Prune Tree 
Update reference
sequences with
new data
New'sequences'added'at'0.25'PD'for'amino'
acid'tree;'higher'PD'threshold'enables'
more'aggressive'searches'of'reference'
database,'since'LAST'searching'is'faster'
with'fewer'sequences.'
Reconcile'NCBI'taxonomy'IDs'with'
phylogeneOc'topologies,'for'both'
amino'acid'tree'and'codon'subtrees'
Tree
Reconciliation
Package
Markers
Users’'local'marker'databases'are'automaOcally'
scanned'each'Ome'PhyloSiR'is'run'and'any'new'
updates'are'automaOcally'downloaded'if'available'
Automated 
Download to 
PhyloSift Users
Prune Tree 
A'taxa'set'is'selected'with'a'
maxPD'cutoff'of'0.01'and'a'new'
tree'is'inferred'
Wednesday, August 7, 13
Improving VII: More Samples
Wednesday, August 7, 13
The Built Environment
ORIGINAL ARTICLE
Architectural design influences the diversity and
structure of the built environment microbiome
Steven W Kembel1
, Evan Jones1
, Jeff Kline1,2
, Dale Northcutt1,2
, Jason Stenson1,2
,
Ann M Womack1
, Brendan JM Bohannan1
, G Z Brown1,2
and Jessica L Green1,3
1
Biology and the Built Environment Center, Institute of Ecology and Evolution, Department of
Biology, University of Oregon, Eugene, OR, USA; 2
Energy Studies in Buildings Laboratory,
Department of Architecture, University of Oregon, Eugene, OR, USA and 3
Santa Fe Institute,
Santa Fe, NM, USA
Buildings are complex ecosystems that house trillions of microorganisms interacting with each
other, with humans and with their environment. Understanding the ecological and evolutionary
processes that determine the diversity and composition of the built environment microbiome—the
community of microorganisms that live indoors—is important for understanding the relationship
between building design, biodiversity and human health. In this study, we used high-throughput
sequencing of the bacterial 16S rRNA gene to quantify relationships between building attributes and
airborne bacterial communities at a health-care facility. We quantified airborne bacterial community
structure and environmental conditions in patient rooms exposed to mechanical or window
ventilation and in outdoor air. The phylogenetic diversity of airborne bacterial communities was
lower indoors than outdoors, and mechanically ventilated rooms contained less diverse microbial
communities than did window-ventilated rooms. Bacterial communities in indoor environments
contained many taxa that are absent or rare outdoors, including taxa closely related to potential
human pathogens. Building attributes, specifically the source of ventilation air, airflow rates, relative
humidity and temperature, were correlated with the diversity and composition of indoor bacterial
communities. The relative abundance of bacteria closely related to human pathogens was higher
indoors than outdoors, and higher in rooms with lower airflow rates and lower relative humidity.
The observed relationship between building design and airborne bacterial diversity suggests that
we can manage indoor environments, altering through building design and operation the community
of microbial species that potentially colonize the human microbiome during our time indoors.
The ISME Journal advance online publication, 26 January 2012; doi:10.1038/ismej.2011.211
Subject Category: microbial population and community ecology
Keywords: aeromicrobiology; bacteria; built environment microbiome; community ecology; dispersal;
environmental filtering
Introduction
Humans spend up to 90% of their lives indoors
(Klepeis et al., 2001). Consequently, the way we
microbiome—includes human pathogens and com-
mensals interacting with each other and with their
environment (Eames et al., 2009). There have been
few attempts to comprehensively survey the built
The ISME Journal (2012), 1–11
 2012 International Society for Microbial Ecology All rights reserved 1751-7362/12
www.nature.com/ismej
Microbial Biogeography of Public Restroom Surfaces
Gilberto E. Flores1
, Scott T. Bates1
, Dan Knights2
, Christian L. Lauber1
, Jesse Stombaugh3
, Rob Knight3,4
,
Noah Fierer1,5
*
1 Cooperative Institute for Research in Environmental Science, University of Colorado, Boulder, Colorado, United States of America, 2 Department of Computer Science,
University of Colorado, Boulder, Colorado, United States of America, 3 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado, United
States of America, 4 Howard Hughes Medical Institute, University of Colorado, Boulder, Colorado, United States of America, 5 Department of Ecology and Evolutionary
Biology, University of Colorado, Boulder, Colorado, United States of America
Abstract
We spend the majority of our lives indoors where we are constantly exposed to bacteria residing on surfaces. However, the
diversity of these surface-associated communities is largely unknown. We explored the biogeographical patterns exhibited
by bacteria across ten surfaces within each of twelve public restrooms. Using high-throughput barcoded pyrosequencing of
the 16 S rRNA gene, we identified 19 bacterial phyla across all surfaces. Most sequences belonged to four phyla:
Actinobacteria, Bacteriodetes, Firmicutes and Proteobacteria. The communities clustered into three general categories: those
found on surfaces associated with toilets, those on the restroom floor, and those found on surfaces routinely touched with
hands. On toilet surfaces, gut-associated taxa were more prevalent, suggesting fecal contamination of these surfaces. Floor
surfaces were the most diverse of all communities and contained several taxa commonly found in soils. Skin-associated
bacteria, especially the Propionibacteriaceae, dominated surfaces routinely touched with our hands. Certain taxa were more
common in female than in male restrooms as vagina-associated Lactobacillaceae were widely distributed in female
restrooms, likely from urine contamination. Use of the SourceTracker algorithm confirmed many of our taxonomic
observations as human skin was the primary source of bacteria on restroom surfaces. Overall, these results demonstrate that
restroom surfaces host relatively diverse microbial communities dominated by human-associated bacteria with clear
linkages between communities on or in different body sites and those communities found on restroom surfaces. More
generally, this work is relevant to the public health field as we show that human-associated microbes are commonly found
on restroom surfaces suggesting that bacterial pathogens could readily be transmitted between individuals by the touching
of surfaces. Furthermore, we demonstrate that we can use high-throughput analyses of bacterial communities to determine
sources of bacteria on indoor surfaces, an approach which could be used to track pathogen transmission and test the
efficacy of hygiene practices.
Citation: Flores GE, Bates ST, Knights D, Lauber CL, Stombaugh J, et al. (2011) Microbial Biogeography of Public Restroom Surfaces. PLoS ONE 6(11): e28132.
doi:10.1371/journal.pone.0028132
Editor: Mark R. Liles, Auburn University, United States of America
Received September 12, 2011; Accepted November 1, 2011; Published November 23, 2011
Copyright: ß 2011 Flores et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported with funding from the Alfred P. Sloan Foundation and their Indoor Environment program, and in part by the National
Institutes of Health and the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, or
preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: noah.fierer@colorado.edu
Introduction
More than ever, individuals across the globe spend a large
portion of their lives indoors, yet relatively little is known about the
microbial diversity of indoor environments. Of the studies that
have examined microorganisms associated with indoor environ-
ments, most have relied upon cultivation-based techniques to
detect organisms residing on a variety of household surfaces [1–5].
Not surprisingly, these studies have identified surfaces in kitchens
and restrooms as being hot spots of bacterial contamination.
Because several pathogenic bacteria are known to survive on
surfaces for extended periods of time [6–8], these studies are of
obvious importance in preventing the spread of human disease.
However, it is now widely recognized that the majority of
microorganisms cannot be readily cultivated [9] and thus, the
overall diversity of microorganisms associated with indoor
communities and revealed a greater diversity of bacteria on
indoor surfaces than captured using cultivation-based techniques
[10–13]. Most of the organisms identified in these studies are
related to human commensals suggesting that the organisms are
not actively growing on the surfaces but rather were deposited
directly (i.e. touching) or indirectly (e.g. shedding of skin cells) by
humans. Despite these efforts, we still have an incomplete
understanding of bacterial communities associated with indoor
environments because limitations of traditional 16 S rRNA gene
cloning and sequencing techniques have made replicate sampling
and in-depth characterizations of the communities prohibitive.
With the advent of high-throughput sequencing techniques, we
can now investigate indoor microbial communities at an
unprecedented depth and begin to understand the relationship
between humans, microbes and the built environment.
In order to begin to comprehensively describe the microbial
the stall in), they were likely dispersed manually after women used
the toilet. Coupling these observations with those of the
distribution of gut-associated bacteria indicate that routine use of
toilets results in the dispersal of urine- and fecal-associated bacteria
throughout the restroom. While these results are not unexpected,
they do highlight the importance of hand-hygiene when using
public restrooms since these surfaces could also be potential
vehicles for the transmission of human pathogens. Unfortunately,
previous studies have documented that college students (who are
likely the most frequent users of the studied restrooms) are not
always the most diligent of hand-washers [42,43].
Results of SourceTracker analysis support the taxonomic
patterns highlighted above, indicating that human skin was the
primary source of bacteria on all public restroom surfaces
examined, while the human gut was an important source on or
around the toilet, and urine was an important source in women’s
restrooms (Figure 4, Table S4). Contrary to expectations (see
above), soil was not identified by the SourceTracker algorithm as
being a major source of bacteria on any of the surfaces, including
floors (Figure 4). Although the floor samples contained family-level
taxa that are common in soil, the SourceTracker algorithm
probably underestimates the relative importance of sources, like
Figure 3. Cartoon illustrations of the relative abundance of discriminating taxa on public restroom surfaces. Light blue indicates low
abundance while dark blue indicates high abundance of taxa. (A) Although skin-associated taxa (Propionibacteriaceae, Corynebacteriaceae,
Staphylococcaceae and Streptococcaceae) were abundant on all surfaces, they were relatively more abundant on surfaces routinely touched with
hands. (B) Gut-associated taxa (Clostridiales, Clostridiales group XI, Ruminococcaceae, Lachnospiraceae, Prevotellaceae and Bacteroidaceae) were most
abundant on toilet surfaces. (C) Although soil-associated taxa (Rhodobacteraceae, Rhizobiales, Microbacteriaceae and Nocardioidaceae) were in low
abundance on all restroom surfaces, they were relatively more abundant on the floor of the restrooms we surveyed. Figure not drawn to scale.
doi:10.1371/journal.pone.0028132.g003
Bacteria of Public Restrooms
high diversity of floor communities is likely due to the frequency of
contact with the bottom of shoes, which would track in a diversity
of microorganisms from a variety of sources including soil, which is
known to be a highly-diverse microbial habitat [27,39]. Indeed,
bacteria commonly associated with soil (e.g. Rhodobacteraceae,
Rhizobiales, Microbacteriaceae and Nocardioidaceae) were, on average,
more abundant on floor surfaces (Figure 3C, Table S2).
Interestingly, some of the toilet flush handles harbored bacterial
related differences in the relative abundances of s
some surfaces (Figure 1B, Table S2). Most notably
were clearly more abundant on certain surfaces
restrooms than male restrooms (Figure 1B). Some
family are the most common, and often most abun
found in the vagina of healthy reproductive age w
and are relatively less abundant in male urine
analysis of female urine samples collected as part
Figure 2. Relationship between bacterial communities associated with ten public restroom surfaces. Communities were
PCoA of the unweighted UniFrac distance matrix. Each point represents a single sample. Note that the floor (triangles) and toilet (as
form clusters distinct from surfaces touched with hands.
doi:10.1371/journal.pone.0028132.g002
Bacteria of P
time, the
un to take
of outside
om plants
ours after
ere shut
ortion of
e human
ck to pre-
which
26 Janu-
Journal,
hanically
had lower
y than ones with open win-
ility of fresh air translated
tions of microbes associ-
an body, and consequently,
pathogens. Although this
hat having natural airflow
Green says answering that
clinical data; she’s hoping
ital to participate in a study
ence of hospital-acquired
they move around. But to quantify those con-
tributions, Peccia’s team has had to develop
new methods to collect airborne bacteria and
extract their DNA, as the microbes are much
less abundant in air than on surfaces.
In one recent study, they used air filters
to sample airborne particles and microbes
in a classroom during 4 days during which
students were present and 4 days during
which the room was vacant. They measured
pant in indoor microbial
ecology research, Peccia
thinks that the field has
yet to gel. And the Sloan
Foundation’s Olsiewski
shares some of his con-
cern. “Everybody’s gen-
erating vast amounts of
data,” she says, but looking across data sets
can be difficult because groups choose dif-
ferent analytical tools. With Sloan support,
though, a data archive and integrated analyt-
ical tools are in the works.
To foster collaborations between micro-
biologists, architects, and building scientists,
the foundation also sponsored a symposium
on the microbiome of the built environment
at the 2011 Indoor Air conference in Austin,
100
80
60
40
20
0
Averagecontribution(%)
DoorinDoorout
StallinStallout
Faucethandles
SoapdispenserToiletseat
ToiletflushhandleToiletfloorSinkfloor
SOURCES
Soil
Water
Mouth
Urine
Gut
Skin
Bathroom biogeography. By
swabbing different surfaces in
public restrooms, researchers
determinedthatmicrobesvaryin
where they come from depend-
ing on the surface (chart).
onFebruary9,2012
Wednesday, August 7, 13

Weitere ähnliche Inhalte

Was ist angesagt?

2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
c.titus.brown
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
drelamuruganvet
 

Was ist angesagt? (20)

Advancing the Metagenomics Revolution
Advancing the Metagenomics RevolutionAdvancing the Metagenomics Revolution
Advancing the Metagenomics Revolution
 
The Emerging Global Collaboratory for Microbial Metagenomics Researchers
The Emerging Global Collaboratory for Microbial Metagenomics ResearchersThe Emerging Global Collaboratory for Microbial Metagenomics Researchers
The Emerging Global Collaboratory for Microbial Metagenomics Researchers
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
 
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencing
 
Quantified Self On Being A Personal Genomic Observatory
Quantified Self On Being A Personal Genomic ObservatoryQuantified Self On Being A Personal Genomic Observatory
Quantified Self On Being A Personal Genomic Observatory
 
Case studies of HTS / NGS applications
Case studies of HTS / NGS applicationsCase studies of HTS / NGS applications
Case studies of HTS / NGS applications
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)Viral Metagenomics (CABBIO 20150629 Buenos Aires)
Viral Metagenomics (CABBIO 20150629 Buenos Aires)
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
 
Mason abrf single_cell_2017
Mason abrf single_cell_2017Mason abrf single_cell_2017
Mason abrf single_cell_2017
 
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
Clinical Metagenomics for Rapid Detection of Enteric Pathogens and Characteri...
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
Trends In Genomics
Trends In GenomicsTrends In Genomics
Trends In Genomics
 
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
[2013.12.02] Mads Albertsen: Extracting Genomes from Metagenomes
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Metagenomics analysis
Metagenomics  analysisMetagenomics  analysis
Metagenomics analysis
 

Andere mochten auch

[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics
Mads Albertsen
 
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Vall d'Hebron Institute of Research (VHIR)
 

Andere mochten auch (20)

Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
introduction to metagenomics
introduction to metagenomicsintroduction to metagenomics
introduction to metagenomics
 
Metagenomics newer approach in understanding Microbes
Metagenomics newer approach in understanding Microbes  Metagenomics newer approach in understanding Microbes
Metagenomics newer approach in understanding Microbes
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
 
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library PrepQIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
QIAseq Technologies for Metagenomics and Microbiome NGS Library Prep
 
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
Creating a Cyberinfrastructure for Advanced Marine Microbial Ecology Research...
 
Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...
Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...
Hervé Blottiere-El impacto de las ciencias ómicas en la medicina, la nutrició...
 
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan EisenPhylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
Phylogenetic approaches to metagenomic analysis #KSMicro talk by Jonathan Eisen
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics Researchers
 
[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics[13.07.07] albertsen mewe13 metagenomics
[13.07.07] albertsen mewe13 metagenomics
 
Reframing Phylogenomics
Reframing PhylogenomicsReframing Phylogenomics
Reframing Phylogenomics
 
Microbial Metagenomics and Human Health
Microbial Metagenomics and Human HealthMicrobial Metagenomics and Human Health
Microbial Metagenomics and Human Health
 
Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014
Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014
Next Generation Sequencing of Fish Microbiome- AquaCyprus 2014
 
Microbiome 2013
Microbiome 2013Microbiome 2013
Microbiome 2013
 
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
Identification of antibiotic resistance genes in Klebsiella pneumoniae isolat...
 
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
Gut microbiota for health: lessons of a metagenomic scan (by Joel Doré)
 
Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....Diversity Diversity Diversity Diversity ....
Diversity Diversity Diversity Diversity ....
 

Ähnlich wie Bayesian Taxonomic Assignment for the Next-Generation Metagenomics

Theusch 2009. GWAS AP
Theusch 2009. GWAS APTheusch 2009. GWAS AP
Theusch 2009. GWAS AP
Yuri Cheung
 
Fundamentals of Analysis of Exomes
Fundamentals of Analysis of ExomesFundamentals of Analysis of Exomes
Fundamentals of Analysis of Exomes
daforerog
 

Ähnlich wie Bayesian Taxonomic Assignment for the Next-Generation Metagenomics (20)

"Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E...
"Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E..."Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E...
"Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E...
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
DNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal GenomicsDNA Testing: Living Longer Via Personal Genomics
DNA Testing: Living Longer Via Personal Genomics
 
MIB200A at UCDavis Module: Microbial Phylogeny; Class 3
MIB200A at UCDavis Module: Microbial Phylogeny; Class 3MIB200A at UCDavis Module: Microbial Phylogeny; Class 3
MIB200A at UCDavis Module: Microbial Phylogeny; Class 3
 
EVE 161 Lecture 4
EVE 161 Lecture 4EVE 161 Lecture 4
EVE 161 Lecture 4
 
Theusch 2009. GWAS AP
Theusch 2009. GWAS APTheusch 2009. GWAS AP
Theusch 2009. GWAS AP
 
Brown - CV
Brown - CVBrown - CV
Brown - CV
 
Protease Phylogeny
 Protease Phylogeny  Protease Phylogeny
Protease Phylogeny
 
Microbial Phylogenomics (EVE161) Class 5
Microbial Phylogenomics (EVE161) Class 5Microbial Phylogenomics (EVE161) Class 5
Microbial Phylogenomics (EVE161) Class 5
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome Sequencing
 
Bioinformatics A Biased Overview
Bioinformatics A Biased OverviewBioinformatics A Biased Overview
Bioinformatics A Biased Overview
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
 
Jonathan Eisen talk for #IDWeek: Facilitating the Spread of Scientific Knowle...
Jonathan Eisen talk for #IDWeek: Facilitating the Spread of Scientific Knowle...Jonathan Eisen talk for #IDWeek: Facilitating the Spread of Scientific Knowle...
Jonathan Eisen talk for #IDWeek: Facilitating the Spread of Scientific Knowle...
 
Fundamentals of Analysis of Exomes
Fundamentals of Analysis of ExomesFundamentals of Analysis of Exomes
Fundamentals of Analysis of Exomes
 
FCEN_UBA2018_LauraHuerta
FCEN_UBA2018_LauraHuertaFCEN_UBA2018_LauraHuerta
FCEN_UBA2018_LauraHuerta
 
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
 
Sweden_eemis_big_data
Sweden_eemis_big_dataSweden_eemis_big_data
Sweden_eemis_big_data
 
Urinalysis and Body Fluids ( PDFDrive ).pdf
Urinalysis and Body Fluids ( PDFDrive ).pdfUrinalysis and Body Fluids ( PDFDrive ).pdf
Urinalysis and Body Fluids ( PDFDrive ).pdf
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Updated Resume
Updated ResumeUpdated Resume
Updated Resume
 

Mehr von Jonathan Eisen

EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
Jonathan Eisen
 

Mehr von Jonathan Eisen (20)

Eisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdfEisen.CentralValley2024.pdf
Eisen.CentralValley2024.pdf
 
Phylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of MicrobesPhylogenomics and the Diversity and Diversification of Microbes
Phylogenomics and the Diversity and Diversification of Microbes
 
Talk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meetingTalk by Jonathan Eisen for LAMG2022 meeting
Talk by Jonathan Eisen for LAMG2022 meeting
 
Thoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current ActionsThoughts on UC Davis' COVID Current Actions
Thoughts on UC Davis' COVID Current Actions
 
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
 
A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2A Field Guide to Sars-CoV-2
A Field Guide to Sars-CoV-2
 
EVE198 Summer Session Class 4
EVE198 Summer Session Class 4EVE198 Summer Session Class 4
EVE198 Summer Session Class 4
 
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1 EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 1
 
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Summer Session 2 Class 2 Vaccines
 
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 IntroductionEVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class1 Introduction
 
EVE198 Spring2021 Class2
EVE198 Spring2021 Class2EVE198 Spring2021 Class2
EVE198 Spring2021 Class2
 
EVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 VaccinesEVE198 Spring2021 Class5 Vaccines
EVE198 Spring2021 Class5 Vaccines
 
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA DetectionEVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 8 - COVID RNA Detection
 
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 IntroductionEVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 1 Introduction
 
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID TestingEVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 3 - COVID Testing
 
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID VaccinesEVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 5 - COVID Vaccines
 
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID TransmissionEVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Winter2020 Class 9 - COVID Transmission
 
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 VaccinesEVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
 
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and TestingEVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
 
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 IntroductionEVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Bayesian Taxonomic Assignment for the Next-Generation Metagenomics

  • 1. “Bayesian Taxonomic Assignment for the Next-Generation Metagenomics” Jonathan A. Eisen August 7, 2013 DHS Meeting Wednesday, August 7, 13
  • 7. Shotgun Metagenomics DNA Sequence Who is there? What are they doing? Wednesday, August 7, 13
  • 9. Shotgun Metagenomics • Which communities are most similar / different? • What accounts for the differences? • Natural vs. unnatural • Community level signatures (of events, stability, biogeography, etc) Wednesday, August 7, 13
  • 10. Our Approach - Phylogeny Phylogeny of sequences can reveal details about history, taxonomy, function, and ecology Wednesday, August 7, 13
  • 11. DNA extraction PCR Sequence rRNA genes Sequence alignment = Data matrix Phylogenetic tree PCR rRNA1 rRNA2 Makes lots of copies of the rRNA genes in sample rRNA1 5’...ACACACATAGGTGGAGCTA GCGATCGATCGA... 3’ E. coli Humans A T T A G A A C A T C A C A A C A G G A G T T C rRNA1 E. coli Humans rRNA2 rRNA2 5’..TACAGTATAGGTGGAGCTAG CGACGATCGA... 3’ rRNA phylotyping rRNA3 5’...ACGGCAAAATAGGTGGATT CTAGCGATATAGA... 3’ rRNA4 5’...ACGGCCCGATAGGTGGATT CTAGCGCCATAGA... 3’ rRNA3 C A C T G T rRNA4 C A C A G T Yeast T A C A G T Yeast rRNA3 rRNA4 Phylotyping Wednesday, August 7, 13
  • 12. Uses of Phylogeny in Metagenomics • Taxonomic assessment • Phylogenetic OTUs • Phylogenetic taxonomy assignment • Phylogenetic binning • Sample comparisons and hypothesis testing • Alpha diversity (i.e., PD) • Beta diversity • Trait evolution • Dispersal • Functional predictions • Rates of evolution • Convergence Wednesday, August 7, 13
  • 13. Venter et al., Science 304: 66. 2004 rRNA Phylotyping - Sargasso Metagenome Wednesday, August 7, 13
  • 14. Venter et al., Science 304: 66. 2004 RecA Phylotyping - Sargasso Metagenome Wednesday, August 7, 13
  • 16. GOS 1 GOS 2 GOS 3 GOS 4 GOS 5 Phylogenetic ID of Novel Lineages Wu et al PLoS One 2011 Wednesday, August 7, 13
  • 17. Wu et al. 2006 PLoS Biology 4: e188. Baumannia makes vitamins and cofactors Sulcia makes amino acids Phylogenetic Binning Wednesday, August 7, 13
  • 18. Phylogenetic Functional Prediction Venter et al., Science 304: 66. 2004 Wednesday, August 7, 13
  • 19. Sequencing Revolution • More Samples • Deeper sequencing • The rare biosphere • Relative abundance estimates • More samples (with barcoding) • Times series • Spatially diverse sampling • Fine scale sampling Wednesday, August 7, 13
  • 21. Acknowledgements Jonathan Eisen Students and other staff: - Eric Lowe, John Zhang, David Coil Open source community: - BLAST, LAST, HMMER, Infernal, pplacer, Krona, metAMOS, Bioperl, Bio::Phylo, JSON, etc. etc. PhyloSift is open source software: - Website: http://phylosift.wordpress.org - Code: http://github.com/gjospin/phylosift Erick Matsen FHCRC Todd Treangen BNBI, NBACC Holly Bik Tiffanie Nelson Mark Brown Aaron Darling Guillaume Jospin Supported by DHS Grant Wednesday, August 7, 13
  • 23. Analysis & Summary Analysis & Summary •Metagenomic reads •Contigs •Genes PhyloSift Wednesday, August 7, 13
  • 24. Analysis & Summary Searching inputs against reference family DB PhyloSift Wednesday, August 7, 13
  • 25. Analysis & Summary Align to reference HMMs for each family PhyloSift Wednesday, August 7, 13
  • 26. Analysis & Summary Place reads into reference phylogeny using pplacer PhyloSift Wednesday, August 7, 13
  • 27. Analysis & Summary Summarize results & additional analyses PhyloSift Wednesday, August 7, 13
  • 29. Taxonomic summary plots in Krona (Ondov et al 2011) Taxonomic Summaries (via Krona) Wednesday, August 7, 13
  • 32. Tree Reconciliation in PhyloSift Wednesday, August 7, 13
  • 33. Tree Reconciliation in PhyloSift Environmental Sequences Named Taxa Wednesday, August 7, 13
  • 34. Output 2: Phylogenetic Tree of Reads Wednesday, August 7, 13
  • 35. PhyloSift Tree Browsing Darling et al Submitted Placement tree from 2 week old infant gut data Wednesday, August 7, 13
  • 36. Output 3: Edge PCA  Edge PCA for exploratory data analysis (Matsen and Evans 2013)  Given E edges and S samples: − For each edge, calculate difference in placement mass on either side of edge − Results in E x S matrix − Calculate E x E covariance matrix − Calculate eigenvectors, eigenvalues of covariance matrix  Eigenvector: each value indicates how “important” an edge is in explaining differences among the S samples Example calculating a matrix entry for an edge: This edge gets 5-2=3 mass=5 mass=2 Wednesday, August 7, 13
  • 37. Edge PCA: Identify lineages that explain most variation among samples Matsen and Evans 2013, Darling et al Submitted. Edge PCA Wednesday, August 7, 13
  • 38. QIIME and Edge PCA on 110 fecal metagenomes from Yatsunenko et al 2012 Nature. Sequenced with 454, to about 150Mbp/metagenome Darling et al Submitted. Edge PCA vs. UNIFRAC PCA Wednesday, August 7, 13
  • 41. Analysis & Summary Analysis & Summary •Metagenomic reads •Contigs •Genes PhyloSift Wednesday, August 7, 13
  • 42. Analysis & Summary Analysis & Summary •Metagenomic reads •Contigs •Genes PhyloSift Challenge - Short Non Overlapping Reads Wednesday, August 7, 13
  • 43. Analysis & Summary Searching inputs against reference family DB PhyloSift Wednesday, August 7, 13
  • 44. Markers • PMPROK – Dongying Wu’s Bac/Arch markers • Eukaryotic Orthologs – Parfrey 2011 paper • 16S/18S rRNA • Mitochondria - protein-coding genes • Viral Markers – Markov clustering on genomes • Codon Subtrees – finer scale taxonomy • Extended Markers – plastids, gene families Wednesday, August 7, 13
  • 46. Analysis & Summary PhyloSift Challenges: •Limited ref. genomes •Limited markers, families Searching inputs against reference family DB Wednesday, August 7, 13
  • 47. Improving I: More Markers Phylogenetic group Genome Number Gene Number Maker Candidates Archaea 62 145415 106 Actinobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684 Wu et al. PLOS One 2013. In press. Wednesday, August 7, 13
  • 48. Representative Genomes Extract Protein Annotation All v. All BLAST Homology Clustering (MCL) SFams Align & Build HMMs HMMs Screen for Homologs New Genomes Extract Protein Annotation Figure 1 Sharpton et al. 2013 A B C Improving II: More Families Wednesday, August 7, 13
  • 49. Improving III: Filling in the Tree Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Wednesday, August 7, 13
  • 50. Genomic Encyclopedia of Bacteria & Archaea Wu et al. 2009 Nature 462, 1056-1060 Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Wednesday, August 7, 13
  • 51. Genomic Encyclopedia of Bacteria & Archaea Wu et al. 2009 Nature 462, 1056-1060 Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Wednesday, August 7, 13
  • 52. Family Diversity vs. PD Wu et al. 2009 Nature 462, 1056-1060 Wednesday, August 7, 13
  • 53. The Dark Matter of Biology From Wu et al. 2009 Nature 462, 1056-1060 Wednesday, August 7, 13
  • 54. 50 Number of SAGs from Candidate Phyla OD1 OP11 OP3 SAR406 Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz GEBA Uncultured Wednesday, August 7, 13
  • 55. JGI Dark Matter Project environmental samples (n=9) isolation of single cells (n=9,600) whole genome amplification (n=3,300) SSU rRNA gene based identification (n=2,000) genome sequencing, assembly and QC (n=201) draft genomes (n=201) SAK HSM ETLTG HOT GOM GBS EPR TAETL T PR EBS AK E SM G TATTG OM OT seawater brackish/freshwater hydrothermal sediment bioreactor GN04 WS3 (Latescibacteria) GN01 +Gí LD1 WS1 Poribacteria BRC1 Lentisphaerae Verrucomicrobia OP3 (Omnitrophica) Chlamydiae Planctomycetes NKB19 (Hydrogenedentes) WYO Armatimonadetes WS4 Actinobacteria Gemmatimonadetes NC10 SC4 WS2 Cyanobacteria :36í2 Deltaproteobacteria EM19 (Calescamantes) 2FW6SDí )HUYLGLEDFWHULD
  • 56. GAL35 Aquificae EM3 Thermotogae Dictyoglomi SPAM GAL15 CD12 (Aerophobetes) OP8 (Aminicenantes) AC1 SBR1093 Thermodesulfobacteria Deferribacteres Synergistetes OP9 (Atribacteria) :36í2 Caldiserica AD3 Chloroflexi Acidobacteria Elusimicrobia Nitrospirae 49S1 2B Caldithrix GOUTA4 6$5 0DULQLPLFURELD
  • 57. Chlorobi )LUPLFXWHV Tenericutes )XVREDFWHULD Chrysiogenetes Proteobacteria )LEUREDFWHUHV TG3 Spirochaetes WWE1 (Cloacamonetes) 70 ZB3 093í 'HLQRFRFFXVí7KHUPXV OP1 (Acetothermia) Bacteriodetes TM7 GN02 (Gracilibacteria) SR1 BH1 OD1 (Parcubacteria) :6 OP11 (Microgenomates) Euryarchaeota Micrarchaea DSEG (Aenigmarchaea) Nanohaloarchaea Nanoarchaea Cren MCG Thaumarchaeota Cren C2 Aigarchaeota Cren pISA7 Cren Thermoprotei Korarchaeota pMC2A384 (Diapherotrites) BACTERIA ARCHAEA archaeal toxins (Nanoarchaea) lytic murein transglycosylase stringent response (Diapherotrites, Nanoarchaea) ppGpp limiting amino acids SpotT RelA (GTP or GDP) + PPi GTP or GDP +ATP limiting phosphate, fatty acids, carbon, iron DksA Expression of components for stress response sigma factor (Diapherotrites, Nanoarchaea) ı4 ȕ ȕ¶ ı2ı3 ı1 -35 -10 Į17' Į7' 51$ SROPHUDVH oxidoretucase + +e- donor e- acceptor H 1 Ribo ADP + 1+2 O Reduction Oxidation H 1 Ribo ADP 1+ O 2H 1$' + H 1$'++ + - HGT from Eukaryotes (Nanoarchaea) Eukaryota O +2+2 OH 1+ 2+3 O O +2+2 1+ 2+3 O tetra- peptide O +2+2 OH 1+ 2+3 O O +2+2 1+ 2+3 O tetra- peptide murein (peptido-glycan) archaeal type purine synthesis (Microgenomates) PurF PurD 3XU1 PurL/Q PurM PurK PurE 3XU PurB PurP ? Archaea adenine guanine O + 12 + 1 1+2 1 1 H H 1 1 1 H H H1 1 H PRPP )$,$5 IMP $,$5 A GUA G U G U A G U A U A U A U Growing AA chain W51$*O
  • 58. recognizes UGA P51$ UGA recoded for Gly (Gracilibacteria) ribosome Woyke et al. Nature 2013. Wednesday, August 7, 13
  • 59. A Genomic Encyclopedia of Microbes (GEM) Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Wednesday, August 7, 13
  • 60. A Genomic Encyclopedia of Microbes (GEM) Figure from Barton, Eisen et al. “Evolution”, CSHL Press based on Baldauf et al Tree Wednesday, August 7, 13
  • 61. Analysis Summary Align to reference HMMs for each family PhyloSift Wednesday, August 7, 13
  • 62. Analysis Summary Align to reference HMMs for each family PhyloSift Challenge: How to align? Wednesday, August 7, 13
  • 63. Zorro - Automated Masking cetoTrueTree 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 200 400 800 1600 3200 DistancetoTrueTree Sequence Length 200 no masking zorro gblocks Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone. 0030288 Wednesday, August 7, 13
  • 64. Analysis Summary Place reads into reference phylogeny using pplacer PhyloSift Wednesday, August 7, 13
  • 65. Analysis Summary Place reads into reference phylogeny using pplacer PhyloSift Challenges: •Trees from short reads •Probabilistic methods Wednesday, August 7, 13
  • 66. Improving IV: Better Reference Tree Lang et al. 2013 Wednesday, August 7, 13
  • 67. Analysis Summary Summarize results additional analyses PhyloSift Wednesday, August 7, 13
  • 68. Phylosift DB Update Amino Acid Tree Run PhyloSift (search + align) Execute'dbupdate'mode' A'taxa'set'is'selected'with'a' maxPD'cutoff'of'0.02'and'a'new' tree'is'inferred' EBI' Genomes' Infer Updated Tree Add'new'sequences'to'marker'packages' JGI' Genomes' Private' Genomes' NCBI' Genomes' Nucleotide Tree Prune Tree Update reference sequences with new data New'sequences'added'at'0.25'PD'for'amino' acid'tree;'higher'PD'threshold'enables' more'aggressive'searches'of'reference' database,'since'LAST'searching'is'faster' with'fewer'sequences.' Reconcile'NCBI'taxonomy'IDs'with' phylogeneOc'topologies,'for'both' amino'acid'tree'and'codon'subtrees' Tree Reconciliation Package Markers Users’'local'marker'databases'are'automaOcally' scanned'each'Ome'PhyloSiR'is'run'and'any'new' updates'are'automaOcally'downloaded'if'available' Automated Download to PhyloSift Users Prune Tree A'taxa'set'is'selected'with'a' maxPD'cutoff'of'0.01'and'a'new' tree'is'inferred' Wednesday, August 7, 13
  • 69. Improving VI: Other Methods • PhylOTU • Kembel all markers • Kembel copy # correction Wednesday, August 7, 13
  • 70. Kembel Correction Kembel SW, Wu M, Eisen JA, Green JL (2012) Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance. PLoS Comput Biol 8(10): e1002743. doi:10.1371/journal.pcbi.1002743 Wednesday, August 7, 13
  • 71. alignment used to build the profile, resulting in a multiple sequence alignment of full-length reference sequences and metagenomic reads. The final step of the alignment process is a PD versus PID clustering, 2) to explore overlap betw clusters and recognized taxonomic designations, and the accuracy of PhylOTU clusters from shotgun re Figure 1. PhylOTU Workflow. Computational processes are represented as squares and databases are represented as cylinders in workflow of PhylOTU. See Results section for details. doi:10.1371/journal.pcbi.1001061.g001 Finding Meta Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O'Dwyer JP, Green JL, Eisen JA, Pollard KS. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 PhylOTU Wednesday, August 7, 13
  • 72. Kembel Combiner typically used as a qualitative measure because duplicate s quences are usually removed from the tree. However, the test may be used in a semiquantitative manner if all clone even those with identical or near-identical sequences, are i cluded in the tree (13). Here we describe a quantitative version of UniFrac that w call “weighted UniFrac.” We show that weighted UniFrac b haves similarly to the FST test in situations where both a FIG. 1. Calculation of the unweighted and the weighted UniFr measures. Squares and circles represent sequences from two differe environments. (a) In unweighted UniFrac, the distance between t circle and square communities is calculated as the fraction of t branch length that has descendants from either the square or the circ environment (black) but not both (gray). (b) In weighted UniFra branch lengths are weighted by the relative abundance of sequences the square and circle communities; square sequences are weight twice as much as circle sequences because there are twice as many tot circle sequences in the data set. The width of branches is proportion to the degree to which each branch is weighted in the calculations, an gray branches have no weight. Branches 1 and 2 have heavy weigh since the descendants are biased toward the square and circles, respe tively. Branch 3 contributes no value since it has an equal contributio from circle and square sequences after normalization. Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214 Wednesday, August 7, 13
  • 73. NMF in MetagenomesCharacterizing the niche-space distributions of components Sites North American East Coast_GS005_Embayment North American East Coast_GS002_Coastal North American East Coast_GS003_Coastal North American East Coast_GS007_Coastal North American East Coast_GS004_Coastal North American East Coast_GS013_Coastal North American East Coast_GS008_Coastal North American East Coast_GS011_Estuary North American East Coast_GS009_Coastal Eastern Tropical Pacific_GS021_Coastal North American East Coast_GS006_Estuary North American East Coast_GS014_Coastal Polynesia Archipelagos_GS051_Coral Reef Atoll Galapagos Islands_GS036_Coastal Galapagos Islands_GS028_Coastal Indian Ocean_GS117a_Coastal sample Galapagos Islands_GS031_Coastal upwelling Galapagos Islands_GS029_Coastal Galapagos Islands_GS030_Warm Seep Galapagos Islands_GS035_Coastal Sargasso Sea_GS001c_Open Ocean Eastern Tropical Pacific_GS022_Open Ocean Galapagos Islands_GS027_Coastal Indian Ocean_GS149_Harbor Indian Ocean_GS123_Open Ocean Caribbean Sea_GS016_Coastal Sea Indian Ocean_GS148_Fringing Reef Indian Ocean_GS113_Open Ocean Indian Ocean_GS112a_Open Ocean Caribbean Sea_GS017_Open Ocean Indian Ocean_GS121_Open Ocean Indian Ocean_GS122a_Open Ocean Galapagos Islands_GS034_Coastal Caribbean Sea_GS018_Open Ocean Indian Ocean_GS108a_Lagoon Reef Indian Ocean_GS110a_Open Ocean Eastern Tropical Pacific_GS023_Open Ocean Indian Ocean_GS114_Open Ocean Caribbean Sea_GS019_Coastal Caribbean Sea_GS015_Coastal Indian Ocean_GS119_Open Ocean Galapagos Islands_GS026_Open Ocean Polynesia Archipelagos_GS049_Coastal Indian Ocean_GS120_Open Ocean Polynesia Archipelagos_GS048a_Coral Reef Component 1 Component 2 Component 3 Component 4 Component 5 0.1 0.2 0.3 0.4 0.5 0.6 0.2 0.4 0.6 0.8 1.0 Salinity SampleDepth Chlorophyll Temperature Insolation WaterDepth General High M edium Low NA High M edium Low NA Water depth 4000m 2000!4000m 900!2000m 100!200m 20!100m 0!20m 4000m 2000!4000m 900!2000m 100!200m 20!100m 0!20m (a) (b) (c) Figure 3: a) Niche-space distributions for our five components (HT ); b) the site- similarity matrix ( ˆHT ˆH); c) environmental variables for the sites. The matrices are aligned so that the same row corresponds to the same site in each matrix. Sites are ordered by applying spectral reordering to the similarity matrix (see Materials and Methods). Rows are aligned across the three matrices. Functional biogeography of ocean microbes revealed through non-negative matrix factorization Jiang et al. In press PLoS One. Comes out 9/18. w/ Weitz, Dushoff, Langille, Neches, Levin, etc Wednesday, August 7, 13
  • 74. Other Uses of PhyloSift • Integration with other tools (e.g., QIIME) • LGT detection • Contamination screening • Synthetic Biology Orders Wednesday, August 7, 13
  • 75. w 68 Amino Acid Tree Run PhyloSift (search + align) Execute'dbupdate'mode' A'taxa'set'is'selected'with'a' maxPD'cutoff'of'0.02'and'a'new' tree'is'inferred' EBI' Genomes' Infer Updated Tree Add'new'sequences'to'marker'packages' JGI' Genomes' Private' Genomes' NCBI' Genomes' Nucleotide Tree Prune Tree Update reference sequences with new data New'sequences'added'at'0.25'PD'for'amino' acid'tree;'higher'PD'threshold'enables' more'aggressive'searches'of'reference' database,'since'LAST'searching'is'faster' with'fewer'sequences.' Reconcile'NCBI'taxonomy'IDs'with' phylogeneOc'topologies,'for'both' amino'acid'tree'and'codon'subtrees' Tree Reconciliation Package Markers Users’'local'marker'databases'are'automaOcally' scanned'each'Ome'PhyloSiR'is'run'and'any'new' updates'are'automaOcally'downloaded'if'available' Automated Download to PhyloSift Users Prune Tree A'taxa'set'is'selected'with'a' maxPD'cutoff'of'0.01'and'a'new' tree'is'inferred' Wednesday, August 7, 13
  • 76. Improving VII: More Samples Wednesday, August 7, 13
  • 77. The Built Environment ORIGINAL ARTICLE Architectural design influences the diversity and structure of the built environment microbiome Steven W Kembel1 , Evan Jones1 , Jeff Kline1,2 , Dale Northcutt1,2 , Jason Stenson1,2 , Ann M Womack1 , Brendan JM Bohannan1 , G Z Brown1,2 and Jessica L Green1,3 1 Biology and the Built Environment Center, Institute of Ecology and Evolution, Department of Biology, University of Oregon, Eugene, OR, USA; 2 Energy Studies in Buildings Laboratory, Department of Architecture, University of Oregon, Eugene, OR, USA and 3 Santa Fe Institute, Santa Fe, NM, USA Buildings are complex ecosystems that house trillions of microorganisms interacting with each other, with humans and with their environment. Understanding the ecological and evolutionary processes that determine the diversity and composition of the built environment microbiome—the community of microorganisms that live indoors—is important for understanding the relationship between building design, biodiversity and human health. In this study, we used high-throughput sequencing of the bacterial 16S rRNA gene to quantify relationships between building attributes and airborne bacterial communities at a health-care facility. We quantified airborne bacterial community structure and environmental conditions in patient rooms exposed to mechanical or window ventilation and in outdoor air. The phylogenetic diversity of airborne bacterial communities was lower indoors than outdoors, and mechanically ventilated rooms contained less diverse microbial communities than did window-ventilated rooms. Bacterial communities in indoor environments contained many taxa that are absent or rare outdoors, including taxa closely related to potential human pathogens. Building attributes, specifically the source of ventilation air, airflow rates, relative humidity and temperature, were correlated with the diversity and composition of indoor bacterial communities. The relative abundance of bacteria closely related to human pathogens was higher indoors than outdoors, and higher in rooms with lower airflow rates and lower relative humidity. The observed relationship between building design and airborne bacterial diversity suggests that we can manage indoor environments, altering through building design and operation the community of microbial species that potentially colonize the human microbiome during our time indoors. The ISME Journal advance online publication, 26 January 2012; doi:10.1038/ismej.2011.211 Subject Category: microbial population and community ecology Keywords: aeromicrobiology; bacteria; built environment microbiome; community ecology; dispersal; environmental filtering Introduction Humans spend up to 90% of their lives indoors (Klepeis et al., 2001). Consequently, the way we microbiome—includes human pathogens and com- mensals interacting with each other and with their environment (Eames et al., 2009). There have been few attempts to comprehensively survey the built The ISME Journal (2012), 1–11 2012 International Society for Microbial Ecology All rights reserved 1751-7362/12 www.nature.com/ismej Microbial Biogeography of Public Restroom Surfaces Gilberto E. Flores1 , Scott T. Bates1 , Dan Knights2 , Christian L. Lauber1 , Jesse Stombaugh3 , Rob Knight3,4 , Noah Fierer1,5 * 1 Cooperative Institute for Research in Environmental Science, University of Colorado, Boulder, Colorado, United States of America, 2 Department of Computer Science, University of Colorado, Boulder, Colorado, United States of America, 3 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado, United States of America, 4 Howard Hughes Medical Institute, University of Colorado, Boulder, Colorado, United States of America, 5 Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, United States of America Abstract We spend the majority of our lives indoors where we are constantly exposed to bacteria residing on surfaces. However, the diversity of these surface-associated communities is largely unknown. We explored the biogeographical patterns exhibited by bacteria across ten surfaces within each of twelve public restrooms. Using high-throughput barcoded pyrosequencing of the 16 S rRNA gene, we identified 19 bacterial phyla across all surfaces. Most sequences belonged to four phyla: Actinobacteria, Bacteriodetes, Firmicutes and Proteobacteria. The communities clustered into three general categories: those found on surfaces associated with toilets, those on the restroom floor, and those found on surfaces routinely touched with hands. On toilet surfaces, gut-associated taxa were more prevalent, suggesting fecal contamination of these surfaces. Floor surfaces were the most diverse of all communities and contained several taxa commonly found in soils. Skin-associated bacteria, especially the Propionibacteriaceae, dominated surfaces routinely touched with our hands. Certain taxa were more common in female than in male restrooms as vagina-associated Lactobacillaceae were widely distributed in female restrooms, likely from urine contamination. Use of the SourceTracker algorithm confirmed many of our taxonomic observations as human skin was the primary source of bacteria on restroom surfaces. Overall, these results demonstrate that restroom surfaces host relatively diverse microbial communities dominated by human-associated bacteria with clear linkages between communities on or in different body sites and those communities found on restroom surfaces. More generally, this work is relevant to the public health field as we show that human-associated microbes are commonly found on restroom surfaces suggesting that bacterial pathogens could readily be transmitted between individuals by the touching of surfaces. Furthermore, we demonstrate that we can use high-throughput analyses of bacterial communities to determine sources of bacteria on indoor surfaces, an approach which could be used to track pathogen transmission and test the efficacy of hygiene practices. Citation: Flores GE, Bates ST, Knights D, Lauber CL, Stombaugh J, et al. (2011) Microbial Biogeography of Public Restroom Surfaces. PLoS ONE 6(11): e28132. doi:10.1371/journal.pone.0028132 Editor: Mark R. Liles, Auburn University, United States of America Received September 12, 2011; Accepted November 1, 2011; Published November 23, 2011 Copyright: ß 2011 Flores et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported with funding from the Alfred P. Sloan Foundation and their Indoor Environment program, and in part by the National Institutes of Health and the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: noah.fierer@colorado.edu Introduction More than ever, individuals across the globe spend a large portion of their lives indoors, yet relatively little is known about the microbial diversity of indoor environments. Of the studies that have examined microorganisms associated with indoor environ- ments, most have relied upon cultivation-based techniques to detect organisms residing on a variety of household surfaces [1–5]. Not surprisingly, these studies have identified surfaces in kitchens and restrooms as being hot spots of bacterial contamination. Because several pathogenic bacteria are known to survive on surfaces for extended periods of time [6–8], these studies are of obvious importance in preventing the spread of human disease. However, it is now widely recognized that the majority of microorganisms cannot be readily cultivated [9] and thus, the overall diversity of microorganisms associated with indoor communities and revealed a greater diversity of bacteria on indoor surfaces than captured using cultivation-based techniques [10–13]. Most of the organisms identified in these studies are related to human commensals suggesting that the organisms are not actively growing on the surfaces but rather were deposited directly (i.e. touching) or indirectly (e.g. shedding of skin cells) by humans. Despite these efforts, we still have an incomplete understanding of bacterial communities associated with indoor environments because limitations of traditional 16 S rRNA gene cloning and sequencing techniques have made replicate sampling and in-depth characterizations of the communities prohibitive. With the advent of high-throughput sequencing techniques, we can now investigate indoor microbial communities at an unprecedented depth and begin to understand the relationship between humans, microbes and the built environment. In order to begin to comprehensively describe the microbial the stall in), they were likely dispersed manually after women used the toilet. Coupling these observations with those of the distribution of gut-associated bacteria indicate that routine use of toilets results in the dispersal of urine- and fecal-associated bacteria throughout the restroom. While these results are not unexpected, they do highlight the importance of hand-hygiene when using public restrooms since these surfaces could also be potential vehicles for the transmission of human pathogens. Unfortunately, previous studies have documented that college students (who are likely the most frequent users of the studied restrooms) are not always the most diligent of hand-washers [42,43]. Results of SourceTracker analysis support the taxonomic patterns highlighted above, indicating that human skin was the primary source of bacteria on all public restroom surfaces examined, while the human gut was an important source on or around the toilet, and urine was an important source in women’s restrooms (Figure 4, Table S4). Contrary to expectations (see above), soil was not identified by the SourceTracker algorithm as being a major source of bacteria on any of the surfaces, including floors (Figure 4). Although the floor samples contained family-level taxa that are common in soil, the SourceTracker algorithm probably underestimates the relative importance of sources, like Figure 3. Cartoon illustrations of the relative abundance of discriminating taxa on public restroom surfaces. Light blue indicates low abundance while dark blue indicates high abundance of taxa. (A) Although skin-associated taxa (Propionibacteriaceae, Corynebacteriaceae, Staphylococcaceae and Streptococcaceae) were abundant on all surfaces, they were relatively more abundant on surfaces routinely touched with hands. (B) Gut-associated taxa (Clostridiales, Clostridiales group XI, Ruminococcaceae, Lachnospiraceae, Prevotellaceae and Bacteroidaceae) were most abundant on toilet surfaces. (C) Although soil-associated taxa (Rhodobacteraceae, Rhizobiales, Microbacteriaceae and Nocardioidaceae) were in low abundance on all restroom surfaces, they were relatively more abundant on the floor of the restrooms we surveyed. Figure not drawn to scale. doi:10.1371/journal.pone.0028132.g003 Bacteria of Public Restrooms high diversity of floor communities is likely due to the frequency of contact with the bottom of shoes, which would track in a diversity of microorganisms from a variety of sources including soil, which is known to be a highly-diverse microbial habitat [27,39]. Indeed, bacteria commonly associated with soil (e.g. Rhodobacteraceae, Rhizobiales, Microbacteriaceae and Nocardioidaceae) were, on average, more abundant on floor surfaces (Figure 3C, Table S2). Interestingly, some of the toilet flush handles harbored bacterial related differences in the relative abundances of s some surfaces (Figure 1B, Table S2). Most notably were clearly more abundant on certain surfaces restrooms than male restrooms (Figure 1B). Some family are the most common, and often most abun found in the vagina of healthy reproductive age w and are relatively less abundant in male urine analysis of female urine samples collected as part Figure 2. Relationship between bacterial communities associated with ten public restroom surfaces. Communities were PCoA of the unweighted UniFrac distance matrix. Each point represents a single sample. Note that the floor (triangles) and toilet (as form clusters distinct from surfaces touched with hands. doi:10.1371/journal.pone.0028132.g002 Bacteria of P time, the un to take of outside om plants ours after ere shut ortion of e human ck to pre- which 26 Janu- Journal, hanically had lower y than ones with open win- ility of fresh air translated tions of microbes associ- an body, and consequently, pathogens. Although this hat having natural airflow Green says answering that clinical data; she’s hoping ital to participate in a study ence of hospital-acquired they move around. But to quantify those con- tributions, Peccia’s team has had to develop new methods to collect airborne bacteria and extract their DNA, as the microbes are much less abundant in air than on surfaces. In one recent study, they used air filters to sample airborne particles and microbes in a classroom during 4 days during which students were present and 4 days during which the room was vacant. They measured pant in indoor microbial ecology research, Peccia thinks that the field has yet to gel. And the Sloan Foundation’s Olsiewski shares some of his con- cern. “Everybody’s gen- erating vast amounts of data,” she says, but looking across data sets can be difficult because groups choose dif- ferent analytical tools. With Sloan support, though, a data archive and integrated analyt- ical tools are in the works. To foster collaborations between micro- biologists, architects, and building scientists, the foundation also sponsored a symposium on the microbiome of the built environment at the 2011 Indoor Air conference in Austin, 100 80 60 40 20 0 Averagecontribution(%) DoorinDoorout StallinStallout Faucethandles SoapdispenserToiletseat ToiletflushhandleToiletfloorSinkfloor SOURCES Soil Water Mouth Urine Gut Skin Bathroom biogeography. By swabbing different surfaces in public restrooms, researchers determinedthatmicrobesvaryin where they come from depend- ing on the surface (chart). onFebruary9,2012 Wednesday, August 7, 13
  • 80. Acknowledgements Jonathan Eisen Students and other staff: - Eric Lowe, John Zhang, David Coil Open source community: - BLAST, LAST, HMMER, Infernal, pplacer, Krona, metAMOS, Bioperl, Bio::Phylo, JSON, etc. etc. PhyloSift is open source software: - Website: http://phylosift.wordpress.org - Code: http://github.com/gjospin/phylosift Erick Matsen FHCRC Todd Treangen BNBI, NBACC Holly Bik Tiffanie Nelson Mark Brown Aaron Darling Guillaume Jospin Supported by DHS Grant Wednesday, August 7, 13