UC Davis EVE 161 Lecture 7 - rRNA workflows - by Jonathan Eisen @phylogenomics
1. Lecture 7:
EVE 161:
Microbial Phylogenomics
!
Lecture #7:
Era II: rRNA sequencing and analysis
!
UC Davis, Winter 2014
Instructor: Jonathan Eisen
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!1
2. Where we are going and where we have been
• Previous lecture:
! 6: Era II: PCR and major groups
• Current Lecture:
! 7: Era II: rRNA sequencing and analysis
• Next Lecture:
! 8: Era II: rRNA ecology
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!2
3. All Analysis Should Be Guided by Goals
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
4. All Analysis Should Be Guided by Goals
• Taxonomic assignment for sequences (i.e., what type
of organism is the sequence from)
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
5. All Analysis Should Be Guided by Goals
• Taxonomic assignment for sequences (i.e., what type
of organism is the sequence from)
– Best via phylogenetic analysis of sequences
– Sometimes done with blast
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
6. All Analysis Should Be Guided by Goals
• Taxonomic assignment for sequences (i.e., what type
of organism is the sequence from)
– Best via phylogenetic analysis of sequences
– Sometimes done with blast
• Ecological characterization of community
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
7. All Analysis Should Be Guided by Goals
• Taxonomic assignment for sequences (i.e., what type
of organism is the sequence from)
– Best via phylogenetic analysis of sequences
– Sometimes done with blast
• Ecological characterization of community
–Grouping into species / classifying
–Have we sampled enough?
–Number of species
–Relative abundance
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
8. All Analysis Should Be Guided by Goals
• Taxonomic assignment for sequences (i.e., what type
of organism is the sequence from)
– Best via phylogenetic analysis of sequences
– Sometimes done with blast
• Ecological characterization of community
–Grouping into species / classifying
–Have we sampled enough?
–Number of species
–Relative abundance
• Comparisons between communities
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
9. All Analysis Should Be Guided by Goals
• Taxonomic assignment for sequences (i.e., what type
of organism is the sequence from)
– Best via phylogenetic analysis of sequences
– Sometimes done with blast
• Ecological characterization of community
–Grouping into species / classifying
–Have we sampled enough?
–Number of species
–Relative abundance
• Comparisons between communities
–Taxonomy
–Ecological metrics
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
10. All Analysis Should Be Guided by Goals
• Taxonomic assignment for sequences (i.e., what type
of organism is the sequence from)
– Best via phylogenetic analysis of sequences
– Sometimes done with blast
• Ecological characterization of community
–Grouping into species / classifying
–Have we sampled enough?
–Number of species
–Relative abundance
• Comparisons between communities
–Taxonomy
–Ecological metrics
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
11. All Analysis Should Be Guided by Goals
• Taxonomic assignment for sequences (i.e., what type
of organism is the sequence from)
– Best via phylogenetic analysis of sequences
– Sometimes done with blast
• Ecological characterization of community
–Grouping into species / classifying
–Have we sampled enough?
–Number of species
–Relative abundance
• Comparisons between communities
–Taxonomy
–Ecological metrics
• Phylogenetic diversity
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
12. All Analysis Should Be Guided by Goals
• Other Goals from rRNA analysis?
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
13. rRNA Workflow
• General workflow
! Sample collection and DNA extraction
! rRNA PCR
! Sequence
! Alignment
! Cluster sequences into groups (known as operational
taxonomic units or OTUs)
! Measure relative abundance of OTU by # of
sequences in that group
! Try and assign a taxonomy to each OTU
• Caveats
! Copy number varies extensively
! Not all organisms amplified
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!13
14. What to Actually Measure in the Microbiome
• Lists
! Taxa
! Genes
!
• Summary statistics
! Alpha diversity = within sample
! Beta diversity = between samples
! (and hope these reflect something about functional
properties)
!
• Estimation vs. measurement
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!14
15. rRNA PCR
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
16. rRNA PCR
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
17. Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
18. Degenerate PCR
Conserved
sequence shared
by all species
* * *
*
*
* Ambiguities in the sequence
5’-TWCGTSGARCTGCACGGVACCGGYAC-3’
IUPAC degeneracies: W = A or T
V = C or G or A
S = G or C
Y = C or T
R = A or G
2*2*2*3*2 = 48 different primers sequences
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
26. Diversity 1: Alpha Diversity
• Alpha diversity is (basically) a measure of the diversity
within a single sample
• Types of alpha diversity
! Total # of species = richness
! Phylogenetic diversity of species = PD
! Total # of genes = genetic richness
! Phylogenetic diversity of genes = genetic PD
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!26
27. n M ay 11, 2010
stabilized (Fig. 7), suggesting that further sampling will result
in a greater difference in richness between the ponds with low
and high productivity.
Rarefaction Curves
FIG. 6. Rarefaction curves of observed OTU richness in human
mouth (E) and gut (F) bacterial samples. The error bars are 95% CIs
and were calculated from the variance of the number of OTUs drawn
in 100 randomizations at each sample size.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!27
28. Observed vs. Estimated Alpha Diversity
4402
MINIREVIEW
A
CASE STUDIE
In terms of both underlying assump
be evaluated, nonparametric estimato
for assessing microbial diversity. To
potential, we applied these technique
sets. In particular, we compared the
estimators with the rarefaction approa
the precision of their estimates cha
These four data sets were among th
represented a range of habitat types a
ents. We came across a number of a
would also have been appropriate for
although others of comparable size
analyzed with these techniques (5, 45
The analyses were performed with E
FIG. 3. Observed and estimated OTU richness of bacteria in a
human mouth (33) versus sample size. The number of OTUs observed
R. Colwell, University of Connecticut
for a given sample size, or the accumulation curve, is averaged over 50
.edu/estimates]). For the purposes of
simulations HUGHES, JESSICA J. HELLMANN, TAYLOR H. RICKETTS,
JENNIFER B. (E). Estimated OTU richness is plotted for Chaol (F) and
program, we treated each cloned sequ
ACE (Œ) estimators.
AND BRENDAN J. M. BOHANNAN
ple. We ran 100 randomizations for a
Department of Biological Sciences, Stanford University,
izations did not change the results.
Stanford, California 94305-5020
Human mouth and gut. Two of the
lem of not being able to measure bias. (This assumes that the
communities are from human habitat
Volume 67, no. 10, p. 4399–4406, 2001.estimator does not differ so3: lines 4 and 5 should read “. . subgingival plaque from a human
bias of an Page 4402, legend to Fig. radically among compled .simulations (Œ). Estimated
OTU richness is plotted for Chao1 (⅜)that it disrupts the relative order of the estimates. In
munities and ACE (ⅷ) estimators.”
to amplify the bacterial 16S rDNA, cre
the absence of alternative evidence, this initial assumption
the amplified DNA, and then sequen
seems appropriate.)
al. defined an OTU as a 16S rDNA s
Chao (8) derives a closed-form solution for the variance of
sequences differed by Յ1%. By this d
SChao1:
distinct OTUs from their sample of 26
Although the accumulation curve does
4
2
m
n1
it is not
can
Slides for UC Davis EVE161 m
Course3 Taught by Jonathan Eisen Winter 2014 linear (Fig. 3). Thus, we !28 tr
Var͑S
͒ϭn
ϩm ϩ
, where m ϭ
ERRATUM
Counting the Uncountable: Statistical Approaches to
Estimating Microbial Diversity
ͩ
ͪ
29. me spatial scale,
would be higher
est, however, that
man mouth or in
to our ability to
are kilometers of
munities, microbiecologists use to
organisms.
—are too diverse
eful to know the
nities, most diveracross biotic and
ctivity, area, latirs to these quesies among sites,
mens. Using this
versity and many
d (50, 57, 63, 64),
ative exponential function (61). The benefit of estimating diversity with such extrapolation methods is that once a species
has been counted, it does not need to be counted again. Hence,
a surveyor can focus effort on identifying new, generally rarer,
species. The downside is that for diverse communities in which
Rank Abundance Curves
ALIF D AV IS on M a y 1 1 , 2 0 1 0
unities have been
ves. The species
he x axis, and the
n the y axis. The
a similar pattern
mmunities such as
re abundant, but
tail on the rank-
FIG. 2. Rank-abundance curves for (a) tropical moths (n ϭ 4,538)
(56) and (b) temperate soil bacteria (n ϭ 137) (39). The two most
abundant species of moths (396 and 173 individuals) are excluded from
panel a to shorten the y axis.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
30. Diversity 2: Beta Diversity
• Beta diversity is (basically) a measure of the similarity in
diversity between samples
• Types of beta diversity
! Species presence/absence
! Shared phylogenetic diversity
! Gene presence / absence
! Shared phylogenetic diversity of genes
!
• Frequently used as values for PCA of PCoA analysis
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
!30
31. ARTICLES
Variability in Health vs. Disease
40
PC2
•
•
•
•
30
•
••
•
•
Ulcerative colitis
•
•
•
•
•
•
•
•
•
•
•
•
•
•
PC1
•
•
•
•
Healthy
•
•
20
10
•
•
Cluster (%)
•
•
Crohn’s disease
•
•
0
P value: 0.031
•
•
•
1
•
Figure 4Figure 4 | Bacterialspecies abundance differentiates IBD patients and
| Bacterial species abundance differentiates IBD patients and healthy individuals.
healthy individuals. Principal component analysis with health status as
instrumentalQin et al. 2010. Nature.on the abundance of 155 species with $1%
variables, based
genome coverage by the Illumina reads in at least 1 individual of the cohort,
It iscarried out with 14go backwards from these patterns to
was possible to healthy individuals and 25 IBD patients (21 ulcerative
colitis or 4 Crohn’s disease) the clustering patters
from Spain
taxa and genes drive were plotted(Supplementary Table 1). Two first
components (PC1 and PC2)
and represented 7.3% of whole
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
inertia. Individuals (represented by points) were clustered and centre of
Figure 5 | C
were ranked
length and c
clusters with
groups of 1
that contain
see which
were withi
!31
This sugge