16S Ribosomal DNA Sequence Analysis

I6S RIBOSOMAL DNA SEQUENCE
ANALYSIS
Abdulrahman Mohammed
School of Public Health & Zoonoses
GADVASU

INTRODUCTION
• The rRNA gene is the most conserved (least variable)
DNA in all cells. Portions of the rDNA sequence from
distantly related organisms are remarkably similar. This
means that sequences from distantly related
organisms can be precisely aligned, making the true
differences easy to measure. For this reason, genes
that encode the rRNA (rDNA) have been used
extensively to determine taxonomy, phylogeny
(evolutionary relationships), and to estimate rates of
species divergence among bacteria. Thus the
comparison of 16s rDNA sequence can show
evolutionary relatedness among microorganisms.
• Carl Woese, who proposed the three Domain system of
classification - Archaea, Bacteria, and Eucarya - based
on such sequence information, pioneered this work

Note on terminology
• Several pieces of RNA are important for
proper ribosome function.
• This RNA is not translated to protein, the
ribosomal RNA is the active component.
• Thus we can refer to the “rRNA gene” or
“rDNA” to designate the DNA in the genome
that produces the ribosomal RNA.

Universal phylogenetic tree as determined from
comparative ribosomal RNA sequencing.

• Although the three domains of living
organisms were originally defined by
ribosomal RNA sequencing, subsequent
studies have shown that they differ in many
other ways
• Large public databases available for
comparison.
• Ribosomal Database Project currently
contains >1.5 million rRNA sequences.

Detailed phylogenetic tree of the major lineages
(phyla) of Bacteria based on 16S ribosomal RNA
sequence comparisons

RIBOSOMAL RNA
• To infer relationships that span the diversity of known
life, it is necessary to look at genes conserved through
the billions of years of evolutionary divergence.
• Examples of genes in this category are those that
define the ribosomal RNAs (rRNAs).
• In Bacteria, Archaea, Mitochondria, and Chloroplasts,
the small ribosomal subunit contains the 16S
• rRNA (where the S in 16S represents Svedberg units).
The large ribosomal subunit contains two rRNA species
(the 5S and 23S rRNAs).

• Most prokaryotes have three rRNAs, called the 5S, 16S
and 23S rRNA. Bacterial 16S,
• 23S, and 5S rRNA genes are typically organized as a co-transcribed
operon. There may
• be one or more copies of the operon dispersed in the
genome (for example, E coli has
• seven). The Archaea contains either a single rDNA
operon or multiple copies of the operon
• rRNA targets were studied originally, most researchers
now target the corresponding ribosomal DNA (rDNA)
because DNA is more stable and easier to analyse

Secondary structure
of small subunit ribosomal RNA

Types
• In prokaryotes: 23S, 5S,16S
• In eukaryotes: 28S, 5.8S, 5S, 18S

16S rDNA gene – codes for making SSU rRNA
Forward primer Reverse primer
5’ 3’
Conserved region Variable regions
Stems (sites that
rarely mutate &
are conserved)
Use of primers to copy the 16S rDNA gene in bacteria
Loops
(sites that are more
free to mutate &
evolve faster)
Ribosome synthesizing
a protein
Bacterium with
ribosomes
Campbell & Reece, 6th Ed.
Atomic structure of the small subunit a ribosome.
The rRNA, shown in orange, helps match the mRNA
(codon) to the tRNA (anticodon).
Small subunit ribosomal RNA
F
R
Copied DNA (using PCR)

Ribosomal RNAs in Prokaryotes:
NAME SIZE (NUCLEOTIDES) LOCATION
5S 120 Large subunit of ribosome
16S 1500 Small subunit of ribosome
23S 2900 Large subunit of ribosome

• The 16s rDNA sequence has hypervariable regions, where
sequences have diverged over evolutionary time.
• Strongly conserved regions often flank these hypervariable
regions.
• Primers are designed to bind to conserved regions and amplify
variable regions.
• The DNA sequence of the16S rDNA gene has been determined
for an extremely large number of species. In fact, there is no
other gene that has been as well characterized in as many
species.
• Sequences from tens of thousands of clinical and environmental
isolates are available over the Internet through the National
Center for Biotechnology Information (www.ncbi.nlm.nih.gov)
and the Ribosomal Database Project (http://rdp.cme.msu.edu/).
• These sites also provide search algorithms to compare new
sequences to their database.

Why is the small subunit rRNA gene so useful ?
 Conserved in parts – highly variable
in other parts. Thus it a very good
phylogenetic marker
 VERY large database of sequences
 Cell have many ribosomes which can
be targeted with probes (e.g. FISH,
&TRFLP) for community analysis
 16S rRNA gene sequencing is now
the gold standard for community
analysis

Which hyper-variable regions to
sequence?
Region Position # b.p.
V1 69-99 30
V2 137-242 105
V3 338-533 195
V4 576-682 106
V5 822-879 57
V6 967-1046 79
V7 1117-1173 56
V8 1243-1294 51
V9 1435-1465 30
E.coli 16S SSU rRNA hyper-variable
regions

454-based 16S amplicon sequencing

RFLP Fingerprinting Analysis
• RFLP = restriction fragment length polymorphism
• RFLP analysis involves cutting DNA into fragments using one
or a set of restriction enzymes.
• For chromosomal DNA the RFLP fragments are separated by
gel electrophoresis, transferred to a membrane, and probed
with a gene probe.
• One advantage of this fingerprinting technique is that all
bands are bright (from chromosomal DNA) because they are
detected by a gene probe. AP-PCR, ERIC-PCR, and REP-PCR all
have bands of variable brightness and also can have ghost
bands.
• For PCR products a simple fragment pattern can be
distinguised immediately on a gel. This is used to confirm the
PCR product or to distinguish between different isolates based
on restriction cutting of the 16S-rDNA sequence “ribotyping”.
Also developed into a diversity measurement technique called
“TRFLP”.

TRFLP Analysis
• TRFLP = (terminal restriction fragment length polymorphism
analysis)
• A way to separate multiple PCR products of the same size.
These products can be generated by a 16S-rRNA PCR of
community DNA
• The PCR is performed as usual with two primers, but one is
fluorescently labeled
• The PCR products are then cut up using a restriction enzyme
• The fluorescently labeled PCR pieces are detected
• TRFLP steps:
1. Extract DNA
2. Perform 16S rRNA PCR using fluorescently-labeled primer
3. Choose a restriction enzyme for TRFLP that will give the
greatest diversity in restriction product size

Automated DNA analyzer
Gel
electrophoresis
analysis
0 100 200 300 400 500 600 700
Fragment Length
Relative Abundance
0.10
0.08
0.06
0.04
0.02
0.00

TRFLP (Terminal Restriction Fragment Length
Polymorphism)
• Mixed population is amplified
using a 16S primer with a
fluorescent tag
• PCR product is cut with a 4bp
cutting restriction endonuclease
• Different sequences will give
different length fragments
• Sample is injected into a capillary
sequencer to sort fragments by
fragment size size
cut with 4bp RE
FU

TRFLP (cont.)
 Advantages
 Very sensitive
 Fast, easy and cheap
 Disadvantages
 Can NOT cut bands to get sequence
data
 Requires capillary sequencer
 Hard to distinguish noise from little
peaks sometimes

Southern Blot Hybridization
• SBH analysis is a method named after its developer, Southern, E, M. (1979)
that facilitates detection of a DNA fragment of interest among hundreds of
other fragments generated by REA
• Allows restriction digestion electrophoresis patterns to become
interpretable
• Restriction DNA fragments separated in agarose gel are transferred
(blotted) onto a piece of nitrocellulose or nylon membrane
• The membrane is then exposed to a DNA probe that has been labeled with
a molecule that facilitates visual detection of a selected target DNA
fragment
• The probe, which is a piece of single-stranded DNA, specifically binds
(hybridizes) to its complementary DNA sequence embedded in the
membrane under appropriate conditions
• When the SBH typing method uses ribosomal operon genes (rrn) found
among restriction-digested fragments in a membrane as the target, it is
called ribotyping

Microarrays
Constructed using probes for a known nucleic acid sequence or for a series of targets, a
nucleic acid sequence whose abundance is being detected.
GeneChip microarrays consist of small DNA fragments (referred to also as probes),
chemically synthesized at specific locations on a coated quartz surface. By extracting,
amplifying, and labeling nucleic acids from experimental samples, and then hybridizing
those prepared samples to the array, the amount of label can be monitored at each
feature, enabling either the precise
identification of hundreds of thousands
of target sequence (DNA Analysis) or the
simultaneous relative quantitation of the
tens of thousands of different RNA
transcripts, representing gene activity
(Expression Analysis).
The intensity and color of each
spot provide information on the
specific gene from the tested
sample.

DNA extraction
PCR
Gel electrophoresis
Bacteria
identification
DNA sequencing
ACAGATGTCTTGTAATCCGGC
CGTTGGTGGCATAGGGAAAG
GACATTTAGTGAAAGAAATTG
ATGCGATGGGTGGATCGATG
GCTTATGCTATCGATCAATCA
GGAATTCAATTTAGAGTACTT
AATAGTAGCAAAGGAGCTGC
TGTTAGAGCAACACGTGCTCA
GGCAGATAAAATATTATATCG
TCAAGCAATACGTAGTATTCT
TGAATATCAAAAATTTTTGTTG
GTTATTCA
Bioinformatics

Secondary structure
of 16S rRNA in E. coli
Molecular
Phylogenetics
Step 1. Select a DNA
region that is homologous,
or similar across species
due to common ancestry.
Ribosomal RNA (rRNA)
Ideal gene for phylogenetic
studies because it :
• is an essential gene that is
present in all organisms.
• is a common target for
sequencing studies; large
database for comparisons.
• contains sites that are
relatively conserved (stems)
and sites that are more free to
vary (loops).

2. Amplify and Sequence this region across isolates….
ACAGATGTCTTGTAATCCGGCCGTTGGTGGCAT
AGGGAAAGGACATTTAGTGAAAGAAATTGATG
CGATGGGTGGATCGATGGCTTATGCTATCGATC
AATCAGGAATTCAATTTAGAGTACTTAATAGTA
GCAAAGGAGCTGCTGTTAGAGCAACACGTGCT
CAGGCAGATAAAATATTATATCGTCAAGCAATA
CGT
CGT
GTFTGGGTGGATCGATGGCTTATGCTATCGATC
CGT
AATTTAGAATTCAATTTAGAGTACTTAATAGTAG
CAAAGGAGCTGCTGTTAGAGCAACACGTGCTC
AGGCAGATAAAATATTATATCGTCAAGCAATAC
GT
Sequence the
PCR product
PCR

3. Sequence alignment is crucial for inferring how DNA
sites have changed.
Poor alignment
Implies that species “I” is
divergent from the others,
but this is not the case.
Good alignment.
Species “I” has probably
experienced a deletion event
at position #6 or #7.

4. Estimate relationships based on extent of DNA similarity.
G
B
C
D
A
J
F
E
K
H
I
ATGTTGGCAGTCCGATGTAAGC
ATGTTGGCAGTCCGATGTAAGC
ATGTTGGCAGTCCGATGTAACC
ACGGTAGCAGTCTGATGTATCC
CTGCTGGTAGTCGTTTGTAACC
CTGCTGGTAGTCGTTTGTAACC
CTGCTGGCAGTCGGTTGTAACC
ATGCTGGCAGTCGGGTGTAACC
ATGGTGGCAGTCGGGTGTCACC
At variable DNA positions, related
groups will tend to share the
same nucleotide.
The sheer number of characters is
helpful to distinguish the
‘phylogenetic signal’ from noise.
Molecular phylogeny of taxa A-I.
Colored letters = different from top sequence (taxon G)

Example: Molecular
phylogenies have
revealed unexpected
features of bacterial
evolution.
For instance, an
endosymbiotic lifestyle has
evolved several times
independently.
Moran and Wernegreen (2000)

How does this organism fit into the world of available
sequence data?
CGT
PCR Sequence the
PCR product
“Blast” sequence
to Genbank
GENBANK = NIH genetic database with all publicly
available DNA sequences. As of 2004: > 44 billion
bp, and > 40 million sequences
Blast output:
Lists sequences
that are most
similar to yours

?
CGT
Is the bacterium really Wolbachia?
PCR and sequence a
gene of interest
(e.g., 16S rDNA)
YES!!
Blast results:
Wolbachia sp. 1
Wolbachia sp. 2
Wolbachia sp. 3
….
“Blast” sequence
to Genbank

Some Databases
• National Center for Biotechnology Information
(www.ncbi.nlm.nih.gov)
• Ribosomal Database Project II
(http://rdp.cme.msu.edu/html/)
• Ribosomal Differentiation of Medical
Microorganisms (www.ridom.com)
• MicroSeq 16S 500 Library (Applied Biosystems)
• GenBank
• Mayo Database

Guidelines for interpretation of 16S rRNA gene
sequence-based results for identification of
medically important aerobic Gram-positive bacteria(Woo et al., 2009)
Full and 527 bp 16S rRNA gene sequencing and MicroSeq
databases used for identifying medically important aerobic
Gram-positive bacteria. Overall, full and 527 bp 16S rRNA
gene sequencing can identify 24 and 40 % of medically
important Gram-positive cocci (GPC), and 21 and 34 % of
medically important Gram-positive rods (GPR) confidently to
the species level, whereas the full-MicroSeq and 500-
MicroSeq databases can identify 15 and 34 % of medically
important GPC and 14 and 25 % of medically important GPR
confidently to the species level. Among staphylococci,
streptococci, enterococci, mycobacteria, corynebacteria,
nocardia and members of Bacillus and related taxa
(Paenibacillus, Brevibacillus, Geobacillus and Virgibacillus), the
methods and databases are least useful for identification of
staphylococci and nocardia.

Only 0–2 and 2–13 % of staphylococci, and 0 and 0–10 % of
nocardia, can be confidently and doubtfully identified,
respectively. However, these methods and databases are most
useful for identification of Bacillus and related taxa, with 36–
56 and 11–14 % of Bacillus and related taxa confidently and
doubtfully identified, respectively. A total of 15 medically
important GPC and 18 medically important GPR that should
be confidently identified by full 16S rRNA gene sequencing are
not included in the full-MicroSeq database. A total of 9
medically important GPC and 21 medically important GPR that
should be confidently identified by 527 bp 16S rRNA gene
sequencing are not included in the 500-MicroSeq database.
16S rRNA gene sequence results of Gram-positive bacteria
should be interpreted with basic phenotypic tests results.
Additional biochemical tests or sequencing of additional gene
loci are often required for definitive identification. To improve
the usefulness of the MicroSeq databases, bacterial species
that can be confidently identified by 16S rRNA gene
sequencing but are not found in the MicroSeq databases
should be included.

Definitions
“A bacterium species is defined as ‘confidently identified by
16S rRNA gene sequencing’ if there is >3% difference
between the16S rRNA gene sequence of the species and
those of other medically important bacteria species. A
bacterium species is defined as ‘not confidently identified
by 16S rRNA gene sequencing’ if there is <2% difference
between the 16S rRNA gene sequence of the species and
that of one or more medically important aerobic Gram-positive
bacterium species. A bacterium species is defined
as ‘only doubtfully identified by 16S rRNA gene sequencing’
if there is 2–3 % difference between the 16S rRNA gene
sequence of the species and that of one or more medically
important aerobic Gram-positive bacterium species. (Woo
et al., 2009)

16S Ribosomal DNA Sequence Analysis

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie 16S Ribosomal DNA Sequence Analysis

Ähnlich wie 16S Ribosomal DNA Sequence Analysis (20)

Mehr von Abdulrahman Muhammad

Mehr von Abdulrahman Muhammad (13)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

16S Ribosomal DNA Sequence Analysis