2. Sequencing and its Methods
● Sequencing is a process by which the precise order of nucleotides in a piece of
DNA can be determined.
● Methods:
➔ Enzymatic DNA sequencing: Sanger sequencing
➔ Chemical sequencing of DNA: Maxam gilbert sequencing
➔ Automated DNA sequencing
➔ RNA sequencing
3. Enzymatic DNA sequencing/ Chain termination Sequencing
● Chain termination method/Enzymatic DNA sequencing was first devised by
Fred Sanger and colleagues in the mid-1970s. It is also known as “Sanger
sequencing”.
● With the chain termination method, up to 96 sequences can be obtained
simultaneously in a single run of a sequencing machine.
● Principle: Chain termination DNA sequencing is based on the principle that
single-stranded DNA molecules that differ in length by just a single
nucleotide can be separated from one another by polyacrylamide gel
electrophoresis. This means that it is possible to resolve a family of molecules,
representing all lengths from 10 to 1500 nucleotides, into a series of bands in a
slab or capillary gel
4. Fig:
Polyacrylamide gel electrophoresis can
resolve single-stranded DNA molecules
that differ in length by just one nucleotide.
The banding pattern shown here is
produced after separation of single-
stranded DNA molecules by denaturing
polyacrylamide gel electrophoresis. The
molecules have been labeled with a
radioactive marker and the bands
visualized by autoradiography.
5. Chain termination Sequencing
Step I: Initiation of Strand Synthesis:
The starting material for a chain termination
sequencing experiment is a preparation of
identical single-stranded DNA molecules.
The first step is to anneal a short
oligonucleotide to the same position on each
molecule, this oligonucleotide subsequently
acting as the primer for synthesis of a new DNA
strand that is complementary to the template
6. Step II: The strand synthesis:
● It is catalyzed by a DNA polymerase enzyme and requires the four
deoxyribonucleotide triphosphates (dNTPs—dATP, dCTP, dGTP, and dTTP) as
substrates.
● Chain termination sequencing also adds a small amount of each of four
dideoxynucleotides (ddNTPs—ddATP, ddCTP, ddGTP, and ddTTP). Each of
these dideoxynucleotides is labeled with a different fluorescent marker.
● The polymerase enzyme does not discriminate between deoxy- and
dideoxynucleotides, but once incorporated a dideoxynucleotide blocks further
elongation because it lacks the 3′-hydroxyl group needed to form a connection
with the next nucleotide. Thus the synthesis terminates at a site of
dideoxynucleotides addition
7. Dideoxynucleotide
Because the normal deoxynucleotides
are also present, in larger amounts
than the dideoxynucleotides, the
strand synthesis does not always
terminate close to the primer: in
fact, several hundred nucleotides may
be polymerized before a
dideoxynucleotide is eventually
incorporated.
8.
9. ● To work out the DNA sequence, all that we
have to do is identify the dideoxynucleotide
at the end of each chain-terminated
molecule.
● This is where the polyacrylamide gel comes
into play. The mixture is loaded into a well of
a polyacrylamide slab gel, or into a tube of a
capillary gel system, and electrophoresis
carried out to separate the molecules according
to their lengths.
● After separation, the molecules are run past a
fluorescent detector capable of discriminating
the labels attached to the dideoxynucleotides .
The detector therefore determines if each
Detection of Sequences
10. The sequence can be printed out for examination by the operator, or entered directly into a
storage device for future analysis.
11. Polymerase used for chain termination sequencing
● In the original method for chain termination sequencing, the Klenow polymerase was used as
the sequencing enzyme, this is a modified version of the DNA polymerase I enzyme from E.
coli.
● The modification removing the 5′→3′ exonuclease activity of the standard enzyme.
However, the Klenow polymerase has low processivity, meaning that it can only synthesize a
relatively short DNA strand before dissociating from the template due to natural causes.
● This limits the length of sequence that can be obtained from a single experiment to about 250
bp.
● To avoid this problem, most sequencing today makes use of a more specialized enzyme, such
as Sequenase, a modified version of the DNA polymerase encoded by bacteriophage T7.
● Sequenase has high processivity and no exonuclease activity and so is ideal for chain
termination sequencing, enabling sequences of up to 750 bp to be obtained in a single
experiment.
12. Chain termination sequencing requires a single-stranded DNA template
Double-stranded plasmid DNA can be converted into single-stranded DNA by
1. Denaturation with alkali or by boiling:This is a common method for obtaining
template DNA for DNA sequencing, but a shortcoming is that it can be difficult to
prepare plasmid DNA that is not contaminated with small quantities of bacterial DNA
and RNA, which can act as spurious templates or primers in the DNA sequencing
experiment. l
2. The DNA can be cloned in a phagemid, a plasmid vector that contains an M13
origin of replication and which can therefore be obtained as both double- and single-
stranded DNA versions. Phagemids avoid the instabilities of M13 cloning and can be
used with fragments up to 10 kb or more.
13. The primer determines the region of the template DNA that will be
sequenced
● In the first stage of a chain termination sequencing experiment, an oligonucleotide
primer is annealed onto the template DNA.
● The main function of primer is to provide the short double-stranded region that is
needed in order for the DNA polymerase to initiate DNA synthesis. The primer also
plays a second critical role in determining the region of the template molecule that
will be sequenced.
● For most sequencing experiments a universal primer is used, this being one that is
complementary to the part of the vector DNA immediately adjacent to the point
into which new DNA is ligated.
● The 3′ end of the primer points toward the inserted DNA, so the sequence that is
obtained starts with a short stretch of the vector and then progresses into the cloned
DNA fragment.
14. ● If the DNA is cloned in a plasmid vector, then both forward and reverse universal
primers can be used, enabling sequences to be obtained from both ends of the insert.
This is an advantage if the cloned DNA is more than 750 bp and hence too long to be
sequenced completely in one experiment.
● Alternatively, it is possible to extend the sequence in one direction by synthesizing a
non-universal primer, designed to anneal at a position within the insert DNA.
15.
16. Chemical Sequencing of DNA
● Allan Maxam and Walter Gilbert published a DNA sequencing method in 1977 based
on chemical modification of DNA and subsequent cleavage at specific bases.
● This method allows purified samples of double-stranded DNA to be used without
further cloning.
● Maxam-Gilbert sequencing requires radioactive labelling at 5' end or 3’ end of the
DNA followed by purification of the DNA fragment to be sequenced
17. Procedure:
Step I:
● The starting material is double-stranded
DNA, which is first labeled by attaching a
radioactive phosphorus group to the 5’ end
of each strand.
● Dimethylsulfoxide (DMSO) is then added
and the DNA heated to 90˚C. This breaks the
base pairing between the strands, enabling
them to be separated from one another by gel
electrophoresis, the basis to this being that
one of the strands probably contains more
purine nucleotides than the other and is
therefore slightly heavier and runs more
slowly during the electrophoresis.
18. Step II:
● One strand is purified from the gel and
divided into four samples, each of which is
treated with one of the cleavage reagents. To
illustrate the procedure, we will follow the
“G” reaction.
● First, the molecules are treated with dimethyl
sulfate, which attaches a methyl group to
the purine ring of G nucleotides. Only a
limited amount of dimethyl sulfate is added,
the objective being to modify, on average,
just one G residue per polynucleotide.
● At this stage the DNA strands are still intact,
cleavage not occurring until a second
chemical—piperidine—is added.
19. ● Piperidine removes the modified purine ring and cuts the DNA molecule at the
phosphodiester bond immediately upstream of the baseless site that is created.
● The result is a set of cleaved DNA molecules, some of which are labeled and some of
which are not. The labeled molecules all have one end in common and one end
determined by the cut sites, the latter indicating the positions of the G nucleotides in
the DNA molecules that were cleaved.
● Similar approaches are used to generate additional families of cleaved molecules,
though these are usually not simply “A,” “T,” and “C” families as problems have been
encountered in developing chemical treatments to cut specifically at A or T.
● The four reactions that are carried out are therefore usually “G,” “A + G,” “C,” and
“C + T”. This complicates things but does not affect the accuracy of the sequence that
is determined.
20. Reading the sequence from autoradiography:
● The family of molecules generated in each reaction is
loaded into a lane of a polyacrylamide slab gel and, after
electrophoresis, the positions of the bands in the gel are
visualized by autoradiography.
● The band that has moved the furthest represents the
smallest piece of DNA, this band lies in the “A + G”
lane. There is no equivalent-sized band in the “G” lane,
so the first nucleotide in the sequence is “A”.
● The next size position is occupied by two bands, one in
the “C” lane and one in the “C + T” lane: the second
nucleotide is therefore “C” and the sequence so far is
“AC”. The sequence reading can be continued up to the
region of the gel where individual bands were not
separated.
21. Maxam-Gilbert sequencing
(Summary)
This is a chemical-degradation
method allowing to sequence
dsDNA without previous in vivo
cloning steps.
It relies on the specific modifications
of the DNA nitrogenous bases (A, C,
G & T) and subsequent cleavage of
the ssDNA phosphate backbone at
such specifically-modified sites.
22. RNA Sequencing
RNA sequencing (RNA-Seq) uses the capabilities of high-throughput sequencing methods to
provide insight into the transcriptome of a cell.
Compared to previous Sanger sequencing- and microarray-based methods, RNA-Seq
provides far higher coverage and greater resolution of the dynamic nature of the
transcriptome.
Beyond quantifying gene expression, the data generated by RNA-Seq facilitate the
discovery of novel transcripts, identification of alternatively spliced genes, and
detection of allele-specific expression.
In addition to polyadenylated messenger RNA (mRNA) transcripts, RNA-Seq can be
applied to investigate different populations of RNA, including total RNA, pre-mRNA, and
noncoding RNA, such as microRNA and long ncRNA
23. Procedure:
1. RNA is extracted from the biological
material of choice (e.g., cells, tissues).
2. subsets of RNA molecules are isolated
using a specific protocol, such as the
poly-A selection protocol to enrich for
polyadenylated transcripts or a ribo-
depletion protocol to remove ribosomal
RNAs.
3. The RNA is converted to complementary
DNA (cDNA) by reverse transcription
and sequencing adaptors are ligated to the
ends of the cDNA fragments.
4. Following amplification by PCR, the
RNA-Seq library is ready for sequencing.
24. Step I: Isolation of RNA
● The first step in transcriptome sequencing is the isolation of RNA from a biological
sample.
● To ensure a successful RNA-Seq experiment, the RNA should be of sufficient quality
to produce a library for sequencing.
● The quality of RNA is typically measured using an Agilent Bioanalyzer, which produces
an RNA Integrity Number (RIN) between 1 and 10 with 10 being the highest quality
samples showing the least degradation.
● The RIN estimates sample integrity using gel electrophoresis and analysis of the ratios
of 28S to 18S ribosomal bands.
● Low-quality RNA (RIN < 6) can substantially affect the sequencing results (e.g.,
uneven gene coverage, 3′–5′ transcript bias, etc.) and lead to erroneous biological
conclusions.
● Therefore, high-quality RNA is essential for successful RNA-Seq experiments.
25. Step II: Selection of RNA
Before constructing RNA-Seq libraries, one must choose an appropriate library
preparation protocol that will enrich or deplete a “total” RNA sample for particular RNA
species.
The total RNA pool includes ribosomal RNA (rRNA), precursor messenger RNA (pre-
mRNA), mRNA, and various classes of noncoding RNA (ncRNA).
In most cell types, the majority of RNA molecules are rRNA, typically accounting for over
95% of the total cellular RNA.
If the rRNA transcripts are not removed before library construction, they will consume the
bulk of the sequencing reads, reducing the overall depth of sequence coverage and thus
limiting the detection of other less-abundant RNAs.
26. Because the efficient removal of rRNA is critical for successful transcriptome profiling,
many protocols focus on enriching for mRNA molecules before library construction by
selecting for polyadenylated (poly-A) RNAs. In this approach, the 3′ poly-A tail of
mRNA molecules is targeted using poly-T oligos that are covalently attached to a
given substrate (e.g., magnetic beads).
Alternatively, researchers can selectively deplete rRNA using commercially available
kits, such as RiboMinus (Life Technologies) or RiboZero (Epicentre). This latter
method facilitates the accurate quantification of noncoding RNA species, which may be
polyadenylated and thus excluded from poly-A libraries.
27. Step III: Library Preparation
Following RNA isolation, the next step in transcriptome sequencing is the creation of an
RNA-Seq library, which can vary by the selection of RNA species and between NGS
platforms.
The construction of sequencing libraries principally involves isolating the desired RNA
molecules, reverse-transcribing the RNA to cDNA, fragmenting or amplifying
randomly primed cDNA molecules, and ligating sequencing adaptors.
Within these basic steps, there are several choices in library construction and experimental
design that must be carefully made depending on the specific needs.
Additionally, the accuracy of detection for specific types of RNAs is largely dependent on
the nature of the library construction. .
29. Sequencing
The selection of a sequencing platform is important and dependent on the experimental
goals. Currently, several NGS platforms are commercially available and other platforms
are under active technological development .
The majority of high-throughput sequencing platforms use a sequencing-by-synthesis
method to sequence tens of millions of sequence clusters in parallel.
The NGS platforms can often be categorized as either ensemble-based (i.e. sequencing
many identical copies of a DNA molecule) or single-molecule-based (i.e. sequencing a
single DNA molecule).
The differences between these sequencing techniques and platforms can affect downstream
analysis and interpretation of the sequencing data.
30. RNA-Sequencing Data Analysis
Following typical RNA-Seq experiments, reads are
first aligned to a reference genome.
Second, the reads may be assembled into transcripts
using reference transcript annotations or de novo
assembly approaches.
Next, the expression level of each gene is estimated
by counting the number of reads that align to each
exon or full-length transcript.
Downstream analyses with RNA-Seq data include
testing for differential expression between samples,
detecting allele-specific expression, and identifying
expression quantitative trait loci (eQTLs).
32. Introduction
● Next-generation sequencing (NGS) is a massively parallel sequencing technology
that offers ultra-high throughput, scalability, and speed.
● The technology is used to determine the order of nucleotides in entire genomes or
targeted regions of DNA or RNA.
● Next generation sequencing includes following techniques:
1. Pyrosequencing
2. Roche 454 sequencing
3. Illumina sequencing
33. Pyrosequencing
● Pyrosequencing is a DNA sequencing technique based on sequencing-by-synthesis
enabling rapid real-time sequence determination. This technique employs four
enzymatic reactions in a single tube to monitor DNA synthesis.
● In pyrosequencing detection of signal is achieved in real time by detecting the
nucleotide incorporated by a DNA polymerase.
● The advantage with pyrosequencing is that it can be automated in a massively
parallel manner that enables hundreds of thousands of sequences to be obtained at
once, perhaps as much as 1000 Mb in a single run.
● Sequence is therefore produced much more quickly than is possible by the chain
termination method, which explains why pyrosequencing is gradually taking over as
the method of choice for genome projects.
34. Starting material:
A preparation of identical single-stranded DNA molecules as the starting
material.These are obtained by alkali denaturation of PCR products or, more rarely,
recombinant plasmid molecules.
After attachment of the primer, the template is copied by a DNA polymerase in a
straightforward manner. As the new strand is being made, the order in which the
deoxynucleotides are incorporated is detected, so the sequence can be “read” as the
reaction proceeds.
This procedure makes it possible to follow the order in which the deoxynucleotides are
incorporated into the growing strand.
35. ● The addition of a deoxynucleotide to the
end of the growing strand is detectable
because it is accompanied by release of a
molecule of pyrophosphate, which can be
converted by the enzyme sulfurylase into a
flash of chemiluminescence.
● Of course, if all four deoxynucleotides were
added at once, then flashes of light would
be seen all the time and no useful sequence
information would be obtained.
● Each deoxynucleotide is therefore added
separately, one after the other.
● with a nucleotidase enzyme also present in the reaction mixture so that if a
deoxynucleotide is not incorporated into the polynucleotide then it is rapidly degraded
before the next one is added.
36. Massively parallel pyrosequencing
A and B: DNA fragmentation and
adaptor attachment: The DNA is
broken into fragments between
300 and 500 bp in length, and each
fragment is ligated to a pair of
adaptors, one adaptor to either
end.
C: Attachment of DNA with bead: Adaptors enable
the DNA fragments to be attached to small metallic
beads. This is because one of the adaptors has a
biotin label attached to its 5′ end, and the beads are
coated with streptavidin, to which biotin binds with
great affinity.
DNA fragments therefore become attached to the
beads via biotin-streptavidin linkages.
37. D: Amplification of DNA: Each DNA fragment
will now be amplified by PCR so that enough
copies are made for sequencing.
● The adaptors now play their second role as
they provide the annealing sites for the
primers for this PCR.
● The same pair of primers can therefore be
used for all the fragments, even though the
fragments themselves have many different
sequences.
● If the PCR is carried out immediately then all we will obtain is a mixture of all the
products, which will not enable us to obtain the individual sequences of each one.
● To solve this problem, PCR is carried out in an oil emulsion, each bead residing in
its own aqueous droplet within the emulsion.
38. E: Sequencing:
Each droplet contains all the reagents needed for PCR, and is physically separated from all
the other droplets by the barrier provided by the oil component of emulsion.
After PCR, the aqueous droplets are transferred into wells on a plastic strip so there is one
droplet and hence once PCR product per well, and the pyrosequencing reactions are
carried out in each well.
39. Roche 454 sequencing
The 454 sequencing technology takes pyrosequencing one step further, allowing for bulk
sequencing of an entire genome.
Step I: Genomic DNA is fragmented and ligated to specific adapter molecules, which
are used as templates for primers in a polymerase chain reaction (PCR).
Step II: The PCR reaction, carried out on synthetic beads in an oil emulsion, takes a
single piece of DNA and replicates it until 10 million copies of a discrete DNA molecule
are bound to each bead.
Step III: Each bead is deposited into an individual well of a fiber-optic slide, where it
meets a cocktail of the enzymes required for the pyrophosphate reaction. These steps
take the place of the laborious task of cloning individual DNA molecules and eliminate
biases that can be introduced by cloning a population of fragments (transformed bacteria
will preferentially replicate certain sequence stretches over others).
40. Step IV: A loaded slide is then fed into the sequencing instrument, which washes
deoxynucleotides over the plate, extending the chains of DNA templates in each well and
promoting photon release.
Step V: A computer records the release of light, logs the sequence of the DNA in each well,
and eventually interprets these data to align smaller bits of sequence into a full genome
sequence.
In 2005, 454 Life Sciences released the genome of Mycoplasma genitalium, the first
organism sequenced by this technology.
41. Illumina Sequencing
Illumina sequencing technology uses clonal
array formation (i.e. solid phase amplification)
and proprietary reversible terminator technology
(It is a modified nucleotide analogous that can
terminate primer extension reversibly) for
rapid and accurate large-scale sequencing.
The innovative and flexible sequencing system
enables a broad array of applications in
genomics, transcriptomics, and epigenomics.
42. Step I: Cluster Generation
Prepare Genomic DNA Sample Attach DNA to Surface
48. Sequencing by Synthesis Sequencing by synthesis (SBS) technology uses four fluorescently
labeled nucleotides to sequence the tens of millions of clusters on the flow cell surface in
parallel.
During each sequencing cycle, a single labeled deoxynucleoside triphosphate (dNTP) is
added to the nucleic acid chain. The nucleotide label serves as a terminator for
polymerization, so after each dNTP incorporation, the fluorescent dye is imaged to
identify the base and then enzymatically cleaved to allow incorporation of the next
nucleotide. Since all four reversible terminator-bound dNTPs (A, C, T, G) are present
as single, separate molecules, natural competition minimizes incorporation bias.
Base calls are made directly from signal intensity measurements during each cycle, which
greatly reduces raw error rates compared to other technologies. The end result is highly
accurate base-by-base sequencing that eliminates sequence-context specific errors, enabling
robust base calling across the genome, including repetitive sequence regions and within
homopolymers.