Sequencing is one of the major technological advancement that has taken shape in the last two or three decade. Starting from Sanger and Maxam-Gilbert sequencing methods to the latest high-throughput methods, sequencing technologies has changed the the landscape of biological sciences.
This slide takes a look a the major sequencing methods over time.
Note: Several images included here have been sourced from GOOGLE IMAGES. The content has been extracted from several SCIENTIFIC PAPERS and WEBSITES.
PLEASE DO CONTACT THE AUTHOR DIRECTLY IF ANY COPYRIGHT ISSUE ARISES.
2. Sequencing
• The process of determining the primary structure of an unbranched
biopolymer.
• Usually used to refer DNA/RNA sequencing.
3. DNA Seqencing
• Is the process of determining the sequence of nucleotides (As, Ts, Cs,
and Gs) in a piece of DNA.
• Methods:
First generation methods
Sequencing by Chemical modification.
Sequencing by Chain termination.
4. RNA Seqencing
• Need for RNA-seq: Sequencing DNA gives a genetic profile of an organism,
sequencing RNA reflects only the sequences that are actively expressed in the
cells.
• Limitation for direct sequencing: RNA is less stable in the cell, and also more
prone to nuclease attack experimentally.
• Method:
Extraction of RNA
Reverse transcription to form cDNA
Sequencing of cDNA
5. Sequencing by Chemical modification
• Developed by Allan Maxam and Walter Gilbert in 1976–1977.
• Also known as Maxam and Gilbert Method.
• Based on nucleobase-specific partial chemical modification of DNA and
subsequent cleavage of the DNA backbone at sites adjacent to the modified
nucleotides.
Base Specific Modification
G: Methylation of N7 with dimethylsulphate at pH 8.0
renders the C8 -C9 bond specifically susceptible to
cleavage by base
A+G: Piperidine formate (pH 2) weakens the
glycosidic bonds of A and G residues by protonating
N atoms in the purine rings, resulting in depurination
C+T: Hydrazine opens pyrimidine rings, which
recyclize in a five-membered form that is susceptible
to removal
C: In the presence of 1.5 M NaCl, only cytosine
reacts appreciably with hydrazine
6. Sequencing by Chemical modification
STEPS
Purification of the DNA
Modified DNAs may then be cleaved by hot piperidine; (CH2)5NH
at the position of the modified base.
Radioactive labeling at 5′ end (typically by a kinase reaction using
gamma-32P ATP)
Chemical treatment to generate breaks at a small proportion of
one or two of the four nucleotide bases in each of four reactions
Fragments in the four reactions are electrophoresed side by side in
denaturing acrylamide gels for size separation
Note: In all cases reactions are carried out under carefully controlled conditions to
ensure that on average only one of the target bases in each DNA molecule is modified
7. Sequencing by Chain termination
• Also known as Sanger sequencing.
• Developed by Frederick Sanger and colleagues in 1977.
• Based on di-deoxynucleotidetriphosphates (ddNTPs).
8. Sequencing by Chain termination
DNA sample is divided into four separate sequencing
reactions, containing all four of the standard deoxynucleotides
(dATP, dGTP, dCTP and dTTP) and the DNA polymerase.
To each reaction is added only one of the four
dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP)
Polymerase Chain Reaction (PCR) to amplify the template
DNA fragments are heat denatured by Snap-Chill method
Separated by size using gel electrophoresis
The DNA bands may then be visualized by autoradiography
or UV light and the DNA sequence can be directly read off
the X-ray film or gel image.
9. Next-Generation DNA Sequencing
• Also known as high-throughput sequencing.
• NGS is the general term used to describe a number of different modern sequencing technologies
including:
Illumina (Solexa) sequencing
Roche 454 sequencing
Ion torrent: Proton / PGM sequencing
SOLiD sequencing
.... and few others
Advantages Disadvantages
Sequencing and quicker & cheaper. The associated error rates (~0.1–15%) are higher
The read lengths generally shorter (35–700 bp for
short-read approaches)
11. Sequencing by ligation (SBL)
• Involve the hybridization and ligation of labelled probe and anchor sequences to a DNA
strand.
• The probes encode one or two known bases and a series of degenerate or universal bases,
driving complementary binding between the probe and template.
• The anchor fragment encodes a known sequence that is complementary to an adapter
sequence and provides a site to initiate ligation.
• After ligation, the template is imaged and the known base or bases in the probe are
identified.
• A new cycle begins after complete removal of the anchor–probe complex or through
cleavage to remove the fluorophore and to regenerate the ligation site.
• Platforms: SOLiD and Complete Genomics
12. Sequencing by ligation (SBL)
• Involve the hybridization and ligation of labelled probe and anchor sequences to a DNA
strand.
• The probes encode one or two known bases and a series of degenerate or universal bases,
driving complementary binding between the probe and template.
• The anchor fragment encodes a known sequence that is complementary to an adapter
sequence and provides a site to initiate ligation.
• After ligation, the template is imaged and the known base or bases in the probe are
identified.
• A new cycle begins after complete removal of the anchor–probe complex or through
cleavage to remove the fluorophore and to regenerate the ligation site.
• Platforms: SOLiD and Complete Genomics
13. SBL: SOLiD
Sequencing by Oligonucleotide Ligation and
Detection utilizes two-base-encoded probes, in which
each fluorometric signal represents a dinucleotide.
The raw output is not directly associated with the
incorporation of a known nucleotide. Because the 16
possible dinucleotide combinations cannot be
individually associated with spectrally resolvable
fluorophores, four fluorescent signals are used, each
representing a subset of four dinucleotide
combinations.
Each ligation signal represents one of several possible
dinucleotides, leading to the term colour-space (rather
than base-space ), which must be deconvoluted
during data analysis.
The SOLiD sequencing procedure is composed of a
series of probe–anchor binding, ligation, imaging and
cleavage cycles to elongate the complementary strand.
Over the course of the cycles, single-nucleotide
offsets are introduced to ensure every base in the
template strand is sequenced.
14. SBL: Complete Genomics
performs DNA sequencing using combinatorial
probe–anchor ligation (cPAL) or combinatorial
probe–anchor synthesis (cPAS).
In cPAL , an anchor sequence (complementary to one
of the four adaptor sequences) and a probe hybridize
to a DNA nanoball at several locations.
In each cycle, the hybridizing probe is a member of a
pool of one-base-encoded probes, in which each
probe contains a known base in a constant position
and a corresponding fluorophore.
After imaging, the entire probe–anchor complex is
removed and a new probe–anchor combination is
hybridized.
Each sub-sequent cycle utilizes a probe set with the
known base in the n + 1 position.
Further cycles in the process also use adaptors of
variable lengths and chemistries, allowing
sequencing to occur upstream and downstream of the
adaptor sequence.
The cPAS approach is a modification of cPAL
intended to increase read lengths of Complete
Genomics’ chemistry.
15. Sequencing by synthesis(SBS)
• SBS is a term usedto describe numerous DNA-polymerase-dependent
methods.
• SBS approaches can be classified as:
Cyclic reversible termination (CRT)
Single-nucleotide addition (SNA)
Platforms: Illumina, Qiagen, 454, Ion Torrent
16. SBS: CRT
CRT approaches are defined by their use of terminator molecules that are similar to those used in Sanger
sequencing, in which the ribose 3ʹ‐OH group is blocked, thus preventing elongation.
To begin the process, a DNA template is primed by a sequence that is complementary to an adapter region,
which will initiate polymerase binding to this double-stranded DNA (dsDNA) region.
During each cycle, a mixture of all four individually labelled and 3ʹ‐blocked deoxynucleotides (dNTPs) are
added. After the incorporation of a single dNTP to each elongating complementary strand, unbound dNTPs are
removed and the surface is imaged to identify which dNTP was incorporated at each cluster.
The fluorophore and blocking group can then be removed and a new cycle can begin.
Platform: Illumina, Qiagen
17. SBS: CRT [Illumina]
• Accounts for the largest market share for sequencing instruments.
• Illumina’s suite of instruments for short-read sequencing range from small, low-throughput benchtop units to
large ultra-high throughput instruments dedicated to population-level whole-genome sequencing (WGS).
• dNTP identification is achieved through total internal reflection fluorescence (TIRF) microscopy using either two
or four laser channels.
• In most Illumina platforms, each dNTP is bound to a single fluorophore that is specific to that base type and
requires four different imaging channels, whereas the NextSeq and Mini-Seq systems use a two-fluorophore
system .
TIRFM is a type of microscope
with which a thin region of a
specimen, usually less than
200 nanometers can be
observed.
A video of Illumina Sequencing by Synthesis is
available at https://youtu.be/fCd6B5HRaZ8
18. SBS: CRT [Qiagen]
• GeneReader is intended to be an all‐in‐one NGS platform, from sample preparation to analysis.
• To accomplish this, the GeneReader system is bundled with the QIAcube sample preparation system and the
Qiagen Clinical Insight platform for variant analysis.
• The GeneReader uses virtually the same approach as that used by Illumina; however, it does not aim to ensure
that each template incorporates a fluorophore labelled dNTP.
• Rather, GeneReader aims to ensure that just enough labelled dNTPs are incorporated to achieve identification.
19. SBS: SNA
SNA approaches rely on a single signal to mark the incorporation of a dNTP into an elongating strand.
Each of the four nucleotides must be added iteratively to a sequencing reaction to ensure only one dNTP is
responsible for the signal.
This does not require the dNTPs to be blocked, as the absence of the next nucleotide in the sequencing reaction
prevents elongation.
The exception to this is homopolymer regions where identical dNTPs are added, with sequence identification
relying on a proportional increase in the signal as multiple dNTPs are incorporated.
Platforms: 454, Ion Torrent
20. SBS: SNA [454]
• The first NGS instrument developed was the 454 pyrosequencing device.
• This SNA system distributes template-bound beads into a PicoTiterPlate along with beads containing an enzyme
cocktail. As a dNTP is incorporated into a strand, an enzymatic cascade occurs, resulting in a bioluminescence
signal. Each burst of light, detected by a charge-coupled device (CCD) camera, can be attributed to the
incorporation of one or more identical dNTPs at a particular bead.
21. SBS: SNA [Ion Torrent]
• The Ion Torrent was the first NGS platform without optical sensing.
• Doesnot depend on enzymatic reaction to generate a signal.
• It detects the H+ ions that are released as each dNTP is incorporated.
• The resulting change in pH is detected by an integrated complementary metal-oxide--semiconductor (CMOS) and
an ion-sensitive field-effect transistor (ISFET).
• The pH change detected by the sensor is imperfectly proportional to the number of nucleotides detected, allowing
for limited accuracy in measuring homopolymer lengths.
22. Long-read sequencing
• Genomes are highly complex with many long repetitive elements, copy number alterations and
structural variations that are relevant to evolution, adaptation and disease.
• Many of these complex elements are so long that short-read paired-end technologies are
insufficient to resolve them.
• Long-read sequencing delivers reads in excess of several kilobases, allowing for the resolution of
these large structural features.
• Long reads can span complex or repetitive regions with a single continuous read, thus eliminating
ambiguity in the positions or size of genomic elements.
• Long reads can also be useful for transcriptomic research, as they are capable of spanning entire
mRNA transcripts, allowing researchers to identify the precise connectivity of exons and discern
gene isoforms.
• Two main technologies:
Real-time long-read sequencing
Synthetic approaches
23. Real-time long-read sequencing
• The single-molecule approaches differ from short-read approaches in
that they do not rely on a clonal population of amplified DNA
fragments to generate detectablesignal, nor do they require chemical
cycling for each dNTP added.
• Platforms: SMRT-PacBio, MinION, llumina synthetic long-read
sequencing, 10X Genomics emulsion-based system
24. Real-time long-read sequencing: [SMRT-PacBio]
• Uses a specialized flow cell with many thousands of individual picolitre wells
with transparent bottoms — zero-mode waveguides (ZMW).
• Polymerase is fixed at the bottom of the well and allows the DNA strand to
progress through the ZMW.
• By having a constant location of incorporation owing to the stationary enzyme, the
system can focus on a single molecule.
• dNTP incorporation on each single-molecule template per well is continuously
visualized with a laser and camera system that records the colour and duration of
emitted light as the labelled nucleotide momentarily pauses during incorporation at
the bottom of the ZMW.
• The polymerase cleaves the dNTP-bound fluorophore during incorporation,
allowing it to diffuse away from the sensor area before the next labelled dNTP is
incorporated.
• SMRT uses a unique circular template that allows each template to be sequenced
multiple times as the polymerase repeatedly traverses the circular molecule.
• Although it is difficult for DNA templates longer than ~3 kb to be sequenced
multiple times, shorter DNA templates can be sequenced many times as a function
of template length.
• These multiple passes are used to generate a consensus read of insert , known as a
circular consensus sequence (CCS).
25. Real-time long-read sequencing: [MinION]
• The first consumer prototype of the nanopore sequencer was made available in 2014 by Oxford Nanopore Technologies
(ONT).
• This do not monitor incorporations or hybridizations of nucleotides guided by a template DNA strand.
• Whereas other platforms use a secondary signal, light, colour or pH, nanopore sequencers directly detect the DNA
composition of a native ssDNA molecule.
• To carry out sequencing, DNA is passed through a protein pore as current is passed through the pore.
• As the DNA translocates through the action of a secondary motor protein, a voltage blockade occurs that modulates the
current passing through the pore.
• The temporal tracing of these charges is called squiggle space , and shifts in voltage are characteristic of the particular
DNA sequence in the pore, which can then be interpreted as a k‐mer.
• Rather than having 1–4 possible signals, the instrument has more than 1,000 — one for each possible k‐mer, especially
when modified bases present on native DNA are taken into account.
• The ONT MinION uses a leader hairpin library structure.
• This allows the forward DNA strand to pass through the pore, followed by a hairpin that links the two strands, and
finally the reverse strand. This generates 1D and 2D reads in which both ‘1D’ strands can be aligned to create a
consensus sequence ‘2D’ read.
27. Synthetic approaches
• The synthetic approaches do not generate actual long-reads; rather,
they are an approach to library preparation that leverages barcodes to
allow computational assembly of a larger fragment.
• Platforms: llumina synthetic long-read sequencing,10X Genomics
emulsion-based system
28. Synthetic approaches [llumina synthetic long-read sequencing]
• The Illumina system (formerly Moleculo) partitions DNA into a microtitre plate and does not require specialized
instrumentation.
29. Synthetic approaches [10X Genomics]
• 10X Genomics instruments (GemCode and Chromium) use
emulsion to partition DNA and require the use of a microfluidic
instrument to perform pre-sequencing reactions.
• With as little as 1 ng of starting material, the 10X Genomics
instruments can partition arbitrarily large DNA fragments, up to
~100 kb, into micelles called ‘GEMs’, which typically contain
≤0.3× copies of the genome and one unique barcode.
• Within each GEM, a gel bead dissolves and smaller fragments of
DNA are amplified from the original large fragments, each with a
barcode identifying the source GEM.
• After sequencing, the reads are aligned and linked together to
form a series of anchored fragments across the span of the
original fragment.
30. Applications
• Rapidly sequence whole genomes.
• Zoom in to deeply sequence target regions.
• Utilize RNA sequencing (RNA-Seq) to discover novel RNA variants
and splice sites, or precisely quantify mRNAs for gene expression
analysis.
• Analyze epigenetic factors such as genome-wide DNA methylation
and DNA-protein interactions.
• Sequence cancer samples to study rare somatic variants, tumor
subclones, and more.
• Study microbial diversity.
32. Paired-End vs. Single-Read Sequencing
• In single-end reading, the sequencer reads a fragment from only one end to the other, generating the
sequence of base pairs.
• In paired-end reading it starts at one read, finishes this direction at the specified read length, and then starts
another round of reading from the opposite end of the fragment..
• Paired-end sequencing facilitates detection of genomic rearrangements and repetitive sequence elements, as
well as gene fusions and novel transcripts.
• Since paired-end reads are more likely to align to a reference, the quality of the entire data set improves.
33. Multiplex Sequencing
• Process a large number of samples with multiplex sequencing on a high-throughput instrument.
• Sample multiplexing is a useful technique when targeting specific genomic regions or working with smaller
genomes.
• To accomplish this, individual "barcode" sequences are added to each sample so they can be distinguished
and sorted during data analysis.
• Pooling samples exponentially increases the number of samples analyzed in a single run, without drastically
increasing cost or time.
34. Mate Pair Sequencing
• Mate pair sequencing involves generating long-insert paired-end DNA libraries
useful for a number of sequencing applications, including:
De novo sequencing
Genome finishing
Structural variant detection
Identification of complex genomic rearrangements
• Difference from paired-end sequencing: Longer read length (>800bp)
• Method: First DNA is fragmented and fragments of a desired length (2-5 kb)
are isolated. Afterwards the ends of the DNA fragments are biotinylated (adding
Biotine). The biotinylated ends leads to a circularizing of the fragments. Then
the DNA ring is crushed into smaller fragments (400-600 bp). Biotinylated
fragments are enriched (by biotin tag) and adapters are ligated. They are then
ready for cluster generation and sequencing. The trick here is that the produced
fragment (400-600 bp) contains the ends of the original long fragment (2-5 kb)
and can be sequenced now. After sequencing you therefore get information
about the original fragment.
• Combining data from mate pair sequencing with that from short-insert paired-
end reads provides increased information for maximising sequencing coverage
across a genome
35. Deep Sequencing
• Deep sequencing refers to sequencing a genomic region multiple times, sometimes hundreds or even
thousands of times.
• This NGS approach allows researchers to detect rare clonal types, cells, or microbes comprising as little as
1% of the original sample.
• Deep sequencing is useful for studies in oncology, microbial genomics, and other research involving
analysis of rare cell populations.
• For example, deep sequencing is required to identify mutations within tumors, because normal cell
contamination is common in cancer samples, and the tumors themselves likely contain multiple sub-clones
of cancer cells.
36. Sequencing Coverage
• Sequencing coverage describes the average number of reads that align to, or "cover," known reference
bases.
• The next-generation sequencing (NGS) coverage level often determines whether variant discovery can be
made with a certain degree of confidence at particular base positions.
• Sequencing coverage requirements vary by application as well as on other factors such as size of reference
genome, gene expression level, published literature, and best practices defined by the scientific community.
• At higher levels of coverage, each base is covered by a greater number of aligned sequence reads, so base
calls can be made with a higher degree of confidence.
• Examples of sequencing coverage recommendations for some common applications include:
• For detecting human genome mutations, SNPs, and rearrangements, publications often recommend from 10× to 30× depth
of coverage, depending on the application and statistical model.
• For RNA sequencing, researchers usually think in terms of numbers of millions of reads to be sampled. Detecting rarely
expressed genes often requires an increase in the depth of coverage.
• For ChIP-Seq (chromatin immunoprecipitation sequencing), publications often recommend coverage of around 100x.
Average coverage = N * L / G
Where: G is length of the original genome, N is the number of reads, and L
is the average read length
37. Sequencing Coverage Histograms
• Coverage histograms are commonly used to depict the range and uniformity of sequencing coverage for an
entire data set.
• They illustrate the overall coverage distribution by displaying the number of reference bases that are
covered by mapped sequencing reads at various depths.
• Mapped read depth refers to the total number of bases sequenced and aligned at a given reference base
position (note that "mapped" and "aligned" are used interchangeably in the sequencing community).
• In a sequencing coverage histogram, the read depths are binned and displayed on the x-axis, while the total
numbers of reference bases that occupy each read depth bin are displayed on the y-axis. These can also be
written as percentages of reference bases.
38. Evaluating NGS Coverage
• Inter-Quartile Range (IQR): The IQR is the difference in sequencing
coverage between the 75th and 25th percentiles of the histogram. This value
is a measure of statistical variability, reflecting the non-uniformity of
coverage across the entire data set. A high IQR indicates high variation in
coverage across the genome, while a low IQR reflects more uniform
sequence coverage. In the shown histograms, the lower IQR indicates that
the histogram on the left has better sequencing coverage uniformity than that
on the right.
• Mean (Mapped) Read Depth: The mean mapped read depth (or mean read
depth) is the sum of the mapped read depths at each reference base position,
divided by the number of known bases in the reference. The mean read depth
metric indicates how many reads, on average, are likely to be aligned at a
given reference base position.
• Raw Read Depth: This is the total amount of sequence data produced by the
instrument (pre-alignment), divided by the reference genome size. Although
raw read depth is often provided by sequencing instrument vendors as a
specification, it does not take into account the efficiency of the alignment
process. If a large fraction of the raw sequencing reads are discarded during
the alignment process, the post-alignment mapped read depth can be
significantly smaller than the raw read depth.
39. References:
• Coming of age: ten years of next-generation sequencing technologies. Sara Goodwin, John D.
McPherson and W. Richard McCombie. Nature Reviews: Genetics. doi:10.1038/nrg.2016.49
• Illumina: https://www.illumina.com
• ecSeq Bioinformatics: https://www.ecseq.com
• Wikipedia: https://en.wikipedia.org/
• Columbia Genome Centre: https://systemsbiology.columbia.edu
40. Next Generation Sequencing Glossary can be found at:
http://sabiosciences.com/NGS_Glossary.php
http://deeptools.readthedocs.io/en/latest/content/help_glossary.html
https://www.nextgenerationsequencing.info/ngs-introduction/ngs-glossary
http://www.nslc.wustl.edu/elgin/genomics/tour_nextgen/glossary.pdf