A presentation of the major nextgen sequencing technologies. Price indications were valid 1st quarter of 2009. Mac keynote file, half french half english. Hope this can help.
3. Virus: 3500 à 8 x 105 bases
Bactéries plus de1Mb
(Escherichia coli = 4,7 Mb)
Basics
1 kilobase 1kb
= 1 000 bases
Eucaryotes de 10 à 3 x 105 Mb
levure = 1,3 Mb
drosophile = 165 Mb
1 mégabase 1Mb
1 000 000 bases
1 million
Homo sapiens 3400 Mb 3Gb
20 000-25 000 genes
Transcriptome = 2% Genome
1 gigabase 1 Gb
1000 Mb
1 milliard
4. Avant: le séquencage enzymatique
= SANGER Sequencing
ADN simple brin + ADN polymérase
addition d ’un didéoxy.en petite quantité (ddNTP)
4 réactions pour les 4 bases en //, chacune avec 1 didéoxy.
différent
synthèse arrêtée à chaque incorporation d ’un didéoxy.
statistiquement, autant de fragments avortés que de fois où la
base est représentée
5. Avant: le séquencage enzymatique
= SANGER Sequencing
ADN simple brin + ADN polymérase
addition d ’un didéoxy.en petite quantité (ddNTP)
4 réactions pour les 4 bases en //, chacune avec 1 didéoxy.
différent
synthèse arrêtée à chaque incorporation d ’un didéoxy.
statistiquement, autant de fragments avortés que de fois où la
base est représentée
10. Cout séquencage:
Idée du
3+1+(0.4+4.5+0.4)x2=14.6€/1séq. ds de 700b
CEQ 8 capillaires: 33.000b ds/24h (48x2x700b)
cout du
Cout séquencage de 33.000b ds: 688€
CEquencing
Cout séquencage de 1Mb ds: 20.848€
Bioinformatique, confirmation:
5min/1000b 7hrs/33.000b
11. Roche GS-FLXti
0.4 Gb/run
Next Generation 1m reads @ 400b
Sequencers €5990/run
€14.97/Mb
€500k/inst.
Illumina GA2
NextGen Sequencers - NextGen Sequencing
5-10 Gb/run
(NGS)
60m reads @ 50b
Whole Genome Sequencer - Whole Genome
Sequencing (WGS) $8250(€6180)/run (5Gb)
$0,33(€0,25)/Mb
$460k(€344k)/inst.
AB Solid 3.0
10-20 Gb/run
100m reads @ 50b
€5300/run 5+5Gb
The competition:
€0,53/Mb
Helicos Biosciences, Pacific Biosiences, George Church Lab.,
Nanopores sequencing, ZS-Genetics, Sequencing by TEM...
€462k/inst.
12. The Polonator G.007 is the
first quot;open sourcequot; gene
sequencing instrument to hit
Other Players
the lab market in which the
instrument's software (Web
ware) and specifications are
freely available to the public.
At $150,000, the Polonator is
the cheapest instrument on
the market
George Church Lab. + Danaher Motion: Polonator G.007 The HeliScope™
Single Molecule
Sequencer is the first
Helicos BioSciences Corp.: HeliScope SMS genetic analyzer to
harness the power of
direct DNA
measurement,
enabled by Helicos
ZS-Genetics: Electron Microscopy Sequencing. By the first True Single Molecule
Sequencing (tSMS)™
half of 2009, the system is expected to read complete a haploid
technology.
human genome in approximately 8 days, with 4X coverage, at a cost
in the tens of thousands of dollars.
Pacific BioSciences published technology for Single
Molecule Realtime Sequencing SMRT. Instrument by 2010
Moebius Biosystems: Nexus. Over 6 Gigabases in 24hrs.
Nanopore sequencing: Oxorf Nanopore, Sequenom...etc
Pacific BioSciences
14. • GS-FLXti Data
emPCR Sequencing
DNA Library Preparation and Titration
4.5 h and 10.5 h 8h 10 h
Genome fragmented by
nebulization
No cloning; no colony
picking
sstDNA library created
with adapters
A/B fragments selected
using avidin-biotin
purification
gDNA sstDNA library
Process Steps
1. DNA library preparation
15. • GS-FLXti Data
emPCR Sequencing
DNA Library Preparation and Titration
4.5 h and 10.5 h 8h 10 h
Break microreactors,
Anneal sstDNA to Emulsify beads and Clonal amplification
enrich for DNA-
an excess of DNA PCR reagents in water- occurs inside
positive beads
capture beads in-oil microreactors microreactors
sstDNA library Clonally-amplified sstDNA attached to bead
Process Steps
2. emulsion PCR
16. •Multiple optical
fibers are fused to
form an optical
array.
•Proprietary
etching method
produces wells
that serve as
picoliter reaction
vessels.
•Each well is only
able to accept a
single DNA bead.
Load sequencing
Load PicoTiterPlate
Load genome into
•Reactions in the reagents
device on instrument
PicoTiterPlate device
wells are
Close and Press GO! – sequence genome
measured of the
CCD camera.
Process Steps
•Titanium plate:
3. Sequencing with the PicoTiterPlate
3.4m wells
device
17. •Multiple optical
fibers are fused to
form an optical
array.
•Proprietary
etching method
produces wells
that serve as
picoliter reaction
vessels.
•Each well is only
able to accept a
single DNA bead.
Load sequencing
Load PicoTiterPlate
Load genome into
•Reactions in the reagents
device on instrument
PicoTiterPlate device
wells are
Close and Press GO! – sequence genome
measured of the
CCD camera.
Process Steps
•Titanium plate:
3. Sequencing with the PicoTiterPlate
3.4m wells
device
18. •Multiple optical
fibers are fused to
form an optical
array.
•Proprietary
etching method
produces wells
that serve as
picoliter reaction
vessels.
•Each well is only
able to accept a
single DNA bead.
Load sequencing
Load PicoTiterPlate
Load genome into
•Reactions in the reagents
device on instrument
PicoTiterPlate device
wells are
Close and Press GO! – sequence genome
measured of the
CCD camera.
Process Steps
•Titanium plate:
3. Sequencing with the PicoTiterPlate
3.4m wells
device
19. •Multiple optical
fibers are fused to
form an optical
array.
•Proprietary
etching method
produces wells
that serve as
picoliter reaction
vessels.
•Each well is only
able to accept a
single DNA bead.
Load sequencing
Load PicoTiterPlate
Load genome into
•Reactions in the reagents
device on instrument
PicoTiterPlate device
wells are
Close and Press GO! – sequence genome
measured of the
CCD camera.
Process Steps
•Titanium plate:
3. Sequencing with the PicoTiterPlate
3.4m wells
device
20. •Multiple optical
fibers are fused to
form an optical
array.
•Proprietary
etching method
produces wells
that serve as
picoliter reaction
vessels.
•Each well is only
able to accept a
single DNA bead.
Load sequencing
Load PicoTiterPlate
Load genome into
•Reactions in the reagents
device on instrument
PicoTiterPlate device
wells are
Close and Press GO! – sequence genome
measured of the
CCD camera.
Process Steps
•Titanium plate:
3. Sequencing with the PicoTiterPlate
3.4m wells
device
21. •Multiple optical
fibers are fused to
form an optical
array.
•Proprietary
etching method
produces wells
that serve as
picoliter reaction
vessels.
•Each well is only
able to accept a
single DNA bead.
Load sequencing
Load PicoTiterPlate
Load genome into
•Reactions in the reagents
device on instrument
PicoTiterPlate device
wells are
Close and Press GO! – sequence genome
measured of the
CCD camera.
Process Steps
•Titanium plate:
3. Sequencing with the PicoTiterPlate
3.4m wells
device
22. DNA Library Preparation and Titration emPCR Sequencing
• GS-FLXti Data 4.5 h and 10.5 h 8h 10 h
3.4 m wells
3.4 m reads obtained in parallel
A single clonally amplified sstDNA bead
is deposited per well.
Amplified sstDNA library beads Quality filtered bases
DNA capture 4 bases (TACG)
bead containing cycled 200 times
millions of copies Chemiluminescent
of a single clonal signal generation
fragment Signal processing to
determine base
sequence and quality
score
Amplified sstDNA library beads Quality filtered bases
Process Steps
3. Sequencing
23. T
•Raw data is C
G
processed
A
from a series
of individual T
images.
•Each well’s
data is
extracted,
Signal output from a single well
Metric and image viewing software
quantified,
(flowgram)
and
normalized.
•Read data is
converted
into
flowgrams.
Process Steps
4. Signal-processing
24. •Raw data is
processed
from a series
of individual
images.
Key sequence = TCAG for identifying wells and calibration
•Each well’s Flow of individual bases (TCAG) is 42 times.
data is
TA
extracted, CG
quantified,
and
normalized.
TTCTGCGAA
•Read data is
converted
into
flowgrams.
Base flow
Signal strength
Process Steps
4. Signal-processing
25. • Quality filtered bases
• GS-FLXti Data
400-500 bp average read length
> 0.4 Gb or 1m reads with a 70 x 75 mm FLXti PicoTiterPlate device
10 hours run time
• Phred-like quality score for use in available assemblers or viewers
• Consensus base-called contig files - FASTA file of assembled reads
mapping against known scaffold (resequencing)
de novo assembly of individual bases in consensus contigs
• Viewer-ready genome file - assembly file in .ace format
• Assembly metric files
• Run-time metrics files - summarize important information pertaining to
sequencing quality for each run
Process Steps
5. Data output
27. • GS-FLXti Data
Sanger: Weeks
454: 4 days
Sanger Technology
7 days Weeks
Preparation* Total Sequencing Time
- DNA Library Preparation - 180 runs (1 per 4 hours)
- Cloning - 2-million-base (Mb) genome
- Template Preparation - 6x coverage
454 Technology
2.5 days 1 day
Preparation Total Sequencing Time
- DNA Library Preparation - 1 run (10 hours)
- Titration of Library Beads - 400-600 million-base (Mb)
- emPCR
Technology Comparison
Sanger vs. 454 technology
for a 2-million-base genome
28. NextGen
Sequencers Roche GS-FLX:
Workflow
IT steps:
Workflow 3-4 days (setup) + 1 day (run)
1. Generation of a single-stranded template DNA library (~8-16 hours)
2. Emulsion-based clonal amplification of the library (~8 hours)
GS-FLX Software
3. Data generation via sequencing-by-synthesis (9 hours)
4. Image and Base calling analysis (~8 hours)
▪GS Reference Mapper
5. Data analysis using different bioinformatics tools
▪GS De Novo Assembler
•Long Single Reads / Standard Shotgun (required input = 3–5μg,5μg recommended)
▪GS Amplicon Variant Analyzer
~1,000,000 single reads with an average read length of 400 bases
•Paired End Reads (required input = 5μg @25 ng/μl or above, in TE; >10kb)
◦3K Long-Tag Paired End Reads. Sequence 100 bases from each end of a 3,000 base span
on a single sequence read (Figure). Co-assemble GS FLX Titanium shotgun reads with 3K
Third Party Software
Long-Tag Paired Ends reads from Standard series runs.
•Sequence Capture (required input = 3–5μg)
◦Roche NimbleGen Sequence Capture using a single microarray hybridization-based
enrichment process.
•Amplicon Sequencing (1-5ng or 10-50ng)
◦The DNA-sample preparation for Amplicon Sequencing with the GS FLX System consists of a
simple PCR amplification reaction with special Fusion Primers. The Fusion Primer consists of a
20-25 bp target-specific sequence (3' end) and a 19 bp fixed sequence (Primer A or Primer B
on the 5' end).
29. NextGen
Sequencers Roche GS-FLX:
Workflow
IT steps:
Workflow 3-4 days (setup) + 1 day (run)
1. Generation of a single-stranded template DNA library (~8-16 hours)
2. Emulsion-based clonal amplification of the library (~8 hours)
GS-FLX Software
3. Data generation via sequencing-by-synthesis (9 hours)
4. Image and Base calling analysis (~8 hours)
▪GS Reference Mapper
5. Data analysis using different bioinformatics tools
▪GS De Novo Assembler
•Long Single Reads / Standard Shotgun (required input = 3–5μg,5μg recommended)
▪GS Amplicon Variant Analyzer
~1,000,000 single reads with an average read length of 400 bases
•Paired End Reads (required input = 5μg @25 ng/μl or above, in TE; >10kb)
◦3K Long-Tag Paired End Reads. Sequence 100 bases from each end of a 3,000 base span
on a single sequence read (Figure). Co-assemble GS FLX Titanium shotgun reads with 3K
Third Party Software
Long-Tag Paired Ends reads from Standard series runs.
•Sequence Capture (required input = 3–5μg)
◦Roche NimbleGen Sequence Capture using a single microarray hybridization-based
enrichment process.
•Amplicon Sequencing (1-5ng or 10-50ng)
◦The DNA-sample preparation for Amplicon Sequencing with the GS FLX System consists of a
simple PCR amplification reaction with special Fusion Primers. The Fusion Primer consists of a
20-25 bp target-specific sequence (3' end) and a 19 bp fixed sequence (Primer A or Primer B
on the 5' end).
30. NextGen Roche GS-FLX:
Sequencers
add-ons
not included
- Nebulizers + nitrogen tank
Nebulization is required to shear fragments for DNA >70-800bp
- emPCR Breaking Kit
This device is required for the preparation of consistently sized reactors
for emulsion PCR.
- Magnetic Concentrator IVGN +€5000
- MT plate centrifuge BCI +€15.000
- Multisizer™ 3 COULTER counter +€15.000
The most versatile and accurate particle sizing and counting analyzer
available today. Using The Coulter Principle, also known as ESZ (Electrical
Sensing Zone Method), the Multisizer 3 COULTER COUNTER provides
number, volume, mass and surface area size distributions in one
measurement, with an overall sizing range of 0.4 µm to 1,200
- Agilent BioAnalyzer +€20.000
- Titanium cluster station +€29.000
31. Roche FLXti:
Next Generation 0.5 Gb/run
1m reads @ 400b
Sequencers €5990/run
€14.97/Mb
€585k/inst. tot
The Roche Roche FLXti:
Setup time: 3-4 d
0.5 Gb/run
System Run time: 10 hrs
images: 27 GB
Primary Analysis: 15 GB
PA CPU time: 80-220 hrs
(6-7 hrs with cluster st)
Final file size: 4 GB
notes:
400-500b frag. length sequencing
future dev. up to 1000b
x coverage with long frag. vs x+n
coverage with short reads vs cost/
Mb
10 systems in France
Multiplexing capacity
≈200 publications
33. Illumina's Solexa Sequencing Technology
Step 1: Sample Preparation
The DNA sample of interest is sheared to appropriate size (average
~800bp) using a compressed air device known as a nebulizer. The
ends of the DNA are polished, and two unique adapters are ligated
to the fragments. Ligated fragments of the size range of 150-200bp
are isolated via gel extraction and amplified using limited cycles of
PCR. 1.5 days.
Steps 2-6: Cluster Generation by Bridge
Amplification
In contrast to the 454 and ABI methods which use a bead-based
emulsion PCR to generate quot;poloniesquot;, Illumina utilizes a unique
quot;bridgedquot; amplification reaction that occurs on the surface of the
flow cell.
The flow cell surface is coated with single stranded oligonucleotides
that correspond to the sequences of the adapters ligated during the
sample preparation stage. Single-stranded, adapter-ligated
fragments are bound to the surface of the flow cell exposed to
reagents for polyermase-based extension. Priming occurs as the
free/distal end of a ligated fragment quot;bridgesquot; to a complementary
oligo on the surface.
Repeated denaturation and extension results in localized
amplification of single molecules in millions of unique locations
across the flow cell surface. This process occurs in what is referred
to as Illumina's quot;cluster stationquot;, an automated flow cell processor.
8hrs.
34. Illumina's Solexa Sequencing Technology
Step 1: Sample Preparation
The DNA sample of interest is sheared to appropriate size (average
~800bp) using a compressed air device known as a nebulizer. The
ends of the DNA are polished, and two unique adapters are ligated
to the fragments. Ligated fragments of the size range of 150-200bp
are isolated via gel extraction and amplified using limited cycles of
PCR. 1.5 days.
Steps 2-6: Cluster Generation by Bridge
Amplification
In contrast to the 454 and ABI methods which use a bead-based
emulsion PCR to generate quot;poloniesquot;, Illumina utilizes a unique
quot;bridgedquot; amplification reaction that occurs on the surface of the
flow cell.
The flow cell surface is coated with single stranded oligonucleotides
that correspond to the sequences of the adapters ligated during the
sample preparation stage. Single-stranded, adapter-ligated
fragments are bound to the surface of the flow cell exposed to
reagents for polyermase-based extension. Priming occurs as the
free/distal end of a ligated fragment quot;bridgesquot; to a complementary
oligo on the surface.
Repeated denaturation and extension results in localized
amplification of single molecules in millions of unique locations
across the flow cell surface. This process occurs in what is referred
to as Illumina's quot;cluster stationquot;, an automated flow cell processor.
8hrs.
35. Illumina's Solexa Sequencing Technology
Step 1: Sample Preparation
The DNA sample of interest is sheared to appropriate size (average
~800bp) using a compressed air device known as a nebulizer. The
ends of the DNA are polished, and two unique adapters are ligated
to the fragments. Ligated fragments of the size range of 150-200bp
are isolated via gel extraction and amplified using limited cycles of
PCR. 1.5 days.
Steps 2-6: Cluster Generation by Bridge
Amplification
In contrast to the 454 and ABI methods which use a bead-based
emulsion PCR to generate quot;poloniesquot;, Illumina utilizes a unique
quot;bridgedquot; amplification reaction that occurs on the surface of the
flow cell.
The flow cell surface is coated with single stranded oligonucleotides
that correspond to the sequences of the adapters ligated during the
sample preparation stage. Single-stranded, adapter-ligated
fragments are bound to the surface of the flow cell exposed to
reagents for polyermase-based extension. Priming occurs as the
free/distal end of a ligated fragment quot;bridgesquot; to a complementary
oligo on the surface.
Repeated denaturation and extension results in localized
amplification of single molecules in millions of unique locations
across the flow cell surface. This process occurs in what is referred
to as Illumina's quot;cluster stationquot;, an automated flow cell processor.
8hrs.
36. Illumina's Solexa Sequencing Technology
Step 1: Sample Preparation
The DNA sample of interest is sheared to appropriate size (average
~800bp) using a compressed air device known as a nebulizer. The
ends of the DNA are polished, and two unique adapters are ligated
to the fragments. Ligated fragments of the size range of 150-200bp
are isolated via gel extraction and amplified using limited cycles of
PCR. 1.5 days.
Steps 2-6: Cluster Generation by Bridge
Amplification
In contrast to the 454 and ABI methods which use a bead-based
emulsion PCR to generate quot;poloniesquot;, Illumina utilizes a unique
quot;bridgedquot; amplification reaction that occurs on the surface of the
flow cell.
The flow cell surface is coated with single stranded oligonucleotides
that correspond to the sequences of the adapters ligated during the
sample preparation stage. Single-stranded, adapter-ligated
fragments are bound to the surface of the flow cell exposed to
reagents for polyermase-based extension. Priming occurs as the
free/distal end of a ligated fragment quot;bridgesquot; to a complementary
oligo on the surface.
Repeated denaturation and extension results in localized
amplification of single molecules in millions of unique locations
across the flow cell surface. This process occurs in what is referred
to as Illumina's quot;cluster stationquot;, an automated flow cell processor.
8hrs.
37. Illumina's Solexa Sequencing Technology
Step 1: Sample Preparation
The DNA sample of interest is sheared to appropriate size (average
~800bp) using a compressed air device known as a nebulizer. The
ends of the DNA are polished, and two unique adapters are ligated
to the fragments. Ligated fragments of the size range of 150-200bp
are isolated via gel extraction and amplified using limited cycles of
PCR. 1.5 days.
Steps 2-6: Cluster Generation by Bridge
Amplification
In contrast to the 454 and ABI methods which use a bead-based
emulsion PCR to generate quot;poloniesquot;, Illumina utilizes a unique
quot;bridgedquot; amplification reaction that occurs on the surface of the
flow cell.
The flow cell surface is coated with single stranded oligonucleotides
that correspond to the sequences of the adapters ligated during the
sample preparation stage. Single-stranded, adapter-ligated
fragments are bound to the surface of the flow cell exposed to
reagents for polyermase-based extension. Priming occurs as the
free/distal end of a ligated fragment quot;bridgesquot; to a complementary
oligo on the surface.
Repeated denaturation and extension results in localized
amplification of single molecules in millions of unique locations
across the flow cell surface. This process occurs in what is referred
to as Illumina's quot;cluster stationquot;, an automated flow cell processor.
8hrs.
38. Illumina's Solexa Sequencing Technology
Step 1: Sample Preparation
The DNA sample of interest is sheared to appropriate size (average
~800bp) using a compressed air device known as a nebulizer. The
ends of the DNA are polished, and two unique adapters are ligated
to the fragments. Ligated fragments of the size range of 150-200bp
are isolated via gel extraction and amplified using limited cycles of
PCR. 1.5 days.
Steps 2-6: Cluster Generation by Bridge
Amplification
In contrast to the 454 and ABI methods which use a bead-based
emulsion PCR to generate quot;poloniesquot;, Illumina utilizes a unique
quot;bridgedquot; amplification reaction that occurs on the surface of the
flow cell.
The flow cell surface is coated with single stranded oligonucleotides
that correspond to the sequences of the adapters ligated during the
sample preparation stage. Single-stranded, adapter-ligated
fragments are bound to the surface of the flow cell exposed to
reagents for polyermase-based extension. Priming occurs as the
free/distal end of a ligated fragment quot;bridgesquot; to a complementary
oligo on the surface.
Repeated denaturation and extension results in localized
amplification of single molecules in millions of unique locations
across the flow cell surface. This process occurs in what is referred
to as Illumina's quot;cluster stationquot;, an automated flow cell processor.
8hrs.
39. Illumina's Solexa Sequencing Technology
Steps 7-12: Sequencing by Synthesis
A flow cell containing millions of unique clusters is now loaded into
the 1G sequencer for automated cycles of extension and imaging.
The first cycle of sequencing consists first of the incorporation of a
single fluorescent nucleotide, followed by high resolution imaging of
the entire flow cell. These images represent the data collected for
the first base. Any signal above background identifies the physical
location of a cluster (or polony), and the fluorescent emission
identifies which of the four bases was incorporated at that position.
This cycle is repeated, one base at a time, generating a series of
images each representing a single base extension at a specific
cluster. Base calls are derived with an algorithm that identifies the
emission color over time. At this time reports of useful Illumina
reads range from 26-50 bases.
The use of physical location to identify unique reads is a critical
concept for all next generation sequencing systems. The density of
the reads and the ability to image them without interfering noise is
vital to the throughput of a given instrument. Each platform has its
own unique issues that determine this number, 454 is limited by the
number of wells in their PicoTiterPlate, Illumina is limited by
fragment length that can effectively quot;bridgequot;, and all providers are
limited by flow cell real estate. 2-6 days (18-36 cycles).
40. Illumina's Solexa Sequencing Technology
Steps 7-12: Sequencing by Synthesis
A flow cell containing millions of unique clusters is now loaded into
the 1G sequencer for automated cycles of extension and imaging.
The first cycle of sequencing consists first of the incorporation of a
single fluorescent nucleotide, followed by high resolution imaging of
the entire flow cell. These images represent the data collected for
the first base. Any signal above background identifies the physical
location of a cluster (or polony), and the fluorescent emission
identifies which of the four bases was incorporated at that position.
This cycle is repeated, one base at a time, generating a series of
images each representing a single base extension at a specific
cluster. Base calls are derived with an algorithm that identifies the
emission color over time. At this time reports of useful Illumina
reads range from 26-50 bases.
The use of physical location to identify unique reads is a critical
concept for all next generation sequencing systems. The density of
the reads and the ability to image them without interfering noise is
vital to the throughput of a given instrument. Each platform has its
own unique issues that determine this number, 454 is limited by the
number of wells in their PicoTiterPlate, Illumina is limited by
fragment length that can effectively quot;bridgequot;, and all providers are
limited by flow cell real estate. 2-6 days (18-36 cycles).
41. Illumina's Solexa Sequencing Technology
Steps 7-12: Sequencing by Synthesis
A flow cell containing millions of unique clusters is now loaded into
the 1G sequencer for automated cycles of extension and imaging.
The first cycle of sequencing consists first of the incorporation of a
single fluorescent nucleotide, followed by high resolution imaging of
the entire flow cell. These images represent the data collected for
the first base. Any signal above background identifies the physical
location of a cluster (or polony), and the fluorescent emission
identifies which of the four bases was incorporated at that position.
This cycle is repeated, one base at a time, generating a series of
images each representing a single base extension at a specific
cluster. Base calls are derived with an algorithm that identifies the
emission color over time. At this time reports of useful Illumina
reads range from 26-50 bases.
The use of physical location to identify unique reads is a critical
concept for all next generation sequencing systems. The density of
the reads and the ability to image them without interfering noise is
vital to the throughput of a given instrument. Each platform has its
own unique issues that determine this number, 454 is limited by the
number of wells in their PicoTiterPlate, Illumina is limited by
fragment length that can effectively quot;bridgequot;, and all providers are
limited by flow cell real estate. 2-6 days (18-36 cycles).
42. Illumina's Solexa Sequencing Technology
Steps 7-12: Sequencing by Synthesis
A flow cell containing millions of unique clusters is now loaded into
the 1G sequencer for automated cycles of extension and imaging.
The first cycle of sequencing consists first of the incorporation of a
single fluorescent nucleotide, followed by high resolution imaging of
the entire flow cell. These images represent the data collected for
the first base. Any signal above background identifies the physical
location of a cluster (or polony), and the fluorescent emission
identifies which of the four bases was incorporated at that position.
This cycle is repeated, one base at a time, generating a series of
images each representing a single base extension at a specific
cluster. Base calls are derived with an algorithm that identifies the
emission color over time. At this time reports of useful Illumina
reads range from 26-50 bases.
The use of physical location to identify unique reads is a critical
concept for all next generation sequencing systems. The density of
the reads and the ability to image them without interfering noise is
vital to the throughput of a given instrument. Each platform has its
own unique issues that determine this number, 454 is limited by the
number of wells in their PicoTiterPlate, Illumina is limited by
fragment length that can effectively quot;bridgequot;, and all providers are
limited by flow cell real estate. 2-6 days (18-36 cycles).
43. Illumina's Solexa Sequencing Technology
Steps 7-12: Sequencing by Synthesis
A flow cell containing millions of unique clusters is now loaded into
the 1G sequencer for automated cycles of extension and imaging.
The first cycle of sequencing consists first of the incorporation of a
single fluorescent nucleotide, followed by high resolution imaging of
the entire flow cell. These images represent the data collected for
the first base. Any signal above background identifies the physical
location of a cluster (or polony), and the fluorescent emission
identifies which of the four bases was incorporated at that position.
This cycle is repeated, one base at a time, generating a series of
images each representing a single base extension at a specific
cluster. Base calls are derived with an algorithm that identifies the
emission color over time. At this time reports of useful Illumina
reads range from 26-50 bases.
The use of physical location to identify unique reads is a critical
concept for all next generation sequencing systems. The density of
the reads and the ability to image them without interfering noise is
vital to the throughput of a given instrument. Each platform has its
own unique issues that determine this number, 454 is limited by the
number of wells in their PicoTiterPlate, Illumina is limited by
fragment length that can effectively quot;bridgequot;, and all providers are
limited by flow cell real estate. 2-6 days (18-36 cycles).
44. Illumina's Solexa Sequencing Technology
Steps 7-12: Sequencing by Synthesis
A flow cell containing millions of unique clusters is now loaded into
the 1G sequencer for automated cycles of extension and imaging.
The first cycle of sequencing consists first of the incorporation of a
single fluorescent nucleotide, followed by high resolution imaging of
the entire flow cell. These images represent the data collected for
the first base. Any signal above background identifies the physical
location of a cluster (or polony), and the fluorescent emission
identifies which of the four bases was incorporated at that position.
This cycle is repeated, one base at a time, generating a series of
images each representing a single base extension at a specific
cluster. Base calls are derived with an algorithm that identifies the
emission color over time. At this time reports of useful Illumina
reads range from 26-50 bases.
The use of physical location to identify unique reads is a critical
concept for all next generation sequencing systems. The density of
the reads and the ability to image them without interfering noise is
vital to the throughput of a given instrument. Each platform has its
own unique issues that determine this number, 454 is limited by the
number of wells in their PicoTiterPlate, Illumina is limited by
fragment length that can effectively quot;bridgequot;, and all providers are
limited by flow cell real estate. 2-6 days (18-36 cycles).
45. Pipeline software highlights
Automated image calibration: maximizes the number of clusters used to generate sequence data
Accurate cluster intensity scoring algorithms: allow efficient filtering for high-quality reads
Quality-calibrated base calls: minimize the propagation of downstream sequencing errors
Highly optimized genomic alignment tools: minimize the need for elaborate computer
infrastructures
Open source code: enables researchers to customize the software to meet their needs
46. Sanger: Weeks
Illumina: <7 days
Technology Comparison
Sanger vs. Solexa technology
for a 2-Gigabase genome
47. Sanger: Weeks
Illumina: <7 days
Technology Comparison
Sanger vs. Solexa technology
for a 2-Gigabase genome
48. NextGen
Illumina GA2:
Sequencers
Workflow
▪
Tracking Samples ready for sample prep
▪
Samples ready for cluster prep
Workflow 2-3 days (setup) + 2-3 days (run)
▪
Flow cells ready for sequencing
1.
Non amplified DNA/RNA Sample
2.
QC and possibly purify
3.
Process with appropriate Sample Prep Kit
4.
QC sample prep
▪
Serve analysis files to DAS2 enabled genome
DAS2 server
5.
Assemble 7 samples with the same number of cycles, library
browsers for direct visualization of results
types, and sample types
without file download
6.
Process grouped samples with appropriate Cluster Generation Kit
▪
Private server up and going using Authentication
7.
Run cluster generation
Mapping application (to handle 5-100 million 15-50bp sequences)
8.
Transfer flow cell onto Genome Analyzer
▪
9.
Run sequencing 1st cycle Filter sequences by quality score
▪
10.
QC 1st cycle Count and remove identical sequences
▪
11.
Run remaining cycles Map sequences to reference genome
12.
Export data
Filter application
▪
Take binary map files and filter based on type of
13.
Run analysis
aligment and # of counts
▪
Export filtered universal binary for downstream
applications
Distributed Annotation System (DAS) defines a communication protocol used to exchange biological annotations
49. NextGen Illumina GA2:
Sequencers
add-ons
not included - Cluster Station +$50.000
The Cluster Station is a standalone, software-
controlled system for the automated generation
of clonal clusters from single molecule fragments
on Illumina Genome Analyzer flow cells.
- Paired-End Module +$45.000
The Paired-End Module provides fully automated
template preparation for the second round of
sequencing in a paired-end sequencing run.
- IPAR +$60.000
IPAR is a bundled hardware and software solution
that provides real-time quality control and
integrated online processing of primary data
during sequencing runs
- Agilent BioAnalyzer +€20.000
Total: €126.000
50. Illumina GA2:
Next Generation 5-10 Gb/run (50b)
$8250 (€6180)/run (5Gb)
Sequencers $0,33/Mb
€480/inst. tot
The Illumina Illumina GA2:
Setup time: 2-3 d
6-11 Gb/run
System Run time: 3-6 d
images: 900 GB
Primary Analysis: 350 GB
PA CPU time: 100 hrs
Final File Size: 75 GB
notes:
7/15 Gb by end of 2009
72 frag. length
9 systems in France
325 publications
Multiplexing capacity
52. SOLiD v2 instrument components
The SOLiD™ Instrument consists of
the following components:
• Reagent delivery system
• Electronics
• Camera (4 megapixel)
• Monitor stand
• Independently controlled dual flow
cells
• Liquid waste container
SOLiD v2 computer system
instrument controller
• Hardware: Intel® Xeon® processors
• Operating system: Microsoft®
Windows® XP Pro
• Installed RAM: 4 GB
• Hard disk storage: dual 80 GB
SATA hard drives (RAID-1)
head node
• Hardware: Intel® Xeon® Dual Core
processors (2)
• Operating system: 64-bit LINUX
• Installed RAM: 8 GB
• Hard disk storage: dual 750 GB
SATA hard drives (RAID-1)
compute nodes (each)
• Hardware: Intel® Xeon® Dual Core
processors (2)
• Operating System: 64-bit LINUX
• Installed RAM: 8 GB
SOLID in details
• Hard disk storage: 80 GB SATA hard
drives
storage
• Hard disk storage:
15x 750 GB SATA hard drives
• Operating system: 64-bit LINUX
• RAID-5 w/ hot spare
53. Figure 1. Library generation schematic.
Sequencing on the SOLiD machine starts with library preparation. In the simplest
fragment library, two different adapters are ligated to sheared genomic DNA (left
panel of Fig. 1). If more rigorous structural analysis is desired, a “mate-pair”
library can be generated in a similar fashion, by incorporating a circularization/
cleavage step prior to adapter ligation (right panel of Fig.1).
ABI's SOLID Sequencing
Technology
54. Figure 2. Clonal bead library generation via emulsion PCR.
Once the adapters are ligated to the library, emulsion PCR is conducted using the
common primers to generate “bead clones” which each contain a single nucleic
acid species.
ABI's SOLID Sequencing
Technology
55. Figure 3. Depositing beads into flow cell via end modifications.
Each bead is then attached to the surface of a flow cell via 3’ modifications to the
DNA strands.
At this point, we have a flow cell (basically a microscope slide that can be serially
exposed to any liquids desired) whose surface is coated with thousands of beads
each containing a single genomic DNA species, with unique adapters on either
end.
Each microbead can be considered a separate sequencing reaction which is
monitored simultaneously via sequential digital imaging. Up to this point all next-
gen sequencing technologies are very similar, this is where ABI/SOLiD diverges
dramatically (see next).
ABI's SOLID Sequencing
Technology
56. Each oligo has degenerate
positions at 3’ bases 1-3
(N’s), and one of 16 specific
dinucleotides at positions
4-5. Positions 6 through the
5’ are also degenerate, and
hold one of four fluorescent
dyes. The sequencing
involves:
1. Hybridization and
ligation of a specific
oligo whose 4th & 5th
bases match that of the
template
2. Detection of the specific
fluor
3. Cleavage of all bases to
the 5’ of base 5
4. Repeat, this time
querying the 9th & 10th
Figure 4. Schematic of ABI SOLiD sequencing chemistry.
bases
5. After 5-7 cycles of this,
perform a “reset”, in
which the initial primer
and all ligated portions
The actual base detection is no longer done by the polymerase-driven incorporation of
are melted from the
labeled dideoxy terminators. Instead, SOLiD uses a mixture of labeled oligonucleotides
template and
and queries the input strand with ligase. Understanding the labeled oligo mixture is
discarded.
key to understanding SOLiD technology.
6. Next a new initial
primer is used that is
N-1 in length.
Repeating the initial
cycling (steps 1-4) now
ABI's SOLID Sequencing
generates an
overlapping data set
(bases 3/4, 8/9, etc,
Technology
see Fig 5).
57. Each oligo has degenerate
positions at 3’ bases 1-3
(N’s), and one of 16 specific
dinucleotides at positions
4-5. Positions 6 through the
5’ are also degenerate, and
hold one of four fluorescent
dyes. The sequencing
involves:
1. Hybridization and
ligation of a specific
oligo whose 4th & 5th
bases match that of the
template
2. Detection of the specific
fluor
3. Cleavage of all bases to
the 5’ of base 5
4. Repeat, this time
querying the 9th & 10th
Figure 4. Schematic of ABI SOLiD sequencing chemistry.
bases
5. After 5-7 cycles of this,
perform a “reset”, in
which the initial primer
and all ligated portions
The actual base detection is no longer done by the polymerase-driven incorporation of
are melted from the
labeled dideoxy terminators. Instead, SOLiD uses a mixture of labeled oligonucleotides
template and
and queries the input strand with ligase. Understanding the labeled oligo mixture is
discarded.
key to understanding SOLiD technology.
6. Next a new initial
primer is used that is
N-1 in length.
Repeating the initial
cycling (steps 1-4) now
ABI's SOLID Sequencing
generates an
overlapping data set
(bases 3/4, 8/9, etc,
Technology
see Fig 5).
58. For example (see Fig.
4), the dinucleotides
CA, AC, TG, and GT are
all encoded by the
green dye.
Because each base is
queried twice it is
possible, using the two
colors, to determine
which bases were at
which positions.
This two color query
Figure 5. Sequencing coverage during SOLiD sequencing cycles
system (known as
“color space” in ABI- Thus, 5-7 ligation reactions followed by a 4-5 primer reset cycles are repeated
speak) has some generating sequence data for ~35 contiguous bases, in which each base has
interesting been queried by two different oligonucleotides.
consequences with
regard to the If you’re doing the math you’ve realized there are 16 possible dinucleotides
identification of errors. (4^2) and only 4 dyes. So data from a single color does not tell you what base is
at a given position. This is where the brilliance (and potential confusion) comes
about with regard to SOLiD. There are 4 oligos for every dye, meaning there are
four dinucleotides that are encoded by each dye.
ABI's SOLID Sequencing
Technology
59. For example (see Fig.
4), the dinucleotides
CA, AC, TG, and GT are
all encoded by the
green dye.
Because each base is
queried twice it is
possible, using the two
colors, to determine
which bases were at
which positions.
This two color query
Figure 5. Sequencing coverage during SOLiD sequencing cycles
system (known as
“color space” in ABI- Thus, 5-7 ligation reactions followed by a 4-5 primer reset cycles are repeated
speak) has some generating sequence data for ~35 contiguous bases, in which each base has
interesting been queried by two different oligonucleotides.
consequences with
regard to the If you’re doing the math you’ve realized there are 16 possible dinucleotides
identification of errors. (4^2) and only 4 dyes. So data from a single color does not tell you what base is
at a given position. This is where the brilliance (and potential confusion) comes
about with regard to SOLiD. There are 4 oligos for every dye, meaning there are
four dinucleotides that are encoded by each dye.
ABI's SOLID Sequencing
Technology
60. NextGen
AB Solid 3.0:
Sequencers
Workflow
Workflow: 3-4 days (setup) + 4-10 days (run)
61. NextGen
AB Solid 3.0:
Sequencers
Workflow
Workflow: 3-4 days (setup) + 4-10 days (run)
62. NextGen
AB Solid 3.0:
Sequencers
Workflow
Workflow: 3-4 days (setup) + 4-10 days (run)
63. NextGen
AB Solid 3.0:
Sequencers
Workflow
Workflow: 3-4 days (setup) + 4-10 days (run)
64. NextGen AB Solid 3.0:
Sequencers
add-ons
Covaris S2 System ULTRA-TURRAX Tube
Drive from IKA
The Covaris™ S2 System is required
sample preparation instrument for use
This device is required for the
in the SOLiD™ System workflow. The
preparation of consistently sized
instrument is an essential part of the
reactors for emulsion PCR.
emulsion PCR process used to prepare
the beads for emulsion PCR. The
Hydroshear from
Covaris System is also used to shear
DNA into 60 bp fragments for fragment
Genomic Solutions
library preparation.
The Hydroshear® from Genomic
Solutions® is a reproducible and
included controllable method for generating
random DNA fragments of specific
sizes. Use this to prepare mate-
paired libraries for the SOLiD™
System.
not included - Agilent BioAnalyzer +€20.000
65. AB Solid 3.0
Next Generation 10-20 Gb/run
100m reads @ 50b
€5300/run 5+5Gb
Sequencers €0,53/Mb
€482k/inst. tot
The SOLID AB Solid 3.0:
Setup time: 3-5 d
5-12.5 Gb/run/slide
System Run time: 3.5-10 d
images: 2.5 TB
Primary Analysis: 750 GB
PA CPU time: in run time
Final file size: 140 GB
notes:
The Scientist Top Innovation of 2008
125-400m reads in 2009
30/40Gb
potential for 12x human genome @
$10.000
3 systems in France
Multiplexing capacity
66. Roche GS-FLXti: Roche GS-FLXti:
0.5 Gb/run Setup time: 3-4 d
1m reads @ 400b 0.4Gb/run
Run time: 10 hrs
images: 27 GB
€5990/run Primary Analysis: 15 GB
PA CPU time: 220 hrs
€14.97/Mb
Final file size: 4 GB
€585k/inst. tot
Illumina GA2: Illumina GA2:
5-10 Gb/run (50b) Setup time: 2-3 d
6-11 Gb/run
€6180/run (5Gb) Run time: 3-6 d
images: 900 GB
Primary Analysis: 350 GB
€0,25/Mb
PA CPU time: 100 hrs
€480/inst. tot Final File Size: 75 GB
AB Solid 3.0 AB Solid 3.0:
10-20 Gb/run Setup time: 3-5 d
100m reads @ 50b 5-12.5 Gb/run/slide
Run time: 3.5-10 d
€5300/run 5+5Gb images: 2.5 TB
Primary Analysis: 750 GB
PA CPU time: in run time
€0,53/Mb
Final file size: 140 GB
€482k/inst. tot
67. Roche GS-FLXti
General
Infrastructure
Laboratory 1 Controlled
Room
(emPCR)
Amplicon
Room
Requirements General
Laboratory 2
BioIT room
Illumina GA2
- Lab space, dedicated rooms General
Laboratory 1
- Hands on IT infrastructure Cluster
Station room
- Data Storage capacity General
Laboratory 2
BioIT room
-Sample and wor kflow
tracability solutions
General
Laboratory 1 Controlled
Room
(emPCR)
Amplicon
- BioIT group support for 3rd Room
General
party analysis Laboratory 2
BioIT room
AB Solid 3.0
68. NextGen Sequencing Service
Providers
Europe
Many locations Cogenics http://www.cogenics.com/sequencing/s...ingService.cfm
Many locations GATC Biotech http://www.gatc-biotech.com/en/index.php
Germany dkfz http://www.dkfz.de/gpcf/ngs_sequencing.html
Germany Functional Genomics Center zurich http://www.fgcz.ethz.ch/applications/gt/ngsequencing
Germany Eurofins MWG Operon http://www.eurofinsdna.com/products-...equencing.html
Hungary BAYGEN http://baygen.hu/
The Netherlands ServiceXS http://www.servicexs.com/servicexs+i...+ii+sequencing
Spain Sistemas Genómicos http://www.sistemasgenomicos.com/
Sweden Sweden Uppsala Genome Center http://www.genpat.uu.se/node453
Switzerland Fasteris http://www.fasteris.com/
UK AGOWA - LGC http://www.lgc.co.uk/pdf/Next%20gen%...lyer%20web.p
UK The Gene Pool https://www.wiki.ed.ac.uk/display/GenePool/Home
UK Geneservice http://www.geneservice.co.uk/services/sequencing/
UK University of Liverpool http://www.liv.ac.uk/agf/index.html
Belgium DNAVision (soon available) http://www.dnavision.be/
GATC
Illumina platform based: 3500 € HT 1/8 flow cell vs 772 €
Roche platform based: 10.150 € HT 1/2 picoplate vs 2995 €
Cogenics 10/2008
Roche platform based: 15.000 € HT 1 full picoplate vs 5990 €
69. Whole genome Amplicon seq. Transcriptome seq.
- Mutations / SNP - cDNA
sequencing
- Small RNA
- de novo sequencing
- comparative seq.
Methylation seq. Metagenomics ChIP sequencing
Les Applications
70. AB:
Roche
1, 4, 8 regions
slides
16-128 samples/slide
with barcoding
AB
Illumina:
2, 4, 8, regions Flow Cell
flow cells – 1.4mm wide channel design
– 40% more usable area
Roche:
2, 4, 8, 16 regions
Illumina
plates
Multiple Sample Sequencing
71. Roche (192)
AB (256)
Illumina (96)
Increase Sample Throughput
via Multiplex Identifiers