SlideShare a Scribd company logo
1 of 56
High-throughput sequencing technologies in
genome assembly
Hans Jansen
Dutch SME at Bioscience Park in Leiden, the Netherlands
• High throughput drug screens, and toxicity assays in zebrafish larvae
• Fish fertility (eel, pike perch, sole) to aid sustainable aquaculture
• Sequencing (genomes, transcriptomes)
• Bioinformatics
ZF-screens B.V.
Common carp (Cyprinus carpio)
High troughput screening model
Genome and transcriptomes
European and Japanese eel (Anguilla anguilla and Anguilla japonica)
Completing the life cycle in aquaculture
Genome and transcriptomes
King cobra (Ophiophagus hannah)
Evolution and toxins
Genome and transcriptomes
Some examples of genome projects
Chemical cleavage (Maxam and Gilbert)
Chain termination (Sanger, Nicklen, and Coulson)
Throughput: 5 samples, 1 Kb/day, micrograms
of ssDNA needed
1977 2000 2011
Massively parallel
signature sequencing
(Brenner)
SMRT (Pacific
Biosciences)
Throughput: 3x109 samples, 55 Gb/day,
single molecule of DNA needed
A brief history of DNA sequencing
A brief history of DNA sequencing
February 1977: Maxam and Gilbert
Chemical cleavage: Modify nucleotides and cut at the modified position.
December 1977: Sanger, Nicklen, and Coulson
Chain termination: Use modified nucleotides to stop the
extension of a newly synthesized DNA strand.
A brief history of DNA sequencing
Maxam and Gilbert sequencing was relatively soon abandoned. It was technically
complex, used some nasty chemicals and radioactivity.
The Sanger sequencing method has been improved and over the years was the method
of choice to sequence the first draft of a human genome.
• Thermostable polymerases alleviated the need for ssDNA template
• Fluorescent dye terminators to combine all four reactions in one.
• Automation of the separation of the DNA fragments.
Shotgun sequencing was already used by Sanger to sequence lambda DNA and proved
to be a powerful tool to sequence and assemble larger DNA molecules and even whole
genomes.
A brief history of DNA sequencing
To make assembly easier partially overlapping BAC clones from the genome were first
selected and then sequenced and assembled by the shotgun method.
gDNA
BAC
This was a laborious method and later a whole genome shothun approach was used.
A brief history of DNA sequencing
Genomic DNA
Break the DNA in < 1Kb fragments
3’
5’
Polish the ends of the DNA and
adenylate them
3’
5’
3’
A5’
3’
A
A
3’
5’
5’
Ligate adapter to the ends of the DNAT5’
3’T5’
3’
Amplify paired end library3’
5’
3’
5’
3’
5’
3’
5’
3’
5’
3’
5’
3’
5’
3’
5’
Bind ss-library to flowcell3’5’
Making a paired end library
Attach and cluster the library on a carrier
Sequence the library
2 x 50 bp
Generate large fragments by shearing,
and label the ends with biotin (green dash).
Self ligate fragments in large volume,
and shear the circular fragments (black dash).
Isolate the biotinylated fragments, convert them to a
paired end library and sequence them (red arrows).
Problem: part of these fragments have unconvertible ends.
Problems: larger fragments will self ligate inefficiently.
Nicks in the DNA will enable digestion of circularized molecules
The above mentioned problems limit the library to ~10 kb insert size and they tend to have a low number of
unique fragments.
Obtaining scaffolding information: mate pairs
Generate large fragments by shearing, isolate
~39 kb fragments and clone in adapted fosmid
vector which contain insert flanking EcoP15I
sites (purple dash).
Cut with EcoP15I which leaves a 26 bp
overhang, end repair fragments and self ligate.
PCR the diTag library from these fragments, and
sequence the 52 bp inserts.
Problem: These large fragments will ligate inefficiently in the
fosmid vector leading to low complexity libraries.
Obtaining scaffolding information: Fosmid diTags
Library Insert Reads Gbp Coverage Span
PE200 <155 bp 2 × 76 nt 21.9 14.6×
PE280 230–305 bp 2 × 151 11.0 7.3×
PE500 370–485 bp
2 × 50–151
nt
19.3 12.9× 1.2×
MP2K 1.6–2.4 Kbp 2 × 36 nt 5.4 4.5×
MP7K 4–6 Kbp 2 × 51 nt 2.3 0.6×
MP10K 6.5–10 Kbp 2 × 51 nt 5.3 7.7×
MP15K 9–13 Kbp 2 × 51 nt 3.8 8.8×
69 Gbp 34.8× 22.9×
King cobra sequence data
Read merging
If the two reads of a paired end fragment overlap they can be merged into a single
longer read
• We use our own script since nothing was available at the time
• Now there are a number of tools: FLASH, SHERA, SeqPrep
• Paired end libraries need to be prepared with the read length in mind, and size
select as narrow as possible.
~600 bp
~270 bp
102
Fragmentsize (bp)
%oftheassembly
103 104 105 106
+ 500 bp + 2 Kbp+ 7 Kbp + 10 Kbp+ 15 Kbp
Assembly (cobra)
Contigs
N50 3982 bp
largest 70 Kbp
number 1186408
Tota length 1.45 Gbp
Scaffolds
N50 226 Kbp
largest 2.84 Mbp
number 716551
Total length 1.66 Gbp
number of genes 22183
King cobra sequence assembly
Genome Res. 2007 17: 240-248
This is a method to sequence (a small) part of a genome, and do this for
multiple siblings.
From the sequence data SNP’s can be identified and used as markers to build a
genetic map of this genome.
Analysis of the spotted gar genome cut with SbfI in the parents and 94
individuals from their progeny produced 8406 markers in 29 linkage groups.
Generating a RAD-tag genetic map
From Baird, PLoS ONE 2008
This can be done with multiple samples
when using barcodes
After adding the barcodes all samples can
be pooled to reduce workload
Pools of short fragments from different
individuals.
Generating a RAD-tag genetic map
Amores A et al. Genetics 2011;188:799-808
Generating a RAD-tag genetic map
Long DNA molecules Fluorescently labeled at specific sites are linearized in
nanochannels and imaged. The fluorescent fingerprints of each molecule can
be assembled and linked to contigs and scaffolds.
Optical mapping: BioNano Genomics
Gabino Sanchez-Perez lecture at 15.00 hrs. will explain this in much
more detail and show some great examples how to use this technology.
Just a genome is usually not the goal of a de novo sequencing project.
Based on the general structure of a gene, gene predictions can be made.
exon exon exon exon
AGGT AGT
A
G
Pyrich CAGG
splice acceptor site
ATG STOP
Poly adenylation signalA
C
splice donor site
CT A
Branch site
A C
G T
20-50 bases
intron
RNAseq reads can help validate predictions
Annotation of the genome
Different flavors of RNAseq
• Stranded dUTP RNAseq: simple modification of standard prep gives
information of the strandedness of the transcript.
• RNAseq with minimal quantities of RNA : a great tool to look at small
numbers of (FACS sorted) cells
• Cage : ideal to find transcription start site
• smallRNA: to explore the miRNA content of a sample
Transcriptome sequencing
Disadvantages of next generation sequencing:
• Complex sample preparation including PCR amplification.
• High run costs.
• Long run times.
• Short reads
Changes needed:
• Single molecule analysis
• Reading sequences at a high speed
• Highly parallel
• Long reads >10kb
• No errors
Long reads: what do we want?
Pacific Biosciences PacBio RS II
Available since 2010
Oxford Nanopore Technologies MinION
Available since 2014
Generating long reads
Pacific Biosciences PacBio RSII
It uses a zero mode waveguide
to measure fluorescence in a
very small volume.
Ligate hairpin adapters
Fragment gDNA and polish ends, and add adenosine.
Attach polymerase, load on SMRT cell and sequence
DNA polymerase
Transparent bottom of
zero mode waveguide
Pacific Biosciences
Pacific Biosciences P6-C4
• Yield 0.5-1 Gbp/SMRT cell.
• Since no amplification is done you
sequence the DNA as it comes out of your
sample (nicks, base modifications).
• There is very little sequence bias and no
systemic errors
Christoph Konig’s lecture at 14.15 hrs will delve much deeper into this technology.
• Started to work on nanopore sensing in 2005
• Investments to date 180 million GBP (227 M€)
• ~200 employees
• Broad IP portfolio
• Announced products: MinION and PromethION systems
• Access program for MinION (MAP)
Oxford Nanopore Technologies
But MAP is much more. It is about being a community and a playground to test new
applications.
Last part of the development of this technology is done “in field” in an fairly open
program.
100’s of MinIONs send around the globe to see how they would behave in real life.
MAP is visible as a web portal with information from ONT and social media like system
with blog possibilities, comment, likes, and a forum to ask advice.
MinION access program
Tethering oligo
Motor protein Brake protein
hairpin
abasic nucleotidesT
TA
A
Shear (optional)
DNA repair (optional), AmpureXP purification
end repair, AmpureXP purification
A tailing, AmpureXP purification
Ligation, His-tag purification,
Dilution in run buffer and ATP
A MuA transposase protocol is under development. This should further
simplify sample preparation (10 minutes).
Library preparation
Tethering oligo
Motor protein E5
Brake protein E3
hairpin
abasic nucleotides
Tether keeps DNA fragment on the membrane leading to a ~20K fold higher DNA
concentration close to the pore.
Motor protein unwinds DNA and ratchets it though the pore.
Abasic nucleotides in the hairpin are a recognition point.
Brake protein prevents the motor protein from zipping through the complement strand.
Sequencing
Stills taken from: https://www.nanoporetech.com/news/movies#movie-24-nanopore-dna-sequencing
Strand sequencing
ATP
GGCTCACTCCCATAAGC
GGCTC
GCTCA
CTCAC
TCACT
CACTC
ACTCC
CTCCC
Raw Data (ionic curent, pA)
Events (with time domain)
Squiggle (events with time domain removed)
Sensing the DNA
Squiggle plot for a complete read
First the template part in blue, then the abasic nucleotides in the hairpin in red, and
finally the complement part in turquoise .
Alignment of template and complement squiggles gives a 2d read.
Squiggle plot
MinKNOW controls the run and shows channel states…..
Interactive interface
….. and amount of events vs read length.
Metrichor agent runs in the background to send sequence files to and from
the (cloud based) base caller.
MinKNOW can interact with other software.
minoTour analyses reads in a streaming mode and can control MinKNOW.
Interactive interface
template mean 8734 bp complement mean 8126 bp 2D mean 9930 bp
Read length is limited by the non-nicked fragment length rather than the by the system.
My longest 2D read until now: 93.5 Kbp, template 120 Kb.
Read length distribution
There are actually 4 wells/detection
channel. QC at the beginning of the
run determines the quality of the
4wells. Sequencing starts on the best
set of wells. Each 24 hrs the next best
set of wells is chosen.
Yield over time
Errors
ref TGATGTATATGCTCTCTTTTCTGACGTTAGTCTCCGACGGCAGGCTTCAA-TGACCC-A-GGCTGAGAAATTCCCGGACCCTTTTTGCTCAAGAGCGATG
|||||||||||||| |||||||||||| ||||||||||||||||||| |||||| | ||||||||||||||||||||| |||||| |||| | |
MinION TGATGTATATGCTC----TTCTGACGTTAGCCTCCGACGGCAGGCTTCAATTGACCCGATGGCTGAGAAATTCCCGGACCC--TTTGCTACAGAGTG-T-
ref TTAATTTGTTCAATCATTTGGTTAGGAAAGCGGATGTTGCGGGTTGTTGTTCTGCGGGTTCTGTTCTTCGTTGACATGAG---GTTGCCCCGTATTCAGT
|||||||||||||||||||||||||||||||||| ||| |||||| | |||| ||||||| ||| |||||| | || | || | | |
MinION TTAATTTGTTCAATCATTTGGTTAGGAAAGCGGA---TGC-GGTTGT--TCCTGC-GGTTCTG----TCG-TGACATCCGTTATTTGCGCTGT-TACGC
ref GTCGC-TGATTTGTATTGTCTGAAGTTGTTTTTACGTTAAGTTGATGCAGATCAATTAATACGATACCT--GCGTCATAATTGATTATTTGACGT--GGT
| || || |||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||| |||||||||||||||||||||||| |||
MinION ATGGCATGTTTTGTATTGTCTGAAGTTGTTTTTACGTTAAGTTAATGCAGATCAATTAATACGATACCTCGGCGTCATAATTGATTATTTGACGTGGGGT
Error rate lies around 15% for current chemistry (R7.3). Typical passing 2D R7.3 read now is
2.8% deletions, 2.7% insertions and 1.7% substitutions.
R8&9 nanopores are in the pipeline (improving on G/C rich reads and better S/N).
Errors
Errors result from different parts of the system.
On the ASIC:
Events are missed by the translation from raw data to event data.
Solution: Sharpen up the raw data by playing with voltage and by new
nanopores with lower noise. Sequence faster.
In the base caller:
Bases outside the observed k-mer influence the current.
Solution: Higher k-mer models
Modified bases are currently not included in the k-mer model.
Solution: add modified k-mers to the model. Modified k-mers are
different from unmodified k-mers.
Errors
Throughput is defined by:
Number of channels. 512 on the MinION
Speed of translocation. 30 bps/sec
Occupancy of the pore. 90%
The time a Flow Cell can run. ~60 hrs.
Currently well over 1 Gb events.
On R7.3 this translates to ~400 Mb 2D data.
Throughput
In “fast mode” the MinION will read 500 bps/sec. Currently three MAP groups are
testing this. Throughput will increase to ~20 Gb in events.
Longest 2D read: 93.5 Kbp
Longest template read: 120 Kbp (231 Kbp)
Highest yield: 1.32 Gevents
R7
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Base pairs sequenced (Mbp)
Runs
template and 2D yield over the past year
template
2D
R7.3R6
repeatunique sequence in unique sequence out
Long reads can help to resolve repeat area’s in the assembly graph
And the resulting contigs will now look like this:
Untangle
1. Short read correction Quake (not for small genomes)
2. Short read assembly Velvet
3. MinION read alignment to Velvet contigs LAST
4. Link filtering and contig tiling Untangle script
5. Path detachment around repeats Untangle script
6. Bubble popping Untangle script
7. Delete unconfirmed connections Untangle script
8. Contig extraction Untangle script
Assembly and scaffolding strategy
Task Software
Agrobacterium NCPPB 1771 assembly graph
25× transposon →
(1160 bp)
8× transposon →
(873 bp)
4× rRNA →
(6.4 Kb)
271 nodes, 311 connections
154 contigs
N50 = 198 Kb
Sum = 5.87 Mb
• Alignment: LAST with optimized settings
• Links: alignment filtering and contig tiling
• 7328 reads aligned to contigs
• 438 reads aligned to multiple contigs
• 585 links between contigs
• 13158 reads on R6 and R7 chemistry
• 73.8 Mb total yield (template and 2D)
• 5–85970 nt length, typical ~12 Kb
MinION sequencing and scaffolding
Links between nodes are specific
Means link is confirmed by PCR
Final assembly graph after scaffolding
• 271 nodes + 312 connections → 49 nodes + 5 connections
• 154 contigs → ~8 contigs
• Complete chromosome 2 (1.2 Mb), pTi (190 Kb), cryptic megaplasmid (746 Kb)
• Slight residual fragmentation of chromosome 1
Reads are in HDF5 format and contain all data from the event data onwards.
A cloud based basecaller is provided by Oxford Naopore Technologies.
The MAP community is actively developing software to use this type of data.
Some examples:
Jared Simpson’s pipeline to correct and assemble using only nanopore reads.
Live monitoring, alignments and feedback to the MinION.
Matt Loose’s Minotour.
Squiggle space aligners
Each base is measured 5 times in consecutive kmers so it makes sense to avoid
basecalling and work directly with the events (squiggle space)
Software
London Calling 2015
Highlights from Clive Brown’s talk
• Improvements to the basecaller .
• Read until (and barcoding).
• Fast mode on the MinION MkI (500 bp/sec instead of 30).
• New 3000 channel ASIC with “crumpet” chip design to separate ASIC and fluidics part.
• MinION MkII and PromethION will have this new ASIC.
• Library prep on beads to reduce amounts of DNA needed (lower ng to pg).
• Direct RNA sequencing.
• Simplified sample preparation and VolTRAX.
• Pricing will be “pay as you go”. Initial payment for hardware include some hrs sequencing.
• MkI $270 and 3 hrs sequencing (~3 Gbp in fast mode).
London Calling 2015
Much emphasis on getting the library prep
simpler and faster to be able to leave the lab.
If the system leaves the lab many more
applications become possible.
VolTRAX
The technology underlying the MinION system is scalable so
larger throughput can be made available relatively easy.
It will use the new ASIC design and will have 144000 channels.
Projected throughput: 6.4 Tbp/day.
Too much data to do cloud baseclling so will be done locally.
Access Program will start later this year.
London Calling 2015
PromethION
Freek Vonk
Harald Kerkkamp
Asad Hyder
Michael Richardson
Christiaan Henkel
Paul Hooykaas
Ron Dirks
Guido van den Thillart
Herman Spaink
Pim Arntzen
Erwin Fakkert
Marten Boetzer
Walter Pirovano
Diana Uffink
R. Manjunatha Kini
Ken Kraaijeveld
Yavuz Ariyurek
Arnoud Schmitz
Yahya Anvar
Acknowledgments
Dan Turner
Oliver Hartwell
20150601 bio sb_assembly_course

More Related Content

What's hot

Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
jukais
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
Thomas Keane
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
mkim8
 

What's hot (20)

Nextera Overview Feb 2010
Nextera Overview Feb 2010Nextera Overview Feb 2010
Nextera Overview Feb 2010
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
Oxford nanopore sequencing
Oxford nanopore sequencingOxford nanopore sequencing
Oxford nanopore sequencing
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools Selection
 
next generation sequencing (recent collection2018)
next generation sequencing (recent collection2018) next generation sequencing (recent collection2018)
next generation sequencing (recent collection2018)
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
Next-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-SeqNext-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-Seq
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 

Viewers also liked

Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assembly
Lex Nederbragt
 

Viewers also liked (10)

Fish breeding for future environments under climate change
 Fish breeding for future environments under climate change Fish breeding for future environments under climate change
Fish breeding for future environments under climate change
 
Fishing in the genepool: Genetic resources and traits to address climate change
Fishing in the genepool: Genetic resources and traits to address climate changeFishing in the genepool: Genetic resources and traits to address climate change
Fishing in the genepool: Genetic resources and traits to address climate change
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assembly
 
20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 

Similar to 20150601 bio sb_assembly_course

ngs-mousumee-210611153338.pdf
ngs-mousumee-210611153338.pdfngs-mousumee-210611153338.pdf
ngs-mousumee-210611153338.pdf
ssuser4743df
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
c.titus.brown
 

Similar to 20150601 bio sb_assembly_course (20)

Conventional and next generation sequencing ppt
Conventional and next generation sequencing pptConventional and next generation sequencing ppt
Conventional and next generation sequencing ppt
 
Hamas 1
Hamas 1Hamas 1
Hamas 1
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Dna sequencing and its types
Dna sequencing and its typesDna sequencing and its types
Dna sequencing and its types
 
THIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxTHIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptx
 
Sequence based Markers
Sequence based MarkersSequence based Markers
Sequence based Markers
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Useful.ppt
Useful.pptUseful.ppt
Useful.ppt
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
ngs-mousumee-210611153338.pdf
ngs-mousumee-210611153338.pdfngs-mousumee-210611153338.pdf
ngs-mousumee-210611153338.pdf
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
AFLP, RFLP & RAPD
AFLP, RFLP & RAPDAFLP, RFLP & RAPD
AFLP, RFLP & RAPD
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
RMR-Nirma-NGS-Heena.pdf
RMR-Nirma-NGS-Heena.pdfRMR-Nirma-NGS-Heena.pdf
RMR-Nirma-NGS-Heena.pdf
 
molecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfmolecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdf
 
2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education
 

Recently uploaded

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Recently uploaded (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 

20150601 bio sb_assembly_course

  • 1. High-throughput sequencing technologies in genome assembly Hans Jansen
  • 2. Dutch SME at Bioscience Park in Leiden, the Netherlands • High throughput drug screens, and toxicity assays in zebrafish larvae • Fish fertility (eel, pike perch, sole) to aid sustainable aquaculture • Sequencing (genomes, transcriptomes) • Bioinformatics ZF-screens B.V.
  • 3. Common carp (Cyprinus carpio) High troughput screening model Genome and transcriptomes European and Japanese eel (Anguilla anguilla and Anguilla japonica) Completing the life cycle in aquaculture Genome and transcriptomes King cobra (Ophiophagus hannah) Evolution and toxins Genome and transcriptomes Some examples of genome projects
  • 4. Chemical cleavage (Maxam and Gilbert) Chain termination (Sanger, Nicklen, and Coulson) Throughput: 5 samples, 1 Kb/day, micrograms of ssDNA needed 1977 2000 2011 Massively parallel signature sequencing (Brenner) SMRT (Pacific Biosciences) Throughput: 3x109 samples, 55 Gb/day, single molecule of DNA needed A brief history of DNA sequencing
  • 5. A brief history of DNA sequencing February 1977: Maxam and Gilbert Chemical cleavage: Modify nucleotides and cut at the modified position. December 1977: Sanger, Nicklen, and Coulson Chain termination: Use modified nucleotides to stop the extension of a newly synthesized DNA strand.
  • 6. A brief history of DNA sequencing Maxam and Gilbert sequencing was relatively soon abandoned. It was technically complex, used some nasty chemicals and radioactivity. The Sanger sequencing method has been improved and over the years was the method of choice to sequence the first draft of a human genome. • Thermostable polymerases alleviated the need for ssDNA template • Fluorescent dye terminators to combine all four reactions in one. • Automation of the separation of the DNA fragments. Shotgun sequencing was already used by Sanger to sequence lambda DNA and proved to be a powerful tool to sequence and assemble larger DNA molecules and even whole genomes.
  • 7. A brief history of DNA sequencing To make assembly easier partially overlapping BAC clones from the genome were first selected and then sequenced and assembled by the shotgun method. gDNA BAC This was a laborious method and later a whole genome shothun approach was used.
  • 8. A brief history of DNA sequencing
  • 9. Genomic DNA Break the DNA in < 1Kb fragments 3’ 5’ Polish the ends of the DNA and adenylate them 3’ 5’ 3’ A5’ 3’ A A 3’ 5’ 5’ Ligate adapter to the ends of the DNAT5’ 3’T5’ 3’ Amplify paired end library3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ 3’ 5’ Bind ss-library to flowcell3’5’ Making a paired end library
  • 10. Attach and cluster the library on a carrier
  • 12. 2 x 50 bp Generate large fragments by shearing, and label the ends with biotin (green dash). Self ligate fragments in large volume, and shear the circular fragments (black dash). Isolate the biotinylated fragments, convert them to a paired end library and sequence them (red arrows). Problem: part of these fragments have unconvertible ends. Problems: larger fragments will self ligate inefficiently. Nicks in the DNA will enable digestion of circularized molecules The above mentioned problems limit the library to ~10 kb insert size and they tend to have a low number of unique fragments. Obtaining scaffolding information: mate pairs
  • 13. Generate large fragments by shearing, isolate ~39 kb fragments and clone in adapted fosmid vector which contain insert flanking EcoP15I sites (purple dash). Cut with EcoP15I which leaves a 26 bp overhang, end repair fragments and self ligate. PCR the diTag library from these fragments, and sequence the 52 bp inserts. Problem: These large fragments will ligate inefficiently in the fosmid vector leading to low complexity libraries. Obtaining scaffolding information: Fosmid diTags
  • 14. Library Insert Reads Gbp Coverage Span PE200 <155 bp 2 × 76 nt 21.9 14.6× PE280 230–305 bp 2 × 151 11.0 7.3× PE500 370–485 bp 2 × 50–151 nt 19.3 12.9× 1.2× MP2K 1.6–2.4 Kbp 2 × 36 nt 5.4 4.5× MP7K 4–6 Kbp 2 × 51 nt 2.3 0.6× MP10K 6.5–10 Kbp 2 × 51 nt 5.3 7.7× MP15K 9–13 Kbp 2 × 51 nt 3.8 8.8× 69 Gbp 34.8× 22.9× King cobra sequence data
  • 15. Read merging If the two reads of a paired end fragment overlap they can be merged into a single longer read • We use our own script since nothing was available at the time • Now there are a number of tools: FLASH, SHERA, SeqPrep • Paired end libraries need to be prepared with the read length in mind, and size select as narrow as possible. ~600 bp ~270 bp
  • 16. 102 Fragmentsize (bp) %oftheassembly 103 104 105 106 + 500 bp + 2 Kbp+ 7 Kbp + 10 Kbp+ 15 Kbp Assembly (cobra)
  • 17. Contigs N50 3982 bp largest 70 Kbp number 1186408 Tota length 1.45 Gbp Scaffolds N50 226 Kbp largest 2.84 Mbp number 716551 Total length 1.66 Gbp number of genes 22183 King cobra sequence assembly
  • 18. Genome Res. 2007 17: 240-248 This is a method to sequence (a small) part of a genome, and do this for multiple siblings. From the sequence data SNP’s can be identified and used as markers to build a genetic map of this genome. Analysis of the spotted gar genome cut with SbfI in the parents and 94 individuals from their progeny produced 8406 markers in 29 linkage groups. Generating a RAD-tag genetic map
  • 19. From Baird, PLoS ONE 2008 This can be done with multiple samples when using barcodes After adding the barcodes all samples can be pooled to reduce workload Pools of short fragments from different individuals. Generating a RAD-tag genetic map
  • 20. Amores A et al. Genetics 2011;188:799-808 Generating a RAD-tag genetic map
  • 21. Long DNA molecules Fluorescently labeled at specific sites are linearized in nanochannels and imaged. The fluorescent fingerprints of each molecule can be assembled and linked to contigs and scaffolds. Optical mapping: BioNano Genomics Gabino Sanchez-Perez lecture at 15.00 hrs. will explain this in much more detail and show some great examples how to use this technology.
  • 22. Just a genome is usually not the goal of a de novo sequencing project. Based on the general structure of a gene, gene predictions can be made. exon exon exon exon AGGT AGT A G Pyrich CAGG splice acceptor site ATG STOP Poly adenylation signalA C splice donor site CT A Branch site A C G T 20-50 bases intron RNAseq reads can help validate predictions Annotation of the genome
  • 23. Different flavors of RNAseq • Stranded dUTP RNAseq: simple modification of standard prep gives information of the strandedness of the transcript. • RNAseq with minimal quantities of RNA : a great tool to look at small numbers of (FACS sorted) cells • Cage : ideal to find transcription start site • smallRNA: to explore the miRNA content of a sample Transcriptome sequencing
  • 24. Disadvantages of next generation sequencing: • Complex sample preparation including PCR amplification. • High run costs. • Long run times. • Short reads Changes needed: • Single molecule analysis • Reading sequences at a high speed • Highly parallel • Long reads >10kb • No errors Long reads: what do we want?
  • 25. Pacific Biosciences PacBio RS II Available since 2010 Oxford Nanopore Technologies MinION Available since 2014 Generating long reads
  • 26. Pacific Biosciences PacBio RSII It uses a zero mode waveguide to measure fluorescence in a very small volume.
  • 27. Ligate hairpin adapters Fragment gDNA and polish ends, and add adenosine. Attach polymerase, load on SMRT cell and sequence DNA polymerase Transparent bottom of zero mode waveguide Pacific Biosciences
  • 28. Pacific Biosciences P6-C4 • Yield 0.5-1 Gbp/SMRT cell. • Since no amplification is done you sequence the DNA as it comes out of your sample (nicks, base modifications). • There is very little sequence bias and no systemic errors Christoph Konig’s lecture at 14.15 hrs will delve much deeper into this technology.
  • 29. • Started to work on nanopore sensing in 2005 • Investments to date 180 million GBP (227 M€) • ~200 employees • Broad IP portfolio • Announced products: MinION and PromethION systems • Access program for MinION (MAP) Oxford Nanopore Technologies
  • 30. But MAP is much more. It is about being a community and a playground to test new applications. Last part of the development of this technology is done “in field” in an fairly open program. 100’s of MinIONs send around the globe to see how they would behave in real life. MAP is visible as a web portal with information from ONT and social media like system with blog possibilities, comment, likes, and a forum to ask advice. MinION access program
  • 31. Tethering oligo Motor protein Brake protein hairpin abasic nucleotidesT TA A Shear (optional) DNA repair (optional), AmpureXP purification end repair, AmpureXP purification A tailing, AmpureXP purification Ligation, His-tag purification, Dilution in run buffer and ATP A MuA transposase protocol is under development. This should further simplify sample preparation (10 minutes). Library preparation
  • 32. Tethering oligo Motor protein E5 Brake protein E3 hairpin abasic nucleotides Tether keeps DNA fragment on the membrane leading to a ~20K fold higher DNA concentration close to the pore. Motor protein unwinds DNA and ratchets it though the pore. Abasic nucleotides in the hairpin are a recognition point. Brake protein prevents the motor protein from zipping through the complement strand. Sequencing
  • 33. Stills taken from: https://www.nanoporetech.com/news/movies#movie-24-nanopore-dna-sequencing Strand sequencing ATP
  • 34. GGCTCACTCCCATAAGC GGCTC GCTCA CTCAC TCACT CACTC ACTCC CTCCC Raw Data (ionic curent, pA) Events (with time domain) Squiggle (events with time domain removed) Sensing the DNA
  • 35. Squiggle plot for a complete read First the template part in blue, then the abasic nucleotides in the hairpin in red, and finally the complement part in turquoise . Alignment of template and complement squiggles gives a 2d read. Squiggle plot
  • 36. MinKNOW controls the run and shows channel states….. Interactive interface
  • 37. ….. and amount of events vs read length. Metrichor agent runs in the background to send sequence files to and from the (cloud based) base caller. MinKNOW can interact with other software. minoTour analyses reads in a streaming mode and can control MinKNOW. Interactive interface
  • 38. template mean 8734 bp complement mean 8126 bp 2D mean 9930 bp Read length is limited by the non-nicked fragment length rather than the by the system. My longest 2D read until now: 93.5 Kbp, template 120 Kb. Read length distribution
  • 39. There are actually 4 wells/detection channel. QC at the beginning of the run determines the quality of the 4wells. Sequencing starts on the best set of wells. Each 24 hrs the next best set of wells is chosen. Yield over time
  • 41. ref TGATGTATATGCTCTCTTTTCTGACGTTAGTCTCCGACGGCAGGCTTCAA-TGACCC-A-GGCTGAGAAATTCCCGGACCCTTTTTGCTCAAGAGCGATG |||||||||||||| |||||||||||| ||||||||||||||||||| |||||| | ||||||||||||||||||||| |||||| |||| | | MinION TGATGTATATGCTC----TTCTGACGTTAGCCTCCGACGGCAGGCTTCAATTGACCCGATGGCTGAGAAATTCCCGGACCC--TTTGCTACAGAGTG-T- ref TTAATTTGTTCAATCATTTGGTTAGGAAAGCGGATGTTGCGGGTTGTTGTTCTGCGGGTTCTGTTCTTCGTTGACATGAG---GTTGCCCCGTATTCAGT |||||||||||||||||||||||||||||||||| ||| |||||| | |||| ||||||| ||| |||||| | || | || | | | MinION TTAATTTGTTCAATCATTTGGTTAGGAAAGCGGA---TGC-GGTTGT--TCCTGC-GGTTCTG----TCG-TGACATCCGTTATTTGCGCTGT-TACGC ref GTCGC-TGATTTGTATTGTCTGAAGTTGTTTTTACGTTAAGTTGATGCAGATCAATTAATACGATACCT--GCGTCATAATTGATTATTTGACGT--GGT | || || |||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||| |||||||||||||||||||||||| ||| MinION ATGGCATGTTTTGTATTGTCTGAAGTTGTTTTTACGTTAAGTTAATGCAGATCAATTAATACGATACCTCGGCGTCATAATTGATTATTTGACGTGGGGT Error rate lies around 15% for current chemistry (R7.3). Typical passing 2D R7.3 read now is 2.8% deletions, 2.7% insertions and 1.7% substitutions. R8&9 nanopores are in the pipeline (improving on G/C rich reads and better S/N). Errors
  • 42. Errors result from different parts of the system. On the ASIC: Events are missed by the translation from raw data to event data. Solution: Sharpen up the raw data by playing with voltage and by new nanopores with lower noise. Sequence faster. In the base caller: Bases outside the observed k-mer influence the current. Solution: Higher k-mer models Modified bases are currently not included in the k-mer model. Solution: add modified k-mers to the model. Modified k-mers are different from unmodified k-mers. Errors
  • 43. Throughput is defined by: Number of channels. 512 on the MinION Speed of translocation. 30 bps/sec Occupancy of the pore. 90% The time a Flow Cell can run. ~60 hrs. Currently well over 1 Gb events. On R7.3 this translates to ~400 Mb 2D data. Throughput In “fast mode” the MinION will read 500 bps/sec. Currently three MAP groups are testing this. Throughput will increase to ~20 Gb in events.
  • 44. Longest 2D read: 93.5 Kbp Longest template read: 120 Kbp (231 Kbp) Highest yield: 1.32 Gevents R7 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Base pairs sequenced (Mbp) Runs template and 2D yield over the past year template 2D R7.3R6
  • 45. repeatunique sequence in unique sequence out Long reads can help to resolve repeat area’s in the assembly graph And the resulting contigs will now look like this: Untangle
  • 46. 1. Short read correction Quake (not for small genomes) 2. Short read assembly Velvet 3. MinION read alignment to Velvet contigs LAST 4. Link filtering and contig tiling Untangle script 5. Path detachment around repeats Untangle script 6. Bubble popping Untangle script 7. Delete unconfirmed connections Untangle script 8. Contig extraction Untangle script Assembly and scaffolding strategy Task Software
  • 47. Agrobacterium NCPPB 1771 assembly graph 25× transposon → (1160 bp) 8× transposon → (873 bp) 4× rRNA → (6.4 Kb) 271 nodes, 311 connections 154 contigs N50 = 198 Kb Sum = 5.87 Mb
  • 48. • Alignment: LAST with optimized settings • Links: alignment filtering and contig tiling • 7328 reads aligned to contigs • 438 reads aligned to multiple contigs • 585 links between contigs • 13158 reads on R6 and R7 chemistry • 73.8 Mb total yield (template and 2D) • 5–85970 nt length, typical ~12 Kb MinION sequencing and scaffolding
  • 49. Links between nodes are specific Means link is confirmed by PCR
  • 50. Final assembly graph after scaffolding • 271 nodes + 312 connections → 49 nodes + 5 connections • 154 contigs → ~8 contigs • Complete chromosome 2 (1.2 Mb), pTi (190 Kb), cryptic megaplasmid (746 Kb) • Slight residual fragmentation of chromosome 1
  • 51. Reads are in HDF5 format and contain all data from the event data onwards. A cloud based basecaller is provided by Oxford Naopore Technologies. The MAP community is actively developing software to use this type of data. Some examples: Jared Simpson’s pipeline to correct and assemble using only nanopore reads. Live monitoring, alignments and feedback to the MinION. Matt Loose’s Minotour. Squiggle space aligners Each base is measured 5 times in consecutive kmers so it makes sense to avoid basecalling and work directly with the events (squiggle space) Software
  • 52. London Calling 2015 Highlights from Clive Brown’s talk • Improvements to the basecaller . • Read until (and barcoding). • Fast mode on the MinION MkI (500 bp/sec instead of 30). • New 3000 channel ASIC with “crumpet” chip design to separate ASIC and fluidics part. • MinION MkII and PromethION will have this new ASIC. • Library prep on beads to reduce amounts of DNA needed (lower ng to pg). • Direct RNA sequencing. • Simplified sample preparation and VolTRAX. • Pricing will be “pay as you go”. Initial payment for hardware include some hrs sequencing. • MkI $270 and 3 hrs sequencing (~3 Gbp in fast mode).
  • 53. London Calling 2015 Much emphasis on getting the library prep simpler and faster to be able to leave the lab. If the system leaves the lab many more applications become possible. VolTRAX
  • 54. The technology underlying the MinION system is scalable so larger throughput can be made available relatively easy. It will use the new ASIC design and will have 144000 channels. Projected throughput: 6.4 Tbp/day. Too much data to do cloud baseclling so will be done locally. Access Program will start later this year. London Calling 2015 PromethION
  • 55. Freek Vonk Harald Kerkkamp Asad Hyder Michael Richardson Christiaan Henkel Paul Hooykaas Ron Dirks Guido van den Thillart Herman Spaink Pim Arntzen Erwin Fakkert Marten Boetzer Walter Pirovano Diana Uffink R. Manjunatha Kini Ken Kraaijeveld Yavuz Ariyurek Arnoud Schmitz Yahya Anvar Acknowledgments Dan Turner Oliver Hartwell