SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Adam M. Phillippy
Head, Genome Informatics Section
40 Years of Genome Assembly:
Are We Done Yet?
@aphillippy
1980
2014
2001
2012
1995
2020
2010
• Genome assembly’s 40th anniversary
• Rodger Staden (1979)
• “With modern fast sequencing techniques1,2 and
suitable computer programs it is now possible to
sequence whole genomes without the need of
restriction maps.”
A strategy of DNA sequencing employing computer programs. Staden. Nucleic Acids Research (1979)
• Shotgun assembly
• 1995: Haemophilus influenzae
• 1995: Overlap graphs
• 1995: de Bruijn graphs
1980
2014
2001
2012
1995
2020
2010
• The first human genome
• 2000: Celera Assembler
• 2001: The human genome
1980
2014
2001
2012
1995
2020
2010
1980
2001
2012
1995
2020
2014
2010
• Shotgun sequencing era
Input
Extraction
Sequencing
Assembly
Output
1980
2001
2012
1995
2020
2014
2010
• Long-read shotgun sequencing
• First complete de novo assemblies
• 2012: Bacteria (106 bp)
Class I Class II
Yersinia pestis
CO92
Esche
O26:H
Bacillus anthracis
Ames
0
20
0
161
16
171
1980
2014
2001
2012
1995
2020
2010
• First complete de novo assemblies
• 2012: Bacteria (106 bp)
• 2014: Yeast (107 bp)
1980
2014
2001
2012
1995
2020
2010
• First complete de novo assemblies
• 2012: Bacteria (106 bp)
• 2014: Yeast (107 bp)
• 2014: Drosophila (108 bp)
3L3R
2R
2L X
1980
2014
2001
2012
1995
2020
2010
• First complete de novo assemblies
• 2012: Bacteria (106 bp)
• 2014: Yeast (107 bp)
• 2014: Drosophila (108 bp)
• ????: Human (109 bp)
1980
2014
2001
2012
1995
2020
2010
Assembly is solved:
Sequence all the things!
VertebrateGenomesProject.org
• HQ Reference assemblies
• >1 Mb contig N50
• Scaffolds == chromosomes
• 99.99% average base quality
• Sequencing Technology
• Long reads: PacBio
• Linked reads: 10x Genomics
• Optical maps: BioNano
• Cross linking: Arima Hi-C
Vertebrate Genomes Project
Erich Jarvis, chairperson – worldwide consortium of universities, museums, zoos, etc.
~250
~1,000
~10,000
G10K
~60,000
B10K, Bat1K
Orders
Families
Genera
Species
VGP Assembly Working Group
VGP Assembly Pipeline
PacBio
10XG
Contigging
+ Purging
Scaffolding
BioNano
Scaffolding
Hi-C
Gap-filling &
Curation
Final assembly
A
A
A
C TGGA
TGGGGA
TGGGGA
TGGGGA
A TGGGGA
Polishing
Scaffolding
exon 1 exon 2 exon 3
Primary
Alternate
• vgp.github.io
• 86 species currently posted
• 24 with all four data types
The GenomeArk
Jennifer Vashon of Maine Department of Inland Fisheries and Wildlife, left, and
UMass lynx team coordinator, Tanya Lama, with an adult male lynx from northern
Maine whose DNA was used to create first-ever whole genome for the species.
The lynx has since been released to the wild. (MassWildlife photo / Bill Byrne)
VGP Phase 1: What did we learn?
• Iterative assembly process is not ideal
• Errors carry over and are hard to correct
• Data integration is hard
• Most tools built for a single technology
• Little reward for building complex, integrated systems
• Need to decentralize
• Open data, standard formats, modular frameworks
• Nobody* likes building infrastructure
Assembly is hard
• P(Asm|Data) ∝ P(Data|Asm)
• Read coverage
• Hi-C heatmaps
• k-mer recovery
• Comparative annotation
Assembly validation is critical
• Cannot map short reads to repeats
• Therefore, cannot effectively polish/assemble with short reads
• Long read assemblies more accurate in repeats (e.g. HLA, rRNA)
• PacBio can exceed 99.999% accuracy (QV50)
Long read polishing is essential
In some regions, short-read polishing can actually harm the assembly
Oddballs
• Marmoset chimeras
• Zebra finch GRCs
• Platypus sex chrs (10!)
• Lamprey genome deletions
• Fish with spikes and stripes
Not all vertebrates are created equal
Contig N50 (Mb)
Repeats (%)
Mixed haplotypes can introduce indels
CGTTAAAGC
CGTTAAAGC
CGTTAAAGC
CGTTTAAGC
CGTTTAAGC
CGTTTAAAGC
CGTT-AAAGC
CGTT-AAAGC
CGTTTAA-GC
CGTTTAA-GC
P(sub) = 0.01
P(ins) = 0.12
P(del) = 0.02
P(mat) = 0.85
P(mat)^34 * P(sub)^2
3.983304e-07
P(mat)^36 * P(ins)^4
5.967691e-07<
Heterozygosity can lead to false duplications
P:
A:
FALCON-
Unzip
Finch Fish
Size (Gbp) 1.09 0.94 1.95 0.73
NG50 (Mbp) 3.0 0.6 2.6 0.02
BUSCO (c) 93.9 82.1 94.2 40.6
BUSCO (d) 5.0 3.3 20.8 3.4
1.2% 1.6%
Assemble the genomes
De novo assembly of haplotype-resolved genomes with trio binning.
Koren, Rhie, et al. Nature Biotechnology (2018)
×
DamSire
F1 cross
Parental
k-mers
Sire haplotype
Dam haplotype
Sire assembly Dam assembly
Unassigned
Correctly resolved alleles with TrioBinning
FALCON-
Unzip
TrioCanu
FALCON-
Unzip
TrioCanu
Size (Gbp) 1.09 0.94 1.05 1.06 1.95 0.73 1.37 1.36
NG50 (Mbp) 3.0 0.6 3.6 4.0 2.6 0.02 2.6 2.1
BUSCO (c) 93.9 82.1 94.4 93.3 94.2 40.6 91.6 92.7
BUSCO (d) 5.0 3.3 1.4 1.3 20.8 3.4 3.5 3.4
1.2% 1.6%
Esperanza: A nearly perfect diploid
125x PacBio coverage (~60x per haplotype), no Illumina polishing needed, TrioCanu haplotig NG50 70 Mbp, BUSCOs 94%
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 X
Dam (yak)
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 X
Sire (Highland) Esperanza
Can we finally finish the human
genome?
• The human reference genome is incomplete
• 368 unresolved issues, 102 gaps
• Segmental duplications, satellites, rDNAs
• Centromeres, telomeres, heterochromatin
• These gaps contain important information
• Missing reference sequence leads to analysis artifacts
• Variation in these gaps is unexplored (e.g. rDNAs)
• We don’t know what we don’t know…
We need to finish the genome
Our target: CHM13hTERT
Cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton and Tamara Potapova, Stowers
N=46; XX
• Repeats are long, reads are short
• “If the overlap is of sufficient length to distinguish
it from being a repeat in the sequence the two
sequences must be contiguous.”
— Rodger Staden, 1979
What’s the problem?
• How long are the repeats?
• 7 kbp LINEs
• 1 Mbp+ rDNA arrays
• 1 Mbp+ centromere arrays
• 10 Mbp+ heterochromatin blocks
• Coverage and accuracy matter too
• 1,000X of 100 bp reads at 100% accuracy? NO
• 10X of 10,000,000 bp reads at 100% accuracy, YES
• 100X of 100,000 bp reads at 90% accuracy, MAYBE?
How long do reads need to be, for human?
>50% of the genome
• Length at the expense of throughput
• Read lengths >1 Mbp possible
Ultra-long nanopore sequencing
Nanopore sequencing and assembly of a human genome with ultra-long reads.
Jain et al. Nature Biotechnology (2018)
• Prediction: 30x raw UL coverage == GRCh38
How much do we need?
Nanopore sequencing and assembly of a human genome with ultra-long reads.
Jain et al. Nature Biotechnology (2018)
• 30x Nanopore ultra-long
• Contig building
• 60x PacBio
• Polishing
• 50x 10x Genomics
• Polishing
• BioNano
• Structural validation
We need long reads. Lots of long reads
• Nanopore UL read length distribution is long tailed
It pays to go deep
repeat
• From May 1 – October 29, 2018
• 62 MinION/GridION flow cells
• 8.9M reads, 98 Gb, 1.6 Gb / cell
• N50 read length 76 kb
• 44 Gb in reads >100 kb
• Max read length 1.03 Mb
• Assembled with Canu
CHM13 sequencing
Now upwards of 90+ flow cells and counting…
The human genome, 2001
ref28 NG50 contig 0.5 Mbp
The human genome, 2019
CHM13 NG50 contig 75 Mbp (70x PacBio + 35x UL ONT)
13 14 15 16 17 18 19 20 21 22 X
1 2 3 4 5 6 7 8 9 10 11 12
Canu
The first complete assembly
of a human chromosome
A complete X chromosome
ddPCR
• Unique structural variants from PacBio
• Unique k-mers confirmed by Duplex-Seq
Stitching across the X centromere
An assembly is a hypothesis
• Per read error rates between 5–15%
• Latest Nanopore > PacBio
• Consensus error rates >99.9%
• After Nanopore polishing QV30
• After PacBio polishing QV40
• BAC validation
• >85% of BACs at >99.8% idy
• v.s. 54% for prior PacBio asm
What about the error rate?
BAC analysis courtesy of Eichler lab @ UW
88.0 / 90.6 / 92.4
• ChrX GAGE gene locus
• 19 tandemly arrayed ~9.4 kb repeats
• Corrupted by mapping/polishing pipeline
Repeat collapse analysis
Mitchell Vollger @ UW
• Mappers prefer the “best” alignment
• Consensus can be of variable quality (patches)
• Best mapping not always the correct mapping
• Marker-based anchoring
• Increase number of secondary alignments returned
• Redefine mapping quality to measure single-copy k-
mer agreement between read and assembly
Unique k-mer mapping
Before:
After:
Centromere array validation
Jennifer Gerton @ Stowers
Centromere array validation
Beth Sullivan @ Duke
1.8 Mb
0.7 Mb
0.3 Mb
It’s time to finish the human genome
• Almost!
• Have proven it’s possible for the X chromosome
• T2T assembly of all chrs within the next 2 years
• Challenges
• REPEATS, REPEATS, REPEATS
• Heterozygosity: diploids, polyploids, metagenomes
• Nanopore-only consensus quality
• Targeted long-read sequencing
Are we there yet?
• github.com/nanopore-wgs-consortium/chm13
• Draft whole-genome assemblies
• Nanopore ultra-long reads
• 10x Genomics reads
• BioNano DLS (WashU)
• PacBio (SRA)
• Coming soon:
• Arima Genomics Hi-C
• PacBio CCS
• Strand-Seq
All CHM13 data is openly released
NHGRI
• Sergey Koren
• Arang Rhie
• Jim Mullikin
• Alice Young
• Shelise Brooks
• Valerie Maduro
• Gerard Bouffard
• Sofia Barreira
• Andy Baxevanis
• Nancy Hansen
• Karen Miga, UCSC
• Jennifer Gerton, Stowers
• Tamara Potapova, Stowers
• Beth Sullivan, Duke
• Tina Graves Lindsay, WashU
• Ira Hall, WashU
• Valerie Schneider, NCBI
• Kerstin Howe, Sanger
• Jo Wood, Sanger
• Matt Loose, Nottingham
• Nick Loman, Birmingham
• Urvashi Surti, Pitt (ret.)
Acknowledgements
Evan Eichler, Mitchel Vollger, Glennis Logsdon, David Porubsky, Melanie Sorensen
It’s time to finish the human genome
Google “t2t consortium” – I’ll be hiring in the fall
The Telomere-to-Telomere (T2T) consortium is an
open, community-based effort to generate the
first complete assembly of a human genome.

Weitere ähnliche Inhalte

Was ist angesagt?

An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformaticsJoel Ricci-López
 
File formats for Next Generation Sequencing
File formats for Next Generation SequencingFile formats for Next Generation Sequencing
File formats for Next Generation SequencingPierre Lindenbaum
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5BITS
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applicationsAGRF_Ltd
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq DataPhil Ewels
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Sebastian Schmeier
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure predictionSubin E K
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGAayushi Pal
 
Telomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosomeTelomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosomeAdam Phillippy
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation SequencingFarid MUSA
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Genome Sequencing Project
Genome Sequencing ProjectGenome Sequencing Project
Genome Sequencing Projectguestd53a1
 
The next generation sequencing platform of roche 454
The next generation sequencing platform of roche 454The next generation sequencing platform of roche 454
The next generation sequencing platform of roche 454creativebiogene1
 
proteomics and genomics-1
proteomics and genomics-1proteomics and genomics-1
proteomics and genomics-1Shyam Kodi
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slidesharehansjansen9999
 

Was ist angesagt? (20)

An Overview to Protein bioinformatics
An Overview to Protein bioinformaticsAn Overview to Protein bioinformatics
An Overview to Protein bioinformatics
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
File formats for Next Generation Sequencing
File formats for Next Generation SequencingFile formats for Next Generation Sequencing
File formats for Next Generation Sequencing
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure prediction
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
Telomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosomeTelomere-to-telomere assembly of a complete human X chromosome
Telomere-to-telomere assembly of a complete human X chromosome
 
Introduction to Next Generation Sequencing
Introduction to Next Generation SequencingIntroduction to Next Generation Sequencing
Introduction to Next Generation Sequencing
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Genome Sequencing Project
Genome Sequencing ProjectGenome Sequencing Project
Genome Sequencing Project
 
The next generation sequencing platform of roche 454
The next generation sequencing platform of roche 454The next generation sequencing platform of roche 454
The next generation sequencing platform of roche 454
 
proteomics and genomics-1
proteomics and genomics-1proteomics and genomics-1
proteomics and genomics-1
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare
 

Ähnlich wie 40 Years of Genome Assembly: Are We Done Yet?

How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortiumGenomeInABottle
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods Mrinal Vashisth
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowBrian Krueger
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsfmaumus
 
Tetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenTetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenJonathan Eisen
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewDominic Suciu
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionMinesh A. Jethva
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeBrian Krueger
 
The Human Genome Project - Part I
The Human Genome Project - Part IThe Human Genome Project - Part I
The Human Genome Project - Part Ihhalhaddad
 
CALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqCALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqAshley Yow
 
Human genome project
Human genome projectHuman genome project
Human genome projectRakesh R
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorialc.titus.brown
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptRuthMWinnie
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptEdizonJambormias2
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin
 

Ähnlich wie 40 Years of Genome Assembly: Are We Done Yet? (20)

How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
HMD_Sequencing_KIBGE_KCHI.pptx
HMD_Sequencing_KIBGE_KCHI.pptxHMD_Sequencing_KIBGE_KCHI.pptx
HMD_Sequencing_KIBGE_KCHI.pptx
 
Lecture on the annotation of transposable elements
Lecture on the annotation of transposable elementsLecture on the annotation of transposable elements
Lecture on the annotation of transposable elements
 
Microbial physiology in genomic era
Microbial physiology in genomic eraMicrobial physiology in genomic era
Microbial physiology in genomic era
 
Tetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenTetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan Eisen
 
Next Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology OverviewNext Gen Sequencing (NGS) Technology Overview
Next Gen Sequencing (NGS) Technology Overview
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools Selection
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genome
 
The Human Genome Project - Part I
The Human Genome Project - Part IThe Human Genome Project - Part I
The Human Genome Project - Part I
 
CALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqCALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeq
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 

Kürzlich hochgeladen

Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 

Kürzlich hochgeladen (20)

Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 

40 Years of Genome Assembly: Are We Done Yet?

  • 1. Adam M. Phillippy Head, Genome Informatics Section 40 Years of Genome Assembly: Are We Done Yet? @aphillippy
  • 2. 1980 2014 2001 2012 1995 2020 2010 • Genome assembly’s 40th anniversary • Rodger Staden (1979) • “With modern fast sequencing techniques1,2 and suitable computer programs it is now possible to sequence whole genomes without the need of restriction maps.” A strategy of DNA sequencing employing computer programs. Staden. Nucleic Acids Research (1979)
  • 3. • Shotgun assembly • 1995: Haemophilus influenzae • 1995: Overlap graphs • 1995: de Bruijn graphs 1980 2014 2001 2012 1995 2020 2010
  • 4. • The first human genome • 2000: Celera Assembler • 2001: The human genome 1980 2014 2001 2012 1995 2020 2010
  • 5. 1980 2001 2012 1995 2020 2014 2010 • Shotgun sequencing era Input Extraction Sequencing Assembly Output
  • 7. • First complete de novo assemblies • 2012: Bacteria (106 bp) Class I Class II Yersinia pestis CO92 Esche O26:H Bacillus anthracis Ames 0 20 0 161 16 171 1980 2014 2001 2012 1995 2020 2010
  • 8. • First complete de novo assemblies • 2012: Bacteria (106 bp) • 2014: Yeast (107 bp) 1980 2014 2001 2012 1995 2020 2010
  • 9. • First complete de novo assemblies • 2012: Bacteria (106 bp) • 2014: Yeast (107 bp) • 2014: Drosophila (108 bp) 3L3R 2R 2L X 1980 2014 2001 2012 1995 2020 2010
  • 10. • First complete de novo assemblies • 2012: Bacteria (106 bp) • 2014: Yeast (107 bp) • 2014: Drosophila (108 bp) • ????: Human (109 bp) 1980 2014 2001 2012 1995 2020 2010
  • 11. Assembly is solved: Sequence all the things!
  • 13. • HQ Reference assemblies • >1 Mb contig N50 • Scaffolds == chromosomes • 99.99% average base quality • Sequencing Technology • Long reads: PacBio • Linked reads: 10x Genomics • Optical maps: BioNano • Cross linking: Arima Hi-C Vertebrate Genomes Project Erich Jarvis, chairperson – worldwide consortium of universities, museums, zoos, etc. ~250 ~1,000 ~10,000 G10K ~60,000 B10K, Bat1K Orders Families Genera Species
  • 15. VGP Assembly Pipeline PacBio 10XG Contigging + Purging Scaffolding BioNano Scaffolding Hi-C Gap-filling & Curation Final assembly A A A C TGGA TGGGGA TGGGGA TGGGGA A TGGGGA Polishing Scaffolding exon 1 exon 2 exon 3 Primary Alternate
  • 16. • vgp.github.io • 86 species currently posted • 24 with all four data types The GenomeArk Jennifer Vashon of Maine Department of Inland Fisheries and Wildlife, left, and UMass lynx team coordinator, Tanya Lama, with an adult male lynx from northern Maine whose DNA was used to create first-ever whole genome for the species. The lynx has since been released to the wild. (MassWildlife photo / Bill Byrne)
  • 17. VGP Phase 1: What did we learn?
  • 18. • Iterative assembly process is not ideal • Errors carry over and are hard to correct • Data integration is hard • Most tools built for a single technology • Little reward for building complex, integrated systems • Need to decentralize • Open data, standard formats, modular frameworks • Nobody* likes building infrastructure Assembly is hard
  • 19. • P(Asm|Data) ∝ P(Data|Asm) • Read coverage • Hi-C heatmaps • k-mer recovery • Comparative annotation Assembly validation is critical
  • 20. • Cannot map short reads to repeats • Therefore, cannot effectively polish/assemble with short reads • Long read assemblies more accurate in repeats (e.g. HLA, rRNA) • PacBio can exceed 99.999% accuracy (QV50) Long read polishing is essential In some regions, short-read polishing can actually harm the assembly
  • 21. Oddballs • Marmoset chimeras • Zebra finch GRCs • Platypus sex chrs (10!) • Lamprey genome deletions • Fish with spikes and stripes Not all vertebrates are created equal Contig N50 (Mb) Repeats (%)
  • 22. Mixed haplotypes can introduce indels CGTTAAAGC CGTTAAAGC CGTTAAAGC CGTTTAAGC CGTTTAAGC CGTTTAAAGC CGTT-AAAGC CGTT-AAAGC CGTTTAA-GC CGTTTAA-GC P(sub) = 0.01 P(ins) = 0.12 P(del) = 0.02 P(mat) = 0.85 P(mat)^34 * P(sub)^2 3.983304e-07 P(mat)^36 * P(ins)^4 5.967691e-07<
  • 23. Heterozygosity can lead to false duplications P: A: FALCON- Unzip Finch Fish Size (Gbp) 1.09 0.94 1.95 0.73 NG50 (Mbp) 3.0 0.6 2.6 0.02 BUSCO (c) 93.9 82.1 94.2 40.6 BUSCO (d) 5.0 3.3 20.8 3.4 1.2% 1.6%
  • 24. Assemble the genomes De novo assembly of haplotype-resolved genomes with trio binning. Koren, Rhie, et al. Nature Biotechnology (2018) × DamSire F1 cross Parental k-mers Sire haplotype Dam haplotype Sire assembly Dam assembly Unassigned
  • 25. Correctly resolved alleles with TrioBinning FALCON- Unzip TrioCanu FALCON- Unzip TrioCanu Size (Gbp) 1.09 0.94 1.05 1.06 1.95 0.73 1.37 1.36 NG50 (Mbp) 3.0 0.6 3.6 4.0 2.6 0.02 2.6 2.1 BUSCO (c) 93.9 82.1 94.4 93.3 94.2 40.6 91.6 92.7 BUSCO (d) 5.0 3.3 1.4 1.3 20.8 3.4 3.5 3.4 1.2% 1.6%
  • 26. Esperanza: A nearly perfect diploid 125x PacBio coverage (~60x per haplotype), no Illumina polishing needed, TrioCanu haplotig NG50 70 Mbp, BUSCOs 94% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 X Dam (yak) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 X Sire (Highland) Esperanza
  • 27. Can we finally finish the human genome?
  • 28. • The human reference genome is incomplete • 368 unresolved issues, 102 gaps • Segmental duplications, satellites, rDNAs • Centromeres, telomeres, heterochromatin • These gaps contain important information • Missing reference sequence leads to analysis artifacts • Variation in these gaps is unexplored (e.g. rDNAs) • We don’t know what we don’t know… We need to finish the genome
  • 29. Our target: CHM13hTERT Cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton and Tamara Potapova, Stowers N=46; XX
  • 30. • Repeats are long, reads are short • “If the overlap is of sufficient length to distinguish it from being a repeat in the sequence the two sequences must be contiguous.” — Rodger Staden, 1979 What’s the problem?
  • 31. • How long are the repeats? • 7 kbp LINEs • 1 Mbp+ rDNA arrays • 1 Mbp+ centromere arrays • 10 Mbp+ heterochromatin blocks • Coverage and accuracy matter too • 1,000X of 100 bp reads at 100% accuracy? NO • 10X of 10,000,000 bp reads at 100% accuracy, YES • 100X of 100,000 bp reads at 90% accuracy, MAYBE? How long do reads need to be, for human? >50% of the genome
  • 32. • Length at the expense of throughput • Read lengths >1 Mbp possible Ultra-long nanopore sequencing Nanopore sequencing and assembly of a human genome with ultra-long reads. Jain et al. Nature Biotechnology (2018)
  • 33. • Prediction: 30x raw UL coverage == GRCh38 How much do we need? Nanopore sequencing and assembly of a human genome with ultra-long reads. Jain et al. Nature Biotechnology (2018)
  • 34. • 30x Nanopore ultra-long • Contig building • 60x PacBio • Polishing • 50x 10x Genomics • Polishing • BioNano • Structural validation We need long reads. Lots of long reads
  • 35. • Nanopore UL read length distribution is long tailed It pays to go deep repeat
  • 36. • From May 1 – October 29, 2018 • 62 MinION/GridION flow cells • 8.9M reads, 98 Gb, 1.6 Gb / cell • N50 read length 76 kb • 44 Gb in reads >100 kb • Max read length 1.03 Mb • Assembled with Canu CHM13 sequencing Now upwards of 90+ flow cells and counting…
  • 37. The human genome, 2001 ref28 NG50 contig 0.5 Mbp
  • 38. The human genome, 2019 CHM13 NG50 contig 75 Mbp (70x PacBio + 35x UL ONT) 13 14 15 16 17 18 19 20 21 22 X 1 2 3 4 5 6 7 8 9 10 11 12 Canu
  • 39. The first complete assembly of a human chromosome
  • 40. A complete X chromosome ddPCR
  • 41. • Unique structural variants from PacBio • Unique k-mers confirmed by Duplex-Seq Stitching across the X centromere
  • 42. An assembly is a hypothesis
  • 43. • Per read error rates between 5–15% • Latest Nanopore > PacBio • Consensus error rates >99.9% • After Nanopore polishing QV30 • After PacBio polishing QV40 • BAC validation • >85% of BACs at >99.8% idy • v.s. 54% for prior PacBio asm What about the error rate? BAC analysis courtesy of Eichler lab @ UW 88.0 / 90.6 / 92.4
  • 44. • ChrX GAGE gene locus • 19 tandemly arrayed ~9.4 kb repeats • Corrupted by mapping/polishing pipeline Repeat collapse analysis Mitchell Vollger @ UW
  • 45. • Mappers prefer the “best” alignment • Consensus can be of variable quality (patches) • Best mapping not always the correct mapping • Marker-based anchoring • Increase number of secondary alignments returned • Redefine mapping quality to measure single-copy k- mer agreement between read and assembly Unique k-mer mapping Before: After:
  • 47. Centromere array validation Beth Sullivan @ Duke 1.8 Mb 0.7 Mb 0.3 Mb
  • 48. It’s time to finish the human genome
  • 49. • Almost! • Have proven it’s possible for the X chromosome • T2T assembly of all chrs within the next 2 years • Challenges • REPEATS, REPEATS, REPEATS • Heterozygosity: diploids, polyploids, metagenomes • Nanopore-only consensus quality • Targeted long-read sequencing Are we there yet?
  • 50. • github.com/nanopore-wgs-consortium/chm13 • Draft whole-genome assemblies • Nanopore ultra-long reads • 10x Genomics reads • BioNano DLS (WashU) • PacBio (SRA) • Coming soon: • Arima Genomics Hi-C • PacBio CCS • Strand-Seq All CHM13 data is openly released
  • 51. NHGRI • Sergey Koren • Arang Rhie • Jim Mullikin • Alice Young • Shelise Brooks • Valerie Maduro • Gerard Bouffard • Sofia Barreira • Andy Baxevanis • Nancy Hansen • Karen Miga, UCSC • Jennifer Gerton, Stowers • Tamara Potapova, Stowers • Beth Sullivan, Duke • Tina Graves Lindsay, WashU • Ira Hall, WashU • Valerie Schneider, NCBI • Kerstin Howe, Sanger • Jo Wood, Sanger • Matt Loose, Nottingham • Nick Loman, Birmingham • Urvashi Surti, Pitt (ret.) Acknowledgements Evan Eichler, Mitchel Vollger, Glennis Logsdon, David Porubsky, Melanie Sorensen
  • 52. It’s time to finish the human genome Google “t2t consortium” – I’ll be hiring in the fall The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome.