SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Adam M. Phillippy
Head, Genome Informatics Section
Telomere-to-telomere assembly of a
complete human X chromosome
AGBT – March 2, 2019
• The human reference genome is incomplete
• 368 unresolved issues, 102 gaps
• Segmental duplications, rDNAs
• Centromeres, telomeres, heterochromatin
• These gaps contain important information
• Missing reference sequence leads to analysis artifacts
• Variation in these gaps is unexplored (e.g. rDNAs)
• We don’t know what we don’t know…
I have some troubling news…
@khmiga @aphillippy
Karen Miga Adam Phillippy
Let’s finish the human genome
• Repeats are long, reads are short
• “If the overlap is of sufficient length to distinguish
it from being a repeat in the sequence the two
sequences must be contiguous.”
• Rodger Staden, 1979, MRC Laboratory of Molecular Biology
What’s the problem?
• The return of closed (bacterial) genomes
• Bibersteinia trehalosi 192
Flashback to AGBT 2012
• How long are the repeats?
• 7 kbp LINEs
• 1 Mbp+ rDNA arrays
• 1 Mbp+ centromere arrays
• 10 Mbp+ heterochromatin blocks
• Coverage and accuracy matter too
• 1,000X of 100 bp reads at 100% accuracy? NO
• 10X of 10,000,000 bp reads at 100% accuracy, YES
• 100X of 100,000 bp reads at 90% accuracy, MAYBE?
How long do reads need to be, for human?
• ONT R9 pore: E. coli CsgG membrane protein
• Read lengths >1 Mbp possible
Ultra-long nanopore sequencing
*Assuming 3.4 Å per bp, 1 Mbp = 3,400,000 Å (0.34 mm) = 40,000x height of the pore
120 Å
85 Å
3.2 km in 37 m
8 cm
LONG READ CLUB
Really very long reads indeed
@pathogenomenick
Nick Loman
@mattloose
Matt Loose
It’s time to finish the human genome
CHM13 cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton, Stowers (N=46; XX)
The Telomere-to-Telomere (T2T) consortium is an
open, community-based effort to generate the
first complete assembly of a human genome.
• 30x Nanopore ultra-long
• Contig building
• 60x PacBio
• Polishing
• 50x 10x Genomics
• Polishing
• BioNano
• Structural validation
We need long reads. Lots of long reads
100 kb
• Nanopore UL read length distribution is long tailed
It pays to go deep
repeat
• From May 1 – October 29, 2018
• 62 MinION/GridION flow cells
• 8.9M reads, 98 Gb, 1.6 Gb / cell
• N50 read length 76 kb
• 44 Gb in reads >100 kb
• Max read length 1.03 Mb
• Assembled with Canu
• 10x cov of 100 kb at 90% acc
CHM13 sequencing
Now upwards of 90 flow cells and counting…
The human genome, 2001
ref28 NG50 contig 0.5 Mbp
The human genome, 2019
CHM13 NG50 contig 75 Mbp (70x PacBio + 35x UL ONT)
13 14 15 16 17 18 19 20 21 22 X
1 2 3 4 5 6 7 8 9 10 11 12
Canu
The first complete assembly
of a human chromosome
A complete X chromosome
ddPCR
• Unique structural variants from PacBio
• Unique k-mers confirmed by Duplex-Seq
Stitching across the X centromere
An assembly is a hypothesis
Anchored 100 kb+ centromere reads
Requires a careful measure of “mapping quality”
Centromere array validation
Centromere array validation
1.8 Mb
0.7 Mb
0.3 Mb
It’s time to finish the human genome
• Almost!
• Have proven it’s possible for the X chromosome
• T2T assembly of all chrs within the next 2 years
• Remaining challenges
• Satellite arrays, rDNA arrays, segmental duplications
• Nanopore consensus quality
• Targeted long-read sequencing
• Better methods for phasing repeats and haplotypes
Are we there yet?
• github.com/nanopore-wgs-consortium/chm13
• Draft whole-genome assemblies
• Nanopore ultra-long reads
• 10x Genomics reads
• BioNano DLS (WashU)
• PacBio (SRA)
• Coming soon:
• Hi-C (Arima Genomics)
All our CHM13 data is openly released
NHGRI
• Sergey Koren
• Arang Rhie
• Jim Mullikin
• Alice Young
• Shelise Brooks
• Valerie Maduro
• Gerard Bouffard
• Sofia Barreira
• Andy Baxevanis
• Nancy Hansen
• Karen Miga, UCSC
• Jennifer Gerton, Stowers
• Tamara Potapova, Stowers
• Tina Graves Lindsay, WashU
• Ira Hall, WashU
• Valerie Schneider, NCBI
• Kerstin Howe, Sanger
• Jo Wood, Sanger
• Matt Loose, Nottingham
• Nick Loman, Birmingham
• Urvashi Surti, Pitt (ret.)
Acknowledgements

Weitere ähnliche Inhalte

Was ist angesagt?

KHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdfKHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdfKarenMiga
 
Genome Sequencing Project
Genome Sequencing ProjectGenome Sequencing Project
Genome Sequencing Projectguestd53a1
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
CRISPR/CAS9- THE GENE EDITING TOOL
CRISPR/CAS9- THE GENE EDITING TOOLCRISPR/CAS9- THE GENE EDITING TOOL
CRISPR/CAS9- THE GENE EDITING TOOLChandni Verma
 
Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assemblyZhuyi Xue
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9 Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9 Ek Han Tan
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
History of Genomics
History of Genomics History of Genomics
History of Genomics Sonal Chavan
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
FastQC and Prinseqlite
FastQC and PrinseqliteFastQC and Prinseqlite
FastQC and PrinseqliteRavi Gandham
 

Was ist angesagt? (20)

Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
KHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdfKHMiga-AGBT.020923.upload.pdf
KHMiga-AGBT.020923.upload.pdf
 
Genome Sequencing Project
Genome Sequencing ProjectGenome Sequencing Project
Genome Sequencing Project
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
CRISPR/CAS9- THE GENE EDITING TOOL
CRISPR/CAS9- THE GENE EDITING TOOLCRISPR/CAS9- THE GENE EDITING TOOL
CRISPR/CAS9- THE GENE EDITING TOOL
 
Overlap Layout Consensus assembly
Overlap Layout Consensus assemblyOverlap Layout Consensus assembly
Overlap Layout Consensus assembly
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9 Genome Editing CRISPR-Cas9
Genome Editing CRISPR-Cas9
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
History of Genomics
History of Genomics History of Genomics
History of Genomics
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
Primer design
Primer designPrimer design
Primer design
 
Crisper Cas system
Crisper Cas systemCrisper Cas system
Crisper Cas system
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
FastQC and Prinseqlite
FastQC and PrinseqliteFastQC and Prinseqlite
FastQC and Prinseqlite
 

Ähnlich wie Telomere-to-telomere assembly of a complete human X chromosome

How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortiumGenomeInABottle
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeBrian Krueger
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowBrian Krueger
 
Tetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenTetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenJonathan Eisen
 
Human genome project
Human genome projectHuman genome project
Human genome projectRakesh R
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods Mrinal Vashisth
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen MigaKaren Hayden Miga
 
CALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqCALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqAshley Yow
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
Genome organisation
Genome organisationGenome organisation
Genome organisationDeepak Kumar
 
Useful.ppt
Useful.pptUseful.ppt
Useful.pptaaaa bbb
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS4RTPCRAnand
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cellAmitSamadhiya1
 
1_7_genome_1.ppt
1_7_genome_1.ppt1_7_genome_1.ppt
1_7_genome_1.pptOmerBushra4
 
The Human Genome Project - Part I
The Human Genome Project - Part IThe Human Genome Project - Part I
The Human Genome Project - Part Ihhalhaddad
 
Human Genome presentation.pptx
Human Genome presentation.pptxHuman Genome presentation.pptx
Human Genome presentation.pptxbeth951481
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaBhavya Sree
 
sequencing of genome
sequencing of genomesequencing of genome
sequencing of genomeNaveen Gupta
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfNoraCRuizGuevara
 

Ähnlich wie Telomere-to-telomere assembly of a complete human X chromosome (20)

How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genome
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
Tetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenTetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan Eisen
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
 
CALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqCALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeq
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
Genetics
GeneticsGenetics
Genetics
 
Genome organisation
Genome organisationGenome organisation
Genome organisation
 
Useful.ppt
Useful.pptUseful.ppt
Useful.ppt
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
 
1_7_genome_1.ppt
1_7_genome_1.ppt1_7_genome_1.ppt
1_7_genome_1.ppt
 
The Human Genome Project - Part I
The Human Genome Project - Part IThe Human Genome Project - Part I
The Human Genome Project - Part I
 
Human Genome presentation.pptx
Human Genome presentation.pptxHuman Genome presentation.pptx
Human Genome presentation.pptx
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thaliana
 
sequencing of genome
sequencing of genomesequencing of genome
sequencing of genome
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdf
 

Kürzlich hochgeladen

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 

Kürzlich hochgeladen (20)

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 

Telomere-to-telomere assembly of a complete human X chromosome

  • 1. Adam M. Phillippy Head, Genome Informatics Section Telomere-to-telomere assembly of a complete human X chromosome AGBT – March 2, 2019
  • 2. • The human reference genome is incomplete • 368 unresolved issues, 102 gaps • Segmental duplications, rDNAs • Centromeres, telomeres, heterochromatin • These gaps contain important information • Missing reference sequence leads to analysis artifacts • Variation in these gaps is unexplored (e.g. rDNAs) • We don’t know what we don’t know… I have some troubling news…
  • 3. @khmiga @aphillippy Karen Miga Adam Phillippy Let’s finish the human genome
  • 4. • Repeats are long, reads are short • “If the overlap is of sufficient length to distinguish it from being a repeat in the sequence the two sequences must be contiguous.” • Rodger Staden, 1979, MRC Laboratory of Molecular Biology What’s the problem?
  • 5. • The return of closed (bacterial) genomes • Bibersteinia trehalosi 192 Flashback to AGBT 2012
  • 6. • How long are the repeats? • 7 kbp LINEs • 1 Mbp+ rDNA arrays • 1 Mbp+ centromere arrays • 10 Mbp+ heterochromatin blocks • Coverage and accuracy matter too • 1,000X of 100 bp reads at 100% accuracy? NO • 10X of 10,000,000 bp reads at 100% accuracy, YES • 100X of 100,000 bp reads at 90% accuracy, MAYBE? How long do reads need to be, for human?
  • 7. • ONT R9 pore: E. coli CsgG membrane protein • Read lengths >1 Mbp possible Ultra-long nanopore sequencing *Assuming 3.4 Å per bp, 1 Mbp = 3,400,000 Å (0.34 mm) = 40,000x height of the pore 120 Å 85 Å 3.2 km in 37 m 8 cm
  • 8. LONG READ CLUB Really very long reads indeed @pathogenomenick Nick Loman @mattloose Matt Loose
  • 9. It’s time to finish the human genome CHM13 cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton, Stowers (N=46; XX) The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome.
  • 10. • 30x Nanopore ultra-long • Contig building • 60x PacBio • Polishing • 50x 10x Genomics • Polishing • BioNano • Structural validation We need long reads. Lots of long reads 100 kb
  • 11. • Nanopore UL read length distribution is long tailed It pays to go deep repeat
  • 12. • From May 1 – October 29, 2018 • 62 MinION/GridION flow cells • 8.9M reads, 98 Gb, 1.6 Gb / cell • N50 read length 76 kb • 44 Gb in reads >100 kb • Max read length 1.03 Mb • Assembled with Canu • 10x cov of 100 kb at 90% acc CHM13 sequencing Now upwards of 90 flow cells and counting…
  • 13. The human genome, 2001 ref28 NG50 contig 0.5 Mbp
  • 14. The human genome, 2019 CHM13 NG50 contig 75 Mbp (70x PacBio + 35x UL ONT) 13 14 15 16 17 18 19 20 21 22 X 1 2 3 4 5 6 7 8 9 10 11 12 Canu
  • 15. The first complete assembly of a human chromosome
  • 16. A complete X chromosome ddPCR
  • 17. • Unique structural variants from PacBio • Unique k-mers confirmed by Duplex-Seq Stitching across the X centromere
  • 18. An assembly is a hypothesis
  • 19. Anchored 100 kb+ centromere reads Requires a careful measure of “mapping quality”
  • 22. It’s time to finish the human genome
  • 23. • Almost! • Have proven it’s possible for the X chromosome • T2T assembly of all chrs within the next 2 years • Remaining challenges • Satellite arrays, rDNA arrays, segmental duplications • Nanopore consensus quality • Targeted long-read sequencing • Better methods for phasing repeats and haplotypes Are we there yet?
  • 24. • github.com/nanopore-wgs-consortium/chm13 • Draft whole-genome assemblies • Nanopore ultra-long reads • 10x Genomics reads • BioNano DLS (WashU) • PacBio (SRA) • Coming soon: • Hi-C (Arima Genomics) All our CHM13 data is openly released
  • 25. NHGRI • Sergey Koren • Arang Rhie • Jim Mullikin • Alice Young • Shelise Brooks • Valerie Maduro • Gerard Bouffard • Sofia Barreira • Andy Baxevanis • Nancy Hansen • Karen Miga, UCSC • Jennifer Gerton, Stowers • Tamara Potapova, Stowers • Tina Graves Lindsay, WashU • Ira Hall, WashU • Valerie Schneider, NCBI • Kerstin Howe, Sanger • Jo Wood, Sanger • Matt Loose, Nottingham • Nick Loman, Birmingham • Urvashi Surti, Pitt (ret.) Acknowledgements