SlideShare ist ein Scribd-Unternehmen logo
1 von 27
New technologies, new management
challenges for reference assemblies
Valerie Schneider, Ph.D.
@dnadiver
NCBI
13 February 2017
https://genomereference.org
Twitter: @GenomeRef
Announcements: grc-announce@ncbi.nlm.nih.gov
• Reference assembly management
• Challenges of changing technologies, new resources
• De novo assembly assessment
Evolution of Human Reference Assembly Management
Why do we need data management
and assembly infrastructure?
Reference Assembly Management
Reference Assembly Management
Sanger-seq’d, clone-based assembly BAC insert
BAC vector
Shotgun sequence clone
Assemble
GAPS
Finish
Minimal Tiling Path
Define switch points for adjacent clones (haploid mosaic)
Most contiguous
Highest sequence quality
Ordering the Path
Fingerprint maps
Genetic linkage maps
Radiation hybrid maps
Component-Level Quality
Shotgun sequence clone
Assemble
GAPS
Finish
Finished: Error rate <10-4
Additional Component Screening
• Contamination
• Chromosome Assignment
• Suitability (e.g. cancer)
DRAFTFIN
Contig-Level Quality
? ?
• Reference assembly management
• Challenges of changing technologies, new resources
• De novo assembly assessment
Evolution of Human Reference Assembly Management
Changing Technologies, New Resources
Declining clone usage
New technologies
New public WGS genomes
Cost
Time
Quality?
Clone-based assembly: 300 Kb interval, 1 contig
WGS-based assembly: 670 Kb interval, 6 contigs
?
Changing Technologies, New Resources
GCA_001297185.1
Changing Technologies, New Resources
Changing Technologies, New Resources
Lander and Waterman
(1988) Genomics
SequencedNot sequenced
1X Coverage
5X Coverage
10X Coverage
37% 63%
0.6% 99.4%
0.005% 99.995%
The likelihood a base is seq’d.Coverage
N50
HuRef
SOAPdenovo
NA12878
ALLPATHS
NA12878
MHAP
CHM1
Chaisson and Eichler (2015)
AK1
HX1
Changing Technologies, New Resources
Measure of contiguity. Half of the assembly
is in contigs this length or greater.
Variant analysis
Annotation
Clinical
Diagnostics
Comparative
genomics
Transcriptomics
?
Sample/
Population
Assembly level
(contig,
scaffold,
chromosome)
Full/partial
genome
representation
Sequencing
method
Assembly
method
Contiguity
Coverage
Segmental
duplication/
repeat
representation
Gene
representation
Diploid/
Haploid
Changing Technologies, New Resources
• Reference assembly management
• Challenges of changing technologies, new resources
• De novo assembly assessment
Evolution of Human Reference Assembly Management
De novo assembly assessment
CHM1 and CHM13 Assemblathon
• 46 XX haploid hydatidiform moles (U. Surti)
• How good are each of the assemblies?
• How suited are these assemblies for use in reference
curation? (GRC)
Accession Short Name Sample Submitter Assembler Data Coverage
GCA_001307025.1 CHM1_CA_P6 CHM1 Phillippy CA 8.3rc2 P6 61
GCA_001297185.1 CHM1_FC_P6 CHM1 Chin Falcon 0.3+ P6 61
GCA_000983465.1 CHM13_CA1 CHM13 Phillippy CA 8.3rc2 P5+P6 70
GCA_001015355.1 CHM13_CA2 CHM13 Phillippy CA 8.3rc2 P5+P6 70
GCA_000983475.1 CHM13_CA3 CHM13 Phillippy CA 8.3rc2 P5+P6 70
GCA_001015385.3 CHM13_CA4 CHM13 Phillippy CA 8.3rc2 P5+P6 70
GCA_000983455.2 CHM13_FC CHM13 Chin Falcon 0.4 P5+P6 70
http://www.biorxiv.org/content/early/2016/08/30/072116
Assembly Assessments
• General QA (NCBI)
• Assembly stats (length, contiguity)
• Annotation
• Assembly-assembly alignment to reference
• Comparison to BAC inserts
• BAC end placements (CHM1 only)
• BioNano map comparison (MGI)
• Illumina alignments (Phillippy, Li)
• Quality/Errors
• Coverage
• Paired end distribution
Resource Sample
Illumina reads CHM1, CHM13
BioNano map CHM1, CHM13
BAC library CHM1, CHM13
BAC library end seqs CHM1
Fingerprint Map CHM1
De novo assembly assessment
CHM1 GRCh38 primary CHM1_FC_P6 CHM1_CA_P6
GCF_000001305.14 GCA_001297185.1 GCA_001307025.1
Total Length 3,099,734,149 2,996,426,293 2,939,630,703
Contig N50 56,413,054 26,899,841 20,609,304
Num Ctgs 1,385 3,641 4,850
QV (Koren) ND 44.64 42.28
CHM13 GRCh38 primary CHM13_FC CHM13_CA1 CHM13_CA2 CHM13_CA3 CHM13_CA4
GCF_000001305.14 GCA_000983455.2 GCA_000983465.1 GCA_001015355.1 GCA_000983475.1 GCA_001015385.1
Total
Length 3,099,734,149 2,941,135,618 3,061,240,732 3,028,917,871 2,996,416,935 3,065,003,163
Contig N50 56,413,054 10,549,591 13,331,528 19,357,701 5,550,336 12,252,446
Num Ctgs 1,385 4,961 15,538 11,138 10,430 12,091
QV (Koren) ND 43.00 41.21 39.94 42.89 41.36
De novo assembly assessment
CHM1 Assemblies CHM13 Assemblies De novo assembly assessment
FRCbam Error Curves
FRCbam C-E Curves
(Sergey Koren)
De novo assembly assessment
Assembly Not Aligned
(%)
Split
Alignment
(%)
Coverage
<95% (%)
Dropped
coding
transcripts
Dropped
non-coding
transcripts
Proteins with
frameshifts*
GRCh38
GCA_000001405.15
22 (0.04%) 10 (0.02%) 17 (0.04%) 2 0 19
CHM1_CA_P6
GCA_001307025.1
117 (0.23%) 291 (0.23%) 426 (1.08%) 226 160 983
CHM1_FC_P6
GCA_001297185.1
65 (0.13%) 171 (0.34%) 234 (0.60%) 214 167 1012
CHM13_CA1
GCA_000983465.1
50 (0.10%) 345 (0.68%) 386 (0.98%) 274 213 503
CHM13_CA2
GCA_001015355.1
49 (0.10%) 320 (0.63%) 335 (0.85%) 272 213 439
CHM13_CA3
GCA_000983475.1
46 (0.09%) 616 (1.22%) 632 (1.61%) 240 187 627
CHM13_CA4
GCA_001015385.3
50 (0.10%) 400 (0.79%) 404 (1.03%) 259 197 450
CHM13_FC
GCA_000983455.2
94 (0.18%) 482 (0.96%) 568 (1.44%) 281 202 346
50867 RefSeq transcripts were aligned to each assembly
*GRCh38 frameshifts exclude alternate loci
De novo assembly assessment
RefSeq Transcript Analysis
Assembly Not Aligned
(%)
Split
Alignment
(%)
Coverage
<95% (%)
Dropped
coding
transcripts
Dropped
non-coding
transcripts
Proteins with
frameshifts*
GRCh38
GCA_000001405.15
22 (0.04%) 10 (0.02%) 17 (0.04%) 2 0 19
CHM1_CA_P6
GCA_001307025.1
117 (0.23%) 291 (0.23%) 426 (1.08%) 226 160 983
CHM1_FC_P6
GCA_001297185.1
65 (0.13%) 171 (0.34%) 234 (0.60%) 214 167 1012
CHM13_CA1
GCA_000983465.1
50 (0.10%) 345 (0.68%) 386 (0.98%) 274 213 503
CHM13_CA2
GCA_001015355.1
49 (0.10%) 320 (0.63%) 335 (0.85%) 272 213 439
CHM13_CA3
GCA_000983475.1
46 (0.09%) 616 (1.22%) 632 (1.61%) 240 187 627
CHM13_CA4
GCA_001015385.3
50 (0.10%) 400 (0.79%) 404 (1.03%) 259 197 450
CHM13_FC
GCA_000983455.2
94 (0.18%) 482 (0.96%) 568 (1.44%) 281 202 346
50867 RefSeq transcripts were aligned to each assembly
*GRCh38 frameshifts exclude alternate loci
De novo assembly assessment
RefSeq Transcript Analysis
• Reference assembly management
• Challenges of changing technologies, new resources
• De novo assembly assessment
Evolution of Human Reference Assembly Management
Credits
GRCh38 Collaborators
• NCBI RefSeq and gpipe annotation team
• Havana annotators
• Karen Miga
• David Schwartz
• Steve Goldstein
• Mario Caceres
• Giulio Genovese
• Jeff Kidd
• Peter Lansdorp
• Mark Hills
• David Page
• Jim Knight
• Stephan Schuster
• 1000 Genomes
GRC SAB
• Rick Myers
• Granger Sutton
• Evan Eichler
• Jim Kent
• Roderic Guigo
• Carol Bult
• Derek Stemple
• Jan Korbel
• Liz Worthey
• Matthew Hurles
• Richard Gibbs
Assemblathon Collaborators
• Jason Chin
• Adam Phillippy
• Sergey Koren
• Heng Li
GRC
Tina Graves-Lindsay
Karyn Meltz Steinberg
Kerstin Howe
Richard Durbin
Paul Flicek
Laura Clarke
Deanna Church
Curators!
Developers!
CHM1 Evaluation CHM13 Evaluation
De novo assembly assessment

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Ashg grc workshop2014_tg
Ashg grc workshop2014_tgAshg grc workshop2014_tg
Ashg grc workshop2014_tg
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Grc workshop agbt2015_tg
Grc workshop agbt2015_tgGrc workshop agbt2015_tg
Grc workshop agbt2015_tg
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
 

Ähnlich wie AGBT2017 Reference Workshop: Schneider

Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Nuria Lopez-Bigas
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Ähnlich wie AGBT2017 Reference Workshop: Schneider (20)

HIV Vaccines Process Development & Manufacturing - Pitfalls & Possibilities
HIV Vaccines Process Development & Manufacturing - Pitfalls & PossibilitiesHIV Vaccines Process Development & Manufacturing - Pitfalls & Possibilities
HIV Vaccines Process Development & Manufacturing - Pitfalls & Possibilities
 
MDC Connects: Hit ID screening - understanding your target is key
MDC Connects: Hit ID screening - understanding your target is keyMDC Connects: Hit ID screening - understanding your target is key
MDC Connects: Hit ID screening - understanding your target is key
 
A practical approach to assay design for qPCR
A practical approach to assay design for qPCRA practical approach to assay design for qPCR
A practical approach to assay design for qPCR
 
Refinery stream modeling walkthrough
Refinery stream modeling walkthroughRefinery stream modeling walkthrough
Refinery stream modeling walkthrough
 
CDAC 2018 Pellegrini clustering ppi networks
CDAC 2018 Pellegrini clustering ppi networksCDAC 2018 Pellegrini clustering ppi networks
CDAC 2018 Pellegrini clustering ppi networks
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
 
Compressed Timelines for Breakthrough Therapies: Impact on Process Characteri...
Compressed Timelines for Breakthrough Therapies: Impact on Process Characteri...Compressed Timelines for Breakthrough Therapies: Impact on Process Characteri...
Compressed Timelines for Breakthrough Therapies: Impact on Process Characteri...
 
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...OPERA: A free and open source QSAR tool for predicting physicochemical proper...
OPERA: A free and open source QSAR tool for predicting physicochemical proper...
 
On using BS to improve the
On using BS to improve theOn using BS to improve the
On using BS to improve the
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Sequencing 60,000 Samples: An Innovative Large Cohort Study for Breast Cancer...
Sequencing 60,000 Samples: An Innovative Large Cohort Study for Breast Cancer...Sequencing 60,000 Samples: An Innovative Large Cohort Study for Breast Cancer...
Sequencing 60,000 Samples: An Innovative Large Cohort Study for Breast Cancer...
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...
 
GRM 2011: Improving cowpea productivity in Africa - J Ehlers
GRM 2011: Improving cowpea productivity in Africa - J EhlersGRM 2011: Improving cowpea productivity in Africa - J Ehlers
GRM 2011: Improving cowpea productivity in Africa - J Ehlers
 
Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptx
 
Bertrand OF 2013 06
Bertrand OF 2013 06Bertrand OF 2013 06
Bertrand OF 2013 06
 
Delivering More Efficient Therapeutic Protein Expression Systems Through Cell...
Delivering More Efficient Therapeutic Protein Expression Systems Through Cell...Delivering More Efficient Therapeutic Protein Expression Systems Through Cell...
Delivering More Efficient Therapeutic Protein Expression Systems Through Cell...
 
Refinery stream modeling walkthrough
Refinery stream modeling walkthroughRefinery stream modeling walkthrough
Refinery stream modeling walkthrough
 
Delivering More Efficient Therapeutic Protein Expression Systems Through Cell...
Delivering More Efficient Therapeutic Protein Expression Systems Through Cell...Delivering More Efficient Therapeutic Protein Expression Systems Through Cell...
Delivering More Efficient Therapeutic Protein Expression Systems Through Cell...
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PM
 

Mehr von Genome Reference Consortium

Mehr von Genome Reference Consortium (17)

Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
 

Kürzlich hochgeladen

Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
adilkhan87451
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
chetankumar9855
 
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
Sheetaleventcompany
 

Kürzlich hochgeladen (20)

Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
 
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Mysore Just Call 8250077686 Top Class Call Girl Service Available
 
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
 
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
 
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
 
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
💚Call Girls In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girl No💰Advance Cash...
 
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
 
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Hyderabad Just Call 8250077686 Top Class Call Girl Service Available
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
 
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
 
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
 
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappMost Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
 

AGBT2017 Reference Workshop: Schneider

  • 1. New technologies, new management challenges for reference assemblies Valerie Schneider, Ph.D. @dnadiver NCBI 13 February 2017
  • 3. • Reference assembly management • Challenges of changing technologies, new resources • De novo assembly assessment Evolution of Human Reference Assembly Management
  • 4. Why do we need data management and assembly infrastructure? Reference Assembly Management
  • 5. Reference Assembly Management Sanger-seq’d, clone-based assembly BAC insert BAC vector Shotgun sequence clone Assemble GAPS Finish Minimal Tiling Path Define switch points for adjacent clones (haploid mosaic) Most contiguous Highest sequence quality Ordering the Path Fingerprint maps Genetic linkage maps Radiation hybrid maps
  • 6. Component-Level Quality Shotgun sequence clone Assemble GAPS Finish Finished: Error rate <10-4 Additional Component Screening • Contamination • Chromosome Assignment • Suitability (e.g. cancer) DRAFTFIN
  • 8. • Reference assembly management • Challenges of changing technologies, new resources • De novo assembly assessment Evolution of Human Reference Assembly Management
  • 9. Changing Technologies, New Resources Declining clone usage New technologies New public WGS genomes Cost Time Quality?
  • 10. Clone-based assembly: 300 Kb interval, 1 contig WGS-based assembly: 670 Kb interval, 6 contigs ? Changing Technologies, New Resources
  • 13. Lander and Waterman (1988) Genomics SequencedNot sequenced 1X Coverage 5X Coverage 10X Coverage 37% 63% 0.6% 99.4% 0.005% 99.995% The likelihood a base is seq’d.Coverage N50 HuRef SOAPdenovo NA12878 ALLPATHS NA12878 MHAP CHM1 Chaisson and Eichler (2015) AK1 HX1 Changing Technologies, New Resources Measure of contiguity. Half of the assembly is in contigs this length or greater.
  • 15. • Reference assembly management • Challenges of changing technologies, new resources • De novo assembly assessment Evolution of Human Reference Assembly Management
  • 16. De novo assembly assessment CHM1 and CHM13 Assemblathon • 46 XX haploid hydatidiform moles (U. Surti) • How good are each of the assemblies? • How suited are these assemblies for use in reference curation? (GRC) Accession Short Name Sample Submitter Assembler Data Coverage GCA_001307025.1 CHM1_CA_P6 CHM1 Phillippy CA 8.3rc2 P6 61 GCA_001297185.1 CHM1_FC_P6 CHM1 Chin Falcon 0.3+ P6 61 GCA_000983465.1 CHM13_CA1 CHM13 Phillippy CA 8.3rc2 P5+P6 70 GCA_001015355.1 CHM13_CA2 CHM13 Phillippy CA 8.3rc2 P5+P6 70 GCA_000983475.1 CHM13_CA3 CHM13 Phillippy CA 8.3rc2 P5+P6 70 GCA_001015385.3 CHM13_CA4 CHM13 Phillippy CA 8.3rc2 P5+P6 70 GCA_000983455.2 CHM13_FC CHM13 Chin Falcon 0.4 P5+P6 70 http://www.biorxiv.org/content/early/2016/08/30/072116
  • 17. Assembly Assessments • General QA (NCBI) • Assembly stats (length, contiguity) • Annotation • Assembly-assembly alignment to reference • Comparison to BAC inserts • BAC end placements (CHM1 only) • BioNano map comparison (MGI) • Illumina alignments (Phillippy, Li) • Quality/Errors • Coverage • Paired end distribution Resource Sample Illumina reads CHM1, CHM13 BioNano map CHM1, CHM13 BAC library CHM1, CHM13 BAC library end seqs CHM1 Fingerprint Map CHM1 De novo assembly assessment
  • 18. CHM1 GRCh38 primary CHM1_FC_P6 CHM1_CA_P6 GCF_000001305.14 GCA_001297185.1 GCA_001307025.1 Total Length 3,099,734,149 2,996,426,293 2,939,630,703 Contig N50 56,413,054 26,899,841 20,609,304 Num Ctgs 1,385 3,641 4,850 QV (Koren) ND 44.64 42.28 CHM13 GRCh38 primary CHM13_FC CHM13_CA1 CHM13_CA2 CHM13_CA3 CHM13_CA4 GCF_000001305.14 GCA_000983455.2 GCA_000983465.1 GCA_001015355.1 GCA_000983475.1 GCA_001015385.1 Total Length 3,099,734,149 2,941,135,618 3,061,240,732 3,028,917,871 2,996,416,935 3,065,003,163 Contig N50 56,413,054 10,549,591 13,331,528 19,357,701 5,550,336 12,252,446 Num Ctgs 1,385 4,961 15,538 11,138 10,430 12,091 QV (Koren) ND 43.00 41.21 39.94 42.89 41.36 De novo assembly assessment
  • 19. CHM1 Assemblies CHM13 Assemblies De novo assembly assessment FRCbam Error Curves FRCbam C-E Curves (Sergey Koren)
  • 20. De novo assembly assessment
  • 21. Assembly Not Aligned (%) Split Alignment (%) Coverage <95% (%) Dropped coding transcripts Dropped non-coding transcripts Proteins with frameshifts* GRCh38 GCA_000001405.15 22 (0.04%) 10 (0.02%) 17 (0.04%) 2 0 19 CHM1_CA_P6 GCA_001307025.1 117 (0.23%) 291 (0.23%) 426 (1.08%) 226 160 983 CHM1_FC_P6 GCA_001297185.1 65 (0.13%) 171 (0.34%) 234 (0.60%) 214 167 1012 CHM13_CA1 GCA_000983465.1 50 (0.10%) 345 (0.68%) 386 (0.98%) 274 213 503 CHM13_CA2 GCA_001015355.1 49 (0.10%) 320 (0.63%) 335 (0.85%) 272 213 439 CHM13_CA3 GCA_000983475.1 46 (0.09%) 616 (1.22%) 632 (1.61%) 240 187 627 CHM13_CA4 GCA_001015385.3 50 (0.10%) 400 (0.79%) 404 (1.03%) 259 197 450 CHM13_FC GCA_000983455.2 94 (0.18%) 482 (0.96%) 568 (1.44%) 281 202 346 50867 RefSeq transcripts were aligned to each assembly *GRCh38 frameshifts exclude alternate loci De novo assembly assessment RefSeq Transcript Analysis
  • 22.
  • 23. Assembly Not Aligned (%) Split Alignment (%) Coverage <95% (%) Dropped coding transcripts Dropped non-coding transcripts Proteins with frameshifts* GRCh38 GCA_000001405.15 22 (0.04%) 10 (0.02%) 17 (0.04%) 2 0 19 CHM1_CA_P6 GCA_001307025.1 117 (0.23%) 291 (0.23%) 426 (1.08%) 226 160 983 CHM1_FC_P6 GCA_001297185.1 65 (0.13%) 171 (0.34%) 234 (0.60%) 214 167 1012 CHM13_CA1 GCA_000983465.1 50 (0.10%) 345 (0.68%) 386 (0.98%) 274 213 503 CHM13_CA2 GCA_001015355.1 49 (0.10%) 320 (0.63%) 335 (0.85%) 272 213 439 CHM13_CA3 GCA_000983475.1 46 (0.09%) 616 (1.22%) 632 (1.61%) 240 187 627 CHM13_CA4 GCA_001015385.3 50 (0.10%) 400 (0.79%) 404 (1.03%) 259 197 450 CHM13_FC GCA_000983455.2 94 (0.18%) 482 (0.96%) 568 (1.44%) 281 202 346 50867 RefSeq transcripts were aligned to each assembly *GRCh38 frameshifts exclude alternate loci De novo assembly assessment RefSeq Transcript Analysis
  • 24.
  • 25. • Reference assembly management • Challenges of changing technologies, new resources • De novo assembly assessment Evolution of Human Reference Assembly Management
  • 26. Credits GRCh38 Collaborators • NCBI RefSeq and gpipe annotation team • Havana annotators • Karen Miga • David Schwartz • Steve Goldstein • Mario Caceres • Giulio Genovese • Jeff Kidd • Peter Lansdorp • Mark Hills • David Page • Jim Knight • Stephan Schuster • 1000 Genomes GRC SAB • Rick Myers • Granger Sutton • Evan Eichler • Jim Kent • Roderic Guigo • Carol Bult • Derek Stemple • Jan Korbel • Liz Worthey • Matthew Hurles • Richard Gibbs Assemblathon Collaborators • Jason Chin • Adam Phillippy • Sergey Koren • Heng Li GRC Tina Graves-Lindsay Karyn Meltz Steinberg Kerstin Howe Richard Durbin Paul Flicek Laura Clarke Deanna Church Curators! Developers!
  • 27. CHM1 Evaluation CHM13 Evaluation De novo assembly assessment