3. • Reference assembly management
• Challenges of changing technologies, new resources
• De novo assembly assessment
Evolution of Human Reference Assembly Management
4. Why do we need data management
and assembly infrastructure?
Reference Assembly Management
8. • Reference assembly management
• Challenges of changing technologies, new resources
• De novo assembly assessment
Evolution of Human Reference Assembly Management
9. Changing Technologies, New Resources
Declining clone usage
New technologies
New public WGS genomes
Cost
Time
Quality?
13. Lander and Waterman
(1988) Genomics
SequencedNot sequenced
1X Coverage
5X Coverage
10X Coverage
37% 63%
0.6% 99.4%
0.005% 99.995%
The likelihood a base is seq’d.Coverage
N50
HuRef
SOAPdenovo
NA12878
ALLPATHS
NA12878
MHAP
CHM1
Chaisson and Eichler (2015)
AK1
HX1
Changing Technologies, New Resources
Measure of contiguity. Half of the assembly
is in contigs this length or greater.
15. • Reference assembly management
• Challenges of changing technologies, new resources
• De novo assembly assessment
Evolution of Human Reference Assembly Management
16. De novo assembly assessment
CHM1 and CHM13 Assemblathon
• 46 XX haploid hydatidiform moles (U. Surti)
• How good are each of the assemblies?
• How suited are these assemblies for use in reference
curation? (GRC)
Accession Short Name Sample Submitter Assembler Data Coverage
GCA_001307025.1 CHM1_CA_P6 CHM1 Phillippy CA 8.3rc2 P6 61
GCA_001297185.1 CHM1_FC_P6 CHM1 Chin Falcon 0.3+ P6 61
GCA_000983465.1 CHM13_CA1 CHM13 Phillippy CA 8.3rc2 P5+P6 70
GCA_001015355.1 CHM13_CA2 CHM13 Phillippy CA 8.3rc2 P5+P6 70
GCA_000983475.1 CHM13_CA3 CHM13 Phillippy CA 8.3rc2 P5+P6 70
GCA_001015385.3 CHM13_CA4 CHM13 Phillippy CA 8.3rc2 P5+P6 70
GCA_000983455.2 CHM13_FC CHM13 Chin Falcon 0.4 P5+P6 70
http://www.biorxiv.org/content/early/2016/08/30/072116
17. Assembly Assessments
• General QA (NCBI)
• Assembly stats (length, contiguity)
• Annotation
• Assembly-assembly alignment to reference
• Comparison to BAC inserts
• BAC end placements (CHM1 only)
• BioNano map comparison (MGI)
• Illumina alignments (Phillippy, Li)
• Quality/Errors
• Coverage
• Paired end distribution
Resource Sample
Illumina reads CHM1, CHM13
BioNano map CHM1, CHM13
BAC library CHM1, CHM13
BAC library end seqs CHM1
Fingerprint Map CHM1
De novo assembly assessment
25. • Reference assembly management
• Challenges of changing technologies, new resources
• De novo assembly assessment
Evolution of Human Reference Assembly Management
26. Credits
GRCh38 Collaborators
• NCBI RefSeq and gpipe annotation team
• Havana annotators
• Karen Miga
• David Schwartz
• Steve Goldstein
• Mario Caceres
• Giulio Genovese
• Jeff Kidd
• Peter Lansdorp
• Mark Hills
• David Page
• Jim Knight
• Stephan Schuster
• 1000 Genomes
GRC SAB
• Rick Myers
• Granger Sutton
• Evan Eichler
• Jim Kent
• Roderic Guigo
• Carol Bult
• Derek Stemple
• Jan Korbel
• Liz Worthey
• Matthew Hurles
• Richard Gibbs
Assemblathon Collaborators
• Jason Chin
• Adam Phillippy
• Sergey Koren
• Heng Li
GRC
Tina Graves-Lindsay
Karyn Meltz Steinberg
Kerstin Howe
Richard Durbin
Paul Flicek
Laura Clarke
Deanna Church
Curators!
Developers!