Call Girls Siliguri Just Call 9907093804 Top Class Call Girl Service Available
Giab agbt SVs_2019
1. Discovery: 498876 (296761 unique) calls >=50bp and 1157458 (521360 unique) calls >=20bp
discovered in 30+ sequence-resolved callsets from 4 technologies for AJ Trio
Compare SVs: 128715 sequence-resolved SV calls >=50bp after clustering
sequence changes within 20% edit distance in trio
Discovery Support: 30062 SVs with 2+ techs or 5+ callers predicting
sequences <20% different or BioNano/Nabsys support in trio
Evaluate/genotype: 19748 SVs with consensus variant
genotype from svviz in son
Filter complex: 12745 SVs not within
1kb of another SV
Regions: 11869 SVs inside
2.69 Gbp benchmark
regions supported by
diploid assembly
v0.6
Introduction
A new benchmark for human germline structural variant calls
Justin Zook,1 Lesley Chapman,1 Nancy Hansen,3 Fritz J. Sedlazeck,4 Aaron Wenger,5 Adam English,6 Chunlin Xiao,7 John Oliver,8 Joyce Lee,9 Alex Hastie,9 Ian Fiddes,10
Alvaro Barrio,10 Tobias Marschall,11 Mark Chaisson,12 John Farrell,13 Andrew Carroll,14 Paul C. Boutros15,16, Iman Hajirasouliha17, Christopher E. Mason17, Sayed
Mohammad Ebrahim Sahraeian,18 Marc Salit,2 and many other members of the Genome in a Bottle Consortium
(1) National Institute of Standards and Technology; (2) Joint Initiative for Metrology in Biology; (3) NHGRI/NIH; (4) Baylor College of Medicine; (5) Pacific Biosciences; (6) Spiral Genetics;
(7) NCBI/NIH; (8) Nabsys; (9) BioNano Genomics; (10) 10x Genomics; (11) Max Planck Institute; (12) University of Southern California; (13) Boston University Medical School; (14) Google; (15)
University of California, Los Angeles; (16) Ontario Institute for Cancer Research; (17) Weill Cornell Medicine; (18) Roche Sequencing Solutions
• NIST has hosted the Genome in a Bottle Consortium to develop
authoritatively-characterized, human genome Reference Materials
that are an enduring resource for benchmarking variant calls
Integrating data to form benchmark calls
Ongoing and Future GIAB Work
• Using long & linked reads in difficult-to-map regions
• Improved benchmarks for homopolymers and long repeats
• Complex and clustered variants
• New collaborations to characterize difficult regions and
variants in these genomes are welcome! Email jzook@nist.gov
Crowd-sourced manual curation vs. benchmark set
Benchmark calls are strongly supported
Zook et al., Scientific Data, 2016.
ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data
Our benchmark sets are useful in evaluating
multiple technologies
2012
• No human
benchmark
calls
available
• GIAB
Consortium
formed
2014
• Small
variant
genotypes
for ~77% of
pilot
genome
NA12878
2015
• NIST
releases
first human
genome
Reference
Material
2016
• 4 new
genomes
• Small
variants for
~90% of 7
genomes for
GRCh37/38
2018
• Draft SV
benchmark
• Difficult to
map regions
2019+
• Characteriz-
ing difficult
variants and
regions
• Assembly
benchmarks
• Cancer
Benchmark set and README at tinyurl.com/GIABSV06
• Goal: When comparing any callset
to our vcf within the bed, most
putative FPs and FNs should be
errors in the tested callset
• We benchmarked several callsets
from assembly-based and non-
assembly-based methods with short
and long reads.
• Upon manual curation, the majority
of most FPs and FNs were errors in
the tested callset
• Exception: FP insertions from pbsv,
suggesting we may miss ~5%
of true insertions
• Exception: One FP insertion from
Bionano was correctly larger
github.com/nspies/svviz2
50 to 1000 bp
svcurator.com
1kbp to 10kbp
Alu
Alu
LINE
LINE
• Candidates examined by
11 curators on average
• 627/635 consensus
manual curations agreed
with v0.6 genotype in
benchmark regions
• Most “discordant” sites
related to inclusion of
20-49bp indels in
curation
github.com/spiralgenetics/truvari
Short reads
• Illumina
• Complete Genomics
Long reads
• PacBio (raw and CCS)
• Oxford Nanopore
Linked reads
• 10x Genomics
• 6kb Mate-pair
Optical/electronic mapping
• BioNano
• Nabsys
Public GIAB
Data
Short reads have limitations
for large insertions and SVs in
tandem repeats
Log10(BioNano Size)
Log10(BenchmarkSize)
Father 0/0 0/0 0/0 0/1 0/1 0/1 1/1 1/1 1/1
Son | Mother 0/0 0/1 1/1 0/0 0/1 1/1 0/0 0/1 1/1
0/1 14 1185 417 1143 1119 462 416 522 12
1/1 0 0 0 0 449 444 2 431 2748
Trio Mendelian genotype violation rate
28/9392 = 0.3%
(Excludes X/Y and sites with no GT in a parent)
Support from long reads Support from short reads
Support from optical mapping
SV discovery and genotyping methods have
different strengths and weaknesses
More methods discover
SVs that are deletions, not
in tandem repeats, and
smaller insertions github.com/nhansen/SVanalyzer
Fraction of reads supporting SV Fraction of reads supporting SV
Het Hom Het Hom
Het Hom Het Hom
Het Hom
Het Hom
Het Hom
Het Hom