SlideShare a Scribd company logo
1 of 16
Download to read offline
VarMatch:
robust matching of small variant datasets
using flexible scoring schemes
Chen Sun, Paul Medvedev
Penn State
1
Variant Matching
ā€¢ Different pipelines tends to report variants in different
representations
ā€¢ Need to compare VCF files
ā€¢ Evaluate variant callers
ā€¢ Find overlap as high confident variants
ā€¢ Add variants into database
ā€¢ Two variant sets are equivalent if applying them separately to the
reference genome results in the same donor genome.
ā€¢ Variant Matching Problem: given two call sets, identify the largest
equivalent subsets.
2
The Variant Matching problem
Seq A G C C G G
1 REF G C C G
ALT C C G A
2 REF G C G
ALT C G A
3 REF A G G
ALT A G A
Donor: A C C G A G
ā€¢ NaĆÆve approach
ā€¢ Match two variants if location and alleles exactly
same
ā€¢ Normalization (Tan et al 15)
ā€¢ Guarantees to match equivalent singletons
ā€¢ Complex Variants
ā€¢ One variant matches multiple variants
ā€¢ Multiple variants matches multiple variants
ā€¢ Decomposition (Li 14, Zook et al 14)
ā€¢ Creates fractional matches
ā€¢ Does not always work (Example ļƒ  )
3
VarMatch Algorithm Overview
ā€¢ Separator on reference genome sequence
ā€¢ Variants on the left can not be equivalent to variants on the right
ā€¢ Linear scan of reference genome to identify separators
ā€¢ Solve independent small problem
ā€¢ Branch and bound method for small problem
ā€¢ Similar algorithm as Cleary et al., 2015
ā€¢ Problem size small
ā€¢ Require less memory and time
ā€¢ Theorem for identifying separators
Software: https://github.com/medvedevgroup/varmatch
Preprint: VarMatch: robust matching of small variant datasets using flexible
scoring schemes (bioArxiv)
4
VarMatch supports flexible scoring schemes
ā€¢ Maximize number of total matched variants or just in the baseline?
ā€¢ Maximize number of calls or total edit distance?
ā€¢ e.g. a call affecting changes 10 bases vs. 10 calls changing 1 base.
ā€¢ Require genotypes to match or to just detect a variant is present?
Others possible?
5
Benchmark
CHM1 + bowtie (Li 14)
Freebayes GATK-HC
NA12878 + bowtie (Li 14)
Freebayes GATK-UG
Vt normalize 2,778,372 2,778,372 4,092,161 4,092,161
RTG Tools 2,843,396 2,912,641 4,197,070 4,321,997
VarMatch 2,843,396 2,912,641 4,197,138 4,322,083
RAM(Gb) Time(s)
RTG Tools 48 456
VarMatch 5 302
Memory and Running Time Evaluation
Number of Matched Variants
Matching in low-complexity regions
ā€¢ Comparison of (1) BWA+FreeBayes and (2) Bowtie2+Platypus NA12878 callsets (Li 14)
ā€¢ Using Bowtie2+GATK as baseline
ā€¢ Focus on low-complexity region
ā€¢ 12% more equivalent variants identified using VarMatch than normalization
Results of Vt-normalize Results of VarMatch
Matching in dense regions
ā€¢ Comparison of Freebayes vs. Platypus NA12878 callsets (Li. 2014)
ā€¢ using GIAB Gold Standard (Zook et al 14) as baseline
ā€¢ Focus on ā€œdense regionsā€
ā€¢ 10 base regions that contain an INDEL and another variant
ā€¢ Assessment genome wide differs from that in dense regions
Number of Matched Variants in Baseline
Freebayes Platypus
genome wide 2,896,841 2,891,849
dense regions 24,188 24,522
Conclusion
ā€¢ Software: https://github.com/medvedevgroup/varmatch
ā€¢ Manuscript: VarMatch: robust matching of small variant datasets
using flexible scoring schemes (bioArxiv)
9
Supplementary
10
VarMatch Highlights
ā€¢ Use less memory and running time
ā€¢ Better performance matching complex variants
ā€¢ Better performance in low-complexity regions
ā€¢ Better performance in dense regions
ā€¢ Flexible scoring schemes
11
12
13
14
15
16

More Related Content

What's hot

What's hot (20)

160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
Ā 
Aug2015 salit standards architecture
Aug2015 salit standards architectureAug2015 salit standards architecture
Aug2015 salit standards architecture
Ā 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
Ā 
Giab ashg 2017
Giab ashg 2017Giab ashg 2017
Giab ashg 2017
Ā 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequins
Ā 
Sept2016 sv nist_intro
Sept2016 sv nist_introSept2016 sv nist_intro
Sept2016 sv nist_intro
Ā 
New methods diploid assembly with graphs
New methods   diploid assembly with graphsNew methods   diploid assembly with graphs
New methods diploid assembly with graphs
Ā 
170120 giab stanford genetics seminar
170120 giab stanford genetics seminar170120 giab stanford genetics seminar
170120 giab stanford genetics seminar
Ā 
160628 giab for festival of genomics
160628 giab for festival of genomics160628 giab for festival of genomics
160628 giab for festival of genomics
Ā 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
Ā 
Tips for effective use of BLAST and other NCBI tools
Tips for effective use of BLAST and other NCBI toolsTips for effective use of BLAST and other NCBI tools
Tips for effective use of BLAST and other NCBI tools
Ā 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
Ā 
Sept2016 plenary nist_intro
Sept2016 plenary nist_introSept2016 plenary nist_intro
Sept2016 plenary nist_intro
Ā 
2017 agbt benchmarking_poster
2017 agbt benchmarking_poster2017 agbt benchmarking_poster
2017 agbt benchmarking_poster
Ā 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
Ā 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
Ā 
Aug2015 Giab nist integration methods
Aug2015 Giab nist integration methodsAug2015 Giab nist integration methods
Aug2015 Giab nist integration methods
Ā 
Sept2016 sv illumina
Sept2016 sv illuminaSept2016 sv illumina
Sept2016 sv illumina
Ā 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
Ā 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
Ā 

Viewers also liked

Ī”ĪµĻƒĻ€ĪæĻ„Ī¬Ļ„Īæ Ļ„Ī·Ļ‚ Ī—Ļ€ĪµĪÆĻĪæĻ…,ĪĪ¬Ļ„ĻƒĪ·Ļ‚ - ĪœĻ€ĻŒĪ²ĪæĪ»ĪæĻ‚
Ī”ĪµĻƒĻ€ĪæĻ„Ī¬Ļ„Īæ Ļ„Ī·Ļ‚ Ī—Ļ€ĪµĪÆĻĪæĻ…,ĪĪ¬Ļ„ĻƒĪ·Ļ‚ - ĪœĻ€ĻŒĪ²ĪæĪ»ĪæĻ‚Ī”ĪµĻƒĻ€ĪæĻ„Ī¬Ļ„Īæ Ļ„Ī·Ļ‚ Ī—Ļ€ĪµĪÆĻĪæĻ…,ĪĪ¬Ļ„ĻƒĪ·Ļ‚ - ĪœĻ€ĻŒĪ²ĪæĪ»ĪæĻ‚
Ī”ĪµĻƒĻ€ĪæĻ„Ī¬Ļ„Īæ Ļ„Ī·Ļ‚ Ī—Ļ€ĪµĪÆĻĪæĻ…,ĪĪ¬Ļ„ĻƒĪ·Ļ‚ - ĪœĻ€ĻŒĪ²ĪæĪ»ĪæĻ‚
Iliana Kouvatsou
Ā 
Event Stream Processing SAP
Event Stream Processing SAPEvent Stream Processing SAP
Event Stream Processing SAP
Gaurav Ahluwalia
Ā 
Ī±Ī½Ļ„Ī±Ī»Ī»Ī±ĪŗĻ„Ī¹ĪŗĪ¬ Ī“ĪÆĪŗĻ„Ļ…Ī±
Ī±Ī½Ļ„Ī±Ī»Ī»Ī±ĪŗĻ„Ī¹ĪŗĪ¬ Ī“ĪÆĪŗĻ„Ļ…Ī±Ī±Ī½Ļ„Ī±Ī»Ī»Ī±ĪŗĻ„Ī¹ĪŗĪ¬ Ī“ĪÆĪŗĻ„Ļ…Ī±
Ī±Ī½Ļ„Ī±Ī»Ī»Ī±ĪŗĻ„Ī¹ĪŗĪ¬ Ī“ĪÆĪŗĻ„Ļ…Ī±
Athina Kollia
Ā 
Ī“Ī¹Ī±Ī“Ī¹ĪŗĻ„Ļ…Ī±ĪŗĪµĻ‚ ĪµĪŗĻ€ĪæĪ¼Ļ€ĪµĻ‚
Ī“Ī¹Ī±Ī“Ī¹ĪŗĻ„Ļ…Ī±ĪŗĪµĻ‚ ĪµĪŗĻ€ĪæĪ¼Ļ€ĪµĻ‚Ī“Ī¹Ī±Ī“Ī¹ĪŗĻ„Ļ…Ī±ĪŗĪµĻ‚ ĪµĪŗĻ€ĪæĪ¼Ļ€ĪµĻ‚
Ī“Ī¹Ī±Ī“Ī¹ĪŗĻ„Ļ…Ī±ĪŗĪµĻ‚ ĪµĪŗĻ€ĪæĪ¼Ļ€ĪµĻ‚
Athina Kollia
Ā 

Viewers also liked (14)

Ī”ĪµĻƒĻ€ĪæĻ„Ī¬Ļ„Īæ Ļ„Ī·Ļ‚ Ī—Ļ€ĪµĪÆĻĪæĻ…,ĪĪ¬Ļ„ĻƒĪ·Ļ‚ - ĪœĻ€ĻŒĪ²ĪæĪ»ĪæĻ‚
Ī”ĪµĻƒĻ€ĪæĻ„Ī¬Ļ„Īæ Ļ„Ī·Ļ‚ Ī—Ļ€ĪµĪÆĻĪæĻ…,ĪĪ¬Ļ„ĻƒĪ·Ļ‚ - ĪœĻ€ĻŒĪ²ĪæĪ»ĪæĻ‚Ī”ĪµĻƒĻ€ĪæĻ„Ī¬Ļ„Īæ Ļ„Ī·Ļ‚ Ī—Ļ€ĪµĪÆĻĪæĻ…,ĪĪ¬Ļ„ĻƒĪ·Ļ‚ - ĪœĻ€ĻŒĪ²ĪæĪ»ĪæĻ‚
Ī”ĪµĻƒĻ€ĪæĻ„Ī¬Ļ„Īæ Ļ„Ī·Ļ‚ Ī—Ļ€ĪµĪÆĻĪæĻ…,ĪĪ¬Ļ„ĻƒĪ·Ļ‚ - ĪœĻ€ĻŒĪ²ĪæĪ»ĪæĻ‚
Ā 
Sistema cardiovascular en la vejez
Sistema cardiovascular en la vejezSistema cardiovascular en la vejez
Sistema cardiovascular en la vejez
Ā 
Event Stream Processing SAP
Event Stream Processing SAPEvent Stream Processing SAP
Event Stream Processing SAP
Ā 
TĪ± Ļ€Ī±Ī¹Ļ‡Ī½ĪÆĪ“Ī¹Ī± Ļ„Ī·Ļ‚ Ī³ĪµĪ¹Ļ„ĪæĪ½Ī¹Ī¬Ļ‚
TĪ± Ļ€Ī±Ī¹Ļ‡Ī½ĪÆĪ“Ī¹Ī± Ļ„Ī·Ļ‚ Ī³ĪµĪ¹Ļ„ĪæĪ½Ī¹Ī¬Ļ‚TĪ± Ļ€Ī±Ī¹Ļ‡Ī½ĪÆĪ“Ī¹Ī± Ļ„Ī·Ļ‚ Ī³ĪµĪ¹Ļ„ĪæĪ½Ī¹Ī¬Ļ‚
TĪ± Ļ€Ī±Ī¹Ļ‡Ī½ĪÆĪ“Ī¹Ī± Ļ„Ī·Ļ‚ Ī³ĪµĪ¹Ļ„ĪæĪ½Ī¹Ī¬Ļ‚
Ā 
CONSTRUCCIƓN PSICOJURƍDICA DE LA FIGURA DE HOMICIDIO EN ESTADO DE EMOCIƓN VIO...
CONSTRUCCIƓN PSICOJURƍDICA DE LA FIGURA DE HOMICIDIO EN ESTADO DE EMOCIƓN VIO...CONSTRUCCIƓN PSICOJURƍDICA DE LA FIGURA DE HOMICIDIO EN ESTADO DE EMOCIƓN VIO...
CONSTRUCCIƓN PSICOJURƍDICA DE LA FIGURA DE HOMICIDIO EN ESTADO DE EMOCIƓN VIO...
Ā 
Proyecto de PromociĆ³n de la AutonomĆ­a en la Escuela en CĆ³rdoba, Argentina
Proyecto de PromociĆ³n de la AutonomĆ­a en la Escuela en CĆ³rdoba, ArgentinaProyecto de PromociĆ³n de la AutonomĆ­a en la Escuela en CĆ³rdoba, Argentina
Proyecto de PromociĆ³n de la AutonomĆ­a en la Escuela en CĆ³rdoba, Argentina
Ā 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
Ā 
Revista Sinergia 2 ediciĆ³n: InclusiĆ³n digital - inclusiĆ³n educativa.
Revista Sinergia 2 ediciĆ³n: InclusiĆ³n digital - inclusiĆ³n educativa.Revista Sinergia 2 ediciĆ³n: InclusiĆ³n digital - inclusiĆ³n educativa.
Revista Sinergia 2 ediciĆ³n: InclusiĆ³n digital - inclusiĆ³n educativa.
Ā 
Ī±Ī½Ļ„Ī±Ī»Ī»Ī±ĪŗĻ„Ī¹ĪŗĪ¬ Ī“ĪÆĪŗĻ„Ļ…Ī±
Ī±Ī½Ļ„Ī±Ī»Ī»Ī±ĪŗĻ„Ī¹ĪŗĪ¬ Ī“ĪÆĪŗĻ„Ļ…Ī±Ī±Ī½Ļ„Ī±Ī»Ī»Ī±ĪŗĻ„Ī¹ĪŗĪ¬ Ī“ĪÆĪŗĻ„Ļ…Ī±
Ī±Ī½Ļ„Ī±Ī»Ī»Ī±ĪŗĻ„Ī¹ĪŗĪ¬ Ī“ĪÆĪŗĻ„Ļ…Ī±
Ā 
Ī“Ī¹Ī±Ī“Ī¹ĪŗĻ„Ļ…Ī±ĪŗĪµĻ‚ ĪµĪŗĻ€ĪæĪ¼Ļ€ĪµĻ‚
Ī“Ī¹Ī±Ī“Ī¹ĪŗĻ„Ļ…Ī±ĪŗĪµĻ‚ ĪµĪŗĻ€ĪæĪ¼Ļ€ĪµĻ‚Ī“Ī¹Ī±Ī“Ī¹ĪŗĻ„Ļ…Ī±ĪŗĪµĻ‚ ĪµĪŗĻ€ĪæĪ¼Ļ€ĪµĻ‚
Ī“Ī¹Ī±Ī“Ī¹ĪŗĻ„Ļ…Ī±ĪŗĪµĻ‚ ĪµĪŗĻ€ĪæĪ¼Ļ€ĪµĻ‚
Ā 
Proyecto de vida laura camila Nieves
Proyecto de vida laura camila NievesProyecto de vida laura camila Nieves
Proyecto de vida laura camila Nieves
Ā 
Yolsuzlukla MĆ¼cadelede AƧık Veri Kullanımı Atƶlyesi / TecruĢˆbe paylasĢ§Ä±mı - S...
Yolsuzlukla MĆ¼cadelede AƧık Veri Kullanımı Atƶlyesi / TecruĢˆbe paylasĢ§Ä±mı - S...Yolsuzlukla MĆ¼cadelede AƧık Veri Kullanımı Atƶlyesi / TecruĢˆbe paylasĢ§Ä±mı - S...
Yolsuzlukla MĆ¼cadelede AƧık Veri Kullanımı Atƶlyesi / TecruĢˆbe paylasĢ§Ä±mı - S...
Ā 
ĪšĻŒĪ¼Ī¹ĪŗĻ‚ Ļ€ĪæĻ… Ī­Ī³Ī¹Ī½Ī±Ī½ Ļ„Ī±Ī¹Ī½ĪÆĪµĻ‚
ĪšĻŒĪ¼Ī¹ĪŗĻ‚ Ļ€ĪæĻ… Ī­Ī³Ī¹Ī½Ī±Ī½ Ļ„Ī±Ī¹Ī½ĪÆĪµĻ‚ĪšĻŒĪ¼Ī¹ĪŗĻ‚ Ļ€ĪæĻ… Ī­Ī³Ī¹Ī½Ī±Ī½ Ļ„Ī±Ī¹Ī½ĪÆĪµĻ‚
ĪšĻŒĪ¼Ī¹ĪŗĻ‚ Ļ€ĪæĻ… Ī­Ī³Ī¹Ī½Ī±Ī½ Ļ„Ī±Ī¹Ī½ĪÆĪµĻ‚
Ā 
SCAN QUALITY E-REPUTATION
SCAN QUALITY E-REPUTATIONSCAN QUALITY E-REPUTATION
SCAN QUALITY E-REPUTATION
Ā 

Similar to GIAB Sep2016 Lightning chen sun varmatch

20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
Ā 

Similar to GIAB Sep2016 Lightning chen sun varmatch (20)

Ashg2015 schneider final
Ashg2015 schneider finalAshg2015 schneider final
Ashg2015 schneider final
Ā 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
Ā 
Database Searching
Database SearchingDatabase Searching
Database Searching
Ā 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
Ā 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
Ā 
Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptx
Ā 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
Ā 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
Ā 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
Ā 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
Ā 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
Ā 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Ā 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
Ā 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Ā 
Genome in a Bottle - Towards new benchmarks for the ā€œdark matterā€ of the huma...
Genome in a Bottle - Towards new benchmarks for the ā€œdark matterā€ of the huma...Genome in a Bottle - Towards new benchmarks for the ā€œdark matterā€ of the huma...
Genome in a Bottle - Towards new benchmarks for the ā€œdark matterā€ of the huma...
Ā 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
Ā 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
Ā 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
Ā 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
Ā 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
Ā 

More from GenomeInABottle

More from GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
Ā 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
Ā 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
Ā 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Ā 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ā 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
Ā 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
Ā 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
Ā 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
Ā 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
Ā 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
Ā 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
Ā 
How giab fits in the rest of the world seqc2 tumor normal
How giab fits in the rest of the world   seqc2 tumor normalHow giab fits in the rest of the world   seqc2 tumor normal
How giab fits in the rest of the world seqc2 tumor normal
Ā 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
Ā 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
Ā 
New data from giab genomes promethion
New data from giab genomes   promethionNew data from giab genomes   promethion
New data from giab genomes promethion
Ā 
New data from giab genomes intro and ultralong nanopore
New data from giab genomes   intro and ultralong nanoporeNew data from giab genomes   intro and ultralong nanopore
New data from giab genomes intro and ultralong nanopore
Ā 
How giab fits in the rest of the world mdic somatic reference samples
How giab fits in the rest of the world   mdic somatic reference samplesHow giab fits in the rest of the world   mdic somatic reference samples
How giab fits in the rest of the world mdic somatic reference samples
Ā 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
Ā 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...
Ā 

Recently uploaded

Call Girl In Indore šŸ“ž9235973566šŸ“ž JustšŸ“² Call Inaaya Indore Call Girls Service ...
Call Girl In Indore šŸ“ž9235973566šŸ“ž JustšŸ“² Call Inaaya Indore Call Girls Service ...Call Girl In Indore šŸ“ž9235973566šŸ“ž JustšŸ“² Call Inaaya Indore Call Girls Service ...
Call Girl In Indore šŸ“ž9235973566šŸ“ž JustšŸ“² Call Inaaya Indore Call Girls Service ...
Sheetaleventcompany
Ā 
Call Girl in Chennai | Whatsapp No šŸ“ž 7427069034 šŸ“ž VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No šŸ“ž 7427069034 šŸ“ž VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No šŸ“ž 7427069034 šŸ“ž VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No šŸ“ž 7427069034 šŸ“ž VIP Escorts Service Availab...
amritaverma53
Ā 
ā¤ļøAmritsar Escorts Serviceā˜Žļø9815674956ā˜Žļø Call Girl service in Amritsarā˜Žļø Amri...
ā¤ļøAmritsar Escorts Serviceā˜Žļø9815674956ā˜Žļø Call Girl service in Amritsarā˜Žļø Amri...ā¤ļøAmritsar Escorts Serviceā˜Žļø9815674956ā˜Žļø Call Girl service in Amritsarā˜Žļø Amri...
ā¤ļøAmritsar Escorts Serviceā˜Žļø9815674956ā˜Žļø Call Girl service in Amritsarā˜Žļø Amri...
Sheetaleventcompany
Ā 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan 087776558899
Ā 
šŸ’šCall Girls In Amritsar šŸ’ÆAnvi šŸ“²šŸ”8725944379šŸ”Amritsar Call Girl NošŸ’°Advance Cash...
šŸ’šCall Girls In Amritsar šŸ’ÆAnvi šŸ“²šŸ”8725944379šŸ”Amritsar Call Girl NošŸ’°Advance Cash...šŸ’šCall Girls In Amritsar šŸ’ÆAnvi šŸ“²šŸ”8725944379šŸ”Amritsar Call Girl NošŸ’°Advance Cash...
šŸ’šCall Girls In Amritsar šŸ’ÆAnvi šŸ“²šŸ”8725944379šŸ”Amritsar Call Girl NošŸ’°Advance Cash...
Sheetaleventcompany
Ā 
Pune Call Girl Service šŸ“ž9xx000xx09šŸ“žJust Call DivyašŸ“² Call Girl In Pune NošŸ’°Adva...
Pune Call Girl Service šŸ“ž9xx000xx09šŸ“žJust Call DivyašŸ“² Call Girl In Pune NošŸ’°Adva...Pune Call Girl Service šŸ“ž9xx000xx09šŸ“žJust Call DivyašŸ“² Call Girl In Pune NošŸ’°Adva...
Pune Call Girl Service šŸ“ž9xx000xx09šŸ“žJust Call DivyašŸ“² Call Girl In Pune NošŸ’°Adva...
Sheetaleventcompany
Ā 

Recently uploaded (20)

Call Girl In Indore šŸ“ž9235973566šŸ“ž JustšŸ“² Call Inaaya Indore Call Girls Service ...
Call Girl In Indore šŸ“ž9235973566šŸ“ž JustšŸ“² Call Inaaya Indore Call Girls Service ...Call Girl In Indore šŸ“ž9235973566šŸ“ž JustšŸ“² Call Inaaya Indore Call Girls Service ...
Call Girl In Indore šŸ“ž9235973566šŸ“ž JustšŸ“² Call Inaaya Indore Call Girls Service ...
Ā 
Call Girl in Chennai | Whatsapp No šŸ“ž 7427069034 šŸ“ž VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No šŸ“ž 7427069034 šŸ“ž VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No šŸ“ž 7427069034 šŸ“ž VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No šŸ“ž 7427069034 šŸ“ž VIP Escorts Service Availab...
Ā 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanisms
Ā 
ā¤ļøAmritsar Escorts Serviceā˜Žļø9815674956ā˜Žļø Call Girl service in Amritsarā˜Žļø Amri...
ā¤ļøAmritsar Escorts Serviceā˜Žļø9815674956ā˜Žļø Call Girl service in Amritsarā˜Žļø Amri...ā¤ļøAmritsar Escorts Serviceā˜Žļø9815674956ā˜Žļø Call Girl service in Amritsarā˜Žļø Amri...
ā¤ļøAmritsar Escorts Serviceā˜Žļø9815674956ā˜Žļø Call Girl service in Amritsarā˜Žļø Amri...
Ā 
ā¤ļøCall Girl Service In Chandigarhā˜Žļø9814379184ā˜Žļø Call Girl in Chandigarhā˜Žļø Cha...
ā¤ļøCall Girl Service In Chandigarhā˜Žļø9814379184ā˜Žļø Call Girl in Chandigarhā˜Žļø Cha...ā¤ļøCall Girl Service In Chandigarhā˜Žļø9814379184ā˜Žļø Call Girl in Chandigarhā˜Žļø Cha...
ā¤ļøCall Girl Service In Chandigarhā˜Žļø9814379184ā˜Žļø Call Girl in Chandigarhā˜Žļø Cha...
Ā 
Call 8250092165 Patna Call Girls ā‚¹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ā‚¹4.5k Cash Payment With Room DeliveryCall 8250092165 Patna Call Girls ā‚¹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ā‚¹4.5k Cash Payment With Room Delivery
Ā 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
Ā 
Gastric Cancer: Š”linical Implementation of Artificial Intelligence, Synergeti...
Gastric Cancer: Š”linical Implementation of Artificial Intelligence, Synergeti...Gastric Cancer: Š”linical Implementation of Artificial Intelligence, Synergeti...
Gastric Cancer: Š”linical Implementation of Artificial Intelligence, Synergeti...
Ā 
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Shahdol Just Call 8250077686 Top Class Call Girl Service Available
Ā 
tongue disease lecture Dr Assadawy legacy
tongue disease lecture Dr Assadawy legacytongue disease lecture Dr Assadawy legacy
tongue disease lecture Dr Assadawy legacy
Ā 
Kolkata Call Girls Naktala šŸ’ÆCall Us šŸ” 8005736733 šŸ” šŸ’ƒ Top Class Call Girl Se...
Kolkata Call Girls Naktala  šŸ’ÆCall Us šŸ” 8005736733 šŸ” šŸ’ƒ  Top Class Call Girl Se...Kolkata Call Girls Naktala  šŸ’ÆCall Us šŸ” 8005736733 šŸ” šŸ’ƒ  Top Class Call Girl Se...
Kolkata Call Girls Naktala šŸ’ÆCall Us šŸ” 8005736733 šŸ” šŸ’ƒ Top Class Call Girl Se...
Ā 
šŸ’°Call Girl In Bangaloreā˜Žļø63788-78445šŸ’° Call Girl service in Bangaloreā˜ŽļøBangalo...
šŸ’°Call Girl In Bangaloreā˜Žļø63788-78445šŸ’° Call Girl service in Bangaloreā˜ŽļøBangalo...šŸ’°Call Girl In Bangaloreā˜Žļø63788-78445šŸ’° Call Girl service in Bangaloreā˜ŽļøBangalo...
šŸ’°Call Girl In Bangaloreā˜Žļø63788-78445šŸ’° Call Girl service in Bangaloreā˜ŽļøBangalo...
Ā 
Call Girls in Lucknow Just Call šŸ‘‰šŸ‘‰8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call šŸ‘‰šŸ‘‰8630512678 Top Class Call Girl Service Avai...Call Girls in Lucknow Just Call šŸ‘‰šŸ‘‰8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call šŸ‘‰šŸ‘‰8630512678 Top Class Call Girl Service Avai...
Ā 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Ā 
(RIYA)šŸŽ„Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)šŸŽ„Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...(RIYA)šŸŽ„Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
(RIYA)šŸŽ„Airhostess Call Girl Jaipur Call Now 8445551418 Premium Collection Of ...
Ā 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Ā 
Kolkata Call Girls Shobhabazar šŸ’ÆCall Us šŸ” 8005736733 šŸ” šŸ’ƒ Top Class Call Gir...
Kolkata Call Girls Shobhabazar  šŸ’ÆCall Us šŸ” 8005736733 šŸ” šŸ’ƒ  Top Class Call Gir...Kolkata Call Girls Shobhabazar  šŸ’ÆCall Us šŸ” 8005736733 šŸ” šŸ’ƒ  Top Class Call Gir...
Kolkata Call Girls Shobhabazar šŸ’ÆCall Us šŸ” 8005736733 šŸ” šŸ’ƒ Top Class Call Gir...
Ā 
šŸ’šCall Girls In Amritsar šŸ’ÆAnvi šŸ“²šŸ”8725944379šŸ”Amritsar Call Girl NošŸ’°Advance Cash...
šŸ’šCall Girls In Amritsar šŸ’ÆAnvi šŸ“²šŸ”8725944379šŸ”Amritsar Call Girl NošŸ’°Advance Cash...šŸ’šCall Girls In Amritsar šŸ’ÆAnvi šŸ“²šŸ”8725944379šŸ”Amritsar Call Girl NošŸ’°Advance Cash...
šŸ’šCall Girls In Amritsar šŸ’ÆAnvi šŸ“²šŸ”8725944379šŸ”Amritsar Call Girl NošŸ’°Advance Cash...
Ā 
Pune Call Girl Service šŸ“ž9xx000xx09šŸ“žJust Call DivyašŸ“² Call Girl In Pune NošŸ’°Adva...
Pune Call Girl Service šŸ“ž9xx000xx09šŸ“žJust Call DivyašŸ“² Call Girl In Pune NošŸ’°Adva...Pune Call Girl Service šŸ“ž9xx000xx09šŸ“žJust Call DivyašŸ“² Call Girl In Pune NošŸ’°Adva...
Pune Call Girl Service šŸ“ž9xx000xx09šŸ“žJust Call DivyašŸ“² Call Girl In Pune NošŸ’°Adva...
Ā 
Chandigarh Call Girls Service ā¤ļøšŸ‘ 9809698092 šŸ‘„šŸ«¦Independent Escort Service Cha...
Chandigarh Call Girls Service ā¤ļøšŸ‘ 9809698092 šŸ‘„šŸ«¦Independent Escort Service Cha...Chandigarh Call Girls Service ā¤ļøšŸ‘ 9809698092 šŸ‘„šŸ«¦Independent Escort Service Cha...
Chandigarh Call Girls Service ā¤ļøšŸ‘ 9809698092 šŸ‘„šŸ«¦Independent Escort Service Cha...
Ā 

GIAB Sep2016 Lightning chen sun varmatch

  • 1. VarMatch: robust matching of small variant datasets using flexible scoring schemes Chen Sun, Paul Medvedev Penn State 1
  • 2. Variant Matching ā€¢ Different pipelines tends to report variants in different representations ā€¢ Need to compare VCF files ā€¢ Evaluate variant callers ā€¢ Find overlap as high confident variants ā€¢ Add variants into database ā€¢ Two variant sets are equivalent if applying them separately to the reference genome results in the same donor genome. ā€¢ Variant Matching Problem: given two call sets, identify the largest equivalent subsets. 2
  • 3. The Variant Matching problem Seq A G C C G G 1 REF G C C G ALT C C G A 2 REF G C G ALT C G A 3 REF A G G ALT A G A Donor: A C C G A G ā€¢ NaĆÆve approach ā€¢ Match two variants if location and alleles exactly same ā€¢ Normalization (Tan et al 15) ā€¢ Guarantees to match equivalent singletons ā€¢ Complex Variants ā€¢ One variant matches multiple variants ā€¢ Multiple variants matches multiple variants ā€¢ Decomposition (Li 14, Zook et al 14) ā€¢ Creates fractional matches ā€¢ Does not always work (Example ļƒ  ) 3
  • 4. VarMatch Algorithm Overview ā€¢ Separator on reference genome sequence ā€¢ Variants on the left can not be equivalent to variants on the right ā€¢ Linear scan of reference genome to identify separators ā€¢ Solve independent small problem ā€¢ Branch and bound method for small problem ā€¢ Similar algorithm as Cleary et al., 2015 ā€¢ Problem size small ā€¢ Require less memory and time ā€¢ Theorem for identifying separators Software: https://github.com/medvedevgroup/varmatch Preprint: VarMatch: robust matching of small variant datasets using flexible scoring schemes (bioArxiv) 4
  • 5. VarMatch supports flexible scoring schemes ā€¢ Maximize number of total matched variants or just in the baseline? ā€¢ Maximize number of calls or total edit distance? ā€¢ e.g. a call affecting changes 10 bases vs. 10 calls changing 1 base. ā€¢ Require genotypes to match or to just detect a variant is present? Others possible? 5
  • 6. Benchmark CHM1 + bowtie (Li 14) Freebayes GATK-HC NA12878 + bowtie (Li 14) Freebayes GATK-UG Vt normalize 2,778,372 2,778,372 4,092,161 4,092,161 RTG Tools 2,843,396 2,912,641 4,197,070 4,321,997 VarMatch 2,843,396 2,912,641 4,197,138 4,322,083 RAM(Gb) Time(s) RTG Tools 48 456 VarMatch 5 302 Memory and Running Time Evaluation Number of Matched Variants
  • 7. Matching in low-complexity regions ā€¢ Comparison of (1) BWA+FreeBayes and (2) Bowtie2+Platypus NA12878 callsets (Li 14) ā€¢ Using Bowtie2+GATK as baseline ā€¢ Focus on low-complexity region ā€¢ 12% more equivalent variants identified using VarMatch than normalization Results of Vt-normalize Results of VarMatch
  • 8. Matching in dense regions ā€¢ Comparison of Freebayes vs. Platypus NA12878 callsets (Li. 2014) ā€¢ using GIAB Gold Standard (Zook et al 14) as baseline ā€¢ Focus on ā€œdense regionsā€ ā€¢ 10 base regions that contain an INDEL and another variant ā€¢ Assessment genome wide differs from that in dense regions Number of Matched Variants in Baseline Freebayes Platypus genome wide 2,896,841 2,891,849 dense regions 24,188 24,522
  • 9. Conclusion ā€¢ Software: https://github.com/medvedevgroup/varmatch ā€¢ Manuscript: VarMatch: robust matching of small variant datasets using flexible scoring schemes (bioArxiv) 9
  • 11. VarMatch Highlights ā€¢ Use less memory and running time ā€¢ Better performance matching complex variants ā€¢ Better performance in low-complexity regions ā€¢ Better performance in dense regions ā€¢ Flexible scoring schemes 11
  • 12. 12
  • 13. 13
  • 14. 14
  • 15. 15
  • 16. 16