Jan2015 using the pilot genome rm for clinical validation steve lincoln

2/9/2015 © 2013-2014 Invitae Corporation. All Rights Reserved | CONFIDENTIAL1
Using the Genome in a Bottle (GIAB) pilot
reference material: Its strengths and
limitations for analytic validation of a
diagnostic panel test
Stephen E. Lincoln
Invitae

• Diagnostic tests are ordered in response to a medical
question that needs an answer in order to make a specific
decision for a specific patient
− Can be time critical; Decisions may not be reversible
• Our Job: Provide a highly accurate answer to the question
asked in the time needed
− A complete answer is highly valued
o No matter how challenging (with some limits)
− Extra information is not valued (in most cases)
• Rigorous validation required by CLIA, CAP, the medical
community and payers
− Focus on Analytic (not Clinical) validation here…
2/9/2015 | Copyright © Invitae Corp. All Rights Reserved2
Genetic Diagnostic Tests ≠ Research

29 Gene Hereditary Cancer Panel
Sub-Panel Genes Total Gene names
BRCA1/2 2 2 BRCA1, BRCA2
Other High-Risk Breast/Ovarian 4 6 CDH1, PTEN, STK11, TP53
Moderate-Risk Breast/Ovarian 6 12 ATM, BRIP1, CHEK2, NBN, PALB2, RAD51C
Lynch Syndrome 5 17 EPCAM, MLH1, MSH2, MSH6, PMS2
Other Hereditary Cancer
Syndromes
11 28
APC, BMPR1A, SMAD4, CDK4, CDKN2A,
PALLD, MET, MEN1, RET, PTCH1, VHL
MUTYH 1 29 MUTYH

1. Multiple Enrichment Methods
− No one technology delivers adequate coverage of all 29 genes
2. Copy number and other structural variants play a
significant role in addition to sequence variants
− CNVs as small as one exon
− Alu insertions
− Tandem duplications
3. Of these 29 genes, a number are “hard”
− PMS2 (last 4 exons) and CHEK2 have pseudogenes
− SMAD4 also does, in some people
− MSH2 has a large intronic homopolymer-A immediately next to
a canonical splice site (known to harbor pathogenic mutations)
− CDKN2A has a low complexity 80% GC tandem duplication at
the 5’ Met (also known to harbor pathogenic mutations)
Technical Requirements For These 29 Genes

Study Population
Group N Description Previous Testing
Prospective
Clinical
735
Prospectively accrued clinical
cases
Clinical testing for
BRCA1/2, occasionally
other genes (depending
on case)
High-Risk
Clinical
(Total 327)
209
Retrospective cases from a clinical
biobank generally containing
higher-risk individuals
118
Cases referred due to known
pathogenic variant in family
Clinical single-site
testing
Reference
Samples
36
Reference samples from public
biobanks (Coriell, NIBSC)
Samples carry known
pathogenic variants
Well-Characterized
Genomes (WCGs)
7
Reference samples from public
biobanks with high-quality whole
genome sequencing (WGS) data
Variants in 29 cancer
genes extracted from
WGS data; most of
these are benign
Total 1105
1062

7 Well Characterized Genomes (WCGs) Used
✔
NA19239 NA19238
NA19240
CEPH/Utah Pedigree 1463 Yoruba Family Y117
✔
NA12889
✔
✔
✔ ✔
NA12879
NA12890
NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893
NA12877 NA12878
NA12891 NA12892
✔
Geoff Nilsen
Integrated Complete Genomics, Illumina Platinum and other data sets
Mendelian scrub (leveraging data from family members not used in this study)

1. CLIA Validation and Performance Study (pre-GIAB):
• Integrated CG and Illumina Platinum data
• Compared scrubbed data against our Dx test data
2. Later reconciled NA12878 data against GIAB data set
• Substantially the same as our integrated data
Results presented here are a mix of the pre/post GIAB
WCGs in Cancer Panel Validation
Geoff Nilsen, Shan Yang

• 58,708 variants detected (avg. 53 per patient)
• >90% are common polymorphisms (MAF>1% in 1KG)
• >99% are single nucleotide variants (SNVs)
• <0.1% are of the most technically challenging types*
− CNVs (single gene to single exon)
− Larger indels (≥10bp)
− Closely-spaced variants (≤25bp)
− Complex variants
− Variants in/near low complexity sequence
Genetic Data for 1105 Individuals x 29 genes
*We believe this largely reflects prevalence, not sensitivity limitations.

Variants Selected in Analytic Validation Study
Type Variants Details
Single Nucleotide Variants (SNVs) 549
Sequence deletions <10 base-pairs 125
Sequence insertions <5 base-pairs 31
Sequence insertions ≥5 base-pairs 4 24, 5 bp
Sequence deletions ≥10 base-pairs 9 126, 40, 19, 15, 11 bp
Complex variants 6 Delins, haplotypes, Homopolymer-associated1
Single exon deletions 9 BRCA1, BRCA2, MSH2, PMS2
Single exon duplications 4 BRCA1, MLH1
Deletions of multiple exons or whole gene 10 BRCA1, MSH2, RAD51C
Duplications of multiple exons or whole gene 6 BRCA1, BRCA2, NBN, SMAD4
Total 750
SequenceCopyNumber
Some published validation studies have few, if any, examples of these relatively
challenging classes of variation2,3
1. MSH2:c.942+3A>T
2. Bosdet et al, J Mol Dx, 2013
3. Chong et al, PLOS One, 2014
“Hard Stuff”
All could be directly compared between NGS panel and reference/orthogonal data.

• 7 Samples Contributed 310 of 750 selected variants
− All variants in assay targets in the WCG data sets were used
− 41% of the total set of variants came from 0.6% of the samples
• In 15 of 29 genes the 7 WCGs doubled (or more) the
selected variant count
• WCGs added variants in one gene (PTCH1) which
otherwise had none selected
• Saved us 310 Sanger confirmations
− Unlike confirmation, WCGs contribute both to sensitivity and
specificity measurements in a strong way
• As a replenishable resource, it’s easy to rerun WCGs
WCGs Contribution to Analytic Validation Study

• No coding variants in 5 of 29 genes
− CDKN2A, PALB2, RAD51C, SMAD4
− CHEK2 (a special case)
• Only 1 coding variant in 2 other genes
− PTEN, TP53
• The only errors in any reference data
we saw were in WCGs (but not GIAB)
− 2 in NA19240, 1 in NA12892
− All errors in low-complexity sequence
• Many of the variants are repeated
− Partly due to using related individuals
− Partly because most are common
polymorphisms
Limitations of the 7 WCGs WCGs All Others
APC 31 9
ATM 26 10
BMPR1A 7 1
BRCA1 21 162
BRCA2 39 156
BRIP1 23 5
CDH1 12 4
CDKN2A 3
CHEK2 4
EPCAM 8 1
MEN1 18 1
MET 18 2
MLH1 4 6
MSH2 4 8
MSH6 11 7
MUTYH 4 23
NBN 16 3
PALB2 8
PALLD 6 1
PMS2 16 9
PTCH1 10
PTEN 1 1
RAD51C 4
RET 27 2
SMAD4 3
TP53 1 3
VHL 7 1

PALB2 in NA12878 (Get-RM browser)
Lots of GIAB variants but none are exonic

CDKN2A in NA12878 (Get-RM browser)
Just one GIAB variant in 3’ UTR
(Similar situations in RAD51C, SMAD4)

• 304 of 310 sequence variants are SNVs
• 6 small deletions (max 4bp)
• 0 insertions
• 0 other variant types
• 0 variants in the most tricky regions for a Dx test
− Segdups, low-complexity, etc.
• No GIAB CNV data yet, but we’d expect 0 positives
• None of the WCG variants are clinically relevant
− None pathogenic or likely pathogenic under ACMG ISV criteria
− Unsurprisingly
• But Unfortunately….
Other Limitations of the WCGs in this study

A Significant Fraction of Pathogenic Variants in
The Clinical Cases are Technically Challenging
Pathogenic and likely pathogenic variants (n=260) among the clinical cases
(n=1062) by variant type.
SNV
34.2%
CNV
multi-exon
4.6%
CNV
single-
exon
3.8%Large
Indel
3.5%Complex
1.5%
Small
Indel
52.3%

Examples

BRCA1: c.1175_1214del40
Deletion
mapped
correctly in a
fraction of
reads
Split-read
signature in
additional
reads

2/9/2015 Copyright © Invitae Corp. All Rights Reserved19
BRCA2: c.9203del126
Split-read
signal at 3’
end of
deletion
Split-read
signal at 5’
end of
deletion
Exon target

Deletion Affecting 2 Neighboring Exons
Split-read
signal at 3’
end of
deletion
Split-read
signal at 5’
end of
deletion
Exon Exon
Intron

CDKN2A:c.9_32dup24
Lincoln et al., December 2014
Insertion of 3rd
repeat in correctly
mapped NGS reads
Repeat Copy 1 Repeat Copy 2
Split-read signal
from 3rd copy
(soft-clipped
reads)
Translation
5’ Met
Sup. Figures Page 21
Split-read signal
from 3rd copy
(soft-clipped
reads)

BRCA2 c.156_insAlu
Split-read
signal of
Alu sequence

• Get IGV
MSH2:c.943+3T>C
Homopolymer-A
Alignment and
Biochemical
Artifacts

SMAD4 Whole-Gene Duplication
Split-read signal
of neighboring
Exon equence
Ditto
Ditto
Ditto
Rare Pseudogene Insertion

Lies, Damned Lies and Statistics*
• Imagine this validation study:
− Test genes/exons of medical relevance in NA12878 (etc)
− Compare test results to GIAB reference data
− Count concordance, calculate sensitivity, specificity, and PPV
• Imagine an assay which silently fails to detect all “hard”
variants, but which works highly accurately on the “easy”
variants
• For the total spectrum of variants, sensitivity and specificity
will be over 99.9% for a large enough panel/study
• But among the truly positive patients there is a
>10% chance of a clinical false negative
− In targeted and validated assay regions!
*Mark Twain

• Well characterized genomes, in particular NA12878 with
the GIAB data set, contributed significantly to the analytic
validation of a hereditary cancer panel test
• But there were important limitations:
− Few if any coding variants in some genes
o These are the majority of regions targeted by most Dx assays
− Few deletions (in these regions)
o No insertions in these regions
− Very few complex or “hard” variants, including
o Large indels
o Small CNVs
o Variants in medically relevant low complexity regions
o Other tricky stuff
Conclusion

• More samples with greater genetic diversity
− This is in process!
• CNV/SV maps
− This is in process!
• Fill in some regions currently missing data
− Suggestion: Prioritize coding regions of known disease genes
o There’s ~3,000 in total, ~700 generally used in Dx ~100 commonly used
• Engineered control with lots of “hard” variants
− In subsets of those known disease genes (commonly used ones)
− Genetically engineered cell lines or spike-ins?
• Data in transcript coordinates, using HGVS
Wish List for GIAB Reference Samples

• Steve Lincoln
• Yuya Kobayashi
• Michael Anderson
• Shan Yang
• Kevin Jacobs
• Josh Paul
• Geoff Nilsen
• Jon Sorenson
• Federico Monzon
• Swaroop Aradhya
• Scott Topper
• Martin Powers
| Copyright © Invitae Corp. All Rights Reserved
Acknowledgements
• Jim Ford
• Allison Kurian
• Meredith Mills
• Leif Ellisen
• Andrea Desmond
• Michelle Gabree
• Kristen Shannon

Jan2015 using the pilot genome rm for clinical validation steve lincoln

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Jan2015 using the pilot genome rm for clinical validation steve lincoln

Ähnlich wie Jan2015 using the pilot genome rm for clinical validation steve lincoln (20)

Mehr von GenomeInABottle

Mehr von GenomeInABottle (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Jan2015 using the pilot genome rm for clinical validation steve lincoln