4. 1. Multiple Enrichment Methods
− No one technology delivers adequate coverage of all 29 genes
2. Copy number and other structural variants play a
significant role in addition to sequence variants
− CNVs as small as one exon
− Alu insertions
− Tandem duplications
3. Of these 29 genes, a number are “hard”
− PMS2 (last 4 exons) and CHEK2 have pseudogenes
− SMAD4 also does, in some people
− MSH2 has a large intronic homopolymer-A immediately next to
a canonical splice site (known to harbor pathogenic mutations)
− CDKN2A has a low complexity 80% GC tandem duplication at
the 5’ Met (also known to harbor pathogenic mutations)
Technical Requirements For These 29 Genes
5. Study Population
Group N Description Previous Testing
Prospective
Clinical
735
Prospectively accrued clinical
cases
Clinical testing for
BRCA1/2, occasionally
other genes (depending
on case)
High-Risk
Clinical
(Total 327)
209
Retrospective cases from a clinical
biobank generally containing
higher-risk individuals
118
Cases referred due to known
pathogenic variant in family
Clinical single-site
testing
Reference
Samples
36
Reference samples from public
biobanks (Coriell, NIBSC)
Samples carry known
pathogenic variants
Well-Characterized
Genomes (WCGs)
7
Reference samples from public
biobanks with high-quality whole
genome sequencing (WGS) data
Variants in 29 cancer
genes extracted from
WGS data; most of
these are benign
Total 1105
1062
6. 7 Well Characterized Genomes (WCGs) Used
✔
NA19239 NA19238
NA19240
CEPH/Utah Pedigree 1463 Yoruba Family Y117
✔
NA12889
✔
✔
✔ ✔
NA12879
NA12890
NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893
NA12877 NA12878
NA12891 NA12892
✔
Geoff Nilsen
Integrated Complete Genomics, Illumina Platinum and other data sets
Mendelian scrub (leveraging data from family members not used in this study)
7. 1. CLIA Validation and Performance Study (pre-GIAB):
• Integrated CG and Illumina Platinum data
• Compared scrubbed data against our Dx test data
2. Later reconciled NA12878 data against GIAB data set
• Substantially the same as our integrated data
Results presented here are a mix of the pre/post GIAB
WCGs in Cancer Panel Validation
Geoff Nilsen, Shan Yang
8. • 58,708 variants detected (avg. 53 per patient)
• >90% are common polymorphisms (MAF>1% in 1KG)
• >99% are single nucleotide variants (SNVs)
• <0.1% are of the most technically challenging types*
− CNVs (single gene to single exon)
− Larger indels (≥10bp)
− Closely-spaced variants (≤25bp)
− Complex variants
− Variants in/near low complexity sequence
Genetic Data for 1105 Individuals x 29 genes
*We believe this largely reflects prevalence, not sensitivity limitations.
10. Variants Selected in Analytic Validation Study
Type Variants Details
Single Nucleotide Variants (SNVs) 549
Sequence deletions <10 base-pairs 125
Sequence insertions <5 base-pairs 31
Sequence insertions ≥5 base-pairs 4 24, 5 bp
Sequence deletions ≥10 base-pairs 9 126, 40, 19, 15, 11 bp
Complex variants 6 Delins, haplotypes, Homopolymer-associated1
Single exon deletions 9 BRCA1, BRCA2, MSH2, PMS2
Single exon duplications 4 BRCA1, MLH1
Deletions of multiple exons or whole gene 10 BRCA1, MSH2, RAD51C
Duplications of multiple exons or whole gene 6 BRCA1, BRCA2, NBN, SMAD4
Total 750
SequenceCopyNumber
Some published validation studies have few, if any, examples of these relatively
challenging classes of variation2,3
1. MSH2:c.942+3A>T
2. Bosdet et al, J Mol Dx, 2013
3. Chong et al, PLOS One, 2014
“Hard Stuff”
All could be directly compared between NGS panel and reference/orthogonal data.
11. • 7 Samples Contributed 310 of 750 selected variants
− All variants in assay targets in the WCG data sets were used
− 41% of the total set of variants came from 0.6% of the samples
• In 15 of 29 genes the 7 WCGs doubled (or more) the
selected variant count
• WCGs added variants in one gene (PTCH1) which
otherwise had none selected
• Saved us 310 Sanger confirmations
− Unlike confirmation, WCGs contribute both to sensitivity and
specificity measurements in a strong way
• As a replenishable resource, it’s easy to rerun WCGs
WCGs Contribution to Analytic Validation Study
12. • No coding variants in 5 of 29 genes
− CDKN2A, PALB2, RAD51C, SMAD4
− CHEK2 (a special case)
• Only 1 coding variant in 2 other genes
− PTEN, TP53
• The only errors in any reference data
we saw were in WCGs (but not GIAB)
− 2 in NA19240, 1 in NA12892
− All errors in low-complexity sequence
• Many of the variants are repeated
− Partly due to using related individuals
− Partly because most are common
polymorphisms
Limitations of the 7 WCGs WCGs All Others
APC 31 9
ATM 26 10
BMPR1A 7 1
BRCA1 21 162
BRCA2 39 156
BRIP1 23 5
CDH1 12 4
CDKN2A 3
CHEK2 4
EPCAM 8 1
MEN1 18 1
MET 18 2
MLH1 4 6
MSH2 4 8
MSH6 11 7
MUTYH 4 23
NBN 16 3
PALB2 8
PALLD 6 1
PMS2 16 9
PTCH1 10
PTEN 1 1
RAD51C 4
RET 27 2
SMAD4 3
TP53 1 3
VHL 7 1
15. • 304 of 310 sequence variants are SNVs
• 6 small deletions (max 4bp)
• 0 insertions
• 0 other variant types
• 0 variants in the most tricky regions for a Dx test
− Segdups, low-complexity, etc.
• No GIAB CNV data yet, but we’d expect 0 positives
• None of the WCG variants are clinically relevant
− None pathogenic or likely pathogenic under ACMG ISV criteria
− Unsurprisingly
• But Unfortunately….
Other Limitations of the WCGs in this study
16. A Significant Fraction of Pathogenic Variants in
The Clinical Cases are Technically Challenging
Pathogenic and likely pathogenic variants (n=260) among the clinical cases
(n=1062) by variant type.
SNV
34.2%
CNV
multi-exon
4.6%
CNV
single-
exon
3.8%Large
Indel
3.5%Complex
1.5%
Small
Indel
52.3%
21. CDKN2A:c.9_32dup24
Lincoln et al., December 2014
Insertion of 3rd
repeat in correctly
mapped NGS reads
Repeat Copy 1 Repeat Copy 2
Split-read signal
from 3rd copy
(soft-clipped
reads)
Translation
5’ Met
Sup. Figures Page 21
Split-read signal
from 3rd copy
(soft-clipped
reads)
25. Lies, Damned Lies and Statistics*
• Imagine this validation study:
− Test genes/exons of medical relevance in NA12878 (etc)
− Compare test results to GIAB reference data
− Count concordance, calculate sensitivity, specificity, and PPV
• Imagine an assay which silently fails to detect all “hard”
variants, but which works highly accurately on the “easy”
variants
• For the total spectrum of variants, sensitivity and specificity
will be over 99.9% for a large enough panel/study
• But among the truly positive patients there is a
>10% chance of a clinical false negative
− In targeted and validated assay regions!
*Mark Twain