Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
2/9/2015 © 2013-2014 Invitae Corporation. All Rights Reserved | CONFIDENTIAL1
Using the Genome in a Bottle (GIAB) pilot
re...
• Diagnostic tests are ordered in response to a medical
question that needs an answer in order to make a specific
decision...
29 Gene Hereditary Cancer Panel
Sub-Panel Genes Total Gene names
BRCA1/2 2 2 BRCA1, BRCA2
Other High-Risk Breast/Ovarian 4...
1. Multiple Enrichment Methods
− No one technology delivers adequate coverage of all 29 genes
2. Copy number and other str...
Study Population
Group N Description Previous Testing
Prospective
Clinical
735
Prospectively accrued clinical
cases
Clinic...
7 Well Characterized Genomes (WCGs) Used
✔
NA19239 NA19238
NA19240
CEPH/Utah Pedigree 1463 Yoruba Family Y117
✔
NA12889
✔
...
1. CLIA Validation and Performance Study (pre-GIAB):
• Integrated CG and Illumina Platinum data
• Compared scrubbed data a...
• 58,708 variants detected (avg. 53 per patient)
• >90% are common polymorphisms (MAF>1% in 1KG)
• >99% are single nucleot...
Analytic
Validation
Variants Selected in Analytic Validation Study
Type Variants Details
Single Nucleotide Variants (SNVs) 549
Sequence deleti...
• 7 Samples Contributed 310 of 750 selected variants
− All variants in assay targets in the WCG data sets were used
− 41% ...
• No coding variants in 5 of 29 genes
− CDKN2A, PALB2, RAD51C, SMAD4
− CHEK2 (a special case)
• Only 1 coding variant in 2...
2/9/2015 | Copyright © Invitae Corp. All Rights Reserved13
PALB2 in NA12878 (Get-RM browser)
Lots of GIAB variants but non...
2/9/2015 | Copyright © Invitae Corp. All Rights Reserved14
CDKN2A in NA12878 (Get-RM browser)
Just one GIAB variant in 3’ ...
• 304 of 310 sequence variants are SNVs
• 6 small deletions (max 4bp)
• 0 insertions
• 0 other variant types
• 0 variants ...
A Significant Fraction of Pathogenic Variants in
The Clinical Cases are Technically Challenging
Pathogenic and likely path...
Examples
2/9/2015 | Copyright © Invitae Corp. All Rights Reserved17
2/9/2015 | Copyright © Invitae Corp. All Rights Reserved18
BRCA1: c.1175_1214del40
Deletion
mapped
correctly in a
fraction...
2/9/2015 Copyright © Invitae Corp. All Rights Reserved19
BRCA2: c.9203del126
Split-read
signal at 3’
end of
deletion
Split...
2/9/2015 | Copyright © Invitae Corp. All Rights Reserved20
Deletion Affecting 2 Neighboring Exons
Split-read
signal at 3’
...
CDKN2A:c.9_32dup24
Lincoln et al., December 2014
Insertion of 3rd
repeat in correctly
mapped NGS reads
Repeat Copy 1 Repea...
2/9/2015 | Copyright © Invitae Corp. All Rights Reserved22
BRCA2 c.156_insAlu
Split-read
signal of
Alu sequence
• Get IGV
2/9/2015 | Copyright © Invitae Corp. All Rights Reserved23
MSH2:c.943+3T>C
Homopolymer-A
Alignment and
Biochemic...
2/9/2015 | Copyright © Invitae Corp. All Rights Reserved24
SMAD4 Whole-Gene Duplication
Split-read signal
of neighboring
E...
Lies, Damned Lies and Statistics*
• Imagine this validation study:
− Test genes/exons of medical relevance in NA12878 (etc...
• Well characterized genomes, in particular NA12878 with
the GIAB data set, contributed significantly to the analytic
vali...
• More samples with greater genetic diversity
− This is in process!
• CNV/SV maps
− This is in process!
• Fill in some reg...
• Steve Lincoln
• Yuya Kobayashi
• Michael Anderson
• Shan Yang
• Kevin Jacobs
• Josh Paul
• Geoff Nilsen
• Jon Sorenson
•...
Nächste SlideShare
Wird geladen in …5
×

Jan2015 using the pilot genome rm for clinical validation steve lincoln

1.270 Aufrufe

Veröffentlicht am

Jan2015 using the pilot genome rm for clinical validation steve lincoln

Veröffentlicht in: Gesundheit & Medizin
  • Get the best essay, research papers or dissertations. from ⇒ www.HelpWriting.net ⇐ A team of professional authors with huge experience will give u a result that will overcome your expectations.
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Get the best essay, research papers or dissertations. from ⇒ www.WritePaper.info ⇐ A team of professional authors with huge experience will give u a result that will overcome your expectations.
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Jan2015 using the pilot genome rm for clinical validation steve lincoln

  1. 1. 2/9/2015 © 2013-2014 Invitae Corporation. All Rights Reserved | CONFIDENTIAL1 Using the Genome in a Bottle (GIAB) pilot reference material: Its strengths and limitations for analytic validation of a diagnostic panel test Stephen E. Lincoln Invitae
  2. 2. • Diagnostic tests are ordered in response to a medical question that needs an answer in order to make a specific decision for a specific patient − Can be time critical; Decisions may not be reversible • Our Job: Provide a highly accurate answer to the question asked in the time needed − A complete answer is highly valued o No matter how challenging (with some limits) − Extra information is not valued (in most cases) • Rigorous validation required by CLIA, CAP, the medical community and payers − Focus on Analytic (not Clinical) validation here… 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved2 Genetic Diagnostic Tests ≠ Research
  3. 3. 29 Gene Hereditary Cancer Panel Sub-Panel Genes Total Gene names BRCA1/2 2 2 BRCA1, BRCA2 Other High-Risk Breast/Ovarian 4 6 CDH1, PTEN, STK11, TP53 Moderate-Risk Breast/Ovarian 6 12 ATM, BRIP1, CHEK2, NBN, PALB2, RAD51C Lynch Syndrome 5 17 EPCAM, MLH1, MSH2, MSH6, PMS2 Other Hereditary Cancer Syndromes 11 28 APC, BMPR1A, SMAD4, CDK4, CDKN2A, PALLD, MET, MEN1, RET, PTCH1, VHL MUTYH 1 29 MUTYH
  4. 4. 1. Multiple Enrichment Methods − No one technology delivers adequate coverage of all 29 genes 2. Copy number and other structural variants play a significant role in addition to sequence variants − CNVs as small as one exon − Alu insertions − Tandem duplications 3. Of these 29 genes, a number are “hard” − PMS2 (last 4 exons) and CHEK2 have pseudogenes − SMAD4 also does, in some people − MSH2 has a large intronic homopolymer-A immediately next to a canonical splice site (known to harbor pathogenic mutations) − CDKN2A has a low complexity 80% GC tandem duplication at the 5’ Met (also known to harbor pathogenic mutations) Technical Requirements For These 29 Genes
  5. 5. Study Population Group N Description Previous Testing Prospective Clinical 735 Prospectively accrued clinical cases Clinical testing for BRCA1/2, occasionally other genes (depending on case) High-Risk Clinical (Total 327) 209 Retrospective cases from a clinical biobank generally containing higher-risk individuals 118 Cases referred due to known pathogenic variant in family Clinical single-site testing Reference Samples 36 Reference samples from public biobanks (Coriell, NIBSC) Samples carry known pathogenic variants Well-Characterized Genomes (WCGs) 7 Reference samples from public biobanks with high-quality whole genome sequencing (WGS) data Variants in 29 cancer genes extracted from WGS data; most of these are benign Total 1105 1062
  6. 6. 7 Well Characterized Genomes (WCGs) Used ✔ NA19239 NA19238 NA19240 CEPH/Utah Pedigree 1463 Yoruba Family Y117 ✔ NA12889 ✔ ✔ ✔ ✔ NA12879 NA12890 NA12880 NA12881 NA12882 NA12883 NA12884 NA12885 NA12886 NA12887 NA12888 NA12893 NA12877 NA12878 NA12891 NA12892 ✔ Geoff Nilsen Integrated Complete Genomics, Illumina Platinum and other data sets Mendelian scrub (leveraging data from family members not used in this study)
  7. 7. 1. CLIA Validation and Performance Study (pre-GIAB): • Integrated CG and Illumina Platinum data • Compared scrubbed data against our Dx test data 2. Later reconciled NA12878 data against GIAB data set • Substantially the same as our integrated data Results presented here are a mix of the pre/post GIAB WCGs in Cancer Panel Validation Geoff Nilsen, Shan Yang
  8. 8. • 58,708 variants detected (avg. 53 per patient) • >90% are common polymorphisms (MAF>1% in 1KG) • >99% are single nucleotide variants (SNVs) • <0.1% are of the most technically challenging types* − CNVs (single gene to single exon) − Larger indels (≥10bp) − Closely-spaced variants (≤25bp) − Complex variants − Variants in/near low complexity sequence Genetic Data for 1105 Individuals x 29 genes *We believe this largely reflects prevalence, not sensitivity limitations.
  9. 9. Analytic Validation
  10. 10. Variants Selected in Analytic Validation Study Type Variants Details Single Nucleotide Variants (SNVs) 549 Sequence deletions <10 base-pairs 125 Sequence insertions <5 base-pairs 31 Sequence insertions ≥5 base-pairs 4 24, 5 bp Sequence deletions ≥10 base-pairs 9 126, 40, 19, 15, 11 bp Complex variants 6 Delins, haplotypes, Homopolymer-associated1 Single exon deletions 9 BRCA1, BRCA2, MSH2, PMS2 Single exon duplications 4 BRCA1, MLH1 Deletions of multiple exons or whole gene 10 BRCA1, MSH2, RAD51C Duplications of multiple exons or whole gene 6 BRCA1, BRCA2, NBN, SMAD4 Total 750 SequenceCopyNumber Some published validation studies have few, if any, examples of these relatively challenging classes of variation2,3 1. MSH2:c.942+3A>T 2. Bosdet et al, J Mol Dx, 2013 3. Chong et al, PLOS One, 2014 “Hard Stuff” All could be directly compared between NGS panel and reference/orthogonal data.
  11. 11. • 7 Samples Contributed 310 of 750 selected variants − All variants in assay targets in the WCG data sets were used − 41% of the total set of variants came from 0.6% of the samples • In 15 of 29 genes the 7 WCGs doubled (or more) the selected variant count • WCGs added variants in one gene (PTCH1) which otherwise had none selected • Saved us 310 Sanger confirmations − Unlike confirmation, WCGs contribute both to sensitivity and specificity measurements in a strong way • As a replenishable resource, it’s easy to rerun WCGs WCGs Contribution to Analytic Validation Study
  12. 12. • No coding variants in 5 of 29 genes − CDKN2A, PALB2, RAD51C, SMAD4 − CHEK2 (a special case) • Only 1 coding variant in 2 other genes − PTEN, TP53 • The only errors in any reference data we saw were in WCGs (but not GIAB) − 2 in NA19240, 1 in NA12892 − All errors in low-complexity sequence • Many of the variants are repeated − Partly due to using related individuals − Partly because most are common polymorphisms Limitations of the 7 WCGs WCGs All Others APC 31 9 ATM 26 10 BMPR1A 7 1 BRCA1 21 162 BRCA2 39 156 BRIP1 23 5 CDH1 12 4 CDKN2A 3 CHEK2 4 EPCAM 8 1 MEN1 18 1 MET 18 2 MLH1 4 6 MSH2 4 8 MSH6 11 7 MUTYH 4 23 NBN 16 3 PALB2 8 PALLD 6 1 PMS2 16 9 PTCH1 10 PTEN 1 1 RAD51C 4 RET 27 2 SMAD4 3 TP53 1 3 VHL 7 1
  13. 13. 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved13 PALB2 in NA12878 (Get-RM browser) Lots of GIAB variants but none are exonic
  14. 14. 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved14 CDKN2A in NA12878 (Get-RM browser) Just one GIAB variant in 3’ UTR (Similar situations in RAD51C, SMAD4)
  15. 15. • 304 of 310 sequence variants are SNVs • 6 small deletions (max 4bp) • 0 insertions • 0 other variant types • 0 variants in the most tricky regions for a Dx test − Segdups, low-complexity, etc. • No GIAB CNV data yet, but we’d expect 0 positives • None of the WCG variants are clinically relevant − None pathogenic or likely pathogenic under ACMG ISV criteria − Unsurprisingly • But Unfortunately…. Other Limitations of the WCGs in this study
  16. 16. A Significant Fraction of Pathogenic Variants in The Clinical Cases are Technically Challenging Pathogenic and likely pathogenic variants (n=260) among the clinical cases (n=1062) by variant type. SNV 34.2% CNV multi-exon 4.6% CNV single- exon 3.8%Large Indel 3.5%Complex 1.5% Small Indel 52.3%
  17. 17. Examples 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved17
  18. 18. 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved18 BRCA1: c.1175_1214del40 Deletion mapped correctly in a fraction of reads Split-read signature in additional reads
  19. 19. 2/9/2015 Copyright © Invitae Corp. All Rights Reserved19 BRCA2: c.9203del126 Split-read signal at 3’ end of deletion Split-read signal at 5’ end of deletion Exon target
  20. 20. 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved20 Deletion Affecting 2 Neighboring Exons Split-read signal at 3’ end of deletion Split-read signal at 5’ end of deletion Exon Exon Intron
  21. 21. CDKN2A:c.9_32dup24 Lincoln et al., December 2014 Insertion of 3rd repeat in correctly mapped NGS reads Repeat Copy 1 Repeat Copy 2 Split-read signal from 3rd copy (soft-clipped reads) Translation 5’ Met Sup. Figures Page 21 Split-read signal from 3rd copy (soft-clipped reads)
  22. 22. 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved22 BRCA2 c.156_insAlu Split-read signal of Alu sequence
  23. 23. • Get IGV 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved23 MSH2:c.943+3T>C Homopolymer-A Alignment and Biochemical Artifacts
  24. 24. 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved24 SMAD4 Whole-Gene Duplication Split-read signal of neighboring Exon equence Ditto Ditto Ditto Rare Pseudogene Insertion
  25. 25. Lies, Damned Lies and Statistics* • Imagine this validation study: − Test genes/exons of medical relevance in NA12878 (etc) − Compare test results to GIAB reference data − Count concordance, calculate sensitivity, specificity, and PPV • Imagine an assay which silently fails to detect all “hard” variants, but which works highly accurately on the “easy” variants • For the total spectrum of variants, sensitivity and specificity will be over 99.9% for a large enough panel/study • But among the truly positive patients there is a >10% chance of a clinical false negative − In targeted and validated assay regions! *Mark Twain
  26. 26. • Well characterized genomes, in particular NA12878 with the GIAB data set, contributed significantly to the analytic validation of a hereditary cancer panel test • But there were important limitations: − Few if any coding variants in some genes o These are the majority of regions targeted by most Dx assays − Few deletions (in these regions) o No insertions in these regions − Very few complex or “hard” variants, including o Large indels o Small CNVs o Variants in medically relevant low complexity regions o Other tricky stuff 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved26 Conclusion
  27. 27. • More samples with greater genetic diversity − This is in process! • CNV/SV maps − This is in process! • Fill in some regions currently missing data − Suggestion: Prioritize coding regions of known disease genes o There’s ~3,000 in total, ~700 generally used in Dx ~100 commonly used • Engineered control with lots of “hard” variants − In subsets of those known disease genes (commonly used ones) − Genetically engineered cell lines or spike-ins? • Data in transcript coordinates, using HGVS 2/9/2015 | Copyright © Invitae Corp. All Rights Reserved27 Wish List for GIAB Reference Samples
  28. 28. • Steve Lincoln • Yuya Kobayashi • Michael Anderson • Shan Yang • Kevin Jacobs • Josh Paul • Geoff Nilsen • Jon Sorenson • Federico Monzon • Swaroop Aradhya • Scott Topper • Martin Powers | Copyright © Invitae Corp. All Rights Reserved Acknowledgements • Jim Ford • Allison Kurian • Meredith Mills • Leif Ellisen • Andrea Desmond • Michelle Gabree • Kristen Shannon

×