SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
NIST	
  Program	
  to	
  Develop	
  
Genomic	
  Reference	
  Materials	
  
      Jus<n	
  Zook	
  and	
  Marc	
  Salit	
  
Scope	
  of	
  NIST	
  work	
  
•  Human	
  Whole	
  Genome	
  RMs	
  
•  Synthe<c	
  DNA	
  constructs	
  
•  Microbial	
  Whole	
  Genome	
  RMs	
  
RM	
  Development	
  Process	
  
1.  Select	
  and	
  procure	
  materials	
  
2.  Characterize	
  materials	
  
3.  Process	
  and	
  integrate	
  data	
  from	
  mul<ple	
  
    plaMorms	
  
4.  Confirm	
  selected	
  genotypes	
  
5.  Write	
  Report	
  of	
  Analysis	
  
6.  Develop	
  methods	
  for	
  end	
  users	
  to	
  obtain	
  
    performance	
  metrics	
  from	
  the	
  materials	
  
Proposed	
  Timeline	
  for	
  Human	
  RMs	
  
Proposed	
  Timeline	
  for	
  Synthe<c	
  
                                                Structures	
  

Title                                                 2011         Effort   2012   2013   2014   2015   2
  1) Human RMs                                                     535w
         1.1) Select/Procure human DNA for RM                       32w
         1.2) **NIST receives packaged DNA for RM/SRM
         1.3) Develop bioinformatics pipeline for data              97w
              integration
         1.4) Human Primary Sequencing                             147w
         1.5) Human Homogeneity assessment                           8w
         1.6) Analyze homogeneity data and produce preliminary      10w
              SNP calls for RM
         1.7) Write human RM Report of Analysis                     10w
         1.8) Process Human RM for release                          24w
         1.9) **Human RM officially released
        1.10) Human Sequencing data integration                     25w
        1.11) Human Validation                                      20w
        1.12) Human other characterization methods                  48w
        1.13) Analyze validation data and refine sequencing calls    12w
        1.14) Develop pipeline for SVs and test                     40w
        1.15) Write Human SRM Report of Analysis                      8w
        1.16) Process Human SRM for release                         24w
        1.17) **Human SRM officially released
        1.18) Procure local data storage                            10w
        1.19) Procure Bioinformatics data analysis tools            10w
        1.20) Procure Automated sample prep instrumentation         10w
  2) Microbial RMs                                                 279w
         2.1) Select/Procure microbial DNA for RMs                  31w
         2.2) Microbial Primary Sequencing                         124w
         2.3) Microbial Homogeneity assessment                       6w
         2.4) Microbial Sequencing data integration                 40w
          2.4.1) Mapping/Alignment                                  10w
          2.4.2) Variant calling                                    12w
          2.4.3) Form consensus variant calls                       12w
Proposed	
  Characteriza<on	
  Methods	
  
          for	
  Whole	
  Genomes	
  
Whole	
  Genome	
  Sequencing	
               Other	
  
•  ABI	
  5500	
  (1kb,	
  6kb,	
  and	
      •  Genotyping	
  microarrays	
  
   10kb	
  mate-­‐pair	
  libraries)	
        •  Array	
  CGH	
  
•  Illumina	
                                 •  Targeted	
  sequencing	
  
•  Complete	
  Genomics	
                     •  Fosmid	
  sequencing?	
  
•  Upcoming	
  technologies?	
  	
            •  Op<cal	
  Mapping?	
  
     –  Ion	
  Proton?	
  	
  
     –  Oxford	
  Nanopore?	
  
                                                               Father	
        Mother	
  
•  3x	
  replica<on	
  of	
  sequencing	
  
   (3	
  library	
  preps)	
                      Husband	
            NA12878	
  


                                                     Son	
            Daughter	
  
Integra<on	
  of	
  Exis<ng	
  Data	
  to	
  Form	
  
     Consensus	
  Genotype	
  Calls	
  
                            Find	
  all	
  possible	
  variant	
  sites	
  



                       Find	
  sites	
  where	
  all	
  datasets	
  agree	
  


           Iden<fy	
  sites	
  with	
  atypical	
  characteris<cs	
  signifying	
  
                sequencing,	
  mapping,	
  or	
  alignment	
  bias	
  


       For	
  each	
  site,	
  remove	
  datasets	
  with	
  decreasingly	
  atypical	
  
                       characteris<cs	
  un<l	
  all	
  datasets	
  agree	
  


        Even	
  if	
  all	
  datasets	
  agree,	
  iden<fy	
  them	
  as	
  uncertain	
  if	
  
                              few	
  have	
  typical	
  characteris<cs	
  
Consensus	
  has	
  lower	
  FN	
  rate	
  than	
  
                                       individual	
  datasets	
  
                                                                            Illumina	
  Omni	
  SNP	
  Array	
  
                                                              Homozygous	
                                    Homozygous	
  
HiSeq	
  –	
  GATK	
  



                                                                                        Heterozygous	
                                             Uncertain	
  
                                                               Reference	
                                        Variant	
  
                                  Homozygous	
                                                          “FNs”	
  
                                   Reference/	
                  1.45M	
                 7.24k	
  (1.34%)	
   5.28k	
  (0.65%)	
                        N/A	
  
                                     No	
  Call	
               “FPs*”	
  
                                  Heterozygous	
               196	
  (0.03%)	
          411k	
  (60.7%)	
           133	
  (0.02%)	
                   N/A	
  
                                  Homozygous	
  
                                                               154	
  (0.02%)	
           150	
  (0.02%)	
          249k	
  (37.0%)	
                   N/A	
  
                                     Variant	
  
                                                                               Illumina	
  Omni	
  SNP	
  Array	
  
Integrated	
  Consensus	
  




                                                              Homozygous	
                                   Homozygous	
  
                                                                                        Heterozygous	
                                              Uncertain	
  
                                                               Reference	
                                          Variant	
  
                                  Homozygous	
                                                          “FNs”	
  
     Genotypes	
  




                                   Reference/	
                    1.45M	
                613	
  (0.09%)	
        977	
  (0.15%)	
                      N/A	
  
                                     No	
  Call	
                 “FPs*”	
  
                                  Heterozygous	
               241	
  (0.04%)	
          414k	
  (61.5%)	
           173	
  (0.03%)	
                   N/A	
  
                                  Homozygous	
                 152	
  (0.02%)	
            61	
  (0.01%)	
          249k	
  (36.9%)	
                   N/A	
  
                                     Variant	
  
                                    Uncertain	
               5458	
  (0.81%)	
          3421	
  (0.51%)	
          4808	
  (0.71%)	
                   N/A	
  

                              *	
  Note	
  that	
  most	
  or	
  all	
  of	
  the	
  puta<ve	
  FPs	
  seem	
  to	
  actually	
  be	
  FNs	
  on	
  the	
  microarray	
  
SNP	
  arrays	
  overesMmate	
  performance	
  
                                                                   Illumina	
  Omni	
  SNP	
  Array	
  
                                                        Homozygous	
                              Homozygous	
  
HiSeq	
  –	
  GATK	
  



                                                                             Heterozygous	
                                  Uncertain	
  
                                                         Reference	
                                  Variant	
  
                                    Homozygous	
                                            “FNs”	
  
                                     Reference/	
         1.45M	
            7.24k	
  (1.34%)	
   5.28k	
  (0.65%)	
             N/A	
  
                                       No	
  Call	
      “FPs*”	
  
                                    Heterozygous	
      196	
  (0.03%)	
      411k	
  (60.7%)	
      133	
  (0.02%)	
            N/A	
  
                                    Homozygous	
  
                                                        154	
  (0.02%)	
      150	
  (0.02%)	
       249k	
  (37.0%)	
           N/A	
  
                                       Variant	
  


                                                           Integrated	
  Consensus	
  Genotypes	
  
                                                        Homozygous	
                              Homozygous	
  
         HiSeq	
  –	
  GATK	
  




                                                                             Heterozygous	
                                  Uncertain	
  
                                                         Reference	
                                  Variant	
  
                                    Homozygous	
                                            “FNs”	
  
                                     Reference/	
           1.52M	
           157k	
  (4.68%)	
   30.3k	
  (0.90%)	
            4.17M	
  
                                       No	
  Call	
        “FPs”	
  
                                    Heterozygous	
       47	
  (0.00%)	
     1.90M	
  (56.4%)	
       34	
  (0.00%)	
      16.9k	
  (0.50%)	
  
                                    Homozygous	
         1	
  (0.00%)	
       298	
  (0.01%)	
      1.19M	
  (35.3%)	
     73.3k	
  (2.18%)	
  
                                       Variant	
  
Samtools	
  has	
  higher	
  FP	
  and	
  lower	
  FN	
  
                                                  than	
  GATK	
  
                                                             Integrated	
  Consensus	
  Genotypes	
  
HiSeq	
  –	
  samtools	
  



                                                         Homozygous	
                              Homozygous	
  
                                                                              Heterozygous	
                                  Uncertain	
  
                                                          Reference	
                                  Variant	
  
                                     Homozygous	
                                            “FNs”	
  
                                      Reference/	
          1.51M	
           49.6k	
  (1.47%)	
   6.74k	
  (0.20%)	
           3.93M	
  
                                        No	
  Call	
       “FPs”	
  
                                     Heterozygous	
      3141(0.09%)	
        2.00M	
  (59.6%)	
       74	
  (0.00%)	
      175k	
  (5.19%)	
  
                                     Homozygous	
                                                                           192k	
  (5.71%)	
  
                                                          21	
  (0.00%)	
      777	
  (0.02%)	
      1.21M	
  (36.0%)	
  
                                        Variant	
  


                                                            Integrated	
  Consensus	
  Genotypes	
  
                                                         Homozygous	
                              Homozygous	
  
          HiSeq	
  –	
  GATK	
  




                                                                              Heterozygous	
                                  Uncertain	
  
                                                          Reference	
                                  Variant	
  
                                     Homozygous	
                                            “FNs”	
  
                                      Reference/	
           1.52M	
           157k	
  (4.68%)	
   30.3k	
  (0.90%)	
            4.17M	
  
                                        No	
  Call	
        “FPs”	
  
                                     Heterozygous	
       47	
  (0.00%)	
     1.90M	
  (56.4%)	
       34	
  (0.00%)	
      16.9k	
  (0.50%)	
  
                                     Homozygous	
         1	
  (0.00%)	
       298	
  (0.01%)	
      1.19M	
  (35.3%)	
     73.3k	
  (2.18%)	
  
                                        Variant	
  
Performance	
  Metrics:	
  Characteris<cs	
  
                                           of	
  Mis-­‐calls	
  
                                                                                   Consensus	
  Genotypes	
  
                                                                Hom.	
  Ref.	
     Heterozygous	
          Hom.	
  Variant	
     Uncertain	
  
                 Heterozygous	
   Hom.	
  Ref./No	
  call	
  
HiSeq/GATK	
  
                 Hom.	
  Variant	
  




                                                                                     QUAL/Depth	
  of	
  Coverage	
  
                                                                                            Strand	
  Bias	
  
                                                                                                     .	
  .	
  .	
  
Challenges	
  with	
  assessing	
  
                     performance	
  
•  All	
  variant	
  types	
  are	
  not	
  equal	
  
•  Nearby	
  variants	
  are	
  ojen	
  difficult	
  to	
  align	
  
•  All	
  regions	
  of	
  the	
  genome	
  are	
  not	
  equal	
  
    –  Homopolymers,	
  STRs,	
  duplica<ons	
  
    –  Can	
  be	
  similar	
  or	
  different	
  in	
  different	
  genomes	
  
•  Labeling	
  difficult	
  variants	
  as	
  “uncertain”	
  in	
  the	
  
   Reference	
  Material	
  leads	
  to	
  higher	
  apparent	
  accuracy	
  
   when	
  assessing	
  performance	
  
•  Genotypes	
  fall	
  in	
  3+	
  categories	
  (not	
  posi<ve/nega<ve)	
  
•  It’s	
  important	
  to	
  consider	
  data	
  from	
  mul<ple	
  plaMorms	
  
   and	
  library	
  prepara<ons	
  when	
  characterizing	
  a	
  
   Reference	
  Material	
  

Weitere ähnliche Inhalte

Andere mochten auch

George Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGeorge Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait Data
GenomeInABottle
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
GenomeInABottle
 
Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...
Matt Stine
 
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Matt Stine
 

Andere mochten auch (20)

A National Network of Biomedical Research Expertise
A National Network of Biomedical Research ExpertiseA National Network of Biomedical Research Expertise
A National Network of Biomedical Research Expertise
 
George Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGeorge Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait Data
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
 
Biomedical research
Biomedical researchBiomedical research
Biomedical research
 
Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...Case Study: SRM 2.0 - A next generation shared resource management system bui...
Case Study: SRM 2.0 - A next generation shared resource management system bui...
 
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...Information Sciences Solutions to Core Facility Problems at St. Jude Children...
Information Sciences Solutions to Core Facility Problems at St. Jude Children...
 
I V I F2 F July 2005 Talk
I V I  F2 F  July 2005  TalkI V I  F2 F  July 2005  Talk
I V I F2 F July 2005 Talk
 
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
Leadership in Decline: Assessing U.S. International Competitiveness in Biomed...
 
Clean Labs Training
Clean Labs TrainingClean Labs Training
Clean Labs Training
 
decentralization: a trend in biomedical research
decentralization: a trend in biomedical researchdecentralization: a trend in biomedical research
decentralization: a trend in biomedical research
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
Making Biomedical Research More Like Airbnb
Making Biomedical Research More Like AirbnbMaking Biomedical Research More Like Airbnb
Making Biomedical Research More Like Airbnb
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2Cross-Disciplinary Biomedical Research at Calit2
Cross-Disciplinary Biomedical Research at Calit2
 
Biomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital EnterpriseBiomedical Research as an Open Digital Enterprise
Biomedical Research as an Open Digital Enterprise
 
Core Facility 2.0 - leveraging social media to enhance visibility
Core Facility 2.0 - leveraging social media to enhance visibilityCore Facility 2.0 - leveraging social media to enhance visibility
Core Facility 2.0 - leveraging social media to enhance visibility
 
HIE technical infrastructure
HIE technical infrastructureHIE technical infrastructure
HIE technical infrastructure
 
Future of biomedical instrumentation
Future of biomedical instrumentationFuture of biomedical instrumentation
Future of biomedical instrumentation
 
Supporting the Scientists: Working as a research technician in a Core Service...
Supporting the Scientists: Working as a research technician in a Core Service...Supporting the Scientists: Working as a research technician in a Core Service...
Supporting the Scientists: Working as a research technician in a Core Service...
 
Biomedical instrumentation PPT
Biomedical instrumentation PPTBiomedical instrumentation PPT
Biomedical instrumentation PPT
 

Ähnlich wie NIST program to develop genomic reference materials

Automated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNAAutomated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNA
Luc Van Laer
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
Fyzah Bashir
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
sesejun
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Reece Hart
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-Editing
Thomas Keane
 

Ähnlich wie NIST program to develop genomic reference materials (20)

Automated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNAAutomated Solutions for working with DNA/RNA
Automated Solutions for working with DNA/RNA
 
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome Project
 
Church gmod2012 pt2
Church gmod2012 pt2Church gmod2012 pt2
Church gmod2012 pt2
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
 
Biohackathon2016
Biohackathon2016Biohackathon2016
Biohackathon2016
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breeding
 
Whole Genome Analysis
Whole Genome AnalysisWhole Genome Analysis
Whole Genome Analysis
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
150224 giab 30 min generic slides
150224 giab 30 min generic slides150224 giab 30 min generic slides
150224 giab 30 min generic slides
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traits
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
 
15 molecular markers techniques
15 molecular markers techniques15 molecular markers techniques
15 molecular markers techniques
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-Editing
 
Natasha de Vere - Plants Plenary
Natasha de Vere - Plants PlenaryNatasha de Vere - Plants Plenary
Natasha de Vere - Plants Plenary
 
Fundamentals of Fluorescence in situ Hybridization
Fundamentals of Fluorescence in situ Hybridization Fundamentals of Fluorescence in situ Hybridization
Fundamentals of Fluorescence in situ Hybridization
 
Mushroom breeding
Mushroom breedingMushroom breeding
Mushroom breeding
 

Mehr von GenomeInABottle

Mehr von GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

NIST program to develop genomic reference materials

  • 1. NIST  Program  to  Develop   Genomic  Reference  Materials   Jus<n  Zook  and  Marc  Salit  
  • 2. Scope  of  NIST  work   •  Human  Whole  Genome  RMs   •  Synthe<c  DNA  constructs   •  Microbial  Whole  Genome  RMs  
  • 3. RM  Development  Process   1.  Select  and  procure  materials   2.  Characterize  materials   3.  Process  and  integrate  data  from  mul<ple   plaMorms   4.  Confirm  selected  genotypes   5.  Write  Report  of  Analysis   6.  Develop  methods  for  end  users  to  obtain   performance  metrics  from  the  materials  
  • 4. Proposed  Timeline  for  Human  RMs  
  • 5. Proposed  Timeline  for  Synthe<c   Structures   Title 2011 Effort 2012 2013 2014 2015 2 1) Human RMs 535w 1.1) Select/Procure human DNA for RM 32w 1.2) **NIST receives packaged DNA for RM/SRM 1.3) Develop bioinformatics pipeline for data 97w integration 1.4) Human Primary Sequencing 147w 1.5) Human Homogeneity assessment 8w 1.6) Analyze homogeneity data and produce preliminary 10w SNP calls for RM 1.7) Write human RM Report of Analysis 10w 1.8) Process Human RM for release 24w 1.9) **Human RM officially released 1.10) Human Sequencing data integration 25w 1.11) Human Validation 20w 1.12) Human other characterization methods 48w 1.13) Analyze validation data and refine sequencing calls 12w 1.14) Develop pipeline for SVs and test 40w 1.15) Write Human SRM Report of Analysis 8w 1.16) Process Human SRM for release 24w 1.17) **Human SRM officially released 1.18) Procure local data storage 10w 1.19) Procure Bioinformatics data analysis tools 10w 1.20) Procure Automated sample prep instrumentation 10w 2) Microbial RMs 279w 2.1) Select/Procure microbial DNA for RMs 31w 2.2) Microbial Primary Sequencing 124w 2.3) Microbial Homogeneity assessment 6w 2.4) Microbial Sequencing data integration 40w 2.4.1) Mapping/Alignment 10w 2.4.2) Variant calling 12w 2.4.3) Form consensus variant calls 12w
  • 6. Proposed  Characteriza<on  Methods   for  Whole  Genomes   Whole  Genome  Sequencing   Other   •  ABI  5500  (1kb,  6kb,  and   •  Genotyping  microarrays   10kb  mate-­‐pair  libraries)   •  Array  CGH   •  Illumina   •  Targeted  sequencing   •  Complete  Genomics   •  Fosmid  sequencing?   •  Upcoming  technologies?     •  Op<cal  Mapping?   –  Ion  Proton?     –  Oxford  Nanopore?   Father   Mother   •  3x  replica<on  of  sequencing   (3  library  preps)   Husband   NA12878   Son   Daughter  
  • 7. Integra<on  of  Exis<ng  Data  to  Form   Consensus  Genotype  Calls   Find  all  possible  variant  sites   Find  sites  where  all  datasets  agree   Iden<fy  sites  with  atypical  characteris<cs  signifying   sequencing,  mapping,  or  alignment  bias   For  each  site,  remove  datasets  with  decreasingly  atypical   characteris<cs  un<l  all  datasets  agree   Even  if  all  datasets  agree,  iden<fy  them  as  uncertain  if   few  have  typical  characteris<cs  
  • 8. Consensus  has  lower  FN  rate  than   individual  datasets   Illumina  Omni  SNP  Array   Homozygous   Homozygous   HiSeq  –  GATK   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Reference/   1.45M   7.24k  (1.34%)   5.28k  (0.65%)   N/A   No  Call   “FPs*”   Heterozygous   196  (0.03%)   411k  (60.7%)   133  (0.02%)   N/A   Homozygous   154  (0.02%)   150  (0.02%)   249k  (37.0%)   N/A   Variant   Illumina  Omni  SNP  Array   Integrated  Consensus   Homozygous   Homozygous   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Genotypes   Reference/   1.45M   613  (0.09%)   977  (0.15%)   N/A   No  Call   “FPs*”   Heterozygous   241  (0.04%)   414k  (61.5%)   173  (0.03%)   N/A   Homozygous   152  (0.02%)   61  (0.01%)   249k  (36.9%)   N/A   Variant   Uncertain   5458  (0.81%)   3421  (0.51%)   4808  (0.71%)   N/A   *  Note  that  most  or  all  of  the  puta<ve  FPs  seem  to  actually  be  FNs  on  the  microarray  
  • 9. SNP  arrays  overesMmate  performance   Illumina  Omni  SNP  Array   Homozygous   Homozygous   HiSeq  –  GATK   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Reference/   1.45M   7.24k  (1.34%)   5.28k  (0.65%)   N/A   No  Call   “FPs*”   Heterozygous   196  (0.03%)   411k  (60.7%)   133  (0.02%)   N/A   Homozygous   154  (0.02%)   150  (0.02%)   249k  (37.0%)   N/A   Variant   Integrated  Consensus  Genotypes   Homozygous   Homozygous   HiSeq  –  GATK   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Reference/   1.52M   157k  (4.68%)   30.3k  (0.90%)   4.17M   No  Call   “FPs”   Heterozygous   47  (0.00%)   1.90M  (56.4%)   34  (0.00%)   16.9k  (0.50%)   Homozygous   1  (0.00%)   298  (0.01%)   1.19M  (35.3%)   73.3k  (2.18%)   Variant  
  • 10. Samtools  has  higher  FP  and  lower  FN   than  GATK   Integrated  Consensus  Genotypes   HiSeq  –  samtools   Homozygous   Homozygous   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Reference/   1.51M   49.6k  (1.47%)   6.74k  (0.20%)   3.93M   No  Call   “FPs”   Heterozygous   3141(0.09%)   2.00M  (59.6%)   74  (0.00%)   175k  (5.19%)   Homozygous   192k  (5.71%)   21  (0.00%)   777  (0.02%)   1.21M  (36.0%)   Variant   Integrated  Consensus  Genotypes   Homozygous   Homozygous   HiSeq  –  GATK   Heterozygous   Uncertain   Reference   Variant   Homozygous   “FNs”   Reference/   1.52M   157k  (4.68%)   30.3k  (0.90%)   4.17M   No  Call   “FPs”   Heterozygous   47  (0.00%)   1.90M  (56.4%)   34  (0.00%)   16.9k  (0.50%)   Homozygous   1  (0.00%)   298  (0.01%)   1.19M  (35.3%)   73.3k  (2.18%)   Variant  
  • 11. Performance  Metrics:  Characteris<cs   of  Mis-­‐calls   Consensus  Genotypes   Hom.  Ref.   Heterozygous   Hom.  Variant   Uncertain   Heterozygous   Hom.  Ref./No  call   HiSeq/GATK   Hom.  Variant   QUAL/Depth  of  Coverage   Strand  Bias   .  .  .  
  • 12. Challenges  with  assessing   performance   •  All  variant  types  are  not  equal   •  Nearby  variants  are  ojen  difficult  to  align   •  All  regions  of  the  genome  are  not  equal   –  Homopolymers,  STRs,  duplica<ons   –  Can  be  similar  or  different  in  different  genomes   •  Labeling  difficult  variants  as  “uncertain”  in  the   Reference  Material  leads  to  higher  apparent  accuracy   when  assessing  performance   •  Genotypes  fall  in  3+  categories  (not  posi<ve/nega<ve)   •  It’s  important  to  consider  data  from  mul<ple  plaMorms   and  library  prepara<ons  when  characterizing  a   Reference  Material