SlideShare a Scribd company logo
1 of 11
Download to read offline
Sample Characterization

Michael A. Eberle
GiaB, January 2014
Pedigree including NA12878

12889

12890

12891

12892

NA12878
12877

12879

12880

12881

12882

12878

12883

12884

12885

12886

12887

12888

12893

All 17 members sequenced to at least 50x depth (PCR-Free protocol)
Variants are called across the pedigree using different software & technology
Inheritance information provides high confident, direct validation of variant calls

2
Why sequence a pedigree?

AA
CT
AC
GT
TG
AA

AG
CC
AA
GT
TT
AA
3

AG
TT
CC
TG
GT
AC

AG
TC
CA
TT
GT
AA

GG
TC
CA
GT
TT
CA

AG
CC
AA
TT
TT
AA

AG
CT
AC
GG
TT
AC

With a sufficiently large pedigree
the transmission of the parental
chromosomes can unambiguously
be determined

AG
TT
CC
TG
GT
AC

AG
CC
AA
GT
TT
AA

Error: T in blue haplotype should be G
Why sequence a pedigree?

Either parent
could also be TT

AG
CC
AA
GT
TT
AA
4

AG
TT
CC
TG
GT
AC

AA
CT
AC
GT
TG
AA

AG
TC
CA
TT
GT
AA

GG
TC
CA
GT
TT
CA

AG
CC
AA
TT
TT
AA

If only the trio were sequenced this
error would not be detected
When sequencing a trio we can
never eliminate alternative
genotypes in some of the samples

AG
AG
AG
CT
CC
TT
A Could also be GG or GT
C
AA
CC
GG
GT
TG
TT
TT
GT
AC
AA
AC
A large pedigree identifies most errors
Can identify a single error
in >99.7% of the variant
positions (11 sibs)

% Sites Perfectly Constrained
Percent

100

“Perfectly constrained” means could
remove the genotype information of any
More sibs adds
confidence to more sample and impute it based on the
phasing and other sample genotypes
variant calls

50

2 sibs allows phasing & identifies errors in 25% of variant positions

Trio never positively identifies the genotypes in every sample

0
1
5

2

3

4

5

6

7

# Siblings

8

9

10 11
Cost to add more siblings

% Sites Perfectly Constrained
Percent

100
2 Trios of Sequencing / 4 sibs

50

1 Trio of Sequencing

0
1
6

2

3

4

5

6

7

# Siblings

8

9

10 11
Understanding conflicts in the pedigree

7
# Errors

Somatic/cell-line deletions on chr22

300

Errors per 50kb

Errors in NA12878 & NA12893

200
100
0

300
200
100
0

Normalized Depth
4
3

None of the other children carry
this deletion (though noise may
indicate mosaic)

2
1
0
8

1Mb
Read counts for the haplotypes inferred in NA12878 at
location of cell line deletion (200x depth)
Maternal haplotype (NA12892)

0.10
•

Inferred the two haplotypes in
NA12878 based on the other samples
Counts represent the predicted
heterozygous locations

Fraction

•

0.05
Paternal haplotype (NA12891)

0.00
0

50

100

Allele Counts
9

150

200
Technical replicates validate de novo SNVs
82 (~4%) did not replicate

Total Errors
TotalConflicts

4000

3000

2000

FPs?

1843 (~96%) replicate original call

NA
128

0

82

1000

10

Results in Tech. Rep.
Thoughts on selecting the next samples for sequencing

Identify and sequence pedigrees with multiple siblings
– WGS every individual in the pedigree to identify haplotype transmission vectors
– One “high quality” family (2 parents & 4 sibs) provides a “better” reference than two
lower quality trios for the same amount of sequencing
– Technical replicates allow alternative validation of biologically interesting calls – e.g.
de novo mutations, gene conversion etc.

Choose one or two samples to target for long reads if sequencing-limited
– Sequencing both parent will provide 100% of the variants in the pedigree though with
four children only ~75% will be validated in the children
– Sequencing a child will guarantee that every variant has been sequenced in at least
one of the parents though will only contain ~50% of the variants in the family

Quality of the DNA is important
– CEPH pedigree shows many cell line artifacts that are correctly genotyped but
deviate from inheritance
– Cell line artifacts complicate the analysis

11

More Related Content

What's hot

Spring Research Paper FINAL
Spring Research Paper FINALSpring Research Paper FINAL
Spring Research Paper FINAL
Hameeda Naimi
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Klaas Vandepoele
 
Spring 2016 Poster - Ashley 2
Spring 2016 Poster - Ashley 2Spring 2016 Poster - Ashley 2
Spring 2016 Poster - Ashley 2
Ashley Smith
 
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
OECD Environment
 
Detection of ehrlichia platys dna in brown dog ticks
Detection of ehrlichia platys dna in brown dog ticksDetection of ehrlichia platys dna in brown dog ticks
Detection of ehrlichia platys dna in brown dog ticks
Josephine Huang
 
Transplantation in sensitized patients(seminar)
Transplantation in sensitized patients(seminar)Transplantation in sensitized patients(seminar)
Transplantation in sensitized patients(seminar)
Vishal Golay
 
Ross Excel 15 Final
Ross Excel 15 FinalRoss Excel 15 Final
Ross Excel 15 Final
Brandon Ross
 

What's hot (20)

Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...
 
Spring Research Paper FINAL
Spring Research Paper FINALSpring Research Paper FINAL
Spring Research Paper FINAL
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
 
Variant Calling II
Variant Calling IIVariant Calling II
Variant Calling II
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
Spring 2016 Poster - Ashley 2
Spring 2016 Poster - Ashley 2Spring 2016 Poster - Ashley 2
Spring 2016 Poster - Ashley 2
 
Early proteins of hpv
Early proteins of hpvEarly proteins of hpv
Early proteins of hpv
 
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
Genome editing as a tool for enhancing disease resistance in crops - Vladimir...
 
Ophtalmology
OphtalmologyOphtalmology
Ophtalmology
 
Detection of ehrlichia platys dna in brown dog ticks
Detection of ehrlichia platys dna in brown dog ticksDetection of ehrlichia platys dna in brown dog ticks
Detection of ehrlichia platys dna in brown dog ticks
 
Immortalized Human Amniotic Epithelial Cells
Immortalized Human Amniotic Epithelial CellsImmortalized Human Amniotic Epithelial Cells
Immortalized Human Amniotic Epithelial Cells
 
Stable 16 year storage of DNA purified with the QIAamp® DNA Blood mini kit - ...
Stable 16 year storage of DNA purified with the QIAamp® DNA Blood mini kit - ...Stable 16 year storage of DNA purified with the QIAamp® DNA Blood mini kit - ...
Stable 16 year storage of DNA purified with the QIAamp® DNA Blood mini kit - ...
 
Transplantation in sensitized patients(seminar)
Transplantation in sensitized patients(seminar)Transplantation in sensitized patients(seminar)
Transplantation in sensitized patients(seminar)
 
A Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with HypertableA Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with Hypertable
 
Service Details 2
Service Details 2Service Details 2
Service Details 2
 
Ross poster final
Ross poster finalRoss poster final
Ross poster final
 
ISHIposter16_f
ISHIposter16_fISHIposter16_f
ISHIposter16_f
 
Weak alloantibody anti Jka missed on routine crossmatching: a case report ill...
Weak alloantibody anti Jka missed on routine crossmatching: a case report ill...Weak alloantibody anti Jka missed on routine crossmatching: a case report ill...
Weak alloantibody anti Jka missed on routine crossmatching: a case report ill...
 
Ross Excel 15 Final
Ross Excel 15 FinalRoss Excel 15 Final
Ross Excel 15 Final
 

Viewers also liked

140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WG
GenomeInABottle
 
140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls
GenomeInABottle
 
140127 bioinformatics wg summary
140127 bioinformatics wg summary140127 bioinformatics wg summary
140127 bioinformatics wg summary
GenomeInABottle
 
140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMs
GenomeInABottle
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
GenomeInABottle
 
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of BiasRyan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
GenomeInABottle
 
Aug2013 reference material selection and design working group
Aug2013 reference material selection and design working groupAug2013 reference material selection and design working group
Aug2013 reference material selection and design working group
GenomeInABottle
 
140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses
GenomeInABottle
 
Aug2013 Mike Snyder the genomics revolution and human health
Aug2013 Mike Snyder the genomics revolution and human healthAug2013 Mike Snyder the genomics revolution and human health
Aug2013 Mike Snyder the genomics revolution and human health
GenomeInABottle
 
NIST program to develop genomic reference materials
NIST program to develop genomic reference materialsNIST program to develop genomic reference materials
NIST program to develop genomic reference materials
GenomeInABottle
 
March 2013 NIST Reference Material Program and Data Integration
March 2013 NIST Reference Material Program and Data IntegrationMarch 2013 NIST Reference Material Program and Data Integration
March 2013 NIST Reference Material Program and Data Integration
GenomeInABottle
 
Aug2013 performance metrics working group
Aug2013 performance metrics working groupAug2013 performance metrics working group
Aug2013 performance metrics working group
GenomeInABottle
 
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
GenomeInABottle
 
Aug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materialsAug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materials
GenomeInABottle
 
Aug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browserAug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browser
GenomeInABottle
 

Viewers also liked (19)

140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WG
 
140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls140127 GIAB update and NIST high-confidence calls
140127 GIAB update and NIST high-confidence calls
 
140127 bioinformatics wg summary
140127 bioinformatics wg summary140127 bioinformatics wg summary
140127 bioinformatics wg summary
 
Aug2014 working group report rm selection and design
Aug2014 working group report rm selection and designAug2014 working group report rm selection and design
Aug2014 working group report rm selection and design
 
140128 use cases of giab RMs
140128 use cases of giab RMs140128 use cases of giab RMs
140128 use cases of giab RMs
 
Aug2013 NIST program slides
Aug2013 NIST program slidesAug2013 NIST program slides
Aug2013 NIST program slides
 
Ryan Poplin - Sources of Bias
Ryan Poplin - Sources of BiasRyan Poplin - Sources of Bias
Ryan Poplin - Sources of Bias
 
Aug2013 reference material selection and design working group
Aug2013 reference material selection and design working groupAug2013 reference material selection and design working group
Aug2013 reference material selection and design working group
 
140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses
 
Mason u41 grant figures
Mason u41 grant figuresMason u41 grant figures
Mason u41 grant figures
 
Aug2013 Mike Snyder the genomics revolution and human health
Aug2013 Mike Snyder the genomics revolution and human healthAug2013 Mike Snyder the genomics revolution and human health
Aug2013 Mike Snyder the genomics revolution and human health
 
NIST program to develop genomic reference materials
NIST program to develop genomic reference materialsNIST program to develop genomic reference materials
NIST program to develop genomic reference materials
 
March 2013 NIST Reference Material Program and Data Integration
March 2013 NIST Reference Material Program and Data IntegrationMarch 2013 NIST Reference Material Program and Data Integration
March 2013 NIST Reference Material Program and Data Integration
 
Tools for Using NIST Reference Materials
Tools for Using NIST Reference MaterialsTools for Using NIST Reference Materials
Tools for Using NIST Reference Materials
 
Aug2013 performance metrics working group
Aug2013 performance metrics working groupAug2013 performance metrics working group
Aug2013 performance metrics working group
 
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...Bioinformatics, Data Integration, and Data Representation Working Group Summa...
Bioinformatics, Data Integration, and Data Representation Working Group Summa...
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
Aug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materialsAug2013 horizon dx engineered cell line reference materials
Aug2013 horizon dx engineered cell line reference materials
 
Aug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browserAug2013 GeT-RM project and genome browser
Aug2013 GeT-RM project and genome browser
 

Similar to 140127 measurements for rm characterization wg summary

Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Databricks
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
GenomeInABottle
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
GenomeInABottle
 
Aug2013 real time genomics trio pedigree analysis
Aug2013 real time genomics trio pedigree analysisAug2013 real time genomics trio pedigree analysis
Aug2013 real time genomics trio pedigree analysis
GenomeInABottle
 
Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1
Cameron Locker, MPH
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
Ulises Urzua
 
Genetica forense curso 2012
Genetica forense curso 2012Genetica forense curso 2012
Genetica forense curso 2012
braguetin
 

Similar to 140127 measurements for rm characterization wg summary (20)

2016 ashg giab poster
2016 ashg giab poster2016 ashg giab poster
2016 ashg giab poster
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
 
Aug2013 illumina platinum genomes
Aug2013 illumina platinum genomesAug2013 illumina platinum genomes
Aug2013 illumina platinum genomes
 
2017 agbt giab_poster
2017 agbt giab_poster2017 agbt giab_poster
2017 agbt giab_poster
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matrices
 
Aug2013 real time genomics trio pedigree analysis
Aug2013 real time genomics trio pedigree analysisAug2013 real time genomics trio pedigree analysis
Aug2013 real time genomics trio pedigree analysis
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1
 
SNP
SNPSNP
SNP
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
Sept2016 sv nist_intro
Sept2016 sv nist_introSept2016 sv nist_intro
Sept2016 sv nist_intro
 
Genetica forense curso 2012
Genetica forense curso 2012Genetica forense curso 2012
Genetica forense curso 2012
 

More from GenomeInABottle

More from GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

140127 measurements for rm characterization wg summary

  • 1. Sample Characterization Michael A. Eberle GiaB, January 2014
  • 2. Pedigree including NA12878 12889 12890 12891 12892 NA12878 12877 12879 12880 12881 12882 12878 12883 12884 12885 12886 12887 12888 12893 All 17 members sequenced to at least 50x depth (PCR-Free protocol) Variants are called across the pedigree using different software & technology Inheritance information provides high confident, direct validation of variant calls 2
  • 3. Why sequence a pedigree? AA CT AC GT TG AA AG CC AA GT TT AA 3 AG TT CC TG GT AC AG TC CA TT GT AA GG TC CA GT TT CA AG CC AA TT TT AA AG CT AC GG TT AC With a sufficiently large pedigree the transmission of the parental chromosomes can unambiguously be determined AG TT CC TG GT AC AG CC AA GT TT AA Error: T in blue haplotype should be G
  • 4. Why sequence a pedigree? Either parent could also be TT AG CC AA GT TT AA 4 AG TT CC TG GT AC AA CT AC GT TG AA AG TC CA TT GT AA GG TC CA GT TT CA AG CC AA TT TT AA If only the trio were sequenced this error would not be detected When sequencing a trio we can never eliminate alternative genotypes in some of the samples AG AG AG CT CC TT A Could also be GG or GT C AA CC GG GT TG TT TT GT AC AA AC
  • 5. A large pedigree identifies most errors Can identify a single error in >99.7% of the variant positions (11 sibs) % Sites Perfectly Constrained Percent 100 “Perfectly constrained” means could remove the genotype information of any More sibs adds confidence to more sample and impute it based on the phasing and other sample genotypes variant calls 50 2 sibs allows phasing & identifies errors in 25% of variant positions Trio never positively identifies the genotypes in every sample 0 1 5 2 3 4 5 6 7 # Siblings 8 9 10 11
  • 6. Cost to add more siblings % Sites Perfectly Constrained Percent 100 2 Trios of Sequencing / 4 sibs 50 1 Trio of Sequencing 0 1 6 2 3 4 5 6 7 # Siblings 8 9 10 11
  • 7. Understanding conflicts in the pedigree 7
  • 8. # Errors Somatic/cell-line deletions on chr22 300 Errors per 50kb Errors in NA12878 & NA12893 200 100 0 300 200 100 0 Normalized Depth 4 3 None of the other children carry this deletion (though noise may indicate mosaic) 2 1 0 8 1Mb
  • 9. Read counts for the haplotypes inferred in NA12878 at location of cell line deletion (200x depth) Maternal haplotype (NA12892) 0.10 • Inferred the two haplotypes in NA12878 based on the other samples Counts represent the predicted heterozygous locations Fraction • 0.05 Paternal haplotype (NA12891) 0.00 0 50 100 Allele Counts 9 150 200
  • 10. Technical replicates validate de novo SNVs 82 (~4%) did not replicate Total Errors TotalConflicts 4000 3000 2000 FPs? 1843 (~96%) replicate original call NA 128 0 82 1000 10 Results in Tech. Rep.
  • 11. Thoughts on selecting the next samples for sequencing Identify and sequence pedigrees with multiple siblings – WGS every individual in the pedigree to identify haplotype transmission vectors – One “high quality” family (2 parents & 4 sibs) provides a “better” reference than two lower quality trios for the same amount of sequencing – Technical replicates allow alternative validation of biologically interesting calls – e.g. de novo mutations, gene conversion etc. Choose one or two samples to target for long reads if sequencing-limited – Sequencing both parent will provide 100% of the variants in the pedigree though with four children only ~75% will be validated in the children – Sequencing a child will guarantee that every variant has been sequenced in at least one of the parents though will only contain ~50% of the variants in the family Quality of the DNA is important – CEPH pedigree shows many cell line artifacts that are correctly genotyped but deviate from inheritance – Cell line artifacts complicate the analysis 11