SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Graph and assembly strategies for the
MHC and ribosomal DNA regions
Alexander Dilthey
The MHC is the zebrafish of the genome!
(model region)
PRGs – Population Reference Graphs
• Simple: acyclic, directed (sub-class of general variation graphs)
• Usually built from MSA, preserve gap positions
(i.e. global homology between input sequences).
• Generative model: Recombination
• Ploidy well-defined (0, 1, 2)
TA CT A G
C
C
_
_
A
TA
A
Outline
• Quick recap:
What we know about the utility of graph genome approaches
• New results:
Haplotyping in hypervariable regions (HLA)
Pseudo graph alignment
• De novo assembly of ribosomal DNA
In most of the MHC, single-reference
approaches work just fine…
Numberofkmers(millions)
4.55.0
PGFreference Platypus PRG-Viterbi PRG-Mapped
kmersrecovered
kmersnot recovered
+ long-read validation with consistent results (not shown)
Dilthey et al., Nature Genetics 2015
… graph genomes outperform in the most
complex sub-region of the MHC …
Dilthey et al., Nature Genetics 2015
… remaining problems driven by incomplete
input haplotypes + algorithmics.
Aligned kmers
Chromotype position (kb)
Readposition(kb)
0 10 20
0
2
4
6
Incomplete input haplotypes:
Large uncharacterized inversion
Algorithmics:
Incorrect HLA haplotyping.
Dilthey et al., Nature Genetics 2015
HLA haplotyping
• Hypothesis: Whole-genome sequencing data contains the information
necessary for accurate HLA typing
• “HLA typing”  HLA gene exon sequences
• HLA class I: exons 2 and 3
• HLA class II: exon 2
• Challenge: align reads to the right gene – homology hell.
• Proper read-to-graph alignment instead of k-Mers.
Class I exon homology
Exon 2 Exon 3
HLA-A 3284 alleles
HLA-B 4077 alleles
HLA-C 2799 alleles
Approach: deep PRG + mapping
Exonic MSA
T*01:01 _ _ A C G T A C T _ _
T*01:02 C A A C A T A C T _ _
T*01:03 _ _ A C G C G C T _ _
T*01:04 _ _ A T C C G C T A C
T*01:05 _ _ A T C C C C T _ _
T*01:06 _ _ _ C C T A C T _ _
Genomic MSA
T*01:01 A G C A _ _ A C G T A C T _ _ C C T A
T*01:02 A C C A C A A C A T A C T _ _ C C T A
T*01:04 _ T T A _ _ A T C C G C T A C C C T A
8 xMHC reference haplotypes
PGF (with T*01:01) A C T A G C A _ _ A C G T A C T _ _ C C T A T G A
MANN (with T*01:04) T T T _ T T A _ _ A T C C G C T A C C C T A T G A
1) Gene-only PRG – 46 (pseudo) genes, mostly HLA
|--NNN--| |--NNN--|Gene 1 Gene 2 Gene 3
Padding UTR Exon 1 Intron 1 Exon 2 UTR Padding
Numberofreferencesequences
Region covered by 'genomic' sequences
2) Varying numbers of input sequences across PRG
3) Use hierarchical MSA approach to combine in
Approach: deep PRG + mapping
Level 1
CA
_ _
C T
C
CC
G
AAligned read
2 3 4 5 6 7
A _ TATA _ C
198 9 10 11 12 13 14 15 16 17 18 25 26
C AGTATC
20 21 22 23 24
TC
TC
T T
A
_
A _
A G
C
T
C
T
T
C T
ATA
C
C {G, C}T
C
G
CA
A
_ _
A
4) Seed-and-extend paired-end mapping to PRG
5) Likelihood-based inference: maximize L( aligned reads | HLA types )
(independently per locus)
High-quality WGS data enables gold-standard
accuracy
(of note: 2/3 original discrepancies with validation data were errors in the validation data!)
… but not from exome, MiSeq data
Sequencing error?
Effective fragment length? [2 x read length + IS]
Conclusion (intermediate)
• If the input sequencing data is „good enough“, we manage near-
perfect haplotyping in the genome‘s most polymorphic region
• Effective fragment length likely the most important factor
• Not-so-good sequencing data: joint haplotyping + alignment
(i.e. alignment location is not independent of inferred haplotype)
• Read mapping implementation SLOW
Pseudo graph mapping
Input sequences
Pseudo graph mapping
Input sequences
Graph
Pseudo graph mapping
Input sequences
Graph
Align short reads to input sequences...
Pseudo graph mapping
Input sequences
Graph
Align short reads to input sequences...
... transpose onto graph
Scrubbing, cutting, cleaning
Input MSA Lin. alignment MSA coor. Scrubbed
123456789 123456X789 123456789
Seq1 AACAC_TTT Seq1 AACAC_TTT AACAC__TTT AACAC_TTT
Seq2 TTCACGTTT Read AACACGTTT AACAC_GTTT AACACGTTT
-
Graph TTCAC TTT
G
Scrubbing: get rid of INDEL-induced changes in the alignment coordinate system
Cutting: Examine alignment gap structure; cut in „bad“ areas; use longest stretch
Cleaning: Find the best gap-less sequence-to-graph alignment + extension with gaps
Graph alignment
123456789
Graph AACACGTTT
Seq1 AACACGTTT
Accuracy slightly worse; fast!
Conclusion: perhaps there is a middle ground between graph and linear sequence
alignment. Work in progress. Further tuning?
Inferred Accuracy Call Rate Inferred Accuracy Call Rate
A 6 6 1.00 1.00 6 1.00 1.00
B 6 6 1.00 1.00 6 1.00 1.00
C 6 6 1.00 1.00 6 1.00 1.00
DQA1 6 6 1.00 1.00 6 1.00 1.00
DQB1 6 6 1.00 1.00 6 1.00 1.00
DRB1 6 6 1.00 1.00 6 1.00 1.00
A 22 22 0.86 1.00 22 1.00 1.00
B 22 22 1.00 1.00 22 1.00 1.00
C 22 22 1.00 1.00 22 1.00 1.00
DQA1 12 12 1.00 1.00 12 1.00 1.00
DQB1 22 22 1.00 1.00 22 1.00 1.00
DRB1 22 22 0.91 1.00 22 0.95 1.00
Platinum
Trio
1000
Genomes
Highest
Resolution
MHC-PRG-2 HLA*PRG
NLocusCohort
Towards additional high-quality reference
haplotypes…
Remaining challenges: extreme repeats, haplotypes.
Sergey Koren
Ribosomal DNA
• Encodes ribosomal RNA
• Hundreds of copies
(tandem repeat arrays)
• Variation poorly characterized
• Step 1: Targeted approach
• Step 2: WGS-based
• Step 3: Variation graph
Read error vs variation
… from whole-genome data?
Long reads  de Bruijn graph Technology!
6% > 50k
Summary
• Variation graphs are worth the effort – at least in highly complex regions.
• Evidence: MHC „model system“
+ overall improvement of Genome inference accuracy
+ complex-locus haplotyping
• Incorporate LD?
• Middle ground between full graph alignment and linear sequence
alignment?
• Ribosomal DNA – let me know if you‘re also interested!
Acknowledgements
NIH
Adam Phillippy
Sergey Koren
Brian Walenz
Jung-Hyun Kim
Vladimir Larionov
Oxford
Gil McVean
Zam Iqbal
Alexander Mentzer
Histogenetics
Nezih Cereb
UCSF/Nantes
Pierre-Antoine Gourraud
GSK
Matt Nelson
Charles Cox

Weitere ähnliche Inhalte

Andere mochten auch

Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Genome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsGenome Reference Consortium
 
AQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - SchizophreniaAQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - SchizophreniaSnowfairy007
 

Andere mochten auch (14)

Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 
Everyday de novo assembly
Everyday de novo assemblyEveryday de novo assembly
Everyday de novo assembly
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
 
AQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - SchizophreniaAQA A2 Psychology Unit 4 - Schizophrenia
AQA A2 Psychology Unit 4 - Schizophrenia
 

Ähnlich wie Graph and assembly strategies for the MHC and ribosomal DNA regions

20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)Marwa Al-Rikaby
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...IOSR Journals
 
High throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple platesHigh throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple platesIntegrated DNA Technologies
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Deanna Church
 
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...Christian Have
 
ETH_SymposiumCR
ETH_SymposiumCRETH_SymposiumCR
ETH_SymposiumCRChantal Roth
 
Introducing data analysis: reads to results
Introducing data analysis: reads to resultsIntroducing data analysis: reads to results
Introducing data analysis: reads to resultsAGRF_Ltd
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Thermo Fisher Scientific
 
Wang labsummer2010
Wang labsummer2010Wang labsummer2010
Wang labsummer2010russodl
 
LPEI_ZCNI_Poster
LPEI_ZCNI_PosterLPEI_ZCNI_Poster
LPEI_ZCNI_PosterLong Pei
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
 
Abrf poster2007
Abrf poster2007Abrf poster2007
Abrf poster2007Elsa von Licy
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphsChirag Jain
 
Daly altshuler.labmeeting
Daly altshuler.labmeetingDaly altshuler.labmeeting
Daly altshuler.labmeetingManuel Rivas
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Databricks
 
Rt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcardeRt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcardeElsa von Licy
 
Notes on Mutation
Notes on MutationNotes on Mutation
Notes on Mutationgiordepasamba
 

Ähnlich wie Graph and assembly strategies for the MHC and ribosomal DNA regions (20)

Biochip
BiochipBiochip
Biochip
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)DNA Compression (Encoded using Huffman Encoding Method)
DNA Compression (Encoded using Huffman Encoding Method)
 
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...An Efficient Biological Sequence Compression Technique Using  LUT and Repeat ...
An Efficient Biological Sequence Compression Technique Using LUT and Repeat ...
 
High throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple platesHigh throughput qPCR: tips for analysis across multiple plates
High throughput qPCR: tips for analysis across multiple plates
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
 
ETH_SymposiumCR
ETH_SymposiumCRETH_SymposiumCR
ETH_SymposiumCR
 
Introducing data analysis: reads to results
Introducing data analysis: reads to resultsIntroducing data analysis: reads to results
Introducing data analysis: reads to results
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
 
Wang labsummer2010
Wang labsummer2010Wang labsummer2010
Wang labsummer2010
 
LPEI_ZCNI_Poster
LPEI_ZCNI_PosterLPEI_ZCNI_Poster
LPEI_ZCNI_Poster
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
 
Abrf poster2007
Abrf poster2007Abrf poster2007
Abrf poster2007
 
Paired-end alignments in sequence graphs
Paired-end alignments in sequence graphsPaired-end alignments in sequence graphs
Paired-end alignments in sequence graphs
 
Daly altshuler.labmeeting
Daly altshuler.labmeetingDaly altshuler.labmeeting
Daly altshuler.labmeeting
 
Computational Chemistry Robots
Computational Chemistry RobotsComputational Chemistry Robots
Computational Chemistry Robots
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
 
Rt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcardeRt2 pcr arraydataanalysisquickcarde
Rt2 pcr arraydataanalysisquickcarde
 
Notes on Mutation
Notes on MutationNotes on Mutation
Notes on Mutation
 

Mehr von Genome Reference Consortium

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?Genome Reference Consortium
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 

Mehr von Genome Reference Consortium (18)

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 

KĂźrzlich hochgeladen

Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Miss joya
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersBook Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersnarwatsonia7
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbaisonalikaur4
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Bookingnarwatsonia7
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipurparulsinha
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliRewAs ALI
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...narwatsonia7
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 

KĂźrzlich hochgeladen (20)

Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCREscort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
Escort Service Call Girls In Sarita Vihar,, 99530°56974 Delhi NCR
 
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersBook Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service MumbaiLow Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
Low Rate Call Girls Mumbai Suman 9910780858 Independent Escort Service Mumbai
 
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment BookingHousewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
Housewife Call Girls Hoskote | 7001305949 At Low Cost Cash Payment Booking
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment BookingCall Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
Call Girl Koramangala | 7001305949 At Low Cost Cash Payment Booking
 
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service JaipurHigh Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
High Profile Call Girls Jaipur Vani 8445551418 Independent Escort Service Jaipur
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas Ali
 
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
Call Girls Service in Bommanahalli - 7001305949 with real photos and phone nu...
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original PhotosBook Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
 

Graph and assembly strategies for the MHC and ribosomal DNA regions

  • 1. Graph and assembly strategies for the MHC and ribosomal DNA regions Alexander Dilthey
  • 2. The MHC is the zebrafish of the genome! (model region)
  • 3. PRGs – Population Reference Graphs • Simple: acyclic, directed (sub-class of general variation graphs) • Usually built from MSA, preserve gap positions (i.e. global homology between input sequences). • Generative model: Recombination • Ploidy well-defined (0, 1, 2) TA CT A G C C _ _ A TA A
  • 4. Outline • Quick recap: What we know about the utility of graph genome approaches • New results: Haplotyping in hypervariable regions (HLA) Pseudo graph alignment • De novo assembly of ribosomal DNA
  • 5. In most of the MHC, single-reference approaches work just fine… Numberofkmers(millions) 4.55.0 PGFreference Platypus PRG-Viterbi PRG-Mapped kmersrecovered kmersnot recovered + long-read validation with consistent results (not shown) Dilthey et al., Nature Genetics 2015
  • 6. … graph genomes outperform in the most complex sub-region of the MHC … Dilthey et al., Nature Genetics 2015
  • 7. … remaining problems driven by incomplete input haplotypes + algorithmics. Aligned kmers Chromotype position (kb) Readposition(kb) 0 10 20 0 2 4 6 Incomplete input haplotypes: Large uncharacterized inversion Algorithmics: Incorrect HLA haplotyping. Dilthey et al., Nature Genetics 2015
  • 8. HLA haplotyping • Hypothesis: Whole-genome sequencing data contains the information necessary for accurate HLA typing • “HLA typing”  HLA gene exon sequences • HLA class I: exons 2 and 3 • HLA class II: exon 2 • Challenge: align reads to the right gene – homology hell. • Proper read-to-graph alignment instead of k-Mers.
  • 9. Class I exon homology Exon 2 Exon 3 HLA-A 3284 alleles HLA-B 4077 alleles HLA-C 2799 alleles
  • 10. Approach: deep PRG + mapping Exonic MSA T*01:01 _ _ A C G T A C T _ _ T*01:02 C A A C A T A C T _ _ T*01:03 _ _ A C G C G C T _ _ T*01:04 _ _ A T C C G C T A C T*01:05 _ _ A T C C C C T _ _ T*01:06 _ _ _ C C T A C T _ _ Genomic MSA T*01:01 A G C A _ _ A C G T A C T _ _ C C T A T*01:02 A C C A C A A C A T A C T _ _ C C T A T*01:04 _ T T A _ _ A T C C G C T A C C C T A 8 xMHC reference haplotypes PGF (with T*01:01) A C T A G C A _ _ A C G T A C T _ _ C C T A T G A MANN (with T*01:04) T T T _ T T A _ _ A T C C G C T A C C C T A T G A 1) Gene-only PRG – 46 (pseudo) genes, mostly HLA |--NNN--| |--NNN--|Gene 1 Gene 2 Gene 3 Padding UTR Exon 1 Intron 1 Exon 2 UTR Padding Numberofreferencesequences Region covered by 'genomic' sequences 2) Varying numbers of input sequences across PRG 3) Use hierarchical MSA approach to combine in
  • 11. Approach: deep PRG + mapping Level 1 CA _ _ C T C CC G AAligned read 2 3 4 5 6 7 A _ TATA _ C 198 9 10 11 12 13 14 15 16 17 18 25 26 C AGTATC 20 21 22 23 24 TC TC T T A _ A _ A G C T C T T C T ATA C C {G, C}T C G CA A _ _ A 4) Seed-and-extend paired-end mapping to PRG 5) Likelihood-based inference: maximize L( aligned reads | HLA types ) (independently per locus)
  • 12. High-quality WGS data enables gold-standard accuracy (of note: 2/3 original discrepancies with validation data were errors in the validation data!)
  • 13. … but not from exome, MiSeq data
  • 15. Effective fragment length? [2 x read length + IS]
  • 16. Conclusion (intermediate) • If the input sequencing data is „good enough“, we manage near- perfect haplotyping in the genome‘s most polymorphic region • Effective fragment length likely the most important factor • Not-so-good sequencing data: joint haplotyping + alignment (i.e. alignment location is not independent of inferred haplotype) • Read mapping implementation SLOW
  • 18. Pseudo graph mapping Input sequences Graph
  • 19. Pseudo graph mapping Input sequences Graph Align short reads to input sequences...
  • 20. Pseudo graph mapping Input sequences Graph Align short reads to input sequences... ... transpose onto graph
  • 21. Scrubbing, cutting, cleaning Input MSA Lin. alignment MSA coor. Scrubbed 123456789 123456X789 123456789 Seq1 AACAC_TTT Seq1 AACAC_TTT AACAC__TTT AACAC_TTT Seq2 TTCACGTTT Read AACACGTTT AACAC_GTTT AACACGTTT - Graph TTCAC TTT G Scrubbing: get rid of INDEL-induced changes in the alignment coordinate system Cutting: Examine alignment gap structure; cut in „bad“ areas; use longest stretch Cleaning: Find the best gap-less sequence-to-graph alignment + extension with gaps Graph alignment 123456789 Graph AACACGTTT Seq1 AACACGTTT
  • 22. Accuracy slightly worse; fast! Conclusion: perhaps there is a middle ground between graph and linear sequence alignment. Work in progress. Further tuning? Inferred Accuracy Call Rate Inferred Accuracy Call Rate A 6 6 1.00 1.00 6 1.00 1.00 B 6 6 1.00 1.00 6 1.00 1.00 C 6 6 1.00 1.00 6 1.00 1.00 DQA1 6 6 1.00 1.00 6 1.00 1.00 DQB1 6 6 1.00 1.00 6 1.00 1.00 DRB1 6 6 1.00 1.00 6 1.00 1.00 A 22 22 0.86 1.00 22 1.00 1.00 B 22 22 1.00 1.00 22 1.00 1.00 C 22 22 1.00 1.00 22 1.00 1.00 DQA1 12 12 1.00 1.00 12 1.00 1.00 DQB1 22 22 1.00 1.00 22 1.00 1.00 DRB1 22 22 0.91 1.00 22 0.95 1.00 Platinum Trio 1000 Genomes Highest Resolution MHC-PRG-2 HLA*PRG NLocusCohort
  • 23. Towards additional high-quality reference haplotypes… Remaining challenges: extreme repeats, haplotypes. Sergey Koren
  • 24. Ribosomal DNA • Encodes ribosomal RNA • Hundreds of copies (tandem repeat arrays) • Variation poorly characterized • Step 1: Targeted approach • Step 2: WGS-based • Step 3: Variation graph
  • 25. Read error vs variation … from whole-genome data? Long reads  de Bruijn graph Technology! 6% > 50k
  • 26. Summary • Variation graphs are worth the effort – at least in highly complex regions. • Evidence: MHC „model system“ + overall improvement of Genome inference accuracy + complex-locus haplotyping • Incorporate LD? • Middle ground between full graph alignment and linear sequence alignment? • Ribosomal DNA – let me know if you‘re also interested!
  • 27. Acknowledgements NIH Adam Phillippy Sergey Koren Brian Walenz Jung-Hyun Kim Vladimir Larionov Oxford Gil McVean Zam Iqbal Alexander Mentzer Histogenetics Nezih Cereb UCSF/Nantes Pierre-Antoine Gourraud GSK Matt Nelson Charles Cox