Book Call Girls in Yelahanka - For 7001305949 Cheap & Best with original Photos
Â
Graph and assembly strategies for the MHC and ribosomal DNA regions
1. Graph and assembly strategies for the
MHC and ribosomal DNA regions
Alexander Dilthey
2. The MHC is the zebrafish of the genome!
(model region)
3. PRGs â Population Reference Graphs
⢠Simple: acyclic, directed (sub-class of general variation graphs)
⢠Usually built from MSA, preserve gap positions
(i.e. global homology between input sequences).
⢠Generative model: Recombination
⢠Ploidy well-defined (0, 1, 2)
TA CT A G
C
C
_
_
A
TA
A
4. Outline
⢠Quick recap:
What we know about the utility of graph genome approaches
⢠New results:
Haplotyping in hypervariable regions (HLA)
Pseudo graph alignment
⢠De novo assembly of ribosomal DNA
5. In most of the MHC, single-reference
approaches work just fineâŚ
Numberofkmers(millions)
4.55.0
PGFreference Platypus PRG-Viterbi PRG-Mapped
kmersrecovered
kmersnot recovered
+ long-read validation with consistent results (not shown)
Dilthey et al., Nature Genetics 2015
6. ⌠graph genomes outperform in the most
complex sub-region of the MHC âŚ
Dilthey et al., Nature Genetics 2015
7. ⌠remaining problems driven by incomplete
input haplotypes + algorithmics.
Aligned kmers
Chromotype position (kb)
Readposition(kb)
0 10 20
0
2
4
6
Incomplete input haplotypes:
Large uncharacterized inversion
Algorithmics:
Incorrect HLA haplotyping.
Dilthey et al., Nature Genetics 2015
8. HLA haplotyping
⢠Hypothesis: Whole-genome sequencing data contains the information
necessary for accurate HLA typing
⢠âHLA typingâ ď HLA gene exon sequences
⢠HLA class I: exons 2 and 3
⢠HLA class II: exon 2
⢠Challenge: align reads to the right gene â homology hell.
⢠Proper read-to-graph alignment instead of k-Mers.
9. Class I exon homology
Exon 2 Exon 3
HLA-A 3284 alleles
HLA-B 4077 alleles
HLA-C 2799 alleles
10. Approach: deep PRG + mapping
Exonic MSA
T*01:01 _ _ A C G T A C T _ _
T*01:02 C A A C A T A C T _ _
T*01:03 _ _ A C G C G C T _ _
T*01:04 _ _ A T C C G C T A C
T*01:05 _ _ A T C C C C T _ _
T*01:06 _ _ _ C C T A C T _ _
Genomic MSA
T*01:01 A G C A _ _ A C G T A C T _ _ C C T A
T*01:02 A C C A C A A C A T A C T _ _ C C T A
T*01:04 _ T T A _ _ A T C C G C T A C C C T A
8 xMHC reference haplotypes
PGF (with T*01:01) A C T A G C A _ _ A C G T A C T _ _ C C T A T G A
MANN (with T*01:04) T T T _ T T A _ _ A T C C G C T A C C C T A T G A
1) Gene-only PRG â 46 (pseudo) genes, mostly HLA
|--NNN--| |--NNN--|Gene 1 Gene 2 Gene 3
Padding UTR Exon 1 Intron 1 Exon 2 UTR Padding
Numberofreferencesequences
Region covered by 'genomic' sequences
2) Varying numbers of input sequences across PRG
3) Use hierarchical MSA approach to combine in
11. Approach: deep PRG + mapping
Level 1
CA
_ _
C T
C
CC
G
AAligned read
2 3 4 5 6 7
A _ TATA _ C
198 9 10 11 12 13 14 15 16 17 18 25 26
C AGTATC
20 21 22 23 24
TC
TC
T T
A
_
A _
A G
C
T
C
T
T
C T
ATA
C
C {G, C}T
C
G
CA
A
_ _
A
4) Seed-and-extend paired-end mapping to PRG
5) Likelihood-based inference: maximize L( aligned reads | HLA types )
(independently per locus)
12. High-quality WGS data enables gold-standard
accuracy
(of note: 2/3 original discrepancies with validation data were errors in the validation data!)
16. Conclusion (intermediate)
⢠If the input sequencing data is âgood enoughâ, we manage near-
perfect haplotyping in the genomeâs most polymorphic region
⢠Effective fragment length likely the most important factor
⢠Not-so-good sequencing data: joint haplotyping + alignment
(i.e. alignment location is not independent of inferred haplotype)
⢠Read mapping implementation SLOW
25. Read error vs variation
⌠from whole-genome data?
Long reads ď de Bruijn graph Technology!
6% > 50k
26. Summary
⢠Variation graphs are worth the effort â at least in highly complex regions.
⢠Evidence: MHC âmodel systemâ
+ overall improvement of Genome inference accuracy
+ complex-locus haplotyping
⢠Incorporate LD?
⢠Middle ground between full graph alignment and linear sequence
alignment?
⢠Ribosomal DNA â let me know if youâre also interested!
27. Acknowledgements
NIH
Adam Phillippy
Sergey Koren
Brian Walenz
Jung-Hyun Kim
Vladimir Larionov
Oxford
Gil McVean
Zam Iqbal
Alexander Mentzer
Histogenetics
Nezih Cereb
UCSF/Nantes
Pierre-Antoine Gourraud
GSK
Matt Nelson
Charles Cox