This document summarizes a presentation about assembling the major histocompatibility complex (MHC) region of the human genome. It discusses the importance of accurately phasing HLA genes in the MHC region for organ transplantation matching. It describes using long reads, trio sequencing data, and other techniques to generate "perfect" haplotig assemblies of the MHC region with fully phased HLA genes. It acknowledges some remaining challenges like resolving repeats and integrating assembly and mapping-based variant calls to create the most accurate reference. The goal is to solve the complex MHC puzzle at scale using long read technologies to create a next-generation MHC database.
19. Integrating
assembly- and
mapping-
based calls
gives best
MHC
benchmark
• MHC assembly-based bed
includes 23187 variants in
the MHC region, excluding:
• CYP21A2 and pseudogene
• Homopolymers >10bp
• SVs in assembly
• Very dense variants
• v4.0 mapping-based bed
includes 13964 variants in
the MHC region, excluding:
• Short read callsets
• Conflicts between callers
• SVs from all methods
• Homopolymers >10bp
• Many clusters of variants,
including some HLA genes
• Only 11 differences
between assembly and
mapping based calls in
both beds
• 2 genotyping errors in
assembly-based
• 1 inaccurate complex allele
and cluster of 8 missed
variants in mapping-based
• Merged benchmark
includes 23229 variants in
the MHC region Mbp
• Covers most HLA genes and
CYP21A2/TNXA/TNXB
Threshold True-pos-baseline True-pos-call False-pos False-neg Precision Sensitivity F-measure
----------------------------------------------------------------------------------------------------
None 13899 13549 10 4 0.9993 0.9997 0.9995
These variants are fully phased through the MHC regions too!!
9265 new variants
over MHC region.