SlideShare a Scribd company logo
1 of 84
A brief history of gnomAD
Daniel MacArthur
April 11, 2019
Given known mutation rates, it is almost certain that
every possible single base change compatible with
life exists in a living human
The power of seven billion people
Opportunities and challenges of
genetic data aggregation
over three million exomes
and genomes sequenced
Challenges:
• difficulty of moving data
• inadequate consent and
data use permissions
• objections to data
sharing
• inconsistent processing
and variant calling
A history of genome data aggregation at
Broad
Exome Aggregation
Consortium (ExAC)
Genome Aggregation
Database (gnomAD)
• 60,076 exomes
• began in 2012, first release Oct 2014
• preprint in Oct 2015
• published in May 2016
• 125,748 exomes and 15,708 genomes
• began in 2016, first release Oct 2016
• final publication data release Oct 2018
• 7 preprints in Jan-March 2019
gnomAD 2.1 samples
• Data provided by 109 PIs for 141,456 individuals including 125,748
exomes & 15,708 whole genomes
• Primarily from case-control studies of complex adult-onset diseases (e.g.
type 2 diabetes, heart attack, neuropsychiatric conditions)
• Removed low-quality samples, related individuals, known severe
pediatric disease cases plus first-degree relatives
• Diverse range of ancestries (57% European, over ~10,000 samples
apiece from South Asian, Latino, African/African-American, and East
Asian populations)
gnomAD’s impact
• 20.8 M pageviews of the ExAC and gnomAD browsers, by 230,000 users
from 166 countries
• Aided in the diagnosis of over 50,000 rare disease families
• 3,962 papers have cited the ExAC paper
The gnomAD preprints on bioRxiv
http://broad.io/gnomad_lof
http://broad.io/gnomad_drugs
http://broad.io/gnomad_lrrk2
http://broad.io/tx_annotation
http://broad.io/gnomad_mnv
http://broad.io/gnomad_uorfs
http://broad.io/gnomad_sv
Thank you
Production team
Eric Banks
Charlotte Tolonen
Christopher
Llanwarne
David Roazen
Diane Kaplan
Gordon Wade
Jeff Gentry
Jose Soto
Kathleen Tibbetts
Kristian Cibulskis
Laura Gauthier
Louis Bergelson
Miguel Covarrubias
Nikelle Petrillo
Ruchi Munshi
Sam Novod
Thibault Jeandet
Valentin Ruano-
Rubio
Yossi Farjoun
Analysis team
Konrad Karczewski
Laurent Francioli
Grace Tiao
Kristen Laricchia
Anne O'Donnell-
Luria
Ben Neale
Beryl Cummings
Eric Minikel
Irina Armean
James Ware
Kaitlin Samocha
Mark Daly
Nicola Whiffin
Qingbo Wang
Ryan Collins
Cotton Seed
Tim Poterba
Arcturus Wang
Chris Vittal
Structural Variation
team
Ryan Collins
Harrison Brand
Konrad Karczewski
Laurent Francioli
Nick Watts
Matthew Solomonson
Xuefang Zhao
Laura Gauthier
Harold Wang
Chelsea Lowther
Mark Walker
Christopher Whelan
Ted Brookings
Ted Sharpe
Jack Fu
Eric Banks
Michael Talkowski
Website team
Matthew
Solomonson
Nick Watts
Ben Weisburd
Konrad Karczewski
Ethics team
Andrea Saltzman
Molly Schleicher
Namrata Gupta
Stacey Donnelly
Broad
Genomics
Platform
Stacey Gabriel
Kristen Connolly
Steven Ferriera
Funding
NIGMS R01 GM104371
(PI: MacArthur)
NIDDK U54 DK105566
(PIs: MacArthur and
Neale)
NHGRI U24 HG010262
(PI: Phillipakis)
NIMH R56 MH115957
(PI: Talkowski)
The vast majority of the
data storage, computing
resources, and human
effort used to generate this
call set were donated by
the Broad Institute
Coordination
Jessica Alföldi
Thank youPrincipal Investigators
Daniel MacArthur
Aarno Palotie
Andres Metspalu
Anne Remes
Adolfo Correa
Andre Franke
Ann Pulver
Ben Glaser
Ben Neale
Bong-Jo Kim
Bruce Cohen
Carlos Pato
Carlos A Aguilar Salinas
Christina Hultman
Christine M. Albert
Christopher Haiman
Clicerio Gonzalez
Colin Palmer
Craig Hanis
Dan Roden
Dan Turner
Dana Dabelea
Daniel Chasman
Danish Saleheen
David Altshuler
David Goldstein
Dawood Darbar
Dermot McGovern
Diego Ardissino
Donald Bowden
Dost Ongur
Emelia J. Benjamin
Erkki Vartiainen
Erwin Bottinger
Gad Getz
George Kirov
Gil Atzmon
Harlan M. Krumholz
Harry Sokol
Heribert Schunkert
Hilkka Soininen
Hugh Watkins
Jaakko Kaprio
Jaana Suvisaari
James Meigs
James Ware
James Wilson
Jaspal Kooner
Jaume Marrugat
Jeanette Erdmann
Jeremiah Scharf
John Barnard
John Chambers
John D. Rioux
Jose Florez
Josée Dupuis
Judy Cho
Juliana Chan
Kari Mattila
Kyong Soo Park
Laurent Beaugerie
Leif Groop
Lorena Orozco
Lori Bonnycastle
Maija Wessman
Mark Daly
Mark McCarthy
Markku Laakso
Martti Färkkilä
Matthew Bown
Matthew Harms
Matti Holi
Michael Boehnke
Michael O'Donovan
Michael Owen
Mikko Hiltunen
Mikko Kallela
Mina Chung
Ming Tsuang
Moore Shoemaker
Nazneen Rahman
Nilesh Samani
Olle Melander
Pamela Sklar
Patrick T. Ellinor
Patrick Sullivan
Peter Nilsson
Ramnik Xavier
Ravindranath
Duggirala
Rinse Weersma
Roberto Elosua
Ronald Ma
Ruth Loos
Ruth McPherson
Samuli Ripatti
Sekar Kathiresan
Seppo Koskinen
Soo Heon Kwak
Stephen Glatt
Steve McCarroll
Steven A. Lubitz
Subra
Kugathasan
Tai Shyong
Tariq Ahmad
Teresa Tusie
Luna
Terho Lehtimäki
Tim Spector
Tõnu Esko
Tuomi Tiinamaija
Veikko Salomaa
Yik Ying Teo
Young Jin Kim
Jerome Rotter
Steven Rich
Variation across 141,456
individuals reveals the
spectrum of loss-of-function
intolerance of the human
genome
Konrad Karczewski
April 11, 2019
@konradjk
broad.io/gnomad_lof
Range of LoF impact
embryonic lethal
recessive disease
non-essential
complex disease
beneficial
haploinsufficient disease
Identifying true LoF variants is challenging
• LoFs are rare
• LoFs are enriched for artifacts
Identifying true LoF variants is challenging
• LoFs are rare
• LoFs are enriched for artifacts
Staggering amounts of variation
synonymous
missense
pLoF
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
0 40,000 80,000 125,748
Sample size
Numberobserved
• gnomAD contains:
• 230M variants in 15,708
genomes
• 15M variants in 125,748
exomes
Staggering amounts of pLoFs
• gnomAD contains:
• 230M variants in 15,708
genomes
• 15M variants in 125,748
exomes
• Of these, we observe
515,326 predicted loss-of-
function (pLoF) variants
• Stop-gained
• Essential splice
• Frameshift indel
pLoF
0
100,000
200,000
300,000
400,000
0 40,000 80,000 125,748
Sample size
Numberobserved
Identifying true LoF variants is challenging
• LoFs are rare
• LoFs are enriched for artifacts
LOFTEE removes benign variation
• LoF filtering plugin to VEP, LOFTEE
• Variants retained by LOFTEE are:
• rarer, and thus
• more deleterious
• After filtering, we discover 443,769
high-confidence pLoFs in gnomAD
https://github.com/konradjk/loftee
●
●
●
●
0.00
0.05
0.10
0.15
synonymous
missense
low
confidence pLoF
high confidence pLoF
MAPS
Rarer, more
deleterious
Detecting genes depleted for pLoFs
• Mutational model that predicts the number of SNVs in a given
functional class we would expect to see in each gene in a cohort
• Now incorporating methylation, improved coverage correction, LOFTEE
• Previously transformed into the probability of LoF intolerance (pLI)
• Applying to 125,748 gnomAD exomes
• Median of 17.3 pLoFs expected per gene
• Direct estimate of observed/expected ratio
Kaitlin Samocha
(Samocha et al. 2014;
Lek et al. 2016)
Most genes are depleted of LoF variation
MED13L FNDC3B
Phenotype Severe Intellectual Disability Unknown
Observed Expected Obs/Exp (CI) Observed Expected Obs/Exp (CI)
Synonymous 462 465 0.993 (0.92-1.07) 271 266 1.02 (0.92-1.13)
pLoF 0 102 0 (0-0.029) 0 68 0 (0-0.043)
• Many are extremely depleted
(<20% observed compared to
expected)
• Including most known (curated)
haploinsufficient genes
• Using upper bound of
confidence interval corrects
for small genes
0
500
1000
1500
0.0 0.5 1.0 1.5
Observed/Expected
Numberofgenes
0
200
400
600
800
0.0 0.5 1.0 1.5 2.0
LOEUF
Numberofgenes
• Binning this spectrum into deciles
Resolving the spectrum of LoF intolerance
Haploinsufficient
Autosomal Recessive
Olfactory Genes
0%
20%
40%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofgenelist
More depleted
More constrained
More tolerant
Less constrained
• Known haploinsufficient genes have ~10% of the expected pLoFs
Resolving the spectrum of LoF intolerance
Haploinsufficient
Autosomal Recessive
Olfactory Genes
0%
20%
40%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofgenelist
• Autosomal recessive genes are centered around 60% of expected
Resolving the spectrum of LoF intolerance
Haploinsufficient
Autosomal Recessive
Olfactory Genes
0%
20%
40%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofgenelist
Gene list from:
Blekhman et al., 2008
Berg et al., 2013
Haploinsufficient
Autosomal Recessive
Olfactory Genes
0%
20%
40%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofgenelist
• Some genes, e.g. olfactory receptors, are unconstrained
Resolving the spectrum of LoF intolerance
Constraint metrics reflect mouse and
cellular knockout phenotypes
Qingbo
Wang
0%
10%
20%
30%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofmouse
hetlethalknockoutgenes
Cell essential
Cell non−essential
0%
5%
10%
15%
20%
25%
0% 20% 40% 60% 80% 100%
LOEUF decile
Percentofessential/
non−essentialgenes
Hart et al., 2017
Eppig et al., 2015
Motenko et al., 2015
Constraint spectrum reflects
patterns of structural variation
• Called SVs in 14,245
individuals with 30X WGS
• 9,860 rare (<1%), biallelic,
autosomal LoF deletions
• Occurrence correlates with
SNV constraint metric
Ryan
Collins
Harrison
Brand
● ●
●
●
●
●
●
●
●
●
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
0% 20% 40% 60% 80% 100%
LOEUF decile
Aggregatedeletion
SVobserved/expected
broad.io/gnomad_sv
Constraint metrics are correlated with
biological relevance
Protein-protein interactions
●
●
● ●
●
●
●
● ●
●
5
10
15
20
25
0% 20% 40% 60% 80% 100%
LOEUF decile
Meannumberof
protein−proteininteractions
Gene expression
●
● ● ●
●
●
●
●
●
●
0
10
20
30
0% 20% 40% 60% 80% 100%
LOEUF decile
Numberoftissueswhere
canonicaltranscriptisexpressed
●
●
●
●
●
●
●● ●● ●● ●
●
●● ●● ●●
synonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymous
pLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoF
0
5
10
15
0% 20% 40% 60% 80% 100%
LOEUF decile
Rateratiofordenovo
variantsinID/DDcases
comparedtocontrols
Constraint improves rare disease diagnosis
• Patients with developmental
delay/intellectual disability are
15X more likely to have an de
novo LoF in a constrained gene
• 8,095 de novos in 5,305 cases
• 2,623 de novos in 2,179 controls
• Integrating expression data
improves this further
Jack
Kosmicki
Beryl
Cummings
broad.io/tx_annotation
Constraint informs common disease
etiologies
• Compared to genome-wide
background, SNPs near
constrained genes are
enriched in their contribution
to heritability of common traits
• In particular, traits that
previously1 showed an
enrichment of ultra-rare
variants are also enriched
among constrained genes
●
●
●
●
●
●
● ●
●
●
1.0
1.2
1.4
0% 20% 40% 60% 80% 100%
LOEUF decile
Partitioningheritability
enrichment
Schizoprenia
Qualifications: College or University degree
Duration to first press of snap−button in each round
Educational attainment Bipolar
10-2
10
-4
10
-6
10-8
10-10
10
-12
10
-14
Activities
Cardiovascular
Cognitive
Environment
Hematological
Metabolic
Nutritional
Ophthalmological
Psychiatric
Reproduction
Respiratory
Skeletal
Social Interactions
Other
Traitenrichment
p−value
1Ganna et al. 2018 AJHG
Andrea
Ganna
Data publicly released with no publication restrictions
gnomad.broadinstitute.org
Matt
Solomonson
Nick
Watts
Gene model with
transcripts
Pathogenic Clinvar
Variants
Dataset
selection box
Tissue
isoform
expression
Constraint
metrics
pext:
broad.io/tx_annotation
Now featuring: structural variant calls in the browser
gnomad.broadinstitute.org
Matt
Solomonson
Nick
Watts
Acknowledgments
• Laurent Francioli
• Grace Tiao
• Beryl Cummings
• Jack Kosmicki
• Andrea Ganna
• Qingbo Wang
• Kaitlin Samocha
Ben Neale
• Daniel Birnbaum
• Jessica Alföldi
Kristen Laricchia
• Matt Solomonson
Nick Watts
• Ryan Collins
Harrison Brand
• Raymond K. Walters
Kate Tashman
• Daniel Rhodes
Moriel Singer-Berk
Eleina England
Eleanor G. Seaby
• Hail team
Tim Poterba
Cotton Seed
Arcturus Wang
• Laura Gauthier
Yossi Farjoun
Eric Banks
• Analytic and
Translational Genetics
Unit
• Mark Daly
• Daniel MacArthur
broad.io/gnomad_lof
Evaluating potential drug
targets through human loss-of-
function genetic variation
Eric Vallabh Minikel
April 11, 2019
@cureffi
broad.io/gnomad_drugs
Why study LoF variants in drug discovery?
• LoF variants can be an in vivo, whole human, lifelong model of
inhibition of a target.
from Plenge 2013, PMID: 23868113
Why study LoF variants in drug discovery?
• LoF variants can be an in vivo, whole human, lifelong model of
inhibition of a target.
• With caveats:
• drug effect may not exactly mimic LoF
• developmental effects
• tissue-specific effects
• dosage
• difference in our ancestors' environment vs. our environment
How do drug targets compare to all genes
& specific gene lists in constraint?
How do drug targets compare to all genes
& specific gene lists in constraint?
How constrained are some well-known
drug targets?
How constrained are some well-known
drug targets?
• 19% of all drug targets (N=73, including 53 targets of inhibitors, antagonists,
etc.) have obs/exp < 13%, the average for haploinsufficient genes
How constrained are some well-known
drug targets?
• 19% of all drug targets (N=73, including 53 targets of inhibitors, antagonists,
etc.) have obs/exp < 13%, the average for haploinsufficient genes
• These include some chemotherapy targets but also aspirin, statins, and
antimuscarinics!
How constrained are some well-known
drug targets?
• Not all chemotherapy targets are so constrained
How constrained are some well-known
drug targets?
• Drug targets span the full spectrum – constraint alone should not rule a potential
target in or out
Can we find and phenotype LoF individuals
for a gene of interest?
• If you can find them, phenotyping of LoF individuals (het or hom)
can be deeply informative for safety and/or efficacy
• Examples: PCSK9, APOC3, CETP, LPA, HAO1...
• Questions for today:
• Is it always realistic to expect to find enough LoF heterozygotes
or homozygotes to permit your analysis of interest?
• What is the best strategy to go about finding them?
• How should you curate pLoF variants before starting to
recontact?
Cumulative allele frequency of LoF variants
• Cumulative allele frequency (CAF) = Σ(AF) for all LoF variants
• Define p = proprtion of the haplotypes in population that are LoF
• In an outbred population:
• LoF het frequency = 2p(1-p)
• LoF hom / compound het frequency = p2
• This analysis:
• Use gnomAD data to compute p for each gene
• Predict the hom/compound het frequency for each gene in the population
— assuming this genotype is not lethal (!)
In outbred populations...
2p(1-p)
everyone
In outbred populations...
2p(1-p)
p2
everyone
What about bottlenecked populations?
https://ecologyblog0112.weebly.com/population-genetics-in-the-conservation-of-biodiversity.html
In bottlenecked populations...
everyone
In bottlenecked populations...
everyone
In consanguineous individuals...
2p(1-p)
everyone
0.058p
Which populations to sequence?
• For the near future, analysis for most genes will need to focus on
heterozygotes, regardless of population
• For finding homozygotes, best strategy is to sample diverse
bottlenecked populations and consanguineous individuals
How to curate?
• "the more interesting something looks, the less likely it is to be real"
• Solutions:
• LOFTEE (Karczewski 2019, broad.io/gnomad_lof)
• Expression-aware annotation (Cummings 2019,
broad.io/tx_annotation)
• Deep manual curation is still important
Non-random distribution of pLoFs across the coding sequence is
suspicious
• Next up: examples of curation of 3 genes with different error modes
HTT
• Cumulative pLoF allele frequency: 6.2%
• Mostly driven by several common variants in exon 1
• Highly suspicious given the lethal mouse knockout phenotype!
HTT
• common LoFs are all alignment artifacts at polyQ and polyP repeat regions
• after filtering & curation, cumulative pLoF allele frequency: 0.013%
MAPT
• Cumulative pLoF allele frequency: 14%
MAPT
• almost all pLoFs are in exons not expressed in the brain!
• the remainder are various artifacts
• after filtering & curation, cumulative pLoF allele frequency: 0%
• Transcript-aware expression – see Cummings et al, broad.io/tx_annotation
PRNP
• Slightly constrained – 6 observed, 10 expected
• All in N terminus or very near C terminus
PRNP
• N-terminal variants are true LoF. In N terminus, not constrained at all (obs/exp = 6/6.05 = 99%)
• C-terminal truncating variants cause disease through gain-of-function (literature variants added). The
gnomAD C-terminal frameshift turns out to be a dementia case!
Very different answers before/after curation
CAF
gene before after
HTT 6.2% 0.013%
LRRK2 0.23% 0.09%
MAPT 14% 0%
PRNP 0.0035% 0.0021%
SNCA 0.0012% 0%
SOD1 0.0060% 0.0038%
Very different answers before/after curation
CAF prevalence
gene before after LoF hets GoF disease
HTT 6.2% 0.013% 1 in 3,800 1 in 2,400-4,400
LRRK2 0.23% 0.09% 1 in 500 1 in 3,300
MAPT 14% 0% not observed 1 in 5,000 – 31,000
PRNP 0.0035% 0.0021% 1 in 18,000 1 in 50,000
SNCA 0.0012% 0% not observed 1 in 360,000
SOD1 0.0060% 0.0038% 1 in 26,000 1 in 27,000-83,000
Very different answers before/after curation
CAF prevalence
gene before after LoF hets GoF disease
HTT 6.2% 0.013% 1 in 3,800 1 in 2,400-4,400
LRRK2 0.23% 0.09% 1 in 500 1 in 3,300
MAPT 14% 0% not observed 1 in 5,000 – 31,000
PRNP 0.0035% 0.0021% 1 in 18,000 1 in 50,000
SNCA 0.0012% 0% not observed 1 in 360,000
SOD1 0.0060% 0.0038% 1 in 26,000 1 in 27,000-83,000
Very different answers before/after curation
• Even without recontact & phenotyping, curation can be highly informative
CAF prevalence
gene before after LoF hets GoF disease
HTT 6.2% 0.013% 1 in 3,800 1 in 2,400-4,400
LRRK2 0.23% 0.09% 1 in 500 1 in 3,300
MAPT 14% 0% not observed 1 in 5,000 – 31,000
PRNP 0.0035% 0.0021% 1 in 18,000 1 in 50,000
SNCA 0.0012% 0% not observed 1 in 360,000
SOD1 0.0060% 0.0038% 1 in 26,000 1 in 27,000-83,000
Very different answers before/after curation
• Even without recontact & phenotyping, curation can be highly informative
• But remember, even MAPT and SNCA might be great drug targets!
CAF prevalence
gene before after LoF hets GoF disease
HTT 6.2% 0.013% 1 in 3,800 1 in 2,400-4,400
LRRK2 0.23% 0.09% 1 in 500 1 in 3,300
MAPT 14% 0% not observed 1 in 5,000 – 31,000
PRNP 0.0035% 0.0021% 1 in 18,000 1 in 50,000
SNCA 0.0012% 0% not observed 1 in 360,000
SOD1 0.0060% 0.0038% 1 in 26,000 1 in 27,000-83,000
Suggested guidelines for evaluating drug
targets based on LoF
• It's complicated - no simple formula, evaluate each target on a
case-by-case basis
• Filter and curate
• Consider positional distribution
• Calculate cumulative allele frequency
• Experimentally validate loss-of-function
• Don't eliminate a gene from consideration just because you can't find
LoF individuals
• Read the pre-print: broad.io/gnomad_drugs
Acknowledgments
• Contact: eminikel@broadinstitute.org / danmac@broadinstitute.org
• Funding: NIH F31 AI22592
• Many thanks to East London Genes & Health
• Thanks to co-authors: Konrad, Beryl, Nicky, Jessica; Stuart Schreiber; Hilary Martin,
Richard Trembath, & David van Heel (ELGH); gnomAD consortium & production team
• FYI: Sonia & Eric's thesis defenses (primary prevention targeting PRNP) – April 16,
9:00a – 11:00a, Broad Auditorium
broad.io/gnomad_drugs
From LoF to phenotype: a pilot
study using LRRK2
Nicky Whiffin
@nickywhiffin
Research fellow, Imperial College London
Irina ArmeanAaron Kleinman
broad.io/gnomad_lrrk2
• Gain of function missense variants in LRRK2 cause early-
onset Parkinson’s
• LRRK2 is over-activated in general Parkinson’s
GoF LRRK2 variants cause Parkinson’s
• Gain of function missense variants in LRRK2 cause early-
onset Parkinson’s
• LRRK2 is over-activated in general Parkinson’s
• Multiple pharma companies now pursuing LRRK2 inhibitors as
generalised Parkinson’s therapy
GoF LRRK2 variants cause Parkinson’s
• Early pre-clinical model organism studies – KO animals have
lung, liver and renal phenotypes
Early concerns for toxicity
• Early pre-clinical model organism studies – KO animals have
lung, liver and renal phenotypes
Is partial reduction of LRRK2 protein levels safe in humans?
Early concerns for toxicity
Cohorts included
gnomAD v2.1
141,456 sequenced
individuals
Case-control and cohort
studies
23andMe
>4 million research-
consented individuals
Genotyped and imputed
Cohorts included
633 LRRK2 LoF carriers
(123 unique variants)
348 LRRK2 LoF carriers
(117 unique variants)
- LOFTEE low confidence
255 LRRK2 LoF carriers
(111 unique variants)
- Manual curation
Exome
sequencing
RNAseq
RNAseq
LRRK2 LoF
carrier
Homozygous
reference
Cohorts included
Genotyped or imputed
LRRK2 LoF variants
8 variants
Subset of carriers for each variant
sent for Sanger validation
3 variants
in 1103 carriers
(749 Sanger confirmed)
- < 5 validated carriers
- Failed validation
- Manual curation
633 LRRK2 LoF carriers
(123 unique variants)
348 LRRK2 LoF carriers
(117 unique variants)
- LOFTEE low confidence
255 LRRK2 LoF carriers
(111 unique variants)
- Manual curation
Cohorts included
<5 homozygotes
Genotyped or imputed
LRRK2 LoF variants
8 variants
Subset of carriers for each variant
sent for Sanger validation
3 variants
in 1103 carriers
(749 Sanger confirmed)
- < 5 validated carriers
- Failed validation
- Manual curation
633 LRRK2 LoF carriers
(123 unique variants)
348 LRRK2 LoF carriers
(117 unique variants)
- LOFTEE low confidence
255 LRRK2 LoF carriers
(111 unique variants)
- Manual curation
pLoF variants are evenly spread across the protein...
...and are genuinely LoF
lymphoblastoid cells from
individuals with heterozygous
LRRK2 LoF
CRISPR-edited embryonic
stem cells differentiated into
cardiomyocytes
Jamie Marshall
Homozygous
reference
Homozygous
reference
p.Cys1313Ter
p.Arg1483Ter
p.Arg1693Ter
• 1,358 carriers of 111 pLoF variants
• Appear to be true LoF
But what effect do these have on human health?
A curated dataset of LRRK2 pLoF individuals
No effect on overall mortality
• 60 LRRK2 LoF carriers in gnomAD had available
phenotype data
• Genomic Psychiatric Cohort, Pakistan Risk of Myocardial Infarction
Study, Swedish Schizophrenia and Bipolar Studies, the FINRISK
study, the BioMe Biobank, the Estonian Biobank
• Very diverse sources including EHRs and questionnaires
Manual curation of gnomAD phenotype data
Jessica Alföldi
• 60 LRRK2 LoF carriers in gnomAD had available
phenotype data
• Genomic Psychiatric Cohort, Pakistan Risk of Myocardial Infarction
Study, Swedish Schizophrenia and Bipolar Studies, the FINRISK
study, the BioMe Biobank, the Estonian Biobank
• Very diverse sources including EHRs and questionnaires
• Manually assessed for lung, liver, kidney, CV, nervous
system, immune system phenotypes and cancer
• No enrichment for any adverse phenotypes
• No sign of syndromic phenotypes
Manual curation of gnomAD phenotype data
Jessica Alföldi
No differences across 77 serum biomarkers
Entire cohort
LRRK2 LoF
carriers
No association with any phenotypes in 23andMe
• ~1 in 550 humans has a heterozygous pLoF variant in LRRK2
• ~50% reduction in LRRK2 protein
• likely across all tissues throughout life
• No discernable negative impact across >1100 carriers
• No effect on overall mortality
• No enrichment for any assessed phenotypes
• Suggests that partial LRRK2 inhibitors should be well-tolerated,
even with chronic administration
• Demonstrates the power of large-scale genetics to assess
tolerability for drug discovery
Key message for LRRK2 drug development
Acknowledgements
Irina Armean
Jamie Marshall
Eric Minikel
Konrad Karczewski
Beryl Cummings
Laurent Francioli
Kristen Laricchia
Qingbo Wang
James Ware
Jessica Alföldi
Daniel MacArthur
Aaron Kleinman
Anna Guan
Babak Alipanahi
Peter Morrison
the 23andMe Research
Team
Paul Cannon
Genome Aggregation Database
Production Group
Genome Aggregation Database
Consortium
Marco Baptista
Kalpana Merchant
Aki Havulinna
Bozenna Iliadou
Jung-Jin Lee
Grish Nadkarni
Cole Whiteman
Mark Daly
Tõnu Esko
Christina Hultman
Ruth Loos
Lili Milani
Aarno Palotie
Carlos Pato
Michele Pato
Danish Saleheen
Patrick Sullivan

More Related Content

What's hot

De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
A guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesA guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesYannick Pouliot
 
[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimationWei Yang
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in BioinformaticsAli Kishk
 
Generative AI con Amazon Bedrock.pdf
Generative AI con Amazon Bedrock.pdfGenerative AI con Amazon Bedrock.pdf
Generative AI con Amazon Bedrock.pdfGuido Maria Nebiolo
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionVasileiosMezaris
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Sebastian Schmeier
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learningStanley Wang
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentRutger Vos
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model ServingDatabricks
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015Torsten Seemann
 
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsKonfHubTechConferenc
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK Kamonasish Hore
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?Bhaskar Mitra
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...Edureka!
 

What's hot (20)

De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
A guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databasesA guided SQL tour of bioinformatics databases
A guided SQL tour of bioinformatics databases
 
[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in Bioinformatics
 
Generative AI con Amazon Bedrock.pdf
Generative AI con Amazon Bedrock.pdfGenerative AI con Amazon Bedrock.pdf
Generative AI con Amazon Bedrock.pdf
 
Explaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attentionExplaining video summarization based on the focus of attention
Explaining video summarization based on the focus of attention
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
 
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion Models
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
 

Similar to Variation across 141,456 individuals reveals the spectrum of loss-of-function intolerance of the human genome

An integrated map of genetic variation from 1,092
An integrated map of genetic variation from 1,092An integrated map of genetic variation from 1,092
An integrated map of genetic variation from 1,092Grigory Sapunov
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesSean Ekins
 
Supporting Genomics in the Practice of Medicine by Heidi Rehm
Supporting Genomics in the Practice of Medicine by Heidi RehmSupporting Genomics in the Practice of Medicine by Heidi Rehm
Supporting Genomics in the Practice of Medicine by Heidi RehmKnome_Inc
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Reid Robison
 
Large Scale Epitope Identification Screen and Its Potential Application to th...
Large Scale Epitope Identification Screen and Its Potential Application to th...Large Scale Epitope Identification Screen and Its Potential Application to th...
Large Scale Epitope Identification Screen and Its Potential Application to th...National Alopecia Areata Foundation
 
Studying the microbiome
Studying the microbiomeStudying the microbiome
Studying the microbiomeMick Watson
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
The server of the Spanish Population Variability
The server of the Spanish Population VariabilityThe server of the Spanish Population Variability
The server of the Spanish Population VariabilityJoaquin Dopazo
 
Sundaram et al. 2018 Presentation
Sundaram et al. 2018 PresentationSundaram et al. 2018 Presentation
Sundaram et al. 2018 PresentationBrianSchilder
 
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESMICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESKaran Veer Singh
 
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Functional Genomics Data Society
 
The emerging biodiversity data ecosystem
The emerging biodiversity data ecosystemThe emerging biodiversity data ecosystem
The emerging biodiversity data ecosystemCyndy Parr
 
Global ocean’s protist metabarcoding
Global ocean’s protist metabarcodingGlobal ocean’s protist metabarcoding
Global ocean’s protist metabarcodingEukRef
 
CGR PPT by DrPPS PRABHAKRA SASTRY-10-2023.pptx
CGR PPT by DrPPS PRABHAKRA SASTRY-10-2023.pptxCGR PPT by DrPPS PRABHAKRA SASTRY-10-2023.pptx
CGR PPT by DrPPS PRABHAKRA SASTRY-10-2023.pptxEEPD1
 
Molecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfMolecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfsabyabby
 

Similar to Variation across 141,456 individuals reveals the spectrum of loss-of-function intolerance of the human genome (20)

An integrated map of genetic variation from 1,092
An integrated map of genetic variation from 1,092An integrated map of genetic variation from 1,092
An integrated map of genetic variation from 1,092
 
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan DiseasesUsing In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
Using In Silico Tools in Repurposing Drugs for Neglected and Orphan Diseases
 
Supporting Genomics in the Practice of Medicine by Heidi Rehm
Supporting Genomics in the Practice of Medicine by Heidi RehmSupporting Genomics in the Practice of Medicine by Heidi Rehm
Supporting Genomics in the Practice of Medicine by Heidi Rehm
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
Large Scale Epitope Identification Screen and Its Potential Application to th...
Large Scale Epitope Identification Screen and Its Potential Application to th...Large Scale Epitope Identification Screen and Its Potential Application to th...
Large Scale Epitope Identification Screen and Its Potential Application to th...
 
Studying the microbiome
Studying the microbiomeStudying the microbiome
Studying the microbiome
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
RDD Conf Day 1: Genomics for Rare Diseases Dr. Anna Lehman
RDD Conf Day 1: Genomics for Rare Diseases Dr. Anna LehmanRDD Conf Day 1: Genomics for Rare Diseases Dr. Anna Lehman
RDD Conf Day 1: Genomics for Rare Diseases Dr. Anna Lehman
 
The server of the Spanish Population Variability
The server of the Spanish Population VariabilityThe server of the Spanish Population Variability
The server of the Spanish Population Variability
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
Sundaram et al. 2018 Presentation
Sundaram et al. 2018 PresentationSundaram et al. 2018 Presentation
Sundaram et al. 2018 Presentation
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESMICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
 
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
 
The emerging biodiversity data ecosystem
The emerging biodiversity data ecosystemThe emerging biodiversity data ecosystem
The emerging biodiversity data ecosystem
 
Global ocean’s protist metabarcoding
Global ocean’s protist metabarcodingGlobal ocean’s protist metabarcoding
Global ocean’s protist metabarcoding
 
Micro array analysis
Micro array analysisMicro array analysis
Micro array analysis
 
CGR PPT by DrPPS PRABHAKRA SASTRY-10-2023.pptx
CGR PPT by DrPPS PRABHAKRA SASTRY-10-2023.pptxCGR PPT by DrPPS PRABHAKRA SASTRY-10-2023.pptx
CGR PPT by DrPPS PRABHAKRA SASTRY-10-2023.pptx
 
E-Rare project “RNA-ALS”
E-Rare project “RNA-ALS”E-Rare project “RNA-ALS”
E-Rare project “RNA-ALS”
 
Molecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdfMolecular techniques for pathology research - MDX .pdf
Molecular techniques for pathology research - MDX .pdf
 

Recently uploaded

BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 

Recently uploaded (20)

BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 

Variation across 141,456 individuals reveals the spectrum of loss-of-function intolerance of the human genome

  • 1. A brief history of gnomAD Daniel MacArthur April 11, 2019
  • 2. Given known mutation rates, it is almost certain that every possible single base change compatible with life exists in a living human The power of seven billion people
  • 3. Opportunities and challenges of genetic data aggregation over three million exomes and genomes sequenced Challenges: • difficulty of moving data • inadequate consent and data use permissions • objections to data sharing • inconsistent processing and variant calling
  • 4. A history of genome data aggregation at Broad Exome Aggregation Consortium (ExAC) Genome Aggregation Database (gnomAD) • 60,076 exomes • began in 2012, first release Oct 2014 • preprint in Oct 2015 • published in May 2016 • 125,748 exomes and 15,708 genomes • began in 2016, first release Oct 2016 • final publication data release Oct 2018 • 7 preprints in Jan-March 2019
  • 5. gnomAD 2.1 samples • Data provided by 109 PIs for 141,456 individuals including 125,748 exomes & 15,708 whole genomes • Primarily from case-control studies of complex adult-onset diseases (e.g. type 2 diabetes, heart attack, neuropsychiatric conditions) • Removed low-quality samples, related individuals, known severe pediatric disease cases plus first-degree relatives • Diverse range of ancestries (57% European, over ~10,000 samples apiece from South Asian, Latino, African/African-American, and East Asian populations)
  • 6. gnomAD’s impact • 20.8 M pageviews of the ExAC and gnomAD browsers, by 230,000 users from 166 countries • Aided in the diagnosis of over 50,000 rare disease families • 3,962 papers have cited the ExAC paper
  • 7. The gnomAD preprints on bioRxiv http://broad.io/gnomad_lof http://broad.io/gnomad_drugs http://broad.io/gnomad_lrrk2 http://broad.io/tx_annotation http://broad.io/gnomad_mnv http://broad.io/gnomad_uorfs http://broad.io/gnomad_sv
  • 8. Thank you Production team Eric Banks Charlotte Tolonen Christopher Llanwarne David Roazen Diane Kaplan Gordon Wade Jeff Gentry Jose Soto Kathleen Tibbetts Kristian Cibulskis Laura Gauthier Louis Bergelson Miguel Covarrubias Nikelle Petrillo Ruchi Munshi Sam Novod Thibault Jeandet Valentin Ruano- Rubio Yossi Farjoun Analysis team Konrad Karczewski Laurent Francioli Grace Tiao Kristen Laricchia Anne O'Donnell- Luria Ben Neale Beryl Cummings Eric Minikel Irina Armean James Ware Kaitlin Samocha Mark Daly Nicola Whiffin Qingbo Wang Ryan Collins Cotton Seed Tim Poterba Arcturus Wang Chris Vittal Structural Variation team Ryan Collins Harrison Brand Konrad Karczewski Laurent Francioli Nick Watts Matthew Solomonson Xuefang Zhao Laura Gauthier Harold Wang Chelsea Lowther Mark Walker Christopher Whelan Ted Brookings Ted Sharpe Jack Fu Eric Banks Michael Talkowski Website team Matthew Solomonson Nick Watts Ben Weisburd Konrad Karczewski Ethics team Andrea Saltzman Molly Schleicher Namrata Gupta Stacey Donnelly Broad Genomics Platform Stacey Gabriel Kristen Connolly Steven Ferriera Funding NIGMS R01 GM104371 (PI: MacArthur) NIDDK U54 DK105566 (PIs: MacArthur and Neale) NHGRI U24 HG010262 (PI: Phillipakis) NIMH R56 MH115957 (PI: Talkowski) The vast majority of the data storage, computing resources, and human effort used to generate this call set were donated by the Broad Institute Coordination Jessica Alföldi
  • 9. Thank youPrincipal Investigators Daniel MacArthur Aarno Palotie Andres Metspalu Anne Remes Adolfo Correa Andre Franke Ann Pulver Ben Glaser Ben Neale Bong-Jo Kim Bruce Cohen Carlos Pato Carlos A Aguilar Salinas Christina Hultman Christine M. Albert Christopher Haiman Clicerio Gonzalez Colin Palmer Craig Hanis Dan Roden Dan Turner Dana Dabelea Daniel Chasman Danish Saleheen David Altshuler David Goldstein Dawood Darbar Dermot McGovern Diego Ardissino Donald Bowden Dost Ongur Emelia J. Benjamin Erkki Vartiainen Erwin Bottinger Gad Getz George Kirov Gil Atzmon Harlan M. Krumholz Harry Sokol Heribert Schunkert Hilkka Soininen Hugh Watkins Jaakko Kaprio Jaana Suvisaari James Meigs James Ware James Wilson Jaspal Kooner Jaume Marrugat Jeanette Erdmann Jeremiah Scharf John Barnard John Chambers John D. Rioux Jose Florez Josée Dupuis Judy Cho Juliana Chan Kari Mattila Kyong Soo Park Laurent Beaugerie Leif Groop Lorena Orozco Lori Bonnycastle Maija Wessman Mark Daly Mark McCarthy Markku Laakso Martti Färkkilä Matthew Bown Matthew Harms Matti Holi Michael Boehnke Michael O'Donovan Michael Owen Mikko Hiltunen Mikko Kallela Mina Chung Ming Tsuang Moore Shoemaker Nazneen Rahman Nilesh Samani Olle Melander Pamela Sklar Patrick T. Ellinor Patrick Sullivan Peter Nilsson Ramnik Xavier Ravindranath Duggirala Rinse Weersma Roberto Elosua Ronald Ma Ruth Loos Ruth McPherson Samuli Ripatti Sekar Kathiresan Seppo Koskinen Soo Heon Kwak Stephen Glatt Steve McCarroll Steven A. Lubitz Subra Kugathasan Tai Shyong Tariq Ahmad Teresa Tusie Luna Terho Lehtimäki Tim Spector Tõnu Esko Tuomi Tiinamaija Veikko Salomaa Yik Ying Teo Young Jin Kim Jerome Rotter Steven Rich
  • 10. Variation across 141,456 individuals reveals the spectrum of loss-of-function intolerance of the human genome Konrad Karczewski April 11, 2019 @konradjk broad.io/gnomad_lof
  • 11. Range of LoF impact embryonic lethal recessive disease non-essential complex disease beneficial haploinsufficient disease
  • 12. Identifying true LoF variants is challenging • LoFs are rare • LoFs are enriched for artifacts
  • 13. Identifying true LoF variants is challenging • LoFs are rare • LoFs are enriched for artifacts
  • 14. Staggering amounts of variation synonymous missense pLoF 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 0 40,000 80,000 125,748 Sample size Numberobserved • gnomAD contains: • 230M variants in 15,708 genomes • 15M variants in 125,748 exomes
  • 15. Staggering amounts of pLoFs • gnomAD contains: • 230M variants in 15,708 genomes • 15M variants in 125,748 exomes • Of these, we observe 515,326 predicted loss-of- function (pLoF) variants • Stop-gained • Essential splice • Frameshift indel pLoF 0 100,000 200,000 300,000 400,000 0 40,000 80,000 125,748 Sample size Numberobserved
  • 16. Identifying true LoF variants is challenging • LoFs are rare • LoFs are enriched for artifacts
  • 17. LOFTEE removes benign variation • LoF filtering plugin to VEP, LOFTEE • Variants retained by LOFTEE are: • rarer, and thus • more deleterious • After filtering, we discover 443,769 high-confidence pLoFs in gnomAD https://github.com/konradjk/loftee ● ● ● ● 0.00 0.05 0.10 0.15 synonymous missense low confidence pLoF high confidence pLoF MAPS Rarer, more deleterious
  • 18. Detecting genes depleted for pLoFs • Mutational model that predicts the number of SNVs in a given functional class we would expect to see in each gene in a cohort • Now incorporating methylation, improved coverage correction, LOFTEE • Previously transformed into the probability of LoF intolerance (pLI) • Applying to 125,748 gnomAD exomes • Median of 17.3 pLoFs expected per gene • Direct estimate of observed/expected ratio Kaitlin Samocha (Samocha et al. 2014; Lek et al. 2016)
  • 19. Most genes are depleted of LoF variation MED13L FNDC3B Phenotype Severe Intellectual Disability Unknown Observed Expected Obs/Exp (CI) Observed Expected Obs/Exp (CI) Synonymous 462 465 0.993 (0.92-1.07) 271 266 1.02 (0.92-1.13) pLoF 0 102 0 (0-0.029) 0 68 0 (0-0.043) • Many are extremely depleted (<20% observed compared to expected) • Including most known (curated) haploinsufficient genes • Using upper bound of confidence interval corrects for small genes 0 500 1000 1500 0.0 0.5 1.0 1.5 Observed/Expected Numberofgenes 0 200 400 600 800 0.0 0.5 1.0 1.5 2.0 LOEUF Numberofgenes
  • 20. • Binning this spectrum into deciles Resolving the spectrum of LoF intolerance Haploinsufficient Autosomal Recessive Olfactory Genes 0% 20% 40% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofgenelist More depleted More constrained More tolerant Less constrained
  • 21. • Known haploinsufficient genes have ~10% of the expected pLoFs Resolving the spectrum of LoF intolerance Haploinsufficient Autosomal Recessive Olfactory Genes 0% 20% 40% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofgenelist
  • 22. • Autosomal recessive genes are centered around 60% of expected Resolving the spectrum of LoF intolerance Haploinsufficient Autosomal Recessive Olfactory Genes 0% 20% 40% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofgenelist Gene list from: Blekhman et al., 2008 Berg et al., 2013
  • 23. Haploinsufficient Autosomal Recessive Olfactory Genes 0% 20% 40% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofgenelist • Some genes, e.g. olfactory receptors, are unconstrained Resolving the spectrum of LoF intolerance
  • 24. Constraint metrics reflect mouse and cellular knockout phenotypes Qingbo Wang 0% 10% 20% 30% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofmouse hetlethalknockoutgenes Cell essential Cell non−essential 0% 5% 10% 15% 20% 25% 0% 20% 40% 60% 80% 100% LOEUF decile Percentofessential/ non−essentialgenes Hart et al., 2017 Eppig et al., 2015 Motenko et al., 2015
  • 25. Constraint spectrum reflects patterns of structural variation • Called SVs in 14,245 individuals with 30X WGS • 9,860 rare (<1%), biallelic, autosomal LoF deletions • Occurrence correlates with SNV constraint metric Ryan Collins Harrison Brand ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0% 20% 40% 60% 80% 100% LOEUF decile Aggregatedeletion SVobserved/expected broad.io/gnomad_sv
  • 26. Constraint metrics are correlated with biological relevance Protein-protein interactions ● ● ● ● ● ● ● ● ● ● 5 10 15 20 25 0% 20% 40% 60% 80% 100% LOEUF decile Meannumberof protein−proteininteractions Gene expression ● ● ● ● ● ● ● ● ● ● 0 10 20 30 0% 20% 40% 60% 80% 100% LOEUF decile Numberoftissueswhere canonicaltranscriptisexpressed
  • 27. ● ● ● ● ● ● ●● ●● ●● ● ● ●● ●● ●● synonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymoussynonymous pLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoFpLoF 0 5 10 15 0% 20% 40% 60% 80% 100% LOEUF decile Rateratiofordenovo variantsinID/DDcases comparedtocontrols Constraint improves rare disease diagnosis • Patients with developmental delay/intellectual disability are 15X more likely to have an de novo LoF in a constrained gene • 8,095 de novos in 5,305 cases • 2,623 de novos in 2,179 controls • Integrating expression data improves this further Jack Kosmicki Beryl Cummings broad.io/tx_annotation
  • 28. Constraint informs common disease etiologies • Compared to genome-wide background, SNPs near constrained genes are enriched in their contribution to heritability of common traits • In particular, traits that previously1 showed an enrichment of ultra-rare variants are also enriched among constrained genes ● ● ● ● ● ● ● ● ● ● 1.0 1.2 1.4 0% 20% 40% 60% 80% 100% LOEUF decile Partitioningheritability enrichment Schizoprenia Qualifications: College or University degree Duration to first press of snap−button in each round Educational attainment Bipolar 10-2 10 -4 10 -6 10-8 10-10 10 -12 10 -14 Activities Cardiovascular Cognitive Environment Hematological Metabolic Nutritional Ophthalmological Psychiatric Reproduction Respiratory Skeletal Social Interactions Other Traitenrichment p−value 1Ganna et al. 2018 AJHG Andrea Ganna
  • 29. Data publicly released with no publication restrictions gnomad.broadinstitute.org Matt Solomonson Nick Watts Gene model with transcripts Pathogenic Clinvar Variants Dataset selection box Tissue isoform expression Constraint metrics pext: broad.io/tx_annotation
  • 30. Now featuring: structural variant calls in the browser gnomad.broadinstitute.org Matt Solomonson Nick Watts
  • 31. Acknowledgments • Laurent Francioli • Grace Tiao • Beryl Cummings • Jack Kosmicki • Andrea Ganna • Qingbo Wang • Kaitlin Samocha Ben Neale • Daniel Birnbaum • Jessica Alföldi Kristen Laricchia • Matt Solomonson Nick Watts • Ryan Collins Harrison Brand • Raymond K. Walters Kate Tashman • Daniel Rhodes Moriel Singer-Berk Eleina England Eleanor G. Seaby • Hail team Tim Poterba Cotton Seed Arcturus Wang • Laura Gauthier Yossi Farjoun Eric Banks • Analytic and Translational Genetics Unit • Mark Daly • Daniel MacArthur broad.io/gnomad_lof
  • 32. Evaluating potential drug targets through human loss-of- function genetic variation Eric Vallabh Minikel April 11, 2019 @cureffi broad.io/gnomad_drugs
  • 33. Why study LoF variants in drug discovery? • LoF variants can be an in vivo, whole human, lifelong model of inhibition of a target. from Plenge 2013, PMID: 23868113
  • 34. Why study LoF variants in drug discovery? • LoF variants can be an in vivo, whole human, lifelong model of inhibition of a target. • With caveats: • drug effect may not exactly mimic LoF • developmental effects • tissue-specific effects • dosage • difference in our ancestors' environment vs. our environment
  • 35. How do drug targets compare to all genes & specific gene lists in constraint?
  • 36. How do drug targets compare to all genes & specific gene lists in constraint?
  • 37. How constrained are some well-known drug targets?
  • 38. How constrained are some well-known drug targets? • 19% of all drug targets (N=73, including 53 targets of inhibitors, antagonists, etc.) have obs/exp < 13%, the average for haploinsufficient genes
  • 39. How constrained are some well-known drug targets? • 19% of all drug targets (N=73, including 53 targets of inhibitors, antagonists, etc.) have obs/exp < 13%, the average for haploinsufficient genes • These include some chemotherapy targets but also aspirin, statins, and antimuscarinics!
  • 40. How constrained are some well-known drug targets? • Not all chemotherapy targets are so constrained
  • 41. How constrained are some well-known drug targets? • Drug targets span the full spectrum – constraint alone should not rule a potential target in or out
  • 42. Can we find and phenotype LoF individuals for a gene of interest? • If you can find them, phenotyping of LoF individuals (het or hom) can be deeply informative for safety and/or efficacy • Examples: PCSK9, APOC3, CETP, LPA, HAO1... • Questions for today: • Is it always realistic to expect to find enough LoF heterozygotes or homozygotes to permit your analysis of interest? • What is the best strategy to go about finding them? • How should you curate pLoF variants before starting to recontact?
  • 43. Cumulative allele frequency of LoF variants • Cumulative allele frequency (CAF) = Σ(AF) for all LoF variants • Define p = proprtion of the haplotypes in population that are LoF • In an outbred population: • LoF het frequency = 2p(1-p) • LoF hom / compound het frequency = p2 • This analysis: • Use gnomAD data to compute p for each gene • Predict the hom/compound het frequency for each gene in the population — assuming this genotype is not lethal (!)
  • 46. What about bottlenecked populations? https://ecologyblog0112.weebly.com/population-genetics-in-the-conservation-of-biodiversity.html
  • 50. Which populations to sequence? • For the near future, analysis for most genes will need to focus on heterozygotes, regardless of population • For finding homozygotes, best strategy is to sample diverse bottlenecked populations and consanguineous individuals
  • 51. How to curate? • "the more interesting something looks, the less likely it is to be real" • Solutions: • LOFTEE (Karczewski 2019, broad.io/gnomad_lof) • Expression-aware annotation (Cummings 2019, broad.io/tx_annotation) • Deep manual curation is still important Non-random distribution of pLoFs across the coding sequence is suspicious • Next up: examples of curation of 3 genes with different error modes
  • 52. HTT • Cumulative pLoF allele frequency: 6.2% • Mostly driven by several common variants in exon 1 • Highly suspicious given the lethal mouse knockout phenotype!
  • 53. HTT • common LoFs are all alignment artifacts at polyQ and polyP repeat regions • after filtering & curation, cumulative pLoF allele frequency: 0.013%
  • 54. MAPT • Cumulative pLoF allele frequency: 14%
  • 55. MAPT • almost all pLoFs are in exons not expressed in the brain! • the remainder are various artifacts • after filtering & curation, cumulative pLoF allele frequency: 0% • Transcript-aware expression – see Cummings et al, broad.io/tx_annotation
  • 56. PRNP • Slightly constrained – 6 observed, 10 expected • All in N terminus or very near C terminus
  • 57. PRNP • N-terminal variants are true LoF. In N terminus, not constrained at all (obs/exp = 6/6.05 = 99%) • C-terminal truncating variants cause disease through gain-of-function (literature variants added). The gnomAD C-terminal frameshift turns out to be a dementia case!
  • 58. Very different answers before/after curation CAF gene before after HTT 6.2% 0.013% LRRK2 0.23% 0.09% MAPT 14% 0% PRNP 0.0035% 0.0021% SNCA 0.0012% 0% SOD1 0.0060% 0.0038%
  • 59. Very different answers before/after curation CAF prevalence gene before after LoF hets GoF disease HTT 6.2% 0.013% 1 in 3,800 1 in 2,400-4,400 LRRK2 0.23% 0.09% 1 in 500 1 in 3,300 MAPT 14% 0% not observed 1 in 5,000 – 31,000 PRNP 0.0035% 0.0021% 1 in 18,000 1 in 50,000 SNCA 0.0012% 0% not observed 1 in 360,000 SOD1 0.0060% 0.0038% 1 in 26,000 1 in 27,000-83,000
  • 60. Very different answers before/after curation CAF prevalence gene before after LoF hets GoF disease HTT 6.2% 0.013% 1 in 3,800 1 in 2,400-4,400 LRRK2 0.23% 0.09% 1 in 500 1 in 3,300 MAPT 14% 0% not observed 1 in 5,000 – 31,000 PRNP 0.0035% 0.0021% 1 in 18,000 1 in 50,000 SNCA 0.0012% 0% not observed 1 in 360,000 SOD1 0.0060% 0.0038% 1 in 26,000 1 in 27,000-83,000
  • 61. Very different answers before/after curation • Even without recontact & phenotyping, curation can be highly informative CAF prevalence gene before after LoF hets GoF disease HTT 6.2% 0.013% 1 in 3,800 1 in 2,400-4,400 LRRK2 0.23% 0.09% 1 in 500 1 in 3,300 MAPT 14% 0% not observed 1 in 5,000 – 31,000 PRNP 0.0035% 0.0021% 1 in 18,000 1 in 50,000 SNCA 0.0012% 0% not observed 1 in 360,000 SOD1 0.0060% 0.0038% 1 in 26,000 1 in 27,000-83,000
  • 62. Very different answers before/after curation • Even without recontact & phenotyping, curation can be highly informative • But remember, even MAPT and SNCA might be great drug targets! CAF prevalence gene before after LoF hets GoF disease HTT 6.2% 0.013% 1 in 3,800 1 in 2,400-4,400 LRRK2 0.23% 0.09% 1 in 500 1 in 3,300 MAPT 14% 0% not observed 1 in 5,000 – 31,000 PRNP 0.0035% 0.0021% 1 in 18,000 1 in 50,000 SNCA 0.0012% 0% not observed 1 in 360,000 SOD1 0.0060% 0.0038% 1 in 26,000 1 in 27,000-83,000
  • 63. Suggested guidelines for evaluating drug targets based on LoF • It's complicated - no simple formula, evaluate each target on a case-by-case basis • Filter and curate • Consider positional distribution • Calculate cumulative allele frequency • Experimentally validate loss-of-function • Don't eliminate a gene from consideration just because you can't find LoF individuals • Read the pre-print: broad.io/gnomad_drugs
  • 64. Acknowledgments • Contact: eminikel@broadinstitute.org / danmac@broadinstitute.org • Funding: NIH F31 AI22592 • Many thanks to East London Genes & Health • Thanks to co-authors: Konrad, Beryl, Nicky, Jessica; Stuart Schreiber; Hilary Martin, Richard Trembath, & David van Heel (ELGH); gnomAD consortium & production team • FYI: Sonia & Eric's thesis defenses (primary prevention targeting PRNP) – April 16, 9:00a – 11:00a, Broad Auditorium broad.io/gnomad_drugs
  • 65. From LoF to phenotype: a pilot study using LRRK2 Nicky Whiffin @nickywhiffin Research fellow, Imperial College London Irina ArmeanAaron Kleinman broad.io/gnomad_lrrk2
  • 66. • Gain of function missense variants in LRRK2 cause early- onset Parkinson’s • LRRK2 is over-activated in general Parkinson’s GoF LRRK2 variants cause Parkinson’s
  • 67. • Gain of function missense variants in LRRK2 cause early- onset Parkinson’s • LRRK2 is over-activated in general Parkinson’s • Multiple pharma companies now pursuing LRRK2 inhibitors as generalised Parkinson’s therapy GoF LRRK2 variants cause Parkinson’s
  • 68. • Early pre-clinical model organism studies – KO animals have lung, liver and renal phenotypes Early concerns for toxicity
  • 69. • Early pre-clinical model organism studies – KO animals have lung, liver and renal phenotypes Is partial reduction of LRRK2 protein levels safe in humans? Early concerns for toxicity
  • 70. Cohorts included gnomAD v2.1 141,456 sequenced individuals Case-control and cohort studies 23andMe >4 million research- consented individuals Genotyped and imputed
  • 71. Cohorts included 633 LRRK2 LoF carriers (123 unique variants) 348 LRRK2 LoF carriers (117 unique variants) - LOFTEE low confidence 255 LRRK2 LoF carriers (111 unique variants) - Manual curation
  • 73. Cohorts included Genotyped or imputed LRRK2 LoF variants 8 variants Subset of carriers for each variant sent for Sanger validation 3 variants in 1103 carriers (749 Sanger confirmed) - < 5 validated carriers - Failed validation - Manual curation 633 LRRK2 LoF carriers (123 unique variants) 348 LRRK2 LoF carriers (117 unique variants) - LOFTEE low confidence 255 LRRK2 LoF carriers (111 unique variants) - Manual curation
  • 74. Cohorts included <5 homozygotes Genotyped or imputed LRRK2 LoF variants 8 variants Subset of carriers for each variant sent for Sanger validation 3 variants in 1103 carriers (749 Sanger confirmed) - < 5 validated carriers - Failed validation - Manual curation 633 LRRK2 LoF carriers (123 unique variants) 348 LRRK2 LoF carriers (117 unique variants) - LOFTEE low confidence 255 LRRK2 LoF carriers (111 unique variants) - Manual curation
  • 75. pLoF variants are evenly spread across the protein...
  • 76. ...and are genuinely LoF lymphoblastoid cells from individuals with heterozygous LRRK2 LoF CRISPR-edited embryonic stem cells differentiated into cardiomyocytes Jamie Marshall Homozygous reference Homozygous reference p.Cys1313Ter p.Arg1483Ter p.Arg1693Ter
  • 77. • 1,358 carriers of 111 pLoF variants • Appear to be true LoF But what effect do these have on human health? A curated dataset of LRRK2 pLoF individuals
  • 78. No effect on overall mortality
  • 79. • 60 LRRK2 LoF carriers in gnomAD had available phenotype data • Genomic Psychiatric Cohort, Pakistan Risk of Myocardial Infarction Study, Swedish Schizophrenia and Bipolar Studies, the FINRISK study, the BioMe Biobank, the Estonian Biobank • Very diverse sources including EHRs and questionnaires Manual curation of gnomAD phenotype data Jessica Alföldi
  • 80. • 60 LRRK2 LoF carriers in gnomAD had available phenotype data • Genomic Psychiatric Cohort, Pakistan Risk of Myocardial Infarction Study, Swedish Schizophrenia and Bipolar Studies, the FINRISK study, the BioMe Biobank, the Estonian Biobank • Very diverse sources including EHRs and questionnaires • Manually assessed for lung, liver, kidney, CV, nervous system, immune system phenotypes and cancer • No enrichment for any adverse phenotypes • No sign of syndromic phenotypes Manual curation of gnomAD phenotype data Jessica Alföldi
  • 81. No differences across 77 serum biomarkers Entire cohort LRRK2 LoF carriers
  • 82. No association with any phenotypes in 23andMe
  • 83. • ~1 in 550 humans has a heterozygous pLoF variant in LRRK2 • ~50% reduction in LRRK2 protein • likely across all tissues throughout life • No discernable negative impact across >1100 carriers • No effect on overall mortality • No enrichment for any assessed phenotypes • Suggests that partial LRRK2 inhibitors should be well-tolerated, even with chronic administration • Demonstrates the power of large-scale genetics to assess tolerability for drug discovery Key message for LRRK2 drug development
  • 84. Acknowledgements Irina Armean Jamie Marshall Eric Minikel Konrad Karczewski Beryl Cummings Laurent Francioli Kristen Laricchia Qingbo Wang James Ware Jessica Alföldi Daniel MacArthur Aaron Kleinman Anna Guan Babak Alipanahi Peter Morrison the 23andMe Research Team Paul Cannon Genome Aggregation Database Production Group Genome Aggregation Database Consortium Marco Baptista Kalpana Merchant Aki Havulinna Bozenna Iliadou Jung-Jin Lee Grish Nadkarni Cole Whiteman Mark Daly Tõnu Esko Christina Hultman Ruth Loos Lili Milani Aarno Palotie Carlos Pato Michele Pato Danish Saleheen Patrick Sullivan

Editor's Notes

  1. Citations, diagnoses, impact
  2. I'm going to talk some cool things you can do with this dataset to understand the impact of loss-of-function variation on the human genome.
  3. Imagine if we could put each of the 20K genes in the genome along a spectrum of sensitivity to functional disruption, that is, the clinical or phenotypic impact that a loss-of-function variant might have in that gene. For instance, here over on the left are genes where we’ll never see LoF variants in living humans as these would be incompatible with human life. In the middle are variants and genes that we typically study in the clinical genetics space, from causal variants for dominant and recessive diseases to risk factors for complex disease. On the right, we have genes that are relatively tolerant of LoF variation, potentially even homozygous inactivation. And unlike in model organisms, where we can effectively engineer such mutations, there are obvious technical and ethical barriers to doing so in humans. But when we sequence healthy individuals or individuals with common diseases, we find plenty of genes inactivated, in the form of naturally-occurring predicted loss-of-function variants (or pLoFs). However…
  4. Because true LoFs are deleterious, a number of factors conspire to make them difficult to characterize. In particular...
  5. This dataset contains a substantial amount of variation, including... Here you can see the number of variants discovered in the exomes, broken down by functional class, as a function of sample size, which follow approximately a square root law. If we zoom into the predicted LoFs...
  6. ...we observe over half a million LoFs, following the same pattern of discovery. And I should clarify that when I'm talking about pLoFs today, I'm referring to stop-gained, essential splice, and frameshift variants.
  7. So now that we've increased our sensitivity and discovered a bunch of rare putative LoFs, now we'd like to increase our specificity...
  8. To this end, we've created a tool called LOFTEE, a plugin to VEP that filters out common error modes based on first principles, and importantly, does not use frequency. In spite of that, when we look at the mutability adjusted proportion of singletons, or MAPS, a metric of deleteriousness based on frequency, LOFTEE filters out variants that have a frequency spectrum consistent with missense variants, while variants that are retained are much more rare on average and thus more deleterious. After filtering... 1649 confident homozygous. With a high-quality catalog of predicted loss-of-function variants, we can not only look at genes which have LoF variants in the general population, but also genes where we don’t see any LoFs.
  9. A few years ago, Kaitlin Samocha built a mutational model to predict... And we've now built on this model with a number of improvements to refine the model and increase specificity. Previously, this constraint metric was defined in a metric called pLI. However, now that we're applying to a larger dataset, with our greater resolution, we can use the more interpretable observed/expected ratio, and build a confidence interval around this value, which can give us a conservative estimate of the observed to expected ratio.
  10. Using this method, we can return to the question of where genes fall on this spectrum of LoF tolerance. Most genes have a depletion of pLoFs (that is, observed/expected less than 1), and many are extremely depleted, including most known HI genes. Just a note for anyone who has used the pLI scores previously, we've now flipped the scale, so the genes over on the left side are high pLI genes. So these improvements solidify our ability to detect constraint, here are two very clear examples. LoFs in MED13L previously demonstrated to cause severe ID, facial features, and cardiac phenotypes. FNDC3B has no known human phenotype, but results in death at birth when knocked out in mice. But if you find a rare disease patient with an LoF in this gene, you might be concerned. Some of you may notice this tick on the left side, where observed/expected is zero. This can happen due to extreme constraint, or small genes (say, observed = 0, expected = 2). At larger sample sizes, this will even itself out and we could use the observed/expected ratio, but for now, we can use the upper bound of the confidence interval, which we term LOEUF, resulting in a much smoother distribution, which I’ll use from here on out. So this metric is a conservative estimate that takes into account the gene size. As our sample sizes grow, LOEUF will converge to the o/e ratio, but for now this is a useful metric.
  11. So we can bin this metric into deciles, which I'll show on every slide from here on out with the left ...
  12. 116 - manually curated haploinsufficient developmental delay genes
  13. 1164
  14. 350 so the metric is well-calibrated, and importantly this means we now have improved LoF tolerance scores for all protein-coding genes in the genome
  15. This fits with what we see in model systems, where genes that are early lethal in mouse are more likely to have an ortholog in the human constrained genes. Similarly, in CRISPR screens, genes that are essential for cell viability are also more likely to be constrained and the opposite for the confidently non-essential genes.
  16. We next explored the correlation between constraint against SNVs with patterns of structural variation. Ryan and Harrison called SVs in 14K individuals, identifying about 10K rare biallelic autosomal LoFs that disrupt gene function. When they looked at the occurrence of SVs in each of the constraint deciles, they found that on average, the constrained genes had a strong depletion of structural variation, Important to note that this is not a per-gene SV metric, as even this dataset of 15K has less than one rare LoF SV per gene. For more information on this dataset, see the recently posted preprint from Ryan and crew
  17. If we look at the burden of de novo LoFs in patients with developmental delay or intellectual disability, we observe a 15-fold increased rate in the top 10% most constrained genes in the genome, in cases compared to controls. So in other words, this lowest decile contains genes where a single LoF mutation will prevent you, by an estimable amount, from progressing through a healthy development during childhood.
  18. Finally, we can investigate how these constraint metrics relate to common disease biology. In a partitioning heritability analysis of 600 traits from UKBiobank, we find that SNPs near constrained genes are enriched for heritability of common traits. If we zoom in on which traits have the strongest enrichment for heritability among constrained genes, we find schizophrenia, bipolar disorder, and educational attainment, which is consistent with previous work that marked these traits as enriched for ultra-rare coding variants
  19. Thanks to the efforts of Matt and Nick
  20. Not depleted LOEUF = 0.64
  21. Not depleted LOEUF = 0.64
  22. Not depleted LOEUF = 0.64
  23. Not depleted LOEUF = 0.64
  24. By LoF I mean nonsense, frameshift or essential splice site variants Manual curation Variant quality metrics Reads on IGV LoF rescue either by co-localised variants or cryptic/alternative splice sites
  25. ‘GC’ still works as a strong splice donor site
  26. Protein domains - Chi-square P=0.23
  27. Thank Jamie by name “Western blot” of protein levels
  28. Kolmogorov-Smirnov P=0.085 and 0.46 respectively Last known age not survival
  29. ~4 million individuals, over 1000 of which are known carriers giving reasonable power to detect an association