3. 3
Transcription factor binding
• Transcription factors (TFs) have key importance in regulating gene expression by binding to regulatory
genomic elements (TFs demonstrate sequence-based specificities towards these binding sites)
• Understanding the process of TF-DNA binding can help us understand the intricate process of gene
regulation, develop actionable hypotheses that can be used in drug development/therapy, etc.
• With the aid of technologies like ChIP-seq, SELEX and PBMs, many TF binding sites (TFBSs) have been
characterized
khanacademy.org
4. 4
Modelling transcription factor binding
• Binding sites have been used to train computational models
• Position weight matrices or PWMs (simplest models)
• More complex machine learning (deep learning)
approaches are able to learn far more complex patterns in
binding sites
• The performance by which TF binding models are able to
distinguish their binding regions from random genomic
regions has been well characterized
Giraud et al. 2010
Alipanahi et al. 2015
5. 5
Genetic variation and transcription factor binding
• Genetic variation falling within specificity
determinants of TF binding sites (TFBSs) can alter
binding by introducing novel binding sites or
diminishing existing binding sites,
• Can result in a substantial impact on molecular
phenotypes through changes in gene expression
• PWMs and DL-based models have been used to
assess impact of variants on binding sites
• Have become an essential component of many
variant prioritization pipelines
6. 6
Motivation
• How well do binding prediction models perform at predicting impact of variants?
• Variants from the Human Genome Mutation Database (HGMD), genome-wide association
studies (GWAS), and quantitative trait loci (QTL) studies have previously been used
• Little has been done to explore the ability of these models to assess the impact of genetic
variants on binding in a TF-specific manner.
• Not many curated datasets on variants impacting TFBSs
• Allele-specific ChIP-seq data
7. 7
Allele-specific ChIP-seq data
• Gather heterozygous mutations
• ChIP-seq for a particular TF are mapped onto each of the alleles
of the diploid genome
• Compare the read counts between the two parental
chromosomes (using binomial test)
• Significant binomial test (Pbinomial < 0.01):
- Allele-specific binding variants (ASB)
- variant impacts binding
• Non-significant binomial test (Pbinomial > 0.5):
- Non-allele-specific binding variant (non-ASB)
- variant has little to no impact on binding
Chen et al. 2016
8. 8
• Assess performance of binding predictors at predicting variant impact
• Collect ASB data (read counts on heterozygous variants)
• Compile TFBS predictors, score ASB and non-ASB
A compendium of allele-specific binding events
9. 9
• Mapped reads for heterozygous variants were
obtained from individual studies and not uniformly
processed
• To ensure reliability of cross-study read-counts:
correlate log ref/alt reads for overlapping ASB
events between studies
• mean Pearson r = 0.79
• Conclusion: although read counts come from
different studies, they remain in agreement.
A compendium of allele-specific binding events
11. 11
Properties of allele-specific binding data:
ASB loss variants are under purifying selection
• Assess proportion of ASB/non-ASB variants that
are rare wrt ExAC, 1000G and ESP6500
• Loss ASB variants are under purifying selection
(larger proportion of rare variants)
12. 12
Properties of allele-specific binding data:
Non-coding variant impact predictors do not differentiate ASB
from non-ASB
• Several other non-coding predictors that do not
take into account TF-binding motifs and instead
utilise metrics such as conservation are not able to
recapitulate
• i.e. knowledge on TF-specific binding specificity
can help identify impactful non-coding variants
13. 13
Properties of allele-specific binding data
Take-home messages
• Compiled largest known ASB dataset
• Loss ASBs are under purifying selection and therefore of significant importance
• Current non-coding variant impact predictors are unable to distinguish ASB
variants
• ASB data is suitable to assess performance of TF-binding models at predicting
variant impact
14. 14
Performance of transcription factor binding
variant impact predictions
Model collection
• Collected pre-trained and trained models for TFs with ASB data from five different
methods ranging from simple methods (PWMs) to deep learning approaches
1. PWMs
2. DeepBind
3. DeepSEA
4. DanQ
5. GERV
6. gkmSVM
15. 15
Performance of transcription factor binding
variant impact predictions
Model collection
Method Model type No. Models for TFs with ASB data Source data
DeepBind Pre-trained 91 ENCODE ChIP-seq
DeepSEA Pre-trained 91 ENCODE/RE ChIP-seq
DanQ Pre-trained 91 ENCODE/RE ChIP-seq
gkmSVM Trained 91 ENCODE ChIP-seq (Same data used to train DeepBind models)
GERV Pre-trained 60 ENCODE ChIP-seq
PWM - JASPAR Pre-trained 56 JASPAR PWMs
PWM - MEME ChIP Trained 87 Over-represented motifs discovered by MEME-ChIP using DeepBind training data
16. 16
Performance of transcription factor binding
variant impact predictions
Variant-impact metric definition
Method Metric Description
DeepSea/DanQ Log FC Chromatin feature probability log fold changes
Diff. Chromatin feature probability differences
gkmSVM deltaSVM
Change in the sum of k-mer weights for wildtype and variant
sequences
GERV GERV score
L2 norm of the difference between predicted ChIP-seq signal in a
given window for the reference and the alternate allele
DeepBind/PWMs Max delta raw
Difference between raw model scores for reference and alternate
alleles with the maximum absolute value across multiple windows
Delta max raw
Difference of the maximum reference and alternate raw model scores
across multiple windows
Max delta Pbind
Difference between probability-transformed scores for reference and
alternate alleles with the maximum absolute value across multiple
windows
Pcomb
Signed liklihood of loss or gain depending on which has higher
magnitute
Psum Sum of liklihood of loss and gain signed by effect size
Defined in this study
17. 17
Performance of transcription factor binding
variant impact predictions
Performance measure
• Loss ASB variants (Pbinomial < 0.01) and ref_reads >
alt_reads and at least 10 total reads
• Gain ASB variants (Pbinomial < 0.01) and alt_reads >
ref_reads at least 10 total reads
• Non-ASB variants (Pbinomial > 0.50) and at least 10 total
reads
• Use models for TFs with ≥10 ASB/non-ASB variants
• Measure AUROC/AUPRC
18. 18
Performance of transcription factor binding
variant impact predictions
PWM metrics performance
• PWM metrics have similar AUROCs
• Exception of max delta raw
• All metrics significantly have higher AUROCs (p<1.26e-04)
• JASPAR PWMs showed similar results (data not shown)
• Due to maximising over multiple windows of a sequence, score is often inflated
19. 19
Performance of transcription factor binding
variant impact predictions
DeepBind/DeepSEA/DanQ metrics performance
• DeepBind metrics have similar AUROCs
• DeepSEA metrics have similar AUROCs
• DanQ metrics have similar AUROCs
• → Choice of metric has no clear impact on performance
20. 20
Performance of transcription factor binding
variant impact predictions
Comparison of ML vs. PWM-based methods
• For methods with multiple metrics, we picked one
representative metric
• PWMs → Delta max raw
• DeepBind → Max delta raw
• DeepSEA/DanQ → Log FC
• Compare performance
• gkmSVM/DeepBind/DeepSEA/DanQ all significantly
outperformed PWMs (p<3.11e-03)
●
●
●
●
●
●
●
●
0.4
0.5
0.6
0.7
0.8
G
ERV
G
ERV
score
PW
M
(M
EM
E,signif)
D
elta
m
ax
rawgkm
SVM
deltaSVMD
eepBind
M
ax
delta
raw
D
anQ
Log
FCD
eepSEA
Log
FC
AUROC
Performance for 34 TFs
21. 21
Performance of transcription factor binding
variant impact predictions
Comparison of ML-based methods
• DeepSEA performs slightly better than gkmSVM (p=0.022) and
DeepBind (p=0.026)
• DanQ performs significantly better than gkmSVM (p=0.044) and
borderline significantly better DeepBind (p=0.057)
22. 22
Performance of transcription factor binding
variant impact predictions
Take-home messages
• The choice of the scoring metric used in variant impact can often be critical to both interpretability
and performance, particularly for PWMs
• Deep learning-based methods significantly outperform other ML-based and PWM-based methods
• Amongst deep learning methods, no clear winner wrt significance, although DeepSEA/DanQ
generally have higher performance
23. 23
What drives TF-specific performance?
• TFs show highly variable performance in assessing variant impact
• What are some of the factors that contribute to poor
performance?
• Do TFs that perform better at detecting their own binding sites
(Binding AUROC) perform better at assessing variant impact? No
• Some TFs that have distinct binding specificities, are unable to
predict variant impact
• What else could potentially drive poor performance?
24. 24
What drives TF-specific performance?
Alternative binding mechanisms explain performance differences
• A TF model can have less specificity at predicting
variant impact due to:
• Co-factors: TFs in larger complexes could have
different specificities
• Methylation: TFs that depend on methylation
for binding
• DNA shape: TFs that depend on shape of the
DNA
• PTMs: can regulate TF binding specificity (e.g.
in p53)
25. 25
What drives TF-specific performance?
Take-home messages
• Predictions for certain TFs were consistently poor, and our investigation supports efforts to use
features beyond sequence, such as methylation, DNA shape, and post-translational modifications
• Features such as cell-type/cell-line is also a confounding factor
26. 26
Detecting TF-altering LoF variants in a genome
• Loss of binding does not necessarily imply phenotypic consequence
• How to assess performance of predictors wrt TFBS-altering variants that have a
phenotypic consequence?
• No large scale TF-specific datasets available
27. 27
Detecting TF-altering LoF variants in a genome
• Manually curated 73 variants (11 gain
and 62 losses) with a phenotypic
consequence due to an altered TFBS
• 32 TFs and
• 36 phenotypes
• 35/73 (48%) of which have a
DeepSEA/DanQ/DeepBind ChIP-seq
binding model for the corresponding TF
• Scored variants against corresponding
DeepBind/DeepSEA/DanQ models
28. 28
Detecting TF-altering LoF variants in a genome
• Also scored 10,000 randomly sampled
1000Genome variants with an AF > 5%
as a background set and used to define
an empirical p-value
• For a given score s of a curated variant,
p-value is computed using the number
of 1000G variants that have a score ≥ s
29. 29
Detecting TF-altering LoF variants in a genome
• 70% of variants had a p-value <0.05
• 67% of variants had a p-value <0.01
• 30% of variants had a p-value of <0.001
→ Predictors were able to identify the
majority of these variants accurately
30. 30
Detecting TF-altering LoF variants in a genome
P-value-transformed values vs. model scores
• P-value transformation using a background set (e.g.
1000G) is common practice in assessing variant
impact
• Is it necessary?
• Across the different TFs, there exists a strong linear
relationship between the raw score and 1000g-
transformed p-value, across TFs (a)
• P-value transformation is not a necessity can
simply use a universal cut-off on the model's score
31. 31
Understanding our ability to detect LoF TFBS
variants in a genome
• Need: representative set of variants that are unlikely to cause LoF
• Collected variants from four relatively healthy patients (PGP)
• Restricted to
• Haploinsufficient genes as defined by ExAC pLI scores (pLI > 0.90)
• Falling within 5kb of the TSS (core promoter + extended region)
• Rare: gnomAD AF < 1e-4
• Average total of 79 variants per sample
32. 32
Understanding our ability to detect LoF TFBS
variants in a genome
• At a given cutoff, assess the % variants falling below (loss) or above (gain) that cutoff by at least one TF model
• For a given genome, at a cutoff of -1 “sweet spot” we are able to recover ~70% of curated variants with a
phenotype
• Maintain an average of ~15% false positive rate across four genomes (0.15 * 79 = ~12 variants)
• Similar for gains, although much fewer number of curated variants
0.00
0.25
0.50
0.75
1.00
−6 −4 −2 0
DanQ Log FC score cutoff
Proportionvariants
withatleastonemodel<cutoff
Loss
0.00
0.25
0.50
0.75
1.00
0 2 4 6
DanQ Log FC score cutoff
Proportionvariants
withatleastonemodel>cutoff
Curated variants gain (n=4)
Curated variants loss (n=39)
PGPC_0003 (n=79)
PGPC_0004 (n=65)
PGPC_0005 (n=113)
PGPC_0007 (n=59)
Gain
0.10
0.05
0.01
33. 33
Summary and wrap-up
• ASB data presents a useful resource for benchmarking TF model variant-
impact predictions
• Models could be trained to maximise variant-impact performance instead
of binding performance
• Our compiled set of ASB data (~100,000 variants, 150,000 TF-variant
pairs) is the largest available and is freely available online in the
supplementary data of the biorxiv paper http://goo.gl/2wFQ9w
34. 34
Summary and wrap-up
• PWMs do not perform well at variant impact, DL-methods significantly
better
• TFs do not perform uniformly at predicting variant impact!
• TFs with poor performance at assessing variant impact often rely on
additional mechanisms such as binding partners, methylation, DNA shape
and PTMs
• Incorporation of these mechanisms into training TF-binding models will
drastically increase TF-binding/variant-impact performance
35. 35
Summary and wrap-up
• Analysis of genome for healthy individuals reveals that DL models based
purely on sequence specificity in their current state perform reasonably
well at identifying LoF variants caused by altered TFBSs, while minimising
false positive rates