This document provides an overview of a workshop on integrating genetics, omics, and chemical data for drug discovery using the Open Targets platform. The workshop covers an introduction to Open Targets, demonstrations of the Open Targets platform and genetics tools, and hands-on exercises. Open Targets is a partnership that aims to systematically identify and prioritize drug targets through integrating public data and experimental projects. The platform provides target and disease annotations, evidence for target-disease associations, and tools to explore target tractability and safety.
1. Open Targets: integrating genetics,
omics and chemical data for drug
discovery
Denise Carvalho-Silva, PhD
EMBL-EBI | Open Targets
Wellcome Genome Campus
United Kingdom
C4X Discovery
June 18th 2019
3. • Introduction to Open Targets
• Open Targets Platform: presentation + live demos
• Hands-on exercises
Lunch break à 12:00-13:00
• Open Targets Genetics: presentation + live demo
• Feedback survey, wrap up, further discussion
This session 10:00-15:00
4. Drug discovery R&D
Lengthy, costly, high attrition
https://www.genengnews.com/insights/how-crispr-is-accelerating-drug-discovery/
DOI: 10.1016/j.molonc.2012.02.004
5. Open Targets
A partnership to transform drug discovery
Founded in 2014
Founding partners 2016 2017 20182018
Systematic identification and prioritisation of targets
8. Data generation: Open Targets
www.opentargets.org/science/
• Organoid knockouts (CRISPR-Cas9) - gut epithelium
• IL22 pathway to treat IBD?
• Wellcome Sanger Institute, GSK, University of Cambridge
• Alzheimer’s and Parkinson’s
• CRISPR/Cas9 screens, iPS cells
• Wellcome Sanger Institute, Biogen, Gurdon Institute
Some examples
• > 1,000 cancer cell lines + biomarkers + tractability
• RNASeq, CRISPR/Cas9 screens
• Wellcome Sanger Institute, GSK and EMBL-EBI
10. Behan et al (2019)
WRN sustains in vivo growth in:
• colorectal
• ovarian
• endometrial
• gastric
• New candidate target for tumours
with MSI (WRN antagonists)
Data now available in the Open Targets Platform e.g.
https://www.targetvalidation.org/evidence/ENSG000001
00941/EFO_0000305?view=sec:affected_pathway
14. What is a target?
https://www.targetvalidation.org/target/ENSG00000255248
https://www.targetvalidation.org/target/ENSG00000175482?view=sec:genome_browser
28K
targets
Examples
15. • Modified version of Experimental Factor Ontology (EFO)
• Controlled vocabulary (Coeliac versus Celiac)
• Hierarchy (relationships)
How do we describe our diseases?
• Promotes consistency
• Increases the richness of annotation
• Allow for easier and automatic integration
10K
diseases
16. Evidence for our T-D associations
https://docs.targetvalidation.org/data-sources/data-sources
17. Data sources grouped into data types
Data type
Data source
https://docs.targetvalidation.org/data-sources/data-sources
18. What can you do with
the Open Targets
Platform?
• Target annotations
• Target-disease associations (+ evidence + score)
• Disease annotations
http://www.targetvalidation.org/target/ENSG00000141510
http://www.targetvalidation.org/diseaset/EFO_0000228
https://www.targetvalidation.org/evidence/ENSG00000141510/EFO_0000228
19. Demo 1
Searching for a disease à targets
What is the evidence for
the association?
Which targets are
associated with my disease?
Is there any data to help me
with target prioritization e.g.
tractability?
20.
21. Association score à confidence
Which targets have more evidence for an association?
What is the relative weight of the evidence for different targets?
Overall score Genetic Somatic Drugs Pathways Expression Animal modText mining
22. ΣH
Calculated at
four levels:
• Evidence
• Data source
• Data type
• Overall
Score: 0 to 1 (max)
Aggregation with
(harmonic sum)
ΣH
Note: Each data set has
its own scoring and
ranking scheme
S1 + S2/22 + S3/32 + S4/42 + Si/i2
European Variation
Archive (germline)
UniProt
Gene2Phenotype
GWAS catalog
Cancer Gene Census
European Variation
Archive (somatic)
IntOGen
ChEMBL
Reactome
Expression Atlas
Europe PMC
PhenoDigm
Genetic associations
Somatic mutations
RNA expression
Animal models
Pathways &
systems biology
Text mining
Drugs
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*0.2
*0.2
*0.2
Association
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
Genomics England
PheWAS catalog*1.0
*1.0
ΣH
ΣH
weight factor
SLAPenrich
PROGENy
ΣH
*0.5
ΣH
ΣH
SysbioΣH
*0.5
*0.5
Four-tier framework
23. Statistical integration, aggregation and scoring
From evidence to overall score
https://docs.targetvalidation.org/getting-started/scoring
1) Evidence score (e.g. one SNP from a GWAS paper)
2) Data source score (e.g. all SNPs from the GWAS catalog)
3) Data type score (e.g. all sources of Genetic associations)
4) Overall association score
24. f = sample size (cases and controls)
s = predicted functional consequence (VEP)
c = p value reported in the paper
Computing the score for one evidence
score = f * s * c
f, relative occurrence of a target-disease evidence
s, strength of the effect of the variant
c, confidence of the observation for the target-disease evidence
https://docs.targetvalidation.org/getting-started/scoring
(Factors affecting the relative strength of GWAS Catalog evidence)
26. Aggregating scores across the data
• Using a mathematical function, the harmonic sum*
where S1,S2,...,Si are the individual sorted evidence scores in descending order
* PMID: 19107201, PMID: 20118918
• Advantages:
A) account for replication
B) deflate the effect of large amounts of data e.g. text mining
29. What can you do with
the Open Targets
Platform?
• Target annotations
• Target-disease associations (+ evidence + score)
• Disease annotations
https://www.targetvalidation.org/target/ENSG00000141510
https://www.targetvalidation.org/diseaset/EFO_0000228
https://www.targetvalidation.org/evidence/ENSG00000141510/EFO_0000228
44. We have a list of 20 possible
targets for multiple myeloma.
https://www.targetvalidation.org/batch-search
Several targets at once
We would like to know if these
targets are represented in
other diseases.
Are there any pathways over
represented in my set of targets?
51. How to get the evidence for an association
http://platform-
api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000198947&disease=Orphanet_98896&datat
ype=genetic_association
54. Extra hands-on exercises E1-E4
Pages 29-31
Or feel free to explore your target (gene/protein)
and disease of interest
https://tinyurl.com/exercises-c4x
55. Data integration: Open Targets
Data integration: web resources
https://www.opentargets.org/resources/#open-targets-platform
56. Open Targets Genetics. Why?
• Refine genetic associations in the Open Targets Platform
• Make sense of GWAS
• Glimpses of the biology behind the association
• Guide target ID?
Lee et al (2017): PMID:29288389
59. Open Targets Genetics: data sources
https://genetics-docs.opentargets.org/our-approach/data-sources
Functional Genomics
Variant – Gene
Full summary statistics
Neale and colleagues v1
337,199 individuals
2,419 traits
SNP-trait
associations
Human genetics
Variant (GnomAD) – Trait (study)
Sun et al. 2018
Ensembl
VEP
*
**
***
* Javierre et al. 2016
** Andersson et al. 2014
*** Thruman et al. 2012
60. Open Targets Genetics: data model
S VL G
S Study (traits from UK Biobank and GWAS catalog)
VL Lead variant (associated with traits from GWAS catalog)
VT Tag variant (possible causal, expanded from VL)
G Gene
p-value
GWAS Catalog
UK Biobank
VT
Fine mapping
LD expansion
r2
Posterior Prob.
TSS distance
eQTL (GTEx V7)
pQTL
PCHI-C
FANTOM5
VEP
Aggregated
functional score
https://genetics-docs.opentargets.org/our-approach/pipeline-overview
61. What can you do
with Open Targets
Genetics?
Variant Trait Gene
• Genes functionally implicated
• Variant association across traits (PheWAS)
• Variants tagged
• Associated traits
• Links to drugs, expression,
pathway, mouse phenotype,
etc in Open Targets Platform
• Independently associated loci
• Overlapping susceptibility loci
V S G
Locus plot
Visualising the associations between traits, variants, and genes
Fine mapping analysis
62. How to access all this data?
http://genetics-api.opentargets.io/
GraphQL API
Bulk download*
* http://blog.opentargets.org/2019/04/09/open-targets-genetics-release-is-out/
https://genetics.opentargets.io/
User interface
63. The SNP rs12916 has been
associated with cholesterol LDL
(Teslovich et al 2010)
https://genetics.opentargets.org
Demo: searching for a SNP
Can we use Open Targets Genetics
to find which genes are functionally
implicated by this variant?
Which is the nearest protein
coding gene to this variant?
Can we compare Teslovich
et al with other cholesterol
LDL studies?
70. Details on data sources to associate
targets and diseases
Other Open Targets resources
Extra extra extra
71. Target safety
• Clinical phases e.g. phase IV (bucket 1)
• Cellular localization e.g. plasma membrane (bucket 4)
• DrugEBIlity – ensemble score e.g. > 0.7 (bucket 5)1
https://docs.targetvalidation.org/getting-started/getting-started/target-profile
72. Data source: GWAS catalog
• Genome Wide Association Studies
• Array-based chips à genotyping 100,000 SNPs genome wide
Data type
73. Data source: UniProt
• Protein: sequence, annotation, function
• Manual curation of coding variants in patients
EMBL-EBI train online
Data types
74. • Variants, genes, phenotypes in rare diseases
• Literature curation à consultant clinical geneticists in the UK
Data source: Gene2Phenotype
https://www.nature.com/articles/s41467-019-10016-3
Data type
75. Data source: UniProt
• Protein: sequence, annotation, function
• Manual curation of coding variants in patients
EMBL-EBI train online
Data type
76. Data source: PheWAS
• Phenome Wide Association Studies
• A variant associated with multiple phenotypes
• Clinical phenotypes derived from EMR-linked biobank BioVU
• ICD9 codes mapped to EFO
Data type
77. Data source: GE PanelApp
• Aid clinical interpretation of genomes for the 100K project
• We include ‘green genes’ from version 1+ and phenotypes
Data type
78. Data source: EVA
• With ClinVar information for rare diseases
• Clinical significance: pathogenic, protective
EMBL-EBI train online
Data types
79. Data source: The Cancer Gene Census
• Genes with mutations causally implicated in cancer
• Gene associated with a cancer plus other cancers associated
with that gene
Data type
80. Data source: IntOGen
• Genes and somatic (driver) mutations, 28 cancer types
• Involvement in cancer biology
• TCGA data
• Rubio-Perez et al. 2015
Data type
81. Data source: ChEMBL
• Known drugs linked to a disease and a known target
• FDA approved for clinical trials or marketing
EMBL-EBI train online
Data type
82. Data source: Reactome
• Biochemical reactions and pathways
• Manual curation of pathways affected by mutations
EMBL-EBI train online
Data type
83. Data source: SLAPenrich
• 374 pathways curated and mapped to cancer hallmarks
• Divergence of the total number of (TCGA) cancer samples with
genomic alterations
• Mutational burden and total exonic block length of genes
• Downweighed (x 0.5)
Data type
84. Data source: PROGENy
• Comparison of pathway activities between normal and primary
samples from TCGA
• Inferred from RNA-seq: 9,250 tumour and 741 normal samples
• EGFR, hypoxia, JAK.STAT, MAPK, NFkB, PI3K, TGFb, TNFa,
Trail, VEGF, and p53
• Downweighed (x 0.5)
Data type
85. Data source: SysBio
• Curation of four systems biology papers
• Available for six gene lists: ~ 400 genes
• Late onset Alzheimers, cognitive decline, CHD, and IBD
• Score: p-values or rank-based scores if available, if not s=0.5
• Downweighed (x 0.5)
Data type
https://docs.targetvalidation.org/data-sources/affected-pathways#sysbio
87. Data source: Expression Atlas
• Baseline expression for human genes
- target profile page
• Differential mRNA expression (healthy versus diseased):
- target-disease associations
• No propagation in the disease ontology
• Downweighed (x 0.2)
EMBL-EBI train online
Data type
88. Data source: Europe PMC
• Mining titles, abstracts, full text in research articles
• Target and disease co-occurrence in the same sentence
• Dictionary (not NLP)
• Downweighed (x 0.2)
EMBL-EBI train online
Data type
89. Data sources: PhenoDigm
• Semantic approach to associate mouse models with diseases
• Similarity score between a mouse model and a human disease
(Smedley et al 2013)
• Downweighed (x 0.2)
Data type
90. Paste the URL in the location bar of your browser
How to run our REST endpoints (option 1)
91.
92. Command line e.g. CURL –X GET
How to run our REST endpoints (option 2)
93. How to run our REST endpoints (option 3)
Use our free Python client *
* https://docs.targetvalidation.org/tutorials/python-client
Advantage: you can change the way the associations are scored e.g. increase the weight given to
text mining data
94. REST API calls: some examples*
https://platform-api.opentargets.io/v3/platform/public/search?q=alzheimer%27s
https://platform-api.opentargets.io/v3/platform/public/association/filter?target=ENSG00000142192
https://platform-api.opentargets.io/v3/platform/public/association/filter
?target=ENSG00000142192&direct=true
https://platform-
api.opentargets.io/v3/platform/public/evidence/filter?target=ENSG00000142192&disease=EFO_00002
49&datasource=uniprot&direct=true
95. Open Targets REST API
Private: methods used by the UI to serve external data. Subject to change without notice
https://platform-api.opentargets.io/v3/platform/docs/swagger-ui
96. LINK
• LINK: Literature coNcept Knowledgebase
• Subject / predicate / object structured relations
From PubMed abstracts
Proof of Concept
Further developement
http://link.opentargets.io/
97. Addressing text mining shortcomings
• Entities: genes, diseases, drugs
• Concepts extracted via NLP
(Natural Language Processing)
• 28 M documents, 500 M relations
• http://blog.opentargets.org/link/
98. DoRothEA
• Candidate TF-drug interactions in cancer
• 1000 cancer cell lines
• 265 anti-cancer compounds
• 127 transcription factors
http://cancerres.aacrjournals.org/content/early/2017/12/09/0008-5472.CAN-17-1679
dorothea.opentargets.io