Discovering new drugs is a lengthy and expensive process. This means that finding new uses for existing drugs can help create new treatments in less time and with less time. The difficulty is in finding these potential new uses.
How do we find these undiscovered uses for existing drugs?
We can unify the available structured and unstructured data sets into a knowledge graph. This is done by fusing the structured data sets, and performing named entity extraction on the unstructured data sets. Once this is done, we can use deep learning techniques to predict latent relationships.
In this talk we will cover:
Building the knowledge graph
Predicting latent relationships
Using the latent relationships to repurpose existing drugs
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
Drug Repurposing using Deep Learning on Knowledge Graphs
1. Drug Repurposing using Deep Learning
on Knowledge Graphs
Or how to leverage AI to recycle (old) new
drugs
2. About Us
Alex Thomas is a principal data scientist at Wisecube. He's
used natural language processing and machine learning
with clinical data, identity data, employer and jobseeker
data, and now biochemical data. Alex is also the author of
Natural Language Processing with Spark NLP.
Vishnu is the CTO and Founder of Wisecube AI and has over two
decades of experience building data science teams and
platforms. Vishnu has extensive experience with various graph
databases including Neo4J, TitanDB (now JanusGraph) and
more recently OrientDB and AWS Neptune.
3. Drug Discovery is Broken
- Every year, around US$200 billion is
spent globally on biomedical
research
- 75% of potential drug target
research could not be reproduced
- New drugs approved / Billion$ spent
on R&D has halved every 9 years
since 1950
- This is trend is now called Eroom’s
Law (opposite of Moore’s law)
4. Drug Repurposing: looking for (old) new cures
Given the high attrition rates, substantial costs and
slow pace of new drug discovery and development,
repurposing of 'old' drugs is a viable alternative.
Repurposing drugs to treat both common and rare
diseases is increasingly becoming an attractive
proposition because it involves the use of de-risked
compounds
Various data-driven and experimental approaches
have been suggested for the identification of
repurposable drug candidates.
5. AI (NLP + Knowledge Graphs + Deep Graph Learning) to the rescue
Wisecube works with Research
and Pharmaceutical
organizations to help leverage
the power of AI to accelerate
drug discovery and repurposing
We are currently working with
St.John’s Institute to repurpose
drug candidates
7. Pipeline Deep Dive
● Datasets
○ Ingesting Data
○ Graph Building
○ Link Prediction
8. Datasets
❏ Drug Repurposing Knowledge
Graph (DRKG)
❏ “Drug Repurposing Knowledge Graph (DRKG) is a comprehensive
biological knowledge graph relating genes, compounds, diseases,
biological processes, side effects and symptoms.”
❏ https://github.com/gnn4dr/DRKG
❏ ChEMBL
❏ “ChEMBL is a manually curated database of bioactive molecules with
drug-like properties.”
❏ https://www.ebi.ac.uk/chembl/
❏ PubChem
❏ “PubChem is an open chemistry database at the National Institutes of
Health (NIH).”
❏ https://pubchemdocs.ncbi.nlm.nih.gov/about
9. Datasets: DRKG
❏ DrugBank
❏ “DrugBank is a pharmaceutical knowledge base that is enabling major advances across the data-driven medicine
industry.”
❏ Link: https://go.drugbank.com/
❏ GNBR
❏ “A global network of biomedical relationships derived from text”
❏ https://zenodo.org/record/1134693#.WqQe1GbVSL9
❏ Hetionet
❏ “Hetionet is an integrative network of biomedical knowledge assembled from 29 different databases of genes,
compounds, diseases, and more.”
❏ https://het.io/
❏ StringDB
❏ “STRING is a database of known and predicted protein-protein interactions.”
❏ https://string-db.org/cgi/about
❏ IntAct
❏ “IntAct provides a freely available, open source database system and analysis tools for molecular interaction data.
“
❏ https://www.ebi.ac.uk/intact/
❏ DGIdb
❏ “[I]nformation on drug-gene interactions and the druggable genome, mined from over thirty trusted sources.”
❏ https://www.dgidb.org/
HETIONET
10. Pipeline Deep Dive
✓ Datasets
● Ingesting Data
○ Graph Building
○ Link Prediction
15. Pipeline Deep Dive
✓ Datasets
✓ Ingesting Data
● Graph Building
○ Link Prediction
16. Graph Building
❏ Explicit Relationships
❏ Literature-based Relationships
❏ Link Prediction Relationships
17. Graph Building: Explicit Relationships
❏ Explicit Relationships
❏ Triples data
❏ Inherently represents relationships
❏ Tabular data (flattened graph)
❏ 2 (or more) entities or IDs in each row
❏ Need to determine which fields are associated with which entity or edge
❏ RDBMS data
❏ Foreign keys
❏ Join tables
18. Graph Building: from Literature
❏ Heuristic vs Model
❏ Relationship extraction data sets are rare, compared to NER models
❏ Creating labels requires experts
❏ Heuristics with labels
❏ Stated relationships may span across multiple sentences
❏ Certain styles of language are excessively verbose
❏ Especially academic language
19. Graph Building: from Literature
1. Given two terms, u and v
2. Calculate TF.IDF for extracted entities
3. Sum TF.IDF for u and v over all documents
• TF.IDF(u), TF.IDF(v)
4. Identify documents where u and v share a
context
• Sentence, window, paragraph, whole document
5. Sum TF.IDF for u and v over all documents
where u and v share a context
• TF.IDF(u,v)
6. The weight for the potential u~v edges is
the ratio of these two sums
7. Accept edges over chosen threshold
• Top 10%
20. Graph Building: from Literature
1. Given two terms, u and v
2. Calculate TF.IDF for extracted entities
3. Sum TF.IDF for u and v over all documents
• TF.IDF(u), TF.IDF(v)
4. Identify documents where u and v share a
context
• Sentence, window, paragraph, whole document
5. Sum TF.IDF for u and v over all documents
where u and v share a context
• TF.IDF(u,v)
6. The weight for the potential u~v edges is
the ratio of these two sums
7. Accept edges over chosen threshold
• Top 10%
21. Pipeline Deep Dive
✓ Datasets
✓ Ingesting Data
✓ Graph Building
● Link Prediction
22. Link Prediction
❏ Untyped models
❏ Jaccard
❏ Deepwalk
❏ Typed Models
❏ TransE-L2
❏ DLG
❏ “Deep Graph Library (DGL) is a Python package
built for easy implementation of graph neural
network model family, on top of existing DL
frameworks (currently supporting PyTorch, MXNet
and TensorFlow).”
❏ https://docs.dgl.ai/
23. ❏ Intuition
❏ Unconnected nodes which are connected to many of the same nodes may be connected
❏ Pro’s
❏ No training necessary
❏ Con’s
❏ Intuition is unrealistic
❏ Jaccard similarity
❏ For node u and v
❏ N(u): set of nodes connected to u
❏ N(v): set of nodes connected to v
❏ Jaccard similarity is |N(u) intersect N(v)| / |N(u) union N(v)|
Link Prediction: Jaccard
24. ❏ Intuition
❏ A node can be characterized by the paths it occurs in
❏ Creates embeddings (vector representations)
❏ Pro’s
❏ Easy to train as it relies on models used in NLP
❏ Con’s
❏ Does not take into account the edge type
❏ DeepWalk
❏ For each node u, generate K random paths of length L with u in the
middle of the path
❏ Using these paths, build a model to predict u given the nodes before
and after it
❏ Model
❏ Build a model to predict if two nodes (represented by their
embeddings) are connected
DeepWalk
25. ❏ Intuition
❏ Learn embeddings that directly predict embeddings
❏ Pro’s
❏ Directly predicts embeddings
❏ After embeddings are built, no additional model is needed
❏ Learns representation for relationships
❏ Con’s
❏ More sophisticated model (more parameters) takes longer to train
❏ TransE L2
❏ u, v are node representations (vectors)
❏ r is an edge type representation
❏ Train model that assumes ||u+r-v||2=0 if u and v are connected by and edge of type r
TransE L2
26. Research Case Study: Early Results
We worked with St.John’s
Institute (Part of Providence
Healthcare) to repurpose
drugs to inhibit a kinase
target related to Alzheimer's
disease and have submitted
the first round of drug
candidates for expert review
27. In Summary
• Drug Discovery Scientists are drowning
in disjoined datasets and bringing new
drugs to market is expensive and slow
• Drug Repurposing is one way to bring
new cures using old drugs
• NLP, Knowledge Graphs and Deep
Graph Learning are Key to leveraging
the combined knowledge of
experimental and literature based
evidence for accelerating drug
repurposing and research