SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Drug Repurposing using Deep Learning
on Knowledge Graphs
Or how to leverage AI to recycle (old) new
drugs
About Us
Alex Thomas is a principal data scientist at Wisecube. He's
used natural language processing and machine learning
with clinical data, identity data, employer and jobseeker
data, and now biochemical data. Alex is also the author of
Natural Language Processing with Spark NLP.
Vishnu is the CTO and Founder of Wisecube AI and has over two
decades of experience building data science teams and
platforms. Vishnu has extensive experience with various graph
databases including Neo4J, TitanDB (now JanusGraph) and
more recently OrientDB and AWS Neptune.
Drug Discovery is Broken
- Every year, around US$200 billion is
spent globally on biomedical
research
- 75% of potential drug target
research could not be reproduced
- New drugs approved / Billion$ spent
on R&D has halved every 9 years
since 1950
- This is trend is now called Eroom’s
Law (opposite of Moore’s law)
Drug Repurposing: looking for (old) new cures
Given the high attrition rates, substantial costs and
slow pace of new drug discovery and development,
repurposing of 'old' drugs is a viable alternative.
Repurposing drugs to treat both common and rare
diseases is increasingly becoming an attractive
proposition because it involves the use of de-risked
compounds
Various data-driven and experimental approaches
have been suggested for the identification of
repurposable drug candidates.
AI (NLP + Knowledge Graphs + Deep Graph Learning) to the rescue
Wisecube works with Research
and Pharmaceutical
organizations to help leverage
the power of AI to accelerate
drug discovery and repurposing
We are currently working with
St.John’s Institute to repurpose
drug candidates
Wisecube Drug Repurposing Pipeline Overview
Pipeline Deep Dive
● Datasets
○ Ingesting Data
○ Graph Building
○ Link Prediction
Datasets
❏ Drug Repurposing Knowledge
Graph (DRKG)
❏ “Drug Repurposing Knowledge Graph (DRKG) is a comprehensive
biological knowledge graph relating genes, compounds, diseases,
biological processes, side effects and symptoms.”
❏ https://github.com/gnn4dr/DRKG
❏ ChEMBL
❏ “ChEMBL is a manually curated database of bioactive molecules with
drug-like properties.”
❏ https://www.ebi.ac.uk/chembl/
❏ PubChem
❏ “PubChem is an open chemistry database at the National Institutes of
Health (NIH).”
❏ https://pubchemdocs.ncbi.nlm.nih.gov/about
Datasets: DRKG
❏ DrugBank
❏ “DrugBank is a pharmaceutical knowledge base that is enabling major advances across the data-driven medicine
industry.”
❏ Link: https://go.drugbank.com/
❏ GNBR
❏ “A global network of biomedical relationships derived from text”
❏ https://zenodo.org/record/1134693#.WqQe1GbVSL9
❏ Hetionet
❏ “Hetionet is an integrative network of biomedical knowledge assembled from 29 different databases of genes,
compounds, diseases, and more.”
❏ https://het.io/
❏ StringDB
❏ “STRING is a database of known and predicted protein-protein interactions.”
❏ https://string-db.org/cgi/about
❏ IntAct
❏ “IntAct provides a freely available, open source database system and analysis tools for molecular interaction data.
“
❏ https://www.ebi.ac.uk/intact/
❏ DGIdb
❏ “[I]nformation on drug-gene interactions and the druggable genome, mined from over thirty trusted sources.”
❏ https://www.dgidb.org/
HETIONET
Pipeline Deep Dive
✓ Datasets
● Ingesting Data
○ Graph Building
○ Link Prediction
Ingesting Data
❏ Unifying the data
❏ Loading the data
❏ Post-processing the data
Ingesting Data: Unification
❏ DrugBankID -> NCBI CID -> ChEMBLID
❏ PUG REST API
❏ https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest
❏ PUG VIEW REST API
❏ https://pubchemdocs.ncbi.nlm.nih.gov/pug-view
NCBI CID <- DrugBankID
NCBI CID -> ChEMBLID
Ingesting Data: Loading
❏ Ingest into Graph DB
❏ Neptune
❏ CosmosDB
❏ Any Graph DB which supports Gremlin
❏ Graph DB vs Triple Store
❏ Most open data is in RDF triples formats (RDF/XML, Turtle,
N-Triples)
❏ Modern Graph Dbs are faster than Triple Stores
@prefix sio: <http://semanticscience.org/resource/> .
@prefix compound: <http://rdf.ncbi.nlm.nih.gov/pubchem/compound/> .
@prefix descriptor: <http://rdf.ncbi.nlm.nih.gov/pubchem/descriptor/> .
compound:CID400516 sio:has-attribute
descriptor:CID400516_Isomeric_SMILES ,
descriptor:CID400516_Isotope_Atom_Count ,
descriptor:CID400516_Molecular_Formula ,
descriptor:CID400516_Molecular_Weight ,
descriptor:CID400516_Mono_Isotopic_Weight ,
descriptor:CID400516_Non-hydrogen_Atom_Count ,
~id ~label articles:String[] source_ids:String[] name:String SMILES:String
8647 COMPOUND 13961;... CHEMBL1200689 Nitric oxide [N]=O
344 COMPOUND 268975;... CHEMBL142438 Nitrogen N#N
18030 COMPOUND 10081;... CHEMBL925 TYROSINE N[C@@H](Cc1ccc(O)cc1)C(=O)O
1534 COMPOUND 211538;... CHEMBL1616046
HYPOCHLOROUS
ACID
OCl
18800 COMPOUND 13464;... CHEMBL978 Methacholine CC(=O)OC(C)C[N+](C)(C)C
26747 COMPOUND 226005;.... CHEMBL863 Cysteine N[C@@H](CS)C(=O)O
Ingesting Data: Post-processing
1. Save predictions
2. Experts review
3. Ingest new edges
Pipeline Deep Dive
✓ Datasets
✓ Ingesting Data
● Graph Building
○ Link Prediction
Graph Building
❏ Explicit Relationships
❏ Literature-based Relationships
❏ Link Prediction Relationships
Graph Building: Explicit Relationships
❏ Explicit Relationships
❏ Triples data
❏ Inherently represents relationships
❏ Tabular data (flattened graph)
❏ 2 (or more) entities or IDs in each row
❏ Need to determine which fields are associated with which entity or edge
❏ RDBMS data
❏ Foreign keys
❏ Join tables
Graph Building: from Literature
❏ Heuristic vs Model
❏ Relationship extraction data sets are rare, compared to NER models
❏ Creating labels requires experts
❏ Heuristics with labels
❏ Stated relationships may span across multiple sentences
❏ Certain styles of language are excessively verbose
❏ Especially academic language
Graph Building: from Literature
1. Given two terms, u and v
2. Calculate TF.IDF for extracted entities
3. Sum TF.IDF for u and v over all documents
• TF.IDF(u), TF.IDF(v)
4. Identify documents where u and v share a
context
• Sentence, window, paragraph, whole document
5. Sum TF.IDF for u and v over all documents
where u and v share a context
• TF.IDF(u,v)
6. The weight for the potential u~v edges is
the ratio of these two sums
7. Accept edges over chosen threshold
• Top 10%
Graph Building: from Literature
1. Given two terms, u and v
2. Calculate TF.IDF for extracted entities
3. Sum TF.IDF for u and v over all documents
• TF.IDF(u), TF.IDF(v)
4. Identify documents where u and v share a
context
• Sentence, window, paragraph, whole document
5. Sum TF.IDF for u and v over all documents
where u and v share a context
• TF.IDF(u,v)
6. The weight for the potential u~v edges is
the ratio of these two sums
7. Accept edges over chosen threshold
• Top 10%
Pipeline Deep Dive
✓ Datasets
✓ Ingesting Data
✓ Graph Building
● Link Prediction
Link Prediction
❏ Untyped models
❏ Jaccard
❏ Deepwalk
❏ Typed Models
❏ TransE-L2
❏ DLG
❏ “Deep Graph Library (DGL) is a Python package
built for easy implementation of graph neural
network model family, on top of existing DL
frameworks (currently supporting PyTorch, MXNet
and TensorFlow).”
❏ https://docs.dgl.ai/
❏ Intuition
❏ Unconnected nodes which are connected to many of the same nodes may be connected
❏ Pro’s
❏ No training necessary
❏ Con’s
❏ Intuition is unrealistic
❏ Jaccard similarity
❏ For node u and v
❏ N(u): set of nodes connected to u
❏ N(v): set of nodes connected to v
❏ Jaccard similarity is |N(u) intersect N(v)| / |N(u) union N(v)|
Link Prediction: Jaccard
❏ Intuition
❏ A node can be characterized by the paths it occurs in
❏ Creates embeddings (vector representations)
❏ Pro’s
❏ Easy to train as it relies on models used in NLP
❏ Con’s
❏ Does not take into account the edge type
❏ DeepWalk
❏ For each node u, generate K random paths of length L with u in the
middle of the path
❏ Using these paths, build a model to predict u given the nodes before
and after it
❏ Model
❏ Build a model to predict if two nodes (represented by their
embeddings) are connected
DeepWalk
❏ Intuition
❏ Learn embeddings that directly predict embeddings
❏ Pro’s
❏ Directly predicts embeddings
❏ After embeddings are built, no additional model is needed
❏ Learns representation for relationships
❏ Con’s
❏ More sophisticated model (more parameters) takes longer to train
❏ TransE L2
❏ u, v are node representations (vectors)
❏ r is an edge type representation
❏ Train model that assumes ||u+r-v||2=0 if u and v are connected by and edge of type r
TransE L2
Research Case Study: Early Results
We worked with St.John’s
Institute (Part of Providence
Healthcare) to repurpose
drugs to inhibit a kinase
target related to Alzheimer's
disease and have submitted
the first round of drug
candidates for expert review
In Summary
• Drug Discovery Scientists are drowning
in disjoined datasets and bringing new
drugs to market is expensive and slow
• Drug Repurposing is one way to bring
new cures using old drugs
• NLP, Knowledge Graphs and Deep
Graph Learning are Key to leveraging
the combined knowledge of
experimental and literature based
evidence for accelerating drug
repurposing and research
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

PubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug DiscoveryPubChem and Its Applications for Drug Discovery
PubChem and Its Applications for Drug Discovery
 
Biological Database
Biological DatabaseBiological Database
Biological Database
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
AI in Bioinformatics
AI in BioinformaticsAI in Bioinformatics
AI in Bioinformatics
 
Ligand based drug design
Ligand based drug designLigand based drug design
Ligand based drug design
 
Molecular docking.pptx
Molecular docking.pptxMolecular docking.pptx
Molecular docking.pptx
 
Molecular Docking
 Molecular Docking Molecular Docking
Molecular Docking
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
DRUG DESIGN BASED ON BIOINFORMATICS TOOLS
DRUG DESIGN BASED ON BIOINFORMATICS TOOLSDRUG DESIGN BASED ON BIOINFORMATICS TOOLS
DRUG DESIGN BASED ON BIOINFORMATICS TOOLS
 
Molecular and data visualization in drug discovery
Molecular and data visualization in drug discoveryMolecular and data visualization in drug discovery
Molecular and data visualization in drug discovery
 
Protein computational analysis
Protein computational analysisProtein computational analysis
Protein computational analysis
 
Virtual sreening
Virtual sreeningVirtual sreening
Virtual sreening
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDBMetabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
 
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
 

Ähnlich wie Drug Repurposing using Deep Learning on Knowledge Graphs

Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Databricks
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
Yatpang Cheung
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
Michael Atkins
 

Ähnlich wie Drug Repurposing using Deep Learning on Knowledge Graphs (20)

Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Using NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 LiteratureUsing NLP to Explore Entity Relationships in COVID-19 Literature
Using NLP to Explore Entity Relationships in COVID-19 Literature
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differences
 
Canadian health census to lod
Canadian health census to lodCanadian health census to lod
Canadian health census to lod
 
Ontologies Ontop Databases
Ontologies Ontop DatabasesOntologies Ontop Databases
Ontologies Ontop Databases
 
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
Neo4j Demo: Using Knowledge Graphs to Classify Diabetes Patients (GlaxoSmithK...
 
The Cancer Genomics Cloud (CGC) pilots - an Introduction
The Cancer Genomics Cloud (CGC) pilots  - an IntroductionThe Cancer Genomics Cloud (CGC) pilots  - an Introduction
The Cancer Genomics Cloud (CGC) pilots - an Introduction
 
FedCentric_Presentation
FedCentric_PresentationFedCentric_Presentation
FedCentric_Presentation
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
 
TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...
 
NetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang SuNetBioSIG2013-Talk Gang Su
NetBioSIG2013-Talk Gang Su
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Transparency in the Data Supply Chain
Transparency in the Data Supply ChainTransparency in the Data Supply Chain
Transparency in the Data Supply Chain
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 

Mehr von Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 

Kürzlich hochgeladen (20)

Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 

Drug Repurposing using Deep Learning on Knowledge Graphs

  • 1. Drug Repurposing using Deep Learning on Knowledge Graphs Or how to leverage AI to recycle (old) new drugs
  • 2. About Us Alex Thomas is a principal data scientist at Wisecube. He's used natural language processing and machine learning with clinical data, identity data, employer and jobseeker data, and now biochemical data. Alex is also the author of Natural Language Processing with Spark NLP. Vishnu is the CTO and Founder of Wisecube AI and has over two decades of experience building data science teams and platforms. Vishnu has extensive experience with various graph databases including Neo4J, TitanDB (now JanusGraph) and more recently OrientDB and AWS Neptune.
  • 3. Drug Discovery is Broken - Every year, around US$200 billion is spent globally on biomedical research - 75% of potential drug target research could not be reproduced - New drugs approved / Billion$ spent on R&D has halved every 9 years since 1950 - This is trend is now called Eroom’s Law (opposite of Moore’s law)
  • 4. Drug Repurposing: looking for (old) new cures Given the high attrition rates, substantial costs and slow pace of new drug discovery and development, repurposing of 'old' drugs is a viable alternative. Repurposing drugs to treat both common and rare diseases is increasingly becoming an attractive proposition because it involves the use of de-risked compounds Various data-driven and experimental approaches have been suggested for the identification of repurposable drug candidates.
  • 5. AI (NLP + Knowledge Graphs + Deep Graph Learning) to the rescue Wisecube works with Research and Pharmaceutical organizations to help leverage the power of AI to accelerate drug discovery and repurposing We are currently working with St.John’s Institute to repurpose drug candidates
  • 6. Wisecube Drug Repurposing Pipeline Overview
  • 7. Pipeline Deep Dive ● Datasets ○ Ingesting Data ○ Graph Building ○ Link Prediction
  • 8. Datasets ❏ Drug Repurposing Knowledge Graph (DRKG) ❏ “Drug Repurposing Knowledge Graph (DRKG) is a comprehensive biological knowledge graph relating genes, compounds, diseases, biological processes, side effects and symptoms.” ❏ https://github.com/gnn4dr/DRKG ❏ ChEMBL ❏ “ChEMBL is a manually curated database of bioactive molecules with drug-like properties.” ❏ https://www.ebi.ac.uk/chembl/ ❏ PubChem ❏ “PubChem is an open chemistry database at the National Institutes of Health (NIH).” ❏ https://pubchemdocs.ncbi.nlm.nih.gov/about
  • 9. Datasets: DRKG ❏ DrugBank ❏ “DrugBank is a pharmaceutical knowledge base that is enabling major advances across the data-driven medicine industry.” ❏ Link: https://go.drugbank.com/ ❏ GNBR ❏ “A global network of biomedical relationships derived from text” ❏ https://zenodo.org/record/1134693#.WqQe1GbVSL9 ❏ Hetionet ❏ “Hetionet is an integrative network of biomedical knowledge assembled from 29 different databases of genes, compounds, diseases, and more.” ❏ https://het.io/ ❏ StringDB ❏ “STRING is a database of known and predicted protein-protein interactions.” ❏ https://string-db.org/cgi/about ❏ IntAct ❏ “IntAct provides a freely available, open source database system and analysis tools for molecular interaction data. “ ❏ https://www.ebi.ac.uk/intact/ ❏ DGIdb ❏ “[I]nformation on drug-gene interactions and the druggable genome, mined from over thirty trusted sources.” ❏ https://www.dgidb.org/ HETIONET
  • 10. Pipeline Deep Dive ✓ Datasets ● Ingesting Data ○ Graph Building ○ Link Prediction
  • 11. Ingesting Data ❏ Unifying the data ❏ Loading the data ❏ Post-processing the data
  • 12. Ingesting Data: Unification ❏ DrugBankID -> NCBI CID -> ChEMBLID ❏ PUG REST API ❏ https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest ❏ PUG VIEW REST API ❏ https://pubchemdocs.ncbi.nlm.nih.gov/pug-view NCBI CID <- DrugBankID NCBI CID -> ChEMBLID
  • 13. Ingesting Data: Loading ❏ Ingest into Graph DB ❏ Neptune ❏ CosmosDB ❏ Any Graph DB which supports Gremlin ❏ Graph DB vs Triple Store ❏ Most open data is in RDF triples formats (RDF/XML, Turtle, N-Triples) ❏ Modern Graph Dbs are faster than Triple Stores @prefix sio: <http://semanticscience.org/resource/> . @prefix compound: <http://rdf.ncbi.nlm.nih.gov/pubchem/compound/> . @prefix descriptor: <http://rdf.ncbi.nlm.nih.gov/pubchem/descriptor/> . compound:CID400516 sio:has-attribute descriptor:CID400516_Isomeric_SMILES , descriptor:CID400516_Isotope_Atom_Count , descriptor:CID400516_Molecular_Formula , descriptor:CID400516_Molecular_Weight , descriptor:CID400516_Mono_Isotopic_Weight , descriptor:CID400516_Non-hydrogen_Atom_Count , ~id ~label articles:String[] source_ids:String[] name:String SMILES:String 8647 COMPOUND 13961;... CHEMBL1200689 Nitric oxide [N]=O 344 COMPOUND 268975;... CHEMBL142438 Nitrogen N#N 18030 COMPOUND 10081;... CHEMBL925 TYROSINE N[C@@H](Cc1ccc(O)cc1)C(=O)O 1534 COMPOUND 211538;... CHEMBL1616046 HYPOCHLOROUS ACID OCl 18800 COMPOUND 13464;... CHEMBL978 Methacholine CC(=O)OC(C)C[N+](C)(C)C 26747 COMPOUND 226005;.... CHEMBL863 Cysteine N[C@@H](CS)C(=O)O
  • 14. Ingesting Data: Post-processing 1. Save predictions 2. Experts review 3. Ingest new edges
  • 15. Pipeline Deep Dive ✓ Datasets ✓ Ingesting Data ● Graph Building ○ Link Prediction
  • 16. Graph Building ❏ Explicit Relationships ❏ Literature-based Relationships ❏ Link Prediction Relationships
  • 17. Graph Building: Explicit Relationships ❏ Explicit Relationships ❏ Triples data ❏ Inherently represents relationships ❏ Tabular data (flattened graph) ❏ 2 (or more) entities or IDs in each row ❏ Need to determine which fields are associated with which entity or edge ❏ RDBMS data ❏ Foreign keys ❏ Join tables
  • 18. Graph Building: from Literature ❏ Heuristic vs Model ❏ Relationship extraction data sets are rare, compared to NER models ❏ Creating labels requires experts ❏ Heuristics with labels ❏ Stated relationships may span across multiple sentences ❏ Certain styles of language are excessively verbose ❏ Especially academic language
  • 19. Graph Building: from Literature 1. Given two terms, u and v 2. Calculate TF.IDF for extracted entities 3. Sum TF.IDF for u and v over all documents • TF.IDF(u), TF.IDF(v) 4. Identify documents where u and v share a context • Sentence, window, paragraph, whole document 5. Sum TF.IDF for u and v over all documents where u and v share a context • TF.IDF(u,v) 6. The weight for the potential u~v edges is the ratio of these two sums 7. Accept edges over chosen threshold • Top 10%
  • 20. Graph Building: from Literature 1. Given two terms, u and v 2. Calculate TF.IDF for extracted entities 3. Sum TF.IDF for u and v over all documents • TF.IDF(u), TF.IDF(v) 4. Identify documents where u and v share a context • Sentence, window, paragraph, whole document 5. Sum TF.IDF for u and v over all documents where u and v share a context • TF.IDF(u,v) 6. The weight for the potential u~v edges is the ratio of these two sums 7. Accept edges over chosen threshold • Top 10%
  • 21. Pipeline Deep Dive ✓ Datasets ✓ Ingesting Data ✓ Graph Building ● Link Prediction
  • 22. Link Prediction ❏ Untyped models ❏ Jaccard ❏ Deepwalk ❏ Typed Models ❏ TransE-L2 ❏ DLG ❏ “Deep Graph Library (DGL) is a Python package built for easy implementation of graph neural network model family, on top of existing DL frameworks (currently supporting PyTorch, MXNet and TensorFlow).” ❏ https://docs.dgl.ai/
  • 23. ❏ Intuition ❏ Unconnected nodes which are connected to many of the same nodes may be connected ❏ Pro’s ❏ No training necessary ❏ Con’s ❏ Intuition is unrealistic ❏ Jaccard similarity ❏ For node u and v ❏ N(u): set of nodes connected to u ❏ N(v): set of nodes connected to v ❏ Jaccard similarity is |N(u) intersect N(v)| / |N(u) union N(v)| Link Prediction: Jaccard
  • 24. ❏ Intuition ❏ A node can be characterized by the paths it occurs in ❏ Creates embeddings (vector representations) ❏ Pro’s ❏ Easy to train as it relies on models used in NLP ❏ Con’s ❏ Does not take into account the edge type ❏ DeepWalk ❏ For each node u, generate K random paths of length L with u in the middle of the path ❏ Using these paths, build a model to predict u given the nodes before and after it ❏ Model ❏ Build a model to predict if two nodes (represented by their embeddings) are connected DeepWalk
  • 25. ❏ Intuition ❏ Learn embeddings that directly predict embeddings ❏ Pro’s ❏ Directly predicts embeddings ❏ After embeddings are built, no additional model is needed ❏ Learns representation for relationships ❏ Con’s ❏ More sophisticated model (more parameters) takes longer to train ❏ TransE L2 ❏ u, v are node representations (vectors) ❏ r is an edge type representation ❏ Train model that assumes ||u+r-v||2=0 if u and v are connected by and edge of type r TransE L2
  • 26. Research Case Study: Early Results We worked with St.John’s Institute (Part of Providence Healthcare) to repurpose drugs to inhibit a kinase target related to Alzheimer's disease and have submitted the first round of drug candidates for expert review
  • 27. In Summary • Drug Discovery Scientists are drowning in disjoined datasets and bringing new drugs to market is expensive and slow • Drug Repurposing is one way to bring new cures using old drugs • NLP, Knowledge Graphs and Deep Graph Learning are Key to leveraging the combined knowledge of experimental and literature based evidence for accelerating drug repurposing and research
  • 28. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.