Overview of Open PHACTS, the BDE Pilot project in SC1, presented at BDE SC1 Workshop 3, 13 December, 2017.
https://www.big-data-europe.eu/the-final-big-data-europe-workshop/
Models Call Girls Electronic City | 7001305949 At Low Cost Cash Payment Booking
BIG DATA EUROPE SC1 PILOT: OPEN PHACTS DRUG DISCOVERY PLATFORM
1. BIG DATA EUROPE SC1 PILOT
The Open PHACTS Discovery Platform
Kiera McNeice, Open PHACTS Foundation13 Dec 2017
2. Big Data Europe Objectives
Build foundational Big Data infrastructure
that:
o Is open source
o Makes it simple to get started with Big Data
o Supports a variety of use cases
o Embraces emerging Big Data technologies
o Enables simple integration with custom
components
4. Drug discovery using public
data
Literature
PubChem
Genbank
Patents
Databases
Downloads
Data Integration Data Analysis
Firewalled Databases
5. The situation in 2010…
GSK
Pfizer
AstraZeneca
Roche
Novartis
Merck-Serono
Janssen
6. Challenges: Identifiers
Andy Law’s third law:
The number of unique identifiers assigned to an individual is
never less than the number of institutions involved in the study
P12047
X31045
GB:29384
http://bioinformatics.roslin.ac.uk/lawslaws/
8. Semantic linking (RDF)
Link and store data as semantic “triples”:
[Compound] acts on [Target]
Subject Predicate Object––
9. Focus on researcher needs
ChEMBL DrugBank
Gene
Ontology
Wikipathways
UniProt
ChemSpider
UMLS
ConceptWiki
ChEBI
TrialTrove
GVKBio
GeneGo
TR
Integrity
“Find me compounds
that inhibit targets in
NFkB pathway
assayed in only
functional assays with
a potency <1 μM”
“What is the
selectivity profile of
known p38
inhibitors?”
“Let me compare
MW, logP and PSA
for known
oxidoreductase
inhibitors”
DisGeNet
neXtProt
ChEMBL
Target Class
ENZYME
FDA adverse
events
SureChEMBL
10. Ranked research questions
Number sum Nr of 1 Question
15 12 9
All oxidoreductase inhibitors active <100nM in both human and
mouse
18 14 8
Given compound X, what is its predicted secondary pharmacology?
What are the on and off,target safety concerns for a compound? What is
the evidence and how reliable is that evidence (journal impact factor,
KOL) for findings associated with a compound?
24 13 8
Given a target find me all actives against that target. Find/predict
polypharmacology of actives. Determine ADMET profile of actives.
32 13 8 For a given interaction profile, give me compounds similar to it.
37 13 8
The current Factor Xa lead series is characterised by substructure X.
Retrieve all bioactivity data in serine protease assays for molecules that
contain substructure X.
38 13 8
Retrieve all experimental and clinical data for a given list of compounds
defined by their chemical structure (with options to match
stereochemistry or not).
41 13 8
A project is considering Protein Kinase C Alpha (PRKCA) as a target.
What are all the compounds known to modulate the target directly? What
are the compounds that may modulate the target directly? i.e. return all
cmpds active in assays where the resolution is at least at the level of the
target family (i.e. PKC) both from structured assay databases and the
15. Example workflow
Q10: For a given compound, summarise all
similar compounds and their activities
CC1=C(C(C(=C(N
1)C)C(=O)OC)C2=
CC=CC=C2[N+](=
O)[O-])C(=O)OC
18. Benefits of Open PHACTS
Efficiency: Queries that once took days can now be done
in less than an hour
Novelty: Semantically integrated databases allow for
completely new ways of analysing the data
Cost: Sharing cost and effort in a precompetitive project
saved “millions”
“Integration of different databases is difficult, costly,
and time consuming, and probably would not have
been done at this level of quality without Open
20. Open PHACTS architecture
Nanopub
Db
VoID
Data Cache
(Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
Indexing
CorePlatform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public Ontologies
User
Annotations
Apps
Yes!
21. Open PHACTS in BDE
Able to exchange Virtuoso triple store for 4Store
18 of 21 original research questions answered
o (Remaining 3 required patent data which is not open)
IMS implemented as independent docker module
Local installation runs much faster than original platform!
22. Local hardware requirements
Hardware:
150GB of disk space (ideal: 250GB)
16GB of RAM (ideal: 128GB)
4 CPU core (ideal: 8 cores)
Prerequisites:
Recent x64 Linux (Ubuntu 14.04 LTS, Centos 7)
Docker and Docker Compose
Fast Internet connection https://github.com/openphacts/ops-docker
https://data.openphacts.org/
23. Advantages of rebuilding with
BDI
Integration into a wider platform
Flexibility, scalability, extensibility
Local installation of the entire Open PHACTS
infrastructure!