1. Open Science for Rare and Neglected
Diseases
Sean Ekins
Collaborations Pharmaceuticals, Inc. Fuquay Varina, NC.
Collaborations in Chemistry, Inc. Fuquay Varina, NC.
Collaborative Drug Discovery, Inc., Burlingame, CA.
Phoenix Nest, Brooklyn, NY
Hereditary Neuropathy Foundation, New York, NY
Wikipedia
3. Objectives – The Big Picture
Demonstrate – open data - models – new leads – accessible to anyone
Describe - experiences working on rare and neglected diseases
Suggest - What could be done to increase success in bringing to clinic
Inspire – others to help
4. Neglected and Rare Disease Drug Discovery
Share urgent need for new therapeutics
http://www.mm4tb.org/ http://www.phoenixnestbiotech.com/
5. Laboratories past and present
Lavoisier’s lab 18th C Edison’s lab 20th C
Author’s lab 21th C
+ Network of global collaborators
7. Crowdfunding Rare Disease Science
Sanfilippo Syndrome - funds raised in ~ 2 years
Funding gene therapy research/ development
for Sanfilippo Type A
Funding gene therapy research/
development / Chaperone research for
Sanfilippo Type C
8. Start a company
Enables you to apply for SBIR /STTR
grants
Fund research – help academics and
commercialize their work
Then use NIH TRND and NCATS and
other programs
http://goo.gl/k0o9Q0
Developing an enzyme replacement for
MPS IIID with Dr. Patricia Dickson at LA
BioMed
9. Idea + Data + Skills + Time = Discovery
Drug Discovery on a Shoestring
• What disease / target
• do I want to work on?
• Will it make a
difference?
• What data is there I can use?
• What is the data quality?
• Is it public or do I need to
reach out to a lab?
• What technology can I access?
• Am I capable of following through?
• Who can I get to help me?
• Where do I find the right person/s?
• How do I fit it into my day job?
• Is this an evening / weekend project?
• What will have to give?
10. Having the big idea
• Driven by personal circumstance
• You or family member/ friend has a disease
• You meet someone and they inspire you
• Driven by surroundings
• You live in an area where disease is endemic
• You read/ hear something e.g. Twitter
• You do it for the recognition / kudos
• You want to give back
• None of the above
• There are >7000 rare diseases – pick one!
11. "Rub al Khali 002" by Nepenthes
The chemistry/ biology data desert outside of pharma circa early 2000’s
Limited ADME/Tox data
Paucity of Structure Activity Data
Small datasets for modeling
Drug companies – gate keepers of information for drug discovery
12. "Oasis in Libya" by Sfivat
The growing chemistry/biology data Oasis outside of pharma circa 2015
13. Examples of using open data
• To discover new leads
• Tuberculosis – from public data to open models to create IP
• Chagas Disease - from public data to create new IP
• Ebola virus – from little data to create open data and IP
• Making lots of machine learning models open
• Demonstrate collaborations
14. Tuberculosis kills 1.6-1.7m/yr (~1 every 8 seconds)
1/3rd of worlds population infected!!!!
streptomycin (1943)
para-aminosalicyclic acid (1949)
isoniazid (1952)
pyrazinamide (1954)
cycloserine (1955)
ethambutol (1962)
rifampicin (1967)
Multi drug resistance in 4.3% of cases
Extensively drug resistant increasing
incidence
one new drug (bedaquiline) in 40 yrs
Tuberculosis
15. Tested >350,000 molecules Tested ~2M 2M >300,000
>1500 active and non toxic Published 177 100s 800
Bigger Open Data: Screening for New Tuberculosis Treatments
How many will become a new drug?
TBDA screened over 1 million, 1 million
more to go
TB Alliance + Japanese pharma screens
R43 LM011152-01
16. Over 8 years analyzed in vitro data and built models
Top scoring molecules
assayed for
Mtb growth inhibition
Mtb screening
molecule
database/s
High-throughput
phenotypic
Mtb screening
Descriptors + Bioactivity (+Cytotoxicity)
Bayesian Machine Learning classification Mtb Model
Molecule Database
(e.g. GSK malaria
actives)
virtually scored
using Bayesian Models
New bioactivity data
may enhance models
Identify in vitro hits and test models3 x published prospective tests ~750
molecules were tested in vitro
198 actives were identified
>20 % hit rate
Multiple retrospective tests 3-10 fold
enrichment
N
H
S
N
Ekins et al., Pharm Res 31: 414-435, 2014
Ekins, et al., Tuberculosis 94; 162-169, 2014
Ekins, et al., PLOSONE 8; e63240, 2013
Ekins, et al., Chem Biol 20: 370-378, 2013
Ekins, et al., JCIM, 53: 3054−3063, 2013
Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011
Ekins et al., Mol BioSyst, 6: 840-851, 2010
Ekins, et al., Mol. Biosyst. 6, 2316-2324, 2010,
R43 LM011152-01
17. 5 active compounds vs Mtb in a few months
7 tested, 5 active (70% hit rate)
Ekins et al.,Chem
Biol 20, 370–378,
2013
1. Virtually screen
13,533-member GSK
antimalarial hit library
2. Bayesian Model = SRI
TAACF-CB2 dose
response + cytotoxicity
model
3. Top 46 commercially
available compounds
visually inspected
4. 7 compounds chosen
for Mtb testing based
on
- drug-likeness
- chemotype diversity
GSK #
Bayesian
Score Chemical Structure
Mtb H37Rv
MIC
(mg/mL)
GSK
Reported
% Inhibition
HepG2 @ 10
mM cmpd
TCMDC-
123868 5.73 >32 40
TCMDC-
125802 5.63 0.0625 5
TCMDC-
124192 5.27 2.0 4
TCMDC-
124334 5.20 2.0 4
TCMDC-
123856 5.09 1.0 83
TCMDC-
123640 4.66 >32 10
TCMDC-
124922 4.55 1.0 9
R43 LM011152-01
18. • BAS00521003/ TCMDC-125802 reported to be a P.
falciparum lactate dehydrogenase inhibitor
• Only one report of antitubercular activity from 1969
- solid agar MIC = 1 mg/mL (“wild strain”)
- “no activity” in mouse model up to 400 mg/kg
- however, activity was solely judged by
extension of survival!
Bruhin, H. et al., J. Pharm. Pharmac. 1969, 21, 423-433.
.
MIC of 0.0625 ug/mL
• 64X MIC affords 6 logs of
kill
• Resistance and/or drug
instability beyond 14 d
Vero cells : CC50 = 4.0
mg/mL
Selectivity Index SI =
CC50/MICMtb = 16 – 64
In mouse no toxicity but
also no efficacy in GKO
model – probably
metabolized.
Ekins et al.,Chem Biol 20, 370–378, 2013
Taking a compound in vivo identifies issues
R43 LM011152-01
19. Optimizing the triazine series as part of this project, improve solubility and show in
vivo efficacy
1U19AI109713-01
20. Chagas Disease
• About 7 million to 8 million people
estimated to be infected worldwide
• Vector-borne transmission occurs in the
Americas.
• A triatomine bug carries the
parasite Trypanosoma cruzi which causes
the disease.
• The disease is curable if treatment is
initiated soon after infection.
• No FDA approved drug, pipe line sparse
Hotez et al., PLoS Negl Trop Dis. 2013 Oct
31;7(10):e2300
R41-AI108003-01
21. T. cruzi
C2C12 cells
6-8 days
infect
T. cruzi
(Trypomastigote)
T. cruzi high-content screening assay
Plate containing
compounds
T.cruzi
Myocyte
Fixing & Staining
Reading
3 days
R41-AI108003-01
22. • Dataset from PubChem AID 2044 – Broad Institute data
• Dose response data (1853 actives and 2203 inactives)
• Dose response and cytotoxicity (1698 actives and 2363 inactives)
• EC50 values less than 1 mM were selected as actives.
• For cytotoxicity greater than 10 fold difference compared with EC50
• Models generated using : molecular function class fingerprints of maximum
diameter 6 (FCFP_6), AlogP, molecular weight, number of rotatable bonds,
number of rings, number of aromatic rings, number of hydrogen bond
acceptors, number of hydrogen bond donors, and molecular fractional polar
surface area.
• 5-fold cross validation or leave out 50% x 100 fold cross validation was used
to calculate the ROC for the models generated
T. cruzi Machine Learning models
R41-AI108003-01
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
24. Good Bad
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
T. cruzi Dose Response and cytotoxicity Machine Learning model features
Tertiary amines, piperidines and aromatic
fragments with basic Nitrogen
Cyclic hydrazines and electron poor
chlorinated aromatics
R41-AI108003-01
25. Bayesian Machine Learning Models
- Selleck Chemicals natural product lib. (139 molecules);
- GSK kinase library (367 molecules);
- Malaria box (400 molecules);
- Microsource Spectrum (2320 molecules);
- CDD FDA drugs (2690 molecules);
- Prestwick Chemical library (1280 molecules);
- Traditional Chinese Medicine components (373 molecules)
7569 molecules
99 molecules
R41-AI108003-01 Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
26. Synonyms Infection Ratio EC50 (µM) EC90 (µM) Hill slope
Cytotoxicity CC50
(µM)
Chagas mouse model (4
days treatment,
luciferase): In vivo
efficacy at 50 mg/kg bid
(IP) (%)
(±)-Verapamil
hydrochloride, 715730,
SC-0011762
0.02, 0.02 0.0383 0.143 1.67 >10.0 55.1
29781612,
Pyronaridine 0.00, 0.00 0.225 0.665 2.03 3.0 85.2
511176, Furazolidone 0.00, 0.00 0.257 0.563 2.81 >10.0 100.5
501337,
SC-0011777,
Tetrandrine
0.00, 0.00 0.508 1.57 1.95 1.3 43.6
SC-0011754,
Nitrofural 0.01, 0.01 0.775 6.98 1.00 >10.0 78.5*
* Used hydroxymethylnitrofurazone for in vivo study (nitrofural pro-drug)
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
H3C
O
N
CH3
N
CH3
H3C
O
CH3
O
H3C
O
H3C
N
N
HN
N
N
OH
Cl
O
CH 3
O
N
N
+
N
O
O
–
O
O
O
N
+
O
O
–
N
H
N
NH2
O
In vitro and in vivo data for compounds selected
R41-AI108003-01
27. 7,569 cpds => 99 cpds => 17 hits (5 in nM range)
Infection Treatment Reading
0 1 2 3 4 5 6 7
Pyronaridine Furazolidone Verapamil
Nitrofural Tetrandrine Benznidazole
In vivo efficacy of the 5 tested compounds
Vehicle
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878R41-AI108003-01
28. Pyronaridine: New anti-Chagas and known anti-Malarial
EMA approved in combination with
artesunate
The IC50 value 2 nM against the growth
of KT1 and KT3 P. falciparum
Known P-gp inhibitor
Active against Babesia and Theileria
Parasites tick-transmitted
R41-AI108003-01
Work provided starting point for a phase II and phase I grant (submitted)
29. 2014-2015 Ebola outbreak
March 2014, the
World Health
Organization (WHO)
reported a major
Ebola outbreak in
Guinea, a western
African nation
8 August 2014, the
WHO declared the
epidemic to be an
international public
health emergency
I urge everyone involved in all aspects of this epidemic to openly and rapidly report their experiences and
findings. Information will be one of our key weapons in defeating the Ebola epidemic. Peter Piot
Wikipedia
Wikipedia
30. Madrid PB, et al. (2013) A Systematic Screen of FDA-Approved Drugs for Inhibitors of Biological
Threat Agents. PLoS ONE 8(4): e60579. doi:10.1371/journal.pone.0060579
Chloroquine in mouse
31. Pharmacophore based on 4 compounds
Ekins S, Freundlich JS and Coffee M, 2014 F1000Research 2014, 3:277
amodiaquine, chloroquine, clomiphene
toremifene all are active in vitro
may have common features and bind
common site / target / mechanism
Could they be targeting proteins like viral
protein 35 (VP35)
component of the viral RNA polymerase
complex, a viral assembly factor, and an
inhibitor of host interferon (IFN) production
VP35 contributes to viral escape from host
innate immunity - required for virulence,
32. Pharmacophores for EBOV VP35 generated from crystal structures in the protein data bank PDB.
Ekins S, Freundlich JS and Coffee M, 2014 F1000Research 2014, 3:277
33. Redocking VPL57 in 4IBI
• The 4IBI ligand was removed from the structure
and redocked.
• The closest pose (grey) was ranked 29 with RMSD
3.02A and LibDock score 86.62 when compared to
the actual ligand in 4IBI (yellow)
Ekins S, Freundlich JS and Coffee M, 2014 F1000Research 2014, 3:277
34. Docking FDA approved compounds in VP35 protein showing overlap with ligand (yellow) and 2D interaction
diagram
4IBI was used, 4IBI
ligand VPL57 shown in
yellow.
Amodiaquine (grey)
and 4IBI LibDock
score 90.80,
Chloroquine (grey)
LibDock score 97.82,
Clomiphene (grey)
and 4IBI LibDock
score 69.77,
Toremifene (grey) and
4IBI LibDock score
68.11
Ekins S, Freundlich JS and Coffee M, 2014 F1000Research 2014, 3:277
35. Machine Learning for EBOV
• 868 molecules from the viral pseudotype entry assay and the EBOV replication assay
• Salts were stripped and duplicates removed using Discovery Studio 4.1 (Biovia, San
Diego, CA)
• IC50 values less than 50 mM were selected as actives.
• Models generated using : molecular function class fingerprints of maximum diameter 6
(FCFP_6), AlogP, molecular weight, number of rotatable bonds, number of rings,
number of aromatic rings, number of hydrogen bond acceptors, number of hydrogen
bond donors, and molecular fractional polar surface area.
• Models were validated using five-fold cross validation (leave out 20% of the database).
• Bayesian, Support Vector Machine and Recursive Partitioning Forest and single tree
models built.
• RP Forest and RP Single Tree models used the standard protocol in Discovery Studio.
• 5-fold cross validation or leave out 50% x 100 fold cross validation was used to
calculate the ROC for the models generated
36. Models
(training set 868 compounds)
RP Forest
(Out of bag
ROC)
RP Single Tree
(With 5 fold
cross validation
ROC)
SVM
(with 5 fold
cross validation
ROC)
Bayesian
(with 5 fold
cross validation
ROC)
Bayesian
(leave out
50% x 100
ROC)
Open
Bayesian
(with 5 fold
cross
validation
ROC)
Ebola replication (actives = 20)
0.70 0.78 0.73 0.86 0.86 0.82
Ebola Pseudotype (actives = 41)
0.85 0.81 0.76 0.85 0.82 0.82
Ebola HTS Machine learning model cross validation
Receiver Operator Curve Statistics.
38. Compound EC50 (uM)
Chloroquine 10
Mol 1 0.42
Mol 2 0.35
Mol 3 0.23
Effect of drug treatment on infection with Ebola-GFP
Compound EC50 (uM)
Chloroquine 6.9
Mol 1 0.23
Mol 2 0.19
Mol 3 0.52
3 Molecules selected from MicroSource Spectrum virtual screen and tested in vitro
All of them nM activity
Data from Robert Davey and Peter Madrid
Duplicate experiments
39. Making Ebola models available
• From data published by others …to proposing target
• Collaborated with lab to open up their screening data, build models,
identified more active inhibitors
• To date the most potent drugs and drug-like molecules
• Still a need for a drug that could be used ASAP
• Models in MMDS http://molsync.com/ebola/
More data continues to be published
• We collated 55 molecules from the literature
• A second review lists 60 hits
– Picazo, E. and F. Giordanetto, Drug Discovery Today. 2015 Feb;20(2):277-86
• Additional screens have identified 53 hits and 80 hits respectively
– Kouznetsova, J., et al., Emerg Microbes Infect, 2014. 3(12): p. e84.
– Johansen, L.M., et al., Sci Transl Med, 2015. 7(290): p. 290ra89.
Litterman N, Lipinski C and Ekins S 2015 F1000Research 2015, 4:38
40. MoDELS RESIDE IN PAPERS
NOT ACCESSIBLE…THIS IS
UNDESIRABLE
How do we share them?
How do we use Them?
42. Open ExtendedConnectivity Fingerprints
ECFP_6 FCFP_6
• Collected,
deduplicated,
hashed
• Sparse integers
• Invented for Pipeline Pilot: public method, proprietary details
• Often used with Bayesian models: many published papers
• Built a new implementation: open source, Java, CDK
– stable: fingerprints don't change with each new toolkit release
– well defined: easy to document precise steps
– easy to port: already migrated to iOS (Objective-C) for TB Mobile app
• Provides core basis feature for CDD open source model service
Clark et al., J Cheminform 6:38 2014
43. Predictions for the InhA target: (a) the ROC curve with ECFP_6 and FCFP_6 fingerprints; (b)
modified Bayesian estimators for active and inactive compounds; (c) structures of selected
binders.
For each listed target with at least two binders, it is first assumed that all of the molecules in
the collection that do not indicate this as one of their targets are inactive.
In the app we used ECFP_6 fingerprints
Building Bayesian models for each target in TB Mobile
Clark et al., J Cheminform 6:38 2014
44. TB Mobile Vers.2
Ekins et al., J Cheminform 5:13, 2013
Clark et al., J Cheminform 6:38 2014
Predict targets
Cluster molecules
http://goo.gl/vPOKS
http://goo.gl/iDJFR
46. Ames Bayesian model built using CDD Models showing ROC for 3
fold cross validation. Note only FCFP_6 descriptors were used
9R44TR000942-02
47. Exporting models from CDD
Clark et al., JCIM 55: 1231-1245 (2015)9R44TR000942-02
http://molsync.com/bayesian1
48. Open models in MMDS
Clark et al., JCIM 55: 1231-1245 (2015)9R44TR000942-02
49. ChEMBL 20
• Skipped targets with > 100,000 assays and sets
with < 100 measurements
• Converted data to –log
• Dealt with duplicates
• 2152 datasets
• Cutoff determination
• Balance active/ inactive ratio
• Favor structural diversity and activity distribution
Clark and Ekins, J Chem Inf Model. 2015 Jun 22;55(6):1246-60
50. What do 2000 ChEMBL models
look like
Folding bit size
Average
ROC
http://molsync.com/bayesian2 Clark and Ekins, J Chem Inf Model. 2015 Jun 22;55(6):1246-60
51. Nature Reviews Drug Discovery 9, 215–236 (1 March 2010)
Transporters modeled
Created models for
P-gp
OATPs
OCT1
OCT2
BCRP
hOCTN2
ASBT
hPEPT1
hPEPT2
NTCP
MATE1,
MATE-2K
MRP4
52. Results for Bayesian model cross validation. 5-fold and Leave one out
(LOO) validation with Bayesian models generated with Discovery Studio
and Open Models implemented in the mobile app MMDS. * = previously
published
Ekins et al Drug Metab Dispos In Press 2015
Transporter models
R41-AI108003-01
53. Ekins et al Drug Metab Dispos In Press 2015
Transporter models
http://molsync.com/transporters
9R44TR000942-02 5R01DK058251-14
54. • Very few researchers chasing >7000
diseases
• NIH ORDR budget for 2015 estimated
at $813M
• Relatively easy to treat. At the
forefront of gene therapy resurgence
• Only miniscule clinical trials possible
• Incentives – exclusivity, vouchers
Rare disease biology not well known
Affects 10s- 1000s per disease
The Rare Disease Opportunity
55. Used Not Used
67.5
125
245
350
0
50
100
150
200
250
300
350
400
Novartis Janssen BioMarin Knight
Therapeutics
Retrophin United
Therapeutics
VALUE ($M)
Tropical Tropical TropicalRare Rare Rare
According to statute, FDA's rare pediatric disease priority review voucher program is now slated to end after 17 March 2016. - See more at: http://www.raps.org/Regulatory-
Focus/News/2015/03/18/21750/Pediatric-Priority-Review-Voucher-Program-Set-to-End-After-FDA-Approves-New-Drug/#sthash.j6XGLEXz.dpuf
Benefits of Tropical disease and rare pediatric
disease priority review voucher program : The
golden ticket
57. A Mobile App for Open Drug Discovery
A flipboard for science
#ODDT
iOS only
Embraced by rare disease
advocates
Getting people to share
data openly is a
challenge
Tweets saved indefinitely
Developed with Alex Clark
Open Drug Discovery Teams – brings data from Twitter and the internet together
Ekins et al., Mol Informatics, 31: 585-597, 2012
http://goo.gl/r9NP7p
58. • Virtually anyone can do this
• Data is out there to produce models for drug discovery
• Computational and experimental collaborations with open data have lead to :
– New hits and leads
– New IP
– New grants for collaborators
• Even Ebola had enough data to build models and suggest compounds to test
in 2014
• Make findings open and published immediately
• Huge opportunity to work on rare diseases
• Challenges still – sharing and accessing information / knowledge
– Lack of trust in models
– Belief that you need super computers – when an app might be enough
– Barriers to sharing and collaboration
Conclusions
59. Alex Clark
Jair Lage de Siqueira-Neto
Joel Freundlich
Peter Madrid
Robert Davey
Megan Coffee
Ethan Perlstein
Robert Reynolds
Nadia Litterman
Christopher Lipinski
Christopher Southan
Antony Williams
Carolyn Talcott
Malabika Sarker
Steven Wright
Mike Pollastri
Ni Ai
Barry Bunin and all colleagues at CDD
Acknowledgments and contact info
ekinssean@yahoo.com
collabchem
sean.ekins