Presentation at the Pre-meeting Workshop Next-Generation Clinical Pharmacology: Integrating Systems Pharmacology, Data-Driven Therapeutics, and Personalized Medicine. American Society for Clinical Pharmacology and Therapeutics Annual Meeting Atlanta GA March 18, 2014.
Next Generation Data and Opportunities for Clinical Pharmacologists
1. NEXT GENERATION DATA AND
OPPORTUNITIES FOR CLINICAL
PHARMACOLOGISTS
Philip E. Bourne Ph.D.
Associate Director for Data Science
National Institutes of Health
3. Agenda
Research that Informs my NIH Agenda
– The TB drugome – towards reproducibility
– Systems pharmacology – towards interoperability
Some Challenges
– We have the why, but we lack the how
– The how involves:
• Representation
• Sustainability
• Discoverability
• Training
4. Reconstruction of Genome-Scale
3D Drug-Target Interaction Models
Integrating chemical genomics and structural systems biology
MD
simulation
Mj
Q
Mj
Q
ligENTS SMAP
Protein-ligand
docking
Mj
Q
Mi
3D model
of novel
Target
3D model of
annotated
target
interaction
model
Query
chemical
Network
modeling
Experimental
support
L. Xie and P.E. Bourne 2008 PNAS, 105(14) 5441-5446
http//:funsite.sdsc.edu
5. • Geometric and topological constraints
• Evolutionary constraints
• Dynamic constraints
• Physiochemical constraints
Detecting Protein Binding
Promiscuity in a Given Proteome
HASSTRVCTVREPRTSEQAENCE
SMAP v2.0
Approach
6. Geometric Potential – A Geometric Constraint
Challenge: inherent flexibility
and uncertainty in homology
models
Representation of the protein
structure
- Cα atoms only
- Delaunay tessellation
- Graph representation
Geometric Potential (GP)
GP = P +
Pi
Di+1.0neighbors
∑ ×
cos(αi)+1.0
2.0
L. Xie & P. E. Bourne, BMC Bioinformatics, 8(2007):S9
100 0
Geometric Potential Scale
0
0.5
1
1.5
2
2.5
3
3.5
4
0
11
22
33
44
55
66
77
88
99
Geometric Potential
binding site
non-binding site
Approach
8. Similarity Matrix of Alignment – Chemical &
Evolutionary Constraints?
Constraint - Chemical Similarity
• Amino acid grouping: (LVIMC), (AGSTP), (FYW), and
(EDNQKRH)
• Amino acid chemical similarity matrix
Constraint - Evolutionary Correlation
• Amino acid substitution matrix such as BLOSUM45
• Similarity score between two sequence profiles
i
a
i
i
b
i
b
i
i
a SfSfd ∑∑ +=
fa, fb are the 20 amino acid target frequencies of profile a
and b, respectively
Sa, Sb are the PSSM of profile a and b, respectively
Xie and Bourne 2008 PNAS, 105(14) 5441
9. The Problem with Tuberculosis
One third of global population infected
1.7 million deaths per year
95% of deaths in developing countries
Anti-TB drugs hardly changed in 40 years
MDR-TB and XDR-TB pose a threat to human health
worldwide
Development of novel, effective and inexpensive
drugs is an urgent priority
10. The TB-Drugome
1. Determine the TB structural proteome
2. Determine all known drug binding sites from the
PDB
3. Determine which of the sites found in 2 exist in 1
4. Call the result the TB-drugome
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
11. 1. Determine the TB Structural
Proteome
284
1, 446
3, 996 2, 266
TB proteom
e
hom
ology
m
odels
solved
structures
High quality homology models from ModBase
(http://modbase.compbio.ucsf.edu) increase structural
coverage from 7.1% to 43.3%
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
12. 2. Determine all Known Drug Binding
Sites in the PDB
Searched the PDB for protein crystal structures
bound with FDA-approved drugs
268 drugs bound in a total of 931 binding sites
No. of drug binding sites
Methotrexate
Chenodiol
Alitretinoin
Conjugated
estrogens
Darunavir
Acarbose
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
13. 3. Map 2 onto 1 –
The TB-Drugome
http://funsite.sdsc.edu/drugome/TB/
Similarities between the binding sites of M.tb proteins (blue),
and binding sites containing approved drugs (red).
14. From a Drug Repositioning
Perspective
Similarities between drug binding sites and
TB proteins are found for 61/268 drugs
41 of these drugs could potentially inhibit
more than one TB protein
No. of potential TB targets
raloxifene
alitretinoin
conjugated
estrogens &
methotrexate
ritonavir
testosterone
levothyroxine
chenodiol
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
15. Agenda
Research that Informs my NIH Agenda
– The TB drugome – towards reproducibility
– Systems pharmacology – towards interoperability
Some Challenges
– We have the why, but we lack the how
– The how involves:
• Representation
• Sustainability
• Discoverability
• Training
16. Agenda
Research that Informs my NIH Agenda
– The TB drugome – towards reproducibility
– Systems pharmacology – towards interoperability
Some Challenges
– We have the why, but we lack the how
– The how involves:
• Representation
• Sustainability
• Discoverability
• Training
17. Characteristics of the Original and
Current Experiment
Original and Current:
– Purely in silico
– Uses a combination of public databases and
open source software by us and others
Original:
– http://funsite.sdsc.edu/drugome/TB/
Current:
– Recast in the Wings workflow system
18. Considered the Ability to Reproduce
by Four Classes of User
REP-AUTHOR – original author of the work
REP-EXPERT – domain expert – can reproduce even
with incomplete methods described
REP-NOVICE – basic domain (bioinformatics) expertise
REP-MINIMAL – researcher with no domain expertise
Garijo et al 2013 PLOS ONE 8(11): e80278
19. A Conceptual Overview of the Method
Should Be Mandatory
Garijo et al 2013 PLOS ONE 8(11): e80278
20. Time to Reproduce the Method
Garijo et al 2013 PLOS ONE 8(11): e80278
21. Its not that we could not reproduce
the work, but the effort involved was
substantial
Any graduate student could tell you
this and little has changed in 40 years
Perhaps it is time we did better?
22. Agenda
Research that Informs my NIH Agenda
– The TB drugome – towards reproducibility
– Systems pharmacology – towards interoperability
Some Challenges
– We have the why, but we lack the how
– The how involves:
• Representation
• Sustainability
• Discoverability
• Training
23. Human Kidney Modeling Pipeline
Recon1
metabolic
network
constrain
exchange
fluxes
preliminary
model
refine
based on
capabilities
literatur
e
set flux
constraints
normalize &
set threshold
renal
objectives
set minimum
objective flux
GIMME metabolic
influx
metabolic
efflux
kidney
model
healthy kidney
gene expression
data
Approach
metabolomic
blood/urine & kidney
localization data
R.L Chang et al. 2010 PLOS Comp. Biol. 6(9): e1000938
24. Agenda
Research that Informs my NIH Agenda
– The TB drugome – towards reproducibility
– Systems pharmacology – towards interoperability
Some Challenges
– We have the why, but we lack the how
– The how involves:
• Representation
• Sustainability
• Discoverability
• Training
25. Agenda
Research that Informs my NIH Agenda
– The TB drugome – towards reproducibility
– Systems pharmacology – towards interoperability
Some Challenges
– We have the why, but we lack the how
– The how involves:
• Representation
• Sustainability
• Discoverability
• Training
26. Representation
Requires community engagement:
– RDA
– GA4GH
– FORCE11
– ……
Policies
– Genomic data sharing plan
– Machine readable data sharing plans
Particular needs surrounding phenotypic data
27. Sustainability
The How of Data Sharing
More credit to the data scientists
Change to funding models – become less IC based
Public/Private partnerships
Interagency cooperation
International cooperation
Better evaluation and more informed decisions about
existing and proposed resources – How are current
data being used?
Role of institutional repositories – reward institutions
rather than PIs
28. Discoverability
Calls for data and software registries (e.g., DDI)
Data commons (NIH drive?)
More clinical trial data in the public domain
Facilitate authentication and hence access to clinical
data
29. Training
Calls out for training grants – new and as
supplements to existing training efforts
Regional training centers (cf Cold Spring Harbor)?
Tuberculosis, which is caused by the bacterial pathogen Mycobacterium tuberculosis, is a leading cause of mortality among the infectious diseases. It has been estimated by the World Health Organization (WHO) that almost one-third of the world's population, around 2 billion people, is infected with the disease.
Every year, more than 8 million people develop an active form of the disease, which claims the lives of nearly 2 million. This translates to over 4,900 deaths per day, and more than 95% of these are in developing countries.
Despite the current global situation, antitubercular drugs have remained largely unchanged over the last four decades. The widespread use of these agents has provided a strong selective pressure for M.tuberculosis, thus encouraging the emergence of resistant strains.
Multidrug resistant (MDR) tuberculosis is defined as resistance to the first-line drugs isoniazid and rifampin. The effective treatment of MDR tuberculosis necessitates long-term use of second-line drug combinations, an unfortunate consequence of which is the emergence of further drug resistance.
Enter extensively drug resistant (XDR) tuberculosis - M.tuberculosis strains that are resistant to both isoniazid plus rifampin, as well as key second-line drugs. Since the only remaining drug classes exhibit such low potency and high toxicity, XDR tuberculosis is extremely difficult to treat.
The rise of XDR tuberculosis around the world imposes a great threat on human health, therefore reinforcing the development of new antitubercular agents as an urgent priority.
Very few Mtb proteins explored as drug targets
3,996 proteins in TB proteome
749 solved structures in the PDB, representing a total of 284 proteins (7.2% coverage)
ModBase contains homology models for entire TB proteome
1,446 ‘high quality’ homology models were added to the data set
Structural coverage increased to 43.8%
Retained only those models with a model score of > 0.7 and a Modpipe quality score of > 1.1 (2818 models).
There were multiple models per protein. For each TB protein, chose the model with the best model score, and if they were equal, chose the model with the best Modpipe quality score (1703 models).
However, 251 (+6) models were removed since they correspond to TB proteins that already have solved structures. 1446 models remained)
Score for the reliability of a Model, derived from statistical potentials (F. Melo, R. Sanchez, A. Sali,2001 PDF). A model is predicted to be good when the model score is higher than a pre-specified cutoff (0.7). A reliable model has a probability of the correct fold that is larger than 95%. A fold is correct when at least 30% of its Calpha atoms superpose within 3.5A of their correct positions.
The ModPipe Protein Quality Score is a composite score comprising sequence identity to the template, coverage, and the three individual scores evalue, z-Dope and GA341. We consider a MPQS of >1.1 as reliable