Numerical Characterization of Structure-Activity Relationships from a Medicinal Chemists Point of View
1. Defining & Using
Structure-Activity
Landscapes
Rajarshi Guha
Numerical Characterization of Background
Visualization
Structure-Activity Relationships from a Utilization
Medicinal Chemists Point of View Predictive Models
3D models
Chemical spaces
Summary
Rajarshi Guha
School of Informatics
Indiana University
National Medicinal Chemistry Symposium
Pittsburgh, PA
15th June, 2008
2. Defining & Using
Structure Activity Relationships Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Assumptions Predictive Models
3D models
Chemical spaces
Similar molecules will have similar activities Summary
Small changes in structure will lead to small changes in
activity
One implication is that SAR’s are additive
This is the basis for QSAR modeling
Martin, Y.C. et al., J. Med. Chem., 2002, 45, 4350–4358
3. Defining & Using
Structure Activity Landscapes Structure-Activity
Landscapes
Melanocortin-4 receptor inhibitors Rajarshi Guha
Background
Visualization
Utilization
F3C Cl Cl
F3C Cl Cl NH2
Predictive Models
NH2 3D models
Chemical spaces
N
N Summary
N
N NH
NH2
O
O O
Ki = 39.0 nM Ki = 1.8 nM
F3C Cl Cl F3C Cl Cl
NH2 NH2
N N
N N
NH NH
O NH2 O
O O NH2
Ki = 10.0 nM Ki = 1.0 nM
Tran, J.A. et al., Bioorg. Med. Chem. Lett., 2007, 15, 5166–5176
4. Defining & Using
Structure Activity Landscapes Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
Rugged gorges or rolling hills?
Small structural changes associated with large activity
changes represent steep slopes in the landscape
Activity Cliffs
But traditionally, QSAR assumes gentle slopes
Machine learning is not very good for special cases
Maggiora, G.M., J. Chem. Inf. Model., 2006, 46, 1535–1535
5. Defining & Using
Characterizing the Landscape Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Converting activity cliffs to numbers
Utilization
Predictive Models
A cliff can be numerically characterized 3D models
Chemical spaces
Structure-Activity Landscape Index (SALI) Summary
|Ai − Aj |
SALIi,j =
1 − sim(i, j)
Cliffs are characterized by elements of the matrix with
very large values
Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., 2008, 48, 646–658
6. Defining & Using
Visualizing the SALI Matrix Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
7. Defining & Using
Visualizing SALI Values Structure-Activity
Landscapes
Rajarshi Guha
Background
Alternatives? Visualization
A heatmap is an easy to understand visualization Utilization
Predictive Models
3D models
Coupled with brushing, can be a handy tool Chemical spaces
Summary
A more flexible approach is to consider a network view of
the matrix
The SALI graph
Compounds are nodes
Nodes i, j are connected if SALIi,j > X
Only display connected nodes
9. Defining & Using
Better Visualization Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
SALIViewer Utilization
Predictive Models
Java application for generating and visualizing SALI 3D models
Chemical spaces
graphs Summary
Create SALI graphs from SMILES and activity data,
using the CDK fingerprints
Easily examine SALI graphs at different cutoffs
Provides 2D depictions for nodes and edges
Generate SALI curves
http://cheminfo.informatics.indiana.edu/~rguha/code/java/salivis
10. Defining & Using
Better Visualization - SALIViewer Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
11. Defining & Using
Varying Fingerprint Methods Structure-Activity
Landscapes
Rajarshi Guha
8 BCI 1052 bit MACCS 166 bit CDK 1024 bit
8
8
Background
6
6
6
Visualization
Density
Density
Density
4
4
4
Utilization
Predictive Models
2
2
2
3D models
0
0
0
Chemical spaces
0.70 0.75 0.80 0.85 0.90 0.95 1.00 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0.6 0.7 0.8 0.9 1.0
Tanimoto Similarity Tanimoto Similarity Tanimoto Similarity
Summary
Shorter fingerprints will lead to more “similar” pairs
Requires a higher cutoff to focus on significant cliffs
12. Defining & Using
Varying the Similarity Metric Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
The similarity metric does not affect the SALI values
13. Defining & Using
SALI Graphs & Predictive Models Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
The graph view allows us to view SAR’s and identify Predictive Models
trends easily 3D models
Chemical spaces
The aim of a QSAR model is to encode SAR’s Summary
Traditionally, we consider the quality of a model in terms
of RMSE or R 2
But in general, we’re not as interested in RMSE’s as we
are in whether the model predicted something as more
active than something else
What we want to have is the correct ordering
We assume the model is statistically significant
14. Defining & Using
SALI Graphs & Predictive Models Structure-Activity
Landscapes
Rajarshi Guha
Background
Measuring model quality Visualization
Utilization
A QSAR model should easily encode the “rolling hills” Predictive Models
3D models
A good model captures the most significant cliffs Chemical spaces
Summary
Can be formalized as
How many of the edge orderings of a SALI
graph does the model predict correctly?
Define S(X ), representing the number of edges correctly
predicted for a SALI network at a threshold X
Repeat for varying X and obtain the SALI curve
15. Defining & Using
SALI Graphs & Predictive Models Structure-Activity
Landscapes
Rajarshi Guha
Background
Measuring model quality Visualization
Utilization
A QSAR model should easily encode the “rolling hills” Predictive Models
3D models
A good model captures the most significant cliffs Chemical spaces
Summary
Can be formalized as
How many of the edge orderings of a SALI
graph does the model predict correctly?
Define S(X ), representing the number of edges correctly
predicted for a SALI network at a threshold X
Repeat for varying X and obtain the SALI curve
16. Defining & Using
SALI Curves - An Example Structure-Activity
Landscapes
Rajarshi Guha
1.0
Background
Visualization
Utilization
0.5
Predictive Models
3D models
Chemical spaces
Summary
S(X)
0.0
−0.5
3−descriptor
5−descriptor
Scrambled 3−descriptor
−1.0
0.0 0.2 0.4 0.6 0.8 1.0
X
17. Defining & Using
SALI Curves & Model Comparison Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Considered four datasets Utilization
Predictive Models
Developed linear regression models, using exhaustive 3D models
Chemical spaces
search for feature selection Summary
Identify three models for each dataset
Minimum RMSE (“best”)
Median RMSE
Maximum RMSE (“worst”)
Generate SALI curves for each model and summarize by
dataset
Guha, R.; Van Drie, J.H., J. Chem. Inf. Model., submitted
19. Defining & Using
SALI Curves & Model Comparison Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
1.0
Utilization
Predictive Models
The initial and final portions of 3D models
0.5
Chemical spaces
the curve are of interest Summary
S(X)
0.0
It’s also useful to summarize
the whole curve
−0.5
We evaluate the area between
the curve and the X-axis (SCI) SCI = 0.12
−1.0
-1 ≤ SCI ≤ 1 0.0 0.2 0.4 0.6 0.8 1.0
X
SALI Curve Integral
20. Defining & Using
SALI Curves & Model Comparison Structure-Activity
Landscapes
Rajarshi Guha
0.6 Min−RMSE
Background
Median−RMSE
Max−RMSE Visualization
0.5
Utilization
Predictive Models
3D models
0.4
Chemical spaces
Summary
0.3
SCI
0.2
0.1
0.0
−0.1
PDGFR Artemisinin Melanocortin Benzodiazepine
21. Defining & Using
Examining Any Type of Model . . . Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Previous examples make use of predicted values from
Utilization
QSAR models Predictive Models
3D models
We can consider any “prediction” that is supposed to Chemical spaces
track observed activity Summary
Ranks
Energies
Allows us to apply this approach to any type of
computational model that predicts something
Docking
CoMFA
Pharmacophore
22. Defining & Using
Docking & CoMFA Models Structure-Activity
Landscapes
Rajarshi Guha
1.0
1.0
Background
Visualization
0.5
0.5
Utilization
Predictive Models
3D models
S(X)
S(X)
Chemical spaces
0.0
0.0
Summary
−0.5
−0.5
SCI = 0.95 SCI = 0.99
−1.0
−1.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
X X
Docking CoMFA
Not surprising that 3D models capture more cliffs
The CoMFA model is nearly perfect!
Holloway, K. et al, J. Med. Chem., 1995, 38, 305–317
Cavalli, A. et al, J. Med. Chem., 2002, 45, 3844–3853
23. Defining & Using
Comparing Landscapes Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
The SALI curve is a function of
Utilization
dataset Predictive Models
3D models
descriptor space Chemical spaces
We can quantify a descriptor spaces ability to encode Summary
the structure-activity landscape using SALI graphs
What is the size of the graph as a function of SALI
cutoff?
The SALI approach allows us to investigate molecular
representations that may not be directly accessible
Work in progress
24. Defining & Using
Comparing Landscapes Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
The SALI curve is a function of
Utilization
dataset Predictive Models
3D models
descriptor space Chemical spaces
We can quantify a descriptor spaces ability to encode Summary
the structure-activity landscape using SALI graphs
What is the size of the graph as a function of SALI
cutoff?
The SALI approach allows us to investigate molecular
representations that may not be directly accessible
Work in progress
25. Defining & Using
Comparing Landscapes Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
The SALI curve is a function of
Utilization
dataset Predictive Models
3D models
descriptor space Chemical spaces
We can quantify a descriptor spaces ability to encode Summary
the structure-activity landscape using SALI graphs
What is the size of the graph as a function of SALI
cutoff?
The SALI approach allows us to investigate molecular
representations that may not be directly accessible
Work in progress
26. Defining & Using
Comparing Landscapes Structure-Activity
Landscapes
Rajarshi Guha
1.0
Fingerprint
Best 3−Descriptor
Background
Worst 3−Descriptor
Visualization
Normalized Number of Cliffs
0.8
Utilization
Predictive Models
3D models
0.6
Chemical spaces
Summary
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
SALI Cutoff
A Type 2 SALI curve for the PDGFR dataset, comparing 3
different molecular representations
27. Defining & Using
What’s Next? Structure-Activity
Landscapes
Rajarshi Guha
Background
SALI graphs and curves represent a framework for Visualization
exploring structure-∗ landscapes Utilization
Predictive Models
3D models
Chemical spaces
Open questions Summary
Weighted SALI graphs (ADMET, synthetic feasibility)
Is it correct to identify cliffs using fingerprints, and then
predict cliffs using different descriptors?
Can we use SALI curves to compare 3D and 2D
descriptor spaces?
Can we use SCI for feature selection?
28. Defining & Using
Conclusions Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
The SALI is an effective way to numerically encode Predictive Models
3D models
activity cliffs Chemical spaces
Summary
The network view of these values allows us to explore
SAR’s in an intuitive way
Using the SALI curve allows us to compare predictive
models in a manner that is intuitive for a medicinal
chemist
29. Defining & Using
Acknowledgments Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
John Van Drie
30. Defining & Using
Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
31. Defining & Using
Making Use of the SALI Graph Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
Utilization
A little difficult with a non-interactive graph Predictive Models
3D models
We can investigate a series of transformations that Chemical spaces
Summary
increase (or decrease) activity
Identify two types of SAR’s
Broad
Detailed
Depends on what cutoff we choose
These correspond somewhat to the continuous and
discontinuous SAR’s described by Peltason et al.
Peltason, L. et al., J. Med. Chem., 2007, 50, 5571–5578
32. Defining & Using
Glucocorticoid Inhibitors Structure-Activity
Landscapes
Rajarshi Guha
Background
06−21 06−32 06−41 06−30 07−12 07−19 07−20 07−23 06−42 06−24 06−44 07−30 07−41
Visualization
Utilization
Predictive Models
3D models
Chemical spaces
Summary
06−20 06−16 06−43 07−13 07−14 07−17 07−35 07−36 07−37 07−39 07−42
62 dihydroquinoline derivatives
IC50 ’s reported, some values were censored 07−38
50% SALI graph generated using 1052 bit BCI
fingerprints
Takahashi, H. et al, Bioorg. Med. Chem. Lett., 2007, 17, 5091–5095
33. Defining & Using
Glucocorticoid Inhibitors Structure-Activity
Landscapes
Rajarshi Guha
O
Background
Visualization
O Utilization
Moving from ally or phenylethyl N
H Predictive Models
3D models
to ethyl causes a 6-fold increase 07-20, 2000 nM Chemical spaces
Summary
in activity
O
Reducing bulk at this position
seems to improve activity
O
Pretty broad conclusion N
H
07-23, 2000 nM
But ethyl is not much smaller
than allyl
O
We need more detail
O
N
H
07-17, 355 nM
36. Defining & Using
Glucocorticoid Inhibitors Structure-Activity
Landscapes
Rajarshi Guha
O
O
Background
Visualization
O O
N
H
N Utilization
H
Predictive Models
07-15, 2000 nM 07-20, 2000 nM 3D models
Chemical spaces
Summary
O
Suggests that electron density is
also important
O
N
Lower π density possibly correlates H
to increased activity 07-18, 710 nM
Confirmed by 07-23 → 07-18
O
07-15 → 07-17 is interesting since
the change increases the bulk
O
N
H
07-17, 355 nM
37. Defining & Using
Glucocorticoid Inhibitors Structure-Activity
Landscapes
Rajarshi Guha
Background
Visualization
These observations match those made by Takahashi Utilization
et al. Predictive Models
3D models
More detailed graphs exhibit longer paths that focus on Chemical spaces
Summary
the bulk of side chains at the C4-α position
A number of paths consider changes to the epoxide
substitution
Usually of length 1
Highlights the fact that bulk at the C4 α has greater
impact on activity than epoxide substitutions
The SALI graph stresses the non-linearity of the SAR
38. Defining & Using
SALI Curves - Control Experiments Structure-Activity
Landscapes
Rajarshi Guha
1.0
Scrambling Background
0.5
Visualization
Scramble the Y-variable and
S(X)
0.0
Utilization
rebuild the model Predictive Models
3D models
−0.5
Evaluate the SALI curve Chemical spaces
Summary
Repeat 50 times and take
−1.0
0.0 0.2 0.4
X
0.6 0.8 1.0
the mean of the counts for a
given cutoff
1.0
0.5
Noise
S(X)
Add uniform noise to each
0.0
descriptor, rebuild the model
−0.5
No noise
10
50
100
We expect little variation in
−1.0
200
0.0 0.2 0.4 0.6 0.8 1.0
the plateau
X