Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Interplay between screenng_data_and _properties_pope
1. The Interplay between Chemical Properties
and Screening Data
Andy Pope
Platform Technology & Science, GlaxoSmithKline,
Collegeville PA, USA
MipTec 2011, Basel
Sept. 20-22, 2011
2. Compound properties aren’t what they used to be…
MW
ClogP
ClogP
Properties vs Phase*
MW
cLogP (median) MW (median) *Adapted from Blake JF, Medicinal Chemistry,
Failed candidate = 3.9 Failed candidate = 432 2005, 1, 649-655
Marketed drug = 2.5 Marketed drug = 349
4. Drug discovery chemical property space - Some critical
factors . …
- Chemistry methods
Drug - Chemistry “culture”
candidates
- Hit ID libraries
- Screening methods
- SAR data
5. Drug discovery chemical property space - Some critical
factors . …
- Chemistry methods - Efficiency concepts
Drug - Chemistry “culture” - Property guides/rules
candidates
- Hit ID libraries
- Rigorous property rules
- Fragments
- Lead-like, “Beautiful”
- Screening methods
- SAR data
6. Does assay data influence discovery chemical property
space occupancy?
(…or vice versa)
7. Large Scale analysis of High Throughput Screening Data
HTS at GSK
330 screens of >500,000 cpds, 2005-2010
Single concentration primary data (10 uM) re-analysed
Compound results binned according to simple compound properties
Meta-data (e.g. target class, screening technology) curated
Academic screening centers (MLPCN)
~100 screens with >250,000 cpds tested & deposited to PubChem BioAssay
from major NIH funded screening centers (NCGC, Scripps, Broad)
Single concentration data re-analysed using same methods as GSK data
8. The GSK HTS Process
Primary Screen
(10 uM – singlicate) ~ Entire collection
2 (100%)
Statistical separation
from null effect Chemical clustering
~2 million if hit rate >1%
population
Confirmation
(10 uM – duplicate)
Potential actives
(<1%)
Eliminate false ~20,000
positives from Chemical clustering for
primary diversity & property
sampling
Dose response
(11 pt 3-fold dilution)
“Real“ hits
(<0.1%)
~2,000
9. HTS Hit Marking Processes
---- Binned Raw HTS data
---- Fit with raw mean and std. deviation Normal Distribution:
---- Fit with robust mean & std. deviation 3 x RSD cut-off x 2
“miss” “hit” 1 2 2
Px e
2
2
Raw mean = 1.0
Raw SD = 11.1
% Compounds
Robust mean = -0.3
Robust SD = 5.5
% Compounds
weak hits, artefacts, and Blue & black curves are
statistical “noise” normal distribution fits
using mean & SD
potent hits (& artefacts)
RESPONSE (% control)
RESPONSE (% control)
10. Frequency (% cpds/bin) Typical HTS observed data distribution vs. fit
---- Binned Raw HTS data Effect (% control)
---- Robust distribution fit
---- Hit cut-off (mean + 3 x RSD)
Note; representative selection of individual
---- Residual (raw – fit)
screens from ~330 analyzed
11. Frequency (% cpds in bin) Observed data distribution vs. fit – zoom
---- Binned Raw HTS data Effect (% control)
---- Robust distribution fit
---- Hit cut-off (mean + 3 x RSD) Note; representative selection of
---- Residual (raw – fit) individual screens from ~330 analyzed
12. Screen cut-off (mean + 3 x RSD)
GSK HTS campaigns 2005-2010
Average robust Z’ of assay during HTS production
13. Looking for property trends in the GSK HTS dataset
The total polar surface area (tPSA) is
defined as the surface sum over all polar
e.g. Compound total polar surface area; atoms
< 60 A2 predicts brain penetration
Aggregate results from all > 140 A2 predicts poor cell penetration
330 campaigns 2005-2010
with >500K tests Compounds with tPSA 80-85 Å2
26M measured responses in this bin
- 485k marked as “hit”
Hit rate = 100*(485k/26M) = 1.86%
Hit Rate (%)
- Hit rate for Compounds in
specific tPSA bin
Polar Surface Area (tPSA, Å2)
14. Compound shapeliness and flexibility
fCsp3 captures “shapeliness” of a compound
- Weak positive correlation with MW
- More irregular 3D shape lower hit probability
Hit Rate (%)
Flexibility = Percentage of a compound’s bonds that are
rotatable
- light decrease in HR with Flexibility
- No correlation with MW or ClogP
Fraction of carbons that are sp3 (fCsp3)
Hit Rate (%)
Flexibility
15. Compound Size (MW)
HTS hit rates rises significantly
with increasing compound MW Middle 80% of Cpds
270 470
Cumulative % Cpds
% Cpds in MW Bin
4.0%
Hit Rate (%)
2.62%
1.50% MW
1.2%
Overall Hit rate rises 1.7-fold across
the middle 80% of the screening deck
i.e. 70% rise in hit rate from MW = 270 to
Molecular Weight (MW) MW = 470
3.3-fold rise across full MW range
- Only bins containing 1M or more
records are shown
16. Compound Lipophilicity (ClogP)
HTS hit rates rises sharply with
Middle 80% of Cpds
increasing compound lipophilicity 1 5
Cumulative % Cpds
% Cpds in ClogP Bin
4.5%
3.31%
Hit Rate (%)
ClogP
1.14%
1.1%
Overall hit rate rises 2.9-fold across the
middle 80% of the screening deck
i.e. from ClogP = 1 5
ClogP 4.1-fold rise across full ClogP range
- Only bins containing 1M or more
records are shown
17. Promiscuity v. Molecular Properties
The prevalence of promiscuous compounds rises sharply with size and
lipophilicity
• Hit Frequency Index (HFI)= % of SS HTS campaigns that a compound give activity >cut-off
• “Promiscuous” compound HFI ≥ 10% (having seen at least 50 campaigns)
% of Promiscuous Compounds
% of Promiscuous Compounds
% Rise in Promiscuity
% Rise in Promiscuity
Molecular Weight cLogP
Across the middle 80% of the screening deck …
• Large compounds are 4-fold more likely to have high HFI than small ones (MW: 270 470)
• Lipophilic compounds are 10-fold more likely to have high HFI than polar ones ones (cLogP: 1 5)
18. Property distributions vs. promiscuity - cLogP
Compounds Compounds hitting
hitting ~1 target >10% of targets
cLogP
Note; Compounds
required to have been
run in 50 HTS and
yielded > 50% effect in
a single screen to be
included
Frequency at bin > Frequency at bin > Frequency at bin > Frequency at bin >
Inhibition frequency Index* (%)
*Inhibition frequency index (IFI) = % of screens where cpd yielded
>50% inhibition, where total screens run => 50
19. The “Dark” Matter
– Compounds which have not yielded >50% effect once in >50 screens
Molecular Weight (Da)
cLogP
20. Translation of biases to full-curve follow-up
Property bias in primary HTS hit marking are propagated forward to dose-
response follow-up
SS testing
FC testing
FC – SS differential
% Compounds Tested
% Compounds Tested
cLogP Molecular Weight
Elevated testing of large, lipophilic Reduced testing of small, polar compounds
compounds in the full-curve phase of HTS in the full-curve phase of HTS
Note; Plots represent data from 402M single-concentration responses &
2.1M full-curve results
21. Property Trends; translation to dose response
Property effects contribute to hits at all effect levels
- i.e not just hits on the statistical margins
Property-dependence decreases through the HTS process
Standard 3SD SS Hits Standard 3SD SS Hits
Top 0.1% of SS Responses Top 0.1% of SS Responses
% of cpds with IC50 <= 10 uM % of compounds with IC50 <= 10 uM
% Lift in Hit Rate
% Lift in Hit Rate
ClogP MW
From *ClogP = 1 5: From *MW = 270 470:
• 3SD: 2.9X rise in Hit Rate • 3SD: 1.8X rise in Hit Rate
• Top 0.1%: 2.2X rise • Top 0.1%: 1.3X rise
• FC Active: 1.5X rise • FC Active: 1.2X rise
*Across the middle 80% of the deck,….
22. Property response of individual screens is highly variable
e.g. Screens with largest response to cLogP
Hit rate as % of HR at cLogP =3.5
cLogP
23. Property response of individual screens is highly variable
e.g. Screens with smallest response to cLogP
Hit rate as % of HR at cLogP =3.5
cLogP
24. Assay Technology
Colored by Hit
rate (%)
Hit rate as % of HR at cLogP =3.5
cLogP
25. Target Class
Colored by Hit
rate (%)
Hit rate as % of HR at cLogP =3.5
cLogP
26. Improving hit marking
- reducing bias towards high cLogP, MW hits
Virtual partitioning of collection according to property
- e.g. sub-collections in different cLogP ranges
Change the hit calling method, so this takes properties as well as % effect into
account.
- e.g. calculate hit cut-off’s bases on BEI/LEI etc.
- “scalar” methods based on correcting the observed biases
And..improving assays and the collection based on awareness of
these biases
27. Improving hit marking – Property Biasing
Mean + 3 x RSD cut-off
Hit Rate (%)
Ordinary HTS Hit Marking
Property-biased Hit Marking
More attractive
properties
% Compounds
- promote MW
Less attractive
Hit Rate (%)
properties
- demote
Ordinary HTS Hit Marking
Property-biased Hit Marking
RESPONSE (% control)
ClogP
28. Improving hit marking HitProperty Binning
Property-biased – Marking
Sub-divide screening data into bins of compounds with similar properties
- apply 3 x rsd hit cut-offs to each bin
Consensus method combines approaches – routinely implemented
Response Response
Property-Binned stats Property-Binned stats
Property Consensus Property Consensus
Hit Rate (%)
Hit Rate (%)
Bin 1; Bin 2; Bin 3; Bin 1; Bin 2; Bin 3;
Low MW, Medium MW, High MW, Low MW, Medium MW, High MW,
cLogP cLogP cLogP cLogP cLogP cLogP
MW ClogP
29. Evolving the screening collection to smaller, more polar
lead-like space
GSK’s Compound Collection Enhancement (CCE) strategy has biased the HTS deck
towards decreased size and lipophilicity with the aim of improving chemical starting
points
Compounds tested in HTS
% Compounds Exceeding Property Limit
- 2004
(% of total compounds in HTS)
- 2010
- 2010 <> 2004
ClogP > 5
MW > 500
New
2011
ClogP Year
CCE Acquisition, Property Bounds
2004-05: Lipinski criteria (MW<500, ClogP<5)
Most recently: MW<360, ClogP<3
Inclusion of DPU lead-op cpds: MW<500, ClogP<5
30. Property trends in MLPCN Screening Data
Primary data from around 100 Academic HTS campaigns obtained from
PubChem BioAssay
Lipophilicity – similar to GSK HTS Compound size – little effect
3.80%
Hit Rate (%)
Hit Rate (%)
Pretty flat
2.27%
2.14%
1.28%
ClogP (MW)
GSK screening deck (>50 HTSs, 2.01M cpds)
ClogP = 0.00835*MW – 0.058, R2 = 0.18
PubChem Compounds (405k)
ClogP = 0.00554*MW + 0.97, R2 = 0.09
31. MLPCN Screening Data – Property Trends
Example Individual screen responses to cLogP
Trellis by individual screens
3 x rsd hit rate (%)
cLogP
32. Small Beautiful Set Screening
SBS = Subset of the HTS deck which spans the
gap between HTS and fragments
HTS collection (2M)
Filtered on;
- size and lipophilicity
• 10 ≤ HAC ≤ 28 and -2 ≤ ClogP ≤ 3, bounded (MW)
- “promiscuity” – frequent-hitters are eliminated
• IFI ≤ 3% (IFI = Inhibition Frequency Index, 3SD hit cutoff)
- hit explosion opportunity
• Near Neighbor Count ≥ 20 (in GSK registry
- “shapliness”
• fCsp3 ≥ 0.3 (i.e. ≥ 30% of carbon atoms must be sp3)
- acquisition sub-structural filters
- “greedy” diversity selection (no compounds >0.9 similar )
ClogP
SBS2 = ~75,000 compounds
Tested at higher concentration (e.g. 100-200 uM)
33. Conclusions
Standard HTS processes favor the selection of larger, more lipophilic
compounds
There are no clear trends between this behavior and assay technology or
target class
Methods have been developed which (to some extent) compensate for
property biases to ensure that attractive lead like molecules are selected
- Overall hit rate in relation to downstream triage capacity is also critical
- Aspire to hit rate to as close to “authentic pharmacology” rate as possible
Changing the trajectory of discovery chemical space requires an interplay
between the composition of chemical libaries, assay practice, hit analysis
and downstream Hit to Lead and Lead to Candidate chemistry practice
34. Acknowledgements
Pat Brady Tony Jurewicz James Chan
Darren Green Glenn Hofmann Snehal Bhatt
Stephen Pickett Stan Martens Amy Quinn
Sunny Hung Jeff Gross Geoff Quinique
Subhas Chakravorty Zining Wu Bob Hertzberg
Nicola Richmond Mehu Patel
Jesus Herranz Emilio Diez
Gonzalo Colmeranjo-Sanchez Julio Martin-Plaza
…and numerous others who contributed to the 300+ HTS
campaigns run by GSK 2005-2010…..
Screening & Compound Profiling
36. Year of Screen
Colored by Hit
rate (%)
Hit rate as % of HR at cLogP =3.5
cLogP
37. Promiscuity v. Molecular Properties – Molecular weight
Compounds Compounds hitting
hitting ~1 target >10% of targets
Molecular Weight (Da)
Note; Compounds
required to have been
run in 50 HTS and
yielded > 50% effect in
a single screen to be
included
Frequency at bin > Frequency at bin > Frequency at bin > Frequency at bin >
Inhibition frequency Index (%)
*Inhibition frequency index (IFI) = % of screens where cpd yielded
>50% inhibition, where total screens run => 50
38. GSK HTS campaigns 2005-2010
Hit cut-off (% effect @ 10 uM) Hit rate (% of compounds) > cut-off
Number of Screens
Number of Screens
Mean + 3 *RSD of % compounds with effect
sample data (% control) > mean + 3 *RSD
39. Validation and robustness methods cannot detect
Property-biases
Compound sets used to test robustness of assays and
validate screening process reflect current compound
acquistion practice, not the collection as tested
cLogP
MW
40. Dose Response Data – Property Trends
Is the observed size & lipophilicity bias in HTS single-shot testing an artifact
of false positives, e.g. experimental “noise”?
% Rise in Active Rate
% of Tests Yielding pXC50 ≥ 5
% Rise in Active Rate
% of Tests Yielding pXC50 ≥ 5
% Rise in Active Rate
Molecular Weight cLogP
No, size and lipophilicity dependence is still observed in the rate of
identifying compounds at 10uM activity or better
41. Molecular Property Correlations in GSKscreen
Table below shows the correlation coefficients (R2) between particular molecular
properties and MW/ClogP, along with whether the correlation is positive or
negative (i.e. the sign of the slope in a linear regression)
This data is computed using 2.09M compounds comprising GSKscreen
Property R2, ± vs MW R2, ± vs ClogP
MW 1, + 0.21, +
ClogP 0.21, + 1.0, +
HAC 0.92, + 0.19, +
fCsp3 0.15, + 0.00
RotBonds 0.36, + 0.04, +
tPSA 0.16, + 0.08, -
Chiral 0.02, + 0.00
HetAtmRatio 0.02, - 0.34, -
Complexity 0.31, + 0.02, +
Flexibility 0.02, + 0.00
AromRings 0.22, + 0.16, +
HBA 0.11, + 0.10, -
HBD 0.01, + 0.02, -
Across 2.09M cpds in GSKscreen