Interplay between screenng_data_and _properties_pope

The Interplay between Chemical Properties
and Screening Data

Andy Pope
Platform Technology & Science, GlaxoSmithKline,
Collegeville PA, USA

MipTec 2011, Basel
Sept. 20-22, 2011

Compound properties aren’t what they used to be…

MW
ClogP
ClogP

Properties vs Phase*
MW

cLogP (median) MW (median) *Adapted from Blake JF, Medicinal Chemistry,
 Failed candidate = 3.9  Failed candidate = 432 2005, 1, 649-655
 Marketed drug = 2.5  Marketed drug = 349

And we have known this for a while. …

Drug discovery chemical property space - Some critical
factors . …

- Chemistry methods
Drug - Chemistry “culture”
candidates
- Hit ID libraries

- Screening methods
- SAR data

Drug discovery chemical property space - Some critical
factors . …

- Chemistry methods - Efficiency concepts
Drug - Chemistry “culture” - Property guides/rules
candidates
- Hit ID libraries
- Rigorous property rules
- Fragments
- Lead-like, “Beautiful”

- Screening methods
- SAR data

Does assay data influence discovery chemical property
space occupancy?

(…or vice versa)

Large Scale analysis of High Throughput Screening Data
HTS at GSK
 330 screens of >500,000 cpds, 2005-2010

 Single concentration primary data (10 uM) re-analysed

 Compound results binned according to simple compound properties

 Meta-data (e.g. target class, screening technology) curated

Academic screening centers (MLPCN)
 ~100 screens with >250,000 cpds tested & deposited to PubChem BioAssay
from major NIH funded screening centers (NCGC, Scripps, Broad)

 Single concentration data re-analysed using same methods as GSK data

The GSK HTS Process

Primary Screen
(10 uM – singlicate) ~ Entire collection
2 (100%)
Statistical separation
from null effect Chemical clustering
~2 million if hit rate >1%
population

Confirmation
(10 uM – duplicate)
Potential actives
(<1%)
Eliminate false ~20,000
positives from Chemical clustering for
primary diversity & property
sampling

Dose response
(11 pt 3-fold dilution)
“Real“ hits
(<0.1%)
~2,000

HTS Hit Marking Processes
---- Binned Raw HTS data
---- Fit with raw mean and std. deviation Normal Distribution:
---- Fit with robust mean & std. deviation 3 x RSD cut-off x 2

“miss” “hit” 1 2 2
Px e
2
2

Raw mean = 1.0
Raw SD = 11.1
% Compounds

Robust mean = -0.3
Robust SD = 5.5
% Compounds

weak hits, artefacts, and Blue & black curves are
statistical “noise” normal distribution fits
using mean & SD

potent hits (& artefacts)

RESPONSE (% control)


Frequency (% cpds/bin) Typical HTS observed data distribution vs. fit

---- Binned Raw HTS data Effect (% control)
---- Robust distribution fit
---- Hit cut-off (mean + 3 x RSD)
Note; representative selection of individual
---- Residual (raw – fit)
screens from ~330 analyzed

Frequency (% cpds in bin) Observed data distribution vs. fit – zoom

---- Binned Raw HTS data Effect (% control)
---- Robust distribution fit
---- Hit cut-off (mean + 3 x RSD) Note; representative selection of
---- Residual (raw – fit) individual screens from ~330 analyzed

Screen cut-off (mean + 3 x RSD)
GSK HTS campaigns 2005-2010

Average robust Z’ of assay during HTS production

Looking for property trends in the GSK HTS dataset
The total polar surface area (tPSA) is
defined as the surface sum over all polar
e.g. Compound total polar surface area; atoms
< 60 A2 predicts brain penetration
Aggregate results from all > 140 A2 predicts poor cell penetration
330 campaigns 2005-2010
with >500K tests Compounds with tPSA 80-85 Å2

26M measured responses in this bin
- 485k marked as “hit”

Hit rate = 100*(485k/26M) = 1.86%
Hit Rate (%)

- Hit rate for Compounds in
specific tPSA bin

Polar Surface Area (tPSA, Å2)

Compound shapeliness and flexibility
 fCsp3 captures “shapeliness” of a compound
- Weak positive correlation with MW
- More irregular 3D shape  lower hit probability
Hit Rate (%)

 Flexibility = Percentage of a compound’s bonds that are
rotatable
- light decrease in HR with Flexibility
- No correlation with MW or ClogP

Fraction of carbons that are sp3 (fCsp3)
Hit Rate (%)

Flexibility

Compound Size (MW)
 HTS hit rates rises significantly
with increasing compound MW Middle 80% of Cpds
270 470

Cumulative % Cpds
% Cpds in MW Bin
4.0%
Hit Rate (%)

2.62%

1.50% MW
1.2%
 Overall Hit rate rises 1.7-fold across
the middle 80% of the screening deck
i.e. 70% rise in hit rate from MW = 270 to
Molecular Weight (MW) MW = 470

 3.3-fold rise across full MW range
- Only bins containing 1M or more
records are shown

Compound Lipophilicity (ClogP)
 HTS hit rates rises sharply with
Middle 80% of Cpds
increasing compound lipophilicity 1 5

Cumulative % Cpds
% Cpds in ClogP Bin
4.5%

3.31%
Hit Rate (%)

ClogP

1.14%
1.1%
 Overall hit rate rises 2.9-fold across the
middle 80% of the screening deck
i.e. from ClogP = 1  5
ClogP  4.1-fold rise across full ClogP range

- Only bins containing 1M or more
records are shown

Promiscuity v. Molecular Properties
 The prevalence of promiscuous compounds rises sharply with size and
lipophilicity
• Hit Frequency Index (HFI)= % of SS HTS campaigns that a compound give activity >cut-off
• “Promiscuous” compound  HFI ≥ 10% (having seen at least 50 campaigns)
% of Promiscuous Compounds

% of Promiscuous Compounds

% Rise in Promiscuity
% Rise in Promiscuity

Molecular Weight cLogP

Across the middle 80% of the screening deck …
• Large compounds are 4-fold more likely to have high HFI than small ones (MW: 270  470)
• Lipophilic compounds are 10-fold more likely to have high HFI than polar ones ones (cLogP: 1  5)

Property distributions vs. promiscuity - cLogP
Compounds Compounds hitting
hitting ~1 target >10% of targets
cLogP

Note; Compounds
required to have been
run in 50 HTS and
yielded > 50% effect in
a single screen to be
included

Frequency at bin > Frequency at bin > Frequency at bin > Frequency at bin >

Inhibition frequency Index* (%)

*Inhibition frequency index (IFI) = % of screens where cpd yielded
>50% inhibition, where total screens run => 50

The “Dark” Matter
– Compounds which have not yielded >50% effect once in >50 screens

Molecular Weight (Da)
cLogP

Translation of biases to full-curve follow-up

 Property bias in primary HTS hit marking are propagated forward to dose-
response follow-up
SS testing
FC testing
FC – SS differential
% Compounds Tested

% Compounds Tested

cLogP Molecular Weight

Elevated testing of large, lipophilic Reduced testing of small, polar compounds
compounds in the full-curve phase of HTS in the full-curve phase of HTS

Note; Plots represent data from 402M single-concentration responses &
2.1M full-curve results

Property Trends; translation to dose response
 Property effects contribute to hits at all effect levels
- i.e not just hits on the statistical margins

 Property-dependence decreases through the HTS process

Standard 3SD SS Hits Standard 3SD SS Hits
Top 0.1% of SS Responses Top 0.1% of SS Responses
% of cpds with IC50 <= 10 uM % of compounds with IC50 <= 10 uM
% Lift in Hit Rate

% Lift in Hit Rate

ClogP MW

From *ClogP = 1  5: From *MW = 270  470:
• 3SD: 2.9X rise in Hit Rate • 3SD: 1.8X rise in Hit Rate
• Top 0.1%: 2.2X rise • Top 0.1%: 1.3X rise
• FC Active: 1.5X rise • FC Active: 1.2X rise

*Across the middle 80% of the deck,….

Property response of individual screens is highly variable
e.g. Screens with largest response to cLogP
Hit rate as % of HR at cLogP =3.5

cLogP

Property response of individual screens is highly variable
e.g. Screens with smallest response to cLogP

cLogP

Assay Technology

Colored by Hit
rate (%)

cLogP

Target Class
Colored by Hit
rate (%)

cLogP

Improving hit marking
- reducing bias towards high cLogP, MW hits

 Virtual partitioning of collection according to property
- e.g. sub-collections in different cLogP ranges

 Change the hit calling method, so this takes properties as well as % effect into
account.
- e.g. calculate hit cut-off’s bases on BEI/LEI etc.
- “scalar” methods based on correcting the observed biases

And..improving assays and the collection based on awareness of
these biases

Improving hit marking – Property Biasing

Mean + 3 x RSD cut-off

Hit Rate (%)
Ordinary HTS Hit Marking
Property-biased Hit Marking
More attractive
properties
% Compounds

- promote MW

Less attractive

Hit Rate (%)
properties
- demote

Ordinary HTS Hit Marking
Property-biased Hit Marking

ClogP

Improving hit marking HitProperty Binning
Property-biased – Marking
Sub-divide screening data into bins of compounds with similar properties
- apply 3 x rsd hit cut-offs to each bin

 Consensus method combines approaches – routinely implemented

Response Response
Property-Binned stats Property-Binned stats
Property Consensus Property Consensus

Hit Rate (%)
Hit Rate (%)

Bin 1; Bin 2; Bin 3; Bin 1; Bin 2; Bin 3;
Low MW, Medium MW, High MW, Low MW, Medium MW, High MW,
cLogP cLogP cLogP cLogP cLogP cLogP

MW ClogP

Evolving the screening collection to smaller, more polar
lead-like space
 GSK’s Compound Collection Enhancement (CCE) strategy has biased the HTS deck
towards decreased size and lipophilicity with the aim of improving chemical starting
points
Compounds tested in HTS

% Compounds Exceeding Property Limit
- 2004
(% of total compounds in HTS)

- 2010
- 2010 <> 2004

ClogP > 5

MW > 500

New
2011

ClogP Year

CCE Acquisition, Property Bounds
2004-05: Lipinski criteria (MW<500, ClogP<5)
Most recently: MW<360, ClogP<3
Inclusion of DPU lead-op cpds: MW<500, ClogP<5

Property trends in MLPCN Screening Data
 Primary data from around 100 Academic HTS campaigns obtained from
PubChem BioAssay

Lipophilicity – similar to GSK HTS Compound size – little effect

3.80%

Hit Rate (%)
Hit Rate (%)

Pretty flat
2.27%
2.14%

1.28%

ClogP (MW)

 GSK screening deck (>50 HTSs, 2.01M cpds)
ClogP = 0.00835*MW – 0.058, R2 = 0.18
 PubChem Compounds (405k)
ClogP = 0.00554*MW + 0.97, R2 = 0.09

MLPCN Screening Data – Property Trends

 Example Individual screen responses to cLogP

Trellis by individual screens
3 x rsd hit rate (%)

cLogP

Small Beautiful Set Screening

SBS = Subset of the HTS deck which spans the
gap between HTS and fragments
HTS collection (2M)

 Filtered on;
- size and lipophilicity
• 10 ≤ HAC ≤ 28 and -2 ≤ ClogP ≤ 3, bounded (MW)
- “promiscuity” – frequent-hitters are eliminated
• IFI ≤ 3% (IFI = Inhibition Frequency Index, 3SD hit cutoff)
- hit explosion opportunity
• Near Neighbor Count ≥ 20 (in GSK registry
- “shapliness”
• fCsp3 ≥ 0.3 (i.e. ≥ 30% of carbon atoms must be sp3)
- acquisition sub-structural filters
- “greedy” diversity selection (no compounds >0.9 similar )

ClogP
SBS2 = ~75,000 compounds

Tested at higher concentration (e.g. 100-200 uM)

Conclusions
 Standard HTS processes favor the selection of larger, more lipophilic
compounds

 There are no clear trends between this behavior and assay technology or
target class

 Methods have been developed which (to some extent) compensate for
property biases to ensure that attractive lead like molecules are selected
- Overall hit rate in relation to downstream triage capacity is also critical
- Aspire to hit rate to as close to “authentic pharmacology” rate as possible

 Changing the trajectory of discovery chemical space requires an interplay
between the composition of chemical libaries, assay practice, hit analysis
and downstream Hit to Lead and Lead to Candidate chemistry practice

Acknowledgements

Pat Brady Tony Jurewicz James Chan
Darren Green Glenn Hofmann Snehal Bhatt
Stephen Pickett Stan Martens Amy Quinn
Sunny Hung Jeff Gross Geoff Quinique
Subhas Chakravorty Zining Wu Bob Hertzberg
Nicola Richmond Mehu Patel
Jesus Herranz Emilio Diez
Gonzalo Colmeranjo-Sanchez Julio Martin-Plaza

…and numerous others who contributed to the 300+ HTS
campaigns run by GSK 2005-2010…..

Screening & Compound Profiling

Year of Screen

Colored by Hit
rate (%)

cLogP

Promiscuity v. Molecular Properties – Molecular weight
Compounds Compounds hitting
hitting ~1 target >10% of targets
Molecular Weight (Da)

Note; Compounds
required to have been
run in 50 HTS and
yielded > 50% effect in
a single screen to be
included
Frequency at bin > Frequency at bin > Frequency at bin > Frequency at bin >

Inhibition frequency Index (%)
*Inhibition frequency index (IFI) = % of screens where cpd yielded
>50% inhibition, where total screens run => 50

GSK HTS campaigns 2005-2010
Hit cut-off (% effect @ 10 uM) Hit rate (% of compounds) > cut-off
Number of Screens

Number of Screens

Mean + 3 *RSD of % compounds with effect
sample data (% control) > mean + 3 *RSD

Validation and robustness methods cannot detect
Property-biases
Compound sets used to test robustness of assays and
validate screening process reflect current compound
acquistion practice, not the collection as tested
cLogP

MW

Dose Response Data – Property Trends
 Is the observed size & lipophilicity bias in HTS single-shot testing an artifact
of false positives, e.g. experimental “noise”?

% Rise in Active Rate
% of Tests Yielding pXC50 ≥ 5


% of Tests Yielding pXC50 ≥ 5

Molecular Weight cLogP

 No, size and lipophilicity dependence is still observed in the rate of
identifying compounds at 10uM activity or better

Molecular Property Correlations in GSKscreen

 Table below shows the correlation coefficients (R2) between particular molecular
properties and MW/ClogP, along with whether the correlation is positive or
negative (i.e. the sign of the slope in a linear regression)
 This data is computed using 2.09M compounds comprising GSKscreen
Property R2, ± vs MW R2, ± vs ClogP
MW 1, + 0.21, +
ClogP 0.21, + 1.0, +
HAC 0.92, + 0.19, +
fCsp3 0.15, + 0.00
RotBonds 0.36, + 0.04, +
tPSA 0.16, + 0.08, -
Chiral 0.02, + 0.00
HetAtmRatio 0.02, - 0.34, -
Complexity 0.31, + 0.02, +
Flexibility 0.02, + 0.00
AromRings 0.22, + 0.16, +
HBA 0.11, + 0.10, -
HBD 0.01, + 0.02, -
Across 2.09M cpds in GSKscreen

Interplay between screenng_data_and _properties_pope

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Interplay between screenng_data_and _properties_pope