Smashing Molecules

Smashing
Molecules

How
Molecular
Fragments
Allow
us
to
Explore
Large

Chemical
Spaces

Rajarshi
Guha
&
Trung
Nguyen

NIH
Center
for

Transla9onal
Therapeu9cs

Chemaxon
UGM

September
2011

Outline

•  Fragments
as
the
building
blocks
of
chemistry

•  Fragments
and
SAR

•  Fragments
and
ac9vity
proﬁles

Big
Data
for
Some
Problems

•  Halevy
et
al
discuss
the
eﬀec9veness
of

extremely
large
datasets

•  Their
applica9on
focuses
on
machine

transla9on
–
see
the
Google
n-‐gram
corpus

•  They
suggest
that
such
extremely
large
datasets

are
useful
because
they
eﬀec9vely
encompass

all
n-‐grams
(phrases)
commonly
used

•  Domain
is
rela9vely
constrained

Halevy
et
al,
IEEE
Intelligent
Systems,
2009,
24,
8-‐12

Google
Scale
in
Chemistry?

•  What
would
be
the
equivalent
of
an
n-‐gram

corpus
in
chemistry?

–  Fragments

–  A
more
direct
analogy
can
be
made
by
using
LINGO’s

•  It
is
possible
to
generate
arbitrarily
large
(virtual)

compound
and

fragment
collec9ons

•  But
would
such
a
collec9on
span
all
of

“commonly
used”
chemistry?

–  Depending
on
the
ini9al
compound
set,
yes

–  But
we’re
also
interested
in
going
beyond
such
a

“commonly
used”
set

Fink
T,
Reymond
JL,
J
Chem
Inf
Model,
2007,
47,
342

Fragment
Diversity

•  Consider
a
set
of
bioac9ves
such
as
the
LOPAC

collec9on,
1280
compounds

•  Using
exhaus9ve

fragmenta9on
we
get

40

2,460
unique
fragments

Percent of Total
30

•  On
the
MLSMR

(~
372K
compounds),

20

we
get

164,583

10

fragments
0

0 1 2 3 4

log Fragment Frequency

Fragment
Diversity

6 All
fragments
4
Fragments
occurring
in

5
to
50
molecules

4
2

2
PC 2

0

PC 2
0

-2
-2

-4
-4

-4 -2 0 2
-4 -2 0 2 4
PC 1 PC 1

•  Distribu9on
of
MLSMR
fragments
in
BCUT
space

What
Do
We
Do
with
Fragments?

•  Assuming
we
obtain
fragments
from
a
large

enough
collec9on
what
do
we
do?

–  Learning
from
fragments
–
QSARs,
genera9ve

models

–  Use
fragments
as

ﬁlters,
alterna9ve

to
clustering

–  Explore
chemotypes

and
ac9vity

–  Scaﬀold
level
promiscuity

White,
D
and
Wilson,
RC,
J
Chem
Inf
Model,
2010,
50,
1257-‐1274

Scaffold
AcKvity
Diagrams

•  Network
oriented
view
of
fragment
(scaffold)

collec9ons

–  Similar
in
idea
to

Scaffold
Hunter
etc

–  Not
purely
hierarchical

•  Color
by
arbitrary

proper9es

•  Quickly
assess
u9lity

of
a
scaffold

•  Try
it
online

What
Makes
a
Good
Scaﬀold?

•  What
makes
a
good

scaﬀold?

–  Size,
complexity,
…

–  Do
the
members

represent
an
SAR
or
not?

–  Intui9on
and
experience

also
play
a
role

Scaﬀold
QSAR

Fit
PLS
or
ridge

regression
model

0
!

!

!!
!

!2
! !

!
!

Predicted
!

!4
! ! !!
!

Evaluate
topological

! !

and
physicochemical

!

!6
descriptors
for
the

!

!

R-‐groups

!8
Characterize
the

!8 !6 !4 !2 0
Observed
SAR
landscape

Scaffold
QSAR
-‐
Drawbacks

•  Many
scaffolds
have
few
(5
to
10)
members

•  Invariably,
more
features
than
observa9ons

•  If
the
number
of
R-‐groups
is
large,
the
feature

matrix
can
be
very
sparse

–  Less
of
a
problem
for
combinatorial
libraries

•  A
linear
fit
may
not
be
the
best
approach
to

correla9ng
R-‐groups
to
the
ac9vi9es

–  Difficult
to
choose
a
model
type
a
priori

Fragment
AcKvity
Profiles

•  Using
scaffolds
in
HTS
triage
usually
leads
to

two
ques9ons

–  What
is
known
about
the
chemical
series
with

respect
to
the
intended
target?

–  What
compound
classes
are
known
to
modulate

the
intended
target
&
how
similar
are
they
to

series
in
ques9on

•  We’re
interested
in
exploring
summaries
of

ac=vity,
grouped
by
scaffolds
and
targets

Fragment
AcKvity
Profiles

•  We
use
ChEMBL
(08)
as
the
source
of

bioac9vity
across
mul9ple
targets

•  Preprocess
the
database

–  Generate
scaffolds
(exhaus9ve
enumera9on
of

combina9ons
of
SSSR’s)

–  Normalize
ac9vity
data
so
that
we
compare
the

ac9vity
of
a
molecule
across
different
assays

Database
Setup

•  Preprocessing
steps
available
as
a
Java
servlet

–  hkp://tripod.nih.gov/ﬁles/chembl-‐servlets.zip

•  Need
ChEMBL
installed
in
Oracle;
we
add

some
extra
tables

–  Fragment
structures
and
computed
proper9es

–  Aggregated
assay
ac9vity
summary

•  Only
consider
assays
with
IC50’s
in
nM
and
uncensored

data,
more
than
5
observa9ons
and
a
MAD
>
0

–  (Robust)
z-‐scored
ac9vi9es

Some
Fragment
StaKsKcs

•  Considered
Z-‐score
range
of
-‐40
to
15

•  There
were
12,887
molecules
lying
outside

this
range

15 50

Number of compounds
Percentage of assays

40

10
30

20
5

10

0 0

1.0 1.5 2.0 2.5 -40 -30 -20 -10 0 10

log(Number of molecules) Z-score

Some
Fragment
StaKsKcs

•  Next,
iden9fy
fragments
with
8
to
20
atoms

and
occurring
in
100
to
900
molecules

•  Gives
us
1,746
fragments

40
Percentage of Fragments

30

20

10

0

200 400 600 800

Num Molecules

Some
Fragment
StaKsKcs

•  We
can
query
the
fragment
tables
to
get

ac9vity
summaries

40169 64473 115654

for
individual

60
N = 1457 N = 1595 N = 1515
50

40

fragments

30

20

10

0

•  For
these
examples
-20 0
5390
20 -40 -20
5486
0 20 -20 -10
13485
0 10

60

we
consider
the

Percent of Total N = 1489 N = 1578 N = 1455
50

40

30

full
range
of
Z-‐

20

10

0

scores
60
-5

N = 1280
0
778
5 10 15 0

N = 1918
10
2723
20 -60 -40

N = 2641
-20
4058
0 20

50

40

30

20

10

0

-30 -20 -10 0 10 -600 -400 -200 0 -50 0 50

Z-Score

Exploring
AcKvity
Proﬁles

Ac9vity
distribu9ons

of
parent
molecules

Fragments
from
ChEMBL
across
all
targets
Z-‐scores
for
individual

molecules
against
a

speciﬁc
target

Exploring
AcKvity
Proﬁles

•  User
can
draw
a
molecule
and
fragment
on

the
ﬂy

•  Use
generated

fragments
to

create

ac9vity

histograms

Target
SelecKon

•  Employs
the
ChEMBL

target
hierarchy

•  Can
select
target

families
or
individual

targets

Similar
Fragments
with
Similar
Proﬁles?

•  Consider
658
fragments
with
>
10
atoms
and

occurring
in
500
to
1200
molecules

•  Overall,
the
fragments
25

tend
to
be
dissimilar

20

–  95th
percen9le
is
just

Percentage of pairs
0.50
15

•  1,873
pairs
do
exhibit
10

Tc
>
0.8
5

0

0.0 0.2 0.4 0.6 0.8 1.0

Tanimoto Similarity

Comparing
AcKvity
Profiles

•  Compare
ac9vity
profiles
with
the
K-‐S
sta9s9c

•  Color
corresponds
to

1.0

p-‐value
of
the
K-‐S
test

0.6

0.5

•  No
obvious
correla9on

0.8

between
fragment

0.4

0.6

K-S statistic
similarity
&
ac9vity

0.3

0.4

profile
similarity

0.2

0.2

•  Probably
not
rigorous

0.1

when
a
scaffold
has
few

0.0 0.0

0.80 0.85 0.90 0.95 1.00

parent
molecules
Tanimoto Similarity

Exploring
Proﬁles
for
Fragment
Pairs

•  Compare
ac9vity

distribu9ons
across

all
targets
in
a

pairwise
fashion

•  Can
also
generate

comparison
for
a

single
target,
but

requires
data
for
all

the
fragments

Looking
for
SelecKve
Fragments

•  Interes9ng
to
visually
explore
fragment
pairs

•  Can
become
tedious,
especially
in
a
database

as
big
as
ChEMBL

•  Can
we
automate
this
type
of
analysis?

–  Iden9fy
fragment
pairs
with
very
diﬀerent
ac9vity

distribu9ons?

–  Iden9fy
fragments
with
a
preference
for
a
certain

target
(class)?

Mean Z−Score

Ac −10 −5 0
et
yl
ch
Ad olin
re e
ne re
rg cep
ic

3
re tor
An ce
gi pt

50
ot or

4056459
en
si
n Ag

6
ge re c
ne ce
−r p

14
el AN tor
at IO
ed N

class

IC

107
pe
pt
id C
e

target

6
re 1A
ce
pt
C

2
C or
ch C
em C am

5
ok AT k
C in ION
e

19
XC re IC
ch ce
em pt

1
or
ok
in Cm
e

19
re gc
c
C ept

1
YP or
_1
C 1

3
YP B1
_
C 11B

6
YP 2
_1

8
C 9A1
YP
C _1A

14
YP 2
_2
C C1

7
YP 9
_
C 2C

17
YP 9
_

13
C 2D6
YP
_

20
C 3A4
YP
C _4A

2
YP 1
_4

24
C A11
YP
_

2
C 4A3
YP
D _
op C 4F

24
am YP 2
in _5
e

9
re A1
En ce
pt

18
do or
th
el
in dru

4
G re g
nR ce
H H p
is

2
ta re tor
m
in cep
e

2
et re tor
ab ce
ot pt
ro

1
or
pi
c M
gl C M1
ut H

2
0A
a re
N ma cep
N e t
1
eu uro e re tor
ro k c
pe inin ept
o
1
pt
id rec r
e ep
Y to
2

N r
or ece r
ep pt
in o
10

ep r
hr
in
1

N e
R
1H
59

N 3
R
3A
4

N 1
R
3A
4

O 2
pi NR
oi
d 3C
2

re 3
ce
pt
4

or
po PA
86

ta F
ss
Se iu
m
3

ro
to
•  Count
number
of
parent
molecules
tested
against
the

ni
So n S1
12

di r A
um ece
_h pto
42

yd r
ro
ge
7

n
153

Tk
•  Evaluate
mean
ac9vity
of
parent
molecules
within
a
target

•  Selec9vity
of
1-‐phenylimidazole
for
CYP450
has
been
noted

Wilkinson
et
al,
Biochem
Pharmacol,
1983,
32,
997-‐1003

Targetwise
AcKvity
Proﬁles

Mean Z−Score

−8 −6 −4 −2 0 2

Ad
re
n er
g ic A2

5
re A
An ce
pt

2
gi or

4055899
ot
e
Br nsin Ag

23
ad c
yk rec
in ep
in t

7
al re or
ci ce
um pt

6
se or
ns
in C
g

7
1A
re
ce
pt

24
or
C
C C
ch am
e C

2
C
ho mo ATI k
le k O
cy ine N
IC

67
st
ok rec
in ep
in t

102
re or
ce
pt

6
or
C
m
C g

18
YP c
_2
C D

3
YP 6
_3
D

8
op Do A4
am pa
in m
e in

11
r e
ED ece
En G pt

19
do re or
th ce
el
G in pt
o

16
lu
ca rec r
go ep
n to

2
G re r
nR ce
H H pt

1
is re or
ta
Le min cep
e to

16
uk
ot r r
rie ece
ne pt

49
re or
ce
pt

1
or
M
10

3
ro A
pi
c M
C M1
gl H

2
ut 2B
am rec
N a ep
t

33
N eu te
eu ro re or
ro ki ce
pe nin pt

18
r or
pt
id ece
e pt
Y

118
r or
N
or ece
ep pt
o

1
in
ep r
hr
in 1
e
N
R
1I
4

O 1
pi NR
oi 3C
d
2

re 4
ce
pt
11

or
•  But
reported
as
dopamine
agonists

O
Pr th
8

os er
ta
no
id PA
3

re F
ce
pt
28

or
R
5

eg
S1
38

A
S2
with
preference
for
a
specific
target
class

7

1
•  Iden9fied
benzylpyrrolidine
as
a
fragment

Se S9
45

ro Se A
t on roto
in ni
4

re n
ce
pt
9

or
29

Tk

Tk
2

l
Targetwise
AcKvity
Profiles

Fragment
or
Scaffold?

•  I’ve
been
using
fragment
&
scaffold

interchangeably
–
not
always
true

•  Chemists
have
an
intui9ve
idea
of
what
a

scaffold
is

•  Can
we
encode
the
idea
of
scaffold-‐like
or

fragment-‐like

•  We
use
the
concept
of

Size
of
fragment

Signal-‐to-‐Noise

µ SD
of
number
of
atoms

Ra9o
SNR = not
in
the
fragment,

! considered
over
the

parent
molecules

Fragment
or
Scaﬀold

•  Par9al
distribu9on
of
SNR
values
for
fragments

with
atom
count
>
8
&
<
20

60

50

Percentage of Fragments
40

30

20

10

0

0 1 2 3 4 5 6

SNR

Fragment
or
Scaﬀold

•  Large
SNR’s
associated
with
Murcko-‐like
fragments

•  A
useful
SNR
cutoﬀ
is
an
open
ques9on

SNR
=
8.50
SNR
=
9.10
SNR
=
12.09

SNR
=
0.83
SNR
=
0.43
SNR
=
0.36

AcKvity
Proﬁles
&
SNR

•  Given
a
fragment,
evaluate
SD
of
the
number
of

atoms
in
the
parent
molecules
that
are
not
part

of
the
fragment

•  Label
the
parent
molecules
based
on

–  If
number
of
atoms
not
in
the
fragment
>
SD,
non

core-‐like

–  Otherwise
core-‐like

•  Visualize
the
ac9vity
distribu9ons
of
the
parent

molecules,
grouped
by
the
label

AcKvity
Proﬁles
&
SNR

-50 0 50 -50 0 50

20967 20967 44591 44591
Core-like Not core-like Core-like Not core-like
Percentage of Total

80

60

40

20

-50 0 50 -50 0 50
High
SNR

Z-Score

-30 -20 -10 0 10 -30 -20 -10 0 10

801 801 68604 68604
Core-like Not core-like Core-like Not core-like
Percentage of Total

80

60

40

20

Low
SNR

-30 -20 -10 0 10 -30 -20 -10 0 10

Z-Score

Downloads

•  Scaﬀold
ac9vity
networks

•  Fragment
Ac9vity
Proﬁler

–  SQL
&
servlet
sources

–  Client
sources

–  Online
version

Smashing Molecules

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von Rajarshi Guha

Mehr von Rajarshi Guha (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Smashing Molecules