SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Smashing	
  Molecules	
  
How	
  Molecular	
  Fragments	
  Allow	
  us	
  to	
  Explore	
  Large	
  
                                               Chemical	
  Spaces	
  




                Rajarshi	
  Guha	
  &	
  Trung	
  Nguyen	
  
                         NIH	
  Center	
  for	
  	
  
                  Transla9onal	
  Therapeu9cs	
  
                                    	
  
                       Chemaxon	
  UGM	
  
                       September	
  2011	
  
Outline	
  
•  Fragments	
  as	
  the	
  building	
  blocks	
  of	
  chemistry	
  
•  Fragments	
  and	
  SAR	
  
•  Fragments	
  and	
  ac9vity	
  profiles	
  
Big	
  Data	
  for	
  Some	
  Problems	
  

           •  Halevy	
  et	
  al	
  discuss	
  the	
  effec9veness	
  of	
  
              extremely	
  large	
  datasets	
  
           •  Their	
  applica9on	
  focuses	
  on	
  machine	
  
              transla9on	
  –	
  see	
  the	
  Google	
  n-­‐gram	
  corpus	
  
           •  They	
  suggest	
  that	
  such	
  extremely	
  large	
  datasets	
  
              are	
  useful	
  because	
  they	
  effec9vely	
  encompass	
  
              all	
  n-­‐grams	
  (phrases)	
  commonly	
  used	
  
           •  Domain	
  is	
  rela9vely	
  constrained	
  

Halevy	
  et	
  al,	
  IEEE	
  Intelligent	
  Systems,	
  2009,	
  24,	
  8-­‐12	
  
Google	
  Scale	
  in	
  Chemistry?	
  
           •  What	
  would	
  be	
  the	
  equivalent	
  of	
  an	
  n-­‐gram	
  
              corpus	
  in	
  chemistry?	
  
                       –  Fragments	
  
                       –  A	
  more	
  direct	
  analogy	
  can	
  be	
  made	
  by	
  using	
  LINGO’s	
  
           •  It	
  is	
  possible	
  to	
  generate	
  arbitrarily	
  large	
  (virtual)	
  
              compound	
  and	
  	
  fragment	
  collec9ons	
  
           •  But	
  would	
  such	
  a	
  collec9on	
  span	
  all	
  of	
  
              “commonly	
  used”	
  chemistry?	
  
                       –  Depending	
  on	
  the	
  ini9al	
  compound	
  set,	
  yes	
  
                       –  But	
  we’re	
  also	
  interested	
  in	
  going	
  beyond	
  such	
  a	
  
                          “commonly	
  used”	
  set	
  

Fink	
  T,	
  Reymond	
  JL,	
  J	
  Chem	
  Inf	
  Model,	
  2007,	
  47,	
  342	
  
Fragment	
  Diversity	
  

•  Consider	
  a	
  set	
  of	
  bioac9ves	
  such	
  as	
  the	
  LOPAC	
  
   collec9on,	
  1280	
  compounds	
  
•  Using	
  exhaus9ve	
  	
  
   fragmenta9on	
  we	
  get	
  	
                           40


   2,460	
  unique	
  fragments	
  

                                          Percent of Total
                                                             30


•  On	
  the	
  MLSMR	
  	
  
   (~	
  372K	
  compounds),	
  	
  
                                                             20




   we	
  get	
  	
  164,583	
  	
                            10




   fragments	
                                                0


                                                                  0   1          2         3       4

                                                                          log Fragment Frequency
Fragment	
  Diversity	
  
       6                               All	
  fragments	
              4
                                                                                      Fragments	
  occurring	
  in	
  	
  
                                                                                      5	
  to	
  50	
  molecules	
  
       4
                                                                       2


       2
PC 2




                                                                       0




                                                                PC 2
       0


                                                                       -2
       -2



                                                                       -4
       -4



                 -4    -2          0          2
                                                                            -4   -2           0          2          4
                            PC 1                                                            PC 1



            •  Distribu9on	
  of	
  MLSMR	
  fragments	
  in	
  BCUT	
  space	
  
What	
  Do	
  We	
  Do	
  with	
  Fragments?	
  

   •  Assuming	
  we	
  obtain	
  fragments	
  from	
  a	
  large	
  
      enough	
  collec9on	
  what	
  do	
  we	
  do?	
  
               –  Learning	
  from	
  fragments	
  –	
  QSARs,	
  genera9ve	
  
                  models	
  
               –  Use	
  fragments	
  as	
  	
  
                  filters,	
  alterna9ve	
  	
  
                  to	
  clustering	
  
               –  Explore	
  chemotypes	
  
                  and	
  ac9vity	
  
               –  Scaffold	
  level	
  promiscuity	
  
White,	
  D	
  and	
  Wilson,	
  RC,	
  J	
  Chem	
  Inf	
  Model,	
  2010,	
  50,	
  1257-­‐1274	
  
Scaffold	
  AcKvity	
  Diagrams	
  

•  Network	
  oriented	
  view	
  of	
  fragment	
  (scaffold)	
  
   collec9ons	
  
    –  Similar	
  in	
  idea	
  to	
  
       Scaffold	
  Hunter	
  etc	
  
    –  Not	
  purely	
  hierarchical	
  
•  Color	
  by	
  arbitrary	
  	
  
   proper9es	
  
•  Quickly	
  assess	
  u9lity	
  
   of	
  a	
  scaffold	
  
•  Try	
  it	
  online	
  	
  
What	
  Makes	
  a	
  Good	
  Scaffold?	
  
•  What	
  makes	
  a	
  good	
  
   scaffold?	
  
    –  Size,	
  complexity,	
  …	
  
    –  Do	
  the	
  members	
  
       represent	
  an	
  SAR	
  or	
  not?	
  
    –  Intui9on	
  and	
  experience	
  
       also	
  play	
  a	
  role	
  
Scaffold	
  QSAR	
  
                                                                             Fit	
  PLS	
  or	
  ridge	
  
                                                                             regression	
  model	
  




                                                                   0
                                                                                                                                    !


                                                                                                                       !

                                                                                                                       !!
                                                                                                                            !




                                                                   !2
                                                                                                !                  !

                                                                                            !
                                                                                                          !




                                                       Predicted
                                                                                                      !




                                                                   !4
                                                                                          ! !       !!
                                                                                                      !




Evaluate	
  topological	
  	
  
                                                                                   !        !




and	
  physicochemical	
  	
  
                                                                                   !




                                                                   !6
descriptors	
  for	
  the	
  	
                                          !

                                                                             !


R-­‐groups	
  


                                                                   !8
                         Characterize	
  the	
  	
                      !8        !6        !4                !2                0
                                                                                       Observed
                         SAR	
  landscape	
  
Scaffold	
  QSAR	
  -­‐	
  Drawbacks	
  

•  Many	
  scaffolds	
  have	
  few	
  (5	
  to	
  10)	
  members	
  
•  Invariably,	
  more	
  features	
  than	
  observa9ons	
  
•  If	
  the	
  number	
  of	
  R-­‐groups	
  is	
  large,	
  the	
  feature	
  
   matrix	
  can	
  be	
  very	
  sparse	
  
    –  Less	
  of	
  a	
  problem	
  for	
  combinatorial	
  libraries	
  
•  A	
  linear	
  fit	
  may	
  not	
  be	
  the	
  best	
  approach	
  to	
  
   correla9ng	
  R-­‐groups	
  to	
  the	
  ac9vi9es	
  
    –  Difficult	
  to	
  choose	
  a	
  model	
  type	
  a	
  priori	
  
Fragment	
  AcKvity	
  Profiles	
  
•  Using	
  scaffolds	
  in	
  HTS	
  triage	
  usually	
  leads	
  to	
  
   two	
  ques9ons	
  
    –  What	
  is	
  known	
  about	
  the	
  chemical	
  series	
  with	
  
       respect	
  to	
  the	
  intended	
  target?	
  
    –  What	
  compound	
  classes	
  are	
  known	
  to	
  modulate	
  
       the	
  intended	
  target	
  &	
  how	
  similar	
  are	
  they	
  to	
  
       series	
  in	
  ques9on	
  
•  We’re	
  interested	
  in	
  exploring	
  summaries	
  of	
  
   ac=vity,	
  grouped	
  by	
  scaffolds	
  and	
  targets	
  
Fragment	
  AcKvity	
  Profiles	
  
•  We	
  use	
  ChEMBL	
  (08)	
  as	
  the	
  source	
  of	
  
   bioac9vity	
  across	
  mul9ple	
  targets	
  
•  Preprocess	
  the	
  database	
  
    –  Generate	
  scaffolds	
  (exhaus9ve	
  enumera9on	
  of	
  
       combina9ons	
  of	
  SSSR’s)	
  
    –  Normalize	
  ac9vity	
  data	
  so	
  that	
  we	
  compare	
  the	
  
       ac9vity	
  of	
  a	
  molecule	
  across	
  different	
  assays	
  
Database	
  Setup	
  
•  Preprocessing	
  steps	
  available	
  as	
  a	
  Java	
  servlet	
  
    –  hkp://tripod.nih.gov/files/chembl-­‐servlets.zip	
  
•  Need	
  ChEMBL	
  installed	
  in	
  Oracle;	
  we	
  add	
  
   some	
  extra	
  tables	
  
    –  Fragment	
  structures	
  and	
  computed	
  proper9es	
  
    –  Aggregated	
  assay	
  ac9vity	
  summary	
  
        •  Only	
  consider	
  assays	
  with	
  IC50’s	
  in	
  nM	
  and	
  uncensored	
  
           data,	
  more	
  than	
  5	
  observa9ons	
  and	
  a	
  MAD	
  >	
  0	
  
    –  (Robust)	
  z-­‐scored	
  ac9vi9es	
  
Some	
  Fragment	
  StaKsKcs	
  
•  Considered	
  Z-­‐score	
  range	
  of	
  -­‐40	
  to	
  15	
  
•  There	
  were	
  12,887	
  molecules	
  lying	
  outside	
  
   this	
  range	
  

                          15                                                      50




                                                            Number of compounds
   Percentage of assays




                                                                                  40


                          10
                                                                                  30



                                                                                  20
                           5

                                                                                  10



                           0                                                       0


                                1.0    1.5     2.0    2.5                              -40   -30   -20   -10   0   10

                               log(Number of molecules)                                              Z-score
Some	
  Fragment	
  StaKsKcs	
  
•  Next,	
  iden9fy	
  fragments	
  with	
  8	
  to	
  20	
  atoms	
  
   and	
  occurring	
  in	
  100	
  to	
  900	
  molecules	
  
•  Gives	
  us	
  1,746	
  fragments	
  

                                 40
       Percentage of Fragments




                                 30




                                 20




                                 10




                                  0


                                      200     400           600   800

                                                Num Molecules
Some	
  Fragment	
  StaKsKcs	
  
•  We	
  can	
  query	
  the	
  fragment	
  tables	
  to	
  get	
  
   ac9vity	
  summaries	
  	
                                            40169                                          64473                               115654




   for	
  individual	
  	
  
                                                  60
                                                       N = 1457                                      N = 1595                                 N = 1515
                                                  50

                                                  40




   fragments	
  
                                                  30

                                                  20

                                                  10

                                                   0



•  For	
  these	
  examples	
                                  -20          0
                                                                         5390
                                                                                          20           -40        -20
                                                                                                                        5486
                                                                                                                                0        20   -20     -10
                                                                                                                                                            13485
                                                                                                                                                                  0           10


                                                                                                                                                                                         60



   we	
  consider	
  the	
  
                               Percent of Total        N = 1489                                        N = 1578                               N = 1455
                                                                                                                                                                                         50

                                                                                                                                                                                         40

                                                                                                                                                                                         30



   full	
  range	
  of	
  Z-­‐	
  
                                                                                                                                                                                         20

                                                                                                                                                                                         10

                                                                                                                                                                                         0



   scores	
                                       60
                                                         -5



                                                       N = 1280
                                                                     0
                                                                         778
                                                                            5        10         15           0



                                                                                                      N = 1918
                                                                                                                          10
                                                                                                                        2723
                                                                                                                                    20        -60    -40



                                                                                                                                                N = 2641
                                                                                                                                                            -20
                                                                                                                                                            4058
                                                                                                                                                                          0        20




                                                  50

                                                  40

                                                  30

                                                  20

                                                  10

                                                   0

                                                         -30      -20     -10    0         10        -600       -400        -200         0            -50             0             50

                                                                                                                  Z-Score
Exploring	
  AcKvity	
  Profiles	
  
                                  Ac9vity	
  distribu9ons	
  
                                  of	
  parent	
  molecules	
  	
  
Fragments	
  from	
  ChEMBL	
     across	
  all	
  targets	
          Z-­‐scores	
  for	
  individual	
  
                                                                      molecules	
  against	
  a	
  	
  
                                                                      specific	
  target	
  
Exploring	
  AcKvity	
  Profiles	
  
•  User	
  can	
  draw	
  a	
  molecule	
  and	
  fragment	
  on	
  
   the	
  fly	
  
•  Use	
  generated	
  
   fragments	
  to	
  	
  
   create	
  	
  
   ac9vity	
  	
  
   histograms	
  
Target	
  SelecKon	
  
•  Employs	
  the	
  ChEMBL	
  
   target	
  hierarchy	
  
•  Can	
  select	
  target	
  	
  
   families	
  or	
  individual	
  
   targets	
  
Similar	
  Fragments	
  with	
  Similar	
  Profiles?	
  

•  Consider	
  658	
  fragments	
  with	
  >	
  10	
  atoms	
  and	
  
   occurring	
  in	
  500	
  to	
  1200	
  molecules	
  
•  Overall,	
  the	
  fragments	
                                  25



   tend	
  to	
  be	
  dissimilar	
  	
  
                                                                   20


    –  95th	
  percen9le	
  is	
  just	
  

                                             Percentage of pairs
           0.50	
                                                  15




•  1,873	
  pairs	
  do	
  exhibit	
                               10




   Tc	
  >	
  0.8	
                                                 5


    	
  
                                                                    0



                                                                        0.0   0.2     0.4       0.6       0.8   1.0

                                                                                    Tanimoto Similarity
Comparing	
  AcKvity	
  Profiles	
  
•  Compare	
  ac9vity	
  profiles	
  with	
  the	
  K-­‐S	
  sta9s9c	
  
•  Color	
  corresponds	
  to	
  	
                                                                        1.0


   p-­‐value	
  of	
  the	
  K-­‐S	
  test	
  
                                                   0.6




                                                   0.5


•  No	
  obvious	
  correla9on	
  
                                                                                                           0.8




   between	
  fragment	
  
                                                   0.4

                                                                                                           0.6




                                   K-S statistic
   similarity	
  &	
  ac9vity	
  
                                                   0.3


                                                                                                           0.4




   profile	
  similarity	
  
                                                   0.2



                                                                                                           0.2



•  Probably	
  not	
  rigorous	
  
                                                   0.1




   when	
  a	
  scaffold	
  has	
  few	
  
                                                   0.0                                                     0.0



                                                         0.80   0.85          0.90           0.95   1.00



   parent	
  molecules	
                                               Tanimoto Similarity
Exploring	
  Profiles	
  for	
  Fragment	
  Pairs	
  
•  Compare	
  ac9vity	
  
   distribu9ons	
  across	
  
   all	
  targets	
  in	
  a	
  
   pairwise	
  fashion	
  
•  Can	
  also	
  generate	
  
   comparison	
  for	
  a	
  
   single	
  target,	
  but	
  
   requires	
  data	
  for	
  all	
  
   the	
  fragments	
  
Looking	
  for	
  SelecKve	
  Fragments	
  
•  Interes9ng	
  to	
  visually	
  explore	
  fragment	
  pairs	
  
•  Can	
  become	
  tedious,	
  especially	
  in	
  a	
  database	
  
   as	
  big	
  as	
  ChEMBL	
  
•  Can	
  we	
  automate	
  this	
  type	
  of	
  analysis?	
  
   –  Iden9fy	
  fragment	
  pairs	
  with	
  very	
  different	
  ac9vity	
  
      distribu9ons?	
  
   –  Iden9fy	
  fragments	
  with	
  a	
  preference	
  for	
  a	
  certain	
  
      target	
  (class)?	
  
Mean Z−Score

                                                                                    Ac                     −10             −5             0
                                                                                      et
                                                                                           yl
                     ch
                Ad olin
                    re         e
                       ne re
                           rg cep
                              ic




                                                                                                     3
                                  re tor
              An                     ce
                  gi                     pt




                                                                                                     50
                     ot                     or




                                                                                                                 4056459
                        en
                            si
                               n         Ag




                                                                                                     6
ge                                re         c
    ne                               ce
       −r                                p




                                                                                                     14
          el                    AN tor
             at                      IO
                ed                       N




                                                                                                                                                 class	
  
                                           IC




                                                                                                     107
                     pe
                         pt
                            id          C
                               e




                                                                                                                                                 target	
  




                                                                                                     6
                                  re      1A
                                     ce
                                         pt
          C




                                                                                                     2
            C                               or
                ch                    C
                   em C am




                                                                                                     5
                        ok       AT          k
       C                    in ION
                               e




                                                                                                     19
          XC                      re IC
                ch                   ce
                   em                    pt




                                                                                                     1
                                            or
                        ok
                            in Cm
                               e




                                                                                                     19
                                  re       gc
                                     c
                             C ept




                                                                                                     1
                                YP or
                                     _1
                             C           1




                                                                                                     3
                                YP B1
                                     _
                             C 11B




                                                                                                     6
                                YP           2
                                     _1




                                                                                                     8
                                C 9A1
                                  YP
                             C _1A




                                                                                                     14
                                YP           2
                                    _2
                               C C1




                                                                                                     7
                                 YP 9
                                      _
                               C 2C




                                                                                                     17
                                 YP 9
                                      _




                                                                                                     13
                                C 2D6
                                  YP
                                       _




                                                                                                     20
                                C 3A4
                                  YP
                             C _4A




                                                                                                     2
                                YP           1
                                     _4




                                                                                                     24
                                C A11
                                  YP
                                       _




                                                                                                     2
                                C 4A3
                                  YP
                 D                     _
                   op           C 4F




                                                                                                     24
                       am YP 2
                            in _5
                               e




                                                                                                     9
                                  re A1
                En                   ce
                                         pt




                                                                                                     18
                    do                      or
                        th
                           el
                              in dru




                                                                                                     4
                       G          re         g
                          nR ce
                 H            H          p
                   is




                                                                                                     2
                      ta          re tor
                         m
                            in cep
                               e




                                                                                                     2
et                                re tor
   ab                                ce
      ot                                 pt
         ro




                                                                                                     1
                                            or
            pi
               c        M
                 gl        C M1
                    ut H


                                                                                                     2
                                          0A
                       a          re
                N ma cep
        N         e           t
                                                                                                     1
          eu uro e re tor
               ro        k           c
                  pe inin ept
                                            o
                                                                                                     1
                     pt
                        id rec r
                           e           ep
                               Y           to
                                                                                                     2



                      N           r
                         or ece r
                            ep           pt
                                in          o
                                                                                                     10




                                   ep r
                                       hr
                                          in
                                                                                                     1




                                   N e
                                      R
                                        1H
                                                                                                     59




                                    N 3
                                      R
                                        3A
                                                                                                     4




                                    N 1
                                      R
                                        3A
                                                                                                     4




                      O                     2
                         pi NR
                            oi
                               d 3C
                                                                                                     2




                                  re 3
                                     ce
                                         pt
                                                                                                     4




                                            or
                               po PA
                                                                                                     86




                                   ta F
                                      ss
                 Se                      iu
                                            m
                                                                                                     3




                     ro
                         to
                                                                                                                                              •  Count	
  number	
  of	
  parent	
  molecules	
  tested	
  against	
  the	
  




                            ni
                  So           n        S1
                                                                                                     12




                      di          r         A
                         um ece
                              _h pto
                                                                                                     42




                                  yd          r
                                      ro
                                         ge
                                                                                                     7




                                             n
                                                                                                     153




                                                                                                Tk
                                                                                                                                              •  Evaluate	
  mean	
  ac9vity	
  of	
  parent	
  molecules	
  within	
  a	
  target	
  



                                                                                                                                              •  Selec9vity	
  of	
  1-­‐phenylimidazole	
  for	
  CYP450	
  has	
  been	
  noted	
  




Wilkinson	
  et	
  al,	
  Biochem	
  Pharmacol,	
  1983,	
  32,	
  997-­‐1003	
  
                                                                                                                                                                                                                                         Targetwise	
  AcKvity	
  Profiles	
  
Mean Z−Score

                                                −8             −6   −4         −2   0   2


       Ad
           re
              n er
                   g ic         A2




                                          5
                          re          A
      An                       ce
                                 pt




                                          2
        gi                          or




                                                     4055899
             ot
                e
        Br nsin                 Ag




                                          23
           ad                        c
                yk rec
                    in        ep
                       in         t




                                          7
 al                       re or
    ci                       ce
       um                       pt




                                          6
            se                      or
                ns
                     in        C
                        g




                                          7
                                 1A
                          re
                             ce
                                pt




                                          24
                                    or
 C
   C                         C
        ch                     am
           e           C




                                          2
C
  ho mo ATI k
      le           k         O
         cy ine                 N
                                  IC




                                          67
            st
               ok rec
                    in        ep
                       in         t




                                          102
                          re or
                             ce
                                pt




                                          6
                                    or
                             C
                               m
                        C         g




                                          18
                          YP c
                              _2
                        C        D




                                          3
                          YP 6
                              _3
         D




                                          8
           op Do A4
               am pa
                     in        m
                        e        in




                                          11
                          r          e
                 ED ece
        En             G        pt




                                          19
            do            re        or
                th           ce
                    el
         G             in       pt
                                   o




                                          16
            lu
               ca rec r
                   go         ep
                        n         to




                                          2
               G          re         r
                  nR ce
         H             H        pt




                                          1
           is             re or
              ta
      Le min cep
                        e         to




                                          16
          uk
             ot           r          r
                 rie ece
                      ne        pt




                                          49
                          re or
                             ce
                                pt




                                          1
                                    or
                             M
                               10




                                          3
ro                                  A
   pi
      c          M
                    C M1
         gl            H




                                          2
            ut                   2B
               am rec
        N            a        ep
                                  t




                                          33
N         eu te
  eu          ro          re or
       ro         ki         ce
          pe nin                pt




                                          18
                          r         or
             pt
                 id ece
                    e           pt
                        Y



                                          118
                          r         or
              N
                 or ece
                     ep         pt
                                    o

                                          1
                         in
                            ep r
                              hr
                                 in       1
                                     e
                             N
                               R
                                 1I
                                          4



              O                     1
                  pi NR
                     oi        3C
                        d
                                          2




                          re        4
                             ce
                                pt
                                          11




                                    or
                                                                                            •  But	
  reported	
  as	
  dopamine	
  agonists	
  




                             O
        Pr                      th
                                          8




           os                      er
               ta
                  no
                       id PA
                                          3




                          re F
                             ce
                                pt
                                          28




                                    or
                                R
                                          5




                                  eg
                                S1
                                          38




                                      A
                                S2
                                                                                               with	
  preference	
  for	
  a	
  specific	
  target	
  class	
  




                                          7




                                      1
                                                                                            •  Iden9fied	
  benzylpyrrolidine	
  as	
  a	
  fragment	
  




         Se                     S9
                                          45




             ro      Se         A
                t on roto
                    in        ni
                                          4




                       re        n
                          ce
                             pt
                                          9




                                or
                                          29




                                    Tk

                                    Tk
                                          2




                                      l
                                                                                                                                                                  Targetwise	
  AcKvity	
  Profiles	
  
Fragment	
  or	
  Scaffold?	
  
•  I’ve	
  been	
  using	
  fragment	
  &	
  scaffold	
  
   interchangeably	
  –	
  not	
  always	
  true	
  
•  Chemists	
  have	
  an	
  intui9ve	
  idea	
  of	
  what	
  a	
  
   scaffold	
  is	
  
•  Can	
  we	
  encode	
  the	
  idea	
  of	
  scaffold-­‐like	
  or	
  
   fragment-­‐like	
  
•  We	
  use	
  the	
  concept	
  of	
  	
                        Size	
  of	
  fragment	
  

   Signal-­‐to-­‐Noise	
  	
                     µ SD	
  of	
  number	
  of	
  atoms	
  
   Ra9o	
                       SNR =                    not	
  in	
  the	
  fragment,	
  	
  
                                                        !            considered	
  over	
  the	
  	
  
                                                                     parent	
  molecules	
  
Fragment	
  or	
  Scaffold	
  
•  Par9al	
  distribu9on	
  of	
  SNR	
  values	
  for	
  fragments	
  
   with	
  atom	
  count	
  >	
  8	
  &	
  <	
  20	
  
                                                              60



                                                              50




                                    Percentage of Fragments
                                                              40



                                                              30



                                                              20



                                                              10



                                                               0


                                                                   0   1   2    3    4   5   6

                                                                               SNR
Fragment	
  or	
  Scaffold	
  
•  Large	
  SNR’s	
  associated	
  with	
  Murcko-­‐like	
  fragments	
  
•  A	
  useful	
  SNR	
  cutoff	
  is	
  an	
  open	
  ques9on	
  




       SNR	
  =	
  8.50	
         SNR	
  =	
  9.10	
     SNR	
  =	
  12.09	
  




        SNR	
  =	
  0.83	
        SNR	
  =	
  0.43	
            SNR	
  =	
  0.36	
  
AcKvity	
  Profiles	
  &	
  SNR	
  
•  Given	
  a	
  fragment,	
  evaluate	
  SD	
  of	
  the	
  number	
  of	
  
   atoms	
  in	
  the	
  parent	
  molecules	
  that	
  are	
  not	
  part	
  
   of	
  the	
  fragment	
  
•  Label	
  the	
  parent	
  molecules	
  based	
  on	
  	
  
    –  If	
  number	
  of	
  atoms	
  not	
  in	
  the	
  fragment	
  >	
  SD,	
  non	
  
       core-­‐like	
  
    –  Otherwise	
  core-­‐like	
  
•  Visualize	
  the	
  ac9vity	
  distribu9ons	
  of	
  the	
  parent	
  
     molecules,	
  grouped	
  by	
  the	
  label	
  
	
  
AcKvity	
  Profiles	
  &	
  SNR	
  
                                                                     -50         0            50                                             -50         0            50

                                    20967                              20967                                44591                              44591
                                   Core-like                         Not core-like                         Core-like                         Not core-like
Percentage of Total




                      80

                      60

                      40

                      20


                                 -50         0            50                                             -50         0            50
                                                                                                                                                                 High	
  SNR	
  
                                                                                         Z-Score



                                                               -30   -20   -10       0   10                                            -30   -20   -10       0   10

                                     801                                 801                                68604                              68604
                                   Core-like                         Not core-like                         Core-like                         Not core-like
Percentage of Total




                      80

                      60

                      40

                      20

                                                                                                                                                                 Low	
  SNR	
  
                           -30   -20   -10       0   10                                            -30   -20   -10       0   10

                                                                                         Z-Score
Downloads	
  
•  Scaffold	
  ac9vity	
  networks	
  
•  Fragment	
  Ac9vity	
  Profiler	
  
   –  SQL	
  &	
  servlet	
  sources	
  
   –  Client	
  sources	
  
   –  Online	
  version	
  
Smashing Molecules

Weitere ähnliche Inhalte

Mehr von Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesRajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 

Mehr von Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 

Kürzlich hochgeladen

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Kürzlich hochgeladen (20)

Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Smashing Molecules

  • 1. Smashing  Molecules   How  Molecular  Fragments  Allow  us  to  Explore  Large   Chemical  Spaces   Rajarshi  Guha  &  Trung  Nguyen   NIH  Center  for     Transla9onal  Therapeu9cs     Chemaxon  UGM   September  2011  
  • 2. Outline   •  Fragments  as  the  building  blocks  of  chemistry   •  Fragments  and  SAR   •  Fragments  and  ac9vity  profiles  
  • 3. Big  Data  for  Some  Problems   •  Halevy  et  al  discuss  the  effec9veness  of   extremely  large  datasets   •  Their  applica9on  focuses  on  machine   transla9on  –  see  the  Google  n-­‐gram  corpus   •  They  suggest  that  such  extremely  large  datasets   are  useful  because  they  effec9vely  encompass   all  n-­‐grams  (phrases)  commonly  used   •  Domain  is  rela9vely  constrained   Halevy  et  al,  IEEE  Intelligent  Systems,  2009,  24,  8-­‐12  
  • 4. Google  Scale  in  Chemistry?   •  What  would  be  the  equivalent  of  an  n-­‐gram   corpus  in  chemistry?   –  Fragments   –  A  more  direct  analogy  can  be  made  by  using  LINGO’s   •  It  is  possible  to  generate  arbitrarily  large  (virtual)   compound  and    fragment  collec9ons   •  But  would  such  a  collec9on  span  all  of   “commonly  used”  chemistry?   –  Depending  on  the  ini9al  compound  set,  yes   –  But  we’re  also  interested  in  going  beyond  such  a   “commonly  used”  set   Fink  T,  Reymond  JL,  J  Chem  Inf  Model,  2007,  47,  342  
  • 5. Fragment  Diversity   •  Consider  a  set  of  bioac9ves  such  as  the  LOPAC   collec9on,  1280  compounds   •  Using  exhaus9ve     fragmenta9on  we  get     40 2,460  unique  fragments   Percent of Total 30 •  On  the  MLSMR     (~  372K  compounds),     20 we  get    164,583     10 fragments   0 0 1 2 3 4 log Fragment Frequency
  • 6. Fragment  Diversity   6 All  fragments   4 Fragments  occurring  in     5  to  50  molecules   4 2 2 PC 2 0 PC 2 0 -2 -2 -4 -4 -4 -2 0 2 -4 -2 0 2 4 PC 1 PC 1 •  Distribu9on  of  MLSMR  fragments  in  BCUT  space  
  • 7. What  Do  We  Do  with  Fragments?   •  Assuming  we  obtain  fragments  from  a  large   enough  collec9on  what  do  we  do?   –  Learning  from  fragments  –  QSARs,  genera9ve   models   –  Use  fragments  as     filters,  alterna9ve     to  clustering   –  Explore  chemotypes   and  ac9vity   –  Scaffold  level  promiscuity   White,  D  and  Wilson,  RC,  J  Chem  Inf  Model,  2010,  50,  1257-­‐1274  
  • 8. Scaffold  AcKvity  Diagrams   •  Network  oriented  view  of  fragment  (scaffold)   collec9ons   –  Similar  in  idea  to   Scaffold  Hunter  etc   –  Not  purely  hierarchical   •  Color  by  arbitrary     proper9es   •  Quickly  assess  u9lity   of  a  scaffold   •  Try  it  online    
  • 9. What  Makes  a  Good  Scaffold?   •  What  makes  a  good   scaffold?   –  Size,  complexity,  …   –  Do  the  members   represent  an  SAR  or  not?   –  Intui9on  and  experience   also  play  a  role  
  • 10. Scaffold  QSAR   Fit  PLS  or  ridge   regression  model   0 ! ! !! ! !2 ! ! ! ! Predicted ! !4 ! ! !! ! Evaluate  topological     ! ! and  physicochemical     ! !6 descriptors  for  the     ! ! R-­‐groups   !8 Characterize  the     !8 !6 !4 !2 0 Observed SAR  landscape  
  • 11. Scaffold  QSAR  -­‐  Drawbacks   •  Many  scaffolds  have  few  (5  to  10)  members   •  Invariably,  more  features  than  observa9ons   •  If  the  number  of  R-­‐groups  is  large,  the  feature   matrix  can  be  very  sparse   –  Less  of  a  problem  for  combinatorial  libraries   •  A  linear  fit  may  not  be  the  best  approach  to   correla9ng  R-­‐groups  to  the  ac9vi9es   –  Difficult  to  choose  a  model  type  a  priori  
  • 12. Fragment  AcKvity  Profiles   •  Using  scaffolds  in  HTS  triage  usually  leads  to   two  ques9ons   –  What  is  known  about  the  chemical  series  with   respect  to  the  intended  target?   –  What  compound  classes  are  known  to  modulate   the  intended  target  &  how  similar  are  they  to   series  in  ques9on   •  We’re  interested  in  exploring  summaries  of   ac=vity,  grouped  by  scaffolds  and  targets  
  • 13. Fragment  AcKvity  Profiles   •  We  use  ChEMBL  (08)  as  the  source  of   bioac9vity  across  mul9ple  targets   •  Preprocess  the  database   –  Generate  scaffolds  (exhaus9ve  enumera9on  of   combina9ons  of  SSSR’s)   –  Normalize  ac9vity  data  so  that  we  compare  the   ac9vity  of  a  molecule  across  different  assays  
  • 14. Database  Setup   •  Preprocessing  steps  available  as  a  Java  servlet   –  hkp://tripod.nih.gov/files/chembl-­‐servlets.zip   •  Need  ChEMBL  installed  in  Oracle;  we  add   some  extra  tables   –  Fragment  structures  and  computed  proper9es   –  Aggregated  assay  ac9vity  summary   •  Only  consider  assays  with  IC50’s  in  nM  and  uncensored   data,  more  than  5  observa9ons  and  a  MAD  >  0   –  (Robust)  z-­‐scored  ac9vi9es  
  • 15. Some  Fragment  StaKsKcs   •  Considered  Z-­‐score  range  of  -­‐40  to  15   •  There  were  12,887  molecules  lying  outside   this  range   15 50 Number of compounds Percentage of assays 40 10 30 20 5 10 0 0 1.0 1.5 2.0 2.5 -40 -30 -20 -10 0 10 log(Number of molecules) Z-score
  • 16. Some  Fragment  StaKsKcs   •  Next,  iden9fy  fragments  with  8  to  20  atoms   and  occurring  in  100  to  900  molecules   •  Gives  us  1,746  fragments   40 Percentage of Fragments 30 20 10 0 200 400 600 800 Num Molecules
  • 17. Some  Fragment  StaKsKcs   •  We  can  query  the  fragment  tables  to  get   ac9vity  summaries     40169 64473 115654 for  individual     60 N = 1457 N = 1595 N = 1515 50 40 fragments   30 20 10 0 •  For  these  examples   -20 0 5390 20 -40 -20 5486 0 20 -20 -10 13485 0 10 60 we  consider  the   Percent of Total N = 1489 N = 1578 N = 1455 50 40 30 full  range  of  Z-­‐   20 10 0 scores   60 -5 N = 1280 0 778 5 10 15 0 N = 1918 10 2723 20 -60 -40 N = 2641 -20 4058 0 20 50 40 30 20 10 0 -30 -20 -10 0 10 -600 -400 -200 0 -50 0 50 Z-Score
  • 18. Exploring  AcKvity  Profiles   Ac9vity  distribu9ons   of  parent  molecules     Fragments  from  ChEMBL   across  all  targets   Z-­‐scores  for  individual   molecules  against  a     specific  target  
  • 19. Exploring  AcKvity  Profiles   •  User  can  draw  a  molecule  and  fragment  on   the  fly   •  Use  generated   fragments  to     create     ac9vity     histograms  
  • 20. Target  SelecKon   •  Employs  the  ChEMBL   target  hierarchy   •  Can  select  target     families  or  individual   targets  
  • 21. Similar  Fragments  with  Similar  Profiles?   •  Consider  658  fragments  with  >  10  atoms  and   occurring  in  500  to  1200  molecules   •  Overall,  the  fragments   25 tend  to  be  dissimilar     20 –  95th  percen9le  is  just   Percentage of pairs 0.50   15 •  1,873  pairs  do  exhibit   10 Tc  >  0.8   5   0 0.0 0.2 0.4 0.6 0.8 1.0 Tanimoto Similarity
  • 22. Comparing  AcKvity  Profiles   •  Compare  ac9vity  profiles  with  the  K-­‐S  sta9s9c   •  Color  corresponds  to     1.0 p-­‐value  of  the  K-­‐S  test   0.6 0.5 •  No  obvious  correla9on   0.8 between  fragment   0.4 0.6 K-S statistic similarity  &  ac9vity   0.3 0.4 profile  similarity   0.2 0.2 •  Probably  not  rigorous   0.1 when  a  scaffold  has  few   0.0 0.0 0.80 0.85 0.90 0.95 1.00 parent  molecules   Tanimoto Similarity
  • 23. Exploring  Profiles  for  Fragment  Pairs   •  Compare  ac9vity   distribu9ons  across   all  targets  in  a   pairwise  fashion   •  Can  also  generate   comparison  for  a   single  target,  but   requires  data  for  all   the  fragments  
  • 24. Looking  for  SelecKve  Fragments   •  Interes9ng  to  visually  explore  fragment  pairs   •  Can  become  tedious,  especially  in  a  database   as  big  as  ChEMBL   •  Can  we  automate  this  type  of  analysis?   –  Iden9fy  fragment  pairs  with  very  different  ac9vity   distribu9ons?   –  Iden9fy  fragments  with  a  preference  for  a  certain   target  (class)?  
  • 25. Mean Z−Score Ac −10 −5 0 et yl ch Ad olin re e ne re rg cep ic 3 re tor An ce gi pt 50 ot or 4056459 en si n Ag 6 ge re c ne ce −r p 14 el AN tor at IO ed N class   IC 107 pe pt id C e target   6 re 1A ce pt C 2 C or ch C em C am 5 ok AT k C in ION e 19 XC re IC ch ce em pt 1 or ok in Cm e 19 re gc c C ept 1 YP or _1 C 1 3 YP B1 _ C 11B 6 YP 2 _1 8 C 9A1 YP C _1A 14 YP 2 _2 C C1 7 YP 9 _ C 2C 17 YP 9 _ 13 C 2D6 YP _ 20 C 3A4 YP C _4A 2 YP 1 _4 24 C A11 YP _ 2 C 4A3 YP D _ op C 4F 24 am YP 2 in _5 e 9 re A1 En ce pt 18 do or th el in dru 4 G re g nR ce H H p is 2 ta re tor m in cep e 2 et re tor ab ce ot pt ro 1 or pi c M gl C M1 ut H 2 0A a re N ma cep N e t 1 eu uro e re tor ro k c pe inin ept o 1 pt id rec r e ep Y to 2 N r or ece r ep pt in o 10 ep r hr in 1 N e R 1H 59 N 3 R 3A 4 N 1 R 3A 4 O 2 pi NR oi d 3C 2 re 3 ce pt 4 or po PA 86 ta F ss Se iu m 3 ro to •  Count  number  of  parent  molecules  tested  against  the   ni So n S1 12 di r A um ece _h pto 42 yd r ro ge 7 n 153 Tk •  Evaluate  mean  ac9vity  of  parent  molecules  within  a  target   •  Selec9vity  of  1-­‐phenylimidazole  for  CYP450  has  been  noted   Wilkinson  et  al,  Biochem  Pharmacol,  1983,  32,  997-­‐1003   Targetwise  AcKvity  Profiles  
  • 26. Mean Z−Score −8 −6 −4 −2 0 2 Ad re n er g ic A2 5 re A An ce pt 2 gi or 4055899 ot e Br nsin Ag 23 ad c yk rec in ep in t 7 al re or ci ce um pt 6 se or ns in C g 7 1A re ce pt 24 or C C C ch am e C 2 C ho mo ATI k le k O cy ine N IC 67 st ok rec in ep in t 102 re or ce pt 6 or C m C g 18 YP c _2 C D 3 YP 6 _3 D 8 op Do A4 am pa in m e in 11 r e ED ece En G pt 19 do re or th ce el G in pt o 16 lu ca rec r go ep n to 2 G re r nR ce H H pt 1 is re or ta Le min cep e to 16 uk ot r r rie ece ne pt 49 re or ce pt 1 or M 10 3 ro A pi c M C M1 gl H 2 ut 2B am rec N a ep t 33 N eu te eu ro re or ro ki ce pe nin pt 18 r or pt id ece e pt Y 118 r or N or ece ep pt o 1 in ep r hr in 1 e N R 1I 4 O 1 pi NR oi 3C d 2 re 4 ce pt 11 or •  But  reported  as  dopamine  agonists   O Pr th 8 os er ta no id PA 3 re F ce pt 28 or R 5 eg S1 38 A S2 with  preference  for  a  specific  target  class   7 1 •  Iden9fied  benzylpyrrolidine  as  a  fragment   Se S9 45 ro Se A t on roto in ni 4 re n ce pt 9 or 29 Tk Tk 2 l Targetwise  AcKvity  Profiles  
  • 27. Fragment  or  Scaffold?   •  I’ve  been  using  fragment  &  scaffold   interchangeably  –  not  always  true   •  Chemists  have  an  intui9ve  idea  of  what  a   scaffold  is   •  Can  we  encode  the  idea  of  scaffold-­‐like  or   fragment-­‐like   •  We  use  the  concept  of     Size  of  fragment   Signal-­‐to-­‐Noise     µ SD  of  number  of  atoms   Ra9o   SNR = not  in  the  fragment,     ! considered  over  the     parent  molecules  
  • 28. Fragment  or  Scaffold   •  Par9al  distribu9on  of  SNR  values  for  fragments   with  atom  count  >  8  &  <  20   60 50 Percentage of Fragments 40 30 20 10 0 0 1 2 3 4 5 6 SNR
  • 29. Fragment  or  Scaffold   •  Large  SNR’s  associated  with  Murcko-­‐like  fragments   •  A  useful  SNR  cutoff  is  an  open  ques9on   SNR  =  8.50   SNR  =  9.10   SNR  =  12.09   SNR  =  0.83   SNR  =  0.43   SNR  =  0.36  
  • 30. AcKvity  Profiles  &  SNR   •  Given  a  fragment,  evaluate  SD  of  the  number  of   atoms  in  the  parent  molecules  that  are  not  part   of  the  fragment   •  Label  the  parent  molecules  based  on     –  If  number  of  atoms  not  in  the  fragment  >  SD,  non   core-­‐like   –  Otherwise  core-­‐like   •  Visualize  the  ac9vity  distribu9ons  of  the  parent   molecules,  grouped  by  the  label    
  • 31. AcKvity  Profiles  &  SNR   -50 0 50 -50 0 50 20967 20967 44591 44591 Core-like Not core-like Core-like Not core-like Percentage of Total 80 60 40 20 -50 0 50 -50 0 50 High  SNR   Z-Score -30 -20 -10 0 10 -30 -20 -10 0 10 801 801 68604 68604 Core-like Not core-like Core-like Not core-like Percentage of Total 80 60 40 20 Low  SNR   -30 -20 -10 0 10 -30 -20 -10 0 10 Z-Score
  • 32. Downloads   •  Scaffold  ac9vity  networks   •  Fragment  Ac9vity  Profiler   –  SQL  &  servlet  sources   –  Client  sources   –  Online  version