SlideShare ist ein Scribd-Unternehmen logo
1 von 76
Downloaden Sie, um offline zu lesen
Outlier Detection Using R-NN Curves
Searching for HIV Integrase Inhibitors
                            Summary




     Navigating Molecular Haystacks:
          Tools & Applications
 Finding Interesting Molecules & Doing it Fast


                             Rajarshi Guha

                       Department of Chemistry
                     Pennsylvania State University


                           20th April, 2006
Outlier Detection Using R-NN Curves
           Searching for HIV Integrase Inhibitors
                                       Summary


Past Work


QSAR Applications
   Artemisinin analogs
    PDGFR inhibitors
    Bleaching agents
    Linear & non-linear methods

QSAR Methods
   Representative QSAR sets
    Model interpretation
    Model applicability
Outlier Detection Using R-NN Curves
          Searching for HIV Integrase Inhibitors
                                      Summary


Past Work




Cheminformatics
    Automated QSAR pipeline
    Contributions to the CDK
    Cheminformatics webservices
    R packages and snippets
Outlier Detection Using R-NN Curves
           Searching for HIV Integrase Inhibitors
                                       Summary


Past Work




Chemical Data Mining
    Approximate k-NN
    Local regression
    Outlier detection
    VS protocol
Outlier Detection Using R-NN Curves
           Searching for HIV Integrase Inhibitors
                                       Summary


Past Work




Chemical Data Mining
    Approximate k-NN
    Local regression
    Outlier detection
    VS protocol
Outlier Detection Using R-NN Curves
            Searching for HIV Integrase Inhibitors
                                        Summary


Outline


  1   Outlier Detection Using R-NN Curves
        Methods for Diversity Analysis
        Generating & Summarizing R-NN Curves
        Using R-NN Curves

  2   Searching for HIV Integrase Inhibitors
        Previous Work on HIV Integrase
        A Tiered Virtual Screening Protocol
        What Does the Pipeline Give Us?

  3   Summary
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
            Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                        Summary      Using R-NN Curves


Outline


  1   Outlier Detection Using R-NN Curves
        Methods for Diversity Analysis
        Generating & Summarizing R-NN Curves
        Using R-NN Curves

  2   Searching for HIV Integrase Inhibitors
        Previous Work on HIV Integrase
        A Tiered Virtual Screening Protocol
        What Does the Pipeline Give Us?

  3   Summary
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
          Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                      Summary      Using R-NN Curves


Diversity Analysis




  Why is it Important?
     Compound acquisition
      Lead hopping
      Knowledge of the distribution of compounds in a descriptor
      space may improve predictive models
Outlier Detection Using R-NN Curves       Methods for Diversity Analysis
                  Searching for HIV Integrase Inhibitors    Generating & Summarizing R-NN Curves
                                              Summary       Using R-NN Curves


Approaches to Diversity Analysis



Cell based                                                                                                      q             q

                                                                                                                                                  q


     Divide space into bins                                                   q



                                                                                  q               q
                                                                                                          q


                                                                                                          qq
                                                                                                             q q
                                                                                                                qq
                                                                                                                 q
                                                                                                                     q
                                                                                                                     q
                                                                                                                     qq
                                                                                                                          q
                                                                                                                              q

                                                                                                                                  q
                                                                                                                                       q

                                                                                                                                     q q
                                                                                                                                      q
                                                                                                                                         qq
                                                                                                                                                  q
                                                                                                               q
                                                                                                            qq                      qq q
                                                                                                                                     q
                                                                                                                   qq q
                                                                                                                      qq                 q        q
                                                                                                       q q
                                                                                                  qq                                q q           q


      Compounds are mapped to bins
                                                                                                  q           q     q                       q
                                                                                  q                    qqq q qq q
                                                                                                        q         qqq              q qq  q
                                                                                                            q                     q      q    q
                                                                                              q         q q  q       q    q             qqq       q
                                                                                                                 q      q
                                                                              q                              q qqq           q q q
                                                                                                              q q               q    q                q
                                                                                      q                                                   q
                                                                                                                    qq q      q
                                                                                                                              q
                                                                                          q             q      q     q q
                                                                                                                    qq         q q      q
                                                                                                            q q        q     qq     q
                                                                                                           q                          q
                                                                                                           q               q
                                                                                      q                   q                 q
                                                                                                                            q
                                                                                                                   q
                                                                                              q
                                                                                                           qq           q q
                                                                                                             q      q               q q

Disadvantages                                                                                         q
                                                                                                          qq   q
                                                                                                                q
                                                                                                                            q
                                                                                                                                  q




    Not useful for high dimensional
    data
      Choosing the bin size can be tricky




  Schnur, D.; J. Chem. Inf. Comput. Sci. 1999, 39, 36–45
  Agrafiotis, D.; Rassokhin, D.; J. Chem. Inf. Comput. Sci. 2002, 42, 117–12
Outlier Detection Using R-NN Curves       Methods for Diversity Analysis
                  Searching for HIV Integrase Inhibitors    Generating & Summarizing R-NN Curves
                                              Summary       Using R-NN Curves


Approaches to Diversity Analysis


                                                                                                              q




Distance based                                                                                            q                   q


                                                                                                                                              q
                                                                                                                                                                              q

                                                                                                                      q
                                                                                                                                                                                          q



     Considers distance between                                                                   q
                                                                                                                                  q


                                                                                                                                                  q
                                                                                                                                                      q


                                                                                                                                      q

     compounds in a space                                                         q       q

                                                                                                                  q
                                                                                                                                                          q
                                                                                                                                                                                      q


                                                                                                          q                                               q
                                                                                          q                       q       q                                   q


      Generally requires pairwise distance                                        q
                                                                                              q



                                                                                                                                                          q
                                                                                                  q


      calculation                                                             q               q                               q
                                                                                      q
                                                                                                                      q                                       q

                                                                                                                                  q
                                                                                                          q                               q
                                                                                                      q                   q
                                                                                                                              q                   q

      Can be sped up by kD trees, MVP                                                                                                 q
                                                                                                                                                      q

                                                                                                                                                                  q
                                                                                                                                                                          q




                                                                                                                                                                                  q


      trees etc.                                                          q
                                                                                                                      q
                                                                                                                              q
                                                                                                                                  q
                                                                                                                                                                      q




  Agrafiotis, D. K.; Lobanov, V. S.; J. Chem. Inf. Comput. Sci. 1999, 39, 51–58
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
           Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                       Summary      Using R-NN Curves


Generating an R-NN Curve


  Observations
      Consider a query point with a hypersphere, of radius R,
      centered on it
      For small R, the hypersphere will contain very few or no
      neighbors
      For larger R, the number of neighbors will increase
      When R ≥ Dmax , the neighbor set is the whole dataset

  The question is . . .
  Does the variation of nearest neighbor count with radius allow us
  to characterize the location of a query point in a dataset?
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
           Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                       Summary      Using R-NN Curves


Generating an R-NN Curve


  Observations
      Consider a query point with a hypersphere, of radius R,
      centered on it
      For small R, the hypersphere will contain very few or no
      neighbors
      For larger R, the number of neighbors will increase
      When R ≥ Dmax , the neighbor set is the whole dataset

  The question is . . .
  Does the variation of nearest neighbor count with radius allow us
  to characterize the location of a query point in a dataset?
Outlier Detection Using R-NN Curves        Methods for Diversity Analysis
                  Searching for HIV Integrase Inhibitors     Generating & Summarizing R-NN Curves
                                              Summary        Using R-NN Curves


Generating an R-NN Curve


Algorithm                                                                                                                        q
                                                                                                                                             q            50%
                                                                                                               q                                                                q


  Dmax ← max pairwise distance                                                                                                   q           q
                                                                                                                                             q
                                                                                                                                                  q
                                                                                                                                                 qq
                                                                                                                                                                        30%
  for molecule in dataset do                                                           q                       q



                                                                                                               q       q
                                                                                                                       q
                                                                                                                           q q

                                                                                                                             q
                                                                                                                            q q
                                                                                                                                 q
                                                                                                                                         q
                                                                                                                                         q
                                                                                                                                             q

                                                                                                                                                      q
                                                                                                                                                                                         q

                                                                                                                                     q           q q


    R ← 0.01 × Dmax
                                                                                                       q                                                  q q
                                                                                                                             q
                                                                                                                                             q                                  q
                                                                                           q                           qq                    q            q             q                    q
                                                                                                           q             q q q
                                                                                                           qq                                                   q
                                                                                                                       q
                                                                                                                           q q     q
                                                                                                                                           q                                        10%
                                                                                                                            q q q       q    q                  q
                                                                                                                                                                            q
                                                                                                                    q                  q     q
                                                                                                                              q                                                      5%
    while R ≤ Dmax do
                                                                                                                       q                     q                      q
                                                                                   q                               qq        qq
                                                                                                           q    q  q      q                 q
                                                                                                   q              qq
                                                                                               q           q     q q
                                                                                                                 qq            q
                                                                                                                    q         q q
                                                                                                                              q              q                                       q
                                                                                                                        q                    q                  q
                                                                                                                                                                q
                                                                                                            q qq q               q                              q
                                                                                                                q                                                                            q
                                                                                                                          q          q  q


       Find NN’s within radius R
                                                                                                        q                          q
                                                                                           q                                                                    q
                                                                                                                                           q
                                                                                           q
                                                                                                                                      q
                                                                                                                                      q
                                                                               q                   q         q        q
                                                                                                                      q                 q                            q
                                                                                                                                                                    q
                                                                                                                                  q q                           q


       Increment R
                                                                                                                                  q                                             q
                                                                                                          q                         q     q
                                                                                                      q           q     q
                                                                                                                                     q
                                                                                                     q         q q    q                  q
                                                                                                                                               q
                                                                                                                                                                                     q
                                                                                                                                                                                                 q


    end while                                                                                                                            q




  end for                                                                                                                                                           q




  Plot NN count vs. R



  Guha, R.; Dutta, D.; Jurs, P.C.; Chen, T.; J. Chem. Inf. Model., submitted
Outlier Detection Using R-NN Curves                        Methods for Diversity Analysis
                                        Searching for HIV Integrase Inhibitors                     Generating & Summarizing R-NN Curves
                                                                    Summary                        Using R-NN Curves


Generating an R-NN Curve

                                                                                              qq
                                                                                              qq
                                                                                         qqqq
                                                                                          qqq                                                                             qqqqqqqqqqqqqqqqqqqqqqqqqqq
                                                                                                                                                                           qqqqqqqqqqqqqqqqqqqqqqqqqqq
                                                                                        q                                                                          qqqqqq
                                                                                                                                                                    qqqqq
                                                                                       q
                                                                                                                                                              qqqq
                                                                                                                                                               qqqq
                       250




                                                                                      q




                                                                                                                          250
                                                                                                                                                            qq
                                                                                                                                                        qqq
                                                                                                                                                         qq
 Number of Neighbors




                                                                                                    Number of Neighbors
                                                                                  q                                                                   q
                       200




                                                                                                                          200
                                                                                                                                                   qq
                                                                                  q


                                                                              q
                                                                                                                                                  q
                       150




                                                                                                                          150
                                                                                                                                              qq
                                                                                                                                               q

                                                                                                                                              q
                                                                                                                                             q
                                                                                                                                          q
                       100




                                                                                                                          100
                                                                              q

                                                                                                                                        qq
                                                                                                                                         q



                                                                          q
                       50




                                                                                                                                      q




                                                                                                                          50
                                                                         qq
                                                                          q
                                                                      qq
                                                                       qq                                                           q
                                                                  qqq
                                                                   qq
                                                       qqqqqqqqqq
                                                        qqqqqqqqqq
                             qqqqqqqqqqqqqqqqqqqqqqqqq
                              qqqqqqqqqqqqqqqqqqqqqqqq
                       0




                                                                                                                                qqq
                                                                                                                                 qq




                                                                                                                          0
                             0        20          40         60         80                   100                                0                  20             40         60         80         100
                                                       R                                                                                                               R

                                           Sparse                                                                                                          Dense
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
           Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                       Summary      Using R-NN Curves


Characterizing an R-NN Curve


  Converting the Plot to Numbers
      Since R-NN curves are sigmoidal, fit them to the logistic
      equation
                                   1 + m e −R/τ
                          NN = a ·
                                    1 + n e −R/τ
      m, n should characterize the curve
      Problems
           Two parameters
           Non-linear fitting is dependent on the starting point
           For some starting points, the fit does not converge and
           requires repetition
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
           Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                       Summary      Using R-NN Curves


Characterizing an R-NN Curve


Converting the Plot to a Number
Determine the value of R where the
lower tail transitions to the linear
portion of the curve                                                                        S=0

                                                                                      S’’
Solution
     Determine the slope at various
     points on the curve                                                         S’
                                                              S=0
    Find R for the first occurence of
    the maximal slope (Rmax(S) )
    Can be achieved using a finite
    difference approach
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
           Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                       Summary      Using R-NN Curves


Characterizing an R-NN Curve


Converting the Plot to a Number
Determine the value of R where the
lower tail transitions to the linear
portion of the curve                                                                        S=0

                                                                                      S’’
Solution
     Determine the slope at various
     points on the curve                                                         S’
                                                              S=0
    Find R for the first occurence of
    the maximal slope (Rmax(S) )
    Can be achieved using a finite
    difference approach
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
           Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                       Summary      Using R-NN Curves


Characterizing Multiple R-NN Curves

Problem
    Visual inspection of curves is useful
    for a few molecules




                                                                      80
    For larger datasets we need to
    summarize R-NN curves




                                                                      60
                                                             Rmax S
                                                                      40
Solution
     Plot Rmax(S) values for each




                                                                      20
     molecule in the dataset
    Points at the top of the plot are

                                                                      0
                                                                           0   50   100   150   200   250

                                                                                    Serial Number
    located in the sparsest regions
    Points at the bottom are located in
    the densest regions
Outlier Detection Using R-NN Curves        Methods for Diversity Analysis
                  Searching for HIV Integrase Inhibitors     Generating & Summarizing R-NN Curves
                                              Summary        Using R-NN Curves


How Can We Use It For Large Datasets?



  Breaking the O(n2 ) barrier
         Traditional NN detection has a time complexity of O(n2 )
         Modern NN algorithms such as kD-trees
                 have lower time complexity
                 restricted to the exact NN problem
         Solution is to use approximate NN algorithms such as Locality
         Sensitive Hashing (LSH)




  Bentley, J.; Commun. ACM 1980, 23, 214–229
  Datar, M. et al.; SCG ’04: Proc. 20th Symp. Comp. Geom.; ACM Press, 2004
  Dutta, D.; Guha, R.; Jurs, P.; Chen, T.; J. Chem. Inf. Model. 2006, 46, 321–333
Outlier Detection Using R-NN Curves        Methods for Diversity Analysis
                  Searching for HIV Integrase Inhibitors     Generating & Summarizing R-NN Curves
                                              Summary        Using R-NN Curves


How Can We Use It For Large Datasets?




                                                                                              −1.0
                                                                                              −1.5
                                                                log [Mean Query Time (sec)]
Why LSH?




                                                                                              −2.0
   Theoretically sublinear




                                                                                              −2.5
      Shown to be 3 orders of




                                                                                              −3.0
                                                                                                                      LSH (142 descriptors)
                                                                                                                      LSH (20 descriptors)
      magnitude faster than                                                                                           kNN (142 descriptors)
                                                                                                                      kNN (20 descriptors)




                                                                                              −3.5
      traditional kNN
                                                                                                     0.02 0.03 0.04 0.05 0.06 0.07 0.08
      Very accurate (> 94%)                                                                                       Radius



                                                             Comparison of NN detection speed
                                                             on a 42,000 compound dataset
                                                             using a 200 point query set
  Dutta, D.; Guha, R.; Jurs, P.; Chen, T.; J. Chem. Inf. Model. 2006, 46, 321–333
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
          Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                      Summary      Using R-NN Curves


Alternatives?


Why not use PCA?
   R-NN curves are fundamentally a




                                                                                     4
   form of dimension reduction




                                                                                     2
    Principal Components Analysis is




                                                             Principal Component 2
                                                                                     0
    also a form of dimension reduction




                                                                                     −2
Disadvantages




                                                                                     −4
    Eigendecomposition via SVD is



                                                                                     −6
    O(n3 )                                                                                0            5              10

                                                                                              Principal Component 1
    Difficult to visualize more than 2 or
    3 PC’s at the same time
Outlier Detection Using R-NN Curves         Methods for Diversity Analysis
                  Searching for HIV Integrase Inhibitors      Generating & Summarizing R-NN Curves
                                              Summary         Using R-NN Curves


Datasets

  Boiling point dataset
       277 molecules
          Average: MW = 115, Stanimoto = 0.20
          Calculated 214 descriptors, reduced to 64

  Kazius-AMES dataset
      4337 molecules
          Average MW = 240, Stanimoto = 0.21
          Known to have a number of significant outliers
          Calculated 142 MOE descriptors, reduced to 45


  Goll, E.; Jurs, P.; J. Chem. Inf. Comput. Sci. 1999, 39, 974–983
  Kazius, J.; McGuire, R.; Bursi, R.; J. Med. Chem. 2005, 48, 312–320
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
           Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                       Summary      Using R-NN Curves


Choosing a Descriptor Space

  Boiling point dataset
       We had previously generated linear regression models using a
       GA to search for subsets
      Best model had 4 descriptors

  Kazius-AMES dataset
      No models were generated for this dataset
      Considered random descriptor subsets
      Used a 5-descriptor subset to represent the results

  The R-NN curve approach focuses on the distribution of molecules
  in a given descriptor space. Hence a good or random descriptor
  subset should be able to highlight the utility of the method
Outlier Detection Using R-NN Curves      Methods for Diversity Analysis
           Searching for HIV Integrase Inhibitors   Generating & Summarizing R-NN Curves
                                       Summary      Using R-NN Curves


Choosing a Descriptor Space

  Boiling point dataset
       We had previously generated linear regression models using a
       GA to search for subsets
      Best model had 4 descriptors

  Kazius-AMES dataset
      No models were generated for this dataset
      Considered random descriptor subsets
      Used a 5-descriptor subset to represent the results

  The R-NN curve approach focuses on the distribution of molecules
  in a given descriptor space. Hence a good or random descriptor
  subset should be able to highlight the utility of the method
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications
Navigating Molecular Haystacks: Tools & Applications

Weitere ähnliche Inhalte

Andere mochten auch

GICD Webinar Sample Advisor Woman Client Stress
GICD Webinar Sample Advisor Woman Client StressGICD Webinar Sample Advisor Woman Client Stress
GICD Webinar Sample Advisor Woman Client StressElmer Rich
 
People and Performance UK
People and Performance UKPeople and Performance UK
People and Performance UKpphrskp
 
мастерская деда мороза2011
мастерская деда мороза2011мастерская деда мороза2011
мастерская деда мороза2011GALINA kOSHKINA
 
Tutorial para compras on line
Tutorial para compras on lineTutorial para compras on line
Tutorial para compras on lineLisbeth Ollarves
 
Astronomy libraries - your gateway to information
Astronomy libraries - your gateway to informationAstronomy libraries - your gateway to information
Astronomy libraries - your gateway to informationUta Grothkopf
 
Vector Graphics W4
Vector Graphics W4Vector Graphics W4
Vector Graphics W4twmffat
 
Casey shane pcp-o_week4_final_ppp_visuals
Casey shane pcp-o_week4_final_ppp_visualsCasey shane pcp-o_week4_final_ppp_visuals
Casey shane pcp-o_week4_final_ppp_visualsShane Casey
 
Evaluation
EvaluationEvaluation
EvaluationJhynel
 
Maintenance management best practices indonesia outline.compressed
Maintenance management best practices indonesia outline.compressedMaintenance management best practices indonesia outline.compressed
Maintenance management best practices indonesia outline.compressedFasih Lisan
 
Pr агентства аиа 3013
Pr агентства аиа 3013Pr агентства аиа 3013
Pr агентства аиа 3013Red Keds
 
How Stupid Can We Get Notes
How Stupid Can We Get NotesHow Stupid Can We Get Notes
How Stupid Can We Get Notesguru06
 
Powerpoint Location Recce
Powerpoint Location ReccePowerpoint Location Recce
Powerpoint Location Recceneasterla
 
Brand Storrytelling Ruslan Kovalev Liliya Mamleeva Brandspoint10
Brand Storrytelling Ruslan Kovalev Liliya Mamleeva Brandspoint10Brand Storrytelling Ruslan Kovalev Liliya Mamleeva Brandspoint10
Brand Storrytelling Ruslan Kovalev Liliya Mamleeva Brandspoint10BrandsPoint
 

Andere mochten auch (19)

GICD Webinar Sample Advisor Woman Client Stress
GICD Webinar Sample Advisor Woman Client StressGICD Webinar Sample Advisor Woman Client Stress
GICD Webinar Sample Advisor Woman Client Stress
 
Fic papo
Fic papoFic papo
Fic papo
 
QSAR inquiries, LLC
QSAR inquiries, LLCQSAR inquiries, LLC
QSAR inquiries, LLC
 
People and Performance UK
People and Performance UKPeople and Performance UK
People and Performance UK
 
мастерская деда мороза2011
мастерская деда мороза2011мастерская деда мороза2011
мастерская деда мороза2011
 
Tutorial para compras on line
Tutorial para compras on lineTutorial para compras on line
Tutorial para compras on line
 
Astronomy libraries - your gateway to information
Astronomy libraries - your gateway to informationAstronomy libraries - your gateway to information
Astronomy libraries - your gateway to information
 
Vector Graphics W4
Vector Graphics W4Vector Graphics W4
Vector Graphics W4
 
Casey shane pcp-o_week4_final_ppp_visuals
Casey shane pcp-o_week4_final_ppp_visualsCasey shane pcp-o_week4_final_ppp_visuals
Casey shane pcp-o_week4_final_ppp_visuals
 
Evaluation
EvaluationEvaluation
Evaluation
 
Maintenance management best practices indonesia outline.compressed
Maintenance management best practices indonesia outline.compressedMaintenance management best practices indonesia outline.compressed
Maintenance management best practices indonesia outline.compressed
 
Having Fun with Local WordPress Development
Having Fun with Local WordPress DevelopmentHaving Fun with Local WordPress Development
Having Fun with Local WordPress Development
 
Pr агентства аиа 3013
Pr агентства аиа 3013Pr агентства аиа 3013
Pr агентства аиа 3013
 
How Stupid Can We Get Notes
How Stupid Can We Get NotesHow Stupid Can We Get Notes
How Stupid Can We Get Notes
 
Powerpoint Location Recce
Powerpoint Location ReccePowerpoint Location Recce
Powerpoint Location Recce
 
1E - CP House Refurbishment
1E - CP House Refurbishment1E - CP House Refurbishment
1E - CP House Refurbishment
 
Vistas taller
Vistas tallerVistas taller
Vistas taller
 
Bulk Email Sender
Bulk Email SenderBulk Email Sender
Bulk Email Sender
 
Brand Storrytelling Ruslan Kovalev Liliya Mamleeva Brandspoint10
Brand Storrytelling Ruslan Kovalev Liliya Mamleeva Brandspoint10Brand Storrytelling Ruslan Kovalev Liliya Mamleeva Brandspoint10
Brand Storrytelling Ruslan Kovalev Liliya Mamleeva Brandspoint10
 

Ähnlich wie Navigating Molecular Haystacks: Tools & Applications

Time series compare
Time series compareTime series compare
Time series comparelrhutyra
 
Living In The Cloud
Living In The CloudLiving In The Cloud
Living In The Cloudacme
 
boston scientific1996_annual
boston scientific1996_annualboston scientific1996_annual
boston scientific1996_annualfinance28
 
boston scientific1996_annual
boston scientific1996_annualboston scientific1996_annual
boston scientific1996_annualfinance28
 
Regression Modelling Overview
Regression Modelling OverviewRegression Modelling Overview
Regression Modelling OverviewPaul Hewson
 
Weak Lensing Simulator
Weak Lensing SimulatorWeak Lensing Simulator
Weak Lensing SimulatorCorey Chivers
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power lawsColin Gillespie
 
Survey of Esoko Users
Survey of Esoko UsersSurvey of Esoko Users
Survey of Esoko UsersEsoko
 
Robust parametric classification and variable selection with minimum distance...
Robust parametric classification and variable selection with minimum distance...Robust parametric classification and variable selection with minimum distance...
Robust parametric classification and variable selection with minimum distance...echi99
 
Example sweavefunnelplot
Example sweavefunnelplotExample sweavefunnelplot
Example sweavefunnelplotTony Hirst
 
ICCS2010
ICCS2010ICCS2010
ICCS2010larry77
 
Liquid Measurement Story
Liquid Measurement StoryLiquid Measurement Story
Liquid Measurement Storycheggi1
 
F1 2011 Korea Race Report
F1 2011 Korea Race ReportF1 2011 Korea Race Report
F1 2011 Korea Race ReportTony Hirst
 
F1 India 2011: Free Practice Summary Stats
F1 India 2011: Free Practice Summary StatsF1 India 2011: Free Practice Summary Stats
F1 India 2011: Free Practice Summary StatsTony Hirst
 

Ähnlich wie Navigating Molecular Haystacks: Tools & Applications (19)

Time series compare
Time series compareTime series compare
Time series compare
 
Slides lyon-2011
Slides lyon-2011Slides lyon-2011
Slides lyon-2011
 
Living In The Cloud
Living In The CloudLiving In The Cloud
Living In The Cloud
 
boston scientific1996_annual
boston scientific1996_annualboston scientific1996_annual
boston scientific1996_annual
 
boston scientific1996_annual
boston scientific1996_annualboston scientific1996_annual
boston scientific1996_annual
 
Regression Modelling Overview
Regression Modelling OverviewRegression Modelling Overview
Regression Modelling Overview
 
Slides p6
Slides p6Slides p6
Slides p6
 
Slides mcneil
Slides mcneilSlides mcneil
Slides mcneil
 
Weak Lensing Simulator
Weak Lensing SimulatorWeak Lensing Simulator
Weak Lensing Simulator
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power laws
 
Slides alexander-mcneil
Slides alexander-mcneilSlides alexander-mcneil
Slides alexander-mcneil
 
Survey of Esoko Users
Survey of Esoko UsersSurvey of Esoko Users
Survey of Esoko Users
 
Robust parametric classification and variable selection with minimum distance...
Robust parametric classification and variable selection with minimum distance...Robust parametric classification and variable selection with minimum distance...
Robust parametric classification and variable selection with minimum distance...
 
Example sweavefunnelplot
Example sweavefunnelplotExample sweavefunnelplot
Example sweavefunnelplot
 
Clustering Plot
Clustering PlotClustering Plot
Clustering Plot
 
ICCS2010
ICCS2010ICCS2010
ICCS2010
 
Liquid Measurement Story
Liquid Measurement StoryLiquid Measurement Story
Liquid Measurement Story
 
F1 2011 Korea Race Report
F1 2011 Korea Race ReportF1 2011 Korea Race Report
F1 2011 Korea Race Report
 
F1 India 2011: Free Practice Summary Stats
F1 India 2011: Free Practice Summary StatsF1 India 2011: Free Practice Summary Stats
F1 India 2011: Free Practice Summary Stats
 

Mehr von Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesRajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 

Mehr von Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 

Kürzlich hochgeladen

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 

Kürzlich hochgeladen (20)

YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 

Navigating Molecular Haystacks: Tools & Applications

  • 1. Outlier Detection Using R-NN Curves Searching for HIV Integrase Inhibitors Summary Navigating Molecular Haystacks: Tools & Applications Finding Interesting Molecules & Doing it Fast Rajarshi Guha Department of Chemistry Pennsylvania State University 20th April, 2006
  • 2. Outlier Detection Using R-NN Curves Searching for HIV Integrase Inhibitors Summary Past Work QSAR Applications Artemisinin analogs PDGFR inhibitors Bleaching agents Linear & non-linear methods QSAR Methods Representative QSAR sets Model interpretation Model applicability
  • 3. Outlier Detection Using R-NN Curves Searching for HIV Integrase Inhibitors Summary Past Work Cheminformatics Automated QSAR pipeline Contributions to the CDK Cheminformatics webservices R packages and snippets
  • 4. Outlier Detection Using R-NN Curves Searching for HIV Integrase Inhibitors Summary Past Work Chemical Data Mining Approximate k-NN Local regression Outlier detection VS protocol
  • 5. Outlier Detection Using R-NN Curves Searching for HIV Integrase Inhibitors Summary Past Work Chemical Data Mining Approximate k-NN Local regression Outlier detection VS protocol
  • 6. Outlier Detection Using R-NN Curves Searching for HIV Integrase Inhibitors Summary Outline 1 Outlier Detection Using R-NN Curves Methods for Diversity Analysis Generating & Summarizing R-NN Curves Using R-NN Curves 2 Searching for HIV Integrase Inhibitors Previous Work on HIV Integrase A Tiered Virtual Screening Protocol What Does the Pipeline Give Us? 3 Summary
  • 7. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Outline 1 Outlier Detection Using R-NN Curves Methods for Diversity Analysis Generating & Summarizing R-NN Curves Using R-NN Curves 2 Searching for HIV Integrase Inhibitors Previous Work on HIV Integrase A Tiered Virtual Screening Protocol What Does the Pipeline Give Us? 3 Summary
  • 8. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Diversity Analysis Why is it Important? Compound acquisition Lead hopping Knowledge of the distribution of compounds in a descriptor space may improve predictive models
  • 9. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Approaches to Diversity Analysis Cell based q q q Divide space into bins q q q q qq q q qq q q q qq q q q q q q q qq q q qq qq q q qq q qq q q q q qq q q q Compounds are mapped to bins q q q q q qqq q qq q q qqq q qq q q q q q q q q q q q qqq q q q q q qqq q q q q q q q q q q qq q q q q q q q q qq q q q q q q qq q q q q q q q q q q q qq q q q q q q Disadvantages q qq q q q q Not useful for high dimensional data Choosing the bin size can be tricky Schnur, D.; J. Chem. Inf. Comput. Sci. 1999, 39, 36–45 Agrafiotis, D.; Rassokhin, D.; J. Chem. Inf. Comput. Sci. 2002, 42, 117–12
  • 10. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Approaches to Diversity Analysis q Distance based q q q q q q Considers distance between q q q q q compounds in a space q q q q q q q q q q q Generally requires pairwise distance q q q q calculation q q q q q q q q q q q q q Can be sped up by kD trees, MVP q q q q q trees etc. q q q q q Agrafiotis, D. K.; Lobanov, V. S.; J. Chem. Inf. Comput. Sci. 1999, 39, 51–58
  • 11. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Generating an R-NN Curve Observations Consider a query point with a hypersphere, of radius R, centered on it For small R, the hypersphere will contain very few or no neighbors For larger R, the number of neighbors will increase When R ≥ Dmax , the neighbor set is the whole dataset The question is . . . Does the variation of nearest neighbor count with radius allow us to characterize the location of a query point in a dataset?
  • 12. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Generating an R-NN Curve Observations Consider a query point with a hypersphere, of radius R, centered on it For small R, the hypersphere will contain very few or no neighbors For larger R, the number of neighbors will increase When R ≥ Dmax , the neighbor set is the whole dataset The question is . . . Does the variation of nearest neighbor count with radius allow us to characterize the location of a query point in a dataset?
  • 13. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Generating an R-NN Curve Algorithm q q 50% q q Dmax ← max pairwise distance q q q q qq 30% for molecule in dataset do q q q q q q q q q q q q q q q q q q q R ← 0.01 × Dmax q q q q q q q qq q q q q q q q q qq q q q q q q 10% q q q q q q q q q q q 5% while R ≤ Dmax do q q q q qq qq q q q q q q qq q q q q qq q q q q q q q q q q q q qq q q q q q q q q Find NN’s within radius R q q q q q q q q q q q q q q q q q q q Increment R q q q q q q q q q q q q q q q q q end while q end for q Plot NN count vs. R Guha, R.; Dutta, D.; Jurs, P.C.; Chen, T.; J. Chem. Inf. Model., submitted
  • 14. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Generating an R-NN Curve qq qq qqqq qqq qqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqq q qqqqqq qqqqq q qqqq qqqq 250 q 250 qq qqq qq Number of Neighbors Number of Neighbors q q 200 200 qq q q q 150 150 qq q q q q 100 100 q qq q q 50 q 50 qq q qq qq q qqq qq qqqqqqqqqq qqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqq 0 qqq qq 0 0 20 40 60 80 100 0 20 40 60 80 100 R R Sparse Dense
  • 15. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Characterizing an R-NN Curve Converting the Plot to Numbers Since R-NN curves are sigmoidal, fit them to the logistic equation 1 + m e −R/τ NN = a · 1 + n e −R/τ m, n should characterize the curve Problems Two parameters Non-linear fitting is dependent on the starting point For some starting points, the fit does not converge and requires repetition
  • 16. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Characterizing an R-NN Curve Converting the Plot to a Number Determine the value of R where the lower tail transitions to the linear portion of the curve S=0 S’’ Solution Determine the slope at various points on the curve S’ S=0 Find R for the first occurence of the maximal slope (Rmax(S) ) Can be achieved using a finite difference approach
  • 17. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Characterizing an R-NN Curve Converting the Plot to a Number Determine the value of R where the lower tail transitions to the linear portion of the curve S=0 S’’ Solution Determine the slope at various points on the curve S’ S=0 Find R for the first occurence of the maximal slope (Rmax(S) ) Can be achieved using a finite difference approach
  • 18. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Characterizing Multiple R-NN Curves Problem Visual inspection of curves is useful for a few molecules 80 For larger datasets we need to summarize R-NN curves 60 Rmax S 40 Solution Plot Rmax(S) values for each 20 molecule in the dataset Points at the top of the plot are 0 0 50 100 150 200 250 Serial Number located in the sparsest regions Points at the bottom are located in the densest regions
  • 19. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves How Can We Use It For Large Datasets? Breaking the O(n2 ) barrier Traditional NN detection has a time complexity of O(n2 ) Modern NN algorithms such as kD-trees have lower time complexity restricted to the exact NN problem Solution is to use approximate NN algorithms such as Locality Sensitive Hashing (LSH) Bentley, J.; Commun. ACM 1980, 23, 214–229 Datar, M. et al.; SCG ’04: Proc. 20th Symp. Comp. Geom.; ACM Press, 2004 Dutta, D.; Guha, R.; Jurs, P.; Chen, T.; J. Chem. Inf. Model. 2006, 46, 321–333
  • 20. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves How Can We Use It For Large Datasets? −1.0 −1.5 log [Mean Query Time (sec)] Why LSH? −2.0 Theoretically sublinear −2.5 Shown to be 3 orders of −3.0 LSH (142 descriptors) LSH (20 descriptors) magnitude faster than kNN (142 descriptors) kNN (20 descriptors) −3.5 traditional kNN 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Very accurate (> 94%) Radius Comparison of NN detection speed on a 42,000 compound dataset using a 200 point query set Dutta, D.; Guha, R.; Jurs, P.; Chen, T.; J. Chem. Inf. Model. 2006, 46, 321–333
  • 21. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Alternatives? Why not use PCA? R-NN curves are fundamentally a 4 form of dimension reduction 2 Principal Components Analysis is Principal Component 2 0 also a form of dimension reduction −2 Disadvantages −4 Eigendecomposition via SVD is −6 O(n3 ) 0 5 10 Principal Component 1 Difficult to visualize more than 2 or 3 PC’s at the same time
  • 22. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Datasets Boiling point dataset 277 molecules Average: MW = 115, Stanimoto = 0.20 Calculated 214 descriptors, reduced to 64 Kazius-AMES dataset 4337 molecules Average MW = 240, Stanimoto = 0.21 Known to have a number of significant outliers Calculated 142 MOE descriptors, reduced to 45 Goll, E.; Jurs, P.; J. Chem. Inf. Comput. Sci. 1999, 39, 974–983 Kazius, J.; McGuire, R.; Bursi, R.; J. Med. Chem. 2005, 48, 312–320
  • 23. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Choosing a Descriptor Space Boiling point dataset We had previously generated linear regression models using a GA to search for subsets Best model had 4 descriptors Kazius-AMES dataset No models were generated for this dataset Considered random descriptor subsets Used a 5-descriptor subset to represent the results The R-NN curve approach focuses on the distribution of molecules in a given descriptor space. Hence a good or random descriptor subset should be able to highlight the utility of the method
  • 24. Outlier Detection Using R-NN Curves Methods for Diversity Analysis Searching for HIV Integrase Inhibitors Generating & Summarizing R-NN Curves Summary Using R-NN Curves Choosing a Descriptor Space Boiling point dataset We had previously generated linear regression models using a GA to search for subsets Best model had 4 descriptors Kazius-AMES dataset No models were generated for this dataset Considered random descriptor subsets Used a 5-descriptor subset to represent the results The R-NN curve approach focuses on the distribution of molecules in a given descriptor space. Hence a good or random descriptor subset should be able to highlight the utility of the method