SlideShare a Scribd company logo
1 of 26
Molecular similarity                    By: Haytham Hijazi
searching methods                       Advisor: Univ-Prof. Hon-Prof. Dr. Dieter
in drug discovery                                       Roller


A Presentation in advanced graphical
engineering systems seminar 2011/2012

                                                                              1
In this work, I propose a contribution to the field of “Cheminformatic”.
   Cheminformatic means solving chemical problems using computational methods[1].



James Rhodes, Stephen Boyer1, Jeffrey Kreulen, Ying Chen, Patricia Ordonez, “Mining patents using molecular similarity
search”, IBM, Almaden Services Research, Pacific Symposium on Biocomputing 12:304-315(2007).




      Molecular similarity                                                                By: Haytham Hijazi
      searching methods                                                                      Advisor: Univ-Prof. Hon-Prof. Dr. Dieter
      in drug discovery                                                                                      Roller


      A Presentation in advanced graphical
      engineering systems seminar 2011/2012

                                                                                                                                    2
Agenda
                           •The main question in this research

                           •The principle of similarity

                           •Drug discovery as an application

                           •Research problem

                           • Molecular representations (1D, 2D…)

                           •Searching the similarity

                           •Similarity coefficients calculations

                           •The probabilistic model (BIM)

                           •The contribution (MDC)

                           •Experiments, conclusions and discussion
                                                                      3
A Presentation in advanced graphical engineering
systems seminar 2011/2012
“The similarity is in the eye of the beholder”
      Shape                     Colour




      Size                      Pattern




                                                 4
Question:      Which molecules in a database are
               similar to the query
               molecule?
Application:   •better compounds than initial lead
               compound (Drug discovery)
               •Property prediction of unknown
               compound.




                                                     5
     Structurally similar molecules are assumed to have
             similar biological properties.


            Similar biological propritiesdrug discovery.




                                                                   [1]




1. Sylvaine Roy and Laurence Lafanechère, “Chemogenomics and Chemical Genetics: A User's Introduction for
Biologists, Chemists and Informaticians”, Molecular similarity, Springer Berlin, ISBN 978-3-642-19614-0, 1st Edition.   6
Claim: General manufacturing problems!
                                         7
Similarity coefficients
  Molecule
                Feature selection      calculations and
represntation
                                      ranking for search




                                                              8
   Historical progression
            ◦ Complete structure
            ◦ Sub-Structure


           Descriptors
            ◦ 1D (psychophysical properties), 2D, 3D, and 4D


           Connectivity tables and graph theory!




Image Source: Karine Audouze, “Representation of molecular structures and structural
                                                                                       9
diversity”, ChemoInformatics in Drug Discovery, 2009.
SMILES


                                                              CCCC1=NN(C2=C1NC(=NC2=O)C3=C(C=
     CC(=O)OC1=CC=CC=C1C(=O)O
                                                              CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C

                  SMILES – Simplified Molecular Line Entry System
Source: Karine Audouze, “Representation of molecular structures and structural
                                                                                              10
diversity”, ChemoInformatics in Drug Discovery, 2009.
       A fingerprint is a vector encoding the presence (‘1’) or
              absence (‘0’) of FRAGMENT substructures in a molecule


             Dictionary based or and hash based fingerprints

                  Descriptor          Fragment


              1                      AR


              2                      CCCCN


              3                      Me


              9                      NH2



                               [1]
                                                                                          [2]

2. Source: Karine Audouze, “Representation of molecular structures and structural diversity”,
                                                                                                11
ChemoInformatics in Drug Discovery, 2009.
   In 3D keys the position of each bit
            corresponds to a certain range of distances or
            angels.
           Computationally complex




Source: Karine Audouze, “Representation of molecular structures and structural
                                                                                 12
diversity”, ChemoInformatics in Drug Discovery, 2009.
Similarity coefficients
  Molecule
                Feature selection      calculations and
represntation
                                      ranking for search




                                                              13
   Exact structure search
                             Structure search
   Substructure search

   Similarity searching: maximal common sub
    graph isomorphism, Tanimoto/Dice/Cosine
    coefficients




                                                14
   The similarity measure (coefficient) is a
    quantitative measure of similarity

   Used to rank the results of the query

   Results are ordered decreasingly

    Distance coefficients.
    Probabilistic coefficients.
    Correlation coefficients.
    Association coefficients.


                                                15
Associative
           Simple matching coefficient                          (c+d)/(a+b-c+d)
           Jaccard measure (Tanimoto)                           c/(a+b-c) =AND/OR
           Cosine, Ochiai                                       c/√(a+b)(c+d)
           Dice                                                 c/.5[(a+c)+(b+c)] and 2c/a+b
                                                         Distance
           Hamming distance                                     a+b-2c
           Euclidean distance                                   √a+b-2c
           Soregel distance                                     a+b-2c/a+b-c
                                                   Other coefficients
           Pattern difference                                   ab/(a+b c+d)2
           Size                                                 (a-b)2/(a+b+c+d)2




Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research
                                                                                                 16
Management Centre Project Vote – 75207, University of Malaysia, 2009
   Assume we generate the fingerprint fragment
    based bits
   Molecule A:
       00010100010101000101010011110100
   Molecule B:
       00000000100101001001000011100000
                                      c
   Tanimoto coefficient =
   Where c=A AND B              (a   b)   c

   Tanimoto=6/(13+8)-6=0.4

                         a   c    b

                                                  17
   Associate the relevance of a structure to an
            explicit feature




           pi=probability that bit bi appears in an active structure.
           qi=probability that bit bi appears in an inactive structure
           αi represents a binary selector. If αi=1 means the bit occurs in the structure, else it is 0 and negated.
           P (A|S) is the probability of an active structure given S.
           P (NA|S) is the probability of an inactive structure given S.
           P(A) is the probability of ACTIVEs
           P(NA) is the probability of INACTIVES




Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research
                                                                                                                        18
Management Centre Project Vote – 75207, University of Malaysia, 2009
Claim: General manufacturing problems !
                                          19
Molecular
 dynamic
simulating
   tool                                            Active
                                                   compounds
                                                   Database
    Psychophysical properties   Voting   Class 1

        Classification                   Class 2
         Algorithm

                                         Class n


                                                               20
   Better insight about the similarity in terms of
    bioactivity, toxicity, reactivity...(+)

   The time of searching (+)

   Prediction and voting possibilities (+)

   Cost of simulation tools (-)

   Classification errors (-)


                                                      21
   Materials Explorer




   Itemtracker -Freezer/Cryogen sample tracking system


   CHARMM


   MDynaMix




                                                          22
Fingerprint time gneration

                                   30

                                   25

                                   20

                       Time (Ms)   15
                                                                                                   2 bits
                                   10
                                                                                                   3 bits
                                     5                                                             4 bits
                                                                                          4 bits
                                        0
                                                                                        3 bits
                                                4                                     2 bits
                                                        5
                                                                 6
                                                                              7
                                                                                  8

                                                            Max path.length




                                            Consider if we have more than 1000 bits!

Data source: simulating tool indicated in the report [17]
                                                                                                            23
Hit rate
                     0.18

                     0.16

                     0.14

                     0.12

                      0.1
          Hit Rate




                     0.08
                                                                                              Hit rate
                     0.06

                     0.04

                     0.02

                       0

                            0       500              1000                1500   2000   2500

                                                            Selection Size


   The more we increase the size of features, the more the hit rate of finding actives decreaes.


Data source: simulating tool indicated in the report [17]
                                                                                                         24
   Even fingerprint fragment based is time
    consuming

   Probabilistic models and machine learning
    introduced substantial changes

    Mixing more than type of descriptors seems
    efficient i.e. Time and results quality

   Still need to have experimental results



                                                  25
Molecular similarity                       Thanks for your listening
searching methods
in drug discovery                          Haytham Hijazi
                                           
A Presentation to the advanced graphical
engineering systems seminar 2011/2012

                                                                  26

More Related Content

What's hot

Conformational analysis
Conformational analysisConformational analysis
Conformational analysis
Pinky Vincent
 
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
Deependra Ban
 

What's hot (20)

Conformational analysis
Conformational analysisConformational analysis
Conformational analysis
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
In silico drug desigining
In silico drug desiginingIn silico drug desigining
In silico drug desigining
 
Chemoinformatics
ChemoinformaticsChemoinformatics
Chemoinformatics
 
Structure based drug design
Structure based drug designStructure based drug design
Structure based drug design
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Computer aided drug designing (CADD)
Computer aided drug designing (CADD)
 
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening1  -val_gillet_-_ligand-based_and_structure-based_virtual_screening
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Chemoinformatic
Chemoinformatic Chemoinformatic
Chemoinformatic
 
Validation of homology modeling
Validation of homology modelingValidation of homology modeling
Validation of homology modeling
 
Molecular modeling in drug design
Molecular modeling in drug designMolecular modeling in drug design
Molecular modeling in drug design
 
Protein Structure Prediction
Protein Structure PredictionProtein Structure Prediction
Protein Structure Prediction
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Molecular modelling (1)
Molecular modelling (1)Molecular modelling (1)
Molecular modelling (1)
 
MD Simulation
MD SimulationMD Simulation
MD Simulation
 
Molecular modelling
Molecular modelling Molecular modelling
Molecular modelling
 

Similar to Molecular similarity searching methods, seminar

Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"
ieee_cis_cyprus
 
Ontology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questionsOntology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questions
Nicola Guarino
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
butest
 
Artificial ethics
Artificial ethicsArtificial ethics
Artificial ethics
JORGE
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...
butest
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
Pallavi Vashistha
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2
zukun
 

Similar to Molecular similarity searching methods, seminar (20)

Machine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryMachine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug Discovery
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resources
 
Semantic representation of neuroimaging observation
Semantic representation of neuroimaging observationSemantic representation of neuroimaging observation
Semantic representation of neuroimaging observation
 
Ontology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questionsOntology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questions
 
Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Semantic Hybridized Image Features in Visual Diagnostic of Plant Health
Semantic Hybridized Image Features in Visual Diagnostic of Plant HealthSemantic Hybridized Image Features in Visual Diagnostic of Plant Health
Semantic Hybridized Image Features in Visual Diagnostic of Plant Health
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Artificial ethics
Artificial ethicsArtificial ethics
Artificial ethics
 
Bm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel GeaBm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
 
MultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic TwinsMultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic Twins
 
Human Assessment of Ontologies
Human Assessment of OntologiesHuman Assessment of Ontologies
Human Assessment of Ontologies
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
 
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2
 
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Molecular similarity searching methods, seminar

  • 1. Molecular similarity By: Haytham Hijazi searching methods Advisor: Univ-Prof. Hon-Prof. Dr. Dieter in drug discovery Roller A Presentation in advanced graphical engineering systems seminar 2011/2012 1
  • 2. In this work, I propose a contribution to the field of “Cheminformatic”. Cheminformatic means solving chemical problems using computational methods[1]. James Rhodes, Stephen Boyer1, Jeffrey Kreulen, Ying Chen, Patricia Ordonez, “Mining patents using molecular similarity search”, IBM, Almaden Services Research, Pacific Symposium on Biocomputing 12:304-315(2007). Molecular similarity By: Haytham Hijazi searching methods Advisor: Univ-Prof. Hon-Prof. Dr. Dieter in drug discovery Roller A Presentation in advanced graphical engineering systems seminar 2011/2012 2
  • 3. Agenda •The main question in this research •The principle of similarity •Drug discovery as an application •Research problem • Molecular representations (1D, 2D…) •Searching the similarity •Similarity coefficients calculations •The probabilistic model (BIM) •The contribution (MDC) •Experiments, conclusions and discussion 3 A Presentation in advanced graphical engineering systems seminar 2011/2012
  • 4. “The similarity is in the eye of the beholder” Shape Colour Size Pattern 4
  • 5. Question: Which molecules in a database are similar to the query molecule? Application: •better compounds than initial lead compound (Drug discovery) •Property prediction of unknown compound. 5
  • 6. Structurally similar molecules are assumed to have similar biological properties.  Similar biological propritiesdrug discovery. [1] 1. Sylvaine Roy and Laurence Lafanechère, “Chemogenomics and Chemical Genetics: A User's Introduction for Biologists, Chemists and Informaticians”, Molecular similarity, Springer Berlin, ISBN 978-3-642-19614-0, 1st Edition. 6
  • 8. Similarity coefficients Molecule Feature selection calculations and represntation ranking for search 8
  • 9. Historical progression ◦ Complete structure ◦ Sub-Structure  Descriptors ◦ 1D (psychophysical properties), 2D, 3D, and 4D  Connectivity tables and graph theory! Image Source: Karine Audouze, “Representation of molecular structures and structural 9 diversity”, ChemoInformatics in Drug Discovery, 2009.
  • 10. SMILES CCCC1=NN(C2=C1NC(=NC2=O)C3=C(C= CC(=O)OC1=CC=CC=C1C(=O)O CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C SMILES – Simplified Molecular Line Entry System Source: Karine Audouze, “Representation of molecular structures and structural 10 diversity”, ChemoInformatics in Drug Discovery, 2009.
  • 11. A fingerprint is a vector encoding the presence (‘1’) or absence (‘0’) of FRAGMENT substructures in a molecule  Dictionary based or and hash based fingerprints Descriptor Fragment 1 AR 2 CCCCN 3 Me 9 NH2 [1] [2] 2. Source: Karine Audouze, “Representation of molecular structures and structural diversity”, 11 ChemoInformatics in Drug Discovery, 2009.
  • 12. In 3D keys the position of each bit corresponds to a certain range of distances or angels.  Computationally complex Source: Karine Audouze, “Representation of molecular structures and structural 12 diversity”, ChemoInformatics in Drug Discovery, 2009.
  • 13. Similarity coefficients Molecule Feature selection calculations and represntation ranking for search 13
  • 14. Exact structure search Structure search  Substructure search  Similarity searching: maximal common sub graph isomorphism, Tanimoto/Dice/Cosine coefficients 14
  • 15. The similarity measure (coefficient) is a quantitative measure of similarity  Used to rank the results of the query  Results are ordered decreasingly Distance coefficients. Probabilistic coefficients. Correlation coefficients. Association coefficients. 15
  • 16. Associative Simple matching coefficient (c+d)/(a+b-c+d) Jaccard measure (Tanimoto) c/(a+b-c) =AND/OR Cosine, Ochiai c/√(a+b)(c+d) Dice c/.5[(a+c)+(b+c)] and 2c/a+b Distance Hamming distance a+b-2c Euclidean distance √a+b-2c Soregel distance a+b-2c/a+b-c Other coefficients Pattern difference ab/(a+b c+d)2 Size (a-b)2/(a+b+c+d)2 Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research 16 Management Centre Project Vote – 75207, University of Malaysia, 2009
  • 17. Assume we generate the fingerprint fragment based bits  Molecule A: 00010100010101000101010011110100  Molecule B: 00000000100101001001000011100000 c  Tanimoto coefficient =  Where c=A AND B (a b) c  Tanimoto=6/(13+8)-6=0.4 a c b 17
  • 18. Associate the relevance of a structure to an explicit feature  pi=probability that bit bi appears in an active structure.  qi=probability that bit bi appears in an inactive structure  αi represents a binary selector. If αi=1 means the bit occurs in the structure, else it is 0 and negated.  P (A|S) is the probability of an active structure given S.  P (NA|S) is the probability of an inactive structure given S.  P(A) is the probability of ACTIVEs  P(NA) is the probability of INACTIVES Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research 18 Management Centre Project Vote – 75207, University of Malaysia, 2009
  • 20. Molecular dynamic simulating tool Active compounds Database Psychophysical properties Voting Class 1 Classification Class 2 Algorithm Class n 20
  • 21. Better insight about the similarity in terms of bioactivity, toxicity, reactivity...(+)  The time of searching (+)  Prediction and voting possibilities (+)  Cost of simulation tools (-)  Classification errors (-) 21
  • 22. Materials Explorer  Itemtracker -Freezer/Cryogen sample tracking system  CHARMM  MDynaMix 22
  • 23. Fingerprint time gneration 30 25 20 Time (Ms) 15 2 bits 10 3 bits 5 4 bits 4 bits 0 3 bits 4 2 bits 5 6 7 8 Max path.length Consider if we have more than 1000 bits! Data source: simulating tool indicated in the report [17] 23
  • 24. Hit rate 0.18 0.16 0.14 0.12 0.1 Hit Rate 0.08 Hit rate 0.06 0.04 0.02 0 0 500 1000 1500 2000 2500 Selection Size The more we increase the size of features, the more the hit rate of finding actives decreaes. Data source: simulating tool indicated in the report [17] 24
  • 25. Even fingerprint fragment based is time consuming  Probabilistic models and machine learning introduced substantial changes  Mixing more than type of descriptors seems efficient i.e. Time and results quality  Still need to have experimental results 25
  • 26. Molecular similarity Thanks for your listening searching methods in drug discovery Haytham Hijazi  A Presentation to the advanced graphical engineering systems seminar 2011/2012 26

Editor's Notes

  1. 1
  2. Each bit in the fingerprint represents one molecular fragment