SlideShare a Scribd company logo
1 of 26
Molecular similarity                    By: Haytham Hijazi
searching methods                       Advisor: Univ-Prof. Hon-Prof. Dr. Dieter
in drug discovery                                       Roller


A Presentation in advanced graphical
engineering systems seminar 2011/2012

                                                                              1
In this work, I propose a contribution to the field of “Cheminformatic”.
   Cheminformatic means solving chemical problems using computational methods[1].



James Rhodes, Stephen Boyer1, Jeffrey Kreulen, Ying Chen, Patricia Ordonez, “Mining patents using molecular similarity
search”, IBM, Almaden Services Research, Pacific Symposium on Biocomputing 12:304-315(2007).




      Molecular similarity                                                                By: Haytham Hijazi
      searching methods                                                                      Advisor: Univ-Prof. Hon-Prof. Dr. Dieter
      in drug discovery                                                                                      Roller


      A Presentation in advanced graphical
      engineering systems seminar 2011/2012

                                                                                                                                    2
Agenda
                           •The main question in this research

                           •The principle of similarity

                           •Drug discovery as an application

                           •Research problem

                           • Molecular representations (1D, 2D…)

                           •Searching the similarity

                           •Similarity coefficients calculations

                           •The probabilistic model (BIM)

                           •The contribution (MDC)

                           •Experiments, conclusions and discussion
                                                                      3
A Presentation in advanced graphical engineering
systems seminar 2011/2012
“The similarity is in the eye of the beholder”
      Shape                     Colour




      Size                      Pattern




                                                 4
Question:      Which molecules in a database are
               similar to the query
               molecule?
Application:   •better compounds than initial lead
               compound (Drug discovery)
               •Property prediction of unknown
               compound.




                                                     5
     Structurally similar molecules are assumed to have
             similar biological properties.


            Similar biological propritiesdrug discovery.




                                                                   [1]




1. Sylvaine Roy and Laurence Lafanechère, “Chemogenomics and Chemical Genetics: A User's Introduction for
Biologists, Chemists and Informaticians”, Molecular similarity, Springer Berlin, ISBN 978-3-642-19614-0, 1st Edition.   6
Claim: General manufacturing problems!
                                         7
Similarity coefficients
  Molecule
                Feature selection      calculations and
represntation
                                      ranking for search




                                                              8
   Historical progression
            ◦ Complete structure
            ◦ Sub-Structure


           Descriptors
            ◦ 1D (psychophysical properties), 2D, 3D, and 4D


           Connectivity tables and graph theory!




Image Source: Karine Audouze, “Representation of molecular structures and structural
                                                                                       9
diversity”, ChemoInformatics in Drug Discovery, 2009.
SMILES


                                                              CCCC1=NN(C2=C1NC(=NC2=O)C3=C(C=
     CC(=O)OC1=CC=CC=C1C(=O)O
                                                              CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C

                  SMILES – Simplified Molecular Line Entry System
Source: Karine Audouze, “Representation of molecular structures and structural
                                                                                              10
diversity”, ChemoInformatics in Drug Discovery, 2009.
       A fingerprint is a vector encoding the presence (‘1’) or
              absence (‘0’) of FRAGMENT substructures in a molecule


             Dictionary based or and hash based fingerprints

                  Descriptor          Fragment


              1                      AR


              2                      CCCCN


              3                      Me


              9                      NH2



                               [1]
                                                                                          [2]

2. Source: Karine Audouze, “Representation of molecular structures and structural diversity”,
                                                                                                11
ChemoInformatics in Drug Discovery, 2009.
   In 3D keys the position of each bit
            corresponds to a certain range of distances or
            angels.
           Computationally complex




Source: Karine Audouze, “Representation of molecular structures and structural
                                                                                 12
diversity”, ChemoInformatics in Drug Discovery, 2009.
Similarity coefficients
  Molecule
                Feature selection      calculations and
represntation
                                      ranking for search




                                                              13
   Exact structure search
                             Structure search
   Substructure search

   Similarity searching: maximal common sub
    graph isomorphism, Tanimoto/Dice/Cosine
    coefficients




                                                14
   The similarity measure (coefficient) is a
    quantitative measure of similarity

   Used to rank the results of the query

   Results are ordered decreasingly

    Distance coefficients.
    Probabilistic coefficients.
    Correlation coefficients.
    Association coefficients.


                                                15
Associative
           Simple matching coefficient                          (c+d)/(a+b-c+d)
           Jaccard measure (Tanimoto)                           c/(a+b-c) =AND/OR
           Cosine, Ochiai                                       c/√(a+b)(c+d)
           Dice                                                 c/.5[(a+c)+(b+c)] and 2c/a+b
                                                         Distance
           Hamming distance                                     a+b-2c
           Euclidean distance                                   √a+b-2c
           Soregel distance                                     a+b-2c/a+b-c
                                                   Other coefficients
           Pattern difference                                   ab/(a+b c+d)2
           Size                                                 (a-b)2/(a+b+c+d)2




Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research
                                                                                                 16
Management Centre Project Vote – 75207, University of Malaysia, 2009
   Assume we generate the fingerprint fragment
    based bits
   Molecule A:
       00010100010101000101010011110100
   Molecule B:
       00000000100101001001000011100000
                                      c
   Tanimoto coefficient =
   Where c=A AND B              (a   b)   c

   Tanimoto=6/(13+8)-6=0.4

                         a   c    b

                                                  17
   Associate the relevance of a structure to an
            explicit feature




           pi=probability that bit bi appears in an active structure.
           qi=probability that bit bi appears in an inactive structure
           αi represents a binary selector. If αi=1 means the bit occurs in the structure, else it is 0 and negated.
           P (A|S) is the probability of an active structure given S.
           P (NA|S) is the probability of an inactive structure given S.
           P(A) is the probability of ACTIVEs
           P(NA) is the probability of INACTIVES




Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research
                                                                                                                        18
Management Centre Project Vote – 75207, University of Malaysia, 2009
Claim: General manufacturing problems !
                                          19
Molecular
 dynamic
simulating
   tool                                            Active
                                                   compounds
                                                   Database
    Psychophysical properties   Voting   Class 1

        Classification                   Class 2
         Algorithm

                                         Class n


                                                               20
   Better insight about the similarity in terms of
    bioactivity, toxicity, reactivity...(+)

   The time of searching (+)

   Prediction and voting possibilities (+)

   Cost of simulation tools (-)

   Classification errors (-)


                                                      21
   Materials Explorer




   Itemtracker -Freezer/Cryogen sample tracking system


   CHARMM


   MDynaMix




                                                          22
Fingerprint time gneration

                                   30

                                   25

                                   20

                       Time (Ms)   15
                                                                                                   2 bits
                                   10
                                                                                                   3 bits
                                     5                                                             4 bits
                                                                                          4 bits
                                        0
                                                                                        3 bits
                                                4                                     2 bits
                                                        5
                                                                 6
                                                                              7
                                                                                  8

                                                            Max path.length




                                            Consider if we have more than 1000 bits!

Data source: simulating tool indicated in the report [17]
                                                                                                            23
Hit rate
                     0.18

                     0.16

                     0.14

                     0.12

                      0.1
          Hit Rate




                     0.08
                                                                                              Hit rate
                     0.06

                     0.04

                     0.02

                       0

                            0       500              1000                1500   2000   2500

                                                            Selection Size


   The more we increase the size of features, the more the hit rate of finding actives decreaes.


Data source: simulating tool indicated in the report [17]
                                                                                                         24
   Even fingerprint fragment based is time
    consuming

   Probabilistic models and machine learning
    introduced substantial changes

    Mixing more than type of descriptors seems
    efficient i.e. Time and results quality

   Still need to have experimental results



                                                  25
Molecular similarity                       Thanks for your listening
searching methods
in drug discovery                          Haytham Hijazi
                                           
A Presentation to the advanced graphical
engineering systems seminar 2011/2012

                                                                  26

More Related Content

What's hot

NMR of protein
NMR of proteinNMR of protein
NMR of protein
Jiya Ali
 

What's hot (20)

Molecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingMolecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular Modeling
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
 
22.pharmacophore
22.pharmacophore22.pharmacophore
22.pharmacophore
 
Computational Drug Design
Computational Drug DesignComputational Drug Design
Computational Drug Design
 
In-silico Drug designing
In-silico Drug designing In-silico Drug designing
In-silico Drug designing
 
QSAR : Activity Relationships Quantitative Structure
QSAR : Activity Relationships Quantitative StructureQSAR : Activity Relationships Quantitative Structure
QSAR : Activity Relationships Quantitative Structure
 
7.local and global minima
7.local and global minima7.local and global minima
7.local and global minima
 
Molecular and Quantum Mechanics in drug design
Molecular and Quantum Mechanics in drug designMolecular and Quantum Mechanics in drug design
Molecular and Quantum Mechanics in drug design
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAY
 
Virtual sreening
Virtual sreeningVirtual sreening
Virtual sreening
 
NMR of protein
NMR of proteinNMR of protein
NMR of protein
 
Structure based and ligand based drug designing
Structure based and ligand based drug designingStructure based and ligand based drug designing
Structure based and ligand based drug designing
 
Ligand based drug desighning
Ligand based drug desighningLigand based drug desighning
Ligand based drug desighning
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug design
 
Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)
 
Presentation1
Presentation1Presentation1
Presentation1
 
Molecular Modeling
Molecular ModelingMolecular Modeling
Molecular Modeling
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Ligand based drug design
Ligand based drug designLigand based drug design
Ligand based drug design
 

Similar to Molecular similarity searching methods, seminar

Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"
ieee_cis_cyprus
 
Ontology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questionsOntology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questions
Nicola Guarino
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
butest
 
Artificial ethics
Artificial ethicsArtificial ethics
Artificial ethics
JORGE
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...
butest
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
Pallavi Vashistha
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2
zukun
 

Similar to Molecular similarity searching methods, seminar (20)

Machine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryMachine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug Discovery
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"
 
SBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resourcesSBML (the Systems Biology Markup Language), model databases, and other resources
SBML (the Systems Biology Markup Language), model databases, and other resources
 
Semantic representation of neuroimaging observation
Semantic representation of neuroimaging observationSemantic representation of neuroimaging observation
Semantic representation of neuroimaging observation
 
Ontology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questionsOntology quality, ontology design patterns, and competency questions
Ontology quality, ontology design patterns, and competency questions
 
Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)Information Visualisation (Multimedia 2009 course)
Information Visualisation (Multimedia 2009 course)
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Semantic Hybridized Image Features in Visual Diagnostic of Plant Health
Semantic Hybridized Image Features in Visual Diagnostic of Plant HealthSemantic Hybridized Image Features in Visual Diagnostic of Plant Health
Semantic Hybridized Image Features in Visual Diagnostic of Plant Health
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Artificial ethics
Artificial ethicsArtificial ethics
Artificial ethics
 
Bm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel GeaBm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
Bm Systems Disruptive Innovation E Conference 20052010 Manuel Gea
 
MultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic TwinsMultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic Twins
 
Human Assessment of Ontologies
Human Assessment of OntologiesHuman Assessment of Ontologies
Human Assessment of Ontologies
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
 
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2
 
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
Combining Data Mining and Ontology Engineering to enrich Ontologies and Linke...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Molecular similarity searching methods, seminar

  • 1. Molecular similarity By: Haytham Hijazi searching methods Advisor: Univ-Prof. Hon-Prof. Dr. Dieter in drug discovery Roller A Presentation in advanced graphical engineering systems seminar 2011/2012 1
  • 2. In this work, I propose a contribution to the field of “Cheminformatic”. Cheminformatic means solving chemical problems using computational methods[1]. James Rhodes, Stephen Boyer1, Jeffrey Kreulen, Ying Chen, Patricia Ordonez, “Mining patents using molecular similarity search”, IBM, Almaden Services Research, Pacific Symposium on Biocomputing 12:304-315(2007). Molecular similarity By: Haytham Hijazi searching methods Advisor: Univ-Prof. Hon-Prof. Dr. Dieter in drug discovery Roller A Presentation in advanced graphical engineering systems seminar 2011/2012 2
  • 3. Agenda •The main question in this research •The principle of similarity •Drug discovery as an application •Research problem • Molecular representations (1D, 2D…) •Searching the similarity •Similarity coefficients calculations •The probabilistic model (BIM) •The contribution (MDC) •Experiments, conclusions and discussion 3 A Presentation in advanced graphical engineering systems seminar 2011/2012
  • 4. “The similarity is in the eye of the beholder” Shape Colour Size Pattern 4
  • 5. Question: Which molecules in a database are similar to the query molecule? Application: •better compounds than initial lead compound (Drug discovery) •Property prediction of unknown compound. 5
  • 6. Structurally similar molecules are assumed to have similar biological properties.  Similar biological propritiesdrug discovery. [1] 1. Sylvaine Roy and Laurence Lafanechère, “Chemogenomics and Chemical Genetics: A User's Introduction for Biologists, Chemists and Informaticians”, Molecular similarity, Springer Berlin, ISBN 978-3-642-19614-0, 1st Edition. 6
  • 8. Similarity coefficients Molecule Feature selection calculations and represntation ranking for search 8
  • 9. Historical progression ◦ Complete structure ◦ Sub-Structure  Descriptors ◦ 1D (psychophysical properties), 2D, 3D, and 4D  Connectivity tables and graph theory! Image Source: Karine Audouze, “Representation of molecular structures and structural 9 diversity”, ChemoInformatics in Drug Discovery, 2009.
  • 10. SMILES CCCC1=NN(C2=C1NC(=NC2=O)C3=C(C= CC(=O)OC1=CC=CC=C1C(=O)O CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C SMILES – Simplified Molecular Line Entry System Source: Karine Audouze, “Representation of molecular structures and structural 10 diversity”, ChemoInformatics in Drug Discovery, 2009.
  • 11. A fingerprint is a vector encoding the presence (‘1’) or absence (‘0’) of FRAGMENT substructures in a molecule  Dictionary based or and hash based fingerprints Descriptor Fragment 1 AR 2 CCCCN 3 Me 9 NH2 [1] [2] 2. Source: Karine Audouze, “Representation of molecular structures and structural diversity”, 11 ChemoInformatics in Drug Discovery, 2009.
  • 12. In 3D keys the position of each bit corresponds to a certain range of distances or angels.  Computationally complex Source: Karine Audouze, “Representation of molecular structures and structural 12 diversity”, ChemoInformatics in Drug Discovery, 2009.
  • 13. Similarity coefficients Molecule Feature selection calculations and represntation ranking for search 13
  • 14. Exact structure search Structure search  Substructure search  Similarity searching: maximal common sub graph isomorphism, Tanimoto/Dice/Cosine coefficients 14
  • 15. The similarity measure (coefficient) is a quantitative measure of similarity  Used to rank the results of the query  Results are ordered decreasingly Distance coefficients. Probabilistic coefficients. Correlation coefficients. Association coefficients. 15
  • 16. Associative Simple matching coefficient (c+d)/(a+b-c+d) Jaccard measure (Tanimoto) c/(a+b-c) =AND/OR Cosine, Ochiai c/√(a+b)(c+d) Dice c/.5[(a+c)+(b+c)] and 2c/a+b Distance Hamming distance a+b-2c Euclidean distance √a+b-2c Soregel distance a+b-2c/a+b-c Other coefficients Pattern difference ab/(a+b c+d)2 Size (a-b)2/(a+b+c+d)2 Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research 16 Management Centre Project Vote – 75207, University of Malaysia, 2009
  • 17. Assume we generate the fingerprint fragment based bits  Molecule A: 00010100010101000101010011110100  Molecule B: 00000000100101001001000011100000 c  Tanimoto coefficient =  Where c=A AND B (a b) c  Tanimoto=6/(13+8)-6=0.4 a c b 17
  • 18. Associate the relevance of a structure to an explicit feature  pi=probability that bit bi appears in an active structure.  qi=probability that bit bi appears in an inactive structure  αi represents a binary selector. If αi=1 means the bit occurs in the structure, else it is 0 and negated.  P (A|S) is the probability of an active structure given S.  P (NA|S) is the probability of an inactive structure given S.  P(A) is the probability of ACTIVEs  P(NA) is the probability of INACTIVES Naomie Salim, “The study of probability model for compound similarity searching”, UTM Research 18 Management Centre Project Vote – 75207, University of Malaysia, 2009
  • 20. Molecular dynamic simulating tool Active compounds Database Psychophysical properties Voting Class 1 Classification Class 2 Algorithm Class n 20
  • 21. Better insight about the similarity in terms of bioactivity, toxicity, reactivity...(+)  The time of searching (+)  Prediction and voting possibilities (+)  Cost of simulation tools (-)  Classification errors (-) 21
  • 22. Materials Explorer  Itemtracker -Freezer/Cryogen sample tracking system  CHARMM  MDynaMix 22
  • 23. Fingerprint time gneration 30 25 20 Time (Ms) 15 2 bits 10 3 bits 5 4 bits 4 bits 0 3 bits 4 2 bits 5 6 7 8 Max path.length Consider if we have more than 1000 bits! Data source: simulating tool indicated in the report [17] 23
  • 24. Hit rate 0.18 0.16 0.14 0.12 0.1 Hit Rate 0.08 Hit rate 0.06 0.04 0.02 0 0 500 1000 1500 2000 2500 Selection Size The more we increase the size of features, the more the hit rate of finding actives decreaes. Data source: simulating tool indicated in the report [17] 24
  • 25. Even fingerprint fragment based is time consuming  Probabilistic models and machine learning introduced substantial changes  Mixing more than type of descriptors seems efficient i.e. Time and results quality  Still need to have experimental results 25
  • 26. Molecular similarity Thanks for your listening searching methods in drug discovery Haytham Hijazi  A Presentation to the advanced graphical engineering systems seminar 2011/2012 26

Editor's Notes

  1. 1
  2. Each bit in the fingerprint represents one molecular fragment