SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Cheminformatics Noel M. O’Boyle Apr 2010 Postgrad course on Comp Chem
Cheminformatics Hard to define in words: David Wild: “The field that studies all aspects of the representation and use of chemical and related biological information on computers” Design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information Hard to agree on spelling: Sometimes chemoinformatics More easily thought of as encompassing a range of concepts and techniques Molecular similarity Quantitative-structure activity relationships (QSAR) Substructure search (Automated) Molecular depiction Encoding/decoding of molecular structures 3D structure generation from a 2D or 0D structure Conformer generation Algorithms: ring perception, aromaticity, isomers
References Cheminformatics, Johann Gasteiger and Thomas Engel (Eds) Molecular modelling – Principles and Applications, A. R. Leach I571 Chemical Information Technology, David Wild, University of Indiana, http://i571.wikispaces.com/ An introduction to cheminformatics, A. R. Leach, V. J. Gillet
Molecular representation Mike Hann (GSK): “Cecin'est pas une molecule serves to remind us that all of the graphics images presented here are not molecules, not even pictures of molecules, but pictures of icons which we believe represent some aspects of the molecule's properties.” http://mgl.scripps.edu/people/goodsell/mgs_art/hann.html
Computer representations of molecules How can a molecular structure be stored on a computer? Common names: aspirin IUPAC name: 2-acetoxybenzoic acid Formula: C9H8O4 As an image (PNG, GIF, etc.) CAS number: 50-78-2 File format: ChemDraw file, MOL file, etc. SMILES string: O=C(Oc1ccccc1C(=O)O)C Binary Fingerprint: 10000100000001100000100100000001 How should it be stored? ...if I want to find all molecules in a database of 100K molecules that have a benzene ring? ...if I want a unique identifier? http://en.wikipedia.org/wiki/Aspirin
Computer representations of molecules The structure of a molecule can be represented by a graph Graph = collection of nodes and edges, nodes and edges have properties (atomic number, bond order) Represent the molecular graph somehow Connection table (which nodes are connected to which other nodes) Line notation (e.g. SMILES) Fig 12.2: Molecular modelling – principles and applications, Andrew R Leach, Pearson, 2ndedn.
Chemical file formats A large number of file formats have been developed However there are certain de-facto standards MOL file for small-molecule structures PDB files for protein structures from crystallography MOL2 files for protein structures from modelling software (e.g. after manipulation of the PDB file)
A chemical file format: MOL file Fig 12.3: Molecular modelling – principles and applications, Andrew R Leach, Pearson, 2ndedn. This file format can represent 0D, 2D information (a depiction) as well as 3D
SMILES format Simplified Molecular Input Line Entry System Weininger, J ChemInfComputSci, 1988, 28, 31 More recently, a community developed description: http://opensmiles.org Linear format (“line notation”) that describes the connection table and stereochemistry of a molecule (i.e. 0D) Convenient to enter as a query on-line, store in a database, pass by email, etc. Examples: CC represents CH3CH3 (ethane) CC(=O)O represents CH3COOH (acetic acid) Basic guidelines: Hydrogens are implicit Parentheses indicate branches Each atom is connected to the preceding atom to its left (excluding branches in-between) Single bonds are implicit, = for double, # for triple What is C(C)(C)(C)C?
SMILES format II To represent rings, you need to break a ring bond and replace it by a ring opening symbol and a corresponding ring closure symbol Br 1 1 C C C1CCC=CC1 Cl ,[object Object]
To represent tetrahedral stereochemistryyou use @ or @@
Br[C@](Cl)(I)F means that looking from the Br, the Cl, I, and F are arranged anticlockwise
To represent aromaticity, use lower case
C1CCCCC1 (cyclohexane)
c1ccccc1 (benzene),[object Object]
InChI International Chemical Identifier Line notation developed by NIST and IUPAC Goal: An index for uniquely identifying a molecule Aspirin: InChI=1/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)/f/h11H Features Derived from the structure (unlike CAS number) One-to-one relationship between InChI and structure Layers (of specificity) Can distinguish between stereoisomers, isotopes, or can leave out those layers Different tautomeric forms give rise to the same InChI (unlike SMILES) Notes Not human readable or writeable All implementations use the same (open source) code which is provided by the InChI Trust “The Trust's goal is to enable the interlinking and combining of chemical, biological and related information, using unique machine-readable chemical structure representations to facilitate and expedite new scientific discoveries.” See http://inchi.info and Google “unofficial inchifaq”
A unique identifier makes it easy to link databases DrugBank ChEBI
US Generic Legislation Comprehensive Drug Abuse and Control Act, 1970 Controlled Substances Act, 1970 Federal Analog Act, 1986 The term “controlled substance analog” means a substance The chemical structure of which is substantially similar to the chemical structure of a controlled substance in schedule I or II Slide courtesy Dr. J.J. Keating, School of Pharmacy, University College Cork

Weitere ähnliche Inhalte

Was ist angesagt?

Chemo informatics scope and applications
Chemo informatics scope and applicationsChemo informatics scope and applications
Chemo informatics scope and applicationsshyam I
 
A Brief Overview of Cheminformatics
A Brief Overview of CheminformaticsA Brief Overview of Cheminformatics
A Brief Overview of CheminformaticsSunghwan Kim
 
2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORS2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORSSmita Jain
 
Conformational analysis
Conformational analysisConformational analysis
Conformational analysisPinky Vincent
 
Structure based and ligand based drug designing
Structure based and ligand based drug designingStructure based and ligand based drug designing
Structure based and ligand based drug designingDr Vysakh Mohan M
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function predictionLars Juhl Jensen
 
Cheminformatics by kk sahu
Cheminformatics by kk sahuCheminformatics by kk sahu
Cheminformatics by kk sahuKAUSHAL SAHU
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminarHaitham Hijazi
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionArindam Ghosh
 
Molecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingMolecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingAkshay Kank
 
System's Biology
System's Biology System's Biology
System's Biology Pritam Shil
 
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Chanin Nantasenamat
 
Role of bioinformatics of drug designing
Role of bioinformatics of drug designingRole of bioinformatics of drug designing
Role of bioinformatics of drug designingDr NEETHU ASOKAN
 

Was ist angesagt? (20)

Chemo informatics scope and applications
Chemo informatics scope and applicationsChemo informatics scope and applications
Chemo informatics scope and applications
 
A Brief Overview of Cheminformatics
A Brief Overview of CheminformaticsA Brief Overview of Cheminformatics
A Brief Overview of Cheminformatics
 
Chemoinformatics.ppt
Chemoinformatics.pptChemoinformatics.ppt
Chemoinformatics.ppt
 
2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORS2D QSAR DESCRIPTORS
2D QSAR DESCRIPTORS
 
Chemoinformatic
Chemoinformatic Chemoinformatic
Chemoinformatic
 
Conformational analysis
Conformational analysisConformational analysis
Conformational analysis
 
Structure based and ligand based drug designing
Structure based and ligand based drug designingStructure based and ligand based drug designing
Structure based and ligand based drug designing
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Cheminformatics by kk sahu
Cheminformatics by kk sahuCheminformatics by kk sahu
Cheminformatics by kk sahu
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminar
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Molecular modeling in drug design
Molecular modeling in drug designMolecular modeling in drug design
Molecular modeling in drug design
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Finding motif
Finding motifFinding motif
Finding motif
 
Molecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular ModelingMolecular Mechanics in Molecular Modeling
Molecular Mechanics in Molecular Modeling
 
System's Biology
System's Biology System's Biology
System's Biology
 
Genomic Databases-.pptx
Genomic Databases-.pptxGenomic Databases-.pptx
Genomic Databases-.pptx
 
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
 
Role of bioinformatics of drug designing
Role of bioinformatics of drug designingRole of bioinformatics of drug designing
Role of bioinformatics of drug designing
 

Andere mochten auch

Cheminformatics
CheminformaticsCheminformatics
Cheminformaticsbaoilleach
 
Chemoinformatics and information management
Chemoinformatics and information managementChemoinformatics and information management
Chemoinformatics and information managementDuncan Hull
 
Cheminformatics in R
Cheminformatics in RCheminformatics in R
Cheminformatics in RRajarshi Guha
 
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureDevakumar Jain
 
Cheminformatics: An overview
Cheminformatics: An overviewCheminformatics: An overview
Cheminformatics: An overviewsubhasis banerjee
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsJeremy Yang
 
Chemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientistsChemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientistsunyil96
 
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...Dr. Haxel Consult
 
Pharmacophore Q&A
Pharmacophore Q&APharmacophore Q&A
Pharmacophore Q&ASean Ekins
 
Code camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific ThinkingCode camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific ThinkingMitch Miller
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
Smiles Across the World
Smiles Across the WorldSmiles Across the World
Smiles Across the Worldaluthe
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Sean Ekins
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
 
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesApplied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesHezekiah Fatoki
 
Use of Rasmol and study of proteins
Use of Rasmol and study of proteins Use of Rasmol and study of proteins
Use of Rasmol and study of proteins kamalmodi481
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
PharmacohorepptAbhik Seal
 

Andere mochten auch (20)

Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Chemoinformatics and information management
Chemoinformatics and information managementChemoinformatics and information management
Chemoinformatics and information management
 
Cheminformatics in R
Cheminformatics in RCheminformatics in R
Cheminformatics in R
 
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
 
Cheminformatics: An overview
Cheminformatics: An overviewCheminformatics: An overview
Cheminformatics: An overview
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Chemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientistsChemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientists
 
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
DWPI Markush Database on STN – A New Perspective for Searching Markush Struct...
 
Pharmacophore Q&A
Pharmacophore Q&APharmacophore Q&A
Pharmacophore Q&A
 
Code camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific ThinkingCode camp 2014 Talk Scientific Thinking
Code camp 2014 Talk Scientific Thinking
 
Regulatory 101
Regulatory 101 Regulatory 101
Regulatory 101
 
Chem spider introduction spring 2011
Chem spider introduction spring 2011Chem spider introduction spring 2011
Chem spider introduction spring 2011
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
22.pharmacophore
22.pharmacophore22.pharmacophore
22.pharmacophore
 
Smiles Across the World
Smiles Across the WorldSmiles Across the World
Smiles Across the World
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and OpportunitiesApplied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
Applied Bioinformatics & Chemoinformatics: Techniques, Tools, and Opportunities
 
Use of Rasmol and study of proteins
Use of Rasmol and study of proteins Use of Rasmol and study of proteins
Use of Rasmol and study of proteins
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
Pharmacohoreppt
 

Ähnlich wie Cheminformatics

Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformaticsBenjamin Bucior
 
HMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSIS
HMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSISHMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSIS
HMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSISijcseit
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug designReihaneh Safavi
 
The chaotic structure of
The chaotic structure ofThe chaotic structure of
The chaotic structure ofcsandit
 
The Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein SequencesThe Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein Sequencescsandit
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomicsShyam Sarkar
 
Target oriented generic fingerprint-based molecular representation
Target oriented generic fingerprint-based molecular representationTarget oriented generic fingerprint-based molecular representation
Target oriented generic fingerprint-based molecular representationcsandit
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shahguest5de83e
 
Molecular Structures 2009
Molecular Structures 2009Molecular Structures 2009
Molecular Structures 2009lyonja
 
20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environment20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environmentJonathan Blakes
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsAlexander Pico
 
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...Yaser Kalifa
 
Chemoinformatic File Format.pptx
Chemoinformatic File Format.pptxChemoinformatic File Format.pptx
Chemoinformatic File Format.pptxwadhava gurumeet
 
Structural Bioinformatics.pdf
Structural Bioinformatics.pdfStructural Bioinformatics.pdf
Structural Bioinformatics.pdfRahmatEkoSanjaya1
 
20090608 Abstraction and reusability in the biological modelling process
20090608 Abstraction and reusability in the biological modelling process20090608 Abstraction and reusability in the biological modelling process
20090608 Abstraction and reusability in the biological modelling processJonathan Blakes
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
Analytic tools for higher-order data
Analytic tools for higher-order dataAnalytic tools for higher-order data
Analytic tools for higher-order dataAustin Benson
 

Ähnlich wie Cheminformatics (20)

Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
 
HMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSIS
HMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSISHMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSIS
HMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSIS
 
Oct 2011 ualr
Oct 2011 ualrOct 2011 ualr
Oct 2011 ualr
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug design
 
The chaotic structure of
The chaotic structure ofThe chaotic structure of
The chaotic structure of
 
The Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein SequencesThe Chaotic Structure of Bacterial Virulence Protein Sequences
The Chaotic Structure of Bacterial Virulence Protein Sequences
 
Stock markets and_human_genomics
Stock markets and_human_genomicsStock markets and_human_genomics
Stock markets and_human_genomics
 
Target oriented generic fingerprint-based molecular representation
Target oriented generic fingerprint-based molecular representationTarget oriented generic fingerprint-based molecular representation
Target oriented generic fingerprint-based molecular representation
 
Presentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali ShahPresentation 2007 Journal Club Azhar Ali Shah
Presentation 2007 Journal Club Azhar Ali Shah
 
Molecular Structures 2009
Molecular Structures 2009Molecular Structures 2009
Molecular Structures 2009
 
20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environment20090219 The case for another systems biology modelling environment
20090219 The case for another systems biology modelling environment
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network Representations
 
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...
Multiobjective Optimization Tool for a Free Structure Analog Circuits Design ...
 
Chemoinformatic File Format.pptx
Chemoinformatic File Format.pptxChemoinformatic File Format.pptx
Chemoinformatic File Format.pptx
 
Structural Bioinformatics.pdf
Structural Bioinformatics.pdfStructural Bioinformatics.pdf
Structural Bioinformatics.pdf
 
20090608 Abstraction and reusability in the biological modelling process
20090608 Abstraction and reusability in the biological modelling process20090608 Abstraction and reusability in the biological modelling process
20090608 Abstraction and reusability in the biological modelling process
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Analytic tools for higher-order data
Analytic tools for higher-order dataAnalytic tools for higher-order data
Analytic tools for higher-order data
 
des.pptx
des.pptxdes.pptx
des.pptx
 

Mehr von baoilleach

We need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESWe need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESbaoilleach
 
Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overviewbaoilleach
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?baoilleach
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Webbaoilleach
 
Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringbaoilleach
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2baoilleach
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babelbaoilleach
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand dockingbaoilleach
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculationbaoilleach
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSARbaoilleach
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsbaoilleach
 
My Open Access papers
My Open Access papersMy Open Access papers
My Open Access papersbaoilleach
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...baoilleach
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...baoilleach
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tunebaoilleach
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...baoilleach
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopybaoilleach
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devicesbaoilleach
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...baoilleach
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...baoilleach
 

Mehr von baoilleach (20)

We need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESWe need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILES
 
Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overview
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Web
 
Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES string
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculation
 
Data Analysis in QSAR
Data Analysis in QSARData Analysis in QSAR
Data Analysis in QSAR
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
 
My Open Access papers
My Open Access papersMy Open Access papers
My Open Access papers
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tune
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopy
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devices
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 

Kürzlich hochgeladen

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Kürzlich hochgeladen (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Cheminformatics

  • 1. Cheminformatics Noel M. O’Boyle Apr 2010 Postgrad course on Comp Chem
  • 2. Cheminformatics Hard to define in words: David Wild: “The field that studies all aspects of the representation and use of chemical and related biological information on computers” Design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information Hard to agree on spelling: Sometimes chemoinformatics More easily thought of as encompassing a range of concepts and techniques Molecular similarity Quantitative-structure activity relationships (QSAR) Substructure search (Automated) Molecular depiction Encoding/decoding of molecular structures 3D structure generation from a 2D or 0D structure Conformer generation Algorithms: ring perception, aromaticity, isomers
  • 3. References Cheminformatics, Johann Gasteiger and Thomas Engel (Eds) Molecular modelling – Principles and Applications, A. R. Leach I571 Chemical Information Technology, David Wild, University of Indiana, http://i571.wikispaces.com/ An introduction to cheminformatics, A. R. Leach, V. J. Gillet
  • 4. Molecular representation Mike Hann (GSK): “Cecin'est pas une molecule serves to remind us that all of the graphics images presented here are not molecules, not even pictures of molecules, but pictures of icons which we believe represent some aspects of the molecule's properties.” http://mgl.scripps.edu/people/goodsell/mgs_art/hann.html
  • 5. Computer representations of molecules How can a molecular structure be stored on a computer? Common names: aspirin IUPAC name: 2-acetoxybenzoic acid Formula: C9H8O4 As an image (PNG, GIF, etc.) CAS number: 50-78-2 File format: ChemDraw file, MOL file, etc. SMILES string: O=C(Oc1ccccc1C(=O)O)C Binary Fingerprint: 10000100000001100000100100000001 How should it be stored? ...if I want to find all molecules in a database of 100K molecules that have a benzene ring? ...if I want a unique identifier? http://en.wikipedia.org/wiki/Aspirin
  • 6. Computer representations of molecules The structure of a molecule can be represented by a graph Graph = collection of nodes and edges, nodes and edges have properties (atomic number, bond order) Represent the molecular graph somehow Connection table (which nodes are connected to which other nodes) Line notation (e.g. SMILES) Fig 12.2: Molecular modelling – principles and applications, Andrew R Leach, Pearson, 2ndedn.
  • 7. Chemical file formats A large number of file formats have been developed However there are certain de-facto standards MOL file for small-molecule structures PDB files for protein structures from crystallography MOL2 files for protein structures from modelling software (e.g. after manipulation of the PDB file)
  • 8. A chemical file format: MOL file Fig 12.3: Molecular modelling – principles and applications, Andrew R Leach, Pearson, 2ndedn. This file format can represent 0D, 2D information (a depiction) as well as 3D
  • 9. SMILES format Simplified Molecular Input Line Entry System Weininger, J ChemInfComputSci, 1988, 28, 31 More recently, a community developed description: http://opensmiles.org Linear format (“line notation”) that describes the connection table and stereochemistry of a molecule (i.e. 0D) Convenient to enter as a query on-line, store in a database, pass by email, etc. Examples: CC represents CH3CH3 (ethane) CC(=O)O represents CH3COOH (acetic acid) Basic guidelines: Hydrogens are implicit Parentheses indicate branches Each atom is connected to the preceding atom to its left (excluding branches in-between) Single bonds are implicit, = for double, # for triple What is C(C)(C)(C)C?
  • 10.
  • 11. To represent tetrahedral stereochemistryyou use @ or @@
  • 12. Br[C@](Cl)(I)F means that looking from the Br, the Cl, I, and F are arranged anticlockwise
  • 13. To represent aromaticity, use lower case
  • 15.
  • 16. InChI International Chemical Identifier Line notation developed by NIST and IUPAC Goal: An index for uniquely identifying a molecule Aspirin: InChI=1/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)/f/h11H Features Derived from the structure (unlike CAS number) One-to-one relationship between InChI and structure Layers (of specificity) Can distinguish between stereoisomers, isotopes, or can leave out those layers Different tautomeric forms give rise to the same InChI (unlike SMILES) Notes Not human readable or writeable All implementations use the same (open source) code which is provided by the InChI Trust “The Trust's goal is to enable the interlinking and combining of chemical, biological and related information, using unique machine-readable chemical structure representations to facilitate and expedite new scientific discoveries.” See http://inchi.info and Google “unofficial inchifaq”
  • 17. A unique identifier makes it easy to link databases DrugBank ChEBI
  • 18. US Generic Legislation Comprehensive Drug Abuse and Control Act, 1970 Controlled Substances Act, 1970 Federal Analog Act, 1986 The term “controlled substance analog” means a substance The chemical structure of which is substantially similar to the chemical structure of a controlled substance in schedule I or II Slide courtesy Dr. J.J. Keating, School of Pharmacy, University College Cork
  • 19. Molecular similarity Similarity principle: Structurally similar molecules tend to have similar properties Properties: biological activity, solubility, color and so on If we can measure similarity somehow Can construct a distance matrix Distance = inverse of similarity Such matrices can be used to cluster compounds, to create a 2D depiction showing the spread of molecular structures in a dataset, to select a diverse subset Can use to find molecules in a database similar to a particular query Can find unknown molecules with a similar property Can use to see whether a particular property is correlated with molecular similarity ...But how to measure similarity? One way is using molecular fingerprints
  • 20. Molecular fingerprints A molecular fingerprint is an encoding of the molecular structure onto a (long) binary string 100100010000001011000000000001... Types: path-based fingerprint, key-based fingerprint Path-based fingerprints (e.g. Daylight fingerprint) Break the molecule up into all possible fragments of length 1, 2, 3...7 Create a string representing each fragment Hash each string onto a number between 1 and 1024 (for example) Wikipedia: “A hash function is any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array” Set the corresponding bit of the fingerprint to 1 (all others will be 0) Key-based fingerprint s(e.g. MACCS keys) A (long) list of pre-generated questions about a chemical structure “Are there fewer than 3 oxygens?” “Is there an S-S bond?” “Is there a ring of size 4?” Each answer, true or false, corresponds to a 1 or 0 in the binary fingerprint
  • 21.
  • 22. Intersection is 38 (Note: B is a substructure of A)
  • 24.
  • 25. Freely available Open database of NMR spectra – add your own spectra (with assigned peaks) – predict assignments

Hinweis der Redaktion

  1. Next time: More on Magritte
  2. Acetic acid
  3. Acetic acid
  4. Add year
  5. Next time: Add some pictures
  6. Next time: Add example of what intersection and union mean graphically