SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Efficient Searching and Similarity
of Unmapped Reactions:
Application to ELN Analysis
Roger Sayle, Ed Griffen, Thierry Kogej and David Drake
NextMove Software, Cambridge, UK
AstraZeneca, Alderley Park, UK
AstraZeneca, Molndal, Sweden
ACS National Meeting, San Diego, USA 27th March 2012
motivation
• Better understanding of chemical reactions can be
used to improve the design-test-make cycles in the
pharmaceutical industry, both for in-house synthesis
and outsourcing to CROs.
• Although small molecule informatics is well
supported, the computer representation, storage,
manipulation, analysis, QSAR and property
prediction of chemical reactions is relatively
unexplored.
ACS National Meeting, San Diego, USA 27th March 2012
Motivating example from gsk
• Stephen D. Pickett, Darren V.S. Green, David L. Hunt,
David A. Pardoe and Ian Hughes, “Automated Lead
Optimization of MMP-12 Inhibitors using a Genetic
Algorithm”, ACS Medicinal Chemistry Letters, Vol. 2,
pp. 28-33, 2011.
50 R1 x 50 R2 = 2500 cmpds
ACS National Meeting, San Diego, USA 27th March 2012
Libraries in practice
• Objective was to be diverse complete, so variations of
standard route and conditions were required to avoid
compromising diversity.
• Despite best efforts, 566 not made due to poor
reactivity, unanticipated side reactions or product
stability compared to 26 not assayed, 28 assay failed
and 176 inactive, giving 1704 assay results.
• Interestingly, R1=21&R2=7 and R1=31&R2=25 were the
two most active compounds (pIC50=8), R1=21&R2=25
had a pIC50=7.8 and R1=31&R2=7 was never made!
ACS National Meeting, San Diego, USA 27th March 2012
Transforms vs. reactions
Importance of Reaction Mechanism
Example: Ullman-type Coupling Reactions
SMIRKS: [H][N:1].[Cl][c:2]>>[N:1][c:2]
The SMIRKS transform alone is insufficient
to predict the products and by-products in
this example. A measure of nucleophility
is desirable for each atom in a molecule.
Without this software may be misled into
believing that protecting groups are required.
ACS National Meeting, San Diego, USA 27th March 2012
Beyond drug guru
ACS National Meeting, San Diego, USA 27th March 2012
sildenafil (viagra) vardenafil (levitra)
Proposed solution
• Analyze the hundreds of thousands of reaction
examples described in in-house Electronic Laboratory
Notebooks (ELNs).
• A typical use case would be to plot reaction yield for
related reactions against catalysts and/or solvents
used to determine the likely best or poor conditions.
• ELNs, unlike the literature, are strongly biased to the
types of reactions/chemistries used in pharma.
• Most importantly, one of the only sources of
negative data (reactions that have failed).
ACS National Meeting, San Diego, USA 27th March 2012
The challenges
1. Export of the data from the ELN.
2. High fidelity conversion to other file formats.
3. Reaction normalization/standardization.
4. Reaction identity (canonicalization).
5. Improved reaction depiction.
6. Intuitive reaction searching.
7. Reaction similarity.
8. Reaction classification/clustering.
ACS National Meeting, San Diego, USA 27th March 2012
Database export
• Typically, ELNs are implemented as complex schemas
within relational databases (Oracle), supporting
transactions, auditing and security privileges.
• Not uncommonly the vendor provided functionality
or APIs for data export are slow and/or buggy.
• In addition to reactions and structures, there is often
a requirement to export all associated data, including
textual and numeric data, tables, even LCMS and
NMR spectra.
ACS National Meeting, San Diego, USA 27th March 2012
typical eln fields in RD format
$DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:VALUE
$DATUM 120 °C
$DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:MINVAL
$DATUM 393.15
$DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:MAXVAL
$DATUM 393.15
$DTYPE REACTION:REACTION.CONDITIONS:PRESSURE:VALUE
$DATUM 5 bar
$DTYPE REACTION:REACTANTS(1):NAME
$DATUM nicotinoyl chloride
$DTYPE REACTION:REACTANTS(1):CHEMICAL.STRUCTURE
$DATUM c1cc(cnc1)C(=O)Cl
$DTYPE REACTION:REACTANTS(1):FORMULA.MASS:VALUE
$DATUM 141.56
$DTYPE REACTION:REACTANTS(2):NAME
$DATUM (2R,3S)-3-methylpentan-2-ol
$DTYPE REACTION:REACTANTS(2):CHEMICAL.STRUCTURE
$DATUM CC[C@@H](C)[C@H](C)O
$DTYPE REACTION:REACTANTS(2):FORMULA.MASS:VALUE
$DATUM 102.17
ACS National Meeting, San Diego, USA 27th March 2012
$DTYPE REACTION:PRODUCTS(1):CHEMICAL.STRUCTURE
$DATUM CC[C@@H](C)[C@H](C)OC(=O)c1cccnc1
$DTYPE REACTION:PRODUCTS(1):NAME
$DATUM (2R,3S)-3-methylpentan-2-yl nicotinate
$DTYPE REACTION:PRODUCTS(1):MOLECULAR.WEIGHT:VALUE
$DATUM 207.27
$DTYPE REACTION:PRODUCTS(1):MOLECULAR.FORMULA
$DATUM C12H17NO2
$DTYPE REACTION:PRODUCTS(1):ACTUAL.MOLES:VALUE
$DATUM 2.16 mmol
$DTYPE REACTION:SOLVENTS(1):NAME
$DATUM 1-pentanol
$DTYPE REACTION:SOLVENTS(1):VOLUME:VALUE
$DATUM 5 mL
$DTYPE REACTION:SOLVENTS(1):R.VALUES
$DATUM R10,R20
File format conversion
• The source format for many reactions is typically a
sketch, in either CDX, CDXML, ISIS Sketch or Marvin
file format.
• For data processing reactions are much easier to
handle as reaction SMILES, MDL RXN or RD files and
possibly even variants of MOL and SD file formats.
• Alas handling of reaction file formats is generally
poorly handled by many cheminformatics tools.
• Additionally, reaction file formats can rarely encode
all of the same information.
ACS National Meeting, San Diego, USA 27th March 2012
Reaction standardization
• An often overlooked aspect of ELNs is the need to
enforce “business rules” to consistently represent a
reaction, in the same way that normalized molecules
are stored in registration systems.
• Pharmaceutical ELNs contain structures where nitros
are represented arbitrarily, and even cases where
azide representations differ on each side of an arrow.
• Unfortunately, the rules used for molecules (such as
InChI) may be inappropriate for reactions, where
metal co-ordination and radicals play a major role.
ACS National Meeting, San Diego, USA 27th March 2012
Reaction identity
• Continuing the notion of standardization, one might
ask when are two reactions the same (duplicates).
• Two reactions that have matching agents (catalysts
and solvents), reactants and products are considered
identical, yet two that have the same reactants and
products are considered the “same”.
• Whether a component is a reactant, catalyst, solvent
or reagent may be consistently defined by atom-
mapping; reactants contribute atoms to the product.
ACS National Meeting, San Diego, USA 27th March 2012
Improved reaction depiction
• The 2D display of reactions can be improved by
suitably aligning the reactants and products.
– Molecules should be rotated or flipped (adjusting chirality)
so that the substructures are consistent across an arrow.
– The ordering reactants in coupling reactions should match
their ordering in the product.
– Ideally, the configuration of angles, torsions and chiral
bonds should be consistent across a reaction arrow.
• Agents, solvents, salts and even protecting groups
may be easier to interpret textually (superatoms).
ACS National Meeting, San Diego, USA 27th March 2012
Reaction depiction
ACS National Meeting, San Diego, USA 27th March 2012
Reaction depiction
ACS National Meeting, San Diego, USA 27th March 2012
Rxn searching: make or Break
• In order to simplify the learning curve for exploiting
reaction databases, an easy to use query mechanism
has been developed for frequent use-cases.
• Query: Retrieve reactions that make indoles.
• Traditional interfaces provide either substructure
search or require reaction centers/atom mapping.
• A more convenient (robust) mechanism is to check
whether (or count) that a drawn substructure
appears in the products but not in either the
reactants or agents.
ACS National Meeting, San Diego, USA 27th March 2012
Avoiding uninteresting results
ACS National Meeting, San Diego, USA 27th March 2012
Atom mapping ambiguity
• Mechanistic atom mappings depend on mechanism.
ACS National Meeting, San Diego, USA 27th March 2012
Atom mapping ambiguity
• Mechanistic atom mappings depend on mechanism.
ACS National Meeting, San Diego, USA 27th March 2012
Atom mapping ambiguity
• Mechanistic atom mappings depend on mechanism.
ACS National Meeting, San Diego, USA 27th March 2012
bond changes in indole synthesis
Synthesis A B C D
Baeyer-Emmerling M
Bartoli M M
Bischler-Möhlau M C M
Fischer M C M
Fukuyama M
Hemetsberger M
Larock M C M
Mandelung M
Nenitzescu M M
Reissert M M
Larock M C M
ACS National Meeting, San Diego, USA 27th March 2012
Reaction similarity
• To define the similarity between two reactions, we
employ an ECFP variation of Daylight’s “difference”
fingerprints.
• Conceptually, a difference fingerprint encodes the
differences between the fingerprint of the reactants
and fingerprint of the products.
• We use differences in counted Scitegic/Accelrys ECFP
fingerprints to capture the local environment of the
reaction centre without requiring atom mapping.
ACS National Meeting, San Diego, USA 27th March 2012
Reaction classification
• Although reaction similarity may be used to cluster
and group mechanistically related reactions together,
it is also convenient for them to be categorized by
the type of reaction.
• Manually assignment into the RSC’s RXNO ontology.
• InfoChem’s ICCLASSIFY and ICNAMERXN tools.
• Algorithmic assignment of reaction class (compound
purchase, purification, chiral separation) or
recognized (named) reaction mechanism using
SMIRKS transformations.
ACS National Meeting, San Diego, USA 27th March 2012
categorization of reactions
J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006.
S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011.
23.1
22.4
11.5
18.0
8.2
5.6
5.6
3.1
1.5 1.1 Heteroatom alkylation and arylation
Acylation and related processes
C-C bond formation
Deprotections
Heterocycle formation
Reductions
Functional group conversion
Protections
Oxidations
Functional group addition
ACS National Meeting, San Diego, USA 27th March 2012
conclusions
• In an attempt to better understand and hopefully
improve the productivity of synthetic chemists,
new computational methods have been developed
to process “real world” organic reaction data.
• The fruits of this work now enable medicinal
chemists and informaticians to make greater use of
the wealth of information in their in-house ELNs.
ACS National Meeting, San Diego, USA 27th March 2012
acknowledgements
• Daniel Lowe, NextMove Software.
• Plamen Petrov and Mikko Vainio, AstraZeneca.
• Richard Bolton and Andrew Wooster, GSK.
• Peter Loew and Hans Kraut, InfoChem.
• Ethan Hoff, Abbott Laboratories.
• Stephen Roughley, Vernalis.
• Thank you for your time.
ACS National Meeting, San Diego, USA 27th March 2012

Weitere ähnliche Inhalte

Ähnlich wie Efficient Searching and Similarity of Unmapped Reactions: Application to ELN Analysis

EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert syste...
EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert syste...EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert syste...
EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert syste...
ChemAxon
 
Sandrogreco Lithium Diisopropylamide Solution Kinetics And
Sandrogreco Lithium Diisopropylamide   Solution Kinetics AndSandrogreco Lithium Diisopropylamide   Solution Kinetics And
Sandrogreco Lithium Diisopropylamide Solution Kinetics And
Profª Cristiana Passinato
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInal
Steve Flynn
 

Ähnlich wie Efficient Searching and Similarity of Unmapped Reactions: Application to ELN Analysis (20)

EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert syste...
EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert syste...EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert syste...
EUGM15 - Alexandre Varnek (Université de Strasbourg): Towards an expert syste...
 
Docking
DockingDocking
Docking
 
Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)
 
Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms
Evaluating the Quality and Performance of Automatic Atom Mapping AlgorithmsEvaluating the Quality and Performance of Automatic Atom Mapping Algorithms
Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms
 
Pharmacophore Modeling and Docking Techniques.ppt
Pharmacophore Modeling and Docking Techniques.pptPharmacophore Modeling and Docking Techniques.ppt
Pharmacophore Modeling and Docking Techniques.ppt
 
Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms
Evaluating the Quality and Performance of Automatic Atom Mapping AlgorithmsEvaluating the Quality and Performance of Automatic Atom Mapping Algorithms
Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms
 
Assignment 105B.pptx
Assignment 105B.pptxAssignment 105B.pptx
Assignment 105B.pptx
 
Sandrogreco Lithium Diisopropylamide Solution Kinetics And
Sandrogreco Lithium Diisopropylamide   Solution Kinetics AndSandrogreco Lithium Diisopropylamide   Solution Kinetics And
Sandrogreco Lithium Diisopropylamide Solution Kinetics And
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
 
Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...
 
Essential Biology: Making ATP Workbook (SL Core only)
Essential Biology: Making ATP Workbook (SL Core only)Essential Biology: Making ATP Workbook (SL Core only)
Essential Biology: Making ATP Workbook (SL Core only)
 
43_EMIJ-06-00212.pdf
43_EMIJ-06-00212.pdf43_EMIJ-06-00212.pdf
43_EMIJ-06-00212.pdf
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInal
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular Docking
 
Drug design based on bioinformatic tools
Drug design based on bioinformatic toolsDrug design based on bioinformatic tools
Drug design based on bioinformatic tools
 
Year 9 rubric eg
Year 9 rubric egYear 9 rubric eg
Year 9 rubric eg
 
Implementing Life Cycle Assessment to Understand the Environmental and Energy...
Implementing Life Cycle Assessment to Understand the Environmental and Energy...Implementing Life Cycle Assessment to Understand the Environmental and Energy...
Implementing Life Cycle Assessment to Understand the Environmental and Energy...
 
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
Extracting medicinal chemistry knowledge by a secured Matched Molecular Pair ...
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...
 

Mehr von NextMove Software

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
NextMove Software
 

Mehr von NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 

Kürzlich hochgeladen

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 

Efficient Searching and Similarity of Unmapped Reactions: Application to ELN Analysis

  • 1. Efficient Searching and Similarity of Unmapped Reactions: Application to ELN Analysis Roger Sayle, Ed Griffen, Thierry Kogej and David Drake NextMove Software, Cambridge, UK AstraZeneca, Alderley Park, UK AstraZeneca, Molndal, Sweden ACS National Meeting, San Diego, USA 27th March 2012
  • 2. motivation • Better understanding of chemical reactions can be used to improve the design-test-make cycles in the pharmaceutical industry, both for in-house synthesis and outsourcing to CROs. • Although small molecule informatics is well supported, the computer representation, storage, manipulation, analysis, QSAR and property prediction of chemical reactions is relatively unexplored. ACS National Meeting, San Diego, USA 27th March 2012
  • 3. Motivating example from gsk • Stephen D. Pickett, Darren V.S. Green, David L. Hunt, David A. Pardoe and Ian Hughes, “Automated Lead Optimization of MMP-12 Inhibitors using a Genetic Algorithm”, ACS Medicinal Chemistry Letters, Vol. 2, pp. 28-33, 2011. 50 R1 x 50 R2 = 2500 cmpds ACS National Meeting, San Diego, USA 27th March 2012
  • 4. Libraries in practice • Objective was to be diverse complete, so variations of standard route and conditions were required to avoid compromising diversity. • Despite best efforts, 566 not made due to poor reactivity, unanticipated side reactions or product stability compared to 26 not assayed, 28 assay failed and 176 inactive, giving 1704 assay results. • Interestingly, R1=21&R2=7 and R1=31&R2=25 were the two most active compounds (pIC50=8), R1=21&R2=25 had a pIC50=7.8 and R1=31&R2=7 was never made! ACS National Meeting, San Diego, USA 27th March 2012
  • 5. Transforms vs. reactions Importance of Reaction Mechanism Example: Ullman-type Coupling Reactions SMIRKS: [H][N:1].[Cl][c:2]>>[N:1][c:2] The SMIRKS transform alone is insufficient to predict the products and by-products in this example. A measure of nucleophility is desirable for each atom in a molecule. Without this software may be misled into believing that protecting groups are required. ACS National Meeting, San Diego, USA 27th March 2012
  • 6. Beyond drug guru ACS National Meeting, San Diego, USA 27th March 2012 sildenafil (viagra) vardenafil (levitra)
  • 7. Proposed solution • Analyze the hundreds of thousands of reaction examples described in in-house Electronic Laboratory Notebooks (ELNs). • A typical use case would be to plot reaction yield for related reactions against catalysts and/or solvents used to determine the likely best or poor conditions. • ELNs, unlike the literature, are strongly biased to the types of reactions/chemistries used in pharma. • Most importantly, one of the only sources of negative data (reactions that have failed). ACS National Meeting, San Diego, USA 27th March 2012
  • 8. The challenges 1. Export of the data from the ELN. 2. High fidelity conversion to other file formats. 3. Reaction normalization/standardization. 4. Reaction identity (canonicalization). 5. Improved reaction depiction. 6. Intuitive reaction searching. 7. Reaction similarity. 8. Reaction classification/clustering. ACS National Meeting, San Diego, USA 27th March 2012
  • 9. Database export • Typically, ELNs are implemented as complex schemas within relational databases (Oracle), supporting transactions, auditing and security privileges. • Not uncommonly the vendor provided functionality or APIs for data export are slow and/or buggy. • In addition to reactions and structures, there is often a requirement to export all associated data, including textual and numeric data, tables, even LCMS and NMR spectra. ACS National Meeting, San Diego, USA 27th March 2012
  • 10. typical eln fields in RD format $DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:VALUE $DATUM 120 °C $DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:MINVAL $DATUM 393.15 $DTYPE REACTION:REACTION.CONDITIONS:TEMPERATURE:MAXVAL $DATUM 393.15 $DTYPE REACTION:REACTION.CONDITIONS:PRESSURE:VALUE $DATUM 5 bar $DTYPE REACTION:REACTANTS(1):NAME $DATUM nicotinoyl chloride $DTYPE REACTION:REACTANTS(1):CHEMICAL.STRUCTURE $DATUM c1cc(cnc1)C(=O)Cl $DTYPE REACTION:REACTANTS(1):FORMULA.MASS:VALUE $DATUM 141.56 $DTYPE REACTION:REACTANTS(2):NAME $DATUM (2R,3S)-3-methylpentan-2-ol $DTYPE REACTION:REACTANTS(2):CHEMICAL.STRUCTURE $DATUM CC[C@@H](C)[C@H](C)O $DTYPE REACTION:REACTANTS(2):FORMULA.MASS:VALUE $DATUM 102.17 ACS National Meeting, San Diego, USA 27th March 2012 $DTYPE REACTION:PRODUCTS(1):CHEMICAL.STRUCTURE $DATUM CC[C@@H](C)[C@H](C)OC(=O)c1cccnc1 $DTYPE REACTION:PRODUCTS(1):NAME $DATUM (2R,3S)-3-methylpentan-2-yl nicotinate $DTYPE REACTION:PRODUCTS(1):MOLECULAR.WEIGHT:VALUE $DATUM 207.27 $DTYPE REACTION:PRODUCTS(1):MOLECULAR.FORMULA $DATUM C12H17NO2 $DTYPE REACTION:PRODUCTS(1):ACTUAL.MOLES:VALUE $DATUM 2.16 mmol $DTYPE REACTION:SOLVENTS(1):NAME $DATUM 1-pentanol $DTYPE REACTION:SOLVENTS(1):VOLUME:VALUE $DATUM 5 mL $DTYPE REACTION:SOLVENTS(1):R.VALUES $DATUM R10,R20
  • 11. File format conversion • The source format for many reactions is typically a sketch, in either CDX, CDXML, ISIS Sketch or Marvin file format. • For data processing reactions are much easier to handle as reaction SMILES, MDL RXN or RD files and possibly even variants of MOL and SD file formats. • Alas handling of reaction file formats is generally poorly handled by many cheminformatics tools. • Additionally, reaction file formats can rarely encode all of the same information. ACS National Meeting, San Diego, USA 27th March 2012
  • 12. Reaction standardization • An often overlooked aspect of ELNs is the need to enforce “business rules” to consistently represent a reaction, in the same way that normalized molecules are stored in registration systems. • Pharmaceutical ELNs contain structures where nitros are represented arbitrarily, and even cases where azide representations differ on each side of an arrow. • Unfortunately, the rules used for molecules (such as InChI) may be inappropriate for reactions, where metal co-ordination and radicals play a major role. ACS National Meeting, San Diego, USA 27th March 2012
  • 13. Reaction identity • Continuing the notion of standardization, one might ask when are two reactions the same (duplicates). • Two reactions that have matching agents (catalysts and solvents), reactants and products are considered identical, yet two that have the same reactants and products are considered the “same”. • Whether a component is a reactant, catalyst, solvent or reagent may be consistently defined by atom- mapping; reactants contribute atoms to the product. ACS National Meeting, San Diego, USA 27th March 2012
  • 14. Improved reaction depiction • The 2D display of reactions can be improved by suitably aligning the reactants and products. – Molecules should be rotated or flipped (adjusting chirality) so that the substructures are consistent across an arrow. – The ordering reactants in coupling reactions should match their ordering in the product. – Ideally, the configuration of angles, torsions and chiral bonds should be consistent across a reaction arrow. • Agents, solvents, salts and even protecting groups may be easier to interpret textually (superatoms). ACS National Meeting, San Diego, USA 27th March 2012
  • 15. Reaction depiction ACS National Meeting, San Diego, USA 27th March 2012
  • 16. Reaction depiction ACS National Meeting, San Diego, USA 27th March 2012
  • 17. Rxn searching: make or Break • In order to simplify the learning curve for exploiting reaction databases, an easy to use query mechanism has been developed for frequent use-cases. • Query: Retrieve reactions that make indoles. • Traditional interfaces provide either substructure search or require reaction centers/atom mapping. • A more convenient (robust) mechanism is to check whether (or count) that a drawn substructure appears in the products but not in either the reactants or agents. ACS National Meeting, San Diego, USA 27th March 2012
  • 18. Avoiding uninteresting results ACS National Meeting, San Diego, USA 27th March 2012
  • 19. Atom mapping ambiguity • Mechanistic atom mappings depend on mechanism. ACS National Meeting, San Diego, USA 27th March 2012
  • 20. Atom mapping ambiguity • Mechanistic atom mappings depend on mechanism. ACS National Meeting, San Diego, USA 27th March 2012
  • 21. Atom mapping ambiguity • Mechanistic atom mappings depend on mechanism. ACS National Meeting, San Diego, USA 27th March 2012
  • 22. bond changes in indole synthesis Synthesis A B C D Baeyer-Emmerling M Bartoli M M Bischler-Möhlau M C M Fischer M C M Fukuyama M Hemetsberger M Larock M C M Mandelung M Nenitzescu M M Reissert M M Larock M C M ACS National Meeting, San Diego, USA 27th March 2012
  • 23. Reaction similarity • To define the similarity between two reactions, we employ an ECFP variation of Daylight’s “difference” fingerprints. • Conceptually, a difference fingerprint encodes the differences between the fingerprint of the reactants and fingerprint of the products. • We use differences in counted Scitegic/Accelrys ECFP fingerprints to capture the local environment of the reaction centre without requiring atom mapping. ACS National Meeting, San Diego, USA 27th March 2012
  • 24. Reaction classification • Although reaction similarity may be used to cluster and group mechanistically related reactions together, it is also convenient for them to be categorized by the type of reaction. • Manually assignment into the RSC’s RXNO ontology. • InfoChem’s ICCLASSIFY and ICNAMERXN tools. • Algorithmic assignment of reaction class (compound purchase, purification, chiral separation) or recognized (named) reaction mechanism using SMIRKS transformations. ACS National Meeting, San Diego, USA 27th March 2012
  • 25. categorization of reactions J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006. S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011. 23.1 22.4 11.5 18.0 8.2 5.6 5.6 3.1 1.5 1.1 Heteroatom alkylation and arylation Acylation and related processes C-C bond formation Deprotections Heterocycle formation Reductions Functional group conversion Protections Oxidations Functional group addition ACS National Meeting, San Diego, USA 27th March 2012
  • 26. conclusions • In an attempt to better understand and hopefully improve the productivity of synthetic chemists, new computational methods have been developed to process “real world” organic reaction data. • The fruits of this work now enable medicinal chemists and informaticians to make greater use of the wealth of information in their in-house ELNs. ACS National Meeting, San Diego, USA 27th March 2012
  • 27. acknowledgements • Daniel Lowe, NextMove Software. • Plamen Petrov and Mikko Vainio, AstraZeneca. • Richard Bolton and Andrew Wooster, GSK. • Peter Loew and Hans Kraut, InfoChem. • Ethan Hoff, Abbott Laboratories. • Stephen Roughley, Vernalis. • Thank you for your time. ACS National Meeting, San Diego, USA 27th March 2012