Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Regioselectivity: An Application of Expert
Systems and Ontologies to Chemical
(Named) Reaction Analysis
CINF Reaction Anal...
analysis vs. prediction
• In “Applied Chemoinformatics” (2018), J. Goodman
defines three main problems.
– Reaction Plannin...
analysis vs. prediction
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
intuition and counter-intuition
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018...
Experimental validation
• Synthesis of a novel aromatic
heterocycle previously unreported in
the literature.
• William Pit...
expectations: setting the scope
Goodman Challenges Carey et al. Challenges
Maitotoxin
Difficult to access substituted
arom...
Which reactions are important?
• NCI/LHASA SAVI (14+6 reactions)
– Suzuki Coupling 36262 out of 1.2M USPTO examples
– Sulf...
myth #1: heterocycle formation
• Because almost all drug-like molecules contain a
heterocycle, there’s a belief that heter...
Reactions that don’t happen!
• Paal-Knorr Pyrrole Synthesis vs. Aldehydes/Ketones
Of the 430 examples of
Paal-Knorr pyrrol...
Who let the dogs out?
• Big Data can determine the utility and scope of reactions.
CINF Reaction Analytics. 256th ACS Nati...
Reactions that do happen!
• Chloro Sonogashira Couplings (@ NCI)
– The LHASA ‘CHMTRN’ rules for Transform 2267, Sonogashir...
improving upon synthia™
• As presented earlier, Chematica/Synthia maintains a
manually curated “black list” of interfering...
handling noisy data (n>1 statistics)
• Kumada couplings incompatible with aldehydes...
• US20050107257A1 [p0196] Syngenta
...
c.f. US20140296184A1 [C00008]
M. Segler, M. Preuss and M. Waller, “Planning Chemical Syntheses with Deep Neural
Networks a...
search: monte carlo v. proof num
• Appropriate choice of AI search technique.
Monte Carlo search is inappropriate
for sear...
categorization of reactions
1. J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006.
2. S. Roughley...
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
reaction ontology
• Reactions are classified into a common subset of the
Carey et al. classes and the RSC’s RXNO ontology....
Complication: agents matter
From http://en.wikipedia.org/wiki/Diazonium_compound
Sandmeyer reaction
Benzenediazonium chlor...
10 most popular reactions
ID Name Count
2.1.2 Carboxylic acid + amine 26,040
1.3.1 Buchwald-Hartwig amination 22,048
3.1 S...
Most/least successful reactions
ID Name Mean Yield Count
1.7.2 Diazomethane esterification 91% 41
9.3.1 Carboxylic acid to...
rArE named reactions
• Adams decarboxylation
• Angeli-Rimini reaction
• Aza-Baylis-Hillman reaction
• Boyer reaction
• Buc...
Trends in Reaction Types
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
19...
Namerxn reaction naming
• Many reaction classification algorithms are
dependent upon atom-atom mapping assignments.
• Alas...
Example smarts/smirks
# NOZAKI_HIYAMA_KISHI_REACTION
[#6v4+0;X4,X3:1][BrD1h0+0:2].[Ni].[Cr].[OD1h
0+0:3]=[CD2h1v4+0:4]>>[#...
Smarts pattern compilation
bool atom_1(const RDKit::Atom *aptr) {
return aptr->getAtomicNum() == 6 &&
ringinfo->numAtomRin...
Smarts pattern compilation
RDKit::ROMol::OBOND_ITER_PAIR biter[25];
…
case_7:
biter[0] = mol->getAtomBonds(atom[1]);
goto ...
Recent improvements insights
• Profiling NameRxn in 2014 to diagnose performance
problems with RDKit revealed that pattern...
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Pistachio: Siri for chemists
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Multi-step Synthetic Routes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Intermediates 197702103114 56611 31403 17268 9230 50...
Application to planning 1
• Cinnamic Acid (PhCHCHCO2)
1. Bromo Heck reaction (272)
2. Horner-Wadsworth-Emmons reaction (26...
Application to planning 2
• p-Nitrobenzoic acid
1. Nitrile to carboxy (12)
2. CO2H-Me deprot (8)
3. CO2H-Et deprot (5)
4. ...
tactical vs. strategic reactions
• Traditional Synthesis
Planning has concentrated
on the Strategic Application
of Named R...
functional group interconversion
• Relative frequency of simple group conversion
From a total of 7,252,419 reactions from ...
nextmove’s strategy
• Why expose reaction planning to the end-user at all?
• During our collaboration with ChemSpace, it h...
building block search
• Query:
Results:
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd Aug...
the challenge of regioselectivity
• A tricky benchmark is reactions of 2,4,5-trichloropyrimidine
• The nature of pyrimidin...
handy’s rule
• Scott Handy and Yanan Zhang, “A Simple Guide for Predicting
Regioselectivity in the Coupling of Polyhalohet...
electrophilic substitution of
heterocycles with qm methods
• J.C. Kromann et al., “Fast and Accurate Prediction of the
Reg...
a data-driven strategy
• One possible approach to the challenge of regioselectivity is
to derive preferences by large-scal...
the semantics of atom mapping
Prof. Goodman poses an interesting question of atom mapping.
4.1.6 Cyclic Beckman Rearrangem...
Acknowledgements
• NextMove Software
– Noel O’Boyle
– John Mayfield
• NextMove Alumni
– Daniel Lowe
• Thank you for you ti...
Transforms vs. reactions
Importance of Reaction Mechanism
Example: Ullman-type Coupling Reactions
SMIRKS: [H][N:1].[Cl][c:...
Analysis of pharmaceutical elns
• NextMove Software’s HazELNut software is used to
export and analyze ELN content at 6 of ...
Extracting mps and reactions
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Example reaction mining INPUT
Methyl 4-[(pentafluorophenoxy)sulfonyl]benzoate
To a solution of methyl 4-(chlorosulfonyl)be...
Example reaction mining Output
CHEMICAL reactions for free
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
bond changes in indole synthesis
Synthesis A B C D
Baeyer-Emmerling M
Bartoli M M
Bischler-Möhlau M C M
Fischer M C M
Fuku...
Suzuki coupling leaving groups
Leaving Group Mean Yield N Observations
Bromo 58.80% 10817
Chloro 57.96% 2752
Iodo 57.21% 2...
Beyond drug guru
sildenafil (viagra) vardenafil (levitra)
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA....
Eli lilly’s automated synthesis lab
Alexander G. Godfrey, Thierry Masquelin and Horst Hemmerle, “A Remote-Control Adaptive...
Synthesis failures at lilly
• At the 2013 Sheffield Cheminformatics conference,
Christos Nicolaou highlighted the technica...
Synthesis failures at gsk
• Pickett et al. 2011 describe the parallel synthesis of a
50x50 library of MMP-12 inhibitors by...
Learning from failure (logp shift)
• Nadin et al. 2012 [1] hypothesize that low LogP is a
major cause of synthesis failure...
Example (actionable) insight
The clear trend between Suzuki coupling success rate and
predicted octanol-water partition co...
big data mining confirmation of
Nadin-churcher hypothesis
On 16,335 Suzuki coupling reactions extracted from US
patent app...
“big data” reaction yield analysis
AZ Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
“big data” reaction yield analysis
AZ Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
Functional group changes analyzed
Do the results make sense at all?
Functional Group
Avg in
Reaction
Overall
Average
halog...
Nächste SlideShare
Wird geladen in …5
×

CINF 170: Regioselectivity: An application of expert systems and ontologies to chemical (named) reaction analysis

513 Aufrufe

Veröffentlicht am

Prediction is much harder than analysis. Consider hurricanes and tornadoes; it's much easier to follow the path of destruction by locating devastated neighborhoods, than to forecast the paths of such weather systems in advance. Likewise for many chemical reactions, such as nitration (by refluxing with nitric acid and sulfuric acid) where the appearance of one or more nitro groups indicates a nitration reaction, but predicting where on a non-trivial organic molecule this functional group appears is a much harder challenge. In this sense, reaction analysis is much simpler than (either forward or retrosynthetic) synthesis planning.

NextMove Software's namerxn is an expert system for classifying reactions (from reaction SMILES, MDL connection tables or ChemDraw sketches) typically assigning each reaction instance to a leaf classification in the Royal Society of Chemistry's RXNO ontology. These tools can be helpful in the analysis of regioselectivity preferences of reactions.

This talk consists of two parts. A technical part describing the recent algorithmic and methodological improvements to the namerxn software, including describing some of the more challenging of the 1000+ reactions it currently identifies. And a scientific part that investigates the regioselective preferences of some of these reactions.

Veröffentlicht in: Wissenschaft
  • ➤➤ How Long Does She Want You to Last? Here's the link to the FREE report ■■■ https://tinyurl.com/rockhardxxx
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Gehören Sie zu den Ersten, denen das gefällt!

CINF 170: Regioselectivity: An application of expert systems and ontologies to chemical (named) reaction analysis

  1. 1. Regioselectivity: An Application of Expert Systems and Ontologies to Chemical (Named) Reaction Analysis CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Roger Sayle, John Mayfield and Noel O’Boyle NextMove Software, Cambridge, UK
  2. 2. analysis vs. prediction • In “Applied Chemoinformatics” (2018), J. Goodman defines three main problems. – Reaction Planning: R → ? → P [Database] – Reaction Prediction: R1 + R2 → ? [Simulation] – Synthesis Planning: R? + R? + R? → P [Design] • A corollary is that there’s a distinction between reactions that have already been observed and those experiments yet to be performed. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  3. 3. analysis vs. prediction CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  4. 4. intuition and counter-intuition CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 With apologies to Mark Twain: “Never let the facts get in the way of a good synthesis plan”. There are reactions that chemists expect will happen but don’t & those they don’t expect but do. But cheminformaticians are alchemists that can turn lead into gold, as easily as “[Pb]>>[Au]”.
  5. 5. Experimental validation • Synthesis of a novel aromatic heterocycle previously unreported in the literature. • William Pitt et al., “Heteroaromatic Rings of the Future”, Journal of Medicinal Chemistry, 52(9):2952- 2963, 2009. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  6. 6. expectations: setting the scope Goodman Challenges Carey et al. Challenges Maitotoxin Difficult to access substituted aromatic starting materials. Eribulin Org. Biomol. Chem. 2005, 4,2337-2347. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  7. 7. Which reactions are important? • NCI/LHASA SAVI (14+6 reactions) – Suzuki Coupling 36262 out of 1.2M USPTO examples – Sulfonamide Schotten-Baumann 14348 out of 1.2M USPTO examples – Buchwald-Harwig 6040 out of 1.2M USPTO examples – Hiyama coupling 458 out of 1.2M USPTO examples – Fukuyama coupling 2 out of 1.2M USPTO examples – Liebeskind-Srogl coupling 0 out of 1.2M USPTO examples • Hartenfeller/Schneider (58 reactions) – #1 Pictet-Spengler reaction 7 out of 1.2M USPTO examples – #10+ #11 Azide-nitrile Huisgen-cycloaddition 5 out of 1.2M USPTO examples – #17 Pyridone synthesis 2 out of 1.2M USPTO examples – #20 Phthalazinone synthesis 16 out of 1.2M USPTO examples – #24 Friedlander quinoline synthesis 30 out of 1.2M USPTO examples • Enamine REAL 2016 (43 reactions) – Thiourea to guanidine (14 out of 160M examples). CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  8. 8. myth #1: heterocycle formation • Because almost all drug-like molecules contain a heterocycle, there’s a belief that heterocycle formation is important. • Analysis of both patent data and pharmaceutical ELNs reveals that heterocycle forming reactions, even named heterocycle forming reactions, are relatively rare, with ring systems often being purchased as building blocks*. * This analysis might not apply to process development and manufacturing. https://nextmovesoftware.com/blog/2016/10/24/buying-a-ring-or-making-one-yourself/ CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  9. 9. Reactions that don’t happen! • Paal-Knorr Pyrrole Synthesis vs. Aldehydes/Ketones Of the 430 examples of Paal-Knorr pyrrole synthesis reported in US patent applications 2001-2012, exactly zero have more than the two reacting ketones/aldehydes. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  10. 10. Who let the dogs out? • Big Data can determine the utility and scope of reactions. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  11. 11. Reactions that do happen! • Chloro Sonogashira Couplings (@ NCI) – The LHASA ‘CHMTRN’ rules for Transform 2267, Sonogashira couplings, (presented by Marc Nicklaus at Sheffield) states “Iodides are usually more reactive than bromide. Chlorides do not react”. Code Name Count Yield 3.3.2 Bromo Sonogashira coupling 3717 49.3% 3.3.3 Chloro Sonogashira coupling 429 44.2% 3.3.4 Iodo Sonogashira coupling 2721 64.9% • Isotopically-labelled Compounds (@Eli Lilly) – The LAAR reactions used to construct Lilly’s PLC (Nicolaou et al. 2016) forbid the presence of isotopic labels in reactants. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  12. 12. improving upon synthia™ • As presented earlier, Chematica/Synthia maintains a manually curated “black list” of interfering functional groups. • A more labor-efficient data-driven strategy is to automatically maintain a “white list” of tolerated functional groups that can be derived from observed experiments. • It’s much easier to track the things you know about and can see, than the things you can’t see and/or don’t know about. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  13. 13. handling noisy data (n>1 statistics) • Kumada couplings incompatible with aldehydes... • US20050107257A1 [p0196] Syngenta A solution of 34.3 ml of ethylmagnesium chloride in 50 ml of tetrahydrofuran is added dropwise at −70° C. to 7.3 g of indium trichloride in 200 ml of tetrahydrofuran and, after stirring for 30 minutes, the reaction temperature is allowed to rise slowly to room temperature. That solution is added to a solution of 14.9 g of 3- bromo-4-pyridine carbaldehyde and 2.8 g of PdCl2 (PPh3)2 in 240 ml of tetrahydrofuran and the reaction mixture is heated under reflux for 20 hours. 5 ml of methanol are then added and the mixture is concentrated in vacuo, stirred thoroughly with diethyl ether, filtered off and concentrated in vacuo once more. The residue is chromatographed on silica gel using ethyl acetate/hexane (1:1), yielding 3-ethyl-4-pyridine carbaldehyde (I) in the form of a yellow oil. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  14. 14. c.f. US20140296184A1 [C00008] M. Segler, M. Preuss and M. Waller, “Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI”, Nature, 555:604-610, 2018. A.Gini, M. Segler et al, “Dehydrogenative TEMPO-mediated formation of Unstable Nitrones: Easy Access to N-Carbamoyl Isoxazolines”, Chem. Eur. J. 21:12053-12060, 2015. prediction? about the future? CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  15. 15. search: monte carlo v. proof num • Appropriate choice of AI search technique. Monte Carlo search is inappropriate for search problems such as mazes. White to win in 173 ply Kf1-f2! CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 OR AND 0 1 0 0 1 1 1 0 0 1 0 1 0
  16. 16. categorization of reactions 1. J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006. 2. S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011. 34% 17% 5% 2% 3% 6% 10% 1% 15% 2% 5% Heteroatom alkylation and arylation Acylation and related processes C-C bond formations Heterocycle formation Protections Deprotections Reductions Oxidations Functional group conversion Functional group addition Resolution CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  17. 17. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  18. 18. reaction ontology • Reactions are classified into a common subset of the Carey et al. classes and the RSC’s RXNO ontology. • There are 12 super-classes – e.g. 3 C-C bond formation (RXNO:0000002). • These contain 84 class/categories. – e.g. 3.5 Pd-catalyzed C-C bond formation (RXNO:0000316) • These contain ~1050 named reactions/types. – e.g. 3.5.3 Negishi coupling (RXNO:0000088) • These require ~2200 SMIRKS-like transformations. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  19. 19. Complication: agents matter From http://en.wikipedia.org/wiki/Diazonium_compound Sandmeyer reaction Benzenediazonium chloride heated with cuprous chloride disolved in HCl to yield chlorobenzene. C6H5N2 + + CuCl → C6H5Cl + N2 + Cu+ Gatterman reaction Benzenediazonium chloride is warmed with copper powder and HCl to yield chlorobenzene. C6H5N2 + + CuCl → C6H5Cl + N2 + Cu+ CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  20. 20. 10 most popular reactions ID Name Count 2.1.2 Carboxylic acid + amine 26,040 1.3.1 Buchwald-Hartwig amination 22,048 3.1 Suzuki coupling 16,508 1.7.6 Williamson ether synthesis 15,665 2.1.1 Amide Schotten-Baumann 11,016 7.1 Nitro to amino 10,234 6.1.1 N-Boc deprotection 9,821 6.2.2 CO2H-Me deprotection 9,487 6.2.1 CO2H-Et deprotection 6,749 2.2.3 Sulfonamide Schotten-Baumann 6,223 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  21. 21. Most/least successful reactions ID Name Mean Yield Count 1.7.2 Diazomethane esterification 91% 41 9.3.1 Carboxylic acid to acid chloride 88% 704 9.7.14 Bromo to azido 85% 235 1.7.5 Methyl esterification 84% 2918 9.7.19 Bromo to iodo Finkelstein reaction 82% 116 6.1.3 N-Cbz deprotection 81% 1359 … 4.1.11 Larock indole synthesis 47% 55 3.11.3 Ullmann-type biaryl coupling 44% 407 1.7.1 Chan-Lam ether coupling 44% 154 4.1.4 Pinner pyrimidine synthesis 39% 47 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  22. 22. rArE named reactions • Adams decarboxylation • Angeli-Rimini reaction • Aza-Baylis-Hillman reaction • Boyer reaction • Buchwald-Fischer indole synthesis • Castro-Stephens coupling • Chugaev elimination • Cook-Heilbron thiazole synthesis • Fischer-Hepp rearrangement • Fukuyama indole synthesis • Gasman indole synthesis • Imine Hosomi-Sakurai reaction • Koch reaction • Leuckart reaction • Liebeskind-Srogl coupling • Lossen rearrangement • Ponzio reaction • Prins reaction • Reimer-Tiemann carboxylation CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  23. 23. Trends in Reaction Types 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 8.0% 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Suzukicouplingsasapercentageofreactionsinayear CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Leaving Group Mean Yield N Observations Bromo 58.80% 10817 Chloro 57.96% 2752 Iodo 57.21% 2049 Triflyloxy 65.48% 717
  24. 24. Namerxn reaction naming • Many reaction classification algorithms are dependent upon atom-atom mapping assignments. • Alas MCS-based atom mapping algorithms are often slow and/or inaccurate [Lowe & Sayle 2012 & 2013]. • NameRXN is a mechanism-based atom-mapper. • All reactants and reagents are placed in a single pot (molecule) and sets of SMIRKS applied in turn. • If the desired product is generated, the reaction (its mechanism and mapping) is identified. – Rationalization is easier than prediction! CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  25. 25. Example smarts/smirks # NOZAKI_HIYAMA_KISHI_REACTION [#6v4+0;X4,X3:1][BrD1h0+0:2].[Ni].[Cr].[OD1h 0+0:3]=[CD2h1v4+0:4]>>[#6:1][C:4]-[Oh1:3] # PAAL_KNORR_THIOPHENE_SYNTHESIS [OD1h0+0:1]=[CX3v4+0:2][CX4v4+0:3]([H])[CX4v 4+0:4]([H])[CX3v4+0:5]=[OD1h0+0:6]>>[S:1]1[C :2]=[C:3][C:4]=[C:5]1 • Writing SMIRKS is both an art and a science. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  26. 26. Smarts pattern compilation bool atom_1(const RDKit::Atom *aptr) { return aptr->getAtomicNum() == 6 && ringinfo->numAtomRings(aptr->getIdx()) != 0 && aptr->getDegree() == 3 && aptr->getTotalNumHs(false) == 0 && aptr->getExplicitValence()+aptr->getImplicitValence() == 4 && aptr->getFormalCharge() == 0; } bool atom_28(const RDKit::Atom *aptr) { if (aptr->getAtomicNum() != 6 || aptr->getExplicitValence()+aptr->getImplicitValence() != 4 || aptr->getFormalCharge() != 0) return false; return (aptr->getDegree() == 2 && aptr->getTotalNumHs(false) == 1) || (aptr->getDegree() == 2 && aptr->getTotalNumHs(false) == 2) || aptr->getDegree() == 3; } CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  27. 27. Smarts pattern compilation RDKit::ROMol::OBOND_ITER_PAIR biter[25]; … case_7: biter[0] = mol->getAtomBonds(atom[1]); goto case_9; case_8: avisit[atom[0]->getIdx()] = 0; ++biter[0].first; case_9: if (biter[0].first != biter[0].second) { bptr = (*mol)[*biter[0].first].get(); if (bond_1(bptr)) { aptr = bptr->getOtherAtom(atom[1]); aidx = aptr->getIdx(); if (avisit[aidx] == 0 && atom_1(aptr)) { avisit[aidx] = 1; atom[0] = aptr; goto case_10; } else ++biter[0].first; } else ++biter[0].first; } else goto case_5; goto case_9; CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  28. 28. Recent improvements insights • Profiling NameRxn in 2014 to diagnose performance problems with RDKit revealed that pattern matching and transformation accounted for <1% of runtime. • The bottleneck is actually in canonicalization. • The Ah-ha experience was to use hash filtering. • Check molecular formula: CiHjBrkCllFmInNoOpPqSr • Additional cleverness allows pre-sanitization hashing. – Triple bond count, but not single or double bond count. – Perhaps there’s something in InChI-style hashing after all. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  29. 29. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  30. 30. Pistachio: Siri for chemists CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  31. 31. Multi-step Synthetic Routes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Intermediates 197702103114 56611 31403 17268 9230 5057 2701 1256 639 301 136 58 15 5 2 Terminal Products 385149149445 81837 47579 27670 16619 9320 5263 2511 1330 678 373 111 63 8 6 5 0 100000 200000 300000 400000 500000 600000 700000 Occurrences Number of steps Intermediates Terminal Products CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  32. 32. Application to planning 1 • Cinnamic Acid (PhCHCHCO2) 1. Bromo Heck reaction (272) 2. Horner-Wadsworth-Emmons reaction (268) 3. Wittig olefination (129) 4. Bromo Heck-type reaction (62) 5. Iodo Heck reaction (49) 6. Triflyloxy Heck[-type] reaction (43) 7. Ester Schotten-Baumann (10) 8. Bromo Suzuki coupling (5) 9. Stille reaction (2) 10. Olefin metathesis (1) CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  33. 33. Application to planning 2 • p-Nitrobenzoic acid 1. Nitrile to carboxy (12) 2. CO2H-Me deprot (8) 3. CO2H-Et deprot (5) 4. Ester hydrolysis (1) 5. Nitration (1) • p-Nitrotoluene 1. Nitration (96) 2. Bromo Suzuki-type (1) 3. Chloro Suzuki (1) CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  34. 34. tactical vs. strategic reactions • Traditional Synthesis Planning has concentrated on the Strategic Application of Named Reactions. • However, there’s much to be gained for the Tactical Application of Unnamed Reactions. • Nitro reduction is the 6th most frequent reaction. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  35. 35. functional group interconversion • Relative frequency of simple group conversion From a total of 7,252,419 reactions from USPTO & EPO patents. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Amino Bromo Chloro Fluoro Hydroxy Iodo Sulfanyl Thioxo Amino 3951 2949 990 3976 3657 143 Bromo 3121 589 637 2390 738 435 Chloro 9424 717 1606 2156 744 798 Fluoro 1549 180 484 826 28 103 Hydroxy 2572 11441 31593 7641 3004 348 Iodo 155 445 47 Nitro 126606 138 Oxo 8419
  36. 36. nextmove’s strategy • Why expose reaction planning to the end-user at all? • During our collaboration with ChemSpace, it has become clear that what was required was not similarity nor superstructure search by a form of synthesis-aware search. • This is similar to the challenge faced by traditional restrosynthesis tools in identifying a leaf/goal state. • 937M purchasable compounds makes this is non-trivial. • The usual challenges of functional group, tautomer and protonation state lookups, now also protecting groups. • But why not return the carboxylic acid when searching for the acid chloride, or alcohol when searching for bromo derivative. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  37. 37. building block search • Query: Results: CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  38. 38. the challenge of regioselectivity • A tricky benchmark is reactions of 2,4,5-trichloropyrimidine • The nature of pyrimidine makes the chloro at the 4-position more reactive than the 2 position which is more reactive than the 5 position. • Simple quantum mechanical have difficulty discerning this order. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  39. 39. handy’s rule • Scott Handy and Yanan Zhang, “A Simple Guide for Predicting Regioselectivity in the Coupling of Polyhaloheteroaromatics”, Chemical Communications, 3:299-301, Nov 2005. Abstract A simple guide for predicting the order and site of coupling (Suzuki, Stille, Negishi, Sonogashira, etc.) in polyhaloheteroaromatics based upon the 1H NMR chemical shift values of the parent non-halogenated heteroaromatics has been developed. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  40. 40. electrophilic substitution of heterocycles with qm methods • J.C. Kromann et al., “Fast and Accurate Prediction of the Regioselectivity of Electrophilic Aromatic Substitution Reactions”, Chem. Sci., 9(3):660-665, Nov 2017. • M. Kruszyk et al., “Computational Methods to Predict the Regioselectivity of Electrophilic Aromatic Substitution Reactions of Heteroaromatic Systems”, J. Org. Chem. 81(12):5128-5134, Jun 2016. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  41. 41. a data-driven strategy • One possible approach to the challenge of regioselectivity is to derive preferences by large-scale (statistical) analysis of reaction data sets. • Reaction classification can identify the subset of relevant examples, which can then be used to produce tables of heterocycles position preferences. • Directing groups and their influence can also be identified and tabulated. • See https://www.scripps.edu/baran/images/grpmtgpdf/Gutekunst_Apr_10.pdf CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  42. 42. the semantics of atom mapping Prof. Goodman poses an interesting question of atom mapping. 4.1.6 Cyclic Beckman Rearrangement 1.2.9 Alcohol + Amine Condensation or 1.1.3 Iodo N-methylation CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  43. 43. Acknowledgements • NextMove Software – Noel O’Boyle – John Mayfield • NextMove Alumni – Daniel Lowe • Thank you for you time. • Questions? • Thoughts? • AbbVie • AstraZeneca • Bristol-Myers Squibb • ChemSpace • Eli Lilly • GlaxoSmithKline • Hoffmann-La Roche • IBM Research Zurich • Merck • Novartis • Royal Society of Chemistry • Vernalis • Vertex Pharmaceuticals CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  44. 44. Transforms vs. reactions Importance of Reaction Mechanism Example: Ullman-type Coupling Reactions SMIRKS: [H][N:1].[Cl][c:2]>>[N:1][c:2] The SMIRKS transform alone is insufficient to predict the products and by-products in this example. A measure of nucleophility is desirable for each atom in a molecule. Without this software may be misled into believing that protecting groups are required. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  45. 45. Analysis of pharmaceutical elns • NextMove Software’s HazELNut software is used to export and analyze ELN content at 6 of the top 10 large pharmaceutical companies. • In-house analysis of this data, across the industry, reveals a surprisingly high rate of synthesis failure, not indicated in the published literature (journal articles, patent applications or reaction databases). • Understanding the causes of these failures is perhaps more significant than attempting to access new chemistries. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  46. 46. Extracting mps and reactions CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  47. 47. Example reaction mining INPUT Methyl 4-[(pentafluorophenoxy)sulfonyl]benzoate To a solution of methyl 4-(chlorosulfonyl)benzoate (606 mg, 2.1 mmol, 1 eq) in DCM (35 ml) was added pentafluorophenol (412 mg, 2.2 mmol, 1.1 eq) and Et3N (540 mg, 5.4 mmol, 2.5 eq) and the reaction mixture stirred at room temperature until all of the starting material was consumed. The solvent was evaporated in vacuo and the residue redissolved in ethyl acetate (10 ml), washed with water (10 ml), saturated sodium hydrogen carbonate (10 ml), dried over sodium sulphate, filtered and evaporated to yield the title compound as a white solid (690 mg, 1.8 mmol, 85%).
  48. 48. Example reaction mining Output
  49. 49. CHEMICAL reactions for free CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  50. 50. bond changes in indole synthesis Synthesis A B C D Baeyer-Emmerling M Bartoli M M Bischler-Möhlau M C M Fischer M C M Fukuyama M Hemetsberger M Larock M C M Mandelung M Nenitzescu M M Reissert M M CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  51. 51. Suzuki coupling leaving groups Leaving Group Mean Yield N Observations Bromo 58.80% 10817 Chloro 57.96% 2752 Iodo 57.21% 2049 Triflyloxy 65.48% 717 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  52. 52. Beyond drug guru sildenafil (viagra) vardenafil (levitra) CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  53. 53. Eli lilly’s automated synthesis lab Alexander G. Godfrey, Thierry Masquelin and Horst Hemmerle, “A Remote-Control Adaptive MedChem Lab: An Innovative Approach to Enable Drug Discovery in the 21st Century”, Drug Discovery Today, Vol. 18, Nos. 17-18, September 2013. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  54. 54. Synthesis failures at lilly • At the 2013 Sheffield Cheminformatics conference, Christos Nicolaou highlighted the technical challenge with predicting compounds potentially accessible by the Lilly’s Advanced Synthesis Lab (ASL). • In a proof-of-concept pilot project, only 25 of 90 compounds suggested by Lilly’s Annotated Reaction Repository (LARR) rule-set could be successfully synthesized in practice. • http://cisrg.shef.ac.uk/shef2013/talks/14.pdf CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  55. 55. Synthesis failures at gsk • Pickett et al. 2011 describe the parallel synthesis of a 50x50 library of MMP-12 inhibitors by an iodo-Suzuki coupling reaction. • Only 1704 of 2500 could be assayed [566 not made] Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  56. 56. Learning from failure (logp shift) • Nadin et al. 2012 [1] hypothesize that low LogP is a major cause of synthesis failure in parallel synthesis of combinatorial libraries. • Analysis confirms that this is indeed a significant factor for the GSK MMP-12 library. – 1704 compounds measured, mean logP = 3.56 (1.44) – 566 compounds not made, mean logP = 2.83 (1.52) – Student’s t-test for different distributions, p<2x10-22. 1. Nadin, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  57. 57. Example (actionable) insight The clear trend between Suzuki coupling success rate and predicted octanol-water partition co-efficient. Data: Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011 97 232 340 525 474 202 63 55 119 127 141 80 36 8 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% <1.0 1.0-2.0 2.0-3.0 3.0-4.0 4.0-5.0 5.0-6.0 >6.0 Reaction Product Predicted cLogP Sucessful Reactions Failed Reactions CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  58. 58. big data mining confirmation of Nadin-churcher hypothesis On 16,335 Suzuki coupling reactions extracted from US patent applications between 2001 and 2012. Nadin, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012. LogP Mean Yield N Obs < 1.0 52.89% 196 1.0 – 2.0 56.02% 1155 2.0 – 3.0 56.72% 2881 3.0 – 4.0 58.14% 4071 4.0 – 5.0 57.26% 3186 5.0 – 6.0 59.25% 2126 > 6.0 63.83% 2720 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  59. 59. “big data” reaction yield analysis AZ Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
  60. 60. “big data” reaction yield analysis AZ Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
  61. 61. Functional group changes analyzed Do the results make sense at all? Functional Group Avg in Reaction Overall Average halogen -0.98 -0.3 alcohol -0.95 -0.12 halogen_notfluorine -0.89 -0.27 alcohol_aromatic -0.67 -0.04 halogen_aliphatic -0.62 -0.15 halogen_notfluorine_aliphatic -0.62 -0.14 carboxylicacid -0.5 -0.23 halogen_bromine -0.42 -0.11 halogen_bromine_aliphatic -0.39 -0.06 halogen_aromatic -0.36 -0.16 alcohol_aliphatic -0.28 -0.08 halogen_notfluorine_aromatic -0.27 -0.13 amine -0.04 -0.3 amine_aliphatic -0.04 -0.27 carboxylicacid_aliphatic -0.04 -0.08 halogen_bromine_aromatic -0.03 -0.05 amine_tertiary -0.02 -0.06 amine_tertiary_aliphatic -0.02 -0.08 carboxylicacid_aromatic -0.02 -0.03 amine_cyclic -0.01 -0.02 halogen_bromine_bromoketone -0.01 0 Functional Group Avg in Reaction Overall Average acidchloride 0 -0.07 acidchloride_aliphatic 0 -0.05 acidchloride_aromatic 0 -0.02 aldehyde 0 -0.04 aldehyde_aliphatic 0 -0.01 aldehyde_aromatic 0 -0.03 amine_aromatic 0 -0.03 amine_primary 0 -0.15 amine_primary_aliphatic 0 -0.07 amine_primary_aromatic 0 -0.07 amine_secondary 0 -0.04 amine_secondary_aliphatic 0 -0.07 amine_secondary_aromatic 0 0.03 amine_tertiary_aromatic 0 0 azide 0 0 azide_aliphatic 0 0 azide_aromatic 0 0 boronicacid 0 -0.03 boronicacid_aliphatic 0 0 boronicacid_aromatic 0 -0.03 carboxylicacid_alphaamino 0 0 isocyanate 0 -0.01 isocyanate_aliphatic 0 0 isocyanate_aromatic 0 0 nitro 0 -0.03 nitro_aliphatic 0 0 nitro_aromatic 0 -0.03 sulfonylchloride 0 -0.02 sulfonylchloride_aliphatic 0 -0.01 Compare the average deltas for the >39K instances of Williamson ether synthesis These look sensible

×