SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
Regioselectivity: An Application of Expert
Systems and Ontologies to Chemical
(Named) Reaction Analysis
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Roger Sayle, John Mayfield and Noel O’Boyle
NextMove Software, Cambridge, UK
analysis vs. prediction
• In “Applied Chemoinformatics” (2018), J. Goodman
defines three main problems.
– Reaction Planning: R → ? → P [Database]
– Reaction Prediction: R1 + R2 → ? [Simulation]
– Synthesis Planning: R? + R? + R? → P [Design]
• A corollary is that there’s a distinction between
reactions that have already been observed and
those experiments yet to be performed.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
analysis vs. prediction
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
intuition and counter-intuition
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
With apologies to Mark Twain:
“Never let the facts get in the
way of a good synthesis plan”.
There are reactions that chemists
expect will happen but don’t &
those they don’t expect but do.
But cheminformaticians are
alchemists that can turn lead into
gold, as easily as “[Pb]>>[Au]”.
Experimental validation
• Synthesis of a novel aromatic
heterocycle previously unreported in
the literature.
• William Pitt et al., “Heteroaromatic
Rings of the Future”, Journal of
Medicinal Chemistry, 52(9):2952-
2963, 2009.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
expectations: setting the scope
Goodman Challenges Carey et al. Challenges
Maitotoxin
Difficult to access substituted
aromatic starting materials.
Eribulin Org. Biomol. Chem. 2005, 4,2337-2347.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Which reactions are important?
• NCI/LHASA SAVI (14+6 reactions)
– Suzuki Coupling 36262 out of 1.2M USPTO examples
– Sulfonamide Schotten-Baumann 14348 out of 1.2M USPTO examples
– Buchwald-Harwig 6040 out of 1.2M USPTO examples
– Hiyama coupling 458 out of 1.2M USPTO examples
– Fukuyama coupling 2 out of 1.2M USPTO examples
– Liebeskind-Srogl coupling 0 out of 1.2M USPTO examples
• Hartenfeller/Schneider (58 reactions)
– #1 Pictet-Spengler reaction 7 out of 1.2M USPTO examples
– #10+ #11 Azide-nitrile Huisgen-cycloaddition 5 out of 1.2M USPTO examples
– #17 Pyridone synthesis 2 out of 1.2M USPTO examples
– #20 Phthalazinone synthesis 16 out of 1.2M USPTO examples
– #24 Friedlander quinoline synthesis 30 out of 1.2M USPTO examples
• Enamine REAL 2016 (43 reactions)
– Thiourea to guanidine (14 out of 160M examples).
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
myth #1: heterocycle formation
• Because almost all drug-like molecules contain a
heterocycle, there’s a belief that heterocycle
formation is important.
• Analysis of both patent data and pharmaceutical
ELNs reveals that heterocycle forming reactions,
even named heterocycle forming reactions, are
relatively rare, with ring systems often being
purchased as building blocks*.
* This analysis might not apply to process development and manufacturing.
https://nextmovesoftware.com/blog/2016/10/24/buying-a-ring-or-making-one-yourself/
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Reactions that don’t happen!
• Paal-Knorr Pyrrole Synthesis vs. Aldehydes/Ketones
Of the 430 examples of
Paal-Knorr pyrrole
synthesis reported in
US patent applications
2001-2012, exactly
zero have more than
the two reacting
ketones/aldehydes.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Who let the dogs out?
• Big Data can determine the utility and scope of reactions.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Reactions that do happen!
• Chloro Sonogashira Couplings (@ NCI)
– The LHASA ‘CHMTRN’ rules for Transform 2267, Sonogashira couplings,
(presented by Marc Nicklaus at Sheffield) states “Iodides are usually
more reactive than bromide. Chlorides do not react”.
Code Name Count Yield
3.3.2 Bromo Sonogashira coupling 3717 49.3%
3.3.3 Chloro Sonogashira coupling 429 44.2%
3.3.4 Iodo Sonogashira coupling 2721 64.9%
• Isotopically-labelled Compounds (@Eli Lilly)
– The LAAR reactions used to construct Lilly’s PLC (Nicolaou et al. 2016)
forbid the presence of isotopic labels in reactants.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
improving upon synthia™
• As presented earlier, Chematica/Synthia maintains a
manually curated “black list” of interfering functional
groups.
• A more labor-efficient data-driven strategy is to
automatically maintain a “white list” of tolerated
functional groups that can be derived from observed
experiments.
• It’s much easier to track the things you know about
and can see, than the things you can’t see and/or
don’t know about.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
handling noisy data (n>1 statistics)
• Kumada couplings incompatible with aldehydes...
• US20050107257A1 [p0196] Syngenta
A solution of 34.3 ml of ethylmagnesium chloride in 50 ml of tetrahydrofuran is added dropwise at −70° C. to
7.3 g of indium trichloride in 200 ml of tetrahydrofuran and, after stirring for 30 minutes, the reaction
temperature is allowed to rise slowly to room temperature. That solution is added to a solution of 14.9 g of 3-
bromo-4-pyridine carbaldehyde and 2.8 g of PdCl2 (PPh3)2 in 240 ml of tetrahydrofuran and the reaction
mixture is heated under reflux for 20 hours. 5 ml of methanol are then added and the mixture is concentrated
in vacuo, stirred thoroughly with diethyl ether, filtered off and concentrated in vacuo once more. The residue is
chromatographed on silica gel using ethyl acetate/hexane (1:1), yielding 3-ethyl-4-pyridine carbaldehyde (I) in
the form of a yellow oil.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
c.f. US20140296184A1 [C00008]
M. Segler, M. Preuss and M. Waller, “Planning Chemical Syntheses with Deep Neural
Networks and Symbolic AI”, Nature, 555:604-610, 2018.
A.Gini, M. Segler et al, “Dehydrogenative TEMPO-mediated formation of Unstable
Nitrones: Easy Access to N-Carbamoyl Isoxazolines”, Chem. Eur. J. 21:12053-12060,
2015.
prediction? about the future?
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
search: monte carlo v. proof num
• Appropriate choice of AI search technique.
Monte Carlo search is inappropriate
for search problems such as mazes.
White to win in 173 ply
Kf1-f2!
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
OR
AND
0 1 0 0 1 1 1 0
0
1
0 1 0
categorization of reactions
1. J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006.
2. S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011.
34%
17%
5%
2%
3%
6%
10%
1%
15%
2%
5% Heteroatom alkylation and arylation
Acylation and related processes
C-C bond formations
Heterocycle formation
Protections
Deprotections
Reductions
Oxidations
Functional group conversion
Functional group addition
Resolution
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
reaction ontology
• Reactions are classified into a common subset of the
Carey et al. classes and the RSC’s RXNO ontology.
• There are 12 super-classes
– e.g. 3 C-C bond formation (RXNO:0000002).
• These contain 84 class/categories.
– e.g. 3.5 Pd-catalyzed C-C bond formation (RXNO:0000316)
• These contain ~1050 named reactions/types.
– e.g. 3.5.3 Negishi coupling (RXNO:0000088)
• These require ~2200 SMIRKS-like transformations.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Complication: agents matter
From http://en.wikipedia.org/wiki/Diazonium_compound
Sandmeyer reaction
Benzenediazonium chloride heated with cuprous chloride
disolved in HCl to yield chlorobenzene.
C6H5N2
+ + CuCl → C6H5Cl + N2 + Cu+
Gatterman reaction
Benzenediazonium chloride is warmed with copper powder and
HCl to yield chlorobenzene.
C6H5N2
+ + CuCl → C6H5Cl + N2 + Cu+
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
10 most popular reactions
ID Name Count
2.1.2 Carboxylic acid + amine 26,040
1.3.1 Buchwald-Hartwig amination 22,048
3.1 Suzuki coupling 16,508
1.7.6 Williamson ether synthesis 15,665
2.1.1 Amide Schotten-Baumann 11,016
7.1 Nitro to amino 10,234
6.1.1 N-Boc deprotection 9,821
6.2.2 CO2H-Me deprotection 9,487
6.2.1 CO2H-Et deprotection 6,749
2.2.3 Sulfonamide Schotten-Baumann 6,223
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Most/least successful reactions
ID Name Mean Yield Count
1.7.2 Diazomethane esterification 91% 41
9.3.1 Carboxylic acid to acid chloride 88% 704
9.7.14 Bromo to azido 85% 235
1.7.5 Methyl esterification 84% 2918
9.7.19 Bromo to iodo Finkelstein reaction 82% 116
6.1.3 N-Cbz deprotection 81% 1359
…
4.1.11 Larock indole synthesis 47% 55
3.11.3 Ullmann-type biaryl coupling 44% 407
1.7.1 Chan-Lam ether coupling 44% 154
4.1.4 Pinner pyrimidine synthesis 39% 47
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
rArE named reactions
• Adams decarboxylation
• Angeli-Rimini reaction
• Aza-Baylis-Hillman reaction
• Boyer reaction
• Buchwald-Fischer indole synthesis
• Castro-Stephens coupling
• Chugaev elimination
• Cook-Heilbron thiazole synthesis
• Fischer-Hepp rearrangement
• Fukuyama indole synthesis
• Gasman indole synthesis
• Imine Hosomi-Sakurai reaction
• Koch reaction
• Leuckart reaction
• Liebeskind-Srogl coupling
• Lossen rearrangement
• Ponzio reaction
• Prins reaction
• Reimer-Tiemann carboxylation
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Trends in Reaction Types
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
8.0%
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
Suzukicouplingsasapercentageofreactionsinayear
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Leaving Group Mean Yield N Observations
Bromo 58.80% 10817
Chloro 57.96% 2752
Iodo 57.21% 2049
Triflyloxy 65.48% 717
Namerxn reaction naming
• Many reaction classification algorithms are
dependent upon atom-atom mapping assignments.
• Alas MCS-based atom mapping algorithms are often
slow and/or inaccurate [Lowe & Sayle 2012 & 2013].
• NameRXN is a mechanism-based atom-mapper.
• All reactants and reagents are placed in a single pot
(molecule) and sets of SMIRKS applied in turn.
• If the desired product is generated, the reaction (its
mechanism and mapping) is identified.
– Rationalization is easier than prediction!
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Example smarts/smirks
# NOZAKI_HIYAMA_KISHI_REACTION
[#6v4+0;X4,X3:1][BrD1h0+0:2].[Ni].[Cr].[OD1h
0+0:3]=[CD2h1v4+0:4]>>[#6:1][C:4]-[Oh1:3]
# PAAL_KNORR_THIOPHENE_SYNTHESIS
[OD1h0+0:1]=[CX3v4+0:2][CX4v4+0:3]([H])[CX4v
4+0:4]([H])[CX3v4+0:5]=[OD1h0+0:6]>>[S:1]1[C
:2]=[C:3][C:4]=[C:5]1
• Writing SMIRKS is both an art and a science.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Smarts pattern compilation
bool atom_1(const RDKit::Atom *aptr) {
return aptr->getAtomicNum() == 6 &&
ringinfo->numAtomRings(aptr->getIdx()) != 0 &&
aptr->getDegree() == 3 &&
aptr->getTotalNumHs(false) == 0 &&
aptr->getExplicitValence()+aptr->getImplicitValence() == 4 &&
aptr->getFormalCharge() == 0;
}
bool atom_28(const RDKit::Atom *aptr) {
if (aptr->getAtomicNum() != 6 ||
aptr->getExplicitValence()+aptr->getImplicitValence() != 4 ||
aptr->getFormalCharge() != 0)
return false;
return (aptr->getDegree() == 2 && aptr->getTotalNumHs(false) == 1) ||
(aptr->getDegree() == 2 && aptr->getTotalNumHs(false) == 2) ||
aptr->getDegree() == 3;
}
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Smarts pattern compilation
RDKit::ROMol::OBOND_ITER_PAIR biter[25];
…
case_7:
biter[0] = mol->getAtomBonds(atom[1]);
goto case_9;
case_8:
avisit[atom[0]->getIdx()] = 0;
++biter[0].first;
case_9:
if (biter[0].first != biter[0].second) {
bptr = (*mol)[*biter[0].first].get();
if (bond_1(bptr)) {
aptr = bptr->getOtherAtom(atom[1]);
aidx = aptr->getIdx();
if (avisit[aidx] == 0 && atom_1(aptr)) {
avisit[aidx] = 1;
atom[0] = aptr;
goto case_10;
} else ++biter[0].first;
} else ++biter[0].first;
} else goto case_5;
goto case_9;
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Recent improvements insights
• Profiling NameRxn in 2014 to diagnose performance
problems with RDKit revealed that pattern matching
and transformation accounted for <1% of runtime.
• The bottleneck is actually in canonicalization.
• The Ah-ha experience was to use hash filtering.
• Check molecular formula: CiHjBrkCllFmInNoOpPqSr
• Additional cleverness allows pre-sanitization hashing.
– Triple bond count, but not single or double bond count.
– Perhaps there’s something in InChI-style hashing after all.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Pistachio: Siri for chemists
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Multi-step Synthetic Routes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Intermediates 197702103114 56611 31403 17268 9230 5057 2701 1256 639 301 136 58 15 5 2
Terminal Products 385149149445 81837 47579 27670 16619 9320 5263 2511 1330 678 373 111 63 8 6 5
0
100000
200000
300000
400000
500000
600000
700000
Occurrences
Number of steps
Intermediates
Terminal Products
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Application to planning 1
• Cinnamic Acid (PhCHCHCO2)
1. Bromo Heck reaction (272)
2. Horner-Wadsworth-Emmons reaction (268)
3. Wittig olefination (129)
4. Bromo Heck-type reaction (62)
5. Iodo Heck reaction (49)
6. Triflyloxy Heck[-type] reaction (43)
7. Ester Schotten-Baumann (10)
8. Bromo Suzuki coupling (5)
9. Stille reaction (2)
10. Olefin metathesis (1)
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Application to planning 2
• p-Nitrobenzoic acid
1. Nitrile to carboxy (12)
2. CO2H-Me deprot (8)
3. CO2H-Et deprot (5)
4. Ester hydrolysis (1)
5. Nitration (1)
• p-Nitrotoluene
1. Nitration (96)
2. Bromo Suzuki-type (1)
3. Chloro Suzuki (1)
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
tactical vs. strategic reactions
• Traditional Synthesis
Planning has concentrated
on the Strategic Application
of Named Reactions.
• However, there’s much to
be gained for the Tactical
Application of Unnamed
Reactions.
• Nitro reduction is the 6th
most frequent reaction.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
functional group interconversion
• Relative frequency of simple group conversion
From a total of 7,252,419 reactions from USPTO & EPO patents.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Amino Bromo Chloro Fluoro Hydroxy Iodo Sulfanyl Thioxo
Amino 3951 2949 990 3976 3657 143
Bromo 3121 589 637 2390 738 435
Chloro 9424 717 1606 2156 744 798
Fluoro 1549 180 484 826 28 103
Hydroxy 2572 11441 31593 7641 3004 348
Iodo 155 445 47
Nitro 126606 138
Oxo 8419
nextmove’s strategy
• Why expose reaction planning to the end-user at all?
• During our collaboration with ChemSpace, it has become clear
that what was required was not similarity nor superstructure
search by a form of synthesis-aware search.
• This is similar to the challenge faced by traditional
restrosynthesis tools in identifying a leaf/goal state.
• 937M purchasable compounds makes this is non-trivial.
• The usual challenges of functional group, tautomer and
protonation state lookups, now also protecting groups.
• But why not return the carboxylic acid when searching for the
acid chloride, or alcohol when searching for bromo derivative.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
building block search
• Query:
Results:
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
the challenge of regioselectivity
• A tricky benchmark is reactions of 2,4,5-trichloropyrimidine
• The nature of pyrimidine makes the chloro at the 4-position more reactive
than the 2 position which is more reactive than the 5 position.
• Simple quantum mechanical have difficulty discerning this order.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
handy’s rule
• Scott Handy and Yanan Zhang, “A Simple Guide for Predicting
Regioselectivity in the Coupling of Polyhaloheteroaromatics”,
Chemical Communications, 3:299-301, Nov 2005.
Abstract
A simple guide for predicting the order and site of coupling (Suzuki, Stille, Negishi,
Sonogashira, etc.) in polyhaloheteroaromatics based upon the 1H NMR chemical shift
values of the parent non-halogenated heteroaromatics has been developed.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
electrophilic substitution of
heterocycles with qm methods
• J.C. Kromann et al., “Fast and Accurate Prediction of the
Regioselectivity of Electrophilic Aromatic Substitution
Reactions”, Chem. Sci., 9(3):660-665, Nov 2017.
• M. Kruszyk et al., “Computational Methods to Predict the
Regioselectivity of Electrophilic Aromatic Substitution
Reactions of Heteroaromatic Systems”, J. Org. Chem.
81(12):5128-5134, Jun 2016.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
a data-driven strategy
• One possible approach to the challenge of regioselectivity is
to derive preferences by large-scale (statistical) analysis of
reaction data sets.
• Reaction classification can identify the subset of relevant
examples, which can then be used to produce tables of
heterocycles position preferences.
• Directing groups and their influence can also be identified and
tabulated.
• See https://www.scripps.edu/baran/images/grpmtgpdf/Gutekunst_Apr_10.pdf
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
the semantics of atom mapping
Prof. Goodman poses an interesting question of atom mapping.
4.1.6 Cyclic Beckman Rearrangement
1.2.9 Alcohol + Amine Condensation or 1.1.3 Iodo N-methylation
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Acknowledgements
• NextMove Software
– Noel O’Boyle
– John Mayfield
• NextMove Alumni
– Daniel Lowe
• Thank you for you time.
• Questions?
• Thoughts?
• AbbVie
• AstraZeneca
• Bristol-Myers Squibb
• ChemSpace
• Eli Lilly
• GlaxoSmithKline
• Hoffmann-La Roche
• IBM Research Zurich
• Merck
• Novartis
• Royal Society of Chemistry
• Vernalis
• Vertex Pharmaceuticals
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Transforms vs. reactions
Importance of Reaction Mechanism
Example: Ullman-type Coupling Reactions
SMIRKS: [H][N:1].[Cl][c:2]>>[N:1][c:2]
The SMIRKS transform alone is insufficient
to predict the products and by-products in
this example. A measure of nucleophility
is desirable for each atom in a molecule.
Without this software may be misled into
believing that protecting groups are required.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Analysis of pharmaceutical elns
• NextMove Software’s HazELNut software is used to
export and analyze ELN content at 6 of the top 10
large pharmaceutical companies.
• In-house analysis of this data, across the industry,
reveals a surprisingly high rate of synthesis failure,
not indicated in the published literature (journal
articles, patent applications or reaction databases).
• Understanding the causes of these failures is perhaps
more significant than attempting to access new
chemistries.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Extracting mps and reactions
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Example reaction mining INPUT
Methyl 4-[(pentafluorophenoxy)sulfonyl]benzoate
To a solution of methyl 4-(chlorosulfonyl)benzoate (606 mg,
2.1 mmol, 1 eq) in DCM (35 ml) was added
pentafluorophenol (412 mg, 2.2 mmol, 1.1 eq) and Et3N
(540 mg, 5.4 mmol, 2.5 eq) and the reaction mixture stirred
at room temperature until all of the starting material was
consumed. The solvent was evaporated in vacuo and the
residue redissolved in ethyl acetate (10 ml), washed with
water (10 ml), saturated sodium hydrogen carbonate (10
ml), dried over sodium sulphate, filtered and evaporated to
yield the title compound as a white solid (690 mg, 1.8
mmol, 85%).
Example reaction mining Output
CHEMICAL reactions for free
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
bond changes in indole synthesis
Synthesis A B C D
Baeyer-Emmerling M
Bartoli M M
Bischler-Möhlau M C M
Fischer M C M
Fukuyama M
Hemetsberger M
Larock M C M
Mandelung M
Nenitzescu M M
Reissert M M
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Suzuki coupling leaving groups
Leaving Group Mean Yield N Observations
Bromo 58.80% 10817
Chloro 57.96% 2752
Iodo 57.21% 2049
Triflyloxy 65.48% 717
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Beyond drug guru
sildenafil (viagra) vardenafil (levitra)
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Eli lilly’s automated synthesis lab
Alexander G. Godfrey, Thierry Masquelin and Horst Hemmerle, “A Remote-Control Adaptive MedChem Lab: An Innovative
Approach to Enable Drug Discovery in the 21st Century”, Drug Discovery Today, Vol. 18, Nos. 17-18, September 2013.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Synthesis failures at lilly
• At the 2013 Sheffield Cheminformatics conference,
Christos Nicolaou highlighted the technical challenge
with predicting compounds potentially accessible by
the Lilly’s Advanced Synthesis Lab (ASL).
• In a proof-of-concept pilot project, only 25 of 90
compounds suggested by Lilly’s Annotated Reaction
Repository (LARR) rule-set could be successfully
synthesized in practice.
• http://cisrg.shef.ac.uk/shef2013/talks/14.pdf
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Synthesis failures at gsk
• Pickett et al. 2011 describe the parallel synthesis of a
50x50 library of MMP-12 inhibitors by an iodo-Suzuki
coupling reaction.
• Only 1704 of 2500 could be assayed [566 not made]
Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Learning from failure (logp shift)
• Nadin et al. 2012 [1] hypothesize that low LogP is a
major cause of synthesis failure in parallel synthesis
of combinatorial libraries.
• Analysis confirms that this is indeed a significant
factor for the GSK MMP-12 library.
– 1704 compounds measured, mean logP = 3.56 (1.44)
– 566 compounds not made, mean logP = 2.83 (1.52)
– Student’s t-test for different distributions, p<2x10-22.
1. Nadin, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for
Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012.
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
Example (actionable) insight
The clear trend between Suzuki coupling success rate and
predicted octanol-water partition co-efficient.
Data: Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011
97 232
340
525
474 202
63
55 119 127 141 80 36 8
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
<1.0 1.0-2.0 2.0-3.0 3.0-4.0 4.0-5.0 5.0-6.0 >6.0
Reaction Product Predicted cLogP
Sucessful Reactions Failed Reactions
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
big data mining confirmation of
Nadin-churcher hypothesis
On 16,335 Suzuki coupling reactions extracted from US
patent applications between 2001 and 2012.
Nadin, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for Synthetic
Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012.
LogP Mean Yield N Obs
< 1.0 52.89% 196
1.0 – 2.0 56.02% 1155
2.0 – 3.0 56.72% 2881
3.0 – 4.0 58.14% 4071
4.0 – 5.0 57.26% 3186
5.0 – 6.0 59.25% 2126
> 6.0 63.83% 2720
CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
“big data” reaction yield analysis
AZ Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
“big data” reaction yield analysis
AZ Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
Functional group changes analyzed
Do the results make sense at all?
Functional Group
Avg in
Reaction
Overall
Average
halogen -0.98 -0.3
alcohol -0.95 -0.12
halogen_notfluorine -0.89 -0.27
alcohol_aromatic -0.67 -0.04
halogen_aliphatic -0.62 -0.15
halogen_notfluorine_aliphatic -0.62 -0.14
carboxylicacid -0.5 -0.23
halogen_bromine -0.42 -0.11
halogen_bromine_aliphatic -0.39 -0.06
halogen_aromatic -0.36 -0.16
alcohol_aliphatic -0.28 -0.08
halogen_notfluorine_aromatic -0.27 -0.13
amine -0.04 -0.3
amine_aliphatic -0.04 -0.27
carboxylicacid_aliphatic -0.04 -0.08
halogen_bromine_aromatic -0.03 -0.05
amine_tertiary -0.02 -0.06
amine_tertiary_aliphatic -0.02 -0.08
carboxylicacid_aromatic -0.02 -0.03
amine_cyclic -0.01 -0.02
halogen_bromine_bromoketone -0.01 0
Functional Group
Avg in
Reaction
Overall
Average
acidchloride 0 -0.07
acidchloride_aliphatic 0 -0.05
acidchloride_aromatic 0 -0.02
aldehyde 0 -0.04
aldehyde_aliphatic 0 -0.01
aldehyde_aromatic 0 -0.03
amine_aromatic 0 -0.03
amine_primary 0 -0.15
amine_primary_aliphatic 0 -0.07
amine_primary_aromatic 0 -0.07
amine_secondary 0 -0.04
amine_secondary_aliphatic 0 -0.07
amine_secondary_aromatic 0 0.03
amine_tertiary_aromatic 0 0
azide 0 0
azide_aliphatic 0 0
azide_aromatic 0 0
boronicacid 0 -0.03
boronicacid_aliphatic 0 0
boronicacid_aromatic 0 -0.03
carboxylicacid_alphaamino 0 0
isocyanate 0 -0.01
isocyanate_aliphatic 0 0
isocyanate_aromatic 0 0
nitro 0 -0.03
nitro_aliphatic 0 0
nitro_aromatic 0 -0.03
sulfonylchloride 0 -0.02
sulfonylchloride_aliphatic 0 -0.01
Compare the average deltas for the >39K instances of
Williamson ether synthesis
These look sensible

Weitere ähnliche Inhalte

Was ist angesagt?

Nucleophilic Aromatic Substitution
Nucleophilic Aromatic SubstitutionNucleophilic Aromatic Substitution
Nucleophilic Aromatic Substitution
Aadil Ali Wani
 

Was ist angesagt? (20)

Asymmetric Synthesis
Asymmetric SynthesisAsymmetric Synthesis
Asymmetric Synthesis
 
Basic aspects of Stereochemistry
Basic aspects of StereochemistryBasic aspects of Stereochemistry
Basic aspects of Stereochemistry
 
Carbanion
CarbanionCarbanion
Carbanion
 
Sigmatropic reaction
Sigmatropic reactionSigmatropic reaction
Sigmatropic reaction
 
Diastreoslectivity,chemoslectivity&;regioslectivity crams rule felkin anh m...
Diastreoslectivity,chemoslectivity&;regioslectivity   crams rule felkin anh m...Diastreoslectivity,chemoslectivity&;regioslectivity   crams rule felkin anh m...
Diastreoslectivity,chemoslectivity&;regioslectivity crams rule felkin anh m...
 
Aromaticity
AromaticityAromaticity
Aromaticity
 
Sigmatropic rearrangement reactions (pericyclic reaction)
Sigmatropic rearrangement reactions (pericyclic reaction)Sigmatropic rearrangement reactions (pericyclic reaction)
Sigmatropic rearrangement reactions (pericyclic reaction)
 
Stobbe condensation
Stobbe condensation Stobbe condensation
Stobbe condensation
 
Hydrogenation reaction
Hydrogenation reactionHydrogenation reaction
Hydrogenation reaction
 
Reactions intermediate
Reactions intermediateReactions intermediate
Reactions intermediate
 
SMILES REARRANGEMENT [REACTION AND MECHANISM]
SMILES REARRANGEMENT [REACTION AND MECHANISM]SMILES REARRANGEMENT [REACTION AND MECHANISM]
SMILES REARRANGEMENT [REACTION AND MECHANISM]
 
Stereochemistry
StereochemistryStereochemistry
Stereochemistry
 
Nucleophilic Aromatic Substitution
Nucleophilic Aromatic SubstitutionNucleophilic Aromatic Substitution
Nucleophilic Aromatic Substitution
 
Retrosynthesis
RetrosynthesisRetrosynthesis
Retrosynthesis
 
Topicity
TopicityTopicity
Topicity
 
Crown ethers ppt
Crown ethers pptCrown ethers ppt
Crown ethers ppt
 
4. Wilkinson's Catalyst
4. Wilkinson's Catalyst4. Wilkinson's Catalyst
4. Wilkinson's Catalyst
 
PERICYCLIC REACTION & WOODWARD HOFFMANN RULES, FMO THEORY
PERICYCLIC REACTION & WOODWARD HOFFMANN RULES, FMO THEORYPERICYCLIC REACTION & WOODWARD HOFFMANN RULES, FMO THEORY
PERICYCLIC REACTION & WOODWARD HOFFMANN RULES, FMO THEORY
 
pericyclic reaction
 pericyclic reaction pericyclic reaction
pericyclic reaction
 
Determination of reaction mechanisms
Determination of reaction mechanismsDetermination of reaction mechanisms
Determination of reaction mechanisms
 

Ähnlich wie CINF 170: Regioselectivity: An application of expert systems and ontologies to chemical (named) reaction analysis

Applications of Computer Science in Environmental Models
Applications of Computer Science in Environmental ModelsApplications of Computer Science in Environmental Models
Applications of Computer Science in Environmental Models
IJLT EMAS
 
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Hitesh Patel
 

Ähnlich wie CINF 170: Regioselectivity: An application of expert systems and ontologies to chemical (named) reaction analysis (20)

Ruh2 Synthesis Lab Report
Ruh2 Synthesis Lab ReportRuh2 Synthesis Lab Report
Ruh2 Synthesis Lab Report
 
Methanol to Ethanol by Homologation - Kinetic Study
Methanol to Ethanol by Homologation - Kinetic StudyMethanol to Ethanol by Homologation - Kinetic Study
Methanol to Ethanol by Homologation - Kinetic Study
 
Chemical structure representation in PubChem
Chemical structure representation in PubChemChemical structure representation in PubChem
Chemical structure representation in PubChem
 
How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...
 
Checking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying ChemistryChecking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying Chemistry
 
CINF 51: Analyzing success rates of supposedly 'easy' reactions
CINF 51: Analyzing success rates of supposedly 'easy' reactionsCINF 51: Analyzing success rates of supposedly 'easy' reactions
CINF 51: Analyzing success rates of supposedly 'easy' reactions
 
foglar book.pdf
foglar book.pdffoglar book.pdf
foglar book.pdf
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Symposium
SymposiumSymposium
Symposium
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
C6 lesson part two
C6 lesson part twoC6 lesson part two
C6 lesson part two
 
13197.full.pdf
13197.full.pdf13197.full.pdf
13197.full.pdf
 
Chalcones
ChalconesChalcones
Chalcones
 
Applications of Computer Science in Environmental Models
Applications of Computer Science in Environmental ModelsApplications of Computer Science in Environmental Models
Applications of Computer Science in Environmental Models
 
Immunosuppressant Analysis by means of SACI
Immunosuppressant Analysis by means of SACIImmunosuppressant Analysis by means of SACI
Immunosuppressant Analysis by means of SACI
 
Advances in Organic Chemistry in Academia Using Real-Time In Situ Mid-FTIR - ...
Advances in Organic Chemistry in Academia Using Real-Time In Situ Mid-FTIR - ...Advances in Organic Chemistry in Academia Using Real-Time In Situ Mid-FTIR - ...
Advances in Organic Chemistry in Academia Using Real-Time In Situ Mid-FTIR - ...
 
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
Synthetically Accessible Virtual Inventory (SAVI) : Reaction generation and h...
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Teoria itc
Teoria itcTeoria itc
Teoria itc
 

Mehr von NextMove Software

Mehr von NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 

Kürzlich hochgeladen

Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Kürzlich hochgeladen (20)

Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 

CINF 170: Regioselectivity: An application of expert systems and ontologies to chemical (named) reaction analysis

  • 1. Regioselectivity: An Application of Expert Systems and Ontologies to Chemical (Named) Reaction Analysis CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Roger Sayle, John Mayfield and Noel O’Boyle NextMove Software, Cambridge, UK
  • 2. analysis vs. prediction • In “Applied Chemoinformatics” (2018), J. Goodman defines three main problems. – Reaction Planning: R → ? → P [Database] – Reaction Prediction: R1 + R2 → ? [Simulation] – Synthesis Planning: R? + R? + R? → P [Design] • A corollary is that there’s a distinction between reactions that have already been observed and those experiments yet to be performed. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 3. analysis vs. prediction CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 4. intuition and counter-intuition CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 With apologies to Mark Twain: “Never let the facts get in the way of a good synthesis plan”. There are reactions that chemists expect will happen but don’t & those they don’t expect but do. But cheminformaticians are alchemists that can turn lead into gold, as easily as “[Pb]>>[Au]”.
  • 5. Experimental validation • Synthesis of a novel aromatic heterocycle previously unreported in the literature. • William Pitt et al., “Heteroaromatic Rings of the Future”, Journal of Medicinal Chemistry, 52(9):2952- 2963, 2009. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 6. expectations: setting the scope Goodman Challenges Carey et al. Challenges Maitotoxin Difficult to access substituted aromatic starting materials. Eribulin Org. Biomol. Chem. 2005, 4,2337-2347. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 7. Which reactions are important? • NCI/LHASA SAVI (14+6 reactions) – Suzuki Coupling 36262 out of 1.2M USPTO examples – Sulfonamide Schotten-Baumann 14348 out of 1.2M USPTO examples – Buchwald-Harwig 6040 out of 1.2M USPTO examples – Hiyama coupling 458 out of 1.2M USPTO examples – Fukuyama coupling 2 out of 1.2M USPTO examples – Liebeskind-Srogl coupling 0 out of 1.2M USPTO examples • Hartenfeller/Schneider (58 reactions) – #1 Pictet-Spengler reaction 7 out of 1.2M USPTO examples – #10+ #11 Azide-nitrile Huisgen-cycloaddition 5 out of 1.2M USPTO examples – #17 Pyridone synthesis 2 out of 1.2M USPTO examples – #20 Phthalazinone synthesis 16 out of 1.2M USPTO examples – #24 Friedlander quinoline synthesis 30 out of 1.2M USPTO examples • Enamine REAL 2016 (43 reactions) – Thiourea to guanidine (14 out of 160M examples). CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 8. myth #1: heterocycle formation • Because almost all drug-like molecules contain a heterocycle, there’s a belief that heterocycle formation is important. • Analysis of both patent data and pharmaceutical ELNs reveals that heterocycle forming reactions, even named heterocycle forming reactions, are relatively rare, with ring systems often being purchased as building blocks*. * This analysis might not apply to process development and manufacturing. https://nextmovesoftware.com/blog/2016/10/24/buying-a-ring-or-making-one-yourself/ CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 9. Reactions that don’t happen! • Paal-Knorr Pyrrole Synthesis vs. Aldehydes/Ketones Of the 430 examples of Paal-Knorr pyrrole synthesis reported in US patent applications 2001-2012, exactly zero have more than the two reacting ketones/aldehydes. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 10. Who let the dogs out? • Big Data can determine the utility and scope of reactions. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 11. Reactions that do happen! • Chloro Sonogashira Couplings (@ NCI) – The LHASA ‘CHMTRN’ rules for Transform 2267, Sonogashira couplings, (presented by Marc Nicklaus at Sheffield) states “Iodides are usually more reactive than bromide. Chlorides do not react”. Code Name Count Yield 3.3.2 Bromo Sonogashira coupling 3717 49.3% 3.3.3 Chloro Sonogashira coupling 429 44.2% 3.3.4 Iodo Sonogashira coupling 2721 64.9% • Isotopically-labelled Compounds (@Eli Lilly) – The LAAR reactions used to construct Lilly’s PLC (Nicolaou et al. 2016) forbid the presence of isotopic labels in reactants. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 12. improving upon synthia™ • As presented earlier, Chematica/Synthia maintains a manually curated “black list” of interfering functional groups. • A more labor-efficient data-driven strategy is to automatically maintain a “white list” of tolerated functional groups that can be derived from observed experiments. • It’s much easier to track the things you know about and can see, than the things you can’t see and/or don’t know about. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 13. handling noisy data (n>1 statistics) • Kumada couplings incompatible with aldehydes... • US20050107257A1 [p0196] Syngenta A solution of 34.3 ml of ethylmagnesium chloride in 50 ml of tetrahydrofuran is added dropwise at −70° C. to 7.3 g of indium trichloride in 200 ml of tetrahydrofuran and, after stirring for 30 minutes, the reaction temperature is allowed to rise slowly to room temperature. That solution is added to a solution of 14.9 g of 3- bromo-4-pyridine carbaldehyde and 2.8 g of PdCl2 (PPh3)2 in 240 ml of tetrahydrofuran and the reaction mixture is heated under reflux for 20 hours. 5 ml of methanol are then added and the mixture is concentrated in vacuo, stirred thoroughly with diethyl ether, filtered off and concentrated in vacuo once more. The residue is chromatographed on silica gel using ethyl acetate/hexane (1:1), yielding 3-ethyl-4-pyridine carbaldehyde (I) in the form of a yellow oil. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 14. c.f. US20140296184A1 [C00008] M. Segler, M. Preuss and M. Waller, “Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI”, Nature, 555:604-610, 2018. A.Gini, M. Segler et al, “Dehydrogenative TEMPO-mediated formation of Unstable Nitrones: Easy Access to N-Carbamoyl Isoxazolines”, Chem. Eur. J. 21:12053-12060, 2015. prediction? about the future? CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 15. search: monte carlo v. proof num • Appropriate choice of AI search technique. Monte Carlo search is inappropriate for search problems such as mazes. White to win in 173 ply Kf1-f2! CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 OR AND 0 1 0 0 1 1 1 0 0 1 0 1 0
  • 16. categorization of reactions 1. J. Carey, D. Laffan, C. Thomson, M. Williams, Org. Biomol. Chem. 2337, 2006. 2. S. Roughley and A. Jordan, J. Med. Chem. 54:3451-3479, 2011. 34% 17% 5% 2% 3% 6% 10% 1% 15% 2% 5% Heteroatom alkylation and arylation Acylation and related processes C-C bond formations Heterocycle formation Protections Deprotections Reductions Oxidations Functional group conversion Functional group addition Resolution CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 17. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 18. reaction ontology • Reactions are classified into a common subset of the Carey et al. classes and the RSC’s RXNO ontology. • There are 12 super-classes – e.g. 3 C-C bond formation (RXNO:0000002). • These contain 84 class/categories. – e.g. 3.5 Pd-catalyzed C-C bond formation (RXNO:0000316) • These contain ~1050 named reactions/types. – e.g. 3.5.3 Negishi coupling (RXNO:0000088) • These require ~2200 SMIRKS-like transformations. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 19. Complication: agents matter From http://en.wikipedia.org/wiki/Diazonium_compound Sandmeyer reaction Benzenediazonium chloride heated with cuprous chloride disolved in HCl to yield chlorobenzene. C6H5N2 + + CuCl → C6H5Cl + N2 + Cu+ Gatterman reaction Benzenediazonium chloride is warmed with copper powder and HCl to yield chlorobenzene. C6H5N2 + + CuCl → C6H5Cl + N2 + Cu+ CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 20. 10 most popular reactions ID Name Count 2.1.2 Carboxylic acid + amine 26,040 1.3.1 Buchwald-Hartwig amination 22,048 3.1 Suzuki coupling 16,508 1.7.6 Williamson ether synthesis 15,665 2.1.1 Amide Schotten-Baumann 11,016 7.1 Nitro to amino 10,234 6.1.1 N-Boc deprotection 9,821 6.2.2 CO2H-Me deprotection 9,487 6.2.1 CO2H-Et deprotection 6,749 2.2.3 Sulfonamide Schotten-Baumann 6,223 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 21. Most/least successful reactions ID Name Mean Yield Count 1.7.2 Diazomethane esterification 91% 41 9.3.1 Carboxylic acid to acid chloride 88% 704 9.7.14 Bromo to azido 85% 235 1.7.5 Methyl esterification 84% 2918 9.7.19 Bromo to iodo Finkelstein reaction 82% 116 6.1.3 N-Cbz deprotection 81% 1359 … 4.1.11 Larock indole synthesis 47% 55 3.11.3 Ullmann-type biaryl coupling 44% 407 1.7.1 Chan-Lam ether coupling 44% 154 4.1.4 Pinner pyrimidine synthesis 39% 47 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 22. rArE named reactions • Adams decarboxylation • Angeli-Rimini reaction • Aza-Baylis-Hillman reaction • Boyer reaction • Buchwald-Fischer indole synthesis • Castro-Stephens coupling • Chugaev elimination • Cook-Heilbron thiazole synthesis • Fischer-Hepp rearrangement • Fukuyama indole synthesis • Gasman indole synthesis • Imine Hosomi-Sakurai reaction • Koch reaction • Leuckart reaction • Liebeskind-Srogl coupling • Lossen rearrangement • Ponzio reaction • Prins reaction • Reimer-Tiemann carboxylation CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 23. Trends in Reaction Types 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0% 8.0% 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Suzukicouplingsasapercentageofreactionsinayear CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Leaving Group Mean Yield N Observations Bromo 58.80% 10817 Chloro 57.96% 2752 Iodo 57.21% 2049 Triflyloxy 65.48% 717
  • 24. Namerxn reaction naming • Many reaction classification algorithms are dependent upon atom-atom mapping assignments. • Alas MCS-based atom mapping algorithms are often slow and/or inaccurate [Lowe & Sayle 2012 & 2013]. • NameRXN is a mechanism-based atom-mapper. • All reactants and reagents are placed in a single pot (molecule) and sets of SMIRKS applied in turn. • If the desired product is generated, the reaction (its mechanism and mapping) is identified. – Rationalization is easier than prediction! CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 25. Example smarts/smirks # NOZAKI_HIYAMA_KISHI_REACTION [#6v4+0;X4,X3:1][BrD1h0+0:2].[Ni].[Cr].[OD1h 0+0:3]=[CD2h1v4+0:4]>>[#6:1][C:4]-[Oh1:3] # PAAL_KNORR_THIOPHENE_SYNTHESIS [OD1h0+0:1]=[CX3v4+0:2][CX4v4+0:3]([H])[CX4v 4+0:4]([H])[CX3v4+0:5]=[OD1h0+0:6]>>[S:1]1[C :2]=[C:3][C:4]=[C:5]1 • Writing SMIRKS is both an art and a science. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 26. Smarts pattern compilation bool atom_1(const RDKit::Atom *aptr) { return aptr->getAtomicNum() == 6 && ringinfo->numAtomRings(aptr->getIdx()) != 0 && aptr->getDegree() == 3 && aptr->getTotalNumHs(false) == 0 && aptr->getExplicitValence()+aptr->getImplicitValence() == 4 && aptr->getFormalCharge() == 0; } bool atom_28(const RDKit::Atom *aptr) { if (aptr->getAtomicNum() != 6 || aptr->getExplicitValence()+aptr->getImplicitValence() != 4 || aptr->getFormalCharge() != 0) return false; return (aptr->getDegree() == 2 && aptr->getTotalNumHs(false) == 1) || (aptr->getDegree() == 2 && aptr->getTotalNumHs(false) == 2) || aptr->getDegree() == 3; } CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 27. Smarts pattern compilation RDKit::ROMol::OBOND_ITER_PAIR biter[25]; … case_7: biter[0] = mol->getAtomBonds(atom[1]); goto case_9; case_8: avisit[atom[0]->getIdx()] = 0; ++biter[0].first; case_9: if (biter[0].first != biter[0].second) { bptr = (*mol)[*biter[0].first].get(); if (bond_1(bptr)) { aptr = bptr->getOtherAtom(atom[1]); aidx = aptr->getIdx(); if (avisit[aidx] == 0 && atom_1(aptr)) { avisit[aidx] = 1; atom[0] = aptr; goto case_10; } else ++biter[0].first; } else ++biter[0].first; } else goto case_5; goto case_9; CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 28. Recent improvements insights • Profiling NameRxn in 2014 to diagnose performance problems with RDKit revealed that pattern matching and transformation accounted for <1% of runtime. • The bottleneck is actually in canonicalization. • The Ah-ha experience was to use hash filtering. • Check molecular formula: CiHjBrkCllFmInNoOpPqSr • Additional cleverness allows pre-sanitization hashing. – Triple bond count, but not single or double bond count. – Perhaps there’s something in InChI-style hashing after all. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 29. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 30. Pistachio: Siri for chemists CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 31. Multi-step Synthetic Routes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Intermediates 197702103114 56611 31403 17268 9230 5057 2701 1256 639 301 136 58 15 5 2 Terminal Products 385149149445 81837 47579 27670 16619 9320 5263 2511 1330 678 373 111 63 8 6 5 0 100000 200000 300000 400000 500000 600000 700000 Occurrences Number of steps Intermediates Terminal Products CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 32. Application to planning 1 • Cinnamic Acid (PhCHCHCO2) 1. Bromo Heck reaction (272) 2. Horner-Wadsworth-Emmons reaction (268) 3. Wittig olefination (129) 4. Bromo Heck-type reaction (62) 5. Iodo Heck reaction (49) 6. Triflyloxy Heck[-type] reaction (43) 7. Ester Schotten-Baumann (10) 8. Bromo Suzuki coupling (5) 9. Stille reaction (2) 10. Olefin metathesis (1) CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 33. Application to planning 2 • p-Nitrobenzoic acid 1. Nitrile to carboxy (12) 2. CO2H-Me deprot (8) 3. CO2H-Et deprot (5) 4. Ester hydrolysis (1) 5. Nitration (1) • p-Nitrotoluene 1. Nitration (96) 2. Bromo Suzuki-type (1) 3. Chloro Suzuki (1) CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 34. tactical vs. strategic reactions • Traditional Synthesis Planning has concentrated on the Strategic Application of Named Reactions. • However, there’s much to be gained for the Tactical Application of Unnamed Reactions. • Nitro reduction is the 6th most frequent reaction. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 35. functional group interconversion • Relative frequency of simple group conversion From a total of 7,252,419 reactions from USPTO & EPO patents. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018 Amino Bromo Chloro Fluoro Hydroxy Iodo Sulfanyl Thioxo Amino 3951 2949 990 3976 3657 143 Bromo 3121 589 637 2390 738 435 Chloro 9424 717 1606 2156 744 798 Fluoro 1549 180 484 826 28 103 Hydroxy 2572 11441 31593 7641 3004 348 Iodo 155 445 47 Nitro 126606 138 Oxo 8419
  • 36. nextmove’s strategy • Why expose reaction planning to the end-user at all? • During our collaboration with ChemSpace, it has become clear that what was required was not similarity nor superstructure search by a form of synthesis-aware search. • This is similar to the challenge faced by traditional restrosynthesis tools in identifying a leaf/goal state. • 937M purchasable compounds makes this is non-trivial. • The usual challenges of functional group, tautomer and protonation state lookups, now also protecting groups. • But why not return the carboxylic acid when searching for the acid chloride, or alcohol when searching for bromo derivative. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 37. building block search • Query: Results: CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 38. the challenge of regioselectivity • A tricky benchmark is reactions of 2,4,5-trichloropyrimidine • The nature of pyrimidine makes the chloro at the 4-position more reactive than the 2 position which is more reactive than the 5 position. • Simple quantum mechanical have difficulty discerning this order. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 39. handy’s rule • Scott Handy and Yanan Zhang, “A Simple Guide for Predicting Regioselectivity in the Coupling of Polyhaloheteroaromatics”, Chemical Communications, 3:299-301, Nov 2005. Abstract A simple guide for predicting the order and site of coupling (Suzuki, Stille, Negishi, Sonogashira, etc.) in polyhaloheteroaromatics based upon the 1H NMR chemical shift values of the parent non-halogenated heteroaromatics has been developed. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 40. electrophilic substitution of heterocycles with qm methods • J.C. Kromann et al., “Fast and Accurate Prediction of the Regioselectivity of Electrophilic Aromatic Substitution Reactions”, Chem. Sci., 9(3):660-665, Nov 2017. • M. Kruszyk et al., “Computational Methods to Predict the Regioselectivity of Electrophilic Aromatic Substitution Reactions of Heteroaromatic Systems”, J. Org. Chem. 81(12):5128-5134, Jun 2016. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 41. a data-driven strategy • One possible approach to the challenge of regioselectivity is to derive preferences by large-scale (statistical) analysis of reaction data sets. • Reaction classification can identify the subset of relevant examples, which can then be used to produce tables of heterocycles position preferences. • Directing groups and their influence can also be identified and tabulated. • See https://www.scripps.edu/baran/images/grpmtgpdf/Gutekunst_Apr_10.pdf CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 42. the semantics of atom mapping Prof. Goodman poses an interesting question of atom mapping. 4.1.6 Cyclic Beckman Rearrangement 1.2.9 Alcohol + Amine Condensation or 1.1.3 Iodo N-methylation CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 43. Acknowledgements • NextMove Software – Noel O’Boyle – John Mayfield • NextMove Alumni – Daniel Lowe • Thank you for you time. • Questions? • Thoughts? • AbbVie • AstraZeneca • Bristol-Myers Squibb • ChemSpace • Eli Lilly • GlaxoSmithKline • Hoffmann-La Roche • IBM Research Zurich • Merck • Novartis • Royal Society of Chemistry • Vernalis • Vertex Pharmaceuticals CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 44. Transforms vs. reactions Importance of Reaction Mechanism Example: Ullman-type Coupling Reactions SMIRKS: [H][N:1].[Cl][c:2]>>[N:1][c:2] The SMIRKS transform alone is insufficient to predict the products and by-products in this example. A measure of nucleophility is desirable for each atom in a molecule. Without this software may be misled into believing that protecting groups are required. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 45. Analysis of pharmaceutical elns • NextMove Software’s HazELNut software is used to export and analyze ELN content at 6 of the top 10 large pharmaceutical companies. • In-house analysis of this data, across the industry, reveals a surprisingly high rate of synthesis failure, not indicated in the published literature (journal articles, patent applications or reaction databases). • Understanding the causes of these failures is perhaps more significant than attempting to access new chemistries. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 46. Extracting mps and reactions CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 47. Example reaction mining INPUT Methyl 4-[(pentafluorophenoxy)sulfonyl]benzoate To a solution of methyl 4-(chlorosulfonyl)benzoate (606 mg, 2.1 mmol, 1 eq) in DCM (35 ml) was added pentafluorophenol (412 mg, 2.2 mmol, 1.1 eq) and Et3N (540 mg, 5.4 mmol, 2.5 eq) and the reaction mixture stirred at room temperature until all of the starting material was consumed. The solvent was evaporated in vacuo and the residue redissolved in ethyl acetate (10 ml), washed with water (10 ml), saturated sodium hydrogen carbonate (10 ml), dried over sodium sulphate, filtered and evaporated to yield the title compound as a white solid (690 mg, 1.8 mmol, 85%).
  • 49. CHEMICAL reactions for free CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 50. bond changes in indole synthesis Synthesis A B C D Baeyer-Emmerling M Bartoli M M Bischler-Möhlau M C M Fischer M C M Fukuyama M Hemetsberger M Larock M C M Mandelung M Nenitzescu M M Reissert M M CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 51. Suzuki coupling leaving groups Leaving Group Mean Yield N Observations Bromo 58.80% 10817 Chloro 57.96% 2752 Iodo 57.21% 2049 Triflyloxy 65.48% 717 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 52. Beyond drug guru sildenafil (viagra) vardenafil (levitra) CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 53. Eli lilly’s automated synthesis lab Alexander G. Godfrey, Thierry Masquelin and Horst Hemmerle, “A Remote-Control Adaptive MedChem Lab: An Innovative Approach to Enable Drug Discovery in the 21st Century”, Drug Discovery Today, Vol. 18, Nos. 17-18, September 2013. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 54. Synthesis failures at lilly • At the 2013 Sheffield Cheminformatics conference, Christos Nicolaou highlighted the technical challenge with predicting compounds potentially accessible by the Lilly’s Advanced Synthesis Lab (ASL). • In a proof-of-concept pilot project, only 25 of 90 compounds suggested by Lilly’s Annotated Reaction Repository (LARR) rule-set could be successfully synthesized in practice. • http://cisrg.shef.ac.uk/shef2013/talks/14.pdf CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 55. Synthesis failures at gsk • Pickett et al. 2011 describe the parallel synthesis of a 50x50 library of MMP-12 inhibitors by an iodo-Suzuki coupling reaction. • Only 1704 of 2500 could be assayed [566 not made] Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 56. Learning from failure (logp shift) • Nadin et al. 2012 [1] hypothesize that low LogP is a major cause of synthesis failure in parallel synthesis of combinatorial libraries. • Analysis confirms that this is indeed a significant factor for the GSK MMP-12 library. – 1704 compounds measured, mean logP = 3.56 (1.44) – 566 compounds not made, mean logP = 2.83 (1.52) – Student’s t-test for different distributions, p<2x10-22. 1. Nadin, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012. CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 57. Example (actionable) insight The clear trend between Suzuki coupling success rate and predicted octanol-water partition co-efficient. Data: Pickett et al., ACS Med. Chem. Lett. 2(1):28, 2011 97 232 340 525 474 202 63 55 119 127 141 80 36 8 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% <1.0 1.0-2.0 2.0-3.0 3.0-4.0 4.0-5.0 5.0-6.0 >6.0 Reaction Product Predicted cLogP Sucessful Reactions Failed Reactions CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 58. big data mining confirmation of Nadin-churcher hypothesis On 16,335 Suzuki coupling reactions extracted from US patent applications between 2001 and 2012. Nadin, Hattotuwagama and Churcher ,“Lead-Oriented Synthesis: A New Opportunity for Synthetic Chemistry”, Angew. Chem. Int. Ed, 51:1114 2012. LogP Mean Yield N Obs < 1.0 52.89% 196 1.0 – 2.0 56.02% 1155 2.0 – 3.0 56.72% 2881 3.0 – 4.0 58.14% 4071 4.0 – 5.0 57.26% 3186 5.0 – 6.0 59.25% 2126 > 6.0 63.83% 2720 CINF Reaction Analytics. 256th ACS National Meeting, Boston, MA. Thursday 23rd August 2018
  • 59. “big data” reaction yield analysis AZ Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
  • 60. “big data” reaction yield analysis AZ Data courtesy of Nick Tomkinson, AstraZeneca RDI, Alderley Park, UK.
  • 61. Functional group changes analyzed Do the results make sense at all? Functional Group Avg in Reaction Overall Average halogen -0.98 -0.3 alcohol -0.95 -0.12 halogen_notfluorine -0.89 -0.27 alcohol_aromatic -0.67 -0.04 halogen_aliphatic -0.62 -0.15 halogen_notfluorine_aliphatic -0.62 -0.14 carboxylicacid -0.5 -0.23 halogen_bromine -0.42 -0.11 halogen_bromine_aliphatic -0.39 -0.06 halogen_aromatic -0.36 -0.16 alcohol_aliphatic -0.28 -0.08 halogen_notfluorine_aromatic -0.27 -0.13 amine -0.04 -0.3 amine_aliphatic -0.04 -0.27 carboxylicacid_aliphatic -0.04 -0.08 halogen_bromine_aromatic -0.03 -0.05 amine_tertiary -0.02 -0.06 amine_tertiary_aliphatic -0.02 -0.08 carboxylicacid_aromatic -0.02 -0.03 amine_cyclic -0.01 -0.02 halogen_bromine_bromoketone -0.01 0 Functional Group Avg in Reaction Overall Average acidchloride 0 -0.07 acidchloride_aliphatic 0 -0.05 acidchloride_aromatic 0 -0.02 aldehyde 0 -0.04 aldehyde_aliphatic 0 -0.01 aldehyde_aromatic 0 -0.03 amine_aromatic 0 -0.03 amine_primary 0 -0.15 amine_primary_aliphatic 0 -0.07 amine_primary_aromatic 0 -0.07 amine_secondary 0 -0.04 amine_secondary_aliphatic 0 -0.07 amine_secondary_aromatic 0 0.03 amine_tertiary_aromatic 0 0 azide 0 0 azide_aliphatic 0 0 azide_aromatic 0 0 boronicacid 0 -0.03 boronicacid_aliphatic 0 0 boronicacid_aromatic 0 -0.03 carboxylicacid_alphaamino 0 0 isocyanate 0 -0.01 isocyanate_aliphatic 0 0 isocyanate_aromatic 0 0 nitro 0 -0.03 nitro_aliphatic 0 0 nitro_aromatic 0 -0.03 sulfonylchloride 0 -0.02 sulfonylchloride_aliphatic 0 -0.01 Compare the average deltas for the >39K instances of Williamson ether synthesis These look sensible