SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Evaluating the Quality and
Performance of Automatic Atom
      Mapping Algorithms

                Daniel Lowe and Roger Sayle
                    NextMove Software
                       Cambridge, UK

 ACS National Meeting, Philadelphia, USA 20th August 2012
What is Atom-Mapping?




                                                      Mapping
                                                      algorithm




ACS National Meeting, Philadelphia, USA 20th August 2012
Why Perform Atom-Mapping?

• Assigning roles to reagents

• Normalization of reactions for registration




     ACS National Meeting, Philadelphia, USA 20th August 2012
Why Perform Atom-Mapping?

• More precise database searches
  – Solvents/catalysts can be distinguished from
    reactants
  – Allows the relationship between the reactant
    atoms and product atoms to be made explicit




    ACS National Meeting, Philadelphia, USA 20th August 2012
Example
• I want to find reactions converting an alkene
  to a cyclopropane so I search for C=C>>C1CC1




     ACS National Meeting, Philadelphia, USA 20th August 2012
Why Perform Atom-Mapping?

• Identifying suspect reactions:




     ACS National Meeting, Philadelphia, USA 20th August 2012
Qualities to look for in an atom
        mapping algorithm
• Chemically plausible atom mappings
• Ability to distinguish genuine reactants from
  solvents/catalysts
• Support for unbalanced reactions
  – Side product not specified
  – Reactant stoichiometry > 1
• Fast run-time


     ACS National Meeting, Philadelphia, USA 20th August 2012
Algorithms Evaluated

      Vendor:Program                                         Version
       ChemAxon:Marvin                                        5.10.1
              GGA:Indigo                                       1.1
         InfoChem:ICMAP                                       5.10
PerkinElmer:ChemDraw Ultra                                    12.0




  ACS National Meeting, Philadelphia, USA 20th August 2012
Methodology

                        Test set                                     Reactions
         Pharmaceutical ELN subset                                    18,244
           ChemReact68 database                                       67,926
           SPRESI database subset                                      5,230
      Reactions extracted from 2008-                                  562,872
     2011 USPTO patent applications*



* Lowe, D. M. Automated Extraction of Reactions from the Patent Literature.
243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012.

          ACS National Meeting, Philadelphia, USA 20th August 2012
Methodology-cont.

• Reaction SMILES were used as input and
  output for all algorithms bar ICMAP
• Input and output was converted to and from
  RDF for use with ICMAP
• Indigo was ran with its default configuration
  and more lenient settings for matching
  valences, charges and bond orders
• Marvin was configured to use its best
  quality mapping strategy
     ACS National Meeting, Philadelphia, USA 20th August 2012
Ability to map all product atoms




  ACS National Meeting, Philadelphia, USA 20th August 2012
c-c bonds broken




ACS National Meeting, Philadelphia, USA 20th August 2012
Speed Comparison




  Average           1.7                    3.6                  1.6       4.0
reagents per
  reaction
               ACS National Meeting, Philadelphia, USA 20th August 2012
Simple mappings




        Marvin/ChemDraw/Indigo/ICMAP
ACS National Meeting, Philadelphia, USA 20th August 2012
Simple mappings




        Marvin/ChemDraw/Indigo/ICMAP
ACS National Meeting, Philadelphia, USA 20th August 2012
More complicated Mappings




                                        Marvin




                                   ChemDraw

ACS National Meeting, Philadelphia, USA 20th August 2012
More complicated Mappings



                                        Indigo




                                        ICMAP


ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                        Marvin
ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                   ChemDraw
ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                        Indigo
ACS National Meeting, Philadelphia, USA 20th August 2012
Reuse of reactants




                                        ICMAP
ACS National Meeting, Philadelphia, USA 20th August 2012
Single Atom Mapping




                               ICMAP/Marvin




                          ChemDraw/Indigo

ACS National Meeting, Philadelphia, USA 20th August 2012
Bugs and quirks

• Marvin
  – 2 unsuccessful mappings produced unchecked
    exceptions rather than checked exceptions
• ChemDraw
  – Hydrogen on aromatic atoms missing in SMILES
     output
• Indigo
  – Calculation of valency fails for aromatic sulfur


     ACS National Meeting, Philadelphia, USA 20th August 2012
Bugs and quirks

• ICMAP
  – Single atom products are interpreted as empty
    molecules or occasionally replaced by a product
    from a previous reaction (bug reported)
  – Input files must be < 2gb and use dos line endings




    ACS National Meeting, Philadelphia, USA 20th August 2012
conclusions

• ICMAP produced the best quality mappings on
  the tested sets

• Atom mapping isn’t as simple as finding a
  maximum common subgraph mapping

• In all the algorithms there were aspects that
  could be improved to yield appreciable
  benefits
     ACS National Meeting, Philadelphia, USA 20th August 2012
acknowledgements

• Ed Griffen and Nick Tomkinson, AstraZeneca.
• Andrew Wooster, GSK.
• Hans Kraut, InfoChem


• Thank you for your time.




     ACS National Meeting, Philadelphia, USA 20th August 2012

Weitere ähnliche Inhalte

Ähnlich wie Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms

Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patentsdan2097
 
Robert kiss acs_2012_sd_upload
Robert kiss acs_2012_sd_uploadRobert kiss acs_2012_sd_upload
Robert kiss acs_2012_sd_uploadrkiss81
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsNextMove Software
 
Recent improvements in marvin v6 reaction atom mapping and its application to...
Recent improvements in marvin v6 reaction atom mapping and its application to...Recent improvements in marvin v6 reaction atom mapping and its application to...
Recent improvements in marvin v6 reaction atom mapping and its application to...NextMove Software
 
Green Chemistry &amp; Engineering
Green Chemistry &amp; EngineeringGreen Chemistry &amp; Engineering
Green Chemistry &amp; Engineeringernestvictor
 
Transforming pharma to academia
Transforming pharma to academiaTransforming pharma to academia
Transforming pharma to academiaDIv CHAS
 
Integrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
Integrating Analyzers with Automation Systems: Oil and Gas by David SchihabelIntegrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
Integrating Analyzers with Automation Systems: Oil and Gas by David SchihabelISA Interchange
 
Gc&amp; E Presentation Rev2 A
Gc&amp; E Presentation Rev2 AGc&amp; E Presentation Rev2 A
Gc&amp; E Presentation Rev2 Aernestvictor
 
Virtual Reaction Service Using Chem Axon Reactor July06
Virtual Reaction Service Using Chem Axon Reactor July06Virtual Reaction Service Using Chem Axon Reactor July06
Virtual Reaction Service Using Chem Axon Reactor July06DanielSButler
 
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...Wonyong Koh
 
Open2012 sharing-best-practices-hopkins
Open2012 sharing-best-practices-hopkinsOpen2012 sharing-best-practices-hopkins
Open2012 sharing-best-practices-hopkinsthe nciia
 
Efficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
Efficient Perception of Proteins and Nucleic Acids from Atomic ConnectivityEfficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
Efficient Perception of Proteins and Nucleic Acids from Atomic ConnectivityNextMove Software
 

Ähnlich wie Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms (12)

Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
 
Robert kiss acs_2012_sd_upload
Robert kiss acs_2012_sd_uploadRobert kiss acs_2012_sd_upload
Robert kiss acs_2012_sd_upload
 
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical PatentsChemical Text Mining for Current Awareness of Pharmaceutical Patents
Chemical Text Mining for Current Awareness of Pharmaceutical Patents
 
Recent improvements in marvin v6 reaction atom mapping and its application to...
Recent improvements in marvin v6 reaction atom mapping and its application to...Recent improvements in marvin v6 reaction atom mapping and its application to...
Recent improvements in marvin v6 reaction atom mapping and its application to...
 
Green Chemistry &amp; Engineering
Green Chemistry &amp; EngineeringGreen Chemistry &amp; Engineering
Green Chemistry &amp; Engineering
 
Transforming pharma to academia
Transforming pharma to academiaTransforming pharma to academia
Transforming pharma to academia
 
Integrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
Integrating Analyzers with Automation Systems: Oil and Gas by David SchihabelIntegrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
Integrating Analyzers with Automation Systems: Oil and Gas by David Schihabel
 
Gc&amp; E Presentation Rev2 A
Gc&amp; E Presentation Rev2 AGc&amp; E Presentation Rev2 A
Gc&amp; E Presentation Rev2 A
 
Virtual Reaction Service Using Chem Axon Reactor July06
Virtual Reaction Service Using Chem Axon Reactor July06Virtual Reaction Service Using Chem Axon Reactor July06
Virtual Reaction Service Using Chem Axon Reactor July06
 
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
ALD 2018 Tutorial ALD precursors for semiconductor and its development challe...
 
Open2012 sharing-best-practices-hopkins
Open2012 sharing-best-practices-hopkinsOpen2012 sharing-best-practices-hopkins
Open2012 sharing-best-practices-hopkins
 
Efficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
Efficient Perception of Proteins and Nucleic Acids from Atomic ConnectivityEfficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
Efficient Perception of Proteins and Nucleic Acids from Atomic Connectivity
 

Mehr von dan2097

From Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resourcesFrom Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resourcesdan2097
 
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...dan2097
 
OPSIN: Taming the jungle of IUPAC chemical nomenclature
OPSIN: Taming the jungle of IUPAC chemical nomenclatureOPSIN: Taming the jungle of IUPAC chemical nomenclature
OPSIN: Taming the jungle of IUPAC chemical nomenclaturedan2097
 
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical NomenclatureOPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclaturedan2097
 
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChIInChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChIdan2097
 
Automated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent LiteratureAutomated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent Literaturedan2097
 

Mehr von dan2097 (6)

From Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resourcesFrom Open text mining solutions to Open Data resources
From Open text mining solutions to Open Data resources
 
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
Tackling the difficult areas of chemical entity extraction: Misspelt chemical...
 
OPSIN: Taming the jungle of IUPAC chemical nomenclature
OPSIN: Taming the jungle of IUPAC chemical nomenclatureOPSIN: Taming the jungle of IUPAC chemical nomenclature
OPSIN: Taming the jungle of IUPAC chemical nomenclature
 
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical NomenclatureOPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
 
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChIInChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
InChI vs IUPAC nomenclature: Aspects to be aware of when using Standard InChI
 
Automated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent LiteratureAutomated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent Literature
 

Kürzlich hochgeladen

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms

  • 1. Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms Daniel Lowe and Roger Sayle NextMove Software Cambridge, UK ACS National Meeting, Philadelphia, USA 20th August 2012
  • 2. What is Atom-Mapping? Mapping algorithm ACS National Meeting, Philadelphia, USA 20th August 2012
  • 3. Why Perform Atom-Mapping? • Assigning roles to reagents • Normalization of reactions for registration ACS National Meeting, Philadelphia, USA 20th August 2012
  • 4. Why Perform Atom-Mapping? • More precise database searches – Solvents/catalysts can be distinguished from reactants – Allows the relationship between the reactant atoms and product atoms to be made explicit ACS National Meeting, Philadelphia, USA 20th August 2012
  • 5. Example • I want to find reactions converting an alkene to a cyclopropane so I search for C=C>>C1CC1 ACS National Meeting, Philadelphia, USA 20th August 2012
  • 6. Why Perform Atom-Mapping? • Identifying suspect reactions: ACS National Meeting, Philadelphia, USA 20th August 2012
  • 7. Qualities to look for in an atom mapping algorithm • Chemically plausible atom mappings • Ability to distinguish genuine reactants from solvents/catalysts • Support for unbalanced reactions – Side product not specified – Reactant stoichiometry > 1 • Fast run-time ACS National Meeting, Philadelphia, USA 20th August 2012
  • 8. Algorithms Evaluated Vendor:Program Version ChemAxon:Marvin 5.10.1 GGA:Indigo 1.1 InfoChem:ICMAP 5.10 PerkinElmer:ChemDraw Ultra 12.0 ACS National Meeting, Philadelphia, USA 20th August 2012
  • 9. Methodology Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68 database 67,926 SPRESI database subset 5,230 Reactions extracted from 2008- 562,872 2011 USPTO patent applications* * Lowe, D. M. Automated Extraction of Reactions from the Patent Literature. 243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012. ACS National Meeting, Philadelphia, USA 20th August 2012
  • 10. Methodology-cont. • Reaction SMILES were used as input and output for all algorithms bar ICMAP • Input and output was converted to and from RDF for use with ICMAP • Indigo was ran with its default configuration and more lenient settings for matching valences, charges and bond orders • Marvin was configured to use its best quality mapping strategy ACS National Meeting, Philadelphia, USA 20th August 2012
  • 11. Ability to map all product atoms ACS National Meeting, Philadelphia, USA 20th August 2012
  • 12. c-c bonds broken ACS National Meeting, Philadelphia, USA 20th August 2012
  • 13. Speed Comparison Average 1.7 3.6 1.6 4.0 reagents per reaction ACS National Meeting, Philadelphia, USA 20th August 2012
  • 14. Simple mappings Marvin/ChemDraw/Indigo/ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 15. Simple mappings Marvin/ChemDraw/Indigo/ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 16. More complicated Mappings Marvin ChemDraw ACS National Meeting, Philadelphia, USA 20th August 2012
  • 17. More complicated Mappings Indigo ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 18. Reuse of reactants ACS National Meeting, Philadelphia, USA 20th August 2012
  • 19. Reuse of reactants Marvin ACS National Meeting, Philadelphia, USA 20th August 2012
  • 20. Reuse of reactants ChemDraw ACS National Meeting, Philadelphia, USA 20th August 2012
  • 21. Reuse of reactants Indigo ACS National Meeting, Philadelphia, USA 20th August 2012
  • 22. Reuse of reactants ICMAP ACS National Meeting, Philadelphia, USA 20th August 2012
  • 23. Single Atom Mapping ICMAP/Marvin ChemDraw/Indigo ACS National Meeting, Philadelphia, USA 20th August 2012
  • 24. Bugs and quirks • Marvin – 2 unsuccessful mappings produced unchecked exceptions rather than checked exceptions • ChemDraw – Hydrogen on aromatic atoms missing in SMILES output • Indigo – Calculation of valency fails for aromatic sulfur ACS National Meeting, Philadelphia, USA 20th August 2012
  • 25. Bugs and quirks • ICMAP – Single atom products are interpreted as empty molecules or occasionally replaced by a product from a previous reaction (bug reported) – Input files must be < 2gb and use dos line endings ACS National Meeting, Philadelphia, USA 20th August 2012
  • 26. conclusions • ICMAP produced the best quality mappings on the tested sets • Atom mapping isn’t as simple as finding a maximum common subgraph mapping • In all the algorithms there were aspects that could be improved to yield appreciable benefits ACS National Meeting, Philadelphia, USA 20th August 2012
  • 27. acknowledgements • Ed Griffen and Nick Tomkinson, AstraZeneca. • Andrew Wooster, GSK. • Hans Kraut, InfoChem • Thank you for your time. ACS National Meeting, Philadelphia, USA 20th August 2012