SlideShare ist ein Scribd-Unternehmen logo
1 von 37
How can the International Chemical
Identifier (InChI) be extended to non-
                     trivial chemicals?
                        of the pillars of a
                          V. Tkachenko, A.J. Williams,
         Y. Borodina, F. Switzer, T. Peryea, L. Callahan

                                    ACS Philly August 2012
What is InChI
InChI Examples


     CH3CH2OH
                      InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3
      ethanol




                      InChI=1S/C6H8O6/c7-1-2(8)5-
    L-ascorbic acid   3(9)4(10)6(11)12-5/h2,5,7-8,10-
                      11H,1H2/t2-,5+/m0/s1
InChI Structure
InChIKey
   The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the
    SHA-256 algorithm)
   Designed to allow for easy web searches of chemical compounds
   InChIKeys consist of
       14 characters resulting from a hash of the connectivity information of the InChI
       followed by 9 characters resulting from a hash of the remaining layers of the InChI
       followed by a single character indication the version of InChI used
       followed by single checksum character




   InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-
    11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1
   BQJCRHHNABKAKU-KBQPJGBKSA-N
   Unlike InChI, InChIKey  CT only by lookup
Proliferation of InChI
Search by InChI
ChemSpider Google Search
http://www.chemspider.com/google/
What’s the catch?

 InChI has limitations
 InChI is ideal for
    Simple
    Static
    Well-defined graphs
 Real chemical substances can only be
  approximated by such graphs
Limitations
 Non-trivial stereo (e.g. axial, planar)
 Non-trivial tautomers (e.g. ring-chain)
 Mixtures – full stereo is rarely known
 Polymers
 Markush structures
 Organometalics
 Inorganics
 Materials
 Reactions
 Etc
Chemical data complexity
Work in progress
   InChI Extensions: Under the guidance of IUPAC, several sub-teams are now
    working on expanding InChI to new areas of chemical representation:

      Reaction InChI (RInChI): the reaction working group has completed its
       recommendations, and work is ready to begin.

      Polymers/Mixtures: The polymers/mixtures working group also has
       submitted its recommendations, and work to incorporate the new
       representations should begin once version 1.04 is released.

      Markush: This project is the most complex undertaken to date. The initial
       recommendations have been submitted, but financing of the work still
       needs to be sorted out.

   But what do we do NOW???
Data
   Validation

 Standardization

    Filtering

Componentization
                   Deposition Process




 Deduplication

    Mapping
      data
      Non-
   redundant
ChemSpider Data Model
Organometallics
Mixtures or unknown stereo
Accelrys Enhanced Stereo
MOL V3000
Enhanced stereo and InChI…
 Unfortunately not supported
 Is it important?
 Now real-world examples…
FDA Substance Registration System
Stoichiometric and non-stoichiometric mixtures



                                     Moiety 1:
Substance:




                                      Moiety 2:
Substance:   Moiety 1:



             Moiety 2:



             Moiety 3:



             Moiety 4:
Substance:   Moiety 1:




             Moiety 2:
                         (undefined)
Moiety 1:
Substance:


                         (A)


             Moiety 2:
                         (B)
D-glucose
SRS standardization approach
   Substance description
   Standardization module
   Moieties generator
   Normalization
   InChI[Key] generator


 Hash function f(InChIKeys, moieties)


 Unique ID
 Standard description
SRS TBD
 Markush

 Polymers

 Proteins

 Inorganics

 Materials
OpenPHACTS
 Open PHACTS is an Innovative Medicines Initiative
  (IMI) – 3 years project

 To reduce the barriers to drug discovery in industry,
  academia and for small businesses

 To build an open platform, integrating chemistry and
  biology data from public domain resources

 Semantic web platform

 Open Standards, Open Data and Open Source
OpenPHACTS specifics
 Active/inactive ingredient

 Parent/child

 Sample/substance

 Misreferences (!!!)
ChemSpider Reactions
ChemSpider Reaction Challenges
 Deduplication

 Identification

 Deposition
Conclusions
 InChI is The Identifier

 InChI has its limitations

 InChI is work in progress

 InChI deficiencies can be hot-fixed
Acknowledgements
 RSC Cheminformatics group

 FDA SRS group

 OpenPHACTS consortium

 Software: InChI, GGA Software
Thank you

Email: tkachenkov@rsc.org
Blog: www.chemspider.com/blog
SLIDES:
http://www.slideshare.net/valerytkachenko16

Weitere ähnliche Inhalte

Andere mochten auch (7)

Do arco da rua Augusta, melhor varanda de Lisboa
Do arco da rua Augusta, melhor varanda de LisboaDo arco da rua Augusta, melhor varanda de Lisboa
Do arco da rua Augusta, melhor varanda de Lisboa
 
O Segredo da Cebola
O Segredo da CebolaO Segredo da Cebola
O Segredo da Cebola
 
Microbios
MicrobiosMicrobios
Microbios
 
Toda a verdade sobre a linhaça
Toda a verdade sobre a linhaçaToda a verdade sobre a linhaça
Toda a verdade sobre a linhaça
 
Cuide seus olhos
Cuide seus olhosCuide seus olhos
Cuide seus olhos
 
Tutoría en entornos virtuales (moodle)
Tutoría en entornos virtuales (moodle)Tutoría en entornos virtuales (moodle)
Tutoría en entornos virtuales (moodle)
 
Dezenove predios inusitados e curiosos
Dezenove predios inusitados e curiososDezenove predios inusitados e curiosos
Dezenove predios inusitados e curiosos
 

Ähnlich wie How can the international chemical identifier (InChI) be extended to non trivial chemicals

Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
Valery Tkachenko
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Ähnlich wie How can the international chemical identifier (InChI) be extended to non trivial chemicals (20)

Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
 
Great promise of navigating the internet using in chis
Great promise of navigating the internet using in chisGreat promise of navigating the internet using in chis
Great promise of navigating the internet using in chis
 
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity KnowledgeDMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
DMCM2018 Community Resources Connecting Chemistry and Toxicity Knowledge
 
Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...Global Regulatory Issues: one BA method, one validation, one report ...
Global Regulatory Issues: one BA method, one validation, one report ...
 
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
A brief history of reaction analytics (CINF 144, ACS National Meeting 2018-08...
 
ICH
ICHICH
ICH
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 
Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
ChemSpider – An Online Database and Registration System Linking the Web
ChemSpider – An Online Database and  Registration System Linking the WebChemSpider – An Online Database and  Registration System Linking the Web
ChemSpider – An Online Database and Registration System Linking the Web
 
Mode of action analysis
Mode of action analysisMode of action analysis
Mode of action analysis
 
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
Link - Opportunities and Challenges for Research on Intelligent Algorithms fo...
 
In vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicologyIn vitro data and in silico models for predictive toxicology
In vitro data and in silico models for predictive toxicology
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream products
 
Data4Impact booklet overview of results
Data4Impact booklet overview of resultsData4Impact booklet overview of results
Data4Impact booklet overview of results
 
PubChem and Big Data Chemistry
PubChem and Big Data ChemistryPubChem and Big Data Chemistry
PubChem and Big Data Chemistry
 
Finding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging DrugsFinding Optimal Compound Dosage for Anti-Aging Drugs
Finding Optimal Compound Dosage for Anti-Aging Drugs
 
Best compound characterization protocol
Best compound characterization protocolBest compound characterization protocol
Best compound characterization protocol
 
Good Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging TestingGood Model Organism for Anti Aging Testing
Good Model Organism for Anti Aging Testing
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

How can the international chemical identifier (InChI) be extended to non trivial chemicals

  • 1. How can the International Chemical Identifier (InChI) be extended to non- trivial chemicals? of the pillars of a V. Tkachenko, A.J. Williams, Y. Borodina, F. Switzer, T. Peryea, L. Callahan ACS Philly August 2012
  • 3. InChI Examples CH3CH2OH InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3 ethanol InChI=1S/C6H8O6/c7-1-2(8)5- L-ascorbic acid 3(9)4(10)6(11)12-5/h2,5,7-8,10- 11H,1H2/t2-,5+/m0/s1
  • 5. InChIKey  The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the SHA-256 algorithm)  Designed to allow for easy web searches of chemical compounds  InChIKeys consist of  14 characters resulting from a hash of the connectivity information of the InChI  followed by 9 characters resulting from a hash of the remaining layers of the InChI  followed by a single character indication the version of InChI used  followed by single checksum character  InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10- 11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1  BQJCRHHNABKAKU-KBQPJGBKSA-N  Unlike InChI, InChIKey  CT only by lookup
  • 9. What’s the catch?  InChI has limitations  InChI is ideal for  Simple  Static  Well-defined graphs  Real chemical substances can only be approximated by such graphs
  • 10. Limitations  Non-trivial stereo (e.g. axial, planar)  Non-trivial tautomers (e.g. ring-chain)  Mixtures – full stereo is rarely known  Polymers  Markush structures  Organometalics  Inorganics  Materials  Reactions  Etc
  • 12. Work in progress  InChI Extensions: Under the guidance of IUPAC, several sub-teams are now working on expanding InChI to new areas of chemical representation:  Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.  Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.  Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out.  But what do we do NOW???
  • 13. Data Validation Standardization Filtering Componentization Deposition Process Deduplication Mapping data Non- redundant
  • 19. Enhanced stereo and InChI…  Unfortunately not supported  Is it important?  Now real-world examples…
  • 21. Stoichiometric and non-stoichiometric mixtures Moiety 1: Substance: Moiety 2:
  • 22. Substance: Moiety 1: Moiety 2: Moiety 3: Moiety 4:
  • 23. Substance: Moiety 1: Moiety 2: (undefined)
  • 24. Moiety 1: Substance: (A) Moiety 2: (B)
  • 26. SRS standardization approach  Substance description  Standardization module  Moieties generator  Normalization  InChI[Key] generator  Hash function f(InChIKeys, moieties)  Unique ID  Standard description
  • 27. SRS TBD  Markush  Polymers  Proteins  Inorganics  Materials
  • 28. OpenPHACTS  Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project  To reduce the barriers to drug discovery in industry, academia and for small businesses  To build an open platform, integrating chemistry and biology data from public domain resources  Semantic web platform  Open Standards, Open Data and Open Source
  • 29.
  • 30.
  • 31. OpenPHACTS specifics  Active/inactive ingredient  Parent/child  Sample/substance  Misreferences (!!!)
  • 33.
  • 34. ChemSpider Reaction Challenges  Deduplication  Identification  Deposition
  • 35. Conclusions  InChI is The Identifier  InChI has its limitations  InChI is work in progress  InChI deficiencies can be hot-fixed
  • 36. Acknowledgements  RSC Cheminformatics group  FDA SRS group  OpenPHACTS consortium  Software: InChI, GGA Software
  • 37. Thank you Email: tkachenkov@rsc.org Blog: www.chemspider.com/blog SLIDES: http://www.slideshare.net/valerytkachenko16