SlideShare a Scribd company logo
1 of 61
Accessing NCI/CADD Web Resources by InChI
Markus Sitzmann
Computer-Aided Drug Design Group, Chemical Biology Laboratory,
Frederick National Laboratory for Cancer Research, NIH, DHHS
http://cactus.nci.nih.gov
Chemical Identifier Resolver (CIR)



                            CIR works as a resolver for different
                            chemical structure identifiers or
                            representations.
                            It allows one to convert a given
                            structure identifier into another
                            representation or structure
                            identifier.




http://cactus.nci.nih.gov/chemical/structure
Chemical Structure Representations

                      SYBYL Line Notation
  SMILES                                         CAS Registry Number
                       chemical names
                                                    GIF image
  ChemNavigator SID                                                    SD File

                           chemical structure
       CML

                                                                        FDA UNII

              NCI/CADD Identifiers
                                            NSC number
                                                                          MRV

     InChI/InChIKey
                                                                 PubChem SID/CID
                             ChemSpider ID
                                                          ChEBI ID
  Chemical Formula                                                        PDB Ligand ID
Chemical Structure Representations

                      SYBYL Line Notation
  SMILES                                         CAS Registry Number
                       chemical names
                                                    GIF image
  ChemNavigator SID                                                    SD File
       CML
                                            InChI                       FDA UNII

              NCI/CADD Identifiers
                                            NSC number
                                                                          MRV

     InChI/InChIKey
                                                                 PubChem SID/CID
                             ChemSpider ID
                                                          ChEBI ID
  Chemical Formula                                                        PDB Ligand ID
Chemical Structure Databases




                 InChI


                               many more …
Chemical Identifier Resolver (CIR)



                            CIR works as a resolver for different
                            chemical structure identifiers or
                            representations.
                            It allows one to convert a given
                            structure identifier into another
                            representation or structure
                            identifier.




http://cactus.nci.nih.gov/chemical/structure
Chemical Identifier Resolver (CIR)
                                         C7H6O2
                                         APtclcactv03051222202D 0 0.00000   0.00000

 WPYMKLBDIGXBTP-FZOZFQFYNA-N             15 15 0 0 0 0 0 0 0 0999 V2000
                                           2.8660 -2.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
                                           3.7321 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
                                           3.7321 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
                                           2.8660 -0.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
                                           2.0000 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
                                           2.0000 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
                                 Works as a resolver for different
                                           2.8660 0.9400 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
                                           3.7321 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
                                 chemical structure identifiers.
                                           2.0000 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
                                           2.8660 -2.6800 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
                                 Allows one to convert a given
                                           4.2690 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
                                           4.2690 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
                                 structure identifier into another
                                           1.4631 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
                                           1.4631 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
                                 representation or structure
                                           3.7321 2.0600 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
                                          1 2 2 0 0 0 0
                                          2 3 1 0 0 0 0
                                 identifier.
                                          3 4 2 0 0 0 0
                                          4 5 1 0 0 0 0
                                          5 6 2 0 0 0 0
                                          1 6 1 0 0 0 0
                                          4 7 1 0 0 0 0
                                          7 8 1 0 0 0 0
                                          7 9 2 0 0 0 0
                                          1 10 1 0 0 0 0
                                          2 11 1 0 0 0 0
                                          3 12 1 0 0 0 0

http://cactus.nci.nih.gov/chemical/structure
                                          5 13 1 0 0 0 0
                                          6 14 1 0 0 0 0
                                          8 15 1 0 0 0 0
ChemWriter Editor                        M END
                               SD file   $$$$
Chemical Identifier Resolver (CIR)
                                         benzoic acid
                                         65-85-0
                                         WLN: QVR
 WPYMKLBDIGXBTP-FZOZFQFYNA-N             Unisept BZA
                                         AIDS018010
                                         Salvo liquid
                                         Benzoic acid-ring-UL-14C
                                         ST5213864
                                         Benzoesaeure
                                         CHEBI:30746
                                 Works as a resolver for different
                                         NSC 149
                                         benzenecarboxylic acid
                                 chemical structure identifiers.
                                         phenylformic acid
                                         Benzoic acid (JP15/USP)
                                 Allows one to convert a given
                                         Benzoic acid (TN)
                                         18102_RIEDEL
                                 structure identifier into another
                                         Aromatic hydroxy acid
                                         Benzoic acid (7CI,8CI,9CI)
                                 representation or structure
                                         Benzoic acid [USAN:JAN]
                                         W213128_ALDRICH
                                         47849_SUPELCO
                                 identifier.
                                         Acide benzoique [French]
                                         Acido benzoico [Italian]
                                         Benzoate (VAN)
                                         Benzoesaeure [German]
                                         Benzoic acid (natural)
                                         Acide benzoique
                                         Benzeneformic acid
                                         Benzenemethanoic acid
                                         Benzoesaeure GK
                                         Benzoesaeure GV

http://cactus.nci.nih.gov/chemical/structure
                                         Benzoic acid, tech.
                                         Carboxybenzene
                                         Kyselina benzoova
ChemWriter Editor                        Phenylcarboxylic acid
                               names
Chemical Identifier Resolver (CIR)

 WPYMKLBDIGXBTP-FZOZFQFYNA-N



                               Works as a resolver for different
                               chemical structure identifiers.
                                        InChIKey=WPYMKLBDIGXBTP-UHFFFAOYSA-N
                                        InChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)
                               Allows one to convert a given
                                        C1=CC=C(C=C1)C(O)=O

                               structure identifier into another
                               representation or structure
                               identifier.




                         InChIKey
http://cactus.nci.nih.gov/chemical/structure
                            InChI
ChemWriter Editor
                           SMILES
Chemical Identifier Resolver (CIR)

programmatic URL API:




 http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”




if a request is not successful: HTTP404 status message
Chemical Identifier Resolver (CIR)

examples:
programmatic URL API:
http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/cas


204255-11-8        MIME type: text/plain

 http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”
http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/image




if a request is not successful: HTTP404 status message
                   MIME type: image/gif
Chemical Identifier Resolver (CIR)

• access by programming libraries/languages (e.g. Python):
 from urllib2 import *
 url = “http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas”
 resolver = urlopen(url)
 try:
      response = resolver.read()
 except HTTPError:
      raise “your own error handling”
 print response
 204255-11-8



• access from Unix shell level (e.g., via wget):
 shell > wget -qO - 
 http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas
 204255-11-8
Chemical Identifier Resolver: InChI/InChIKey

                       Database RegIDs
  structure images     (PubChem, ZINC, eMolecules, ChemSpider ID)
  (GIF, PNG)

(trivial) names                                      SMILES
                          InChI/InChIKey


 IUPAC names (OPSIN)
                                           chemical properties
                                           (MW, formula, …)
         CAS Registry numbers
                                   structure files (sdf, pdb, cdx, …)
Chemical Identifier Resolver (CIR)

                                                                         /smiles
        chemical names                                                   /names, /iupac_name
  IUPAC names (OPSIN)                                                    /cas
          CAS numbers                                                    /inchi, /stdinchi
         SMILES strings                                                  /inchikey, /stdinchikey
 IUPAC InChI/InChIKeys                                                   /ficts, /ficus, /uuuuu
   NCI/CADD Identifiers                                                  /image
       CACTVS HASHISY                      CIR                           /file, /sdf
           NSC number     http://cactus.nci.nih.gov/chemcial/structure   /mw, /monoisotopic_mass
          PubChem SID                                                    /formula
             ZINC Code                                                   /twirl
         ChemSpider ID                                                   /urls
    ChemNavigator SID                                                    /chemspider_id
         eMolecule VID                                                   /pubchem_sid
                                                                         /chemnavigator_sid

        “identifier”                                                      “representation”
Chemical Identifier Resolver (CIR)

                   identifier                                               representation
        http request

                                                            calculation of the        http response
                              identifier is a
                              full structure               requested structure
                             representation                  representation
   detection of            (e.g. SMILES, InChI)
  the identifier                                                                               MIME type
       type                                                   e.g. InChI, GIF image

       identifier is a
                                                                                      e.g. CAS number,
   hashed structure                               structure                           chemical name
     representation
     (e.g. InChIKey),
 chemical name etc.




                                                  database lookup        CSDB
Chemical Structure Database (CSDB)

• ChemNavigator iResearch Library
  compilation of commercially available screening
  compounds from ~300 international chemistry
  suppliers                                                          PubChem
                                                    ChemNav.         ~38%
• PubChem database                                  iResearch Lib.
  including Open NCI database, EPA DSSTox           ~56%
  databases, NIAID HIV database, NIST Webbook,
  NLM ChemIDplus, ChemSpider, …
                                                                      ~6%
• Commercial Sources / others                                               others
  Asinex, Comgenex, eMolecules, …

   current status:                               140 chemical structure databases
   (as of March 2010)                                120 million structure records
                                           84.6 million unique structures by FICuS
                                        110 million Standard InChIKeys for lookup
Chemical Structure Database (Update 2012)

• PubChem Substance & Compound as separate databases
  (both updated to 2012)
• ChemNavigator iResearch Library: updated to 2012
• new databases, e.g.
  • Therapeutic Target Database (TTD)
  • Human Metabolome Database (HMDB)
  • DrugBank
• “pull” download of databases also available in PubChem, e.g.
  • DSSTox, ZINC 2012/01, ChEBI 2012/01, ChEMBL13,
    ChemIDplus 2012/01
• to a limited extend “historic versions” of databases are archived,
  e.g. comparison of PubChem Substance 2007 vs 2012 will be
  possible
Chemical Structure Database (CSDB)
Chemical Structure Normalization

• calculation of a set of parent structures with different
  sensitivity to chemical features:


                      structure                     hashcode
     original       normalization                  calculation
                                      parent                       NCI/CADD
    structure
                                     structure                     Identifier
     record
                                                   E_HASHISY
     Molfile                           SDF                            FICTS
     SDF                               SMILES
     SMILES                            database                       FICuS
     ChemDraw cdx
     PDB                                                             uuuuu

      both the original structure record & the normalized parent structures
                           are archived in the database
Chemical Structure Database (CSDB)
NCI/CADD Identifiers (FICTS, FICuS, uuuuu)

based on CACTVS hashcodes (HASHISY)                                                                                    O

16-digit hexadecimal number (64-bit unsigned)                                                      HN                       OH
                                                                                                         N           NH 2

                                                                                                   9850FD9F9E2B4E25
structure normalization:
             O                      O                          O                         O                             O
                                                                     Na+
  HN             OH       N             OH     HN                  O-      HN                 OH        HN                 OH
       N   NH                 NH NH2                N     NH2                   N       NH2                  N       NH2

   histidine:             tautomer                      salt                        R                            S

9850FD9F9E2B4E25-FICTS 6C16DE2351F9FF50-FICTS E5F83F10C5DB080A-FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS


           9850FD9F9E2B4E25-FICuS            E5F83F10C5DB080A-FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS


                                             9850FD9F9E2B4E25-uuuuu
Chemical Structure Database (Update 2012)

Unique structure count: (HASHISY)
based on CACTVS hashcodes                                                                                              O

16-digit hexadecimal number (64-bit unsigned)                                                      HN                       OH
  FICTS   ~118 million                                                                                   N           NH 2

  FICuS      ~115 million                                                                          9850FD9F9E2B4E25
structure normalization:
  uuuuu      ~100 million
             O                      O                          O                         O                             O
                                                                     Na+
  HN             OH       N             OH     HN                  O-      HN                 OH        HN                 OH
       N   NH                 NH NH2                N     NH2                   N       NH2                  N       NH2
Chemical Structure Database (Update 2012)
    histidine:             tautomer                     salt                        R                            S

231 small-molecule database
9850FD9F9E2B4E25-FICTS 6C16DE2351F9FF50-FICTS E5F83F10C5DB080A-FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS


367 database releases (full, incremental, “historic versions”)
          9850FD9F9E2B4E25-FICuS              E5F83F10C5DB080A-FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS


324 million original database records          9850FD9F9E2B4E25-uuuuu
Chemical Structure Database (Update 2012)
InChI/InChIKey

InChI/InChIKey (Version 1.04) calculated with four InChI flag sets:

                 CACTVS
  Standard   :    Add H   Standard InChIKey

    Set 1    :    Add H   DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T

    Set 2    :    Add H   DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T

    Set 3    :    Add H   DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T




Standard Set, Set 1 & Set 2: addition of hydrogen atoms by CACTVS
Set 3: addition of hydrogen atoms by the InChI library
Chemical Structure Database (Update 2012)
InChI/InChIKey

• calculation of InChI/InChIKey Standard set, Set 1, Set 2 & Set 3
  for all original structure records and normalized parent structure:


                     structure                    hashcode
     original      normalization                 calculation
                                      parent                   NCI/CADD
    structure
                                     structure                 Identifier
     record
                                                 E_HASHISY
                                                                 FICTS

                                                                 FICuS

                                                                 uuuuu
                   InChI/InChIKey

        Standard    Set 1    Set 2   Set 3
Using CIR with InChI/InChIKey
Using CIR with InChI/InChIKey
(Partial) InChIKey Lookup

• resolve Standard InChIKey into full structure representation:
                                                                                  Ethanol
http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA-N/smiles

 CCO


http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA/smiles`

 CCO
 CC[OH2+]


http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ/smiles

 C(C(O)([2H])[2H])[2H]
 CC(O)([2H])[2H]
 C(CO)([2H])([2H])[2H]
 CC[17OH]
 C(CO)[2H]
 [14CH3]CO
 CCO
Using CIR with InChI/InChIKey
Chemical File Representation

• available file format representations:
                                                       Aspirin
http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/file?format=sdf

  alc Alchemy format                            maestro Schroedinger MacroModel
  cdxml CambridgeSoft ChemDraw XML format       structure file format
  cerius MSI Cerius II format                   mol Symyx molecule file
  charmm Chemistry at HARvard                   sybyl2/mol2 Tripos Sybyl MOL2 format
  Macromolecular Mechanics file format          mrv ChemAxon MRV format
  cif Crystallographic Information File         pdb Protein Data Bank
  cml Chemical Markup Language                  sdf Symyx Structure Data Format
  gjf Gaussian input data file                  sdf3000 Symyx Structure Data Format 3000
  gromacs GROMACS file format                   sln SYBYL Line Notation
  hyperchem HyperChem file format               smiles SMILES
  jme Java Molecule Editor format               xyz xyz file format
Using CIR with InChI/InChIKey
Chemical Structure Images (GIF, PNG)

                                                                        Buckyball

                          http://cactus.nci.nih.gov/chemical/structure/
                          XMWRBQBLMFGWIX-UHFFFAOYSA-N/image
                          ?height=300&width=300&bgcolor=black&bondcolor=white




                                                                           Aspirin
                          http://cactus.nci.nih.gov/chemical/structure/
                          BSYNRYMUTXBXSQ-UHFFFAOYSA-N/image
                          ?height=200&width=200&symbolfontsize=7&footer="Aspirin"
Using CIR with InChI/InChIKey
3D Chemical Structure Visualization (TwirlyMol)




simple javascript that allows you to render a rotatable/zoomable
3D representation of a molecule in your web browser

implemented by Noel O'Boyle (University College Cork, Ireland)

no plugin is needed, only a modern browser:
Chrome Safari FF3.6+ IE9 IE8 IE7 IE6
Using CIR with InChI/InChIKey
3D Chemical Structure Visualization (TwirlyMol)




simple viewer:                                            Restasis
http://cactus.nci.nih.gov/chemical/structure/DDPJWUQJQMKQIF-XPNZOOHZSA-N/twirl


embedded into a web page:

 <div id=“canvas” height=“400” width=“400”></div>
 <script src=“http://cactus.nci.nih.gov/chemical/structure/
      DDPJWUQJQMKQIF-XPNZOOHZSA-N/twirl_cached/canvas” />
Using CIR with InChI/InChIKey
3D Chemical Structure Visualization (TwirlyMol)




    http://baoilleach.blogspot.com/




                                                        http://www.coronene.com/blog/



          http://chemical-quantum-images.blogspot.com
Using CIR with InChI/InChIKey
Chemical Database URLs

• request database URLs:
                                                                   Restasis
http://cactus.nci.nih.gov/chemical/structure/DDPJWUQJQMKQIF-XPNZOOHZSA-N/urls/xml

 <?xml version="1.0" encoding="UTF-8" ?>
 <request string="DDPJWUQJQMKQIF-XPNZOOHZSA-N" representation="urls">
    <data id="1" resolver=“stdinchikey" string_class=“Standard InChIKey">
       <item id="1" classification="exact" database="ChemSpider" publisher="ChemSpider">
              http://chemspider.com/structure.4939506
       </item>
       <item id="2" classification="exact" database="ChemSpider“ publisher="PubChem">
              http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=43028058
       </item>
       <item id="3" classification="exact" database="NLM ChemIDplus" publisher="NLM">
              http://chem.sis.nlm.nih.gov/chemidplus/direct.jsp?result=advanced&regno=059865133
       […]
    </data>
 </request>
Using CIR with InChI/InChIKey
Chemical Name Lookup

• request (alternative) names:
                                                                   Aspirin
http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/names/xml

 <?xml version="1.0" encoding="UTF-8" ?>
 <request string=“BSYNRYMUTXBXSQ-UHFFFAOYSA-N" representation="names">
    <data id="1" resolver=“stdinchikey" string_class=“Standard InChIKey">
       <item id="1" classification=“pubchem_iupac_name">2-acetyloxybenzoic acid</item>
       <item id="2" classification="pubchem_iupac_openeye_name">2-Acetoxybenzoic acid</item>
       <item id="3" classification="pubchem_generic_registry_name">50-78-2</item>
       <item id="4" classification="pubchem_generic_registry_name">11126-35-5</item>
       <item id="5" classification="pubchem_generic_registry_name">11126-37-7</item>
       <item id="6" classification="pubchem_generic_registry_name">2349-94-2</item>
       <item id="7" classification="pubchem_generic_registry_name">26914-13-6</item>
       <item id="8" classification="pubchem_substance_synonym">NCGC00090977-04</item>
       <item id="9" classification="pubchem_substance_synonym">KBioSS_002272</item>
       <item id="10" classification="pubchem_substance_synonym">SBB015069</item>
       <item id="11" classification="pubchem_substance_synonym">Aspirin</item>
       <item id="12" classification="pubchem_substance_synonym">D00109</item>
 […]
Using CIR with InChI/InChIKey
Chemical Properties

• request molecular weight:
                                                          Aspirin
http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/weight

 180.1598                                                               MIME type: text/plain

  /mw molecular weight                             /aromatic compound is aromatic
  /formula formula                                 /macrocyclic compound is macrocyclic
  /monoisotopic_mass monoisotopic mass             /heteroatom_count heteroatom count
  /h_bond_donor_count H bond donor count           /hydrogen_atom_count H atom count
  /h_bond_acceptor_count H bond acceptor count     /heavy_atom_count heavy atom count
  /h_bond_center_count H bond center count         /deprotonable_group_count number of
  /rotor_count number of rotatable bonds           deprotonable groups
  /effective_rotor_count number of effectively     /protonable_group_count number of
  rotatable bonds                                  protonable groups
  /rule_of_5_violation_count number of Rule-of-5   /ring_count number of rings
  violations                                       /ringsys_count number of ringsystems
  /xlogp2 octanol−water partition
  coefficient XLOGP2
Using CIR with InChI/InChIKey
Chemical Name Pattern Search

• Google-like searches on CIR’s name index (approx. 70 million names)


 example: all chemical names that contain the words “morphine” and “methyl”
 (name pattern: ‘+morphine +methyl‘):

 http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl/stdinchikey/xml?resolver=name_pattern




                                                  based on the open source
                                                  full text search server Sphinx
                                                  (http://sphinxsearch.com)
Search name pattern ‘+morphine +methyl’: 7 matching names
<request string="+morphine +methyl" representation="stdinchikey">
   <data id="1" resolver="name_pattern" notation="Morphine 3-methyl ether">
      <item id="1">InChIKey=OROGSEYTTFOCAN-DNJOTXNNSA-N</item>
   </data>
   <data id="2" resolver="name_pattern" notation="6-Methyl-delta(sup 6)-deoxy-morphine">
      <item id="1">InChIKey=CUFWYVOFDYVCPM-GGNLRSJOSA-N</item>
   </data>
   <data id="3" resolver="name_pattern" notation="Morphine, dihydro-6-methyl-">
      <item id="1">InChIKey=NBKVWIJQJMEQLE-NGTWOADLSA-N</item>
   </data>
   <data id="4" resolver="name_pattern“ notation="6-METHYL-MORPHINE ETHER">
      <item id="1">InChIKey=FNAHUZTWOVOCTL-UHFFFAOYSA-N</item>
   </data>
   <data id="5" resolver="name_pattern" notation="Morphine alcoholic methyl ether">
      <item id="1">InChIKey=FNAHUZTWOVOCTL-XSSYPUMDSA-N</item>
   </data>
   <data id="6" resolver="name_pattern" notation="N-Methyl morphine chloride">
      <item id="1">InChIKey=MJNCZWBHCFTYFU-SCLAZZCHSA-N</item>
   </data>
   <data id="7" resolver="name_pattern" notation="Morphine, 7-hydroxy-6,6-dimethoxy-3-O-methyl-">
      <item id="1">InChIKey=URFKRBIESURBKC-UHFFFAOYSA-N</item>
   </data>
</request>
Using CIR with InChI/InChIKey
Chemical Name Pattern Search

example: chemical names that contain the words “morphine” and “methyl”
but not “hydroxyl” (name pattern: ‘+morphine +methyl -hydroxyl‘):
http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl -hydroxyl/stdinchikey/xml?resolver=name_pattern

                                                                                    6 matching names

example: chemical names that contain the substring “morphine”
somewhere in the name (name pattern: ‘*morphine*‘)
http://cactus.nci.nih.gov/chemical/structure/*morphine*/stdinchikey/xml?resolver=name_pattern


                                                                                  45 matching names

example: chemical names that contain a single character “m” and the word
“benzene” in a maximum distance of 3 words (finds meta-substituted aromatic
compounds, name pattern: ‘“m benzene”~3‘):
http://cactus.nci.nih.gov/chemical/structure/(m benzene)~3/stdinchikey/xml?resolver=name_pattern

                                                                                  22 matching names
Structure Normalization
    (Tautomerism)
Structure Normalization
Tautomerism

21 SMIRKS transform rules:

  rule 1: 1.3 (thio)keto/(thio)enol               rule 12: furanones
  rule 2: 1.5 (thio)keto/(thio)enol               rule 13: keten/ynol exchange
  rule 3: simple (aliphatic) imine                rule 14: ionic nitro/aci-nitro
  rule 4: special imine                           rule 15: pentavalent nitro/aci-nitro
  rule 5: 1.3 aromatic heteroatom H shift         rule 16: oxim/nitroso
  rule 6: 1.3 heteroatom H shift                  rule 17: oxim/nitroso via phenol
  rule 7: 1.5 (aromatic) heteroatom H shift (1)   rule 18: cyanic/iso-cyanic acids
  rule 8: 1.5 aromatic heteroatom H shift (2)     rule 19: formamidinesulfinic acids
  rule 9: 1.7 (aromatic) heteroatom H shift       rule 20: isocyanides
  rule 10: 1.9 (aromatic) heteroatom H shift      rule 21: phosphonic acids
  rule 11: 1.11 (aromatic) heteroatom H shift
Structure Normalization
Tautomerism
rule 1: 1.3 (thio)keto/(thio)enol

                                  1                                                 1           H4
                                  O                                                     O
                                                                                    2
                              2       3       H4                                                3


                                                        1.3 keto/enol

               [O,S,Se,Te;X1:1]=[C;z{1-2}:2][CX4R{0-2}:3][#1:4]>>
            [#1:4][O,S,Se,Te;X2:1][#6;z{1-2}:2]=[C,cz{0-1}R{0-1}:3]


rule 6: 1.3 heteroatom H shift

                                      4                                         4
                                      H                                         H           3
                      1                   3
                          S           N                                     1   S           N
                                  2       H                                                     H
                                                                                        2
                              N                                                     N
                                      H            1.3 heteroatom H shift                   H

     [N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3][#1:4]>>
      [#1:4][N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3]
Structure Normalization
Warfarin - Tautomers

       HO    O                   HO   O         HO   O



   O         O              HO        O    HO        O




       O     O                   O    O         O    O



   O         O              HO        O    HO        O




       O     O                   O    O          O   O



   O         OH             HO        OH   HO        OH




           prototropic tautomerism
Structure Normalization
Warfarin - Tautomers

        HO    O                   HO   O                 HO   O



    O         O              HO        O           HO         O




        O     O                   O    O                 O    O



   O          O              HO        O           HO         O




        O     O                   O    O                  O   O



    O         OH             HO        OH           HO        OH




http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/representation
            prototropic tautomerism
Structure Normalization
Warfarin – FICuS Identifier
            FICuS


        HO     O                   HO    O                  HO    O



    O          O              HO         O             HO         O


  D76B88C0354759F1-FICuS     D76B88C0354759F1-FICuS   D76B88C0354759F1-FICuS


         O     O                    O    O                   O    O



    O          O              HO         O             HO         O


  D76B88C0354759F1-FICuS     D76B88C0354759F1-FICuS   D76B88C0354759F1-FICuS


         O     O                    O    O                   O    O



    O          OH             HO         OH            HO         OH


  D76B88C0354759F1-FICuS     D76B88C0354759F1-FICuS   D76B88C0354759F1-FICuS


http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficus
             prototropic tautomerism tautomerism
                          prototropic
Structure Normalization
Warfarin – FICuS Identifier
            FICuS


        HO     O                   HO    O                  HO    O                   O    O



    O          O              HO         O             HO         O                        O
                                                                                      HO
  D76B88C0354759F1-FICuS     D76B88C0354759F1-FICuS   D76B88C0354759F1-FICuS   09BB2FAADA1508A7-FICuS


         O     O                    O    O                   O    O                   O    O
                                                                                HO


    O          O              HO         O             HO         O                        O


  D76B88C0354759F1-FICuS     D76B88C0354759F1-FICuS   D76B88C0354759F1-FICuS   09BB2FAADA1508A7-FICuS


         O     O                    O    O                   O    O                   O    O
                                                                                HO


    O          OH             HO         OH            HO         OH                       OH


  D76B88C0354759F1-FICuS     D76B88C0354759F1-FICuS   D76B88C0354759F1-FICuS   2F505A3FCA434B3C-FICuS


http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficus
                                                                                  ring-chain
                                                                                  ring-chain
             prototropic tautomerism tautomerism
                          prototropic
                                                                                tautomerism
                                                                                tautomerism
Structure Normalization
Warfarin –                    Standard InChIKey


          HO    O                         HO    O                        HO   O                        O    O



      O         O                    HO         O                   HO        O                             O
                                                                                                       HO
QTXVAVXCBMYBJW-UHFFFAOYSA-N      VWSXIGYSLWNCBN-VAWYXSNFSA-N   GRAAPKVUSREWIL-UHFFFAOYSA-N   LSCYDZJASSKSMJ-UHFFFAOYSA-N


          O     O                          O    O                        O    O                        O    O
                                                                                                 HO


     O          O                    HO         O                   HO        O                             O


FQEPJUOLUDFINX-UHFFFAOYSA-N      UCKRWKACBKRIKB-VAWYXSNFSA-N   NNLYDNMZCAHUOV-UHFFFAOYSA-N   XGIOTBZTMHLTRL-UHFFFAOYSA-N


          O     O                          O    O                         O   O                        O    O
                                                                                                 HO


      O         OH                   HO         OH                  HO        OH                            OH


PJVWKTKQMONHTI-UHFFFAOYSA-N      FVSFCRPKSVCTBA-VAWYXSNFSA-N   BBOSKMPTDUUMKL-UHFFFAOYSA-N   QUJJIKXCACZKKD-UHFFFAOYSA-N


http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/stdinchikey
                                                                                                   ring-chain
              prototropic tautomerism
                                                                                                 tautomerism
Structure Normalization
Warfarin –                    InChIKey


         HO     O                         HO    O                       HO   O                         O    O



     O          O                    HO         O                  HO        O                              O
                                                                                                       HO
SAYISSDYYDIVTP-UHFFFAOYNA-N      SAYISSDYYDIVTP-UHFFFAOYNA-N   PMOPDASZKFXBOL-UHFFFAOYNA-N   LSCYDZJASSKSMJ-UHFFFAOYNA-N


          O     O                          O    O                        O    O                        O    O
                                                                                                 HO


     O          O                    HO         O                  HO         O                             O


SAYISSDYYDIVTP-UHFFFAOYNA-N      SAYISSDYYDIVTP-UHFFFAOYNA-N   PMOPDASZKFXBOL-UHFFFAOYNA-N   FQOKLKCGRHFANU-UHFFFAOYNA-N


          O     O                          O    O                        O    O                        O    O
                                                                                                 HO


     O          OH                   HO         OH                 HO         OH                            OH


SAYISSDYYDIVTP-UHFFFAOYNA-N      SAYISSDYYDIVTP-UHFFFAOYNA-N   PMOPDASZKFXBOL-UHFFFAOYNA-N   FQOKLKCGRHFANU-UHFFFAOYNA-N


InChIKey (W0 RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T)
                                                                                                   ring-chain
              prototropic tautomerism
                                                                                                 tautomerism
Structure Normalization
Warfarin

• “normalize” Standard InChIKey by NCI/CADD’s business rules:
http://cactus.nci.nih.gov/chemical/structure/normalize:QTXVAVXCBMYBJW-UHFFFAOYSA-N/stdinchikey


 InChIKey=FQEPJUOLUDFINX-UHFFFAOYSA-N                                     MIME type: text/plain




                      HO   O                                   O   O



                  O        O                               O       O




      QTXVAVXCBMYBJW-UHFFFAOYSA-N               FQEPJUOLUDFINX-UHFFFAOYSA-N
Structure Normalization
Chemical Operators

• available operators:

 add_hyrogens, remove_hydrogens, normalize, ficts, ficus, uuuuu,
 scaffold_sequence, nostereo, stereoisomers, tautomers

example:
http://cactus.nci.nih.gov/chemical/structure/
           scaffold_sequence:FQEPJUOLUDFINX-UHFFFAOYSA-N/stdinchikey


             O   O                             O   O                               O   O



                 O                                 O                                   O



 XVYBSGQBRUYLNK-UHFFFAOYSA-N BQLSCAPEANVCOG-UHFFFAOYSA-N MERGMNQXULKBCH-UHFFFAOYSA-N


                      Schuffenhauer et al., J. Chem. Inf. Model. 2007, 47, 47-58
Soon: Chemical File Resolver (CFR)
Chemical File Resolver (CFR)

 chemical        HTTP Post                  HTTP Get          chemical
                                   CFR
    file                                                         file


• allows conversion of many chemical file formats into another
  format or other representations
• will have a programmatic URL API & a HTML Web interface
• url’izes all elements of the original file, i.e. provides access to each
  specific record, field, and any metadata (size, record count, etc.) of
  the posted file by URLs
• release: Q2/2012 (hopefully)
Chemical File Resolver (CFR)

 chemical           HTTP Post                         HTTP Get             chemical
                                          CFR
    file                                                                      file


• HTTP: post a file (e.g. with curl), CFR replies with a MD5 hash key:

 curl -F upload=@/your/local/file.sdf http://cactus.nci.nih.gov/chemical/file
 >d85b396ed6ced6348a5b402eb8fcfe8b


• accepted formats:
  • chemical file formats: alc, cdxml, cerius, charmm, cif, cml, jme,
    maestro, mol, mol2, mrv, pdb, sdf, sdf3000, sln, smiles, xyz, …
  • text files with a list of identifiers …
Post a plain text file, e.g.:

  ethanol          HTTP Post                          HTTP Get
  chemical
  aspirin                                                                chemical
                                          CFR
      file
  InChI=1S/C4H10O/c1-3-5-4-2/h3-4H2,1-2H3                                     file
  CCOCC
  InChIKey=RCINICONZNJXQF-MZXODVADSA-N
  InChIKey=QTXVAVXCBMYBJW-UHFFFAOYSA-N
• 204255-11-8a file, CFR replies with a MD5 hash sum:
  after posting
  tautomers:guanine
 curl -F upload=@/your/local/file.sdf http://cactus.nci.nih.gov/TEST/chemical/file
  ChemSpider_ID=1234
 >d85b396ed6ced6348a5b402eb8fcfe8b
  Pubchem_SID=456

• accepted formats:
   • chemical file formats: alc, cdxml, cerius, charmm, cif, cml, jme,
     maestro, mol, mol2, mrv, pdb, sdf, sdf3000, sln, smiles, xyz, …
   • text files with a list of identifier:
Chemical File Resolver (CFR)

 chemical           HTTP Post                        HTTP Get             chemical
                                          CFR
    file                                                                     file



• request new file format using the obtained MD5 hash key:

                                   d85b396ed6ced6348a5b402eb8fcfe8b

 curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}?format={sdf, smi, pdb, cml, …}
Chemical File Resolver (CFR)

 chemical          HTTP Post                       HTTP Get             chemical
                                        CFR
    file                                                                   file



• request record 2 and 5 as SMILES string:

                                  d85b396ed6ced6348a5b402eb8fcfe8b

 curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}?records=2,5&format=smiles
Chemical File Resolver (CFR)

 chemical           HTTP Post                         HTTP Get            chemical
                                          CFR
    file                                                                     file



• get field names:

 curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}/fields


• get a specific field value from record n:


 curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}/n/{field_name}
Chemical Structure Web API

                                                                     external
      Chemical                Chemical
                                                  NCI/CADD          web services
      Identifier                File
                                                  web service
      Resolver                Resolver


                                                                                   http

                             Chemical Structure Web API




                                                                  other
                                         CACTVS                 software
                                                                packages

    NCI/CADD Chemical Structure                                   OPSIN
          Database (CSDB)
IUPAC InChI/InChIKey Resolver

• (hopefully) there will be many resolvers from different
  providers with different background:
  • publishers
  • commercial databases
  • free sources and databases: ChemSpider, PubChem, ChEBI, …
• InChI/InChIKey is the perfect tool to interlink the resolvers
• ChemSpider, PubChem and NCI/CADD are working on a test
  protocol for a federated InChI/InChIKey resolver
IUPAC InChI/InChIKey Resolver


                                  Resolver 1




            IUPAC Root Resolver

                                  Resolver 2



                                               Resolver 3.1
                                  Resolver 3


                                               Resolver 3.2
 Clients                          Resolver 3
                                                       CIR


                                               Resolver 3.3
http://cactus.nci.nih.gov
http://cactus.nci.nih.gov/blog
Acknowledgments

The InChI Team

NCI/CADD Team              University of Cambridge, UK
Igor Filippov              Daniel Lowe
Marc Nicklaus

Xemistry GmbH, Germany     University College Cork, Ireland
Wolf-Dietrich Ihlenfeldt   Noel O’ Boyle



All Database providers     ChemNavigator
                           Scott Hutton
                           Tad Hurst
Acknowledgments - Software


           CACTVS


                                   Python Web Framework
                      ChemWriter


                                       Python SQL Library
      Peter Ertl (Novartis)




       Javascript library

                                    Fulltext Search Engine

More Related Content

Recently uploaded

Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncrdollysharma2066
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 

Recently uploaded (20)

Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / NcrCall Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
Call Girls in DELHI Cantt, ( Call Me )-8377877756-Female Escort- In Delhi / Ncr
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Access NCI/CADD Resources by InChI

  • 1. Accessing NCI/CADD Web Resources by InChI Markus Sitzmann Computer-Aided Drug Design Group, Chemical Biology Laboratory, Frederick National Laboratory for Cancer Research, NIH, DHHS
  • 3. Chemical Identifier Resolver (CIR) CIR works as a resolver for different chemical structure identifiers or representations. It allows one to convert a given structure identifier into another representation or structure identifier. http://cactus.nci.nih.gov/chemical/structure
  • 4. Chemical Structure Representations SYBYL Line Notation SMILES CAS Registry Number chemical names GIF image ChemNavigator SID SD File chemical structure CML FDA UNII NCI/CADD Identifiers NSC number MRV InChI/InChIKey PubChem SID/CID ChemSpider ID ChEBI ID Chemical Formula PDB Ligand ID
  • 5. Chemical Structure Representations SYBYL Line Notation SMILES CAS Registry Number chemical names GIF image ChemNavigator SID SD File CML InChI FDA UNII NCI/CADD Identifiers NSC number MRV InChI/InChIKey PubChem SID/CID ChemSpider ID ChEBI ID Chemical Formula PDB Ligand ID
  • 6. Chemical Structure Databases InChI many more …
  • 7. Chemical Identifier Resolver (CIR) CIR works as a resolver for different chemical structure identifiers or representations. It allows one to convert a given structure identifier into another representation or structure identifier. http://cactus.nci.nih.gov/chemical/structure
  • 8. Chemical Identifier Resolver (CIR) C7H6O2 APtclcactv03051222202D 0 0.00000 0.00000 WPYMKLBDIGXBTP-FZOZFQFYNA-N 15 15 0 0 0 0 0 0 0 0999 V2000 2.8660 -2.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -0.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 Works as a resolver for different 2.8660 0.9400 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 chemical structure identifiers. 2.0000 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -2.6800 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 Allows one to convert a given 4.2690 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 structure identifier into another 1.4631 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1.4631 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 representation or structure 3.7321 2.0600 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 identifier. 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 1 6 1 0 0 0 0 4 7 1 0 0 0 0 7 8 1 0 0 0 0 7 9 2 0 0 0 0 1 10 1 0 0 0 0 2 11 1 0 0 0 0 3 12 1 0 0 0 0 http://cactus.nci.nih.gov/chemical/structure 5 13 1 0 0 0 0 6 14 1 0 0 0 0 8 15 1 0 0 0 0 ChemWriter Editor M END SD file $$$$
  • 9. Chemical Identifier Resolver (CIR) benzoic acid 65-85-0 WLN: QVR WPYMKLBDIGXBTP-FZOZFQFYNA-N Unisept BZA AIDS018010 Salvo liquid Benzoic acid-ring-UL-14C ST5213864 Benzoesaeure CHEBI:30746 Works as a resolver for different NSC 149 benzenecarboxylic acid chemical structure identifiers. phenylformic acid Benzoic acid (JP15/USP) Allows one to convert a given Benzoic acid (TN) 18102_RIEDEL structure identifier into another Aromatic hydroxy acid Benzoic acid (7CI,8CI,9CI) representation or structure Benzoic acid [USAN:JAN] W213128_ALDRICH 47849_SUPELCO identifier. Acide benzoique [French] Acido benzoico [Italian] Benzoate (VAN) Benzoesaeure [German] Benzoic acid (natural) Acide benzoique Benzeneformic acid Benzenemethanoic acid Benzoesaeure GK Benzoesaeure GV http://cactus.nci.nih.gov/chemical/structure Benzoic acid, tech. Carboxybenzene Kyselina benzoova ChemWriter Editor Phenylcarboxylic acid names
  • 10. Chemical Identifier Resolver (CIR) WPYMKLBDIGXBTP-FZOZFQFYNA-N Works as a resolver for different chemical structure identifiers. InChIKey=WPYMKLBDIGXBTP-UHFFFAOYSA-N InChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9) Allows one to convert a given C1=CC=C(C=C1)C(O)=O structure identifier into another representation or structure identifier. InChIKey http://cactus.nci.nih.gov/chemical/structure InChI ChemWriter Editor SMILES
  • 11. Chemical Identifier Resolver (CIR) programmatic URL API: http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation” if a request is not successful: HTTP404 status message
  • 12. Chemical Identifier Resolver (CIR) examples: programmatic URL API: http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/cas 204255-11-8 MIME type: text/plain http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation” http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/image if a request is not successful: HTTP404 status message MIME type: image/gif
  • 13. Chemical Identifier Resolver (CIR) • access by programming libraries/languages (e.g. Python): from urllib2 import * url = “http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas” resolver = urlopen(url) try: response = resolver.read() except HTTPError: raise “your own error handling” print response 204255-11-8 • access from Unix shell level (e.g., via wget): shell > wget -qO - http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas 204255-11-8
  • 14. Chemical Identifier Resolver: InChI/InChIKey Database RegIDs structure images (PubChem, ZINC, eMolecules, ChemSpider ID) (GIF, PNG) (trivial) names SMILES InChI/InChIKey IUPAC names (OPSIN) chemical properties (MW, formula, …) CAS Registry numbers structure files (sdf, pdb, cdx, …)
  • 15. Chemical Identifier Resolver (CIR) /smiles chemical names /names, /iupac_name IUPAC names (OPSIN) /cas CAS numbers /inchi, /stdinchi SMILES strings /inchikey, /stdinchikey IUPAC InChI/InChIKeys /ficts, /ficus, /uuuuu NCI/CADD Identifiers /image CACTVS HASHISY CIR /file, /sdf NSC number http://cactus.nci.nih.gov/chemcial/structure /mw, /monoisotopic_mass PubChem SID /formula ZINC Code /twirl ChemSpider ID /urls ChemNavigator SID /chemspider_id eMolecule VID /pubchem_sid /chemnavigator_sid “identifier” “representation”
  • 16. Chemical Identifier Resolver (CIR) identifier representation http request calculation of the http response identifier is a full structure requested structure representation representation detection of (e.g. SMILES, InChI) the identifier MIME type type e.g. InChI, GIF image identifier is a e.g. CAS number, hashed structure structure chemical name representation (e.g. InChIKey), chemical name etc. database lookup CSDB
  • 17. Chemical Structure Database (CSDB) • ChemNavigator iResearch Library compilation of commercially available screening compounds from ~300 international chemistry suppliers PubChem ChemNav. ~38% • PubChem database iResearch Lib. including Open NCI database, EPA DSSTox ~56% databases, NIAID HIV database, NIST Webbook, NLM ChemIDplus, ChemSpider, … ~6% • Commercial Sources / others others Asinex, Comgenex, eMolecules, … current status: 140 chemical structure databases (as of March 2010) 120 million structure records 84.6 million unique structures by FICuS 110 million Standard InChIKeys for lookup
  • 18. Chemical Structure Database (Update 2012) • PubChem Substance & Compound as separate databases (both updated to 2012) • ChemNavigator iResearch Library: updated to 2012 • new databases, e.g. • Therapeutic Target Database (TTD) • Human Metabolome Database (HMDB) • DrugBank • “pull” download of databases also available in PubChem, e.g. • DSSTox, ZINC 2012/01, ChEBI 2012/01, ChEMBL13, ChemIDplus 2012/01 • to a limited extend “historic versions” of databases are archived, e.g. comparison of PubChem Substance 2007 vs 2012 will be possible
  • 19. Chemical Structure Database (CSDB) Chemical Structure Normalization • calculation of a set of parent structures with different sensitivity to chemical features: structure hashcode original normalization calculation parent NCI/CADD structure structure Identifier record E_HASHISY Molfile SDF FICTS SDF SMILES SMILES database FICuS ChemDraw cdx PDB uuuuu both the original structure record & the normalized parent structures are archived in the database
  • 20. Chemical Structure Database (CSDB) NCI/CADD Identifiers (FICTS, FICuS, uuuuu) based on CACTVS hashcodes (HASHISY) O 16-digit hexadecimal number (64-bit unsigned) HN OH N NH 2 9850FD9F9E2B4E25 structure normalization: O O O O O Na+ HN OH N OH HN O- HN OH HN OH N NH NH NH2 N NH2 N NH2 N NH2 histidine: tautomer salt R S 9850FD9F9E2B4E25-FICTS 6C16DE2351F9FF50-FICTS E5F83F10C5DB080A-FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 9850FD9F9E2B4E25-FICuS E5F83F10C5DB080A-FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25-uuuuu
  • 21. Chemical Structure Database (Update 2012) Unique structure count: (HASHISY) based on CACTVS hashcodes O 16-digit hexadecimal number (64-bit unsigned) HN OH FICTS ~118 million N NH 2 FICuS ~115 million 9850FD9F9E2B4E25 structure normalization: uuuuu ~100 million O O O O O Na+ HN OH N OH HN O- HN OH HN OH N NH NH NH2 N NH2 N NH2 N NH2 Chemical Structure Database (Update 2012) histidine: tautomer salt R S 231 small-molecule database 9850FD9F9E2B4E25-FICTS 6C16DE2351F9FF50-FICTS E5F83F10C5DB080A-FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 367 database releases (full, incremental, “historic versions”) 9850FD9F9E2B4E25-FICuS E5F83F10C5DB080A-FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 324 million original database records 9850FD9F9E2B4E25-uuuuu
  • 22. Chemical Structure Database (Update 2012) InChI/InChIKey InChI/InChIKey (Version 1.04) calculated with four InChI flag sets: CACTVS Standard : Add H Standard InChIKey Set 1 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T Set 2 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T Set 3 : Add H DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T Standard Set, Set 1 & Set 2: addition of hydrogen atoms by CACTVS Set 3: addition of hydrogen atoms by the InChI library
  • 23. Chemical Structure Database (Update 2012) InChI/InChIKey • calculation of InChI/InChIKey Standard set, Set 1, Set 2 & Set 3 for all original structure records and normalized parent structure: structure hashcode original normalization calculation parent NCI/CADD structure structure Identifier record E_HASHISY FICTS FICuS uuuuu InChI/InChIKey Standard Set 1 Set 2 Set 3
  • 24. Using CIR with InChI/InChIKey
  • 25. Using CIR with InChI/InChIKey (Partial) InChIKey Lookup • resolve Standard InChIKey into full structure representation: Ethanol http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA-N/smiles CCO http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA/smiles` CCO CC[OH2+] http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ/smiles C(C(O)([2H])[2H])[2H] CC(O)([2H])[2H] C(CO)([2H])([2H])[2H] CC[17OH] C(CO)[2H] [14CH3]CO CCO
  • 26. Using CIR with InChI/InChIKey Chemical File Representation • available file format representations: Aspirin http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/file?format=sdf alc Alchemy format maestro Schroedinger MacroModel cdxml CambridgeSoft ChemDraw XML format structure file format cerius MSI Cerius II format mol Symyx molecule file charmm Chemistry at HARvard sybyl2/mol2 Tripos Sybyl MOL2 format Macromolecular Mechanics file format mrv ChemAxon MRV format cif Crystallographic Information File pdb Protein Data Bank cml Chemical Markup Language sdf Symyx Structure Data Format gjf Gaussian input data file sdf3000 Symyx Structure Data Format 3000 gromacs GROMACS file format sln SYBYL Line Notation hyperchem HyperChem file format smiles SMILES jme Java Molecule Editor format xyz xyz file format
  • 27. Using CIR with InChI/InChIKey Chemical Structure Images (GIF, PNG) Buckyball http://cactus.nci.nih.gov/chemical/structure/ XMWRBQBLMFGWIX-UHFFFAOYSA-N/image ?height=300&width=300&bgcolor=black&bondcolor=white Aspirin http://cactus.nci.nih.gov/chemical/structure/ BSYNRYMUTXBXSQ-UHFFFAOYSA-N/image ?height=200&width=200&symbolfontsize=7&footer="Aspirin"
  • 28. Using CIR with InChI/InChIKey 3D Chemical Structure Visualization (TwirlyMol) simple javascript that allows you to render a rotatable/zoomable 3D representation of a molecule in your web browser implemented by Noel O'Boyle (University College Cork, Ireland) no plugin is needed, only a modern browser: Chrome Safari FF3.6+ IE9 IE8 IE7 IE6
  • 29. Using CIR with InChI/InChIKey 3D Chemical Structure Visualization (TwirlyMol) simple viewer: Restasis http://cactus.nci.nih.gov/chemical/structure/DDPJWUQJQMKQIF-XPNZOOHZSA-N/twirl embedded into a web page: <div id=“canvas” height=“400” width=“400”></div> <script src=“http://cactus.nci.nih.gov/chemical/structure/ DDPJWUQJQMKQIF-XPNZOOHZSA-N/twirl_cached/canvas” />
  • 30. Using CIR with InChI/InChIKey 3D Chemical Structure Visualization (TwirlyMol) http://baoilleach.blogspot.com/ http://www.coronene.com/blog/ http://chemical-quantum-images.blogspot.com
  • 31. Using CIR with InChI/InChIKey Chemical Database URLs • request database URLs: Restasis http://cactus.nci.nih.gov/chemical/structure/DDPJWUQJQMKQIF-XPNZOOHZSA-N/urls/xml <?xml version="1.0" encoding="UTF-8" ?> <request string="DDPJWUQJQMKQIF-XPNZOOHZSA-N" representation="urls"> <data id="1" resolver=“stdinchikey" string_class=“Standard InChIKey"> <item id="1" classification="exact" database="ChemSpider" publisher="ChemSpider"> http://chemspider.com/structure.4939506 </item> <item id="2" classification="exact" database="ChemSpider“ publisher="PubChem"> http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=43028058 </item> <item id="3" classification="exact" database="NLM ChemIDplus" publisher="NLM"> http://chem.sis.nlm.nih.gov/chemidplus/direct.jsp?result=advanced&regno=059865133 […] </data> </request>
  • 32. Using CIR with InChI/InChIKey Chemical Name Lookup • request (alternative) names: Aspirin http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/names/xml <?xml version="1.0" encoding="UTF-8" ?> <request string=“BSYNRYMUTXBXSQ-UHFFFAOYSA-N" representation="names"> <data id="1" resolver=“stdinchikey" string_class=“Standard InChIKey"> <item id="1" classification=“pubchem_iupac_name">2-acetyloxybenzoic acid</item> <item id="2" classification="pubchem_iupac_openeye_name">2-Acetoxybenzoic acid</item> <item id="3" classification="pubchem_generic_registry_name">50-78-2</item> <item id="4" classification="pubchem_generic_registry_name">11126-35-5</item> <item id="5" classification="pubchem_generic_registry_name">11126-37-7</item> <item id="6" classification="pubchem_generic_registry_name">2349-94-2</item> <item id="7" classification="pubchem_generic_registry_name">26914-13-6</item> <item id="8" classification="pubchem_substance_synonym">NCGC00090977-04</item> <item id="9" classification="pubchem_substance_synonym">KBioSS_002272</item> <item id="10" classification="pubchem_substance_synonym">SBB015069</item> <item id="11" classification="pubchem_substance_synonym">Aspirin</item> <item id="12" classification="pubchem_substance_synonym">D00109</item> […]
  • 33. Using CIR with InChI/InChIKey Chemical Properties • request molecular weight: Aspirin http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/weight 180.1598 MIME type: text/plain /mw molecular weight /aromatic compound is aromatic /formula formula /macrocyclic compound is macrocyclic /monoisotopic_mass monoisotopic mass /heteroatom_count heteroatom count /h_bond_donor_count H bond donor count /hydrogen_atom_count H atom count /h_bond_acceptor_count H bond acceptor count /heavy_atom_count heavy atom count /h_bond_center_count H bond center count /deprotonable_group_count number of /rotor_count number of rotatable bonds deprotonable groups /effective_rotor_count number of effectively /protonable_group_count number of rotatable bonds protonable groups /rule_of_5_violation_count number of Rule-of-5 /ring_count number of rings violations /ringsys_count number of ringsystems /xlogp2 octanol−water partition coefficient XLOGP2
  • 34. Using CIR with InChI/InChIKey Chemical Name Pattern Search • Google-like searches on CIR’s name index (approx. 70 million names) example: all chemical names that contain the words “morphine” and “methyl” (name pattern: ‘+morphine +methyl‘): http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl/stdinchikey/xml?resolver=name_pattern based on the open source full text search server Sphinx (http://sphinxsearch.com)
  • 35. Search name pattern ‘+morphine +methyl’: 7 matching names <request string="+morphine +methyl" representation="stdinchikey"> <data id="1" resolver="name_pattern" notation="Morphine 3-methyl ether"> <item id="1">InChIKey=OROGSEYTTFOCAN-DNJOTXNNSA-N</item> </data> <data id="2" resolver="name_pattern" notation="6-Methyl-delta(sup 6)-deoxy-morphine"> <item id="1">InChIKey=CUFWYVOFDYVCPM-GGNLRSJOSA-N</item> </data> <data id="3" resolver="name_pattern" notation="Morphine, dihydro-6-methyl-"> <item id="1">InChIKey=NBKVWIJQJMEQLE-NGTWOADLSA-N</item> </data> <data id="4" resolver="name_pattern“ notation="6-METHYL-MORPHINE ETHER"> <item id="1">InChIKey=FNAHUZTWOVOCTL-UHFFFAOYSA-N</item> </data> <data id="5" resolver="name_pattern" notation="Morphine alcoholic methyl ether"> <item id="1">InChIKey=FNAHUZTWOVOCTL-XSSYPUMDSA-N</item> </data> <data id="6" resolver="name_pattern" notation="N-Methyl morphine chloride"> <item id="1">InChIKey=MJNCZWBHCFTYFU-SCLAZZCHSA-N</item> </data> <data id="7" resolver="name_pattern" notation="Morphine, 7-hydroxy-6,6-dimethoxy-3-O-methyl-"> <item id="1">InChIKey=URFKRBIESURBKC-UHFFFAOYSA-N</item> </data> </request>
  • 36. Using CIR with InChI/InChIKey Chemical Name Pattern Search example: chemical names that contain the words “morphine” and “methyl” but not “hydroxyl” (name pattern: ‘+morphine +methyl -hydroxyl‘): http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl -hydroxyl/stdinchikey/xml?resolver=name_pattern 6 matching names example: chemical names that contain the substring “morphine” somewhere in the name (name pattern: ‘*morphine*‘) http://cactus.nci.nih.gov/chemical/structure/*morphine*/stdinchikey/xml?resolver=name_pattern 45 matching names example: chemical names that contain a single character “m” and the word “benzene” in a maximum distance of 3 words (finds meta-substituted aromatic compounds, name pattern: ‘“m benzene”~3‘): http://cactus.nci.nih.gov/chemical/structure/(m benzene)~3/stdinchikey/xml?resolver=name_pattern 22 matching names
  • 37. Structure Normalization (Tautomerism)
  • 38. Structure Normalization Tautomerism 21 SMIRKS transform rules: rule 1: 1.3 (thio)keto/(thio)enol rule 12: furanones rule 2: 1.5 (thio)keto/(thio)enol rule 13: keten/ynol exchange rule 3: simple (aliphatic) imine rule 14: ionic nitro/aci-nitro rule 4: special imine rule 15: pentavalent nitro/aci-nitro rule 5: 1.3 aromatic heteroatom H shift rule 16: oxim/nitroso rule 6: 1.3 heteroatom H shift rule 17: oxim/nitroso via phenol rule 7: 1.5 (aromatic) heteroatom H shift (1) rule 18: cyanic/iso-cyanic acids rule 8: 1.5 aromatic heteroatom H shift (2) rule 19: formamidinesulfinic acids rule 9: 1.7 (aromatic) heteroatom H shift rule 20: isocyanides rule 10: 1.9 (aromatic) heteroatom H shift rule 21: phosphonic acids rule 11: 1.11 (aromatic) heteroatom H shift
  • 39. Structure Normalization Tautomerism rule 1: 1.3 (thio)keto/(thio)enol 1 1 H4 O O 2 2 3 H4 3 1.3 keto/enol [O,S,Se,Te;X1:1]=[C;z{1-2}:2][CX4R{0-2}:3][#1:4]>> [#1:4][O,S,Se,Te;X2:1][#6;z{1-2}:2]=[C,cz{0-1}R{0-1}:3] rule 6: 1.3 heteroatom H shift 4 4 H H 3 1 3 S N 1 S N 2 H H 2 N N H 1.3 heteroatom H shift H [N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3][#1:4]>> [#1:4][N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3]
  • 40. Structure Normalization Warfarin - Tautomers HO O HO O HO O O O HO O HO O O O O O O O O O HO O HO O O O O O O O O OH HO OH HO OH prototropic tautomerism
  • 41. Structure Normalization Warfarin - Tautomers HO O HO O HO O O O HO O HO O O O O O O O O O HO O HO O O O O O O O O OH HO OH HO OH http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/representation prototropic tautomerism
  • 42. Structure Normalization Warfarin – FICuS Identifier FICuS HO O HO O HO O O O HO O HO O D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS O O O O O O O O HO O HO O D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS O O O O O O O OH HO OH HO OH D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficus prototropic tautomerism tautomerism prototropic
  • 43. Structure Normalization Warfarin – FICuS Identifier FICuS HO O HO O HO O O O O O HO O HO O O HO D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS 09BB2FAADA1508A7-FICuS O O O O O O O O HO O O HO O HO O O D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS 09BB2FAADA1508A7-FICuS O O O O O O O O HO O OH HO OH HO OH OH D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS 2F505A3FCA434B3C-FICuS http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficus ring-chain ring-chain prototropic tautomerism tautomerism prototropic tautomerism tautomerism
  • 44. Structure Normalization Warfarin – Standard InChIKey HO O HO O HO O O O O O HO O HO O O HO QTXVAVXCBMYBJW-UHFFFAOYSA-N VWSXIGYSLWNCBN-VAWYXSNFSA-N GRAAPKVUSREWIL-UHFFFAOYSA-N LSCYDZJASSKSMJ-UHFFFAOYSA-N O O O O O O O O HO O O HO O HO O O FQEPJUOLUDFINX-UHFFFAOYSA-N UCKRWKACBKRIKB-VAWYXSNFSA-N NNLYDNMZCAHUOV-UHFFFAOYSA-N XGIOTBZTMHLTRL-UHFFFAOYSA-N O O O O O O O O HO O OH HO OH HO OH OH PJVWKTKQMONHTI-UHFFFAOYSA-N FVSFCRPKSVCTBA-VAWYXSNFSA-N BBOSKMPTDUUMKL-UHFFFAOYSA-N QUJJIKXCACZKKD-UHFFFAOYSA-N http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/stdinchikey ring-chain prototropic tautomerism tautomerism
  • 45. Structure Normalization Warfarin – InChIKey HO O HO O HO O O O O O HO O HO O O HO SAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N LSCYDZJASSKSMJ-UHFFFAOYNA-N O O O O O O O O HO O O HO O HO O O SAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N FQOKLKCGRHFANU-UHFFFAOYNA-N O O O O O O O O HO O OH HO OH HO OH OH SAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N FQOKLKCGRHFANU-UHFFFAOYNA-N InChIKey (W0 RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T) ring-chain prototropic tautomerism tautomerism
  • 46. Structure Normalization Warfarin • “normalize” Standard InChIKey by NCI/CADD’s business rules: http://cactus.nci.nih.gov/chemical/structure/normalize:QTXVAVXCBMYBJW-UHFFFAOYSA-N/stdinchikey InChIKey=FQEPJUOLUDFINX-UHFFFAOYSA-N MIME type: text/plain HO O O O O O O O QTXVAVXCBMYBJW-UHFFFAOYSA-N FQEPJUOLUDFINX-UHFFFAOYSA-N
  • 47. Structure Normalization Chemical Operators • available operators: add_hyrogens, remove_hydrogens, normalize, ficts, ficus, uuuuu, scaffold_sequence, nostereo, stereoisomers, tautomers example: http://cactus.nci.nih.gov/chemical/structure/ scaffold_sequence:FQEPJUOLUDFINX-UHFFFAOYSA-N/stdinchikey O O O O O O O O O XVYBSGQBRUYLNK-UHFFFAOYSA-N BQLSCAPEANVCOG-UHFFFAOYSA-N MERGMNQXULKBCH-UHFFFAOYSA-N Schuffenhauer et al., J. Chem. Inf. Model. 2007, 47, 47-58
  • 48. Soon: Chemical File Resolver (CFR)
  • 49. Chemical File Resolver (CFR) chemical HTTP Post HTTP Get chemical CFR file file • allows conversion of many chemical file formats into another format or other representations • will have a programmatic URL API & a HTML Web interface • url’izes all elements of the original file, i.e. provides access to each specific record, field, and any metadata (size, record count, etc.) of the posted file by URLs • release: Q2/2012 (hopefully)
  • 50. Chemical File Resolver (CFR) chemical HTTP Post HTTP Get chemical CFR file file • HTTP: post a file (e.g. with curl), CFR replies with a MD5 hash key: curl -F upload=@/your/local/file.sdf http://cactus.nci.nih.gov/chemical/file >d85b396ed6ced6348a5b402eb8fcfe8b • accepted formats: • chemical file formats: alc, cdxml, cerius, charmm, cif, cml, jme, maestro, mol, mol2, mrv, pdb, sdf, sdf3000, sln, smiles, xyz, … • text files with a list of identifiers …
  • 51. Post a plain text file, e.g.: ethanol HTTP Post HTTP Get chemical aspirin chemical CFR file InChI=1S/C4H10O/c1-3-5-4-2/h3-4H2,1-2H3 file CCOCC InChIKey=RCINICONZNJXQF-MZXODVADSA-N InChIKey=QTXVAVXCBMYBJW-UHFFFAOYSA-N • 204255-11-8a file, CFR replies with a MD5 hash sum: after posting tautomers:guanine curl -F upload=@/your/local/file.sdf http://cactus.nci.nih.gov/TEST/chemical/file ChemSpider_ID=1234 >d85b396ed6ced6348a5b402eb8fcfe8b Pubchem_SID=456 • accepted formats: • chemical file formats: alc, cdxml, cerius, charmm, cif, cml, jme, maestro, mol, mol2, mrv, pdb, sdf, sdf3000, sln, smiles, xyz, … • text files with a list of identifier:
  • 52. Chemical File Resolver (CFR) chemical HTTP Post HTTP Get chemical CFR file file • request new file format using the obtained MD5 hash key: d85b396ed6ced6348a5b402eb8fcfe8b curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}?format={sdf, smi, pdb, cml, …}
  • 53. Chemical File Resolver (CFR) chemical HTTP Post HTTP Get chemical CFR file file • request record 2 and 5 as SMILES string: d85b396ed6ced6348a5b402eb8fcfe8b curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}?records=2,5&format=smiles
  • 54. Chemical File Resolver (CFR) chemical HTTP Post HTTP Get chemical CFR file file • get field names: curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}/fields • get a specific field value from record n: curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}/n/{field_name}
  • 55. Chemical Structure Web API external Chemical Chemical NCI/CADD web services Identifier File web service Resolver Resolver http Chemical Structure Web API other CACTVS software packages NCI/CADD Chemical Structure OPSIN Database (CSDB)
  • 56. IUPAC InChI/InChIKey Resolver • (hopefully) there will be many resolvers from different providers with different background: • publishers • commercial databases • free sources and databases: ChemSpider, PubChem, ChEBI, … • InChI/InChIKey is the perfect tool to interlink the resolvers • ChemSpider, PubChem and NCI/CADD are working on a test protocol for a federated InChI/InChIKey resolver
  • 57. IUPAC InChI/InChIKey Resolver Resolver 1 IUPAC Root Resolver Resolver 2 Resolver 3.1 Resolver 3 Resolver 3.2 Clients Resolver 3 CIR Resolver 3.3
  • 60. Acknowledgments The InChI Team NCI/CADD Team University of Cambridge, UK Igor Filippov Daniel Lowe Marc Nicklaus Xemistry GmbH, Germany University College Cork, Ireland Wolf-Dietrich Ihlenfeldt Noel O’ Boyle All Database providers ChemNavigator Scott Hutton Tad Hurst
  • 61. Acknowledgments - Software CACTVS Python Web Framework ChemWriter Python SQL Library Peter Ertl (Novartis) Javascript library Fulltext Search Engine

Editor's Notes

  1. … but you can do things also independently from InChI – this is the general scheme Almost every identifier or representation can be converted to any other representation
  2. If I take this rules and any of these input structures here