SlideShare ist ein Scribd-Unternehmen logo
1 von 53
InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison Markus Sitzmann Computer-Aided Drug Design Group (NCI/CADD), Laboratory of Medicinal Chemistry, NCI-Frederick, NIH, DHHS
The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey Chemical Structure Lookup Service FICTS FICuS uuuuu Std. InChI/InChIKey 74 million structure records – 46 million unique structures
[object Object],[object Object],[object Object],[object Object],[object Object],NCI/CADD Structure Identifiers Unique Representation of Chemical Structures 9850FD9F9E2B4E25 H N N N H 2 O H O
charged form A3DAE0788050DDE4  3ECEF579D7DF025A tautomers isotope “ errors” E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomers 6C16DE2351F9FF50 salt 9850FD9F9E2B4E25 H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 2 O - O N a + H N N N H 3 + O - O 8F7A1DE5A733F0E0 O H N N N H 2 O N a 60525E1AF41497B6 H N N N H O H O B2FDA68AEDA06DB9 N H N 1 5 N H 2 O H O
input structure MDL Molfile MDL SDF SMILES ChemDraw cdx PDB structure normalization parent structure MDL SDF SMILES database NCI/CADD Identifier hashcode calculation NCI/CADD Structure Identifiers Unique Representation of Chemical Structures E_HASHISY
[object Object],NCI/CADD Structure Identifiers Fragments sensitive keep only largest organic fragment Isotopes ignore isotope labels sensitive Charges uncharge sensitive find canonical tautomer Stereochemistry sensitive discard stereo information un-sensitive un-sensitive un-sensitive un-sensitive sensitive Tautomers Na + Structure Normalization un-sensitive D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensitive un-sensitive Tautomers Stereochemistry sensitive sensitive Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICTS identifier:   representation of the exact drawing un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive T ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive ≠ ≠ S Na + = = ≠ ≠ Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICuS identifier:  comes closest to how a chemist perceives a compound un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive u ≠ ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive = = ≠ ≠ S Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
NCI/CADD Structure Identifier Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sensitive sensitive = = = = = = = = uuuuu identifier:  closely related forms of the same compound u u u u u un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Structure Normalization O O - D D D D D D O - O N H 3 + O O H O O H C O O H H N H 2 C O O H N H 2 H O O H O O C O O H N H 2 O H O N H 2
NCI/CADD Structure Identifier correct structure: add hydrogen atoms correct functional groups correct metal atom bonds input structure normalize  or  discard stereo information define canonical tautomer discard isotope labels d Structure Normalization get largest fragment & uncharge: delete complex center get largest organic fragment delete radical center uncharge structure uuuuu uuuuS uuuTu uuuTS FICuu FICuS FICTS FICTu n n n n d d d define canonical resonance form/ protonation state parent structures
NCI/CADD Structure Identifier 9850FD9F9E2B4E25 -FICTS-01-57   9850FD9F9E2B4E25 -FICuS-01-78 9850FD9F9E2B4E25 -uuuuu-01-27 <CACTVS hashcode (E_HASHISY)>-<tag>-<version>-<checksum> H N N N H 2 O H O
A3DAE0788050DDE4-FICTS  E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomers isotope salt stereoisomers FICTS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
A3DAE0788050DDE4-FICuS  E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICuS charged form tautomers isotope salt stereoisomers FICuS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -uuuuu charged form tautomers isotope stereoisomers salt uuuuu “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
HNDVDQJCIGZPNO -UHFFFAOYSA-N HNDVDQJCIGZPNO -CDYZYAPPSA-N HNDVDQJCIGZPNO -RXMQYKEDSA-N  HNDVDQJCIGZPNO -YFKPBYRVSA-N HNDVDQJCIGZPNO -UHFFFAOYSA-N H N N N H 2 O - O N a + HNDVDQJCIGZPNO -UHFFFAOYSA-N charged form tautomers isotope stereoisomers salt Std. InChIKey “ errors” HNDVDQJCIGZPNO -UHFFFAOYSA-N UHPNKBYGGMJTIM-UHFFFAOYSA-M  UHPNKBYGGMJTIM-UHFFFAOYSA-M  H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
Structure Normalization Tautomers canonical tautomer ? O O OH O O OH O O O
[object Object],[object Object],[object Object],Tautomers Structure Normalization ,[object Object],[object Object]
Tautomers Structure Normalization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tautomers Structure Normalization A6199E68A788F2F5 -FICTS 959B273B619C709F -FICTS 61248C4A7D045A47 -FICTS 675R4FCC50F45026 -FICTS 0B345B47F6625113 -FICTS 181CA9BCE3EF47F4 -FICTS 1AD375920BE60DAD -FICTS 67196F0B20B1D934 -FICTS BCCDA7D0CDACF120 -FICTS CE8F480C11DBFC4F -FICTS D46A1E6500B06AB6 -FICTS D979CF9770AC0BA5 -FICTS 56FFE8B5619FB01 -FICTS F802E527EC5C61BF -FICTS EF060DA9D97091DE -FICTS BCCDA7D0CDACF120 -FICuS guanine UYTPUPDQBNUYGX-UHFFFAOYSA-N N N H N H N O H 2 N N N H N H N O H 2 N N N H N N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N N O H H 2 N H N N H N H N O H N N N H N H N O H H N H N N H N H N O H N N N H N H N O H H N H N N H N N O H H N H N N N H N O H H N H N N N H N O H H N
Tautomerism & Stereochemistry methyl propenyl ketone Structure Normalization O Z O E
tautomer tautomer methyl propenyl ketone Structure Normalization Tautomerism & Stereochemistry O Z O E O H
76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautomer generation. tautomer tautomer methyl propenyl ketone InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry O Z O E O H O
76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautomer generation. tautomer InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3+ LABTWGUMFABVFG -ONEGZZNKSA-N InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4,6H,1H2,2H3/b5-4- LYGWZVOQSCPYDG -PLNGDYQASA-N InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3- LABTWGUMFABVFG -ARJAWSKDSA-N tautomer methyl propenyl ketone InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3 LABTWGUMFABVFG -UHFFFAOYSA-N O Z O E O H O
821D8C17ACE5040E -FICTS 6EB4AA2BAA11965F -FICTS  1677645190718885 -FICTS  tautomer tautomer 76D03F08ACDF6C0C -FICTS methyl propenyl ketone FICTS “sees” four  different structures InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry O Z O E O H O
Charges in Resonance Systems Structure Normalization F3A27F03AE77A722 F3A27F03AE77A722 62FADCB01F197FC9 canonical resonance structure? uncharge ≠ uncharge problem! 2E011EE4519F7920 different protonation states N N H N N H H N N H N N H H
[object Object],[object Object],Structure Normalization shifting of charges: 5 rules recombination of charges: 5 rules separation of charges: 4 rules O N O Charges in Resonance Systems O N O O N O O N O O N O O N O
Structure Normalization (no plausible unpolarized resonance structure can be drawn) münchnones: 1.2 shift 1.2 recombination 1.2 recombination separation (pentavalent N atom) 1.3 shift 1.3 shift 1.3 recombination 1.3 shift 1.3 shift 1.3 shift 1.3 shift Charges in Resonance Systems IUYUGWCTOLFFCL-UHFFFAOYSA-N F68AC07DE0D3379F -FICuS N O O N O O N O O N O O N O O N O O N O O N O O
[object Object],[object Object],[object Object],»Chemical Structure Lookup Service« Database 74 million structure records  (~46 million unique structures) InChI/InChIKey - NCI/CADD Identifier comparison ChemNav. iResearch   Lib.  ~43% PubChem ~47% Others ~ 10%
[object Object],successful calculation of: Standard InChI/InChIKey:  73.8 million  records NCI/CADD Structure Identifiers:  73.7 million  records ,[object Object],Standard InChI/InChIKey: FICTS Identifier FICuS Identifier Standard InChIKey (first block) uuuuu Identifier 48,027,940 48,023,835 46,715,521 43,055,589 41,671,010 Standard InChI/InChIKeys where calculated by  stdinchi-1 (Linux i-386 executable) from the  original SD file  records Unique Structure Counts InChI/InChIKey - NCI/CADD Identifier comparison
original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison
original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison 1 conflicts? InChI/InChIKey - NCI/CADD Identifier comparison
original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison Standard InChI/InChIKey calculated by CACTVS from FICuS compound structure 1 conflicts? InChI/InChIKey - NCI/CADD Identifier comparison same InChI/InChIKey? 2
no conflicts  between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison FICuS linked to a single InChI/InChIKey both linked to a  single  structure record both linked to  multiple  structure records 62.3 34.4 27.9 all structure records (46.9%) (38.0%) 73.7 (84.5%) structure records (million records) 1
conflicts  between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison structure records (million records) all structure records FICuS is linked to multiple InChI/InChIKeys or vice versa one FICuS is linked to multiple InChI/InChIKeys one InChI/InChIKey is linked to multiple FICuS 10.4 3.6 6.8 (4.6%) (9.3%) (84.5%) 73.7 1
conflicts  between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison structure records (million records) all structure records FICuS is linked to multiple InChI/InChIKeys or vice versa one FICuS is linked to multiple InChI/InChIKeys one InChI/InChIKey is linked to multiple FICuS 10.4 3.6 6.8 (4.6%) (9.3%) (84.5%) 73.7 number of InChIKeys first block 0.9 number of InChIKeys first block 2.3 (1.2%) (3.1%) 1
Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (million records) all compounds 73.7 9.3 4.6 (29.7%) 21.9 (6.2%) (12.7%) structure records (million records) all records InChI/InChIKey - NCI/CADD Identifier comparison same InChI/InChIKey? InChI changes InChI changes 2
Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (million records) all compounds structure records (million records) all records InChI/InChIKey - NCI/CADD Identifier comparison 3.2 6.3 (7.6%) (8.4%) vs. InChIKey first block InChI changes InChI changes same InChI/InChIKey? 73.7 9.3 4.6 (29.7%) 21.9 (6.2%) (12.7%) 2
(formal) tautomer count >  1 (formal) tautomer count >  3 (formal) tautomer count > 10 full stereo contains metal atoms metal complexes salt has resonance charges inorganic compound classification 14.5% 18.5% 28.9% 16.9% 34.5% 52.1% 18.6% 52.1% 33.9% 56.4% 25.4% 5.5% 25.7% 0.8% 0.2% 1.0% 0.2% 0.1% Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison occurrence in FICuS set occurrence in  FICuS subset ( InChI changes )
FICuS : 12 different structure records linked to this structure Std. InChI/InChIKey (stdinchi-1) : calculates  3 different   strings/keys for these 12 structure records (all have the same connectivity layer/first block)  all of these 3  StdInChI/InChIKey  differ from the  StdInChI/InChIKey  calculated after  FICuS  normalization (including connectivity layer/ first block) InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
H N O N N H O O N O N O O N H Z E InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: InChI/InChIKey - NCI/CADD Identifier comparison H N O N N H O O ChemBlock A3422/0145215 N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N O O N H H N O N N H O O N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 S R H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 N O N N H O O How many structures? ZINC04685909 ChemBlock A3422/0145215 ChemNavigator 47748165 NIST MS-Lib 1967005690 ChemNavigator 34903393 ChemNavigator 65635274 H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O
H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 N O N N H O O How many structures? InChIKey A InChIKey B InChIKey C same connectivity layer/block FICuS parent structure H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O
Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I original structure
Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I best representation S N S N I original structure
Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I S N S N H I H H H H H S N S N I H H H best representation InChI FICuS Z E E Z E S N S N I original structure
The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey FICTS FICuS uuuuu Std. InChI/InChIKey 74 million structure records – 46 million unique structures http://cactus.nci.nih.gov/lookup Chemical Structure Lookup Service
Web Service Chemical Structure REST Service (beta)  http://cactus.nci.nih.gov/chemical/structure/ {identifier} / {method} http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / smiles http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / names http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / ficus http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / stdinchi http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / image http://cactus.nci.nih.gov/chemical/structure/ ethanol / stdinchikey http://cactus.nci.nih.gov/chemical/structure/ 64-17-5 / stdinchikey URL scheme: returns plain text/gif image if the structure identifier is not resolvable:  http 404 status code
Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, LMC, NCI Marc Nicklaus Igor V. Filippov CACTVS, Xemistry GmbH Wolf-Dietrich Ihlenfeldt Thanks to all database providers http://cactus.nci.nih.gov Our web site:

Weitere ähnliche Inhalte

Andere mochten auch

Aiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignAiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignShahir Shamsir
 
Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Aakshay Subramaniam
 
Computer aided drug designing
Computer aided drug designingComputer aided drug designing
Computer aided drug designingMuhammed sadiq
 
Computer aided drug designing
Computer aided drug designing Computer aided drug designing
Computer aided drug designing Ayesha Aftab
 

Andere mochten auch (7)

Computer Aided Drug Design
Computer Aided Drug DesignComputer Aided Drug Design
Computer Aided Drug Design
 
CADD Lecture
CADD LectureCADD Lecture
CADD Lecture
 
Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)Computer aided Drug designing (CADD)
Computer aided Drug designing (CADD)
 
Aiding Computer Aided Drug Design
Aiding Computer Aided Drug DesignAiding Computer Aided Drug Design
Aiding Computer Aided Drug Design
 
Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Computer aided drug designing (CADD)
Computer aided drug designing (CADD)
 
Computer aided drug designing
Computer aided drug designingComputer aided drug designing
Computer aided drug designing
 
Computer aided drug designing
Computer aided drug designing Computer aided drug designing
Computer aided drug designing
 

Ähnlich wie ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
5th Meeting on U.S. Government Chemical Databases and Open Chemistry TalkMarkus Sitzmann
 
ACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF TalkACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF TalkMarkus Sitzmann
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Michel Dumontier
 
Chemistry Resource FS1:15
Chemistry Resource FS1:15Chemistry Resource FS1:15
Chemistry Resource FS1:15Krystal Huffer
 
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...Richard West
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataValery Tkachenko
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekingeProf. Wim Van Criekinge
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekingeProf. Wim Van Criekinge
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...AIST
 
Nucleoside libray e-conference VRX-Harry
Nucleoside libray e-conference VRX-HarryNucleoside libray e-conference VRX-Harry
Nucleoside libray e-conference VRX-HarryHarry An
 
Question #1Rank the following alkenes in order of MOST to LEAS.docx
Question #1Rank the following alkenes in order of MOST to LEAS.docxQuestion #1Rank the following alkenes in order of MOST to LEAS.docx
Question #1Rank the following alkenes in order of MOST to LEAS.docxmakdul
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Christoph Steinbeck
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Ken Karapetyan
 

Ähnlich wie ACS Salt Lake City 2009 CINF Talk (InChI Symposium) (20)

ICCS9 2011 Talk
ICCS9 2011 TalkICCS9 2011 Talk
ICCS9 2011 Talk
 
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
 
ACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF TalkACS San Francisco 2010 CINF Talk
ACS San Francisco 2010 CINF Talk
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Seton2007
Seton2007Seton2007
Seton2007
 
Chemistry Resource FS1:15
Chemistry Resource FS1:15Chemistry Resource FS1:15
Chemistry Resource FS1:15
 
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...
Finding Transition States Algorithmically for Automatic Reaction Mechanism Ge...
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical data
 
2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge2015 bioinformatics score_matrices_wim_vancriekinge
2015 bioinformatics score_matrices_wim_vancriekinge
 
2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge2016 bioinformatics i_score_matrices_wim_vancriekinge
2016 bioinformatics i_score_matrices_wim_vancriekinge
 
RJM-Certificates
RJM-CertificatesRJM-Certificates
RJM-Certificates
 
Advanced Computational Drug Design
Advanced Computational Drug DesignAdvanced Computational Drug Design
Advanced Computational Drug Design
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Nucleoside libray e-conference VRX-Harry
Nucleoside libray e-conference VRX-HarryNucleoside libray e-conference VRX-Harry
Nucleoside libray e-conference VRX-Harry
 
Arom fold
Arom foldArom fold
Arom fold
 
Computational Chemistry Robots
Computational Chemistry RobotsComputational Chemistry Robots
Computational Chemistry Robots
 
Question #1Rank the following alkenes in order of MOST to LEAS.docx
Question #1Rank the following alkenes in order of MOST to LEAS.docxQuestion #1Rank the following alkenes in order of MOST to LEAS.docx
Question #1Rank the following alkenes in order of MOST to LEAS.docx
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
 
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
Standardization and Generation of Parents for Open PHACTS Chemical Registry S...
 

Kürzlich hochgeladen

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Kürzlich hochgeladen (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

  • 1. InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison Markus Sitzmann Computer-Aided Drug Design Group (NCI/CADD), Laboratory of Medicinal Chemistry, NCI-Frederick, NIH, DHHS
  • 2. The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey Chemical Structure Lookup Service FICTS FICuS uuuuu Std. InChI/InChIKey 74 million structure records – 46 million unique structures
  • 3.
  • 4. charged form A3DAE0788050DDE4 3ECEF579D7DF025A tautomers isotope “ errors” E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomers 6C16DE2351F9FF50 salt 9850FD9F9E2B4E25 H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 2 O - O N a + H N N N H 3 + O - O 8F7A1DE5A733F0E0 O H N N N H 2 O N a 60525E1AF41497B6 H N N N H O H O B2FDA68AEDA06DB9 N H N 1 5 N H 2 O H O
  • 5. input structure MDL Molfile MDL SDF SMILES ChemDraw cdx PDB structure normalization parent structure MDL SDF SMILES database NCI/CADD Identifier hashcode calculation NCI/CADD Structure Identifiers Unique Representation of Chemical Structures E_HASHISY
  • 6.
  • 7. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive un-sensitive un-sensitive un-sensitive un-sensitive Tautomers Stereochemistry sensitive sensitive Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 8. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICTS identifier: representation of the exact drawing un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive T ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive ≠ ≠ S Na + = = ≠ ≠ Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 9. NCI/CADD Structure Identifiers Fragments Isotopes Charges sensitive sensitive sensitive F I C FICuS identifier: comes closest to how a chemist perceives a compound un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive u ≠ ≠ ≠ ≠ Tautomers Stereochemistry sensitive sensitive = = ≠ ≠ S Na + Structure Normalization D D D D D D O O C O O H N H 2 O - O N H 3 + O H O N H 2 O O H O O H C O O H H N H 2 C O O H N H 2 H O O - O O H
  • 10. NCI/CADD Structure Identifier Fragments Isotopes Charges Tautomers Stereochemistry Na + sensitive sensitive sensitive sensitive sensitive = = = = = = = = uuuuu identifier: closely related forms of the same compound u u u u u un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive Structure Normalization O O - D D D D D D O - O N H 3 + O O H O O H C O O H H N H 2 C O O H N H 2 H O O H O O C O O H N H 2 O H O N H 2
  • 11. NCI/CADD Structure Identifier correct structure: add hydrogen atoms correct functional groups correct metal atom bonds input structure normalize or discard stereo information define canonical tautomer discard isotope labels d Structure Normalization get largest fragment & uncharge: delete complex center get largest organic fragment delete radical center uncharge structure uuuuu uuuuS uuuTu uuuTS FICuu FICuS FICTS FICTu n n n n d d d define canonical resonance form/ protonation state parent structures
  • 12. NCI/CADD Structure Identifier 9850FD9F9E2B4E25 -FICTS-01-57 9850FD9F9E2B4E25 -FICuS-01-78 9850FD9F9E2B4E25 -uuuuu-01-27 <CACTVS hashcode (E_HASHISY)>-<tag>-<version>-<checksum> H N N N H 2 O H O
  • 13. A3DAE0788050DDE4-FICTS E5F83F10C5DB080A -FICTS B2FDA68AEDA06DB9-FICTS 9850FD9F9E2B4E25 -FICTS E5F83F10C5DB080A -FICTS E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICTS charged form tautomers isotope salt stereoisomers FICTS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 14. A3DAE0788050DDE4-FICuS E5F83F10C5DB080A -FICuS B2FDA68AEDA06DB9-FICuS 9850FD9F9E2B4E25 -FICuS E5F83F10C5DB080A -FICuS E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25 -FICuS H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -FICuS charged form tautomers isotope salt stereoisomers FICuS “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 15. 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -FICuS 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu 9850FD9F9E2B4E25 -uuuuu H N N N H 2 O - O N a + 9850FD9F9E2B4E25 -uuuuu charged form tautomers isotope stereoisomers salt uuuuu “ errors” H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 16. HNDVDQJCIGZPNO -UHFFFAOYSA-N HNDVDQJCIGZPNO -CDYZYAPPSA-N HNDVDQJCIGZPNO -RXMQYKEDSA-N HNDVDQJCIGZPNO -YFKPBYRVSA-N HNDVDQJCIGZPNO -UHFFFAOYSA-N H N N N H 2 O - O N a + HNDVDQJCIGZPNO -UHFFFAOYSA-N charged form tautomers isotope stereoisomers salt Std. InChIKey “ errors” HNDVDQJCIGZPNO -UHFFFAOYSA-N UHPNKBYGGMJTIM-UHFFFAOYSA-M UHPNKBYGGMJTIM-UHFFFAOYSA-M H N N N H 2 O H O N N H N H 2 O H O H N N O H O N H 2 H N N O H O N H 2 H N N N H 3 + O - O O H N N N H 2 O N a H N N N H O H O N H N 1 5 N H 2 O H O
  • 17. Structure Normalization Tautomers canonical tautomer ? O O OH O O OH O O O
  • 18.
  • 19.
  • 20. Tautomers Structure Normalization A6199E68A788F2F5 -FICTS 959B273B619C709F -FICTS 61248C4A7D045A47 -FICTS 675R4FCC50F45026 -FICTS 0B345B47F6625113 -FICTS 181CA9BCE3EF47F4 -FICTS 1AD375920BE60DAD -FICTS 67196F0B20B1D934 -FICTS BCCDA7D0CDACF120 -FICTS CE8F480C11DBFC4F -FICTS D46A1E6500B06AB6 -FICTS D979CF9770AC0BA5 -FICTS 56FFE8B5619FB01 -FICTS F802E527EC5C61BF -FICTS EF060DA9D97091DE -FICTS BCCDA7D0CDACF120 -FICuS guanine UYTPUPDQBNUYGX-UHFFFAOYSA-N N N H N H N O H 2 N N N H N H N O H 2 N N N H N N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N H N O H 2 N N N N H N O H H 2 N H N N N N O H H 2 N H N N H N H N O H N N N H N H N O H H N H N N H N H N O H N N N H N H N O H H N H N N H N N O H H N H N N N H N O H H N H N N N H N O H H N
  • 21. Tautomerism & Stereochemistry methyl propenyl ketone Structure Normalization O Z O E
  • 22. tautomer tautomer methyl propenyl ketone Structure Normalization Tautomerism & Stereochemistry O Z O E O H
  • 23. 76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautomer generation. tautomer tautomer methyl propenyl ketone InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry O Z O E O H O
  • 24. 76D03F08ACDF6C0C -FICuS FICUS disregards stereo-chemistry on double bonds if the double bond is not located during tautomer generation. tautomer InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3+ LABTWGUMFABVFG -ONEGZZNKSA-N InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4,6H,1H2,2H3/b5-4- LYGWZVOQSCPYDG -PLNGDYQASA-N InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3- LABTWGUMFABVFG -ARJAWSKDSA-N tautomer methyl propenyl ketone InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3 LABTWGUMFABVFG -UHFFFAOYSA-N O Z O E O H O
  • 25. 821D8C17ACE5040E -FICTS 6EB4AA2BAA11965F -FICTS 1677645190718885 -FICTS tautomer tautomer 76D03F08ACDF6C0C -FICTS methyl propenyl ketone FICTS “sees” four different structures InChI/InChIKey - NCI/CADD Identifier comparison Tautomerism & Stereochemistry O Z O E O H O
  • 26. Charges in Resonance Systems Structure Normalization F3A27F03AE77A722 F3A27F03AE77A722 62FADCB01F197FC9 canonical resonance structure? uncharge ≠ uncharge problem! 2E011EE4519F7920 different protonation states N N H N N H H N N H N N H H
  • 27.
  • 28. Structure Normalization (no plausible unpolarized resonance structure can be drawn) münchnones: 1.2 shift 1.2 recombination 1.2 recombination separation (pentavalent N atom) 1.3 shift 1.3 shift 1.3 recombination 1.3 shift 1.3 shift 1.3 shift 1.3 shift Charges in Resonance Systems IUYUGWCTOLFFCL-UHFFFAOYSA-N F68AC07DE0D3379F -FICuS N O O N O O N O O N O O N O O N O O N O O N O O
  • 29.
  • 30.
  • 31. original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison
  • 32. original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison 1 conflicts? InChI/InChIKey - NCI/CADD Identifier comparison
  • 33. original structure record set (74.2 million) FICuS compound set (46.7 million unique) Standard InchI/InChIKey set calculated by stdinchi-1 (73.8 million, 48.0 million unique) Detailed Comparison Standard InChI/InChIKey calculated by CACTVS from FICuS compound structure 1 conflicts? InChI/InChIKey - NCI/CADD Identifier comparison same InChI/InChIKey? 2
  • 34. no conflicts between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison FICuS linked to a single InChI/InChIKey both linked to a single structure record both linked to multiple structure records 62.3 34.4 27.9 all structure records (46.9%) (38.0%) 73.7 (84.5%) structure records (million records) 1
  • 35. conflicts between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison structure records (million records) all structure records FICuS is linked to multiple InChI/InChIKeys or vice versa one FICuS is linked to multiple InChI/InChIKeys one InChI/InChIKey is linked to multiple FICuS 10.4 3.6 6.8 (4.6%) (9.3%) (84.5%) 73.7 1
  • 36. conflicts between Std. InChI/InChIKey and FICuS Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison structure records (million records) all structure records FICuS is linked to multiple InChI/InChIKeys or vice versa one FICuS is linked to multiple InChI/InChIKeys one InChI/InChIKey is linked to multiple FICuS 10.4 3.6 6.8 (4.6%) (9.3%) (84.5%) 73.7 number of InChIKeys first block 0.9 number of InChIKeys first block 2.3 (1.2%) (3.1%) 1
  • 37. Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (million records) all compounds 73.7 9.3 4.6 (29.7%) 21.9 (6.2%) (12.7%) structure records (million records) all records InChI/InChIKey - NCI/CADD Identifier comparison same InChI/InChIKey? InChI changes InChI changes 2
  • 38. Detailed Comparison FICuS FICTS uuuuu 46.7 48.0 41.6 6.4 (13.7%) 3.8 (7.9%) 11.9 (28.6%) compounds (unique structures) (million records) all compounds structure records (million records) all records InChI/InChIKey - NCI/CADD Identifier comparison 3.2 6.3 (7.6%) (8.4%) vs. InChIKey first block InChI changes InChI changes same InChI/InChIKey? 73.7 9.3 4.6 (29.7%) 21.9 (6.2%) (12.7%) 2
  • 39. (formal) tautomer count > 1 (formal) tautomer count > 3 (formal) tautomer count > 10 full stereo contains metal atoms metal complexes salt has resonance charges inorganic compound classification 14.5% 18.5% 28.9% 16.9% 34.5% 52.1% 18.6% 52.1% 33.9% 56.4% 25.4% 5.5% 25.7% 0.8% 0.2% 1.0% 0.2% 0.1% Detailed Comparison InChI/InChIKey - NCI/CADD Identifier comparison occurrence in FICuS set occurrence in FICuS subset ( InChI changes )
  • 40. FICuS : 12 different structure records linked to this structure Std. InChI/InChIKey (stdinchi-1) : calculates 3 different strings/keys for these 12 structure records (all have the same connectivity layer/first block) all of these 3 StdInChI/InChIKey differ from the StdInChI/InChIKey calculated after FICuS normalization (including connectivity layer/ first block) InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
  • 41. H N O N N H O O N O N O O N H Z E InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N N H O O
  • 42. H N O N N H O O N O N O O N H Z E tautomer: InChI/InChIKey - NCI/CADD Identifier comparison H N O N N H O O ChemBlock A3422/0145215 N O N N H O O
  • 43. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N O O N H H N O N N H O O N O N N H O O
  • 44. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O N O N N H O O
  • 45. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 S R H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O N O N N H O O
  • 46. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 N O N N H O O How many structures? ZINC04685909 ChemBlock A3422/0145215 ChemNavigator 47748165 NIST MS-Lib 1967005690 ChemNavigator 34903393 ChemNavigator 65635274 H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O
  • 47. H N O N N H O O N O N O O N H Z E tautomer: tautomeric interconversion? tautomeric interconversion? S R InChI/InChIKey - NCI/CADD Identifier comparison ChemBlock A3422/0145215 N O N N H O O How many structures? InChIKey A InChIKey B InChIKey C same connectivity layer/block FICuS parent structure H N O N O O N H H N O N N H O O N O N N H O O N O N N H O O
  • 48. Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I original structure
  • 49. Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I best representation S N S N I original structure
  • 50. Dithiazinine InChI/InChIKey - NCI/CADD Identifier comparison S N S N I S N S N H I H H H H H S N S N I H H H best representation InChI FICuS Z E E Z E S N S N I original structure
  • 51. The Adaption and Use of the IUPAC InChI/InChIKey NCI/CADD Identifiers InChI/InChIKey FICTS FICuS uuuuu Std. InChI/InChIKey 74 million structure records – 46 million unique structures http://cactus.nci.nih.gov/lookup Chemical Structure Lookup Service
  • 52. Web Service Chemical Structure REST Service (beta) http://cactus.nci.nih.gov/chemical/structure/ {identifier} / {method} http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / smiles http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / names http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / ficus http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / stdinchi http://cactus.nci.nih.gov/chemical/structure/ InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N / image http://cactus.nci.nih.gov/chemical/structure/ ethanol / stdinchikey http://cactus.nci.nih.gov/chemical/structure/ 64-17-5 / stdinchikey URL scheme: returns plain text/gif image if the structure identifier is not resolvable: http 404 status code
  • 53. Acknowledgments ChemNavigator Scott Hutton Tad Hurst CADD Group, LMC, NCI Marc Nicklaus Igor V. Filippov CACTVS, Xemistry GmbH Wolf-Dietrich Ihlenfeldt Thanks to all database providers http://cactus.nci.nih.gov Our web site: