SlideShare a Scribd company logo
1 of 28
Download to read offline
Implementing iso 11238 standard
compliance with chemaxon tools
Roger Sayle
Nextmove software, cambridge, uk
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
What is iso 11238?
• ISO standard 11238 entitled “Health Informatics –
Identification of medicinal products – Data elements
and structures for the unique identification and
exchange of regulated information on substances”.
• Defines a framework for uniquely identifying and
exchanging compounds of pharmaceutical interest.
• The framework serves a similar role to CAS registry
numbers, PubChem CID or InChI-Key, assigning
unique identifiers to substances.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Meet the (IDMP) family
• 11238 is one of a suite of 5 related standards, all for
“unique identification and exchange of …”
– 11238 “… regulated information on substances”.
– 11239 “… dose forms, units, administration, etc.”.
– 11240 “… units of measurement”.
– 11615 “… regulated medicinal product information”.
– 11616 “… regulated pharmaceutical product information”.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Why this is 11238 important?
• EU regulation 520/2012 on “pharmacovigilance”
requires countries, regulatory authorities and
pharma to adopt the 5 IDMP standards (articles 25
and 26) by 1st July 2016 (article 40).
• Executive summary: It’s the law!
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
How it works
Code Assignment
(Authority)
Code Look-up
(Authority)
Name/Identifer
Connection Table
Properties
(Significant Text)
Unique Code
Unique Code
Name/Identifer
Connection Table
Properties
(Significant Text)
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Likely implementation
Code Assignment
(Authority)
Code Look-up
(Authority)
Name/Identifer
Connection Table
Properties
(Significant Text)
Unique Code
Unique Code
Name/Identifer
Connection Table
Properties
(Significant Text)
FDA UNII
FDA SRS Search
FDA UNII
XML
INN/USAN/CID
FDA/NCATS GInAS
MOL2000/SMILES/InChI
Protein/NA Sequence
ISO11238 Groups 1-4
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Current status
• The standard has been ratified and it use has been
written into EU law (EU Reg. 520/2012).
• Framework requires use of non-semantic, random,
fixed length unique identifiers, that include an
internal integrity check.
• The standard also details constraints on uniqueness.
• Exact implementation details yet to be determined
(to appear in a future “Implementation Guide”).
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
What will the future look like?
• ISO11238 compliant identifiers will be very similar to
the FDA’s UNII (UNique Ingredient Identifier).
• The fixed width non-semantic identifier requirement
rules out the use of plain SMILES, InChI, V2000 Mol
file and similar encodings.
• The random requirement rules out plain CAS registry
numbers, PubChem CIDs and ChEMBL IDs (which use
sequential or monotonic number assignment).
• Alternatively, InChI keys or similar hashes (with [CRC]
checks) of connection tables+text may be possible.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
What’s available now
• ISO charge for access to official standards documents
(which is why 5 IDMP standards is more profitable
than one), about 158 CHF ($177 USD) from ISO for
11238 [between $120 and $340 online].
• However, as with many ISO standards, late drafts of
ISO 11238 are freely available on the internet.
• Caution: Many of the technical examples (all XML)
were removed from the final standard and are due to
appear in the upcoming “Implementation Guide(s)”.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Example requirement
• §3.4 “Naming of substances” states “at least one
substance name or company code shall be associated
with each substance”.
• For the envisioned work flows this typically assumes
INN or USAN name has already been assigned.
• One way to guarantee the existence of a suitable
substance name for investigational compounds is to
use IUPAC naming software (such as ChemAxon’s)
during submission to the unique coding authority.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
• Plug: ChemAxon s2n coverage is state-of-the-art.
The devil is in the details
• One of the interesting cheminformatics challenges
with working with the published ISO standard and
the examples from the draft annex is the typography.
• The document has been typeset by editors with
expertise outside the field of cheminformatics who
have inadvertently changed whitespace without
appreciating the impact this has on chemistry tools.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Final ISO11238 standard Annex A
• §A.2.3 SMILES uses the example “C1 = CC = CC = C1”
where the spurious spaces create problems for
SMILES readers.
• §A.2.4 InChI both strips the “InChI=” prefix and again
suffers from spaces “1/C6H6 /c1-2-4-6-5-3-1/h1-6H”.
– Interestingly this is an old InChI not a standard InChI.
• §A.2.2 Molfile fails to mention that V2000 mol files
use fixed width columns and blank lines, as a result
the example given in text *next slide+ can’t easily be
read.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Annex A: example.mol
ACD/Labs0812062058
6 6 0 0 0 0 0 0 0 0 1 V2000
1.9050 −0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9050 −2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 −0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 −2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
−0.3987 −0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
−0.3987 −2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0
3 1 2 0 0 0 0
4 2 2 0 0 0 0
5 3 1 0 0 0 0
6 4 1 0 0 0 0
6 5 2 0 0 0 0
M END
$$$$
Missing Blank Lines
Incorrectly aligned
columns
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Benefit of the doubt?
• These unintentional typographical errors in the
normative text may perhaps be the result of poor
fonts, with the exception of “InChI=”.
• Alas the content of the original Annex B from the
draft indicate these issues were more widespread
and may arise from ignorance of cheminformatics
file formats.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
§B.2.2 InChI in XML Example
<STRUCTURAL_REPRESENTATION_TYPE>INCHI</STRUCTURAL_REPRESENTATION_TYPE>
<STRUCTURAL_REPRESENTATION>1S/C2H5NO2.AL.CLH.2H2O.ZR/C3-1-
2(4)5;;;;;/H1,3H2,(H,4,5);;1H;2*1H2;/Q;+3;;;;+4/P-
2</STRUCTURAL_REPRESENTATION>
Missing InChI=
Standard and Non-
Standard InChI?
Converted to
upper case
Indentation
Spurious Spaces
Line Breaks
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
§B.2.4 V2000 Mol File in XML Example
<STRUCTURAL_REPRESENTATION_TYPE>MOL</STRUCTURAL_REPRESENTATION_TYPE>
<STRUCTURAL_REPRESENTATION>30 29 0 0 0 0 0 0 0 0999 V2000 9.9563 -7.3055 0.0000 Y
1 1 0 0 0 0 0 0 0 0 0 0 15.0355 -4.8847 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0 13.3609 -
8.0134 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8867 -9.9869 0.0000 O 0 5 0 0 0 0 0 0 0 0 0
0 6.4178 -6.8678 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 5.8872 -4.8955 0.0000 O 0 5 0 0 0 0
0 0 0 0 0 0 6.7218 -5.7285 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.0541 -9.1519 0.0000 C
0 0 0 0 0 0 0 0 0 0 0 0 13.3408 -6.8634 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8599 -
4.8881 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 13.0301 -5.7260 0.0000 C 0 0 0 0 0 0 0 0 0 0 0
0 5.9099 -9.9441 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.4492 -7.9743 0.0000 O 0 0 0 0 0 0
0 0 0 0 0 0 6.7482 -9.1149 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.8605 -5.4221 0.0000 C 0
0 0 0 0 0 0 0 0 0 0 0 11.8897 -5.4263 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.9147 -9.4555
0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.8855 -9.4263 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
7.6897 -8.0305 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.6897 -6.8513 0.0000 C 0 0 0 0 0 0 0
0 0 0 0 0 8.7018 -6.2618 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 9.2908 -5.2506 0.0000 C 0 0
0 0 0 0 0 0 0 0 0 0 10.4700 -5.2524 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.0577 -6.2664
0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 12.0761 -6.8427 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
12.0891 -8.0218 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.7257 -8.5952 0.0000 N 0 0 0 0 0 0
0 0 0 0 0 0 11.0839 -8.6223 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 10.4848 -9.6275 0.0000
C 0 0 0 0 0 0 0 0 0 0 0 0 9.3057 -9.6139 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10 2 1 0 0 0 0
8 3 2 0 0 0 0 25 24 1 0 0 0 0 8 4 1 0 0 0 0 27 18 1 0 0 0 0 7 5 2 0 0 0 0 26 28 1 0 0 0 0
7 6 1 0 0 0 0 19 27 1 0 0 0 0 15 7 1 0 0 0 0 20 21 1 0 0 0 0 17 8 1 0 0 0 0 30 27 1 0 0 0
0 11 9 2 0 0 0 0 30 29 1 0 0 0 0 11 10 1 0 0 0 0 20 19 1 0 0 0 0 16 11 1 0 0 0 0 22 21 1
0 0 0 0 14 12 1 0 0 0 0 23 24 1 0 0 0 0 14 13 2 0 0 0 0 18 14 1 0 0 0 0 26 25 1 0 0 0 0
21 15 1 0 0 0 0 29 28 1 0 0 0 0 24 16 1 0 0 0 0 23 22 1 0 0 0 0 28 17 1 0 0 0 0 M CHG 4
1 3 4 -1 6 -1 12 -1 M ISO 1 1 90 M END </STRUCTURAL_REPRESENTATION>
Where to begin?
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
All is not lost!
• Back at the 2011 ChemAxon UGM here in Budapest,
Sorel Muressan from AstraZeneca Sweden gave a
presentation on how spelling correction improves
the recall of ChemAxon’s name-to-structure tools.
• The exact same CaffeineFix technology can be
applied to perform aggressive “spelling correction”
on SMILES strings, InChI and V2000 mol files.
• As with IUPAC-like systematic names, these can each
be specified by a formal grammar.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
How the algorithm works
• The regular expression describing a V2000 mol files is
compiled into a “finite state machine” with 1333
states.
• The only allowed “corrections” are the deletion of
new lines and the insertion of spaces or new lines,
but only where permitted in the grammar/FSM.
• Depth-first recursion is used to identify a minimal set
of edits to correct the input.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
§B.2.4 example after correction
30 29 0 0 0 0 0 0 0 0999 V2000
9.9563 -7.3055 0.0000 Y 1 1 0 0 0 0 0 0 0 0 0 0
15.0355 -4.8847 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0
13.3609 -8.0134 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
13.8867 -9.9869 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
6.4178 -6.8678 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
5.8872 -4.8955 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
6.7218 -5.7285 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
13.0541 -9.1519 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
13.3408 -6.8634 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
13.8599 -4.8881 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
...
21 15 1 0 0 0 0
29 28 1 0 0 0 0
24 16 1 0 0 0 0
23 22 1 0 0 0 0
28 17 1 0 0 0 0
M CHG 4 1 3 4 -1 6 -1 12 -1
M ISO 1 1 90
M END
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
3 line Header Block
before Count Line
Chemaxon toolkit
implementation
public static Molecule molFileToChemaxonMol(String molFileStr)
throws MolFormatException {
try {
return MolImporter.importMol(molFileStr);
}
catch (MolFormatException e) {
molFileStr = FixMolFile.fixMolFile(molFileStr);
if (molFileStr == null){
throw e;
}
return MolImporter.importMol(molFileStr);
}
}
// Java source code available at http://www.chemaxon.com/forum/ftopic1265.html
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Geek of the week
• A particularly tricky corner case concerns Accerlys’
Pipeline Pilot-style V2000 mol files which abbreviate
the columns in the atom block (to save space).
• In these files there’s potential ambiguity where the
first bond line is mistaken as a continuation of the
last (abbreviated) atom line.
• Our solution relies on the atom stereo care field
being zero in non-query mol files vs. the non-zero
values that appear in the first three fields of bonds.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Lest we forget
• A similar “spelling correction” variant that allows
uppercase characters to be mapped to lowercase,
and the prefix “InChI=” to magically appear at the
start of a string can also be used to fix ISO InChIs.
• Alas uppercasing an InChI (or any molecular formula)
is potentially lossy, e.g. “CsN” vs. “CSn”.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Before and after InChI example
1S/C17H21CLN4O/C1-22-12-3-2-4-13(22)8-11(7-
12)21-17(23)14-5-10(18)6-15-16(14)20-9-19-15/H5-
6,9,11-13H,2-4,7-8H2,1H3,(H,19,20)(H,21,23)
InChI=1S/C17H21ClN4O/c1-22-12-3-2-4-13(22)8-11(7-
12)21-17(23)14-5-10(18)6-15-16(14)20-9-19-15/h5-
6,9,11-13H,2-4,7-8H2,1H3,(H,19,20)(H,21,23)
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
How common are the ambiguities?
• 1.35 million standard InChIs from ChEMBL
• Uppercase the InChIs, fix them and check
whether the original InChI can be regenerated
• 99.5% roundtrip (6596 discrepancies)
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Inchi case-insensitive ambiguities
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
conclusions
• The Java source code for recovering V2000 mol files
and InChIs from the types of corruption seen in the
ISO 12238 draft has now been contributed to the
ChemAxon forum, allowing Marvin and JChem to
read the examples given in that document.
• Whether this functionality will be required to fully
support the final (pending) “Implementation Guide”
requirements remains to be seen (and voted upon).
• Attention to detail is important in standards writing.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Final words
• ISO 11238 IDs may become as popular as
Chemical Abstracts’ registry numbers.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
acknowledgements
• Daniel Lowe, NextMove Software, Cambridge, UK.
• Richard Bolton, GSK, Stevenage, UK.
• Evan Bolton, NCBI PubChem, Bethesda, MD, USA.
• Dac-Trung Nguyen, NIH NCATS, Rockville, MD, USA.
• Tyler Peryea, NIH NCATS, Rockville, MD, USA.
• Noel Southall, NIH NCATS, Rockville, MD, USA.
• Yulia Borodina, FDA, Silver Spring, MD, USA.
• Lawrence Callahan, FDA, Silver Spring, MD, USA.
• Andrew Marr, Marr Consultancy, Knebworth, UK.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014

More Related Content

Similar to EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 compliance with ChemAxon tools

Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...NextMove Software
 
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...Yole Developpement
 
SOA_Case Study_Solution_Overview
SOA_Case Study_Solution_OverviewSOA_Case Study_Solution_Overview
SOA_Case Study_Solution_Overviewsuri86
 
Cross standard and scheme composition - A needed cornerstone for the European...
Cross standard and scheme composition - A needed cornerstone for the European...Cross standard and scheme composition - A needed cornerstone for the European...
Cross standard and scheme composition - A needed cornerstone for the European...Javier Tallón
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsAlex Clark
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in ActionSSA KPI
 
Interoperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldInteroperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldMonika Solanki
 
IEC 61850 Lessons Learned 2016 04-11
IEC 61850 Lessons Learned 2016 04-11IEC 61850 Lessons Learned 2016 04-11
IEC 61850 Lessons Learned 2016 04-11Kevin Mahoney
 
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...ChemAxon
 
OpenChain @ Bitkom Forum Open Source 2022
OpenChain @ Bitkom Forum Open Source 2022OpenChain @ Bitkom Forum Open Source 2022
OpenChain @ Bitkom Forum Open Source 2022Shane Coughlan
 
plastic by kolomohjjjj amole shérif .pdf
plastic by kolomohjjjj amole shérif .pdfplastic by kolomohjjjj amole shérif .pdf
plastic by kolomohjjjj amole shérif .pdfrhrassanconnect
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataValery Tkachenko
 
UNSPSC Process and Samples
UNSPSC Process and SamplesUNSPSC Process and Samples
UNSPSC Process and SamplesIndra kumar
 
Industry 4.0 - Enabling operational excellence of packaging lines
Industry 4.0 - Enabling operational excellence of packaging linesIndustry 4.0 - Enabling operational excellence of packaging lines
Industry 4.0 - Enabling operational excellence of packaging linesStephane Potier
 
Assembly Root Cause Analysis A Way To Reduce Dimensional Variation In Assemb...
Assembly Root Cause Analysis  A Way To Reduce Dimensional Variation In Assemb...Assembly Root Cause Analysis  A Way To Reduce Dimensional Variation In Assemb...
Assembly Root Cause Analysis A Way To Reduce Dimensional Variation In Assemb...Stephen Faucher
 

Similar to EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 compliance with ChemAxon tools (20)

Partex Catalogue
Partex Catalogue Partex Catalogue
Partex Catalogue
 
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
 
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
 
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...
 
SOA_Case Study_Solution_Overview
SOA_Case Study_Solution_OverviewSOA_Case Study_Solution_Overview
SOA_Case Study_Solution_Overview
 
Cross standard and scheme composition - A needed cornerstone for the European...
Cross standard and scheme composition - A needed cornerstone for the European...Cross standard and scheme composition - A needed cornerstone for the European...
Cross standard and scheme composition - A needed cornerstone for the European...
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithms
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in Action
 
Interoperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldInteroperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT world
 
IEC 61850 Lessons Learned 2016 04-11
IEC 61850 Lessons Learned 2016 04-11IEC 61850 Lessons Learned 2016 04-11
IEC 61850 Lessons Learned 2016 04-11
 
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
 
OpenChain @ Bitkom Forum Open Source 2022
OpenChain @ Bitkom Forum Open Source 2022OpenChain @ Bitkom Forum Open Source 2022
OpenChain @ Bitkom Forum Open Source 2022
 
plastic by kolomohjjjj amole shérif .pdf
plastic by kolomohjjjj amole shérif .pdfplastic by kolomohjjjj amole shérif .pdf
plastic by kolomohjjjj amole shérif .pdf
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical data
 
UNSPSC Process and Samples
UNSPSC Process and SamplesUNSPSC Process and Samples
UNSPSC Process and Samples
 
Industry 4.0 - Enabling operational excellence of packaging lines
Industry 4.0 - Enabling operational excellence of packaging linesIndustry 4.0 - Enabling operational excellence of packaging lines
Industry 4.0 - Enabling operational excellence of packaging lines
 
Assembly Root Cause Analysis A Way To Reduce Dimensional Variation In Assemb...
Assembly Root Cause Analysis  A Way To Reduce Dimensional Variation In Assemb...Assembly Root Cause Analysis  A Way To Reduce Dimensional Variation In Assemb...
Assembly Root Cause Analysis A Way To Reduce Dimensional Variation In Assemb...
 
Forecasting Steel
Forecasting SteelForecasting Steel
Forecasting Steel
 
Vocabularies and Linked Open Data
Vocabularies and Linked Open DataVocabularies and Linked Open Data
Vocabularies and Linked Open Data
 
Lo c 2011-05-18
Lo c 2011-05-18Lo c 2011-05-18
Lo c 2011-05-18
 

More from ChemAxon

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?ChemAxon
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemAxon
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive modelsChemAxon
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...ChemAxon
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data managementChemAxon
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseChemAxon
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation ChemAxon
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...ChemAxon
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...ChemAxon
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryChemAxon
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...ChemAxon
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloudChemAxon
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationChemAxon
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction ChemAxon
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...ChemAxon
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology ChemAxon
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem MicroservicesChemAxon
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choralChemAxon
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5ChemAxon
 

More from ChemAxon (20)

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive models
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data management
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug Discovery
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloud
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem Microservices
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choral
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5
 

Recently uploaded

Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 

Recently uploaded (20)

Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 

EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 compliance with ChemAxon tools

  • 1. Implementing iso 11238 standard compliance with chemaxon tools Roger Sayle Nextmove software, cambridge, uk ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 2. What is iso 11238? • ISO standard 11238 entitled “Health Informatics – Identification of medicinal products – Data elements and structures for the unique identification and exchange of regulated information on substances”. • Defines a framework for uniquely identifying and exchanging compounds of pharmaceutical interest. • The framework serves a similar role to CAS registry numbers, PubChem CID or InChI-Key, assigning unique identifiers to substances. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 3. Meet the (IDMP) family • 11238 is one of a suite of 5 related standards, all for “unique identification and exchange of …” – 11238 “… regulated information on substances”. – 11239 “… dose forms, units, administration, etc.”. – 11240 “… units of measurement”. – 11615 “… regulated medicinal product information”. – 11616 “… regulated pharmaceutical product information”. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 4. Why this is 11238 important? • EU regulation 520/2012 on “pharmacovigilance” requires countries, regulatory authorities and pharma to adopt the 5 IDMP standards (articles 25 and 26) by 1st July 2016 (article 40). • Executive summary: It’s the law! ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 5. How it works Code Assignment (Authority) Code Look-up (Authority) Name/Identifer Connection Table Properties (Significant Text) Unique Code Unique Code Name/Identifer Connection Table Properties (Significant Text) ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 6. Likely implementation Code Assignment (Authority) Code Look-up (Authority) Name/Identifer Connection Table Properties (Significant Text) Unique Code Unique Code Name/Identifer Connection Table Properties (Significant Text) FDA UNII FDA SRS Search FDA UNII XML INN/USAN/CID FDA/NCATS GInAS MOL2000/SMILES/InChI Protein/NA Sequence ISO11238 Groups 1-4 ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 7. Current status • The standard has been ratified and it use has been written into EU law (EU Reg. 520/2012). • Framework requires use of non-semantic, random, fixed length unique identifiers, that include an internal integrity check. • The standard also details constraints on uniqueness. • Exact implementation details yet to be determined (to appear in a future “Implementation Guide”). ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 8. What will the future look like? • ISO11238 compliant identifiers will be very similar to the FDA’s UNII (UNique Ingredient Identifier). • The fixed width non-semantic identifier requirement rules out the use of plain SMILES, InChI, V2000 Mol file and similar encodings. • The random requirement rules out plain CAS registry numbers, PubChem CIDs and ChEMBL IDs (which use sequential or monotonic number assignment). • Alternatively, InChI keys or similar hashes (with [CRC] checks) of connection tables+text may be possible. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 9. What’s available now • ISO charge for access to official standards documents (which is why 5 IDMP standards is more profitable than one), about 158 CHF ($177 USD) from ISO for 11238 [between $120 and $340 online]. • However, as with many ISO standards, late drafts of ISO 11238 are freely available on the internet. • Caution: Many of the technical examples (all XML) were removed from the final standard and are due to appear in the upcoming “Implementation Guide(s)”. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 10. Example requirement • §3.4 “Naming of substances” states “at least one substance name or company code shall be associated with each substance”. • For the envisioned work flows this typically assumes INN or USAN name has already been assigned. • One way to guarantee the existence of a suitable substance name for investigational compounds is to use IUPAC naming software (such as ChemAxon’s) during submission to the unique coding authority. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014 • Plug: ChemAxon s2n coverage is state-of-the-art.
  • 11. The devil is in the details • One of the interesting cheminformatics challenges with working with the published ISO standard and the examples from the draft annex is the typography. • The document has been typeset by editors with expertise outside the field of cheminformatics who have inadvertently changed whitespace without appreciating the impact this has on chemistry tools. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 12. Final ISO11238 standard Annex A • §A.2.3 SMILES uses the example “C1 = CC = CC = C1” where the spurious spaces create problems for SMILES readers. • §A.2.4 InChI both strips the “InChI=” prefix and again suffers from spaces “1/C6H6 /c1-2-4-6-5-3-1/h1-6H”. – Interestingly this is an old InChI not a standard InChI. • §A.2.2 Molfile fails to mention that V2000 mol files use fixed width columns and blank lines, as a result the example given in text *next slide+ can’t easily be read. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 13. Annex A: example.mol ACD/Labs0812062058 6 6 0 0 0 0 0 0 0 0 1 V2000 1.9050 −0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.9050 −2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7531 −0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7531 −2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 −0.3987 −0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 −0.3987 −2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 3 1 2 0 0 0 0 4 2 2 0 0 0 0 5 3 1 0 0 0 0 6 4 1 0 0 0 0 6 5 2 0 0 0 0 M END $$$$ Missing Blank Lines Incorrectly aligned columns ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 14. Benefit of the doubt? • These unintentional typographical errors in the normative text may perhaps be the result of poor fonts, with the exception of “InChI=”. • Alas the content of the original Annex B from the draft indicate these issues were more widespread and may arise from ignorance of cheminformatics file formats. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 15. §B.2.2 InChI in XML Example <STRUCTURAL_REPRESENTATION_TYPE>INCHI</STRUCTURAL_REPRESENTATION_TYPE> <STRUCTURAL_REPRESENTATION>1S/C2H5NO2.AL.CLH.2H2O.ZR/C3-1- 2(4)5;;;;;/H1,3H2,(H,4,5);;1H;2*1H2;/Q;+3;;;;+4/P- 2</STRUCTURAL_REPRESENTATION> Missing InChI= Standard and Non- Standard InChI? Converted to upper case Indentation Spurious Spaces Line Breaks ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 16. §B.2.4 V2000 Mol File in XML Example <STRUCTURAL_REPRESENTATION_TYPE>MOL</STRUCTURAL_REPRESENTATION_TYPE> <STRUCTURAL_REPRESENTATION>30 29 0 0 0 0 0 0 0 0999 V2000 9.9563 -7.3055 0.0000 Y 1 1 0 0 0 0 0 0 0 0 0 0 15.0355 -4.8847 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0 13.3609 - 8.0134 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8867 -9.9869 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.4178 -6.8678 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 5.8872 -4.8955 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.7218 -5.7285 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.0541 -9.1519 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.3408 -6.8634 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8599 - 4.8881 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 13.0301 -5.7260 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.9099 -9.9441 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.4492 -7.9743 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 6.7482 -9.1149 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.8605 -5.4221 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.8897 -5.4263 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.9147 -9.4555 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.8855 -9.4263 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.6897 -8.0305 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.6897 -6.8513 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.7018 -6.2618 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 9.2908 -5.2506 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10.4700 -5.2524 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.0577 -6.2664 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 12.0761 -6.8427 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 12.0891 -8.0218 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.7257 -8.5952 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 11.0839 -8.6223 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 10.4848 -9.6275 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 9.3057 -9.6139 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10 2 1 0 0 0 0 8 3 2 0 0 0 0 25 24 1 0 0 0 0 8 4 1 0 0 0 0 27 18 1 0 0 0 0 7 5 2 0 0 0 0 26 28 1 0 0 0 0 7 6 1 0 0 0 0 19 27 1 0 0 0 0 15 7 1 0 0 0 0 20 21 1 0 0 0 0 17 8 1 0 0 0 0 30 27 1 0 0 0 0 11 9 2 0 0 0 0 30 29 1 0 0 0 0 11 10 1 0 0 0 0 20 19 1 0 0 0 0 16 11 1 0 0 0 0 22 21 1 0 0 0 0 14 12 1 0 0 0 0 23 24 1 0 0 0 0 14 13 2 0 0 0 0 18 14 1 0 0 0 0 26 25 1 0 0 0 0 21 15 1 0 0 0 0 29 28 1 0 0 0 0 24 16 1 0 0 0 0 23 22 1 0 0 0 0 28 17 1 0 0 0 0 M CHG 4 1 3 4 -1 6 -1 12 -1 M ISO 1 1 90 M END </STRUCTURAL_REPRESENTATION> Where to begin? ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 17. All is not lost! • Back at the 2011 ChemAxon UGM here in Budapest, Sorel Muressan from AstraZeneca Sweden gave a presentation on how spelling correction improves the recall of ChemAxon’s name-to-structure tools. • The exact same CaffeineFix technology can be applied to perform aggressive “spelling correction” on SMILES strings, InChI and V2000 mol files. • As with IUPAC-like systematic names, these can each be specified by a formal grammar. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 18. How the algorithm works • The regular expression describing a V2000 mol files is compiled into a “finite state machine” with 1333 states. • The only allowed “corrections” are the deletion of new lines and the insertion of spaces or new lines, but only where permitted in the grammar/FSM. • Depth-first recursion is used to identify a minimal set of edits to correct the input. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 19. §B.2.4 example after correction 30 29 0 0 0 0 0 0 0 0999 V2000 9.9563 -7.3055 0.0000 Y 1 1 0 0 0 0 0 0 0 0 0 0 15.0355 -4.8847 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0 13.3609 -8.0134 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8867 -9.9869 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.4178 -6.8678 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 5.8872 -4.8955 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.7218 -5.7285 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.0541 -9.1519 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.3408 -6.8634 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8599 -4.8881 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 ... 21 15 1 0 0 0 0 29 28 1 0 0 0 0 24 16 1 0 0 0 0 23 22 1 0 0 0 0 28 17 1 0 0 0 0 M CHG 4 1 3 4 -1 6 -1 12 -1 M ISO 1 1 90 M END ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014 3 line Header Block before Count Line
  • 20. Chemaxon toolkit implementation public static Molecule molFileToChemaxonMol(String molFileStr) throws MolFormatException { try { return MolImporter.importMol(molFileStr); } catch (MolFormatException e) { molFileStr = FixMolFile.fixMolFile(molFileStr); if (molFileStr == null){ throw e; } return MolImporter.importMol(molFileStr); } } // Java source code available at http://www.chemaxon.com/forum/ftopic1265.html ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 21. Geek of the week • A particularly tricky corner case concerns Accerlys’ Pipeline Pilot-style V2000 mol files which abbreviate the columns in the atom block (to save space). • In these files there’s potential ambiguity where the first bond line is mistaken as a continuation of the last (abbreviated) atom line. • Our solution relies on the atom stereo care field being zero in non-query mol files vs. the non-zero values that appear in the first three fields of bonds. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 22. Lest we forget • A similar “spelling correction” variant that allows uppercase characters to be mapped to lowercase, and the prefix “InChI=” to magically appear at the start of a string can also be used to fix ISO InChIs. • Alas uppercasing an InChI (or any molecular formula) is potentially lossy, e.g. “CsN” vs. “CSn”. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 23. Before and after InChI example 1S/C17H21CLN4O/C1-22-12-3-2-4-13(22)8-11(7- 12)21-17(23)14-5-10(18)6-15-16(14)20-9-19-15/H5- 6,9,11-13H,2-4,7-8H2,1H3,(H,19,20)(H,21,23) InChI=1S/C17H21ClN4O/c1-22-12-3-2-4-13(22)8-11(7- 12)21-17(23)14-5-10(18)6-15-16(14)20-9-19-15/h5- 6,9,11-13H,2-4,7-8H2,1H3,(H,19,20)(H,21,23) ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 24. How common are the ambiguities? • 1.35 million standard InChIs from ChEMBL • Uppercase the InChIs, fix them and check whether the original InChI can be regenerated • 99.5% roundtrip (6596 discrepancies) ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 25. Inchi case-insensitive ambiguities ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 26. conclusions • The Java source code for recovering V2000 mol files and InChIs from the types of corruption seen in the ISO 12238 draft has now been contributed to the ChemAxon forum, allowing Marvin and JChem to read the examples given in that document. • Whether this functionality will be required to fully support the final (pending) “Implementation Guide” requirements remains to be seen (and voted upon). • Attention to detail is important in standards writing. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 27. Final words • ISO 11238 IDs may become as popular as Chemical Abstracts’ registry numbers. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 28. acknowledgements • Daniel Lowe, NextMove Software, Cambridge, UK. • Richard Bolton, GSK, Stevenage, UK. • Evan Bolton, NCBI PubChem, Bethesda, MD, USA. • Dac-Trung Nguyen, NIH NCATS, Rockville, MD, USA. • Tyler Peryea, NIH NCATS, Rockville, MD, USA. • Noel Southall, NIH NCATS, Rockville, MD, USA. • Yulia Borodina, FDA, Silver Spring, MD, USA. • Lawrence Callahan, FDA, Silver Spring, MD, USA. • Andrew Marr, Marr Consultancy, Knebworth, UK. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014