Analytical science underpins so many different types of chemistry that it is clearly indispensable. Nuclear Magnetic Resonance and infrared spectroscopy, mass spectrometry and chromatography, and a myriad of other forms of analytical science are easily available to scientists today, commonly in open access walk up labs. While instrumentation is now compact and highly flexible, and the controlling software is both powerful and easy to use, significant challenges remain in terms of the management and integration of various forms of analytical data and, more importantly, the exchange of data between scientists. In general the reporting of data in peer-reviewed journals is limited to electronic supplementary information in the form of PDF files or, occasionally in the form of webpages. Many of the strengths in analytical data resides in the ability to database diverse data types and interrogate later performing searches based on metadata, spectral features and related chemical structure information. The need for file format export and conversions from binary file formats associated with the majority of analytical instrumentation remains a major objective in the field. While file formats such as JCAMP and NetCDF have enabled data exchange for a number of years the requirement for more advanced formats (such as AnIML and mzML) has continued. This presentation will review existing activities in the development of exchangeable formats and progress in utilizing existing formats for the delivery of reusable analytical data to the community.
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Driving needs for analytical data exchange standards and the potential impacts on the chemical sciences
1. Driving needs for analytical data
exchange standards and the potential
impacts on the chemical sciences
Antony Williams
ORCID ID:0000-0002-2668-4821
2. A useful website if we had it…
• All of the “public spectra” from scientific
research articles were available on a website
– NMR, MS, GC/LC-MS, IR, UV-Vis, Raman
• The spectra were NOT pictures but live,
interactive spectral data that can be searched
• The site had programmatic interfaces that
could integrate to instruments for real time
structure identification
3. A useful website if we had it…
• Structural integration with assigned data
(vibrational bands, MS fragments, NMR
assignments (1D and 2D)) would allow for the
construction of predictive models
• And if it all came together we would be able to
consider CASE – Computer-Assisted
Structure Elucidation online!
16. We have pieces…but much to do
• To build the “spectral database” we really
need certain things:
• Adoption of a new community norm: “A
commitment to share spectral data”
• Education around existing standards – “yes
madam, you can already generate JCAMP!”
• “We need a CCDC for spectral data”
18. So why do we need standards?
• Well that’s a dumb question!
• Just in general - think character codes,
HTML, CSV, W3C efforts
• For our domain – the molfile, SDF file, InChI,
CIF files, JCAMP
• There are “standards by adoption” and “open
standards”
25. Standards without adoption
are limited in value
• If the instrument vendors don’t support or
adopt the standards success is limited
• If the scientists don’t know what the standards
are and how to use them then what?
28. Are There Challenges?
• JCAMP is good for a lot of spectral data – IR,
Raman, 1D NMR
• MS data is rarely made available in JCAMP
• A ratified JCAMP 6.0 for 2D data exchange –
would allow third parties to build support
• All other data standards (for NMR at least!)
will take years to catch up
• Support for ASSIGNED JCAMP spectra IS
already supported!
35. We want to find text spectra?
• We can find and index text spectra:13C NMR
(CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH,
benzylic methane), 30.77 (CH, benzylic
methane), 66.12 (CH2), 68.49 (CH2), 117.72,
118.19, 120.29, 122.67, 123.37, 125.69, 125.84,
129.03, 130.00, 130.53 (ArCH), 99.42, 123.60,
134.69, 139.23, 147.21, 147.61, 149.41,
152.62, 154.88 (ArC)
• What would be better are spectral figures – and
include assignments where possible!
38. Developing Proof-of-Concept
• Extract from 1976-2014 USPTO applications
*unknown – starts off with NMR: peak list (no nucleus)
H 975543
C 56536
unknown 44306
F 9429
P 3241
B 91
Si 62
Sn 22
Se 11
N 8
41. Manual Curation Layer
• ALL SPECTRA SHOULD BE JCAMP
• ChemSpider had manual curation for >8 years
• Users already annotate data on ChemSpider
• These data are intended to go into the
developing RSC Data Repository architecture
• http://link.springer.com/article/10.1007/s10822-014-9784-5
42. What should we be doing?
• Settle on a short-term format – JCAMP-JMOL?
• Convince the instrument vendors to export in
this format
• Push button depositions into “containers” –
ChemSpider, NMRShiftDB, Institutional
Repositories
• Encourage format support in software (read
and write) – Mestre, ACD/Labs, Bruker
TopSpin, etc.
43. Actions
• Support and encourage new and EXISTING
standards
• In the meantime, reawaken and modernize the
JCAMP standard
• Encourage scientists to provide data
• Support those that may have good solutions