SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Driving needs for analytical data
exchange standards and the potential
impacts on the chemical sciences
Antony Williams
ORCID ID:0000-0002-2668-4821
A useful website if we had it…
• All of the “public spectra” from scientific
research articles were available on a website
– NMR, MS, GC/LC-MS, IR, UV-Vis, Raman
• The spectra were NOT pictures but live,
interactive spectral data that can be searched
• The site had programmatic interfaces that
could integrate to instruments for real time
structure identification
A useful website if we had it…
• Structural integration with assigned data
(vibrational bands, MS fragments, NMR
assignments (1D and 2D)) would allow for the
construction of predictive models
• And if it all came together we would be able to
consider CASE – Computer-Assisted
Structure Elucidation online!
And some of it is done…
NIST Webbook
mzCloud
NMRDB.org
ACD/ILab
MassBank
MassBank
SDBS
http://sdbs.db.aist.go.jp/sdbs/cgi-bin/direct_frame_top.cgi
ChemSpider
ChemSpider
9442 Spectra and growing
http://www.chemspider.com/spectra.aspx
We have pieces…but much to do
• To build the “spectral database” we really
need certain things:
• Adoption of a new community norm: “A
commitment to share spectral data”
• Education around existing standards – “yes
madam, you can already generate JCAMP!”
• “We need a CCDC for spectral data” 
So why do we need standards?
So why do we need standards?
• Well that’s a dumb question!
• Just in general - think character codes,
HTML, CSV, W3C efforts
• For our domain – the molfile, SDF file, InChI,
CIF files, JCAMP
• There are “standards by adoption” and “open
standards”
Mass Spectrometry Formats
https://en.wikipedia.org/wiki/Mass_spectrometry_data_format
Analytical Data Standards
Analytical Data Standards
2D NMR
Progress in standards
Progress in standards
Standards without adoption
are limited in value
• If the instrument vendors don’t support or
adopt the standards success is limited
• If the scientists don’t know what the standards
are and how to use them then what?
Publishers can push us for data
RSC loads Supp. Info Data now..
Are There Challenges?
• JCAMP is good for a lot of spectral data – IR,
Raman, 1D NMR
• MS data is rarely made available in JCAMP
• A ratified JCAMP 6.0 for 2D data exchange –
would allow third parties to build support
• All other data standards (for NMR at least!)
will take years to catch up
• Support for ASSIGNED JCAMP spectra IS
already supported!
JCAMP-MOL
Jmol - JSpecView
ChemDoodle Components
And even support for 2D NMR!
A Movie from the Denver meeting
https://www.youtube.com/watch?v=vJbKnu1LT0Y
ESI – Text Spectra
We want to find text spectra?
• We can find and index text spectra:13C NMR
(CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH,
benzylic methane), 30.77 (CH, benzylic
methane), 66.12 (CH2), 68.49 (CH2), 117.72,
118.19, 120.29, 122.67, 123.37, 125.69, 125.84,
129.03, 130.00, 130.53 (ArCH), 99.42, 123.60,
134.69, 139.23, 147.21, 147.61, 149.41,
152.62, 154.88 (ArC)
• What would be better are spectral figures – and
include assignments where possible!
MestreLabs Mnova NMR
1H NMR (CDCl3, 400 MHz):
δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t,
1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz,
C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
Developing Proof-of-Concept
• Extract from 1976-2014 USPTO applications
*unknown – starts off with NMR: peak list (no nucleus)
H 975543
C 56536
unknown 44306
F 9429
P 3241
B 91
Si 62
Sn 22
Se 11
N 8
ESI Data also contains figures
“Where is the real data please?”
FIGURE
DATA
Manual Curation Layer
• ALL SPECTRA SHOULD BE JCAMP
• ChemSpider had manual curation for >8 years
• Users already annotate data on ChemSpider
• These data are intended to go into the
developing RSC Data Repository architecture
• http://link.springer.com/article/10.1007/s10822-014-9784-5
What should we be doing?
• Settle on a short-term format – JCAMP-JMOL?
• Convince the instrument vendors to export in
this format
• Push button depositions into “containers” –
ChemSpider, NMRShiftDB, Institutional
Repositories
• Encourage format support in software (read
and write) – Mestre, ACD/Labs, Bruker
TopSpin, etc.
Actions
• Support and encourage new and EXISTING
standards
• In the meantime, reawaken and modernize the
JCAMP standard
• Encourage scientists to provide data
• Support those that may have good solutions
JCAMP-MOL
ChAMP – Stuart Chalk
Thank you
Email: tony27587@gmail.com
ORCID: 0000-0002-2668-4821
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Weitere ähnliche Inhalte

Andere mochten auch

Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Andere mochten auch (12)

Aligning scientific expertise with passion for a career
Aligning scientific expertise with passion for a careerAligning scientific expertise with passion for a career
Aligning scientific expertise with passion for a career
 
Disruptive Strategies for Removing Drug Discovery Bottlenecks
Disruptive Strategies for Removing Drug Discovery BottlenecksDisruptive Strategies for Removing Drug Discovery Bottlenecks
Disruptive Strategies for Removing Drug Discovery Bottlenecks
 
ChemSpider – An Online Database and Registration System Linking the Web
ChemSpider – An Online Database and  Registration System Linking the WebChemSpider – An Online Database and  Registration System Linking the Web
ChemSpider – An Online Database and Registration System Linking the Web
 
Chemistry in the hand: The delivery of structure databases and spectroscopy g...
Chemistry in the hand: The delivery of structure databases and spectroscopy g...Chemistry in the hand: The delivery of structure databases and spectroscopy g...
Chemistry in the hand: The delivery of structure databases and spectroscopy g...
 
Beyond the Paper CV
Beyond the Paper CVBeyond the Paper CV
Beyond the Paper CV
 
The Future of Computational Models for Predicting Human Toxicities
The Future of Computational Models for Predicting Human ToxicitiesThe Future of Computational Models for Predicting Human Toxicities
The Future of Computational Models for Predicting Human Toxicities
 
Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...
 
Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...Current initiatives in developing research data repositories at the Royal Soc...
Current initiatives in developing research data repositories at the Royal Soc...
 

Kürzlich hochgeladen

The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
University of Hertfordshire
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 

Kürzlich hochgeladen (20)

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 

Driving needs for analytical data exchange standards and the potential impacts on the chemical sciences

  • 1. Driving needs for analytical data exchange standards and the potential impacts on the chemical sciences Antony Williams ORCID ID:0000-0002-2668-4821
  • 2. A useful website if we had it… • All of the “public spectra” from scientific research articles were available on a website – NMR, MS, GC/LC-MS, IR, UV-Vis, Raman • The spectra were NOT pictures but live, interactive spectral data that can be searched • The site had programmatic interfaces that could integrate to instruments for real time structure identification
  • 3. A useful website if we had it… • Structural integration with assigned data (vibrational bands, MS fragments, NMR assignments (1D and 2D)) would allow for the construction of predictive models • And if it all came together we would be able to consider CASE – Computer-Assisted Structure Elucidation online!
  • 4. And some of it is done…
  • 7.
  • 15. 9442 Spectra and growing http://www.chemspider.com/spectra.aspx
  • 16. We have pieces…but much to do • To build the “spectral database” we really need certain things: • Adoption of a new community norm: “A commitment to share spectral data” • Education around existing standards – “yes madam, you can already generate JCAMP!” • “We need a CCDC for spectral data” 
  • 17. So why do we need standards?
  • 18. So why do we need standards? • Well that’s a dumb question! • Just in general - think character codes, HTML, CSV, W3C efforts • For our domain – the molfile, SDF file, InChI, CIF files, JCAMP • There are “standards by adoption” and “open standards”
  • 25. Standards without adoption are limited in value • If the instrument vendors don’t support or adopt the standards success is limited • If the scientists don’t know what the standards are and how to use them then what?
  • 26. Publishers can push us for data
  • 27. RSC loads Supp. Info Data now..
  • 28. Are There Challenges? • JCAMP is good for a lot of spectral data – IR, Raman, 1D NMR • MS data is rarely made available in JCAMP • A ratified JCAMP 6.0 for 2D data exchange – would allow third parties to build support • All other data standards (for NMR at least!) will take years to catch up • Support for ASSIGNED JCAMP spectra IS already supported!
  • 32. And even support for 2D NMR!
  • 33. A Movie from the Denver meeting https://www.youtube.com/watch?v=vJbKnu1LT0Y
  • 34. ESI – Text Spectra
  • 35. We want to find text spectra? • We can find and index text spectra:13C NMR (CDCl3, 100 MHz): δ = 14.12 (CH3), 30.11 (CH, benzylic methane), 30.77 (CH, benzylic methane), 66.12 (CH2), 68.49 (CH2), 117.72, 118.19, 120.29, 122.67, 123.37, 125.69, 125.84, 129.03, 130.00, 130.53 (ArCH), 99.42, 123.60, 134.69, 139.23, 147.21, 147.61, 149.41, 152.62, 154.88 (ArC) • What would be better are spectral figures – and include assignments where possible!
  • 37. 1H NMR (CDCl3, 400 MHz): δ = 2.57 (m, 4H, Me, C(5a)H), 4.24 (d, 1H, J = 4.8 Hz, C(11b)H), 4.35 (t, 1H, Jb = 10.8 Hz, C(6)H), 4.47 (m, 2H, C(5)H), 4.57 (dd, 1H, J = 2.8 Hz, C(6)H), 6.95 (d, 1H, J = 8.4 Hz, ArH), 7.18–7.94 (m, 11H, ArH)
  • 38. Developing Proof-of-Concept • Extract from 1976-2014 USPTO applications *unknown – starts off with NMR: peak list (no nucleus) H 975543 C 56536 unknown 44306 F 9429 P 3241 B 91 Si 62 Sn 22 Se 11 N 8
  • 39. ESI Data also contains figures
  • 40. “Where is the real data please?” FIGURE DATA
  • 41. Manual Curation Layer • ALL SPECTRA SHOULD BE JCAMP • ChemSpider had manual curation for >8 years • Users already annotate data on ChemSpider • These data are intended to go into the developing RSC Data Repository architecture • http://link.springer.com/article/10.1007/s10822-014-9784-5
  • 42. What should we be doing? • Settle on a short-term format – JCAMP-JMOL? • Convince the instrument vendors to export in this format • Push button depositions into “containers” – ChemSpider, NMRShiftDB, Institutional Repositories • Encourage format support in software (read and write) – Mestre, ACD/Labs, Bruker TopSpin, etc.
  • 43. Actions • Support and encourage new and EXISTING standards • In the meantime, reawaken and modernize the JCAMP standard • Encourage scientists to provide data • Support those that may have good solutions
  • 46. Thank you Email: tony27587@gmail.com ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams