AnIML: A New Analytical Data Standard

Associate Professor of Chemistry at University of North Florida um University of North Florida
13. Jun 2016

  1. AnIML: A New Analytical Data Standard Stuart J. Chalk, Department of Chemistry, University of North Florida ACS Meeting Boston 2015
  2.  Data Formats  Goals for Data Handling  Introduction to AnIML  Sections of an AnIML file  AnIML Schemas and Files  AnIML Technique Definitions  Publishing Instrument Data  Referencing Data Elements  Calculations on Data  Future Developments  Conclusion Overview
  3.  Native Data Formats  Proprietary formats  "Metadata" separated from result data  Metadata and data in multiple files  Metadata not available electronically  No way to link metadata with result data  Interchange Data Formats  Available for only a few techniques  ANDI — GC, LC, MS  JCAMP-DX — UV-Vis, IR, NMR, UV/Vis, IMS  Fixed order, fixed syntax, immutable formats  Content limitations  Inconsistent implementations Current Data Formats
  4.  Extensible  Easy to add new elements without breaking existing applications  Flexible  Useful for diverse needs: Interchange, Interconversion, Archiving...  Useable & Maintainable  Easy to create, use, adapt, maintain...  Readily available tools  Acceptable  Use standard mechanisms accepted by mainstream computing  Human readable  eXtensible Markup Language Goals for Data Handling
  5.  Extensible Markup Language (XML) specification  Development under ASTM E13.15 ‘AnIML Task Group’  Data standard to: “Develop an analytical data standard that can be used to store data from any analytical instrument” Introduction to AnIML
  6.  JCAMP-DX   ANDI (netCDF)  ThermoML (NIST)  SpectroML  Nguyen, A. D. T., Arslan, A., Travis, J., Smith, M., Schafer, R., & Kramer, G. W. (2004) ‘Molecular Spectrometry Data Interchange Applications for NIST's SpectroML’, JALA 9 (6), 346-354. doi:10.1016/j.jala.2004.09.001  Generalized Analytical Markup Language (GAML)   First official meeting March 23, 2003 @ ASTM Brief History of Time AnIML
  7.  Broad scope  Different types of data  Size of data sets  Everyone calls ‘widgit’ something different  Need for metadata dictionaries  One size does not fit all  Getting broad community involvement  Domain experts  User communities  What format? Challenges for AnIML
  8.  AnIML XML elements are ‘pigeon holes’ for metadata  Minimal ‘required’ information  If it’s not required you don’t have to include the element  Extensible  Store raw data not processed data (except for FT techniques)  Support for legacy data  Record of changes  Validatable  Signable (digital sense) AnIML Design Philosophy
  9. AnIML Schemas and Files
  10. Sections of an AnIML File
  11. AnIML Technique Definitions
  12. AnIML - Sample
  13. AnIML - Sample
  14. AnIML - Experiment
  15. AnIML - Result
  16.  Data storage format  Not just for spectral data  Access  Data  Metadata  Manipulate using XSLT  Validate  Signable AnIML in an ELN
  17.  AnIML Viewer -> Jmol/JSpecView ( Publish Supplementary Data
  18.  Conversion of AnIML data to SVG using XSLT Convert to Image File for Publication
  19.  Expose an AnIML file at a URL  Optional: Define a DOI for that URL  Use XPath to reference a specific data point in an AnIML file  //ExperimentStepSet[1]/ExperimentStep[1]/Method[1]/Auth or[1]/Name[1]  Encode the XPath expression so it can be part of the URL Open Instrument Data
  20. Part of a Data Management Plan  Federal agencies are mandating data be made available  Long term archive format for research data  Referenceable if available online  Searchable with Xquery  Publish data processing algorithms (XSLT)  Future proof data -> conversion to future data formats
  21.  The Healthcare and Life Science (HCLS) Community Profile is a Note from the Semantic Web HCLS Interest Group  Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval. Data Descriptions: HCLS Community Profile
  22.  AnIML 1.0 Deliverables  Core Schema - Fundamental framework for AnIML documents  Technique Schema - Fundamental framework for technique definition and extension documents  AnIML Technique Definition Documents (ATDD) - Rules for content of specific technique file  AnIML Naming and Design Rules - Specifies rules about data element structure for interoperability  Standard Practice for AnIML Files - Describes how the specification is supposed to work  How to Create a Technique Definition Document - Guidelines for creating new technique definition documents  Other documents  Draft Requirements Specification for AnIML Version 1.0  Requirements and Goals of the Analytical Information Markup Language AnIML Specification
  23.  Documentation  Core specification  Technique and extension specification  Naming and design rules  Annotated technique definitions (UV/Vis, IR, 1D NMR, MS, Chromatography)  Balloting through ASTM (end of 2015)  Vendor, User, Developer extensions  Semantic extension of AnIML metadata items Future Developments
  24. Conclusion  AnIML is a great solution for storing instrument data  Human readable (UTF-8)  Platform neutral  Archivable  Validatable  AnIML leverages the extensive XML ecosystem of tools  Software engineers know XML
