Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

A Generic Scientific Data Model and Ontology for Representation of Chemical Data

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 21 Anzeige

A Generic Scientific Data Model and Ontology for Representation of Chemical Data

Herunterladen, um offline zu lesen

The current movement toward openness and sharing of data is likely to have a profound effect on the speed of scientific research and the complexity of questions we can answer. However, a fundamental problem with currently available datasets (and their metadata) is heterogeneity in terms of implementation, organization, and representation.

To address this issue we have developed a generic scientific data model (SDM) to organize and annotate raw and processed data, and the associated metadata. This paper will present the current status of the SDM, implementation of the SDM in JSON-LD, and the associated scientific data model ontology (SDMO). Example usage of the SDM to store data from a variety of sources with be discussed along with future plans for the work.

The current movement toward openness and sharing of data is likely to have a profound effect on the speed of scientific research and the complexity of questions we can answer. However, a fundamental problem with currently available datasets (and their metadata) is heterogeneity in terms of implementation, organization, and representation.

To address this issue we have developed a generic scientific data model (SDM) to organize and annotate raw and processed data, and the associated metadata. This paper will present the current status of the SDM, implementation of the SDM in JSON-LD, and the associated scientific data model ontology (SDMO). Example usage of the SDM to store data from a variety of sources with be discussed along with future plans for the work.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie A Generic Scientific Data Model and Ontology for Representation of Chemical Data (20)

Anzeige

Weitere von Stuart Chalk (19)

Aktuellste (20)

Anzeige

A Generic Scientific Data Model and Ontology for Representation of Chemical Data

  1. 1. A Generic Scientific Data Model and Ontology for Representation of Chemical Data Stuart J. Chalk, Department of Chemistry University of North Florida schalk@unf.edu CINF Paper 171 – 251st ACS Meeting Spring 2016 #ACSCINFDataSummit
  2. 2. Scientific Data Should be Open  Simple: Openness as the norm not the exception  Data made available, without restriction, so its useful  Mechanisms/tools to make data available  Formats to allow others to get the data…  …but also so its easy to use  Annotate the data to make it easy to find  Community driven promotion of and action on this issue
  3. 3.  Research Notebook  Spectral Files (JCAMP-DX, propriety)  Excel Spreadsheets  Personal Databases  Online Databases  PDF Files No!  RDF Yes! Resource Description Framework Options for Storing Data?
  4. 4.  W3C Recommendation 2015 Specification - https://www.w3.org/TR/ldp/ Primer - https://www.w3.org/TR/ldp-primer/ The Linked Data Platform From: http://www.dataversity.net/introduction-linked-data-platform/
  5. 5.  Use JavaScript Object Notation (JSON) as a text format for storing data and metadata so it can be converted to RDF JSON for Linked Data (JSON-LD) { "@context": { "name": "http://schema.org/name", "isAlive": "http://example.org/isAlive", "age": "http://example.org/age", "height": "http://schema.org/height", "@base": "http://www.unf.edu/chemistry/stuart_chalk.aspx" }, "@id": "", "name": "Stuart Chalk", "isAlive": true, "age": 49, "height": 188.0 } http://json-ld.org/playground/
  6. 6. JSON for Linked Data (JSON-LD) <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://example.org/age> "49"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://example.org/isAlive> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://schema.org/height> "188"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://schema.org/name> "Stuart Chalk" .
  7. 7.  Nice idea but because anything can be linked to anything else to form a graph of variable structure…  ...difficult to search, hard to maintain  OK, use regular relational database – Rigid Schema Not good to try and make data fit the schema…  Use a hybrid approach!  Encode some structure in RDF using a framework...  ...add data to the structured graph in an organized way Store all Scientific Data in RDF?
  8. 8.  Consider FAIR Principals (http://www.datafairport.org)  To be Findable:  F1. (meta)data are assigned a globally unique and persistent identifier  F2. data are described with rich metadata (defined by R1 below)  F3. metadata clearly and explicitly include the identifier of the data it describes  F4. (meta)data are registered or indexed in a searchable resource  To be Accessible:  A1. (meta)data are retrievable by their identifier using a standardized communications protocol  A2. metadata are accessible, even when the data are no longer available  To be Interoperable:  I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.  I2. (meta)data use vocabularies that follow FAIR principles  I3. (meta)data include qualified references to other (meta)data  To be Reusable:  R1. meta(data) are richly described with a plurality of accurate and relevant attributes  R1.1. (meta)data are released with a clear and accessible data usage license  R1.2. (meta)data are associated with detailed provenance  R1.3. (meta)data meet domain-relevant community standards What Metadata is Important for Data?
  9. 9.  Define scope as data obtained from an experiment, a series of experiments, a project  Who did the work and where are they?  Metadata about the data “packet”  The raw data…  …its associated metadata (enough to properly contextualize the data)  Access rights  Published location What Should a Data Model Represent?
  10. 10. General Framework  SciData – Scientific Data Model (SDM)  Overview – http://stuchalk.github.io/scidata/  GitHub Repo – https://github.com/stuchalk/scidata
  11. 11. General Framework - The Context  “@context” contains the context definition  Refers to other context files  Namespace abbreviations  Default vocabulary “@vocab”  “@id” links ontology term  “@type” states data type
  12. 12. Methodology, System, and Dataset
  13. 13. Example Data - pH
  14. 14. Example Data - Literature Value  “scope” provides internal link to “@id” value  Each value of a name value pair has a default data type that can be override by expanding value to a JSON object and adding “@value” and “@type”
  15. 15. Example Data - NMR Spectrum  “dataseries” are JSON arrays of data on one axis  Bring them together with “datagroup” and we can represent at spectrum  “parameter” is generic container for data, or metadata
  16. 16. Example Data – CC Calculation  “datagroup”s are structures to aggregate data at any level  “datagroup”s can be infinitely nested  “uid” is optional and can be used to unique define any piece of data
  17. 17. The SDM Ontology  SciData Ontology – Scientific Data Model Ontology (SDMO)  OWL File – https://github.com/stuchalk/scidata/b lob/master/ontology/scidata.owl
  18. 18.  Get community feedback, refine/extend/standardize  Generate large corpus of disparate data in JSON-LD, ingest into triple store and query (SPARQL)  Evaluate inferencing on the triple store data  Push adoption through collaboration  Run hackathons to build developer implementations  Develop Electronic Laboratory Notebook (ELN) to generate data in JSON-LD  Get feedback from data community, RDA - https://rd-alliance.org/  Test using the NDS - http://www.nationaldataservice.org/ Future Work
  19. 19.  Pain Points  Challenges  Opportunities  Normalization  Tools to generate metadata automatically  User Perspective  Gaps in Data  Gaps in Ontology Coverage Pain Points?  Gather stakeholders to work on standards  Broad knowledge domain representation  i-UPAC, RDA Chemistry Research Data IG  Priorities?  Data annotation and representation  Data exchange (repo <-> repo, user <-> user)  Structure representation (chiral centers)  Curation infrastructures  Domain vocabulary translations  Units of measure
  20. 20. Reality Check “to err is human; to forgive, divine” Alexander Pope “to err is human; to really screw things up requires a computer” Paul Ehrlich “to err is human; all hell will break loose if you don’t provide accurate semantics to a computer” Stuart Chalk
  21. 21.  schalk@unf.edu  Phone: 904-620-1938  Skype: stuartchalk  LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk  ORCID: http://orcid.org/0000-0002-0703-7776  ResearcherID: http://www.researcherid.com/rid/D-8577-2013 Questions?

×