SlideShare ist ein Scribd-Unternehmen logo
1 von 21
A Generic Scientific Data Model
and Ontology for Representation
of Chemical Data
Stuart J. Chalk, Department of Chemistry
University of North Florida
schalk@unf.edu
CINF Paper 171 – 251st ACS Meeting Spring 2016
#ACSCINFDataSummit
Scientific Data Should be Open
 Simple: Openness as the norm not the exception
 Data made available, without restriction, so its useful
 Mechanisms/tools to make data available
 Formats to allow others to get the data…
 …but also so its easy to use
 Annotate the data to make it easy to find
 Community driven promotion of and action on this issue
 Research Notebook
 Spectral Files (JCAMP-DX, propriety)
 Excel Spreadsheets
 Personal Databases
 Online Databases
 PDF Files No!
 RDF Yes!
Resource Description Framework
Options for Storing Data?
 W3C Recommendation 2015
Specification - https://www.w3.org/TR/ldp/
Primer - https://www.w3.org/TR/ldp-primer/
The Linked Data
Platform
From: http://www.dataversity.net/introduction-linked-data-platform/
 Use JavaScript
Object Notation
(JSON) as a text
format for
storing data and
metadata so it
can be converted
to RDF
JSON for Linked Data (JSON-LD)
{
"@context": {
"name": "http://schema.org/name",
"isAlive": "http://example.org/isAlive",
"age": "http://example.org/age",
"height": "http://schema.org/height",
"@base": "http://www.unf.edu/chemistry/stuart_chalk.aspx"
},
"@id": "",
"name": "Stuart Chalk",
"isAlive": true,
"age": 49,
"height": 188.0
} http://json-ld.org/playground/
JSON for Linked Data (JSON-LD)
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://example.org/age>
"49"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://example.org/isAlive>
"true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://schema.org/height>
"188"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://www.unf.edu/chemistry/stuart_chalk.aspx>
<http://schema.org/name>
"Stuart Chalk" .
 Nice idea but because anything can be
linked to anything else to form a graph of variable structure…
 ...difficult to search, hard to maintain
 OK, use regular relational database – Rigid Schema
Not good to try and make data fit the schema…
 Use a hybrid approach!
 Encode some structure in RDF using a framework...
 ...add data to the structured graph in an organized way
Store all Scientific Data in RDF?
 Consider FAIR Principals (http://www.datafairport.org)
 To be Findable:
 F1. (meta)data are assigned a globally unique and persistent identifier
 F2. data are described with rich metadata (defined by R1 below)
 F3. metadata clearly and explicitly include the identifier of the data it describes
 F4. (meta)data are registered or indexed in a searchable resource
 To be Accessible:
 A1. (meta)data are retrievable by their identifier using a standardized communications protocol
 A2. metadata are accessible, even when the data are no longer available
 To be Interoperable:
 I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
 I2. (meta)data use vocabularies that follow FAIR principles
 I3. (meta)data include qualified references to other (meta)data
 To be Reusable:
 R1. meta(data) are richly described with a plurality of accurate and relevant attributes
 R1.1. (meta)data are released with a clear and accessible data usage license
 R1.2. (meta)data are associated with detailed provenance
 R1.3. (meta)data meet domain-relevant community standards
What Metadata is Important for Data?
 Define scope as data obtained from an experiment,
a series of experiments, a project
 Who did the work and where are they?
 Metadata about the data “packet”
 The raw data…
 …its associated metadata (enough to properly contextualize the data)
 Access rights
 Published location
What Should a Data Model Represent?
General
Framework
 SciData – Scientific Data
Model (SDM)
 Overview –
http://stuchalk.github.io/scidata/
 GitHub Repo –
https://github.com/stuchalk/scidata
General Framework
- The Context
 “@context” contains the
context definition
 Refers to other context files
 Namespace abbreviations
 Default vocabulary “@vocab”
 “@id” links ontology term
 “@type” states data type
Methodology, System, and Dataset
Example Data - pH
Example Data -
Literature Value
 “scope” provides internal link
to “@id” value
 Each value of a name value pair
has a default data type that can
be override by expanding value
to a JSON object and adding
“@value” and “@type”
Example Data -
NMR Spectrum
 “dataseries” are JSON arrays of
data on one axis
 Bring them together with
“datagroup” and we can
represent at spectrum
 “parameter” is generic
container for data, or metadata
Example Data –
CC Calculation
 “datagroup”s are structures to
aggregate data at any level
 “datagroup”s can be infinitely
nested
 “uid” is optional and can be
used to unique define any piece
of data
The SDM
Ontology
 SciData Ontology –
Scientific Data Model
Ontology (SDMO)
 OWL File –
https://github.com/stuchalk/scidata/b
lob/master/ontology/scidata.owl
 Get community feedback, refine/extend/standardize
 Generate large corpus of disparate data in JSON-LD, ingest into triple store
and query (SPARQL)
 Evaluate inferencing on the triple store data
 Push adoption through collaboration
 Run hackathons to build developer implementations
 Develop Electronic Laboratory Notebook (ELN) to generate data in JSON-LD
 Get feedback from data community, RDA - https://rd-alliance.org/
 Test using the NDS - http://www.nationaldataservice.org/
Future Work
 Pain Points
 Challenges
 Opportunities
 Normalization
 Tools to generate
metadata automatically
 User Perspective
 Gaps in Data
 Gaps in Ontology Coverage
Pain Points?
 Gather stakeholders to work on standards
 Broad knowledge domain representation
 i-UPAC, RDA Chemistry Research Data IG
 Priorities?
 Data annotation and representation
 Data exchange (repo <-> repo, user <-> user)
 Structure representation (chiral centers)
 Curation infrastructures
 Domain vocabulary translations
 Units of measure
Reality Check
“to err is human; to forgive, divine”
Alexander Pope
“to err is human; to really screw things up requires a computer”
Paul Ehrlich
“to err is human; all hell will break loose if you
don’t provide accurate semantics to a computer”
Stuart Chalk
 schalk@unf.edu
 Phone: 904-620-1938
 Skype: stuartchalk
 LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk
 ORCID: http://orcid.org/0000-0002-0703-7776
 ResearcherID: http://www.researcherid.com/rid/D-8577-2013
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Database system concepts and architecture
Database system concepts and architectureDatabase system concepts and architecture
Database system concepts and architecture
Jafar Nesargi
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
butest
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
butest
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
Anish Das
 
Artificial intelligence-full -report.doc
Artificial intelligence-full -report.docArtificial intelligence-full -report.doc
Artificial intelligence-full -report.doc
daksh Talsaniya
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 

Was ist angesagt? (20)

Machine learning seminar presentation
Machine learning seminar presentationMachine learning seminar presentation
Machine learning seminar presentation
 
Rdbms
RdbmsRdbms
Rdbms
 
Database system concepts and architecture
Database system concepts and architectureDatabase system concepts and architecture
Database system concepts and architecture
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNINGARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
 
Semantic Digital Libraries
Semantic Digital LibrariesSemantic Digital Libraries
Semantic Digital Libraries
 
Dbms and rdbms
Dbms and rdbmsDbms and rdbms
Dbms and rdbms
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
Artificial intelligence-full -report.doc
Artificial intelligence-full -report.docArtificial intelligence-full -report.doc
Artificial intelligence-full -report.doc
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
Diamond chip
Diamond chipDiamond chip
Diamond chip
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Database management system
Database management systemDatabase management system
Database management system
 
Database - Design & Implementation - 1
Database - Design & Implementation - 1Database - Design & Implementation - 1
Database - Design & Implementation - 1
 
3D OPTICAL DATA STORAGE
3D OPTICAL DATA STORAGE3D OPTICAL DATA STORAGE
3D OPTICAL DATA STORAGE
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Ai and using ml in mobile apps
Ai and using ml in mobile appsAi and using ml in mobile apps
Ai and using ml in mobile apps
 

Ähnlich wie A Generic Scientific Data Model and Ontology for Representation of Chemical Data

FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
ibemam
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 

Ähnlich wie A Generic Scientific Data Model and Ontology for Representation of Chemical Data (20)

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Next-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information RetrievalNext-Generation Search Engines for Information Retrieval
Next-Generation Search Engines for Information Retrieval
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Introduction of Linked Data for Science
Introduction of Linked Data for ScienceIntroduction of Linked Data for Science
Introduction of Linked Data for Science
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
David Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published recordDavid Shotton - Research Integrity: Integrity of the published record
David Shotton - Research Integrity: Integrity of the published record
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
When is a model FAIR – and why should we care?
When is a model FAIR – and why should we care?When is a model FAIR – and why should we care?
When is a model FAIR – and why should we care?
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
 
FAIR data
FAIR dataFAIR data
FAIR data
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
eTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service PlatformeTRIKS Data Harmonization Service Platform
eTRIKS Data Harmonization Service Platform
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 

Mehr von Stuart Chalk

Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
Stuart Chalk
 

Mehr von Stuart Chalk (20)

Semantic properties and units
Semantic properties and unitsSemantic properties and units
Semantic properties and units
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
 
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
ChemExtractor: Enhanced Rule-Based Capture and Identification of PDF Based Pr...
 
AnIML: A New Analytical Data Standard
AnIML: A New Analytical Data StandardAnIML: A New Analytical Data Standard
AnIML: A New Analytical Data Standard
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series DataSharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series Data
 
Bringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic WebBringing Flow injection Analysis to the Semantic Web
Bringing Flow injection Analysis to the Semantic Web
 
Reactions to the Open Spectral Database
Reactions to the Open Spectral DatabaseReactions to the Open Spectral Database
Reactions to the Open Spectral Database
 
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
Integrating AnIML Files in Electronic Laboratory Notebooks - PittCon 2015
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
A Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSXA Standard Data Format for Computational Chemistry: CSX
A Standard Data Format for Computational Chemistry: CSX
 
Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)Overview of the Analytical Information Markup Language (AnIML)
Overview of the Analytical Information Markup Language (AnIML)
 
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into EurekaACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
ACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility DataACS 248th Paper 108 NIST-IUPAC Solubility Data
ACS 248th Paper 108 NIST-IUPAC Solubility Data
 
ACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData ProjectACS 248th Paper 104 ChemData Project
ACS 248th Paper 104 ChemData Project
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 

Kürzlich hochgeladen

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Kürzlich hochgeladen (20)

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 

A Generic Scientific Data Model and Ontology for Representation of Chemical Data

  • 1. A Generic Scientific Data Model and Ontology for Representation of Chemical Data Stuart J. Chalk, Department of Chemistry University of North Florida schalk@unf.edu CINF Paper 171 – 251st ACS Meeting Spring 2016 #ACSCINFDataSummit
  • 2. Scientific Data Should be Open  Simple: Openness as the norm not the exception  Data made available, without restriction, so its useful  Mechanisms/tools to make data available  Formats to allow others to get the data…  …but also so its easy to use  Annotate the data to make it easy to find  Community driven promotion of and action on this issue
  • 3.  Research Notebook  Spectral Files (JCAMP-DX, propriety)  Excel Spreadsheets  Personal Databases  Online Databases  PDF Files No!  RDF Yes! Resource Description Framework Options for Storing Data?
  • 4.  W3C Recommendation 2015 Specification - https://www.w3.org/TR/ldp/ Primer - https://www.w3.org/TR/ldp-primer/ The Linked Data Platform From: http://www.dataversity.net/introduction-linked-data-platform/
  • 5.  Use JavaScript Object Notation (JSON) as a text format for storing data and metadata so it can be converted to RDF JSON for Linked Data (JSON-LD) { "@context": { "name": "http://schema.org/name", "isAlive": "http://example.org/isAlive", "age": "http://example.org/age", "height": "http://schema.org/height", "@base": "http://www.unf.edu/chemistry/stuart_chalk.aspx" }, "@id": "", "name": "Stuart Chalk", "isAlive": true, "age": 49, "height": 188.0 } http://json-ld.org/playground/
  • 6. JSON for Linked Data (JSON-LD) <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://example.org/age> "49"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://example.org/isAlive> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://schema.org/height> "188"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://www.unf.edu/chemistry/stuart_chalk.aspx> <http://schema.org/name> "Stuart Chalk" .
  • 7.  Nice idea but because anything can be linked to anything else to form a graph of variable structure…  ...difficult to search, hard to maintain  OK, use regular relational database – Rigid Schema Not good to try and make data fit the schema…  Use a hybrid approach!  Encode some structure in RDF using a framework...  ...add data to the structured graph in an organized way Store all Scientific Data in RDF?
  • 8.  Consider FAIR Principals (http://www.datafairport.org)  To be Findable:  F1. (meta)data are assigned a globally unique and persistent identifier  F2. data are described with rich metadata (defined by R1 below)  F3. metadata clearly and explicitly include the identifier of the data it describes  F4. (meta)data are registered or indexed in a searchable resource  To be Accessible:  A1. (meta)data are retrievable by their identifier using a standardized communications protocol  A2. metadata are accessible, even when the data are no longer available  To be Interoperable:  I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.  I2. (meta)data use vocabularies that follow FAIR principles  I3. (meta)data include qualified references to other (meta)data  To be Reusable:  R1. meta(data) are richly described with a plurality of accurate and relevant attributes  R1.1. (meta)data are released with a clear and accessible data usage license  R1.2. (meta)data are associated with detailed provenance  R1.3. (meta)data meet domain-relevant community standards What Metadata is Important for Data?
  • 9.  Define scope as data obtained from an experiment, a series of experiments, a project  Who did the work and where are they?  Metadata about the data “packet”  The raw data…  …its associated metadata (enough to properly contextualize the data)  Access rights  Published location What Should a Data Model Represent?
  • 10. General Framework  SciData – Scientific Data Model (SDM)  Overview – http://stuchalk.github.io/scidata/  GitHub Repo – https://github.com/stuchalk/scidata
  • 11. General Framework - The Context  “@context” contains the context definition  Refers to other context files  Namespace abbreviations  Default vocabulary “@vocab”  “@id” links ontology term  “@type” states data type
  • 14. Example Data - Literature Value  “scope” provides internal link to “@id” value  Each value of a name value pair has a default data type that can be override by expanding value to a JSON object and adding “@value” and “@type”
  • 15. Example Data - NMR Spectrum  “dataseries” are JSON arrays of data on one axis  Bring them together with “datagroup” and we can represent at spectrum  “parameter” is generic container for data, or metadata
  • 16. Example Data – CC Calculation  “datagroup”s are structures to aggregate data at any level  “datagroup”s can be infinitely nested  “uid” is optional and can be used to unique define any piece of data
  • 17. The SDM Ontology  SciData Ontology – Scientific Data Model Ontology (SDMO)  OWL File – https://github.com/stuchalk/scidata/b lob/master/ontology/scidata.owl
  • 18.  Get community feedback, refine/extend/standardize  Generate large corpus of disparate data in JSON-LD, ingest into triple store and query (SPARQL)  Evaluate inferencing on the triple store data  Push adoption through collaboration  Run hackathons to build developer implementations  Develop Electronic Laboratory Notebook (ELN) to generate data in JSON-LD  Get feedback from data community, RDA - https://rd-alliance.org/  Test using the NDS - http://www.nationaldataservice.org/ Future Work
  • 19.  Pain Points  Challenges  Opportunities  Normalization  Tools to generate metadata automatically  User Perspective  Gaps in Data  Gaps in Ontology Coverage Pain Points?  Gather stakeholders to work on standards  Broad knowledge domain representation  i-UPAC, RDA Chemistry Research Data IG  Priorities?  Data annotation and representation  Data exchange (repo <-> repo, user <-> user)  Structure representation (chiral centers)  Curation infrastructures  Domain vocabulary translations  Units of measure
  • 20. Reality Check “to err is human; to forgive, divine” Alexander Pope “to err is human; to really screw things up requires a computer” Paul Ehrlich “to err is human; all hell will break loose if you don’t provide accurate semantics to a computer” Stuart Chalk
  • 21.  schalk@unf.edu  Phone: 904-620-1938  Skype: stuartchalk  LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk  ORCID: http://orcid.org/0000-0002-0703-7776  ResearcherID: http://www.researcherid.com/rid/D-8577-2013 Questions?