SlideShare a Scribd company logo
1 of 16
Download to read offline
A model for capturing provenance of
assertions about chemical substances
Kody Moodley1
, Amrapali Zaveri1
, Chunlei Wu2
, Michel Dumontier1
1
Institute of Data Science @ Maastricht University
2
Scripps Research Institute
Chemical data resources on the Web
Provenance for scientific assertions
“Acetaminophen inhibits Prostaglandin G/H synthase 1”
Publication
Report Extract
Who? When? Publication? Experiment? Evidence?
"...information about entities, activities, and people involved in producing a piece of data or thing,
which can be used to form assessments about its quality, reliability or trustworthiness" - W3C
Challenges
1. No guidelines on which provenance should be reported and extracted (which attributes are
compulsory, recommended and optional?)
2. Different terminologies (ontologies) are used to capture the same provenance information across
different databases
???
???
Affects data producers, curators & API developers
Publication
ExtractReport
???
Related Work
DAta Tag Suite (DATS)
HCLS Dataset
Descriptions Bioschemas.org
Datasets: Assertions:
SEPIO
PROV-O
Dublin Core
PAV ontology
Limitations
Current provenance models:
● Do not define compulsory, recommended and optional provenance information
● Are “top-down”: propose new terminology for capturing provenance
Requirement level guidelines are needed to improve consistency and compliance in
provenance specification
“Bottom-up” approaches, taking into account how users currently specify provenance, can
help researchers to assist in standardizing its specification
Our Approach
● Which provenance properties are being specified by data publishers?
○ Analyse provenance coverage and propose attributes to consider
● Which ontology terms do they use to capture these properties?
○ Propose ontology terms to use for defining provenance attributes
● Which properties are more frequently used?
○ Propose compulsory, recommended and optional attributes
Datasets
Analyzed provenance usage in:
● Nanopublications
● Wikidata
And provenance advocated in:
● Minimal Information Standards (MISTS) for reporting biomedical studies
Nanopublications
● Extracted 333 properties from 10,803,231 nanopubs (05/04/2018)
● Manually pruned to 37 unique provenance properties
● Most frequently used:
Property Ontology
hasPublicationInfo Nanopubs nschema
created (date) DC terms
references SIO
contributor (author) DC terms
Wikidata
● All 157,522 records of type “Chemical Compound” (extracted 24/09/2018)
● Identified 76 unique relation / metadata types
● Pruned to 37 provenance related elements
● Most frequently used:
Property Ontology
statedin (database) Wikidata
retrieved (date extracted) Wikidata
referenceurl (website) Wikidata
publicationdate Wikidata
MISTS
● 14 reporting guidelines for biomedical studies in FAIRSharing.org
● Read publications to identify 347 required metadata elements for studies
● Pruned to only provenance related elements (44 in total)
● Examples:
○ ethics statement
○ apparatus
○ duration
○ location
Analysis
74 provenance properties naturally fell into 5 dimensions: Coverage of provenance metadata types in Wikidata & Nanopubs
Analysis
Partitioned 74 properties into 23 MUST, 23 SHOULD & 28 OPTIONAL by:
● Frequency of use in Nanopubs and Wikidata
● Advocacy in selected MISTS
● Relative importance in determining veracity of assertion (e.g. more important
to know title of publication than the chapter or page number)
Results https://github.com/MaastrichtU-IDS/biothingsprovenancemodel
{
"@context" : "http://biothings.io/biothingspovenancemodel.jsonld",
"id" : "assertion1",
"type" : "sio:assertion",
"label" : "Acetaminophen inhibits Prostaglandin G/H synthase 1",
"subject" : "drugbank:DB00316",
"predicate" : "sio:inhibits",
"object" : "drugbankb:BE0000017",
"assertedBy" : ”Regina M. Botting",
"coAssertedBy" : "Samir S. Ayoub",
"assertedOn" : "2004/11/23",
"publishedOn" : "2004/11/23",
"wasInferredFrom" : "assertion2",
"publishedIn" : “Prostaglandins, Leukotrienes and Essential Fatty Acids (PLEFA)",
"publicationTitle" : "COX-3 and the mechanism of action of paracetamol/acetaminophen",
}
Examples in JSON-LD & RDF:
JSON-LD context file is also provided in Github
for mapping to ontology terms
Future Work
● Implement model on data sources in BioThings API - MyChem.info
● Evaluate the feasibility of the model on these data sources by:
○ Measuring how much compulsory provenance is actually specified in these sources
● Discuss possible revisions and uptake within the biomedical community
○ Choice of compulsory provenance may differ across subdomains in chemical substance
research
Thank you!
Questions?
https://github.com/MaastrichtU-IDS/biothingsprovenancemodel
kody.moodley@maastrichtuniversity.nl
@MoodleyKody
Biomedical Data
Translator program
Funded by
https://ncats.nih.gov/translator

More Related Content

What's hot

NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...European School of Oncology
 
Opportunities in chemical structure standardization
Opportunities in chemical structure standardizationOpportunities in chemical structure standardization
Opportunities in chemical structure standardizationValery Tkachenko
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 
OSFair2017 Workshop | OmicsDI: Omics discovery index
OSFair2017 Workshop | OmicsDI: Omics discovery indexOSFair2017 Workshop | OmicsDI: Omics discovery index
OSFair2017 Workshop | OmicsDI: Omics discovery indexOpen Science Fair
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for BiopharmaTom Plasterer
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)Gregor Hagedorn
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
 
Open Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials researchOpen Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials researchValery Tkachenko
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsTom Plasterer
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIRDOM
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeTom Plasterer
 
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...Open Science Fair
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesPistoia Alliance
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
 
FAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODSFAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODSFelipe Gutierrez
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)Carole Goble
 
ANDS presentation at AHMEN meeting 6 June 2016
ANDS presentation at AHMEN meeting 6 June 2016ANDS presentation at AHMEN meeting 6 June 2016
ANDS presentation at AHMEN meeting 6 June 2016ARDC
 

What's hot (20)

NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for p...
 
Opportunities in chemical structure standardization
Opportunities in chemical structure standardizationOpportunities in chemical structure standardization
Opportunities in chemical structure standardization
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
OSFair2017 Workshop | OmicsDI: Omics discovery index
OSFair2017 Workshop | OmicsDI: Omics discovery indexOSFair2017 Workshop | OmicsDI: Omics discovery index
OSFair2017 Workshop | OmicsDI: Omics discovery index
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 
Open Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials researchOpen Science Data Repository - the platform for materials research
Open Science Data Repository - the platform for materials research
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
IC-SDV 2019: OntoChem
IC-SDV 2019: OntoChemIC-SDV 2019: OntoChem
IC-SDV 2019: OntoChem
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
 
FAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODSFAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODS
 
FAIR Data and Model Management for Systems Biology (and SOPs too!)
FAIR Data and Model Management for Systems Biology(and SOPs too!)FAIR Data and Model Management for Systems Biology(and SOPs too!)
FAIR Data and Model Management for Systems Biology (and SOPs too!)
 
ANDS presentation at AHMEN meeting 6 June 2016
ANDS presentation at AHMEN meeting 6 June 2016ANDS presentation at AHMEN meeting 6 June 2016
ANDS presentation at AHMEN meeting 6 June 2016
 

Similar to Model for capturing provenance of chemical assertions

Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOAlejandra Gonzalez-Beltran
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentTim Williams
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...Andrew McEachran
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingMerce Crosas
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Michel Dumontier
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiChris Evelo
 
Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019Takeya Kasukawa
 
Keeping up to date with information retrieval research: Summarized Research i...
Keeping up to date with information retrieval research: Summarized Research i...Keeping up to date with information retrieval research: Summarized Research i...
Keeping up to date with information retrieval research: Summarized Research i...Patrice Chalon
 
Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Kerstin Forsberg
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Susanna-Assunta Sansone
 
DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...Núria Queralt Rosinach
 

Similar to Model for capturing provenance of chemical assertions (20)

Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...Building a Network of Interoperable and Independently Produced Linked and Ope...
Building a Network of Interoperable and Independently Produced Linked and Ope...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019Open science in RIKEN-KI doctorial course on March 20, 2019
Open science in RIKEN-KI doctorial course on March 20, 2019
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
Keeping up to date with information retrieval research: Summarized Research i...
Keeping up to date with information retrieval research: Summarized Research i...Keeping up to date with information retrieval research: Summarized Research i...
Keeping up to date with information retrieval research: Summarized Research i...
 
Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
 
DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...DisGeNET: A discovery platform for the dynamical exploration of human disease...
DisGeNET: A discovery platform for the dynamical exploration of human disease...
 
NIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATSNIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATS
 

Recently uploaded

Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 

Recently uploaded (20)

Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 

Model for capturing provenance of chemical assertions

  • 1. A model for capturing provenance of assertions about chemical substances Kody Moodley1 , Amrapali Zaveri1 , Chunlei Wu2 , Michel Dumontier1 1 Institute of Data Science @ Maastricht University 2 Scripps Research Institute
  • 3. Provenance for scientific assertions “Acetaminophen inhibits Prostaglandin G/H synthase 1” Publication Report Extract Who? When? Publication? Experiment? Evidence? "...information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness" - W3C
  • 4. Challenges 1. No guidelines on which provenance should be reported and extracted (which attributes are compulsory, recommended and optional?) 2. Different terminologies (ontologies) are used to capture the same provenance information across different databases ??? ??? Affects data producers, curators & API developers Publication ExtractReport ???
  • 5. Related Work DAta Tag Suite (DATS) HCLS Dataset Descriptions Bioschemas.org Datasets: Assertions: SEPIO PROV-O Dublin Core PAV ontology
  • 6. Limitations Current provenance models: ● Do not define compulsory, recommended and optional provenance information ● Are “top-down”: propose new terminology for capturing provenance Requirement level guidelines are needed to improve consistency and compliance in provenance specification “Bottom-up” approaches, taking into account how users currently specify provenance, can help researchers to assist in standardizing its specification
  • 7. Our Approach ● Which provenance properties are being specified by data publishers? ○ Analyse provenance coverage and propose attributes to consider ● Which ontology terms do they use to capture these properties? ○ Propose ontology terms to use for defining provenance attributes ● Which properties are more frequently used? ○ Propose compulsory, recommended and optional attributes
  • 8. Datasets Analyzed provenance usage in: ● Nanopublications ● Wikidata And provenance advocated in: ● Minimal Information Standards (MISTS) for reporting biomedical studies
  • 9. Nanopublications ● Extracted 333 properties from 10,803,231 nanopubs (05/04/2018) ● Manually pruned to 37 unique provenance properties ● Most frequently used: Property Ontology hasPublicationInfo Nanopubs nschema created (date) DC terms references SIO contributor (author) DC terms
  • 10. Wikidata ● All 157,522 records of type “Chemical Compound” (extracted 24/09/2018) ● Identified 76 unique relation / metadata types ● Pruned to 37 provenance related elements ● Most frequently used: Property Ontology statedin (database) Wikidata retrieved (date extracted) Wikidata referenceurl (website) Wikidata publicationdate Wikidata
  • 11. MISTS ● 14 reporting guidelines for biomedical studies in FAIRSharing.org ● Read publications to identify 347 required metadata elements for studies ● Pruned to only provenance related elements (44 in total) ● Examples: ○ ethics statement ○ apparatus ○ duration ○ location
  • 12. Analysis 74 provenance properties naturally fell into 5 dimensions: Coverage of provenance metadata types in Wikidata & Nanopubs
  • 13. Analysis Partitioned 74 properties into 23 MUST, 23 SHOULD & 28 OPTIONAL by: ● Frequency of use in Nanopubs and Wikidata ● Advocacy in selected MISTS ● Relative importance in determining veracity of assertion (e.g. more important to know title of publication than the chapter or page number)
  • 14. Results https://github.com/MaastrichtU-IDS/biothingsprovenancemodel { "@context" : "http://biothings.io/biothingspovenancemodel.jsonld", "id" : "assertion1", "type" : "sio:assertion", "label" : "Acetaminophen inhibits Prostaglandin G/H synthase 1", "subject" : "drugbank:DB00316", "predicate" : "sio:inhibits", "object" : "drugbankb:BE0000017", "assertedBy" : ”Regina M. Botting", "coAssertedBy" : "Samir S. Ayoub", "assertedOn" : "2004/11/23", "publishedOn" : "2004/11/23", "wasInferredFrom" : "assertion2", "publishedIn" : “Prostaglandins, Leukotrienes and Essential Fatty Acids (PLEFA)", "publicationTitle" : "COX-3 and the mechanism of action of paracetamol/acetaminophen", } Examples in JSON-LD & RDF: JSON-LD context file is also provided in Github for mapping to ontology terms
  • 15. Future Work ● Implement model on data sources in BioThings API - MyChem.info ● Evaluate the feasibility of the model on these data sources by: ○ Measuring how much compulsory provenance is actually specified in these sources ● Discuss possible revisions and uptake within the biomedical community ○ Choice of compulsory provenance may differ across subdomains in chemical substance research