With the growth of the Semantic Web as a medium for creating, consuming, mashing up and republishing data, our ability to trace any statement(s) back to their origin is becoming ever more important. Several approaches have now been proposed to associate statements with provenance, with multiple applications in data publication, attribution and argumentation. Here, we describe the ovopub, a modular model for data publication that enables encapsulation, aggregation, integrity checking, and selective-source query answering. We describe the ovopub RDF specification, key design patterns and their application in the publication and referral to data in the life sciences.
paper: http://arxiv.org/abs/1305.6800
presented at bio-ontologies 2013: https://sites.google.com/site/bioontologies/home
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Â
Ovopub: Modular data publication with minimal provenance
1. Alison Callahan and Michel Dumontier
Carleton University
Ovopubs:
Modular data publication with
minimal provenance
Dumontier::Bio-ontologies 2013:Ovopubs 1
2. Data publication
âą Emerging interest in publishing data on the web
âą microdata formats (rdfa, schema.org) and formal
knowledge representation languages (RDF/OWL)
âą Efforts to capturing credit/provenance of assertions
â PROV-O, OAG
â nanopublications (data/statements - Groth, Kuth)
â microattributions (gene variation - Patrinos et al)
â micropublications (discourse - Clark et al)
Dumontier::Bio-ontologies 2013:Ovopubs 2
3. assertions
Nanopublication
âą A nanopublication claims to be the
âsmallest, unambiguous unit of thoughtâ.
âą A nanopublication is an RDF graph that links to
two/three graphs:
â A graph containing one or more assertions
â A graph containing the provenance for the assertion(s)
â A graph providing information about the nanopublication
assertion provenance publication
Problems : indirection between assertion and its provenance; what if no provenance is
provided? nanopub graph cannot fully contain other graphs; reasoning and easy of
queries across nested graphs.
Dumontier::Bio-ontologies 2013:Ovopubs 3
4. an Ovopub is an object
that contains and links to data
and the ovopubâs provenance
4
data
provenance
Dumontier::Bio-ontologies 2013:Ovopubs
5. an assertion ovopub contains
one or more connected statements
This ovopub is good for
capturing knowledge in
the form of statements
Dumontier::Bio-ontologies 2013:Ovopubs 5
6. An ovopub also links itself to its
content
rdfs:member <uri>
This explicit reification
enables transitive
closures over graph
structures
Dumontier::Bio-ontologies 2013:Ovopubs 6
7. An ovopub contains and links to its
own provenance
âą dc:creator <uri>
âą dc:created xsd:datetime
âą dc:license <uri>
âą rdf:type
sio:assertion-ovopub
sio:collection-ovopub
creator
timestamp
license
ovopub type
Dumontier::Bio-ontologies 2013:Ovopubs 7
8. a collection ovopub contains
one or more unconnected items
Item types:
- object
- assertion ovopub
- collection ovopub
This ovopub is good for
- encapsulation and
redistribution of
selected content
- restriction of query
execution / results
Dumontier::Bio-ontologies 2013:Ovopubs 8
9. iRefIndex: Ovopub Case Study
for Datasets, Records, Assertions
Dumontier::Bio-ontologies 2013:Ovopubs 9
10. Future work
âą Actively develop the nanopublication as a community
standard for provenance-based data publication
â Assess the value of directly linking assertion & provenance graphs
â Generate (revised) nanopublications in Bio2RDF
âą Promote nanopublication-based design patterns for:
â direct/indirect data/discourse assertions
â Aggregation semantics
âą Use of nanopublications for scientific research
â Evidence gathering (HyQue)
Dumontier::Bio-ontologies 2013:Ovopubs 10
In order to keep a clear link back to the original data, in our RDFized datasets we maintain the original data providerâs record identifiers by making use of the following URI pattern:namespace: preferred short name for a biological dataset Registry allows for automatic conversion of any alternative namespace to the preferred namespace from the life sciences registry