Making available and archiving scientific results is for the most part still considered the task of classical publishing companies, despite the fact that classical forms of publishing centered around printed narrative articles no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science.
Here we propose to design scientific data publishing as a Web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data.
We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used for the Semantic Web in general. Evaluation of the current small network shows that this system is efficient and reliable.
OECD bibliometric indicators: Selected highlights, April 2024
Publishing without Publishers: a Decentralized Approach to Dissemination, Retrieval, and Archiving of Data
1. Publishing without Publishers: a
Decentralized Approach to Dissemination,
Retrieval, and Archiving of Data
Tobias Kuhn
http://www.tkuhn.org
@txkuhn
VU University Amsterdam
14th International Semantic Web Conference (ISWC 2015)
Bethlehem, Pennsylvania, USA
15 October 2015
2. Increasing Importance of Scientific Data
https://www.google.com/trends/explore#q=%22data%20science%22
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 2 / 20
3. Scientific Data as Supplemental Material
...
http://www.nature.com/ni/journal/v16/n10/full/ni.3267.html#supplementary-information
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 3 / 20
4. Scientific Data in Open Repositories
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 4 / 20
5. Problems with Current Data Publishing
Solutions
• No enforcement of interoperable data formats such as RDF
• No enforcement of minimal provenance information
• A given dataset is often only available at one place, and
therefore inaccessible if that website happens to be down
• No guarantee that the data has not been modified/corrupted
• No possibility to access and identify individual data entries or
subsets of datasets
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 5 / 20
6. (Scientific) Data Publishing
Published data should be:
• Verifiable (Is this really the data I am looking for?)
• Immutable (Can I be sure that it hasn’t been modified?)
• Permanent (Will it be available in 1, 5, 20 years from now?)
• Reliable (Can it be efficiently retrieved whenever needed?)
• Granular (Can I refer to individual data entries?)
• Semantic (Can it be automatically interpreted?)
• Linked (Does it use established identifiers and ontologies?)
• Trustworthy (Can I trust the source?)
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 6 / 20
7. Requirement: Reliable Low-Level Operations
We need reliable low-level operations to publish data entries and
datasets:
publish <data-entry>
publish <dataset>
... and operations to retrieve data entries and datasets by their
identifiers:
get <data-entry-id>
get <dataset-id>
(like HTTP POST/GET but verifiable, immutable, permanent, reliable, ...)
Approach: Linked Data + Cryptography + Decentralization
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 7 / 20
8. Current Two-Layer Semantic Web Architectures
are not Fully Reliable
plain HTTP requests
and follow-your-nose:
applications (find/query/analyze/use data)
resolvable URIs (provide data)
SPARQL endpoints:
applications (analyze/use data)
SPARQL endpoints (provide/find/query/analyze data)
Linked Data Fragments:
applications (query/analyze/use data)
LDF servers (provide/find/query data)
1A single third-party server being down can break the entire
application!
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 8 / 20
9. Proposed Multi-Layer Architecture
applications (analyze/use data)
decentralized server network (provide data)
1
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 9 / 20
10. Proposed Multi-Layer Architecture
applications (analyze/use data)
advanced services (query/analyze data)
core services (find data)
decentralized server network (provide data)
1
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 10 / 20
11. Nanopublications: Linked Data Containers for
Provenance-Aware Semantic Publishing
assertion
provenance
publication info
nanopublication
http://nanopub.org / @nanopub org
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 11 / 20
12. Trusty URIs: Cryptographic Hash Values for
Verifiable and Immutable Web Identifiers
Nanopublications with Trusty URIs are ...
Verifiable
+
Immutable
+
Permanent
.trighttp://example.org/r1. RA 5AbXdpz5DcaYXCh9l3eI9ruBosiL5XDU3rxBbBaUO70
http://trustyuri.net/
Kuhn, Dumontier. Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data. ESWC 2014.
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 12 / 20
13. Decentralized and Reliable Publishing with a
Nanopublication Server Network
Nanopublications
with Trusty URIs
Publication
Retrieval
Propagation /
Archiving
http://npmonitor.inn.ac
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 13 / 20
14. Defining Datasets with Nanopublication Indexes
(which are themselves Nanopublications)
appends
has sub-index
has
element
(a) (b)
(c) (f)
(d) (e)
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 14 / 20
15. Nanopublication Server Components and API
Nanopublication Server Components:
• Key-value store of nanopublications (trusty URI as the key)
• Journal: list of identifiers of all loaded nanopublications,
subdivided into pages
• List of known peers: the URLs of other servers
Simple REST API (as GET/POST requests):
• Retrieve nanopublication in a format like TriG, TriX, or N-Quads
(depending on content negotiation) for a given trusty URI
• Publish new nanopublication to the network
• Retrieve URLs of other servers in the network
• Retrieve journal page or gzipped package of it (for propagation
of nanopublications to other servers)
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 15 / 20
16. Nanopublication Server Network is
Efficient and Scalable
Our servers can deliver nanopublications about 100 times faster than
when a triple store is used (and need much less resources):
time from start of test in seconds
responsetimeinseconds
0 50 100 150 200 250 3000 50 100 150 200 250 300
0.1
1
10
100
0 20 40 60 80 100
number of clients accessing the service in parallel
Virtuoso triple store with SPARQL endpoint
nanopublication server
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 16 / 20
18. Reliable Low-Level Publish/Retrieve Operations!
Operation to publish data:
$ np publish nanopubs.trig
156026 nanopubs published at http://np.inn.ac/
which can also be used to publish dataset definitions (indexes):
$ np publish index.trig
157 nanopubs published at http://np.inn.ac/
Operation to retrieve data entries:
$ np get http://np.inn.ac/RA7Kmmugi8OuCirfe5WKchnJhC3FuhQD
and to retrieve entire datasets:
$ np get -c http://np.inn.ac/RAY lQruuagCYtAcKAPptkY7EpITw
https://github.com/Nanopublication/nanopub-java
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 18 / 20
19. Future Work
• Allow server to specify what kind of nanopublications they store
• Develop Core and Advanced Services on top of the server
network
• Establish best practices for versioning, retractions, reviews, etc.
• Connect it all to the scientific publishing workflow
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 19 / 20
20. Thank you for your attention!
Questions?
Further information:
• Nanopublications: http://nanopub.org
• Trusty URIs: http://trustyuri.net
• Nanopublication Server Network: http://npmonitor.inn.ac
Tobias Kuhn, VU University Amsterdam Publishing without Publishers: a Decentralized Approach ... 20 / 20