1. dans.knaw.nl
DANS is een instituut van KNAW en NWO
Linked Open Data and DANS
Reinier de Valk
reinier.de.valk@dans.knaw.nl
Vyacheslav Tykhonov
vyacheslav.tykhonov@dans.knaw.nl
NOTaS meeting, The Hague, 15.12.2017
2. LOD | Linked (Open) Data?
• Linked Data (LD) is “a method of publishing structured data so that it
can be interlinked and become more useful through semantic queries” [1]
• Linked Open Data (LOD) is LD that is open, i.e., freely availably to use
and republish
• Builds upon standard web technologies, but extends them so that they
can be read by machines
• Semantic web: a web of data that can be processed by machines
[1] https://en.wikipedia.org/wiki/Linked_data
3. LOD | Four principles of LD [2]
• Use uniform resource identifiers (URIs) as names for things
• Use HTTP URIs so that people can look up those names
• When someone looks up a URI, provide useful information, using the
standards (RDF*, SPARQL)
• Include links to other URIs, so that they can discover more things
[2] Berners-Lee, T. (2006) Linked data. https://www.w3.org/DesignIssues/LinkedData.html
4. LOD | Building block: the triple
• The basic building block of LD is the semantic triple (or simply triple)
• a triple is a statement in the form subject-predicate-object
http://example.name#Bob http://purl.org/vocab/relationship/childOf
http://example.name#Alice
http://example.name#Carl http://purl.org/vocab/relationship/childOf
http://example.name#Alice
• Triples are stored in triplestores (purpose-built databases) or graph
databases (databases with a more generalised structure)
• These databases can be queried with query languages such as
SPARQL; this is done using a (SPARQL) endpoint
5. LOD | The LOD cloud (22.08.2017) [3]
[3] http://lod-cloud.net/
6. LOD at DANS | Static LOD
• A LOD graph is living – it keeps evolving
• We archive static snapshots of the graph
• LD is in plain ASCII – no complicated formats needed
• The archived static snapshot can be revived – the README file
accompanying the data describes the procedure
• Examples at EASY, DANS online long-term archiving system [4]
• use search term “linked data”
• interesting examples: LOD Laundromat; CEDAR RDF database
[4] http://www.easy.dans.knaw.nl/
8. DANS LOD infrastructure
• LOD conversion tool harvesting public metadata from DANS systems
using OAI-PMH protocol and converting to Turtle RDF format
• Virtuoso with SPARQL endpoint to store and query archived triples
(static)
• grlc to build Web APIs using shared SPARQL queries
• Timbuctoo Linked Data storage to keep different versions of
metadata harvested from DANS systems (tern into schema)
• GraphQL endpoint integrated in Timbuctoo to query repository and
evaluate new links
9. What is Timbuctoo?
• Timbuctoo is an open source Linked Data repository system developed by Huygens ING
and specialized in handling interpretative and heterogeneous content. Timbuctoo is
specifically designed for academic research in the arts & humanities and is ideally suited
for research institutions, libraries and archives supporting scholars who follow a
hermeneutic methodology.
• Data upload options:
• Excel upload
• CSV upload
• Dataperfect upload
• remote repository upload with ResourceSync
10. Description of pipeline to archive
• Users depositing new datasets, metadata updating in time
• Snapshots are taken regularly
• ResourceSync is the only option to get updated snapshot in LOD
cloud without manual interaction
14. What is GraphQL?
• “GraphQL is a data query language developed internally by Facebook in 2012 before
being publicly released in 2015. It provides an alternative to REST and ad -hoc webservice
architectures.”
• Wikipedia
• "GraphQL is a query language for your API, and a server-side runtime for executing
queries by using a type system you define for your data. GraphQL isn't tied to any specific
database or storage engine and is instead backed by your existing code and data.”
• GraphQL endpoint provided by Timbuctoo RDF storage allows visual Linked Data
exploration.
•
17. N-Quads U.D.
• RDF data set notations are like snapshots.
• We enrich them…
• What if we need to track changes in resulting new RDF file?
• How do we know which of these predicates has had a previous value?
• What if we want to add new triples?
• N-Quads itself is an extension on N-Triples, Timbuctoo supports both:
• --- easy.nq 2017-12-14 11:18:16.057104790 +0200
• +++ empty.nq 2017-12–14 12:08:18.772264550 +0200
• @@ -1,35652 +0,0 @@
• +<easy:15960> <dc:location> "http://www.gemeentegeschiedenis.nl/gemeentenaam/Slochteren" .
• +<easy:15960> <dc:location> "http://www.gemeentegeschiedenis.nl/gemeentenaam/Sloten_NH" .
•