Data on the World Wide Web changes at the speed of light—today’s facts are tomorrow’s history. This makes the ability to look back important: how do facts grow and change over time? It gets even more interesting when we zoom out beyond individual facts: how do answers to questions evolve when data ages? With Linked Data, we are used to query the latest version of information, because updating a sparql endpoint is easier than maintaining every historical version. With the lightweight Triple Pattern Fragments interface, it becomes very easy for a server to host multiple versions. Using the Memento framework to switch between versions based on a timestamp, your browser can evaluate sparql queries over any point in time. We tried this with dbpedia—and so can you!
2. There is a huge amount of interesting
information in DBpedia’s history.
What could we learn if we could
easily query it?
3. Sustainable querying on
fragments.dbpedia.org
Uniform access to DBpedia versions
Rewriting history: applying Memento to
Triple Pattern Fragments
Time travelling through
DBpedia
Use cases and opportunities
4. Sustainable querying on
fragments.dbpedia.org
Uniform access to DBpedia versions
Rewriting history: applying Memento to
Triple Pattern Fragments
Time travelling through
DBpedia
Use cases and opportunities
5. Linked Data Fragments: hunting
trade-offs between client & server.
high server costlow server cost
data
dump
SPARQL
endpoint
interface offered by the server
high availability low availability
high bandwidth low bandwidth
out-of-date data live data
low client costhigh client cost
DBpedia
Pages
6. low server cost
data
dump
SPARQL
query results
high availability
live data
DBpedia
Pages
triple pattern
fragments
A triple pattern fragments interface
is low-cost and enables clients to query.
7. A Triple Pattern Fragments interface
acts as a gateway to an RDF source.
Client can only ask ?s ?p ?o patterns.
Decompose complex SPARQL queries
on the client-side.
Low server cost, highly cacheable,
but higher bandwidth and query time.
8.
9.
10. Usage is steadily increasing since
the release in October 2014.
#Requests
February 2015 September 2016
19.239.907
4.500.000
11. And still the API has 99.99%
availability up to today.
12. Sustainable querying on
fragments.dbpedia.org
Uniform access to DBpedia versions
Rewriting history: applying Memento to
Triple Pattern Fragments
Time travelling through
DBpedia
Use cases and opportunities
15. Any client can transparently
navigate to a prior version.
http://dbpedia.org/page/Joachim_Lambek
16. Any client can transparently
navigate to a prior version.
http://dbpedia.mementodepot.org/memento/20090924000000/
http://dbpedia.org/page/Joachim_Lambek
17. No updates since version 3.9 (2013)
because of scalability problems.
1.0
Indexing Custom
Indexing time ~ 24 hours per version
Storage MongoDB
Space 383 Gb
# Versions
10 versions:
2.0 through 3.9
# Triples ~ 3 billion
18. Sustainable querying on
fragments.dbpedia.org
Uniform access to DBpedia versions
Rewriting history: applying Memento to
Triple Pattern Fragments
Time travelling through
DBpedia
Use cases and opportunities
19. Directly compatible with Memento
data
dump
SPARQL
query results
Queryable for the consumer
Sustainable for publisher
DBpedia
pages
triple pattern
fragments
The Triple Pattern Fragments trade-off
also pays off for archives.
20. Different HDT snapshots are exposed
through an LDF server with Memento
http://fragments.dbpedia.org
(v2.0)
21. DBpedia pages are now available
through a proxy.
http://dbpedia.org/resource/…
22. Space and time-to-publish significantly
decreased.
1.0 2.0
Indexing Custom HDT-CPP
Indexing time ~ 24 hours per version ~ 4 hours per version
Storage MongoDB HDT binary files
Space 383 Gb 70 Gb
# Versions
10 versions:
2.0 through 3.9
12 versions:
2.0 through 2015
# Triples ~ 3 billion ~ 5 billion
23. Preparing the TPF client was simply
adding an HTTP header.
Query Engine
SPARQL Processing
Hypermedia Layer
Fragments interaction
HTTP Layer
Resource access
DBpedia
3.9
DBpedia
2015
303 Location
200 Content-Location (CORS)
Client
Server
GET Accept-Datetime
24. A self-descriptive interface results
in a single datetime negotiation.
Query Engine
SPARQL Processing
Hypermedia Layer
Fragments interaction
HTTP Layer
Resource access
DBpedia
3.9
DBpedia
2015
Client
Server
GET200
25. Sustainable querying on
fragments.dbpedia.org
Uniform access to DBpedia versions
Rewriting history: applying Memento to
Triple Pattern Fragments
Time travelling through
DBpedia
Use cases and opportunities
26. Querying history and the evolution
of facts.
When did a researcher with name
Hans Fichtner and born in Leipzig die?
Try it yourself:
bit.ly/hansfichtner
bit.ly/hansfichtner-2012
27. What predicates were added between 2009
and 2014 to describe a person?
Analyze and profile changes
in DBpedia.
Try it yourself:
bit.ly/personpredicates-2009
bit.ly/personpredicates-2014
28. What works by cubists were known by
DBpedia and VIAF in 2009?
Resolve out-of-sync issues between
federated sources.
Try it yourself:
bit.ly/workscubists-2009
bit.ly/workscubists
29. Sustainable querying on
fragments.dbpedia.org
Uniform access to DBpedia versions
Rewriting history: applying Memento to
Triple Pattern Fragments
Time travelling through
DBpedia
Use cases and opportunities
30. Start digging into DBpedia’s history or
host your own Linked Data archive!
github.com/LinkedDataFragments
bit.ly/configuring-memento
linkeddatafragments.org
mementoweb.org
Software
Documentation and specification
fragments.mementodepot.org
client.linkeddatafragments.org
Use the archive on