Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Querying datasets on the Web 
with high availability 
Ruben Verborgh, Olaf Hartig, Ben De Meester, Gerald Haesendonck, 
La...
Somebody comes to you 
with a high-quality 
Linked Open Data set: 
“How can people query 
my Linked Data on the Web?”
“A public SPARQL endpoint 
gives live querying, but it’s costly 
and has availability issues.” 
“Offer a data dump. 
but i...
Querying Linked Data 
on the Web 
always involves trade-offs. 
But have we looked 
at all possible trade-offs?
Querying Linked Data 
live on the Web 
becomes affordable 
by building simpler servers 
and more intelligent clients.
We formalized and evaluated 
client-side SPARQL querying 
through a triple-pattern interface. 
The trade-offs are differen...
Querying datasets on the Web 
with high availability 
Trade-offs for the Semantic Web 
Triple pattern fragment querying 
E...
Querying datasets on the Web 
with high availability 
Trade-offs for the Semantic Web 
Triple pattern fragment querying 
E...
It is not an all-or-nothing world. 
There is a spectrum of trade-offs. 
out-of-date data live data 
high bandwidth low ban...
Linked Data Fragments are 
a uniform view on Linked Data interfaces. 
data 
dump 
SPARQL 
endpoint 
Every Linked Data inte...
Each type of Linked Data Fragment 
is defined by three characteristics. 
data 
metadata 
controls 
What triples does it co...
Each type of Linked Data Fragment 
is defined by three characteristics. 
all dataset triples 
(none) 
data dump 
number of...
Each type of Linked Data Fragment 
is defined by three characteristics. 
SPARQL query result 
data 
triples matching the q...
Each type of Linked Data Fragment 
is defined by three characteristics. 
Linked Data Document 
data 
triples about a subje...
We designed a new trade-off mix 
with low cost and high availability. 
out-of-date data live data 
high bandwidth low band...
A triple pattern fragments interface 
is low-cost and enables clients to query. 
live data 
low server cost 
data 
dump 
S...
A triple pattern fragments interface 
is low-cost and enables clients to query. 
matches of a triple pattern 
total number...
controls (other fragments) 
metadata (total count) 
data (first 100)
Triple patterns are not the final answer. 
No interface ever will be. 
Triple patterns show how far we can get 
with simpl...
Querying datasets on the Web 
with high availability 
Trade-offs for the Semantic Web 
Triple pattern fragment querying 
E...
Triple pattern fragment servers 
enable clients to be intelligent. 
The HTML representation explains: 
“you can query by t...
Triple pattern fragment servers 
enable clients to be intelligent. 
<http://fragments.dbpedia.org/2014/en#dataset> hydra:s...
Triple pattern fragment servers 
enable clients to be intelligent. 
The HTML representation explains: 
“this is the number...
Triple pattern fragment servers 
enable clients to be intelligent. 
<#fragment> void:triples 8141. 
The RDF representation...
How can intelligent clients 
solve SPARQL queries over fragments? 
Give them a SPARQL query. 
Give them a URL of any datas...
Let’s follow the execution 
of an example SPARQL query. 
Find names of artists born in Padua, Italy. 
SELECT ?artist ?name...
The client looks inside the fragment 
to see how to access the dataset. 
Fragment: http://fragments.dbpedia.org/2014/en 
<...
The client splits the query 
into the available fragments. 
SELECT ?artist ?name WHERE { 
?artist a dbpedia-owl:Artist; 
r...
The client gets the fragments 
and inspects their metadata. 
?artist a dbpedia-owl:Artist. 
first 100 triples 
96.000 
?ar...
The metadata enables the client 
to choose the right starting point. 
?dbp:artist Alberto_a dbpedia-Benettin owl:a Artist....
Clients execute the query in 3 seconds 
on a highly available, low-cost server. 
SELECT ?artist ?name WHERE { 
?artist a d...
Querying datasets on the Web 
with high availability 
Trade-offs for the Semantic Web 
Triple pattern fragment querying 
E...
We evaluated triple pattern fragments 
for server cost and availability. 
Berlin SPARQL benchmark 
on Amazon EC2 machines ...
We evaluated triple pattern fragments 
for server cost and availability. 
Berlin SPARQL benchmark 
on Amazon EC2 machines ...
The query throughput is lower, 
but resilient to high client numbers. 
Querying Datasets on 1 10 100 10 100 1000 10000 cli...
The server traffic is higher, 
but requests are significantly lighter. 
Datasets on the Web with High Availability 13 
Vir...
2 
Caching is significantly more effective, 
as clients reuse fragments for queries. 
1 10 100 
0 
clients 
sent (mb) 
Fig...
150 
The server uses much less CPU, 
allowing for higher availability. 
clients 
1 10 100 
1 1 10 100 
100 
50 
0 
100 
50...
150 
Servers enable clients to be intelligent, 
so they remain simple and light-weight. 
1 10 100 
1 1 10 100 
100 
50 
0 ...
Querying datasets on the Web 
with high availability 
Trade-offs for the Semantic Web 
Triple pattern fragment querying 
E...
Triple pattern fragments allow 
querying live data with high availability. 
live data 
low server cost 
data 
dump 
SPARQL...
Like each Linked Data query method, 
triple pattern fragments have trade-offs. 
Live data 
Low-cost and highly available s...
Experience the trade-offs yourself 
on the official DBpedia interfaces. 
DBpedia data dump 
DBpedia Linked Data documents ...
Triple pattern fragments are easy: 
all software is available as open source. 
Software 
github.com/LinkedDataFragments 
D...
Querying is a client–server dialogue. 
Start exploring the entire spectrum. 
data 
dump 
SPARQL 
query results 
Linked Dat...
Querying datasets on the Web 
with high availability 
linkeddatafragments.org 
@LDFragments
Sie haben dieses Dokument abgeschlossen.
Nächste SlideShare
Danny Bickson - Python based predictive analytics with GraphLab Create
Weiter
Nächste SlideShare
Danny Bickson - Python based predictive analytics with GraphLab Create
Weiter

Teilen

Querying datasets on the Web with high availability

Slides for my talk at the ISWC2014 conference.
http://ruben.verborgh.org/publications/verborgh_iswc_2014/

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Querying datasets on the Web with high availability

  1. 1. Querying datasets on the Web with high availability Ruben Verborgh, Olaf Hartig, Ben De Meester, Gerald Haesendonck, Laurens De Vocht, Miel Vander Sande, Richard Cyganiak, Pieter Colpaert, Erik Mannens, and Rik Van de Walle
  2. 2. Somebody comes to you with a high-quality Linked Open Data set: “How can people query my Linked Data on the Web?”
  3. 3. “A public SPARQL endpoint gives live querying, but it’s costly and has availability issues.” “Offer a data dump. but it’s not really Web querying: users need to set up an endpoint” “Publish Linked Data documents. But querying is very slow…”
  4. 4. Querying Linked Data on the Web always involves trade-offs. But have we looked at all possible trade-offs?
  5. 5. Querying Linked Data live on the Web becomes affordable by building simpler servers and more intelligent clients.
  6. 6. We formalized and evaluated client-side SPARQL querying through a triple-pattern interface. The trade-offs are different: low-cost, high-availability servers, more bandwidth and slower queries.
  7. 7. Querying datasets on the Web with high availability Trade-offs for the Semantic Web Triple pattern fragment querying Evaluating the trade-offs
  8. 8. Querying datasets on the Web with high availability Trade-offs for the Semantic Web Triple pattern fragment querying Evaluating the trade-offs
  9. 9. It is not an all-or-nothing world. There is a spectrum of trade-offs. out-of-date data live data high bandwidth low bandwidth high availability low availability high client cost low client cost low server cost high server cost data dump SPARQL endpoint Linked Data documents interface offered by the server
  10. 10. Linked Data Fragments are a uniform view on Linked Data interfaces. data dump SPARQL endpoint Every Linked Data interface offers specific fragments of a Linked Data set. Linked Data documents interface offered by the server
  11. 11. Each type of Linked Data Fragment is defined by three characteristics. data metadata controls What triples does it contain? What do we know about it? How to access more data?
  12. 12. Each type of Linked Data Fragment is defined by three characteristics. all dataset triples (none) data dump number of triples, file size data metadata controls
  13. 13. Each type of Linked Data Fragment is defined by three characteristics. SPARQL query result data triples matching the query (none) (none) metadata controls
  14. 14. Each type of Linked Data Fragment is defined by three characteristics. Linked Data Document data triples about a subject creator, maintainer, … links to other LD documents metadata controls
  15. 15. We designed a new trade-off mix with low cost and high availability. out-of-date data live data high bandwidth low bandwidth high availability low availability high client cost low client cost low server cost high server cost data dump SPARQL query results Linked Data documents
  16. 16. A triple pattern fragments interface is low-cost and enables clients to query. live data low server cost data dump SPARQL query results high availability Linked Data documents triple pattern fragments
  17. 17. A triple pattern fragments interface is low-cost and enables clients to query. matches of a triple pattern total number of matches access to all other fragments data metadata controls (paged)
  18. 18. controls (other fragments) metadata (total count) data (first 100)
  19. 19. Triple patterns are not the final answer. No interface ever will be. Triple patterns show how far we can get with simple servers and smart clients. data dump SPARQL query results Linked Data documents triple pattern fragments
  20. 20. Querying datasets on the Web with high availability Trade-offs for the Semantic Web Triple pattern fragment querying Evaluating the trade-offs
  21. 21. Triple pattern fragment servers enable clients to be intelligent. The HTML representation explains: “you can query by triple pattern”. controls
  22. 22. Triple pattern fragment servers enable clients to be intelligent. <http://fragments.dbpedia.org/2014/en#dataset> hydra:search [ hydra:template "http://fragments.dbpedia.org/2014/en controls {?subject,predicate,object}"; hydra:mapping [ hydra:variable "subject"; hydra:property rdf:subject ], [ hydra:variable "predicate"; hydra:property rdf:predicate ], [ hydra:variable "object"; hydra:property rdf:object ] ]. The RDF representation explains: “you can query by triple pattern”.
  23. 23. Triple pattern fragment servers enable clients to be intelligent. The HTML representation explains: “this is the number of matches”. metadata
  24. 24. Triple pattern fragment servers enable clients to be intelligent. <#fragment> void:triples 8141. The RDF representation explains: “this is the number of matches”. metadata
  25. 25. How can intelligent clients solve SPARQL queries over fragments? Give them a SPARQL query. Give them a URL of any dataset fragment. They look inside the fragment to see how to access the dataset and use the metadata to decide how to plan the query.
  26. 26. Let’s follow the execution of an example SPARQL query. Find names of artists born in Padua, Italy. SELECT ?artist ?name WHERE { ?artist a dbpedia-owl:Artist; rdfs:label ?name; dbpedia-owl:birthPlace dbpedia:Padua. FILTER LANGMATCHES(LANG(?name), "EN") } Fragment: http://fragments.dbpedia.org/2014/en
  27. 27. The client looks inside the fragment to see how to access the dataset. Fragment: http://fragments.dbpedia.org/2014/en <http://fragments.dbpedia.org/2014/en#dataset> hydra:search [ hydra:template "http://fragments.dbpedia.org/2014/en {?subject,predicate,object}"; hydra:mapping [ hydra:variable "subject"; hydra:property rdf:subject ], [ hydra:variable "predicate"; hydra:property rdf:predicate ], [ hydra:variable "object"; hydra:property rdf:object ] ]. “I can query the dataset by triple pattern.”
  28. 28. The client splits the query into the available fragments. SELECT ?artist ?name WHERE { ?artist a dbpedia-owl:Artist; rdfs:label ?name; dbpedia-owl:birthPlace dbpedia:Padua. FILTER LANGMATCHES(LANG(?name), "EN") }
  29. 29. The client gets the fragments and inspects their metadata. ?artist a dbpedia-owl:Artist. first 100 triples 96.000 ?artist rdfs:label ?name. first 100 triples 12.000.000 ?artist dbont:birthPlace dbpedia:Padua. first 100 triples 135
  30. 30. The metadata enables the client to choose the right starting point. ?dbp:artist Alberto_a dbpedia-Benettin owl:a Artist. dbont:Artist. 96.000 ?artist rdfs:label ?name. 12.000.000 dbp:Alberto_Benettin rdfs:label ?name. ?artist dbont:birthPlace dbpedia:Padua. 135 dbpedia:Alberto_Benettin dbont:birthPlace dbpedia:Padua. dbpedia:Alberto_Bigon dbont:birthPlace dbpedia:Padua.
  31. 31. Clients execute the query in 3 seconds on a highly available, low-cost server. SELECT ?artist ?name WHERE { ?artist a dbpedia-owl:Artist; rdfs:label ?name; dbpedia-owl:birthPlace dbpedia:Padua. FILTER LANGMATCHES(LANG(?name), "EN") } Try it yourself: bit.ly/artistspadua
  32. 32. Querying datasets on the Web with high availability Trade-offs for the Semantic Web Triple pattern fragment querying Evaluating the trade-offs
  33. 33. We evaluated triple pattern fragments for server cost and availability. Berlin SPARQL benchmark on Amazon EC2 machines 100 million triples high query diversity BGP, UNION, FILTER, …
  34. 34. We evaluated triple pattern fragments for server cost and availability. Berlin SPARQL benchmark on Amazon EC2 machines 1 server 1 cache 1–240 simultaneous clients
  35. 35. The query throughput is lower, but resilient to high client numbers. Querying Datasets on 1 10 100 10 100 1000 10000 clients throughput (q/hr) Virtuoso 6 Fuseki–tdb triple pattern Fig. 3.1: Server performance (log-log plot) 200 executed SPARQL queries per hour timeouts
  36. 36. The server traffic is higher, but requests are significantly lighter. Datasets on the Web with High Availability 13 Virtuoso 6 Virtuoso 7 Fuseki–tdb Fuseki–hdt pattern fragments 1 10 100 4 2 0 clients data sent (mb) Fig. 3.2: Server network traffic data sent by server in MB
  37. 37. 2 Caching is significantly more effective, as clients reuse fragments for queries. 1 10 100 0 clients sent (mb) Fig. 3.2: Server network traffic 1 10 100 20 10 0 clients sent (mb) Fig. 3.4: Cache network traffic ram use data sent by cache in MB 8 6 4
  38. 38. 150 The server uses much less CPU, allowing for higher availability. clients 1 10 100 1 1 10 100 100 50 0 100 50 1 50 server CPU usage per core #timeouts Fig. 3.3: Query timeouts 0 clients cpu use (%) Fig. 3.5: Server processor usage per core 100 use (%)
  39. 39. 150 Servers enable clients to be intelligent, so they remain simple and light-weight. 1 10 100 1 1 10 100 100 50 0 clients #timeouts Fig. 3.3: Query timeouts 100 50 0 clients cpu use (%) 1 Fig. 3.5: Server processor usage per core 50 100 use (%) server CPU usage per core
  40. 40. Querying datasets on the Web with high availability Trade-offs for the Semantic Web Triple pattern fragment querying Evaluating the trade-offs
  41. 41. Triple pattern fragments allow querying live data with high availability. live data low server cost data dump SPARQL query results high availability Linked Data documents triple pattern fragments
  42. 42. Like each Linked Data query method, triple pattern fragments have trade-offs. Live data Low-cost and highly available servers More bandwidth Slower results (but they are streaming)
  43. 43. Experience the trade-offs yourself on the official DBpedia interfaces. DBpedia data dump DBpedia Linked Data documents DBpedia SPARQL endpoint DBpedia triple pattern fragments fragments.dbpedia.org
  44. 44. Triple pattern fragments are easy: all software is available as open source. Software github.com/LinkedDataFragments Documentation and specification linkeddatafragments.org
  45. 45. Querying is a client–server dialogue. Start exploring the entire spectrum. data dump SPARQL query results Linked Data documents triple pattern fragments
  46. 46. Querying datasets on the Web with high availability linkeddatafragments.org @LDFragments
  • seralf

    Nov. 22, 2014
  • cyberandy

    Nov. 19, 2014
  • umutcansimsek

    Oct. 29, 2014
  • FreddyPriyatna

    Oct. 29, 2014
  • Wikier

    Oct. 28, 2014
  • jogracia

    Oct. 28, 2014
  • choeungjin

    Oct. 24, 2014
  • emekaokoye

    Oct. 23, 2014

Slides for my talk at the ISWC2014 conference. http://ruben.verborgh.org/publications/verborgh_iswc_2014/

Aufrufe

Aufrufe insgesamt

3.352

Auf Slideshare

0

Aus Einbettungen

0

Anzahl der Einbettungen

396

Befehle

Downloads

0

Geteilt

0

Kommentare

0

Likes

8

×