SlideShare ist ein Scribd-Unternehmen logo
1 von 81
Federated SPARQL Query Processing
Over the Web of Data
Muhammad Saleem
Tutorial at ISWC 2015, Bethlehem, USA
Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig,
Germany, 11/10/2015
Agenda
• SPARQL Query Federation Approaches
• SPARQL Query Federation Optimization
– Source Selection
– Data Integration Options
– Join Order Selection
– Join Order Optimization
– Join Implementations
• Performance Metrics and Discussion
SPARQL Query Federation Approaches
• SPARQL Endpoint Federation (SEF)
• Linked Data Federation (LDF)
• Linked Data Fragments Federation (LDFF)
• Distributed Hash Tables (DHTs)
• Hybrid
SPARQL Endpoint Federation Approaches
• Most commonly used approaches
• Make use of SPARQL endpoints URLs
• Fast query execution
• RDF data needs to be exposed via SPARQL
endpoints
• E.g., HiBISCus, FedX, SPLENDID, ANAPSID, LHD,
TopFed, QUETSAL etc.
Linked Data Federation Approaches
• Data needs not be exposed via SPARQL endpoints
• Uses URI lookups at runtime
• Data should follow Linked Data principles
• Slower as compared to previous approaches
• E.g., LDQPS, SIHJoin, WoDQA etc.
Linked Data Fragments Federation
• Federation over Linked Data Fragments
• Will be explained in upcoming session in detail
Query federation on top of Distributed Hash Tables
• Uses DHT indexing to federate SPARQL queries
• Space efficient
• Cannot deal with whole LOD
• E.g., ATLAS
Hybrid
• Federation over SPARQL endpoints and Linked
Data
• Can potentially deal with whole LOD
• E.g., ADERIS-Hybrid (of SEF+LDF)
SPARQL Endpoint Federation
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimzer
Integrator
Rewrite query
and get Individual
Triple Patterns
Identify capable
source against
Individual Triple
Patterns
Generate
optimized sub-
query Exe. Plan
Integrate sub-
queries results
Execute sub-
queries
Source Selection
Source Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
Source Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
Source Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
Source Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4-S9
Source Selection
Total triple pattern-wise sources selected =
1+1+1+1+8 => 12
Types of Source Selection
• Index-free
– Using SPARQL ASK queries
– No index maintenance required
– Potentially ensures result set completeness
– SPARQL ASK queries can be expensive
– Can make use of the cache to store recent SPARQL ASK queries results
– E.g., FedX
• Index-only
– Only make use of Index/data summaries
– Less efficient but fast source selection
– Result set completeness is not ensured
– E.g., DARQ, LHD
• Hybrid
– Make use of index+SPARQL ASK
– Most efficient
– Result set completeness is not ensured
– Can make use of the cache to store recent SPARQL ASK queries results
– E.g., HiBISCuS, ANAPSID, SPLENDID
Index-free Source Selection
Input: SPARQL query Q , set of all data sources D
Output: Triple pattern to relevant data sources map M
for each triple pattern ti in SPARQL query Q
Ri = {}; // set of relevant data sources for triple pattern ti
for each data source di in D
if SPARQL ASK(di , ti) = true
Ri = Ri U {di};
end if
end for
M = M U {Ri};
end for
return M What is the total number of SPARQL ASK requests used?
total number of triple patterns * total number of data sources
Index-free
Source Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
Index-free
Source Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
Index-free
Source Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
Index-free
Source Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2
Index-free
Source Selection
Total number of SPARQL ASK requests used = 45
Total triple pattern-wise sources selected = 12
S4-S9
Index-only Source Selection (LHD)
Input: SPARQL query Q , set of all data sources D, data sources index I storing all distinct predicates for
all data sources in D
Output: Triple pattern to relevant data sources map M
for each triple pattern ti in SPARQL query Q
Ri = {}; // set of relevant data sources for triple pattern ti
p = Pred(ti) // predicate of ti
if (bound (p))
Ri = Lookup (I, p) // index lookup for predicate of ti
else
Ri = D ; // all data sources are relevant
end if
M = M U {Ri} ;
end for
return M Why it is the less efficient approach (i.e., greatly overestimate relevant data sources)?
• Source selection is only based on predicate of triple patterns
• Simply select all data sources for triple patterns having unbound predicates
Index-only
Source Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1-S9TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1S1-S9
Index-only
Source Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
Index-only
Source Selection
S1-S9
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
Index-only
Source Selection
S1-S9
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4-S9
Index-only
Source Selection
Total number of SPARQL ASK requests used = 0
Total triple pattern-wise sources selected = 20
S1-S9
Hybrid Source Selection
Input: SPARQL query Q , set of all data sources D, data sources index I storing all distinct predicates for all data
sources in D
Output: Triple pattern to relevant data sources map M
for each triple pattern ti in SPARQL query Q
Ri = {}; // set of relevant data sources for triple pattern ti
s = Subj(ti) , p = Pred(ti) , o = Obj(ti) ; // subject, predicate, and object of ti
if (!bound (p) || bound (s) || bound (o) )
for each data source di in D
if SPARQL ASK(di , ti) = true
Ri = Ri U {di};
end if
end for
else
Ri = Lookup (I, p) // index lookup for predicate of ti
end if
M = M U {Ri}
end for
return M
What is the total number of SPARQL ASK requests used?
total number of triple patterns with bound subject or bound object
or unbound predicate * total number of data sources
Hybrid Source
Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
Hybrid Source
Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
Hybrid Source
Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
Hybrid Source
Selection
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2
Total number of SPARQL ASK requests used = 18
Total triple pattern-wise sources selected = 12
S4-S9
Anything still needs
to be improved?
Hybrid Source
Selection
Source Selection
• Triple pattern-wise source selection
– Ensures 100% recall
– Can over-estimate capable sources
– Can be expensive, e.g., total number of SPARQL ASK
requests used
– Performed by FedX, SPLENDID, LHD, DARQ, ADERIS etc.
• Join-aware triple-pattern wise source selection
– Ensures 100% recall
– May selects optimal/close to optimal capable sources
– Can be expensive, e.g., total number of SPARQL ASK
requests used
– Can significantly reduce the query execution time
– Performed by ANAPSID, HiBISCuS
HiBISCuS: Hypergraph-Based Source Selection for
SPARQL Endpoint Federation
• Hybrid source selection
• Join-aware triple-pattern wise source selection
• Makes use of the hypergraph representation of
SPARQL queries
• Makes use of the URI authorities
• Makes use of the cache to store recent SPARQL
ASK queries results
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Total triple pattern-wise selected sources = 12
Total SPARQL ASK queries : 9*5 = 45
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Total triple pattern-wise selected sources = 12
Total SPARQL ASK queries : 9*5 = 45
Motivation
FedBench (LD3): Return for all US presidents their party
membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
TP3 = S1
Optimal triple pattern-wise selected sources 5
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo
Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Problem Statement
• An overestimation of triple pattern-wise source selection can
be expensive
– Resources are wasted
– Query runtime is increased
– Extra traffic is generated
• How do we perform join-aware triple pattern wise source
selection in time efficient way?
HiBISCuS: Key Concept
• Makes use of the URI’s authorities
http://dbpedia.org/ontology/party
Scheme Authority Path
For URI details: http://tools.ietf.org/html/rfc3986
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
dbpedia:
party
?party
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
dbpedia:
party
?party
?x
nyt:topi
cPage
?page
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
dbpedia:
party
?party
?x
nyt:topi
cPage
?page
owl:
SameAs
HiBISCuS: SPARQL Query as Hypergraph
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_S
tates
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party
?party
nyt:topi
cPage
?page
Star simple hybrid Tail of hyperedge
HiBISCuS: Data Summaries
[] a ds:Service ;
ds:endpointUrl <http://dbpedia.org/sparql> ;
ds:capability [
ds:predicate dbpedia:party ;
ds:sbjAuthority <http://dbpedia.org/> ;
ds:objAuthority <http://dbpedia.org/> ;
] ;
ds:capability [
ds:predicate rdf:type ;
ds:sbjAuthority <http://dbpedia.org/> ;
ds:objAuthority owl:Thing, dbpedia:President; #we store all distinct
classes
] ;
ds:capability [
ds:predicate dbpedia:postalCode ;
ds:sbjAuthority <http://dbpedia.org/> ;
#No objAuthority as the object value for dbpedia:postalCode is string
] ;
HiBISCuS: Triple Pattern-wise Source Selection
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia KEGG NYT SWDF LMDB Geo DrgBnk Jamendo
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.
auth.
dbpedia
Sbj. auth.
KEGG
Sbj. auth.
NYT
Sbj. auth.
SWDF
Sbj. auth.
LMDB
Sbj. auth.
Geo
Sbj. auth.
DrgBnk
Sbj. auth.
Jamendo
Sbj. auth.
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia
Sbj. auth.
KEGG
Sbj. auth.
NYT
Sbj. auth.
SWDF
Sbj. auth.
LMDB
Sbj. auth.
Geo
Sbj. auth.
DrgBnk
Sbj. auth.
Jamendo
Sbj. auth.
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.
auth.
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.
auth.
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
NYT
Obj. auth.
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
NYT
Obj. auth.
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
?president
rdf:type
dbpedia:
President
dbpedia:
United_
States
dbpedia:
nationality
?x
owl:
SameAs
dbpedia:
party ?party
nyt:topi
cPage
?page
Total triple pattern-wise selected sources = 5
Total SPARQL ASK queries : 0
Data Integration Options
Complete Local Integration
• Triple patterns are individually and completely
evaluated against every endpoint
• Triple pattern results are locally integrated using
different join techniques, e.g., NLJ, Hash Join etc.
• Less efficient if query contains common
predicates such rdf:type and owl:sameAs
• Large amount of potentially irrelevant
intermediate results retrieval
Iterative Integration
• Evaluate query iteratively pattern by pattern
• Start with a single triple pattern
• Substitute mappings from previous triple pattern
in the subsequent evaluation
• Evaluate query in a NLJ fashion
• NLJ can cause many remote requests
• Block NLJ fashion minimize the remote requests
Join Order Selection
Join Order Selection
• Left-deep trees
– Joins take place in a left-to-right sequential order
– Result of the join is used as an outer input for the next join
– Used in FedX, DARQ
• Right-deep trees
– Joins take place in a right-to-left sequential order
– Result of the join is used as an inner input for the next join
• Bushy trees
– Joins take place in sub-tress both on left and right sides
– Used in ANAPSID
• Dynamic programming
– Used in SPLENDID
Join Order Selection Example
Compute Micronutrients using Drugbank and KEGG
SELECT ?drug ?title WHERE {
?drug drugbank:drugCategory drugbank-cat:micronutrient. // TP1
?drug drugbank:casRegistryNumber ?id . // TP2
?keggDrug rdf:type kegg:Drug . // TP3
?keggDrug bio2rdf:xRef ?id . // TP4
?keggDrug dc:title ?title . // TP5
}
66
𝜋 ? 𝑑𝑟𝑢𝑔, ? 𝑡𝑖𝑡𝑙𝑒
TP1 TP2
TP3
TP4
TP5
Left-deep tree
𝜋 ? 𝑑𝑟𝑢𝑔, ? 𝑡𝑖𝑡𝑙𝑒
TP1 TP2
TP3
TP4
TP5
Right-deep tree
𝜋 ? 𝑑𝑟𝑢𝑔, ? 𝑡𝑖𝑡𝑙𝑒
TP1 TP2
Bushy tree
TP3 TP5
TP4
Goal: Execute smallest cardinality joins first
Join Order Optimization
Join Order Optimization
• Exclusive Groups
– Group triple patterns with the same relevant data source
– Evaluation in a single (remote) sub-query
– Push join to the data source, i.e., endpoint
• Variable count-heuristic
– Iteratively determine the join order based on free variables
count of triple patterns and groups
– Consider “resolved ” variable mappings from earlier iteration
• Using Selectivities
– Store distinct predicates, avg. subject selectivities , and avg.
object selectivities for each predicate in index
– Use the predicate count, avg. subject selectivities , and avg.
object selectivities to estimate the join cardinality
Exclusive Groups
SELECT ?President ?Party ?TopicPage WHERE {
?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates .
?President dbpedia:party ?Party .
?nytPresident owl:sameAs ?President .
?nytPresident nytimes:topicPage ?TopicPage .
}
Source Selection
@ DBpedia
@ DBpedia
@ DBpedia, NYTimes
@ NYTimes
Exclusive Group
Advantage:
Delegate joins to the endpoint by forming exclusive groups (i.e. executing the
respective patterns in a single subquery)
69
Source: http://www.slideshare.net/aschwarte/fedx-for-federated-query-processing-on-linked-data
Exclusive Groups Join Order Optimization
2 Unoptimized Internal Representation
Compute Micronutrients using Drugbank and KEGG
SELECT ?drug ?title WHERE {
?drug drugbank:drugCategory drugbank-cat:micronutrient .
?drug drugbank:casRegistryNumber ?id .
?keggDrug rdf:type kegg:Drug .
?keggDrug bio2rdf:xRef ?id .
?keggDrug dc:title ?title .
}
1 SPARQL Query
3 Optimized Internal Representation
4x Local Join
=
4x NLJ
Exlusive Group
 Remote Join
70
Source: http://www.slideshare.net/aschwarte/fedx-for-federated-query-processing-on-linked-data
[] a sd:Service ;
sd:endpointUrl <http://localhost:8890/sparql> ;
sd:capability [
sd:predicate diseasome:name ;
sd:totalTriples 147 ; // Total number of triple patterns with predicate value sd:predicate
sd:avgSbjSel ``0.0068'' ; // 1/ distinct subjects with predicate value sd:predicate
sd:avgObjSel ``0.0069'' ; // 1/ distinct Objects with predicate value sd:predicate
] ;
sd:capability [
sd:predicate diseasome:chromosomalLocation ;
sd:totalTtriples 160 ;
sd:avgSbjSel ``0.0062'' ;
sd:avgObjSel ``0.0072'' ;
] ;
S1 P O1 .
S1 P O2 .
S2 P O1 .
S3 P O2 .
totalTriples = 4
avgSbjSel(p) = 1/3
avgObjSel(p) =1/2
Selectivity Based Join Order Optimization
Selectivity Based Join Order Optimization
• Triple pattern cardinality
• Join Cardinality
𝑝 = pred(tp) , 𝑇 = Total triple having predicate 𝑝
𝐶(𝑡𝑝) =
𝑇 𝑖𝑓 𝑛𝑒𝑖𝑡ℎ𝑒𝑟 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑛𝑜𝑟 𝑜𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑
𝑇 × 𝑎𝑣𝑔𝑆𝑏𝑗𝑆𝑒𝑙 𝑝 𝑖𝑓 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑
𝑇 × 𝑎𝑣𝑔𝑂𝑏𝑗𝑆𝑒𝑙 𝑝 𝑖𝑓𝑜𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑
𝐶(𝐽 𝑡𝑝1, 𝑡𝑝2 ) =
𝐶 𝑡𝑝1 × 𝐶 𝑡𝑝2 × 𝑎𝑣𝑔𝑃𝑟𝑒𝑑𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝1 × 𝑎𝑣𝑔𝑃𝑟𝑒𝑑𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝2 𝑖𝑓 𝑝 − 𝑝 𝑗𝑜𝑖𝑛
𝐶 𝑡𝑝1 × 𝐶 𝑡𝑝2 × 𝑎𝑣𝑔𝑆𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝1 × 𝑎𝑣𝑔𝑆𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝2 𝑖𝑓 𝑠 − 𝑠 𝑗𝑜𝑖𝑛
𝐶 𝑡𝑝1 × 𝐶 𝑡𝑝2 × 𝑎𝑣𝑔𝑆𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝1 × 𝑎𝑣𝑔𝑂𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝2 𝑖𝑓 𝑠 − 𝑜 𝑗𝑜𝑖𝑛
How to calculate avgPredJoinSel, avgSbjJoinSel, and avgObjJoinSel?
DARQ selected 0.5 as the avgJoinSel value for all joins
Join Implementations
Join Implementations
• Bound Joins
– Start with a single triple pattern (lowest cardinality)
– Substitute mappings from previous triple pattern in the
subsequent evaluation
– Bound Joins in NLJ fashion
• Execute bound joins in nested loop join fashion
• Too many remote requests
– Bound Joins in Block NLJ fashion
• Execute bound joins in block nested loop join fashion
• Make use of SPARQL UNION construct
• Remote requests are reduced by the block size
• Other Join techniques
– E.g, Hash Joins
Bound Joins in Block NLJ
SELECT ?President ?Party ?TopicPage WHERE {
?President rdf:type dbpedia:PresidentsOfTheUnitedStates .
?President dbpedia:party ?Party .
?nytPresident owl:sameAs ?President .
?nytPresident nytimes:topicPage ?TopicPage .
}
Assume that the following intermediate results have been computed as input for the last triple pattern
Block Input
“Barack Obama”
“George W. Bush”
…
Before (NLJ)
SELECT ?TopicPage WHERE { “Barack Obama” nytimes:topicPage ?TopicPage }
SELECT ?TopicPage WHERE { “George W. Bush” nytimes:topicPage ?TopicPage }
…
Now: Evaluation in a single remote request using a SPARQL UNION
construct + local post processing (SPARQL 1.0)
75
Source: http://www.slideshare.net/aschwarte/fedx-for-federated-query-processing-on-linked-data
Parallelization and Pipelining
• Execute sub-queries concurrently on different data
sources
• Multithreaded worker pool to execute the joins
and UNION operators in parallel
• Pipelining approach for intermediate results
• See FedX and LHD implementations
Performance Metrics and Discussion
Performance Metrics
• Efficient source selection in terms of
– Total triple pattern-wise sources selected
– Total number of SPARQL ASK requests used during source
selection
– Source selection time
• Query execution time
• Results completeness and correctness
• Number of remote requests during query execution
• Index compression ratio (1- index size/datadump size)
• See https://code.google.com/p/bigrdfbench/
Evaluation Setup
• Local dedicated network
• Local SPARQL endpoints (One per machine)
• Run each query 10 times and present the average results
• Statistically analyzed the results, e.g., Wilcoxon signed rank
test, student T-test
SPARQL Query Federation Engines
• FedX
• SPLENDID
• HiBISCuS+FedX
• HiBISCuS+SPLENDID
• ANAPSID
• LHD
• DARQ
• QUETSAL
• TopFed
• DAW 80
Thanks
saleem@informatik.uni-leipzig.de
AKSW, University of Leipzig, Germany

Weitere ähnliche Inhalte

Was ist angesagt?

Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02eswcsummerschool
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutMediaMixerCommunity
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIsJosef Petrák
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelabCAMELIA BOBAN
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialAdonisDamian
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesAlexandra Roatiș
 
Semantic web for ontology chapter4 bynk
Semantic web for ontology chapter4 bynkSemantic web for ontology chapter4 bynk
Semantic web for ontology chapter4 bynkNamgee Lee
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
Ks2008 Semanticweb In Action
Ks2008 Semanticweb In ActionKs2008 Semanticweb In Action
Ks2008 Semanticweb In ActionRinke Hoekstra
 
Herve_Momo-TASS_25SEP2015
Herve_Momo-TASS_25SEP2015Herve_Momo-TASS_25SEP2015
Herve_Momo-TASS_25SEP2015Herve Momo
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesBasil Ell
 
Semantic Web
Semantic WebSemantic Web
Semantic Webhardchiu
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding FormJakob .
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processinglucianb
 

Was ist angesagt? (20)

Sparql
SparqlSparql
Sparql
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Re-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playoutRe-using Media on the Web: Media fragment re-mixing and playout
Re-using Media on the Web: Media fragment re-mixing and playout
 
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
2011 4IZ440 Semantic Web – RDF, SPARQL, and software APIs
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelab
 
Semantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorialSemantic web meetup – sparql tutorial
Semantic web meetup – sparql tutorial
 
Efficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF DatabasesEfficient Query Answering against Dynamic RDF Databases
Efficient Query Answering against Dynamic RDF Databases
 
Semantic web for ontology chapter4 bynk
Semantic web for ontology chapter4 bynkSemantic web for ontology chapter4 bynk
Semantic web for ontology chapter4 bynk
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Ks2008 Semanticweb In Action
Ks2008 Semanticweb In ActionKs2008 Semanticweb In Action
Ks2008 Semanticweb In Action
 
Herve_Momo-TASS_25SEP2015
Herve_Momo-TASS_25SEP2015Herve_Momo-TASS_25SEP2015
Herve_Momo-TASS_25SEP2015
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queries
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Rdf
RdfRdf
Rdf
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
A Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF ProcessingA Comparison Between Python APIs For RDF Processing
A Comparison Between Python APIs For RDF Processing
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 

Ähnlich wie Federated SPARQL Query Processing ISWC2015 Tutorial

2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesSyed Muhammad Ali Hasnain
 
SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic WebJan Beeck
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)EUCLID project
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD CloudRuben Verborgh
 
Sparql service-description
Sparql service-descriptionSparql service-description
Sparql service-descriptionSTIinnsbruck
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebShamod Lacoul
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...తేజ దండిభట్ల
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.Tatiana Tarasova
 
SPARQL and RDF query optimization
SPARQL and RDF query optimizationSPARQL and RDF query optimization
SPARQL and RDF query optimizationKisung Kim
 
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesMULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesKemele M. Endris
 
Comparative study on the processing of RDF in PHP
Comparative study on the processing of RDF in PHPComparative study on the processing of RDF in PHP
Comparative study on the processing of RDF in PHPMSGUNC
 
List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsAlbert Meroño-Peñuela
 
A language-independent method for the extraction of RDF verbalization templat...
A language-independent method for the extraction of RDF verbalization templat...A language-independent method for the extraction of RDF verbalization templat...
A language-independent method for the extraction of RDF verbalization templat...Basil Ell
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 

Ähnlich wie Federated SPARQL Query Processing ISWC2015 Tutorial (20)

Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web Technologies
 
inteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access FrameworkinteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access Framework
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
 
SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic Web
 
Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)Relational Database to RDF (RDB2RDF)
Relational Database to RDF (RDB2RDF)
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD Cloud
 
Sparql service-description
Sparql service-descriptionSparql service-description
Sparql service-description
 
A Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic WebA Hands On Overview Of The Semantic Web
A Hands On Overview Of The Semantic Web
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
 
ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.ParlBench: a SPARQL-benchmark for electronic publishing applications.
ParlBench: a SPARQL-benchmark for electronic publishing applications.
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
SPARQL and RDF query optimization
SPARQL and RDF query optimizationSPARQL and RDF query optimization
SPARQL and RDF query optimization
 
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule TemplatesMULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates
 
Comparative study on the processing of RDF in PHP
Comparative study on the processing of RDF in PHPComparative study on the processing of RDF in PHP
Comparative study on the processing of RDF in PHP
 
List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF Lists
 
A language-independent method for the extraction of RDF verbalization templat...
A language-independent method for the extraction of RDF verbalization templat...A language-independent method for the extraction of RDF verbalization templat...
A language-independent method for the extraction of RDF verbalization templat...
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 

Mehr von Muhammad Saleem

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...Muhammad Saleem
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationMuhammad Saleem
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework Muhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Muhammad Saleem
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016Muhammad Saleem
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetMuhammad Saleem
 
FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015Muhammad Saleem
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataMuhammad Saleem
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseMuhammad Saleem
 

Mehr von Muhammad Saleem (13)

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016
 
LSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries DatasetLSQ: The Linked SPARQL Queries Dataset
LSQ: The Linked SPARQL Queries Dataset
 
FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015
 
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of DataDAW: Duplicate-AWare Federated Query Processing over the Web of Data
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 
Linked Cancer Genome Atlas Database
Linked Cancer Genome Atlas DatabaseLinked Cancer Genome Atlas Database
Linked Cancer Genome Atlas Database
 

Kürzlich hochgeladen

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxrohankumarsinghrore1
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptRakeshMohan42
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 

Kürzlich hochgeladen (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 

Federated SPARQL Query Processing ISWC2015 Tutorial

  • 1. Federated SPARQL Query Processing Over the Web of Data Muhammad Saleem Tutorial at ISWC 2015, Bethlehem, USA Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany, 11/10/2015
  • 2. Agenda • SPARQL Query Federation Approaches • SPARQL Query Federation Optimization – Source Selection – Data Integration Options – Join Order Selection – Join Order Optimization – Join Implementations • Performance Metrics and Discussion
  • 3. SPARQL Query Federation Approaches • SPARQL Endpoint Federation (SEF) • Linked Data Federation (LDF) • Linked Data Fragments Federation (LDFF) • Distributed Hash Tables (DHTs) • Hybrid
  • 4. SPARQL Endpoint Federation Approaches • Most commonly used approaches • Make use of SPARQL endpoints URLs • Fast query execution • RDF data needs to be exposed via SPARQL endpoints • E.g., HiBISCus, FedX, SPLENDID, ANAPSID, LHD, TopFed, QUETSAL etc.
  • 5. Linked Data Federation Approaches • Data needs not be exposed via SPARQL endpoints • Uses URI lookups at runtime • Data should follow Linked Data principles • Slower as compared to previous approaches • E.g., LDQPS, SIHJoin, WoDQA etc.
  • 6. Linked Data Fragments Federation • Federation over Linked Data Fragments • Will be explained in upcoming session in detail
  • 7. Query federation on top of Distributed Hash Tables • Uses DHT indexing to federate SPARQL queries • Space efficient • Cannot deal with whole LOD • E.g., ATLAS
  • 8. Hybrid • Federation over SPARQL endpoints and Linked Data • Can potentially deal with whole LOD • E.g., ADERIS-Hybrid (of SEF+LDF)
  • 9. SPARQL Endpoint Federation S1 S2 S3 S4 RDF RDF RDF RDF Parsing/Rewriting Source Selection Federator Optimzer Integrator Rewrite query and get Individual Triple Patterns Identify capable source against Individual Triple Patterns Generate optimized sub- query Exe. Plan Integrate sub- queries results Execute sub- queries
  • 11. Source Selection FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2
  • 12. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 Source Selection
  • 13. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 Source Selection
  • 14. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 Source Selection
  • 15. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 TP5 = S1 S2 S4-S9 Source Selection Total triple pattern-wise sources selected = 1+1+1+1+8 => 12
  • 16. Types of Source Selection • Index-free – Using SPARQL ASK queries – No index maintenance required – Potentially ensures result set completeness – SPARQL ASK queries can be expensive – Can make use of the cache to store recent SPARQL ASK queries results – E.g., FedX • Index-only – Only make use of Index/data summaries – Less efficient but fast source selection – Result set completeness is not ensured – E.g., DARQ, LHD • Hybrid – Make use of index+SPARQL ASK – Most efficient – Result set completeness is not ensured – Can make use of the cache to store recent SPARQL ASK queries results – E.g., HiBISCuS, ANAPSID, SPLENDID
  • 17. Index-free Source Selection Input: SPARQL query Q , set of all data sources D Output: Triple pattern to relevant data sources map M for each triple pattern ti in SPARQL query Q Ri = {}; // set of relevant data sources for triple pattern ti for each data source di in D if SPARQL ASK(di , ti) = true Ri = Ri U {di}; end if end for M = M U {Ri}; end for return M What is the total number of SPARQL ASK requests used? total number of triple patterns * total number of data sources
  • 18. Index-free Source Selection FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2
  • 19. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 Index-free Source Selection
  • 20. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 Index-free Source Selection
  • 21. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 Index-free Source Selection
  • 22. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 TP5 = S1 S2 Index-free Source Selection Total number of SPARQL ASK requests used = 45 Total triple pattern-wise sources selected = 12 S4-S9
  • 23. Index-only Source Selection (LHD) Input: SPARQL query Q , set of all data sources D, data sources index I storing all distinct predicates for all data sources in D Output: Triple pattern to relevant data sources map M for each triple pattern ti in SPARQL query Q Ri = {}; // set of relevant data sources for triple pattern ti p = Pred(ti) // predicate of ti if (bound (p)) Ri = Lookup (I, p) // index lookup for predicate of ti else Ri = D ; // all data sources are relevant end if M = M U {Ri} ; end for return M Why it is the less efficient approach (i.e., greatly overestimate relevant data sources)? • Source selection is only based on predicate of triple patterns • Simply select all data sources for triple patterns having unbound predicates
  • 24. Index-only Source Selection FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1-S9TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2
  • 25. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1S1-S9 Index-only Source Selection
  • 26. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 Index-only Source Selection S1-S9
  • 27. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 Index-only Source Selection S1-S9
  • 28. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 TP5 = S1 S2 S4-S9 Index-only Source Selection Total number of SPARQL ASK requests used = 0 Total triple pattern-wise sources selected = 20 S1-S9
  • 29. Hybrid Source Selection Input: SPARQL query Q , set of all data sources D, data sources index I storing all distinct predicates for all data sources in D Output: Triple pattern to relevant data sources map M for each triple pattern ti in SPARQL query Q Ri = {}; // set of relevant data sources for triple pattern ti s = Subj(ti) , p = Pred(ti) , o = Obj(ti) ; // subject, predicate, and object of ti if (!bound (p) || bound (s) || bound (o) ) for each data source di in D if SPARQL ASK(di , ti) = true Ri = Ri U {di}; end if end for else Ri = Lookup (I, p) // index lookup for predicate of ti end if M = M U {Ri} end for return M What is the total number of SPARQL ASK requests used? total number of triple patterns with bound subject or bound object or unbound predicate * total number of data sources
  • 30. Hybrid Source Selection FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2
  • 31. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 Hybrid Source Selection
  • 32. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 Hybrid Source Selection
  • 33. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 Hybrid Source Selection
  • 34. FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 TP5 = S1 S2 Total number of SPARQL ASK requests used = 18 Total triple pattern-wise sources selected = 12 S4-S9 Anything still needs to be improved? Hybrid Source Selection
  • 35. Source Selection • Triple pattern-wise source selection – Ensures 100% recall – Can over-estimate capable sources – Can be expensive, e.g., total number of SPARQL ASK requests used – Performed by FedX, SPLENDID, LHD, DARQ, ADERIS etc. • Join-aware triple-pattern wise source selection – Ensures 100% recall – May selects optimal/close to optimal capable sources – Can be expensive, e.g., total number of SPARQL ASK requests used – Can significantly reduce the query execution time – Performed by ANAPSID, HiBISCuS
  • 36. HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation • Hybrid source selection • Join-aware triple-pattern wise source selection • Makes use of the hypergraph representation of SPARQL queries • Makes use of the URI authorities • Makes use of the cache to store recent SPARQL ASK queries results
  • 37. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2
  • 38. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1
  • 39. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1
  • 40. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4
  • 41. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 S6 S7 S8 S9
  • 42. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 S6 S7 S8 S9 Total triple pattern-wise selected sources = 12 Total SPARQL ASK queries : 9*5 = 45
  • 43. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP3 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 S6 S7 S8 S9 Total triple pattern-wise selected sources = 12 Total SPARQL ASK queries : 9*5 = 45
  • 44. Motivation FedBench (LD3): Return for all US presidents their party membership and news pages about them. SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } dbpedia RDF Source Selection Algorithm Triple pattern-wise source selection S1TP1 = TP3 = S1 Optimal triple pattern-wise selected sources 5 KEGG RDF ChEBI RDF NYT RDF SWDF RDF LMDB RDF Jamendo RDF Geo Names RDF DrugBank RDF S1 S2 S3 S4 S5 S6 S7 S8 S9 //TP1 //TP3 //TP4 //TP5 //TP2 TP2 = S1 TP4 = S4 TP5 = S1 S2 S4 S5 S6 S7 S8 S9
  • 45. Problem Statement • An overestimation of triple pattern-wise source selection can be expensive – Resources are wasted – Query runtime is increased – Extra traffic is generated • How do we perform join-aware triple pattern wise source selection in time efficient way?
  • 46. HiBISCuS: Key Concept • Makes use of the URI’s authorities http://dbpedia.org/ontology/party Scheme Authority Path For URI details: http://tools.ietf.org/html/rfc3986
  • 47. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President
  • 48. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality
  • 49. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality dbpedia: party ?party
  • 50. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality dbpedia: party ?party ?x nyt:topi cPage ?page
  • 51. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality dbpedia: party ?party ?x nyt:topi cPage ?page owl: SameAs
  • 52. HiBISCuS: SPARQL Query as Hypergraph SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_S tates dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page Star simple hybrid Tail of hyperedge
  • 53. HiBISCuS: Data Summaries [] a ds:Service ; ds:endpointUrl <http://dbpedia.org/sparql> ; ds:capability [ ds:predicate dbpedia:party ; ds:sbjAuthority <http://dbpedia.org/> ; ds:objAuthority <http://dbpedia.org/> ; ] ; ds:capability [ ds:predicate rdf:type ; ds:sbjAuthority <http://dbpedia.org/> ; ds:objAuthority owl:Thing, dbpedia:President; #we store all distinct classes ] ; ds:capability [ ds:predicate dbpedia:postalCode ; ds:sbjAuthority <http://dbpedia.org/> ; #No objAuthority as the object value for dbpedia:postalCode is string ] ;
  • 54. HiBISCuS: Triple Pattern-wise Source Selection SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia KEGG NYT SWDF LMDB Geo DrgBnk Jamendo
  • 55. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia KEGG NYT SWDF DrgBnk LMDB Geo Jamendo Obj. auth. dbpedia Sbj. auth. KEGG Sbj. auth. NYT Sbj. auth. SWDF Sbj. auth. LMDB Sbj. auth. Geo Sbj. auth. DrgBnk Sbj. auth. Jamendo Sbj. auth.
  • 56. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia Sbj. auth. KEGG Sbj. auth. NYT Sbj. auth. SWDF Sbj. auth. LMDB Sbj. auth. Geo Sbj. auth. DrgBnk Sbj. auth. Jamendo Sbj. auth. dbpedia KEGG NYT SWDF DrgBnk LMDB Geo Jamendo Obj. auth.
  • 57. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page dbpedia KEGG NYT SWDF DrgBnk LMDB Geo Jamendo Obj. auth.
  • 58. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page NYT Obj. auth.
  • 59. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page NYT Obj. auth.
  • 60. HiBISCuS: Triple Pattern-wise Source Pruning SELECT ?president ?party ?page WHERE { ?president rdf:type dbpedia:President . ?president dbpedia:nationality dbpedia:United_States . ?president dbpedia:party ?party . ?x nyt:topicPage ?page . ?x owl:sameAs ?president . } ?president rdf:type dbpedia: President dbpedia: United_ States dbpedia: nationality ?x owl: SameAs dbpedia: party ?party nyt:topi cPage ?page Total triple pattern-wise selected sources = 5 Total SPARQL ASK queries : 0
  • 62. Complete Local Integration • Triple patterns are individually and completely evaluated against every endpoint • Triple pattern results are locally integrated using different join techniques, e.g., NLJ, Hash Join etc. • Less efficient if query contains common predicates such rdf:type and owl:sameAs • Large amount of potentially irrelevant intermediate results retrieval
  • 63. Iterative Integration • Evaluate query iteratively pattern by pattern • Start with a single triple pattern • Substitute mappings from previous triple pattern in the subsequent evaluation • Evaluate query in a NLJ fashion • NLJ can cause many remote requests • Block NLJ fashion minimize the remote requests
  • 65. Join Order Selection • Left-deep trees – Joins take place in a left-to-right sequential order – Result of the join is used as an outer input for the next join – Used in FedX, DARQ • Right-deep trees – Joins take place in a right-to-left sequential order – Result of the join is used as an inner input for the next join • Bushy trees – Joins take place in sub-tress both on left and right sides – Used in ANAPSID • Dynamic programming – Used in SPLENDID
  • 66. Join Order Selection Example Compute Micronutrients using Drugbank and KEGG SELECT ?drug ?title WHERE { ?drug drugbank:drugCategory drugbank-cat:micronutrient. // TP1 ?drug drugbank:casRegistryNumber ?id . // TP2 ?keggDrug rdf:type kegg:Drug . // TP3 ?keggDrug bio2rdf:xRef ?id . // TP4 ?keggDrug dc:title ?title . // TP5 } 66 𝜋 ? 𝑑𝑟𝑢𝑔, ? 𝑡𝑖𝑡𝑙𝑒 TP1 TP2 TP3 TP4 TP5 Left-deep tree 𝜋 ? 𝑑𝑟𝑢𝑔, ? 𝑡𝑖𝑡𝑙𝑒 TP1 TP2 TP3 TP4 TP5 Right-deep tree 𝜋 ? 𝑑𝑟𝑢𝑔, ? 𝑡𝑖𝑡𝑙𝑒 TP1 TP2 Bushy tree TP3 TP5 TP4 Goal: Execute smallest cardinality joins first
  • 68. Join Order Optimization • Exclusive Groups – Group triple patterns with the same relevant data source – Evaluation in a single (remote) sub-query – Push join to the data source, i.e., endpoint • Variable count-heuristic – Iteratively determine the join order based on free variables count of triple patterns and groups – Consider “resolved ” variable mappings from earlier iteration • Using Selectivities – Store distinct predicates, avg. subject selectivities , and avg. object selectivities for each predicate in index – Use the predicate count, avg. subject selectivities , and avg. object selectivities to estimate the join cardinality
  • 69. Exclusive Groups SELECT ?President ?Party ?TopicPage WHERE { ?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates . ?President dbpedia:party ?Party . ?nytPresident owl:sameAs ?President . ?nytPresident nytimes:topicPage ?TopicPage . } Source Selection @ DBpedia @ DBpedia @ DBpedia, NYTimes @ NYTimes Exclusive Group Advantage: Delegate joins to the endpoint by forming exclusive groups (i.e. executing the respective patterns in a single subquery) 69 Source: http://www.slideshare.net/aschwarte/fedx-for-federated-query-processing-on-linked-data
  • 70. Exclusive Groups Join Order Optimization 2 Unoptimized Internal Representation Compute Micronutrients using Drugbank and KEGG SELECT ?drug ?title WHERE { ?drug drugbank:drugCategory drugbank-cat:micronutrient . ?drug drugbank:casRegistryNumber ?id . ?keggDrug rdf:type kegg:Drug . ?keggDrug bio2rdf:xRef ?id . ?keggDrug dc:title ?title . } 1 SPARQL Query 3 Optimized Internal Representation 4x Local Join = 4x NLJ Exlusive Group  Remote Join 70 Source: http://www.slideshare.net/aschwarte/fedx-for-federated-query-processing-on-linked-data
  • 71. [] a sd:Service ; sd:endpointUrl <http://localhost:8890/sparql> ; sd:capability [ sd:predicate diseasome:name ; sd:totalTriples 147 ; // Total number of triple patterns with predicate value sd:predicate sd:avgSbjSel ``0.0068'' ; // 1/ distinct subjects with predicate value sd:predicate sd:avgObjSel ``0.0069'' ; // 1/ distinct Objects with predicate value sd:predicate ] ; sd:capability [ sd:predicate diseasome:chromosomalLocation ; sd:totalTtriples 160 ; sd:avgSbjSel ``0.0062'' ; sd:avgObjSel ``0.0072'' ; ] ; S1 P O1 . S1 P O2 . S2 P O1 . S3 P O2 . totalTriples = 4 avgSbjSel(p) = 1/3 avgObjSel(p) =1/2 Selectivity Based Join Order Optimization
  • 72. Selectivity Based Join Order Optimization • Triple pattern cardinality • Join Cardinality 𝑝 = pred(tp) , 𝑇 = Total triple having predicate 𝑝 𝐶(𝑡𝑝) = 𝑇 𝑖𝑓 𝑛𝑒𝑖𝑡ℎ𝑒𝑟 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑛𝑜𝑟 𝑜𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑 𝑇 × 𝑎𝑣𝑔𝑆𝑏𝑗𝑆𝑒𝑙 𝑝 𝑖𝑓 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑 𝑇 × 𝑎𝑣𝑔𝑂𝑏𝑗𝑆𝑒𝑙 𝑝 𝑖𝑓𝑜𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑 𝐶(𝐽 𝑡𝑝1, 𝑡𝑝2 ) = 𝐶 𝑡𝑝1 × 𝐶 𝑡𝑝2 × 𝑎𝑣𝑔𝑃𝑟𝑒𝑑𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝1 × 𝑎𝑣𝑔𝑃𝑟𝑒𝑑𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝2 𝑖𝑓 𝑝 − 𝑝 𝑗𝑜𝑖𝑛 𝐶 𝑡𝑝1 × 𝐶 𝑡𝑝2 × 𝑎𝑣𝑔𝑆𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝1 × 𝑎𝑣𝑔𝑆𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝2 𝑖𝑓 𝑠 − 𝑠 𝑗𝑜𝑖𝑛 𝐶 𝑡𝑝1 × 𝐶 𝑡𝑝2 × 𝑎𝑣𝑔𝑆𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝1 × 𝑎𝑣𝑔𝑂𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝2 𝑖𝑓 𝑠 − 𝑜 𝑗𝑜𝑖𝑛 How to calculate avgPredJoinSel, avgSbjJoinSel, and avgObjJoinSel? DARQ selected 0.5 as the avgJoinSel value for all joins
  • 74. Join Implementations • Bound Joins – Start with a single triple pattern (lowest cardinality) – Substitute mappings from previous triple pattern in the subsequent evaluation – Bound Joins in NLJ fashion • Execute bound joins in nested loop join fashion • Too many remote requests – Bound Joins in Block NLJ fashion • Execute bound joins in block nested loop join fashion • Make use of SPARQL UNION construct • Remote requests are reduced by the block size • Other Join techniques – E.g, Hash Joins
  • 75. Bound Joins in Block NLJ SELECT ?President ?Party ?TopicPage WHERE { ?President rdf:type dbpedia:PresidentsOfTheUnitedStates . ?President dbpedia:party ?Party . ?nytPresident owl:sameAs ?President . ?nytPresident nytimes:topicPage ?TopicPage . } Assume that the following intermediate results have been computed as input for the last triple pattern Block Input “Barack Obama” “George W. Bush” … Before (NLJ) SELECT ?TopicPage WHERE { “Barack Obama” nytimes:topicPage ?TopicPage } SELECT ?TopicPage WHERE { “George W. Bush” nytimes:topicPage ?TopicPage } … Now: Evaluation in a single remote request using a SPARQL UNION construct + local post processing (SPARQL 1.0) 75 Source: http://www.slideshare.net/aschwarte/fedx-for-federated-query-processing-on-linked-data
  • 76. Parallelization and Pipelining • Execute sub-queries concurrently on different data sources • Multithreaded worker pool to execute the joins and UNION operators in parallel • Pipelining approach for intermediate results • See FedX and LHD implementations
  • 78. Performance Metrics • Efficient source selection in terms of – Total triple pattern-wise sources selected – Total number of SPARQL ASK requests used during source selection – Source selection time • Query execution time • Results completeness and correctness • Number of remote requests during query execution • Index compression ratio (1- index size/datadump size) • See https://code.google.com/p/bigrdfbench/
  • 79. Evaluation Setup • Local dedicated network • Local SPARQL endpoints (One per machine) • Run each query 10 times and present the average results • Statistically analyzed the results, e.g., Wilcoxon signed rank test, student T-test
  • 80. SPARQL Query Federation Engines • FedX • SPLENDID • HiBISCuS+FedX • HiBISCuS+SPLENDID • ANAPSID • LHD • DARQ • QUETSAL • TopFed • DAW 80