SlideShare a Scribd company logo
1 of 164
Download to read offline
Assessing	the	performance	of	RDF	Engines:		
Discussing	RDF	Benchmarks	
	
Irini	Fundulaki	
Institute	of	Computer	Science	–	FORTH,	Greece	
Anastasios	Kementsietsidis	
Google	Research,	USA	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 1
Traditional	Web:	Web	of	Documents	
•  single	information	space:	global	filesystem	
•  designed	for	human	consumption	
•  documents	are	the	primary	objects	with	a	loose	structure	
•  URLs	are	the	globally	unique	IDs	and	part	of	the	retrieval	
mechanism	
•  cannot	ask	expressive	queries	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 2	
©	Hartig,	Cyganiac,	Bizer,	Hausenblas,	Heath		
How	to	Publish	Linked	Data	on	the	Web	
HTML	 HTML	 HTML	
Web	Browsers	 Web	Browsers	
hyperlinks	 hyperlinks
Going	from	the	Web	of	Documents	to	the		
Web	of	Data	
•  A	global	database	
•  Designed	for	machines	first,	humans	later	
•  Things	are	primary	objects	with	a	well	defined	structure	
•  Typed	links	between	things	
•  Ability	to	express	structured	queries	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 3	
Thing	
Thing	
Thing	
Thing	
Thing	
Thing	
Don’t	link	the	documents,	link	the	things	
typed	links	 typed	links	
©	The	Web	of	Linked	Data:	Tom	Heath,		
An	Introduction	to	Linked	Data
Linking	Open	Datasets	(LOD)	
•  Publish	open	data	as	Linked	Data	on	the	Web	
•  Interlink	entities	between	heterogeneous	data	sources	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 4
Status	of	the	Linked	Open	Data	Cloud,	2007	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 5
Status	of	the	Linked	Open	Data	Cloud,	2011	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 6
Status	of	the	Linked	Open	Data	Cloud,	2014	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 7	
Media	
Government	
Geographic	
Publications	
User-generated	
Life	sciences	
Cross-domain	
RDF,	a	common	data	model	
More	than	31B	triples	in	LOD	
Links	(external):	500M
Linked	Data	in	numbers	(2014)	
•  State	of	the	LOD	Cloud	2014,	University	of	Manheim	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 8	
Domain	 Datasets	 %	 Any	 SPARQL	 Dump	
Government	 183	 18.05	 61	(32.80%)	 30.11%	 30.65%	
Publications	 96	 9.47	 10	(10.58%)	 9.62%	 3.85%	
Life	Sciences	 83	 8.19	 19	(21.35%)	 20.22%	 16.85%	
User-generated	
content	
48	 4.73	 3	(5.4%5)	 5.45%	
	
1.82%	
Cross-domain	 41	 4.04	 4	(9.09%)	 4.55%	 6.82%	
Media	 22	 2.17	 1	(2.70%)	 0.00%	 2.70%	
Geographic	 21	 2.07	 8	(19.51%)	 12.20%	 12.20%	
Social	Web	 520	 51.28	 6	(1.16%%)	 1.16%	 0.39%	
Total	 1014	 -		 48	(5.89%)	 4.54%	 3.80%	
Access	Methods
Proliferation	of	Big	Data	Stores	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 9
Many	(not	a	lot)	RDF	Stores	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 10
The	Question(s)	
•  Which	are	the	problems	that	I	wish	to	solve?	
•  Which	are	the	relevant	key	performance	indicators?	
•  Which	is	the	behavior	of	the	existing	engines	w.r.t.	the	key	
performance	indicators?	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 11	
Which	are	the	tool(s)	that	I	should		
use	for	my	data	and		
for	my	use	case?
The	Answer:	Benchmark	your	engines!	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 12	
•  Querying	Benchmark	comprises	of		
–  datasets	(synthetic	or	real)	
–  set	of	software	tools	
•  synthetic	data	generators	
•  query	generators		
–  performance	metrics,		and		
–  set	of	clear	execution	rules	
•  Standardized	application	scenario(s)	that	serve	as	a	basis	for	
testing	systems	
•  Must	include	a	clear	set	of	factors	to	be	measured	and	the	
conditions	under	which	the	systems	should	be	measured
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 13	
•  Benchmarks	exist		
–  To	allow	adequate	measurements	of	systems	
–  To	provide	evaluation	of	engines	for	real	(or	close	to	real)	use	
cases	
•  Provide	help	
–  Designers	and	Developers	to	assess	the	performance	of	their	
tools	
–  Users	to	compare	the	different	available	tools	and	evaluate	
suitability	for	their	needs	
–  Researchers	to	compare	their	work	to	others	
•  Leads	to	improvements:		
–  Vendors	can	improve	their	technology	
–  Researchers	can	address	new	challenges	
–  Current	benchmark	design	can	be	improved	to	cover	new	
necessities	and	application	domains	
Importance	of	Benchmarking
Tutorial	Objective	&	Benefits	
•  Objectives:		
–  Discuss	a	set	of	principles	and	best	practices	for	benchmark	
development	
–  Present	an	overview	of	the	current	work	on	benchmarks	for	
RDF	query	engines	
–  Focus	on	identifying	research	challenges	&	unexplored	
research	directions	
•  Benefits	for	the	audience	
–  Academic:	Obtain	a	solid	background,	discover	new	research	
directions	
–  Practitioner:	find	out	what	are	the	available	benchmarks,	
advantages	and	limitations	thereof	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 14
Purpose	of	the	Tutorial	
•  Stimulate	discussions	on	the	following	topics:		
1.  How	can	one	come	up	with	the	right	benchmark	that	
accurately	captures	use	cases	of	interest?		
2.  How	can	a	benchmark	capture	the	fact	that	RDF	data	originate	
from	a	multitude	of	formats	
! Structured:	relational	and/or	XML	data	to	RDF	
! Unstructured	
3.  How	can	a	benchmark	capture	the	different	data	and	query	
patterns	and	provide	a	consistent	picture	for	system	behavior	
across	different	application	settings?		
4.  How	can	one	select	the	right	benchmark	for	her	system,	data	
and	workload?	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 15
Overview	
•  Introducing	Benchmarks		
•  A	short	discussion	about	Linked	Data		
–  Resource	Description	Framework	(Data	Model)	
–  SPARQL	(Query	Language)	
•  Benchmarking	Principles	&	Choke	Points	
•  Benchmarks	
–  Synthetic	
–  Real	
–  Benchmark	Generators	
•  Sum	up:	what	did	we	learn	today?	
6/15/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 16
A	short	discussion	about	Linked	Data		
-	Resource	Description	Framework	(Data	Model)	
-	SPARQL	(Query	Language)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 17
Resource	Description	Framework	(RDF)	
•  W3C	standard	to	represent	Web	data	and	metadata	
•  generic	and	simple	graph	based	model	
•  information	from	heterogeneous	sources	merges	naturally:	
–  resources	with	the	same	URI	denote	the	same	non-information	
resource	(leading	to	the	Linked	Data	Cloud)	
•  structure	is	added	using	schema	languages	and	is	
represented	as	RDF	triples	
•  Web	browsers	use	URIs	to	retrieve	information	
	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 18
Resource	Description	Framework	(RDF)	
•  An	RDF	triple	is	of	the	form	(s,	p,	o)	where	
–  s	is	the	subject:	the	URI	identifying	the	described	resource	
–  o	is	the	object:	can	either	be	a	simple	literal	value	or	the	URI	of	
another	resource	
–  p	is	the	predicate:	the	URI	indicating	the	relation	between	
subject	and	object	
•  An	RDF	graph	is	a	set	of	triples	
–  Can	be	viewed	as	a	node	and	edge-labeled	directed	graph	
–  It	is	published	in	different	formats	
•  RDF-XML,	turtle,	n3	triples,	…		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 19	
(dbpedia:Good_Day_Sunshine,	dbpedia-owl:artist,		dbpedia:The_Beatles)	
Close	to	how	people	see	the	world	(as	a	graph)!
Adding	Semantics	to	RDF	
•  RDF	is	a	generic,	abstract	data	model	for	describing	resources	
in	the	form	of	triples	
•  RDF	does	not	provide	ways	of	defining	classes,	properties,	
constraints		
•  W3C	Standard	Schema	Languages	
–  RDF	Vocabulary	Description	Language	(RDF	Schema	-	
RDFS)	to	define	schema	vocabularies	
–  Ontology	Web	Language	(OWL)	to	define	ontologies	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 20
Adding	Semantics	to	RDF	
•  RDF	Vocabularies	are	sets	of	terms	used	to	describe	notions	
in	a	domain	of	interest	
•  An	RDF		term	is	either	a	Class	or	a	Property	
–  Object	properties	denote	relationships	between	objects	
–  Data	type	properties	denote	attributes	of	resources	
•  RDFS	designed	to	introduce	useful	semantics	to	RDF	triples	
•  RDFS	Schemas	are	represented	as	RDF	triples	
	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 21	
"An	RDF	Vocabulary	is	a	schema	comprising	of	classes,		
properties	and	relationships	which	can	be	used	for		
describing	data	and	metadata"
RDF	Vocabulary	Description	Language	(RDFS)	
•  Typing:	defining	classes,	properties,	instances	
•  Relationships	between	classes	and	properties:	subsumption	
•  Constraints:	domain	and	range	of	properties	
•  Inference	rules	to	entail	new,	inferred	knowledge	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 22	
Subject	 Predicate	 Object	
t1	 dbo:MusicalWork	 rdfs:subClassOf	 dbo:Album	
t2	 dbo:MusicalWork	 rdfs:domain	 dbo:artist	
t3	 dbo:MusicalWork	 rdfs:range	 dbo:march	
t4	 dbr:Seven_Seas_Of_Rye	 rdf:type	 dbo:MusicalWork	
t5	 dbo:Album	 rdf:type	 rdf:Class
RDFS		Inference	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 23	
•  Used	to	entail	new	information	from	the	one	that	is	explicitly	stated	in	
the	dataset	
–  Transitive	closure	across	class	and	property	hierarchies	
	
–  Transitive	closure	along	the	type	and	class/property	relations	
•  Two	ways	to	implement	it:	Forward	&	Backward	Reasoning	
–  Forward	Reasoning:	closure	is	computed	at	loading	time	
–  Backward	Reasoning:	closure	is	computed	on	the	fly	when	needed	
(P1,	rdfs:subPropertyOf,	P2),	(P2,	rdfs:subPropertyOf,	P3)	
(P1,	rdfs:subPropertyOf,	P3)	
R1:	
(C1,	rdfs:subClassOf,	C2),	(C2,	rdfs:subClassOf,	C3)	
(C1,	rdfs:subClassOf,	C3)	
R2:	
(C1,	rdfs:subClassOf,	C2),	(r1,	rdf:type,	C1)	
(r1,	rdf:type,	C2)	
R2:	
(P1,	rdfs:subPropertyOf,	P2),	(r1,	P1,	r2)	
(r1,	P2,	r2)	
R3:
RDFS	Inference	
•  Transitive	closure	along	the	type	and	class/property	relations	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 24	
(C1,	rdfs:subClassOf,	C2),	(r1,	rdf:type,	C1)	
(r1,	rdf:type,	C2)	
R2:	
Subject	 Predicate	 Object	
t1	 dbo:MusicalWork	 rdfs:subClassOf	 dbo:Album	
t2	 dbo:MusicalWork	 rdfs:domain	 dbo:artist	
t3	 dbo:MusicalWork	 rdfs:range	 dbo:march	
t4	 dbr:Seven_Seas_Of_Rye	 rdf:type	 dbo:MusicalWork	
t5	 dbo:Album	 rdf:type	 rdf:Class	
t6	 dbo:MusicalWork	 rdf:type	 rdf:Class
SPARQL:	Querying	RDF	Data	
•  SPARQL:	W3C	Standard	Language	for	Querying	Linked	
Data	
•  SPARQL	1.0	(2008)	only	allows	accessing	the	data	(query)	
•  SPARQL	1.1	(2013)	introduces:	
–  Query	Extensions:	aggregates,	sub-queries,	negation,	
expressions	in	the	SELECT	clause,	property	paths,	assignment,	
short	form	for	CONSTRUCT,	expanded	set	of	functions	and	
operators	
–  Updates:		
•  Data	management:		Insert,	Delete,		Delete/Insert	
•  Graph	management:		Create,	Load,	Clear,	Drop,	Copy,	
Move,	Add	
–  Federation	extension:	Service,	values,	service	variables	
(informative)	6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 25
SPARQL	Queries	(1)	
•  Building	Block	is	the	Triple	Pattern		
–  RDF	triple	with	variables	
•  Group	Graph	Patterns		
–  Built	through	inductive	construction	combining	smaller	
patterns	into	more	complex	ones	using	SPARQL	operators	
•  Join	-	similar	to	relational	join	
•  Union	(UNION)	–	similar	to	relational	union	
•  Optional	(OPTIONAL)	operators	on	triple	patterns	–	similar	
to	relational	left	outer	join	(introduces	negation	in	the	
language)	
•  Filtering	conditions	(FILTER)	
•  Patterns	on	Named	Graphs	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 26
SPARQL	Queries	(2)	
•  Aggregates		
–  specify	expressions	over	groups	of	solutions	
–  As	in	standard	settings	used	when	the	result	is	computed	over	a	
group	of	solutions	rather	than	a	single	solution	
•  Example:	average	value	of	a	set	of	values,	sum	of	a	set		
–  Aggregates	defined	in	SPARQL	1.1	are	COUNT,	SUM,	MIN,	
MAX,	AVG,	GROUP_CONCAT,	and	SAMPLE.	
–  Solutions	are	grouped	using	the	GROUP	BY	clause	
–  Pruning	at	group	level	is	performed	with	the	HAVING	clause	
•  Additional	Features	
–  duplicate	elimination	(DISTINCT)	
–  ordering	results	(ORDER	BY)	with	an	optional	LIMIT	clause	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 27
SPARQL	Semantics	
•  SPARQL	semantics	based	on	Pattern	Matching	
–  Queries	describe	subgraphs	of	the	queried	graph	
–  SPARQL	graph	patterns		describe	the	subgraphs	to	match	
	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 28	
Intuitively	a	triple	pattern	denotes	the	triples	in	an	RDF	
graph	that	are	of	a	specific	form	
TP1	=	(?album,	dbpedia-owl:artist,	dbpedia:The_Beatles)	
TP2	=	(dbpedia_The_Beatles,	?property,	?object	)	
matches	all	albums	of	the	Beatles	
matches	all	information	about	The	Beatles
SPARQL	Types	of	Queries	
•  SELECT	returns	ordered	multi-set	of	variable	bindings		
–  Bindings:	mappings	of	variables	to	RDF	terms	in	the	dataset	
–  SQL-Like	Syntax	
•  ASK	checks	whether	a	graph	pattern	has	at	least	one	
solution	-		returns	a	Boolean	value	(true/false)	
•  CONSTRUCT	returns	a	new	RDF	graph		as	specified	by	
the	graph	template	of	the	CONSTRUCT	clause	using	the	
computed	bindings	from	the	query’s	WHERE	clause	
•  DESCRIBE	returns	the	RDF	graph	containing	the	RDF	
data	about	the	requested	resource	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 29	
SELECT	?v1,	?v2,	…	WHERE	GraphPattern
Querying	RDF	Data	with	SPARQL	(1)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 30	
PREFIX		dc:	<http://purl.org/dc/elements/1.1/>	
SELECT		?title	
WHERE			{	<http://example.org/book/book1>	dc:title	?title	}		
	Simple	SELECT	query	
PREFIX	foaf:				<http://xmlns.com/foaf/0.1/>	
SELECT	?name	?mbox	
WHERE		{		?x	foaf:name	?name	.	?x	foaf:mbox	?mbox	.		}	
JOIN	Query	
PREFIX	foaf:	<http://xmlns.com/foaf/0.1/>	
SELECT	?name	?mbox	
WHERE		{	?x	foaf:name		?name	.	OPTIONAL	{	?x		foaf:mbox		?mbox	}	}	
OPTIONAL	Operator	
PREFIX		dc:		<http://purl.org/dc/elements/1.1/>	
SELECT		?title	
WHERE		{	?x	dc:title	?title	.	FILTER	regex(?title,	"^SPARQL")	}	
REGEX	in	FILTER
Querying	RDF	Data	with	SPARQL	(2)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 31	
PREFIX	foaf:			<http://xmlns.com/foaf/0.1/>	
PREFIX	org:				<http://example.com/ns#>	
CONSTRUCT	{	?x	foaf:name	?name	}	
WHERE		{	?x	org:employeeName	?name	}	
PREFIX	foaf:	<http://xmlns.com/foaf/0.1/>	
ASK					{	?x	foaf:name		"Alice"	}	
“Find	the	people	who	live	in	“Palo	Alto” and	have	founded	or	are	board	members	of	
companies	in	the	software	industry.	For	each	such	company,	find	the	products	that	
were	developed	by	it,	its	revenue,	and	optionally	its	number	of	employees.“	
	
SELECT*	WHERE	
{	?x	home	“Palo	Alto” .	
{	?x	founder	?y	}	UNION	{	?x	member	?y	}	
{	
?y	industry	“Software” .	
?z	developer	?y	.	
?y	revenue	?n	.	
OPTIONAL	{	?y	employees	?m	}	.	
}	
}	
SPARQL	1.1:	
SPARQL	plus	Aggregates,	Sub-
queries,	Property	paths,	
Negation	and	more!
Storing	and	Querying	RDF	data	
•  Schema	agnostic	
–  triples	are	stored	in	a	large	triple	table	where	the	attributes	are	
(subject,	predicate	and	object)	-	“Monolithic”	triple-stores	
–  But	it	can	get	a	bit	more	efficient	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 32	
Subject	 Predicate	 Object	
t1	 dbr:Seven_Seas_Of_Rye	 rdf:type		 dbo:MusicalWork	
t2	 dbr:Starman_(song)	 rdf:type		 dbo:MusicalWork	
t3	 dbr:Seven_Seas_Of_Rye	 dbo:artist	 dbo:Queen	
id	 URI/Literal	
1	 dbr:Seven_Seas_Of_Rye	
2	 dbr:Starman_(song)	
3	 dbo:MusicalWork	
4	 dbo:Queen	
5	 dbo:artist	
6	 rdf:type		
Subject	 Predicate	 Object	
1	 6	 3	
2	 6	 3	
1	 5	 4	
RDF-3X	maintains	6	indexes,	namely,	SPO,	SOP,	OSP,	OPS,	PSO,	
POS.	To	avoid	storage	overhead,	indexes	are	compressed!	[NW09]
Storing	and	Querying	RDF	data	
•  schema	aware:		
–  one	table	is	created	per	property	with	subject	and	object	attributes	(Property	
Tables	[Wilkinson06])	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 33	
Subject	 Predicate	 Object	
ID1	 type	 BookType	
ID1	 title	 “XYZ”	
ID1	 author	 “Fox,	Joe”	
ID1	 copyright	 “2001”	
ID2	 type	 CDType	
ID2	 title	 “ABC”	
ID2	 artist	 “Orr,	Tim”	
ID2	 copyright	 “1985”	
ID2	 language	 “French”	
ID3	 type	 BookType	
ID3	 title	 “MNO”	
ID3	 language	 “English”	
ID4	 type	 DVDType	
ID4	 title	 “DEF”	
ID5	 type	 CDType	
ID5	 title	 “GHI”	
ID5	 copyright	 “1995”	
ID6	 type	 BookType	
ID6	 copyright	 “2004”	
Subject	 Type	 Title	 copyright	
ID1	 BookType	 “XYZ”	 “2001”	
ID2	 CDType	 “ABC”	 “1985”	
ID3	 BookType	 “MNO”	 NULL	
ID4	 DVDType	 “DEF”	 NULL	
ID5	 CDType	 “GHI”	 “1995”	
ID6	 BookType	 NULL	 “2004”	
Subject	 Predicate	 Object	
ID1	 author	 “Fox,	Joe”	
ID2	 artist	 “Orr,	Tim”	
ID2	 language	 “French”	
ID3	 language	 “English”	
Subject	 Title	 Author	 copyright	
ID1	 “XYZ”	 “Fox,	Joe”	 “2001”	
ID3	 “MNO”	 NULL	 NULL	
ID6	 NULL	 NULL	 “2004”	
Subject	 Title	 artist	 copyright	
ID2	 “ABC”	 “Orr,	Tim”	 “1985”	
ID5	 “GHI”	 NULL	 “1985”	
Subject	 Predicate	 Object	
ID2	 language	 “French”	
ID3	 language	 “English”	
ID4	 type	 DVDType	
ID4	 title	 “DEF”	
Booktype	
CDType	
Property-class	Table	
Subject	 Object	
…	 …	
…	 …	
Clustered	Property	Table	
Multi-Value	P
Storing	and	Querying	RDF	data	
•  Vertically	partitioned	RDF	[AMM+07]	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 34	
Subject	 Predicate	 Object	
ID1	 type	 BookType	
ID1	 title	 “XYZ”	
ID1	 author	 “Fox,	Joe”	
ID1	 copyright	 “2001”	
ID2	 type	 CDType	
ID2	 title	 “ABC”	
ID2	 artist	 “Orr,	Tim”	
ID2	 copyright	 “1985”	
ID2	 language	 “French”	
ID3	 type	 BookType	
ID3	 title	 “MNO”	
ID3	 language	 “English”	
ID4	 type	 DVDType	
ID4	 title	 “DEF”	
ID5	 type	 CDType	
ID5	 title	 “GHI”	
ID5	 copyright	 “1995”	
ID6	 type	 BookType	
ID6	 copyright	 “2004”	
Subject	 Object	
ID1	 BookType	
ID2	 CDType	
ID3	 BookType	
ID4	 DVDType	
ID5	 CDType	
ID6	 BookType	
Subject	 Object	
ID1	 “XYZ”	
ID2	 “ABC”	
ID3	 “MNO”	
ID4	 “DEF”	
ID5	 “GHI”	
Subject	 Object	
ID1	 “2001”	
ID2	 “1985”	
ID5	 “1995”	
ID6	 “2004”	
Subject	 Object	
ID2	 “Orr,	Tim”	
Subject	 Object	
ID1	 “Fox,	Joe”	
Subject	 Object	
ID2	 “French”	
ID3	 “English”	
type	
title	
copyright	
author	
artist	
language	
To	get	the	most	out	of	this	par0cular	
decomposi0on,	a	column-oriented	
DBMS	is	recommended.
Comparison	of	Storage	Techniques	[BDK+13]		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 35	
company	 released	
Google	 Android	
Apple	 iPhone	
subject	 object	
Google	 Android	
Google	 developer	 Android	
subject	 predicate	 object	
Larry	Page	 born	 “1973”	
Larry	Page	 founder	 Google	
Google	 HQ	 “MTV”	
Google	 employees	 50,000	
Google	 industry	 Internet	
Google	 industry	 Software	
Google	 industry	 Hardware	
Triple	store	
person	 born	 founder	
Larry	Page	 “1973	 Google	
Type-oriented	store	
company	 HQ	 employees	
Google	 “MTV”	 50,000	
subject	 predicate	 object	
Google	 industry	 Internet	
Google	 industry	 Software	
Google	 industry	 Hardware	
subject	 object	
Larry	Page	 “1973”	
Predicate-oriented	store	
subject	 object	
Google	 “MTV”	
subject	 object	
Google	 Internet	
Google	 Software	
Google	 Hardware	
subject	 object	
Larry	Page	 Google	
subject	 object	
Google	 50,000	
born	
founder	
HQ	
employees	
industry	
industtry	
Larry	Page	
“1973”	
Google	
Internet	
Software	
Hardware	
“MTV”	
HQ	
50,000	
employees	
sample	graph	
Columns	are	
overloaded	
Traditional	relational	
column	treatment	
Static	mix	of	overloaded	
and	normal	columns	
developer	
Schema	does	not	
change	on	updates	
Schema	might	
change	on	updates
Storing	Linked	Data:	Query	Processing	
•  Schema	Agnostic	
–  algebraic	plan	obtained	for	a	query	involves	a	large	number	of	
self	joins	
–  queries	are	favorable	when	the	predicate	is	a	variable	
•  Hybrid	Approach	and	Schema-aware	
–  algebraic	plan	contains	operations	over	the	appropriate	
property/class	tables	(more	in	the	spirit	of	existing	relational	
schemas)	
–  saves	many	self-joins	over	triple	tables	
–  if	the	predicate	is	a	variable,	then	one	query	per	property/class	
must	be	expressed	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 36
Purpose	of	an	RDF	Querying	Benchmark 		
•  Test	the	performance	of	RDF	stores	
–  Independently	of	underlying	storage	engine	
–  Independently	of	underlying	logical	and	physical	schema	
–  Independently	of	the	query	actually	executed	in	the	engine		
•  SPARQL	for	native	stores	
•  SQL	(SPARQL	translated	to	SQL)	for	relational	stores	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 37
Overview	
•  Introducing	Benchmarks		
•  A	short	discussion	about	Linked	Data		
–  Resource	Description	Framework	(Data	Model)	
–  SPARQL	(Query	Language)	
•  Benchmarking	Principles	&	Choke	Points	
•  Benchmarks	
–  Synthetic	
–  Real	
–  Benchmark	Generators	
•  Sum	up:	what	did	we	learn	today?	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 38
Benchmarking	Principles	&	Choke	Points	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 39
Why	Benchmarks?	
•  Performance	Evaluation	
–  There	is	no	no	single	recipe	on	how	to	do	it	right	
–  There	are	many	ways	how	to	do	it	wrong	
–  There	are	a	number	of	best	practices	but	no	broadly	
accepted	standard	on	how	to	design	and	develop	a	
benchmark	
•  Questions	asked:	
–  What	data/data	sets	should	we	use?	
–  Which	workload/queries	should	we	consider?	
–  What	to	measure	and	how	to	measure?	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 40
Benchmark	Categories	
•  Micro-benchmarks	
•  Standard	benchmarks	
•  Real-life	applications	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 41
Micro	Benchmarks	
•  Specialized,	stand-alone	piece	of	software	
•  Isolate	one	particular	functionality	of	a	larger	system	
•  In	databases	a	micro	benchmark	tests	a	single	database	
operator		
–  Selection,	Join	(and	all	types	thereof),	Projection,	
Aggregates,	Sub-Queries,	…		
	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 42
Micro	Benchmarks:	Advantages	
•  Very	focused	
–  Test	a	specific	operator	of	the	system	
•  Controllable	data	&	workload	
–  Synthetic	and	Real	Data	sets	
•  Different	value	ranges	and	value	distribution	and	correlations	
(mostly	applicable	to	structured	data)	
–  Various	data	sizes	to	tackle	scalability	concerns	
•  Queries		
–  Workloads		of	different	complexity	&	size	
•  Complexity:	as	to	the	types	of	query	operators	and	patterns	
•  Size:	as	to	the	number	of	query	operators	involved	
–  Allow	broad	parameter	range(s)		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 43	
! Useful	for	detailed,	in-depth	analysis	
! Low	setup	threshold;		
! Easy	to	run
Micro	Benchmarks:	Disadvantages	
•  Neglect	larger	picture	since	they	do	not	test	the	whole	system	
•  Do	not	consider	the	flow	of	costs	of	specific	operations	to	the	
cost	of	the	system	
•  Do	not	measure	the	impact	of	micro-benchmark	on	real-life	
applications	
•  Difficult	to	generalize	the	results		
•  The	results	of	micro-benchmarks	cannot	be	applied	in	a	
straightforward	manner	
•  Micro-benchmarks	do	not	use	standardized	metrics	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 44
Standard	Benchmarks	
•  Relational,	Object	Oriented,	Object	Relational	Database	
Management	Systems	
–  Family	of	TPC	Benchmarks	for	relational	databases	
•  XML,	XPath,	XQuery,		
–  Mbench,	XBench,	XMach-1,	XMark,		
•  General	Computing	
–  SPEC	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 45
Standard	Benchmarks:		Advantages	&	Disadvantages	
•  Advantages	
–  Mimic	real-life	scenarios	(respond	to	real	needs)	
•  E.g.,	TPC	is	a	business	oriented	benchmark	
–  Publicly	available	
–  Well	defined		
–  Provide	scalable	data	sets	and	workloads		
–  Metrics	are	well	defined		
•  Disadvantages	
–  Outdated	(standardization	is	a	lengthy	process)	
•  XQuery	took	around	7	years	to	become	a	standard	
•  TPC	benchmark	definition	is	still	an	ongoing	process	
–  Very	large	and	complicated	to	run	
–  Limited	dataset	variation	(target	a	specific	type	of	data)	
–  Limited	Workload	(focuses	on	the	application	in	mind)	
–  Systems	are	often	optimized	for	the	benchmark(s)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 46
•  Management	and	methodological	activities	performed	by	a	
group	of	people	
–  Management:	Organizational	protocols	to	control	the	process	
–  Methodological:	principles,	methods	and	steps	for	benchmark	
creation	
•  Benchmark	Development	
–  Roles	and	bodies:	people/groups	involved	in	the	development	
–  Design	principles:	fundamental	rules	that	direct	the	
development	of	a	benchmark	
–  Development	process:	series	of	steps	to	develop	a	benchmark	
based	on	Choke	Points	
Benchmark	Development	Methodology	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 47	
Choke	Points:	the	set	of	technical		
difficulties	that	force	systems	to	improve	their	performance
The	Example	Standard	Benchmark:	TPC	
•  Transaction	Processing	Council	(TPC)	
–  non-profit	corporation	focused	on	developing	data-centric	
benchmark	standards	and	disseminating		objective,		verifiable	
performance	data	to	the	industry	
–  goal	is	to	«create,	manage	and	maintain	a	set	of	fair	and		
comprehensive	benchmarks	that	enable	end-users	and	vendors	to	
objectively	evaluate	system	performance	under	well	defined	
consistent	and	comparable	workloads»	[NPM+12]	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 48	
Benchmark	 Explanation	
TPC-C	 Focuses	on	transactions.	
TPC-DI	 Focuses	on	ETL	processes	
TPC-DS	 Decision	support	solutions	for,	but	not	limited	to,	Big	Data.		
TPC-E	 On-Line	Transaction	Processing	(OLTP)	workload	
TPC-H	 Decision	support	benchmark,	ad	hoc	queries	and	concurrent	data	modifications	
TPC-VMS	 Virtual	Measurement	Single	System	Specification	for	running	and	reporting	performance	
metrics	for	virtualized	databases	
TPC-xHS	 measure	of	hardware,	operating	system	and	commercial	Apache	Hadoop	File	System	API	
TPX-xV	 measure	the	performance	of	servers	running	database	workloads	in	virtual	machines.	
Active	TPC	Benchmarks	(2016)
Benchmark	Development	Process	(1)	
•  Design	Principles	[L97]	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 49	
Principle	 Comment	
Relevant	 The	benchmark	is	meaningful	for	the	target	domain	
Understandable	 The	benchmark	is	easy	to	understand	and	use	
Good	Metrics	 The	metrics	defined	by	the	benchmark	are	linear,	orthogonal	
and	monotonic	
Scalable	 The	benchmark	is	applicable	to	a	broad	spectrum	of	hardware	
and	software	configurations	
Coverage	 The	benchmark	workload	does	not	oversimplify	the	typical	
environment	
Acceptance	 The	benchmark	is	recognized	as	relevant	by	the	majority	of	
vendors	and	users
Benchmark	Development	Process	(2)	
•  Benchmarking	Metrics	
–  Performance		
–  Price/Performance	
–  Energy/Performance	Metrics:	Energy	metric	to	measure	the	energy	
consumption	of	system	components	
•  TPC	Pricing	specification	
–  Provides	consistent	methodologies	for	computing	the	price	of	the	
benchmarked	system,	licensing	of	software,	maintenance,	…		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 50	
Benchmark	 Metrics	
TPC-C	 Transaction	Rate(tpmC),	Price	per	Transaction	($/tmpC)	
TPC-E	 Transactions	per	Second	(tpS)	
TPC-H	 Composite	Query	per	Hour	Performance	Metric	(QpH@Size),	
Price	per	Composite	Query	per	Hour	Performance	Metric	($/
QpH@Size)
Desirable	Attributes	of	a	Benchmark:	
•  “A	good	benchmark	is	written	in	a	high-level	language,	making	it	
portable	across	different	machines;	is	representative	of	some	
programming	style	or	application;	can	be	measured	easily;	has	
wide	distribution	[W90]”	
•  “a	domain	specific	benchmark	must	meet	four	important	criteria:	
relevance,	portability,	simplicity,	scalability	[G93]”	
•  Six	desirable	attributes	for	TPC-C	[L97]:	relevance,	
understandability,	good	metrics,	scalability,	coverage,	acceptance	
•  Five	desirable	attributes	in	Huppler	[H09]:	relevance,	repeatability,	
fairness,	verifiability,	economy	
•  Big	Data	Benchmarking	[1]:	“a	successful	benchmark	should	be	
simple	to	implement	and	execute,	cost	effective,	timely	and	
verifiable”.		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 51
Desirable	Attributes	of	a	Benchmark:	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 52
Design	Principles:	Desirable	Attributes	of	a	Benchmark	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 53	
•  Relevant/Representative:	based	on	realistic	
use	case	scenarios	and	must	reflect	the	needs	
of	the	use	case	
•  Understandable/Simple:	the	results	and	
workload	are	easily	understandable	by	users	
•  Portable/Fair/Repeatable:	no	system	
benefits	from	the	benchmark.	Must	be	
deterministic	and	provide	a	«gold	standard»	
•  Metrics:	should	be	well	defined	to	be	able	to	
assess	and	compare	the	systems.		
•  Scalable:	datasets	should	be	in	the	order	of	
billions	of	«objects»	
•  Verifiable:	allow	verifiable	results	in	each	
execution	
Benchmark	
Attributes		
relevant	
representative	
understandable	
simple	
portable	
fair	
repeatable	
metrics	
scalable	
verifiable
Design	of	Benchmark	Workload	[Grey93]	
•  Design	the	queries	to	test	specific	features	of	the	query	
language	or	to	test	specific	data	management	approaches	
•  Base	the	query	mix	on	specific	requirements	of	real	world	
use	cases	
–  Leads	to	complex	queries	that	involve	many	(different)	
language	features	
	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 54	
Micro-benchmarks	
Domain	specific	and	standard	benchmarks
Development	Process:	Choke	Points		
•  A	benchmark	exposes	a	system	to	a	workload	and	should	identify	
the	technical	difficulties	of	the	system	under	test	
•  Choke	Points	[BNE14	]	are	those	technological	challenges	whose	
resolution	will	significantly	improve	the	performance	of	a	product	
•  TPC-H:	a	20	years	old	benchmark	(superseded	by	TPC-DS)	but	
still	influential	using	business-oriented	queries	and	concurrent	
modifications	
•  22	queries	capturing	(most	of)	the	aspects	of	relational	query	
processing		
•  [BNE14]	performed	an	analysis	of	the	TPC-H	workload		and	
identified	28	choke	points	grouped	into	6	categories	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 55
Choke	Points	à	la	TPC-H	
•  CP1:	Aggregation	Performance	
–  Ordered	aggregation,	small	group-by	keys,	interesting	orders,	dependent	
group-by	keys	
•  CP2:	Join	Performance 		
–  Large	joins,	sparse	foreign	keys,	rich	join	order	optimization,	late	projection	
•  CP3:	Data	Access	Locality	(materialized	views)	
–  Columnar	locality,	physical	locality	by	key,	detecting	correlation	
•  CP4:	Expression	Calculation	
–  Raw	Expression	Arithmetic,	Complex	Boolean	Expressions	in	Joins	and	
Selections,	String	Matching	Performance	
•  CP5:	Correlated	Sub-queries	
–  Flattening	sub-queries,	moving	predicates	to	a	sub-query,	overlap	between	
outer-	and	sub-query	
•  CP6:	Parallelism	and	Concurrency	
–  Query	plan	parallelization,	workload	management,	result	re-use	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 56
Choke	Points	à	la		RDF	
Choke	Point	 Description	
CP1:	JOIN	
ORDERING	
1.  Tests	if	the	engine	can	evaluate	the	trade-offs	between	
the	time	spent	to	find	the	best	execution	plan	and	the	
quality	of	the	output	plan	
2.  Tests	the	ability	of	the	engine	to	consider	cardinality	
constraints	expressed	by	the	different	kinds	of	schema	
constraints	(e.g.,	functional	and	inverse	functional	
properties)	
CP2:	
AGGREGATION	
Aggregations	are	implemented	with	the	use	of	sub-selects	
in	the	SPARQL	query;	the	optimizer	should	recognize	the	
operations	included	in	the	sub-selects	and	evaluate	them	
first.	
CP3:	OPTIONAL	&	
NESTED	
OPTIONAL	
CLAUSES	
Tests	the	ability	of	the	optimizer	to	produce	a	plan	where	
the	execution	of	the	optional	triple	patterns	is	the	last	to	
be	performed	since	optional	clauses	do	not	reduce	the	
size	of	intermediate	results.	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 57
Choke	Points	in	RDF	Benchmarks	
Choke	Point	 Description	
CP4:	REASONING	
Tests	the	ability	of	the	engine	to	handle	efficiently	RDFS	
and	OWL	constructs	expressed	in		the	schema	
CP5:	PARALLEL	
EXECUTION	OF	
UNIONS	
Tests	the	ability	of	the	optimizer	to	produce	plans	where	
unions	are	executed	in	parallel	
CP6:	FILTERS	
Tests	the	ability	of	the	engines	to	execute	as	early	as	
possible	those	filter	expressions	to	eliminate	
a	possibly	large	number	of	intermediate	results		
CP7:	ORDERING	
Tests	the	ability	of	the	engine	to	choose	query	
plan(s)	that	facilitate	the	ordering	of	results	
CP8:	GEO-SPATIAL	
PREDICATES	
Tests	the	ability	of	the	system	to	handle	queries	for	
geospatial	data	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 58
Choke	Points	in	RDF	Benchmarks	
Choke	Point	 Description	
CP9:	FULL	TEXT		 Queries	that	involve	the	evaluation	of	regular	expressions	
on	data	value	properties	of	resources	
CP10:	DUPLICATE	
ELIMINATION	
Tests	the	ability	of	the	system	to	identify	duplicate	entries	
and	eliminate	them	during	the	creation	of	
intermediate	results	
CP11:	COMPLEX	
FILTER	
CONDITIONS	
Tests	the	ability	of	the	engine	to	deal	with	negation,	
conjunction	and	disjunction	efficiently	(i.e.,	breaking	the	
filters	into	conjunction	of	filters	and	execute	them	in	
parallel).	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 59
Query	Characteristics	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 60	
Characteristics	
Simple	filters	
Unbound	
predicates	
LIMIT	 REGEX	 CONSTRUCT	
Complex	filters	 Negation	 ORDER	BY	 UNION	 ASK	
>=	9	TPs	 OPTIONAL	 DISTINCT	 DESCRIBE
Overview	
•  Introducing	Benchmarks		
•  A	short	discussion	about	Linked	Data		
–  Resource	Description	Framework	(Data	Model)	
–  SPARQL	(Query	Language)	
•  Benchmarking	Principles	&	Choke	Points	
•  Benchmarks	
–  Synthetic	
–  Real	
–  Benchmark	Generators	
•  Sum	up:	what	did	we	learn	today?	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 61
A	Survey	of	RDF	Benchmarks	
Synthetic	Benchmarks	
Real	Benchmarks	
Benchmark	Generators	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 61
Benchmark	Components	
•  Datasets	
•  The	raw	material	of	the	benchmark	against	which	the	workload	
will	be	evaluated	
•  Synthetic	&	Real	Datasets	
!  Synthetic:	Produced	with	a	data	generator	(that	hopefully	
produces	data	with	interesting	characteristics)	
!  Real:	Widely	used	datasets	from	a	domain	of	interest	
•  Query	Workload	
•  Sets	of	queries	and/or	updates	to	evaluate	the	system	with	
•  Metrics	
•  The	performance	metric(s)	that	determine	the	systems	behavior	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 62
Synthetic	RDF	Benchmarks	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 63
Lehigh	University	Benchmark	(LUBM)	[GPH05]	
•  Benchmark	intended	to	facilitate	the	evaluation	of	Semantic	
Web	repositories	
•  Widely	adopted	by	the	data	engineering	and	Semantic	Web	
communities	
•  Focuses	on	evaluating	the	performance	of	query	optimizers	
and	not	ontology	reasoning	as	in	DL	systems	
•  Components:		
–  Scalable	Synthetic	data	generator	
–  Ontology	of	moderate	size	and	complexity	
–  Supports	extensional	queries	(i.e.,	queries	that	request	
instances	and	not	only	schema	information)	
–  Proposes	Performance	metrics	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 64
LUBM	Univ-Bench	Ontology	
•  Describes	universities	and	departments	and	related	activities	
•  Expressed	in	OWL	Lite	(	took	into	consideration	the	
limitations	of	reasoning	systems	reg.	completeness)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 65	
Statistics:		
!  43	Classes	
!  32	Object	Type	Properties	
!  7	Data	Type	Properties	
!  OWL	Lite	inverseOf,	TransitiveProperty,		
someValuesFrom,	intersectionOf
LUBM	Data	Generation	(1)		
•  Synthetically	produced	extensional	data	that	conform	to	the	
LUBM	Ontology	
•  Data	are	generated	using	the	UBA	(Univ-Bench	Artificial	Data	
Generator)	
•  Random	and	Repeatable	Data	Generation	
•  Minimum	unit	of	data	generation:	University	that	has	
departments,	employees,	courses		
•  Instances	of	classes	and	properties	are	randomly	produced	
•  To	make	data	more	realistic	restrictions	are	applied:	
–  «Minimum	15	and	maximum	25	departments	per	university»	
–  «Undergraduate	student/faculty	ratio	between	8	and	14	
inclusive»	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 66
LUBM	Data	Generation	(2)		
•  Assignment	of	Identifiers	is	done	using	zero-based	indexes	
–  University0,	Department0,	…		
•  Data	generated	by	the	tool	are	repeatable	for	the	universities	
–  User	enters	a	seed	for	the	random	number	generator	
employed	in	the	data	generation	process	
•  Data	created	are	represented	in	OWL	Lite	
•  Configurable	serialization	and	representation	model	(RDF/
XML	in	.owl	files,	DAML)	
	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 67
LUBM	Queries	(1)	
•  14	Realistic	Queries	
•  Written	in	SPARQL	1.0		
•  Query	Design	criteria	
–  Input	Size:		
•  proportion	of	the	class	instances	involved	and	entailed	
in	the	query	to	the	total	instances	in	the	dataset	
–  Selectivity:		
•  estimated	proportion	of	the	class	instances	that	satisfy	
the	query	criteria		
•  depends	on	the	input	dataset	size	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 68
LUBM	Queries	(2)		
–  Complexity:		
•  measured	on	the	basis	of	the	number	of	classes	and	
properties	involved	in	the	query		
•  different	complexity	for	the	same	query	and	for	
different	implementations:	relational	vs	RDF	
–  Hierarchy	information:		
•  class	and	property	hierarchies	are	used	to	obtain	all	
query	answers	
–  Logical	inference:		
•  inference	is	required	to	obtain	all	query	answers	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 69
LUBM	Queries	(3):	Characteristics	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 70	
Characteristic	 Q1	 Q2	 Q3	 Q4	 Q5	 Q6	 Q7	 Q8	 Q9	 Q10	 Q11	 Q12	 Q13	 Q14	
Simple	filters	
Complex	filters	
>=	9	TPs	
Unbound	
predicates	
Negation	
OPTIONAL	
LIMIT	
ORDER	BY	
DISTINCT	
REGEX	
UNION	
DESCRIBE	
CONSTRUCT	
ASK	
Simple	SPARQL	SELECT	Queries
LUBM	Queries	(4):	Choke	Points	
#	 CP1	 CP2	 CP3	 CP4	 CP5	 CP6	 CP7	 CP8	 CP9	 CP10	 CP11	
Q1	
Q2	 ✓	
Q3	 ✓	
Q4	 ✓	 ✓	
Q5	 ✓	
Q6	 ✓	
Q7	 ✓	
Q8	 ✓	
Q9	 ✓	
Q10	 ✓	
Q11	 ✓	
Q12	 ✓	 ✓	
Q13	 ✓	
Q14	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 71	
Join	Ordering	
Most	complex	query	contains	5	joins	
Reasoning	
Focus	on	subClass	and	subProperty	
hierarchies
LUBM	Performance	Metrics	(1)			
•  Load	Time:		
–  Time	needed	to	parse,	load	and	reason	for	a	dataset	
–  Focuses	on	persistent	stores	
•  Repository	Size:		
–  For	persistent	storage	only		
–  The	size	of	all	files	that	constitute	the	repository	
•  Query	Response	Time:		
–  Average	time	for	executing	a	query	10	times	(warm	run)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 72
LUBM	Performance	Metrics	(2)			
•  Query	Completeness	and	Soundness:		
–  Measures	the	degree	of	completeness	of	a	query	answer	as	
the	percentage	of	entailed	unique	answers		
•  Combined	Metric:		
–  Combines	query	response	time	with		answer	completeness	
and		answer	soundness		
–  Measures	the	trade-off	between	query	response	time	and	
completeness	of	results	
•  See	how	reasoning	affects	query	performance	
–  Provides	an	absolute	ranking	of	systems	
–  But	hides	details!	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 73
SP2Bench	[SHM+09]	
•  Proposes	a	language	specific	benchmark	to	test	the	most	
common	SPARQL	constructs,	operator	constellations	and	
RDF	access	patterns	
•  Components:		
–  Scalable	synthetic	data	generator		
•  Creation	of	DBLP	documents	in	RDF	mimicking	key	
characteristics	of	the	original	DBLP	dataset	
•  Produced	datasets	contain	blank	nodes	and	RDF	
containers		
–  Supports	extensional	queries	(i.e.,	queries	that	request	
instances	and	not	schema	information)	
–  Proposes	performance	metrics		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 74
SP2Bench	Schema	DBLP	(1)		
	
	
•  Study	of	DBLP	real	data	
–  Determine	the	probability	distribution	for	selected	attributes	
per	document	classes	that	forms	the	basis		for	generating		class	
instances	
–  Reveals	that	only	few	of	the	attributes	are	repeated	for	the	
same	class		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 75	
<!ELEMENT	dblp	
																									(article	|	inproceedings	|	proceedings	|	book	|	incollection	|		
																										phdthesis		|	masterthesis	|	www)*	>		
<!ENTITY	%field	
																									“author	|	editor	|	title	|	booktitle	|	pages	|	year	|	address	|	journal	|		
																											volume	|	number	|	month	|	url	|	ee	|	cdrom	|	cite	|	publisher	|		
																										note	|	crossref	|	isbn	|	series	|	school	|	chapter”	>		
<!ELEMENT	article	(%field)*>		
<!ELEMENT	inproceedings	(%field)*	>		
Extract	DBLP	DTD	2008
SP2Bench	Schema	DBLP	(2)		
•  Probability	distribution	for	selected	attributes	per	document	
classes	
•  Additional	assumption	is	that	attributes	are	not	dependent	
–  Existence	of	an	attribute	does	not	depend	on	another	
	
•  Use	Bell-shaped	Gaussian	curves	to	approximate	input	data		
–  Typically	used	to	model	normal	distributions		
•  Studied	the	number	of	class	instances	over	time	and	modeled	
those	with	a	power	law	distribution	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 76	
Article	 Inproc.	 Proc.	 Book	 WWW	
author	 0.9895	 0.9970	 0.0001	 0.8937	 0.9973	
cite	 0.0048	 0.0104	 0.0001	 0.0079	 0.0000	
editor	 0.0000	 0.0000	 0.7992	 0.1040	 0.0004	
isbn	 0.0000	 0.0000	 0.8592	 0.9294	 0.0000	
…	 …	 …	 …	 …	 …
SP2Bench	Data	Generation		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 77	
•  Synthetically	produced	extensional	data	that	conform	to	the	DBLP	
Schema	
•  Use	of	existing	external	vocabularies	to	describe	resources	in	a	uniform	
way	
–  FOAF	(persons)	–	Friend	of	A	Friend	[	FOAF],	SWRC	-	Semantic	Web	
for	Research	Communities	(scientific	publications)	[SWRC],	DC	–	
Dublin	Core	[DC]	
•  Introduce	blank	nodes	and	RDF	containers	(rdf:Bag)	to	capture	all	aspects	
of	the	RDF	data	model	
•  Data	generation	takes	into	account	data	approximation	as	reflected	in	
the	Gaussian	curves	
•  Data	generator	takes	as	input	either	the	triple	count,	or	year	up	to	which	
the	data	is	generated	
–  Always	ending	up	in	a	consistent	state!	
•  Random	functions	are	based	on	a	fixed	seed	making	data	generation	
deterministic
SP2Bench	Queries	(1):	Characteristics	
•  17	queries		
–  12	main	queries	and	modifications	thereof	
•  Provided	in	natural	language,	in	SPARQL	1.0	and	SQL	
translations	are	also	available	
•  Query	design	criteria	
–  Focus	on	SELECT	and	ASK	SPARQL	forms	
–  Aim	at	covering	the	majority	of	SPARQL	constructs	
(including	DISTINCT,	ORDER	By,	LIMIT,	OFFSET)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 78
SP2Bench	Queries	(2):	Characteristics	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 79	
Characteristic	 Q1	 Q2	 Q3abc	 Q4	 Q5ab	 Q6	 Q7	 Q8	 Q9	 Q10	 Q11	 Q12abc	
Simple	filters	 ✔	 ✔	 ✔	 ✔	
Complex	filters	 ✔	 ✔	 ✔	
>=	9	TPs	 ✔	 ✔	 ✔	 ✔	 ✔	
Unbound	
predicates	
✔	 ✔	
Negation	 ✔	 ✔	
OPTIONAL	 ✔	 ✔	 ✔	
LIMIT	 ✔	
ORDER	BY	 ✔	 ✔	
DISTINCT	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	
REGEX	
UNION	 ✔	 ✔	 ✔	
DESCRIBE	
CONSTRUCT	
ASK	 ✔
SP2Bench	Queries	(3):	Choke	Points	
#	 CP1	 CP2	 CP3	 CP4	 CP5	 CP6	 CP7	 CP8	 CP9	 CP10	 CP11	
Q1	 ✓	
Q2	 ✓	 ✓	
Q3	 ✓	
Q4	 ✓	 ✓	 ✓	
Q5	 ✓	 ✓	 ✓	
Q6	 ✓	 ✓	 ✓	 ✓	
Q7	 ✓	 ✓	 ✓	 ✓	
Q8	 ✓	 ✓	 ✓	 ✓	 ✓	
Q9	 ✓	 ✓	
Q10	
Q11	 ✓	
Q12	 ✓	 ✓	 ✓	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 80	
Join	Ordering:	most		
complex	query	contains	8	joins	
Filters:	most	complex	
query	contains	2	filters	
Duplicate		
Elimination
SP2Bench	Performance	Metrics	
•  Loading	Time:	
–  time	needed	to	parse,	load	and	reason	using	the	tested	system	
for	a	dataset	
–  Focuses	on	persistent	stores	
•  «Per-query»	performance:	
–  Performance	of	each	query	
•  «Global»	performance:	
–  List	the	arithmetic	and	geometric	mean	of	queries	
1.  Multiply	the	execution	time	of	all	17	queries		
2.  Penalize	queries	that	fail	with	3600s	penalty	
3.  Compute	the	17th	root	of	the	result		
•  Memory	consumption	
–  High	watermark	of	main	memory	consumption	
–  Average	memory	consumption	of	all	queries		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 81
Berlin	SPARQL	Benchmark	(BSBM)	[	BS09][BSBM]			
•  Built	around	an	e-commerce	use	case		
•  Query	mix	emulates	the	search	and	navigation	patterns	of	a	user	
looking	for	a	product	of	interest	
•  Goals	
–  Allow	the	comparison	of	SPARQL	engines	across	different	
architectures	(relational	and/or	RDF)	
–  Challenge	forward	and	backward	chain	reasoning	engines	
–  Focuses	on	an	enterprise	setting	where	multiple	clients	
concurrently	execute	workloads	
–  Measures	SPARQL	query	performance	and	not	(so	much)	
reasoning	
•  Components	
–  Data	generator:	supports	the	creation	of	arbitrarily	large	
datasets	
–  Test	Driver:	executes	sequences	of	SPARQL	queries		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 82
BSBM	Schema	(1)		
•  E-commerce	use	case:	products	are	offered	by	several	vendors	
and	consumers	post	reviews	for	those	products	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 83	
9..22	
Review	
	
bsbm:reviewFor	
rev:reviewer	
bsbm:reviewDate	
dc:title	
rev:text	
bsbm:rating1[0..1]	
bsbm:rating2[0..1]	
bsbm:rating3[0..1]	
bsbm:rating4[0..1]	
Producer	
	
rdfs:label	
rdfs:comment	
rdf:type	
foaf:homepage	
bsbm:country	
ProductType	
	
rdfs:label	
rdfs:comment	
rdf:type	
rdfs:subClassOf[1..0]	
ProductFeature	
	
rdfs:label	
rdfs:comment	
rdf:type	
Product	
	
rdfs:label	
rdfs:comment	
rdf:type	
bsbm:producer	
bsbm:productFeature[9..22]	
bsbm:productPropertyTextual1	
bsbm:productPropertyTextual2	
bsbm:productPropertyTextual3	
bsbm:productPropertyTextual4[0..1]	
bsbm:productPropertyTextual5[0..1]	
bsbm:productPropertyNumeric1	
bsbm:productPropertyNumeric2	
bsbm:productPropertyNumeric3	
bsbm:productPropertyNumeric4[0..1]	
bsbm:productPropertyNumeric5[0..1]	
	
	
	
	
	
	
	
	
Offer	
	
bsbm:product	
bsbm:vendor	
bsbm:price	
bsbm:validFrom	
bsbm:validTo	
bsbm:deliveryDays	
bsbm:offerWebpage	
	
Person	
	
foaf:name	
foaf:mbox_sha1sum	
bsbm:country	
	
Vendor	
	
rdfs:label	
rdfs:comment	
rdf:type	
foaf:homepage	
bsbm:country	
	
1..89	
1	
1..*	
1..*	
1..*	
1	
2..16	
1	
4..32	
1	
280..3730	
2..37	
1
BSBM	Schema	&	Data	Characteristics	(1)	
•  Every	product	has	a	type	from	a	product	hierarchy		
•  Product	Hierarchy	is	not	fixed	(depends	on	the	dataset	size)	
–  It’s	depth	and	width	depends	on	the	chosen	scale	factor			
–  Hierarchy	depth			
–  Branching	factor	for		
•  root	level	
•  all	other	levels	is	8	
•  	Product	types	are	assigned	a	variable	number	of	product	
features	
–  computed	as		lowerBound	and	upperBound		with	
•  aa	
–  Set	of	possible	features	for	a	given	product	type	is	the	union	of	
the	type	and	all	its	“super-types”.		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 84	
d =1+round(log10(n)) / 2
n
bfr =1+ round(log10(n))
lowerBound = 35*i / (d *(d +1) / 2 −1),upperBound = 75*i / (d *(d +1) / 2 −1)
BSBM	Schema	&	Data	Characteristics	(2)	
•  Products,	Vendors,	Offers	
–  Products	that	share	the	same	type,	have	also	the	same	set	of	
features	
–  For	a	given	product,	its	features	are	chosen	from	the	set	of	
possible	features	with	a	hard-coded	probability	of	25%	
–  Normal	distribution	with	a	mean	of	μ=50	and	standard	deviation	
σ=16.6	is	employed	to	associate	products	with	producers	
–  Vendors	are	associated	to	countries	following	hard-coded	
distributions		
–  Size	of	offers	is	n*20 are	distributed	over	products	following	a	
normal	distribution	with	«fixed	parameters»	μ=n/2		and	σ=n/4
–  Offers	are	distributed	over	vendors	following	a	normal	
distribution	with	«fixed	parameters»	μ=2000		and	σ=667
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 85
BSBM	Schema	&	Data	Characteristics	(3)	
•  Reviews	
–  10	times	the	scale	factor	n
–  Data	type	property	values	(title	and	text)	between	50	–	300	
words	
–  Up	to	4	ratings,	each	rating	is	a	random	integer	between	1	and	
10	
–  Each	rating	is	missing	with	hard-coded	probability	10%	
–  Distributed	over	products	with	a	normal	distribution	depending	
on	dataset	size	and	following	μ=n/2		and	σ=n/4
–  Number	of	reviews	per	reviewer	follows	normal	distribution	
with		μ=20		and	σ=6.6	
–  Reviews	are	generated	until	all	reviews	are	assigned	a	reviewer	
–  Reviewer	countries	follow	the	same	distribution	as	vendor	
countries	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 86
BSBM	Data	Generation	(1)		
•  Synthetically	produces	instances	of	class	Product	that	conform	to	
the	BSBM	Schema	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 87	
Total	#triples	 250K	 1M	 2M	 100M	
#products	 666	 2,785	 70,812	 284,826	
#product	
features	
2,860	 4,745	 23,833	 47,884	
#product	types	 55	 151	 731	 2011	
#producers	 14	 60	 1422	 5,618	
#vendors	 8	 34	 722	 2,854	
#offers	 13,320	 55,700	 1,416,240	 5,696,520	
#reviewers	 339	 1432	 36,249	 146,054	
#reviews	 6,660	 27,850	 708,120	 2,848,260	
Total	#instances	 23,922	 92,757	 2,258,129	 9,034,027	
Indicative	number	of	instances	for	different	dataset	sizes
BSBM	Queries	(1)		
•  12	Queries	
•  Query	mix	is	emulates	search	and	navigation	patterns	of	a	customer	
looking	for	a	product	
•  BSBM	queries	are	given	in	natural	language,	SPARQL	and	SQL	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 88	
Query	 Description	
Q1	 Find	products	for	a	given	set	of	generic	features	
Q2	 Retrieve	basic	information	about	a	specific	product	for	display	purposes	
Q3	 Find	products	having	some	specific	features	and	not	having	one	feature	
Q4		 Find	products	matching	two	different	sets	of	features	
Q5	 Find	products	that	are	similar	to	a	given	product	
Q6	 Find	products	having	a	label	that	contains	a	specific	string	
Q7	 Retrieve	in-depth	information	about	a	product	including	offers	and	reviews	
Q8	 Give	me	recent	language	reviews	for	a	specific	product	
Q9	 Get	information	about	a	reviewer	
Q10	 Get	cheap	offers	which	fulfill	the	consumer’s	delivery	requirements	
Q11	 Get	all	information	about	an	offer	
Q12	 Export	information	about	an	offer	into	another	schema
Characteristic	 Q1	 Q2	 Q3	 Q4	 Q5	 Q6	 Q7	 Q8	 Q9	 Q10	 Q11	 Q12	
Simple	filters	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	
Complex	filters	 ✔	 ✔	
>	9	TPs	 ✔	 ✔	 ✔	 ✔	 ✔	
Unbound	
predicates	
✔	
	
✔	
	
Negation	 ✔	
	
OPTIONAL	 ✔	 ✔	 ✔	 ✔	
LIMIT	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	
ORDER	BY	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	
DISTINCT	 ✔	 ✔	 ✔	
REGEX	 ✔	
UNION	 ✔	 ✔	
DESCRIBE	 ✔	
CONSTRUCT	 ✔	
ASK	
BSBM	Queries	(2):	Characteristics			
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 89	
11	JOINs,		
3	OPTIONAL	clauses,	
3	Filters,		
1	Unbound	variable	
4	OPTIONAL	clauses
BSBM	Queries	(3):	Choke	Points	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 90	
#	 CP1	 CP2	 CP3	 CP4	 CP5	 CP6	 CP7	 CP8	 CP9	 CP10	 CP11	
Q1	 ✔	 ✔	 ✔	 ✔	
Q2	 ✔	
Q3	 ✔	 ✔	 ✔	
Q4	 ✔	 ✔	 ✔	 ✔	
Q5	 ✔	 ✔	 ✔	 ✔	
Q6	 ✔	 ✔	
Q7	 ✔	 ✔	 ✔	
Q8	 ✔	 ✔	 ✔	
Q9	 ✔	
Q10	 ✔	 ✔	 ✔	 ✔	
Q11	 ✔	
Q12	 ✔	
Join	Ordering:	most		
complex	query	contains	11	joins	
Filters:	most	complex	query	
contains	3	filters	and	most	
complex	filter	contains	
arithmetic	expressions	
Result	
Ordering
BSBM:	Performance	Metrics	
•  Query	Mixes	per	Hour	(QMpH)	
–  Measures	the	number	of	complete	BSBM	query	mixes	answered	
by	a	system	under	test	and	for	a	specific	number	of	clients	
running	concurrently	against	the	system	under	test	
•  Queries	per	Second	(QpS)	
–  Measures	the	number	of	queries	of	a	specific	type	handled	by	
the	system	under	test	in	a	second	
–  Calculated	by	dividing	the	number	of	queries	of	a	specific	type	
within	a	benchmark	run	by	the	total	execution	time	of	those	
queries	
•  Load	Time:	
–  Time	to	load	the	dataset	in	the	RDF	or	relational	repositories		
•  Includes	the	time	to	create	the	appropriate	data	structures	&	
indices	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 91
Semantic	Publishing	Benchmark	(SPB)	
•  Developed	in	the	context	of	FP7	EU	Project	LDBC	(2012-2015)	
•  LDBC’s	goals:		
–  Develop	querying	benchmarks	that	will	spur	research	&	
industry	progress	in	large-scale	graph	and	RDF	data	
management	
•  scalability,	storage,	indexing	and	query	optimization	
techniques	for	RDF	and	graph	database	solutions	
•  quantitatively	and	qualitatively	assess	different	
solutions	for	RDF	data	integration	
–  To	establish	an	industry-neutral	entity	-	LDBC	foundation	-		
à	la	the	Transaction	Processing	Council	(TPC)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 92
Semantic	Publishing	Benchmark	(SPB)	
•  Industry-motivated	benchmark	
–  The	scenario	involves	a	media	/	publisher	organization	that	
maintains	semantic	metadata	about	its	Journalistic	assets	
•  Components	
–  Scalable	Synthetic	Data	Generator	
•  Creation	of	instances	of	BBC	ontologies	mimicking	
characteristics	of	the	original	real	input	datasets	
–  Supports	extensional	queries	(i.e.,	queries	that	request	
instances	and	not	schema	information)	
–  Workload	simulates	consumption	of	RDF	metadata	
•  Concurrent	read	and	update	queries	
–  Proposes	performance	metrics		
	6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 93
SPB	Design:	Requirements	
•  Storing	and	processing	RDF	data	
–  Storing	and	isolating	data	in	separate	RDF	graphs	
–  Supporting	following	SPARQL	standards	:		
•  SPARQL	1.1	Protocol,	Query,	Update	
•  Support	for	Schema	Languages	
–  Support	for	RDFS	to	obtain	the	correct	answers	
–  Optional	support	for	the	RL	profile	of	Web	Ontology	Language	
(OWL2	RL)	in	order	to	pass	the	conformance	test	suite	
•  Loading	data	from	RDF	serialization	formats		
–  N-Quads,	TRIG,	Turtle,	etc.	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 94
SPB	Schema:	BBC	Ontologies	(1)	
•  Core	Ontologies:	7	ontologies	describe	basic	concepts	about		
entities	and	relationships	in	the	domain	of	interest	
–  Basic	Concepts:	Creative	Works,	Places,	Persons,	Provenance	
Information,	Company	Information,	etc.	
	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 95	
Thing CreativeWork
String
cwork:title
owl:Thing owl:sameAs
Theme Organisation
Event PlacePerson Programme
NewsItemBlogPost
cwork:tag
cwork:shortTitle
String
cwork:category
xsd:Any
cwork:description
String
Audience
International Audience National Audience
cwork:audience
cwork:Format
Textual
Format
Video
Format
Interactive Format
Image Format Audio Format
PictureGallery
Format
cwork:primaryFormat
xsd:dateTime
xsd:dateTime
cwork:dateModified
cwork:dateCreated
cwork:Thumbnail
cwork:thumbnail
Thumbnail ThumbnailTypethumbnailType
StandardThumbnail
FixedSize66Thumbnail
CloseUpThumbnail
FixedSize266Thumbnail
FixedSize466Thumbnail
p
rdfs:subClassOf rdfs:subPropertyOf
rdf:type
tag
about mentions
Stringcwork:altText
Schema	BBC	Schema	(2)	
•  Domain	Ontologies:	3	ontologies	describe	concepts	and	
properties	related	to	a	specific	domain	
–  sports	(competitions,	events)	
–  politics	entities	
–  news	(concepts	that	journalists	tag	annotations	with)	
•  Statistics	
–  74	classes	
–  88	data	type	properties,	28	object	type	properties		
–  60	rdfs:subClassOf	(maximum	depth	3)	,	17	rdfs:subPropertyOf	
(maximum	depth	1)	hierarchies	
–  105	rdfs:domain	and	115	rdfs:range	RDFS	properties	
–  8	owl:oneOf	class	axioms,	1	one	owl:TransitiveProperty	
property.	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 96
SPB:	Reference	datasets	
•  Collections	of	entities	describing	various	domains	
–  Snapshots	of	the	real	datasets	of	BBC	
•  Football	competitions	and	teams	
•  Formula	One	competitions	and	teams	
•  UK	Parliament	Members	
–  Additional	datasets	
•  GeoNames	-	Places,	names	and	coordinates	
•  DBPedia	–	Person	data	
–  Reference	Dataset	Size:	25M	triples		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 97
SPB	Data	Generation	(1):	Process		
1.  Loader	
–  Ontology	&	Reference	Data	
2.  Data		Generator	
a.  Retrieves	instances	
from	Reference	Datasets	
b.  Generates	Creative	Works	
according	to	pre-defined	
allocations	and	models	
c.  Writes	generated	data	to	
disk	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 98	
RDF	Repository	
BBC	
Ontologies	
Reference	
Datasets	
Ontology	&		
Reference	
Data	Set	Loader	
Creative	
Works	
Generator	
SPARQL	Endpoint	
SPB	Data	Generator	
Data	
generation	
parameters	
(1)	 (1)	
(2.a)	
Generated	
CWs	
(2.c)	
(1)	
(2.d)
SPB	Data	Generation	(2)	
•  Produces	synthetic	data	that	mimic	most	of	the	characteristics	of	real	
world	data	provided	by	BBC	
•  Input:		Core	&	Domain	Ontologies	and	Reference	datasets	
•  Output:		
–  Instances	that	conform	to	BBC	core	ontologies	(class	Creative	Work)	
–  Instances	refer	to	entities	in	the	reference	datasets	using	the	about	&	
mentions	schema	properties	
–  follows	the	(user)	pre-defined	distributions	of	SPB’s	Data	Generator	
Tagged	entities	
01/2012	 12/	2012
clustering	
correla1ons	
random	distribu1on	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 99
SPB	Operational	Phases	
•  Data	Loading	
1.  Initial	loading	of	reference	datasets	
•  BBC	datasets	enriched	with	DBPedia	Person	and	GeoNames	
place	data	
2.  Generation	of	Creative	Works	
•  Parallel	generation	(multi-threaded	and	multi-process)	
3.  Loading	of	Creative	Works	in	the	RDF	repository	
•  Running	the	Benchmark	
1.  Warm-up	phrase	
2.  Run	the	benchmark	using	the	Test	Driver	
3.  Run	conformance	tests	(OWL2	RL)	[optional]	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 100
Benchmark	Configuration	
•  Data	Generator	
–  Allocation	of	tags	in	Creative	Works		
•  Correlations	of	creative	works	with	important	entities	
(persons,	places,	events)	
•  Clustering	of	Creative	Works	around	major	/	minor	events	
–  Size	of	generated	data	(triples)	
–  Parallel	data	generation	
•  Test	Driver		
–  Distribution	of	queries	in	the	query-mix	
•  editorial	operations	(deletion/addition	of	RDF	triples)	
•  aggregate	operations	(complex	SPARQL	queries)	
–  Number	of	editorial	/	aggregation	agents	
–  Duration	of	Warm-up	and	Benchmark	phases	
–  Each	operational	phase	can	be	enabled	or	disabled	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 101
SPB	Base	Workload	Queries	(2)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 102	
Characteristic	 Q1	 Q2	 Q3	 Q4	 Q5	 Q6	 Q7	 Q8	 Q9	 Q10	 Q11	 Q12	
Simple	filters	 ✔	
Complex	filters	 ✔	 ✔	 ✔	
>	9	TPs	 ✔	 ✔	 ✔	 ✔	
Unbound	
predicates	
✔	
Negation	
OPTIONAL	 ✔	 ✔	 ✔	 ✔	
LIMIT	 ✔	 ✔	 ✔	 ✔		 ✔	 ✔	
ORDER	BY	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	
DISTINCT	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	
COUNT	 ✔	
REGEX	
UNION	 ✔	 ✔	 ✔	
GROYP	BY	 ✔	
CONSTRUCT	 ✔	 ✔	 ✔	 ✔	 ✔	
Evaluate	(parts	of	the)	query		
on	graphs
SPB	Queries	(1)	
•  Base	and	Advanced	Workloads	
–  Base	Workload:	12	queries	&	update	operations	
–  Advanced	Workload:	24	queries	
•  Workloads	based	on	real	queries	used	by	BBC	journalists	
during	their	editorial	operations	
•  Editorial	agents	–	simulate	editorial	work	performed	by	
journalists	:	
–  Insert,	Update,	Delete	
•  Aggregation	agents	–	simulate	retrieval	operations	
performed	by	end-users		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 103
SPB	Base	Workload	Queries	(3):	Choke	Points	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 104	
#	 CP1	 CP2	 CP3	 CP4	 CP5	 CP6	 CP7	 CP8	 CP9	 CP10	 CP11	
Q1	 ✔	 ✔	 ✔	 ✔	 ✔	
Q2	 ✔	 ✔	 ✔	
Q3	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	
Q4	 ✔	 ✔	 ✔	 ✔	 ✔	
Q5	 ✔	 ✔	 ✔	 ✔	 ✔	
Q6	 ✔	 ✔	 ✔	 ✔	
Q7	 ✔	 ✔	
Q8	 ✔	 ✔	 ✔	
Q9	 ✔	 ✔	 ✔	
Q10	 ✔	 ✔	 ✔	 ✔	
Q11	 ✔	 ✔	 ✔	 ✔	 ✔	
Q12	 ✔	 ✔	
Reasoning	reg.	class	&		
property	hierarchies	
Join	Ordering	
Ordering	&	Duplicate	
Elimination
SPB	Performance	Metrics	
•  SPB	Primary	Metrics	
•  Query	Execution	Report	(1)	
	
•  Query	Execution	Report	(2)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 105	
Query	Rate	
Interactive	mix		
(Queries	per	second)		
Query	Rate	
Analytical	Mix		
(Queries	per	second)		
Update	Rate		
(Operations	per	
second)		
Duration	of	
Bulk	Load		
(in	ms)		
Duration	of	
Measurement	
Window		
(in	minutes)		
#	Complete	
Analytical	
mixes		
(per	second)		
#	Complete		
Interactive	mixes		
(per	second)		
#	Complete	
Update		
Operations	
	
Query	 Arithmetic	Mean		
Execution	Time		
Minimum	
Execution	Time		
90th	%	Average	
Execution		Time		
#	Executions
Real	RDF	Benchmarks	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 106
UniProt	[RU09][UniprotKB]	
•  Comprehensive,	high-quality	and	freely	accessible	resource	of	
protein	sequence	and	functional	information	
•  UniProt	Schema	
–  UniProt	Core	Vocabulary,	BIBO	(journals),	ECO	(evidence	
codes),	Dublin	Core	(metadata)	
–  UniProt	Core	Vocabulary:	124	classes,	113	Properties		
•  Dataset	contains	approximately		
–  13	billion	triples	
–  2.5	billion	distinct	subjects	
–  2	billion	distinct	objects	
•  Queries	
–  No	representative	set	of	queries	is	offered.		
–  [NW09]	offers	a	set	of	8	queries	to	test	the	RDF-3X	engine	
	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 107
UniProt	Queries	(1)	[NW09]:	Characteristics	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 108	
Characteristic	 Q1	 Q2	 Q3	 Q4	 Q5	 Q6	 Q7	 Q8	
Simple	filters	
Complex	filters	
>	9	TPs	 ✔	 ✔	 ✔	 ✔	 ✔	 ✔	
Unbound	
predicates	
Negation	
OPTIONAL	
LIMIT	
ORDER	BY	
DISTINCT	
REGEX	
UNION	
DESCRIBE	
CONSTRUCT	
ASK	
Join	Ordering	
RDF-3X	aims	at	optimizing	
join	processing	for	RDF	data
UniProt	Queries	(2)	[NW09]:	Choke	Points	
•  Focus	on	discovering	optimal	or	close	to	optimal	join	orders	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 109	
#	 CP1	 CP2	 CP3	 CP4	 CP5	 CP6	 CP7	 CP8	 CP9	 CP10	 CP11	
Q1	 ✔	
Q2	 ✔	
Q3	 ✔	
Q4	 ✔	
Q5	 ✔	
Q6	 ✔	
Q7	 ✔	
Q8	 ✔	
Join	Ordering:	most		
complex	query	contains	12	joins	
7	queries	contain	more	than	7	joins
YAGO	(Yet	Another	Great	Ontology)[SKW07]	
•  High	quality	multilingual	knowledge	based	derived	from	
Wikipedia,	WordNet	and	GeoNames	
•  Schema	
–  Wikipedia	Entities,	WordNet	and	GeoNames	Concepts	and	
Relationships:	associates	WordNet	taxonomy	with	Wikipedia	
Category	System		
–  10	million	schema	entities	
•  Dataset	
–  120	million	triples	about	schema	entities	
–  2.625	million	links	to	DBPedia	
•  Queries	
–  No	representative	set	of	queries	is	offered	by	YAGO	
–  [NW10]	provides	a	representative	set	of	8	queries	for	RDF-3X	
Evaluation	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 110
YAGO	Queries	(1)	[NW10]:	Characteristics	
•  Simple	SELECT	queries	that	focus	on	Join	ordering,	negation	
and	duplicate	elimination		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 111	
Characteristic	 A1	 A2	 A3	 B1	 B2	 B3	 C1	 C2	
Simple	filters	 ✔	
Complex	filters	
>	9	TPs	 ✔	
Unbound	
predicates	
Negation	 ✔	 ✔	 ✔	
OPTIONAL	
LIMIT	
ORDER	BY	
DISTINCT	 ✔	 ✔	 ✔	 ✔	 ✔	
REGEX	
UNION	 ✔
YAGO	Queries	(2)	[NW10]:	Choke	Points	
•  Queries	focus	mostly	on	discovering	optimal	or	close	to	query	
evaluation	plans,	including	negation	in	filters	and	duplicate	
elimination.	
•  		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 112	
#	 CP1	 CP2	 CP3	 CP4	 CP5	 CP6	 CP7	 CP8	 CP9	 CP10	 CP11	
A1	 ✔	
A2	 ✔	
A3	 ✔	 ✔	 ✔	 ✔	
B1	 ✔	 ✔	 ✔	
B2	 ✔	 ✔	
B3	 ✔	 ✔	 ✔	
C1	 ✔	 ✔	 ✔	 ✔	
C2	 ✔	 ✔	Join	Ordering:	most		
complex	query	contains	8	joins	
all	queries	contain	more	than	5	joins
Barton	Library	[Barton]	
•  Data	from	the	MIT	Simile	Project	that	develops	tools	for	library	data	
management	
–  contains	records	that	compose	an	RDF-formatted	dump	of	the	MIT	
Libraries	Barton	catalog	
–  converted	from	raw	data	stored	in	an	old	library	format	standard	
called	MARC	(Machine	Readable	Catalog).	
•  Schema	
–  Common	types	include	Record	and	Item,	the	latter	being	associated	
with	instances	of	type	Person	and	with	instances	of	Description.		
–  Primitive	types	include	Title	and	Date.	
•  Dataset	
–  Approximately	45	million	RDF	triples	
•  Queries	
–  No	representative	queries	provided	with	the	Barton	Library	Dataset	
–  [Abadi07]	provides	a	workload	of	7	queries	([NW10]	in	SPARQL)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 113
Barton	Queries	(1)	[NW10]:	Characteristics	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 114	
Characteristic	 Q1	 Q2	 Q3	 Q4	 Q5	 Q6	 Q7	
Simple	filters	 ✔	 ✔	 ✔	 ✔	
Complex	filters	
>	9	TPs	
Unbound	
predicates	
Negation	 ✔	
OPTIONAL	
LIMIT	
ORDER	BY	
DISTINCT	 ✔	 ✔	 ✔	
REGEX	
UNION	 ✔
Barton	Queries	(2)	[NW10]:	Choke	Points	
•  Queries	focus	mostly	on	discovering	optimal	or	close	to	
optimal	query	evaluation	plans,	including	negation	in	filters	
and	duplicate	elimination.	
•  		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 115	
#	 CP1	 CP2	 CP3	 CP4	 CP5	 CP6	 CP7	 CP8	 CP9	 CP10	 CP11	
Q1	 ✔	
Q2	 ✔	 ✔	
Q3	 ✔	
Q4	 ✔	
Q5	 ✔	 ✔	
Q6	 ✔	 ✔	 ✔	 ✔	
Q7	 ✔	
Join	Ordering:	most		
complex	query	contains	3	joins
Linked	Sensor	Dataset	[PHS10]	
•  Expressive	descriptions	of	approximately	20,000	weather	
stations	in	the	US	
•  divided	up	into	multiple	subsets,	that	reflect	weather	data	for	
specific	hurricanes	or	blizzards	from	the	past	(focus	on	hurricane	
Ike)	
•  Schema	
–  Contains	information	about	temperature,	precipitation,	
pressure,	wind,	speed,	humidity	
–  Contains	links	to	GeoNames	and	links	to	observations	provided	
by	MesoWest	(meteorological	service	in	the	US)	
•  Dataset		
–  more	than	1	billion	triples	
•  Queries	
–  No	representative	set	of	queries	is	offered.		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 116
WordNet	[WordNet]	
•  Large	lexical	database	of	English,	developed	under	the	
direction	of	George	A.	Miller	(Emeritus).		
•  Schema	
–  Nouns,	verbs,	adjectives	and	adverbs	are	grouped	into	sets	of	
cognitive	synonyms	(synsets),	each	expressing	a	distinct	
concept.		
–  Synsets	are	interlinked	by	means	of	conceptual-semantic	and	
lexical	relations.	The	resulting	network	of	meaningfully	related	
words	and	concepts	can	be	navigated	with	the	browser.	
•  Dataset	
–  Approximately	1.9	million	triples	(300MB).	
•  Queries	
–  No	representative	query	workload	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 117
Publishing	TPC-H	as	RDF	[TPC-H]	
•  Benchmark	can	be	used	by	decision	support	systems	that	
examine		
–  large	volumes	of	data,	execute	queries	with	a	high	degree	of	
complexity,	and	provide	answers	to	critical	business	questions	
•  Benchmark	provides	a	suite	of	business	oriented	ad-hoc	queries	
and	concurrent	data	modifications	
•  Queries	and	the	data	populating	the	database	have	been	chosen	to	
have	broad	industry-wide	relevance	
•  Use	the	DBGEN	TPC-H	generator	to	generate	a	TPC-H	relational	
dataset		
•  Use	the	D2R	tool	or	other	relational	to	RDF	tool	to	convert	the	
relational	dataset	to	the	equivalent	RDF	one.	
•  TPC	SQL	queries	are	translated	to	equivalent	SPARQL	queries	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 118
Benchmark	Generators	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 119
DBPedia	SPARQL	Benchmark	(DBSB)	[MLA+14]	
•  Generic	Methodology	for	SPARQL	Benchmark	Creation		
•  Based	on		
–  Flexible	data	generation	that	mimics	an	input	data	source	
–  Query-log	mining		
–  Clustering	of	queries	
–  SPARQL	queries	feature	analysis	
•  Methodology	is	schema	agnostic	
–  Demonstrated	using	DBPedia	KB	
•  Proposed	approach	applied	on	various	sizes	of	the	DBPedia	
Knowledge	Base	
•  Benchmark	proposes	query	workload	based	on	real	queries	
expressed	against	DBPedia	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 120
DBSB	Data	Generation	(1)	
•  Working	assumptions	
1.  Output	dataset	should	have	similar	characteristics	as	
input	dataset	
•  Number	classes,	properties,	value	distributions,	
taxonomic	structures	(hierarchies)	
2.  Varying	output	dataset	sizes	
3.  Characteristics	such	as	in-,	out-	degree	of	nodes	in	
datasets	of	varying	sizes	should	be	similar	
4.  Easily	repeatable	data	generation	process	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 121
DBSB	Data	Generation	(2)	
•  Idea	
1.  Large	datasets	produced	by		
•  Duplicating	all	triples	and	changing	their	namespace	
2.  Smaller	datasets	produced	by		
•  Removing	triples	in	a	way	that	would	preserve	the	
properties	of	the	original	graph	
•  Using	a	seed	based	method	based	on	the	assumption	that	a	
representative	set	of	resources	is	obtained	by	sampling	
across	classes	
1.  For	each	selected	element	in	the	dataset,	its	concise	
bound	description	(CBD)	is	retrieved	and	added	in	the	
queue	
2.  Process	is	repeated	until	the	number	of	triples	is	
reached	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 122
DBSB	Query	Analysis	(1)	
•  Goal	is	to	detect	prototypical	queries	that	were	sent	to	a	
DBPedia	SPARQL	endpoint	using	similarity	measures	
–  String	similarity	and	graph	similarity	
•  Idea:	4-step	query	analysis	and	clustering	approach	
1.  Select	queries	executed	frequently	on	the	input	data	
2.  Strip	common	syntactic	constructs	(namespace,	prefixes)	
3.  Compute	query	similarity	using	string	matching		
4.  Compute	query	clusters	using	a	soft	graph	clustering	
algorithm	
•  Clusters	used	to	devise	the	benchmark	query	generation	
patterns		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 123
DBSB	Query	Analysis	(2)	
•  Query	Selection	
1.  Use	DBPedia	SPARQL	Query	log	(31.5	million	queries	in	a	3	
month	period)	
2.  Reduce	the	initial	set	of	queries	by	considering	
•  Query	Variations:	use	a	standard	way	to	name	variables	to	
reduce	differences	among	queries	(promoting	query	
constructs	such	as	DISTINCT,	REGEX)	
•  Query	Frequency:	discard	queries	with	low	frequency	since	
they	do	not	contribute	to	the	overall	query	performance	
–  Result:	35,965	queries	
3.  String	Stripping:	remove	all	SPARQL	keywords	and	common	
prefixes	
4.  Similarity	Computation:	compute	the	similarity	of	the	stripped	
queries	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 124
DBSB	Query	Analysis	(3)	
•  Query	Selection	(cont’d)	
4.  Similarity	Computation	
•  Reduce	the	time	of	benchmark	compilation,	use	LIMES[NS11]	
framework	
•  Use	the	Levenshtein	string	similarity	measure,	0.9	threshold	
•  Reduce	by	16.6%	the	number	of	computations	required	by	
computing	the	Cartesian	product	of	queries	
5.  Clustering	
•  Apply	graph	clustering	to	the	query	similarity	graph	of	(4)	
•  Goal	is	to	identify	similar	groups	of	queries	out	of	which	
prototypical	queries	will	be	generated	
•  Use	BorderFlow	[NS09]	algorithm	that	follows	a	seed-based	
approach	
•  Obtain	12272	clusters,	24%	contain	a	single	query	
•  Select	the	clusters	with	>5	queries	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 125
DBSB	Query	Generation	(1)	
•  Select	the	most	interesting	SPARQL	queries	
–  Which	are	the	most	frequently	asked	SPARQL	queries	
–  Which	of	those	queries	cover	the	most	SPARQL	features	
•  SPARQL	Features	
–  Overall	number	of	triple	patterns	
•  Test	the	efficiency	of	join	operations	(CP1)	
–  SPARQL	pattern	constructors	(UNION	&	OPTIONAL)	
•  Handle	parallel	execution	of	Unions	(CP5)	
•  Perform	OPTIONALs	as	late	as	possible	in	the	query	plan	(CP3)	
–  Solution	sequences	&	modifiers	(DISTINCT)		
•  Efficiency	of	duplication	elimination	(	CP10)	
–  Filter	conditions	and	operators	(FILTER,	LANG,	REGEX,	STR)		
•  Efficiency	of	engines	to	execute	filters	as	early	as	possible	(CP6)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 126
DBSB	Query	Generation	(2)	
•  25	queries	are	selected		
–  For	each	of	the	features,	manually	select	the	part	of	the	query	to	
be	varied	(IRI	or	filter	condition)	
–  Variability	of	query	template(s)	for	the	chosen	values	is	
sufficiently	high	(>=1000	per	query	template)	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 127	
Method	ensures	that	
•  Executed	queries	during	the	benchmark	differ		
•  Always	return	non	empty	results
Apples	and	Oranges	[DKS+11]	
•  Propose	structuredness	to	characterize	datasets	
–  The	level	of	structuredness	of	a	dataset	D,	with	respect	to	a	type	(class)	
T,	is	determined	by	how	well	the	instances	of	T,	conform	to	type	T	
–  If	each	instance	of	T	has	the	properties	defined	in	T,	then	the	dataset	
has	high	structuredness	with	respect	to	T	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 128	
0
1
2
3
4
5
6
name	 office	 ext	 major	 GPA	
OC(p,	I(T,	D))	
OC(p,	T)	for	each	property	p	of	T	
0
1
2
3
4
5
6
name office ext major GPA
Highly	structured	dataset	
•  all	instances	have	the	name	attribute	
•  ext	&	GPA		properties	encountered	in		
50%	of	the	instances	
•  οffice	property	found	in		20%	of	the	instances	
•  major	property	in	10%	of	the	instances	
•  all	instances	have	all	attributes
Apples	and	Oranges	[DKS+11]	
•  One	of	the	key	considerations	while	deciding:	
–  appropriate	data	representation	format	(e.g.,	relational	for	
structured	and	XML	for	semi-structured	data)	
–  organization	of	data	(e.g.,	dependency	theory	and	normal	forms	
for	the	relational	model,	and	XML).	
–  data	indexes	(e.g.,	B+-tree	indexes	for	relational	and	numbering	
scheme-based	indexes	for	XML).	
–  data	querying	(e.g.,	using	SQL	for	the	relational	and	XPath/
XQuery	for	XML).	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 129	
In	other	words,	structuredness	permeates	
every	aspect	of	data	management
Apples	and	Oranges	[DKS+11]	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 130	
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Highly	structured	datasets	
(relational	like)	
Less	structured	datasets	
Synthetic	Datasets	
Real	Datasets
Apples	and	Oranges	[DKS+11]	
Some	important	observations:	
•  Since	TPC-H	is	a	relational	
dataset,	it	should	have	high	
structuredness.	
•  There	is	a	difference	between	
synthetic	and	and	real	datasets.		
•  Synthetic	are	fairly	structured	
and	relational-like		
•  Real	datasets	cover	the	whole	
spectrum	of	structuredness.	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 131	
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Structuredness of datasets
Existing	RDF	stores	are	tested	and	
compared	against	each	other	with	respect	
to	datasets	that	are	not	representative	of	
most	real	RDF	data.
Apples	and	Oranges	[DKS+11]	
•  Nothing	can	better	represent	data	than	the	data	itself!	
•  Idea:	Turn	every	dataset	into	a	benchmark	
1.  No	need	to	synthetically	generate	values	
•  Use	the	actual	data	values	in	the	dataset	
2.  No	need	to	synthetically	generate	queries.		
•  The	queries	that	are	known	to	run	in	your	data	can	be	
used	in	the	benchmark.	
3.  But	we	need	to	cover	the	structuredness	spectrum		
•  to	get	data	as	close	as	possible	to	the	real	world	data	
•  to	see	how	the	systems	perform	when	data	goes	from	
very	structured	to	less	structured	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 132
Counting	Coins	[DKS+11]	
•  Start	with	a	dataset	with	size	S	and	CH	=	0.5	
•  Aim	for	a	dataset	with	size	S’	and	CH’,		
where	S	>	S’	and	CH	>	CH’.	
Process:		
•  Assign	a	coin	to	each	triple	(s,	p,	o)	and	compute	the	
impact	in	CH	of	its	removal		
–  The	removal	will	impact	the	size	by	1.	
Example:	Consider	(person1,	ext,	x5304).	Removing	
the	triple	from	D	gives	a	dataset	with	CH(T,	D)	=	0.467.	
Therefore	the	coin(person1,	ext,	x5304)	=	0.5	–	0.467	=	
0.033.	
•  Formulate	(automatically)	an	integer	programming	
problem	whose	solutions	will	tell	us	how	many	coins	to	
remove	to	achieve	the	desired	coherence	CH’	and	size	S’.		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 133	
subject	 predicate	 object	
person0	 name	 Eric	
person0	 office	 BA7430	
person0	 ext	 x4401	
person1	 name	 Kenny	
person1	 office		 BA7349	
person1	 office	 BA5439	
person1	 ext	 x5304	
person2	 name	 Kyle	
person2	 ext	 x6281	
person3	 name	 Timmy	
person3	 major	 C.S.	
person3	 GPA	 3.4	
person4	 name	 Stan	
person4	 GPA	 3.8	
person5	 name	 Jimmy	
person5	 GPA	 3.7	
One	of	the	few	occasions	in	life	where	having	
too	many	coins	is	undesirable…
Technical	challenges	in	problem	formulation	
•  Compute	coins	which	represent	the	impact	on	structuredness	
of	removing	all	triples	with	subjects	that	are	instances	of	a	
type	T	with	properties	equal	to	p	
–  Therefore	one	coin	for	each	type/property	combination.	
•  Add	constraints	that	set	lower	and	upper	bounds	on	the	
number	of	coins	that	can	be	removed	so	as	not	to	completely	
remove	a	property	from	a	type.	
•  Add	constraints	which	guarantee	that	not	all	instances	of	a	
type	are	removed.	
•  To	deal	we	multi-valued	properties,	we	add	constraints	that	
introduce	a	relaxation	parameter	ρ	
–  required	because	of	the	approximation	by	using	the	average	
number	of	triples	per	coin.	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 134
Waterloo	SPARQL	Diversity	Test	Suite	[AHO+14]	
•  Stress	existing	RDF	engines	to	reveal	a	wider	range	of	query	
requirements	as	established	by	web	applications	
•  Contributions	
–  Definition	of	2	classes	of	query	features	used	to	evaluate	
the	variability	of	workloads	and	datasets		
•  Structural	(e.g.,	number	of	triple	patterns)	
•  	Data-driven	(affect	selectivity	and	result	cardinality)	
–  In-depth	analysis	of	existing	SPARQL	benchmarks	using	
the	structural	and	data-driven	features		
–  WatDiv	Test	Suite	to	stress	existing	RDF	engines	to	reveal	a	
wider	range	of	query	requirements	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 135
WatDiv	Structural	Features	(1)	
1.  Triple	Pattern	Count	
–  Number	of	triple	patterns	in	SPARQL	Graph	Patterns	
2.  Join	Vertex	Count	
–  Number	of	RDF	terms	(IRIs,	literals,	blank	nodes)	and	variables	that	
are	subjects	or	objects	of	multiple	triple	patterns		
3.  Join	Vertex	Degree	
–  The	degree	of	a	join	vertex	v	is	the	number	of	triple	patterns	whose	
subject	or	object	is	v	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 136	
SP2Bench	Q5a	
SELECT	DISTINCT	?person	?name	
WHERE	{		
																			?article	rdf:type	bench:Article.	
																			?article	dc:creator	?person.	
																			?inproc	rdf:type	bench:Inproceedings.	
																			?inproc	dc:creator	?person2.	
																			?person	foaf:name	?name.	
																			?person2	foaf:name	?name2	
																			FILTER(?name=?name2)	}	
Triple		
Count	
Join		
Vertices	
Join		
Vertex		
Count	
Join	Vertex		
Degree	
6	 ?article,	?inproc		
?person,	?
person2	
10	 ?article:2,	?inproc:2		
?person:2,	?
person2:2
WatDiv	Structural	Features	(2)	
•  Join	Vertex	Degree	&	Count	provide	a	good	characterization	of	
the	structural	complexity	of	a	query	
–  Number	of	triple	patterns	does	not	properly	characterize	the	
query:	two	queries	with	the	same	set	of	triple	patterns	can	have	
different	structures	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 137	
?n
?m
?x
?l
C
E
?k
A
?y
?b
?z
?d ?o
Linear	query	
?c
D
D
?x
?b
B
?z
C
?w
D
?b
E
?w
Snowflake	query	
?y
?b
?x
B
E
A
D
?z
C
?c
Star	query	
?m
?f
G
WatDiv	Structural	Features	(3)	
•  Join	Vertex	Type	
–  Play	an	important	role	in	the	behavior	of	RDF	engines	to	determine	
efficient	query	plans	
•  E.g.,	star	queries	promote	efficient	merge	joins	
•  3	(mutually	non-exclusive)	types	of	join	vertices	
–  Vertex	x of	type	SS+
if	for	all	triple	patterns	(s,p,o)*, x is	the	subject	
–  Vertex	x of	type	OO+
if	for	all	triple	patterns	(s,p,o)*, x is	the	object	
–  Vertex	x of	type	SO+
if	for	all	triple	patterns	(s,p,o)*, (s’,p’,o’) x=s & x=o’	
	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 138	
?n
?m
?x
?l
C
E
?k
?m type SS+
	
?x
B
?z
C
?w
?x type OO+
	
?c
D
D
?x
?b
B
?z
C
?w
?x type SO+
	
*Triple	pa8erns	(s,p,o) are	incident	on	x
WatDiv	Data-driven	Features	(1)	
•  A	system’s	choice	on	the	most	efficient	query	plan	depends	on		
–  (a)	the	characteristics	of	the	dataset	and		
–  (b)	the	query	
•  If	the	system	relies	on	selectivity	estimations	and	result	
cardinality,	the	same	query	will	have	a	different	query	plan	for	
dataset(s)	of	different	sizes	
•  Different	cases:	
–  Queries	have	a	diverse	mix	of	result	cardinalities	
–  Some	triple	patterns	are	very	selective,	others	are	not	
–  All	triple	patterns	are	equally	selective		
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 139
WatDiv	Data-driven	Features	(2)	
•  Result	Cardinality	CARD(Ā,G)	
–  the	number	of	solutions	in	the	result	of	the	evaluation	of	a	graph	
pattern	Ā = <A, F> over	graph	G	
•  Filter	Triple	Pattern	Selectivity	(f-TP	Selectivity)	SELF
G (tp)		
–  the	ratio	of	distinct	solution	mappings	of	a	triple	pattern	tp	to	
the	set	of	triples	in	graph	G	
•  Measures	
1.  Result	cardinality	
2.  Mean	&	standard	deviation	of	f-TP	selectivities	of	triple	
patterns	
•  Important	for	distinguishing	queries	whose	triple	patterns	are	
almost	equally	selective	from	queries	with	varying	f-TP	
selectivities	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 140
WatDiv	Data-driven	Features	(3)	
•  Result	Cardinality	&	f-TP	selectivity	are	not	sufficient	
–  Intermediate	solution	mappings	will	not	make	it	to	the	final	
result	(e.g.,	due	to	filters	or	more	restrictive	joins)	
–  The	overall	selectivity	of	a	graph	pattern	can	be	
determined	by	a	single		very	selective	triple	pattern	
•  Run-time	optimization	techniques	(e.g.,	side-ways	
information	passing)	to	early	prune	intermediate	results	
•  Introduce	2	features	to	capture	above	cases	
1.  BGP-Restricted	f-TP	selectivity		
2.  Join-Restricted	f-TP	selectivity	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 141
WatDiv	Data-Driven	Features	(4)	
•  BGP-Restricted	f-TP	selectivity	SELF
G (tp|Ā)	
•  assesses	how	much	a	triple	pattern	contributes	to	the	overall	
selectiveness	of	the	query	
•  fraction	of	distinct	solution	mappings	for	a	triple	pattern	that	
are	compatible	with	some	solution	mapping	in	the	query	result.	
•  Join-restricted	f-TP	selectivity	SELF
G (tp|x)	
•  assesses	how	much	a	filtered	triple	pattern	contributes	to	the	
overall	selectiveness	of	the	joins	that	it	participates	in	
•  for	x	a	join	vertex	and	tp a	triple	pattern	incident	on	x, the	x-
restricted	f-TP	of	tp over	graph	G	is	the	fraction	of	distinct	
solution	mappings	compatible	with	a	solution	mapping	in	the	
query	result	of	the	sub-query	that	contains	all	triple	patterns	
incident	to	x	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 142
WatDiv	Test	Suite	(1)		
•  Components:	Data	Generator	and	Query	Generator	
•  Data	Generator	
–  Allows	users	to	define	their	own	dataset	controlling	
•  Entities	to	include	
•  Topology	of	the	graphs	allowing	one	to	mimic	the	real	types	
of	data	distributions	in	the	Web	
– «well-structuredness»	of	entities	
– probability	of	entity	associations	
– cardinality	of	property	associations	
–  Important:	Instances	of	the	same	entity	do	not	have	the	same	
set	of	attributes:	breaking	the	«relational	nature»	of	previous	
RDF	benchmarks	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 143
WatDiv	Test	Suite	(2)		
•  Query	Template	Generator	
–  User-specified	number	of	templates		
–  User	specified	template	characteristics	
•  Number	of	triple	patterns	
•  Types	of	joins	and	filters	in	the	triple	patterns	
–  Traverses	the	WatDiv	schema	using	a	random	walk	and	
generates	a	set	of	query	templates	
•  Query	Generator	
–  Instantiates	the	query	templates	with	terms	(IRIs,	literals	etc.)	
from	the	RDF	dataset	
–  User-specified	number	of	queries	produced	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 144
WatDiv	Test	Suite	(3)	
•  Query	Template	Generator	
–  Random	Walk	on	an	internal	representation	of	the	schema		
•  Entity	types	in	the	schema	correspond	to	graph	vertices	
•  Relationships	(i.e.,	object	type	properties)	are	graph	edges	
•  Vertices	are	annotated	with	data	type	properties	(i.e.,	
attributes)	
–  Produces	a	set	of	Basic	Graph	Patterns	with	a	maximum	n	triple	
patterns	with	unbound	objects	and	subjects	
–  k	uniformly	randomly	selected	subjects/objects	are	replaced	
with	placeholders	
–  Placeholders	are	replaced	with	actual	RDF	terms	randomly	
retrieved	from	the	dataset	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 145
Comparison	of	WatDiv	with	other	RDF	Benchmarks	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 146	
Copyright	[AHO+14]	
•  Query	Workload	
–  Large	range	of	queries	
•  Mean	join	vertex	degree	distributed	among	2	and	10	
–  Join	Vertex	Types:		
•  18%	of	queries	are	star	joins,	4.4%	in	DBSB	
•  61.3%	of	queries	are	path	queries,	5.4%	in	DBSB
Comparison	of	WatDiv	with	other	RDF	Benchmarks	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 147	
Copyright	[AHO+14]	
•  Data-Driven	Features	
–  DBSB	and	BSBM	cover	the	ends	of	the	spectrum	of	mean	Join-Restricted	f-
TP	selectivity	values	
–  WatDiv	covers	the	full	spectrum	of	Restricted	f-TP	selectivity	values	
–  WatDiv	covers	a	lower	range	of	values	for	mean	f-TP	selectivity	when	
compared	to	DBSB	
	
General	Remarks	
•  comparable	to	DBSB		
•  more	diverse	than	LUBM,	SP2Bench	and	BSBM
FEASIBLE	[SNM15]		
•  Proposes	a	feature-based	benchmark	generation	approach	
from	real	queries	
–  Structure-based	
–  Data-driven	based		
•  Approach	is	similar	to	WatDiv	Test	Suite	
•  Novel	sampling	approach	for	queries	based	on	exemplars	and	
medoids	
•  Propose	SELECT,	ASK,	CONSTRUCT	and	DESCRIBE	SPARQL	
queries	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 148
FEASIBLE	Query	Features	
•  Number	of	Triple	Patterns	
•  Number	of	Join	Vertices		
–  Distinguishing	between	«star»,	«path»	,	«hybrid»	and	«sink»	
vertices	
•  Join	Vertex	Degree	
–  Sum	of	incoming	and	outgoing	edges	of	the	vertex	
•  Triple	Pattern	Selectivity	
–  Ratio	of	triples	that	match	the	triple	pattern	over	all	triples	in	
the	dataset	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 149	
o1
x
o2
p1
p2
x
yp1 p2 z
Star	vertex:	x		 Path	vertex:	x	 Hybrid	vertex:		x	
o1
x
o2
p1
p2
y
z
Sink	vertex:		x	
x
y
z
FEASIBLE	Benchmark	Generation	
•  3-step	benchmark	generation	
•  Data-set	Cleaning	
–  Leads	to	practically	reliable	benchmarks	
•  Normalization	of	Feature	Vectors	
–  Query	selection	process	requires	distances	between	queries	to	
be	computed	
–  Normalize	the	query	representations	so	that	all	queries	are	in	a	
unit	hypercube	
•  Query	Selection	
–  Based	on	the	idea	of	exemplars	[NS11]	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 150
FEASIBLE	Benchmark	Generation	
•  Dataset	Cleaning	
–  Remove	erroneous	and	zero-result	queries	from	the	set	of	real	
queries	used	to	generate	the	benchmark	
–  Exclude	all	syntactically	incorrect	queries	
–  Attach	9	SPARQL	operators	(UNION,	DISTINCT,	OPTIONAL,	..	)	
and	7	query	features	(join	vertices,	join	vertex	count	etc.)	to	
each	of	the	queries	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 151
FEASIBLE	Benchmark	Generation	
•  Normalization	of	Feature	Vectors	
–  Queries	are	mapped	to	a	vector	of	length	16	which	stores	the	
query	features	
•  For	binary	SPARQL	clauses	(e.g.,	UNION	is	either	used	or	not	
used),	store	value	1.	Else	store	value	0	
•  All	non-binary	feature	vectors	are	normalized	by	dividing	
their	value	with	the	overall	maximum	value	in	the	data	set	
•  Query	representations	are	associated	with	values	between	1	
and	0	
6/21/16	 ESWC	2016:	Assessing	the	performance	of	RDF	Engines	-	Discussing	RDF	Benchmarks	 152
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks

More Related Content

What's hot

Trends in Cataloging & Metadata
Trends in Cataloging & MetadataTrends in Cataloging & Metadata
Trends in Cataloging & MetadataDebra Shapiro
 
What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection? What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection? Debra Shapiro
 
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic DataNCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic DataNebraska Library Commission
 
New Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMENew Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMESharonYang
 
What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?Debra Shapiro
 
Many flavors of linked data
Many flavors of linked dataMany flavors of linked data
Many flavors of linked dataDebra Shapiro
 
Open Data Management for Public Automated Translation
Open Data Management for Public Automated TranslationOpen Data Management for Public Automated Translation
Open Data Management for Public Automated TranslationDave Lewis
 
They have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersThey have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersRichard Wallis
 
Preserving Public Government Information: The End of Term Web Archive
Preserving Public Government Information: The End of Term Web ArchivePreserving Public Government Information: The End of Term Web Archive
Preserving Public Government Information: The End of Term Web Archivetseneca
 
The Future of Metadata Management & Making Library Collections Discoverable o...
The Future of Metadata Management & Making Library Collections Discoverable o...The Future of Metadata Management & Making Library Collections Discoverable o...
The Future of Metadata Management & Making Library Collections Discoverable o...tfons
 
Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010Juan Sequeda
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?OCLC
 

What's hot (20)

Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
Trends in Cataloging & Metadata
Trends in Cataloging & MetadataTrends in Cataloging & Metadata
Trends in Cataloging & Metadata
 
What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection? What flavor of linked data is best for your collection?
What flavor of linked data is best for your collection?
 
Wiggins-7-jun15
Wiggins-7-jun15Wiggins-7-jun15
Wiggins-7-jun15
 
Thompson 6-jun15-final
Thompson 6-jun15-finalThompson 6-jun15-final
Thompson 6-jun15-final
 
Lawless-3-jun15
Lawless-3-jun15Lawless-3-jun15
Lawless-3-jun15
 
LIBRIS - Linked Library Data
LIBRIS - Linked Library DataLIBRIS - Linked Library Data
LIBRIS - Linked Library Data
 
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic DataNCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
NCompass Live: Beyond MARC: BIBFRAME and the Future of Bibliographic Data
 
New Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMENew Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAME
 
What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?What flavor of metadata is best for your collection?
What flavor of metadata is best for your collection?
 
Many flavors of linked data
Many flavors of linked dataMany flavors of linked data
Many flavors of linked data
 
Open Data Management for Public Automated Translation
Open Data Management for Public Automated TranslationOpen Data Management for Public Automated Translation
Open Data Management for Public Automated Translation
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
They have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersThey have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library Users
 
Preserving Public Government Information: The End of Term Web Archive
Preserving Public Government Information: The End of Term Web ArchivePreserving Public Government Information: The End of Term Web Archive
Preserving Public Government Information: The End of Term Web Archive
 
The Future of Metadata Management & Making Library Collections Discoverable o...
The Future of Metadata Management & Making Library Collections Discoverable o...The Future of Metadata Management & Making Library Collections Discoverable o...
The Future of Metadata Management & Making Library Collections Discoverable o...
 
Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010
 
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?
 

Viewers also liked

NoSQL and Triple Stores
NoSQL and Triple StoresNoSQL and Triple Stores
NoSQL and Triple Storesandyseaborne
 
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...LDBC council
 
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...Paulo Pinheiro
 
Contextual Data Collection for Smart Cities
Contextual Data Collection for Smart CitiesContextual Data Collection for Smart Cities
Contextual Data Collection for Smart CitiesHenrique O. Santos
 
Web sémantique, Web de données, Web 3.0, Linked Data... Quelques repères pour...
Web sémantique, Web de données, Web 3.0, Linked Data... Quelques repères pour...Web sémantique, Web de données, Web 3.0, Linked Data... Quelques repères pour...
Web sémantique, Web de données, Web 3.0, Linked Data... Quelques repères pour...Antidot
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use caseDimitris Kontokostas
 
RMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
RMLEditor: A Graph-based Mapping Editor for Linked Data MappingsRMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
RMLEditor: A Graph-based Mapping Editor for Linked Data MappingsPieter Heyvaert
 
TEDx Navesink 2015: to be AND not to be - Quantum Intelligence
TEDx Navesink 2015: to be AND not to be - Quantum IntelligenceTEDx Navesink 2015: to be AND not to be - Quantum Intelligence
TEDx Navesink 2015: to be AND not to be - Quantum IntelligenceLora Aroyo
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)Lora Aroyo
 
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Lora Aroyo
 
Semantic Support for Complex Ecosystem Research Environments
Semantic Support for Complex Ecosystem Research EnvironmentsSemantic Support for Complex Ecosystem Research Environments
Semantic Support for Complex Ecosystem Research EnvironmentsHenrique O. Santos
 

Viewers also liked (16)

thesis
thesisthesis
thesis
 
Triple Stores
Triple StoresTriple Stores
Triple Stores
 
NoSQL and Triple Stores
NoSQL and Triple StoresNoSQL and Triple Stores
NoSQL and Triple Stores
 
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
8th TUC Meeting - Juan Sequeda (Capsenta). Integrating Data using Graphs and ...
 
cyclades eswc2016
cyclades eswc2016cyclades eswc2016
cyclades eswc2016
 
VOLT - ESWC 2016
VOLT - ESWC 2016VOLT - ESWC 2016
VOLT - ESWC 2016
 
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...
 
Contextual Data Collection for Smart Cities
Contextual Data Collection for Smart CitiesContextual Data Collection for Smart Cities
Contextual Data Collection for Smart Cities
 
Web sémantique, Web de données, Web 3.0, Linked Data... Quelques repères pour...
Web sémantique, Web de données, Web 3.0, Linked Data... Quelques repères pour...Web sémantique, Web de données, Web 3.0, Linked Data... Quelques repères pour...
Web sémantique, Web de données, Web 3.0, Linked Data... Quelques repères pour...
 
Semantically enhanced quality assurance in the jurion business use case
Semantically enhanced quality assurance in the jurion  business use caseSemantically enhanced quality assurance in the jurion  business use case
Semantically enhanced quality assurance in the jurion business use case
 
RMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
RMLEditor: A Graph-based Mapping Editor for Linked Data MappingsRMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
RMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
 
TEDx Navesink 2015: to be AND not to be - Quantum Intelligence
TEDx Navesink 2015: to be AND not to be - Quantum IntelligenceTEDx Navesink 2015: to be AND not to be - Quantum Intelligence
TEDx Navesink 2015: to be AND not to be - Quantum Intelligence
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)
 
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
 
Wither OWL
Wither OWLWither OWL
Wither OWL
 
Semantic Support for Complex Ecosystem Research Environments
Semantic Support for Complex Ecosystem Research EnvironmentsSemantic Support for Complex Ecosystem Research Environments
Semantic Support for Complex Ecosystem Research Environments
 

Similar to Assessing the performance of RDF Engines: Discussing RDF Benchmarks

Marc and beyond: 3 Linked Data Choices
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices Richard Wallis
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked DataAdrian Stevenson
 
Питер Мика "Making the web searchable"
Питер Мика "Making the web searchable"Питер Мика "Making the web searchable"
Питер Мика "Making the web searchable"Yandex
 
Smart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web HarvestingSmart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web Harvestingpaperpublications3
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageNoreen Whysel
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureEmily Nimsakont
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingHerbert Van de Sompel
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Peter Mika
 
Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Marton Nemeth
 
Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...BOBCATSSS 2017
 
Linked Data: Uses and Users
Linked Data: Uses and UsersLinked Data: Uses and Users
Linked Data: Uses and UsersGretchen Gueguen
 

Similar to Assessing the performance of RDF Engines: Discussing RDF Benchmarks (20)

Marc and beyond: 3 Linked Data Choices
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
Питер Мика "Making the web searchable"
Питер Мика "Making the web searchable"Питер Мика "Making the web searchable"
Питер Мика "Making the web searchable"
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Smart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web HarvestingSmart Crawler for Efficient Deep-Web Harvesting
Smart Crawler for Efficient Deep-Web Harvesting
 
LKG Editor Dev
LKG Editor DevLKG Editor Dev
LKG Editor Dev
 
Semantic web
Semantic webSemantic web
Semantic web
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
 
Semantic web
Semantic webSemantic web
Semantic web
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the Future
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Linked Data Basics
Linked Data BasicsLinked Data Basics
Linked Data Basics
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
 
Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...
 
Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...
 
Linked Data: Uses and Users
Linked Data: Uses and UsersLinked Data: Uses and Users
Linked Data: Uses and Users
 

More from Holistic Benchmarking of Big Linked Data

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...Holistic Benchmarking of Big Linked Data
 
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Holistic Benchmarking of Big Linked Data
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...Holistic Benchmarking of Big Linked Data
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F... Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...Holistic Benchmarking of Big Linked Data
 
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignIntroducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignHolistic Benchmarking of Big Linked Data
 

More from Holistic Benchmarking of Big Linked Data (20)

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
Benchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT ProjectBenchmarking Big Linked Data: The case of the HOBBIT Project
Benchmarking Big Linked Data: The case of the HOBBIT Project
 
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
Assessing Linked Data Versioning Systems: The Semantic Publishing Versioning ...
 
The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018The DEBS Grand Challenge 2018
The DEBS Grand Challenge 2018
 
Benchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systemsBenchmarking of distributed linked data streaming systems
Benchmarking of distributed linked data streaming systems
 
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation FrameworkSQCFramework: SPARQL Query Containment Benchmarks Generation Framework
SQCFramework: SPARQL Query Containment Benchmarks Generation Framework
 
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federationLargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation
 
The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
 
Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)Scalable Link Discovery for Modern Data-Driven Applications (poster)
Scalable Link Discovery for Modern Data-Driven Applications (poster)
 
An Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link DiscoveryAn Evaluation of Models for Runtime Approximation in Link Discovery
An Evaluation of Models for Runtime Approximation in Link Discovery
 
Scalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven ApplicationsScalable Link Discovery for Modern Data-Driven Applications
Scalable Link Discovery for Modern Data-Driven Applications
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F... Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery ToolsSPgen: A Benchmark Generator for Spatial Link Discovery Tools
SPgen: A Benchmark Generator for Spatial Link Discovery Tools
 
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation CampaignIntroducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
Introducing the HOBBIT platform into the Ontology Alignment Evaluation Campaign
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
 
MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018MOCHA 2018 Challenge @ ESWC2018
MOCHA 2018 Challenge @ ESWC2018
 
Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018Dynamic planning for link discovery - ESWC 2018
Dynamic planning for link discovery - ESWC 2018
 
Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017Hobbit project overview presented at EBDVF 2017
Hobbit project overview presented at EBDVF 2017
 
Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)Leopard ISWC Semantic Web Challenge 2017 (poster)
Leopard ISWC Semantic Web Challenge 2017 (poster)
 

Recently uploaded

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 

Assessing the performance of RDF Engines: Discussing RDF Benchmarks