Assessing the performance of RDF Engines: Discussing RDF Benchmarks

Assessing the performance of RDF Engines:
Discussing RDF Benchmarks

Irini Fundulaki
Institute of Computer Science – FORTH, Greece
Anastasios Kementsietsidis
Google Research, USA
6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 1

Traditional Web: Web of Documents
•  single information space: global ﬁlesystem
•  designed for human consumption
•  documents are the primary objects with a loose structure
•  URLs are the globally unique IDs and part of the retrieval
mechanism
•  cannot ask expressive queries
© Hartig, Cyganiac, Bizer, Hausenblas, Heath
How to Publish Linked Data on the Web
HTML HTML HTML
Web Browsers Web Browsers
hyperlinks hyperlinks

Going from the Web of Documents to the
Web of Data
•  A global database
•  Designed for machines ﬁrst, humans later
•  Things are primary objects with a well deﬁned structure
•  Typed links between things
•  Ability to express structured queries
Thing
Thing
Thing
Thing
Thing
Thing
Don’t link the documents, link the things
typed links typed links
© The Web of Linked Data: Tom Heath,
An Introduction to Linked Data

Linking Open Datasets (LOD)
•  Publish open data as Linked Data on the Web
•  Interlink entities between heterogeneous data sources

Status of the Linked Open Data Cloud, 2007

Media
Government
Geographic
Publications
User-generated
Life sciences
Cross-domain
RDF, a common data model
More than 31B triples in LOD
Links (external): 500M

Linked Data in numbers (2014)
•  State of the LOD Cloud 2014, University of Manheim
Domain Datasets % Any SPARQL Dump
Government 183 18.05 61 (32.80%) 30.11% 30.65%
Publications 96 9.47 10 (10.58%) 9.62% 3.85%
Life Sciences 83 8.19 19 (21.35%) 20.22% 16.85%
User-generated
content
48 4.73 3 (5.4%5) 5.45%

1.82%
Cross-domain 41 4.04 4 (9.09%) 4.55% 6.82%
Media 22 2.17 1 (2.70%) 0.00% 2.70%
Geographic 21 2.07 8 (19.51%) 12.20% 12.20%
Social Web 520 51.28 6 (1.16%%) 1.16% 0.39%
Total 1014 - 48 (5.89%) 4.54% 3.80%
Access Methods

Proliferation of Big Data Stores

Many (not a lot) RDF Stores

The Question(s)
•  Which are the problems that I wish to solve?
•  Which are the relevant key performance indicators?
•  Which is the behavior of the existing engines w.r.t. the key
performance indicators?
Which are the tool(s) that I should
use for my data and
for my use case?

The Answer: Benchmark your engines!
•  Querying Benchmark comprises of
–  datasets (synthetic or real)
–  set of software tools
•  synthetic data generators
•  query generators
–  performance metrics, and
–  set of clear execution rules
•  Standardized application scenario(s) that serve as a basis for
testing systems
•  Must include a clear set of factors to be measured and the
conditions under which the systems should be measured

•  Benchmarks exist
–  To allow adequate measurements of systems
–  To provide evaluation of engines for real (or close to real) use
cases
•  Provide help
–  Designers and Developers to assess the performance of their
tools
–  Users to compare the diﬀerent available tools and evaluate
suitability for their needs
–  Researchers to compare their work to others
•  Leads to improvements:
–  Vendors can improve their technology
–  Researchers can address new challenges
–  Current benchmark design can be improved to cover new
necessities and application domains
Importance of Benchmarking

Tutorial Objective & Benefits
•  Objectives:
–  Discuss a set of principles and best practices for benchmark
development
–  Present an overview of the current work on benchmarks for
RDF query engines
–  Focus on identifying research challenges & unexplored
research directions
•  Benefits for the audience
–  Academic: Obtain a solid background, discover new research
directions
–  Practitioner: find out what are the available benchmarks,
advantages and limitations thereof

Purpose of the Tutorial
•  Stimulate discussions on the following topics:
1.  How can one come up with the right benchmark that
accurately captures use cases of interest?
2.  How can a benchmark capture the fact that RDF data originate
from a multitude of formats
! Structured: relational and/or XML data to RDF
! Unstructured
3.  How can a benchmark capture the diﬀerent data and query
patterns and provide a consistent picture for system behavior
across diﬀerent application settings?
4.  How can one select the right benchmark for her system, data
and workload?

Overview
•  Introducing Benchmarks
•  A short discussion about Linked Data
–  Resource Description Framework (Data Model)
–  SPARQL (Query Language)
•  Benchmarking Principles & Choke Points
•  Benchmarks
–  Synthetic
–  Real
–  Benchmark Generators
•  Sum up: what did we learn today?

A short discussion about Linked Data
- Resource Description Framework (Data Model)
- SPARQL (Query Language)

Resource Description Framework (RDF)
•  W3C standard to represent Web data and metadata
•  generic and simple graph based model
•  information from heterogeneous sources merges naturally:
–  resources with the same URI denote the same non-information
resource (leading to the Linked Data Cloud)
•  structure is added using schema languages and is
represented as RDF triples
•  Web browsers use URIs to retrieve information


Resource Description Framework (RDF)
•  An RDF triple is of the form (s, p, o) where
–  s is the subject: the URI identifying the described resource
–  o is the object: can either be a simple literal value or the URI of
another resource
–  p is the predicate: the URI indicating the relation between
subject and object
•  An RDF graph is a set of triples
–  Can be viewed as a node and edge-labeled directed graph
–  It is published in diﬀerent formats
•  RDF-XML, turtle, n3 triples, …
(dbpedia:Good_Day_Sunshine, dbpedia-owl:artist, dbpedia:The_Beatles)
Close to how people see the world (as a graph)!

Adding Semantics to RDF
•  RDF is a generic, abstract data model for describing resources
in the form of triples
•  RDF does not provide ways of defining classes, properties,
constraints
•  W3C Standard Schema Languages
–  RDF Vocabulary Description Language (RDF Schema -
RDFS) to define schema vocabularies
–  Ontology Web Language (OWL) to define ontologies

Adding Semantics to RDF
•  RDF Vocabularies are sets of terms used to describe notions
in a domain of interest
•  An RDF term is either a Class or a Property
–  Object properties denote relationships between objects
–  Data type properties denote attributes of resources
•  RDFS designed to introduce useful semantics to RDF triples
•  RDFS Schemas are represented as RDF triples

"An RDF Vocabulary is a schema comprising of classes,
properties and relationships which can be used for
describing data and metadata"

RDF Vocabulary Description Language (RDFS)
•  Typing: deﬁning classes, properties, instances
•  Relationships between classes and properties: subsumption
•  Constraints: domain and range of properties
•  Inference rules to entail new, inferred knowledge
Subject Predicate Object
t1 dbo:MusicalWork rdfs:subClassOf dbo:Album
t2 dbo:MusicalWork rdfs:domain dbo:artist
t3 dbo:MusicalWork rdfs:range dbo:march
t4 dbr:Seven_Seas_Of_Rye rdf:type dbo:MusicalWork
t5 dbo:Album rdf:type rdf:Class

RDFS Inference
•  Used to entail new information from the one that is explicitly stated in
the dataset
–  Transitive closure across class and property hierarchies

–  Transitive closure along the type and class/property relations
•  Two ways to implement it: Forward & Backward Reasoning
–  Forward Reasoning: closure is computed at loading time
–  Backward Reasoning: closure is computed on the ﬂy when needed
(P1, rdfs:subPropertyOf, P2), (P2, rdfs:subPropertyOf, P3)
(P1, rdfs:subPropertyOf, P3)
R1:
(C1, rdfs:subClassOf, C2), (C2, rdfs:subClassOf, C3)
(C1, rdfs:subClassOf, C3)
R2:
(C1, rdfs:subClassOf, C2), (r1, rdf:type, C1)
(r1, rdf:type, C2)
R2:
(P1, rdfs:subPropertyOf, P2), (r1, P1, r2)
(r1, P2, r2)
R3:

RDFS Inference
•  Transitive closure along the type and class/property relations
(C1, rdfs:subClassOf, C2), (r1, rdf:type, C1)
(r1, rdf:type, C2)
R2:
t1 dbo:MusicalWork rdfs:subClassOf dbo:Album
t2 dbo:MusicalWork rdfs:domain dbo:artist
t3 dbo:MusicalWork rdfs:range dbo:march
t5 dbo:Album rdf:type rdf:Class
t6 dbo:MusicalWork rdf:type rdf:Class

SPARQL: Querying RDF Data
•  SPARQL: W3C Standard Language for Querying Linked
Data
•  SPARQL 1.0 (2008) only allows accessing the data (query)
•  SPARQL 1.1 (2013) introduces:
–  Query Extensions: aggregates, sub-queries, negation,
expressions in the SELECT clause, property paths, assignment,
short form for CONSTRUCT, expanded set of functions and
operators
–  Updates:
•  Data management: Insert, Delete, Delete/Insert
•  Graph management: Create, Load, Clear, Drop, Copy,
Move, Add
–  Federation extension: Service, values, service variables
(informative) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 25

SPARQL Queries (1)
•  Building Block is the Triple Pattern
–  RDF triple with variables
•  Group Graph Patterns
–  Built through inductive construction combining smaller
patterns into more complex ones using SPARQL operators
•  Join - similar to relational join
•  Union (UNION) – similar to relational union
•  Optional (OPTIONAL) operators on triple patterns – similar
to relational left outer join (introduces negation in the
language)
•  Filtering conditions (FILTER)
•  Patterns on Named Graphs

SPARQL Queries (2)
•  Aggregates
–  specify expressions over groups of solutions
–  As in standard settings used when the result is computed over a
group of solutions rather than a single solution
•  Example: average value of a set of values, sum of a set
–  Aggregates deﬁned in SPARQL 1.1 are COUNT, SUM, MIN,
MAX, AVG, GROUP_CONCAT, and SAMPLE.
–  Solutions are grouped using the GROUP BY clause
–  Pruning at group level is performed with the HAVING clause
•  Additional Features
–  duplicate elimination (DISTINCT)
–  ordering results (ORDER BY) with an optional LIMIT clause

SPARQL Semantics
•  SPARQL semantics based on Pattern Matching
–  Queries describe subgraphs of the queried graph
–  SPARQL graph patterns describe the subgraphs to match

Intuitively a triple pattern denotes the triples in an RDF
graph that are of a speciﬁc form
TP1 = (?album, dbpedia-owl:artist, dbpedia:The_Beatles)
TP2 = (dbpedia_The_Beatles, ?property, ?object )
matches all albums of the Beatles
matches all information about The Beatles

SPARQL Types of Queries
•  SELECT returns ordered multi-set of variable bindings
–  Bindings: mappings of variables to RDF terms in the dataset
–  SQL-Like Syntax
•  ASK checks whether a graph pattern has at least one
solution - returns a Boolean value (true/false)
•  CONSTRUCT returns a new RDF graph as speciﬁed by
the graph template of the CONSTRUCT clause using the
computed bindings from the query’s WHERE clause
•  DESCRIBE returns the RDF graph containing the RDF
data about the requested resource
SELECT ?v1, ?v2, … WHERE GraphPattern

Querying RDF Data with SPARQL (1)
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?title
WHERE { <http://example.org/book/book1> dc:title ?title }
Simple SELECT query
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox . }
JOIN Query
SELECT ?name ?mbox
WHERE { ?x foaf:name ?name . OPTIONAL { ?x foaf:mbox ?mbox } }
OPTIONAL Operator
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?title
WHERE { ?x dc:title ?title . FILTER regex(?title, "^SPARQL") }
REGEX in FILTER

Querying RDF Data with SPARQL (2)
PREFIX org: <http://example.com/ns#>
CONSTRUCT { ?x foaf:name ?name }
WHERE { ?x org:employeeName ?name }
ASK { ?x foaf:name "Alice" }
“Find the people who live in “Palo Alto” and have founded or are board members of
companies in the software industry. For each such company, ﬁnd the products that
were developed by it, its revenue, and optionally its number of employees.“

SELECT* WHERE
{ ?x home “Palo Alto” .
{ ?x founder ?y } UNION { ?x member ?y }
{
?y industry “Software” .
?z developer ?y .
?y revenue ?n .
OPTIONAL { ?y employees ?m } .
}
}
SPARQL 1.1:
SPARQL plus Aggregates, Sub-
queries, Property paths,
Negation and more!

Storing and Querying RDF data
•  Schema agnostic
–  triples are stored in a large triple table where the attributes are
(subject, predicate and object) - “Monolithic” triple-stores
–  But it can get a bit more eﬃcient
t2 dbr:Starman_(song) rdf:type dbo:MusicalWork
t3 dbr:Seven_Seas_Of_Rye dbo:artist dbo:Queen
id URI/Literal
1 dbr:Seven_Seas_Of_Rye
2 dbr:Starman_(song)
3 dbo:MusicalWork
4 dbo:Queen
5 dbo:artist
6 rdf:type
1 6 3
2 6 3
1 5 4
RDF-3X maintains 6 indexes, namely, SPO, SOP, OSP, OPS, PSO,
POS. To avoid storage overhead, indexes are compressed! [NW09]

•  schema aware:
–  one table is created per property with subject and object attributes (Property
Tables [Wilkinson06])
ID1 type BookType
ID1 title “XYZ”
ID1 author “Fox, Joe”
ID1 copyright “2001”
ID2 type CDType
ID2 title “ABC”
ID2 artist “Orr, Tim”
ID2 language “French”
ID3 type BookType
ID3 title “MNO”
ID3 language “English”
ID4 type DVDType
ID4 title “DEF”
ID5 type CDType
ID5 title “GHI”
ID6 type BookType
Subject Type Title copyright
ID1 BookType “XYZ” “2001”
ID2 CDType “ABC” “1985”
ID3 BookType “MNO” NULL
ID4 DVDType “DEF” NULL
ID5 CDType “GHI” “1995”
ID6 BookType NULL “2004”
Subject Title Author copyright
ID1 “XYZ” “Fox, Joe” “2001”
ID3 “MNO” NULL NULL
ID6 NULL NULL “2004”
Subject Title artist copyright
ID2 “ABC” “Orr, Tim” “1985”
ID5 “GHI” NULL “1985”
ID4 type DVDType
ID4 title “DEF”
Booktype
CDType
Property-class Table
Subject Object
… …
… …
Clustered Property Table
Multi-Value P

•  Vertically partitioned RDF [AMM+07]
ID1 type BookType
ID1 title “XYZ”
ID2 type CDType
ID2 title “ABC”
ID3 type BookType
ID3 title “MNO”
ID4 type DVDType
ID4 title “DEF”
ID5 type CDType
ID5 title “GHI”
ID6 type BookType
Subject Object
ID1 BookType
ID2 CDType
ID3 BookType
ID4 DVDType
ID5 CDType
ID6 BookType
Subject Object
ID1 “XYZ”
ID2 “ABC”
ID3 “MNO”
ID4 “DEF”
ID5 “GHI”
Subject Object
ID1 “2001”
ID2 “1985”
ID5 “1995”
ID6 “2004”
Subject Object
ID2 “Orr, Tim”
Subject Object
ID1 “Fox, Joe”
Subject Object
ID2 “French”
ID3 “English”
type
title
copyright
author
artist
language
To get the most out of this par0cular
decomposi0on, a column-oriented
DBMS is recommended.

Comparison of Storage Techniques [BDK+13]
company released
Google Android
Apple iPhone
subject object
Google Android
Google developer Android
subject predicate object
Larry Page born “1973”
Larry Page founder Google
Google HQ “MTV”
Google employees 50,000
Google industry Internet
Google industry Software
Google industry Hardware
Triple store
person born founder
Larry Page “1973 Google
Type-oriented store
company HQ employees
Google “MTV” 50,000
Google industry Internet
Google industry Software
Google industry Hardware
subject object
Larry Page “1973”
Predicate-oriented store
subject object
Google “MTV”
subject object
Google Internet
Google Software
Google Hardware
subject object
Larry Page Google
subject object
Google 50,000
born
founder
HQ
employees
industry
industtry
Larry Page
“1973”
Google
Internet
Software
Hardware
“MTV”
HQ
50,000
employees
sample graph
Columns are
overloaded
Traditional relational
column treatment
Static mix of overloaded
and normal columns
developer
Schema does not
change on updates
Schema might
change on updates

Storing Linked Data: Query Processing
•  Schema Agnostic
–  algebraic plan obtained for a query involves a large number of
self joins
–  queries are favorable when the predicate is a variable
•  Hybrid Approach and Schema-aware
–  algebraic plan contains operations over the appropriate
property/class tables (more in the spirit of existing relational
schemas)
–  saves many self-joins over triple tables
–  if the predicate is a variable, then one query per property/class
must be expressed

Purpose of an RDF Querying Benchmark
•  Test the performance of RDF stores
–  Independently of underlying storage engine
–  Independently of underlying logical and physical schema
–  Independently of the query actually executed in the engine
•  SPARQL for native stores
•  SQL (SPARQL translated to SQL) for relational stores

Overview
•  Benchmarks
–  Synthetic
–  Real

Benchmarking Principles & Choke Points

Why Benchmarks?
•  Performance Evaluation
–  There is no no single recipe on how to do it right
–  There are many ways how to do it wrong
–  There are a number of best practices but no broadly
accepted standard on how to design and develop a
benchmark
•  Questions asked:
–  What data/data sets should we use?
–  Which workload/queries should we consider?
–  What to measure and how to measure?

Benchmark Categories
•  Micro-benchmarks
•  Standard benchmarks
•  Real-life applications

Micro Benchmarks
•  Specialized, stand-alone piece of software
•  Isolate one particular functionality of a larger system
•  In databases a micro benchmark tests a single database
operator
–  Selection, Join (and all types thereof), Projection,
Aggregates, Sub-Queries, …


Micro Benchmarks: Advantages
•  Very focused
–  Test a specific operator of the system
•  Controllable data & workload
–  Synthetic and Real Data sets
•  Different value ranges and value distribution and correlations
(mostly applicable to structured data)
–  Various data sizes to tackle scalability concerns
•  Queries
–  Workloads of different complexity & size
•  Complexity: as to the types of query operators and patterns
•  Size: as to the number of query operators involved
–  Allow broad parameter range(s)
! Useful for detailed, in-depth analysis
! Low setup threshold;
! Easy to run

Micro Benchmarks: Disadvantages
•  Neglect larger picture since they do not test the whole system
•  Do not consider the flow of costs of specific operations to the
cost of the system
•  Do not measure the impact of micro-benchmark on real-life
applications
•  Difficult to generalize the results
•  The results of micro-benchmarks cannot be applied in a
straightforward manner
•  Micro-benchmarks do not use standardized metrics

Standard Benchmarks
•  Relational, Object Oriented, Object Relational Database
Management Systems
–  Family of TPC Benchmarks for relational databases
•  XML, XPath, XQuery,
–  Mbench, XBench, XMach-1, XMark,
•  General Computing
–  SPEC

Standard Benchmarks: Advantages & Disadvantages
•  Advantages
–  Mimic real-life scenarios (respond to real needs)
•  E.g., TPC is a business oriented benchmark
–  Publicly available
–  Well defined
–  Provide scalable data sets and workloads
–  Metrics are well defined
•  Disadvantages
–  Outdated (standardization is a lengthy process)
•  XQuery took around 7 years to become a standard
•  TPC benchmark definition is still an ongoing process
–  Very large and complicated to run
–  Limited dataset variation (target a specific type of data)
–  Limited Workload (focuses on the application in mind)
–  Systems are often optimized for the benchmark(s)

•  Management and methodological activities performed by a
group of people
–  Management: Organizational protocols to control the process
–  Methodological: principles, methods and steps for benchmark
creation
•  Benchmark Development
–  Roles and bodies: people/groups involved in the development
–  Design principles: fundamental rules that direct the
development of a benchmark
–  Development process: series of steps to develop a benchmark
based on Choke Points
Benchmark Development Methodology
Choke Points: the set of technical
diﬃculties that force systems to improve their performance

The Example Standard Benchmark: TPC
•  Transaction Processing Council (TPC)
–  non-profit corporation focused on developing data-centric
benchmark standards and disseminating objective, verifiable
performance data to the industry
–  goal is to «create, manage and maintain a set of fair and
comprehensive benchmarks that enable end-users and vendors to
objectively evaluate system performance under well defined
consistent and comparable workloads» [NPM+12]
Benchmark Explanation
TPC-C Focuses on transactions.
TPC-DI Focuses on ETL processes
TPC-DS Decision support solutions for, but not limited to, Big Data.
TPC-E On-Line Transaction Processing (OLTP) workload
TPC-H Decision support benchmark, ad hoc queries and concurrent data modifications
TPC-VMS Virtual Measurement Single System Specification for running and reporting performance
metrics for virtualized databases
TPC-xHS measure of hardware, operating system and commercial Apache Hadoop File System API
TPX-xV measure the performance of servers running database workloads in virtual machines.
Active TPC Benchmarks (2016)

Benchmark Development Process (1)
•  Design Principles [L97]
Principle Comment
Relevant The benchmark is meaningful for the target domain
Understandable The benchmark is easy to understand and use
Good Metrics The metrics deﬁned by the benchmark are linear, orthogonal
and monotonic
Scalable The benchmark is applicable to a broad spectrum of hardware
and software conﬁgurations
Coverage The benchmark workload does not oversimplify the typical
environment
Acceptance The benchmark is recognized as relevant by the majority of
vendors and users

Benchmark Development Process (2)
•  Benchmarking Metrics
–  Performance
–  Price/Performance
–  Energy/Performance Metrics: Energy metric to measure the energy
consumption of system components
•  TPC Pricing speciﬁcation
–  Provides consistent methodologies for computing the price of the
benchmarked system, licensing of software, maintenance, …
Benchmark Metrics
TPC-C Transaction Rate(tpmC), Price per Transaction ($/tmpC)
TPC-E Transactions per Second (tpS)
TPC-H Composite Query per Hour Performance Metric (QpH@Size),
Price per Composite Query per Hour Performance Metric ($/
QpH@Size)

Desirable Attributes of a Benchmark:
•  “A good benchmark is written in a high-level language, making it
portable across different machines; is representative of some
programming style or application; can be measured easily; has
wide distribution [W90]”
•  “a domain specific benchmark must meet four important criteria:
relevance, portability, simplicity, scalability [G93]”
•  Six desirable attributes for TPC-C [L97]: relevance,
understandability, good metrics, scalability, coverage, acceptance
•  Five desirable attributes in Huppler [H09]: relevance, repeatability,
fairness, verifiability, economy
•  Big Data Benchmarking [1]: “a successful benchmark should be
simple to implement and execute, cost effective, timely and
verifiable”.

Desirable Attributes of a Benchmark:

Design Principles: Desirable Attributes of a Benchmark
•  Relevant/Representative: based on realistic
use case scenarios and must reflect the needs
of the use case
•  Understandable/Simple: the results and
workload are easily understandable by users
•  Portable/Fair/Repeatable: no system
benefits from the benchmark. Must be
deterministic and provide a «gold standard»
•  Metrics: should be well defined to be able to
assess and compare the systems.
•  Scalable: datasets should be in the order of
billions of «objects»
•  Verifiable: allow verifiable results in each
execution
Benchmark
Attributes
relevant
representative
understandable
simple
portable
fair
repeatable
metrics
scalable
verifiable

Design of Benchmark Workload [Grey93]
•  Design the queries to test specific features of the query
language or to test specific data management approaches
•  Base the query mix on specific requirements of real world
use cases
–  Leads to complex queries that involve many (different)
language features

Micro-benchmarks
Domain specific and standard benchmarks

Development Process: Choke Points
•  A benchmark exposes a system to a workload and should identify
the technical difficulties of the system under test
•  Choke Points [BNE14 ] are those technological challenges whose
resolution will significantly improve the performance of a product
•  TPC-H: a 20 years old benchmark (superseded by TPC-DS) but
still influential using business-oriented queries and concurrent
modifications
•  22 queries capturing (most of) the aspects of relational query
processing
•  [BNE14] performed an analysis of the TPC-H workload and
identified 28 choke points grouped into 6 categories

Choke Points à la TPC-H
•  CP1: Aggregation Performance
–  Ordered aggregation, small group-by keys, interesting orders, dependent
group-by keys
•  CP2: Join Performance
–  Large joins, sparse foreign keys, rich join order optimization, late projection
•  CP3: Data Access Locality (materialized views)
–  Columnar locality, physical locality by key, detecting correlation
•  CP4: Expression Calculation
–  Raw Expression Arithmetic, Complex Boolean Expressions in Joins and
Selections, String Matching Performance
•  CP5: Correlated Sub-queries
–  Flattening sub-queries, moving predicates to a sub-query, overlap between
outer- and sub-query
•  CP6: Parallelism and Concurrency
–  Query plan parallelization, workload management, result re-use

Choke Points à la RDF
Choke Point Description
CP1: JOIN
ORDERING
1.  Tests if the engine can evaluate the trade-offs between
the time spent to find the best execution plan and the
quality of the output plan
2.  Tests the ability of the engine to consider cardinality
constraints expressed by the different kinds of schema
constraints (e.g., functional and inverse functional
properties)
CP2:
AGGREGATION
Aggregations are implemented with the use of sub-selects
in the SPARQL query; the optimizer should recognize the
operations included in the sub-selects and evaluate them
first.
CP3: OPTIONAL &
NESTED
OPTIONAL
CLAUSES
Tests the ability of the optimizer to produce a plan where
the execution of the optional triple patterns is the last to
be performed since optional clauses do not reduce the
size of intermediate results.

Choke Points in RDF Benchmarks
CP4: REASONING
Tests the ability of the engine to handle eﬃciently RDFS
and OWL constructs expressed in the schema
CP5: PARALLEL
EXECUTION OF
UNIONS
Tests the ability of the optimizer to produce plans where
unions are executed in parallel
CP6: FILTERS
Tests the ability of the engines to execute as early as
possible those ﬁlter expressions to eliminate
a possibly large number of intermediate results
CP7: ORDERING
Tests the ability of the engine to choose query
plan(s) that facilitate the ordering of results
CP8: GEO-SPATIAL
PREDICATES
Tests the ability of the system to handle queries for
geospatial data

Choke Points in RDF Benchmarks
CP9: FULL TEXT Queries that involve the evaluation of regular expressions
on data value properties of resources
CP10: DUPLICATE
ELIMINATION
Tests the ability of the system to identify duplicate entries
and eliminate them during the creation of
intermediate results
CP11: COMPLEX
FILTER
CONDITIONS
Tests the ability of the engine to deal with negation,
conjunction and disjunction efficiently (i.e., breaking the
filters into conjunction of filters and execute them in
parallel).

Query Characteristics
Characteristics
Simple ﬁlters
Unbound
predicates
LIMIT REGEX CONSTRUCT
Complex ﬁlters Negation ORDER BY UNION ASK
>= 9 TPs OPTIONAL DISTINCT DESCRIBE

Overview
•  Benchmarks
–  Synthetic
–  Real

A Survey of RDF Benchmarks
Synthetic Benchmarks
Real Benchmarks
Benchmark Generators

Benchmark Components
•  Datasets
•  The raw material of the benchmark against which the workload
will be evaluated
•  Synthetic & Real Datasets
!  Synthetic: Produced with a data generator (that hopefully
produces data with interesting characteristics)
!  Real: Widely used datasets from a domain of interest
•  Query Workload
•  Sets of queries and/or updates to evaluate the system with
•  Metrics
•  The performance metric(s) that determine the systems behavior

Synthetic RDF Benchmarks

Lehigh University Benchmark (LUBM) [GPH05]
•  Benchmark intended to facilitate the evaluation of Semantic
Web repositories
•  Widely adopted by the data engineering and Semantic Web
communities
•  Focuses on evaluating the performance of query optimizers
and not ontology reasoning as in DL systems
•  Components:
–  Scalable Synthetic data generator
–  Ontology of moderate size and complexity
–  Supports extensional queries (i.e., queries that request
instances and not only schema information)
–  Proposes Performance metrics

LUBM Univ-Bench Ontology
•  Describes universities and departments and related activities
•  Expressed in OWL Lite ( took into consideration the
limitations of reasoning systems reg. completeness)
Statistics:
!  43 Classes
!  32 Object Type Properties
!  7 Data Type Properties
!  OWL Lite inverseOf, TransitiveProperty,
someValuesFrom, intersectionOf

LUBM Data Generation (1)
•  Synthetically produced extensional data that conform to the
LUBM Ontology
•  Data are generated using the UBA (Univ-Bench Artiﬁcial Data
Generator)
•  Random and Repeatable Data Generation
•  Minimum unit of data generation: University that has
departments, employees, courses
•  Instances of classes and properties are randomly produced
•  To make data more realistic restrictions are applied:
–  «Minimum 15 and maximum 25 departments per university»
–  «Undergraduate student/faculty ratio between 8 and 14
inclusive»

LUBM Data Generation (2)
•  Assignment of Identifiers is done using zero-based indexes
–  University0, Department0, …
•  Data generated by the tool are repeatable for the universities
–  User enters a seed for the random number generator
employed in the data generation process
•  Data created are represented in OWL Lite
•  Configurable serialization and representation model (RDF/
XML in .owl files, DAML)


LUBM Queries (1)
•  14 Realistic Queries
•  Written in SPARQL 1.0
•  Query Design criteria
–  Input Size:
•  proportion of the class instances involved and entailed
in the query to the total instances in the dataset
–  Selectivity:
•  estimated proportion of the class instances that satisfy
the query criteria
•  depends on the input dataset size

LUBM Queries (2)
–  Complexity:
•  measured on the basis of the number of classes and
properties involved in the query
•  diﬀerent complexity for the same query and for
diﬀerent implementations: relational vs RDF
–  Hierarchy information:
•  class and property hierarchies are used to obtain all
query answers
–  Logical inference:
•  inference is required to obtain all query answers

LUBM Queries (3): Characteristics
Characteristic Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14
Simple ﬁlters
Complex ﬁlters
>= 9 TPs
Unbound
predicates
Negation
OPTIONAL
LIMIT
ORDER BY
DISTINCT
REGEX
UNION
DESCRIBE
CONSTRUCT
ASK
Simple SPARQL SELECT Queries

LUBM Queries (4): Choke Points
# CP1 CP2 CP3 CP4 CP5 CP6 CP7 CP8 CP9 CP10 CP11
Q1
Q2 ✓
Q3 ✓
Q4 ✓ ✓
Q5 ✓
Q6 ✓
Q7 ✓
Q8 ✓
Q9 ✓
Q10 ✓
Q11 ✓
Q12 ✓ ✓
Q13 ✓
Q14
Join Ordering
Most complex query contains 5 joins
Reasoning
Focus on subClass and subProperty
hierarchies

LUBM Performance Metrics (1)
•  Load Time:
–  Time needed to parse, load and reason for a dataset
–  Focuses on persistent stores
•  Repository Size:
–  For persistent storage only
–  The size of all ﬁles that constitute the repository
•  Query Response Time:
–  Average time for executing a query 10 times (warm run)

LUBM Performance Metrics (2)
•  Query Completeness and Soundness:
–  Measures the degree of completeness of a query answer as
the percentage of entailed unique answers
•  Combined Metric:
–  Combines query response time with answer completeness
and answer soundness
–  Measures the trade-oﬀ between query response time and
completeness of results
•  See how reasoning aﬀects query performance
–  Provides an absolute ranking of systems
–  But hides details!

SP2Bench [SHM+09]
•  Proposes a language speciﬁc benchmark to test the most
common SPARQL constructs, operator constellations and
RDF access patterns
•  Components:
–  Scalable synthetic data generator
•  Creation of DBLP documents in RDF mimicking key
characteristics of the original DBLP dataset
•  Produced datasets contain blank nodes and RDF
containers
instances and not schema information)
–  Proposes performance metrics

SP2Bench Schema DBLP (2)
•  Probability distribution for selected attributes per document
classes
•  Additional assumption is that attributes are not dependent
–  Existence of an attribute does not depend on another

•  Use Bell-shaped Gaussian curves to approximate input data
–  Typically used to model normal distributions
•  Studied the number of class instances over time and modeled
those with a power law distribution
Article Inproc. Proc. Book WWW
author 0.9895 0.9970 0.0001 0.8937 0.9973
cite 0.0048 0.0104 0.0001 0.0079 0.0000
editor 0.0000 0.0000 0.7992 0.1040 0.0004
isbn 0.0000 0.0000 0.8592 0.9294 0.0000
… … … … … …

SP2Bench Data Generation
•  Synthetically produced extensional data that conform to the DBLP
Schema
•  Use of existing external vocabularies to describe resources in a uniform
way
–  FOAF (persons) – Friend of A Friend [ FOAF], SWRC - Semantic Web
for Research Communities (scientific publications) [SWRC], DC –
Dublin Core [DC]
•  Introduce blank nodes and RDF containers (rdf:Bag) to capture all aspects
of the RDF data model
•  Data generation takes into account data approximation as reflected in
the Gaussian curves
•  Data generator takes as input either the triple count, or year up to which
the data is generated
–  Always ending up in a consistent state!
•  Random functions are based on a fixed seed making data generation
deterministic

SP2Bench Queries (1): Characteristics
•  17 queries
–  12 main queries and modiﬁcations thereof
•  Provided in natural language, in SPARQL 1.0 and SQL
translations are also available
•  Query design criteria
–  Focus on SELECT and ASK SPARQL forms
–  Aim at covering the majority of SPARQL constructs
(including DISTINCT, ORDER By, LIMIT, OFFSET)

SP2Bench Queries (2): Characteristics
Characteristic Q1 Q2 Q3abc Q4 Q5ab Q6 Q7 Q8 Q9 Q10 Q11 Q12abc
Simple ﬁlters ✔ ✔ ✔ ✔
Complex ﬁlters ✔ ✔ ✔
>= 9 TPs ✔ ✔ ✔ ✔ ✔
Unbound
predicates
✔ ✔
Negation ✔ ✔
OPTIONAL ✔ ✔ ✔
LIMIT ✔
ORDER BY ✔ ✔
DISTINCT ✔ ✔ ✔ ✔ ✔ ✔
REGEX
UNION ✔ ✔ ✔
DESCRIBE
CONSTRUCT
ASK ✔

SP2Bench Queries (3): Choke Points
Q1 ✓
Q2 ✓ ✓
Q3 ✓
Q4 ✓ ✓ ✓
Q5 ✓ ✓ ✓
Q6 ✓ ✓ ✓ ✓
Q7 ✓ ✓ ✓ ✓
Q8 ✓ ✓ ✓ ✓ ✓
Q9 ✓ ✓
Q10
Q11 ✓
Q12 ✓ ✓ ✓
Join Ordering: most
complex query contains 8 joins
Filters: most complex
query contains 2 ﬁlters
Duplicate
Elimination

SP2Bench Performance Metrics
•  Loading Time:
–  time needed to parse, load and reason using the tested system
for a dataset
–  Focuses on persistent stores
•  «Per-query» performance:
–  Performance of each query
•  «Global» performance:
–  List the arithmetic and geometric mean of queries
1.  Multiply the execution time of all 17 queries
2.  Penalize queries that fail with 3600s penalty
3.  Compute the 17th root of the result
•  Memory consumption
–  High watermark of main memory consumption
–  Average memory consumption of all queries

Berlin SPARQL Benchmark (BSBM) [ BS09][BSBM]
•  Built around an e-commerce use case
•  Query mix emulates the search and navigation patterns of a user
looking for a product of interest
•  Goals
–  Allow the comparison of SPARQL engines across diﬀerent
architectures (relational and/or RDF)
–  Challenge forward and backward chain reasoning engines
–  Focuses on an enterprise setting where multiple clients
concurrently execute workloads
–  Measures SPARQL query performance and not (so much)
reasoning
•  Components
–  Data generator: supports the creation of arbitrarily large
datasets
–  Test Driver: executes sequences of SPARQL queries

BSBM Schema (1)
•  E-commerce use case: products are offered by several vendors
and consumers post reviews for those products
9..22
Review

bsbm:reviewFor
rev:reviewer
bsbm:reviewDate
dc:title
rev:text
bsbm:rating1[0..1]
bsbm:rating2[0..1]
bsbm:rating3[0..1]
bsbm:rating4[0..1]
Producer

rdfs:label
rdfs:comment
rdf:type
foaf:homepage
bsbm:country
ProductType

rdfs:label
rdfs:comment
rdf:type
rdfs:subClassOf[1..0]
ProductFeature

rdfs:label
rdfs:comment
rdf:type
Product

rdfs:label
rdfs:comment
rdf:type
bsbm:producer
bsbm:productFeature[9..22]
bsbm:productPropertyTextual1
bsbm:productPropertyTextual4[0..1]
bsbm:productPropertyTextual5[0..1]
bsbm:productPropertyNumeric1
bsbm:productPropertyNumeric4[0..1]
bsbm:productPropertyNumeric5[0..1]

Offer

bsbm:product
bsbm:vendor
bsbm:price
bsbm:validFrom
bsbm:validTo
bsbm:deliveryDays
bsbm:offerWebpage

Person

foaf:name
foaf:mbox_sha1sum
bsbm:country

Vendor

rdfs:label
rdfs:comment
rdf:type
foaf:homepage
bsbm:country

1..89
1
1..*
1..*
1..*
1
2..16
1
4..32
1
280..3730
2..37
1

BSBM Schema & Data Characteristics (1)
•  Every product has a type from a product hierarchy
•  Product Hierarchy is not ﬁxed (depends on the dataset size)
–  It’s depth and width depends on the chosen scale factor
–  Hierarchy depth
–  Branching factor for
•  root level
•  all other levels is 8
•  Product types are assigned a variable number of product
features
–  computed as lowerBound and upperBound with
•  aa
–  Set of possible features for a given product type is the union of
the type and all its “super-types”.
d =1+round(log10(n)) / 2
n
bfr =1+ round(log10(n))
lowerBound = 35*i / (d *(d +1) / 2 −1),upperBound = 75*i / (d *(d +1) / 2 −1)

•  Products, Vendors, Offers
–  Products that share the same type, have also the same set of
features
–  For a given product, its features are chosen from the set of
possible features with a hard-coded probability of 25%
–  Normal distribution with a mean of μ=50 and standard deviation
σ=16.6 is employed to associate products with producers
–  Vendors are associated to countries following hard-coded
distributions
–  Size of offers is n*20 are distributed over products following a
normal distribution with «fixed parameters» μ=n/2 and σ=n/4
–  Offers are distributed over vendors following a normal
distribution with «fixed parameters» μ=2000 and σ=667

•  Reviews
–  10 times the scale factor n
–  Data type property values (title and text) between 50 – 300
words
–  Up to 4 ratings, each rating is a random integer between 1 and
10
–  Each rating is missing with hard-coded probability 10%
–  Distributed over products with a normal distribution depending
on dataset size and following μ=n/2 and σ=n/4
–  Number of reviews per reviewer follows normal distribution
with μ=20 and σ=6.6
–  Reviews are generated until all reviews are assigned a reviewer
–  Reviewer countries follow the same distribution as vendor
countries

BSBM Data Generation (1)
•  Synthetically produces instances of class Product that conform to
the BSBM Schema
Total #triples 250K 1M 2M 100M
#products 666 2,785 70,812 284,826
#product
features
2,860 4,745 23,833 47,884
#product types 55 151 731 2011
#producers 14 60 1422 5,618
#vendors 8 34 722 2,854
#oﬀers 13,320 55,700 1,416,240 5,696,520
#reviewers 339 1432 36,249 146,054
#reviews 6,660 27,850 708,120 2,848,260
Total #instances 23,922 92,757 2,258,129 9,034,027
Indicative number of instances for diﬀerent dataset sizes

BSBM Queries (1)
•  12 Queries
•  Query mix is emulates search and navigation patterns of a customer
looking for a product
•  BSBM queries are given in natural language, SPARQL and SQL
Query Description
Q1 Find products for a given set of generic features
Q2 Retrieve basic information about a specific product for display purposes
Q3 Find products having some specific features and not having one feature
Q4 Find products matching two different sets of features
Q5 Find products that are similar to a given product
Q6 Find products having a label that contains a specific string
Q7 Retrieve in-depth information about a product including offers and reviews
Q8 Give me recent language reviews for a specific product
Q9 Get information about a reviewer
Q10 Get cheap offers which fulfill the consumer’s delivery requirements
Q11 Get all information about an offer
Q12 Export information about an offer into another schema

Characteristic Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12
Simple ﬁlters ✔ ✔ ✔ ✔ ✔ ✔ ✔
Complex ﬁlters ✔ ✔
> 9 TPs ✔ ✔ ✔ ✔ ✔
Unbound
predicates
✔

✔

Negation ✔

OPTIONAL ✔ ✔ ✔ ✔
LIMIT ✔ ✔ ✔ ✔ ✔ ✔
ORDER BY ✔ ✔ ✔ ✔ ✔ ✔
DISTINCT ✔ ✔ ✔
REGEX ✔
UNION ✔ ✔
DESCRIBE ✔
CONSTRUCT ✔
ASK
BSBM Queries (2): Characteristics
11 JOINs,
3 OPTIONAL clauses,
3 Filters,
1 Unbound variable
4 OPTIONAL clauses

BSBM Queries (3): Choke Points
Q1 ✔ ✔ ✔ ✔
Q2 ✔
Q3 ✔ ✔ ✔
Q4 ✔ ✔ ✔ ✔
Q5 ✔ ✔ ✔ ✔
Q6 ✔ ✔
Q7 ✔ ✔ ✔
Q8 ✔ ✔ ✔
Q9 ✔
Q10 ✔ ✔ ✔ ✔
Q11 ✔
Q12 ✔
Join Ordering: most
Filters: most complex query
contains 3 ﬁlters and most
complex ﬁlter contains
arithmetic expressions
Result
Ordering

BSBM: Performance Metrics
•  Query Mixes per Hour (QMpH)
–  Measures the number of complete BSBM query mixes answered
by a system under test and for a specific number of clients
running concurrently against the system under test
•  Queries per Second (QpS)
–  Measures the number of queries of a specific type handled by
the system under test in a second
–  Calculated by dividing the number of queries of a specific type
within a benchmark run by the total execution time of those
queries
•  Load Time:
–  Time to load the dataset in the RDF or relational repositories
•  Includes the time to create the appropriate data structures &
indices

Semantic Publishing Benchmark (SPB)
•  Developed in the context of FP7 EU Project LDBC (2012-2015)
•  LDBC’s goals:
–  Develop querying benchmarks that will spur research &
industry progress in large-scale graph and RDF data
management
•  scalability, storage, indexing and query optimization
techniques for RDF and graph database solutions
•  quantitatively and qualitatively assess diﬀerent
solutions for RDF data integration
–  To establish an industry-neutral entity - LDBC foundation -
à la the Transaction Processing Council (TPC)

Semantic Publishing Benchmark (SPB)
•  Industry-motivated benchmark
–  The scenario involves a media / publisher organization that
maintains semantic metadata about its Journalistic assets
•  Components
–  Scalable Synthetic Data Generator
•  Creation of instances of BBC ontologies mimicking
characteristics of the original real input datasets
instances and not schema information)
–  Workload simulates consumption of RDF metadata
•  Concurrent read and update queries
–  Proposes performance metrics

SPB Design: Requirements
•  Storing and processing RDF data
–  Storing and isolating data in separate RDF graphs
–  Supporting following SPARQL standards :
•  SPARQL 1.1 Protocol, Query, Update
•  Support for Schema Languages
–  Support for RDFS to obtain the correct answers
–  Optional support for the RL proﬁle of Web Ontology Language
(OWL2 RL) in order to pass the conformance test suite
•  Loading data from RDF serialization formats
–  N-Quads, TRIG, Turtle, etc.

SPB Schema: BBC Ontologies (1)
•  Core Ontologies: 7 ontologies describe basic concepts about
entities and relationships in the domain of interest
–  Basic Concepts: Creative Works, Places, Persons, Provenance
Information, Company Information, etc.

Thing CreativeWork
String
cwork:title
owl:Thing owl:sameAs
Theme Organisation
Event PlacePerson Programme
NewsItemBlogPost
cwork:tag
cwork:shortTitle
String
cwork:category
xsd:Any
cwork:description
String
Audience
International Audience National Audience
cwork:audience
cwork:Format
Textual
Format
Video
Format
Interactive Format
Image Format Audio Format
PictureGallery
Format
cwork:primaryFormat
xsd:dateTime
xsd:dateTime
cwork:dateModified
cwork:dateCreated
cwork:Thumbnail
cwork:thumbnail
Thumbnail ThumbnailTypethumbnailType
StandardThumbnail
FixedSize66Thumbnail
CloseUpThumbnail
p
rdfs:subClassOf rdfs:subPropertyOf
rdf:type
tag
about mentions
Stringcwork:altText

Schema BBC Schema (2)
•  Domain Ontologies: 3 ontologies describe concepts and
properties related to a speciﬁc domain
–  sports (competitions, events)
–  politics entities
–  news (concepts that journalists tag annotations with)
•  Statistics
–  74 classes
–  88 data type properties, 28 object type properties
–  60 rdfs:subClassOf (maximum depth 3) , 17 rdfs:subPropertyOf
(maximum depth 1) hierarchies
–  105 rdfs:domain and 115 rdfs:range RDFS properties
–  8 owl:oneOf class axioms, 1 one owl:TransitiveProperty
property.

SPB: Reference datasets
•  Collections of entities describing various domains
–  Snapshots of the real datasets of BBC
•  Football competitions and teams
•  Formula One competitions and teams
•  UK Parliament Members
–  Additional datasets
•  GeoNames - Places, names and coordinates
•  DBPedia – Person data
–  Reference Dataset Size: 25M triples

SPB Data Generation (1): Process
1.  Loader
–  Ontology & Reference Data
2.  Data Generator
a.  Retrieves instances
from Reference Datasets
b.  Generates Creative Works
according to pre-deﬁned
allocations and models
c.  Writes generated data to
disk
RDF Repository
BBC
Ontologies
Reference
Datasets
Ontology &
Reference
Data Set Loader
Creative
Works
Generator
SPARQL Endpoint
SPB Data Generator
Data
generation
parameters
(1) (1)
(2.a)
Generated
CWs
(2.c)
(1)
(2.d)

SPB Data Generation (2)
•  Produces synthetic data that mimic most of the characteristics of real
world data provided by BBC
•  Input: Core & Domain Ontologies and Reference datasets
•  Output:
–  Instances that conform to BBC core ontologies (class Creative Work)
–  Instances refer to entities in the reference datasets using the about &
mentions schema properties
–  follows the (user) pre-deﬁned distributions of SPB’s Data Generator
Tagged entities
01/2012 12/ 2012
clustering
correla1ons
random distribu1on

SPB Operational Phases
•  Data Loading
1.  Initial loading of reference datasets
•  BBC datasets enriched with DBPedia Person and GeoNames
place data
2.  Generation of Creative Works
•  Parallel generation (multi-threaded and multi-process)
3.  Loading of Creative Works in the RDF repository
•  Running the Benchmark
1.  Warm-up phrase
2.  Run the benchmark using the Test Driver
3.  Run conformance tests (OWL2 RL) [optional]

Benchmark Conﬁguration
•  Data Generator
–  Allocation of tags in Creative Works
•  Correlations of creative works with important entities
(persons, places, events)
•  Clustering of Creative Works around major / minor events
–  Size of generated data (triples)
–  Parallel data generation
•  Test Driver
–  Distribution of queries in the query-mix
•  editorial operations (deletion/addition of RDF triples)
•  aggregate operations (complex SPARQL queries)
–  Number of editorial / aggregation agents
–  Duration of Warm-up and Benchmark phases
–  Each operational phase can be enabled or disabled

SPB Base Workload Queries (2)
Characteristic Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12
Simple ﬁlters ✔
Complex ﬁlters ✔ ✔ ✔
> 9 TPs ✔ ✔ ✔ ✔
Unbound
predicates
✔
Negation
OPTIONAL ✔ ✔ ✔ ✔
LIMIT ✔ ✔ ✔ ✔ ✔ ✔
ORDER BY ✔ ✔ ✔ ✔ ✔ ✔ ✔
DISTINCT ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
COUNT ✔
REGEX
UNION ✔ ✔ ✔
GROYP BY ✔
CONSTRUCT ✔ ✔ ✔ ✔ ✔
Evaluate (parts of the) query
on graphs

SPB Queries (1)
•  Base and Advanced Workloads
–  Base Workload: 12 queries & update operations
–  Advanced Workload: 24 queries
•  Workloads based on real queries used by BBC journalists
during their editorial operations
•  Editorial agents – simulate editorial work performed by
journalists :
–  Insert, Update, Delete
•  Aggregation agents – simulate retrieval operations
performed by end-users

SPB Base Workload Queries (3): Choke Points
Q1 ✔ ✔ ✔ ✔ ✔
Q2 ✔ ✔ ✔
Q3 ✔ ✔ ✔ ✔ ✔ ✔
Q4 ✔ ✔ ✔ ✔ ✔
Q5 ✔ ✔ ✔ ✔ ✔
Q6 ✔ ✔ ✔ ✔
Q7 ✔ ✔
Q8 ✔ ✔ ✔
Q9 ✔ ✔ ✔
Q10 ✔ ✔ ✔ ✔
Q11 ✔ ✔ ✔ ✔ ✔
Q12 ✔ ✔
Reasoning reg. class &
property hierarchies
Join Ordering
Ordering & Duplicate
Elimination

SPB Performance Metrics
•  SPB Primary Metrics
•  Query Execution Report (1)

•  Query Execution Report (2)
Query Rate
Interactive mix
(Queries per second)
Query Rate
Analytical Mix
(Queries per second)
Update Rate
(Operations per
second)
Duration of
Bulk Load
(in ms)
Duration of
Measurement
Window
(in minutes)
# Complete
Analytical
mixes
(per second)
# Complete
Interactive mixes
(per second)
# Complete
Update
Operations

Query Arithmetic Mean
Execution Time
Minimum
Execution Time
90th % Average
Execution Time
# Executions

Real RDF Benchmarks

UniProt [RU09][UniprotKB]
•  Comprehensive, high-quality and freely accessible resource of
protein sequence and functional information
•  UniProt Schema
–  UniProt Core Vocabulary, BIBO (journals), ECO (evidence
codes), Dublin Core (metadata)
–  UniProt Core Vocabulary: 124 classes, 113 Properties
•  Dataset contains approximately
–  13 billion triples
–  2.5 billion distinct subjects
–  2 billion distinct objects
•  Queries
–  No representative set of queries is oﬀered.
–  [NW09] oﬀers a set of 8 queries to test the RDF-3X engine


UniProt Queries (1) [NW09]: Characteristics
Characteristic Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
Simple ﬁlters
Complex ﬁlters
> 9 TPs ✔ ✔ ✔ ✔ ✔ ✔
Unbound
predicates
Negation
OPTIONAL
LIMIT
ORDER BY
DISTINCT
REGEX
UNION
DESCRIBE
CONSTRUCT
ASK
Join Ordering
RDF-3X aims at optimizing
join processing for RDF data

UniProt Queries (2) [NW09]: Choke Points
•  Focus on discovering optimal or close to optimal join orders
Q1 ✔
Q2 ✔
Q3 ✔
Q4 ✔
Q5 ✔
Q6 ✔
Q7 ✔
Q8 ✔
Join Ordering: most
7 queries contain more than 7 joins

YAGO (Yet Another Great Ontology)[SKW07]
•  High quality multilingual knowledge based derived from
Wikipedia, WordNet and GeoNames
•  Schema
–  Wikipedia Entities, WordNet and GeoNames Concepts and
Relationships: associates WordNet taxonomy with Wikipedia
Category System
–  10 million schema entities
•  Dataset
–  120 million triples about schema entities
–  2.625 million links to DBPedia
•  Queries
–  No representative set of queries is oﬀered by YAGO
–  [NW10] provides a representative set of 8 queries for RDF-3X
Evaluation

YAGO Queries (1) [NW10]: Characteristics
•  Simple SELECT queries that focus on Join ordering, negation
and duplicate elimination
Characteristic A1 A2 A3 B1 B2 B3 C1 C2
Simple ﬁlters ✔
Complex ﬁlters
> 9 TPs ✔
Unbound
predicates
Negation ✔ ✔ ✔
OPTIONAL
LIMIT
ORDER BY
DISTINCT ✔ ✔ ✔ ✔ ✔
REGEX
UNION ✔

YAGO Queries (2) [NW10]: Choke Points
•  Queries focus mostly on discovering optimal or close to query
evaluation plans, including negation in ﬁlters and duplicate
elimination.
• 
A1 ✔
A2 ✔
A3 ✔ ✔ ✔ ✔
B1 ✔ ✔ ✔
B2 ✔ ✔
B3 ✔ ✔ ✔
C1 ✔ ✔ ✔ ✔
C2 ✔ ✔ Join Ordering: most
all queries contain more than 5 joins

Barton Library [Barton]
•  Data from the MIT Simile Project that develops tools for library data
management
–  contains records that compose an RDF-formatted dump of the MIT
Libraries Barton catalog
–  converted from raw data stored in an old library format standard
called MARC (Machine Readable Catalog).
•  Schema
–  Common types include Record and Item, the latter being associated
with instances of type Person and with instances of Description.
–  Primitive types include Title and Date.
•  Dataset
–  Approximately 45 million RDF triples
•  Queries
–  No representative queries provided with the Barton Library Dataset
–  [Abadi07] provides a workload of 7 queries ([NW10] in SPARQL)

Barton Queries (1) [NW10]: Characteristics
Characteristic Q1 Q2 Q3 Q4 Q5 Q6 Q7
Simple ﬁlters ✔ ✔ ✔ ✔
Complex ﬁlters
> 9 TPs
Unbound
predicates
Negation ✔
OPTIONAL
LIMIT
ORDER BY
DISTINCT ✔ ✔ ✔
REGEX
UNION ✔

Barton Queries (2) [NW10]: Choke Points
•  Queries focus mostly on discovering optimal or close to
optimal query evaluation plans, including negation in ﬁlters
and duplicate elimination.
• 
Q1 ✔
Q2 ✔ ✔
Q3 ✔
Q4 ✔
Q5 ✔ ✔
Q6 ✔ ✔ ✔ ✔
Q7 ✔
Join Ordering: most

Linked Sensor Dataset [PHS10]
•  Expressive descriptions of approximately 20,000 weather
stations in the US
•  divided up into multiple subsets, that reflect weather data for
specific hurricanes or blizzards from the past (focus on hurricane
Ike)
•  Schema
–  Contains information about temperature, precipitation,
pressure, wind, speed, humidity
–  Contains links to GeoNames and links to observations provided
by MesoWest (meteorological service in the US)
•  Dataset
–  more than 1 billion triples
•  Queries
–  No representative set of queries is offered.

WordNet [WordNet]
•  Large lexical database of English, developed under the
direction of George A. Miller (Emeritus).
•  Schema
–  Nouns, verbs, adjectives and adverbs are grouped into sets of
cognitive synonyms (synsets), each expressing a distinct
concept.
–  Synsets are interlinked by means of conceptual-semantic and
lexical relations. The resulting network of meaningfully related
words and concepts can be navigated with the browser.
•  Dataset
–  Approximately 1.9 million triples (300MB).
•  Queries
–  No representative query workload

Publishing TPC-H as RDF [TPC-H]
•  Benchmark can be used by decision support systems that
examine
–  large volumes of data, execute queries with a high degree of
complexity, and provide answers to critical business questions
•  Benchmark provides a suite of business oriented ad-hoc queries
and concurrent data modiﬁcations
•  Queries and the data populating the database have been chosen to
have broad industry-wide relevance
•  Use the DBGEN TPC-H generator to generate a TPC-H relational
dataset
•  Use the D2R tool or other relational to RDF tool to convert the
relational dataset to the equivalent RDF one.
•  TPC SQL queries are translated to equivalent SPARQL queries

Benchmark Generators

DBPedia SPARQL Benchmark (DBSB) [MLA+14]
•  Generic Methodology for SPARQL Benchmark Creation
•  Based on
–  Flexible data generation that mimics an input data source
–  Query-log mining
–  Clustering of queries
–  SPARQL queries feature analysis
•  Methodology is schema agnostic
–  Demonstrated using DBPedia KB
•  Proposed approach applied on various sizes of the DBPedia
Knowledge Base
•  Benchmark proposes query workload based on real queries
expressed against DBPedia

DBSB Data Generation (1)
•  Working assumptions
1.  Output dataset should have similar characteristics as
input dataset
•  Number classes, properties, value distributions,
taxonomic structures (hierarchies)
2.  Varying output dataset sizes
3.  Characteristics such as in-, out- degree of nodes in
datasets of varying sizes should be similar
4.  Easily repeatable data generation process

DBSB Data Generation (2)
•  Idea
1.  Large datasets produced by
•  Duplicating all triples and changing their namespace
2.  Smaller datasets produced by
•  Removing triples in a way that would preserve the
properties of the original graph
•  Using a seed based method based on the assumption that a
representative set of resources is obtained by sampling
across classes
1.  For each selected element in the dataset, its concise
bound description (CBD) is retrieved and added in the
queue
2.  Process is repeated until the number of triples is
reached

DBSB Query Analysis (1)
•  Goal is to detect prototypical queries that were sent to a
DBPedia SPARQL endpoint using similarity measures
–  String similarity and graph similarity
•  Idea: 4-step query analysis and clustering approach
1.  Select queries executed frequently on the input data
2.  Strip common syntactic constructs (namespace, preﬁxes)
3.  Compute query similarity using string matching
4.  Compute query clusters using a soft graph clustering
algorithm
•  Clusters used to devise the benchmark query generation
patterns

•  Query Selection
1.  Use DBPedia SPARQL Query log (31.5 million queries in a 3
month period)
2.  Reduce the initial set of queries by considering
•  Query Variations: use a standard way to name variables to
reduce diﬀerences among queries (promoting query
constructs such as DISTINCT, REGEX)
•  Query Frequency: discard queries with low frequency since
they do not contribute to the overall query performance
–  Result: 35,965 queries
3.  String Stripping: remove all SPARQL keywords and common
preﬁxes
4.  Similarity Computation: compute the similarity of the stripped
queries

•  Query Selection (cont’d)
4.  Similarity Computation
•  Reduce the time of benchmark compilation, use LIMES[NS11]
framework
•  Use the Levenshtein string similarity measure, 0.9 threshold
•  Reduce by 16.6% the number of computations required by
computing the Cartesian product of queries
5.  Clustering
•  Apply graph clustering to the query similarity graph of (4)
•  Goal is to identify similar groups of queries out of which
prototypical queries will be generated
•  Use BorderFlow [NS09] algorithm that follows a seed-based
approach
•  Obtain 12272 clusters, 24% contain a single query
•  Select the clusters with >5 queries

DBSB Query Generation (1)
•  Select the most interesting SPARQL queries
–  Which are the most frequently asked SPARQL queries
–  Which of those queries cover the most SPARQL features
•  SPARQL Features
–  Overall number of triple patterns
•  Test the efficiency of join operations (CP1)
–  SPARQL pattern constructors (UNION & OPTIONAL)
•  Handle parallel execution of Unions (CP5)
•  Perform OPTIONALs as late as possible in the query plan (CP3)
–  Solution sequences & modifiers (DISTINCT)
•  Efficiency of duplication elimination ( CP10)
–  Filter conditions and operators (FILTER, LANG, REGEX, STR)
•  Efficiency of engines to execute filters as early as possible (CP6)

DBSB Query Generation (2)
•  25 queries are selected
–  For each of the features, manually select the part of the query to
be varied (IRI or filter condition)
–  Variability of query template(s) for the chosen values is
sufficiently high (>=1000 per query template)
Method ensures that
•  Executed queries during the benchmark differ
•  Always return non empty results

Apples and Oranges [DKS+11]
•  Propose structuredness to characterize datasets
–  The level of structuredness of a dataset D, with respect to a type (class)
T, is determined by how well the instances of T, conform to type T
–  If each instance of T has the properties defined in T, then the dataset
has high structuredness with respect to T
0
1
2
3
4
5
6
name office ext major GPA
OC(p, I(T, D))
OC(p, T) for each property p of T
0
1
2
3
4
5
6
name office ext major GPA
Highly structured dataset
•  all instances have the name attribute
•  ext & GPA properties encountered in
50% of the instances
•  οffice property found in 20% of the instances
•  major property in 10% of the instances
•  all instances have all attributes

•  One of the key considerations while deciding:
–  appropriate data representation format (e.g., relational for
structured and XML for semi-structured data)
–  organization of data (e.g., dependency theory and normal forms
for the relational model, and XML).
–  data indexes (e.g., B+-tree indexes for relational and numbering
scheme-based indexes for XML).
–  data querying (e.g., using SQL for the relational and XPath/
XQuery for XML).
In other words, structuredness permeates
every aspect of data management

0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Highly structured datasets
(relational like)
Less structured datasets
Synthetic Datasets
Real Datasets

Some important observations:
•  Since TPC-H is a relational
dataset, it should have high
structuredness.
•  There is a diﬀerence between
synthetic and and real datasets.
•  Synthetic are fairly structured
and relational-like
•  Real datasets cover the whole
spectrum of structuredness.
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Structuredness of datasets
Existing RDF stores are tested and
compared against each other with respect
to datasets that are not representative of
most real RDF data.

•  Nothing can better represent data than the data itself!
•  Idea: Turn every dataset into a benchmark
1.  No need to synthetically generate values
•  Use the actual data values in the dataset
2.  No need to synthetically generate queries.
•  The queries that are known to run in your data can be
used in the benchmark.
3.  But we need to cover the structuredness spectrum
•  to get data as close as possible to the real world data
•  to see how the systems perform when data goes from
very structured to less structured

Counting Coins [DKS+11]
•  Start with a dataset with size S and CH = 0.5
•  Aim for a dataset with size S’ and CH’,
where S > S’ and CH > CH’.
Process:
•  Assign a coin to each triple (s, p, o) and compute the
impact in CH of its removal
–  The removal will impact the size by 1.
Example: Consider (person1, ext, x5304). Removing
the triple from D gives a dataset with CH(T, D) = 0.467.
Therefore the coin(person1, ext, x5304) = 0.5 – 0.467 =
0.033.
•  Formulate (automatically) an integer programming
problem whose solutions will tell us how many coins to
remove to achieve the desired coherence CH’ and size S’.
person0 name Eric
person0 oﬃce BA7430
person0 ext x4401
person1 name Kenny
person1 ext x5304
person2 name Kyle
person2 ext x6281
person3 name Timmy
person3 major C.S.
person3 GPA 3.4
person4 name Stan
person4 GPA 3.8
person5 name Jimmy
person5 GPA 3.7
One of the few occasions in life where having
too many coins is undesirable…

Technical challenges in problem formulation
•  Compute coins which represent the impact on structuredness
of removing all triples with subjects that are instances of a
type T with properties equal to p
–  Therefore one coin for each type/property combination.
•  Add constraints that set lower and upper bounds on the
number of coins that can be removed so as not to completely
remove a property from a type.
•  Add constraints which guarantee that not all instances of a
type are removed.
•  To deal we multi-valued properties, we add constraints that
introduce a relaxation parameter ρ
–  required because of the approximation by using the average
number of triples per coin.

Waterloo SPARQL Diversity Test Suite [AHO+14]
•  Stress existing RDF engines to reveal a wider range of query
requirements as established by web applications
•  Contributions
–  Deﬁnition of 2 classes of query features used to evaluate
the variability of workloads and datasets
•  Structural (e.g., number of triple patterns)
•  Data-driven (aﬀect selectivity and result cardinality)
–  In-depth analysis of existing SPARQL benchmarks using
the structural and data-driven features
–  WatDiv Test Suite to stress existing RDF engines to reveal a
wider range of query requirements

WatDiv Structural Features (1)
1.  Triple Pattern Count
–  Number of triple patterns in SPARQL Graph Patterns
2.  Join Vertex Count
–  Number of RDF terms (IRIs, literals, blank nodes) and variables that
are subjects or objects of multiple triple patterns
3.  Join Vertex Degree
–  The degree of a join vertex v is the number of triple patterns whose
subject or object is v
SP2Bench Q5a
SELECT DISTINCT ?person ?name
WHERE {
?article rdf:type bench:Article.
?article dc:creator ?person.
?inproc rdf:type bench:Inproceedings.
?inproc dc:creator ?person2.
?person foaf:name ?name.
?person2 foaf:name ?name2
FILTER(?name=?name2) }
Triple
Count
Join
Vertices
Join
Vertex
Count
Join Vertex
Degree
6 ?article, ?inproc
?person, ?
person2
10 ?article:2, ?inproc:2
?person:2, ?
person2:2

•  Join Vertex Degree & Count provide a good characterization of
the structural complexity of a query
–  Number of triple patterns does not properly characterize the
query: two queries with the same set of triple patterns can have
diﬀerent structures
?n
?m
?x
?l
C
E
?k
A
?y
?b
?z
?d ?o
Linear query
?c
D
D
?x
?b
B
?z
C
?w
D
?b
E
?w
Snowﬂake query
?y
?b
?x
B
E
A
D
?z
C
?c
Star query
?m
?f
G

•  Join Vertex Type
–  Play an important role in the behavior of RDF engines to determine
eﬃcient query plans
•  E.g., star queries promote eﬃcient merge joins
•  3 (mutually non-exclusive) types of join vertices
–  Vertex x of type SS+
if for all triple patterns (s,p,o)*, x is the subject
–  Vertex x of type OO+
if for all triple patterns (s,p,o)*, x is the object
–  Vertex x of type SO+
if for all triple patterns (s,p,o)*, (s’,p’,o’) x=s & x=o’

?n
?m
?x
?l
C
E
?k
?m type SS+

?x
B
?z
C
?w
?x type OO+

?c
D
D
?x
?b
B
?z
C
?w
?x type SO+

*Triple pa8erns (s,p,o) are incident on x

WatDiv Data-driven Features (1)
•  A system’s choice on the most efficient query plan depends on
–  (a) the characteristics of the dataset and
–  (b) the query
•  If the system relies on selectivity estimations and result
cardinality, the same query will have a different query plan for
dataset(s) of different sizes
•  Different cases:
–  Queries have a diverse mix of result cardinalities
–  Some triple patterns are very selective, others are not
–  All triple patterns are equally selective

•  Result Cardinality CARD(Ā,G)
–  the number of solutions in the result of the evaluation of a graph
pattern Ā = <A, F> over graph G
•  Filter Triple Pattern Selectivity (f-TP Selectivity) SELF
G (tp)
–  the ratio of distinct solution mappings of a triple pattern tp to
the set of triples in graph G
•  Measures
1.  Result cardinality
2.  Mean & standard deviation of f-TP selectivities of triple
patterns
•  Important for distinguishing queries whose triple patterns are
almost equally selective from queries with varying f-TP
selectivities

•  Result Cardinality & f-TP selectivity are not sufficient
–  Intermediate solution mappings will not make it to the final
result (e.g., due to filters or more restrictive joins)
–  The overall selectivity of a graph pattern can be
determined by a single very selective triple pattern
•  Run-time optimization techniques (e.g., side-ways
information passing) to early prune intermediate results
•  Introduce 2 features to capture above cases
1.  BGP-Restricted f-TP selectivity
2.  Join-Restricted f-TP selectivity

WatDiv Data-Driven Features (4)
•  BGP-Restricted f-TP selectivity SELF
G (tp|Ā)
•  assesses how much a triple pattern contributes to the overall
selectiveness of the query
•  fraction of distinct solution mappings for a triple pattern that
are compatible with some solution mapping in the query result.
•  Join-restricted f-TP selectivity SELF
G (tp|x)
•  assesses how much a ﬁltered triple pattern contributes to the
overall selectiveness of the joins that it participates in
•  for x a join vertex and tp a triple pattern incident on x, the x-
restricted f-TP of tp over graph G is the fraction of distinct
solution mappings compatible with a solution mapping in the
query result of the sub-query that contains all triple patterns
incident to x

WatDiv Test Suite (1)
•  Components: Data Generator and Query Generator
•  Data Generator
–  Allows users to deﬁne their own dataset controlling
•  Entities to include
•  Topology of the graphs allowing one to mimic the real types
of data distributions in the Web
– «well-structuredness» of entities
– probability of entity associations
– cardinality of property associations
–  Important: Instances of the same entity do not have the same
set of attributes: breaking the «relational nature» of previous
RDF benchmarks

•  Query Template Generator
–  User-specified number of templates
–  User specified template characteristics
•  Number of triple patterns
•  Types of joins and filters in the triple patterns
–  Traverses the WatDiv schema using a random walk and
generates a set of query templates
•  Query Generator
–  Instantiates the query templates with terms (IRIs, literals etc.)
from the RDF dataset
–  User-specified number of queries produced

•  Query Template Generator
–  Random Walk on an internal representation of the schema
•  Entity types in the schema correspond to graph vertices
•  Relationships (i.e., object type properties) are graph edges
•  Vertices are annotated with data type properties (i.e.,
attributes)
–  Produces a set of Basic Graph Patterns with a maximum n triple
patterns with unbound objects and subjects
–  k uniformly randomly selected subjects/objects are replaced
with placeholders
–  Placeholders are replaced with actual RDF terms randomly
retrieved from the dataset

Comparison of WatDiv with other RDF Benchmarks
Copyright [AHO+14]
•  Query Workload
–  Large range of queries
•  Mean join vertex degree distributed among 2 and 10
–  Join Vertex Types:
•  18% of queries are star joins, 4.4% in DBSB
•  61.3% of queries are path queries, 5.4% in DBSB

Comparison of WatDiv with other RDF Benchmarks
Copyright [AHO+14]
•  Data-Driven Features
–  DBSB and BSBM cover the ends of the spectrum of mean Join-Restricted f-
TP selectivity values
–  WatDiv covers the full spectrum of Restricted f-TP selectivity values
–  WatDiv covers a lower range of values for mean f-TP selectivity when
compared to DBSB

General Remarks
•  comparable to DBSB
•  more diverse than LUBM, SP2Bench and BSBM

FEASIBLE [SNM15]
•  Proposes a feature-based benchmark generation approach
from real queries
–  Structure-based
–  Data-driven based
•  Approach is similar to WatDiv Test Suite
•  Novel sampling approach for queries based on exemplars and
medoids
•  Propose SELECT, ASK, CONSTRUCT and DESCRIBE SPARQL
queries

FEASIBLE Query Features
•  Number of Triple Patterns
•  Number of Join Vertices
–  Distinguishing between «star», «path» , «hybrid» and «sink»
vertices
•  Join Vertex Degree
–  Sum of incoming and outgoing edges of the vertex
•  Triple Pattern Selectivity
–  Ratio of triples that match the triple pattern over all triples in
the dataset
o1
x
o2
p1
p2
x
yp1 p2 z
Star vertex: x Path vertex: x Hybrid vertex: x
o1
x
o2
p1
p2
y
z
Sink vertex: x
x
y
z

FEASIBLE Benchmark Generation
•  3-step benchmark generation
•  Data-set Cleaning
–  Leads to practically reliable benchmarks
•  Normalization of Feature Vectors
–  Query selection process requires distances between queries to
be computed
–  Normalize the query representations so that all queries are in a
unit hypercube
•  Query Selection
–  Based on the idea of exemplars [NS11]

•  Dataset Cleaning
–  Remove erroneous and zero-result queries from the set of real
queries used to generate the benchmark
–  Exclude all syntactically incorrect queries
–  Attach 9 SPARQL operators (UNION, DISTINCT, OPTIONAL, .. )
and 7 query features (join vertices, join vertex count etc.) to
each of the queries

•  Normalization of Feature Vectors
–  Queries are mapped to a vector of length 16 which stores the
query features
•  For binary SPARQL clauses (e.g., UNION is either used or not
used), store value 1. Else store value 0
•  All non-binary feature vectors are normalized by dividing
their value with the overall maximum value in the data set
•  Query representations are associated with values between 1
and 0

Assessing the performance of RDF Engines: Discussing RDF Benchmarks

Assessing the performance of RDF Engines: Discussing RDF Benchmarks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Assessing the performance of RDF Engines: Discussing RDF Benchmarks

Similar to Assessing the performance of RDF Engines: Discussing RDF Benchmarks (20)

More from Holistic Benchmarking of Big Linked Data

More from Holistic Benchmarking of Big Linked Data (20)

Recently uploaded

Recently uploaded (20)

Assessing the performance of RDF Engines: Discussing RDF Benchmarks