ESWC 2016 Tutorial on RDF Benchmarks
(This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688227.)
12. The Answer: Benchmark your engines!
6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 12
• Querying Benchmark comprises of
– datasets (synthetic or real)
– set of software tools
• synthetic data generators
• query generators
– performance metrics, and
– set of clear execution rules
• Standardized application scenario(s) that serve as a basis for
testing systems
• Must include a clear set of factors to be measured and the
conditions under which the systems should be measured
13. 6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 13
• Benchmarks exist
– To allow adequate measurements of systems
– To provide evaluation of engines for real (or close to real) use
cases
• Provide help
– Designers and Developers to assess the performance of their
tools
– Users to compare the different available tools and evaluate
suitability for their needs
– Researchers to compare their work to others
• Leads to improvements:
– Vendors can improve their technology
– Researchers can address new challenges
– Current benchmark design can be improved to cover new
necessities and application domains
Importance of Benchmarking
14. Tutorial Objective & Benefits
• Objectives:
– Discuss a set of principles and best practices for benchmark
development
– Present an overview of the current work on benchmarks for
RDF query engines
– Focus on identifying research challenges & unexplored
research directions
• Benefits for the audience
– Academic: Obtain a solid background, discover new research
directions
– Practitioner: find out what are the available benchmarks,
advantages and limitations thereof
6/15/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 14
19. Resource Description Framework (RDF)
• An RDF triple is of the form (s, p, o) where
– s is the subject: the URI identifying the described resource
– o is the object: can either be a simple literal value or the URI of
another resource
– p is the predicate: the URI indicating the relation between
subject and object
• An RDF graph is a set of triples
– Can be viewed as a node and edge-labeled directed graph
– It is published in different formats
• RDF-XML, turtle, n3 triples, …
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 19
(dbpedia:Good_Day_Sunshine, dbpedia-owl:artist, dbpedia:The_Beatles)
Close to how people see the world (as a graph)!
23. RDFS Inference
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 23
• Used to entail new information from the one that is explicitly stated in
the dataset
– Transitive closure across class and property hierarchies
– Transitive closure along the type and class/property relations
• Two ways to implement it: Forward & Backward Reasoning
– Forward Reasoning: closure is computed at loading time
– Backward Reasoning: closure is computed on the fly when needed
(P1, rdfs:subPropertyOf, P2), (P2, rdfs:subPropertyOf, P3)
(P1, rdfs:subPropertyOf, P3)
R1:
(C1, rdfs:subClassOf, C2), (C2, rdfs:subClassOf, C3)
(C1, rdfs:subClassOf, C3)
R2:
(C1, rdfs:subClassOf, C2), (r1, rdf:type, C1)
(r1, rdf:type, C2)
R2:
(P1, rdfs:subPropertyOf, P2), (r1, P1, r2)
(r1, P2, r2)
R3:
25. SPARQL: Querying RDF Data
• SPARQL: W3C Standard Language for Querying Linked
Data
• SPARQL 1.0 (2008) only allows accessing the data (query)
• SPARQL 1.1 (2013) introduces:
– Query Extensions: aggregates, sub-queries, negation,
expressions in the SELECT clause, property paths, assignment,
short form for CONSTRUCT, expanded set of functions and
operators
– Updates:
• Data management: Insert, Delete, Delete/Insert
• Graph management: Create, Load, Clear, Drop, Copy,
Move, Add
– Federation extension: Service, values, service variables
(informative) 6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 25
26. SPARQL Queries (1)
• Building Block is the Triple Pattern
– RDF triple with variables
• Group Graph Patterns
– Built through inductive construction combining smaller
patterns into more complex ones using SPARQL operators
• Join - similar to relational join
• Union (UNION) – similar to relational union
• Optional (OPTIONAL) operators on triple patterns – similar
to relational left outer join (introduces negation in the
language)
• Filtering conditions (FILTER)
• Patterns on Named Graphs
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 26
27. SPARQL Queries (2)
• Aggregates
– specify expressions over groups of solutions
– As in standard settings used when the result is computed over a
group of solutions rather than a single solution
• Example: average value of a set of values, sum of a set
– Aggregates defined in SPARQL 1.1 are COUNT, SUM, MIN,
MAX, AVG, GROUP_CONCAT, and SAMPLE.
– Solutions are grouped using the GROUP BY clause
– Pruning at group level is performed with the HAVING clause
• Additional Features
– duplicate elimination (DISTINCT)
– ordering results (ORDER BY) with an optional LIMIT clause
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 27
28. SPARQL Semantics
• SPARQL semantics based on Pattern Matching
– Queries describe subgraphs of the queried graph
– SPARQL graph patterns describe the subgraphs to match
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 28
Intuitively a triple pattern denotes the triples in an RDF
graph that are of a specific form
TP1 = (?album, dbpedia-owl:artist, dbpedia:The_Beatles)
TP2 = (dbpedia_The_Beatles, ?property, ?object )
matches all albums of the Beatles
matches all information about The Beatles
29. SPARQL Types of Queries
• SELECT returns ordered multi-set of variable bindings
– Bindings: mappings of variables to RDF terms in the dataset
– SQL-Like Syntax
• ASK checks whether a graph pattern has at least one
solution - returns a Boolean value (true/false)
• CONSTRUCT returns a new RDF graph as specified by
the graph template of the CONSTRUCT clause using the
computed bindings from the query’s WHERE clause
• DESCRIBE returns the RDF graph containing the RDF
data about the requested resource
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 29
SELECT ?v1, ?v2, … WHERE GraphPattern
32. Storing and Querying RDF data
• Schema agnostic
– triples are stored in a large triple table where the attributes are
(subject, predicate and object) - “Monolithic” triple-stores
– But it can get a bit more efficient
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 32
Subject Predicate Object
t1 dbr:Seven_Seas_Of_Rye rdf:type dbo:MusicalWork
t2 dbr:Starman_(song) rdf:type dbo:MusicalWork
t3 dbr:Seven_Seas_Of_Rye dbo:artist dbo:Queen
id URI/Literal
1 dbr:Seven_Seas_Of_Rye
2 dbr:Starman_(song)
3 dbo:MusicalWork
4 dbo:Queen
5 dbo:artist
6 rdf:type
Subject Predicate Object
1 6 3
2 6 3
1 5 4
RDF-3X maintains 6 indexes, namely, SPO, SOP, OSP, OPS, PSO,
POS. To avoid storage overhead, indexes are compressed! [NW09]
33. Storing and Querying RDF data
• schema aware:
– one table is created per property with subject and object attributes (Property
Tables [Wilkinson06])
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 33
Subject Predicate Object
ID1 type BookType
ID1 title “XYZ”
ID1 author “Fox, Joe”
ID1 copyright “2001”
ID2 type CDType
ID2 title “ABC”
ID2 artist “Orr, Tim”
ID2 copyright “1985”
ID2 language “French”
ID3 type BookType
ID3 title “MNO”
ID3 language “English”
ID4 type DVDType
ID4 title “DEF”
ID5 type CDType
ID5 title “GHI”
ID5 copyright “1995”
ID6 type BookType
ID6 copyright “2004”
Subject Type Title copyright
ID1 BookType “XYZ” “2001”
ID2 CDType “ABC” “1985”
ID3 BookType “MNO” NULL
ID4 DVDType “DEF” NULL
ID5 CDType “GHI” “1995”
ID6 BookType NULL “2004”
Subject Predicate Object
ID1 author “Fox, Joe”
ID2 artist “Orr, Tim”
ID2 language “French”
ID3 language “English”
Subject Title Author copyright
ID1 “XYZ” “Fox, Joe” “2001”
ID3 “MNO” NULL NULL
ID6 NULL NULL “2004”
Subject Title artist copyright
ID2 “ABC” “Orr, Tim” “1985”
ID5 “GHI” NULL “1985”
Subject Predicate Object
ID2 language “French”
ID3 language “English”
ID4 type DVDType
ID4 title “DEF”
Booktype
CDType
Property-class Table
Subject Object
… …
… …
Clustered Property Table
Multi-Value P
34. Storing and Querying RDF data
• Vertically partitioned RDF [AMM+07]
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 34
Subject Predicate Object
ID1 type BookType
ID1 title “XYZ”
ID1 author “Fox, Joe”
ID1 copyright “2001”
ID2 type CDType
ID2 title “ABC”
ID2 artist “Orr, Tim”
ID2 copyright “1985”
ID2 language “French”
ID3 type BookType
ID3 title “MNO”
ID3 language “English”
ID4 type DVDType
ID4 title “DEF”
ID5 type CDType
ID5 title “GHI”
ID5 copyright “1995”
ID6 type BookType
ID6 copyright “2004”
Subject Object
ID1 BookType
ID2 CDType
ID3 BookType
ID4 DVDType
ID5 CDType
ID6 BookType
Subject Object
ID1 “XYZ”
ID2 “ABC”
ID3 “MNO”
ID4 “DEF”
ID5 “GHI”
Subject Object
ID1 “2001”
ID2 “1985”
ID5 “1995”
ID6 “2004”
Subject Object
ID2 “Orr, Tim”
Subject Object
ID1 “Fox, Joe”
Subject Object
ID2 “French”
ID3 “English”
type
title
copyright
author
artist
language
To get the most out of this par0cular
decomposi0on, a column-oriented
DBMS is recommended.
35. Comparison of Storage Techniques [BDK+13]
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 35
company released
Google Android
Apple iPhone
subject object
Google Android
Google developer Android
subject predicate object
Larry Page born “1973”
Larry Page founder Google
Google HQ “MTV”
Google employees 50,000
Google industry Internet
Google industry Software
Google industry Hardware
Triple store
person born founder
Larry Page “1973 Google
Type-oriented store
company HQ employees
Google “MTV” 50,000
subject predicate object
Google industry Internet
Google industry Software
Google industry Hardware
subject object
Larry Page “1973”
Predicate-oriented store
subject object
Google “MTV”
subject object
Google Internet
Google Software
Google Hardware
subject object
Larry Page Google
subject object
Google 50,000
born
founder
HQ
employees
industry
industtry
Larry Page
“1973”
Google
Internet
Software
Hardware
“MTV”
HQ
50,000
employees
sample graph
Columns are
overloaded
Traditional relational
column treatment
Static mix of overloaded
and normal columns
developer
Schema does not
change on updates
Schema might
change on updates
36. Storing Linked Data: Query Processing
• Schema Agnostic
– algebraic plan obtained for a query involves a large number of
self joins
– queries are favorable when the predicate is a variable
• Hybrid Approach and Schema-aware
– algebraic plan contains operations over the appropriate
property/class tables (more in the spirit of existing relational
schemas)
– saves many self-joins over triple tables
– if the predicate is a variable, then one query per property/class
must be expressed
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 36
37. Purpose of an RDF Querying Benchmark
• Test the performance of RDF stores
– Independently of underlying storage engine
– Independently of underlying logical and physical schema
– Independently of the query actually executed in the engine
• SPARQL for native stores
• SQL (SPARQL translated to SQL) for relational stores
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 37
38. Overview
• Introducing Benchmarks
• A short discussion about Linked Data
– Resource Description Framework (Data Model)
– SPARQL (Query Language)
• Benchmarking Principles & Choke Points
• Benchmarks
– Synthetic
– Real
– Benchmark Generators
• Sum up: what did we learn today?
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 38
40. Why Benchmarks?
• Performance Evaluation
– There is no no single recipe on how to do it right
– There are many ways how to do it wrong
– There are a number of best practices but no broadly
accepted standard on how to design and develop a
benchmark
• Questions asked:
– What data/data sets should we use?
– Which workload/queries should we consider?
– What to measure and how to measure?
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 40
43. Micro Benchmarks: Advantages
• Very focused
– Test a specific operator of the system
• Controllable data & workload
– Synthetic and Real Data sets
• Different value ranges and value distribution and correlations
(mostly applicable to structured data)
– Various data sizes to tackle scalability concerns
• Queries
– Workloads of different complexity & size
• Complexity: as to the types of query operators and patterns
• Size: as to the number of query operators involved
– Allow broad parameter range(s)
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 43
! Useful for detailed, in-depth analysis
! Low setup threshold;
! Easy to run
46. Standard Benchmarks: Advantages & Disadvantages
• Advantages
– Mimic real-life scenarios (respond to real needs)
• E.g., TPC is a business oriented benchmark
– Publicly available
– Well defined
– Provide scalable data sets and workloads
– Metrics are well defined
• Disadvantages
– Outdated (standardization is a lengthy process)
• XQuery took around 7 years to become a standard
• TPC benchmark definition is still an ongoing process
– Very large and complicated to run
– Limited dataset variation (target a specific type of data)
– Limited Workload (focuses on the application in mind)
– Systems are often optimized for the benchmark(s)
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 46
47. • Management and methodological activities performed by a
group of people
– Management: Organizational protocols to control the process
– Methodological: principles, methods and steps for benchmark
creation
• Benchmark Development
– Roles and bodies: people/groups involved in the development
– Design principles: fundamental rules that direct the
development of a benchmark
– Development process: series of steps to develop a benchmark
based on Choke Points
Benchmark Development Methodology
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 47
Choke Points: the set of technical
difficulties that force systems to improve their performance
48. The Example Standard Benchmark: TPC
• Transaction Processing Council (TPC)
– non-profit corporation focused on developing data-centric
benchmark standards and disseminating objective, verifiable
performance data to the industry
– goal is to «create, manage and maintain a set of fair and
comprehensive benchmarks that enable end-users and vendors to
objectively evaluate system performance under well defined
consistent and comparable workloads» [NPM+12]
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 48
Benchmark Explanation
TPC-C Focuses on transactions.
TPC-DI Focuses on ETL processes
TPC-DS Decision support solutions for, but not limited to, Big Data.
TPC-E On-Line Transaction Processing (OLTP) workload
TPC-H Decision support benchmark, ad hoc queries and concurrent data modifications
TPC-VMS Virtual Measurement Single System Specification for running and reporting performance
metrics for virtualized databases
TPC-xHS measure of hardware, operating system and commercial Apache Hadoop File System API
TPX-xV measure the performance of servers running database workloads in virtual machines.
Active TPC Benchmarks (2016)
49. Benchmark Development Process (1)
• Design Principles [L97]
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 49
Principle Comment
Relevant The benchmark is meaningful for the target domain
Understandable The benchmark is easy to understand and use
Good Metrics The metrics defined by the benchmark are linear, orthogonal
and monotonic
Scalable The benchmark is applicable to a broad spectrum of hardware
and software configurations
Coverage The benchmark workload does not oversimplify the typical
environment
Acceptance The benchmark is recognized as relevant by the majority of
vendors and users
50. Benchmark Development Process (2)
• Benchmarking Metrics
– Performance
– Price/Performance
– Energy/Performance Metrics: Energy metric to measure the energy
consumption of system components
• TPC Pricing specification
– Provides consistent methodologies for computing the price of the
benchmarked system, licensing of software, maintenance, …
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 50
Benchmark Metrics
TPC-C Transaction Rate(tpmC), Price per Transaction ($/tmpC)
TPC-E Transactions per Second (tpS)
TPC-H Composite Query per Hour Performance Metric (QpH@Size),
Price per Composite Query per Hour Performance Metric ($/
QpH@Size)
53. Design Principles: Desirable Attributes of a Benchmark
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 53
• Relevant/Representative: based on realistic
use case scenarios and must reflect the needs
of the use case
• Understandable/Simple: the results and
workload are easily understandable by users
• Portable/Fair/Repeatable: no system
benefits from the benchmark. Must be
deterministic and provide a «gold standard»
• Metrics: should be well defined to be able to
assess and compare the systems.
• Scalable: datasets should be in the order of
billions of «objects»
• Verifiable: allow verifiable results in each
execution
Benchmark
Attributes
relevant
representative
understandable
simple
portable
fair
repeatable
metrics
scalable
verifiable
56. Choke Points à la TPC-H
• CP1: Aggregation Performance
– Ordered aggregation, small group-by keys, interesting orders, dependent
group-by keys
• CP2: Join Performance
– Large joins, sparse foreign keys, rich join order optimization, late projection
• CP3: Data Access Locality (materialized views)
– Columnar locality, physical locality by key, detecting correlation
• CP4: Expression Calculation
– Raw Expression Arithmetic, Complex Boolean Expressions in Joins and
Selections, String Matching Performance
• CP5: Correlated Sub-queries
– Flattening sub-queries, moving predicates to a sub-query, overlap between
outer- and sub-query
• CP6: Parallelism and Concurrency
– Query plan parallelization, workload management, result re-use
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 56
57. Choke Points à la RDF
Choke Point Description
CP1: JOIN
ORDERING
1. Tests if the engine can evaluate the trade-offs between
the time spent to find the best execution plan and the
quality of the output plan
2. Tests the ability of the engine to consider cardinality
constraints expressed by the different kinds of schema
constraints (e.g., functional and inverse functional
properties)
CP2:
AGGREGATION
Aggregations are implemented with the use of sub-selects
in the SPARQL query; the optimizer should recognize the
operations included in the sub-selects and evaluate them
first.
CP3: OPTIONAL &
NESTED
OPTIONAL
CLAUSES
Tests the ability of the optimizer to produce a plan where
the execution of the optional triple patterns is the last to
be performed since optional clauses do not reduce the
size of intermediate results.
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 57
63. Benchmark Components
• Datasets
• The raw material of the benchmark against which the workload
will be evaluated
• Synthetic & Real Datasets
! Synthetic: Produced with a data generator (that hopefully
produces data with interesting characteristics)
! Real: Widely used datasets from a domain of interest
• Query Workload
• Sets of queries and/or updates to evaluate the system with
• Metrics
• The performance metric(s) that determine the systems behavior
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 62
68. LUBM Data Generation (2)
• Assignment of Identifiers is done using zero-based indexes
– University0, Department0, …
• Data generated by the tool are repeatable for the universities
– User enters a seed for the random number generator
employed in the data generation process
• Data created are represented in OWL Lite
• Configurable serialization and representation model (RDF/
XML in .owl files, DAML)
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 67
69. LUBM Queries (1)
• 14 Realistic Queries
• Written in SPARQL 1.0
• Query Design criteria
– Input Size:
• proportion of the class instances involved and entailed
in the query to the total instances in the dataset
– Selectivity:
• estimated proportion of the class instances that satisfy
the query criteria
• depends on the input dataset size
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 68
73. LUBM Performance Metrics (1)
• Load Time:
– Time needed to parse, load and reason for a dataset
– Focuses on persistent stores
• Repository Size:
– For persistent storage only
– The size of all files that constitute the repository
• Query Response Time:
– Average time for executing a query 10 times (warm run)
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 72
77. SP2Bench Schema DBLP (2)
• Probability distribution for selected attributes per document
classes
• Additional assumption is that attributes are not dependent
– Existence of an attribute does not depend on another
• Use Bell-shaped Gaussian curves to approximate input data
– Typically used to model normal distributions
• Studied the number of class instances over time and modeled
those with a power law distribution
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 76
Article Inproc. Proc. Book WWW
author 0.9895 0.9970 0.0001 0.8937 0.9973
cite 0.0048 0.0104 0.0001 0.0079 0.0000
editor 0.0000 0.0000 0.7992 0.1040 0.0004
isbn 0.0000 0.0000 0.8592 0.9294 0.0000
… … … … … …
78. SP2Bench Data Generation
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 77
• Synthetically produced extensional data that conform to the DBLP
Schema
• Use of existing external vocabularies to describe resources in a uniform
way
– FOAF (persons) – Friend of A Friend [ FOAF], SWRC - Semantic Web
for Research Communities (scientific publications) [SWRC], DC –
Dublin Core [DC]
• Introduce blank nodes and RDF containers (rdf:Bag) to capture all aspects
of the RDF data model
• Data generation takes into account data approximation as reflected in
the Gaussian curves
• Data generator takes as input either the triple count, or year up to which
the data is generated
– Always ending up in a consistent state!
• Random functions are based on a fixed seed making data generation
deterministic
79. SP2Bench Queries (1): Characteristics
• 17 queries
– 12 main queries and modifications thereof
• Provided in natural language, in SPARQL 1.0 and SQL
translations are also available
• Query design criteria
– Focus on SELECT and ASK SPARQL forms
– Aim at covering the majority of SPARQL constructs
(including DISTINCT, ORDER By, LIMIT, OFFSET)
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 78
82. SP2Bench Performance Metrics
• Loading Time:
– time needed to parse, load and reason using the tested system
for a dataset
– Focuses on persistent stores
• «Per-query» performance:
– Performance of each query
• «Global» performance:
– List the arithmetic and geometric mean of queries
1. Multiply the execution time of all 17 queries
2. Penalize queries that fail with 3600s penalty
3. Compute the 17th root of the result
• Memory consumption
– High watermark of main memory consumption
– Average memory consumption of all queries
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 81
84. BSBM Schema (1)
• E-commerce use case: products are offered by several vendors
and consumers post reviews for those products
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 83
9..22
Review
bsbm:reviewFor
rev:reviewer
bsbm:reviewDate
dc:title
rev:text
bsbm:rating1[0..1]
bsbm:rating2[0..1]
bsbm:rating3[0..1]
bsbm:rating4[0..1]
Producer
rdfs:label
rdfs:comment
rdf:type
foaf:homepage
bsbm:country
ProductType
rdfs:label
rdfs:comment
rdf:type
rdfs:subClassOf[1..0]
ProductFeature
rdfs:label
rdfs:comment
rdf:type
Product
rdfs:label
rdfs:comment
rdf:type
bsbm:producer
bsbm:productFeature[9..22]
bsbm:productPropertyTextual1
bsbm:productPropertyTextual2
bsbm:productPropertyTextual3
bsbm:productPropertyTextual4[0..1]
bsbm:productPropertyTextual5[0..1]
bsbm:productPropertyNumeric1
bsbm:productPropertyNumeric2
bsbm:productPropertyNumeric3
bsbm:productPropertyNumeric4[0..1]
bsbm:productPropertyNumeric5[0..1]
Offer
bsbm:product
bsbm:vendor
bsbm:price
bsbm:validFrom
bsbm:validTo
bsbm:deliveryDays
bsbm:offerWebpage
Person
foaf:name
foaf:mbox_sha1sum
bsbm:country
Vendor
rdfs:label
rdfs:comment
rdf:type
foaf:homepage
bsbm:country
1..89
1
1..*
1..*
1..*
1
2..16
1
4..32
1
280..3730
2..37
1
85. BSBM Schema & Data Characteristics (1)
• Every product has a type from a product hierarchy
• Product Hierarchy is not fixed (depends on the dataset size)
– It’s depth and width depends on the chosen scale factor
– Hierarchy depth
– Branching factor for
• root level
• all other levels is 8
• Product types are assigned a variable number of product
features
– computed as lowerBound and upperBound with
• aa
– Set of possible features for a given product type is the union of
the type and all its “super-types”.
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 84
d =1+round(log10(n)) / 2
n
bfr =1+ round(log10(n))
lowerBound = 35*i / (d *(d +1) / 2 −1),upperBound = 75*i / (d *(d +1) / 2 −1)
86. BSBM Schema & Data Characteristics (2)
• Products, Vendors, Offers
– Products that share the same type, have also the same set of
features
– For a given product, its features are chosen from the set of
possible features with a hard-coded probability of 25%
– Normal distribution with a mean of μ=50 and standard deviation
σ=16.6 is employed to associate products with producers
– Vendors are associated to countries following hard-coded
distributions
– Size of offers is n*20 are distributed over products following a
normal distribution with «fixed parameters» μ=n/2 and σ=n/4
– Offers are distributed over vendors following a normal
distribution with «fixed parameters» μ=2000 and σ=667
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 85
87. BSBM Schema & Data Characteristics (3)
• Reviews
– 10 times the scale factor n
– Data type property values (title and text) between 50 – 300
words
– Up to 4 ratings, each rating is a random integer between 1 and
10
– Each rating is missing with hard-coded probability 10%
– Distributed over products with a normal distribution depending
on dataset size and following μ=n/2 and σ=n/4
– Number of reviews per reviewer follows normal distribution
with μ=20 and σ=6.6
– Reviews are generated until all reviews are assigned a reviewer
– Reviewer countries follow the same distribution as vendor
countries
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 86
89. BSBM Queries (1)
• 12 Queries
• Query mix is emulates search and navigation patterns of a customer
looking for a product
• BSBM queries are given in natural language, SPARQL and SQL
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 88
Query Description
Q1 Find products for a given set of generic features
Q2 Retrieve basic information about a specific product for display purposes
Q3 Find products having some specific features and not having one feature
Q4 Find products matching two different sets of features
Q5 Find products that are similar to a given product
Q6 Find products having a label that contains a specific string
Q7 Retrieve in-depth information about a product including offers and reviews
Q8 Give me recent language reviews for a specific product
Q9 Get information about a reviewer
Q10 Get cheap offers which fulfill the consumer’s delivery requirements
Q11 Get all information about an offer
Q12 Export information about an offer into another schema
93. Semantic Publishing Benchmark (SPB)
• Developed in the context of FP7 EU Project LDBC (2012-2015)
• LDBC’s goals:
– Develop querying benchmarks that will spur research &
industry progress in large-scale graph and RDF data
management
• scalability, storage, indexing and query optimization
techniques for RDF and graph database solutions
• quantitatively and qualitatively assess different
solutions for RDF data integration
– To establish an industry-neutral entity - LDBC foundation -
à la the Transaction Processing Council (TPC)
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 92
95. SPB Design: Requirements
• Storing and processing RDF data
– Storing and isolating data in separate RDF graphs
– Supporting following SPARQL standards :
• SPARQL 1.1 Protocol, Query, Update
• Support for Schema Languages
– Support for RDFS to obtain the correct answers
– Optional support for the RL profile of Web Ontology Language
(OWL2 RL) in order to pass the conformance test suite
• Loading data from RDF serialization formats
– N-Quads, TRIG, Turtle, etc.
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 94
96. SPB Schema: BBC Ontologies (1)
• Core Ontologies: 7 ontologies describe basic concepts about
entities and relationships in the domain of interest
– Basic Concepts: Creative Works, Places, Persons, Provenance
Information, Company Information, etc.
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 95
Thing CreativeWork
String
cwork:title
owl:Thing owl:sameAs
Theme Organisation
Event PlacePerson Programme
NewsItemBlogPost
cwork:tag
cwork:shortTitle
String
cwork:category
xsd:Any
cwork:description
String
Audience
International Audience National Audience
cwork:audience
cwork:Format
Textual
Format
Video
Format
Interactive Format
Image Format Audio Format
PictureGallery
Format
cwork:primaryFormat
xsd:dateTime
xsd:dateTime
cwork:dateModified
cwork:dateCreated
cwork:Thumbnail
cwork:thumbnail
Thumbnail ThumbnailTypethumbnailType
StandardThumbnail
FixedSize66Thumbnail
CloseUpThumbnail
FixedSize266Thumbnail
FixedSize466Thumbnail
p
rdfs:subClassOf rdfs:subPropertyOf
rdf:type
tag
about mentions
Stringcwork:altText
97. Schema BBC Schema (2)
• Domain Ontologies: 3 ontologies describe concepts and
properties related to a specific domain
– sports (competitions, events)
– politics entities
– news (concepts that journalists tag annotations with)
• Statistics
– 74 classes
– 88 data type properties, 28 object type properties
– 60 rdfs:subClassOf (maximum depth 3) , 17 rdfs:subPropertyOf
(maximum depth 1) hierarchies
– 105 rdfs:domain and 115 rdfs:range RDFS properties
– 8 owl:oneOf class axioms, 1 one owl:TransitiveProperty
property.
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 96
99. SPB Data Generation (1): Process
1. Loader
– Ontology & Reference Data
2. Data Generator
a. Retrieves instances
from Reference Datasets
b. Generates Creative Works
according to pre-defined
allocations and models
c. Writes generated data to
disk
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 98
RDF Repository
BBC
Ontologies
Reference
Datasets
Ontology &
Reference
Data Set Loader
Creative
Works
Generator
SPARQL Endpoint
SPB Data Generator
Data
generation
parameters
(1) (1)
(2.a)
Generated
CWs
(2.c)
(1)
(2.d)
101. SPB Operational Phases
• Data Loading
1. Initial loading of reference datasets
• BBC datasets enriched with DBPedia Person and GeoNames
place data
2. Generation of Creative Works
• Parallel generation (multi-threaded and multi-process)
3. Loading of Creative Works in the RDF repository
• Running the Benchmark
1. Warm-up phrase
2. Run the benchmark using the Test Driver
3. Run conformance tests (OWL2 RL) [optional]
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 100
102. Benchmark Configuration
• Data Generator
– Allocation of tags in Creative Works
• Correlations of creative works with important entities
(persons, places, events)
• Clustering of Creative Works around major / minor events
– Size of generated data (triples)
– Parallel data generation
• Test Driver
– Distribution of queries in the query-mix
• editorial operations (deletion/addition of RDF triples)
• aggregate operations (complex SPARQL queries)
– Number of editorial / aggregation agents
– Duration of Warm-up and Benchmark phases
– Each operational phase can be enabled or disabled
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 101
104. SPB Queries (1)
• Base and Advanced Workloads
– Base Workload: 12 queries & update operations
– Advanced Workload: 24 queries
• Workloads based on real queries used by BBC journalists
during their editorial operations
• Editorial agents – simulate editorial work performed by
journalists :
– Insert, Update, Delete
• Aggregation agents – simulate retrieval operations
performed by end-users
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 103
111. YAGO (Yet Another Great Ontology)[SKW07]
• High quality multilingual knowledge based derived from
Wikipedia, WordNet and GeoNames
• Schema
– Wikipedia Entities, WordNet and GeoNames Concepts and
Relationships: associates WordNet taxonomy with Wikipedia
Category System
– 10 million schema entities
• Dataset
– 120 million triples about schema entities
– 2.625 million links to DBPedia
• Queries
– No representative set of queries is offered by YAGO
– [NW10] provides a representative set of 8 queries for RDF-3X
Evaluation
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 110
118. WordNet [WordNet]
• Large lexical database of English, developed under the
direction of George A. Miller (Emeritus).
• Schema
– Nouns, verbs, adjectives and adverbs are grouped into sets of
cognitive synonyms (synsets), each expressing a distinct
concept.
– Synsets are interlinked by means of conceptual-semantic and
lexical relations. The resulting network of meaningfully related
words and concepts can be navigated with the browser.
• Dataset
– Approximately 1.9 million triples (300MB).
• Queries
– No representative query workload
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 117
121. DBPedia SPARQL Benchmark (DBSB) [MLA+14]
• Generic Methodology for SPARQL Benchmark Creation
• Based on
– Flexible data generation that mimics an input data source
– Query-log mining
– Clustering of queries
– SPARQL queries feature analysis
• Methodology is schema agnostic
– Demonstrated using DBPedia KB
• Proposed approach applied on various sizes of the DBPedia
Knowledge Base
• Benchmark proposes query workload based on real queries
expressed against DBPedia
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 120
123. DBSB Data Generation (2)
• Idea
1. Large datasets produced by
• Duplicating all triples and changing their namespace
2. Smaller datasets produced by
• Removing triples in a way that would preserve the
properties of the original graph
• Using a seed based method based on the assumption that a
representative set of resources is obtained by sampling
across classes
1. For each selected element in the dataset, its concise
bound description (CBD) is retrieved and added in the
queue
2. Process is repeated until the number of triples is
reached
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 122
125. DBSB Query Analysis (2)
• Query Selection
1. Use DBPedia SPARQL Query log (31.5 million queries in a 3
month period)
2. Reduce the initial set of queries by considering
• Query Variations: use a standard way to name variables to
reduce differences among queries (promoting query
constructs such as DISTINCT, REGEX)
• Query Frequency: discard queries with low frequency since
they do not contribute to the overall query performance
– Result: 35,965 queries
3. String Stripping: remove all SPARQL keywords and common
prefixes
4. Similarity Computation: compute the similarity of the stripped
queries
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 124
126. DBSB Query Analysis (3)
• Query Selection (cont’d)
4. Similarity Computation
• Reduce the time of benchmark compilation, use LIMES[NS11]
framework
• Use the Levenshtein string similarity measure, 0.9 threshold
• Reduce by 16.6% the number of computations required by
computing the Cartesian product of queries
5. Clustering
• Apply graph clustering to the query similarity graph of (4)
• Goal is to identify similar groups of queries out of which
prototypical queries will be generated
• Use BorderFlow [NS09] algorithm that follows a seed-based
approach
• Obtain 12272 clusters, 24% contain a single query
• Select the clusters with >5 queries
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 125
127. DBSB Query Generation (1)
• Select the most interesting SPARQL queries
– Which are the most frequently asked SPARQL queries
– Which of those queries cover the most SPARQL features
• SPARQL Features
– Overall number of triple patterns
• Test the efficiency of join operations (CP1)
– SPARQL pattern constructors (UNION & OPTIONAL)
• Handle parallel execution of Unions (CP5)
• Perform OPTIONALs as late as possible in the query plan (CP3)
– Solution sequences & modifiers (DISTINCT)
• Efficiency of duplication elimination ( CP10)
– Filter conditions and operators (FILTER, LANG, REGEX, STR)
• Efficiency of engines to execute filters as early as possible (CP6)
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 126
133. Apples and Oranges [DKS+11]
• Nothing can better represent data than the data itself!
• Idea: Turn every dataset into a benchmark
1. No need to synthetically generate values
• Use the actual data values in the dataset
2. No need to synthetically generate queries.
• The queries that are known to run in your data can be
used in the benchmark.
3. But we need to cover the structuredness spectrum
• to get data as close as possible to the real world data
• to see how the systems perform when data goes from
very structured to less structured
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 132
134. Counting Coins [DKS+11]
• Start with a dataset with size S and CH = 0.5
• Aim for a dataset with size S’ and CH’,
where S > S’ and CH > CH’.
Process:
• Assign a coin to each triple (s, p, o) and compute the
impact in CH of its removal
– The removal will impact the size by 1.
Example: Consider (person1, ext, x5304). Removing
the triple from D gives a dataset with CH(T, D) = 0.467.
Therefore the coin(person1, ext, x5304) = 0.5 – 0.467 =
0.033.
• Formulate (automatically) an integer programming
problem whose solutions will tell us how many coins to
remove to achieve the desired coherence CH’ and size S’.
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 133
subject predicate object
person0 name Eric
person0 office BA7430
person0 ext x4401
person1 name Kenny
person1 office BA7349
person1 office BA5439
person1 ext x5304
person2 name Kyle
person2 ext x6281
person3 name Timmy
person3 major C.S.
person3 GPA 3.4
person4 name Stan
person4 GPA 3.8
person5 name Jimmy
person5 GPA 3.7
One of the few occasions in life where having
too many coins is undesirable…
139. WatDiv Structural Features (3)
• Join Vertex Type
– Play an important role in the behavior of RDF engines to determine
efficient query plans
• E.g., star queries promote efficient merge joins
• 3 (mutually non-exclusive) types of join vertices
– Vertex x of type SS+
if for all triple patterns (s,p,o)*, x is the subject
– Vertex x of type OO+
if for all triple patterns (s,p,o)*, x is the object
– Vertex x of type SO+
if for all triple patterns (s,p,o)*, (s’,p’,o’) x=s & x=o’
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 138
?n
?m
?x
?l
C
E
?k
?m type SS+
?x
B
?z
C
?w
?x type OO+
?c
D
D
?x
?b
B
?z
C
?w
?x type SO+
*Triple pa8erns (s,p,o) are incident on x
140. WatDiv Data-driven Features (1)
• A system’s choice on the most efficient query plan depends on
– (a) the characteristics of the dataset and
– (b) the query
• If the system relies on selectivity estimations and result
cardinality, the same query will have a different query plan for
dataset(s) of different sizes
• Different cases:
– Queries have a diverse mix of result cardinalities
– Some triple patterns are very selective, others are not
– All triple patterns are equally selective
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 139
141. WatDiv Data-driven Features (2)
• Result Cardinality CARD(Ā,G)
– the number of solutions in the result of the evaluation of a graph
pattern Ā = <A, F> over graph G
• Filter Triple Pattern Selectivity (f-TP Selectivity) SELF
G (tp)
– the ratio of distinct solution mappings of a triple pattern tp to
the set of triples in graph G
• Measures
1. Result cardinality
2. Mean & standard deviation of f-TP selectivities of triple
patterns
• Important for distinguishing queries whose triple patterns are
almost equally selective from queries with varying f-TP
selectivities
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 140
143. WatDiv Data-Driven Features (4)
• BGP-Restricted f-TP selectivity SELF
G (tp|Ā)
• assesses how much a triple pattern contributes to the overall
selectiveness of the query
• fraction of distinct solution mappings for a triple pattern that
are compatible with some solution mapping in the query result.
• Join-restricted f-TP selectivity SELF
G (tp|x)
• assesses how much a filtered triple pattern contributes to the
overall selectiveness of the joins that it participates in
• for x a join vertex and tp a triple pattern incident on x, the x-
restricted f-TP of tp over graph G is the fraction of distinct
solution mappings compatible with a solution mapping in the
query result of the sub-query that contains all triple patterns
incident to x
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 142
144. WatDiv Test Suite (1)
• Components: Data Generator and Query Generator
• Data Generator
– Allows users to define their own dataset controlling
• Entities to include
• Topology of the graphs allowing one to mimic the real types
of data distributions in the Web
– «well-structuredness» of entities
– probability of entity associations
– cardinality of property associations
– Important: Instances of the same entity do not have the same
set of attributes: breaking the «relational nature» of previous
RDF benchmarks
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 143
145. WatDiv Test Suite (2)
• Query Template Generator
– User-specified number of templates
– User specified template characteristics
• Number of triple patterns
• Types of joins and filters in the triple patterns
– Traverses the WatDiv schema using a random walk and
generates a set of query templates
• Query Generator
– Instantiates the query templates with terms (IRIs, literals etc.)
from the RDF dataset
– User-specified number of queries produced
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 144
146. WatDiv Test Suite (3)
• Query Template Generator
– Random Walk on an internal representation of the schema
• Entity types in the schema correspond to graph vertices
• Relationships (i.e., object type properties) are graph edges
• Vertices are annotated with data type properties (i.e.,
attributes)
– Produces a set of Basic Graph Patterns with a maximum n triple
patterns with unbound objects and subjects
– k uniformly randomly selected subjects/objects are replaced
with placeholders
– Placeholders are replaced with actual RDF terms randomly
retrieved from the dataset
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 145
150. FEASIBLE Query Features
• Number of Triple Patterns
• Number of Join Vertices
– Distinguishing between «star», «path» , «hybrid» and «sink»
vertices
• Join Vertex Degree
– Sum of incoming and outgoing edges of the vertex
• Triple Pattern Selectivity
– Ratio of triples that match the triple pattern over all triples in
the dataset
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 149
o1
x
o2
p1
p2
x
yp1 p2 z
Star vertex: x Path vertex: x Hybrid vertex: x
o1
x
o2
p1
p2
y
z
Sink vertex: x
x
y
z
151. FEASIBLE Benchmark Generation
• 3-step benchmark generation
• Data-set Cleaning
– Leads to practically reliable benchmarks
• Normalization of Feature Vectors
– Query selection process requires distances between queries to
be computed
– Normalize the query representations so that all queries are in a
unit hypercube
• Query Selection
– Based on the idea of exemplars [NS11]
6/21/16 ESWC 2016: Assessing the performance of RDF Engines - Discussing RDF Benchmarks 150