Practical SPARQL Benchmarking Revisited

1
Rob Vesse
rvesse@yarcdata.com
@RobVesse

2
1. Rewind to 2012
2. Limitations
3. Evolving the Framework
4. Examples
5. Future Work

4
 Presentation I gave at this conference in 2012
 Slides at http://www.slideshare.net/RobVesse/practical-sparql-benchmarking
 Highlighted some issues with SPARQL Benchmarking:
 Standard Benchmarks all have know deficiencies
 Lack of standardized methodology
 Best benchmark is the one you run with your data and workload
 Introduced the 1.x version of our SPARQL Query
Benchmarker tool
 Java tool and API for benchmarking
 Used a methodology based upon combination of the BSBM runner and Revelytix SP2B white
paper
 Reports various appropriate statistics
 Various configuration options to change what exactly is benchmarked e.g. whether results are
fully parsed and counted

5
 The 1.x tool was open sourced shortly after the 2012
conference under a 3 clause BSD License
 Available on SourceForge
 http://sourceforge.net/projects/sparql-query-bm/files/1.0.0/
 Also as Maven artifacts (in Maven Central):
 Group ID: net.sf.sparql-query-bm
 Artifact IDs:
 cmd
 core
 Latest 1.x Version: 1.1.0

 The 1.x tool can only benchmark SPARQL queries
 SPARQL 1.1 has been standardized since the 1.x version of
the tool was written and adds various additional SPARQL
features that you may want to test:
7
 SPARQL Updates
 SPARQL Graph Store Protocol
 Queries are fixed
 No parameterization support
 Can't pass custom endpoint parameters in
 For example enable/disable reasoning
 Also no way to test endpoint specific extensions
 e.g. transactions

8
 Requires using HTTP endpoints to access the SPARQL
system to be tested
 Adds communication overheads to the results
 Sometimes this may be desirable
 No ability to test SPARQL operations in-memory
 i.e. can't test lower level APIs

 Only supports a single benchmarking methodology
 Methodology is hard coded
 Can't do things like run a subset of the provided operations
on each run
9
 Or repeat an operation within a run
 Or retry an operation under specific failure conditions
 Configuration of the methodology is tightly coupled to the
methodology
 Many aspects are actually independent of the methodology

1
0
 Used a simplistic text based format
 One query file per line
 No way to specify additional parameters
 No way to assign a friendly name to queries
 Assigns each query the filename

 There is a progress monitoring API but it is limited
 E.g. Gets called after a query completes but not before it
starts
 Makes it awkward/impossible to implement some kinds of
monitoring
1
1
 e.g. crash detection, memory usage

1
2
 In the interests of speed over usability we rolled our own
command line arguments parser
 Means argument parsing is awkward to extend

1
4
 Earlier this year we found a compelling reason to rewrite
the tool and address the various limitations
 First 2.x release was made 9th June 2014
 Minor bug fix and maintenance releases since
 Releases available at:
 http://sourceforge.net/projects/sparql-query-bm/files/
 Code is now using Git
 http://git.code.sf.net/p/sparql-query-bm/git sparql-query-bm-git
 Mirrors available on GitHub for those who think that it is the one true source
 https://github.com/rvesse/sparql-query-bm
 Maven artifacts available through Maven Central as before:
 Group ID: net.sf.sparql-query-bm
 Artifact IDs: core, cmd and dist
 Latest 2.x version: 2.0.1

 Concept of Queries replaced with the general concept of
Operations
 Also divorces the definition of an operation with how to run
said operation
1
5
 Makes it easier to change runtime behaviour of operations
 20 built-in operations provided
 API allows defining and plugging in new operations as
desired
 http://sparql-query-bm.sourceforge.net/javadoc/latest/core/

1
6
 Several kinds of query/update
 Fixed
 Parameterized
 Dataset Size
 Variants for both remote endpoints and in-memory
datasets
 Remote variants have additional NVP variants
 Allows adding custom parameters to the remote request
 Accounts for 13 of the built in operations

1
7
 One for each graph store protocol operation:
 DELETE
 GET
 HEAD
 POST
 PUT
 Accounts for a further 5 of the built-in operations

1
8
 Sleep
 Do nothing for some period
 Useful for simulating quiet periods as part of testing
 Mix
 Allow grouping a set of operations into a single operation
 Lets you compose mixes from other mixes

1
9
 As already noted in-memory variants of some operations
are now available
 These run tests against a Dataset implementation
 Part of Apache Jena ARQ API
 Removes SPARQL Protocol and HTTP overhead from testing
 Of course depending on Dataset implementation may still be some communication overhead
 But this is likely using lower level back end native communications protocols instead

2
0
 Addresses the limitation of hard coded methodology
 Separates test running into three components:
 Overall runner
 Mix runner
 Operation runner
 Each has own API and can be customized as desired
 Various useful base/abstract implementations provided
 Four different test runners are provided:
 Benchmark
 Smoke
 Soak
 Stress

2
1
 Smoke
 Runs the mix once and indicates whether it passes/fails
 Pass is defined as all operations pass
 Soak
 Run the mix continuously for some period of time
 Test how a system reacts under continuous load
 Stress
 Run the mix with increasingly high load
 Test how a system reacts under increasing load
 AbstractRunner provides a basic framework and helper
method to make it easy to add custom runners or
customize existing runs

2
2
 Allows customizing how mixes and individual operations
are run
 Some alternative implementations built in:
 E.g. SamplingOperationMixRunner
 Runs a sample of the operations in the mix
 May include repeats
 E.g. RetryingOperationRunner
 Retries an operation if it doesn't succeed
 Easy to implement your own

2
3
 Separates test configuration from the test runner
 Interface with all common configuration defined
 Endpoints
 Timeouts
 Progress Listeners
 etc
 NB - Runners are typically defined such that they restrict
their input options to sub-interfaces that add runner
specific configuration e.g.
 Warm-ups for benchmarks
 Total runtime for soak testing
 Ramp up factor for stress testing

2
4
 Now using TSV as the file format
 Still wanted to be simple enough that someone with zero RDF/SPARQL knowledge can
configure
 Each line is a series of parameters separated by a tab
character
 First parameter is an identifier for the type of the operation
 Used to decide how to interpret the remaining parameters
 Can define your own mix file format and register a loader
for it
 Possible to override the loader for a specific operation
identifier since this has an API
 Means you can do neat tricks like use a mix designed for remote endpoints against an in-memory
dataset

query 806670-warmup1.rq 806670 Warmup Query 1
query 806670-nofilter.rq 806670 Query with No Filter
query 806670-filter3.rq 806670 Query with Filter (Variant 3)
param-query 806670-filter3-params.rq instances.tsv Parameterized Query with
Filter (Variant 3)
query 806670-filter4.rq 806670 Query with Filter (Variant 4)
query 806670-filter4a.rq 806670 Query with Filter (Variant 4a - Zero Results)
param-query 806670-filter4-params.rq instances.tsv Parameterized Query with
Filter (Variant 4)
query 806238-comment43.rq 806238 Query (Comment 43)
query 806238-comment43a.rq 806238 Query (Comment 43 - SELECT * sub-query)
query 806238-comment45.rq 806238 Query (Comment 45 - Multiple sub-queries)
query 806238-comment54.rq 806238 Query (Comment 54)
param-update load-full1m.ru graph-names.tsv Load 1M Dataset into named graph
param-query count-loaded.rq graph-names.tsv Count named graph
param-update drop-loaded.ru graph-names.tsv Drop named graph
query count.rq Count quads
checkpoint10 Checkpoint every 10 runs
sleep 180 3 minute sleep
2
5

 Now provides notifications before and after operation and
mix runs
 Improvements to how some of the built-in
implementations handle multi-threaded output
2
6
 Makes it easier to distinguish where errors occurred when running multi-threaded
benchmarks

2
7
 Now based upon the powerful open source Airline library
 https://github.com/airlift/airline
 Provides a command line interface to each built-in runner
 Also provides AbstractCommandwith all standard options exposed
 Standardized exit codes across all commands
 Comprehensive built-in help
 Can help you define operation mixes
 ./operations
 ./operation --op param-query

 These are things we've done (or are currently doing) with
the framework that aren't in the open source releases
 However the 2.x framework makes these (hopefully) easy
to replicate yourself
2
9

3
0
 Many stores often have rich REST APIs in addition to their
SPARQL APIs
 Can be useful to include testing of these in your mixes
 Requires implementing two interfaces:
 Operation
 OperationCallable
 Abstract implementations of both available to give you the
boiler plate bits
 Internally we have 9 different custom operations defined
which test a subset of our REST API:
 Database Management
 Asynchronous Queries
 Import Management

 One thing we're particularly interested in is how operations
affect memory usage
3
1
 We added custom progress listeners that track and monitor memory usage
 Reports on min, max and average memory usage
 We also have another progress listener that tracks
processes to identify when a test run may have been
impacted by other activity on the system

3
2
public class RetryOnAuthFailureOperationRunner extends RetryingOperationRunner {
public RetryOnAuthFailureOperationRunner() {
this(1);
}
public RetryOnAuthFailureOperationRunner(int maxRetries) {
super(maxRetries);
}
@Override
protected <T extends Options> boolean shouldRetry(Runner<T> runner, T options,
Operation op, OperationRun run) {
return run.getErrorCategory() == ErrorCategories.AUTHENTICATION;
}
}
 Extends the built-in RetryingOperationRunner
 Simply adds a constraint on retries by overriding the
shouldRetry() method

3
4
 Embrace Java 7 features fully
 Use ServiceLoader to automatically discover new operations and mix formats
 Make it even easier to customize runners
 i.e. provide more abstraction of the current implementations

3
5
Questions?
rvesse@yarcdata.com
@RobVesse

Practical SPARQL Benchmarking Revisited

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Practical SPARQL Benchmarking Revisited

Ähnlich wie Practical SPARQL Benchmarking Revisited (20)

Mehr von Rob Vesse

Mehr von Rob Vesse (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Practical SPARQL Benchmarking Revisited

Hinweis der Redaktion