SlideShare ist ein Scribd-Unternehmen logo
1 von 35
1 
Rob Vesse 
rvesse@yarcdata.com 
@RobVesse
2 
1. Rewind to 2012 
2. Limitations 
3. Evolving the Framework 
4. Examples 
5. Future Work
3
4 
 Presentation I gave at this conference in 2012 
 Slides at http://www.slideshare.net/RobVesse/practical-sparql-benchmarking 
 Highlighted some issues with SPARQL Benchmarking: 
 Standard Benchmarks all have know deficiencies 
 Lack of standardized methodology 
 Best benchmark is the one you run with your data and workload 
 Introduced the 1.x version of our SPARQL Query 
Benchmarker tool 
 Java tool and API for benchmarking 
 Used a methodology based upon combination of the BSBM runner and Revelytix SP2B white 
paper 
 Reports various appropriate statistics 
 Various configuration options to change what exactly is benchmarked e.g. whether results are 
fully parsed and counted
5 
 The 1.x tool was open sourced shortly after the 2012 
conference under a 3 clause BSD License 
 Available on SourceForge 
 http://sourceforge.net/projects/sparql-query-bm/files/1.0.0/ 
 Also as Maven artifacts (in Maven Central): 
 Group ID: net.sf.sparql-query-bm 
 Artifact IDs: 
 cmd 
 core 
 Latest 1.x Version: 1.1.0
6
 The 1.x tool can only benchmark SPARQL queries 
 SPARQL 1.1 has been standardized since the 1.x version of 
the tool was written and adds various additional SPARQL 
features that you may want to test: 
7 
 SPARQL Updates 
 SPARQL Graph Store Protocol 
 Queries are fixed 
 No parameterization support 
 Can't pass custom endpoint parameters in 
 For example enable/disable reasoning 
 Also no way to test endpoint specific extensions 
 e.g. transactions
8 
 Requires using HTTP endpoints to access the SPARQL 
system to be tested 
 Adds communication overheads to the results 
 Sometimes this may be desirable 
 No ability to test SPARQL operations in-memory 
 i.e. can't test lower level APIs
 Only supports a single benchmarking methodology 
 Methodology is hard coded 
 Can't do things like run a subset of the provided operations 
on each run 
9 
 Or repeat an operation within a run 
 Or retry an operation under specific failure conditions 
 Configuration of the methodology is tightly coupled to the 
methodology 
 Many aspects are actually independent of the methodology
1 
0 
 Used a simplistic text based format 
 One query file per line 
 No way to specify additional parameters 
 No way to assign a friendly name to queries 
 Assigns each query the filename
 There is a progress monitoring API but it is limited 
 E.g. Gets called after a query completes but not before it 
starts 
 Makes it awkward/impossible to implement some kinds of 
monitoring 
1 
1 
 e.g. crash detection, memory usage
1 
2 
 In the interests of speed over usability we rolled our own 
command line arguments parser 
 Means argument parsing is awkward to extend
1 
3
1 
4 
 Earlier this year we found a compelling reason to rewrite 
the tool and address the various limitations 
 First 2.x release was made 9th June 2014 
 Minor bug fix and maintenance releases since 
 Releases available at: 
 http://sourceforge.net/projects/sparql-query-bm/files/ 
 Code is now using Git 
 http://git.code.sf.net/p/sparql-query-bm/git sparql-query-bm-git 
 Mirrors available on GitHub for those who think that it is the one true source 
 https://github.com/rvesse/sparql-query-bm 
 Maven artifacts available through Maven Central as before: 
 Group ID: net.sf.sparql-query-bm 
 Artifact IDs: core, cmd and dist 
 Latest 2.x version: 2.0.1
 Concept of Queries replaced with the general concept of 
Operations 
 Also divorces the definition of an operation with how to run 
said operation 
1 
5 
 Makes it easier to change runtime behaviour of operations 
 20 built-in operations provided 
 API allows defining and plugging in new operations as 
desired 
 http://sparql-query-bm.sourceforge.net/javadoc/latest/core/
1 
6 
 Several kinds of query/update 
 Fixed 
 Parameterized 
 Dataset Size 
 Variants for both remote endpoints and in-memory 
datasets 
 Remote variants have additional NVP variants 
 Allows adding custom parameters to the remote request 
 Accounts for 13 of the built in operations
1 
7 
 One for each graph store protocol operation: 
 DELETE 
 GET 
 HEAD 
 POST 
 PUT 
 Accounts for a further 5 of the built-in operations
1 
8 
 Sleep 
 Do nothing for some period 
 Useful for simulating quiet periods as part of testing 
 Mix 
 Allow grouping a set of operations into a single operation 
 Lets you compose mixes from other mixes
1 
9 
 As already noted in-memory variants of some operations 
are now available 
 These run tests against a Dataset implementation 
 Part of Apache Jena ARQ API 
 Removes SPARQL Protocol and HTTP overhead from testing 
 Of course depending on Dataset implementation may still be some communication overhead 
 But this is likely using lower level back end native communications protocols instead
2 
0 
 Addresses the limitation of hard coded methodology 
 Separates test running into three components: 
 Overall runner 
 Mix runner 
 Operation runner 
 Each has own API and can be customized as desired 
 Various useful base/abstract implementations provided 
 Four different test runners are provided: 
 Benchmark 
 Smoke 
 Soak 
 Stress
2 
1 
 Smoke 
 Runs the mix once and indicates whether it passes/fails 
 Pass is defined as all operations pass 
 Soak 
 Run the mix continuously for some period of time 
 Test how a system reacts under continuous load 
 Stress 
 Run the mix with increasingly high load 
 Test how a system reacts under increasing load 
 AbstractRunner provides a basic framework and helper 
method to make it easy to add custom runners or 
customize existing runs
2 
2 
 Allows customizing how mixes and individual operations 
are run 
 Some alternative implementations built in: 
 E.g. SamplingOperationMixRunner 
 Runs a sample of the operations in the mix 
 May include repeats 
 E.g. RetryingOperationRunner 
 Retries an operation if it doesn't succeed 
 Easy to implement your own
2 
3 
 Separates test configuration from the test runner 
 Interface with all common configuration defined 
 Endpoints 
 Timeouts 
 Progress Listeners 
 etc 
 NB - Runners are typically defined such that they restrict 
their input options to sub-interfaces that add runner 
specific configuration e.g. 
 Warm-ups for benchmarks 
 Total runtime for soak testing 
 Ramp up factor for stress testing
2 
4 
 Now using TSV as the file format 
 Still wanted to be simple enough that someone with zero RDF/SPARQL knowledge can 
configure 
 Each line is a series of parameters separated by a tab 
character 
 First parameter is an identifier for the type of the operation 
 Used to decide how to interpret the remaining parameters 
 Can define your own mix file format and register a loader 
for it 
 Possible to override the loader for a specific operation 
identifier since this has an API 
 Means you can do neat tricks like use a mix designed for remote endpoints against an in-memory 
dataset
query 806670-warmup1.rq 806670 Warmup Query 1 
query 806670-warmup2.rq 806670 Warmup Query 2 
query 806670-nofilter.rq 806670 Query with No Filter 
query 806670-filter3.rq 806670 Query with Filter (Variant 3) 
param-query 806670-filter3-params.rq instances.tsv Parameterized Query with 
Filter (Variant 3) 
query 806670-filter4.rq 806670 Query with Filter (Variant 4) 
query 806670-filter4a.rq 806670 Query with Filter (Variant 4a - Zero Results) 
param-query 806670-filter4-params.rq instances.tsv Parameterized Query with 
Filter (Variant 4) 
query 806238-warmup1.rq 806238 Warmup Query 1 
query 806238-warmup2.rq 806238 Warmup Query 2 
query 806238-comment43.rq 806238 Query (Comment 43) 
query 806238-comment43a.rq 806238 Query (Comment 43 - SELECT * sub-query) 
query 806238-comment45.rq 806238 Query (Comment 45 - Multiple sub-queries) 
query 806238-comment54.rq 806238 Query (Comment 54) 
param-update load-full1m.ru graph-names.tsv Load 1M Dataset into named graph 
param-query count-loaded.rq graph-names.tsv Count named graph 
param-update drop-loaded.ru graph-names.tsv Drop named graph 
query count.rq Count quads 
checkpoint10 Checkpoint every 10 runs 
sleep 180 3 minute sleep 
2 
5
 Now provides notifications before and after operation and 
mix runs 
 Improvements to how some of the built-in 
implementations handle multi-threaded output 
2 
6 
 Makes it easier to distinguish where errors occurred when running multi-threaded 
benchmarks
2 
7 
 Now based upon the powerful open source Airline library 
 https://github.com/airlift/airline 
 Provides a command line interface to each built-in runner 
 Also provides AbstractCommandwith all standard options exposed 
 Standardized exit codes across all commands 
 Comprehensive built-in help 
 Can help you define operation mixes 
 ./operations 
 ./operation --op param-query
2 
8
 These are things we've done (or are currently doing) with 
the framework that aren't in the open source releases 
 However the 2.x framework makes these (hopefully) easy 
to replicate yourself 
2 
9
3 
0 
 Many stores often have rich REST APIs in addition to their 
SPARQL APIs 
 Can be useful to include testing of these in your mixes 
 Requires implementing two interfaces: 
 Operation 
 OperationCallable 
 Abstract implementations of both available to give you the 
boiler plate bits 
 Internally we have 9 different custom operations defined 
which test a subset of our REST API: 
 Database Management 
 Asynchronous Queries 
 Import Management
 One thing we're particularly interested in is how operations 
affect memory usage 
3 
1 
 We added custom progress listeners that track and monitor memory usage 
 Reports on min, max and average memory usage 
 We also have another progress listener that tracks 
processes to identify when a test run may have been 
impacted by other activity on the system
3 
2 
public class RetryOnAuthFailureOperationRunner extends RetryingOperationRunner { 
public RetryOnAuthFailureOperationRunner() { 
this(1); 
} 
public RetryOnAuthFailureOperationRunner(int maxRetries) { 
super(maxRetries); 
} 
@Override 
protected <T extends Options> boolean shouldRetry(Runner<T> runner, T options, 
Operation op, OperationRun run) { 
return run.getErrorCategory() == ErrorCategories.AUTHENTICATION; 
} 
} 
 Extends the built-in RetryingOperationRunner 
 Simply adds a constraint on retries by overriding the 
shouldRetry() method
3 
3
3 
4 
 Embrace Java 7 features fully 
 Use ServiceLoader to automatically discover new operations and mix formats 
 Make it even easier to customize runners 
 i.e. provide more abstraction of the current implementations
3 
5 
Questions? 
rvesse@yarcdata.com 
@RobVesse

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionApache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and Production
Databricks
 
SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)
andyseaborne
 

Was ist angesagt? (20)

Debugging Apache Spark - Scala & Python super happy fun times 2017
Debugging Apache Spark -   Scala & Python super happy fun times 2017Debugging Apache Spark -   Scala & Python super happy fun times 2017
Debugging Apache Spark - Scala & Python super happy fun times 2017
 
Pandas UDF and Python Type Hint in Apache Spark 3.0
Pandas UDF and Python Type Hint in Apache Spark 3.0Pandas UDF and Python Type Hint in Apache Spark 3.0
Pandas UDF and Python Type Hint in Apache Spark 3.0
 
Querying Linked Data with SPARQL
Querying Linked Data with SPARQLQuerying Linked Data with SPARQL
Querying Linked Data with SPARQL
 
Apache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and ProductionApache Spark MLlib 2.0 Preview: Data Science and Production
Apache Spark MLlib 2.0 Preview: Data Science and Production
 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data World
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
Holden Karau - Spark ML for Custom Models
Holden Karau - Spark ML for Custom ModelsHolden Karau - Spark ML for Custom Models
Holden Karau - Spark ML for Custom Models
 
SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
 
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on KubeflowMigrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
 
Apache Spark Super Happy Funtimes - CHUG 2016
Apache Spark Super Happy Funtimes - CHUG 2016Apache Spark Super Happy Funtimes - CHUG 2016
Apache Spark Super Happy Funtimes - CHUG 2016
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
 
Speeding up PySpark with Arrow
Speeding up PySpark with ArrowSpeeding up PySpark with Arrow
Speeding up PySpark with Arrow
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016
 
Apache: Big Data - Starting with Apache Spark, Best Practices
Apache: Big Data - Starting with Apache Spark, Best PracticesApache: Big Data - Starting with Apache Spark, Best Practices
Apache: Big Data - Starting with Apache Spark, Best Practices
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Intro to apache spark stand ford
Intro to apache spark stand fordIntro to apache spark stand ford
Intro to apache spark stand ford
 

Ähnlich wie Practical SPARQL Benchmarking Revisited

Performancetestingjmeter 131210111657-phpapp02
Performancetestingjmeter 131210111657-phpapp02Performancetestingjmeter 131210111657-phpapp02
Performancetestingjmeter 131210111657-phpapp02
Nitish Bhardwaj
 
WE18_Performance_Up.ppt
WE18_Performance_Up.pptWE18_Performance_Up.ppt
WE18_Performance_Up.ppt
webhostingguy
 

Ähnlich wie Practical SPARQL Benchmarking Revisited (20)

Integration Group - Robot Framework
Integration Group - Robot Framework Integration Group - Robot Framework
Integration Group - Robot Framework
 
Play framework : A Walkthrough
Play framework : A WalkthroughPlay framework : A Walkthrough
Play framework : A Walkthrough
 
Network Protocol Testing Using Robot Framework
Network Protocol Testing Using Robot FrameworkNetwork Protocol Testing Using Robot Framework
Network Protocol Testing Using Robot Framework
 
Automation using ibm rft
Automation using ibm rftAutomation using ibm rft
Automation using ibm rft
 
Maximizing SAP ABAP Performance
Maximizing SAP ABAP PerformanceMaximizing SAP ABAP Performance
Maximizing SAP ABAP Performance
 
Meetup 2022 - APIs with Quarkus.pdf
Meetup 2022 - APIs with Quarkus.pdfMeetup 2022 - APIs with Quarkus.pdf
Meetup 2022 - APIs with Quarkus.pdf
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Performancetestingjmeter 131210111657-phpapp02
Performancetestingjmeter 131210111657-phpapp02Performancetestingjmeter 131210111657-phpapp02
Performancetestingjmeter 131210111657-phpapp02
 
Linaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISALinaro Connect 2016 (BKK16) - Introduction to LISA
Linaro Connect 2016 (BKK16) - Introduction to LISA
 
Testing Toolbox
Testing ToolboxTesting Toolbox
Testing Toolbox
 
10071756.ppt
10071756.ppt10071756.ppt
10071756.ppt
 
Adventures in Laravel 5 SunshinePHP 2016 Tutorial
Adventures in Laravel 5 SunshinePHP 2016 TutorialAdventures in Laravel 5 SunshinePHP 2016 Tutorial
Adventures in Laravel 5 SunshinePHP 2016 Tutorial
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Mykola Kovsh - Functional API automation with Jmeter
Mykola Kovsh - Functional API automation with JmeterMykola Kovsh - Functional API automation with Jmeter
Mykola Kovsh - Functional API automation with Jmeter
 
Marathon Testing Tool
Marathon Testing ToolMarathon Testing Tool
Marathon Testing Tool
 
Performance Testing REST APIs
Performance Testing REST APIsPerformance Testing REST APIs
Performance Testing REST APIs
 
Basics of QTP Framework
Basics of QTP FrameworkBasics of QTP Framework
Basics of QTP Framework
 
Robot framework
Robot frameworkRobot framework
Robot framework
 
How to use Exachk effectively to manage Exadata environments OGBEmea
How to use Exachk effectively to manage Exadata environments OGBEmeaHow to use Exachk effectively to manage Exadata environments OGBEmea
How to use Exachk effectively to manage Exadata environments OGBEmea
 
WE18_Performance_Up.ppt
WE18_Performance_Up.pptWE18_Performance_Up.ppt
WE18_Performance_Up.ppt
 

Mehr von Rob Vesse

Mehr von Rob Vesse (6)

Challenges and patterns for semantics at scale
Challenges and patterns for semantics at scaleChallenges and patterns for semantics at scale
Challenges and patterns for semantics at scale
 
Introducing JDBC for SPARQL
Introducing JDBC for SPARQLIntroducing JDBC for SPARQL
Introducing JDBC for SPARQL
 
Practical SPARQL Benchmarking
Practical SPARQL BenchmarkingPractical SPARQL Benchmarking
Practical SPARQL Benchmarking
 
Everyday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
 
Everyday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web DeveloperEveryday Tools for the Semantic Web Developer
Everyday Tools for the Semantic Web Developer
 
dotNetRDF - A Semantic Web/RDF Library for .Net Developers
dotNetRDF - A Semantic Web/RDF Library for .Net DevelopersdotNetRDF - A Semantic Web/RDF Library for .Net Developers
dotNetRDF - A Semantic Web/RDF Library for .Net Developers
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Practical SPARQL Benchmarking Revisited

  • 1. 1 Rob Vesse rvesse@yarcdata.com @RobVesse
  • 2. 2 1. Rewind to 2012 2. Limitations 3. Evolving the Framework 4. Examples 5. Future Work
  • 3. 3
  • 4. 4  Presentation I gave at this conference in 2012  Slides at http://www.slideshare.net/RobVesse/practical-sparql-benchmarking  Highlighted some issues with SPARQL Benchmarking:  Standard Benchmarks all have know deficiencies  Lack of standardized methodology  Best benchmark is the one you run with your data and workload  Introduced the 1.x version of our SPARQL Query Benchmarker tool  Java tool and API for benchmarking  Used a methodology based upon combination of the BSBM runner and Revelytix SP2B white paper  Reports various appropriate statistics  Various configuration options to change what exactly is benchmarked e.g. whether results are fully parsed and counted
  • 5. 5  The 1.x tool was open sourced shortly after the 2012 conference under a 3 clause BSD License  Available on SourceForge  http://sourceforge.net/projects/sparql-query-bm/files/1.0.0/  Also as Maven artifacts (in Maven Central):  Group ID: net.sf.sparql-query-bm  Artifact IDs:  cmd  core  Latest 1.x Version: 1.1.0
  • 6. 6
  • 7.  The 1.x tool can only benchmark SPARQL queries  SPARQL 1.1 has been standardized since the 1.x version of the tool was written and adds various additional SPARQL features that you may want to test: 7  SPARQL Updates  SPARQL Graph Store Protocol  Queries are fixed  No parameterization support  Can't pass custom endpoint parameters in  For example enable/disable reasoning  Also no way to test endpoint specific extensions  e.g. transactions
  • 8. 8  Requires using HTTP endpoints to access the SPARQL system to be tested  Adds communication overheads to the results  Sometimes this may be desirable  No ability to test SPARQL operations in-memory  i.e. can't test lower level APIs
  • 9.  Only supports a single benchmarking methodology  Methodology is hard coded  Can't do things like run a subset of the provided operations on each run 9  Or repeat an operation within a run  Or retry an operation under specific failure conditions  Configuration of the methodology is tightly coupled to the methodology  Many aspects are actually independent of the methodology
  • 10. 1 0  Used a simplistic text based format  One query file per line  No way to specify additional parameters  No way to assign a friendly name to queries  Assigns each query the filename
  • 11.  There is a progress monitoring API but it is limited  E.g. Gets called after a query completes but not before it starts  Makes it awkward/impossible to implement some kinds of monitoring 1 1  e.g. crash detection, memory usage
  • 12. 1 2  In the interests of speed over usability we rolled our own command line arguments parser  Means argument parsing is awkward to extend
  • 13. 1 3
  • 14. 1 4  Earlier this year we found a compelling reason to rewrite the tool and address the various limitations  First 2.x release was made 9th June 2014  Minor bug fix and maintenance releases since  Releases available at:  http://sourceforge.net/projects/sparql-query-bm/files/  Code is now using Git  http://git.code.sf.net/p/sparql-query-bm/git sparql-query-bm-git  Mirrors available on GitHub for those who think that it is the one true source  https://github.com/rvesse/sparql-query-bm  Maven artifacts available through Maven Central as before:  Group ID: net.sf.sparql-query-bm  Artifact IDs: core, cmd and dist  Latest 2.x version: 2.0.1
  • 15.  Concept of Queries replaced with the general concept of Operations  Also divorces the definition of an operation with how to run said operation 1 5  Makes it easier to change runtime behaviour of operations  20 built-in operations provided  API allows defining and plugging in new operations as desired  http://sparql-query-bm.sourceforge.net/javadoc/latest/core/
  • 16. 1 6  Several kinds of query/update  Fixed  Parameterized  Dataset Size  Variants for both remote endpoints and in-memory datasets  Remote variants have additional NVP variants  Allows adding custom parameters to the remote request  Accounts for 13 of the built in operations
  • 17. 1 7  One for each graph store protocol operation:  DELETE  GET  HEAD  POST  PUT  Accounts for a further 5 of the built-in operations
  • 18. 1 8  Sleep  Do nothing for some period  Useful for simulating quiet periods as part of testing  Mix  Allow grouping a set of operations into a single operation  Lets you compose mixes from other mixes
  • 19. 1 9  As already noted in-memory variants of some operations are now available  These run tests against a Dataset implementation  Part of Apache Jena ARQ API  Removes SPARQL Protocol and HTTP overhead from testing  Of course depending on Dataset implementation may still be some communication overhead  But this is likely using lower level back end native communications protocols instead
  • 20. 2 0  Addresses the limitation of hard coded methodology  Separates test running into three components:  Overall runner  Mix runner  Operation runner  Each has own API and can be customized as desired  Various useful base/abstract implementations provided  Four different test runners are provided:  Benchmark  Smoke  Soak  Stress
  • 21. 2 1  Smoke  Runs the mix once and indicates whether it passes/fails  Pass is defined as all operations pass  Soak  Run the mix continuously for some period of time  Test how a system reacts under continuous load  Stress  Run the mix with increasingly high load  Test how a system reacts under increasing load  AbstractRunner provides a basic framework and helper method to make it easy to add custom runners or customize existing runs
  • 22. 2 2  Allows customizing how mixes and individual operations are run  Some alternative implementations built in:  E.g. SamplingOperationMixRunner  Runs a sample of the operations in the mix  May include repeats  E.g. RetryingOperationRunner  Retries an operation if it doesn't succeed  Easy to implement your own
  • 23. 2 3  Separates test configuration from the test runner  Interface with all common configuration defined  Endpoints  Timeouts  Progress Listeners  etc  NB - Runners are typically defined such that they restrict their input options to sub-interfaces that add runner specific configuration e.g.  Warm-ups for benchmarks  Total runtime for soak testing  Ramp up factor for stress testing
  • 24. 2 4  Now using TSV as the file format  Still wanted to be simple enough that someone with zero RDF/SPARQL knowledge can configure  Each line is a series of parameters separated by a tab character  First parameter is an identifier for the type of the operation  Used to decide how to interpret the remaining parameters  Can define your own mix file format and register a loader for it  Possible to override the loader for a specific operation identifier since this has an API  Means you can do neat tricks like use a mix designed for remote endpoints against an in-memory dataset
  • 25. query 806670-warmup1.rq 806670 Warmup Query 1 query 806670-warmup2.rq 806670 Warmup Query 2 query 806670-nofilter.rq 806670 Query with No Filter query 806670-filter3.rq 806670 Query with Filter (Variant 3) param-query 806670-filter3-params.rq instances.tsv Parameterized Query with Filter (Variant 3) query 806670-filter4.rq 806670 Query with Filter (Variant 4) query 806670-filter4a.rq 806670 Query with Filter (Variant 4a - Zero Results) param-query 806670-filter4-params.rq instances.tsv Parameterized Query with Filter (Variant 4) query 806238-warmup1.rq 806238 Warmup Query 1 query 806238-warmup2.rq 806238 Warmup Query 2 query 806238-comment43.rq 806238 Query (Comment 43) query 806238-comment43a.rq 806238 Query (Comment 43 - SELECT * sub-query) query 806238-comment45.rq 806238 Query (Comment 45 - Multiple sub-queries) query 806238-comment54.rq 806238 Query (Comment 54) param-update load-full1m.ru graph-names.tsv Load 1M Dataset into named graph param-query count-loaded.rq graph-names.tsv Count named graph param-update drop-loaded.ru graph-names.tsv Drop named graph query count.rq Count quads checkpoint10 Checkpoint every 10 runs sleep 180 3 minute sleep 2 5
  • 26.  Now provides notifications before and after operation and mix runs  Improvements to how some of the built-in implementations handle multi-threaded output 2 6  Makes it easier to distinguish where errors occurred when running multi-threaded benchmarks
  • 27. 2 7  Now based upon the powerful open source Airline library  https://github.com/airlift/airline  Provides a command line interface to each built-in runner  Also provides AbstractCommandwith all standard options exposed  Standardized exit codes across all commands  Comprehensive built-in help  Can help you define operation mixes  ./operations  ./operation --op param-query
  • 28. 2 8
  • 29.  These are things we've done (or are currently doing) with the framework that aren't in the open source releases  However the 2.x framework makes these (hopefully) easy to replicate yourself 2 9
  • 30. 3 0  Many stores often have rich REST APIs in addition to their SPARQL APIs  Can be useful to include testing of these in your mixes  Requires implementing two interfaces:  Operation  OperationCallable  Abstract implementations of both available to give you the boiler plate bits  Internally we have 9 different custom operations defined which test a subset of our REST API:  Database Management  Asynchronous Queries  Import Management
  • 31.  One thing we're particularly interested in is how operations affect memory usage 3 1  We added custom progress listeners that track and monitor memory usage  Reports on min, max and average memory usage  We also have another progress listener that tracks processes to identify when a test run may have been impacted by other activity on the system
  • 32. 3 2 public class RetryOnAuthFailureOperationRunner extends RetryingOperationRunner { public RetryOnAuthFailureOperationRunner() { this(1); } public RetryOnAuthFailureOperationRunner(int maxRetries) { super(maxRetries); } @Override protected <T extends Options> boolean shouldRetry(Runner<T> runner, T options, Operation op, OperationRun run) { return run.getErrorCategory() == ErrorCategories.AUTHENTICATION; } }  Extends the built-in RetryingOperationRunner  Simply adds a constraint on retries by overriding the shouldRetry() method
  • 33. 3 3
  • 34. 3 4  Embrace Java 7 features fully  Use ServiceLoader to automatically discover new operations and mix formats  Make it even easier to customize runners  i.e. provide more abstraction of the current implementations
  • 35. 3 5 Questions? rvesse@yarcdata.com @RobVesse

Hinweis der Redaktion

  1. Ask for a show of hands as to who has used the tool to get an idea of the audience
  2. SPARQL 1.1 standardized 21st March 2013