Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benchmarks

127 Aufrufe

Veröffentlicht am

Triplestores are data management systems for storing and querying RDF data. Over recent years, various benchmarks have been proposed to assess the performance of triplestores across different performance measures. However, choosing the most suitable benchmark for evaluating triplestores in practical settings is not a trivial task. This is because triplestores experience varying workloads when deployed in real applications. We address the problem of determining an appropriate benchmark for a given real-life workload by providing a fine-grained comparative analysis of existing triplestore benchmarks. In particular, we analyze the data and queries provided with the existing triplestore benchmarks in addition to several real-world datasets. Furthermore, we measure the correlation between the query execution time and various SPARQL query features and rank those features based on their significance levels. Our experiments reveal several interesting insights about the design of such benchmarks. With this fine-grained evaluation, we aim to support the design and implementation of more diverse benchmarks. Application developers can use our result to analyze their data and queries and choose a data management system.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benchmarks

  1. 1. May15th, 2019 HOW REPRESENTATIVE IS A SPARQL BENCHMARK? AN ANALYSIS OF RDF TRIPLESTORE BENCHMARKS Muhammad Saleem, Gábor Szárnyas, Felix Conrads, Syed Ahmad Chan Bukhari, Qaiser Mehmood, Axel- Cyrille Ngonga Ngomo The Web Conference 2019, San Francisco 1
  2. 2. MOTIVATION  Various RDF Triplestores  e.g., Virtuoso, FUSEKI, Blazgraph, Stardog, RDF3X etc.  Various Triplestore benchmarks  e.g., WatDiv, FEASIBLE, LDBC, BSBM, SP2Bench etc.  Varying workload on Triplestores  Various important SPARQL query features  Which benchmark is more representative?  Which benchmark is more suitable to test given Triplestore?  How SPARQL features effect the query runtimes? 2
  3. 3. QUERYING BENCHMARK COMPONENTS  Dataset(s)  Queries  Performance metrics  Execution rules 3
  4. 4. IMPORTANT RDF DATASET FEATURES RDF Datasets used in the querying benchmark should vary:  Number of triples  Number of classes  Number of resources  Number of properties  Number of objects  Average properties per class  Average instances per class  Average in-degree and out-degree  Structuredness or coherence  Relationship specialty 4
  5. 5. IMPORTANT SPARQL QUERY FEATURES  Number of triple patterns  Number of projection variables  Number of BGPs  Number of join vertices  Mean join vertex degree  Query result set sizes  Mean triple pattern selectivity  BGP-restricted triple pattern selectivity  Join-restricted triple pattern selectivity  Overall diversity score (average coefficient of variation)  Join vertex types (`star', `path', `hybrid', `sink')  SPARQL clauses used (e.g., LIMIT, UNION, OPTIONAL, FILTER etc.) 5 SPARQL queries as directed hypergraph
  6. 6. IMPORTANT PERFORMANCE METRICS (1/2)  Query Processing Related  Query execution time  Query Mix per Hour (QMpH)  Queries per Second (QpS)  CPU and memory usage  Intermediate results  Number of disk/memory swaps  Result Set Related  Result set correctness  Result set completeness 6
  7. 7. IMPORTANT PERFORMANCE METRICS (2/2)  Data Storage Related  Data loading time  Storage space  Index size  Parallelism with/without Updates  Parallel querying agents  Parallel data updates agents 7
  8. 8. BENCHMARKS SELECTION CRITERIA  Target query runtime performance evaluation of triplestores  RDF Datasets available  SPARQL queries available  No reasoning required to get complete results 8
  9. 9. SELECTED BENCHMARKS  Real data and/or queries benchmarks  FishMark  BioBench  FEASIBLE  Dbpedia SPARQL Benchmark (DBPSB)  Synthetic benchmarks  Bowlogna  TrainBench  Berlin SPARQL Benchmark (BSBM)  SP2Bench  WatDiv  Social Networking Benchmark (SNB) 9  Real-world datasets and queries  Dbpedia 3.5.1  Semantic Web Dog Food (SWDF)  NCBIGene  SIDER  DrugBank
  10. 10. DATASETS ANALYSIS: STRUCTUREDNESS 10  Duan et al. assumption  Real datasets are less structured  Synthetic datasets are high structured The dataset structuredness problem is well covered in recent synthetic data generators (e.g., WatDiv, TrainBench)
  11. 11. DATASETS ANALYSIS: RELATIONSHIP SPECIALTY 11  Qiao et al. assumption  Synthetic datasets have low relationship specialty The low relationship specialty problem in synthetic datasets still exists in general and needs to be covered in future synthetic benchmark generation approaches
  12. 12. QUERIES ANALYSIS: OVERALL DIVERSITY SCORE 12 Benchmarks queries diversity (high to low): FEASIBLE  BioBench  FishMark  WatDiv  Bowlogna  SP2Bench  BSBM  DBPSB  SNB- BI  SNB-INT  TrainBench
  13. 13. QUERIES ANALYSIS: DISTRIBUTION OF SPARQL CLAUSES AND JOIN VERTEX TYPES 13 Only FEASIBLE and BioBench do not completely miss or overused features Synthetic benchmarks often fail to contain important SPARQL clauses
  14. 14. PERFORMANCE METRICS 14 BSBM reported the results for maximum metrics among the selected benchmarks
  15. 15. SPEARSMAN’S CORRELATION WITH RUNTIMES 15 Highest impact on query runtimes: PV  JV  TP  Result  JVD  JTPS  TPS  BGPs  LSQ  BTPS The SPARQL query features we selected have a weak correlation with query execution time, suggesting that the query runtime is a complex measure affected by multidimensional SPARQL query features
  16. 16. EFFECT OF DATASETS STRUCTUREDNESS 16
  17. 17. CONCLUSIONS 17  The dataset structuredness problem is well covered in recent synthetic data generators (e.g., WatDiv, TrainBench)  The low relationship specialty problem in synthetic datasets still exists in general and needs to be covered in future synthetic benchmark generation approaches  The FEASIBLE framework employed on DBpedia generated the most diverse benchmark in our evaluation  The SPARQL query features we selected have a weak correlation with query execution time, suggesting that the query runtime is a complex measure affected by multidimensional SPARQL query features  Still, the number of projection variables, join vertices, triple patterns, the result sizes, and the join vertex degree are the top five SPARQL features that most impact the overall query execution time  Synthetic benchmarks often fail to contain important SPARQL clauses such as DISTINCT, FILTER, OPTIONAL, LIMIT and UNION  The dataset structuredness has a direct correlation with the result sizes and execution times of queries and indirect correlation with dataset

×