SlideShare ist ein Scribd-Unternehmen logo
1 von 20
A Machine Learning Approach to
SPARQL Query Performance
Prediction
Rakebul Hasan
Wimmics Research Team
INRIA Sophia Antipolis
France
Context
2
Slide derived from Andreas Blumauer’s Linked Data slides
• Linked Data Principles
1. Use URIs as names for things.
2. Use HTTP URIs, so that people can
look up those names.
3. When someone looks up a URI,
provide useful information, using the
standards (RDF, SPARQL).
4. Include links to other URIs, so that
they can discover more things.
Context
• W3C Linking Open Data (LOD) Initiative
• An initiative to publish open data as Linked Data
• From 2 billion triples in 2007 to 30 billion triples in 2011
• Accessing Linked Data
– Dereferencing URIs
– SPARQL Endpoints
3
Context
• Querying Linked Data
– SPARQL Endpoints: SPARQL query service via HTTP
implementing SPARQL Protocol
– 68% of the data sets provide SPARQL Endpoints as
of September 2011
– As of Today, 98% of the triples in LOD cloud are
accessible via SPARQL
• 57,856,463,005 out of 58,882,358,557 triples
http://stats.lod2.eu/
4
Context
• Understanding Query Behavior in the context
of Linked Data
– Workload allocation to ensure specific QoS
requirements are met
– Predicting query performance metrics
5
Query Performance Prediction
• Traditional approaches use underlying data statistics-based
cost models to predict query performance
• Data statistics are often missing in the Linked Data scenario
– Only 32.20 % (95 out of 295) data sources provide a voiD
description.
• Basic statistics such as number of triples, often not detailed enough
for statistics based models
– In fact, what makes effective statistics for query cost estimation on RDF is
unclear.
• Challenge
– How to predict query performance without using data statistics?
6
Understanding performance of
database queries
• Ganapathi et al. predicting performance
metrics of database queries prior to query
execution using machine learning.
• Akdere et al. use machine learning for
predicting query execution time.
Ganapathi et al.: Predicting multiple metrics for queries: Better decisions enabled by machine learning, ICDE’09
Akdere et al, Learning-based query performance modeling and prediction, ICDE’12, 7
Predicting Query Performance
• Learn query performance from already
executed queries
• Challenge: how to model SPARQL query
characteristics for machine learning
algorithms - feature extraction?
8
Modeling SPARQL Query Execution
• Two types of features
– Algebra features: extracted from SPARQL algebraic
expression of a query
– Graph pattern features: a vector representation of
the query pattern of a query relative to the
training queries
9
Modeling SPARQL Query Execution
• Algebra features
– Jena API to extract
SPARQL algebra
expressions
10
• Graph pattern features
– Find landmarks in training
queries by clustering
• K-medoids with
approximate graph edit
distance
– Compute distance
between landmark queries
and the query in
examination to construct a
graph pattern feature
vector
• Approximate graph edit
distance for distance
computation
11
Graph Edit Distance
• Minimum amount of distortion needed to
transform one graph to another
– Bipartite matching based approximated graph edit
distance with
• Previous research shows accurate results with
classification problems
Riesen et al. “A Novel Software Toolkit for Graph Edit Distance Computation”, 9th IAPR-TC-15, GbRPR 2013
12
Experiments
• 1260 training, 420 validation, and 420 test
queries generated from DBPSB benchmark query
templates
– DBPSB templates cover most commonly used SPARQL
query features in the queries sent to DBPedia
• DBpedia as RDF data set
• Predicting query execution time
– k-NN regression with k-D tree
– SVM with nu-SVR for regression
13
DBpedia: http://dbpedia.org/
DBPSB: http://aksw.org/Projects/DBPSB.html
Algebra Features
14
Algebra and graph pattern features
15
Time Required for Training and
Predictions
16
Summary
• Understanding SPARQL query behavior in the
Linked Data scenario
– Predicting query performance metrics
• learn query execution times from already
executed queries
– without using statistics about the underlying RDF
data.
– Modeling (vector representation) SPARQL queries for
machine learning algorithms
• Feature extraction
– Highly accurate predictions for common Linked Data
queries
17
Future Work on QPP
• Incorporating bandwidth related features.
• Query optimization for Linked Data applications:
– in place of selectivity estimation for alternative
queries?
• How to accurately predict performance for single
triple patterns
– Alternative query construction for Linked Data
applications – join order optimization. E.g. Federated
Query Processing over Linked Data
• How to generate training queries?
– Next slide
18
Statistical Analysis of Query Logs
• Approach to systematically generating training queries
Mario Arias, Javier D. Fernández, Miguel A. Martínez-Prieto, Pablo de la Fuente: An Empirical Study of Real-World SPARQL Queries,
1st International Workshop on Usage Analysis and the Web of Data,
co-located with the 20th International World Wide Web Conference (WWW2011)
19
• Thank you
20

Weitere ähnliche Inhalte

Was ist angesagt?

Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
Databricks
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 

Was ist angesagt? (20)

Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
Large-Scaled Insurance Analytics Using Tweedie Models in Apache Spark with Ya...
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With Spark
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...
 
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4jExtending Spark Graph for the Enterprise with Morpheus and Neo4j
Extending Spark Graph for the Enterprise with Morpheus and Neo4j
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflow
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySparkPandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
 
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
 

Andere mochten auch

Sigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRSigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IR
David Carmel
 
070517 Jena
070517 Jena070517 Jena
070517 Jena
yuhana
 
Storage And Retrieval Of Information
Storage And Retrieval Of InformationStorage And Retrieval Of Information
Storage And Retrieval Of Information
Marcus9000
 

Andere mochten auch (20)

Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
 
Strategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked DataStrategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked Data
 
Sigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRSigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IR
 
Jess Tab Tutorial
Jess Tab TutorialJess Tab Tutorial
Jess Tab Tutorial
 
SWRL Overview
SWRL OverviewSWRL Overview
SWRL Overview
 
서울시 열린데이터 광장 문화관광 분야 LOD 서비스
서울시 열린데이터 광장 문화관광 분야 LOD 서비스서울시 열린데이터 광장 문화관광 분야 LOD 서비스
서울시 열린데이터 광장 문화관광 분야 LOD 서비스
 
Jena
JenaJena
Jena
 
070517 Jena
070517 Jena070517 Jena
070517 Jena
 
17 using rules of inference to build arguments
17   using rules of inference to build arguments17   using rules of inference to build arguments
17 using rules of inference to build arguments
 
Jena based implementation of a iso 11179 meta data registry
Jena based implementation of a iso 11179 meta data registryJena based implementation of a iso 11179 meta data registry
Jena based implementation of a iso 11179 meta data registry
 
An Introduction to the Jena API
An Introduction to the Jena APIAn Introduction to the Jena API
An Introduction to the Jena API
 
Tutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and SystemsTutorial - Introduction to Rule Technologies and Systems
Tutorial - Introduction to Rule Technologies and Systems
 
Semantic Integration with Apache Jena and Stanbol
Semantic Integration with Apache Jena and StanbolSemantic Integration with Apache Jena and Stanbol
Semantic Integration with Apache Jena and Stanbol
 
Unit 1 rules of inference
Unit 1  rules of inferenceUnit 1  rules of inference
Unit 1 rules of inference
 
LOD(Linked Open Data) Recommendations
LOD(Linked Open Data) RecommendationsLOD(Linked Open Data) Recommendations
LOD(Linked Open Data) Recommendations
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep Learning
 
Semtech web-protege-tutorial
Semtech web-protege-tutorialSemtech web-protege-tutorial
Semtech web-protege-tutorial
 
devices and methods for automatic data capture
devices and methods for automatic data capturedevices and methods for automatic data capture
devices and methods for automatic data capture
 
LODAC 2017 Linked Open Data Workshop
LODAC 2017 Linked Open Data WorkshopLODAC 2017 Linked Open Data Workshop
LODAC 2017 Linked Open Data Workshop
 
Storage And Retrieval Of Information
Storage And Retrieval Of InformationStorage And Retrieval Of Information
Storage And Retrieval Of Information
 

Ähnlich wie A Machine Learning Approach to SPARQL Query Performance Prediction

Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiConquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Databricks
 

Ähnlich wie A Machine Learning Approach to SPARQL Query Performance Prediction (20)

The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Large Scale Machine learning with Spark
Large Scale Machine learning with SparkLarge Scale Machine learning with Spark
Large Scale Machine learning with Spark
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
Fighting Fraud with Apache Spark
Fighting Fraud with Apache SparkFighting Fraud with Apache Spark
Fighting Fraud with Apache Spark
 
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiConquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Big Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudDataBig Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudData
 
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
Metail and Elastic MapReduce
Metail and Elastic MapReduceMetail and Elastic MapReduce
Metail and Elastic MapReduce
 

Kürzlich hochgeladen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Kürzlich hochgeladen (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 

A Machine Learning Approach to SPARQL Query Performance Prediction

  • 1. A Machine Learning Approach to SPARQL Query Performance Prediction Rakebul Hasan Wimmics Research Team INRIA Sophia Antipolis France
  • 2. Context 2 Slide derived from Andreas Blumauer’s Linked Data slides • Linked Data Principles 1. Use URIs as names for things. 2. Use HTTP URIs, so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL). 4. Include links to other URIs, so that they can discover more things.
  • 3. Context • W3C Linking Open Data (LOD) Initiative • An initiative to publish open data as Linked Data • From 2 billion triples in 2007 to 30 billion triples in 2011 • Accessing Linked Data – Dereferencing URIs – SPARQL Endpoints 3
  • 4. Context • Querying Linked Data – SPARQL Endpoints: SPARQL query service via HTTP implementing SPARQL Protocol – 68% of the data sets provide SPARQL Endpoints as of September 2011 – As of Today, 98% of the triples in LOD cloud are accessible via SPARQL • 57,856,463,005 out of 58,882,358,557 triples http://stats.lod2.eu/ 4
  • 5. Context • Understanding Query Behavior in the context of Linked Data – Workload allocation to ensure specific QoS requirements are met – Predicting query performance metrics 5
  • 6. Query Performance Prediction • Traditional approaches use underlying data statistics-based cost models to predict query performance • Data statistics are often missing in the Linked Data scenario – Only 32.20 % (95 out of 295) data sources provide a voiD description. • Basic statistics such as number of triples, often not detailed enough for statistics based models – In fact, what makes effective statistics for query cost estimation on RDF is unclear. • Challenge – How to predict query performance without using data statistics? 6
  • 7. Understanding performance of database queries • Ganapathi et al. predicting performance metrics of database queries prior to query execution using machine learning. • Akdere et al. use machine learning for predicting query execution time. Ganapathi et al.: Predicting multiple metrics for queries: Better decisions enabled by machine learning, ICDE’09 Akdere et al, Learning-based query performance modeling and prediction, ICDE’12, 7
  • 8. Predicting Query Performance • Learn query performance from already executed queries • Challenge: how to model SPARQL query characteristics for machine learning algorithms - feature extraction? 8
  • 9. Modeling SPARQL Query Execution • Two types of features – Algebra features: extracted from SPARQL algebraic expression of a query – Graph pattern features: a vector representation of the query pattern of a query relative to the training queries 9
  • 10. Modeling SPARQL Query Execution • Algebra features – Jena API to extract SPARQL algebra expressions 10
  • 11. • Graph pattern features – Find landmarks in training queries by clustering • K-medoids with approximate graph edit distance – Compute distance between landmark queries and the query in examination to construct a graph pattern feature vector • Approximate graph edit distance for distance computation 11
  • 12. Graph Edit Distance • Minimum amount of distortion needed to transform one graph to another – Bipartite matching based approximated graph edit distance with • Previous research shows accurate results with classification problems Riesen et al. “A Novel Software Toolkit for Graph Edit Distance Computation”, 9th IAPR-TC-15, GbRPR 2013 12
  • 13. Experiments • 1260 training, 420 validation, and 420 test queries generated from DBPSB benchmark query templates – DBPSB templates cover most commonly used SPARQL query features in the queries sent to DBPedia • DBpedia as RDF data set • Predicting query execution time – k-NN regression with k-D tree – SVM with nu-SVR for regression 13 DBpedia: http://dbpedia.org/ DBPSB: http://aksw.org/Projects/DBPSB.html
  • 15. Algebra and graph pattern features 15
  • 16. Time Required for Training and Predictions 16
  • 17. Summary • Understanding SPARQL query behavior in the Linked Data scenario – Predicting query performance metrics • learn query execution times from already executed queries – without using statistics about the underlying RDF data. – Modeling (vector representation) SPARQL queries for machine learning algorithms • Feature extraction – Highly accurate predictions for common Linked Data queries 17
  • 18. Future Work on QPP • Incorporating bandwidth related features. • Query optimization for Linked Data applications: – in place of selectivity estimation for alternative queries? • How to accurately predict performance for single triple patterns – Alternative query construction for Linked Data applications – join order optimization. E.g. Federated Query Processing over Linked Data • How to generate training queries? – Next slide 18
  • 19. Statistical Analysis of Query Logs • Approach to systematically generating training queries Mario Arias, Javier D. Fernández, Miguel A. Martínez-Prieto, Pablo de la Fuente: An Empirical Study of Real-World SPARQL Queries, 1st International Workshop on Usage Analysis and the Web of Data, co-located with the 20th International World Wide Web Conference (WWW2011) 19

Hinweis der Redaktion

  1. Web of Documents -> documents were described using HTML and globally identified using URLs Retrieval mechanism: HTTP protocol all these ensured creating a single global data space Data on the Web Many formats and APIs Proprietary interfaces No single global data space – no hyperlinks between data items within different data sources Web of Data -> a single global data space using RDF to publish data on the Web links between data items within different data sources
  2. Histograms -> on which to create histograms for effective estimation
  3. The graph edit distance between two graphs is the minimum amount of distortion needed to transform one graph to another. The minimum amount of distortion is the sequence of edit operations with minimum cost. The edit operations are deletions, insertions, and substitutions of nodes and edges.
  4. Refining training queries from query logs by considering the statistically significant characteristics Bootstrapping: Starting with a initial set of properties, resources and literals and than generate training queries by permutations and combinations of the statistically significant features Simplifying the pattern features: join features, triple pattern features, pattern graph features to represent query patterns