SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Shima Zahmatkesh, Emanuele Della Valle, and Daniele Dell'Aglio
DEIB - Politecnico of Milano
DEBS 2017 – Barcelona, Spain
23 June 2017
Using Rank Aggregation in Continuously
Answering SPARQL Queries on Streaming
and Quasi-static Linked Data
Stream Processing in Nutshell
Stream Processing Engine
ResultsWindows
Stream data Register query once
and execute it
continuously
!2
Web Stream Processing
Web
Results
Join
Windows
Web Streams Linked Data
✓ High Latency
✓ Rate Limits
✓ Loosing Reactiveness
!3
Stream Processing Engine
RDF Stream Processing (RSP) EngineRSPengine
Web
Results
Join
Windows
RDF Streams SPARQL endpoint
!4
Motivation
The cloth brand ACME wants to persuade influential Social
Networks users to post commercial endorsements.
Every minute give me the ID of the users that are mentioned on
Social Network in the last 10 minutes whose number of followers is
greater than 100,000.
!5
REGISTER STREAM <:InfluencersToContact> AS
CONSTRUCT {?user a :influentialUser}
FROM NAMED WINDOW W ON S [RANGE 10m STEP 1m]
WHERE {
WINDOW W {?user :hasMentions ?mentionsNumber}
SERVICE BKG {?user :hasFollowers ?followerCount }
FILTER (?followerCount > 100,000)
}
Problem DefinitionRSPengine
Web
Results
Join
WindowsRDF Streams
Define Refresh
Budget to control
reactiveness
!6
Data become stale
if not refreshed
Correct vs
approximate
answer
SPARQL endpoint
Local
Replica
Problem DefinitionRSPengine
Web
Results
Join
WindowsRDF Streams SPARQL endpoint
!7
Local
Replica
Maintenance
Policy
ACQUA, ACQUA.F Frameworks
!8
WINDOW clause
Stream data
JOIN Proposer Ranker
Maintainer
2
3
1
SERVICE clause
ACQUA: without FILTER
ACQUA.F: with FILTER Clause
E
C
ACQUA:
RND
LRU
WBM
ACQUA.F:
Filter Update Policy
RND.F
LRU.F
WBM.F
Candidate set
Elected set: top γ mappings
of Candidate set
Local Replica
WSJ: Filter out mappings that
are not involved in current
evaluation
Soheila Dehghanzadeh, et al., Approximate Continuous Query Answering over Streams and Dynamic Linked Data Sets, ICWE 2015.
Shima Zahmatkesh, et al., When a FILTER Makes the Dierence in Continuously Answering SPARQL Queries on Streaming and
Quasi-Static Linked Data, ICWE 2016.
Emanuele Della Valle, et al., Taming velocity and variety simultaneously in big data with stream reasoning, DEBS 2016.
Rankers
• LRU
• Use Least-Recently Used (LRU) cache replacement algorithm
• The less recently a mapping have been refreshed in a query, the
higher is its rank.
• Filter Update Policy
• For each mapping in the replica:
• Computes how close is the value associate to the variable of the
mapping to the Filtering Threshold used in Filter clause.
• Arrange mappings in ascending order.
!9
User Last Update Time LRU policy #followers Filter Update
Alice 8 1 120 2
Bob 10 2 30 3
Carol 14 3 95 1
Filtering Threshold = 100
Rank Aggregation
• Fairly take into account the opinions of different
algorithms.
• Combine the ranking lists obtained from different
algorithms by computing aggregated score
!10
User Score
Alice 0.8
Bob 0.7
Carol 0.5
David 0.4
Eve 0.1
User Score
Bob 0.9
David 0.8
Alice 0.7
Eve 0.4
Carol 0.1
User Scoreagg
Bob 0.8
Alice 0.75α = 0.5
T = 0.5*0.8 + 0.5*0.9 = 0.85List 1 List 2
Rank Aggregation
• Fairly take into account the opinions of different
algorithms.
• Combine the ranking lists obtained from different
algorithms by computing aggregated score
!10
User Score
Alice 0.8
Bob 0.7
Carol 0.5
David 0.4
Eve 0.1
User Score
Bob 0.9
David 0.8
Alice 0.7
Eve 0.4
Carol 0.1
α = 0.5
User Scoreagg
Bob 0.8
Alice 0.75
David 0.6
T = 0.5*0.7 + 0.5*0.8 = 0.75List 1 List 2
Experimental Evaluation
• Data Sets
• Streaming data, and realistic background data from real data of
Twitter
• Query
• Contains WINDOW, SERVICE, and FILTER clauses
• Generate correct answer of the query by an Oracle
• KPIs
• Measure diversity of the set generated by the query and correct
answers
• Compute cummulative errors over evaluations
!11
Experimental Results
!12
WorstBest
Performance
Experiment Dimension
Experimental Results
!12
WorstBest
Performance
Experiment Dimension
For low selectivity
WBM is better than
Filter Update Policy
Experimental Results
!12
WorstBest
Performance
Experiment Dimension
For high selectivity
Filter Update Policy is
better than WBM
Experimental Results
!12
WorstBest
Performance
Experiment Dimension
Comparable to
Best Result
Conclusion
• Problem of continuously evaluating queries over data
stream and background data.
• The results of experiments show that proposed policies
have the same accuracy of the best result achieved
without using any assumption.
• They also show that the proposed policies are not
sensitive to the value of alpha used in rank aggregation
formula.
!13
Future works
• Broaden the class of queries
• Multiple filtering
• Filtering condition formulated as a ranking clause
• Pushing the FILTER clause into the SERVICE clause and
considering caching instead of local replica
• Study the effect of different trends in the data
!14
Thank you!

Any Question?
Using Rank Aggregation in Continuously
Answering SPARQL Queries on Streaming and
Quasi-static Linked Data
Shima Zahmatkesh
shima.zahmatkesh@polimi.it
DEIB - Politecnico of Milano
!15

Weitere ähnliche Inhalte

Ähnlich wie Using Rank Aggregation in Continuously Answering SPARQL Queries on Streaming and Quasi-static Linked Data

[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...Emanuel Lacić
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Rakebul Hasan
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
Rules validation - Copy
Rules validation - CopyRules validation - Copy
Rules validation - CopyHicham Berrada
 
Online index recommendations for high dimensional databases using query workl...
Online index recommendations for high dimensional databases using query workl...Online index recommendations for high dimensional databases using query workl...
Online index recommendations for high dimensional databases using query workl...Mumbai Academisc
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...John Kinmonth
 
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stack
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stackLow Cost Business Intelligence Platform for MongoDB instances using MEAN stack
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stackAvinash Kaza
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113longJeff Harris
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113longJeff Harris
 
database management system
database management systemdatabase management system
database management systemNivetha Ganesan
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
Petabytes and Nanoseconds
Petabytes and NanosecondsPetabytes and Nanoseconds
Petabytes and NanosecondsRobert Greiner
 
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit
 
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...Institute of Information Systems (HES-SO)
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL ServicesAmazon Web Services
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 

Ähnlich wie Using Rank Aggregation in Continuously Answering SPARQL Queries on Streaming and Quasi-static Linked Data (20)

[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
slides-sd
slides-sdslides-sd
slides-sd
 
Rules validation - Copy
Rules validation - CopyRules validation - Copy
Rules validation - Copy
 
Online index recommendations for high dimensional databases using query workl...
Online index recommendations for high dimensional databases using query workl...Online index recommendations for high dimensional databases using query workl...
Online index recommendations for high dimensional databases using query workl...
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...
 
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stack
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stackLow Cost Business Intelligence Platform for MongoDB instances using MEAN stack
Low Cost Business Intelligence Platform for MongoDB instances using MEAN stack
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113long
 
Couchbase overview033113long
Couchbase overview033113longCouchbase overview033113long
Couchbase overview033113long
 
database management system
database management systemdatabase management system
database management system
 
NoSQL and Couchbase
NoSQL and CouchbaseNoSQL and Couchbase
NoSQL and Couchbase
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
Petabytes and Nanoseconds
Petabytes and NanosecondsPetabytes and Nanoseconds
Petabytes and Nanoseconds
 
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
Accumulo Summit 2016: Accumulo Indexing Strategies for Searching Semantic Net...
 
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
DataHub
DataHubDataHub
DataHub
 

Kürzlich hochgeladen

Carbohydrates principles of biochemistry
Carbohydrates principles of biochemistryCarbohydrates principles of biochemistry
Carbohydrates principles of biochemistryKomakeTature
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecTrupti Shiralkar, CISSP
 
Modelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsModelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsYusuf Yıldız
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabusViolet Violet
 
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...soginsider
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Projectreemakb03
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationMohsinKhanA
 
News web APP using NEWS API for web platform to enhancing user experience
News web APP using NEWS API for web platform to enhancing user experienceNews web APP using NEWS API for web platform to enhancing user experience
News web APP using NEWS API for web platform to enhancing user experienceAkashJha84
 
nvidia AI-gtc 2024 partial slide deck.pptx
nvidia AI-gtc 2024 partial slide deck.pptxnvidia AI-gtc 2024 partial slide deck.pptx
nvidia AI-gtc 2024 partial slide deck.pptxjasonsedano2
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfNaveenVerma126
 
Mohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxMohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxKISHAN KUMAR
 
CSR Managerial Round Questions and answers.pptx
CSR Managerial Round Questions and answers.pptxCSR Managerial Round Questions and answers.pptx
CSR Managerial Round Questions and answers.pptxssusera0771e
 
me3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Ame3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Akarthi keyan
 
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...amrabdallah9
 
Tachyon 100G PCB Performance Attributes and Applications
Tachyon 100G PCB Performance Attributes and ApplicationsTachyon 100G PCB Performance Attributes and Applications
Tachyon 100G PCB Performance Attributes and ApplicationsEpec Engineered Technologies
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Sean Meyn
 

Kürzlich hochgeladen (20)

Carbohydrates principles of biochemistry
Carbohydrates principles of biochemistryCarbohydrates principles of biochemistry
Carbohydrates principles of biochemistry
 
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSecGuardians and Glitches: Navigating the Duality of Gen AI in AppSec
Guardians and Glitches: Navigating the Duality of Gen AI in AppSec
 
Lecture 2 .pptx
Lecture 2                            .pptxLecture 2                            .pptx
Lecture 2 .pptx
 
Modelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsModelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovations
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabus
 
Présentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdfPrésentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdf
 
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Project
 
A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software Simulation
 
News web APP using NEWS API for web platform to enhancing user experience
News web APP using NEWS API for web platform to enhancing user experienceNews web APP using NEWS API for web platform to enhancing user experience
News web APP using NEWS API for web platform to enhancing user experience
 
nvidia AI-gtc 2024 partial slide deck.pptx
nvidia AI-gtc 2024 partial slide deck.pptxnvidia AI-gtc 2024 partial slide deck.pptx
nvidia AI-gtc 2024 partial slide deck.pptx
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
 
Lecture 2 .pdf
Lecture 2                           .pdfLecture 2                           .pdf
Lecture 2 .pdf
 
Mohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptxMohs Scale of Hardness, Hardness Scale.pptx
Mohs Scale of Hardness, Hardness Scale.pptx
 
CSR Managerial Round Questions and answers.pptx
CSR Managerial Round Questions and answers.pptxCSR Managerial Round Questions and answers.pptx
CSR Managerial Round Questions and answers.pptx
 
me3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part Ame3493 manufacturing technology unit 1 Part A
me3493 manufacturing technology unit 1 Part A
 
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
 
Présentation IIRB 2024 Chloe Dufrane.pdf
Présentation IIRB 2024 Chloe Dufrane.pdfPrésentation IIRB 2024 Chloe Dufrane.pdf
Présentation IIRB 2024 Chloe Dufrane.pdf
 
Tachyon 100G PCB Performance Attributes and Applications
Tachyon 100G PCB Performance Attributes and ApplicationsTachyon 100G PCB Performance Attributes and Applications
Tachyon 100G PCB Performance Attributes and Applications
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
 

Using Rank Aggregation in Continuously Answering SPARQL Queries on Streaming and Quasi-static Linked Data

  • 1. Shima Zahmatkesh, Emanuele Della Valle, and Daniele Dell'Aglio DEIB - Politecnico of Milano DEBS 2017 – Barcelona, Spain 23 June 2017 Using Rank Aggregation in Continuously Answering SPARQL Queries on Streaming and Quasi-static Linked Data
  • 2. Stream Processing in Nutshell Stream Processing Engine ResultsWindows Stream data Register query once and execute it continuously !2
  • 3. Web Stream Processing Web Results Join Windows Web Streams Linked Data ✓ High Latency ✓ Rate Limits ✓ Loosing Reactiveness !3 Stream Processing Engine
  • 4. RDF Stream Processing (RSP) EngineRSPengine Web Results Join Windows RDF Streams SPARQL endpoint !4
  • 5. Motivation The cloth brand ACME wants to persuade influential Social Networks users to post commercial endorsements. Every minute give me the ID of the users that are mentioned on Social Network in the last 10 minutes whose number of followers is greater than 100,000. !5 REGISTER STREAM <:InfluencersToContact> AS CONSTRUCT {?user a :influentialUser} FROM NAMED WINDOW W ON S [RANGE 10m STEP 1m] WHERE { WINDOW W {?user :hasMentions ?mentionsNumber} SERVICE BKG {?user :hasFollowers ?followerCount } FILTER (?followerCount > 100,000) }
  • 6. Problem DefinitionRSPengine Web Results Join WindowsRDF Streams Define Refresh Budget to control reactiveness !6 Data become stale if not refreshed Correct vs approximate answer SPARQL endpoint Local Replica
  • 7. Problem DefinitionRSPengine Web Results Join WindowsRDF Streams SPARQL endpoint !7 Local Replica Maintenance Policy
  • 8. ACQUA, ACQUA.F Frameworks !8 WINDOW clause Stream data JOIN Proposer Ranker Maintainer 2 3 1 SERVICE clause ACQUA: without FILTER ACQUA.F: with FILTER Clause E C ACQUA: RND LRU WBM ACQUA.F: Filter Update Policy RND.F LRU.F WBM.F Candidate set Elected set: top γ mappings of Candidate set Local Replica WSJ: Filter out mappings that are not involved in current evaluation Soheila Dehghanzadeh, et al., Approximate Continuous Query Answering over Streams and Dynamic Linked Data Sets, ICWE 2015. Shima Zahmatkesh, et al., When a FILTER Makes the Dierence in Continuously Answering SPARQL Queries on Streaming and Quasi-Static Linked Data, ICWE 2016. Emanuele Della Valle, et al., Taming velocity and variety simultaneously in big data with stream reasoning, DEBS 2016.
  • 9. Rankers • LRU • Use Least-Recently Used (LRU) cache replacement algorithm • The less recently a mapping have been refreshed in a query, the higher is its rank. • Filter Update Policy • For each mapping in the replica: • Computes how close is the value associate to the variable of the mapping to the Filtering Threshold used in Filter clause. • Arrange mappings in ascending order. !9 User Last Update Time LRU policy #followers Filter Update Alice 8 1 120 2 Bob 10 2 30 3 Carol 14 3 95 1 Filtering Threshold = 100
  • 10. Rank Aggregation • Fairly take into account the opinions of different algorithms. • Combine the ranking lists obtained from different algorithms by computing aggregated score !10 User Score Alice 0.8 Bob 0.7 Carol 0.5 David 0.4 Eve 0.1 User Score Bob 0.9 David 0.8 Alice 0.7 Eve 0.4 Carol 0.1 User Scoreagg Bob 0.8 Alice 0.75α = 0.5 T = 0.5*0.8 + 0.5*0.9 = 0.85List 1 List 2
  • 11. Rank Aggregation • Fairly take into account the opinions of different algorithms. • Combine the ranking lists obtained from different algorithms by computing aggregated score !10 User Score Alice 0.8 Bob 0.7 Carol 0.5 David 0.4 Eve 0.1 User Score Bob 0.9 David 0.8 Alice 0.7 Eve 0.4 Carol 0.1 α = 0.5 User Scoreagg Bob 0.8 Alice 0.75 David 0.6 T = 0.5*0.7 + 0.5*0.8 = 0.75List 1 List 2
  • 12. Experimental Evaluation • Data Sets • Streaming data, and realistic background data from real data of Twitter • Query • Contains WINDOW, SERVICE, and FILTER clauses • Generate correct answer of the query by an Oracle • KPIs • Measure diversity of the set generated by the query and correct answers • Compute cummulative errors over evaluations !11
  • 14. Experimental Results !12 WorstBest Performance Experiment Dimension For low selectivity WBM is better than Filter Update Policy
  • 15. Experimental Results !12 WorstBest Performance Experiment Dimension For high selectivity Filter Update Policy is better than WBM
  • 17. Conclusion • Problem of continuously evaluating queries over data stream and background data. • The results of experiments show that proposed policies have the same accuracy of the best result achieved without using any assumption. • They also show that the proposed policies are not sensitive to the value of alpha used in rank aggregation formula. !13
  • 18. Future works • Broaden the class of queries • Multiple filtering • Filtering condition formulated as a ranking clause • Pushing the FILTER clause into the SERVICE clause and considering caching instead of local replica • Study the effect of different trends in the data !14
  • 19. Thank you!
 Any Question? Using Rank Aggregation in Continuously Answering SPARQL Queries on Streaming and Quasi-static Linked Data Shima Zahmatkesh shima.zahmatkesh@polimi.it DEIB - Politecnico of Milano !15