SlideShare ist ein Scribd-Unternehmen logo
1 von 25
An Efficient incremental indexing
mechanism for extracting Top-k
representative queries over continuous
data streams
Y.S. Horawalavithana, D.N. Ranasinghe
Adaptive and Reflective Middleware (ARM)
ACM/IFIP/USENIX Middleware
Vancouver, BC, Canada
December 08, 2015
1
University of Colombo School of Computing,
Sri Lanka
2
Overview
• Motivation
• Adaptive Diversification
• Incremental Top-k
• Evaluation
• Conclusion
• Future work
3
4
Diversity: Top-k representative set
Representative Top-kDrawback
(without diversity)
What we want
(with diversity)
Method to retrieve Top-k publications from matching publications
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
5
Minimum independent-dominating set
𝑝1
𝑝2
𝑝3
𝑝4
𝑝5
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2
𝛼
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2

𝑣1
𝑣4
𝑣3
𝑣2
𝑣5
𝑣1
𝑣4
𝑣3
𝑣2
𝑣5
  jijiji ppppdppodNeighborho  ,|)(
𝑣1
𝑣4
𝑣3𝑣2
𝑣5
Publication
space
Graph
model
Independent, dominating Independent, dominating Independent, dominating Dominating, not independent
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
6
NAÏVE Greedy argmax
𝑟(𝑝𝑖)2
𝑝 𝑗∈𝑁(𝑝 𝑖) 𝑟(𝑝𝑗) × 𝑑(𝑝𝑖, 𝑝𝑗)
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
7
Handling streaming publications
𝑝1
𝑝2
𝑝3
𝑝4
𝑝5
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2𝛼
𝑝6
𝑣1
𝑣4
𝑣3
𝑣5
𝑣2𝑣6
Continuity Requirements
1. Durability
an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in 𝑖 + 1 𝑡ℎ window
if it's not expired & other valid items in 𝑖 + 1 𝑡ℎ
window are failed to compete with it.
2. Order
Publication stream follow the chronological order
We avoid the selection of item j as diverse later, when we already selected an item i which is not-
older than j.
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
8
Adaptive Diversification
𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. ....
Matching publication stream
𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. ....
ith window
(i+1)th window
𝑆𝑖
∗
𝑆𝑖+1
∗
Independence
Dominance
Durability
Order
 Straightforward solution:
 Apply naïve greedy method at each instance
 Propose incremental index mechanism!
 Avoid the curse of re-calculating neighborhood
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
9
Locality Sensitive Hashing (LSH)
 Simple Idea
 if two points are close together, then after a “projection” operation these two
points will remain close together
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
10
LSH in Adaptive Diversification:
Publications as categorical data
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
11
LSH in Adaptive Diversification:
Characteristic Matrix
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
12
LSH in Adaptive Diversification:
Minhashing
 No Publications any more!
 Signature to represent
 Technique
 Randomly permute the rows at
characteristic matrix m times
 Take the number of the 1st row, in
the permuted order,
 which the column has a 1 for
the correspondent column of
publications.
First permutation of rows at characteristic matrix
 Advantage:
 Reduce the dimensions into a small
minhash signature
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
13
LSH in Adaptive Diversification:
Signature Matrix
Fast-minhashing
Select m number of random hash
functions
To model the effect of m number of
random permutation
Mathematically proved only when,
The number of rows is a prime.
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
14
LSH in Adaptive Diversification:
LSH Buckets
 Take r sized
signature vectors
 From m sized
minhash-
signature
 Map them into,
 L Hash-Tables
 Each with
arbitrary b
number of
buckets
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
15
LSH in Adaptive Diversification:
Batch-wise Top-k computation
 Bucket “Winner” – a publication which has the
highest relevancy score
 Winner is dominant to represent it's bucket
neighborhood
 Top-k "winners“ that have a majority of votes
 k winners are independent
𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . .
ith
window
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
16
LSH in Dynamic Diversification:
Incremental Top-k computation
𝑁𝑒𝑤 𝑝𝑢𝑏𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑖 𝑈𝑝𝑑𝑎𝑡𝑒 𝑖 𝑡ℎ
𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑣𝑒𝑐𝑡𝑜𝑟
Characteristic
Matrix
𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑖 𝑡ℎ
𝑚𝑖𝑛ℎ𝑎𝑠ℎ 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒
Signature
Matrix
Map 𝑖 𝑡ℎ
signature
into L hash-tables
Update “Winner” at
bucket 𝑖 𝑡ℎ
signature
maps into
Vote 𝑇𝑜𝑝 − 𝑘 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
17
LSH in Dynamic Diversification:
When new publication F arrives…
 Only buckets 𝐵13
, 𝐵23
, 𝐵32
, 𝐵43
will vote
 Follow continuity requirements
 Durability
 Order
𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . .
ith
window
(i+1)th
window

1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
18
LSH in Adaptive Diversification:
Analysis
For two vectors x,y
𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ;
𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 =
𝑥 ∩ 𝑦
𝑥 ∪ 𝑦
 For publications x & y
𝐽𝑆𝐼𝑀 𝑥, 𝑦 ∝ 𝑃𝑟𝑜𝑏 𝐻 𝑥 = 𝐻 𝑦
 At a particular hash table
 x & y map into the same bucket:
𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
 x & y does not map into the same bucket:
1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
 At L Hash-tables
 x & y does not map into the same bucket:
(1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏
) 𝐿 1 − (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏) 𝐿
True near neighbors will
be unlikely to be unlucky
in all the projections
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
Publication Stream  Zipfian subscriptions
 Normalized preferences
19
Evaluation:
Dataset
Amazon on-line market place data available at 17th – 19th November 2014
𝑧𝑖𝑝𝑓 𝑘: 𝑠, 𝑁 =
1
𝑘 𝑠
𝑛=1
𝑁
(
1
𝑛 𝑠)
N - number of elements in distribution,
k - rank of element
s - value of exponent
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑏𝑐𝑟𝑖𝑏𝑒𝑟 𝑣𝑖𝑒𝑤𝑠
=
𝑖=2
32
48 𝑐 𝑖
+ 42 𝑐 𝑖
+ 54 𝑐 𝑖
+ 66 𝑐 𝑖
+ 57 𝑐 𝑖
+ 67 𝑐 𝑖
20
Terminology
ILSH, BLSH and NAÏVE
𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 . .
BLSH
or
NAIVE
BLSH
or
NAIVE
BLSH
or
NAIVE
BLSH
or
NAIVE
ILSH
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
21
Accuracy:
ILSH vs. NAÏVE
Probability of producing optimal diverse set of results by ILSH under Jaccard similarity threshold (s)
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
22
Performance & Efficiency:
ILSH vs. BLSH vs. NAÏVE
log (Top-k matching time) on number of publications with D=500
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
23
Conclusions
 Locality Sensitive Hashing (LSH) indexing method
 Produce diverse set of results at average 70% accuracy over naïve method
 Reduce the matching time very significantly over NAÏVE method
 Further, refine by it’s incremental version
 For handling streaming publications
 Avoid the curse of re-computing neighborhoods
 Top k to restrict the delivery of Top publications
 Given a window size & delivery method
 Model can produce best diverse set of personalized results
 To represent the set of all matching publications at given instance
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
24
Future work
 Explore other suitable use-cases to apply proposed model & develop
prototype applications, E.g.
 Personalized newspaper for every Facebook user
 Adaptive resource scheduling in large scale distributed system
 Exploit overlap among diversified results of users who have similar interest
 Develop LSH based index over multi-threaded distributed environment
1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
25
Q&A
THANK YOU!

Weitere ähnliche Inhalte

Ähnlich wie [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...Sameera Horawalavithana
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfPo-Chuan Chen
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfPo-Chuan Chen
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Big Data Spain
 
Group713_ProgressDraft
Group713_ProgressDraftGroup713_ProgressDraft
Group713_ProgressDraftSarp Uzel
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Sri Ambati
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2Shrayes Ramesh
 
Scalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHScalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHMaruf Aytekin
 
Deep learning Unit1 BasicsAllllllll.pptx
Deep learning Unit1 BasicsAllllllll.pptxDeep learning Unit1 BasicsAllllllll.pptx
Deep learning Unit1 BasicsAllllllll.pptxFreefireGarena30
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference Natan Katz
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference projectNatan Katz
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsNBER
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
The data, they are a-changin’
The data, they are a-changin’The data, they are a-changin’
The data, they are a-changin’ Paolo Missier
 

Ähnlich wie [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation (20)

[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015Building graphs to discover information by David Martínez at Big Data Spain 2015
Building graphs to discover information by David Martínez at Big Data Spain 2015
 
Group713_ProgressDraft
Group713_ProgressDraftGroup713_ProgressDraft
Group713_ProgressDraft
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 
Scalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHScalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSH
 
Deep learning Unit1 BasicsAllllllll.pptx
Deep learning Unit1 BasicsAllllllll.pptxDeep learning Unit1 BasicsAllllllll.pptx
Deep learning Unit1 BasicsAllllllll.pptx
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
NICE Implementations of Variational Inference
NICE Implementations of Variational Inference NICE Implementations of Variational Inference
NICE Implementations of Variational Inference
 
NICE Research -Variational inference project
NICE Research -Variational inference projectNICE Research -Variational inference project
NICE Research -Variational inference project
 
Introduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and AlgorithmsIntroduction to Supervised ML Concepts and Algorithms
Introduction to Supervised ML Concepts and Algorithms
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
nber_slides.pdf
nber_slides.pdfnber_slides.pdf
nber_slides.pdf
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Stack and Hash Table
Stack and Hash TableStack and Hash Table
Stack and Hash Table
 
The data, they are a-changin’
The data, they are a-changin’The data, they are a-changin’
The data, they are a-changin’
 
key.net
key.netkey.net
key.net
 

Mehr von Sameera Horawalavithana

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationSameera Horawalavithana
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political CrisisSameera Horawalavithana
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White HelmetsSameera Horawalavithana
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Sameera Horawalavithana
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubSameera Horawalavithana
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...Sameera Horawalavithana
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...Sameera Horawalavithana
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Sameera Horawalavithana
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...Sameera Horawalavithana
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingSameera Horawalavithana
 

Mehr von Sameera Horawalavithana (16)

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and Simulation
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 
Dancing with Stream Processing
Dancing with Stream ProcessingDancing with Stream Processing
Dancing with Stream Processing
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Zipf distribution
Zipf distributionZipf distribution
Zipf distribution
 
Query personalization
Query personalizationQuery personalization
Query personalization
 
Dancing with publish/subscribe
Dancing with publish/subscribeDancing with publish/subscribe
Dancing with publish/subscribe
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
 

Kürzlich hochgeladen

RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation

  • 1. An Efficient incremental indexing mechanism for extracting Top-k representative queries over continuous data streams Y.S. Horawalavithana, D.N. Ranasinghe Adaptive and Reflective Middleware (ARM) ACM/IFIP/USENIX Middleware Vancouver, BC, Canada December 08, 2015 1 University of Colombo School of Computing, Sri Lanka
  • 2. 2 Overview • Motivation • Adaptive Diversification • Incremental Top-k • Evaluation • Conclusion • Future work
  • 3. 3
  • 4. 4 Diversity: Top-k representative set Representative Top-kDrawback (without diversity) What we want (with diversity) Method to retrieve Top-k publications from matching publications 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 5. 5 Minimum independent-dominating set 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2 𝛼 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2  𝑣1 𝑣4 𝑣3 𝑣2 𝑣5 𝑣1 𝑣4 𝑣3 𝑣2 𝑣5   jijiji ppppdppodNeighborho  ,|)( 𝑣1 𝑣4 𝑣3𝑣2 𝑣5 Publication space Graph model Independent, dominating Independent, dominating Independent, dominating Dominating, not independent 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 6. 6 NAÏVE Greedy argmax 𝑟(𝑝𝑖)2 𝑝 𝑗∈𝑁(𝑝 𝑖) 𝑟(𝑝𝑗) × 𝑑(𝑝𝑖, 𝑝𝑗) 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 7. 7 Handling streaming publications 𝑝1 𝑝2 𝑝3 𝑝4 𝑝5 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2𝛼 𝑝6 𝑣1 𝑣4 𝑣3 𝑣5 𝑣2𝑣6 Continuity Requirements 1. Durability an item is selected as diversified in 𝑖 𝑡ℎ window may still have the chance to be in 𝑖 + 1 𝑡ℎ window if it's not expired & other valid items in 𝑖 + 1 𝑡ℎ window are failed to compete with it. 2. Order Publication stream follow the chronological order We avoid the selection of item j as diverse later, when we already selected an item i which is not- older than j. 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 8. 8 Adaptive Diversification 𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. .... Matching publication stream 𝑃1 𝑃2 𝑃3 𝑃4 .. 𝑃𝑗 𝑃𝑗+1 .. .. .. .... ith window (i+1)th window 𝑆𝑖 ∗ 𝑆𝑖+1 ∗ Independence Dominance Durability Order  Straightforward solution:  Apply naïve greedy method at each instance  Propose incremental index mechanism!  Avoid the curse of re-calculating neighborhood 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 9. 9 Locality Sensitive Hashing (LSH)  Simple Idea  if two points are close together, then after a “projection” operation these two points will remain close together 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 10. 10 LSH in Adaptive Diversification: Publications as categorical data 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 11. 11 LSH in Adaptive Diversification: Characteristic Matrix 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 12. 12 LSH in Adaptive Diversification: Minhashing  No Publications any more!  Signature to represent  Technique  Randomly permute the rows at characteristic matrix m times  Take the number of the 1st row, in the permuted order,  which the column has a 1 for the correspondent column of publications. First permutation of rows at characteristic matrix  Advantage:  Reduce the dimensions into a small minhash signature 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 13. 13 LSH in Adaptive Diversification: Signature Matrix Fast-minhashing Select m number of random hash functions To model the effect of m number of random permutation Mathematically proved only when, The number of rows is a prime. 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 14. 14 LSH in Adaptive Diversification: LSH Buckets  Take r sized signature vectors  From m sized minhash- signature  Map them into,  L Hash-Tables  Each with arbitrary b number of buckets 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 15. 15 LSH in Adaptive Diversification: Batch-wise Top-k computation  Bucket “Winner” – a publication which has the highest relevancy score  Winner is dominant to represent it's bucket neighborhood  Top-k "winners“ that have a majority of votes  k winners are independent 𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . . ith window 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 16. 16 LSH in Dynamic Diversification: Incremental Top-k computation 𝑁𝑒𝑤 𝑝𝑢𝑏𝑙𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑖 𝑈𝑝𝑑𝑎𝑡𝑒 𝑖 𝑡ℎ 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑣𝑒𝑐𝑡𝑜𝑟 Characteristic Matrix 𝐺𝑒𝑛𝑒𝑟𝑎𝑡𝑒 𝑖 𝑡ℎ 𝑚𝑖𝑛ℎ𝑎𝑠ℎ 𝑠𝑖𝑔𝑛𝑎𝑡𝑢𝑟𝑒 Signature Matrix Map 𝑖 𝑡ℎ signature into L hash-tables Update “Winner” at bucket 𝑖 𝑡ℎ signature maps into Vote 𝑇𝑜𝑝 − 𝑘 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 17. 17 LSH in Dynamic Diversification: When new publication F arrives…  Only buckets 𝐵13 , 𝐵23 , 𝐵32 , 𝐵43 will vote  Follow continuity requirements  Durability  Order 𝑃𝐴 𝑃𝐵 𝑃𝐶 𝑃 𝐷 𝑃𝐸 𝑃𝐹 𝑃𝐺 𝑃 𝐻 . . ith window (i+1)th window  1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 18. 18 LSH in Adaptive Diversification: Analysis For two vectors x,y 𝐽𝐷 𝑥, 𝑦 = 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ; 𝑤ℎ𝑒𝑟𝑒, 𝐽𝑆𝐼𝑀 𝑥, 𝑦 = 𝑥 ∩ 𝑦 𝑥 ∪ 𝑦  For publications x & y 𝐽𝑆𝐼𝑀 𝑥, 𝑦 ∝ 𝑃𝑟𝑜𝑏 𝐻 𝑥 = 𝐻 𝑦  At a particular hash table  x & y map into the same bucket: 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏  x & y does not map into the same bucket: 1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏  At L Hash-tables  x & y does not map into the same bucket: (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏 ) 𝐿 1 − (1 − 𝐽𝑆𝐼𝑀 𝑥, 𝑦 𝑏) 𝐿 True near neighbors will be unlikely to be unlucky in all the projections 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 19. Publication Stream  Zipfian subscriptions  Normalized preferences 19 Evaluation: Dataset Amazon on-line market place data available at 17th – 19th November 2014 𝑧𝑖𝑝𝑓 𝑘: 𝑠, 𝑁 = 1 𝑘 𝑠 𝑛=1 𝑁 ( 1 𝑛 𝑠) N - number of elements in distribution, k - rank of element s - value of exponent 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑏𝑐𝑟𝑖𝑏𝑒𝑟 𝑣𝑖𝑒𝑤𝑠 = 𝑖=2 32 48 𝑐 𝑖 + 42 𝑐 𝑖 + 54 𝑐 𝑖 + 66 𝑐 𝑖 + 57 𝑐 𝑖 + 67 𝑐 𝑖
  • 20. 20 Terminology ILSH, BLSH and NAÏVE 𝑃1 𝑃2 𝑃3 𝑃4 𝑃5 𝑃6 𝑃7 𝑃8 . . BLSH or NAIVE BLSH or NAIVE BLSH or NAIVE BLSH or NAIVE ILSH 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 21. 21 Accuracy: ILSH vs. NAÏVE Probability of producing optimal diverse set of results by ILSH under Jaccard similarity threshold (s) 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 22. 22 Performance & Efficiency: ILSH vs. BLSH vs. NAÏVE log (Top-k matching time) on number of publications with D=500 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 23. 23 Conclusions  Locality Sensitive Hashing (LSH) indexing method  Produce diverse set of results at average 70% accuracy over naïve method  Reduce the matching time very significantly over NAÏVE method  Further, refine by it’s incremental version  For handling streaming publications  Avoid the curse of re-computing neighborhoods  Top k to restrict the delivery of Top publications  Given a window size & delivery method  Model can produce best diverse set of personalized results  To represent the set of all matching publications at given instance 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work
  • 24. 24 Future work  Explore other suitable use-cases to apply proposed model & develop prototype applications, E.g.  Personalized newspaper for every Facebook user  Adaptive resource scheduling in large scale distributed system  Exploit overlap among diversified results of users who have similar interest  Develop LSH based index over multi-threaded distributed environment 1.Motivation 2.Adaptive Diversification 3.Incremental Top-k 4.Evaluation 5.Conclusion 6.Future Work

Hinweis der Redaktion

  1. each user gets exposed to more than 1,500 stories each day, but an average user would only get to see about 
  2. Since similar publications have the tendency to map into same bucket at probability 1 − d, dominance condition can be well served. Because the "winner" publication as the most relevant publication at each bucket, can cover it's neighborhood. Also two buckets represent two separate neighborhoods. That results all "winner" publications to be dis-similar from each other by at least d distance. So it also satises the independence condition
  3. Talk on ILSH update cost, because of maintaining a large characteristic matrix