test

•

0 gefällt mir•272 views

asapteam

Bildung Technologie Unterhaltung & Humor

Toward Personalized
Peer-to-Peer Top-k Processing
Xiao BAI
Marin BERTIER
Rachid GUERRAOUI
Anne-Marie KERMARREC

31 March, 2009

Outline
 Background and Motivation
 Personalized PeertoPeer Topk Processing
 Evaluation
 Conclusion

2

Collaborative tagging and Top-k
 Collaborative tagging systems
 U(sers) * I(tems) * T(ags)
 Tagged(u, i, t): User u annotates item i with tag t
 Topk Processing
Item i Scoretj(i)
 Inverted list per tag
 Query q = {t1, ..., tn}
 Score(i) = f (Scoret1(i), ..., Scoretn(i))
 k items with highest scores as results

3

Motivations
 Why collaborative tagging?
 Flexibility of folksonomy
 Searching photos, videos, ...
 Why personalization?
 Variety of user preferences

4

How to personalize?
 Networkaware search
 Interestbased personal network
 Network(u) = {vi | link(u,vi )}
 link(u,v) iff Strengthlink(u,v) =
|{i | Tagged(u, i, t)&Tagged(v,i, t)}| > threshold
 Personalized score
 Scoretj(i, u) = | Network(u) ∩{v | Tagged(v, i, tj)}|

5

Why peer-to-peer?
 Exact
 Inverted list per (user, tag)
 Space intensive
 Global UpperBound
 Inverted list per tag Item i UB(i, tj) users(i)

 Time consuming: Scoretj(i) at query time
 User Clustering
 Inverted list per (cluster, tag)
 Tradeoff between Exact and Global UpperBound

6

Scalability & Efficiency
calls for
Decentralization

7

Peer-to-Peer Solution
 Goals
 Topk results as centralized networkaware search
 Time consumption as Exact
 Limited storage per user

8

Personal network discovery
 Twolayer gossip protocol
 Top layer: Personal network
 Measure the similarities between users
 Bottom layer: Random Peer Sampling
 Keep the overlay connected
 Feed toplayer with new related peers
 Gossip framework
 Peer selection
 Data exchange
 Data processing
9

Inverted lists & Query processing

 Inverted list per (user, tag)
 Inverted lists built and stored by concerned user
 Q = (u, t1, ... , tn)
 Query processed locally with NRA

10

NRA (No Random Access)
 Score(i) = ∑ tj∈Q Scoretj(i)
 Inverted lists are scanned sequentially in parallel
 Information for each seen item:
 Score lowerbound
 Score upperbound
 Seen items are sorted on score lowerbound
 Termination condition: lowerbound(kth seen item) ≥
max{ upperbound(i) | i∈{Seen items}/{topk}}

11

Personal network optimization
 At most n users per personal network
 Derived from all users meeting the threshold
 Strategies
 Random
 Biased Random: plink(u, v)∝ Strengthlink(u, v)
 Nearest: highest Strengthlink(u, v)
 Nearest With Enhanced Link Strength
 Strengthlink(u, v)=| { (i, tj) | Tagged(u, i, tj) & Tagged(v, i, tj) } |

12

Evaluation
 Implementation
 PeerSim
 Data
 Real trace from del.icio.us
 10,000 users
 101,144 items
 31,899 tags
 9,536,635 tagging actions

13

Evaluation metrics
 Convergence speed
1 ∣Current Network {u }∣
 Speed = ∑
∣U∣ u∈ U ∣Network {u } of Centralized Setting∣
 Topk quality: Recall
Number of Retrieved Relevant Items
 Rk = Total Number of Relevant Items
 Storage space per user
 Length of inverted lists
 Length of neigbhors' profiles
 Processing time
 Number of sequential accesses
14

Storage space

 Inverted lists  Profiles

17

Conclusion
 Decentralization is the right way to provide scalable
personalized topk processing
 Future work
 Churn
 Privacy

21

Weitere ähnliche Inhalte

Was ist angesagt?

FTS middleware doc.chopkins19

Text categorization as graphHarry Potter

Hfile格式详细介绍Institute of Computing Technology, Chinese Academy of Sciences

Computing k-rank Answers with Ontological CP-netsOana Tifrea-Marciuska

Introduction to ChainerPreferred Networks

O(1) DHTAkalanning Huang

Document clustering and classification Mahmoud Alfarra

HfileMarc de Palol

Elag 2012 - Under the hood of 3TU.Datacentrum.Egbert Gramsbergen

HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsSchubert Zhang

Introduction to ChainerShunta Saito

Automatic Document SummarizationFindwise

Lecture 25Berkay TURAN

FAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTINGIJNSA Journal

Spatial Approximate String Keyword content Query processinginventionjournals

Was ist angesagt? (15)

FTS middleware doc.

Text categorization as graph

Hfile格式详细介绍

Computing k-rank Answers with Ontological CP-nets

Introduction to Chainer

O(1) DHT

Document clustering and classification

Hfile

Elag 2012 - Under the hood of 3TU.Datacentrum.

HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs

Introduction to Chainer

Automatic Document Summarization

Lecture 25

FAST DETECTION OF DDOS ATTACKS USING NON-ADAPTIVE GROUP TESTING

Spatial Approximate String Keyword content Query processing

Andere mochten auch

Skt migas share2Sarah Maryatie

paperasapteam

Skt migas share1Sarah Maryatie

10 pitanja koja će Vam Bog postavitiDragana28

Skt migas share2Sarah Maryatie

Skt migas share1Sarah Maryatie

Proses iujk kualifikasi b1 (grade 7)Sarah Maryatie

Cad Modelling Ergonomics: Sistema di Scansione AntropometricaLorenzo Salemme

Toward Personalized Peer-to-Peer Top-k Processingasapteam

Skt migas share2Sarah Maryatie

Skt migas share1Sarah Maryatie

Proses iujk kualifikasi b1 (grade 7)Sarah Maryatie

10pitanja[1] MDragana28

Sistema di Scansione Antropometrica per la realizzazione di Abbigliamento di ...Lorenzo Salemme

Presentazione lugano 2010Lorenzo Salemme

Cad Modelling Ergonomics - brief introductionLorenzo Salemme

Andere mochten auch (16)

Skt migas share2

paper

Skt migas share1

10 pitanja koja će Vam Bog postaviti

Skt migas share2

Skt migas share1

Proses iujk kualifikasi b1 (grade 7)

Cad Modelling Ergonomics: Sistema di Scansione Antropometrica

Toward Personalized Peer-to-Peer Top-k Processing

Skt migas share2

Skt migas share1

Proses iujk kualifikasi b1 (grade 7)

10pitanja[1] M

Sistema di Scansione Antropometrica per la realizzazione di Abbigliamento di ...

Presentazione lugano 2010

Cad Modelling Ergonomics - brief introduction

Ähnlich wie test

Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Alessandro Suglia

Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Claudio Greco

Sigir 2011 proceedingschetanagavankar

Summary of SIGIR 2011 Paperschetanagavankar

Computing Outside The Box June 2009Ian Foster

Efficient top-k queries processing in column-family distributed databasesRui Vieira

Indexing and Searching Cross Media Content in a Social NetworkPaolo Nesi

Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah

Distributed Deep Learning + others for Spark MeetupVijay Srinivas Agneeswaran, Ph.D

A Survey of Entity Ranking over RDF GraphsIntelligent Search Systems and Semantic Technologies lab at ITIS KFU

Virtual Knowledge Graphs for Federated Log AnalysisKabul Kurniawan

Provenance for Data Munging EnvironmentsPaul Groth

An introduction to system-oriented evaluation in Information RetrievalMounia Lalmas-Roelleke

Data Streaming (in a Nutshell) ... and Spark's window operationsVincenzo Gulisano

Multidimensional Interfaces for Selecting Data with OrderRuben Taelman

Week14-Multimedia Information Retrieval.pptxHasanulFahmi2

A scalable collaborative filtering framework based on co clusteringAllenWu

Computing Outside The Box September 2009Ian Foster

Tutorial 1 (information retrieval basics)Kira

CSTalks-Quaternary Semantics Recomandation System-24 Augcstalks

Ähnlich wie test (20)

Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...

Sigir 2011 proceedings

Summary of SIGIR 2011 Papers

Computing Outside The Box June 2009

Efficient top-k queries processing in column-family distributed databases

Indexing and Searching Cross Media Content in a Social Network

Cloudera Movies Data Science Project On Big Data

Distributed Deep Learning + others for Spark Meetup

A Survey of Entity Ranking over RDF Graphs

Virtual Knowledge Graphs for Federated Log Analysis

Provenance for Data Munging Environments

An introduction to system-oriented evaluation in Information Retrieval

Data Streaming (in a Nutshell) ... and Spark's window operations

Multidimensional Interfaces for Selecting Data with Order

Week14-Multimedia Information Retrieval.pptx

A scalable collaborative filtering framework based on co clustering

Computing Outside The Box September 2009

Tutorial 1 (information retrieval basics)

CSTalks-Quaternary Semantics Recomandation System-24 Aug

Kürzlich hochgeladen

How to Create and Manage Wizard in Odoo 17Celine George

How to Manage Global Discount in Odoo 17 POSCeline George

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva

Jamworks pilot and AI at Jisc (20/03/2024)Jisc

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136

REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda

Sociology 101 Demonstration of Learning Exhibitjbellavia9

Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma

Salient Features of India constitution especially power and functionsKarakKing

Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC

Wellbeing inclusion and digital dystopias.pptxJisc

On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection

Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva

FSB Advising Checklist - Orientation 2024Elizabeth Walsh

ICT role in 21st century education and it's challenges.MaryamAhmad92

Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva

The basics of sentences session 3pptx.pptxheathfieldcps1

Kürzlich hochgeladen (20)

How to Create and Manage Wizard in Odoo 17

How to Manage Global Discount in Odoo 17 POS

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx

Jamworks pilot and AI at Jisc (20/03/2024)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...

REMIFENTANIL: An Ultra short acting opioid.pptx

Sociology 101 Demonstration of Learning Exhibit

Fostering Friendships - Enhancing Social Bonds in the Classroom

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf

Salient Features of India constitution especially power and functions

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx

Wellbeing inclusion and digital dystopias.pptx

On National Teacher Day, meet the 2024-25 Kenan Fellows

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...

Interdisciplinary_Insights_Data_Collection_Methods.pptx

FSB Advising Checklist - Orientation 2024

ICT role in 21st century education and it's challenges.

Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...

The basics of sentences session 3pptx.pptx

test

1. Toward Personalized Peer-to-Peer Top-k Processing Xiao BAI Marin BERTIER Rachid GUERRAOUI Anne-Marie KERMARREC 31 March, 2009

2. Outline  Background and Motivation  Personalized PeertoPeer Topk Processing  Evaluation  Conclusion 2

3. Collaborative tagging and Top-k  Collaborative tagging systems  U(sers) * I(tems) * T(ags)  Tagged(u, i, t): User u annotates item i with tag t  Topk Processing Item i Scoretj(i)  Inverted list per tag  Query q = {t1, ..., tn}  Score(i) = f (Scoret1(i), ..., Scoretn(i))  k items with highest scores as results 3

4. Motivations  Why collaborative tagging?  Flexibility of folksonomy  Searching photos, videos, ...  Why personalization?  Variety of user preferences 4

5. How to personalize?  Networkaware search  Interestbased personal network  Network(u) = {vi | link(u,vi )}  link(u,v) iff Strengthlink(u,v) = |{i | Tagged(u, i, t)&Tagged(v,i, t)}| > threshold  Personalized score  Scoretj(i, u) = | Network(u) ∩{v | Tagged(v, i, tj)}| 5

6. Why peer-to-peer?  Exact  Inverted list per (user, tag)  Space intensive  Global UpperBound  Inverted list per tag Item i UB(i, tj) users(i)  Time consuming: Scoretj(i) at query time  User Clustering  Inverted list per (cluster, tag)  Tradeoff between Exact and Global UpperBound 6

7. Scalability & Efficiency calls for Decentralization 7

8. Peer-to-Peer Solution  Goals  Topk results as centralized networkaware search  Time consumption as Exact  Limited storage per user 8

9. Personal network discovery  Twolayer gossip protocol  Top layer: Personal network  Measure the similarities between users  Bottom layer: Random Peer Sampling  Keep the overlay connected  Feed toplayer with new related peers  Gossip framework  Peer selection  Data exchange  Data processing 9

10. Inverted lists & Query processing  Inverted list per (user, tag)  Inverted lists built and stored by concerned user  Q = (u, t1, ... , tn)  Query processed locally with NRA 10

11. NRA (No Random Access)  Score(i) = ∑ tj∈Q Scoretj(i)  Inverted lists are scanned sequentially in parallel  Information for each seen item:  Score lowerbound  Score upperbound  Seen items are sorted on score lowerbound  Termination condition: lowerbound(kth seen item) ≥ max{ upperbound(i) | i∈{Seen items}/{topk}} 11

12. Personal network optimization  At most n users per personal network  Derived from all users meeting the threshold  Strategies  Random  Biased Random: plink(u, v)∝ Strengthlink(u, v)  Nearest: highest Strengthlink(u, v)  Nearest With Enhanced Link Strength  Strengthlink(u, v)=| { (i, tj) | Tagged(u, i, tj) & Tagged(v, i, tj) } | 12

13. Evaluation  Implementation  PeerSim  Data  Real trace from del.icio.us  10,000 users  101,144 items  31,899 tags  9,536,635 tagging actions 13

14. Evaluation metrics  Convergence speed 1 ∣Current Network {u }∣  Speed = ∑ ∣U∣ u∈ U ∣Network {u } of Centralized Setting∣  Topk quality: Recall Number of Retrieved Relevant Items  Rk = Total Number of Relevant Items  Storage space per user  Length of inverted lists  Length of neigbhors' profiles  Processing time  Number of sequential accesses 14

15. Convergence speed 15

16. Recall 16

17. Storage space  Inverted lists  Profiles 17

18. Processing time 18

19. Comparison of optimization 19

20. Scalability 20

21. Conclusion  Decentralization is the right way to provide scalable personalized topk processing  Future work  Churn  Privacy 21

22. Thank you! 22

test

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (15)

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie test

Ähnlich wie test (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

test