Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

926 Aufrufe

Veröffentlicht am

Workshop paper part of the Modeling Social Media 2013 workshop at Hypertext 2013 conference presented in Paris, France on May 1, 2013

Veröffentlicht in: Technologie, Business
  • Als Erste(r) kommentieren

Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

  1. 1. Exploring Generative Models of TripartiteGraphs for Recommendation in Social MediaCharalampos Chelmis, Viktor K Prasannachelmis@usc.eduMSM 2013, Paris, France
  2. 2. • Introduction• Structure of Tripartite Graphs• Generative Models of Tripartite Graphs• Social Link Classification Schemes• Evaluation• ConclusionOverview2
  3. 3. • Social Networking is used for Content organization Content sharing• Multiple media types• Users activities Reveal interests and tastes Hidden structure• Description of Resources Text Tags / Hashtags• Social Annotation Collective characterization of resources Use of synonyms for similar recourses Same keywords for different recoursesIntroduction3
  4. 4. • How to address issues of synonymy and polysemy? Deal with space size explosion• How to discover emergent structure in online tagging systems? Hidden topics• How to capture users’ latent interests? Which subjects a user is mostly interested in? Which users have similar interests?• How to model the process of social generation of annotations? How to capture the semantics of collaboration• Why is this useful? Recommend people Recommend Tags / resources Clustering …Research Questions4
  5. 5. • Set of actors (e.g. users) A={a1, ...,ak}• Set of concepts (e.g. tags) C = {c1, ..., cl}• Set of resources (e.g. photos) R ={r1, ..., rm}Structure of Tripartite Graphs5
  6. 6. • The User-Concept Model Users are modeled based on their tag usage φ denotes the matrix of topic distributions− multinomial distribution over N concepts− T topics being drawn independently θ: the matrix of user-specific mixture weights forthese T topics• Captures users’ latent interests• Ignores Resources• Ignores the social aspect of tagging• The User-Resource Model Resources become vocabulary terms• Tags are ignored• Ignores the social aspect of taggingReducing the Tripartite Graph to Bipartite Structures6
  7. 7. • Topic-based representation• Model both resources & users’ interests• Multiple users may annotate resource r• For each tag a user is chosen uniformly at random• Each user is associated with a distribution overlatent topics ɵ• A topic is chosen from a distribution over topicsspecific to that user• The tag is generated from the chosen topic φt: probability distribution of tags for topic tThe User-Resource-Concept Model7
  8. 8. • Tag Recommendation Automatic annotation enhancement Search improvement• Clustering Community detection Organization of resources/tags in categories• Navigation and Visualization Social browsing• Next we focus on recommending peopleRecommendation8
  9. 9. • Classification Based on Latent Interests Measure “tastes” distance with respect to latent topics distribution Pointwise squared distance between feature vectors of users u and v Other measures to consider− Kullback Leibler (KL) divergence− Cosine similarity• Objective: Minimize the distance between linked users• Focus on topical homophily Ignore network effects• Prior work uses network proximity as indicator of link formationSocial Link Recommendation UsingLatent Semantics & Network Structure9]v))(k,-u)(k,(,,v))(1,-u)(1,[(v)F(u, 22ΘΘΘΘ= F(u,v) = 0 => u,v haveidentical distributionsF(u,v) > 0 => distributionsdiverge
  10. 10. • Latent Topics & Local Structure CN(u,v) = common neighbors between users u and v− Simplicity and computational efficiency Latent topics similarity• Latent Topics & Global Structure SD(u,v) = shortest distance between users u and v• Non separable training set => inefficient classifiers• Aggregation Strategy Reduce the number of training samples Produce more efficient classifiers Average latent similarity of user pairs with k commonneighbors:Social Link Recommendation UsingLatent Semantics & Network Structure10v)]CN(u,v),(u,[v)F(u, σ=∑===kk:pp p(p)|kk:p|1(k)avg σσv)]SD(u,v),(u,[v)F(u, σ=22),(),(),(),(),(∑∑∑ΘΘΘΘ=tttvtutvtutvuσ
  11. 11. • Objectives Ability to uncover subliminal collective knowledge Evaluate performance of “people” recommendation• Setting 2.4 GHz Intel Core 2 Duo, 2 GB memory, Windows 7• Real-world Dataset Last.fm online music system− social relationships− tagging information− music artist listening information Statistics− 1,892 users− 25,434 directed user friend relations− 17,632 artists UR Model vocabulary size− 92,834 user-listened-artist relations− 11,946 unique tags UC and URC vocabulary size− 186,479 annotations (tuples <user, tag, artist>)Experimental Analysis11
  12. 12. Sample Topics12
  13. 13. • Evaluate ability to predict tags/resources on new users Perplexity• Split dataset into two disjoint sets 90% for training• Lower perplexity indicates better generalization• URC better overall Exploits more information• UC Organizes tags in “clusters”• UR Inferior quality due to noisePredictive Power13
  14. 14. • Split dataset into two disjoint sets 10%, 25%, 50%, 75% for training, rest for testing• Evaluation process Randomly sample 12,716 pairs of users 50% true links, 50% negative samples Compute similarity of user pairs Sort users in decreasing order of similarity Add links between users with highest similarityRecommendation of Social Ties14
  15. 15. • Latent Topics & Shortest Distance Aggregates all true links training similarity values in a single point Least effective• Ensemble achieves best precision• Over fitting for training size > 50%• Recall drops as dataset size increasesRecommendation of Social Ties15[Latent Topics & Local Structure][Latent Topics][Ensemble]
  16. 16. • In social media number of true links << absent links• High performance for both classes True negatives easier to classify correctly Degradation in performance for true positives• Reasonable results for practical purposesHow about High Class Imbalance?16[Latent Topics & Local Structure][Latent Topics][Ensemble]
  17. 17. • Baselines Cosine Similarity (CS) Maximal Information Path (MIP)• Evaluation Criterion Area under the receiver-operating characteristic curve (AUC)• Baselines AUC Computed over the complete dataset Biases the evaluation in favor of the baselines CS AUC = 0.6087 MIP AUC = 0.6256• Same evaluation process as before• Compute performance lift % change over best performing baseline Positive % denotes improvementComparison to Tag-based similarity metrics17
  18. 18. • Not all schemes can beat the baseline For 10% training data ≤10% AUC loss But, significant speedup due to minimal training dataset• Latent Topics & Local Structure Scheme consistently betterComparison to Tag-based similarity metrics18Training dataset size[Latent Topics & Local Structure][Latent Topics]
  19. 19. • Three generative models of tripartite graphs in social taggingsystems• Modeling of users’ interests in a latent space over resources andmetadata• Limitations Ignore several aspects of real-world annotation process, such as topiccorrelation and user interaction• Achieve great performance in the recommendation task Accurate predictors of social ties in conjunction with structuralevidence Proposed aggregation strategy to reduce number of training samples• Future work Incorporate other types of resources Automatically identify most discriminative latent topics and discarduninformative resources and metadataConcluding Remarks19
  20. 20. • Questions?chelmis@usc.eduThank you!20