This document discusses generative models for tripartite graphs in social media that model users, resources, and tags. It presents three models:
1) The User-Concept model that models users based on their tag usage but ignores resources and social aspects.
2) The User-Resource model that models resources as vocabulary terms but ignores tags and social aspects.
3) The User-Resource-Concept model that models both resources and users' interests using a topic-based representation and models the social generation of annotations.
The models are evaluated on their ability to predict tags/resources for new users, recommend social ties, and compare to baseline similarity metrics, with the ensemble approach achieving the best performance.
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media
1. Exploring Generative Models of Tripartite
Graphs for Recommendation in Social Media
Charalampos Chelmis, Viktor K Prasanna
chelmis@usc.edu
MSM 2013, Paris, France
2. • Introduction
• Structure of Tripartite Graphs
• Generative Models of Tripartite Graphs
• Social Link Classification Schemes
• Evaluation
• Conclusion
Overview
2
3. • Social Networking is used for
Content organization
Content sharing
• Multiple media types
• Users' activities
Reveal interests and tastes
Hidden structure
• Description of Resources
Text
Tags / Hashtags
• Social Annotation
Collective characterization of resources
Use of synonyms for similar recourses
Same keywords for different recourses
Introduction
3
4. • How to address issues of synonymy and polysemy?
Deal with space size explosion
• How to discover emergent structure in online tagging systems?
Hidden topics
• How to capture users’ latent interests?
Which subjects a user is mostly interested in?
Which users have similar interests?
• How to model the process of social generation of annotations?
How to capture the semantics of collaboration
• Why is this useful?
Recommend people
Recommend Tags / resources
Clustering
…
Research Questions
4
5. • Set of actors (e.g. users) A={a1, ...,ak}
• Set of concepts (e.g. tags) C = {c1, ..., cl}
• Set of resources (e.g. photos) R ={r1, ..., rm}
Structure of Tripartite Graphs
5
6. • The User-Concept Model
Users are modeled based on their tag usage
φ denotes the matrix of topic distributions
− multinomial distribution over N concepts
− T topics being drawn independently
θ: the matrix of user-specific mixture weights for
these T topics
• Captures users’ latent interests
• Ignores Resources
• Ignores the social aspect of tagging
• The User-Resource Model
Resources become vocabulary terms
• Tags are ignored
• Ignores the social aspect of tagging
Reducing the Tripartite Graph to Bipartite Structures
6
7. • Topic-based representation
• Model both resources & users’ interests
• Multiple users may annotate resource r
• For each tag a user is chosen uniformly at random
• Each user is associated with a distribution over
latent topics ɵ
• A topic is chosen from a distribution over topics
specific to that user
• The tag is generated from the chosen topic
φt: probability distribution of tags for topic t
The User-Resource-Concept Model
7
8. • Tag Recommendation
Automatic annotation enhancement
Search improvement
• Clustering
Community detection
Organization of resources/tags in categories
• Navigation and Visualization
Social browsing
• Next we focus on recommending people
Recommendation
8
9. • Classification Based on Latent Interests
Measure “tastes” distance with respect to latent topics distribution
Pointwise squared distance between feature vectors of users u and v
Other measures to consider
− Kullback Leibler (KL) divergence
− Cosine similarity
• Objective:
Minimize the distance between linked users
• Focus on topical homophily
Ignore network effects
• Prior work uses network proximity as indicator of link formation
Social Link Recommendation Using
Latent Semantics & Network Structure
9
]v))(k,-u)(k,(,,v))(1,-u)(1,[(v)F(u, 22
ΘΘΘΘ=
F(u,v) = 0 => u,v have
identical distributions
F(u,v) > 0 => distributions
diverge
10. • Latent Topics & Local Structure
CN(u,v) = common neighbors between users u and v
− Simplicity and computational efficiency
Latent topics similarity
• Latent Topics & Global Structure
SD(u,v) = shortest distance between users u and v
• Non separable training set => inefficient classifiers
• Aggregation Strategy
Reduce the number of training samples
Produce more efficient classifiers
Average latent similarity of user pairs with k common
neighbors:
Social Link Recommendation Using
Latent Semantics & Network Structure
10
v)]CN(u,v),(u,[v)F(u, σ=
∑==
=
kk:pp p
(p)
|kk:p|
1
(k)avg σσ
v)]SD(u,v),(u,[v)F(u, σ=
22
),(),(
),(),(
),(
∑∑
∑
ΘΘ
ΘΘ
=
tt
t
vtut
vtut
vuσ
11. • Objectives
Ability to uncover subliminal collective knowledge
Evaluate performance of “people” recommendation
• Setting
2.4 GHz Intel Core 2 Duo, 2 GB memory, Windows 7
• Real-world Dataset
Last.fm online music system
− social relationships
− tagging information
− music artist listening information
Statistics
− 1,892 users
− 25,434 directed user friend relations
− 17,632 artists UR Model vocabulary size
− 92,834 user-listened-artist relations
− 11,946 unique tags UC and URC vocabulary size
− 186,479 annotations (tuples <user, tag, artist>)
Experimental Analysis
11
13. • Evaluate ability to predict tags/resources on new users
Perplexity
• Split dataset into two disjoint sets
90% for training
• Lower perplexity indicates better generalization
• URC better overall
Exploits more information
• UC
Organizes tags in “clusters”
• UR
Inferior quality due to noise
Predictive Power
13
14. • Split dataset into two disjoint sets
10%, 25%, 50%, 75% for training, rest for testing
• Evaluation process
Randomly sample 12,716 pairs of users
50% true links, 50% negative samples
Compute similarity of user pairs
Sort users in decreasing order of similarity
Add links between users with highest similarity
Recommendation of Social Ties
14
15. • Latent Topics & Shortest Distance
Aggregates all true links training similarity values in a single point
Least effective
• Ensemble achieves best precision
• Over fitting for training size > 50%
• Recall drops as dataset size increases
Recommendation of Social Ties
15
[Latent Topics & Local Structure]
[Latent Topics]
[Ensemble]
16. • In social media number of true links << absent links
• High performance for both classes
True negatives easier to classify correctly
Degradation in performance for true positives
• Reasonable results for practical purposes
How about High Class Imbalance?
16
[Latent Topics & Local Structure]
[Latent Topics]
[Ensemble]
17. • Baselines
Cosine Similarity (CS)
Maximal Information Path (MIP)
• Evaluation Criterion
Area under the receiver-operating characteristic curve (AUC)
• Baselines AUC
Computed over the complete dataset
Biases the evaluation in favor of the baselines
CS AUC = 0.6087
MIP AUC = 0.6256
• Same evaluation process as before
• Compute performance lift
% change over best performing baseline
Positive % denotes improvement
Comparison to Tag-based similarity metrics
17
18. • Not all schemes can beat the baseline
For 10% training data
≤10% AUC loss
But, significant speedup due to minimal training dataset
• Latent Topics & Local Structure Scheme consistently better
Comparison to Tag-based similarity metrics
18
Training dataset size
[Latent Topics & Local Structure]
[Latent Topics]
19. • Three generative models of tripartite graphs in social tagging
systems
• Modeling of users’ interests in a latent space over resources and
metadata
• Limitations
Ignore several aspects of real-world annotation process, such as topic
correlation and user interaction
• Achieve great performance in the recommendation task
Accurate predictors of social ties in conjunction with structural
evidence
Proposed aggregation strategy to reduce number of training samples
• Future work
Incorporate other types of resources
Automatically identify most discriminative latent topics and discard
uninformative resources and metadata
Concluding Remarks
19