Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Exploring Generative Models of Tripartite
Graphs for Recommendation in Social Media
Charalampos Chelmis, Viktor K Prasanna
chelmis@usc.edu
MSM 2013, Paris, France

• Introduction
• Structure of Tripartite Graphs
• Generative Models of Tripartite Graphs
• Social Link Classification Schemes
• Evaluation
• Conclusion
Overview
2

• Social Networking is used for
 Content organization
 Content sharing
• Multiple media types
• Users' activities
 Reveal interests and tastes
 Hidden structure
• Description of Resources
 Text
 Tags / Hashtags
• Social Annotation
 Collective characterization of resources
 Use of synonyms for similar recourses
 Same keywords for different recourses
Introduction
3

• How to address issues of synonymy and polysemy?
 Deal with space size explosion
• How to discover emergent structure in online tagging systems?
 Hidden topics
• How to capture users’ latent interests?
 Which subjects a user is mostly interested in?
 Which users have similar interests?
• How to model the process of social generation of annotations?
 How to capture the semantics of collaboration
• Why is this useful?
 Recommend people
 Recommend Tags / resources
 Clustering
 …
Research Questions
4

• Set of actors (e.g. users) A={a1, ...,ak}
• Set of concepts (e.g. tags) C = {c1, ..., cl}
• Set of resources (e.g. photos) R ={r1, ..., rm}
Structure of Tripartite Graphs
5

• The User-Concept Model
 Users are modeled based on their tag usage
 φ denotes the matrix of topic distributions
− multinomial distribution over N concepts
− T topics being drawn independently
 θ: the matrix of user-specific mixture weights for
these T topics
• Captures users’ latent interests
• Ignores Resources
• Ignores the social aspect of tagging
• The User-Resource Model
 Resources become vocabulary terms
• Tags are ignored
• Ignores the social aspect of tagging
Reducing the Tripartite Graph to Bipartite Structures
6

• Topic-based representation
• Model both resources & users’ interests
• Multiple users may annotate resource r
• For each tag a user is chosen uniformly at random
• Each user is associated with a distribution over
latent topics ɵ
• A topic is chosen from a distribution over topics
specific to that user
• The tag is generated from the chosen topic
 φt: probability distribution of tags for topic t
The User-Resource-Concept Model
7

• Tag Recommendation
 Automatic annotation enhancement
 Search improvement
• Clustering
 Community detection
 Organization of resources/tags in categories
• Navigation and Visualization
 Social browsing
• Next we focus on recommending people
Recommendation
8

• Classification Based on Latent Interests
 Measure “tastes” distance with respect to latent topics distribution
 Pointwise squared distance between feature vectors of users u and v

 Other measures to consider
− Kullback Leibler (KL) divergence
− Cosine similarity
• Objective:
 Minimize the distance between linked users
• Focus on topical homophily
 Ignore network effects
• Prior work uses network proximity as indicator of link formation
Social Link Recommendation Using
Latent Semantics & Network Structure
9
]v))(k,-u)(k,(,,v))(1,-u)(1,[(v)F(u, 22
ΘΘΘΘ= 
F(u,v) = 0 => u,v have
identical distributions
F(u,v) > 0 => distributions
diverge

• Latent Topics & Local Structure
 CN(u,v) = common neighbors between users u and v
− Simplicity and computational efficiency
 Latent topics similarity


• Latent Topics & Global Structure
 SD(u,v) = shortest distance between users u and v

• Non separable training set => inefficient classifiers
• Aggregation Strategy
 Reduce the number of training samples
 Produce more efficient classifiers
 Average latent similarity of user pairs with k common
neighbors:
Social Link Recommendation Using
Latent Semantics & Network Structure
10
v)]CN(u,v),(u,[v)F(u, σ=
∑==
=
kk:pp p
(p)
|kk:p|
1
(k)avg σσ
v)]SD(u,v),(u,[v)F(u, σ=
22
),(),(
),(),(
),(
∑∑
∑
ΘΘ
ΘΘ
=
tt
t
vtut
vtut
vuσ

• Objectives
 Ability to uncover subliminal collective knowledge
 Evaluate performance of “people” recommendation
• Setting
 2.4 GHz Intel Core 2 Duo, 2 GB memory, Windows 7
• Real-world Dataset
 Last.fm online music system
− social relationships
− tagging information
− music artist listening information
 Statistics
− 1,892 users
− 25,434 directed user friend relations
− 17,632 artists UR Model vocabulary size
− 92,834 user-listened-artist relations
− 11,946 unique tags UC and URC vocabulary size
− 186,479 annotations (tuples <user, tag, artist>)
Experimental Analysis
11

• Evaluate ability to predict tags/resources on new users
 Perplexity
• Split dataset into two disjoint sets
 90% for training
• Lower perplexity indicates better generalization
• URC better overall
 Exploits more information
• UC
 Organizes tags in “clusters”
• UR
 Inferior quality due to noise
Predictive Power
13

• Split dataset into two disjoint sets
 10%, 25%, 50%, 75% for training, rest for testing
• Evaluation process
 Randomly sample 12,716 pairs of users
 50% true links, 50% negative samples
 Compute similarity of user pairs
 Sort users in decreasing order of similarity
 Add links between users with highest similarity
Recommendation of Social Ties
14

• Latent Topics & Shortest Distance
 Aggregates all true links training similarity values in a single point
 Least effective
• Ensemble achieves best precision
• Over fitting for training size > 50%
• Recall drops as dataset size increases
Recommendation of Social Ties
15
[Latent Topics & Local Structure]
[Latent Topics]
[Ensemble]

• In social media number of true links << absent links
• High performance for both classes
 True negatives easier to classify correctly
 Degradation in performance for true positives
• Reasonable results for practical purposes
How about High Class Imbalance?
16
[Latent Topics]
[Ensemble]

• Baselines
 Cosine Similarity (CS)
 Maximal Information Path (MIP)
• Evaluation Criterion
 Area under the receiver-operating characteristic curve (AUC)
• Baselines AUC
 Computed over the complete dataset
 Biases the evaluation in favor of the baselines
 CS AUC = 0.6087
 MIP AUC = 0.6256
• Same evaluation process as before
• Compute performance lift
 % change over best performing baseline
 Positive % denotes improvement
Comparison to Tag-based similarity metrics
17

• Not all schemes can beat the baseline
 For 10% training data
 ≤10% AUC loss
 But, significant speedup due to minimal training dataset
• Latent Topics & Local Structure Scheme consistently better
Comparison to Tag-based similarity metrics
18
Training dataset size
[Latent Topics]

• Three generative models of tripartite graphs in social tagging
systems
• Modeling of users’ interests in a latent space over resources and
metadata
• Limitations
 Ignore several aspects of real-world annotation process, such as topic
correlation and user interaction
• Achieve great performance in the recommendation task
 Accurate predictors of social ties in conjunction with structural
evidence
 Proposed aggregation strategy to reduce number of training samples
• Future work
 Incorporate other types of resources
 Automatically identify most discriminative latent topics and discard
uninformative resources and metadata
Concluding Remarks
19

• Questions?
chelmis@usc.edu
Thank you!
20

Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media

Ähnlich wie Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Exploring Generative Models of Tripartite Graphs for Recommendation in Social Media