Semantically-Enhanced Recommendation Algorithms

Semantically-Enhanced
Recommendation Algorithms

CCIA 2012

Victor Codina & Luigi Ceccaroni
vcodina@lsi.upc.edu lceccaroni@BDigital.org

Departament de Llenguatges i Sistemes Informàtics Health Informatics
Knowledge Engineering and Machine Learning Group Personalized Computational Medicine

Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 2

The value of recommendations
 Netflix: 2/3 of the movies rented are recommend
 Google News: 38% more clickthrough
 Amazon: 35% sales from recommendations

All these systems employ as a main component
Collaborative Filtering (CF) approach


But in most online services the CF approach
does not work so well

Why??

Usually: Lack of Data

Other reasons: lack of context-awareness,
domain-specific particularities


Outline

Cold-start problem and existing solutions

Proposed solution to overcome cold start

Evaluation and results


Outline

Cold-start problem
Cold-start problem and
existing solutions
Hybrid recommenders




What is the cold-start problem?

 Narrow view
o No ratings at all associated to items or users
 Wider view
o Few ratings associated

Cold-start scenarios: Users
Many ratings Few ratings
Many
Normal New user
ratings
Items
Few
New item New user & item
ratings


Typical solution: hybrid recommender combining
CF with content-based filtering

PAST SOLUTION MORE RECENT SOLUTION
Collaborative Filtering Collaborative Filtering

+ +
Traditional Semantically-Enhanced
Content-based filtering Content-based filtering

New item
New user
Lack of understanding The need of domain
Limitation and exploitation of ontologies describing explicit
domain semantics metadata relations

Outline


Acquisition of implicit semantics
Proposed solution to
overcome cold start Methods for semantics exploitation



Acquisition of implicit domain semantics

 Implicit semantics = semantic similarities among item
attributes extracted from Vector Space Models (VSMs)
 Distributional hypothesis: “words that share similar
contexts share similar meaning”
Items Users

Context
Matrix
Attributes

Similarity
…

Attribute
… wa,c Transformation measure semantic
(SVD, Conditional (Cosine, similarities
probabilities) Jaccard)


Semantic similarities are context-dependant

 Item-based
o Similarity is measured in terms of how many items are similarly
described by both attributes
 User-based
o Similarity is measured in terms of how many users are similarly
interested in both attributes

Example: User-based Items-based
- Top-5 tags similar to “Sci-Fi” Scifi 0.79598457 Scifi 0.48631117
- Calculated using cosine future 0.6889696 aliens 0.42508063
similarity without matrix space 0.65459067 dystopia 0.34769687
transformation aliens 0.6110453 space 0.32580933
robots 0.59465224 future 0.27470198


Exploitation of implicit semantics in
content-based filtering

USER MODELING PREDICTION GENERATION
Attributes Attributes
Attribute
relevance [0,1] … wi,a
…
Items

… w Item attributes (i)
i,a

degree of interest [-1,1]

Items score
Attributes
… ru,i … User modeling … wu,a Vector-based
2. Semantic ( )
technique matching
matching
user ratings (u) User interests (u)
Expanded
user interests (u)

1. Profile
expansion


Method 1: User profile expansion by constrained
spreading activation

activated node
Attribute a1 a2 a3 a4 a5
semantic similarities 0 0.5 -0.1 0 0 User interests [-1,1]
a1 a2 a3 a4 a5

1 0.5 0.2 0 0.3
a1 (0.5) (0.3)

0.5 1 0.3 0 0.1
a2
a3

0.2 0.3 1 0.7 0.8
a4 0.25 0.5 0.05 0 0 Expanded

0 0 0.7 1 0
a1 a2 a3 a4 a5 user interests [-1,1]
a5

0.3 0.1 0.8 0 1 new interest Weight updated
Similarities can be symmetric or
not depending on the similarity
measure used Method - activation threshold = 0.25
hyper-parameters: - fan-out threshold = 0.25
- max.expansion levels = 1


Method 2: Prediction generation by pair-wise
semantic matching strategies

Approach: Vector-based matching
All-pairs matching
Best-pairs matching
Attribute Result: 0.15 - 0.056 = 0.094 - 0.056 = 0.12
- 0.009 + 0.035
semantic similarities (using the product as aggregation function)
a1 a2 a3 a4 a5 a1 a2 a3 a4 a5
Item attributes [0,1]
1 0.5 0.2 0 0.3
a1 0 0.3 0 0 0.7

0.5 1 0.3 0 0.1
a2
a3 (0.3)

0.2 0.3 1 0.7 0.8
Direct (0.1)
a4

0 0 0.7 1 0
matching (1)
(0.8)
a5

0.3 0.1 0.8 0 1
Similarities can be symmetric or 0 0.5 -0.1 0 0 User interests [-1,1]
not depending on the similarity
a1 a2 a3 a4 a5
measure used

Method
- similarity threshold = 0.05
hyper-parameter:


Outline



MovieLens data set
Experimental results


Offline experimentation with a MovieLens data
set extended with movie metadata

Data set statistics after pruning unusual
attributes values and movies with few attributes:

Users 2113
Movies 1646
Attributes 4 (Genres, directors, actors and tags)
Attribute values 2886
Ratings per user on avg. 239
Rating density 14%


Evaluation of methods for semantics exploitation

Baseline = Traditional CB using hybrid user modeling technique
Expansion-CB = CSA-same + User-based + raw frequencies
Matching-CB = Best-pairs-same + User-based + Forbes-Zhu method
BPR-MF = CF based on matrix factorization optimized for ranking


Conclusions

 Cold-start problem can be very critical
o Above all in systems with small databases
 Existing solutions have some limitations
o Traditional CB cannot solve new user scenario
o Semantically-enhanced CB requires domain ontologies to work
 Exploitation of implicit semantics can be a good
alternative to overcome cold-start problem
o User-based semantics is more effective than item-based
o The best-pair semantic matching method is more effective than
the profile expansion based on spreading activation


Future work

 Experimenting with data sets of different domains
o Million Song data set
 Extending the study of Vector Space Models
o Probabilistic similarity measures (e.g. Kullback-Leiber)
 Apply the same approach to enhance cold-start
performance of context-aware recommenders
o Implicit semantics of contextual conditions can also be acquired
from user data
o Similarly, pair-wise semantic strategies can be employed to
enhance contextual user modeling


Semantically-Enhanced Recommendation Algorithms

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Semantically-Enhanced Recommendation Algorithms

Ähnlich wie Semantically-Enhanced Recommendation Algorithms (20)

Mehr von Luigi Ceccaroni

Mehr von Luigi Ceccaroni (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Semantically-Enhanced Recommendation Algorithms

Hinweis der Redaktion