Context-aware recommender systems (CARS) adapt their recommendations to users’ specific situations. In many recommender systems, particularly those based on collaborative filtering, the contextual constraints may lead to sparsity: fewer matches between the current user context and previous situations. Our earlier work proposed an approach called differential context relaxation (DCR), in which different subsets of contextual features were applied in different components of a recommendation algorithm. In this paper, we expand on our previous work on DCR, proposing a more general approach — differential context weighting (DCW), in which contextual features are weighted. We compare DCR and DCW on two real-world datasets, and DCW demonstrates improved accuracy over DCR with comparable coverage. We also show that particle swarm optimization (PSO) can be used to efficiently determine the weights for DCW.
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
[UMAP2013] Recommendation with Differential Context Weighting
1. Recommendation with
Differential Context Weighting
Recommendation with
Differential Context Weighting
Yong Zheng
Robin Burke
Bamshad Mobasher
Center for Web Intelligence
DePaul University
Chicago, IL USA
Yong Zheng
Robin Burke
Bamshad Mobasher
Center for Web Intelligence
DePaul University
Chicago, IL USA
Conference on UMAP
June 12, 2013
2. Overview
• Introduction (RS and Context-aware RS)
• Sparsity of Contexts and Relevant Solutions
• Differential Context Relaxation & Weighting
• Experimental Results
• Conclusion and Future Work
5. Context-aware RS (CARS)
• Traditional RS: Users × Items Ratings
• Context-aware RS: Users × Items × Contexts Ratings
Companion
Example of Contexts in different domains:
Food: time (lunch, dinner), occasion (business lunch, family dinner)
Movie: time (weekend, weekday), location (home, cinema), etc
Music: time (morning, evening), activity (study, sports, party), etc
Book: a book as a gift for kids or mother, etc
Recommendation cannot live alone without considering contexts.
7. Sparsity of Contexts
• Assumption of Context-aware RS: It is better to use
preferences in the same contexts for predictions in
recommender systems.
• Same contexts? How about multiple contexts & sparsity?
An example in the movie domain:
Are there rating profiles in the contexts <Weekday, Home, Sister>?
User Movie Time Location Companion Rating
U1 Titanic Weekend Home Girlfriend 4
U2 Titanic Weekday Home Girlfriend 5
U3 Titanic Weekday Cinema Sister 4
U1 Titanic Weekday Home Sister ?
8. Relevant Solutions
Context Matching The same contexts <Weekday, Home, Sister>?
1.Context Selection Use the influential dimensions only
2.Context Relaxation Use a relaxed set of dimensions, e.g. time
3.Context Weighting We can use all dimensions, but measure how
similar the contexts are! (to be continued later)
Differences between context selection and context relaxation:
Context selection is conducted by surveys or statistics;
Context relaxation is directly towards optimization on predictions;
Optimal context relaxation/weighting is a learning process!
User Movie Time Location Companion Rating
U1 Titanic Weekend Home Girlfriend 4
U2 Titanic Weekday Home Girlfriend 5
U3 Titanic Weekday Cinema Sister 4
U1 Titanic Weekday Home Sister ?
9. DCR and DCW
• Differential Context Relaxation (DCR)
• Differential Context Weighting (DCW)
• Particle Swarm Intelligence as Optimizer
10. Differential Context Relaxation
Differential Context Relaxation (DCR) is our first attempt to alleviate
the sparsity of contexts, and differential context weighting (DCW) is a
finer-grained improvement over DCR.
• There are two notion in DCR
“Differential” Part Algorithm Decomposition
Separate one algorithm into different functional components;
Apply appropriate context constraints to each component;
Maximize the global contextual effects together;
“Relaxation” Part Context Relaxation
We use a set of relaxed dimensions instead of all of them.
• References
Y. Zheng, R. Burke, B. Mobasher. "Differential Context Relaxation for Context-aware
Travel Recommendation". In EC-WEB, 2012
Y. Zheng, R. Burke, B. Mobasher. "Optimal Feature Selection for Context-Aware
Recommendation using Differential Relaxation". In RecSys Workshop on CARS, 2012
11. DCR – Algorithm Decomposition
Take User-based Collaborative Filtering (UBCF) for example.
Pirates of the
Caribbean 4
Kung Fu Panda
2
Harry Potter
6
Harry Potter
7
U1 4 4 2 2
U2 3 4 2 1
U3 2 2 4 4
U4 4 4 1 ?
Standard Process in UBCF (Top-K UserKNN, K=1 for example):
1). Find neighbors based on user-user similarity
2). Aggregate neighbors’ contribution
3). Make final predictions
12. DCR – Algorithm Decomposition
Take User-based Collaborative Filtering (UBCF) for example.
1.Neighbor Selection 2.Neighbor contribution
3.User baseline 4.User Similarity
All components contribute to the final predictions, where
we assume appropriate contextual constraints can leverage the
contextual effect in each algorithm component.
e.g. use neighbors who rated in same contexts.
13. DCR – Context Relaxation
User Movie Time Location Companion Rating
U1 Titanic Weekend Home Girlfriend 4
U2 Titanic Weekday Home Girlfriend 5
U3 Titanic Weekday Cinema Sister 4
U1 Titanic Weekday Home Sister ?
Notion of Context Relaxation:
• Use {Time, Location, Companion} 0 record matched!
• Use {Time, Location} 1 record matched!
• Use {Time} 2 records matched!
In DCR, we choose appropriate context relaxation for each component.
# of matched ratings best performances & least noises
Balance
14. DCR – Context Relaxation
3.User baseline 4.User Similarity
2.Neighbor contribution
1.Neighbor Selection
c is the original contexts, e.g. <Weekday, Home, Sister>
C1, C2, C3, C4 are the relaxed contexts.
The selection is modeled by a binary vector.
E.g. <1, 0, 0> denotes we just selected the first context dimension
Take neighbor selection for example:
Originally select neighbors by users who rated the same item.
DCR further filter those neighbors by contextual constraint C1
i.e.. C1 = <1,0,0> Time=Weekday u must rated i on weekdays
15. DCR – Drawbacks
3.User baseline 4.User Similarity
2.Neighbor contribution
1.Neighbor Selection
1. Context relaxation is still strict, especially when data is sparse.
2. Components are dependent. For example, neighbor contribution is
dependent with neighbor selection. E.g. neighbors are selected by
C1: Location = Cinema, it is not guaranteed, neighbor has ratings
under contexts C2: Time = Weekend
A finer-grained solution is required!! Differential Context Weighting
16. Differential Context Weighting
User Movie Time Location Companion Rating
U1 Titanic Weekend Home Girlfriend 4
U2 Titanic Weekday Home Girlfriend 5
U3 Titanic Weekday Cinema Sister 4
U1 Titanic Weekday Home Sister ?
Goal: Use all dimensions, but we measure the similarity of contexts.
Assumption: More similar two contexts are given, the ratings may be
more useful for calculations in predictions.
c and d are two contexts. (Two red regions in the Table above.)
σ is the weighting vector <w1, w2, w3> for three dimensions.
Assume they are equal weights, w1 = w2 = w3 = 1.
J(c, d, σ) = # of matched dimensions / # of all dimensions = 2/3
Similarity of contexts is measured by
Weighted Jaccard similarity
17. Differential Context Weighting
3.User baseline
4.User Similarity
2.Neighbor contribution1.Neighbor Selection
1.“Differential” part Components are all the same as in DCR.
2.“Context Weighting” part (for each individual component):
σ is the weighting vector
ϵ is a threshold for the similarity of contexts.
i.e., only records with similar enough (≥ ϵ) contexts can be included.
3.In calculations, similarity of contexts are the weights, for example
2.Neighbor
contribution
It is similar calculation for the other components.
18. Particle Swarm Optimization (PSO)
The remaining work is to find optimal context relaxation vectors for
DCR and context weighting vectors for DCW. PSO is derived from
swarm intelligence which helps achieve a goal by collaborative
Fish Birds Bees
Why PSO?
1). Easy to implement as a non-linear optimizer;
2). Has been used in weighted CF before, and was demonstrated
to work better than other non-linear optimizer, e.g. genetic algorithm;
3). Our previous work successfully applied BPSO for DCR;
19. Particle Swarm Optimization (PSO)
Swarm = a group of birds
Particle = each bird ≈ each run in algorithm
Vector = bird’s position in the space ≈ Vectors we need
Goal = the location of pizza ≈ Lower prediction error
So, how to find goal by swam?
1.Looking for the pizza
Assume a machine can tell the distance
2.Each iteration is an attempt or move
3.Cognitive learning from particle itself
Am I closer to the pizza comparing with
my “best ”locations in previous history?
4.Social Learning from the swarm
Hey, my distance is 1 mile. It is the closest!
. Follow me!! Then other birds move towards here.
DCR – Feature selection – Modeled by binary vectors – Binary PSO
DCW – Feature weighting – Modeled by real-number vectors – PSO
How it works? Take DCR and Binary PSO for example:
Assume there are 4 components and 3 contextual dimensions
Thus there are 4 binary vectors for each component respectively
We merge the vectors into a single one, the vector size is 3*4 = 12
This single vector is the particle’s position vector in PSO process.
21. Context-aware Data Sets
AIST Food Data Movie Data
# of Ratings 6360 1010
# of Users 212 69
# of Items 20 176
# of Contexts
Real hunger
(full/normal/hungry)
Virtual hunger
Time (weekend, weekday)
Location (home, cinema)
Companions (friends, alone, etc)
Other
Features
User gender
Food genre, Food style
Food stuff
User gender
Year of the movie
Density Dense Sparse
Context-aware data sets are usually difficult to get….
Those two data sets were collected from surveys.
22. Evaluation Protocols
Metric: root-mean-square error (RMSE) and coverage which
denotes the percentage we can find neighbors for a prediction.
Our goal: improve RMSE (i.e. less errors) within a decent
coverage. We allow a decline in coverage, because applying
contextual constraints usually bring low coverage (i.e. the sparsity
of contexts!).
Baselines:
context-free CF, i.e. the original UBCF
contextual pre-filtering CF which just apply the contextual
constraints to the neighbor selection component – no other
components in DCR and DCW.
Other settings in DCR & DCW:
K = 10 for UserKNN evaluated on 5-folds cross-validation
T = 100 as the maximal iteration limit in the PSO process
Weights are ranged within [0, 1]
We use the same similarity threshold for each component,
which was iterated from 0.0 to 1.0 with 0.1 increment in DCW
23. Predictive Performances
Blue bars are RMSE values, Red lines are coverage curves.
Findings:
1) DCW works better than DCR and two baselines;
2) Significance t-test shows DCW works significantly in movie data,
but DCR was not significant over two baselines; DCW can further
alleviate sparsity of contexts and compensate DCR;
3) DCW offers better coverage over baselines!
24. Performances of Optimizer
Running time is in seconds.
Using 3 particles is the best configuration for two data sets here!
Factors influencing the running performances:
More particles, quicker convergence but probably more costs;
# of contextual variables: more contexts, probably slower;
Density of the data set: denser, more calculations in DCW;
Typically DCW costs more than DCR, because it uses all
contextual dimensions and the calculation for similarity of contexts
is time-consuming, especially for dense data, like the Food data.
25. Other Results (Optional)
1.The optimal threshold for similarity of contexts
For Food data set, it is 0.6;
For Movie data set, it is 0.1;
2.The optimal weighting vectors (e.g. Movie data)
Note: Darker smaller weights; Lighter Larger weights
27. Conclusions
We propose DCW which is a finer-grained improvement over DCR;
It can further improve predictive accuracy within decent coverage;
PSO is demonstrated to be the efficient optimizer;
We found underlying factors influencing running time of optimizer;
Stay Tuned
DCR and DCW are general frameworks (DCM, i.e. differential context
modeling as the name of this framework), and they can be applied to
any recommendation algorithms which can be decomposed into
multiple components.
We have successfully extend its applications to item-based
collaborative filtering and slope one recommender.
References
Y. Zheng, R. Burke, B. Mobasher. "Differential Context Modeling in
Collaborative Filtering ". In SOCRS-2013, Chicago, IL USA 2013
28. Acknowledgement
Student Travel Support from US NSF (UMAP Platinum Sponsor)
Future Work
Try other similarity of contexts instead of the simple Jaccard one;
Introduce semantics into the similarity of contexts to further alleviate
the sparsity of contexts, e.g., Rome is closer to Florence than Paris.
Parallel PSO or put PSO on MapReduce to speed up optimizer;
See u later…
The 19th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining (KDD), Chicago, IL USA, Aug 11-14, 2013