1) Diversifying recommendations based on latent features from matrix factorization can decrease choice difficulty and increase choice satisfaction compared to non-diversified recommendations.
2) Two user studies found that recommendations diversified on latent features were perceived as more diverse and attractive by users, with less choice difficulty.
3) For short lists of 5 items, high diversity led to similar choice satisfaction as longer lists of 10 or 20 items, but with less perceived choice difficulty.
Double Revolving field theory-how the rotor develops torque
Improving user experience in recommender systems
1. Improving user experience in
recommender systems
How latent feature diversification
can decrease choice difficulty and
improve choice satisfaction
Martijn Willemsen
Talk at Institute for Software
technology, Nov 3, 2015
Graz University of Technology
3. Choice Overload in Recommenders
Recommenders reduce information
overload…
But large personalized sets cause choice
overload!
Top-N of all highly ranked items
What should I choose?
These are all very attractive!
4. Choice Overload
Seminal example of choice overload
Satisfaction decreases with larger sets as increased
attractiveness is counteracted by choice difficulty
More attractive
3% sales
Less attractive
30% sales
Higher purchase
satisfaction
From Iyengar and Lepper (2000)
5. Choice Overload in Recommenders
(Bollen, Knijnenburg, Willemsen & Graus, RecSys 2010)
perceived recommendation
variety
perceived recommendation
quality
Top-20
vs Top-5 recommendations
movie
expertise
choice
satisfaction
choice
difficulty
+
+
+
+
-+
.401 (.189)
p < .05
.170 (.069)
p < .05
.449 (.072)
p < .001
.346 (.125)
p < .01
.445 (.102)
p < .001
-.217 (.070)
p < .005
Objective System Aspects (OSA)
Subjective System Aspects (SSA)
Experience (EXP)
Personal Characteristics (PC)
Interaction (INT)
Lin-20
vs Top-5 recommendations
+
+ - +
.172 (.068)
p < .05
.938 (.249)
p < .001
-.540 (.196)
p < .01
-.633 (.177)
p < .001
.496 (.152)
p < .005
-0.1
0
0.1
0.2
0.3
0.4
0.5
Top-5 Top-20 Lin-20
Choice satisfaction
6. A solution: diversification
Tradeoff between similarity and diversity (Smyth &
McClave, 2001) in finding relevant items
Diversification remedies high similarity of Top-N lists
But diversification reduces the overall quality (accuracy) of the list
Many studies only look at data not at real users
Simulated users interact more efficiently with a diverse set (Bridge &
Kelly 2006)
Some studies look at how actual users perceive and
evaluate diversity
Ziegler et al. (2005): diversification based on external ontology
Diversity reduced the accuracy of the recommendations
But coverage and satisfaction increased!
7. Goal of the current research
Extend existing work by:
Diversification based on psychological mechanisms
Test the algorithm with real users and measure their perceptions and
experiences with the algorithm
Outline of the talk
Our user-centric evaluation framework
Psychology behind choice overload
Latent Feature diversification algorithm
Two user studies to test our diversification
11. User-Centric Framework
Our framework adds the intermediate construct of perception that explains
why behavior and experiences changes due to our manipulations
12. User-Centric Framework
And adds personal
and situational
characteristics
Relations modeled
using factor analysis
and SEM
Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H., Newell, C. (2012). Explaining
the User Experience of Recommender Systems. User Modeling and User-Adapted
Interaction (UMUAI), vol 22, p. 441-504 http://bit.ly/umuai
14. Psychology behind choice overload
More options provide more benefits in terms of finding the
right option…
…but result in higher costs/effort
Objective effort:
More comparisons required
Cognitive effort:
Increased potential regret
Larger expectations for larger
sets
Many tradeoffs
15. Research on Choice overload
Choice overload is not omnipresent
Meta-analysis (Scheibehenne et al., JCR 2010)
suggests an overall effect size of zero
Choice overload stronger when:
No strong prior preferences
Little difference in attractiveness items
Prior studies did not control for
the diversity of the item set
Can we reduce choice difficulty and overload by using
personalized diversified item sets?
While controlling for attractiveness…
16. Diversification and attractiveness
Camera:
Suppose Peter thinks
resolution (MP) and Zoom
are equally important
user vector shows preference
direction
Equi-preference line:
Set of equally attractive options
(orthogonal on user vector)
Diversify on the equi-
preference line!
17. Choice Difficulty and Diversity
Larger sets are often more difficult because of the
increased uniformity of these sets (Fasolo et al., 2009;
Reutskaja et al., 2009)
Larger item sets have many
similar options
small inter-product distances
and small tradeoffs
High density!
Choice Difficulty related to
lack of justification 0
5
10
15
20
0 10 20Resolution(MP)
Zoom
High Density
small tradeoffs
18. Choice difficulty and trade-offs
As item sets become more diverse (less dense) tradeoff
size increases
Tradeoffs are effortful…
give up one aspect for another
But can be justified very easily!
0
5
10
15
20
0 10 20
Resolution(MP)
Zoom
High Density
small tradeoffs
0
5
10
15
20
0 10 20
Resolution(MP)
Zoom
Low Density
large tradeoffs
19. Double Mediation Model for difficulty
(Scholten and Sherman, JEP:G 2006)
U-shaped relation between diversity and difficulty:
Choosing from uniform set is
hard to justify but has no
difficult tradeoffs
Choosing from a diverse set
encompasses difficult tradeoffs
but is easy to justify
Does this also apply to personalized item sets?
How can we apply this to recommender system
algorithms? What features to diversify on?
Difficulty
diversity
uniform diverse
20. Features in Matrix Factorization
Latent Features as means
of diversification!
“Latent features are Preference
dimensions related to real world
concepts (e.g. Escapist/serious)”
(Koren, Bell and Volinsky,2009)
Users and items described as
vectors of latent features
Parallel to how choice sets are described in
MAUT(multi-attribute utility theory) in consumer
psychology
21. Explaining Matrix Factorization
Map users and items to a joint latent factor space of
dimensionality f
Each item is a vector qi
each user a vector pu
Predicted rating r of user u for item i:
How to find the dimensions?
SVD: singular value decomposition
u
T
iui pqr ˆ
22. Matrix Factorization algorithms
each user a vector pu Each item is a vector qi
Usual
Suspects
Titanic
DieHard
Godfather
Jack
Dylan
Olivia
Mark
?
?
?
? ? ?
?
pu
Dim1
Dim2
Jack 3 -1
Dylan 1.4 .2
Olivia -2.5 -.8
Mark -2 -1.5
qi
Usual
Suspects
Titanic
DieHard
Godfather
Dim 1 1.6 -1 5 0.2
Dim 2 1 1 .3 -.2
24. Greedy Diversity Algorithm
10-dimensional MF model
Take personalized top-200
Low: closest to centroid
Greedy algorithm
Select K items with
highest inter-item distance
(using city-block)
Medium:
select maximally diverse from
100 items closest to centroid
High: from all items in top-200
25. Greedy Diversity Algorithm
10-dimensional MF model
Take personalized top-200
Low: closest to centroid
Greedy algorithm
Select K items with
highest inter-item distance
(using city-block)
Medium:
select maximally diverse from
100 items closest to centroid
High: from all items in top-200
26. The algorithm
Measure of density: AFSR
Density/tradeoffs on the features: capture how items are distributed
in the feature space.
Average Factor Score Range (AFSR) based on the density metric
used by Fasolo et al. (2009)
X is set of items i
D is number of features
Captures the distribution of items in the feature space and their
tradeoffs better than standard similarity measures
27. Selection of initial set
Top-200 was selected as a balanced initial set:
Large differences in AFSR scores for the 3 levels of diversification
High average predicted rating: 4.48
Range in predicted ratings (0.546) lower than error of MF model:
MAE = .656 (RMSE = .854)
Final check: How does attractiveness vary within
and between the sets?
more diverse sets are by nature more likely to capture high-ranked
items…
Set Size Diversification Rating AFSR
5
Low 4.505 0.295
Medium 4.529 0.634
High 4.561 1.210
20
Low 4.537 0.586
Medium 4.558 1.005
High 4.604 1.615
28. System characteristics
MF recommender based on MyMedia project
10M MovieLens dataset: movies from 1994
5.6M ratings for 70k users and 5.4k movies
RMSE of 0.854, MAE of 0.656
Movies shown with title and predicted rating:
hovering the mouse over the title reveals additional information:
short synopsis, cast, director and image
29. Study 1a: Check diversification algorithm
Does our diversification affects the subjective experiences
with the recommendations?
Do people perceive the diversity?
Does it affects attractiveness and choice difficulty?
Each participant inspects 3 lists (low, mid and high
diversification), order counterbalanced
No choices made from the list
30. Study 1a: design/procedure
Pre-questionnaire
Personal characteristics
Rating task to train the system (10 ratings)
Assess three lists of recommendations
Within subjects: low / mid / high diversification
Between subjects: number of items (5,10,15,20,25)
After each list we measured:
Perceived Diversity & Attractiveness
Expected Trade-Off Difficulty & Choice Difficulty
31. Study 1a: Participants and Manipulation
checks
97 Participants from an online database
Paid for participation
Mean age: 29 years, 52 females and 45 males
Low, medium and high diversification
differed in the feature score range
Average predicted ratings of the sets were not different!
diversity
Average AFSR
(SE)
Avg. Predicted
rating
(SE)
Low 0.959 (0.015) 4.486 (0.042)
Medium 1.273 (0.016) 4.486 (0.041)
High 1.744 (0.024) 4.527 (0.039)
33. Study 1a: Structural Equation Model
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
low mid high
standardizedscore
diversification
attractiveness diversity
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
low mid high
scaledifference
diversification
choice diff. tradeoff diff.
34. Study 1a: Conclusions & Discussion
Diversifying on latent features
Increases attractiveness/diversity
Reduces trade-off difficulty (high)
Reduces choice difficulty (linearly)
No evidence for U-shaped difficulty model
High diversity does not result in trade-off conflicts
(perhaps due to the nature of the domain/MF?)
No effect of number on items
Small sets benefit as much from diversification
Diversification on MF features seems
promising to increase attractiveness!
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
low mid high
scaledifference
diversification
choice diff. tradeoff diff.
35. Study 1b: No choice satisfaction
In Study 1a, no actual choice was made
Explains limited effect of number of items
We could not measure choice satisfaction or justification-based
processes
Diversification and list length as two factors in a new
experiment with choice (and choice satisfaction)
Item size: 5, 10 and 20
Low and high diversity (no medium)
We expect choice difficulty to be more prominent for low
diversity sets
36. Study 1b: design/procedure
Pre-questionnaire
Personal characteristics
Rating task to train the system (10 ratings)
Choose one item from a list of recommendations
Between subjects: 2 levels (low / high diversification)
X 3 lengths (5, 10 or 20 items)
Afterwards we measured:
Perceived Diversity & Attractiveness
Choice Difficulty and Choice satisfaction
87 Participants from an online database
Paid for participation Mean age: 29Y, 41F, 46M
37. Questionnaire-items
Perceived recommendation diversity
5 items, e.g. “The list of movies was varied”
Perceived recommendation attractiveness
5 items, e.g. “The list of recommendations was attractive”
Choice satisfaction
6 items, e.g. “I think I would enjoy watching the chosen movie”
Choice difficulty
5 items, e.g.: “It was easy to select a movie”
39. Study 1b: Structural Equation Model
-0.5
0.0
0.5
1.0
1.5
2.0
5 10 20
standardizedscore
list length
Diversity
low diversity
high diversity
0.0
0.5
1.0
1.5
2.0
2.5
5 10 20
standardizedscore
list length
Satisfaction
low diversity high diversity
40. Study 1b: Results
Long list is more difficult (cognitive and objective effort
(hovers)) but also more satisfying in itself
Diversity influences choice satisfaction in important ways
Diversity increases attractiveness and reduces difficulty
These can increase satisfaction (but only for short lists)
Our diversification improved the 5-item list
5 diverse items are as satisfactory as
10 or 20 items and less difficult!
Less effort needed (hovers)
Using latent feature diversification
one does not need long item lists…
0.0
0.5
1.0
1.5
2.0
2.5
5 10 20
standardizedscore
list length
Satisfaction
low diversity high diversity
41. But…
Our studies show that low or high diversification from the
centroid of Top-200 works
However, these sets were optimized for diversity, not for prediction
accuracy
Most item sets thus not contain the best predicted items
So how does this compare to standard Top-N lists?
Slight modification of our algorithm: diversify starting from the best
predicted item in the top-N set (Top-1) rather than the centroid
Diversification and list length as two factors in a choice
overload experiment
list sizes: 5 and 20
Diversification: none (top 5/20), medium, high
42. Properties of the item sets
Set size diversification Avg. AFSR Avg. Rank
5
High (α = 1) 1.380 78.284
Medium (α = 0.3) 1.096 7.484
Top-N (α = 0) 0.774 3.000
20
High (α = 1) 1.793 89.380
Medium (α = 0.3) 1.486 17.849
Top-N (α = 0) 1.270 10.500
Algorithm balances between max diversity and highest rank
Every iteration: weigh (1-α) highest rank
against highest diversification (α)
α=0: top-N, α=1: max diverse.
(α=0.3 gives good medium diversity)
43. Design/procedure Study 2
159 Participants from an online database
Rating task to train the system (15 ratings)
Choose one item from a list of recommendations
Between subjects: 3 levels of diversification, 2 lengths
Afterwards we measured:
Perceptions: Perceived Diversity & Attractiveness
Experience: Choice Difficulty and Choice satisfaction
Behavior: total views / unique items considered
45. Perceived Diversity & attractiveness
Perceived Diversity increases with
Diversification
Similarly for 5 and 20 items
Perc. Diversity increases attractiveness
Perceived difficulty goes down with
diversification
Perceived attractiveness goes up
with diversification
Diverse 5 item set excels…
Just as satisfying as 20 items
Less difficult to choose from
Less cognitive load…!
-0.5
0
0.5
1
1.5
none med high
standardizedscore
diversification
Perc. Diversity
5 items
20 items
-0.2
0
0.2
0.4
0.6
0.8
1
none med high
standardizedscore
diversification
Choice Satisfaction
5 items
20 items
46. Choice Characteristics
Chosen option (mean and std. err)
Set
Diversity
List Position Rating Rank
5 items
None (top 5) 3.60 (0.27) 4.51 (0.07) 3.60 (0.27)
Medium 4.41 (0.59) 4.41 (0.07) 14.52 (5.37)
High 4.19 (0.27) 4.30 (0.07) 77.59 (12.76)
20 items
None (top 20) 10.15 (0.92) 4.45 (0.05) 10.15 (0.92)
Medium 10.33 (1.18) 4.40 (0.08) 17.7 (2.68)
high 9.93 (1.07) 4.16 (0.07) 72.22 (11.84)
With higher diversity, no difference in position of chosen option
Resulting in less ‘optimal’ choice in terms of predicted rating
Without a reduction in choice satisfaction!
47. Conclusions
Reducing Choice difficulty and overload
Diversity reduces choice difficulty
Less uniform sets are easier to choose from
Latent feature diversification easy to implement
Diversity can improve choice satisfaction
Even when the diversified list has movies with lower
predicted ratings than standard top-N lists
No need for larger item sets
Offering personalized diversified small items sets might be the key
to help decision makers cope with too much choice!
Psychological theory can inform how to improve the
output of Recommender algorithms
48. What you should take away…
Psychological theory can inform new ways of diversifying
algorithm output or eliciting preferences
But also: working with recommenders and algorithms we could
enhance psychological theory: personalized item sets gives control
User-centric evaluation helps to assess the effectiveness
Lot of work… and we need user studies…
But linking subjective to objective measures might help future
studies that cannot do user studies
User-centric framework allows us to understand WHY
particular approaches work or not
Concept of mediation: user perception helps understanding..
This is a problem that has been solved already…
We can help us cope with information overload by using recommenders
But once we get a large set of good recommendations, we might go from information overload to choice overload
Iyengar and lepper set up two different types of taste booths in a big supermarket
More people were attracted towards the boot with more jams, but less people visiting that booth actually bought jam, compared to the booth with fewer jams
For the sake of time only the standaard choice situation, comparing 5 with 20 items
People used a movie recommender, and got either a 5 or 20 item personalized list.
The 20 item list (compared to the 5 item list) as perceived to more varied and attractive, and therefore more satisfactory, but also more difficult to choose from, reducing the satisfaction. In the end, the 5 and 20 items were just as satisfactory…
Accurate recommendations often come at the problem of offering too similar options
We do not search for the highest accuracy but for the underlying mechanisms!
Our ultimate goal is to give users the highest satisfaction with the least difficulty, for this we need to manipulate underlying aspects of the stimulus set.
A deeper look into the diversity / accuracy tradeoff and the advantages of diversification (using the right algorithm and the right Psy assumptions)
If not personalized, it is not garantueed that the set is equally attractive to all users. Some users will experience difficulty, others might not.
See Chernev
User vector maximum difference in preference
The key of manipulation density is by looking at the latent features as they are used in modern recommender systems: these are not just technical tricks to reduce the dimensionality of a set of sparse data, but as they arrive from preferences, they might be related to meaningful preference dimensions
Try to fill in the blanks
Use PCA like techniques to find mapping
Extend this slide with the idea that there are many movies within these plains between which we can diversify…
Need a set of items that are sufficiently attractive and with not too high variance in attractiveness, but large enough to diversify
We take the top 200 predicted for this person, these are all movies they will like. Predicted ratings are typically around between 4 and 5 stars with a half star range.
Avg pred. rating Top 100: 4.58, range = .431
3 levels of diversification because of U-shaped model
Manipulate number of items to see if diversity increases with the number of items and perhaps the impact of diversification is stronger for smaller lists
Maximizer not related to any concept in the model
List length did not affect the variables (due to low N?) Discuss this later in study 2
Factor scores are standardized, the numbers reflect how a change in 1 SD of a factor A influences the score on another factor B
Most inportant: our diversification algo is preceived to be more diverse by our participants.
Higher perceived diversity increases the perceived attractiveness (and a PC of expertise: experts perceive the movies also to be more attractive)
Show graph to see the total effects of the manipulations on diversity and attractiveness
How do these perceptions change the experienced difficulty?
Tradeoff difficulty goes down for high div. (rather than up, as the u-shaped model predicted) but is lower for participants with high preference strength
Choice difficulty drops with diversity and with higher attractiveness, but increases with perceived tradeoff difficulty
The net effect is that choice difficulty reduces linearly with diversity (see graph)
Report the items that came out as related to the constructs in the factor analysis
No relation of the personal characteristics to the concepts in the model (expertise, maximizer and strength of preference)
The manipulations affect diversity and difficulty directly.
The precise effects are somewhat harder to see so we use the total effect plots, as list length interacts with diversity
The effect of high diversity is attenuated for larger sets.
Same for attractiveness, which is related to diversity.
Consistent with the existing literature and with Bollen et al. Choice satisfaction increases with attractiveness and diversity but decreases with choice difficulty
Choice difficulty is function of diversity, attractiveness and list length. Longer lists are harder, and become more difficult if they are attractive, but less difficult for higher diversity. As only the 5 item list is perceived as more diverse, we see that for this list length the difficults is lower. (show graph)
The choice satisfaction increaes with attractiveness, high diversity and list length, but decreases with difficulty.
The 10 and 20 items lists are more satisfying, without much difference between high and low diversity, but for 5 items, we see that the high diversified 5 items are as satisfactory as the 10 and 20 items. (e.g. the high diversity makes this list more attractive and less difficult, ofsetting the negative effect of list length)