Presentation on our long paper for the #RecSys2017 conference on Recommender Systems, Como, Presented by Alain Starke. It shows how the psychometric Rasch model can enhance user recommendations in the energy domain.
In collaboration with Martijn Willemsen & Chris Snijders - Eindhoven University of Technology.
Starke2017 - Effective User Interface Designs to Increase Energy-efficient Behavior in a Rasch-based Energy Recommender System
1. Effective User Interface Designs
to Increase Energy-efficient Behavior
in a Rasch-based
Energy Recommender System
Alain Starke, Martijn Willemsen, Chris Snijders
Human-Technology Interaction Group,
Eindhoven University of Technology
2. Central question
Can we design a recommender
interface which effectively supports a
user’s energy-saving goals?
2
3. Most consumers perform simple behavioral
changes and think these are effective
(Attari et al., 2010)
3
5. How can we recommend from such a
diverse set of energy-saving measures?
5
6. Regular RecSys approaches, e.g. collaborative filtering,
are prone to reinforcing current behavior
• If we want consumers and users to achieve (energy-
saving) goals, we should not only focus on past
behavior but ‘move forward’ (cf. Ekstrand & Willemsen, 2016)
• We need a model which considers future goals
6
7. ‘Saving energy’ can be considered
as an ordinal item-response model
(in our case: a Rasch model)
7
8. Energy-saving measures can be ordered as
increasingly difficult behavioral steps towards
attaining the goal of saving energy
(Kaiser et al., 2010; Urban & Scasny, 2014)
< <
8
9. These steps reflect willingness & capacity to
save energy: a person’s energy-saving ability
(Kaiser et al., 2010; Urban & Scasny, 2014)
< <
9
10. We infer behavioral difficulties
based on engagement frequencies
10
INPUT
Persons
indicate
which
measures
they
perform
Difficult /
Obscure
Easy /
Popular
11. In a similar vein, we infer
energy-saving abilities
11
Low ability,
Performs few
Persons
indicate
which
measures
they
perform
INPUT
High ability,
Performs many
13. One’s energy-saving ability is a good starting
point to look for appropriate measures
A person has a 50% probability of performing a measure
with a difficulty equal to his/her ability
13
15. This paper
We developed two recommender interfaces for a
Rasch scale of 79 measures, to examine whether
it can effectively support the selection and
adoption of energy-saving measures
15
16. Two recommender user studies
Study 1:
• Using a Rasch scale, are ability-tailored
recommendations more satisfactory and
effective than non-personalized suggestions?
Study 2:
• How should advice be tailored around a user’s
ability to support energy-efficient behavior?
• Can persuasive interface aspects support this?
16
20. Research design
• Abilities were estimated using 13 behavioral self-report items
(cf. Bond & Fox, 2006; Starke et al., 2015)
• We compared four different types of advice:
4 between-subject conditions
– Non-personalized, ascending difficulty order (‘Most popular’)
– Non-personalized, descending difficulty order (‘Most difficult’)
– Ability-tailored, ascending difficulty order
– Ability-tailored, descending difficulty order
20
21. Dependent measures
Users interacting with the website
• Behavioral difficulty of chosen measures
• Number of chosen measures
• Clicking behavior
Evaluative Survey (7-point Likert scale)
• Perceived System Support
• Choice Satisfaction
• Perceived Effort
Survey sent to users after 4 weeks
• Extent of implementation of chosen measures (4-point
scale)
21
22. We evaluated the recommender using the
user experience framework
(Knijnenburg & Willemsen, 2015 – Evaluating Recommender Systems with User Experiments)
22
23. Participants & analysis
• 209 research panel participants used our interface & survey
• 78 participants completed the follow-up survey four weeks later
• Analysis: Structural Equation Modelling, using confirmatory
factor analysis for the user experience aspects
23
25. Ability-tailored advice was perceived as less
effortful and – in turn – more supportive &
satisfactory
25 *** p < 0.001, ** p < 0.01, * p < 0.05.
Tailored
Rec’s
Perceived
Effort
-.440*
-
Perceived
Support
Choice
Satisfaction
.746***-.767***
- +
26. Ability-tailored advice was perceived as less effortful
and – in turn – more supportive & satisfactory
26 *** p < 0.001, ** p < 0.01, * p < 0.05.
Tailored
Rec’s
Perceived
Effort
-.440*
-
Perceived
Support
Choice
Satisfaction
.746***-.767***
- +
27. User experience aspects reflected interface behavior
• Users perceiving system support selected more (easy) items
• Behavioral follow-up was higher for easy items
27 *** p < 0.001, ** p < 0.01, * p < 0.05.
Tailored
Rec’s
Perceived
Effort
Perceived
Support
Choice
Satisfaction
.746***
%
Executed
items
Chosen
per click
Difficulty
chosen
items
No. of
chosen
items
-.767***
.239***
-.113** .196*** .139**
-.440*
-.312**
-.068**
-
+
-
- -
+
+
+
28. Lessons learned
• Ability-tailored advice was a more effective
approach than simply using the Rasch scale
• Ambiguous results for behavioral follow-up
– Easy (feasible) measures might lead to more energy-
efficient behavior in the long run
– Difficult (novel) measures had a positive effect on
choice satisfaction
28
29. Study 2: How should advice be tailored to
support energy-efficient choices?
(And can fit scores help to persuade users to
pick more challenging measures?)
29
31. 31
Web shop interface with three lists (tabs):
‘Base’, ‘Recommended’ and ‘Challenging’
• ‘Recommended’ contains 15 best-matching measures,
with fit scores ranging 100% to 60%
• ‘Base’ are easier, ‘challenging’ more difficult
32. 3x2 Between-subject research design
• 3 levels of difficulty, determining contents of the
‘recommended’ list:
– Easy / below ability (~75% probability)
– Ability-tailored (~50% probability)
– Difficult / above ability (~25% probability)
• 2 levels of fit score: they were either shown or not
– The 100% score was consistent with the difficulty condition
– E.g. in the easy condition, measures below a user’s ability
(75%) had a 100% match score
33. Participants, procedure, analysis…
• 288 participants used our interface and completed the survey
• 46 participants reported behavioral follow-up
• Procedure & analysis: Similar to study 1
– We now measure perceived feasibility instead of effort
35. Easy recommendations were perceived as
feasible and, in turn, supportive & satisfactory
SEM Statistics: χ²(140) = 198.693, p < 0.001, CFI = 0.992, TLI = 0.990
35 *** p < 0.001, ** p < 0.01, * p < 0.05.
Perceived
Feasibility
−.469***
Rec
difficulty
-
36. Easy recommendations were perceived as
feasible and, in turn, supportive & satisfactory
SEM Statistics: χ²(140) = 198.693, p < 0.001, CFI = 0.992, TLI = 0.990
36 *** p < 0.001, ** p < 0.01, * p < 0.05.
Perceived
Feasibility
−.469***
Perceived
Support
Choice
Satisfaction.234***
Rec
difficulty
.506***
.221***
-
+ +
37. • Users who felt supported selected more measures
• Satisfied users showed a higher % of follow-up
37 *** p < 0.001, ** p < 0.01, * p < 0.05.
Perceived
Feasibility
−.469***
Perceived
Support
Choice
Satisfaction.234***
No. of
chosen
items
%
Executed
items
Difficulty
chosen
items
Rec
difficulty
.506***
.221***
.385***
−.113**
-
-
++
+
+
+
+
38. Users chose slightly more measures
when presented easier ones
(Showing fit scores did not really matter)
38
39. Fit scores boosted satisfaction levels for easy
measures, but backfired for difficult ones
39
40. Lessons learned
• A satisfactory user interface can lead to the adoption of more
energy-saving measures (within system + after 4 weeks)
• Easy tailored measures seem to be attractive, as they were
perceived as feasible and chosen more often
• Fit scores were merely self-reinforcing, not persuasive to
attain ‘more difficult goals’
40
41. Wrap-up & suggestions
• We presented a novel approach to recommendations,
using an ordinal Rasch scale in a user study, which also
measured behavioral follow-up
• A ‘light personalization’ algorithm had an effect on
behavioral change there’s room for more
• Rasch-based interfaces can be effective in supporting
actual behavior perhaps in other domains too?
41