This document discusses understanding user satisfaction with intelligent assistants. It proposes three main types of usage scenarios: controlling devices, web search, and structured dialogue search. It describes a user study designed to measure user satisfaction across these scenarios. The study examines key factors like effort that determine satisfaction for different scenarios. It also aims to characterize "good abandonment" in web search and examine how query-level satisfaction relates to overall satisfaction in dialogue search. The document concludes that effort is key to satisfaction across scenarios and that abandonment and session context are important to consider.
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
Ā
Understanding User Satisfaction with Intelligent Assistants
1. Understanding User Satisfaction
with Intelligent Assistants
Julia Kiseleva, Kyle Williams, Jiepu Jiang, Ahmed Hassan Awadallah,
Aidan C. Crook, Imed Zitouni, Tasos Anastasakos
Eindhoven University of Technology
Pennsylvania State University
University of Massachusetts Amherst
Microsoft
CHIIRā16, Chapel Hill, NC, USA
2. Q1: how is the weather in Chicago
Q2: how is it this weekend
Q3: find me hotels
Q4: which one of these is the cheapest
Q5: which one of these has at least 4 stars
Q6: find me directions from the Chicago airport to
number one
Userās dialogue
with Cortana:
Task is āFinding
a hotel in
Chicagoā
3. Q1: find me a pharmacy nearby
Q2: which of these is highly rated
Q3: show more information about number 2
Q4: how long will it take me to get there
Q5: Thanks
Userās dialogue
with Cortana:
Task is āFinding
a pharmacyā
5. Controlling Device
ā¢ Call a person
ā¢ Send a text message
ā¢ Check on-device calendar
ā¢ Open an application
ā¢ Turn on/off wi-fi
ā¢ Play music
13. Cortana:
āHere are ten
restaurants
near youā
Cortana:
āHere are ten
restaurants near
you that have
good reviewsā
User:
āshow
restaurant
s near meā
User:
āshow the
best
restaurants
near me ā
Search Dialogue
14. Cortana:
āHere are ten
restaurants
near youā
Cortana:
āHere are ten
restaurants near
you that have
good reviewsā
Cortana:
āGetting you
direction to the
Mayuri Indian
Cuisineā
User:
āshow
restaurant
s near meā
User:
āshow the
best
restaurants
near me ā
User:
āshow
directions to
the second
oneā
Search Dialogue
15. Research Questions
ā¢ RQ1: What are characteristic types of scenarios of use?
ā¢ RQ2: How can we measure different aspects of user satisfaction?
ā¢ RQ3: What are key factors determining user satisfaction for the
different scenarios?
ā¢ RQ4: How to characterize abandonment in the web search
scenario?
ā¢ RQ5: How does query-level satisfaction relate to overall user
satisfaction for the search dialogue scenario?
16. Research Questions
ā¢ RQ1: What are characteristic types of scenarios of use?
ā¢ RQ2: How can we measure different aspects of user satisfaction?
ā¢ RQ3: What are key factors determining user satisfaction for the
different scenarios?
ā¢ RQ4: How to characterize abandonment in the web search
scenario?
ā¢ RQ5: How does query-level satisfaction relate to overall user
satisfaction for the search dialogue scenario?
USERSTUDY
19. User Study Participants
75%
25%
GENDER
Male Female
55%
45%
LANGUAGE
English Other
82%
8%
2%
8%
EDUCATION
Computer Science Electrical Engineering
Mathematics Other
ā¢ 60 Participants
ā¢ 25.53 +/- 5.42 years
20. User Study Design
ā¢ Video Instructions (same for all participants)
ā¢ Tasks are realistic ā mined from Cortana logs:
o Control type of tasks
o Queries where users donāt click
o Search dialogue tasks ā mostly localization type of queries
22. You are planning a
vacation. Pick a place.
Check if the weather is
good enough for the
period you are planning
the vacation. Find a hotel
that suits you. Find the
driving directions to this
place.
23. You are planning a
vacation. Pick a place.
Check if the weather is
good enough for the
period you are planning
the vacation. Find a hotel
that suits you. Find the
driving directions to this
place.
24. Questionnaire: Controlling Device
ā¢ Were you able to complete the task?
o Yes/No
ā¢ How satisfied are you with your experience in this task?
o 5-point Likert scale
ā¢ How well did Cortana recognize what you said?
o 5-point Likert scale
ā¢ Did you put in a lot of effort to complete the task?
o 5-point Likert scale
25. Questionnaire: Controlling Device
ā¢ Were you able to complete the task?
o Yes/No
ā¢ How satisfied are you with your experience in this task?
o 5-point Likert scale
ā¢ How well did Cortana recognize what you said?
o 5-point Likert scale
ā¢ Did you put in a lot of effort to complete the task?
o 5-point Likert scale
5 Tasks
20 Minutes
26. Questionnaire: Good Abandonment
ā¢ Were you able to complete the task?
o Yes/No
ā¢ Where did you find the answer?
o Answer Box, Image, SERP, Visited Website
ā¢ Which query led you to finding the answer?
o First, Second, Third, >= Fourth
ā¢ How satisfied are you with your experience in this task?
o 5-point Likert scale
ā¢ Did you put in a lot of effort to complete the task?
o 5-point Likert scale
27. Questionnaire: Good Abandonment
ā¢ Were you able to complete the task?
o Yes/No
ā¢ Where did you find the answer?
o Answer Box, Image, SERP, Visited Website
ā¢ Which query led you to finding the answer?
o First, Second, Third, >= Fourth
ā¢ How satisfied are you with your experience in this task?
o 5-point Likert scale
ā¢ Did you put in a lot of effort to complete the task?
o 5-point Likert scale
5 Tasks
20 Minutes
28. Questionnaire: Search Dialogue
ā¢ Were you able to complete the task?
o Yes/No
ā¢ How satisfied are you with your experience in this task?
o If the task has sub-tasks participants indicate their graded
satisfaction e.g.
o a. How satisfied are you with your experience in finding a hotel?
o b. How satisfied are you with your experience in finding directions?
ā¢ How well did Cortana recognize what you said?
o 5-point Likert scale
ā¢ Did you put in a lot of effort to complete the task?
o 5-point Likert scale
29. Questionnaire: Search Dialogue
ā¢ Were you able to complete the task?
o Yes/No
ā¢ How satisfied are you with your experience in this task?
o If the task has sub-tasks participants indicate their graded
satisfaction e.g.
o a. How satisfied are you with your experience in finding a hotel?
o b. How satisfied are you with your experience in finding directions?
ā¢ How well did Cortana recognize what you said?
o 5-point Likert scale
ā¢ Did you put in a lot of effort to complete the task?
o 5-point Likert scale
8 Tasks: 1 simple,
4 with 2 subtasks,
3 with 3 subtasks
30 Minutes
30. Search Dialog Dataset
ā¢ 540 tasks that incorporated
ā¢ 2, 040 queries, of which 1, 969 were unique
ā¢ the average query-length is 7.07
ā¢ The simple task generated 130 queries in total
ā¢ Tasks with 2 context switches generated 685 queries
ā¢ Tasks with 3 context switches generated 1, 355 queries
35. Search Dialogue Satisfaction
RQ5: How does query-level satisfaction relate to overall
user satisfaction for the structured search dialogue
scenario?
36. Cortana:
āHere are ten
restaurants
near youā
Cortana:
āHere are ten
restaurants near
you that have
good reviewsā
Cortana:
āGetting you
direction to the
Mayuri Indian
Cuisineā
User:
āshow
restaurant
s near meā
User:
āshow the
best
restaurants
near me ā
User:
āshow
directions to
the second
oneā
SAT? SAT? SAT?
SAT? SAT? SAT?
Overall
SAT?
?
37. Search Dialogue Satisfaction
RQ5: How does query-level satisfaction relate to overall
user satisfaction for the structured search dialogue
scenario?
41. Q1: what do you have medicine for the stomach ache
Q2: stomach ache medicine over the counter
Q3: show me the nearest pharmacy
Q4: more information on the second one
Q5: do they have a stool softener
Q6: does Fred Meyer have stool softeners
General Search
Search Dialog
Combination
of scenarios
Userās dialogue with Cortana related to the āstomach acheā problem
42. Conclusions (1)
ā¢ RQ1: What are characteristic types of scenarios of use?
ā¢ We proposed three main types of scenarios
ā¢ RQ2: How can we measure different aspects of user
satisfaction?
ā¢ We designed a series of user studies tailored to the three
scenarios
ā¢ RQ3: What are key factors determining user satisfaction for
the different scenarios?
ā¢ Effort is a key component of user satisfaction across the
different intelligent assistants scenarios
43. Conclusions (2)
ā¢ RQ4: How to characterize abandonment in the web search
scenario?
ā¢ We concluded that to measure good abandonment we need
to investigate the other forms of interaction signals that are
not based on clicks or reformulation
ā¢ RQ5: How does query-level satisfaction relate to overall user
satisfaction for the search dialogue scenario?
ā¢ We looked at user satisfaction as āa user journey towards an
information goal where each step is important,ā and showed
the importance of session context
44. Questions?
ā¢ We proposed three main types of scenarios of use
ā¢ We designed a series of user studies tailored to the three scenarios
ā¢ Effort is a key component of user satisfaction across the different
intelligent assistants scenarios
ā¢ We concluded that to measure good abandonment we need to investigate
the other forms of interaction signals that are not based on clicks or
reformulation
ā¢ We looked at user satisfaction as āa user journey towards an information
goal where each step is important,ā and showed the importance of session
context on user satisfaction
Questions?