In the online world, user engagement refers to the quality of the user experience that emphasizes the phenomena associated with wanting to use an application longer and frequently. This talk looks at the role of Big Data in measuring user engagement. It does so through two case studies on using absence time, within sessions and across sessions.
Presentation at "Data-Driven Business Day" at Strata + HW Barcelona 2014.
4. What is user engagement?
“User engagement is a quality of the user
experience that emphasizes the phenomena
associated with wanting to use a technological
resource longer and frequently” (Attfield et al, 2011)
self-report: happy, sad,
enjoyment, …
analytics: click, upload,
read, comment, share …
physiology: gaze, body heat,
mouse movement, …
emotional, cognitive and behavioural connection
that exists, at any point in time and over time, between
a user and a technological resource
4
focus of this talk … Big Data
5. Why is it important to engage users?
§ In today’s wired world, users have enhanced expectations
about their interactions with technology
… resulting in increased competition amongst the
purveyors and designers of interactive systems.
§ In addition to utilitarian factors, such as usability, we must
consider the hedonic and experiential factors of interacting
with technology, such as fun, fulfillment, play, and user
engagement.
(O’Brien, Lalmas & Yom-Tov, 2014)
6. Online sites differ with respect to
their engagement pattern
Games
Users spend
much time per
visit
Search
Users come
frequently and
do not stay long
Social media
Users come
frequently and
stay long
Niche
Users come on
average once
a week e.g. weekly
post
News
Users come
periodically,
e.g. morning and
evening
Service
Users visit site,
when needed,
e.g. to renew
subscription
(Lehmann etal, 2012)
7. Why is it important interpret user
engagement measurement well?
CTR
new version
8. Characteristics of user engagement
Focused attention
Positive Affect
Aesthetics
Endurability
Novelty
Richness and control
Reputation, trust and
expectation
Motivation, interests,
incentives, and benefits
(Attfield etal, 2011)
9. Large scale measurements – analytics
• within-session engagement measures success in attracting user to the
site.
• across-session engagement measured by observing user long-term
value.
within-session measures across-session measures
• Dwell time
• Session duration
• Bounce rate
• Play time (video)
• Click through rate (CTR)
• Number of pages viewed
• Conversion rate
• Number of UCG (comments).
• Time between visits
• …
• Fraction of return visits
• Total view time per month (video)
• Lifetime value (number of actions)
• Number of sessions per unit of time
• Total usage time per unit of time
• Number of friends on site (social
networks)
• Number of UCG (comments)
• Time between visits
• …
loyalty
popularity
activity
10. Two case studies: Absence time
news facebook news news mail news
absence time
within sessions across sessions
12. The context – Online multitasking
Browsing the “old way”
1min 2min 1min 3min
facebook news news news news mail
Dwell time during a visit on news site: 7min on average
news site
Browsing “nowadays”
1min 2min 1min 3min
news facebook news news mail news
Dwell time during a visit on news site: 2.33min on average (1min | 3min | 3min)
13. The dataset … Big Data
› July 2012
› 2.5M users
› 785M page views
› Categorization of the most
frequent accessed sites
• 11 categories (e.g. news), 33
subcategories (e.g. news finance,
news society)
• 760 sites from 70 countries/regions
short sessions: average 3.01 distinct sites visited with revisitation rate 10%
long sessions: average 9.62 distinct sites visited with revisitation rate 22%
14. Time spent at each revisit
0.33
0.28
0.23
mail sites social media sites news (finance) sites news (tech) sites
p-value = 0.09
m = -0.01
p-value = 0.07
m = -0.02
p-value = 0.79
m = 0.00
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
100%
50%
0%
decreasing attention increasing attention constant attention complex attention
Teleportation Hyperlinking Backpaging
Proportion of total
dwell time on site
Percentage
of navigation type
ith visit on site ith visit on site ith visit on site ith visit on site
Four patterns: decreasing, increasing, constant and complex
Backpaging increasingly used at each revisit
Metrics:
1. AttShift: 1 (-1) when time spent at each revisit increases (decreases);
0 when no pattern
2. AttRange: 0 when time spent at each revisit is constant
15. Time spent between each visit
• 50% of sites are revisited after
less than 1min
interruption of a task?
• There are revisits after a long
break
returning to a site to perform a
new task?
1.00
0.75
0.50
0.25
0.00
news (finance)
news (tech)
social media
mail
10ï2 10ï1 100 101 102
Cumulative probability
Absence time [min]
2nd
1st 3rd
… visit
absence time
Metric:
1. CumAct: increases with time spent between visits
17. One task during a session
C2: 108 sites
auctions, front page,
shopping, dating
0.75
0.25
-0.25
-0.75
C1: 172 sites
mail, maps, news,
Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
news (soc.)
0.75
0.25
-0.25
-0.75
§ High dwell time per visit and during
entire session
§ Users return to continue a task
(short absence time)
§ C1: attention is shifting to another
site
§ C2: attention is shifting slowly
towards the site
18. Several tasks during a session
C4: 74 sites
front page, search,
download
C3: 156 sites
auctions, search,
front page, shopping
0.75
0.25
-0.25
-0.75
0.75
0.25
-0.25
-0.75
§ Users perform several tasks during
a session
§ No simple activity pattern
§ C3: Dwell time per visit is low, but
dwell time per session is high
Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
19. Sites with low activity
C5: 166 sites
service, download,
blogging, news (soc.)
0.75
0.25
-0.25
-0.75
§ Users do not spend a lot of time on
these sites
§ Time between visits is short
§ Attention is shifting towards the site
Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
20. Browsing behavior can differ between
sites of the same category
C2: 108 sites
auctions, front page,
shopping, dating
0.75
0.25
-0.25
-0.75
C3: 156 sites
auctions, search,
front page, shopping
0.75
0.25
-0.25
-0.75
§ C2: users visit site once to perform
their task
§ C3: users visit site several times to
perform task(s)
Visitdt [min] CumActdt,3 AttShiftdt,4 AttRangedt,4 Sessiondt [min]
21. The message – absence time within
sessions
Online multitasking happens
How it affects a site depends
on the site
Towards a taxonomy of online multitasking
One main continuous task versus user loyalty
New metrics that account for time spent between visits and at each visit
Applications: story-focused reading & network of sites
(Lehmann etal, 2013)
25. Absence time and survival analysis
Users (%) who read story 2 but did not come back after 10 hours
story 1
story 2
story 3
story 4
story 5
story 6
story 7
story 8
story 9
SURVIVE
0 5 10 15 20
0.0 0.2 0.4 0.6 0.8 1.0
Users (%) who did come back
DIE
DIE = RETURN TO SITE èSHORT ABSENCE TIME
26. The dataset … Big Data
Ranking function on Yahoo Answer Japan
Two-weeks click data on Yahoo Answer Japan: search
One millions users
Six ranking functions
30-minute session boundary
27. Examples of search session metrics
§ Number of clicks
§ Click at given position
§ Time to first click
§ Skipping
§ Abandonment rate
§ Number of query reformulations
28. Absence time and number of clicks on
search result page
survival analysis: high hazard rate (die quickly) = short absence
5 clicks
control = no click
3 clicks
29. Absence time – search experience
search session metrics absence time
1. No click means a bad user experience
2. Clicking between 3-5 results leads to same user experience
3. Clicking on more than 5 results reflects poorer user experience; users
cannot find what they are looking for
4. Clicking lower in the ranking (2nd, 3rd) suggests more careful choice from
the user (compared to 1st)
5. Clicking at bottom is a sign of low quality overall ranking
6. Users finding their answers quickly (click sooner) return sooner to the
search application
7. Returning to the same search result page is a worse user experience than
reformulating the query
30. The message – absence time across
sessions
short absence is
a sign of loyalty
important indication
of user satisfaction
with search portal
(Dupret & Lalmas, 2013)
Easy to implement and
interpret
Can compare many things in
one go
No need to estimate
baselines
But need lots of data to
account for noise
Applications: direct display &
churn rate & advertising
32. § “If you cannot measure it, you cannot
improve it” – William Thomson (Lord Kelvin)
§ “You cannot control what you cannot
measure” – DeMarco
§ “The way you measure is more important
than what you measure” – Art Gust