Master Minds on Data Science - Maarten de Rijke

Amsterdam Data Science brings together data
science researchers and practitioners from
academia, industry, and government to share
ideas, pursue research, and foster talent
Amsterdam
Data Science
Amsterdam Data Science
Maarten de Rijke
http://amsterdamdatascience.nl

Amsterdam
Data Science
Knowledge leaders in key technology areas: computer vision,
knowledge representation, information retrieval, machine learning,
security
Four knowledge institutes, 300+ researchers, application domains
business analytics, health and life science, creative industries,
communication
Key facts and ﬁgures

Amsterdam
Data Science
VUmc
HvA
CWI VU
VUVUVU UvA
UvA
Life science Social analytics Business analytics Digital humanities
A
A
F
B
E
E
F
Informatics
I I
G G
B
KK
D D
NN
C
C
UvA UvA
H
H
J
J M
M L
L
UvA

Amsterdam
DATA SCIENCE
Personalization & content
recommendation
Maarten de Rijke

Personalization and content recommendation
Amsterdam
DATA SCIENCE
7
Background

Amsterdam
DATA SCIENCE
8
http://bakadesuyo.bakadesuyo.netdna-cdn.com/wp-content/uploads/2012/12/20110504181819_books.jpg

Amsterdam
DATA SCIENCE
9
http://archive.rusbase.com/media/blogs/apps.png

Amsterdam
DATA SCIENCE
10
http://i.huffpost.com/gen/1293353/images/o-TV-facebook.jpg

Amsterdam
DATA SCIENCE
11
https://beingjaffa.files.wordpress.com/2013/10/facebook-faces.jpg

Amsterdam
DATA SCIENCE
12
?

Amsterdam
DATA SCIENCE
13

Amsterdam
DATA SCIENCE
14
Recommender systems
Ò For consumer
Ò Find things that are interesting
Ò Narrow down the set of choices
Ò Discover new things
Ò Entertainment
Ò …
Ò For producer
Ò Personalized service for the customer
Ò Increase trust and customer loyalty
Ò Increase sales, click trough rates, conversion etc.
Ò Opportunities for promotion, persuasion
Ò Obtain more knowledge about customers
Ò …

Amsterdam
DATA SCIENCE
15
https://en.wikipedia.org/wiki/Recommender_system

Amsterdam
DATA SCIENCE
16
Recommender systems
Ò Given
Ò a user, items
Ò return
Ò a ranked list of items (that are most interesting,
relevant, entertaining, useful, …) 
Ò How is this different from a regular search
engine?

Amsterdam
DATA SCIENCE
17
The user is
the query

Amsterdam
DATA SCIENCE
18
So many criteria

Amsterdam
DATA SCIENCE
19
So many ways of ranking items that are
useful, interesting, relevant, …
Ò Collaborative filtering
Ò Build a model from a user's past behavior (items previously
purchased or selected and/or numerical ratings given to
those items) as well as similar decisions made by other
users. Use model to predict items (or ratings for items) that
the user may have an interest in
Ò Example
Ò Last.fm creates a “station” of recommended songs by
observing what bands and individual tracks user has
listened to on a regular basis and comparing those
against the listening behavior of other users
Ò Last.fm will play tracks that do not appear in user's library,
but are often played by other users with similar interests
Ò Cold start problem (new users, new items)

Amsterdam
DATA SCIENCE
20
Ò Content-based: utilize a series of characteristics
of an item in order to recommend additional
items with similar properties…
Ò Pandora uses the properties of a song or artist (a
subset of the 400 attributes provided by the Music
Genome Project) in order to seed a “station” that plays
music with similar properties
Ò User feedback is used to refine the station's results,
deemphasizing certain attributes when a user “dislikes”
a particular song and emphasizing other attributes
when a user “likes” a song
Ò Needs some sort of content descriptors
Ò Cold start problem (new users)

Amsterdam
DATA SCIENCE
21
Ò Knowledge-based: utilize a series of constraints
expressed by the user
Ò I only want black cars
Ò No cold-start problem

Amsterdam
DATA SCIENCE
22
Ò Probabilistic methods
Ò Given user/item rating matrix, determine the probability
that user will like a new item (based on past
observations)
Ò Matrix factorization methods
Ò Principal component analysis
Ò Popularity
Ò Trends
Ò Deep learning
Ò …

Amsterdam
DATA SCIENCE
23
Learning to rank

Amsterdam
DATA SCIENCE
24
Recommendations are ranked lists
Ò We don’t care so much about the ratings as
about the order in which recommended results
are presented
Ò Automatically learn to rank
Ò Learn an individual ranking of items
Ò Learn to combine multiple rankings

Amsterdam
DATA SCIENCE
25
Recommendations are ranked lists
Ò Learning to rank
Ò Pointwise
Ò Learn a score per item
Ò Pairwise
Ò Learn pairwise classifications
Ò Listwise
Ò Directly optimize a whole result page

Amsterdam
DATA SCIENCE
26
Illustration: listwise learning

Amsterdam
DATA SCIENCE
27
Increasingly: learning from implicit
feedback

Amsterdam
DATA SCIENCE
28

Amsterdam
DATA SCIENCE
29
Bandits
Ò In many domains items are constantly new (e.g., news
recommendation, computational advertisement)
Ò Contextual Bandits are on-line learning algorithms that work
by “exploring” the user preference space and “exploiting” the
resulting models in serving recommendations
Ò Active research in finding optimal strategies for ranking items

Amsterdam
DATA SCIENCE
30
Online experimentation

Amsterdam
DATA SCIENCE
31
Run experiments online, all of the time
Ò Train offline up to a certain performance level
Ò Take your recommendation system online
Ò Let it learn from mostly implicit feedback from its
users 
Ò Two methods: A/B testing vs. interleaving

Amsterdam
DATA SCIENCE
32

Amsterdam
DATA SCIENCE
33

Amsterdam
DATA SCIENCE
34

Amsterdam
DATA SCIENCE
35

Amsterdam
DATA SCIENCE
36

Amsterdam
DATA SCIENCE
37
Responsible data science

Amsterdam
DATA SCIENCE
38
Pitfalls?

Amsterdam
DATA SCIENCE
39
Pitfalls?
Ò Questions about the outcomes of your online
experiments
Ò How can we achieve high classification accuracy while
eliminating discriminatory biases? What are meaningful
formal fairness properties?
Ò How can we design expressive yet easily interpretable
recommenders?
Ò Can we ensure that a recommender remains accurate
even if the statistical signal it relies on is exposed to
public scrutiny?
Ò Are there practical methods to test existing
recommenders for compliance with a policy?

Amsterdam
DATA SCIENCE
40
Pitfalls?
Ò Questions about the fact that your running online
experiments with real users
Ò Should we be doing this?
Ò Facebook emotion contagion study
Ò This is not about privacy

Amsterdam
DATA SCIENCE
41
Wrap-up

80% of tablet and smartphone owners use
their device while watching TV.
(Nielsen, 2011; Razorﬁsh, 2011; Google 2012)
Case & his friends watching Superbowl XLVII, by Scott K. Macklin (mcdm.uw.edu)

Content?
38% of mobile
multitaskers access
content that is related
to the TV program
(Razorﬁsh, 2011; Nielsen, 2011)
80% of tablet and
smartphone owners use their
device while watching TV.
(Nielsen, 2011; Razorﬁsh, 2011;
Google 2012)

Amsterdam
DATA SCIENCE
45
Invitation
Ò MediaNow project
Ò Narrative Search
Ò The search engine result page tells a story (instead of
giving 10 blue links)
Ò Looking for media professionals and their search
behavior

Amsterdam
DATA SCIENCE
46
How well will this work?
Ò Over 80% of Netflix consumptions comes from
recommendation
Ò Should we aim for 100%?

Amsterdam
DATA SCIENCE
47
The next frontierdigital immortality

Amsterdam
DATA SCIENCE
48
Research supported by

Master Minds on Data Science - Maarten de Rijke

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (15)

Ähnlich wie Master Minds on Data Science - Maarten de Rijke

Ähnlich wie Master Minds on Data Science - Maarten de Rijke (20)

Mehr von Media Perspectives

Mehr von Media Perspectives (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Master Minds on Data Science - Maarten de Rijke