1. Amsterdam Data Science brings together data
science researchers and practitioners from
academia, industry, and government to share
ideas, pursue research, and foster talent
Amsterdam
Data Science
Amsterdam Data Science
Maarten de Rijke
http://amsterdamdatascience.nl
2. Amsterdam
Data Science
Knowledge leaders in key technology areas: computer vision,
knowledge representation, information retrieval, machine learning,
security
Four knowledge institutes, 300+ researchers, application domains
business analytics, health and life science, creative industries,
communication
Key facts and figures
4. Amsterdam
Data Science
VUmc
HvA
CWI VU
VUVUVU UvA
UvA
Life science Social analytics Business analytics Digital humanities
A
A
F
B
E
E
F
Informatics
I I
G G
B
KK
D D
NN
C
C
UvA UvA
H
H
J
J M
M L
L
UvA
8. Personalization and content recommendation
Amsterdam
DATA SCIENCE
8
http://bakadesuyo.bakadesuyo.netdna-cdn.com/wp-content/uploads/2012/12/20110504181819_books.jpg
9. Personalization and content recommendation
Amsterdam
DATA SCIENCE
9
http://archive.rusbase.com/media/blogs/apps.png
10. Personalization and content recommendation
Amsterdam
DATA SCIENCE
10
http://i.huffpost.com/gen/1293353/images/o-TV-facebook.jpg
11. Personalization and content recommendation
Amsterdam
DATA SCIENCE
11
https://beingjaffa.files.wordpress.com/2013/10/facebook-faces.jpg
14. Personalization and content recommendation
Amsterdam
DATA SCIENCE
14
Recommender systems
Ò For consumer
Ò Find things that are interesting
Ò Narrow down the set of choices
Ò Discover new things
Ò Entertainment
Ò …
Ò For producer
Ò Personalized service for the customer
Ò Increase trust and customer loyalty
Ò Increase sales, click trough rates, conversion etc.
Ò Opportunities for promotion, persuasion
Ò Obtain more knowledge about customers
Ò …
15. Personalization and content recommendation
Amsterdam
DATA SCIENCE
15
https://en.wikipedia.org/wiki/Recommender_system
16. Personalization and content recommendation
Amsterdam
DATA SCIENCE
16
Recommender systems
Ò Given
Ò a user, items
Ò return
Ò a ranked list of items (that are most interesting,
relevant, entertaining, useful, …)
Ò How is this different from a regular search
engine?
19. Personalization and content recommendation
Amsterdam
DATA SCIENCE
19
So many ways of ranking items that are
useful, interesting, relevant, …
Ò Collaborative filtering
Ò Build a model from a user's past behavior (items previously
purchased or selected and/or numerical ratings given to
those items) as well as similar decisions made by other
users. Use model to predict items (or ratings for items) that
the user may have an interest in
Ò Example
Ò Last.fm creates a “station” of recommended songs by
observing what bands and individual tracks user has
listened to on a regular basis and comparing those
against the listening behavior of other users
Ò Last.fm will play tracks that do not appear in user's library,
but are often played by other users with similar interests
Ò Cold start problem (new users, new items)
20. Personalization and content recommendation
Amsterdam
DATA SCIENCE
20
So many ways of ranking items that are
useful, interesting, relevant, …
Ò Content-based: utilize a series of characteristics
of an item in order to recommend additional
items with similar properties…
Ò Pandora uses the properties of a song or artist (a
subset of the 400 attributes provided by the Music
Genome Project) in order to seed a “station” that plays
music with similar properties
Ò User feedback is used to refine the station's results,
deemphasizing certain attributes when a user “dislikes”
a particular song and emphasizing other attributes
when a user “likes” a song
Ò Needs some sort of content descriptors
Ò Cold start problem (new users)
21. Personalization and content recommendation
Amsterdam
DATA SCIENCE
21
So many ways of ranking items that are
useful, interesting, relevant, …
Ò Knowledge-based: utilize a series of constraints
expressed by the user
Ò I only want black cars
Ò No cold-start problem
22. Personalization and content recommendation
Amsterdam
DATA SCIENCE
22
So many ways of ranking items that are
useful, interesting, relevant, …
Ò Probabilistic methods
Ò Given user/item rating matrix, determine the probability
that user will like a new item (based on past
observations)
Ò Matrix factorization methods
Ò Principal component analysis
Ò Popularity
Ò Trends
Ò Deep learning
Ò …
24. Personalization and content recommendation
Amsterdam
DATA SCIENCE
24
Recommendations are ranked lists
Ò We don’t care so much about the ratings as
about the order in which recommended results
are presented
Ò Automatically learn to rank
Ò Learn an individual ranking of items
Ò Learn to combine multiple rankings
25. Personalization and content recommendation
Amsterdam
DATA SCIENCE
25
Recommendations are ranked lists
Ò Learning to rank
Ò Pointwise
Ò Learn a score per item
Ò Pairwise
Ò Learn pairwise classifications
Ò Listwise
Ò Directly optimize a whole result page
29. Personalization and content recommendation
Amsterdam
DATA SCIENCE
29
Bandits
Ò In many domains items are constantly new (e.g., news
recommendation, computational advertisement)
Ò Contextual Bandits are on-line learning algorithms that work
by “exploring” the user preference space and “exploiting” the
resulting models in serving recommendations
Ò Active research in finding optimal strategies for ranking items
31. Personalization and content recommendation
Amsterdam
DATA SCIENCE
31
Run experiments online, all of the time
Ò Train offline up to a certain performance level
Ò Take your recommendation system online
Ò Let it learn from mostly implicit feedback from its
users
Ò Two methods: A/B testing vs. interleaving
32. Personalization and content recommendation
Amsterdam
DATA SCIENCE
32
Run experiments online, all of the time
33. Personalization and content recommendation
Amsterdam
DATA SCIENCE
33
Run experiments online, all of the time
34. Personalization and content recommendation
Amsterdam
DATA SCIENCE
34
Run experiments online, all of the time
35. Personalization and content recommendation
Amsterdam
DATA SCIENCE
35
Run experiments online, all of the time
36. Personalization and content recommendation
Amsterdam
DATA SCIENCE
36
Run experiments online, all of the time
39. Personalization and content recommendation
Amsterdam
DATA SCIENCE
39
Pitfalls?
Ò Questions about the outcomes of your online
experiments
Ò How can we achieve high classification accuracy while
eliminating discriminatory biases? What are meaningful
formal fairness properties?
Ò How can we design expressive yet easily interpretable
recommenders?
Ò Can we ensure that a recommender remains accurate
even if the statistical signal it relies on is exposed to
public scrutiny?
Ò Are there practical methods to test existing
recommenders for compliance with a policy?
40. Personalization and content recommendation
Amsterdam
DATA SCIENCE
40
Pitfalls?
Ò Questions about the fact that your running online
experiments with real users
Ò Should we be doing this?
Ò Facebook emotion contagion study
Ò This is not about privacy
42. 80% of tablet and smartphone owners use
their device while watching TV.
(Nielsen, 2011; Razorfish, 2011; Google 2012)
Case & his friends watching Superbowl XLVII, by Scott K. Macklin (mcdm.uw.edu)
43. Content?
38% of mobile
multitaskers access
content that is related
to the TV program
(Razorfish, 2011; Nielsen, 2011)
80% of tablet and
smartphone owners use their
device while watching TV.
(Nielsen, 2011; Razorfish, 2011;
Google 2012)
44.
45. Personalization and content recommendation
Amsterdam
DATA SCIENCE
45
Invitation
Ò MediaNow project
Ò Narrative Search
Ò The search engine result page tells a story (instead of
giving 10 blue links)
Ò Looking for media professionals and their search
behavior
46. Personalization and content recommendation
Amsterdam
DATA SCIENCE
46
How well will this work?
Ò Over 80% of Netflix consumptions comes from
recommendation
Ò Should we aim for 100%?