7. Telefonica is a fast-growing Telecom
1989 2000 2008
Clients About 12 About 68 About 260
million million million
subscribers customers customers
Services Basic Wireline and mobile Integrated ICT
telephone and voice, data and solutions for all
data services Internet services customers
Geographies
Operations in Operations in
Spain 25 countries
16 countries
Staff
About 71,000 About 149,000 About 257,000
professionals professionals professionals
Finances Rev: 4,273 M€ Rev: 28,485 M€ Rev: 57,946 M€
EPS(1): 0.45 € EPS(1): 0.67 € EPS: 1.63 €
(1) EPS: Earnings per share
8. Currently among the largest in the world
Telco sector worldwide ranking by market cap (US$ bn)
Source: Bloomberg, 06/12/09
9. Telefonica R&D (TID) is the Research and Development Unit
of the Telefónica Group
MISSION
“To contribute to the
improvement of the
n Founded in 1988 Telefónica Group’s
n Largest private R&D center in Spain competitivness through
n More than 1100 professionals technological innovation”
n Five centers in Spain and two in Latin America
Telefónica was in 2008 the first Spanish company by R&D Investment and the
third in the EU
Applied R&D
research 61 M€ R&D
Products / Services / Processes 594 M€
development 4.384 M€
Technological Innovation
(1)
10. Internet Scientific Areas
Content Distribution and P2P Wireless and Mobile Systems Social Networks
Next generation Managed Wireless bundling Information Propagation
P2P-TV
Device2Device Content Social Search Engines
Future Internet: Content Distribution
Networking Infrastructure for Social
Large Scale mobile data based cloud computing
Delay Tolerant Bulk analysis
Distribution
Network Transparency
11. Multimedia Scientific Areas
Multimedia Mobile and Ubicomp HCC
Core
Multimedia Data Context Multimodal User
Analysis, Search Awareness Interfaces
& Retrieval
Urban Computing Expression, Gesture,
Video, Audio,
Emotion Recognition
Image, Music, Mobile Multimedia
Text, Sensor Data & Search Personalization &
Recommendation
Understanding, Wearable Systems
Summarization, Physiological
Visualization Monitoring Super Telepresence
12. Data Mining & User Modeling
Areas
SOCIAL NETWORK ANALYSYS & BUSINESS INT.
-
Analytical CRM
-
Trend-spotting, service propagation & churn
-
Social Graph Analysis (construction, dynamics)
USER MODELING
-
Application to new services (technology for development)
-
Cognitive, socio-cultural, and contextual modeling
-
Behavioral user modeling (service-use patterns)
DATA MINING
Integration of statistical & knowledge-based techniques
-
- Stream mining
Large scale & distributed machine learning
-
13. Index
Now seriously,
this is where the index should go!
15. The Age of Search has come
to an end
... long live the Age of Recommendation!
Chris Anderson in “The Long Tail”
“We are leaving the age of information and entering the age
of recommendation”
CNN Money, “The race to create a 'smart' Google”:
“The Web, they say, is leaving the era of search and entering
one of discovery. What's the difference? Search is what you
do when you're looking for something. Discovery is when
something wonderful that you didn't know existed, or didn't
know how to ask for, finds you.”
17. The value of
recommendations
Netflix: 2/3 of the movies rented are
recommended
Google News: recommendations generate
38% more clickthrough
Amazon: 35% sales from recommendations
Choicestream: 28% of the people would buy
more music if they found what they liked.
u
18. The “Recommender problem”
Estimate a utility function that is able to
automatically predict how much a user will like
an item that is unknown for her. Based on:
Past behavior
Relations to other users
Item similarity
Context
...
19. The “Recommender problem”
Let C be a large set of all users and let S be a large set of
all possible items that can be recommended (e.g books,
movies, or restaurants).
Let u be a utility function that measures the usefulness of
item s to user c, i.e., u : C X S→R, where R is a totally
ordered set. Then, for each user c є C, we want to choose
such item s’ є S that maximizes u.
Utility of an item is usually represented by rating but can
also can be an arbitrary function, including a profit function.
20. Approaches to Recommendation
Collaborative Filtering
Recommend items based only on the users past behavior
User-based
Find similar users to me and recommend what they liked
Item-based
Find similar items to those that I have previously liked
Content-based
Recommend based on features inherent to the items
Social recommendations (trust-based)
22. The Netflix Prize
500K users x 17K movie
titles = 100M ratings = $1M
(if you “only” improve
existing system by 10%!
From 0.95 to 0.85 RMSE)
49K contestants on 40K teams from
184 countries.
41K valid submissions from 5K
teams; 64 submissions per day
Wining approach uses hundreds of
predictors from several teams
Is this general?
Why did it take so long?
23. What works
It depends on the domain and particular problem
However, in the general case it has been demonstrated that
(currently) the best isolated approach is CF.
Item-based in general more efficient and better but mixing CF
approaches can improve result
Other approaches can be hybridized to improve results in specific
cases (cold-start problem...)
What matters:
Data preprocessing: outlier removal, denoising, removal of global
effects (e.g. individual user's average)
“Smart” dimensionality reduction using MF such as SVD
Combining classifiers
24. I like it... I like it not
Evaluating User Ratings Noise in
Recommender Systems
Xavier Amatriain (@xamat), Josep M. Pujol, Nuria Oliver
Telefonica Research
28. Natural Noise Limits our User Model
DID YOU HEAR WHAT
I LIKE??!!
...and Our Prediction Accuracy
29. The Magic Barrier
Magic Barrier = Limit on prediction accuracy
due to noise in original data
Natural Noise = involuntary noise introduced by
users when giving feedback
Due to (a) mistakes, and (b) lack of resolution in
personal rating scale (e.g. In a 1 to 5 scale a 2 may mean the
same than a 3 for some users and some items).
Magic Barrier >= Natural Noise Threshold
We cannot predict with less error than the
resolution in the original data
30. Our related research questions
Q1. Are users inconsistent when providing
explicit feedback to Recommender Systems via
the common Rating procedure?
Q2. How large is the prediction error due to
these inconsistencies?
Q3. What factors affect user inconsistencies?
31. Experimental Setup (I)
Test-retest procedure: you need at least 3 trials
to separate
Reliability: how much you can trust the instrument
you are using (i.e. ratings)
r = r12 r23 /r13
Stability: drift in user opinion
s12 =r13 /r23 ; s23 =r13 /r12 ; s13 =r13 ²/r12 r23
Users rated movies in 3 trials
Trial 1 <-> 24 h <-> Trial 2 <-> 15 days <-> Trial 3
32. Experimental Setup (II)
100 Movies selected from Netflix dataset doing
a stratified random sampling on popularity
Ratings on a 1 to 5 star scale
Special “not seen” symbol.
Trial 1 and 3 = random order; trial 2 = ordered
by popularity
118 participants
34. Comparison to Netflix Data
Distribution of number of ratings per movie very
similar to Netflix but average rating is lower
(users are not voluntarily choosing what to rate)
35. Test-retest Reliability and Stability
Overall reliability = 0.924 (good reliabilities are
expected to be > 0.9)
Removing mild ratings yields higher reliabilities,
while removing extreme ratings yields lower
Stabilities: s12 = 0.973, s23 = 0.977, and s13 =
0.951
Stabilities might also be accounting for “learning
effect” (note s12<s23)
36. Users are Inconsistent
● What is the probability of making an inconsistency
given an original rating
37. Users are Inconsistent
Mild ratings are
noisier
● What is the percentage of inconsistencies given an
original rating
38. Users are Inconsistent
Negative
ratings are
noisier
● What is the percentage of inconsistencies given an
original rating
39. Prediction Accuracy
#Ti #Tj # RMSE
T1, T2 2185 1961 1838 2308 0.573 0.707
T1, T3 2185 1909 1774 2320 0.637 0.765
T2, T3 1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering
intersection and union of both sets
40. Prediction Accuracy
Max error in
trials that are
#Ti #Tj # RMSE
most distant in
time
T1, T2 2185 1961 1838 2308 0.573 0.707
T1, T3 2185 1909 1774 2320 0.637 0.765
T2, T3 1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering
intersection and union of both sets
41. Prediction Accuracy
Significant less
error when 2nd #Ti #Tj # RMSE
trial is involved
T1, T2 2185 1961 1838 2308 0.573 0.707
T1, T3 2185 1909 1774 2320 0.637 0.765
T2, T3 1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering
intersection and union of both sets
42. Algorithm Robustness to NN
Alg./Trial T1 T2 T3 Tworst /Tbest
User 1.2011 1.1469 1.1945 4.7%
Average
Item 1.0555 1.0361 1.0776 4%
Average
Userbased 0.9990 0.9640 1.0171 5.5%
kNN
Itembased 1.0429 1.0031 1.0417 4%
kNN
SVD 1.0244 0.9861 1.0285 4.3%
● RMSE for different Recommendation algorithms
when predicting each of the trials
43. Algorithm Robustness to NN
Trial 2 is
consistently the
Alg./Trial T1 T2 T3 Tworst /Tbest
least noisy
User 1.2011 1.1469 1.1945 4.7%
Average
Item 1.0555 1.0361 1.0776 4%
Average
Userbased 0.9990 0.9640 1.0171 5.5%
kNN
Itembased 1.0429 1.0031 1.0417 4%
kNN
SVD 1.0244 0.9861 1.0285 4.3%
● RMSE for different Recommendation algorithms
when predicting each of the trials
44. Algorithm Robustness to NN (2)
TrainingTesting T1-T2 T1-T3 T2-T3
Dataset
User Average 1.1585 1.2095 1.2036
Movie Average 1.0305 1.0648 1.0637
Userbased kNN 0.9693 1.0143 1.0184
Itembased kNN 1.0009 1.0406 1.0590
SVD 0.9741 1.0491 1.0118
● RMSE for different Recommendation algorithms
when predicting ratings in one trial (testing) from
ratings on another (training)
45. Algorithm Robustness to NN (2)
TrainingTesting T1-T2 T1-T3 T2-T3
Dataset
User Average 1.1585 1.2095 1.2036
Movie Average 1.0305 1.0648 1.0637
Userbased kNN 0.9693 1.0143 1.0184
Itembased kNN 1.0009 1.0406 1.0590
SVD
Noise is minimized 0.9741 1.0491 1.0118
when we predict
Trial 2
● RMSE for different Recommendation algorithms
when predicting ratings in one trial (testing) from
ratings on another (training)
46. Let's recap
Users are inconsistent
Inconsistencies can depend on many things
including how the items are presented
Inconsistencies produce natural noise
Natural noise reduces our prediction accuracy
independently of the algorithm
47. Item order effect
R1 is the trial with most inconsistencies
R3 has less, but not when excluding “not seen”
(learning effect improves “not seen” discrimination)
R2 minimizes inconsistencies because of order
(reducing “contrast effect”).
48. User Rating Speed Effect
Evaluation time decreases as survey progresses in R1
and R3 (users losing attention but also learning)
In R2 evaluation time starts decreasing until users find
segment of “popular” movies
Rating speed is not correlated with inconsistencies
50. Different proposals
In order to deal with noise in user feedback we
have so far proposed 3 different approaches:
1. Denoise user feedback by using a re-rating
approach (Recsys09)
2. Instead of regular users, take feedback from
experts, which we expect to be less noisy
(SIGIR09)
3. Combine ensembles of datasets to identify which
works better for each user (IJCAI09)
51. Rate it Again
Rate it Again
Increasing Recommendation Accuracy
by User re-Rating
Xavier Amatriain (with J.M. Pujol, N. Tintarev, N. Oliver)
Telefonica Research
52. Rate it again
By asking users to rate items again we can
remove noise in the dataset
Improvements of up to 14% in accuracy!
Because we don't want all users to re-rate all
items we design ways to do partial denoising
Data-dependent: only denoise extreme ratings
User-dependent: detect “noisy” users
53. Algorithm
Given a rating dataset where (some) items
have been re-rated,
Two fairness conditions:
1. Algorithm should remove as few ratings as
possible (i.e. only when there is some certainty that
the rating is only adding noise)
2.Algorithm should not make up new ratings but
decide on which of the existing ones are valid.
54. Algorithm
One source re-rating case:
Given the following milding function:
56. Denoise outliers
● Improvement in RMSE when doing onesource as a function of
the percentage of denoised ratings and users: selecting only noisy
users and extreme ratings
57. The Wisdom of the Few
A Collaborative Filtering Approach Based on
Expert Opinions from the Web
Xavier Amatriain (@xamat), Josep M. Pujol, Nuria Oliver
Telefonica Research (Barcelona)
Neal Lathia
UCL (London)
58. Crowds are not always wise
Collaborative filtering is the preferred approach
for Recommender Systems
Recommendations are drawn from your past
behavior and that of similar users in the system
Standard CF approach:
Find your Neighbors from the set of other users
Recommend things that your Neighbors liked and you
have not “seen”
Problem: predictions are based on a large
dataset that is sparse and noisy
59. Overview of the Approach
expert = individual that we can trust to have produced
thoughtful, consistent and reliable evaluations (ratings) of
items in a given domain
Expert-based Collaborative Filtering
Find neighbors from a reduced set of experts instead of
regular users.
1. Identify domain experts with reliable ratings
2. For each user, compute “expert neighbors”
3. Compute recommendations similar to standard kNN CF
60. Advantages of the Approach
Noise Cold Start problem
Experts introduce less Experts rate items as
natural noise soon as they are
Malicious Ratings available
Dataset can be monitored
Scalability
to avoid shilling Dataset is several order of
Data Sparsity magnitudes smaller
Reduced set of domain
Privacy
experts can be motivated Recommendations can be
to rate items computed locally
61. Mining the Web for Expert Ratings
Collections of expert
ratings can be obtained
almost directly on the web:
we crawled the Rotten
Tomatoes movie critics
mash-up
Only those (169) with
more than 250 ratings in
the Neflix dataset were
used
62. Dataset Analysis. Summary
Experts...
are much less sparse
rate movies all over the rating scale instead of
being biased towards rating only “good” movies
(different incentives).
but, they seem to consistently agree on the good
movies.
have a lower overall standard deviation per movie:
they tend to agree more than regular users.
tend to deviate less from their personal average
rating.
63. Evaluation Procedure
Use the 169 experts to predict ratings from
10.000 users sampled from the Netflix dataset
Prediction MAE using a 80-20 holdout
procedure (5-fold cross-validation)
Top-N precision by classifying items as being
“recommendable” given a threshold
Results show Expert CF to behave similar to
standard CF
But... we have a user study backing up the
approach
64. User Study
57 participants, only 14.5 ratings/participant
50% of the users consider Expert-based CF to be
good or very good
Expert-based CF: only algorithm with an average
rating over 3 (on a 0-4 scale)
65. Current Work
Music recommendations
(using metacritics.com),
mobile geo-located
recommendations...
66. Adaptive Data Sources
Collaborative Filtering With Adaptive
Information Sources
(ITWP @ IJCAI)
With Neal Lathia
UCL (London)
67. Adaptive data sources
like-
minded?
similarity friends?
trust
user modeling experts?
reputation
68. Adaptive Data sources
Given
a simple, un-tuned, kNN predictor and multiple
information sources
A problem
users are subjective, accuracy varies with source
A promise
optimal classification of users to best source
produces incredibly accurate predictions
70. Conclusions
For many applications such as Recommender
Systems (but also Search, Advertising, and
even Networks) understanding data and users
is vital
Algorithms can only be as good as the data
they use as input
Importance of User/Data Mining is going to be a
growing trend in many areas in the coming
years