2. We will talk about
◦ Netflix Prize
◦ Major challenges
◦ Definitions of subjects and problems
◦ Recommend methods
◦ Mahout
◦ CNTV 5+ VIP Recommendation
3. We will not talk about
◦ Architecture of a recommender system
◦ How to make it robust and scalability
4. Netflix Prize
◦
Netflix, Inc. is an American provider of on-demand Internet
streaming media and flat rate DVD-by-mail
◦
60% of DVDs rented by Netflix are selected based on personalized
recommendations.
5. Netflix Prize
◦
In October 2006, Netflix released a dataset containing
approximately 100 million anonymous movie ratings and
challenged researchers and practitioners to develop recommender
systems that could beat the accuracy of the company's
recommendation system, Cinematch.
◦
On 21 September 2009, the grand prize of $1,000,000 was
awarded to a team that over performed the Cinematch's accuracy
by 10%.
6. Major challenges
◦
Data sparsity – 数据庞大;评分分布不均匀。
◦
Scalability– 数据庞大;增量更新。
◦
Cold start – 新来的用户
◦
Diversity vs. accuracy – 不要把路人皆知的推介给我
◦
Vulnerability to attacks – 有榜单,就有人刷榜
◦
The value of time – 不同时期喜欢不同的东西
◦
Evaluation of recommendations – 不同的推介方法谁好谁差
◦
User interface – 优化的展示方式,让用户乐于接受我们的推介
7. Evaluation Metrics for Recommendation
◦ The training set ET
-- The training set is treated as known information
◦ The probe set EP
-- no information from the probe set is allowed to
be used for recommendation.
8. Evaluation Metrics for Recommendation
◦ Accuracy Metrics
◦ Mean Absolute Error (MAE)
◦ Root Mean Squared Error (RMSE)
10. Evaluation Metrics for Recommendation
◦ Precision is the proportion of top recommendations
that are good.
◦ Recall is the proportion of good recommendations that
appear in top recommendations.
12. Classifications of recommender systems
◦ Content-based recommendations
◦ Collaborative recommendations
◦ Memory-based collaborative filtering
◦ Standard similarity-based methods
◦ methods employing social filtering
◦ Model-based collaborative filtering
◦ dimensionality reduction methods
◦ diffusion-based methods
◦ Hybrid approaches
13. Similarity-based methods
◦ User-based recommender
for every other user w
compute a similarity s between u and w
retain the top users, ranked by similarity, as a neighborhood n
for every item i that some user in n has a preference for,
but that u has no preference for yet
for every other user v in n that has a preference for i
compute a similarity s between u and v
incorporate v's preference for i, weighted by s, into a running
average
14. Similarity-based methods
◦ User-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood =
new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender =
new GenericUserBasedRecommender(model, neighborhood,
similarity);
15. Similarity-based methods
◦ User-based recommender
•
•
•
•
Data model, implemented via DataModel
User-user similarity metric, implemented via UserSimilarity
User neighborhood definition, implemented via UserNeighborhood
Recommender engine, implemented via a Recommender (here,
GenericUserBasedRecommender)
16. Similarity-based methods
◦ Item-based recommender
for every item i that u has no preference for yet
for every item j that u has a preference for
compute a similarity s between i and j
add u's preference for j, weighted by s, to a running average
return the top items, ranked by weighted average
17. Similarity-based methods
◦ Item-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
Recommender recommender =
new GenericUserBasedRecommender(model, similarity);
24. Architecture of NeuRecommendation
Request for
recommendation
IMS etc.
Dispatch
request using
round robin
Dispatcher
Recommender
Recommender
Data Feeder
Fetching users’
preferences