Recommender system

Recommender System

hellojinjie
2013-06-19

We will talk about
◦ Netflix Prize
◦ Major challenges
◦ Definitions of subjects and problems

◦ Recommend methods
◦ Mahout
◦ CNTV 5+ VIP Recommendation

We will not talk about
◦ Architecture of a recommender system
◦ How to make it robust and scalability

Netflix Prize
◦

Netflix, Inc. is an American provider of on-demand Internet
streaming media and flat rate DVD-by-mail

◦

60% of DVDs rented by Netflix are selected based on personalized

recommendations.

Netflix Prize
◦

In October 2006, Netflix released a dataset containing

approximately 100 million anonymous movie ratings and
challenged researchers and practitioners to develop recommender
systems that could beat the accuracy of the company's
recommendation system, Cinematch.
◦

On 21 September 2009, the grand prize of $1,000,000 was
awarded to a team that over performed the Cinematch's accuracy

by 10%.

Major challenges
◦

Data sparsity – 数据庞大；评分分布不均匀。

◦

Scalability– 数据庞大；增量更新。

◦

Cold start – 新来的用户

◦

Diversity vs. accuracy – 不要把路人皆知的推介给我

◦

Vulnerability to attacks – 有榜单，就有人刷榜

◦

The value of time – 不同时期喜欢不同的东西

◦

Evaluation of recommendations – 不同的推介方法谁好谁差

◦

User interface – 优化的展示方式，让用户乐于接受我们的推介

Evaluation Metrics for Recommendation
◦ The training set ET
-- The training set is treated as known information
◦ The probe set EP

-- no information from the probe set is allowed to
be used for recommendation.

◦ Accuracy Metrics
◦ Mean Absolute Error (MAE)

◦ Root Mean Squared Error (RMSE)

◦ Precision is the proportion of top recommendations
that are good.
◦ Recall is the proportion of good recommendations that
appear in top recommendations.

Classifications of recommender systems
◦ Content-based recommendations
◦ Collaborative recommendations
◦ Memory-based collaborative filtering
◦ Standard similarity-based methods
◦ methods employing social filtering
◦ Model-based collaborative filtering
◦ dimensionality reduction methods
◦ diffusion-based methods
◦ Hybrid approaches

Similarity-based methods
◦ User-based recommender
for every other user w
compute a similarity s between u and w
retain the top users, ranked by similarity, as a neighborhood n
for every item i that some user in n has a preference for,
but that u has no preference for yet
for every other user v in n that has a preference for i
compute a similarity s between u and v
incorporate v's preference for i, weighted by s, into a running
average

DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood =
new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender =
new GenericUserBasedRecommender(model, neighborhood,
similarity);

•
•
•
•

Data model, implemented via DataModel
User-user similarity metric, implemented via UserSimilarity
User neighborhood definition, implemented via UserNeighborhood
Recommender engine, implemented via a Recommender (here,
GenericUserBasedRecommender)

◦ Item-based recommender
for every item i that u has no preference for yet
for every item j that u has a preference for
compute a similarity s between i and j
add u's preference for j, weighted by s, to a running average
return the top items, ranked by weighted average

◦ Item-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
Recommender recommender =
new GenericUserBasedRecommender(model, similarity);

Summary of available recommender implementations in
Mahout

CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯灌篮得分 nba 5.0
MV-即刻出发（演唱：吉克隽逸）nba 3.0
(第二节11:00) 24-EAST-保罗.乔治灌篮得分 nba 5.0
userBasedBooleanPref
(第四节00:47) 32-WEST-布雷克.格里芬灌篮得分 nba 20.860504
(第二节02:33) 32-WEST-布雷克.格里芬接 24-WEST-科比.布莱恩特传球，灌篮 nba 17.332127
wings nba 9.839406
(上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多自摆乌龙 lfp 8.962188
托尼·帕克现场秀中文 nba 7.3381634
埃文斯再秀创意空中换手+飞跃海报 nba 7.2042103
(下半场58:00) 10-巴塞罗那-梅西攻入一球 lfp 7.201148
歌手 NE-YO 劲歌热舞引导东部全明星入场 nba 7.151176
罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416
(第三节01:25) 34-掘金-贾维尔.麦基灌篮得分 nba 6.4302483

http://172.16.0.237:10008/recommend/userID/260676/howMany/10

CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯灌篮得分 nba 5.0
MV-即刻出发（演唱：吉克隽逸）nba 3.0
(第二节11:00) 24-EAST-保罗.乔治灌篮得分 nba 5.0
userBasedBooleanPref
(第四节00:47) 32-WEST-布雷克.格里芬灌篮得分 nba 20.860504
(第二节02:33) 32-WEST-布雷克.格里芬接 24-WEST-科比.布莱恩特传球，灌篮 nba 17.332127
wings nba 9.839406
(上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多自摆乌龙 lfp 8.962188
托尼·帕克现场秀中文 nba 7.3381634
埃文斯再秀创意空中换手+飞跃海报 nba 7.2042103
(下半场58:00) 10-巴塞罗那-梅西攻入一球 lfp 7.201148
歌手 NE-YO 劲歌热舞引导东部全明星入场 nba 7.151176
罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416
(第三节01:25) 34-掘金-贾维尔.麦基灌篮得分 nba 6.4302483

References
1. Sean Owen, Mahout in Action
2. Linyuan Lv, Recommender Systems

Architecture of NeuRecommendation
Request for
recommendation

IMS etc.
Dispatch
request using
round robin
Dispatcher

Recommender

Recommender

Data Feeder

Fetching users’
preferences

Architecture of NeuRecommendation
Recommender

1.
2.

RPC

Data Store

Mahout

Serve recommendation
request
Fetch users’ preferences

Recommender system

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie Recommender system

Ähnlich wie Recommender system (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Recommender system