SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Recommender System

hellojinjie
2013-06-19
We will talk about
◦ Netflix Prize
◦ Major challenges
◦ Definitions of subjects and problems

◦ Recommend methods
◦ Mahout
◦ CNTV 5+ VIP Recommendation
We will not talk about
◦ Architecture of a recommender system
◦ How to make it robust and scalability
Netflix Prize
◦

Netflix, Inc. is an American provider of on-demand Internet
streaming media and flat rate DVD-by-mail

◦

60% of DVDs rented by Netflix are selected based on personalized

recommendations.
Netflix Prize
◦

In October 2006, Netflix released a dataset containing

approximately 100 million anonymous movie ratings and
challenged researchers and practitioners to develop recommender
systems that could beat the accuracy of the company's
recommendation system, Cinematch.
◦

On 21 September 2009, the grand prize of $1,000,000 was
awarded to a team that over performed the Cinematch's accuracy

by 10%.
Major challenges
◦

Data sparsity – 数据庞大;评分分布不均匀。

◦

Scalability– 数据庞大;增量更新。

◦

Cold start – 新来的用户

◦

Diversity vs. accuracy – 不要把路人皆知的推介给我

◦

Vulnerability to attacks – 有榜单,就有人刷榜

◦

The value of time – 不同时期喜欢不同的东西

◦

Evaluation of recommendations – 不同的推介方法谁好谁差

◦

User interface – 优化的展示方式,让用户乐于接受我们的推介
Evaluation Metrics for Recommendation
◦ The training set ET
-- The training set is treated as known information
◦ The probe set EP

-- no information from the probe set is allowed to
be used for recommendation.
Evaluation Metrics for Recommendation
◦ Accuracy Metrics
◦ Mean Absolute Error (MAE)

◦ Root Mean Squared Error (RMSE)
Evaluation Metrics for Recommendation
Evaluation Metrics for Recommendation
◦ Precision is the proportion of top recommendations
that are good.
◦ Recall is the proportion of good recommendations that
appear in top recommendations.
Evaluation Metrics for Recommendation
Classifications of recommender systems
◦ Content-based recommendations
◦ Collaborative recommendations
◦ Memory-based collaborative filtering
◦ Standard similarity-based methods
◦ methods employing social filtering
◦ Model-based collaborative filtering
◦ dimensionality reduction methods
◦ diffusion-based methods
◦ Hybrid approaches
Similarity-based methods
◦ User-based recommender
for every other user w
compute a similarity s between u and w
retain the top users, ranked by similarity, as a neighborhood n
for every item i that some user in n has a preference for,
but that u has no preference for yet
for every other user v in n that has a preference for i
compute a similarity s between u and v
incorporate v's preference for i, weighted by s, into a running
average
Similarity-based methods
◦ User-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood =
new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender =
new GenericUserBasedRecommender(model, neighborhood,
similarity);
Similarity-based methods
◦ User-based recommender
•
•
•
•

Data model, implemented via DataModel
User-user similarity metric, implemented via UserSimilarity
User neighborhood definition, implemented via UserNeighborhood
Recommender engine, implemented via a Recommender (here,
GenericUserBasedRecommender)
Similarity-based methods
◦ Item-based recommender
for every item i that u has no preference for yet
for every item j that u has a preference for
compute a similarity s between i and j
add u's preference for j, weighted by s, to a running average
return the top items, ranked by weighted average
Similarity-based methods
◦ Item-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
Recommender recommender =
new GenericUserBasedRecommender(model, similarity);
Summary of available recommender implementations in
Mahout
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0
MV-即刻出发(演唱:吉克隽逸)nba 3.0
(第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0
userBasedBooleanPref
(第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504
(第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127
wings nba 9.839406
(上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188
托尼·帕克 现场秀中文 nba 7.3381634
埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103
(下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148
歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176
罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416
(第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483

http://172.16.0.237:10008/recommend/userID/260676/howMany/10
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0
MV-即刻出发(演唱:吉克隽逸)nba 3.0
(第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0
userBasedBooleanPref
(第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504
(第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127
wings nba 9.839406
(上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188
托尼·帕克 现场秀中文 nba 7.3381634
埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103
(下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148
歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176
罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416
(第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0
MV-即刻出发(演唱:吉克隽逸)nba 3.0
(第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0
userBasedBooleanPref
(第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504
(第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127
wings nba 9.839406
(上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188
托尼·帕克 现场秀中文 nba 7.3381634
埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103
(下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148
歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176
罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416
(第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0
MV-即刻出发(演唱:吉克隽逸)nba 3.0
(第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0
userBasedBooleanPref
(第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504
(第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127
wings nba 9.839406
(上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188
托尼·帕克 现场秀中文 nba 7.3381634
埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103
(下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148
歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176
罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416
(第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
References
1. Sean Owen, Mahout in Action
2. Linyuan Lv, Recommender Systems
Architecture of NeuRecommendation
Request for
recommendation

IMS etc.
Dispatch
request using
round robin
Dispatcher

Recommender

Recommender

Data Feeder

Fetching users’
preferences
Architecture of NeuRecommendation
Recommender

1.
2.

RPC

Data Store

Mahout

Serve recommendation
request
Fetch users’ preferences
Recommender system

Weitere ähnliche Inhalte

Andere mochten auch

Nagios的安装部署和与cacti的整合(linuxtone)
Nagios的安装部署和与cacti的整合(linuxtone)Nagios的安装部署和与cacti的整合(linuxtone)
Nagios的安装部署和与cacti的整合(linuxtone)
Yiwei Ma
 
Tmux quick-reference
Tmux quick-referenceTmux quick-reference
Tmux quick-reference
Ramesh Kumar
 
Webサーバ勉強会#5mod sedについて
Webサーバ勉強会#5mod sedについてWebサーバ勉強会#5mod sedについて
Webサーバ勉強会#5mod sedについて
yut148atgmaildotcom
 

Andere mochten auch (18)

Nagios的安装部署和与cacti的整合(linuxtone)
Nagios的安装部署和与cacti的整合(linuxtone)Nagios的安装部署和与cacti的整合(linuxtone)
Nagios的安装部署和与cacti的整合(linuxtone)
 
配布用Cacti running with cherokee
配布用Cacti running with cherokee配布用Cacti running with cherokee
配布用Cacti running with cherokee
 
Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Tmux quick-reference
Tmux quick-referenceTmux quick-reference
Tmux quick-reference
 
RHEL roadmap
RHEL roadmapRHEL roadmap
RHEL roadmap
 
Webサーバ勉強会#5mod sedについて
Webサーバ勉強会#5mod sedについてWebサーバ勉強会#5mod sedについて
Webサーバ勉強会#5mod sedについて
 
UNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAYUNIX SHELL IN DBA EVERYDAY
UNIX SHELL IN DBA EVERYDAY
 
Cacti manual
Cacti manualCacti manual
Cacti manual
 
Sorting techniques in Perl
Sorting techniques in PerlSorting techniques in Perl
Sorting techniques in Perl
 
Unix interview questions
Unix interview questionsUnix interview questions
Unix interview questions
 
Hadoopp0f 150325024427-conversion-gate01
Hadoopp0f 150325024427-conversion-gate01Hadoopp0f 150325024427-conversion-gate01
Hadoopp0f 150325024427-conversion-gate01
 
Url
UrlUrl
Url
 
shell script introduction
shell script introductionshell script introduction
shell script introduction
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
sed -- A programmer's perspective
sed -- A programmer's perspectivesed -- A programmer's perspective
sed -- A programmer's perspective
 

Ähnlich wie Recommender system

RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
Khadija Atiya
 
Advertising research online resources
Advertising research   online resourcesAdvertising research   online resources
Advertising research online resources
Sun-Young Park
 

Ähnlich wie Recommender system (20)

[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
Content Recommendation through Semantic Annotation of User Reviews and Linked...
Content Recommendation through Semantic Annotation of User Reviews and Linked...Content Recommendation through Semantic Annotation of User Reviews and Linked...
Content Recommendation through Semantic Annotation of User Reviews and Linked...
 
Telefonica Lunch Seminar
Telefonica Lunch SeminarTelefonica Lunch Seminar
Telefonica Lunch Seminar
 
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆巨量與開放資料之創新機會與關鍵挑戰-曾新穆
巨量與開放資料之創新機會與關鍵挑戰-曾新穆
 
IRJET- Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET-  	  Predicting Bitcoin Prices using Convolutional Neural Network Algor...IRJET-  	  Predicting Bitcoin Prices using Convolutional Neural Network Algor...
IRJET- Predicting Bitcoin Prices using Convolutional Neural Network Algor...
 
20170406 delft
20170406 delft20170406 delft
20170406 delft
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
When a FILTER makes the di fference in continuously answering SPARQL queries ...
When a FILTER makes the difference in continuously answering SPARQL queries ...When a FILTER makes the difference in continuously answering SPARQL queries ...
When a FILTER makes the di fference in continuously answering SPARQL queries ...
 
Metis Project 2: Predicting Worldwide Gross - JungleBoogie
Metis Project 2: Predicting Worldwide Gross - JungleBoogieMetis Project 2: Predicting Worldwide Gross - JungleBoogie
Metis Project 2: Predicting Worldwide Gross - JungleBoogie
 
Introduction to Digital Marketing
Introduction to Digital Marketing Introduction to Digital Marketing
Introduction to Digital Marketing
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Advertising research online resources
Advertising research   online resourcesAdvertising research   online resources
Advertising research online resources
 
Use of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudyUse of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case Study
 
Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...
Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...
Straight Talk on Machine Learning -- What the Marketing Department Doesn’t Wa...
 
Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...Analysis of visual similarity in news videos with robust and memory efficient...
Analysis of visual similarity in news videos with robust and memory efficient...
 
Video Recommendation Engines as a Service
Video Recommendation Engines as a ServiceVideo Recommendation Engines as a Service
Video Recommendation Engines as a Service
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative Filtering
 
mini project2.ppt.pptx
mini project2.ppt.pptxmini project2.ppt.pptx
mini project2.ppt.pptx
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Recommender system

  • 2. We will talk about ◦ Netflix Prize ◦ Major challenges ◦ Definitions of subjects and problems ◦ Recommend methods ◦ Mahout ◦ CNTV 5+ VIP Recommendation
  • 3. We will not talk about ◦ Architecture of a recommender system ◦ How to make it robust and scalability
  • 4. Netflix Prize ◦ Netflix, Inc. is an American provider of on-demand Internet streaming media and flat rate DVD-by-mail ◦ 60% of DVDs rented by Netflix are selected based on personalized recommendations.
  • 5. Netflix Prize ◦ In October 2006, Netflix released a dataset containing approximately 100 million anonymous movie ratings and challenged researchers and practitioners to develop recommender systems that could beat the accuracy of the company's recommendation system, Cinematch. ◦ On 21 September 2009, the grand prize of $1,000,000 was awarded to a team that over performed the Cinematch's accuracy by 10%.
  • 6. Major challenges ◦ Data sparsity – 数据庞大;评分分布不均匀。 ◦ Scalability– 数据庞大;增量更新。 ◦ Cold start – 新来的用户 ◦ Diversity vs. accuracy – 不要把路人皆知的推介给我 ◦ Vulnerability to attacks – 有榜单,就有人刷榜 ◦ The value of time – 不同时期喜欢不同的东西 ◦ Evaluation of recommendations – 不同的推介方法谁好谁差 ◦ User interface – 优化的展示方式,让用户乐于接受我们的推介
  • 7. Evaluation Metrics for Recommendation ◦ The training set ET -- The training set is treated as known information ◦ The probe set EP -- no information from the probe set is allowed to be used for recommendation.
  • 8. Evaluation Metrics for Recommendation ◦ Accuracy Metrics ◦ Mean Absolute Error (MAE) ◦ Root Mean Squared Error (RMSE)
  • 9. Evaluation Metrics for Recommendation
  • 10. Evaluation Metrics for Recommendation ◦ Precision is the proportion of top recommendations that are good. ◦ Recall is the proportion of good recommendations that appear in top recommendations.
  • 11. Evaluation Metrics for Recommendation
  • 12. Classifications of recommender systems ◦ Content-based recommendations ◦ Collaborative recommendations ◦ Memory-based collaborative filtering ◦ Standard similarity-based methods ◦ methods employing social filtering ◦ Model-based collaborative filtering ◦ dimensionality reduction methods ◦ diffusion-based methods ◦ Hybrid approaches
  • 13. Similarity-based methods ◦ User-based recommender for every other user w compute a similarity s between u and w retain the top users, ranked by similarity, as a neighborhood n for every item i that some user in n has a preference for, but that u has no preference for yet for every other user v in n that has a preference for i compute a similarity s between u and v incorporate v's preference for i, weighted by s, into a running average
  • 14. Similarity-based methods ◦ User-based recommender DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);
  • 15. Similarity-based methods ◦ User-based recommender • • • • Data model, implemented via DataModel User-user similarity metric, implemented via UserSimilarity User neighborhood definition, implemented via UserNeighborhood Recommender engine, implemented via a Recommender (here, GenericUserBasedRecommender)
  • 16. Similarity-based methods ◦ Item-based recommender for every item i that u has no preference for yet for every item j that u has a preference for compute a similarity s between i and j add u's preference for j, weighted by s, to a running average return the top items, ranked by weighted average
  • 17. Similarity-based methods ◦ Item-based recommender DataModel model = new FileDataModel(new File("intro.csv")); ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); Recommender recommender = new GenericUserBasedRecommender(model, similarity);
  • 18. Summary of available recommender implementations in Mahout
  • 19. CNTV 5+ VIP Recommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483 http://172.16.0.237:10008/recommend/userID/260676/howMany/10
  • 20. CNTV 5+ VIP Recommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
  • 21. CNTV 5+ VIP Recommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
  • 22. CNTV 5+ VIP Recommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
  • 23. References 1. Sean Owen, Mahout in Action 2. Linyuan Lv, Recommender Systems
  • 24. Architecture of NeuRecommendation Request for recommendation IMS etc. Dispatch request using round robin Dispatcher Recommender Recommender Data Feeder Fetching users’ preferences
  • 25. Architecture of NeuRecommendation Recommender 1. 2. RPC Data Store Mahout Serve recommendation request Fetch users’ preferences