SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Exploring content
recommendation
Felipe Besson
@fmbesson
March, 2013
“A lot of times, people don't know what they
want until you show it to them.”
Steve Jobs
“We don't make money when we sell things;
we make money when we help customers
make purchase decisions.”
Jeff Bezos, Amazon
Why recommendation is important ?
An Apache project to build scalable machine
learning libraries
●
Focused on large data sets
●
Adaption of standard machine learning algorithms
●
Run on Apache Hadoop (map/reduce paradigm)
… or on a non Hadoop node
Who is using Mahout ?
Source: https://cwiki.apache.org/MAHOUT/powered-by-mahout.html
Supported core algorithms
●
Classification
●
Clustering
●
Recommendation
●
Pattern Mining
●
Regression
●
Dimension Reduction
●
Evolutionary Algorithms
●
Vector Similarity
Mahout Recommender
Collaborative filtering
People often get the best recommendation from someone
with similar taste
●
People tend to like things that are similar to other things
they like
●
There are patterns in people likes and dislikes
John Bob
movie1 movie1
movie2
movie2
movie42
movie4
movie5
Will Bob like movie4? and
movie5?
Mahout Recommender
Available recommenders
●
Item based
●
User based
Execution modes
●
Taste: online but not distributed
●
Hadoop: offline (batch) but distributed
Parameters
●
Many coefficients to calculate user and item
similarity and neighborhood
●
Data model abstractions
Mahout Recommender (Hadoop)
Input
user_id
item_id
preference_value (optional)
1, 23, 0.9
1, 15, 0.5
1, 89, 0.1
2, 11, 0.3
2, 15, 0.2
9, 10, 0.5
9, 99, 0.9
9, 11, 0.1
8, 11, 0.5
...
Output
user_id
[recommended_item, score]
1: [10, 0.93; 11, 0.84; … ]
2: [23, 0.72; 17, 0.60; … ]
8: [121, 0.98; 23, 0.78; … ]
17: [12, 0.89; 32, 0.56; … ]
42: [129, 0.92; 98, 0.45; … ]
...
1st try!
Movie recommendation
Netflix base (http://www.netflixprize.com/)
●
# of user tastes: 2.817.131
●
# of movies: 17.770
●
# of users: 472891
Environment and performance
●
Hadoop pseudo-distributed
●
Computer
●
Intel® Core™ i5-3317U CPU @ 1.70GHz × 4
●
6Gb RAM
●
Total time: ~ 16 minutes
How to run ?
1. Copy the input file to HDFS (Hadoop distributed
file system)
hadoop fs -put qualifying.txt /netflix/input/data.txt
2. Run the recommender
hadoop jar core/target/mahout-core-0.8-SNAPSHOT-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=/netflix/input/data.txt
-Dmapred.output.dir=/netflix/output
--numRecommendations 10
--similarityClassname SIMILARITY_LOGLIKELIHOOD
Results
Recommender analyzer
https://github.com/besson/recommender_analyzer
http://rec-analyzer.herokuapp.com/
Results
References
Sean Owen, Robin Anil, Ted Dunning, and Ellen
Friedman. Mahout in Action, Manning publications,
2011.
Thanks
Felipe Besson
@fmbesson

Weitere ähnliche Inhalte

Ähnlich wie Exploring content recommendation

Apache Mahout
Apache MahoutApache Mahout
Apache MahoutAjit Koti
 
Azure Boot Camp 2017 getting started with azure machine learning
Azure Boot Camp 2017 getting started with azure machine learningAzure Boot Camp 2017 getting started with azure machine learning
Azure Boot Camp 2017 getting started with azure machine learningSetu Chokshi
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Cloudera, Inc.
 
Forget the Fairy Dust - How to Create Content That (Actually) Works
Forget the Fairy Dust - How to Create Content That (Actually) WorksForget the Fairy Dust - How to Create Content That (Actually) Works
Forget the Fairy Dust - How to Create Content That (Actually) WorksJoel Klettke
 
No Nonsense Content Marketing - MNsearch 2017 - Slideshare
No Nonsense Content Marketing - MNsearch 2017 - SlideshareNo Nonsense Content Marketing - MNsearch 2017 - Slideshare
No Nonsense Content Marketing - MNsearch 2017 - SlideshareJohn Doherty
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutKorea Sdec
 
Q2 HUG - Content in AI.pdf
Q2 HUG - Content in AI.pdfQ2 HUG - Content in AI.pdf
Q2 HUG - Content in AI.pdfAlexisLyga
 
Be A Great Product Leader (Amplify, Oct 2019)
Be A Great Product Leader (Amplify, Oct 2019)Be A Great Product Leader (Amplify, Oct 2019)
Be A Great Product Leader (Amplify, Oct 2019)Adam Nash
 
Impersonal Recommendation system on top of Hadoop
Impersonal Recommendation system on top of HadoopImpersonal Recommendation system on top of Hadoop
Impersonal Recommendation system on top of HadoopKostiantyn Kudriavtsev
 
Building a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actBuilding a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actElad Rosenheim
 
How to create searchable content
How to create searchable contentHow to create searchable content
How to create searchable contentBeth Browning
 
Inbound Marketing Conference 2016 Summary
Inbound Marketing Conference 2016 SummaryInbound Marketing Conference 2016 Summary
Inbound Marketing Conference 2016 SummaryJimmy Smith
 
Jumpstart - 02/01/2015
Jumpstart - 02/01/2015Jumpstart - 02/01/2015
Jumpstart - 02/01/2015Tom Hartman
 
Be a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, DropboxBe a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, DropboxAmplitude
 
Download Materials
Download MaterialsDownload Materials
Download Materialsbutest
 

Ähnlich wie Exploring content recommendation (20)

Evc2014
Evc2014Evc2014
Evc2014
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Azure Boot Camp 2017 getting started with azure machine learning
Azure Boot Camp 2017 getting started with azure machine learningAzure Boot Camp 2017 getting started with azure machine learning
Azure Boot Camp 2017 getting started with azure machine learning
 
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...
 
Forget the Fairy Dust - How to Create Content That (Actually) Works
Forget the Fairy Dust - How to Create Content That (Actually) WorksForget the Fairy Dust - How to Create Content That (Actually) Works
Forget the Fairy Dust - How to Create Content That (Actually) Works
 
No Nonsense Content Marketing - MNsearch 2017 - Slideshare
No Nonsense Content Marketing - MNsearch 2017 - SlideshareNo Nonsense Content Marketing - MNsearch 2017 - Slideshare
No Nonsense Content Marketing - MNsearch 2017 - Slideshare
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
Bootstrapping Coursepad
Bootstrapping CoursepadBootstrapping Coursepad
Bootstrapping Coursepad
 
SDEC2011 Essentials of Mahout
SDEC2011 Essentials of MahoutSDEC2011 Essentials of Mahout
SDEC2011 Essentials of Mahout
 
Q2 HUG - Content in AI.pdf
Q2 HUG - Content in AI.pdfQ2 HUG - Content in AI.pdf
Q2 HUG - Content in AI.pdf
 
Yahoo Help Content Strategy - Chris Todd
Yahoo Help Content Strategy -  Chris ToddYahoo Help Content Strategy -  Chris Todd
Yahoo Help Content Strategy - Chris Todd
 
Be A Great Product Leader (Amplify, Oct 2019)
Be A Great Product Leader (Amplify, Oct 2019)Be A Great Product Leader (Amplify, Oct 2019)
Be A Great Product Leader (Amplify, Oct 2019)
 
Impersonal Recommendation system on top of Hadoop
Impersonal Recommendation system on top of HadoopImpersonal Recommendation system on top of Hadoop
Impersonal Recommendation system on top of Hadoop
 
Building a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actBuilding a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing act
 
How to create searchable content
How to create searchable contentHow to create searchable content
How to create searchable content
 
Inbound Marketing Conference 2016 Summary
Inbound Marketing Conference 2016 SummaryInbound Marketing Conference 2016 Summary
Inbound Marketing Conference 2016 Summary
 
Jumpstart - 02/01/2015
Jumpstart - 02/01/2015Jumpstart - 02/01/2015
Jumpstart - 02/01/2015
 
Be a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, DropboxBe a great product leader by Adam Nash, VP Product, Dropbox
Be a great product leader by Adam Nash, VP Product, Dropbox
 
Download Materials
Download MaterialsDownload Materials
Download Materials
 
Better Search Engine Testing
Better Search Engine TestingBetter Search Engine Testing
Better Search Engine Testing
 

Kürzlich hochgeladen

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 

Kürzlich hochgeladen (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 

Exploring content recommendation