Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
BuildingRecommendationEngine Keeyong Han, Jan 2013
Table of Contents1. What is Recommendation?2. Different Recommendation Strategies3. Introduction of Hadoop/Mahout4. Buildi...
What isRecommendation?
Definition ofRecommendation Engine"A recommendation system providesinformation or items that are likely to be ofinterest t...
Examples•   Related Product (Amazon)•   Movie Recommendation (Netflix)•   News Contents (Yahoo)•   Online Dating (eHarmony...
Why Recommendation?•   A way for users to find contents of interest    (from large selections) with less efforts.     o Na...
DifferentRecommendationStrategiesItem vs. User
Item basedrecommendation (1)1. Content-based Item Recommendation.  o   Using meta data from Item, compute similarity      ...
Item basedrecommendation (2)2.       Collaborative Filtering.     o    Leverage users’ collective intelligence           ...
User basedrecommendation• First group users into different clusters    o   Represent users as feature vectors         Inf...
Challenges ofRecommendation Engine• Cold Starter    o   For new users and/or items, no information to        leverage.•   ...
Introduction ofHadoop/Mahout
What is Hadoop?•   An open source distributed computation and    storage platform after Google File System    and MapReduc...
What is Mahout?•   An open source machine learning library    written in Java.    1. Standalone    2. MapReduce.       o S...
BuildingRecommendation Enginewith Hadoop/Mahout
Typical Architecture                Data Collection                      Web server logs,                                 ...
Use Case:Polyvore – Item Page              Item in question                       Content Based                       Reco...
Use Case:Polyvore – Home Page               Personalized Recommendation
People who liked thisalso like ...• This is based on "Collaborative Filtering”• Construct co-occurrence matrix or Item    ...
PersonalizedRecommendation• This is based on "Collaborative Filtering”• Extension of previous topic• Computation-wise, mat...
Polyvore Example• Assumption:    o   N items and M users. Users can only like (no rating)•   Create item similarity matrix...
How to use Mahout?• ItemSimilarityJob class    •   Main class to compute co-occurrence matrix.•   RecommenderJob class    ...
How to use Mahout?(Contd)• Input File: user-item-rating.txt  o  userID,itemID[,rating] per line.• How to compute similarit...
How to use Mahout?(Contd)• Final Output  o     UserID   [(ItemID,Score),(ItemID,Score),......    o   ...•   Load this from...
Lessons• Need to understand business domain    o This takes time and efforts•   Garbage In Garbage Out    o   Filtering is...
Next stage ofrecommendation?•   Need realtime & scalable    recommendation technology.•   Recommendation As A Service.    ...
Q&Akeeyonghan@hotmail.com
Nächste SlideShare
Wird geladen in …5
×

Buidling large scale recommendation engine

Quick survey of recommendation strategies and introduction of mahout

  • Als Erste(r) kommentieren

Buidling large scale recommendation engine

  1. 1. BuildingRecommendationEngine Keeyong Han, Jan 2013
  2. 2. Table of Contents1. What is Recommendation?2. Different Recommendation Strategies3. Introduction of Hadoop/Mahout4. Building Recommendation Engine with Hadoop/Mahout5. How to use Mahout6. Q&A
  3. 3. What isRecommendation?
  4. 4. Definition ofRecommendation Engine"A recommendation system providesinformation or items that are likely to be ofinterest to a user, in an automated fashion”- Alpa Jain from Twitter"Serve the right item to users in anautomated fashion to optimize long-termbusiness objectives"- Deepak Agarwal from Yahoo
  5. 5. Examples• Related Product (Amazon)• Movie Recommendation (Netflix)• News Contents (Yahoo)• Online Dating (eHarmony)• Search Autocomplete (Google)• Connection Recommendation (LinkedIn)• Song Recommendation (Pandora)• Walmart – (Physical) Store Layout
  6. 6. Why Recommendation?• A way for users to find contents of interest (from large selections) with less efforts. o Natural way to personalization! o Serendipity factor• For companies, a good way to introduce new and unknown contents
  7. 7. DifferentRecommendationStrategiesItem vs. User
  8. 8. Item basedrecommendation (1)1. Content-based Item Recommendation. o Using meta data from Item, compute similarity between items. i. Description, price, category and so on ii. Normalize these into a feature vector (numeric values) i. You can think of it as a point in N-dimension. iii. Compute the distances between vectors. i. Euclidean Distance Score ii. Cosine Similarity Score iii. Pearson Correlation Score
  9. 9. Item basedrecommendation (2)2. Collaborative Filtering. o Leverage users’ collective intelligence  Similar users tend to like similar items  Amazon’s product recommendation is a very good and famous example o Will look at this in more detail
  10. 10. User basedrecommendation• First group users into different clusters o Represent users as feature vectors  Information about users: • geo-location, gender, age, …  Items users liked or rated o K-nearest neighbors (KNN) is used a lot• From each cluster, find representative items o Some kind of graph traversal o Highest rated items o Most liked items
  11. 11. Challenges ofRecommendation Engine• Cold Starter o For new users and/or items, no information to leverage.• Sparse Data o Item reviews or purchases are not very common.• Scalability Issue o The bigger the data gets, the more computation is needed.
  12. 12. Introduction ofHadoop/Mahout
  13. 13. What is Hadoop?• An open source distributed computation and storage platform after Google File System and MapReduce framework• Perfect fit for large scale batch offline processing but not for realtime processing• Widely used in many companies
  14. 14. What is Mahout?• An open source machine learning library written in Java. 1. Standalone 2. MapReduce. o Supports large scale batch offline processing.• Covers the followings o Recommendation/Collaborative Filtering. o Classification: Supervised Learning. o Clustering: Unsupervised Learning.
  15. 15. BuildingRecommendation Enginewith Hadoop/Mahout
  16. 16. Typical Architecture Data Collection Web server logs, MySQL tables, ... (explicit Input Data Pre-processing (ETL, Filtering, …) feedback and implicit feedback) Recommendation Data Building (Mahout) Output Data Post-processing (Re-ordering) Hadoop Load Final Data To Serving Layer MySQL, NoSQL, Recommendation Serving Layer Solr/ElasticSearch, ...
  17. 17. Use Case:Polyvore – Item Page Item in question Content Based Recommendation Collaborative Filtering
  18. 18. Use Case:Polyvore – Home Page Personalized Recommendation
  19. 19. People who liked thisalso like ...• This is based on "Collaborative Filtering”• Construct co-occurrence matrix or Item similarity matrix – S[NxN] o Increment S[i,j] and S[j,i] if item i and item j are liked by the same user o Repeat this for all users for their liked items• For item k, find the most co-occurred items (from column k or row k) as recommendations.
  20. 20. PersonalizedRecommendation• This is based on "Collaborative Filtering”• Extension of previous topic• Computation-wise, matrix multiplication a. First, build a similar matrix (S) for items b. Next, build a preference vector (P) for user c. Next, multiply two matrices from a and b  R=SxP a. Lastly, sort the final vector elements of R
  21. 21. Polyvore Example• Assumption: o N items and M users. Users can only like (no rating)• Create item similarity matrix of S (NxN) o This will be used as recommendations in Item page• Create user preference vector of P(1xN) o Set all P(i) which are liked by the user in question• Multiply S by P o Sort result elements by the score o This will be personalized item recommendation
  22. 22. How to use Mahout?• ItemSimilarityJob class • Main class to compute co-occurrence matrix.• RecommenderJob class • Main class to generate personalized recommendations.hadoop jar mahout-core-0.8-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=input/user-item-rating.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData --similarityClassname SIMILARITY_COOCCURRENCE --minPrefsPerUser 2 --maxPrefsPerUser 50000This will run total 10 mapreduce jobs to generate final recommendations forusers
  23. 23. How to use Mahout?(Contd)• Input File: user-item-rating.txt o userID,itemID[,rating] per line.• How to compute similarity between Items o --similarityClassname parameter determines  CooccurrenceCountSimilarity  LogLikelihoodSimilarity  TanimotoCoefficientSimilarity  CityBlockSimilarity  CosineSimilarity  PearsonCorrelationSimilarity  EuclideanDistanceSimilarity
  24. 24. How to use Mahout?(Contd)• Final Output o UserID [(ItemID,Score),(ItemID,Score),...... o ...• Load this from HDFS to a serving layer o Relational Database o Search Engine o NoSQL
  25. 25. Lessons• Need to understand business domain o This takes time and efforts• Garbage In Garbage Out o Filtering is very important• Start with simple approach o And then improve gradually• Having automated pipeline is very important o More experiments with less efforts is doable o Remember you will have to do lots of experiments o But it is hard and takes time to build
  26. 26. Next stage ofrecommendation?• Need realtime & scalable recommendation technology.• Recommendation As A Service. • www.myrrix.com
  27. 27. Q&Akeeyonghan@hotmail.com

×