SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Usman Sharif

RECOMMENDATION SYSTEMS
Why recommendation systems?

 Provide a better experience to your users.
 Understand the behavior and patterns of
  users.
 Enables an opportunity to re-engage inactive
  users.
 Boost sales
 Better than a search feature
How some companies are using
Recommendation Systems - Amazon
How some companies are using
Recommendation Systems - Gmail
A simple recommendation system

 Consider the following scenario
   A library has books and has members
   Members can have books issued
   The library wants to build a recommender system
    to recommend books to their members
Scoring Matrices
         Book 1   Book 2   Book 3   Book 4
User 1   X                 X
User 2   X
User 3            X                 X
User 4   X                 X        X
User 5   X        X

         Book 1   Book 2   Book 3   Book 4
Book 1   4        1        2        1
Book 2   1        2        0        1
Book 3   2        0        2        1
Book 4   1        1        1        2
Using the scoring matrices

 If a user has read Book 1 recommend Book 3, 2, 4.
 If a user has read Book 2 recommend Book 1, 4, 3.
 If a user has read Book 3 recommend Book 1, 4, 2.
 If a user has read Book 4 recommend Book 1, 2, 3.
Advantages

 Very simple to understand and implement.
 Works really well if you’re interested in
  looking at user’s one activity to recommend
  further.
Disadvantages

 Cannot work for a new user with no history.
 In a real world scenario where there are
  thousands of books and thousands of
  members, there are bound to be too many
  zeroes (a sparse matrix).
 Does not consider more than 1 item.
Another Try
 Our Books records might look like this:
BookId Title                     Genre         Writer               Language
1       The Great Gatsby         Classic       F Scott Fitzgerald   English
2       Nine Stories             Short Stories J D Salinger         English
3       The Sun Also Rises       Classic       Ernest Hemingway English
4       The Hunger Games         Action        Suzanne Collins      English
5       The Ambler Warning       Thriller      Robert Ludlum        English
6       The Catcher in the Rye   Classic       J D Salinger         English
7       To Kill a Mockingbird    Classic       Harper Lee           English
Create an Item Similarity
   Matrix
            Book 1     Book 2      Book 3     Book 4      Book 5     Book 6      Book 7
Book 1      3          1           2          1           1          2           2
Book 2      1          3           1          1           1          2           1
Book 3      2          1           3          1           1          2           2
Book 4      1          1           1          3           1          1           1
Book 5      1          1           1          1           3          1           1
Book 6      2          2           2          1           1          3           2
Book 7      2          1           2          1           1          2           3
• This would always be a square (n x n) matrix.
• Each cell has the count of similar attributes (excluding unique attributes).
• In general any measure for similarity can be used here.
To Recommend

 Look at what a user has previously read.
 Use the values from the similarity matrix and
  recommend books based on how similar it is
  to the book the user has already read.
Advantages

 Recommendations can be pre-computed for
  a very large Item base.
 Fast lookups can be built to perform
  recommendations.
 For example, if a user is seeing the page of
  Book 3, you may want to recommend them
  Books 1, 6 and 7.
 Would work for new/non-registered users.
Disadvantage

 Does not consider the user’s history.
 Instead looks at a collective trend.
Another Approach - The Users

 Our Users records might look like this:
 UserId     Gender    Age        Location
 1          Male      34         Pakistan
 2          Female    28         Pakistan
 3          Male      38         India
 4          Male      32         India
 5          Female    21         Pakistan
 6          Female    24         Pakistan
The User Borrowing
  UserId   BookId
  1        3
  1        7
  2        2
  3        1
  3        5
  3        7
  4        6
  4        7
  5        2
  6        4
  6        6
  6        7
Transforming User Borrowing
             User 1     User 2       User 3   User 4   User 5   User 6
   Book 1                            X
   Book 2               X                              X
   Book 3    X
   Book 4                                                       X
   Book 5                            X
   Book 6                                     X                 X
   Book 7    X                       X        X                 X


• Issue with too many zero values.
• Any solutions?
Transform the Users Records

 Consider Age as a discrete column with
  ranges like {0-10, 11-20, 21-30, 31-40, …} so
  that we can create some partitions like this:
  PartitionId   Gender   AgeGroup   Location
  1             Male     31-40      Pakistan
  2             Female   21-30      Pakistan
  3             Male     31-40      India
Recreate User Borrowing using
  Partition Information
 Lesser zero valued records (11/21 compared to
  30/42 previously)
 Much less columns than we previously had!
 The notation has been changed from ‘X’ to
  count.                  Partition 1 Partition 2 Partition 3
                         Book 1                      1
                         Book 2            2
                         Book 3   1
                         Book 4            1
                         Book 5                      1
                         Book 6            1         1
                         Book 7   1        1         2
To Recommend

 See what partition a user belongs to.
 Look at the column of that partition and sort
  the books in descending order based on their
  frequency count.
Advantages

 Continues to improve over time.
 More partitions can be added over time.
 Instead of using a collective scoring, the
  technique partitions the user base into
  ‘similar’ users.
 The technique can easily be extended on the
  item side and rather than having books as
  rows, we can have book clusters.
Disadvantages

 Needs some seed data to start.
 Requires some transformations.
 Can become very complex as the number of
  users/items grow.
Evaluating Performance
(Metrics)
 Almost any Information Retrieval metric can
  be used.
 Three interesting ones:
   Accuracy
   Coverage
   Normalized Distance Based Performance Measure
    (NDPM)
Accuracy
• Takes into account the order in which recommendations are
  shown to users and how they responded to them.
• For rank position = 1:
   • Acc(1) = # of Positive responses with rank less than or
      equal to 1 / total recommendations with rank less than or
      equal to 1
   • Therefore, Acc(1) = 1 / 3 = 33.33%
• Similarly, Acc(2) = 2 / 6 = 33.33%
                        UserId     BookId    Rank       Response
                        1          3         1          Yes
                        1          2         2          No
                        2          7         1          No
                        2          5         2          Yes
                        3          3         1          No
                        3          7         2          No
Coverage
 Shows the coverage of items that appear in the
  recommendations for all users.
 For rank position = 1:
   Cov(1) = Unique items in recommendations with rank less
    than or equal to 1 / total items.
   Therefore, Cov(1) = 2 / 7 = 28.57%
 Similarly, Cov(2) = 4 / 7 = 57.14%
                      UserId     BookId   Rank      Response
                      1          3        1         Yes
                      1          2        2         No
                      2          7        1         No
                      2          5        2         Yes
                      3          3        1         No
                      3          7        2         No
Normalized Distance Based Performance
    Measure (NDPM)
   Assesses the quality of the measure of recommendation system taking into account the
    ordering in which items are shown.
   NDPM = (C- + 0.5 x C+) / Cu
   C- - is the number of recommended item pairs where user responded as (No, Yes).
   C+ - is the number of recommended item pairs where user responded as (Yes, No).
   Cu - is the number of all item pairs where the user’s response was not same.
   In our example,
       C-(1) = 2, C+(1) = 2 and Cu(1) = 4 => NDPM(1) = (2 + 0.5 x 2) / 4 = 75%
       C-(2) = 0, C+(2) = 1 and Cu(2) = 1 => NDPM(2) = (0 + 0.5 x 1) / 1 = 50%
       NDPM = (0.75 + 0.5) / 2 = 62.5%
                                              UserId                 BookId       Rank   Response
                                              1                      3            1      Yes
                                              1                      2            2      No
                                              1                      7            3      No
                                              1                      5            4      Yes
                                              2                      3            1      Yes
                                              2                      7            2      No
How to improve results

 Ensure that you maintain a list of already
  seen recommendations for users and don’t
  recommend them back for some time.
 Provide some sort of mechanism to user to
  provide information about what they’re
  looking for.
 Infer the above from user searches.
Some standard algorithms
 Item Hierarchy
      You bought a printer, you will also need ink.
 Attribute-based recommendations
      You like reading classics, written by Salinger, you might like “Catcher in
       the Rye”.
 Collaborative Filtering – User-User Similarity
      People like you who read “The Hunger Games” also read “The Ambler
       Warning”.
 Collaborative Filtering – Item-Item Similarity
      You like “Catcher in the Rye” so you will like “Nine Stories”.
 Social + Interest Graph Based
      Your friends like “The Great Gatsby” so you will like “The Great Gatsby”
       too.
 Model Based
      Training SVM, LDA, SVD for implicit features.
Some Tools

 Apache Mahout (Java)


 Crab (Python)


 Easyrec (RESTful API)
Questions??
Thankyou!

            www.usman-sharif.com
                  @sharif_usman

Weitere ähnliche Inhalte

Ähnlich wie Recommender Systems

Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahoutIndic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahoutIndicThreads
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfssuser4c50a9
 
Segmentation for Targeting
Segmentation for TargetingSegmentation for Targeting
Segmentation for TargetingMarcelo Salup
 
7.1 ratios and rates 1
7.1 ratios and rates 17.1 ratios and rates 1
7.1 ratios and rates 1bweldon
 
Consulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style CommunicationConsulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style CommunicationBoundless
 
Probabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information MatchingProbabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information MatchingJagadeesh Gorla
 
New Revised GRE Test Format
New Revised GRE Test FormatNew Revised GRE Test Format
New Revised GRE Test FormatBrightLink Prep
 
Stronger Research Reporting Using Visuals
Stronger Research Reporting Using VisualsStronger Research Reporting Using Visuals
Stronger Research Reporting Using Visualsvcuniversity
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFYusuke Yamamoto
 
The Data Analysis Workflow
The Data Analysis WorkflowThe Data Analysis Workflow
The Data Analysis WorkflowJonathanEarley3
 
Effective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA WorkshopEffective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA WorkshopAmanda Stockwell
 

Ähnlich wie Recommender Systems (20)

NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Indic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahoutIndic threads pune12-recommenders-apache-mahout
Indic threads pune12-recommenders-apache-mahout
 
Memo Raft
Memo RaftMemo Raft
Memo Raft
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
Tinderbook
Tinderbook  Tinderbook
Tinderbook
 
Segmentation for Targeting
Segmentation for TargetingSegmentation for Targeting
Segmentation for Targeting
 
7.1 ratios and rates 1
7.1 ratios and rates 17.1 ratios and rates 1
7.1 ratios and rates 1
 
Consulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style CommunicationConsulting Template Slides - Mckinsey, BCG & Bain Style Communication
Consulting Template Slides - Mckinsey, BCG & Bain Style Communication
 
Probabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information MatchingProbabilistic Group Recommendation via Information Matching
Probabilistic Group Recommendation via Information Matching
 
New Revised GRE Test Format
New Revised GRE Test FormatNew Revised GRE Test Format
New Revised GRE Test Format
 
Stronger Research Reporting Using Visuals
Stronger Research Reporting Using VisualsStronger Research Reporting Using Visuals
Stronger Research Reporting Using Visuals
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
Rubric sample
Rubric sampleRubric sample
Rubric sample
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CF
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
The Data Analysis Workflow
The Data Analysis WorkflowThe Data Analysis Workflow
The Data Analysis Workflow
 
Empowering Students Unit
Empowering Students UnitEmpowering Students Unit
Empowering Students Unit
 
Effective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA WorkshopEffective Use of Surveys in UX | Triangle UXPA Workshop
Effective Use of Surveys in UX | Triangle UXPA Workshop
 

Kürzlich hochgeladen

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Recommender Systems

  • 2. Why recommendation systems?  Provide a better experience to your users.  Understand the behavior and patterns of users.  Enables an opportunity to re-engage inactive users.  Boost sales  Better than a search feature
  • 3. How some companies are using Recommendation Systems - Amazon
  • 4. How some companies are using Recommendation Systems - Gmail
  • 5. A simple recommendation system  Consider the following scenario  A library has books and has members  Members can have books issued  The library wants to build a recommender system to recommend books to their members
  • 6. Scoring Matrices Book 1 Book 2 Book 3 Book 4 User 1 X X User 2 X User 3 X X User 4 X X X User 5 X X Book 1 Book 2 Book 3 Book 4 Book 1 4 1 2 1 Book 2 1 2 0 1 Book 3 2 0 2 1 Book 4 1 1 1 2
  • 7. Using the scoring matrices  If a user has read Book 1 recommend Book 3, 2, 4.  If a user has read Book 2 recommend Book 1, 4, 3.  If a user has read Book 3 recommend Book 1, 4, 2.  If a user has read Book 4 recommend Book 1, 2, 3.
  • 8. Advantages  Very simple to understand and implement.  Works really well if you’re interested in looking at user’s one activity to recommend further.
  • 9. Disadvantages  Cannot work for a new user with no history.  In a real world scenario where there are thousands of books and thousands of members, there are bound to be too many zeroes (a sparse matrix).  Does not consider more than 1 item.
  • 10. Another Try  Our Books records might look like this: BookId Title Genre Writer Language 1 The Great Gatsby Classic F Scott Fitzgerald English 2 Nine Stories Short Stories J D Salinger English 3 The Sun Also Rises Classic Ernest Hemingway English 4 The Hunger Games Action Suzanne Collins English 5 The Ambler Warning Thriller Robert Ludlum English 6 The Catcher in the Rye Classic J D Salinger English 7 To Kill a Mockingbird Classic Harper Lee English
  • 11. Create an Item Similarity Matrix Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Book 7 Book 1 3 1 2 1 1 2 2 Book 2 1 3 1 1 1 2 1 Book 3 2 1 3 1 1 2 2 Book 4 1 1 1 3 1 1 1 Book 5 1 1 1 1 3 1 1 Book 6 2 2 2 1 1 3 2 Book 7 2 1 2 1 1 2 3 • This would always be a square (n x n) matrix. • Each cell has the count of similar attributes (excluding unique attributes). • In general any measure for similarity can be used here.
  • 12. To Recommend  Look at what a user has previously read.  Use the values from the similarity matrix and recommend books based on how similar it is to the book the user has already read.
  • 13. Advantages  Recommendations can be pre-computed for a very large Item base.  Fast lookups can be built to perform recommendations.  For example, if a user is seeing the page of Book 3, you may want to recommend them Books 1, 6 and 7.  Would work for new/non-registered users.
  • 14. Disadvantage  Does not consider the user’s history.  Instead looks at a collective trend.
  • 15. Another Approach - The Users  Our Users records might look like this: UserId Gender Age Location 1 Male 34 Pakistan 2 Female 28 Pakistan 3 Male 38 India 4 Male 32 India 5 Female 21 Pakistan 6 Female 24 Pakistan
  • 16. The User Borrowing UserId BookId 1 3 1 7 2 2 3 1 3 5 3 7 4 6 4 7 5 2 6 4 6 6 6 7
  • 17. Transforming User Borrowing User 1 User 2 User 3 User 4 User 5 User 6 Book 1 X Book 2 X X Book 3 X Book 4 X Book 5 X Book 6 X X Book 7 X X X X • Issue with too many zero values. • Any solutions?
  • 18. Transform the Users Records  Consider Age as a discrete column with ranges like {0-10, 11-20, 21-30, 31-40, …} so that we can create some partitions like this: PartitionId Gender AgeGroup Location 1 Male 31-40 Pakistan 2 Female 21-30 Pakistan 3 Male 31-40 India
  • 19. Recreate User Borrowing using Partition Information  Lesser zero valued records (11/21 compared to 30/42 previously)  Much less columns than we previously had!  The notation has been changed from ‘X’ to count. Partition 1 Partition 2 Partition 3 Book 1 1 Book 2 2 Book 3 1 Book 4 1 Book 5 1 Book 6 1 1 Book 7 1 1 2
  • 20. To Recommend  See what partition a user belongs to.  Look at the column of that partition and sort the books in descending order based on their frequency count.
  • 21. Advantages  Continues to improve over time.  More partitions can be added over time.  Instead of using a collective scoring, the technique partitions the user base into ‘similar’ users.  The technique can easily be extended on the item side and rather than having books as rows, we can have book clusters.
  • 22. Disadvantages  Needs some seed data to start.  Requires some transformations.  Can become very complex as the number of users/items grow.
  • 23. Evaluating Performance (Metrics)  Almost any Information Retrieval metric can be used.  Three interesting ones:  Accuracy  Coverage  Normalized Distance Based Performance Measure (NDPM)
  • 24. Accuracy • Takes into account the order in which recommendations are shown to users and how they responded to them. • For rank position = 1: • Acc(1) = # of Positive responses with rank less than or equal to 1 / total recommendations with rank less than or equal to 1 • Therefore, Acc(1) = 1 / 3 = 33.33% • Similarly, Acc(2) = 2 / 6 = 33.33% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 2 7 1 No 2 5 2 Yes 3 3 1 No 3 7 2 No
  • 25. Coverage  Shows the coverage of items that appear in the recommendations for all users.  For rank position = 1:  Cov(1) = Unique items in recommendations with rank less than or equal to 1 / total items.  Therefore, Cov(1) = 2 / 7 = 28.57%  Similarly, Cov(2) = 4 / 7 = 57.14% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 2 7 1 No 2 5 2 Yes 3 3 1 No 3 7 2 No
  • 26. Normalized Distance Based Performance Measure (NDPM)  Assesses the quality of the measure of recommendation system taking into account the ordering in which items are shown.  NDPM = (C- + 0.5 x C+) / Cu  C- - is the number of recommended item pairs where user responded as (No, Yes).  C+ - is the number of recommended item pairs where user responded as (Yes, No).  Cu - is the number of all item pairs where the user’s response was not same.  In our example,  C-(1) = 2, C+(1) = 2 and Cu(1) = 4 => NDPM(1) = (2 + 0.5 x 2) / 4 = 75%  C-(2) = 0, C+(2) = 1 and Cu(2) = 1 => NDPM(2) = (0 + 0.5 x 1) / 1 = 50%  NDPM = (0.75 + 0.5) / 2 = 62.5% UserId BookId Rank Response 1 3 1 Yes 1 2 2 No 1 7 3 No 1 5 4 Yes 2 3 1 Yes 2 7 2 No
  • 27. How to improve results  Ensure that you maintain a list of already seen recommendations for users and don’t recommend them back for some time.  Provide some sort of mechanism to user to provide information about what they’re looking for.  Infer the above from user searches.
  • 28. Some standard algorithms  Item Hierarchy  You bought a printer, you will also need ink.  Attribute-based recommendations  You like reading classics, written by Salinger, you might like “Catcher in the Rye”.  Collaborative Filtering – User-User Similarity  People like you who read “The Hunger Games” also read “The Ambler Warning”.  Collaborative Filtering – Item-Item Similarity  You like “Catcher in the Rye” so you will like “Nine Stories”.  Social + Interest Graph Based  Your friends like “The Great Gatsby” so you will like “The Great Gatsby” too.  Model Based  Training SVM, LDA, SVD for implicit features.
  • 29. Some Tools  Apache Mahout (Java)  Crab (Python)  Easyrec (RESTful API)
  • 31. Thankyou! www.usman-sharif.com @sharif_usman