SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Recommendations
  in Production
     Alex MacCaw
Netflix Prize
Amazon.com
Facebook
Last.fm
StumbleUpon
Google Suggest
iTunes
Rotten Tomatoes
Yelp
Google Search
Chicken or Egg
• Google Reader
• IMDB
Acts As
Recommendable
Types of
   recommendations

• Content Based
• User Based
• Item Based
Programming Collective Intelligence
Has Many Through Relationship
User            Has Many Through
                                              Book

  Has Many                              Has Many




             UserBooks
              Can have score (rating)
User

class User < ActiveRecord::Base
  has_many :user_books
  has_many :books, :through => :user_books
  acts_as_recommendable :books, :through => :user_books
end
Gives you


User#similar_users
User#recommended_books
Book#similar_books
The algorithms
• Manhattan Distance
• Euclidean distance
• Cosine
• Pearson correlation coefficient
• Jaccard
• Levenshtein
How does it work?
Strategy

• Map data into Euclidean Space
• Calculate similarity
• Use similarities to recommend
The Black   John Tucker
          Knight      Must Die
James       4            5

Jonah       3            2

George      5            3

 Alex       4            2
5.00

                   3.75
The Black Knight
                   2.50

                   1.25

                     0
                          0   1.25   2.50   3.75   5.00



                      John Tucker Must Die
5.00

                   3.75
The Black Knight
                   2.50

                   1.25

                     0
                          0   1.25   2.50   3.75   5.00



                      John Tucker Must Die
item id
{
                        user id
    1 => {
       1 => 1.0,
       2 => 0.0,                  score
       ...
    },
    ...
}
[[1, 0.5554], [2, 0.888], [3, 0.8843], ...]
Problem 1

It was far too slow to calculate on the fly
                 (obvious)
SELECT   * FROM quot;usersquot;   WHERE (quot;usersquot;.quot;idquot; = 2)
SELECT   * FROM quot;booksquot;
SELECT   * FROM quot;usersquot;
SELECT   quot;user_booksquot;.*   FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id IN (1,2,3,4,5,6,7,8,9,10))
SELECT   * FROM quot;booksquot;   WHERE (quot;booksquot;.quot;idquot; IN (11,6,12,7,13,8,14,9,15,1,2,19,20,3,10,4,5))
SELECT   * FROM quot;booksquot;   WHERE (quot;booksquot;.quot;idquot; IN (20,3,19,6))




                                              All books             All user_books
Solution
      Cache the dataset
        Build offline


rake recommendations:build
SELECT   *   FROM   quot;user_booksquot; WHERE (quot;user_booksquot;.user_id = 2)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 5)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 4)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 8)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 7)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 2)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 1)
Problem 2

Fetching the dataset took too
 long since it was so massive
Solution


Split up the cache by item
Rails.cache.write(
   quot;aar_books_1quot;, scores
 )
Problem 3

The dataset was so big it
     crashed Ruby!
Solution


Get rid of ActiveRecord

Only deal with integers
items = options[:on_class].connection.select_values(
  quot;SELECT id from #{options[:on_class].table_name}quot;
  ).collect(&:to_i)
Problem 4


It still crashed Ruby!
{
    1 => {
       1 => 1.0,
       2 => 0.0,
       ...
    },
    ...
}
Solution

Remove unnecessary
 cruft from dataset
{
    1 => {
       1 => 1.0,
       ...
    },
    ...
}
Problem 5

It was too slow
Solution


Re-write the slow bits in C
Details

• RubyInline
• Implemented Pearson
• Monkey patched original Ruby methods
• Very fast
InlineC = Module.new do
    inline do |builder|
      builder.c '
      #include <math.h>
      #include quot;ruby.hquot;
      double c_sim_pearson(VALUE items) {



Ruby Object
InlineC = Module.new do
        inline do |builder|
          builder.c '
          #include <math.h>
          #include quot;ruby.hquot;
          double c_sim_pearson(VALUE items) {




No Floats :(
Hash Lookup

if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) {
  prefs1_item = 0.0;
} else {
  prefs1_item = NUM2DBL(prefs1_item_ob);
}
Conversion

return num / den;
Design Designs

• Not too many relationships
• Not to many ‘items’
• Similarity matrix for items, not users
Changing data
Scaling Even Further


• K Means clustering
• Split cluster by category
Adding ratings
ActiveRecord::Schema.define(:version => 1) do
  create_table quot;booksquot;, :force => true do |t|
    t.string   quot;namequot;
    t.datetime quot;created_atquot;
    t.datetime quot;updated_atquot;
  end

 create_table quot;user_booksquot;, :force => true do |t|
   t.integer quot;user_idquot;,    :null => false
   t.integer quot;book_idquot;,    :null => false
   t.integer quot;ratingquot;,     :default => 0
 end

  create_table   quot;usersquot;, :force => true do |t|
    t.string     quot;namequot;
    t.datetime   quot;created_atquot;
    t.datetime   quot;updated_atquot;
  end
end
class User < ActiveRecord::Base
  has_many :user_books
  has_many :books, :through => :user_books
  acts_as_recommendable :books, :through => :user_books, :score => :rating
end
That’s it
Improvements?


• Better API
• Perform calculations over a cluster (EC2)
  using Map/Nanite
class AARN < Nanite::Actor
  expose :sim_pearson

  def sim_pearson(item1, item2)
    Optimizations.c_sim_pearson(item1, item2)
  end
end
Questions?
              http://eribium.org/blog

           twitter : maccman
       email/jabber: maccman@gmail.com



http://github.com/maccman/acts_as_recommendable
             http://rubyurl.com/kUpk

Weitere ähnliche Inhalte

Ähnlich wie Acts As Recommendable

Relaxing With CouchDB
Relaxing With CouchDBRelaxing With CouchDB
Relaxing With CouchDBleinweber
 
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...ActsAsCon
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)MongoSF
 
Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)Dave Bouwman
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsSerge Smetana
 
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsIgnacio Martín
 
Automated Frontend Testing
Automated Frontend TestingAutomated Frontend Testing
Automated Frontend TestingNeil Crosby
 
Automated Testing with Ruby
Automated Testing with RubyAutomated Testing with Ruby
Automated Testing with RubyKeith Pitty
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Railsfreelancing_god
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Rubymametter
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanWei-Yuan Chang
 
JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformationLars Marius Garshol
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Alexey Grigorev
 
Building unit tests correctly
Building unit tests correctlyBuilding unit tests correctly
Building unit tests correctlyDror Helper
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Michael Peacock
 

Ähnlich wie Acts As Recommendable (20)

Relaxing With CouchDB
Relaxing With CouchDBRelaxing With CouchDB
Relaxing With CouchDB
 
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
 
Sphinx on Rails
Sphinx on RailsSphinx on Rails
Sphinx on Rails
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)
 
Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worlds
 
Cooking with Chef
Cooking with ChefCooking with Chef
Cooking with Chef
 
Automated Frontend Testing
Automated Frontend TestingAutomated Frontend Testing
Automated Frontend Testing
 
Automated Testing with Ruby
Automated Testing with RubyAutomated Testing with Ruby
Automated Testing with Ruby
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Rails
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Ruby
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
Couch db and_the_web
Couch db and_the_webCouch db and_the_web
Couch db and_the_web
 
JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
 
Cucumber
CucumberCucumber
Cucumber
 
MongoDB
MongoDBMongoDB
MongoDB
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)
 
Building unit tests correctly
Building unit tests correctlyBuilding unit tests correctly
Building unit tests correctly
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012
 

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Acts As Recommendable