SlideShare a Scribd company logo
1 of 40
BUILDING A
PREDICTIVE MODEL
AN EXAMPLE OF A PRODUCT
RECOMMENDATION ENGINE

Alex Lin
Senior Architect
Intelligent Mining
alin@intelligentmining.com
Outline
  Predictivemodeling methodology
  k-Nearest Neighbor (kNN) algorithm
  Singular value decomposition (SVD)
   method for dimensionality reduction
  Using a synthetic data set to test and
   improve your model
  Experiment and results



                                            2
The Business Problem
  Design
        product recommender solution that will
 increase revenue.




               $$
                                            3
How Do We Increase Revenue?



               Increase
              Conversion
  Increase
  Revenue                      Increase
                              Unit Price
             Increase Avg.
              Order Value
                               Increase
                             Units / Order




                                             4
Example
  Is   this recommendation effective?


                        Increase
                       Unit Price



                        Increase
                      Units / Order




                                         5
What am I
going to do?




               6
Predictive Model
   Framework



                                    ML                  Prediction
     Data          Features
                                 Algorithm               Output



What data?      What feature?   Which Algorithm ?   Cross-sell & Up-sell
                                                    Recommendation




                                                                     7
What Data to Use?
  Explicit   data
       Ratings
       Comments
  Implicit   data
       Order history / Return history
       Cart events
       Page views
       Click-thru
       Search log
  In
    today’s talk we only use Order history and Cart
  events
                                                      8
Predictive Model


                                    ML                  Prediction
     Data          Features
                                 Algorithm               Output



Order History   What feature?   Which Algorithm ?   Cross-sell & Up-sell
Cart Events                                         Recommendation




                                                                     9
What Features to Use?
  We know that a given product tends to get
   purchased by customers with similar tastes or
   needs.
  Use user engagement data to describe a product.


                                  users
                1   2   3     4   5   6     7   8   9   10    …   n
    item




           17   1       .25           .25       1       .25


                         user engagement vector



                                                                      10
Data Representation / Features
  When we merge every item’s user engagement
 vector, we got a m x n item-user matrix
                                        users
              1   2     3     4     5   6   7     8     9   10    …   n

          1   1         .25             1                   .25

          2                                 .25
  items




          3   1               .25                 1
          4       .25               1             .25   1

                        1                   1
          …




          m



                                                                          11
Data Normalization
  Ensurethe magnitudes of the entries in the
 dataset matrix are appropriate
                                             users
               1     2     3     4     5     6     7     8     9     10    …   n

           1   1
               .5          1
                           .9                 1
                                             .92                      1
                                                                     .49

           2                                        1
                                                   .79
   items




           3    1
               .67                1
                                 .46                      1
                                                         .73

           4          1
                     .39                1
                                       .82                1
                                                         .76    1
                                                               .69

                            1                      1
           …
           …




                           .52                     .8

           m


  Removecolumn average – so frequent buyers
                                                                                   12
 don’t dominate the model
Data Normalization
  Differentengagement data points (Order / Cart /
   Page View) should have different weights
  Common normalization strategies:
      Remove column average
      Remove row average
      Remove global mean
      Z-score
      Fill-in the null values




                                                     13
Predictive Model


                                         ML                  Prediction
     Data          Features
                                      Algorithm               Output



Order History   User engagement      Which Algorithm ?   Cross-sell & Up-sell
Cart Events     vector                                   Recommendation



                Data Normalization




                                                                        14
Which Algorithm?
  How
     do we find the items that have similar user
 engagement data?
                                       users
               1   2   3     4   5      6   7   8     9   10    …   n

          1    1       .25              1                 1
          2                                 1
  items




          17   1             1          1       .25       .25

          18       1             .25    1       1     1

                                            1
          …




                       .25

          m


  We
    can find the items that have similar user
                                                                        15
 engagement vectors with kNN algorithm
k-Nearest Neighbor (kNN)
  Find
      the k items that have the most similar user
  engagement vectors
                                      users
              1     2   3    4   5     6   7    8   9    10   …   n

          1   .5        1              1                 1

          2         1                      .5            1
  items




          3   1              1                  1   1

          4         1            .5        1        1

                        .5                 1
          …




          m                  1                      .5



  Nearest         Neighbors of Item 4 = [2,3,1]                      16
Similarity Measure for kNN
                                                      users
                   1    2     3        4        5          6            7         8       9   10   …    n
       items




               2        1                                           .5                        1

               4        1                       .5                      1                 1

    Jaccard coefficient:
                                                                   (1+ 1)
                             sim(a,b) =
                                                     (1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1)
    Cosine similarity:
                                  a•b                                                         (1*1+ 0.5 *1)
           sim(a,b) = cos(a,b) =                                    =
               €                 a ∗ b                                          (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 )
                                  2                            2

    Pearson Correlation:

€         corr(a,b) =
                             ∑ (r − r )(r − r )
                                  i    ai       a     bi            b
                                                                                      =
                                                                                                   m∑ aibi − ∑ ai ∑ bi

                            ∑ (r − r ) ∑ (r − r )
                              i   ai        a
                                                2
                                                       i       bi           b
                                                                                 2
                                                                                          m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2
                                                                                                                                  17
                             match _ cols * Dotprod(a,b) − sum(a) * sum(b)
          =
               match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2
k-Nearest Neighbor (kNN)
                                                       Item
feature
  space                                                Similarity Measure
                                                       (cosine similarity)
                                           7
                          9


              8                   2
          1
                                      4
                      6


                                               5

                  3




                              kNN k=5
                              Nearest Neighbors(8) = [9,6,3,1,2]       18
Predictive Model
   Ver.    1: kNN

                                              ML                   Prediction
     Data               Features
                                           Algorithm                Output



Order History        User engagement      k-Nearest Neighbor   Cross-sell & Up-sell
Cart Events          vector               (kNN)                Recommendation



                     Data Normalization




                                                                              19
Cosine Similarity – Code fragment
 long i_cnt = 100000; // number of items 100K
 long u_cnt = 2000000; // number of users 2M
 double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation)
 double norm[i_cnt];

 // assume data matrix is loaded
 ……
 // calculate vector norm for each user engagement vector
 for (i=0; i<i_cnt; i++) {
     norm[i] = 0;
     for (f=0; f<u_cnt; f++) {
        norm[i] += data[i][f] * data [i][f];
     }                               1. 100K rows x 100K rows x 2M features --> scalability problem
     norm[i] = sqrt(norm[i]);              kd-tree, Locality sensitive hashing,
 }
                                        MapReduce/Hadoop, Multicore/Threading, Stream Processors
 // cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures
 for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem
                                        are
  for (j=0; j<i_cnt; j++) { // loop thru 100K
     dot_product = 0;
                                         This leads us to The SVD dimensionality reduction !
     for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M
         dot_product += data[i][f] * data[j][f];
     }
     printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j]));
   }                                                                                                      20
 // find the Top K nearest neighbors here
 …….
Singular Value Decomposition
            (SVD)
            A = U × S ×VT
                  A                             U                     S                     VT
             m x n matrix                   m x r matrix         r x r matrix           r x n matrix




€
    items




                                items



                                                                rank = k
                                                                k<r


               users                                                                      users
                                                                                            users
             Ak = U k × Sk × VkT
           Low rank approx. Item profile is        U k * Sk


                                                                                items
           Low rank approx. User profile is         S k *VkT                                          21

€          Low rank approx. Item-User matrix is
                                 €                         U k * Sk * Sk *VkT

                                        €
Reduced SVD
        Ak = U k × Sk × VkT
               Ak                          Uk                Sk                VkT
         100K x 2M matrix         100K x 3 matrix      3 x 3 matrix        3 x 2M matrix

                                                         7   0    0

                                                         0   3    0
items




                                   items
                                                         0   0    1              users
                                                         rank = 3
                                                                      Descending
                                                                      Singular Values


             users

       Low rank approx. Item profile is    U k * Sk

                                                                                           22

                              €
SVD Factor Interpretation                                  S
                                                     3 x 3 matrix

  Singular        values plot (rank=512)              7   0   0

                                                       0   3   0

                                                       0   0   1

                                                                    Descending
                                                                    Singular Values




                                                                                23
More Significant        Latent Factors   Noises + Others             Less Significant
SVD Dimensionality Reduction

                                  U k * Sk
        <----- latent factors ----->                     # of users


                        €
items




         3
               rank
                            Need to find the most optimal low rank !!
                10


                                                                        24
Missing values

    Difference between “0” and “unknown”
    Missing values do NOT appear randomly.
    Value = (Preference Factors) + (Availability) – (Purchased
     elsewhere) – (Navigation inefficiency) – etc.
    Approx. Value = (Preference Factors) +/- (Noise)
    Modeling missing values correctly will help us make good
     recommendations, especially when working with an extremely
     sparse data set




                                                                  25
Singular Value Decomposition
(SVD)
  Use SVD to reduce dimensionality, so neighborhood
   formation happens in reduced user space
  SVD helps model to find the low rank approx. dataset
   matrix, while retaining the critical latent factors and
   ignoring noise.
  Optimal low rank needs to be tuned
  SVD is computationally expensive


    SVD Libraries:
         Matlab [U, S, V] = svds(A,256);
         SVDPACKC http://www.netlib.org/svdpack/
         SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/
         GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html   26
Predictive Model
   Ver.    2: SVD+kNN

                                           ML                   Prediction
     Data           Features
                                        Algorithm                Output



Order History     User engagement     k-Nearest Neighbors   Cross-sell & Up-sell
Cart Events            vector         (kNN) in reduced      Recommendation
                                      space


                 Data Normalization

                       SVD


                                                                           27
Synthetic Data Set
  Why   do we use synthetic data set?




  Sowe can test our new model in a controlled
  environment
                                                 28
Synthetic Data Set
  16latent factors synthetic e-commerce
  data set
      Dimension: 1,000 (items) by 20,000 (users)
      16 user preference factors
      16 item property factors (non-negative)
      Txn Set: n = 55,360 sparsity = 99.72 %
      Txn+Cart Set: n = 192,985 sparsity = 99.03%
      Download: http://www.IntelligentMining.com/dataset/

       user_id   item_id   type
       10        42        0.25
       10        997       0.25
       10        950       0.25                              29
       11        836       0.25
       11        225       1
Synthetic Data Set
Item property      User preference             Purchase Likelihood score
                                                           1K x 20K matrix
   factors             factors
 1K x 16 matrix      16 x 20K matrix
                                                    X11   X12   X13   X14   X15   X16
                     x
                                                    X21   X22   X12   X24   X25   X26
                     y




                                            items
                                                    X31   X32   X33   X34   X35   X36
 a    b     c        z
                                                    X41   X42   X43   X44   X45   X46

                                                    X51   X52   X53   X54   X55   X56

                                                                 users

X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z

X32 = Likelihood of Item 3 being purchased by User 2
                                                                                        30
Synthetic Data Set
X11                        X31                                   X51
                                 Based on the distribution,
                                 pre-determine # of items
X21                        X41   purchased by an user            X41
                                 (# of item=2)
X31     Sort by Purchase   X21   From the top, select and skip
                                                                 X31
        likelihood Score         certain items to create data
X41                        X51   sparsity.                       X21

X51                        X11                                   X11



      User 1 purchased Item 4 and Item 1




                                                                       31
Experiment Setup
  Each model (Random / kNN / SVD+kNN) will
   generate top 20 recommendations for each item.
  Compare model output to the actual top 20
   provided by synthetic data set
  Evaluation Metrics :
      Precision %: Overlapping of the top 20 between model
       output and actual (higher the better)
                     {Found _ Top20 _ items} ∩ {Actual _ Top20 _ items}
       Precision =
                                 {Found _ Top20 _ items}

      Quality metric: Average of the actual ranking in the
       model output (lower the better)
       €                                                                              32

         1   2   30      47   50   21              1     2   368   62     900   510
Experimental Result
     kNN            vs. Random (Control)

Precision %                           Quality
(higher is better)                    (Lower is better)




                                                          33
Experimental Result
    Precision       % of SVD+kNN
Recall %
(higher is better)




                                    Improvement




                                                     34
                                          SVD Rank
Experimental Result
      Quality      of SVD+kNN
Quality
(Lower is better)




                                 Improvement




                                                    35
                                         SVD Rank
Experimental Result
    The       effect of using Cart data
Precision %
(higher is better)




                                                      36
                                           SVD Rank
Experimental Result
  The       effect of using Cart data
Quality
(Lower is better)




                                                    37
                                         SVD Rank
Outline
  Predictivemodeling methodology
  k-Nearest Neighbor (kNN) algorithm
  Singular value decomposition (SVD)
   method for dimensionality reduction
  Using a synthetic data set to test and
   improve your model
  Experiment and results



                                            38
References
    J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of
     Predictive Algorithms for Collaborative Filtering," in Proceedings of the
     Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI
     1998), 1998.
    B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative
     filtering recommendation algorithms," in Proceedings of the Tenth
     International Conference on the World Wide Web (WWW 10), pp. 285-295,
     2001.
    B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of
     Dimensionality Reduction in Recommender System A Case Study" In
     ACM WebKDD 2000 Web Mining for E-Commerce Workshop
    Apache Lucene Mahout http://lucene.apache.org/mahout/
    Cofi: A Java-Based Collaborative Filtering Library
     http://www.nongnu.org/cofi/


                                                                                 39
Thank you
  Any   question or comment?




                                40

More Related Content

What's hot

Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsLior Rokach
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Mauryasuraj98
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engineJayesh Lahori
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?blueace
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systemsFalitokiniaina Rabearison
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsT212
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation SystemMinha Hwang
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 

What's hot (20)

Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Movie Recommendation engine
Movie Recommendation engineMovie Recommendation engine
Movie Recommendation engine
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation System
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 

Viewers also liked

Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCMLconf
 
A Data Scientist in the Music Industry
A Data Scientist in the Music IndustryA Data Scientist in the Music Industry
A Data Scientist in the Music IndustryData Science London
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkCaserta
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRomain Lerallut
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with SparkChris Johnson
 
Intro to Factorization Machines
Intro to Factorization MachinesIntro to Factorization Machines
Intro to Factorization MachinesPavel Kalaidin
 
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2njit-ronbrown
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Domonkos Tikk
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorizationrubyyc
 
آموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دومآموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دومfaradars
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix FactorizationTatsuya Yokota
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFMLiangjie Hong
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsAladejubelo Oluwashina
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsNavisro Analytics
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringDKALab
 

Viewers also liked (20)

Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYCJeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
Jeremy Schiff, Senior Manager, Data Science, OpenTable at MLconf NYC
 
A Data Scientist in the Music Industry
A Data Scientist in the Music IndustryA Data Scientist in the Music Industry
A Data Scientist in the Music Industry
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 
RecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at CriteoRecSys 2015: Large-scale real-time product recommendation at Criteo
RecSys 2015: Large-scale real-time product recommendation at Criteo
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
Intro to Factorization Machines
Intro to Factorization MachinesIntro to Factorization Machines
Intro to Factorization Machines
 
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2Lecture 6   lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
Lecture 6 lu factorization & determinants - section 2-5 2-7 3-1 and 3-2
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorization
 
آموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دومآموزش محاسبات عددی - بخش دوم
آموزش محاسبات عددی - بخش دوم
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix Factorization
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFM
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative Filtering
 

Similar to Building a Recommendation Engine - An example of a product recommendation engine

2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrong2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrongJulien Coquet
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaTrushita Redij
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Kun Le
 
Agile Workshop: Agile Metrics
Agile Workshop: Agile MetricsAgile Workshop: Agile Metrics
Agile Workshop: Agile MetricsSiddhi
 
Scikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkScikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkDamian R. Mingle, MBA
 
Know How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfKnow How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfData Science Council of America
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIMEAli Raza Anjum
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareTigerGraph
 
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksIBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksSenturus
 
Deep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsDeep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsApigee | Google Cloud
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon Web Services
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Turi, Inc.
 
Key projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIKey projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIVijayananda Mohire
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 

Similar to Building a Recommendation Engine - An example of a product recommendation engine (20)

2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrong2013 01-23 when analytics projects go wrong
2013 01-23 when analytics projects go wrong
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
Agile Workshop: Agile Metrics
Agile Workshop: Agile MetricsAgile Workshop: Agile Metrics
Agile Workshop: Agile Metrics
 
Scikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkScikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That Work
 
Know How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdfKnow How to Create and Visualize a Decision Tree with Python.pdf
Know How to Create and Visualize a Decision Tree with Python.pdf
 
E miner
E minerE miner
E miner
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
 
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and TricksIBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
IBM Cognos 10 Framework Manager Metadata Modeling: Tips and Tricks
 
Z suzanne van_den_bosch
Z suzanne van_den_boschZ suzanne van_den_bosch
Z suzanne van_den_bosch
 
Deep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee InsightsDeep-Dive: Predicting Customer Behavior with Apigee Insights
Deep-Dive: Predicting Customer Behavior with Apigee Insights
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
 
Lean startup
Lean startupLean startup
Lean startup
 
Key projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIKey projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AI
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 

More from NYC Predictive Analytics

Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsNYC Predictive Analytics
 
The caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsThe caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsNYC Predictive Analytics
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMNYC Predictive Analytics
 
Introduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionIntroduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionNYC Predictive Analytics
 
Optimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsOptimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsNYC Predictive Analytics
 
An Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for PredictionAn Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for PredictionNYC Predictive Analytics
 
How OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeHow OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeNYC Predictive Analytics
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
 

More from NYC Predictive Analytics (10)

Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
The caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive ModelsThe caret Package: A Unified Interface for Predictive Models
The caret Package: A Unified Interface for Predictive Models
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
 
Introduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System CompetitionIntroduction to R Package Recommendation System Competition
Introduction to R Package Recommendation System Competition
 
R package Recommendation Engine
R package Recommendation EngineR package Recommendation Engine
R package Recommendation Engine
 
Optimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive AnalyticsOptimization: A Framework for Predictive Analytics
Optimization: A Framework for Predictive Analytics
 
An Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for PredictionAn Introduction to Multilevel Regression Modeling for Prediction
An Introduction to Multilevel Regression Modeling for Prediction
 
How OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive ChangeHow OMGPOP Uses Predictive Analytics to Drive Change
How OMGPOP Uses Predictive Analytics to Drive Change
 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
 
Recommendation Engine Demystified
Recommendation Engine DemystifiedRecommendation Engine Demystified
Recommendation Engine Demystified
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Building a Recommendation Engine - An example of a product recommendation engine

  • 1. BUILDING A PREDICTIVE MODEL AN EXAMPLE OF A PRODUCT RECOMMENDATION ENGINE Alex Lin Senior Architect Intelligent Mining alin@intelligentmining.com
  • 2. Outline   Predictivemodeling methodology   k-Nearest Neighbor (kNN) algorithm   Singular value decomposition (SVD) method for dimensionality reduction   Using a synthetic data set to test and improve your model   Experiment and results 2
  • 3. The Business Problem   Design product recommender solution that will increase revenue. $$ 3
  • 4. How Do We Increase Revenue? Increase Conversion Increase Revenue Increase Unit Price Increase Avg. Order Value Increase Units / Order 4
  • 5. Example   Is this recommendation effective? Increase Unit Price Increase Units / Order 5
  • 6. What am I going to do? 6
  • 7. Predictive Model   Framework ML Prediction Data Features Algorithm Output What data? What feature? Which Algorithm ? Cross-sell & Up-sell Recommendation 7
  • 8. What Data to Use?   Explicit data   Ratings   Comments   Implicit data   Order history / Return history   Cart events   Page views   Click-thru   Search log   In today’s talk we only use Order history and Cart events 8
  • 9. Predictive Model ML Prediction Data Features Algorithm Output Order History What feature? Which Algorithm ? Cross-sell & Up-sell Cart Events Recommendation 9
  • 10. What Features to Use?   We know that a given product tends to get purchased by customers with similar tastes or needs.   Use user engagement data to describe a product. users 1 2 3 4 5 6 7 8 9 10 … n item 17 1 .25 .25 1 .25 user engagement vector 10
  • 11. Data Representation / Features   When we merge every item’s user engagement vector, we got a m x n item-user matrix users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .25 1 .25 2 .25 items 3 1 .25 1 4 .25 1 .25 1 1 1 … m 11
  • 12. Data Normalization   Ensurethe magnitudes of the entries in the dataset matrix are appropriate users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .5 1 .9 1 .92 1 .49 2 1 .79 items 3 1 .67 1 .46 1 .73 4 1 .39 1 .82 1 .76 1 .69 1 1 … … .52 .8 m   Removecolumn average – so frequent buyers 12 don’t dominate the model
  • 13. Data Normalization   Differentengagement data points (Order / Cart / Page View) should have different weights   Common normalization strategies:   Remove column average   Remove row average   Remove global mean   Z-score   Fill-in the null values 13
  • 14. Predictive Model ML Prediction Data Features Algorithm Output Order History User engagement Which Algorithm ? Cross-sell & Up-sell Cart Events vector Recommendation Data Normalization 14
  • 15. Which Algorithm?   How do we find the items that have similar user engagement data? users 1 2 3 4 5 6 7 8 9 10 … n 1 1 .25 1 1 2 1 items 17 1 1 1 .25 .25 18 1 .25 1 1 1 1 … .25 m   We can find the items that have similar user 15 engagement vectors with kNN algorithm
  • 16. k-Nearest Neighbor (kNN)   Find the k items that have the most similar user engagement vectors users 1 2 3 4 5 6 7 8 9 10 … n 1 .5 1 1 1 2 1 .5 1 items 3 1 1 1 1 4 1 .5 1 1 .5 1 … m 1 .5   Nearest Neighbors of Item 4 = [2,3,1] 16
  • 17. Similarity Measure for kNN users 1 2 3 4 5 6 7 8 9 10 … n items 2 1 .5 1 4 1 .5 1 1   Jaccard coefficient: (1+ 1) sim(a,b) = (1+ 1+ 1) + (1+ 1+ 1+ 1) − (1+ 1)   Cosine similarity: a•b (1*1+ 0.5 *1) sim(a,b) = cos(a,b) = = € a ∗ b (12 + 0.5 2 + 12 ) * (12 + 0.5 2 + 12 + 12 ) 2 2   Pearson Correlation: € corr(a,b) = ∑ (r − r )(r − r ) i ai a bi b = m∑ aibi − ∑ ai ∑ bi ∑ (r − r ) ∑ (r − r ) i ai a 2 i bi b 2 m∑ ai2 − (∑ ai ) 2 m∑ bi2 − (∑ bi ) 2 17 match _ cols * Dotprod(a,b) − sum(a) * sum(b) = match _ cols * sum(a 2 ) − (sum(a)) 2 match _ cols * sum(b 2 ) − (sum(b)) 2
  • 18. k-Nearest Neighbor (kNN) Item feature space Similarity Measure (cosine similarity) 7 9 8 2 1 4 6 5 3 kNN k=5 Nearest Neighbors(8) = [9,6,3,1,2] 18
  • 19. Predictive Model   Ver. 1: kNN ML Prediction Data Features Algorithm Output Order History User engagement k-Nearest Neighbor Cross-sell & Up-sell Cart Events vector (kNN) Recommendation Data Normalization 19
  • 20. Cosine Similarity – Code fragment long i_cnt = 100000; // number of items 100K long u_cnt = 2000000; // number of users 2M double data[i_cnt][u_cnt]; // 100K by 2M dataset matrix (in reality, it needs to be malloc allocation) double norm[i_cnt]; // assume data matrix is loaded …… // calculate vector norm for each user engagement vector for (i=0; i<i_cnt; i++) { norm[i] = 0; for (f=0; f<u_cnt; f++) { norm[i] += data[i][f] * data [i][f]; } 1. 100K rows x 100K rows x 2M features --> scalability problem norm[i] = sqrt(norm[i]); kd-tree, Locality sensitive hashing, } MapReduce/Hadoop, Multicore/Threading, Stream Processors // cosine similarity calculation 2. data[i] is high-dimensional and sparse, similarity measures for (i=0; i<i_cnt; i++) { // loop thru 100Knot reliable --> accuracy problem are for (j=0; j<i_cnt; j++) { // loop thru 100K dot_product = 0; This leads us to The SVD dimensionality reduction ! for (f=0; f<u_cnt; f++) { // loop thru entire user space 2M dot_product += data[i][f] * data[j][f]; } printf(“%d %d %lfn”, i, j, dot_product/(norm[i] * norm[j])); } 20 // find the Top K nearest neighbors here …….
  • 21. Singular Value Decomposition (SVD) A = U × S ×VT A U S VT m x n matrix m x r matrix r x r matrix r x n matrix € items items rank = k k<r users users users Ak = U k × Sk × VkT   Low rank approx. Item profile is U k * Sk items   Low rank approx. User profile is S k *VkT 21 €   Low rank approx. Item-User matrix is € U k * Sk * Sk *VkT €
  • 22. Reduced SVD Ak = U k × Sk × VkT Ak Uk Sk VkT 100K x 2M matrix 100K x 3 matrix 3 x 3 matrix 3 x 2M matrix 7 0 0 0 3 0 items items 0 0 1 users rank = 3 Descending Singular Values users   Low rank approx. Item profile is U k * Sk 22 €
  • 23. SVD Factor Interpretation S 3 x 3 matrix   Singular values plot (rank=512) 7 0 0 0 3 0 0 0 1 Descending Singular Values 23 More Significant Latent Factors Noises + Others Less Significant
  • 24. SVD Dimensionality Reduction U k * Sk <----- latent factors -----> # of users € items 3 rank Need to find the most optimal low rank !! 10 24
  • 25. Missing values   Difference between “0” and “unknown”   Missing values do NOT appear randomly.   Value = (Preference Factors) + (Availability) – (Purchased elsewhere) – (Navigation inefficiency) – etc.   Approx. Value = (Preference Factors) +/- (Noise)   Modeling missing values correctly will help us make good recommendations, especially when working with an extremely sparse data set 25
  • 26. Singular Value Decomposition (SVD)   Use SVD to reduce dimensionality, so neighborhood formation happens in reduced user space   SVD helps model to find the low rank approx. dataset matrix, while retaining the critical latent factors and ignoring noise.   Optimal low rank needs to be tuned   SVD is computationally expensive   SVD Libraries:   Matlab [U, S, V] = svds(A,256);   SVDPACKC http://www.netlib.org/svdpack/   SVDLIBC http://tedlab.mit.edu/~dr/SVDLIBC/   GHAPACK http://www.dcs.shef.ac.uk/~genevieve/ml.html 26
  • 27. Predictive Model   Ver. 2: SVD+kNN ML Prediction Data Features Algorithm Output Order History User engagement k-Nearest Neighbors Cross-sell & Up-sell Cart Events vector (kNN) in reduced Recommendation space Data Normalization SVD 27
  • 28. Synthetic Data Set   Why do we use synthetic data set?   Sowe can test our new model in a controlled environment 28
  • 29. Synthetic Data Set   16latent factors synthetic e-commerce data set   Dimension: 1,000 (items) by 20,000 (users)   16 user preference factors   16 item property factors (non-negative)   Txn Set: n = 55,360 sparsity = 99.72 %   Txn+Cart Set: n = 192,985 sparsity = 99.03%   Download: http://www.IntelligentMining.com/dataset/ user_id item_id type 10 42 0.25 10 997 0.25 10 950 0.25 29 11 836 0.25 11 225 1
  • 30. Synthetic Data Set Item property User preference Purchase Likelihood score 1K x 20K matrix factors factors 1K x 16 matrix 16 x 20K matrix X11 X12 X13 X14 X15 X16 x X21 X22 X12 X24 X25 X26 y items X31 X32 X33 X34 X35 X36 a b c z X41 X42 X43 X44 X45 X46 X51 X52 X53 X54 X55 X56 users X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z X32 = Likelihood of Item 3 being purchased by User 2 30
  • 31. Synthetic Data Set X11 X31 X51 Based on the distribution, pre-determine # of items X21 X41 purchased by an user X41 (# of item=2) X31 Sort by Purchase X21 From the top, select and skip X31 likelihood Score certain items to create data X41 X51 sparsity. X21 X51 X11 X11   User 1 purchased Item 4 and Item 1 31
  • 32. Experiment Setup   Each model (Random / kNN / SVD+kNN) will generate top 20 recommendations for each item.   Compare model output to the actual top 20 provided by synthetic data set   Evaluation Metrics :   Precision %: Overlapping of the top 20 between model output and actual (higher the better) {Found _ Top20 _ items} ∩ {Actual _ Top20 _ items} Precision = {Found _ Top20 _ items}   Quality metric: Average of the actual ranking in the model output (lower the better) € 32 1 2 30 47 50 21 1 2 368 62 900 510
  • 33. Experimental Result   kNN vs. Random (Control) Precision % Quality (higher is better) (Lower is better) 33
  • 34. Experimental Result   Precision % of SVD+kNN Recall % (higher is better) Improvement 34 SVD Rank
  • 35. Experimental Result   Quality of SVD+kNN Quality (Lower is better) Improvement 35 SVD Rank
  • 36. Experimental Result   The effect of using Cart data Precision % (higher is better) 36 SVD Rank
  • 37. Experimental Result   The effect of using Cart data Quality (Lower is better) 37 SVD Rank
  • 38. Outline   Predictivemodeling methodology   k-Nearest Neighbor (kNN) algorithm   Singular value decomposition (SVD) method for dimensionality reduction   Using a synthetic data set to test and improve your model   Experiment and results 38
  • 39. References   J.S. Breese, D. Heckerman and C. Kadie, "Empirical Analysis of Predictive Algorithms for Collaborative Filtering," in Proceedings of the Fourteenth Conference on Uncertainity in Artificial Intelligence (UAI 1998), 1998.   B. Sarwar, G. Karypis, J. Konstan and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the Tenth International Conference on the World Wide Web (WWW 10), pp. 285-295, 2001.   B. Sarwar, G. Karypis, J. Konstan, and J. Riedl "Application of Dimensionality Reduction in Recommender System A Case Study" In ACM WebKDD 2000 Web Mining for E-Commerce Workshop   Apache Lucene Mahout http://lucene.apache.org/mahout/   Cofi: A Java-Based Collaborative Filtering Library http://www.nongnu.org/cofi/ 39
  • 40. Thank you   Any question or comment? 40