SlideShare ist ein Scribd-Unternehmen logo
1 von 20
ENHANCING MATRIX FACTORIZATION
THROUGH INITIALIZATION FOR
IMPLICIT FEEDBACK DATABASES


                  Balázs Hidasi
                 Domonkos Tikk

                    Gravity R&D Ltd.
     Budapest University of Technology and Economics
OUTLINE
 Matrix factoriztion
 Initialization concept

 Methods
     Naive
     SimFactor

 Results
 Discussion




                           2/19
MATRIX FACTORIZATION
 Collaborative Filtering
 One of the most common approaches

 Approximates the rating matrix as product of low-
  rank matrices
                Items


                                           Q
        Users




                R        ≈     P




                                                      3/19
MATRIX FACTORIZATION
 Initialize P and Q with small random numbers
 Teach P and Q
     Alternating Least Squares
     Gradient Descent
     Etc.

   Transforms the data to a feature space
     Separately for users and items
     Noise reduction
     Compression
     Generalization

                                                 4/19
IMPLICIT FEEDBACK
 No ratings
 User-item interactions (events)

 Much noisier
     Presence of an event  might not be positive feedback
     Absence of an event  does not mean negative
      feedback
     No negative feedback is available!

 More common problem
 MF for implicit feedback
       Less accurate results due to noise
       Mostly ALS is used                                    5/19
       Scalability problems (rating matrix is dense)
CONCEPT
   Good MF model
     The feature vectors of similar entities are similar
     If data is too noisy  similar entities won’t be similar by
      their features
   Start MF from a „good” point
       Feature vector similarities are OK
   Data is more than just events
       Metadata
           Info about items/users
       Contextual data
           In what context did the event occured
       Can we incorporate those to help implicit MF?               6/19
NAIVE APPROACH
   Describe items using any data we have (detailed
    later)
       Long, sparse vectors for item description
   Compress these vectors to dense feature vectors
     PCA, MLP, MF, …
     Length of desired vectors = Number of features in MF

   Use these features as starting points




                                                             7/19
NAIVE APPROACH
 Compression and also noise reduction
 Does not really care about similarities

 But often feature similarities are not that bad

 If MF is used
           Half of the results is thrown out


                   Descriptors
                                         features   Descriptor features
                                    ≈
    Items




                                           Item


             Description of items

                                                                          8/19
SIMFACTOR ALGORITHM
 Try to preserve similarities better
 Starting from an MF of item description
                  Descriptors

                                                        Descriptors features




                                             features
            Description of items
                                        ≈
    Items




                                               Item
                    (D)


    Similarities of items: DD’




                                                                      Description of items
           Some metrics require transformation on D




                                                                              (D’)
                            Item
                                            Description of items
                         similarities
                             (S)
                                        =           (D)                                      9/19
SIMFACTOR ALGORITHM




                                                         Descriptors features
                   features
   Item                           Descriptors features                            Item
               ≈     Item

                      (X)
similarities                              (Y’)                                  features
    (S)                                                                            (X’)




                                                                 (Y)
   Similarity approximation

                                       features

                      Item                                    Item
                                         Item



                                   ≈
                                          (X)

                   similarities                   Y’Y       features
                       (S)                                     (X’)


   Y’Y  KxK symmetric                                                                    10/19
       Eigendecomposition
SIMFACTOR ALGORITHM

Y’Y     =      U          λ        U’

   λ diagonal  λ = SQRT(λ) * SQRT(λ)
                   features


   Item                                                      Item
               ≈
                     Item



                                        SQRT   SQRT
                      (X)


similarities                  U          (λ)    (λ)   U’   features
    (S)                                                       (X’)

 X*U*SQRT(λ) = (SQRT(λ)*U’*X’)’=F
 F is MxK matrix

 S F * F’  F used for initialization

   Item
similarities   ≈                  F’                                  11/19
                     F




    (S)
CREATING THE DESCRIPTION MATRIX
   „Any” data about the entity
       Vector-space reprezentation
   For Items:
     Metadata vector (title, category, description, etc)
     Event vector (who bought the item)
     Context-state vector (in which context state was it
      bought)
     Context-event (in which context state who bought it)

   For Users:
       All above except metadata
 Currently: Choose one source for D matrix
                                                             12/19
 Context used: seasonality
EXPERIMENTS: SIMILARITY PRESERVATION
   Real life dataset: online grocery shopping events
            SimFactor RMSE improvement over naive in similarity
                             approximation

     52.36%
                                                                                             48.70%




                                                          26.22%


                                          16.86%
                                                                           13.39%                              12.38%
                       10.81%




Item context state User context state   Item context-   User context-   Item event data   User event data   Item metadata
                                            event          event
                                                                                                                            13/19
   SimFactor approximates similarities better
EXPERIMENTS: INITIALIZATION
 Using different description matrices
 And both naive and SimFactor initialization

 Baseline: random init

 Evaluation metric: recall@50




                                                14/19
EXPERIMENTS: GROCERY DB
 Up to 6% improvement
 Best methods use SimFactor and user context data

                              Top5 methods on Grocery DB
         5.71%


                              4.88%

                                                   4.30%
                                                                       4.12%              4.04%




                                                                                                          15/19
    User context state   User context state   User context event   User event data   User context event
      (SimFactor)             (Naive)            (SimFactor)        (SimFactor)           (Naive)
EXPERIMENTS: „IMPLICITIZED” MOVIELENS
 Keeping 5 star ratings  implicit events
 Up to 10% improvement

 Best methods use SimFactor and item context data
                            Top5 methods on MovieLens DB

          10%




                              9.17%                9.17%                9.17%                9.17%




                                                                                                             16/19
    Item context state   User context state   Item context event   Item context event   Item context state
       (SimFactor)         (SimFactor)           (SimFactor)             (Naive)             (Naive)
DISCUSSION OF RESULTS
 SimFactor yields better results than naive
 Context information yields better results than other
  descriptions
 Context information separates well between entities
       Grocery: User context
         People’s routines
         Different types of shoppings in different times

       MovieLens: Item context
           Different types of movies watched on different hours
   Context-based similarity

                                                                   17/19
WHY CONTEXT?
   Grocery example
       Correlation between context states by users low
                   ITEM                                        USER
        Mon Tue Wed Thu Fri Sat Sun                 Mon Tue Wed Thu Fri Sat Sun
Mon      1,00 0,79 0,79 0,78 0,76 0,70 0,74   Mon    1,00 0,36 0,34 0,34 0,35 0,29 0,19
Tue      0,79 1,00 0,79 0,78 0,76 0,69 0,73   Tue    0,36 1,00 0,34 0,34 0,33 0,29 0,19
Wed      0,79 0,79 1,00 0,79 0,76 0,70 0,74   Wed    0,34 0,34 1,00 0,36 0,35 0,27 0,17
Thu      0,78 0,78 0,79 1,00 0,76 0,71 0,74   Thu    0,34 0,34 0,36 1,00 0,39 0,30 0,16
Fri      0,76 0,76 0,76 0,76 1,00 0,71 0,72   Fri    0,35 0,33 0,35 0,39 1,00 0,32 0,16
Sat      0,70 0,69 0,70 0,71 0,71 1,00 0,71   Sat    0,29 0,29 0,27 0,30 0,32 1,00 0,33
Sun      0,74 0,73 0,74 0,74 0,72 0,71 1,00   Sun    0,19 0,19 0,17 0,16 0,16 0,33 1,00
   Why can context aware algorithms be efficient?
     Different recommendations in different context states
                                                                                     18/19
     Context differentiates well between entities
            Easier subtasks
CONCLUSION & FUTURE WORK
 SimFactor  Similarity preserving compression
 Similarity based MF initialization:
     Description matrix from any data
     Apply SimFactor
     Use output as initial features for MF

 Context differentitates between entities well
 Future work:
     Mixed description matrix (multiple data sources)
     Multiple description matrix
     Using different context information
     Using different similarity metrics                 19/19
THANKS FOR YOUR ATTENTION!

Questions?

Weitere ähnliche Inhalte

Mehr von Domonkos Tikk

Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleDomonkos Tikk
 
Recommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspectsRecommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspectsDomonkos Tikk
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Domonkos Tikk
 
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DChallenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DDomonkos Tikk
 
General factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsGeneral factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsDomonkos Tikk
 
Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment) Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment) Domonkos Tikk
 
Idomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fwIdomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fwDomonkos Tikk
 
Big Data in Online Classifieds
Big Data in Online ClassifiedsBig Data in Online Classifieds
Big Data in Online ClassifiedsDomonkos Tikk
 
Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...Domonkos Tikk
 
Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 worksh...
Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 worksh...Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 worksh...
Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 worksh...Domonkos Tikk
 
From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...Domonkos Tikk
 

Mehr von Domonkos Tikk (11)

Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scale
 
Recommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspectsRecommenders on video sharing portals - business and algorithmic aspects
Recommenders on video sharing portals - business and algorithmic aspects
 
Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...Neighbor methods vs matrix factorization - case studies of real-life recommen...
Neighbor methods vs matrix factorization - case studies of real-life recommen...
 
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DChallenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
 
General factorization framework for context-aware recommendations
General factorization framework for context-aware recommendationsGeneral factorization framework for context-aware recommendations
General factorization framework for context-aware recommendations
 
Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment) Tartalomgazdagítás (content enrichment)
Tartalomgazdagítás (content enrichment)
 
Idomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fwIdomaar crowd rec_reference_fw
Idomaar crowd rec_reference_fw
 
Big Data in Online Classifieds
Big Data in Online ClassifiedsBig Data in Online Classifieds
Big Data in Online Classifieds
 
Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...Context-aware similarities within the factorization framework - presented at ...
Context-aware similarities within the factorization framework - presented at ...
 
Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 worksh...
Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 worksh...Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 worksh...
Recommender Systems Evaluation: A 3D Benchmark - presented at RUE 2012 worksh...
 
From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...From a toolkit of recommendation algorithms into a real business: the Gravity...
From a toolkit of recommendation algorithms into a real business: the Gravity...
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Slides from CARR 2012 WS - Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases

  • 1. ENHANCING MATRIX FACTORIZATION THROUGH INITIALIZATION FOR IMPLICIT FEEDBACK DATABASES Balázs Hidasi Domonkos Tikk Gravity R&D Ltd. Budapest University of Technology and Economics
  • 2. OUTLINE  Matrix factoriztion  Initialization concept  Methods  Naive  SimFactor  Results  Discussion 2/19
  • 3. MATRIX FACTORIZATION  Collaborative Filtering  One of the most common approaches  Approximates the rating matrix as product of low- rank matrices Items Q Users R ≈ P 3/19
  • 4. MATRIX FACTORIZATION  Initialize P and Q with small random numbers  Teach P and Q  Alternating Least Squares  Gradient Descent  Etc.  Transforms the data to a feature space  Separately for users and items  Noise reduction  Compression  Generalization 4/19
  • 5. IMPLICIT FEEDBACK  No ratings  User-item interactions (events)  Much noisier  Presence of an event  might not be positive feedback  Absence of an event  does not mean negative feedback  No negative feedback is available!  More common problem  MF for implicit feedback  Less accurate results due to noise  Mostly ALS is used 5/19  Scalability problems (rating matrix is dense)
  • 6. CONCEPT  Good MF model  The feature vectors of similar entities are similar  If data is too noisy  similar entities won’t be similar by their features  Start MF from a „good” point  Feature vector similarities are OK  Data is more than just events  Metadata  Info about items/users  Contextual data  In what context did the event occured  Can we incorporate those to help implicit MF? 6/19
  • 7. NAIVE APPROACH  Describe items using any data we have (detailed later)  Long, sparse vectors for item description  Compress these vectors to dense feature vectors  PCA, MLP, MF, …  Length of desired vectors = Number of features in MF  Use these features as starting points 7/19
  • 8. NAIVE APPROACH  Compression and also noise reduction  Does not really care about similarities  But often feature similarities are not that bad  If MF is used  Half of the results is thrown out Descriptors features Descriptor features ≈ Items Item Description of items 8/19
  • 9. SIMFACTOR ALGORITHM  Try to preserve similarities better  Starting from an MF of item description Descriptors Descriptors features features Description of items ≈ Items Item (D)  Similarities of items: DD’ Description of items  Some metrics require transformation on D (D’) Item Description of items similarities (S) = (D) 9/19
  • 10. SIMFACTOR ALGORITHM Descriptors features features Item Descriptors features Item ≈ Item (X) similarities (Y’) features (S) (X’) (Y)  Similarity approximation features Item Item Item ≈ (X) similarities Y’Y features (S) (X’)  Y’Y  KxK symmetric 10/19  Eigendecomposition
  • 11. SIMFACTOR ALGORITHM Y’Y = U λ U’  λ diagonal  λ = SQRT(λ) * SQRT(λ) features Item Item ≈ Item SQRT SQRT (X) similarities U (λ) (λ) U’ features (S) (X’)  X*U*SQRT(λ) = (SQRT(λ)*U’*X’)’=F  F is MxK matrix  S F * F’  F used for initialization Item similarities ≈ F’ 11/19 F (S)
  • 12. CREATING THE DESCRIPTION MATRIX  „Any” data about the entity  Vector-space reprezentation  For Items:  Metadata vector (title, category, description, etc)  Event vector (who bought the item)  Context-state vector (in which context state was it bought)  Context-event (in which context state who bought it)  For Users:  All above except metadata  Currently: Choose one source for D matrix 12/19  Context used: seasonality
  • 13. EXPERIMENTS: SIMILARITY PRESERVATION  Real life dataset: online grocery shopping events SimFactor RMSE improvement over naive in similarity approximation 52.36% 48.70% 26.22% 16.86% 13.39% 12.38% 10.81% Item context state User context state Item context- User context- Item event data User event data Item metadata event event 13/19  SimFactor approximates similarities better
  • 14. EXPERIMENTS: INITIALIZATION  Using different description matrices  And both naive and SimFactor initialization  Baseline: random init  Evaluation metric: recall@50 14/19
  • 15. EXPERIMENTS: GROCERY DB  Up to 6% improvement  Best methods use SimFactor and user context data Top5 methods on Grocery DB 5.71% 4.88% 4.30% 4.12% 4.04% 15/19 User context state User context state User context event User event data User context event (SimFactor) (Naive) (SimFactor) (SimFactor) (Naive)
  • 16. EXPERIMENTS: „IMPLICITIZED” MOVIELENS  Keeping 5 star ratings  implicit events  Up to 10% improvement  Best methods use SimFactor and item context data Top5 methods on MovieLens DB 10% 9.17% 9.17% 9.17% 9.17% 16/19 Item context state User context state Item context event Item context event Item context state (SimFactor) (SimFactor) (SimFactor) (Naive) (Naive)
  • 17. DISCUSSION OF RESULTS  SimFactor yields better results than naive  Context information yields better results than other descriptions  Context information separates well between entities  Grocery: User context  People’s routines  Different types of shoppings in different times  MovieLens: Item context  Different types of movies watched on different hours  Context-based similarity 17/19
  • 18. WHY CONTEXT?  Grocery example  Correlation between context states by users low ITEM USER Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Mon 1,00 0,79 0,79 0,78 0,76 0,70 0,74 Mon 1,00 0,36 0,34 0,34 0,35 0,29 0,19 Tue 0,79 1,00 0,79 0,78 0,76 0,69 0,73 Tue 0,36 1,00 0,34 0,34 0,33 0,29 0,19 Wed 0,79 0,79 1,00 0,79 0,76 0,70 0,74 Wed 0,34 0,34 1,00 0,36 0,35 0,27 0,17 Thu 0,78 0,78 0,79 1,00 0,76 0,71 0,74 Thu 0,34 0,34 0,36 1,00 0,39 0,30 0,16 Fri 0,76 0,76 0,76 0,76 1,00 0,71 0,72 Fri 0,35 0,33 0,35 0,39 1,00 0,32 0,16 Sat 0,70 0,69 0,70 0,71 0,71 1,00 0,71 Sat 0,29 0,29 0,27 0,30 0,32 1,00 0,33 Sun 0,74 0,73 0,74 0,74 0,72 0,71 1,00 Sun 0,19 0,19 0,17 0,16 0,16 0,33 1,00  Why can context aware algorithms be efficient?  Different recommendations in different context states 18/19  Context differentiates well between entities  Easier subtasks
  • 19. CONCLUSION & FUTURE WORK  SimFactor  Similarity preserving compression  Similarity based MF initialization:  Description matrix from any data  Apply SimFactor  Use output as initial features for MF  Context differentitates between entities well  Future work:  Mixed description matrix (multiple data sources)  Multiple description matrix  Using different context information  Using different similarity metrics 19/19
  • 20. THANKS FOR YOUR ATTENTION! Questions?