Diese Präsentation wurde erfolgreich gemeldet.

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

1 von 34 Anzeige

# Matrix Factorization

This is a material for the course lecture "Data Engineering" at Shizuoka University in 2019.

This is a material for the course lecture "Data Engineering" at Shizuoka University in 2019.

Anzeige
Anzeige

## Weitere Verwandte Inhalte

Anzeige

Anzeige

### Matrix Factorization

1. 1. Matrix Factorization: Beyond Simple Collaborative Filtering Yusuke Yamamoto Lecturer, Faculty of Informatics yusuke_yamamoto@acm.org Data Engineering （Recommender Systems 3） 2019.11.11
2. 2. 1 2 Problems on Simple Collaborative Filtering
3. 3. User-based Collaborative Filtering 3 Predicts a target user’s rating for an item based on rating tendency of similar users 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢), 𝑖 = 𝑟,- + ∑,∈12 𝑠𝑖𝑚(𝑢), 𝑢) 7 (𝑟,,8 − 𝑟,- ) ∑,∈12 𝑠𝑖𝑚(𝑢), 𝑢) Item5 sim Average Rating Alice ? 1 4 User1 3 0.85 2.4 User2 5 0.71 3.8 Similar users
4. 4. Idea about Item-based Collaborative Filtering 4 Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1 similar Predicts unknown scores based on rating tendency for similar items similar 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 𝑢), 𝑖: = ∑8∈;2 𝑠𝑖𝑚(𝑖:, 𝑖) 7 𝑟,-,8 ∑8∈;2 𝑠𝑖𝑚(𝑖:, 𝑖)
5. 5. Problems on CF approaches (1/3) 5Image reference: https://rafalab.github.io/dsbook/recommendation-systems.html Real data is quite sparse!! Even on large e-commerce sites, there are few intersections between user vectors (& item vectors).
6. 6. Problems on CF approaches (2/3) 6 The curse of dimensionality • In high-dimensional space, it’s difficult to handle similarity • Usually, item/user vectors have quite high dimensionality (b/c rating matrix is quite large)
7. 7. Problems on CF approaches (3/3) 7 High computational cost • The rating matrix is directly used every time systems find similar user/items and make predictions • CF approaches do not scale for most real world scenarios User 1 3 2 4 … 3 5 2 4 … 1 … Rating matrix Similar users for user 1 3 2 4 … 3 5 2 4 … 1 … Rating matrix Suggested items for user 1 Compute Compute User 2 3 2 4 … 3 5 2 4 … 1 … Rating matrix Similar users for user 2 3 2 4 … 3 5 2 4 … 1 … Rating matrix Suggested items for user 2 Compute Compute
8. 8. Recent approaches for recommender systems 8 Model-based approach - Based on offline pre-processing - At run-time, only the learned model is used for rating prediction - Learned models can be updated 3 2 4 … 3 5 2 4 … 1 … Rating matrix Learned Model Computation offline User Suggested Items for user Computation online
9. 9. Memory-based approach vs. model-based approach 9 Memory-based approach - User-based CF - Item-based CF Model-based approach - Matrix factorization - Association rule mining - Probabilistic model - Other ML techniques
10. 10. 2 10 Matrix Factorization
11. 11. User God father Termin- ator Money game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 Basic idea 1 (1/2) 11 I don’t like horror… Sci-Fis often move me. I love humane and dramatic movies! User Horror Sci-Fi Humanity Drama … Alice 0.3 2.1 6.1 4.7 … Latent factors (which cannot be observed) Assumes that latent factors exist in users/items
12. 12. Basic idea 1 (2/2) 12 Assumes that latent factors exist in users/items Image from Amazon.com Latent factors (which cannot be observed) User God father Termin- ator Money game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 User Horror Sci-Fi Humanity Drama … God father 2.3 0.4 4.9 5.7 …
13. 13. Basic idea 2 13 User God father Termin- ator Money game Titanic Back to the future … X-men Alice 5 1 4 4 3 … 2 User Horror Sci-Fi Humanity Drama … Alice 0.3 2.1 6.1 4.7 … Assumes that rating scores derive from latent factors of users and items User Horror Sci-Fi Humanity Drama … God father 2.3 0.4 4.9 5.7 … ×
14. 14. Summary of matrix factorization 14 3 … 3 5 2 … 1 … Rating matrix = R (m users × n items) ≈ P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix (m users × k latent factors) Latent item matrix (k latent factors x n items) QR ≈ T × • Rating matrix can be decomposed to latent factors of users and items • The dimension of latent factors (vectors) is much less than the number of users and items (k ≪ m, n) □ □ □ (□ = unknown scores) User Item Rating
15. 15. Prediction using matrix factorization 15 3 … 3 5 2 … 1 … Predicted rating matrix = R* = P 1.3 … 0.7 0.2 … 4.9 … × 0.2 1.1 … 0.9 3.1 … 0.7 1 … Latent user matrix Latent item matrix QR = T × If we obtain latent user/item matrix, we can predict unknown scores by multiplying the two latent matrix How to obtain latent user/item matrix? ＊ 2 5 4 □ □ □
16. 16. SVD for recommender systems 16 SVD: singular value decomposition - A famous linear algebra technique for matrix decomposition - It is often used for dimensionality reduction - SVD delivers essentially the same result as PCA does U ΣX = T × V× m x n matrix m x m unitary matrix n x n unitary matrix Rectangular diagonal matrix (diagonal values are called as singular values)
17. 17. Example of SVD 17 U ΣX = T × V× 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V=
18. 18. Important features of SVD (1/4) 18 We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V= Largest singular values
19. 19. Important features of SVD (2/4) 19 -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.904 -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 U= Σ= V= Ignore unimportant values! We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
20. 20. 13.368 0 0 0 0 0 4.708 0 0 0 0 0 2.792 0 0 0 0 0 1.586 0 0 0 0 0 0.90 Important features of SVD (3/4) 20 Σ2= -0.369 -0.325 0.282 0.343 0.749 -0.459 0.448 -0.339 -0.584 0.364 -0.468 -0.359 0.544 -0.459 -0.381 -0.563 0.491 0.07 0.562 -0.348 -0.341 -0.569 -0.71 0.118 -0.202 U2= -0.325 0.341 -0.101 0.682 -0.55 -0.562 0.345 0.273 0.153 0.684 -0.468 -0.642 -0.577 0.125 0.141 -0.504 -0.327 0.601 -0.324 -0.416 -0.324 0.496 -0.47 -0.626 -0.189 V2= We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
21. 21. Important features of SVD (4/4) 21 U2 Σ2 T × V2 × = 1.08 2.24 3.29 2.98 0.84 2.72 4.17 1.52 2.41 3.04 1.46 2.93 4.02 3.71 1.19 3.24 5.03 2.04 3.04 3.59 0.57 1.64 3.86 3.18 0.15 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 ≈ = X We can approximate a given matrix by focusing on the largest singular values and the vectors of U and V which correspond to the singular values
22. 22. Apply SVD for recommender systems (1/2) 22 Item1 Item2 Item3 Item4 Item5 Alice 1 3 3 3 ? User1 2 4 2 2 4 User2 1 3 3 5 1 User3 4 5 2 3 3 User4 1 1 5 2 1 Regarded as zero 1 3 3 3 0 2 4 2 2 4 1 3 3 5 1 4 5 2 3 3 1 1 5 2 1 = X To matrix
23. 23. Apply SVD for recommender systems (2/2) 23 U2 Σ2 T V2 U ΣX = T × V× Focus on important features × × Step 1 Step 2 Run SVD Multiply three matrixStep 3 = 1.08 2.24 3.29 2.98 0.84 2.72 4.17 1.52 2.41 3.04 1.46 2.93 4.02 3.71 1.19 3.24 5.03 2.04 3.04 3.59 0.57 1.64 3.86 3.18 0.15 Check the values which were zero before running SVD Step 4
24. 24. Problems on SVD 24 Predicted values are often negative - SVD does not take rating score range into account Zero replacement decreases prediction quality - SVD analyzes the relation between all data in matrix - The meaning of “zero” is different from that of “unknown”
25. 25. Bad example of SVD-based recommendation 25 Ghibli 1 Ghibli 2 Ghibli 3 Ghibli 4 User1 5 User2 3 4 User3 2 1 User4 5 4 User5 5 5 0 0 0 3 4 0 0 2 0 1 0 0 5 0 4 0 0 0 5 Example from: http://smrmkt.hatenablog.jp/entry/2014/08/23/211555 U2, Σ2, V2 RunSVD Zeroreplacement Multiplymatrix 3.53 1.88 0.16 -0.26 3.62 4.04 0.13 2.17 1.44 0.76 0.07 -0.12 2.76 5.41 0.06 4.34 1.67 5.97 0 5.74 -0.26 2.91 -0.06 3.52
26. 26. Netflix Prize (2006-2010) 26Image ref: http://blogs.itmedia.co.jp/saito/2009/09/httpjournalmyco.html Netflix held the open competition to advance collaborative filtering algorithms and to seek the best algorithm.
27. 27. Simon Funk’s Matrix Factorization (2006) 27 Without using SVD (with zero replacement), the Simon’s method learns matrix P and Q to compute observed values only in R 𝑚𝑖𝑛=,> ? ,,8 ∈@ 𝑟,,8 − 𝒑, 𝒒8 C D + 𝜆( 𝒑, D + 𝒒8 D ) Target optimization function musers n items Rating matrix R ≈ ×User Item P Q u: user u; i: item i; pu: u’s latent vector; qi: i’s latent vector
28. 28. Various approaches have been developed … 28Ref: https://www.slideshare.net/databricks/deep-learning-for-recommender-systems-with-nick-pentreath 2003 2006-2009 2010 2013 Scalable models Amazon’s item-based CF Netflix Prize The rise of matrix factorization like Simon Funk’s method Factorization machine Generalized matrix factorization for dealing with various factors Deep Learning ・Deep Factorization machine ・Content2Vec to get content embeddings
29. 29. 3 29 Challenges for recommender systems
30. 30. Remaining challenges 30 Cold start problem How to recommend new items? What to recommend to new users? Serendipity - Only recommendation of favorable items cannot expand user interests - How to recommend unexpected and surprising items? Providing explanations - Recommendation algorithms just decide what to recommend - How can systems persuade users to purchase recommended items? Reviewer’s trust How to remove spam reviewers? How to find high-quality reviewers?
31. 31. Remaining challenges 31 Cold start problem How to recommend new items? What to recommend to new users? Serendipity - Only recommendation of favorable items cannot expand user interests - How to recommend unexpected and surprising items? Providing explanations - Recommendation algorithms just decide what to recommend - How can systems persuade users to purchase recommended items? Reviewer’s trust How to remove spam reviewers? How to find high-quality reviewers?
32. 32. Cold start problem 32 Item1 Item2 Item3 Item4 item5 Kate User1 3 1 2 3 User2 4 3 4 3 User3 3 3 1 5 User4 1 5 5 2 New user New item CF approaches don’t work for new items/users New items/users have no clues to predict unknown scores b/c the CF cannot find neighbor users/items