4. Recommendation System
Answers the question:
What do I want next?!?
Very consumer driven.
Must provide good results or a user may not
trust the system in the future.
5. Collaborative Filtering
Base user recommendations off of:
User’s past history.
History of like-minded users.
View data as product X user matrix.
Find a “neighborhood” of similar users
for that user.
Return the top-N recommendations.
6. Early Approaches
Goldberg, et. al. (1992), Using
collaborative filtering to weave an
information tapestry
Konstan, J., el. at (1997), Applying
Collaborative Filtering to Usenet news.
Use Pearson Correlation or cosine similarity
as a measure of similarity to form
neighborhoods.
9. Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.
Scalability - Nearest neighbor
algorithms computation time grows with
the number of products and users.
10. Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.
Scalability - Nearest neighbor
algorithms computation time grows with
the number of products and users.
Synonymy
14. Dimensionality Reduction
Latent Semantic Indexing (LSI)
Algorithm from IR community (late
80s-early 90s.)
Addresses the problems of synonymy,
polysemy, sparsity, and scalability for
large datasets.
15. Dimensionality Reduction
Latent Semantic Indexing (LSI)
Algorithm from IR community (late
80s-early 90s.)
Addresses the problems of synonymy,
polysemy, sparsity, and scalability for
large datasets.
Reduces dimensionality of a dataset
and captures the latent relationships.
16. Dimensionality Reduction
Latent Semantic Indexing (LSI)
Algorithm from IR community (late
80s-early 90s.)
Addresses the problems of synonymy,
polysemy, sparsity, and scalability for
large datasets.
Reduces dimensionality of a dataset
and captures the latent relationships.
Easily maps to CF!
17. Dimensionality Reduction
Latent Semantic Indexing (LSI)
Algorithm from IR community (late
80s-early 90s.)
Addresses the problems of synonymy,
polysemy, sparsity, and scalability for
large datasets.
Reduces dimensionality of a dataset
and captures the latent relationships.
Easily maps to CF!
18. Framing LSI for CF
Products X Users matrix instead of Terms X
Documents.
Netflix Dataset
480,189 users, 17,770 movies, only ~100 milion ratings.
17,770 X 480,189 matrix that is 99% sparse!
About 8.5 billion potential ratings.
19. SVD- The math behind LSI
Singular Value Decomposition
For any M x N matrix A of rank r, it can
decomposed as:
T
A = UΣV
U is a M x M orthogonal matrix.
V is a N X N orthogonal matrix.
Σ is a M x N diagonal matrix whose first r diagonal
entries are the nonzero singular values of A.
σ1 ≥ σ2 ... ≥ σr > σr+1 = ... = σn = 0
20. Related to eigenvalue
decomposition (PCA)
U is the orthornormal eigenspace of
AA^T. Spans the “column space”, known
as left singular vectors.
V is the orthornormal eigenspace of
A^TA. Spans “row space”. Right vectors.
Singular values are the square roots of
the eigenvalues.
21. Reducing Dimensionality
T
Ak = Uk ΣkVk
A_k is the closest approximation to A.
A_k minimizes the Frobenius norm over all
rank-k matrices: ||A − Ak ||F
22. Making Recommendations
Cosine Similarity- common way to find neighborhood.
i· j
cos(i, j) =
||i||2 ∗ || j||2
Somehow base recommendations off of that
neighborhood and its users.
Can also make predictions of products with a simple
dot product if the singular values are combined with
the singular vectors.
1/2 1/2 T
CPprod = Cavg +Uk Sk (c) · Sk Vk (p)
23. Challenges with SVD
Scalability - Once again, compute
time grows with the number of users
and products. O(m^3)
Offline stage.
Online stage.
Even doing the SVD computation offline
is not possible for large datasets.
Other methods are needed.
26. GHA for SVD
Gorrell (2006),GHA for Incremental SVD in
NLP
Based off of Sanger’s (1989) GHA for eigen
decomposition.
a
∆ci b
= ci · b(x − ∑ a a
(a · c j )c j )
j<i
b
∆ci a
= ci · a(b − ∑ b b
(b · c j )c j )
j<i
27. GHA extended by Funk
void train(int user, int movie, real rating)
{
real err = lrate * (rating - predictRating(movie, user));
userValue[user] += err * movieValue[movie];
movieValue[movie] += err * userValue[user];
}
29. Summary
SVD provides an elegant and automatic
recommendation system that has the
potential to scale.
There are many different algorithms to
calculate or at least approximate SVD which
can be used in offline stages for websites
that need to have CF.
Every dataset is different and requires
experimentation with to get the best results.