This presentation provides the theory and basics of recommendation systems specifically details about Collaborative and Content Based Filtering
In this presentation we would only discuss personalised recommendations as opposed to group based or cluster based:
1. Collaborative Filtering
2. Content Based Recommendations
3. Knowledge Based Recommendations
4. Hybrid
Personalised recommendations include a user, from whom we can derive profile and user data and a recommender system thatoutputs a list of items that state affinity of the user for any particular item.
In Collaborative Filtering the results would be the same type of list stating the affinity of a user for an item. For the input we would also include community data based on which we can derive similar usage behaviours and make recommendations based on that.
As you can see here the input is a matrix of ratings given by different users and the output is a predicted rating for a new item. If we were to treat each of these ratings as a coordinate on the Euclidian system we can calculate the “distance” between any two users. Then we can recommend the rating given by the most similar user to the current user or maybe some combination of the most similar n users.
In Collaborative Filtering we are agnostic to what the items the users are using and make recommendations based solely on the behaviour of other users.
Hybrid recommender system takes a mixed approach based on all the other approaches. It leverages all available inputs which may or may not include user data, product features, additional info and even community data
When using a collaborative filtering approach we would need a method to obtain ratings for every pair of user and item. There are 3 ways to do that:
Explicit Rating: A system that allows users to explicitly assign ratings to items
Like Number of stars
Likes or dislikes
E.g. Youtube, Uber etc.
Implicit Rating: A system that derives ratings from user behaviour
Number of visits to a page
Number of recharges
Number of purchases of an item
E.g. Amazon, YouTube etc.
Hybrid
After you have already figured out a way to calculate ratings. The next step would be to decide a way to calculate distance, also called similarity. There are many ways to calculate similarity between two points mathematically. On of the most common of which is mentioned here: Pearson Corelation
22. Vipul Rajan
Lead Developer
Whiteklay
Vipul holds a keen interest in machine learning and recommender
systems. He has worked extensively with Apache Spark leveraging the
platform to provide optimal solutions to a variety of problems in a
conglomerate of use cases.
Hinweis der Redaktion
Recommender System Basics
<number>
RS help to match users with items
<number>
In this presentation we would only discuss personalised recommendations as opposed to group based or cluster based:
1. Collaborative Filtering
2. Content Based Recommendations
3. Knowledge Based Recommendations
4. Hybrid
<number>
Personalised recommendations include a user, from whom we can derive profile and user data and a recommender system thatoutputs a list of items that state affinity of the user for any particular item.
<number>
In Collaborative Filtering the results would be the same type of list stating the affinity of a user for an item. For the input we would also include community data based on which we can derive similar usage behaviours and make recommendations based on that.
<number>
As you can see here the input is a matrix of ratings given by different users and the output is a predicted rating for a new item. If we were to treat each of these ratings as a coordinate on the Euclidian system we can calculate the “distance” between any two users. Then we can recommend the rating given by the most similar user to the current user or maybe some combination of the most similar n users.
<number>
In Collaborative Filtering we are agnostic to what the items the users are using and make recommendations based solely on the behaviour of other users.
<number>
In a content based approach we consider the attributes of the item as well. Instead of calculating the distance between two users we directly calculate the distance between users and different items.
<number>
In knowledge based approach we also specify some additional data along with item and user features. E.g. A sweater might be the closest item by distance to a particular user, but we have some additional knowledge that it’s summer, it would make sense not to recommend the user a sweater.
<number>
Hybrid recommender system takes a mixed approach based on all the other approaches. It leverages all available inputs which may or may not include user data, product features, additional info and even community data
<number>
When using a collaborative filtering approach we would need a method to obtain ratings for every pair of user and item. There are 3 ways to do that:
Explicit Rating: A system that allows users to explicitly assign ratings to items
Like Number of stars
Likes or dislikes
E.g. Youtube, Uber etc.
Implicit Rating: A system that derives ratings from user behaviour
Number of visits to a page
Number of recharges
Number of purchases of an item
E.g. Amazon, YouTube etc.
Hybrid
<number>
After you have already figured out a way to calculate ratings. The next step would be to decide a way to calculate distance, also called similarity. There are many ways to calculate similarity between two points mathematically. On of the most common of which is mentioned here: Pearson Corelation
<number>
<number>
In collaborative filtering instead of assigning items to users you can very well just reverse the whole process and use an item based approach.
<number>
In case of content based approach you’d have to find a way to calculate item attributes.
We would look at an example of TF-IDF
TF: Term Frequency, Measures how often a term appears (density in a document)
IDF: inverse Document Frequency, Aims to reduce the weight of terms that appear in all documents
<number>
Given a keyword I and a document j
TF(I,j)
Term frequency of keyword I in document j
IDF(i)
Inverse document frequency calculated as IDF(i) = log(N/n(i))
N: number of all recommendable documents
n(i): number of documents from N in which keyword i appears
TF – IDF
Is calculated as: TF-IDF(I,j) = TF(I,j) * IDF(i)
<number>