Recommender Systems from A to Z – The Right Dataset

Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment

1. Having the right Data
Explicit vs Implicit
Likes/Dislikes vs Ratings
1. Rating Dataset Analysis
Density
Connectivity
1. Items Features and Users Features
Unsupervised Learning
Supervised Learning
1. Data Preprocessing
Unsupervised Dimensionality Reduction
Supervised Dimensionality Reduction

Having the Right Data – Explicit & Implicit Feedback
Explicit Feedback
Implicit Feedback

Having the Right Data – Explicit & Implicit Feedback
Explicit Feedback
● Offers the preferences itself
● Clean data (aligned with your goal)
● Cost to collect
Implicit Feedback
● Offers a level of confidence on user preferences
● Very easy to have a lot
● dangerous to interpret

Having the Right Data – Implicit vs Like/Dislike vs Ratings
Explicit Feedback
● Classification (e.g. like/dislike/skip)
● Regression (e.g. star ratings)
● Ranking (e.g. pairwise comparison)
Implicit Feedback
● With Implicit Negative Feedback (e.g. watch-time or play-time of media, like/skip action)
● Without Implicit Negative Feedback (e.g. like only, search history, purchase history)

Explicit Feedback
● Regression (e.g. star ratings) => best data to compute absolute prediction of taste
● Ranking (e.g. pairwise comparison) => best data to compute top-k recommendations
Implicit Feedback
=> test evaluation require bias and model selection is very hard

Explicit Feedback
● Regression (e.g. star ratings) => best data to compute absolute prediction of taste
● Ranking (e.g. pairwise comparison) => best data to compute top-k recommendations
Implicit Feedback
=> test evaluation require bias and model selection is very hard
Take-Home
the data you have affects how you train your models!

Having the Right Data – Netflix

Context
“Context is any information that can be used to characterize the situation of an entity” -
Anind K. Dey 2001

Context
“Context is any information that can be used to characterize the situation of an entity” -
Anind K. Dey 2001
Representative Context
Fully Observable and static
Interactive Context
Non-fully observable and dynamic

Context – Model
Rating Dataset
Instead of tuple (user, item, rating), we consider (user, item, context, rating)
Model
For similarity-based model (user-user or item-item), we need to modify how we compute
the similarity to take context into account
For matrix-factorization model (user-item), we need to add a dimension and use tensor-
factorization instead, which is much more challenging

Rating Dataset Analysis – From Matrix to Graph

Rating Dataset Analysis – Density & Connectivity
General Principle in Collaborative-Filtering
The ability to learn anything on a user or an item is driven by its degree in the graph.
The ability to recommend an item to a user is driven by how connected they are in the graph.
Density and Sparsity
Density of a graph with users, items and ratings = (typically in [0.001–0.01])
Connectivity
There is no information learnt from a user or an item with degree one.
Example: if we have one user with 100 ratings on items with only one rating each, we can remove all
these items, the user and its 100 ratings from the dataset

Rating Dataset Analysis – Sub-graph of minimal degree 2

Rating Dataset Analysis - Use Case: Airbnb

Reminder: Collaborative Filtering – User-Item

Reminder: Embedding Based Model
…
… …
User Item …
…

Reminder: Collaborative Filtering – Item-Item

Reminder: Similarity Based Model
1.00 0.00 -1.00 0.00 -1.001
-0.95 -1.00 1.001.00-1.00 1
-1.00 -1.00 -0.95 1.00 0.00 1.00
1.4

Items & Users Features
1. Quantitative Features
2. Knowledge Graph
3. Deep Content Extraction

Items & Users Features – Quantitative Features
Discrete
● number of episodes in TV shows
● number of purchase made by user
Continuous
● price of item
● age of user
● movie budget
● date released

For similarity-based models (user-user or item-item):
concatenate rating matrices and features, and use same similarity metric (e.g. dot product)
C = Cost, Y = Year, D =
Duration

For embedding-based models (user-item):
compute embedding on rating matrices only, and then concatenate embeddings with features
C = Cost, Y = Year, D =
Duration

Items & Users Features – Knowledge Graph
2. Knowledge Graph

One-to-many (Categorical)
● type of item
● author of a book
● gender of user
Many-to-many (Ontological)
● tags/labels/genres of an item
● all actors of a movie
● selected preferences of user

For similarity-based model (item-item, user-user):
concatenate rating matrices and knowledge graph seen as a sparse matrix, and use same similarity
metric (e.g. dot product)
D =Drama, A = Action, R = Romance

For embeddings-based model (user-item):
We first need to convert the graph-based item-features into dense vectors (dimension reduction), and
then concatenate these vectors to the embeddings

Items & Users Features – Deep Content Extraction
2. Knowledge Graph

Items & Users Features – Deep Content Extraction
Every single item is not just about the available
meta-data.
Encode information from:
● Images (CNN)
● Text Information (NLP)
● Audio (LSTM)
Input
A documentary which examines the
creation and co-production of the
popular children’s television
program in three developing
countries: Bangladesh, Kosovo, and
South Africa.
Prediction
Comedy,
Adventure, Family,
Animation
In his spectacular film debut,
young Babar, King of the
Elephants must save his homeland
from certain destruction by
Rataxes and his band of invading
rhinos.
Documentary, History
Comedy,
Adventure, Family,
Animation
Adventure, War,
Documentary, Music

Items & Users Features – Deep Content Extraction – Images
Pre-trained Convolutional Neural Networks
are widely available
● ResNet50
● Vgg16
● AlexNet

Items & Users Features – Deep Content Extraction – Text
Pre-trained NLP models are widely available
● Word2vec, GloVe, FastText
● SkipThought
● Universal Sentence Encoders
● Elmo
Note: pre-trained complex models like Bi-LSTM do not
work well for cross-domain

Data Preprocessing
Goal
Given a (sparse) matrix of items features I (n-items, n-entities), find the best matrix W (n-entities, d) so
that IW is a dense matrix (n-items, d) that can be used concatenated to item embeddings.
Unsupervised vs Supervised
We say “supervised dimension reduction” when we use ratings
Supervised works better if the items with ratings are aligned with items with features.
Unsupervised works better if you have much more items with features than items with ratings.

Data Preprocessing
1. Unsupervised Dimensionality Reduction
2. Supervised Dimensionality Reduction

Data Preprocessing – PCA
PCA (Principal Component Analysis) is a well known technique for doing feature extraction
PCA projects the data into a new feature space with less dimensions that the original one, and at the
same time, retaining the most relevant information
Feature space of dimension 3 Feature space of dimension 2
● PCA reduce the dimension of
the input data by considering
the dimensions with higher
variance.
● PCA can also by applied to
sparse data.

Data Preprocessing – Unsupervised Random Projection
Random Projection (RP) is another technique for doing dimension reduction
We multiply I by a random matrix T, and verify that the distance between two points is preserved after
the transformation within a certain error
Advantage
● RP is computationally more efficient than PCA
● It’s useful in very high dimension scenarios
Disadvantage
● PCA is the optimal linear projection from an space of dimension d to an space of dimension d’
(d >= d’)

Data Preprocessing – Unsupervised Deep Learning
Graph embeddings Algorithms
● Node2vec
● DeepWalk
● Line
Not often used, so there are no robust tools. They’re all on github in python/C++
Theoretical Remarks
They are actually converging to matrix-factorization of Laplacian-like normalization of the graph,
but may be more flexible and memory-friendly

Data Preprocessing – Supervised Linear Dimensionality Reduction
Given R (n-users, n-items) sparse and I (n-items, n-entities) sparse, find the best matrix W (n-
entities, d) to learn R with a linear model:
✓ Works for both dense and sparse features

Data Preprocessing – Deep Learning Dimensionality Reduction
Directly add the Knowledge Graph as part of the training data (not pre-processing anymore)
Learn embeddings for user, item, user-entities, item-entities together

The Right Dataset – Summary
Data > Pre-processing > Model
● The rating graph needs to be as dense and connected as possible
● Explicit feedback is better than Implicit feedback if you can
● The type of the ratings (binary vs continuous) will affect how you train models.
● Having Negative Feedback is important
● Context helps adding information
● User Features and Item Features help adding information, but require heavy pre-processing

Recommender Systems from A to Z – The Right Dataset

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Recommender Systems from A to Z – The Right Dataset

Ähnlich wie Recommender Systems from A to Z – The Right Dataset (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Recommender Systems from A to Z – The Right Dataset