SlideShare ist ein Scribd-Unternehmen logo
1 von 75
Downloaden Sie, um offline zu lesen
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
1. Introduction
Train/Valid split, underfitting and overfitting
Learning Curve in recommendation engines
2. Evaluation functions
Basic metrics for recommender engines (Precision, Recall, TPR, TNR...)
Regression, Classification, Ranking metrics
3. Loss functions
Optimization problems and Losses functions properties
Regression, Classification, Ranking losses
4. Practical recommendations
Regularization, HP optimization, Embeddings evaluations.
Introduction
Previous Meetup Recap: Recommendation Engine Types
Recommendation
engine
Content-based
Collaborative-filtering
Hybrid engine
Memory-based
Model-based
Item-Item
User-User
User-Item
Model When? Problem definition Solutions strategies
Content-based Item Cold start Least Square, Deep Learning
Item-Item n_users >> n_items Affinity Matrix
User-User n_user << n_items KNN, Affinity Matrix
User-Item Better performance Matrix Factorization, Deep Learning
Previous Meetup Recap: Recommendation Engine Models
Global Avg User Avg Item-Item Linear Linear + Reg Matrix Fact Deep Learning
Domains Baseline Baseline users >> items Known “I” Known “I” Unknown “I” Extra datasets
Model Complexity Trivial Trivial Simple Linear Linear Linear Non-linear
Time Complexity + + +++ ++++ ++++ ++++ ++
Overfit/Underfit Underfit Underfit May Underfit May Overfit May Perform Bad May Overfit Can Overfit
Hyper-Params 0 0 0 1 2 2–3 many
Implementation Numpy Numpy Numpy Numpy Numpy LightFM, Spark NNet libraries
Optimization Problem – Matrix Factorization Example
(or R)
Optimization Problem – Matrix Factorization Example
Optimization problem (definitions)
Sparse matrix of ratings with
m users and n items
Dense matrix of users embeddings
Dense matrix of items embeddings
Optimization Problem – Matrix Factorization Example
Optimization problem (definitions)
Ratings of User #1
Embedding of User #1
Embedding of Item #1
Sparse matrix of ratings with
m users and n items
Dense matrix of users embeddings
Dense matrix of items embeddings
Ratings of User #m
To Item #n
Optimization Problem – Matrix Factorization Example
Optimization problem (definitions)
AVAILABLE DATASET
?
?
Sparse matrix of ratings with
m users and n items
Dense matrix of users embeddings
Dense matrix of items embeddings
Optimization Problem – Matrix Factorization Example
Our goal is to find U and I, such as the difference between each datapoint in R and and the product
between each user and item is minimal.
(or R)
Optimization Problem – Matrix Factorization Example
Our goal is to find U and I, such as the difference between each datapoint in R and and the product
between each user and item is minimal.
(or R)
3. How are we going to solve the problem?
2. What properties are we looking in our outputs?
- Exact rating vs like/dislike vs ranking predictions
1. What type of data do we have?
Ask the Right Questions
(1) What type of data do we have?
(2) What properties are we looking in our outputs?
(3) How are we going to solve the problem?
(4) Which hyper-parameters of my model are the best?
(5) Which model is the best?
Business decisions
Technical decisions
Ask the Right Questions
(1) What type of data do we have?
(2) What properties are we looking in our outputs?
(3) How are we going to solve the problem?
(4) Which hyper-parameters of my model are the best?
(5) Which model is the best?
EVALUATION FUNCTIONS
LOSS FUNCTIONS
RANDOM SEARCH, GP
COMPARE METRICS
ML FOR RECOMMENDATION
Business decisions
Technical decisions
Objectives Types (from data point of view)
Classification
● clic/no-click
● like/dislike/missing
● estimated probability of like (e.g. watch time)
Regression
● absolute rating (e.g. from 1/5 to 5/5)
● number of interactions
Ranking
● estimated order of preference (e.g. watch time)
● pairwise comparisons
Unsupervised
● clustering of items
● clustering of users
Choosing the Right Objective (from business point of view)
Absolute Predictions vs Relative Predictions
Does only the order of the predictions matter?
Sensitivity vs Specificity
Is false positive worst than false negative?
Skewness
Is misclassifying an all-star favorite worst than misclassifying a casual like?
Choosing the Right Objective (from business point of view)
Absolute Predictions vs Relative Predictions
Does only the order of the predictions matter?
Sensitivity vs Specificity
Is false positive worst than false negative?
Skewness
Is misclassifying an all-star favorite worst than misclassifying a casual like?
LOSS FUNCTION THAT PENALIZE MORE
ERRORS IN ALL-STAR RATING
RANKING LOSS FUNCTION
CLASSIFICATION LOSS FUNCTION
Cross Validation – In Traditional Machine Learning
1 2 3 4 4 1 2 3
3 4 1 2 2 3 4 1
Cross Validation – In Recommendation Engines
Dataset
Cross Validation – In Recommendation Engines
Split such as every user is present in train and valid
More stronger: split as every user have 80/20 train and valid
Dataset
Underfitting and Overfitting
Underfitting and Overfitting
Model fails to learn
relations in data
Model is a good fit
for the data
Model fails to
generalize
New samples New samples New samples
+ Complex
Underfitting and Overfitting
Validation
Sample
+ Complex
Underfitting and Overfitting
Validation
Sample
+ Complex
OverfittingUnderfitting
Underfitting and Overfitting
epoch
Loss
Function
or
Metric
Mini-Batch Gradient Descent
for epoch in n_epochs:
● shuffle the batches
● for batch in n_batches:
○ compute the predictions for the batch
○ compute the error for the batch
○ compute the gradient for the batch
○ update the parameters of the model
● plot error vs epoch
Underfitting and Overfitting
A very simple way of checking underfitting
Ground truth
Y
Model predictions
Model is predicting always the same
Predicted Y
Underfitting
Evaluation Functions
What do we want to evaluate?
Classification
● True Positive Rate (TPR)
● True Negative Rate (TNR)
● Precision
● F-measure
Regression
● Mean Square Error (MSE)
Ranking
● Recall@K
● Precision@K
● CG, DCG, nDCG
Ranking/Classification metrics
● AUC
Some common evaluation functions
Regression
Mean Square Error (MSE)
● Easy to compute
● Linear gradient
● Can also be used as loss function
Mean Absolute Error (MAE)
● Easy to compute
● Easy to interpret
● Discontinuous gradient
● Can’t be used as loss function
Classification – Precision vs Recall
TS = Toy Story
KP = Kung Fu Panda
TD = How to train your dragon
A = Annabelle
Model 1 Model 2
TS1
TS2
TS3
KP1
KP2
TS4
KP3
A1
A2
User’s likes
User’s dislikes
Model recommendations
TS1
TS2
TS3
KP1
KP2
TS4
KP3
A1
A2
Classification – Precision vs Recall
TS = Toy Story
KP = Kung Fu Panda
TD = How to train your dragon
A = Annabelle
User’s likes
User’s dislikes
Model recommendations
Recall = 5/7
Precision = 5/5 = 1
Recall = 7/7 = 1
Precision = 7/9
Model 1 Model 2
TS1
TS2
TS3
KP1
KP2
TS4
KP3
A1
A2
TS1
TS2
TS3
KP1
KP2
TS4
KP3
A1
A2
Classification 1/2
True Positive Rate (a.k.a TPR, Recall, Sensitivity)
● Easy to understand
● Useful for likes/dislikes datasets
● Measure of global bias of a model
● 0 <= TPR <=1 (higher is better)
True Negative Rate (a.k.a TNR, Selectivity, Specificity)
● Easy to understand
● Useful for likes/dislikes datasets
● Measure of global bias of a model
● 0 <= TNR <=1 (higher is better)
Classification 2/2
Precision
● Easy to understand
● Useful for likes/dislikes datasets
● Measure quality of recommendation
● 0 <= Precision <=1 (higher is better)
F-measure
● Balance precision and recall
● Not good for recommendation, because
doesn’t take into account True Negatives
● 0 <= F-measure <= 1 (higher is better)
Ranking 1/3
Recall@K
● Count the positive items of the top K items predicted for each user
● Divides that number by the number of positive items for each user
● A perfect score is 1 if the user has K or less positive items and they all appear in the predicted top K
● Independent of the exact values of the predictions, only their relative rank matters
Movie
Toy Story 1 1.0 0.9
Toy Story 2 0.9 0.7
Kung Fu Panda 1 0.7 0.1
Kung Fu Panda 2 0.6 -0.1
Annabelle 1 -0.2 0.4
K = 3
TOP K = ?
TOP K Positive = ?
Total Positive = ?
Recall@K = ?
Ranking 1/3
Recall@K
● Count the positive items of the top K items predicted for each user
● Divides that number by the number of positive items for each user
● A perfect score is 1 if the user has K or less positive items and they all appear in the predicted top K
● Independent of the exact values of the predictions, only their relative rank matters
Movie
Toy Story 1 1.0 0.9
Toy Story 2 0.9 0.7
Kung Fu Panda 1 0.7 0.1
Kung Fu Panda 2 0.6 -0.1
Annabelle 1 -0.2 0.4
K = 3
TOP K = {TS1, TS2, A1}
TOP K Positive = {TS1, TS2} = 2
Total Positive = 4
Recall@K = 2 / 4
top 1
top 2
top 3
Ranking 1/3
Recall@K
● In math terms:
Ranking 2/3
Precision@K
● Count the positive items of the top K items predicted for each user
● Divides that number by K for each user
● A perfect score is 1 if the user has K or more positive items and the top K only contains positives
● Independent of the exact values of the predictions, only their relative rank matters
Movie
Toy Story 1 1.0 0.9
Toy Story 2 0.9 0.7
Kung Fu Panda 1 0.7 0.1
Kung Fu Panda 2 0.6 -0.1
Annabelle 1 -0.2 0.4
K = 3
TOP K = ?
TOP K Positive = ?
Recall@K = ?
Ranking 2/3
Precision@K
● Count the positive items of the top K items predicted for each user
● Divides that number by K for each user
● A perfect score is 1 if the user has K or more positive items and the top K only contains positives
● Independent of the exact values of the predictions, only their relative rank matters
Movie
Toy Story 1 1.0 0.9
Toy Story 2 0.9 0.7
Kung Fu Panda 1 0.7 0.1
Kung Fu Panda 2 0.6 -0.1
Annabelle 1 -0.2 0.4
K = 3
TOP K = {TS1, TS2, A1}
TOP K Positive = {TS1, TS2} = 2
Recall@K = 2 / 3
top 1
top 2
top 3
Ranking 2/3
Precision@K
● In math terms:
Ranking 3/3
CG, DCG, and nDCG
● CG: Sum the true ratings of the Top K items predicted for each user
● DCG: Weight by position in Top K; nDCG: Normalize in [0, 1]
● A perfect score is 1 if the ranking of the prediction is the same as the ranking of the true ratings
● The bigger the score the better
Movie
Toy Story 1 1 0.9
Toy Story 2 0.9 0.7
Kung Fu Panda 1 0.7 0.1
Kung Fu Panda 2 0.6 -0.1
Annabelle 1 -0.2 0.4
K = 3
TOP K = ?
CG = ?
DCG = ?
Ranking 3/3
CG, DCG, and nDCG
● CG: Sum the true ratings of the Top K items predicted for each user
● DCG: Weight by position in Top K; nDCG: Normalize in [0, 1]
● A perfect nDCG is 1 if the ranking of the prediction is the same as the ranking of the true ratings
● The bigger the score the better
Movie
Toy Story 1 1 0.9
Toy Story 2 0.9 0.7
Kung Fu Panda 1 0.7 0.1
Kung Fu Panda 2 0.6 -0.1
Annabelle 1 -0.2 0.4
K = 3
TOP K = {TS1, TS2, A1}
CG = 1.0 + 0.9 - 0.2
DCG = 1/1 + 0.9/2 - 0.2/3
top 1
top 2
top 3
Ranking 3/3
CG, DCG, and nDCG
● CG: Sum the true ratings of the Top K items predicted for each user
● DCG: Weight by position in Top K; nDCG: Normalize in [0, 1]
● A perfect nDCG is 1 if the ranking of the prediction is the same as the ranking of the true ratings
Hybrid Ranking/Classification
AUC
● Vary positive prediction threshold (not just 0)
● Compute TPR and FPR for all possible positive thresholds
● Build Receiver Operating Characteristic (ROC) curve
● Integrate Area Under the ROC Curve (AUC)
Loss functions
Loss Functions vs Evaluation Functions
Evaluation Metrics
● Expensive to evaluate
● Often not smooth
● Often not even derivable
Loss Functions
● Smooth approximations of your evaluation metric
● Well suited for SGD
Loss Functions: How we are going to solve the problem?
Classification loss
● Logistic
● Cross Entropy
● Kullback-Leibler Divergence
Regression loss
● Mean Square Error (MSE)
Ranking loss
● WARP
● BPR
Some common loss functions
Optimization Problems – Basic Formulation with RMSE
Goal: find U and I s.t. the difference between each datapoint in R and and the product between each
user and item is minimal
(or R)
Optimization Problems – General Formulation
Goal: find U and I s.t. the loss function J is minimized.
(or R)
Convex vs Non-Convex Optimization
Convex Non-convex
Convex Optimization
Non-Convex Optimization
Loss Functions – Regression
Mean Square Error
● Typical used loss function for regression. It’s a smooth function. It’s easy to understand.
Regularized Mean Square Error
● Mean square error plus regularization to avoid overfitting.
Loss Functions – Classification
Logistic
● Typical used loss function for classification. Smooth gradient around zero and steep for large errors.
Loss Functions – Classification
Logistic
Loss Functions – Ranking
Weighted Approximate-Rank Pairwise (WARP)
● Approximates DCG-like evaluation metrics
● Smooth and tractable computation
Bayesian Personalised Ranking (BPR)
● Approximates AUC
● Smooth and tractable computation
● Requires binary comparisons (good for binary comparison feedback)
Practical Recommendations
Practical Recommendations
(1) Always compute baseline metrics
(2) Always analyze underfitting vs overfitting
(3) Always do hyperparameter optimization
(4) Always compute multiple metrics for your models
(5) Always analyze the clustering properties of the items/users
(6) Always ask feedback from end users
Practical Recommendations
(1) Always compute baseline metrics
(2) Always analyze underfitting vs overfitting
(3) Always do hyperparameter optimization
(4) Always compute multiple metrics for your models
(5) Always analyze the clustering properties of the items/users
(6) Always ask feedback from end users
COMPARE WITH GLOBAL MODELS IT’S EASY
IF OVERFITTING, USE REGULARIZATION
GRID SEARCH OR GAUSSIAN PROCESS
TPR, TNR, PRECISION, ETC.
ITEM/ITEM SIMILARITIES
EVERYTHING IS ABOUT USER TASTE
(1) Always compute baseline metrics
Global Avg User Avg Item-Item Linear Linear + Reg Matrix Fact Deep Learning
Domains Baseline Baseline users >> items Known “I” Known “I” Unknown “I” Extra datasets
Model Complexity Trivial Trivial Simple Linear Linear Linear Non-linear
Time Complexity + + +++ ++++ ++++ ++++ ++
Overfit/Underfit Underfit Underfit May Underfit May Overfit May Perform Bad May Overfit Can Overfit
Hyper-Params 0 0 0 1 2 2–3 many
Implementation Numpy Numpy Numpy Numpy Numpy LightFM, Spark NNet libraries
(2) Always analyze underfitting vs overfitting
Model-based
● Dropout
● Bagging
Loss-based normalization
● norm: best approximation of sparsity-inducing norm
● norm: very smooth, easy to optimize
Data Augmentation
● Negative Sampling
(3) Always do hyperparameter optimization
Grid Search
Brute force over all the combinations of the parameters
Exponential cost: for 20 parameters, to get only 10 evaluations each, you need 10^20 complete runs
Random Search
Uniformly sample combinations of the parameters
Very easy to implement, very useful in practice
Gaussian Process Optimization
Meta-learning of the validation error given hyper-parameters
Solve exploration/exploitation tradeoff
(3) Always do hyperparameter optimization
Metric to minimize
Metric to maximize
(4) Always compute multiple metrics for your models
(5) Always analyze the clustering properties of the items/users
Items embeddings
● In general, we combine items embeddings with: FEATURES | IMAGE EMBS | NLP EMBS
● After getting the embeddings, we always compute Top-K similarities in well known items
● We use the items embeddings to create clusters and analyze how good they are
(5) Always ask for final users feedback
RECOMMENDATION IS ALL ABOUT USERS TASTE
ASK THEM FOR FEEDBACK!!
Conclusions
Losses and metrics summary table
Name Category loss eval batch-SGD support implicit Comments
MSE Regr ✓ ✓ ✓ ✓ linear gradient
MAE Regr ✓ ✓ easy to interpret
Logistic / XE / KL Classif ✓ ✓ ✓ ✓ flexible truth
Exponential Classif ✓ ✓ exploding gradient
Recall (global) Classif ✓ ✓ ✓ requires negative
Precision (global) Classif ✓ ✓ ✓ requires negative
F-measure
(global)
Classif ✓ ✓ ✓ requires negative
MRR Ranking ✓ considers only 1 item
nDCG Ranking ✓ requires rank
WARP Ranking ✓ for nDCG, p@k, r@k
AUC Hybrid ✓ ✓ ✓ requires negative
BPR Hybrid ✓ ✓ for AUC
Recall@k Hybrid ✓ requires≤k positives
Precision@k Hybrid ✓ requires ≥k positives
Questions
Thank YOU!
Negative Sampling
Problem
● Unary feedback: the best model will always predict “1” for each user and item.
● In general:
○ your model is used in real life to predict (user, item) outside sparse dataset.
○ can’t train on the full (#users x #items) dense matrix.
Negative Sampling Solution
● unary→binary (e.g. click/missing) binary→ternary (e.g. like/dislike/missing)
● sample strategy matters a lot (i.e. how to split train and valid)
● how many negative samples matters a lot
Negative Sampling
Negative Sampling
Split negative feedback in the same proportion
Underfitting and Overfitting – Take Home
(1) For doing cross-validation split data such as almost all users are in training and validation
(2) Use negative sampling to avoid overfitting in your models
(3) Always use learning curves to get more insights about underfitting vs overfitting
(4) Compute mean and variance of your predictions to get insights about underfitting vs overfitting
Loss Functions – Classification
● Equivalent to cross-entropy between the truth and the predicted probability (for 2-classes model)
● Equivalent to Kullback-Leibler divergence between the truth and the predicted probability
● Often used for deep-learning based recommendation engines
● Smooth gradient around zero and steep for large errors
Logistic

Weitere ähnliche Inhalte

Was ist angesagt?

[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...Gabriel Moreira
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresData Science London
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorizationLuis Serrano
 
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...Abhimanyu Lad
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Customer attrition and churn modeling
Customer attrition and churn modelingCustomer attrition and churn modeling
Customer attrition and churn modelingMariya Korsakova
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Ernesto Mislej
 
K- Nearest Neighbor Approach
K- Nearest Neighbor ApproachK- Nearest Neighbor Approach
K- Nearest Neighbor ApproachKumud Arora
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictionsAnton Kulesh
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 

Was ist angesagt? (20)

[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least Squares
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorization
 
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Customer attrition and churn modeling
Customer attrition and churn modelingCustomer attrition and churn modeling
Customer attrition and churn modeling
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
K- Nearest Neighbor Approach
K- Nearest Neighbor ApproachK- Nearest Neighbor Approach
K- Nearest Neighbor Approach
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 

Ähnlich wie Recommender Systems from A to Z – Model Evaluation

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetCrossing Minds
 
Evaluation metrics for binary classification - the ultimate guide
Evaluation metrics for binary classification - the ultimate guideEvaluation metrics for binary classification - the ultimate guide
Evaluation metrics for binary classification - the ultimate guideneptune.ml
 
Yelp Dataset Challenge
Yelp Dataset ChallengeYelp Dataset Challenge
Yelp Dataset ChallengeShrijit Pillai
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummiesMichael Winer
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewYONG ZHENG
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptxshuchismitjha2
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...BigMine
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxajondaree
 
Marketing Research Ppt
Marketing Research PptMarketing Research Ppt
Marketing Research PptVivek Sharma
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsXavier Amatriain
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...ArunkumarAkkineni1
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversionsSudeep Shukla
 
Evaluation of multilabel multi class classification
Evaluation of multilabel multi class classificationEvaluation of multilabel multi class classification
Evaluation of multilabel multi class classificationSridhar Nomula
 

Ähnlich wie Recommender Systems from A to Z – Model Evaluation (20)

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
Mr4 ms10
Mr4 ms10Mr4 ms10
Mr4 ms10
 
Evaluation metrics for binary classification - the ultimate guide
Evaluation metrics for binary classification - the ultimate guideEvaluation metrics for binary classification - the ultimate guide
Evaluation metrics for binary classification - the ultimate guide
 
Yelp Dataset Challenge
Yelp Dataset ChallengeYelp Dataset Challenge
Yelp Dataset Challenge
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
Context-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick ViewContext-aware Recommendation: A Quick View
Context-aware Recommendation: A Quick View
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 
Lecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptxLecture 3.1_ Logistic Regression.pptx
Lecture 3.1_ Logistic Regression.pptx
 
Marketing Research Ppt
Marketing Research PptMarketing Research Ppt
Marketing Research Ppt
 
Agile estimation
Agile estimationAgile estimation
Agile estimation
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...
Gigabyte scale Amazon Product Reviews Sentiment Analysis Challenge: A scalabl...
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
 
Evaluation of multilabel multi class classification
Evaluation of multilabel multi class classificationEvaluation of multilabel multi class classification
Evaluation of multilabel multi class classification
 

Kürzlich hochgeladen

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 

Kürzlich hochgeladen (20)

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 

Recommender Systems from A to Z – Model Evaluation

  • 1.
  • 2. Recommender Systems from A to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 3. Recommender Systems from A to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 4. 1. Introduction Train/Valid split, underfitting and overfitting Learning Curve in recommendation engines 2. Evaluation functions Basic metrics for recommender engines (Precision, Recall, TPR, TNR...) Regression, Classification, Ranking metrics 3. Loss functions Optimization problems and Losses functions properties Regression, Classification, Ranking losses 4. Practical recommendations Regularization, HP optimization, Embeddings evaluations.
  • 6. Previous Meetup Recap: Recommendation Engine Types Recommendation engine Content-based Collaborative-filtering Hybrid engine Memory-based Model-based Item-Item User-User User-Item Model When? Problem definition Solutions strategies Content-based Item Cold start Least Square, Deep Learning Item-Item n_users >> n_items Affinity Matrix User-User n_user << n_items KNN, Affinity Matrix User-Item Better performance Matrix Factorization, Deep Learning
  • 7. Previous Meetup Recap: Recommendation Engine Models Global Avg User Avg Item-Item Linear Linear + Reg Matrix Fact Deep Learning Domains Baseline Baseline users >> items Known “I” Known “I” Unknown “I” Extra datasets Model Complexity Trivial Trivial Simple Linear Linear Linear Non-linear Time Complexity + + +++ ++++ ++++ ++++ ++ Overfit/Underfit Underfit Underfit May Underfit May Overfit May Perform Bad May Overfit Can Overfit Hyper-Params 0 0 0 1 2 2–3 many Implementation Numpy Numpy Numpy Numpy Numpy LightFM, Spark NNet libraries
  • 8. Optimization Problem – Matrix Factorization Example (or R)
  • 9. Optimization Problem – Matrix Factorization Example Optimization problem (definitions) Sparse matrix of ratings with m users and n items Dense matrix of users embeddings Dense matrix of items embeddings
  • 10. Optimization Problem – Matrix Factorization Example Optimization problem (definitions) Ratings of User #1 Embedding of User #1 Embedding of Item #1 Sparse matrix of ratings with m users and n items Dense matrix of users embeddings Dense matrix of items embeddings Ratings of User #m To Item #n
  • 11. Optimization Problem – Matrix Factorization Example Optimization problem (definitions) AVAILABLE DATASET ? ? Sparse matrix of ratings with m users and n items Dense matrix of users embeddings Dense matrix of items embeddings
  • 12. Optimization Problem – Matrix Factorization Example Our goal is to find U and I, such as the difference between each datapoint in R and and the product between each user and item is minimal. (or R)
  • 13. Optimization Problem – Matrix Factorization Example Our goal is to find U and I, such as the difference between each datapoint in R and and the product between each user and item is minimal. (or R) 3. How are we going to solve the problem? 2. What properties are we looking in our outputs? - Exact rating vs like/dislike vs ranking predictions 1. What type of data do we have?
  • 14. Ask the Right Questions (1) What type of data do we have? (2) What properties are we looking in our outputs? (3) How are we going to solve the problem? (4) Which hyper-parameters of my model are the best? (5) Which model is the best? Business decisions Technical decisions
  • 15. Ask the Right Questions (1) What type of data do we have? (2) What properties are we looking in our outputs? (3) How are we going to solve the problem? (4) Which hyper-parameters of my model are the best? (5) Which model is the best? EVALUATION FUNCTIONS LOSS FUNCTIONS RANDOM SEARCH, GP COMPARE METRICS ML FOR RECOMMENDATION Business decisions Technical decisions
  • 16. Objectives Types (from data point of view) Classification ● clic/no-click ● like/dislike/missing ● estimated probability of like (e.g. watch time) Regression ● absolute rating (e.g. from 1/5 to 5/5) ● number of interactions Ranking ● estimated order of preference (e.g. watch time) ● pairwise comparisons Unsupervised ● clustering of items ● clustering of users
  • 17. Choosing the Right Objective (from business point of view) Absolute Predictions vs Relative Predictions Does only the order of the predictions matter? Sensitivity vs Specificity Is false positive worst than false negative? Skewness Is misclassifying an all-star favorite worst than misclassifying a casual like?
  • 18. Choosing the Right Objective (from business point of view) Absolute Predictions vs Relative Predictions Does only the order of the predictions matter? Sensitivity vs Specificity Is false positive worst than false negative? Skewness Is misclassifying an all-star favorite worst than misclassifying a casual like? LOSS FUNCTION THAT PENALIZE MORE ERRORS IN ALL-STAR RATING RANKING LOSS FUNCTION CLASSIFICATION LOSS FUNCTION
  • 19. Cross Validation – In Traditional Machine Learning 1 2 3 4 4 1 2 3 3 4 1 2 2 3 4 1
  • 20. Cross Validation – In Recommendation Engines Dataset
  • 21. Cross Validation – In Recommendation Engines Split such as every user is present in train and valid More stronger: split as every user have 80/20 train and valid Dataset
  • 23. Underfitting and Overfitting Model fails to learn relations in data Model is a good fit for the data Model fails to generalize New samples New samples New samples + Complex
  • 25. Underfitting and Overfitting Validation Sample + Complex OverfittingUnderfitting
  • 26. Underfitting and Overfitting epoch Loss Function or Metric Mini-Batch Gradient Descent for epoch in n_epochs: ● shuffle the batches ● for batch in n_batches: ○ compute the predictions for the batch ○ compute the error for the batch ○ compute the gradient for the batch ○ update the parameters of the model ● plot error vs epoch
  • 27. Underfitting and Overfitting A very simple way of checking underfitting Ground truth Y Model predictions Model is predicting always the same Predicted Y Underfitting
  • 29. What do we want to evaluate? Classification ● True Positive Rate (TPR) ● True Negative Rate (TNR) ● Precision ● F-measure Regression ● Mean Square Error (MSE) Ranking ● Recall@K ● Precision@K ● CG, DCG, nDCG Ranking/Classification metrics ● AUC Some common evaluation functions
  • 30. Regression Mean Square Error (MSE) ● Easy to compute ● Linear gradient ● Can also be used as loss function Mean Absolute Error (MAE) ● Easy to compute ● Easy to interpret ● Discontinuous gradient ● Can’t be used as loss function
  • 31. Classification – Precision vs Recall TS = Toy Story KP = Kung Fu Panda TD = How to train your dragon A = Annabelle Model 1 Model 2 TS1 TS2 TS3 KP1 KP2 TS4 KP3 A1 A2 User’s likes User’s dislikes Model recommendations TS1 TS2 TS3 KP1 KP2 TS4 KP3 A1 A2
  • 32. Classification – Precision vs Recall TS = Toy Story KP = Kung Fu Panda TD = How to train your dragon A = Annabelle User’s likes User’s dislikes Model recommendations Recall = 5/7 Precision = 5/5 = 1 Recall = 7/7 = 1 Precision = 7/9 Model 1 Model 2 TS1 TS2 TS3 KP1 KP2 TS4 KP3 A1 A2 TS1 TS2 TS3 KP1 KP2 TS4 KP3 A1 A2
  • 33. Classification 1/2 True Positive Rate (a.k.a TPR, Recall, Sensitivity) ● Easy to understand ● Useful for likes/dislikes datasets ● Measure of global bias of a model ● 0 <= TPR <=1 (higher is better) True Negative Rate (a.k.a TNR, Selectivity, Specificity) ● Easy to understand ● Useful for likes/dislikes datasets ● Measure of global bias of a model ● 0 <= TNR <=1 (higher is better)
  • 34. Classification 2/2 Precision ● Easy to understand ● Useful for likes/dislikes datasets ● Measure quality of recommendation ● 0 <= Precision <=1 (higher is better) F-measure ● Balance precision and recall ● Not good for recommendation, because doesn’t take into account True Negatives ● 0 <= F-measure <= 1 (higher is better)
  • 35. Ranking 1/3 Recall@K ● Count the positive items of the top K items predicted for each user ● Divides that number by the number of positive items for each user ● A perfect score is 1 if the user has K or less positive items and they all appear in the predicted top K ● Independent of the exact values of the predictions, only their relative rank matters Movie Toy Story 1 1.0 0.9 Toy Story 2 0.9 0.7 Kung Fu Panda 1 0.7 0.1 Kung Fu Panda 2 0.6 -0.1 Annabelle 1 -0.2 0.4 K = 3 TOP K = ? TOP K Positive = ? Total Positive = ? Recall@K = ?
  • 36. Ranking 1/3 Recall@K ● Count the positive items of the top K items predicted for each user ● Divides that number by the number of positive items for each user ● A perfect score is 1 if the user has K or less positive items and they all appear in the predicted top K ● Independent of the exact values of the predictions, only their relative rank matters Movie Toy Story 1 1.0 0.9 Toy Story 2 0.9 0.7 Kung Fu Panda 1 0.7 0.1 Kung Fu Panda 2 0.6 -0.1 Annabelle 1 -0.2 0.4 K = 3 TOP K = {TS1, TS2, A1} TOP K Positive = {TS1, TS2} = 2 Total Positive = 4 Recall@K = 2 / 4 top 1 top 2 top 3
  • 38. Ranking 2/3 Precision@K ● Count the positive items of the top K items predicted for each user ● Divides that number by K for each user ● A perfect score is 1 if the user has K or more positive items and the top K only contains positives ● Independent of the exact values of the predictions, only their relative rank matters Movie Toy Story 1 1.0 0.9 Toy Story 2 0.9 0.7 Kung Fu Panda 1 0.7 0.1 Kung Fu Panda 2 0.6 -0.1 Annabelle 1 -0.2 0.4 K = 3 TOP K = ? TOP K Positive = ? Recall@K = ?
  • 39. Ranking 2/3 Precision@K ● Count the positive items of the top K items predicted for each user ● Divides that number by K for each user ● A perfect score is 1 if the user has K or more positive items and the top K only contains positives ● Independent of the exact values of the predictions, only their relative rank matters Movie Toy Story 1 1.0 0.9 Toy Story 2 0.9 0.7 Kung Fu Panda 1 0.7 0.1 Kung Fu Panda 2 0.6 -0.1 Annabelle 1 -0.2 0.4 K = 3 TOP K = {TS1, TS2, A1} TOP K Positive = {TS1, TS2} = 2 Recall@K = 2 / 3 top 1 top 2 top 3
  • 41. Ranking 3/3 CG, DCG, and nDCG ● CG: Sum the true ratings of the Top K items predicted for each user ● DCG: Weight by position in Top K; nDCG: Normalize in [0, 1] ● A perfect score is 1 if the ranking of the prediction is the same as the ranking of the true ratings ● The bigger the score the better Movie Toy Story 1 1 0.9 Toy Story 2 0.9 0.7 Kung Fu Panda 1 0.7 0.1 Kung Fu Panda 2 0.6 -0.1 Annabelle 1 -0.2 0.4 K = 3 TOP K = ? CG = ? DCG = ?
  • 42. Ranking 3/3 CG, DCG, and nDCG ● CG: Sum the true ratings of the Top K items predicted for each user ● DCG: Weight by position in Top K; nDCG: Normalize in [0, 1] ● A perfect nDCG is 1 if the ranking of the prediction is the same as the ranking of the true ratings ● The bigger the score the better Movie Toy Story 1 1 0.9 Toy Story 2 0.9 0.7 Kung Fu Panda 1 0.7 0.1 Kung Fu Panda 2 0.6 -0.1 Annabelle 1 -0.2 0.4 K = 3 TOP K = {TS1, TS2, A1} CG = 1.0 + 0.9 - 0.2 DCG = 1/1 + 0.9/2 - 0.2/3 top 1 top 2 top 3
  • 43. Ranking 3/3 CG, DCG, and nDCG ● CG: Sum the true ratings of the Top K items predicted for each user ● DCG: Weight by position in Top K; nDCG: Normalize in [0, 1] ● A perfect nDCG is 1 if the ranking of the prediction is the same as the ranking of the true ratings
  • 44. Hybrid Ranking/Classification AUC ● Vary positive prediction threshold (not just 0) ● Compute TPR and FPR for all possible positive thresholds ● Build Receiver Operating Characteristic (ROC) curve ● Integrate Area Under the ROC Curve (AUC)
  • 46. Loss Functions vs Evaluation Functions Evaluation Metrics ● Expensive to evaluate ● Often not smooth ● Often not even derivable Loss Functions ● Smooth approximations of your evaluation metric ● Well suited for SGD
  • 47. Loss Functions: How we are going to solve the problem? Classification loss ● Logistic ● Cross Entropy ● Kullback-Leibler Divergence Regression loss ● Mean Square Error (MSE) Ranking loss ● WARP ● BPR Some common loss functions
  • 48. Optimization Problems – Basic Formulation with RMSE Goal: find U and I s.t. the difference between each datapoint in R and and the product between each user and item is minimal (or R)
  • 49. Optimization Problems – General Formulation Goal: find U and I s.t. the loss function J is minimized. (or R)
  • 50. Convex vs Non-Convex Optimization Convex Non-convex
  • 53. Loss Functions – Regression Mean Square Error ● Typical used loss function for regression. It’s a smooth function. It’s easy to understand. Regularized Mean Square Error ● Mean square error plus regularization to avoid overfitting.
  • 54. Loss Functions – Classification Logistic ● Typical used loss function for classification. Smooth gradient around zero and steep for large errors.
  • 55. Loss Functions – Classification Logistic
  • 56. Loss Functions – Ranking Weighted Approximate-Rank Pairwise (WARP) ● Approximates DCG-like evaluation metrics ● Smooth and tractable computation Bayesian Personalised Ranking (BPR) ● Approximates AUC ● Smooth and tractable computation ● Requires binary comparisons (good for binary comparison feedback)
  • 58. Practical Recommendations (1) Always compute baseline metrics (2) Always analyze underfitting vs overfitting (3) Always do hyperparameter optimization (4) Always compute multiple metrics for your models (5) Always analyze the clustering properties of the items/users (6) Always ask feedback from end users
  • 59. Practical Recommendations (1) Always compute baseline metrics (2) Always analyze underfitting vs overfitting (3) Always do hyperparameter optimization (4) Always compute multiple metrics for your models (5) Always analyze the clustering properties of the items/users (6) Always ask feedback from end users COMPARE WITH GLOBAL MODELS IT’S EASY IF OVERFITTING, USE REGULARIZATION GRID SEARCH OR GAUSSIAN PROCESS TPR, TNR, PRECISION, ETC. ITEM/ITEM SIMILARITIES EVERYTHING IS ABOUT USER TASTE
  • 60. (1) Always compute baseline metrics Global Avg User Avg Item-Item Linear Linear + Reg Matrix Fact Deep Learning Domains Baseline Baseline users >> items Known “I” Known “I” Unknown “I” Extra datasets Model Complexity Trivial Trivial Simple Linear Linear Linear Non-linear Time Complexity + + +++ ++++ ++++ ++++ ++ Overfit/Underfit Underfit Underfit May Underfit May Overfit May Perform Bad May Overfit Can Overfit Hyper-Params 0 0 0 1 2 2–3 many Implementation Numpy Numpy Numpy Numpy Numpy LightFM, Spark NNet libraries
  • 61. (2) Always analyze underfitting vs overfitting Model-based ● Dropout ● Bagging Loss-based normalization ● norm: best approximation of sparsity-inducing norm ● norm: very smooth, easy to optimize Data Augmentation ● Negative Sampling
  • 62. (3) Always do hyperparameter optimization Grid Search Brute force over all the combinations of the parameters Exponential cost: for 20 parameters, to get only 10 evaluations each, you need 10^20 complete runs Random Search Uniformly sample combinations of the parameters Very easy to implement, very useful in practice Gaussian Process Optimization Meta-learning of the validation error given hyper-parameters Solve exploration/exploitation tradeoff
  • 63. (3) Always do hyperparameter optimization Metric to minimize Metric to maximize
  • 64. (4) Always compute multiple metrics for your models
  • 65. (5) Always analyze the clustering properties of the items/users Items embeddings ● In general, we combine items embeddings with: FEATURES | IMAGE EMBS | NLP EMBS ● After getting the embeddings, we always compute Top-K similarities in well known items ● We use the items embeddings to create clusters and analyze how good they are
  • 66. (5) Always ask for final users feedback RECOMMENDATION IS ALL ABOUT USERS TASTE ASK THEM FOR FEEDBACK!!
  • 68. Losses and metrics summary table Name Category loss eval batch-SGD support implicit Comments MSE Regr ✓ ✓ ✓ ✓ linear gradient MAE Regr ✓ ✓ easy to interpret Logistic / XE / KL Classif ✓ ✓ ✓ ✓ flexible truth Exponential Classif ✓ ✓ exploding gradient Recall (global) Classif ✓ ✓ ✓ requires negative Precision (global) Classif ✓ ✓ ✓ requires negative F-measure (global) Classif ✓ ✓ ✓ requires negative MRR Ranking ✓ considers only 1 item nDCG Ranking ✓ requires rank WARP Ranking ✓ for nDCG, p@k, r@k AUC Hybrid ✓ ✓ ✓ requires negative BPR Hybrid ✓ ✓ for AUC Recall@k Hybrid ✓ requires≤k positives Precision@k Hybrid ✓ requires ≥k positives
  • 71. Negative Sampling Problem ● Unary feedback: the best model will always predict “1” for each user and item. ● In general: ○ your model is used in real life to predict (user, item) outside sparse dataset. ○ can’t train on the full (#users x #items) dense matrix. Negative Sampling Solution ● unary→binary (e.g. click/missing) binary→ternary (e.g. like/dislike/missing) ● sample strategy matters a lot (i.e. how to split train and valid) ● how many negative samples matters a lot
  • 73. Negative Sampling Split negative feedback in the same proportion
  • 74. Underfitting and Overfitting – Take Home (1) For doing cross-validation split data such as almost all users are in training and validation (2) Use negative sampling to avoid overfitting in your models (3) Always use learning curves to get more insights about underfitting vs overfitting (4) Compute mean and variance of your predictions to get insights about underfitting vs overfitting
  • 75. Loss Functions – Classification ● Equivalent to cross-entropy between the truth and the predicted probability (for 2-classes model) ● Equivalent to Kullback-Leibler divergence between the truth and the predicted probability ● Often used for deep-learning based recommendation engines ● Smooth gradient around zero and steep for large errors Logistic