SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
1. Introduction
Optimization problem, linear regression and Stochastic Gradient Descent (SGD)
1. Baseline models
Global average, user average and item-item models
1. Basic linear models
Least Squares (LS)
Regularized Least Squares (RLS)
1. Matrix factorization
Matrix Factorization, analytical solution and numerical solution
1. Non-linear models
Basic and Complex Deep Learning model
Introduction
Model training – Introduction
Explicit vs Implicit feedback
Explicit feedback
(users’ ratings)
Implicit feedback
(users’ clicks)
Model training – Introduction
Explicit vs Implicit feedback
Explicit feedback
(users’ ratings)
Implicit feedback
(users’ clicks)
Explicit feedback Implicit feedback
Example Domains Movies, Tv-Shows, Music Marketplaces, Businesses
Example Data type Like/Dislike, Stars Clicks, Play-time, Purchases
Complexity Clean, Costly, Easy to interpret Dirty, Cheap, Difficult to interpret
Model training – Introduction
Recommendation engine types
Recommendation
engine
Content-based
Collaborative-filtering
Hybrid engine
Memory-based
Model-based
Item-Item
User-User
User-Item
Model training – Introduction
Recommendation engine types
Recommendation
engine
Content-based
Collaborative-filtering
Hybrid engine
Memory-based
Model-based
Item-Item
User-User
User-Item
Model When? Linear Problem definition Solutions strategies
Content-based Item Cold start Least Square, Deep Learning
Item-Item n_users >> n_items Affinity Matrix
User-User n_user << n_items KNN, Affinity Matrix
User-Item Better performance Matrix Factorization, Deep Learning
Model training – Introduction
Recommendation engine types
Recommendation
engine
Content-based
Collaborative-based
Hybrid engine
Memory-based
Model-based
Item-Item
User-User
User-Item
Model When? Linear Problem definition Solutions strategies
Content-based Item Cold start Least Square, Deep Learning
Item-Item n_users >> n_items Affinity Matrix
User-User n_user << n_items KNN, Affinity Matrix
User-Item Better performance Matrix Factorization, Deep Learning
Model training – Introduction - Optimization
(or R)
Model training – Introduction - Optimization
Optimization problem (definitions)
Sparse matrix of ratings with
m users and n items
Dense matrix of users embeddings
Dense matrix of items embeddings
Model training – Introduction - Optimization
Optimization problem (definitions)
Ratings of User #1
Embedding of User #1
Embedding of Item #1
Sparse matrix of ratings with
m users and n items
Dense matrix of users embeddings
Dense matrix of items embeddings
Ratings of User #m
To Item #n
Model training – Introduction - Optimization
Optimization problem (definitions)
AVAILABLE DATASET
?
?
Sparse matrix of ratings with
m users and n items
Dense matrix of users embeddings
Dense matrix of items embeddings
Model training – Introduction - Optimization
Optimization problem (basic formulation with RMSE)
Our goal is to find U and I, such as the difference between each datapoint in R and and the product
between each user and item is minimal.
(or R)
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Content-based
Content-based with Regularization
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Content-based
Content-based with Regularization
Available data
Regularization to
avoid overfitting
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Content-based
Content-based with Regularization
Take home
● In content-based models we already know I (items features)
● We can find a linear solutions to this problem using Least Squares
Available data
Regularization to
avoid overfitting
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Collaborative-filtering
Collaborative-filtering with Regularization
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Collaborative-filtering
Collaborative-filtering with Regularization
Available data
Regularization to
avoid overfitting
Model training – Introduction - Optimization
Optimization problem (more complex formulation)
Collaborative-filtering
Collaborative-filtering with Regularization
Available data
Regularization to
avoid overfitting
Take home
● In collaborative-filtering we want to find U and I (users and items embeddings)
● We can find a linear solutions to this problem using Matrix Factorization and SGD
Model training – Introduction - Optimization
How to analytical solve an optimization problem?
Let’s start with the simple optimization problem: linear regression without regularization.
With m > n and. We want to find W such as:
Model training – Introduction - Optimization
How to analytical solve an optimization problem?
Let’s start with the simple optimization problem: linear regression without regularization.
With m > n and. We want to find W such as:
Add column of ones
to support w0
Scalar numbers
Model training – Introduction - Optimization
How to numerical solve an optimization problem?
Gradient descent: Start with random values for W and move in the opposite direction of the gradient
By taking just one sample
Model training – Introduction - Optimization
How to numerical solve an optimization problem?
Gradient descent: Start with random values for W and move in the opposite direction of the gradient
By taking just one sample
J(w)
Model training – Introduction - Optimization
Gradient Descent algorithm Stochastic Gradient Descent algorithm
for epoch in n_epochs:
● compute the predictions for all the samples
● compute the error between truth and predictions
● compute the gradient using all the samples
● update the parameters of the model
for epoch in n_epochs:
● shuffle the samples
● for sample in n_samples:
○ compute the predictions for the sample
○ compute the error between truth and
predictions
○ compute the gradient using the sample
○ update the parameters of the model
Mini-Batch Gradient Descent algorithm
for epoch in n_epochs:
● shuffle the batches
● for batch in n_batches:
○ compute the predictions for the batch
○ compute the error for the batch
○ compute the gradient for the batch
○ update the parameters of the model
Model training – Introduction - Optimization
Gradient Descent comparison
Gradient Descent Stochastic Gradient Descent Mini-Batch Gradient Descent
Gradient
Speed Very Fast (vectorized) Slow (compute sample by sample) Fast (vectorized)
Memory O(dataset) O(1) O(batch)
Convergence Needs more epochs Needs less epochs Middle point between GD and SGD
Gradient Stability Smooth updates in params Noisy updates in params Middle point between GD and SGD
Model training – Introduction - Optimization
A Problem with Implicit Feedback
With datasets with only unary positive feedback (e.g. clicks history)
Negative Sampling
Common fix: add random users and items with r=0
Model training – Introduction - Optimization
A Problem with Implicit Feedback
With datasets with only unary positive feedback (e.g. clicks history)
Negative Sampling
Common fix: add random users and items with r=0
Uniform distribution
Dataset
Model training – Introduction - Optimization
Negative Sampling
Common fix: add random users and items with rating=0
● Expresses “unknowns items” from users
● Acts as a regularizer
● Works also for explicit feedback
Baseline models
Model Training – Baseline models
Introduction
● Before starting to train models, always compute a baseline
● Baselines are very useful to debug more complex models
● As a general rule:
○ Very basic models can’t capture all the details on the training data and tend to underfit
○ Very complex models capture every detail on the training data and tend to overfit
● Note: During this presentation we will be using RMSE for comparing models performance
Model Training – Baseline models
Global Average
Average = 3.64
3.64
3.64
3.64
3.64
3.64
3.64
Prediction
RMSE = sqrt((2 - 3.64)^2 + (1-3.64)^2 + …)
RMSE = sqrt(4.13)
Model Training – Baseline models
Global average - Numpy code
importnumpyas np
from scipy.sparse import csr_matrix
rows= np.array([0,0,0,1,1,2,2,2,2,3,3,3,4,4,5,5,5])
cols = np.array([0,1,5,3,5,0,1,2,4,0,3,5,0,2,1,3,4])
data = np.array([2,5,4,1,5,2,4,5,4,4,5,1,5,2,1,4,2])
ratings= csr_matrix((data,(rows, cols)), shape=(6, 6))
idx = np.random.permutation(data.size)
idx_train = idx[0:int(idx.size*0.8)]
idx_valid = idx[int(idx.size*0.8):]
global_avg= data[idx_train].mean()
rmse = np.sqrt(((data[idx_valid]- global_avg)**2).sum())
Model Training – Baseline models
User average
Average u1 = 4.50
Average u2 = 5.00
Average u3 = 3.67
4.50
5.00
3.67
2.50
5.00
2.50
Prediction
RMSE = sqrt((2 - 4.5)^2 + (1-5.0)^2 ...)
RMSE = sqrt(6.15)
Average u4 = 2.50
Average u5 = 5.00
Average u6 = 2.50
Model Training – Baseline models
User average - Numpy code
rows= np.array([0,0,0,1,1,2,2,2,2,3,3,3,4,4,5,5,5])
cols = np.array([0,1,5,3,5,0,1,2,4,0,3,5,0,2,1,3,4])
data = np.array([2,5,4,1,5,2,4,5,4,4,5,1,5,2,1,4,2])
ratings= csr_matrix((data,(rows, cols)), shape=(6, 6))
idx = np.random.permutation(data.size)
idx_train = idx[0:int(idx.size*0.8)]
idx_valid = idx[int(idx.size*0.8):]
ratings_train = csr_matrix((data[idx_train],(rows[idx_train],cols[idx_train])),shape=(6,6))
ratings_valid = csr_matrix((data[idx_valid],(rows[idx_valid],cols[idx_valid])),shape=(6,6))
count_per_row = (ratings_train> 0).sum(axis=1).A1
sum_per_row = ratings_train.sum(axis=1).A1.astype('float32')
user_avg = sum_per_row / count_per_row
rmse = np.sqrt(((ratings_valid.tocoo().data -user_avg[rows[idx_valid]])**2).sum())
Model Training – Baseline models
Item-Item
Basic linear models
Model Training – Basic linear models
Content Based - Standard Least Squares model
● Goal: very basic linear model
● Data: the matrix of items features I (may be sparse)
● Pre-processing: use PCA to reduce the dimension of I
● Solve:
● Solution is Least Squares:
Model Training – Basic linear models
Content Based - Standard Least Squares model
● Goal: very basic linear model
● Data: the matrix of items features I (may be sparse)
● Pre-processing: use PCA to reduce the dimension of I
● Solve:
● Solution is Least Squares:
Never compute the inverse!
(1) Use numpy:
numpy.linalg.solve(I*I.T, I*R.T)
(1) Use Cholesky decomposition:
(I * I.T) is a positive definite matrix!
Model Training – Basic linear models
Content Based - Regularized Least Squares model
● Goal: avoid overfitting
● Method: Tikhonov Regularization (a.k.a Ridge Regression)
● Solve:
● Solution is Regularized Least Squares:
Matrix factorization
Model Training – Matrix Factorization
Matrix Factorization
● If we don’t have I, to find a linear solution to our problem we need to use Matrix Factorization
techniques.
● Now we want to solve the following optimization problem:
SOLUTIONS
ANALYTICAL NUMERICAL
SVD ALS SGD
Model Training – Matrix Factorization
Matrix Factorization - Graphical interpretation
(or R)
Model Training – Matrix Factorization
Matrix Factorization - Graphical interpretation
Model Training – Matrix Factorization
Matrix Factorization - Graphical interpretation
Model Training – Matrix Factorization
Analytical solution - Singular Value Decomposition (SVD)
● Optimal Solution
● Closed Form, readily available in scikit-learn
● O(n^3) algorithm, does not scale
Model Training – Matrix Factorization
Numerical solution - Alternating Least Square (ALS)
Initialize:
Iterate:
● Solving least squares is easy
● Scales to big dataset
● Distributed implementation are available (e.g. on Spark)
Model Training – Matrix Factorization
Numerical solution - Stochastic Gradient Descent (SGD)
We are using SGD -> One sample each time
Model Training – Matrix Factorization
100 epochs
Non-linear models
Model Training – Non-linear models
Simple Deep Learning model for collaborative filtering
Model Training – Non-linear models
Simple Deep Learning model for collaborative filtering
Model Training – Basic Deep Learning model
Simple Deep Learning model for collaborative filtering
Model Training – Complex Deep Learning problem
More complex Deep Learning model for collaborative filtering
Model Training – Complex Deep Learning problem
Training with Deep Learning
● Use Deep Learning Framework (e.g. PyTorch, TensorFlow)
● ...or at least Analytical Gradient Libraries (e.g. Theano, Chainer)
● Acceleration Heuristics (e.g. AdaGrad, Nesterov, RMSProp, Adam, NAdam)
● DropOut / BatchNorm
● Watch-out for Sparse Momentum Updates! Most Deep Learning frameworks don’t support it
● Hyper-parameter Optimization and Architecture Search (e.g. Gaussian Processes)
Conclusions
Model Training – Conclusions
Conclusions
Global Avg User Avg Item-Item Linear Linear + Reg Matrix Fact Deep Learning
Domains Baseline Baseline users >> items Known “I” Known “I” Unknown “I” Extra datasets
Model Complexity Trivial Trivial Simple Linear Linear Linear Non-linear
Time Complexity + + +++ ++++ ++++ ++++ ++
Overfit/Underfit Underfit Underfit May Underfit May Overfit May Perform Bad May Overfit Can Overfit
Hyper-Params 0 0 0 1 2 2–3 many
Implementation Numpy Numpy Numpy Numpy Numpy LightFM, Spark NNet libraries
Model Training – Conclusions
Take home
● Always start with the simplest, stupidest models
● Spend time on simple interpretable models to debug your codebase and clean your data
● Gradually increase the complexity of your models
● Add more regularization as soon as a complex model performs worse than a simpler model
Questions
Thank YOU!

Weitere ähnliche Inhalte

Was ist angesagt?

Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetCrossing Minds
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemAnoop Deoras
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?blueace
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemRishabh Mehta
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation systemZhenv5
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 

Was ist angesagt? (20)

Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation system
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 

Ähnlich wie Recommender Systems from A to Z – Model Training

Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector MachinesCloudxLab
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Dori Waldman
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummiesMichael Winer
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSanghamitra Deb
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlpankit_ppt
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenPoo Kuan Hoong
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analyticsCollin Bennett
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design TrainingESCOM
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 

Ähnlich wie Recommender Systems from A to Z – Model Training (20)

Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies Machine Learning and Deep Learning 4 dummies
Machine Learning and Deep Learning 4 dummies
 
Machine learning4dummies
Machine learning4dummiesMachine learning4dummies
Machine learning4dummies
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Machine learning and_nlp
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
Regresión
RegresiónRegresión
Regresión
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 

Kürzlich hochgeladen

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 

Kürzlich hochgeladen (20)

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 

Recommender Systems from A to Z – Model Training

  • 1.
  • 2. Recommender Systems from A to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 3. Recommender Systems from A to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 4. 1. Introduction Optimization problem, linear regression and Stochastic Gradient Descent (SGD) 1. Baseline models Global average, user average and item-item models 1. Basic linear models Least Squares (LS) Regularized Least Squares (RLS) 1. Matrix factorization Matrix Factorization, analytical solution and numerical solution 1. Non-linear models Basic and Complex Deep Learning model
  • 6. Model training – Introduction Explicit vs Implicit feedback Explicit feedback (users’ ratings) Implicit feedback (users’ clicks)
  • 7. Model training – Introduction Explicit vs Implicit feedback Explicit feedback (users’ ratings) Implicit feedback (users’ clicks) Explicit feedback Implicit feedback Example Domains Movies, Tv-Shows, Music Marketplaces, Businesses Example Data type Like/Dislike, Stars Clicks, Play-time, Purchases Complexity Clean, Costly, Easy to interpret Dirty, Cheap, Difficult to interpret
  • 8. Model training – Introduction Recommendation engine types Recommendation engine Content-based Collaborative-filtering Hybrid engine Memory-based Model-based Item-Item User-User User-Item
  • 9. Model training – Introduction Recommendation engine types Recommendation engine Content-based Collaborative-filtering Hybrid engine Memory-based Model-based Item-Item User-User User-Item Model When? Linear Problem definition Solutions strategies Content-based Item Cold start Least Square, Deep Learning Item-Item n_users >> n_items Affinity Matrix User-User n_user << n_items KNN, Affinity Matrix User-Item Better performance Matrix Factorization, Deep Learning
  • 10. Model training – Introduction Recommendation engine types Recommendation engine Content-based Collaborative-based Hybrid engine Memory-based Model-based Item-Item User-User User-Item Model When? Linear Problem definition Solutions strategies Content-based Item Cold start Least Square, Deep Learning Item-Item n_users >> n_items Affinity Matrix User-User n_user << n_items KNN, Affinity Matrix User-Item Better performance Matrix Factorization, Deep Learning
  • 11. Model training – Introduction - Optimization (or R)
  • 12. Model training – Introduction - Optimization Optimization problem (definitions) Sparse matrix of ratings with m users and n items Dense matrix of users embeddings Dense matrix of items embeddings
  • 13. Model training – Introduction - Optimization Optimization problem (definitions) Ratings of User #1 Embedding of User #1 Embedding of Item #1 Sparse matrix of ratings with m users and n items Dense matrix of users embeddings Dense matrix of items embeddings Ratings of User #m To Item #n
  • 14. Model training – Introduction - Optimization Optimization problem (definitions) AVAILABLE DATASET ? ? Sparse matrix of ratings with m users and n items Dense matrix of users embeddings Dense matrix of items embeddings
  • 15. Model training – Introduction - Optimization Optimization problem (basic formulation with RMSE) Our goal is to find U and I, such as the difference between each datapoint in R and and the product between each user and item is minimal. (or R)
  • 16. Model training – Introduction - Optimization Optimization problem (more complex formulation) Content-based Content-based with Regularization
  • 17. Model training – Introduction - Optimization Optimization problem (more complex formulation) Content-based Content-based with Regularization Available data Regularization to avoid overfitting
  • 18. Model training – Introduction - Optimization Optimization problem (more complex formulation) Content-based Content-based with Regularization Take home ● In content-based models we already know I (items features) ● We can find a linear solutions to this problem using Least Squares Available data Regularization to avoid overfitting
  • 19. Model training – Introduction - Optimization Optimization problem (more complex formulation) Collaborative-filtering Collaborative-filtering with Regularization
  • 20. Model training – Introduction - Optimization Optimization problem (more complex formulation) Collaborative-filtering Collaborative-filtering with Regularization Available data Regularization to avoid overfitting
  • 21. Model training – Introduction - Optimization Optimization problem (more complex formulation) Collaborative-filtering Collaborative-filtering with Regularization Available data Regularization to avoid overfitting Take home ● In collaborative-filtering we want to find U and I (users and items embeddings) ● We can find a linear solutions to this problem using Matrix Factorization and SGD
  • 22. Model training – Introduction - Optimization How to analytical solve an optimization problem? Let’s start with the simple optimization problem: linear regression without regularization. With m > n and. We want to find W such as:
  • 23. Model training – Introduction - Optimization How to analytical solve an optimization problem? Let’s start with the simple optimization problem: linear regression without regularization. With m > n and. We want to find W such as: Add column of ones to support w0 Scalar numbers
  • 24. Model training – Introduction - Optimization How to numerical solve an optimization problem? Gradient descent: Start with random values for W and move in the opposite direction of the gradient By taking just one sample
  • 25. Model training – Introduction - Optimization How to numerical solve an optimization problem? Gradient descent: Start with random values for W and move in the opposite direction of the gradient By taking just one sample J(w)
  • 26. Model training – Introduction - Optimization Gradient Descent algorithm Stochastic Gradient Descent algorithm for epoch in n_epochs: ● compute the predictions for all the samples ● compute the error between truth and predictions ● compute the gradient using all the samples ● update the parameters of the model for epoch in n_epochs: ● shuffle the samples ● for sample in n_samples: ○ compute the predictions for the sample ○ compute the error between truth and predictions ○ compute the gradient using the sample ○ update the parameters of the model Mini-Batch Gradient Descent algorithm for epoch in n_epochs: ● shuffle the batches ● for batch in n_batches: ○ compute the predictions for the batch ○ compute the error for the batch ○ compute the gradient for the batch ○ update the parameters of the model
  • 27. Model training – Introduction - Optimization Gradient Descent comparison Gradient Descent Stochastic Gradient Descent Mini-Batch Gradient Descent Gradient Speed Very Fast (vectorized) Slow (compute sample by sample) Fast (vectorized) Memory O(dataset) O(1) O(batch) Convergence Needs more epochs Needs less epochs Middle point between GD and SGD Gradient Stability Smooth updates in params Noisy updates in params Middle point between GD and SGD
  • 28. Model training – Introduction - Optimization A Problem with Implicit Feedback With datasets with only unary positive feedback (e.g. clicks history) Negative Sampling Common fix: add random users and items with r=0
  • 29. Model training – Introduction - Optimization A Problem with Implicit Feedback With datasets with only unary positive feedback (e.g. clicks history) Negative Sampling Common fix: add random users and items with r=0 Uniform distribution Dataset
  • 30. Model training – Introduction - Optimization Negative Sampling Common fix: add random users and items with rating=0 ● Expresses “unknowns items” from users ● Acts as a regularizer ● Works also for explicit feedback
  • 32. Model Training – Baseline models Introduction ● Before starting to train models, always compute a baseline ● Baselines are very useful to debug more complex models ● As a general rule: ○ Very basic models can’t capture all the details on the training data and tend to underfit ○ Very complex models capture every detail on the training data and tend to overfit ● Note: During this presentation we will be using RMSE for comparing models performance
  • 33. Model Training – Baseline models Global Average Average = 3.64 3.64 3.64 3.64 3.64 3.64 3.64 Prediction RMSE = sqrt((2 - 3.64)^2 + (1-3.64)^2 + …) RMSE = sqrt(4.13)
  • 34. Model Training – Baseline models Global average - Numpy code importnumpyas np from scipy.sparse import csr_matrix rows= np.array([0,0,0,1,1,2,2,2,2,3,3,3,4,4,5,5,5]) cols = np.array([0,1,5,3,5,0,1,2,4,0,3,5,0,2,1,3,4]) data = np.array([2,5,4,1,5,2,4,5,4,4,5,1,5,2,1,4,2]) ratings= csr_matrix((data,(rows, cols)), shape=(6, 6)) idx = np.random.permutation(data.size) idx_train = idx[0:int(idx.size*0.8)] idx_valid = idx[int(idx.size*0.8):] global_avg= data[idx_train].mean() rmse = np.sqrt(((data[idx_valid]- global_avg)**2).sum())
  • 35. Model Training – Baseline models User average Average u1 = 4.50 Average u2 = 5.00 Average u3 = 3.67 4.50 5.00 3.67 2.50 5.00 2.50 Prediction RMSE = sqrt((2 - 4.5)^2 + (1-5.0)^2 ...) RMSE = sqrt(6.15) Average u4 = 2.50 Average u5 = 5.00 Average u6 = 2.50
  • 36. Model Training – Baseline models User average - Numpy code rows= np.array([0,0,0,1,1,2,2,2,2,3,3,3,4,4,5,5,5]) cols = np.array([0,1,5,3,5,0,1,2,4,0,3,5,0,2,1,3,4]) data = np.array([2,5,4,1,5,2,4,5,4,4,5,1,5,2,1,4,2]) ratings= csr_matrix((data,(rows, cols)), shape=(6, 6)) idx = np.random.permutation(data.size) idx_train = idx[0:int(idx.size*0.8)] idx_valid = idx[int(idx.size*0.8):] ratings_train = csr_matrix((data[idx_train],(rows[idx_train],cols[idx_train])),shape=(6,6)) ratings_valid = csr_matrix((data[idx_valid],(rows[idx_valid],cols[idx_valid])),shape=(6,6)) count_per_row = (ratings_train> 0).sum(axis=1).A1 sum_per_row = ratings_train.sum(axis=1).A1.astype('float32') user_avg = sum_per_row / count_per_row rmse = np.sqrt(((ratings_valid.tocoo().data -user_avg[rows[idx_valid]])**2).sum())
  • 37. Model Training – Baseline models Item-Item
  • 39. Model Training – Basic linear models Content Based - Standard Least Squares model ● Goal: very basic linear model ● Data: the matrix of items features I (may be sparse) ● Pre-processing: use PCA to reduce the dimension of I ● Solve: ● Solution is Least Squares:
  • 40. Model Training – Basic linear models Content Based - Standard Least Squares model ● Goal: very basic linear model ● Data: the matrix of items features I (may be sparse) ● Pre-processing: use PCA to reduce the dimension of I ● Solve: ● Solution is Least Squares: Never compute the inverse! (1) Use numpy: numpy.linalg.solve(I*I.T, I*R.T) (1) Use Cholesky decomposition: (I * I.T) is a positive definite matrix!
  • 41. Model Training – Basic linear models Content Based - Regularized Least Squares model ● Goal: avoid overfitting ● Method: Tikhonov Regularization (a.k.a Ridge Regression) ● Solve: ● Solution is Regularized Least Squares:
  • 43. Model Training – Matrix Factorization Matrix Factorization ● If we don’t have I, to find a linear solution to our problem we need to use Matrix Factorization techniques. ● Now we want to solve the following optimization problem: SOLUTIONS ANALYTICAL NUMERICAL SVD ALS SGD
  • 44. Model Training – Matrix Factorization Matrix Factorization - Graphical interpretation (or R)
  • 45. Model Training – Matrix Factorization Matrix Factorization - Graphical interpretation
  • 46. Model Training – Matrix Factorization Matrix Factorization - Graphical interpretation
  • 47. Model Training – Matrix Factorization Analytical solution - Singular Value Decomposition (SVD) ● Optimal Solution ● Closed Form, readily available in scikit-learn ● O(n^3) algorithm, does not scale
  • 48. Model Training – Matrix Factorization Numerical solution - Alternating Least Square (ALS) Initialize: Iterate: ● Solving least squares is easy ● Scales to big dataset ● Distributed implementation are available (e.g. on Spark)
  • 49. Model Training – Matrix Factorization Numerical solution - Stochastic Gradient Descent (SGD) We are using SGD -> One sample each time
  • 50. Model Training – Matrix Factorization 100 epochs
  • 52. Model Training – Non-linear models Simple Deep Learning model for collaborative filtering
  • 53. Model Training – Non-linear models Simple Deep Learning model for collaborative filtering
  • 54. Model Training – Basic Deep Learning model Simple Deep Learning model for collaborative filtering
  • 55. Model Training – Complex Deep Learning problem More complex Deep Learning model for collaborative filtering
  • 56. Model Training – Complex Deep Learning problem Training with Deep Learning ● Use Deep Learning Framework (e.g. PyTorch, TensorFlow) ● ...or at least Analytical Gradient Libraries (e.g. Theano, Chainer) ● Acceleration Heuristics (e.g. AdaGrad, Nesterov, RMSProp, Adam, NAdam) ● DropOut / BatchNorm ● Watch-out for Sparse Momentum Updates! Most Deep Learning frameworks don’t support it ● Hyper-parameter Optimization and Architecture Search (e.g. Gaussian Processes)
  • 58. Model Training – Conclusions Conclusions Global Avg User Avg Item-Item Linear Linear + Reg Matrix Fact Deep Learning Domains Baseline Baseline users >> items Known “I” Known “I” Unknown “I” Extra datasets Model Complexity Trivial Trivial Simple Linear Linear Linear Non-linear Time Complexity + + +++ ++++ ++++ ++++ ++ Overfit/Underfit Underfit Underfit May Underfit May Overfit May Perform Bad May Overfit Can Overfit Hyper-Params 0 0 0 1 2 2–3 many Implementation Numpy Numpy Numpy Numpy Numpy LightFM, Spark NNet libraries
  • 59. Model Training – Conclusions Take home ● Always start with the simplest, stupidest models ● Spend time on simple interpretable models to debug your codebase and clean your data ● Gradually increase the complexity of your models ● Add more regularization as soon as a complex model performs worse than a simpler model