SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
Recommender Systems from A to Z
Part 1: The Right Dataset
Part 2: Model Training
Part 3: Model Evaluation
Part 4: Real-Time Deployment
1. Having the right Data
Explicit vs Implicit
Likes/Dislikes vs Ratings
1. Rating Dataset Analysis
Density
Connectivity
1. Items Features and Users Features
Unsupervised Learning
Supervised Learning
1. Data Preprocessing
Unsupervised Dimensionality Reduction
Supervised Dimensionality Reduction
Having the Right Data – Explicit & Implicit Feedback
Explicit Feedback
Implicit Feedback
Having the Right Data – Explicit & Implicit Feedback
Explicit Feedback
● Offers the preferences itself
● Clean data (aligned with your goal)
● Cost to collect
Implicit Feedback
● Offers a level of confidence on user preferences
● Very easy to have a lot
● dangerous to interpret
Having the Right Data – Implicit vs Like/Dislike vs Ratings
Explicit Feedback
● Classification (e.g. like/dislike/skip)
● Regression (e.g. star ratings)
● Ranking (e.g. pairwise comparison)
Implicit Feedback
● With Implicit Negative Feedback (e.g. watch-time or play-time of media, like/skip action)
● Without Implicit Negative Feedback (e.g. like only, search history, purchase history)
Having the Right Data – Implicit vs Like/Dislike vs Ratings
Explicit Feedback
● Classification (e.g. like/dislike/skip)
● Regression (e.g. star ratings) => best data to compute absolute prediction of taste
● Ranking (e.g. pairwise comparison) => best data to compute top-k recommendations
Implicit Feedback
● With Implicit Negative Feedback (e.g. watch-time or play-time of media, like/skip action)
● Without Implicit Negative Feedback (e.g. like only, search history, purchase history)
=> test evaluation require bias and model selection is very hard
Having the Right Data – Implicit vs Like/Dislike vs Ratings
Explicit Feedback
● Classification (e.g. like/dislike/skip)
● Regression (e.g. star ratings) => best data to compute absolute prediction of taste
● Ranking (e.g. pairwise comparison) => best data to compute top-k recommendations
Implicit Feedback
● With Implicit Negative Feedback (e.g. watch-time or play-time of media, like/skip action)
● Without Implicit Negative Feedback (e.g. like only, search history, purchase history)
=> test evaluation require bias and model selection is very hard
Take-Home
the data you have affects how you train your models!
Having the Right Data – Netflix
Context
“Context is any information that can be used to characterize the situation of an entity” -
Anind K. Dey 2001
Context
“Context is any information that can be used to characterize the situation of an entity” -
Anind K. Dey 2001
Representative Context
Fully Observable and static
Interactive Context
Non-fully observable and dynamic
Explicit Context Acquisition
Context – Model
Rating Dataset
Instead of tuple (user, item, rating), we consider (user, item, context, rating)
Model
For similarity-based model (user-user or item-item), we need to modify how we compute
the similarity to take context into account
For matrix-factorization model (user-item), we need to add a dimension and use tensor-
factorization instead, which is much more challenging
Rating Dataset Analysis
Rating Dataset Analysis – From Matrix to Graph
Rating Dataset Analysis – From Matrix to Graph
Rating Dataset Analysis – Density & Connectivity
General Principle in Collaborative-Filtering
The ability to learn anything on a user or an item is driven by its degree in the graph.
The ability to recommend an item to a user is driven by how connected they are in the graph.
Density and Sparsity
Density of a graph with users, items and ratings = (typically in [0.001–0.01])
Connectivity
There is no information learnt from a user or an item with degree one.
Example: if we have one user with 100 ratings on items with only one rating each, we can remove all
these items, the user and its 100 ratings from the dataset
Rating Dataset Analysis – Sub-graph of minimal degree 2
Rating Dataset Analysis – Sub-graph of minimal degree 2
Rating Dataset Analysis – Sub-graph of minimal degree 2
Rating Dataset Analysis – Sub-graph of minimal degree 2
Rating Dataset Analysis – Sub-graph of minimal degree 2
Rating Dataset Analysis - Use Case: Airbnb
Item & User Features
Reminder: Collaborative Filtering – User-Item
Reminder: Embedding Based Model
…
… …
User Item …
…
Reminder: Collaborative Filtering – Item-Item
Reminder: Similarity Based Model
1.00 0.00 -1.00 0.00 -1.001
-0.95 -1.00 1.001.00-1.00 1
-1.00 -1.00 -0.95 1.00 0.00 1.00
1.4
Items & Users Features
1. Quantitative Features
2. Knowledge Graph
3. Deep Content Extraction
Items & Users Features – Quantitative Features
Discrete
● number of episodes in TV shows
● number of purchase made by user
Continuous
● price of item
● age of user
● movie budget
● date released
Items & Users Features – Quantitative Features
For similarity-based models (user-user or item-item):
concatenate rating matrices and features, and use same similarity metric (e.g. dot product)
C = Cost, Y = Year, D =
Duration
Items & Users Features – Quantitative Features
For embedding-based models (user-item):
compute embedding on rating matrices only, and then concatenate embeddings with features
C = Cost, Y = Year, D =
Duration
Items & Users Features – Knowledge Graph
1. Quantitative Features
2. Knowledge Graph
3. Deep Content Extraction
Items & Users Features – Knowledge Graph
One-to-many (Categorical)
● type of item
● author of a book
● gender of user
Many-to-many (Ontological)
● tags/labels/genres of an item
● all actors of a movie
● selected preferences of user
Items & Users Features – Knowledge Graph
For similarity-based model (item-item, user-user):
concatenate rating matrices and knowledge graph seen as a sparse matrix, and use same similarity
metric (e.g. dot product)
D =Drama, A = Action, R = Romance
Items & Users Features – Knowledge Graph
For embeddings-based model (user-item):
We first need to convert the graph-based item-features into dense vectors (dimension reduction), and
then concatenate these vectors to the embeddings
Items & Users Features – Deep Content Extraction
1. Quantitative Features
2. Knowledge Graph
3. Deep Content Extraction
Items & Users Features – Deep Content Extraction
Every single item is not just about the available
meta-data.
Encode information from:
● Images (CNN)
● Text Information (NLP)
● Audio (LSTM)
Input
A documentary which examines the
creation and co-production of the
popular children’s television
program in three developing
countries: Bangladesh, Kosovo, and
South Africa.
Prediction
Comedy,
Adventure, Family,
Animation
In his spectacular film debut,
young Babar, King of the
Elephants must save his homeland
from certain destruction by
Rataxes and his band of invading
rhinos.
Documentary, History
Comedy,
Adventure, Family,
Animation
Adventure, War,
Documentary, Music
Items & Users Features – Deep Content Extraction – Images
Pre-trained Convolutional Neural Networks
are widely available
● ResNet50
● Vgg16
● AlexNet
Items & Users Features – Deep Content Extraction – Text
Pre-trained NLP models are widely available
● Word2vec, GloVe, FastText
● SkipThought
● Universal Sentence Encoders
● Elmo
Note: pre-trained complex models like Bi-LSTM do not
work well for cross-domain
Data Preprocessing
Data Preprocessing
Goal
Given a (sparse) matrix of items features I (n-items, n-entities), find the best matrix W (n-entities, d) so
that IW is a dense matrix (n-items, d) that can be used concatenated to item embeddings.
Unsupervised vs Supervised
We say “supervised dimension reduction” when we use ratings
Supervised works better if the items with ratings are aligned with items with features.
Unsupervised works better if you have much more items with features than items with ratings.
Data Preprocessing
1. Unsupervised Dimensionality Reduction
2. Supervised Dimensionality Reduction
Data Preprocessing – PCA
PCA (Principal Component Analysis) is a well known technique for doing feature extraction
PCA projects the data into a new feature space with less dimensions that the original one, and at the
same time, retaining the most relevant information
Feature space of dimension 3 Feature space of dimension 2
● PCA reduce the dimension of
the input data by considering
the dimensions with higher
variance.
● PCA can also by applied to
sparse data.
Data Preprocessing – Unsupervised Random Projection
Random Projection (RP) is another technique for doing dimension reduction
We multiply I by a random matrix T, and verify that the distance between two points is preserved after
the transformation within a certain error
Advantage
● RP is computationally more efficient than PCA
● It’s useful in very high dimension scenarios
Disadvantage
● PCA is the optimal linear projection from an space of dimension d to an space of dimension d’
(d >= d’)
Data Preprocessing – Unsupervised Deep Learning
Graph embeddings Algorithms
● Node2vec
● DeepWalk
● Line
Not often used, so there are no robust tools. They’re all on github in python/C++
Theoretical Remarks
They are actually converging to matrix-factorization of Laplacian-like normalization of the graph,
but may be more flexible and memory-friendly
Data Preprocessing
1. Unsupervised Dimensionality Reduction
2. Supervised Dimensionality Reduction
Data Preprocessing – Supervised Linear Dimensionality Reduction
Given R (n-users, n-items) sparse and I (n-items, n-entities) sparse, find the best matrix W (n-
entities, d) to learn R with a linear model:
✓ Works for both dense and sparse features
Data Preprocessing – Deep Learning Dimensionality Reduction
Directly add the Knowledge Graph as part of the training data (not pre-processing anymore)
Learn embeddings for user, item, user-entities, item-entities together
Data Preprocessing – Deep Learning Dimensionality Reduction
Directly add the Knowledge Graph as part of the training data (not pre-processing anymore)
Learn embeddings for user, item, user-entities, item-entities together
Take-Home Message
The Right Dataset – Summary
Data > Pre-processing > Model
● The rating graph needs to be as dense and connected as possible
● Explicit feedback is better than Implicit feedback if you can
● The type of the ratings (binary vs continuous) will affect how you train models.
● Having Negative Feedback is important
● Context helps adding information
● User Features and Item Features help adding information, but require heavy pre-processing
Thank YOU

Weitere ähnliche Inhalte

Was ist angesagt?

Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingCrossing Minds
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation MethodSHUBHAM GUPTA
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemDing Li
 
KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)Manish nath choudhary
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 

Was ist angesagt? (20)

Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Temporal based Recommendation System
Temporal based Recommendation SystemTemporal based Recommendation System
Temporal based Recommendation System
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)KNN - Classification Model (Step by Step)
KNN - Classification Model (Step by Step)
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 

Ähnlich wie Recommender Systems from A to Z – The Right Dataset

Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesArvind Rapaka
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlightsSandra Garcia
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
Download
DownloadDownload
Downloadbutest
 
Download
DownloadDownload
Downloadbutest
 
Artificial intelligence and IoT
Artificial intelligence and IoTArtificial intelligence and IoT
Artificial intelligence and IoTVeselin Pizurica
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with sparkModern Data Stack France
 
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Kishor Datta Gupta
 
IRIS.TV Talks Future of Video Personalization at Cross Campus LA
IRIS.TV Talks Future of Video Personalization at Cross Campus LAIRIS.TV Talks Future of Video Personalization at Cross Campus LA
IRIS.TV Talks Future of Video Personalization at Cross Campus LAIRIS.TV
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah
 
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...Mladen Jovanovic
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 

Ähnlich wie Recommender Systems from A to Z – The Right Dataset (20)

kdd2015
kdd2015kdd2015
kdd2015
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlights
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Artificial intelligence and IoT
Artificial intelligence and IoTArtificial intelligence and IoT
Artificial intelligence and IoT
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Hadoop France meetup Feb2016 : recommendations with spark
Hadoop France meetup  Feb2016 : recommendations with sparkHadoop France meetup  Feb2016 : recommendations with spark
Hadoop France meetup Feb2016 : recommendations with spark
 
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
Deep Reinforcement Learning based Recommendation with Explicit User-ItemInter...
 
IRIS.TV Talks Future of Video Personalization at Cross Campus LA
IRIS.TV Talks Future of Video Personalization at Cross Campus LAIRIS.TV Talks Future of Video Personalization at Cross Campus LA
IRIS.TV Talks Future of Video Personalization at Cross Campus LA
 
Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
 
Dssg talk CNN intro
Dssg talk CNN introDssg talk CNN intro
Dssg talk CNN intro
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 

Kürzlich hochgeladen

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 

Kürzlich hochgeladen (20)

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

Recommender Systems from A to Z – The Right Dataset

  • 1.
  • 2. Recommender Systems from A to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 3. Recommender Systems from A to Z Part 1: The Right Dataset Part 2: Model Training Part 3: Model Evaluation Part 4: Real-Time Deployment
  • 4. 1. Having the right Data Explicit vs Implicit Likes/Dislikes vs Ratings 1. Rating Dataset Analysis Density Connectivity 1. Items Features and Users Features Unsupervised Learning Supervised Learning 1. Data Preprocessing Unsupervised Dimensionality Reduction Supervised Dimensionality Reduction
  • 5.
  • 6. Having the Right Data – Explicit & Implicit Feedback Explicit Feedback Implicit Feedback
  • 7. Having the Right Data – Explicit & Implicit Feedback Explicit Feedback ● Offers the preferences itself ● Clean data (aligned with your goal) ● Cost to collect Implicit Feedback ● Offers a level of confidence on user preferences ● Very easy to have a lot ● dangerous to interpret
  • 8. Having the Right Data – Implicit vs Like/Dislike vs Ratings Explicit Feedback ● Classification (e.g. like/dislike/skip) ● Regression (e.g. star ratings) ● Ranking (e.g. pairwise comparison) Implicit Feedback ● With Implicit Negative Feedback (e.g. watch-time or play-time of media, like/skip action) ● Without Implicit Negative Feedback (e.g. like only, search history, purchase history)
  • 9. Having the Right Data – Implicit vs Like/Dislike vs Ratings Explicit Feedback ● Classification (e.g. like/dislike/skip) ● Regression (e.g. star ratings) => best data to compute absolute prediction of taste ● Ranking (e.g. pairwise comparison) => best data to compute top-k recommendations Implicit Feedback ● With Implicit Negative Feedback (e.g. watch-time or play-time of media, like/skip action) ● Without Implicit Negative Feedback (e.g. like only, search history, purchase history) => test evaluation require bias and model selection is very hard
  • 10. Having the Right Data – Implicit vs Like/Dislike vs Ratings Explicit Feedback ● Classification (e.g. like/dislike/skip) ● Regression (e.g. star ratings) => best data to compute absolute prediction of taste ● Ranking (e.g. pairwise comparison) => best data to compute top-k recommendations Implicit Feedback ● With Implicit Negative Feedback (e.g. watch-time or play-time of media, like/skip action) ● Without Implicit Negative Feedback (e.g. like only, search history, purchase history) => test evaluation require bias and model selection is very hard Take-Home the data you have affects how you train your models!
  • 11. Having the Right Data – Netflix
  • 12. Context “Context is any information that can be used to characterize the situation of an entity” - Anind K. Dey 2001
  • 13. Context “Context is any information that can be used to characterize the situation of an entity” - Anind K. Dey 2001 Representative Context Fully Observable and static Interactive Context Non-fully observable and dynamic
  • 15. Context – Model Rating Dataset Instead of tuple (user, item, rating), we consider (user, item, context, rating) Model For similarity-based model (user-user or item-item), we need to modify how we compute the similarity to take context into account For matrix-factorization model (user-item), we need to add a dimension and use tensor- factorization instead, which is much more challenging
  • 17. Rating Dataset Analysis – From Matrix to Graph
  • 18. Rating Dataset Analysis – From Matrix to Graph
  • 19. Rating Dataset Analysis – Density & Connectivity General Principle in Collaborative-Filtering The ability to learn anything on a user or an item is driven by its degree in the graph. The ability to recommend an item to a user is driven by how connected they are in the graph. Density and Sparsity Density of a graph with users, items and ratings = (typically in [0.001–0.01]) Connectivity There is no information learnt from a user or an item with degree one. Example: if we have one user with 100 ratings on items with only one rating each, we can remove all these items, the user and its 100 ratings from the dataset
  • 20. Rating Dataset Analysis – Sub-graph of minimal degree 2
  • 21. Rating Dataset Analysis – Sub-graph of minimal degree 2
  • 22. Rating Dataset Analysis – Sub-graph of minimal degree 2
  • 23. Rating Dataset Analysis – Sub-graph of minimal degree 2
  • 24. Rating Dataset Analysis – Sub-graph of minimal degree 2
  • 25. Rating Dataset Analysis - Use Case: Airbnb
  • 26. Item & User Features
  • 28. Reminder: Embedding Based Model … … … User Item … …
  • 30. Reminder: Similarity Based Model 1.00 0.00 -1.00 0.00 -1.001 -0.95 -1.00 1.001.00-1.00 1 -1.00 -1.00 -0.95 1.00 0.00 1.00 1.4
  • 31. Items & Users Features 1. Quantitative Features 2. Knowledge Graph 3. Deep Content Extraction
  • 32. Items & Users Features – Quantitative Features Discrete ● number of episodes in TV shows ● number of purchase made by user Continuous ● price of item ● age of user ● movie budget ● date released
  • 33. Items & Users Features – Quantitative Features For similarity-based models (user-user or item-item): concatenate rating matrices and features, and use same similarity metric (e.g. dot product) C = Cost, Y = Year, D = Duration
  • 34. Items & Users Features – Quantitative Features For embedding-based models (user-item): compute embedding on rating matrices only, and then concatenate embeddings with features C = Cost, Y = Year, D = Duration
  • 35. Items & Users Features – Knowledge Graph 1. Quantitative Features 2. Knowledge Graph 3. Deep Content Extraction
  • 36. Items & Users Features – Knowledge Graph One-to-many (Categorical) ● type of item ● author of a book ● gender of user Many-to-many (Ontological) ● tags/labels/genres of an item ● all actors of a movie ● selected preferences of user
  • 37. Items & Users Features – Knowledge Graph For similarity-based model (item-item, user-user): concatenate rating matrices and knowledge graph seen as a sparse matrix, and use same similarity metric (e.g. dot product) D =Drama, A = Action, R = Romance
  • 38. Items & Users Features – Knowledge Graph For embeddings-based model (user-item): We first need to convert the graph-based item-features into dense vectors (dimension reduction), and then concatenate these vectors to the embeddings
  • 39. Items & Users Features – Deep Content Extraction 1. Quantitative Features 2. Knowledge Graph 3. Deep Content Extraction
  • 40. Items & Users Features – Deep Content Extraction Every single item is not just about the available meta-data. Encode information from: ● Images (CNN) ● Text Information (NLP) ● Audio (LSTM) Input A documentary which examines the creation and co-production of the popular children’s television program in three developing countries: Bangladesh, Kosovo, and South Africa. Prediction Comedy, Adventure, Family, Animation In his spectacular film debut, young Babar, King of the Elephants must save his homeland from certain destruction by Rataxes and his band of invading rhinos. Documentary, History Comedy, Adventure, Family, Animation Adventure, War, Documentary, Music
  • 41. Items & Users Features – Deep Content Extraction – Images Pre-trained Convolutional Neural Networks are widely available ● ResNet50 ● Vgg16 ● AlexNet
  • 42. Items & Users Features – Deep Content Extraction – Text Pre-trained NLP models are widely available ● Word2vec, GloVe, FastText ● SkipThought ● Universal Sentence Encoders ● Elmo Note: pre-trained complex models like Bi-LSTM do not work well for cross-domain
  • 44. Data Preprocessing Goal Given a (sparse) matrix of items features I (n-items, n-entities), find the best matrix W (n-entities, d) so that IW is a dense matrix (n-items, d) that can be used concatenated to item embeddings. Unsupervised vs Supervised We say “supervised dimension reduction” when we use ratings Supervised works better if the items with ratings are aligned with items with features. Unsupervised works better if you have much more items with features than items with ratings.
  • 45. Data Preprocessing 1. Unsupervised Dimensionality Reduction 2. Supervised Dimensionality Reduction
  • 46. Data Preprocessing – PCA PCA (Principal Component Analysis) is a well known technique for doing feature extraction PCA projects the data into a new feature space with less dimensions that the original one, and at the same time, retaining the most relevant information Feature space of dimension 3 Feature space of dimension 2 ● PCA reduce the dimension of the input data by considering the dimensions with higher variance. ● PCA can also by applied to sparse data.
  • 47. Data Preprocessing – Unsupervised Random Projection Random Projection (RP) is another technique for doing dimension reduction We multiply I by a random matrix T, and verify that the distance between two points is preserved after the transformation within a certain error Advantage ● RP is computationally more efficient than PCA ● It’s useful in very high dimension scenarios Disadvantage ● PCA is the optimal linear projection from an space of dimension d to an space of dimension d’ (d >= d’)
  • 48. Data Preprocessing – Unsupervised Deep Learning Graph embeddings Algorithms ● Node2vec ● DeepWalk ● Line Not often used, so there are no robust tools. They’re all on github in python/C++ Theoretical Remarks They are actually converging to matrix-factorization of Laplacian-like normalization of the graph, but may be more flexible and memory-friendly
  • 49. Data Preprocessing 1. Unsupervised Dimensionality Reduction 2. Supervised Dimensionality Reduction
  • 50. Data Preprocessing – Supervised Linear Dimensionality Reduction Given R (n-users, n-items) sparse and I (n-items, n-entities) sparse, find the best matrix W (n- entities, d) to learn R with a linear model: ✓ Works for both dense and sparse features
  • 51. Data Preprocessing – Deep Learning Dimensionality Reduction Directly add the Knowledge Graph as part of the training data (not pre-processing anymore) Learn embeddings for user, item, user-entities, item-entities together
  • 52. Data Preprocessing – Deep Learning Dimensionality Reduction Directly add the Knowledge Graph as part of the training data (not pre-processing anymore) Learn embeddings for user, item, user-entities, item-entities together
  • 54. The Right Dataset – Summary Data > Pre-processing > Model ● The rating graph needs to be as dense and connected as possible ● Explicit feedback is better than Implicit feedback if you can ● The type of the ratings (binary vs continuous) will affect how you train models. ● Having Negative Feedback is important ● Context helps adding information ● User Features and Item Features help adding information, but require heavy pre-processing