SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Cao et al. ICML 2010
Presented by Danushka Bollegala.
 Predict links (relations) between entities
 Recommend items for users (MovieLens, Amazon)
 Recommend users for users (social recommendation)
 Similarity search (suggest similar web pages)
 Query suggestion (suggest related queries by other users)
 Collective Link Prediction (CLP)
 Perform multiple prediction tasks for the same set of users
simultaneously
▪ Predict/recommend multiple item types (books and movies)
 Pros
 Prediction tasks might not be independent, one can
benefit from another (books vs. movies vs. food)
 Less affected by data sparseness (cold start problem)
Transfer Learning+
Collective Link Prediction
(this paper)
Gaussian
Process for Regression
(GPR)
(PRML Sec. 6.4)
Link prediction = matrix factorization
Probabilistic Principal
Component Analysis (PPCA)
(Bishop &Tipping, 1999)
PRML Chapter 12.
Probabilistic non-linear
matrix factorization
Lawrence &
Utrasun,
ICML 2009
Task similarity
Matrix,T
 Link matrix X (xi,j is the rating given by user I to item j)
 Xi,j is modeled by f(ui, vj, ε)
 f: link function
 ui: latent representation of a user i
 vj: latent representation of an item j
 ε: noise term
 Generalized matrix approximation
 Assumption: E is Gaussian noise N(0, σ2I)
 Use Y = f-1(X)
 Then, Y follows a multivariate Gaussian distribution.
Revision (PRML Section 6.4)
 We can view a function as an infinite dimensional
vector
 f(x): (f(x1), f(x2),...)T
 Each point in the domain is mapped by f to a dimension in
the vector
 In machine learning we must find functions (e.g. linear
predictors) that map input values to their
corresponding output values
 We must also avoid over-fitting
 This can be visualized as sampling from a distribution
over functions with certain properties
 Preference bias (cf. restriction bias)
 Linear regression model
 We get different output functions y for
different weight vectors w.
 Let us impose a Gaussian prior over w
 Train dataset: {(x1,y1),...,(xN,yN)}
 Targets: y=(y1,...,yN)T
 Design matrix
 When we impose a Gaussian prior over the
weight vector, then the target y is also
Gaussian.
 K: Kernel matrix (Gram matrix)
 k: kernel function
 Gaussian process is defined as a probability
distribution over functions y(x) such that the set
of values y(x) evaluated at an arbitrary set of
points x1,...,xN jointly have a Gaussian
distribution.
 p(x1,...,xN) is Gaussian.
 Often the mean is set to zero
 Non-informative prior
 Then the kernel function fully defines the GP.
 Gaussian kernel:
 Exponential Kernel:
 Predict outputs with noise
x y
e
t
 PMF can be seen as a Gaussian Process with latent variables
(GP-LVM) [Lawrence & Utrasun ICML 2009]
Generalized matrix approximation model
Y=f-1(X) follows a multivariate Gaussian distribution
A Gaussian prior is set on U
Probabilistic PCA model by
Tipping & Bishop (1999)
Non-linear version
Mapping
back to X
 GP model for each task
 A single model for all tasks
 Known as Kronecker product for two
matrices (e.g., numpy,kron(a,b))
 Each task might have a different rating
distribution.
 c, α, b are parameters that must be estimated
from the data.
 We can relax the constraint α > 0 if we have
no prior knowledge regarding the negativity
of the skewness of the rating distribution.
 Similar to GPR prediction
 Predicting y= g(x)
 Predicting x
 Compute the likelihood of the dataset
 Use Stochastic Gradient Descent for
optimization
 Non-convex optimization
 Sensitive to initial conditions
 Setting
 Use each dataset and predict multiple items
 Datasets
 MovieLens
▪ 100000 ratings, 1-5 scale ratings, 943 users, 1682 movies, 5
popular genres
 Book-Crossing
▪ 56148 ratings, 1-10 scale, 28503 users, 9909 books, 4 most
general Amazon book categories
 Douban
▪ A social network-based recommendation serivce
▪ 10000 users, 200000 items
▪ Movies, books, music
 Evaluation measure
 Mean Absolute Error (MAE)
 Baselines
 I-GP: Independent Link Prediction using GP
 CMF: Collective matrix factorization
▪ non GP, classical NMF
 M-GP: Joint Link prediction using multi-relational GP
▪ Does not consider the similarity between tasks
 Proposed method = CLP-GP
Note: (1) Smaller values are better
(2) with(+)/without(-) link function.
Good
 Romance and Drama are very similar
 Action and Comedy are very dissimilar
 Elegant model and well-written paper
 Few parameters (latent space dimension k)
need to be specified
 All other parameters can be learnt
 Applicable to a wide range of tasks
 Cons:
 Computational complexity
▪ Predictions require kernel matrix inversion
▪ SGD updates might not converge
▪ The problem is non-convex...

Weitere ähnliche Inhalte

Was ist angesagt?

Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
CvilleDataScience
 

Was ist angesagt? (20)

Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscan
 
PCA (Principal component analysis) Theory and Toolkits
PCA (Principal component analysis) Theory and ToolkitsPCA (Principal component analysis) Theory and Toolkits
PCA (Principal component analysis) Theory and Toolkits
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering Tutorial
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
Intro to MATLAB and K-mean algorithm
Intro to MATLAB and K-mean algorithmIntro to MATLAB and K-mean algorithm
Intro to MATLAB and K-mean algorithm
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
K means
K meansK means
K means
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Machine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - PerceptronMachine Learning - Neural Networks - Perceptron
Machine Learning - Neural Networks - Perceptron
 
About Unsupervised Image-to-Image Translation
About Unsupervised Image-to-Image TranslationAbout Unsupervised Image-to-Image Translation
About Unsupervised Image-to-Image Translation
 
17 prims-kruskals (1)
17 prims-kruskals (1)17 prims-kruskals (1)
17 prims-kruskals (1)
 
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
KNN
KNNKNN
KNN
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
K means clustering
K means clusteringK means clustering
K means clustering
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jax
 

Ähnlich wie Transfer learningforclp

Download
DownloadDownload
Download
butest
 
Download
DownloadDownload
Download
butest
 
proposal_pura
proposal_puraproposal_pura
proposal_pura
Erick Lin
 

Ähnlich wie Transfer learningforclp (20)

Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
[ppt]
[ppt][ppt]
[ppt]
 
[ppt]
[ppt][ppt]
[ppt]
 
QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature Map
 
proposal_pura
proposal_puraproposal_pura
proposal_pura
 
Ghost
GhostGhost
Ghost
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 

Transfer learningforclp

  • 1. Cao et al. ICML 2010 Presented by Danushka Bollegala.
  • 2.  Predict links (relations) between entities  Recommend items for users (MovieLens, Amazon)  Recommend users for users (social recommendation)  Similarity search (suggest similar web pages)  Query suggestion (suggest related queries by other users)  Collective Link Prediction (CLP)  Perform multiple prediction tasks for the same set of users simultaneously ▪ Predict/recommend multiple item types (books and movies)  Pros  Prediction tasks might not be independent, one can benefit from another (books vs. movies vs. food)  Less affected by data sparseness (cold start problem)
  • 3. Transfer Learning+ Collective Link Prediction (this paper) Gaussian Process for Regression (GPR) (PRML Sec. 6.4) Link prediction = matrix factorization Probabilistic Principal Component Analysis (PPCA) (Bishop &Tipping, 1999) PRML Chapter 12. Probabilistic non-linear matrix factorization Lawrence & Utrasun, ICML 2009 Task similarity Matrix,T
  • 4.  Link matrix X (xi,j is the rating given by user I to item j)  Xi,j is modeled by f(ui, vj, ε)  f: link function  ui: latent representation of a user i  vj: latent representation of an item j  ε: noise term  Generalized matrix approximation  Assumption: E is Gaussian noise N(0, σ2I)  Use Y = f-1(X)  Then, Y follows a multivariate Gaussian distribution.
  • 6.  We can view a function as an infinite dimensional vector  f(x): (f(x1), f(x2),...)T  Each point in the domain is mapped by f to a dimension in the vector  In machine learning we must find functions (e.g. linear predictors) that map input values to their corresponding output values  We must also avoid over-fitting  This can be visualized as sampling from a distribution over functions with certain properties  Preference bias (cf. restriction bias)
  • 7.  Linear regression model  We get different output functions y for different weight vectors w.  Let us impose a Gaussian prior over w  Train dataset: {(x1,y1),...,(xN,yN)}  Targets: y=(y1,...,yN)T  Design matrix
  • 8.  When we impose a Gaussian prior over the weight vector, then the target y is also Gaussian.  K: Kernel matrix (Gram matrix)  k: kernel function
  • 9.  Gaussian process is defined as a probability distribution over functions y(x) such that the set of values y(x) evaluated at an arbitrary set of points x1,...,xN jointly have a Gaussian distribution.  p(x1,...,xN) is Gaussian.  Often the mean is set to zero  Non-informative prior  Then the kernel function fully defines the GP.  Gaussian kernel:  Exponential Kernel:
  • 10.  Predict outputs with noise x y e t
  • 11.  PMF can be seen as a Gaussian Process with latent variables (GP-LVM) [Lawrence & Utrasun ICML 2009] Generalized matrix approximation model Y=f-1(X) follows a multivariate Gaussian distribution A Gaussian prior is set on U Probabilistic PCA model by Tipping & Bishop (1999) Non-linear version Mapping back to X
  • 12.
  • 13.  GP model for each task  A single model for all tasks
  • 14.  Known as Kronecker product for two matrices (e.g., numpy,kron(a,b))
  • 15.  Each task might have a different rating distribution.  c, α, b are parameters that must be estimated from the data.  We can relax the constraint α > 0 if we have no prior knowledge regarding the negativity of the skewness of the rating distribution.
  • 16.  Similar to GPR prediction  Predicting y= g(x)  Predicting x
  • 17.  Compute the likelihood of the dataset  Use Stochastic Gradient Descent for optimization  Non-convex optimization  Sensitive to initial conditions
  • 18.  Setting  Use each dataset and predict multiple items  Datasets  MovieLens ▪ 100000 ratings, 1-5 scale ratings, 943 users, 1682 movies, 5 popular genres  Book-Crossing ▪ 56148 ratings, 1-10 scale, 28503 users, 9909 books, 4 most general Amazon book categories  Douban ▪ A social network-based recommendation serivce ▪ 10000 users, 200000 items ▪ Movies, books, music
  • 19.  Evaluation measure  Mean Absolute Error (MAE)  Baselines  I-GP: Independent Link Prediction using GP  CMF: Collective matrix factorization ▪ non GP, classical NMF  M-GP: Joint Link prediction using multi-relational GP ▪ Does not consider the similarity between tasks  Proposed method = CLP-GP
  • 20. Note: (1) Smaller values are better (2) with(+)/without(-) link function.
  • 21. Good
  • 22.
  • 23.  Romance and Drama are very similar  Action and Comedy are very dissimilar
  • 24.  Elegant model and well-written paper  Few parameters (latent space dimension k) need to be specified  All other parameters can be learnt  Applicable to a wide range of tasks  Cons:  Computational complexity ▪ Predictions require kernel matrix inversion ▪ SGD updates might not converge ▪ The problem is non-convex...