SlideShare ist ein Scribd-Unternehmen logo
1 von 82
Downloaden Sie, um offline zu lesen
Factorization Models & Polynomial Regression Factorization Machines Applications Summary 
Factorization Machines 
Steen Rendle 
Current aliation: Google Inc. 
Work was done at University of Konstanz 
MLConf, November 14, 2014 
Steen Rendle 1 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Models 
Linear/ Polynomial Regression 
Comparison 
Factorization Machines 
Applications 
Summary 
Steen Rendle 2 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Matrix Factorization 
Example for data: Matrix Factorization: 
Movie 
TI NH SW ST ... 
5 3 1 ? ... 
? ? 4 5 ... 
1 ? 5 ? ... 
... ... ... ... ... 
A 
B 
C 
... 
User 
^ Y := W Ht ; W 2 RjUjk ;H 2 RjIjk 
k is the rank of the reconstruction. 
Steen Rendle 3 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Matrix Factorization 
Example for data: Matrix Factorization: 
Movie 
TI NH SW ST ... 
5 3 1 ? ... 
? ? 4 5 ... 
1 ? 5 ? ... 
... ... ... ... ... 
A 
B 
C 
... 
User 
^ Y := W Ht ; W 2 RjUjk ;H 2 RjIjk 
^y(u; i) = ^yu;i = 
Xk 
f =1 
wu;f hi ;f = hwu; hi i 
k is the rank of the reconstruction. 
Steen Rendle 3 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Matrix Factorization  Extensions 
Example for data: Examples for models: 
Movie 
TI NH SW ST ... 
5 3 1 ? ... 
? ? 4 5 ... 
1 ? 5 ? ... 
... ... ... ... ... 
A 
B 
C 
... 
User 
^yMF(u; i ) := 
Xk 
f =1 
vu;f vi ;f = hvu; vi i 
Steen Rendle 4 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Matrix Factorization  Extensions 
Example for data: Examples for models: 
Movie 
TI NH SW ST ... 
5 3 1 ? ... 
? ? 4 5 ... 
1 ? 5 ? ... 
... ... ... ... ... 
A 
B 
C 
... 
User 
^yMF(u; i ) := 
Xk 
f =1 
vu;f vi ;f = hvu; vi i 
^ySVD++(u; i) := 
* 
vu + 
X 
j2N(u) 
vj ; vi 
+ 
^yFact-KNN(u; i ) := 
1 
jR(u)j 
X 
j2R(u) 
ru;j hvi ; vj i 
Steen Rendle 4 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Matrix Factorization  Extensions 
Example for data: Examples for models: 
Movie 
TI NH SW ST ... 
5 3 1 ? ... 
? ? 4 5 ... 
1 ? 5 ? ... 
... ... ... ... ... 
A 
B 
C 
... 
User 
^yMF(u; i ) := 
Xk 
f =1 
vu;f vi ;f = hvu; vi i 
^ySVD++(u; i) := 
* 
vu + 
X 
j2N(u) 
vj ; vi 
+ 
^yFact-KNN(u; i ) := 
1 
jR(u)j 
X 
j2R(u) 
ru;j hvi ; vj i 
Rating 
Matrix 
time 
^ytimeSVD(u; i ; t) := hvu + vu;t ; vi i 
^ytimeTF(u; i ; t) := 
Xk 
f =1 
vu;f vi ;f vt;f 
: : : 
Steen Rendle 4 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Tensor Factorization 
Example for data: Examples for models: 
Triples of Subject, Predicate, Object 
^yPARAFAC(s; p; o) := 
Xk 
f =1 
vs;f vp;f vo;f 
^yPITF(s; p; o) := hvs ; vpi + hvs ; voi + hvp; voi 
: : : 
Steen Rendle 5 / 53 
[illustration from Drumond et al. 2012]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Sequential Factorization Models 
Example for data: Examples for models: 
Bt Bt­3 
b 
b a b 
a c 
User 1 ? 
c 
e 
c c 
a 
? 
d c e e ? 
? 
User 2 
User 3 
User 4 
Bt­2 
Bt­1 
a 
^yFMC(u; i ; t) := 
X 
l2Bt1 
hvi ; vl i 
^yFPMC(u; i ; t) := hvu; vi i + 
X 
l2Bt1 
hvi ; vl i 
: : : 
Steen Rendle 6 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Factorization Models: Discussion 
I Advantages 
I Can estimate interactions between two (or more) variables even if 
the cross is not observed. 
I E.g. user  movie, current product  next product, user  query  
url, : : : 
Steen Rendle 7 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Factorization Models: Discussion 
I Advantages 
I Can estimate interactions between two (or more) variables even if 
the cross is not observed. 
I E.g. user  movie, current product  next product, user  query  
url, : : : 
I Downsides 
I Factorization models are usually build speci
cally for each problem. 
I Learning algorithms and implementations are tailored to individual 
models. 
Steen Rendle 7 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Models 
Linear/ Polynomial Regression 
Comparison 
Factorization Machines 
Applications 
Summary 
Steen Rendle 8 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Data and Variable Representation 
Many standard ML approaches work with real valued feature vectors as 
input. It allows to represent, e.g.: 
I any number of variables 
I categorical domains by using dummy indicator variables 
I numerical domains 
I set-categorical domains by using dummy indicator variables 
Using this representation allows to apply a wide variety of standard 
models (e.g. linear regression, SVM, etc.). 
Steen Rendle 9 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Linear Regression 
I Let x 2 Rp be an input vector with p predictor variables. 
I Model equation: 
^y(x) := w0 + 
Xp 
i=1 
wi xi 
I Model parameters: 
w0 2 R; w 2 Rp 
O(p) model parameters. 
Steen Rendle 10 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Polynomial Regression 
I Let x 2 Rp be an input vector with p predictor variables. 
I Model equation (degree 2): 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
wi ;j xi xj 
I Model parameters: 
w0 2 R; w 2 Rp; W 2 Rpp 
O(p2) model parameters. 
Steen Rendle 11 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Models 
Linear/ Polynomial Regression 
Comparison 
Factorization Machines 
Applications 
Summary 
Steen Rendle 12 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Representation: Matrix/ Tensor vs. Feature Vectors 
Matrix/ Tensor data can be represented by feature vectors: 
Movie 
TI NH SW ST ... 
5 3 1 ? ... 
? ? 4 5 ... 
1 ? 5 ? ... 
... ... ... ... ... 
A 
B 
C 
... 
User 
Steen Rendle 13 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Representation: Matrix/ Tensor vs. Feature Vectors 
Matrix/ Tensor data can be represented by feature vectors: 
Movie 
TI NH SW ST ... 
5 3 1 ? ... 
? ? 4 5 ... 
1 ? 5 ? ... 
... ... ... ... ... 
A 
B 
C 
... 
User 
, 
# User Movie Rating 
1 Alice Titanic 5 
2 Alice Notting Hill 3 
3 Alice Star Wars 1 
4 Bob Star Wars 4 
5 Bob Star Trek 5 
6 Charlie Titanic 1 
7 Charlie Star Wars 5 
. . . . . . . . . . . . 
Steen Rendle 13 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Representation: Matrix/ Tensor vs. Feature Vectors 
Matrix/ Tensor data can be represented by feature vectors: 
# User Movie Rating 
1 Alice Titanic 5 
2 Alice Notting Hill 3 
3 Alice Star Wars 1 
4 Bob Star Wars 4 
5 Bob Star Trek 5 
6 Charlie Titanic 1 
7 Charlie Star Wars 5 
. . . . . . . . . . . . 
) 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
A B C ... TI NH SW ST ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
User Movie 
Target y 
5 
3 
1 y(3) 
4 
5 
1 
5 
y(1) 
y(2) 
y(4) 
y(5) 
y(6) 
y(7) 
Steen Rendle 13 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Application to Sparse Feature Vectors 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
A B C ... TI NH SW ST ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
User Movie 
Target y 
5 
3 
1 y(3) 
4 
5 
1 
5 
y(1) 
y(2) 
y(4) 
y(5) 
y(6) 
y(7) 
Applying regression models to this data leads to: 
Steen Rendle 14 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Application to Sparse Feature Vectors 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
A B C ... TI NH SW ST ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
User Movie 
Target y 
5 
3 
1 y(3) 
4 
5 
1 
5 
y(1) 
y(2) 
y(4) 
y(5) 
y(6) 
y(7) 
Applying regression models to this data leads to: 
Linear regression: ^y(x) = w0 + wu + wi 
Steen Rendle 14 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Application to Sparse Feature Vectors 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
A B C ... TI NH SW ST ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
User Movie 
Target y 
5 
3 
1 y(3) 
4 
5 
1 
5 
y(1) 
y(2) 
y(4) 
y(5) 
y(6) 
y(7) 
Applying regression models to this data leads to: 
Linear regression: ^y(x) = w0 + wu + wi 
Polynomial regression: ^y(x) = w0 + wu + wi + wu;i 
Steen Rendle 14 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Application to Sparse Feature Vectors 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
A B C ... TI NH SW ST ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
User Movie 
Target y 
5 
3 
1 y(3) 
4 
5 
1 
5 
y(1) 
y(2) 
y(4) 
y(5) 
y(6) 
y(7) 
Applying regression models to this data leads to: 
Linear regression: ^y(x) = w0 + wu + wi 
Polynomial regression: ^y(x) = w0 + wu + wi + wu;i 
Matrix factorization: ^y(u; i) = hwu; hi i 
Steen Rendle 14 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Application to Sparse Feature Vectors 
For the data of the example: 
I Linear regression has no user-item interaction. 
Steen Rendle 15 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Application to Sparse Feature Vectors 
For the data of the example: 
I Linear regression has no user-item interaction. 
I ) Linear regression is not expressive enough. 
Steen Rendle 15 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Application to Sparse Feature Vectors 
For the data of the example: 
I Linear regression has no user-item interaction. 
I ) Linear regression is not expressive enough. 
I Polynomial regression includes pairwise interactions but cannot 
estimate them from the data. 
Steen Rendle 15 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Application to Sparse Feature Vectors 
For the data of the example: 
I Linear regression has no user-item interaction. 
I ) Linear regression is not expressive enough. 
I Polynomial regression includes pairwise interactions but cannot 
estimate them from the data. 
I n  p2: number of cases is much smaller than number of model 
parameters. 
Steen Rendle 15 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Application to Sparse Feature Vectors 
For the data of the example: 
I Linear regression has no user-item interaction. 
I ) Linear regression is not expressive enough. 
I Polynomial regression includes pairwise interactions but cannot 
estimate them from the data. 
I n  p2: number of cases is much smaller than number of model 
parameters. 
I Max.-likelihood estimator for a pairwise eect is: 
wi ;j = 
( 
y  w0  wi  wu; if (i ; j ; y) 2 S: 
not de
ned; else 
Steen Rendle 15 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Application to Sparse Feature Vectors 
For the data of the example: 
I Linear regression has no user-item interaction. 
I ) Linear regression is not expressive enough. 
I Polynomial regression includes pairwise interactions but cannot 
estimate them from the data. 
I n  p2: number of cases is much smaller than number of model 
parameters. 
I Max.-likelihood estimator for a pairwise eect is: 
wi ;j = 
( 
y  w0  wi  wu; if (i ; j ; y) 2 S: 
not de
ned; else 
I Polynomial regression cannot generalize to any unobserved pairwise 
eect. 
Steen Rendle 15 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Machines 
Model 
Examples 
Properties 
Learning 
libFM Software 
Applications 
Summary 
Steen Rendle 16 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Factorization Machine (FM) 
I Let x 2 Rp be an input vector with p predictor variables. 
I Model equation (degree 2): 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
hvi ; vj i xi xj 
I Model parameters: 
w0 2 R; w 2 Rp; V 2 Rpk 
Steen Rendle 17 / 53 
[Rendle 2010, Rendle 2012]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Factorization Machine (FM) 
I Let x 2 Rp be an input vector with p predictor variables. 
I Model equation (degree 2): 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
hvi ; vj i xi xj 
I Model parameters: 
w0 2 R; w 2 Rp; V 2 Rpk 
Compared to Polynomial regression: 
I Model equation (degree 2): 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
wi ;j xi xj 
I Model parameters: 
w0 2 R; w 2 Rp; W 2 Rpp 
Steen Rendle 17 / 53 
[Rendle 2010, Rendle 2012]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Factorization Machine (FM) 
I Let x 2 Rp be an input vector with p predictor variables. 
I Model equation (degree 2): 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
hvi ; vj i xi xj 
I Model parameters: 
w0 2 R; w 2 Rp; V 2 Rpk 
Steen Rendle 17 / 53 
[Rendle 2010, Rendle 2012]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Factorization Machine (FM) 
I Let x 2 Rp be an input vector with p predictor variables. 
I Model equation (degree 3): 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
hvi ; vj i xi xj 
+ 
Xp 
i=1 
Xp 
ji 
Xp 
lj 
Xk 
f =1 
v(3) 
i ;f v(3) 
j ;f v(3) 
l ;f xi xj xl 
I Model parameters: 
w0 2 R; w 2 Rp; V 2 Rpk ; V(3) 2 Rpk 
Steen Rendle 17 / 53 
[Rendle 2010, Rendle 2012]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Factorization Machines: Discussion 
I FMs work with real valued input. 
I FMs include variable interactions like polynomial regression. 
I Model parameters for interactions are factorized. 
I Number of model parameters is O(k p) (instead of O(p2) for poly. 
regr.). 
Steen Rendle 18 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Machines 
Model 
Examples 
Properties 
Learning 
libFM Software 
Applications 
Summary 
Steen Rendle 19 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Matrix Factorization and Factorization Machines 
Two categorical variables encoded with real valued predictor variables: 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
A B C ... TI NH SW ST ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
User Movie 
With this data, the FM is identical to MF with biases1: 
^y(x) = w0 + wu + wi + hvu; vi i | {z } 
MF 
1libFM, k = 128, MCMC inference, Net
ix RMSE=0.8937 
Steen Rendle 20 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
RDF-Triple Prediction with Factorization Machines 
Three categorical variables encoded with real valued predictor variables: 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 0 0 0 1 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
S1 S2 S3 ... P1 P2 P3 P4 ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
1 
0 
0 
0 
0 
0 
0 
... 
... 
... 
... 
... 
0 0 0 1 ... 
O1 O2 O3 O4 ... 
Subject Predicate Object 
With this data, the FM is equivalent to the PITF model: 
^y(x) := w0 + ws + wp + wo + hvs ; vpi + hvs ; voi + hvp; voi 
[PITF: Rendle et al. 2010, WSDM Best Student Paper, ECML 2009 Best DC Award] 
Steen Rendle 21 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Time with Factorization Machines 
Two categorical variables and time as linear predictor: 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
A B C ... TI NH SW ST ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
User Movie 
0.2 
0.6 
0.61 
0.3 
0.5 
0.1 
0.8 
Time 
The FM model would correspond to: 
^y(x) := w0 + wi + wu + t wtime + hvu; vi i + t hvu; vtimei + t hvi ; vtimei 
Steen Rendle 22 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Time with Factorization Machines 
Two categorical variables and time discretized in bins (b(t)): 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
A B C ... TI NH SW ST ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
User Movie 
1 
0 
0 
1 
0 
1 
0 
0 
1 
1 
0 
1 
0 
0 
Time 
0 
0 
0 
0 
0 
0 
1 
T1 T2 T3 
The FM model would correspond to:2 
^y(x) := w0 + wi + wu + wb(t) + hvu; vi i + hvu; vb(t)i + hvi ; vb(t)i 
2libFM, k = 128, MCMC inference, Net
ix RMSE=0.8873 
Steen Rendle 22 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
SVD++ 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 0.3 0.3 0.3 0 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
A B C ... TI NH SW ST ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
0.3 
0.3 
0 
0 
0.5 
0.3 
0.3 
0 
0 
0 
0.3 
0.3 
0.5 
0.5 
0.5 
0 
0 
0.5 
0.5 
0 
... 
... 
... 
... 
... 
0.5 0 0.5 0 ... 
TI NH SW ST ... 
User Movie Other Movies rated 
With this data, the FM3 is identical to: 
^y(x) = 
SVD++ z }| { 
w0 + wu + wi + hvu; vi i + 
1 p 
jNuj 
X 
l2Nu 
hvi ; vl i 
+ 
1 p 
jNuj 
X 
l2Nu 
0 
@wl + hvu; vl i + 
1 p 
jNuj 
X 
l 02Nu ;l 0l 
hvl ; v0 
l i 
1 
A 
3libFM, k = 128, MCMC inference, Net
ix RMSE=0.8865 
Steen Rendle 23 / 53 
[Koren, 2008]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Factorizing Personalized Markov Chains (FPMC) 
Two categorical variables (u,i ), one set categorical (Bt1): 
1 0 0 ... 
1 0 0 ... 
x(3) 1 0 0 ... 0 0 1 0 ... 0.5 0.5 0 0 ... 
0 1 0 ... 
0 1 0 ... 
0 0 1 ... 
1 
0 
0 
0 
1 
0 
1 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
1 
0 
... 
... 
... 
... 
... 
0 0 1 ... 0 0 1 0 ... 
u1 u2 u3 ... A B C D ... 
x(1) 
x(2) 
x(4) 
x(5) 
x(6) 
x(7) 
Feature vector x 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 
0 
... 
... 
... 
... 
... 
1 0 0 0 ... 
User Product 
A B C D ... 
Last Basket 
Sequential Baskets 
u1 A,B C 
u2 C D 
u3 A C 
FM is equivalent to 
^y(x) := w0 + wu + wi + 
1 
jBt1j 
X 
j2Bt1 
wj + hvu; vi i + 
1 
jBt1j 
X 
j2Bt1 
hvi ; vj i + ::: 
Steen Rendle 24 / 53 
[Rendle et al. 2010, WWW Best Paper]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Machines 
Model 
Examples 
Properties 
Learning 
libFM Software 
Applications 
Summary 
Steen Rendle 25 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Computation Complexity 
Factorization Machine model equation: 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
hvi ; vj i xi xj 
I Trivial computation: O(p2 k) 
Steen Rendle 26 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Computation Complexity 
Factorization Machine model equation: 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
hvi ; vj i xi xj 
I Trivial computation: O(p2 k) 
I Ecient computation can be done in: O(p k) 
Steen Rendle 26 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Computation Complexity 
Factorization Machine model equation: 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
hvi ; vj i xi xj 
I Trivial computation: O(p2 k) 
I Ecient computation can be done in: O(p k) 
I Making use of many zeros in x even in: O(Nz (x) k), where Nz (x) is 
the number of non-zero elements in vector x. 
Steen Rendle 26 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Ecient Computation 
The model equation of an FM can be computed in O(p k). 
Steen Rendle 27 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Ecient Computation 
The model equation of an FM can be computed in O(p k). 
Proof: 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
hvi ; vj i xi xj 
= w0 + 
Xp 
i=1 
wi xi + 
1 
2 
Xk 
f =1 
2 
4 
  Xp 
i=1 
xi vi ;f 
!2 
 
Xp 
i=1 
(xi vi ;f )2 
3 
5 
Steen Rendle 27 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Ecient Computation 
The model equation of an FM can be computed in O(p k). 
Proof: 
^y(x) := w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
ji 
hvi ; vj i xi xj 
= w0 + 
Xp 
i=1 
wi xi + 
1 
2 
Xk 
f =1 
2 
4 
  Xp 
i=1 
xi vi ;f 
!2 
 
Xp 
i=1 
(xi vi ;f )2 
3 
5 
I In the sums over i , only non-zero xi elements have to be summed up 
) O(Nz (x) k). 
I (The complexity of polynomial regression is O(Nz (x)2).) 
Steen Rendle 27 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Multilinearity 
FMs are multilinear: 
8 2  = fw0;w1; : : : ;wp; v1;1; : : : ; vp;kg : ^y(x; ) = h()(x)  + g()(x) 
where g() and h() do not depend on the value of . 
Steen Rendle 28 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Multilinearity 
FMs are multilinear: 
8 2  = fw0;w1; : : : ;wp; v1;1; : : : ; vp;kg : ^y(x; ) = h()(x)  + g()(x) 
where g() and h() do not depend on the value of . 
E.g. for second order eects ( = vl ;f ): 
^y(x; vl;f ) := 
g(vl;f )(x) 
z }| { 
w0 + 
Xp 
i=1 
wi xi + 
Xp 
i=1 
Xp 
j=i+1 
Xk 
f 0=1 
(f 06=f )_(l62fi ;jg) 
vi ;f 0 vj;f 0 xi xj 
+ vl;f xl 
X 
i=1;i6=l 
vi ;f xi 
| {z } 
h(vl;f )(x) 
Steen Rendle 28 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Machines 
Model 
Examples 
Properties 
Learning 
libFM Software 
Applications 
Summary 
Steen Rendle 29 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Learning 
Using these properties, learning algorithms can be developed: 
I L2-regularized regression and classi
cation: 
I Stochastic gradient descent [Rendle, 2010] 
I Alternating least squares/ Coordinate Descent [Rendle et al., 2011, 
Rendle 2012] 
I Markov Chain Monte Carlo (for Bayesian FMs) [Freudenthaler et al. 
2011, Rendle 2012] 
I L2-regularized ranking: 
I Stochastic gradient descent [Rendle, 2010] 
All the proposed learning algorithms have a runtime of O(k Nz (X) i ), 
where i is the number of iterations and Nz (X) the number of non-zero 
elements in the design matrix X. 
Steen Rendle 30 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Stochastic Gradient Descent (SGD) 
I For each training case (x; y) 2 S, SGD updates the FM model 
parameter  using: 
0 =    
 
(^y(x)  y)h()(x) + () 
 
I  is the learning rate / step size. 
I () is the regularization value of the parameter . 
I SGD can easily be applied to other loss functions. 
Steen Rendle 31 / 53 
[Rendle, 2010]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Coordinate Descent (CD) 
I CD updates each FM model parameter  using: 
0 = 
P 
(x;y)2S 
 
y  g()(x) 
 
h()(x) 
P 
(x;y)2S h2 
()(x) + () 
I Using caches of intermediate results, the runtime for updating all 
model parameters is O(k Nz (X)). 
I CD can be extended to classi
cation [Rendle, 2012]. 
Steen Rendle 32 / 53 
[Rendle et al., 2011]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Gibbs Sampling (MCMC) 
I Gibbs sampling with a block for each FM model parameter : 
jS; n fg  N 
  
 
P 
(x;y)2S 
 
y  g()(x) 
 
h()(x) 
 
P 
(x;y)2S h2 
()(x) + () 
; 
1 
 
P 
(x;y)2S h2 
()(x) + () 
! 
I Mean is the same as for CD ) computational complexity is also 
O(k Nz (X)). 
I MCMC can be extended to classi
cation using link functions. 
Steen Rendle 33 / 53 
[Freudenthaler et al. 2011, Rendle 2012]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Learning Regularization Values 
 v ,v 
yi 
w ,w 
wj 
w0 
xij 
i=1,...,n 
v j 
 
j=1,...,p 
 , 0 ,0 
yi 
wj 
w0 
w 
xij 
i=1,...,n 
v j 
 
j=1,...,p 
w0 ,w0 
0 ,0 
w0 ,w0 
w  v v 
Standard FM with priors. Two level FM with hyperpriors. 
Steen Rendle 34 / 53 
[Freudenthaler et al., 2011]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Machines 
Model 
Examples 
Properties 
Learning 
libFM Software 
Applications 
Summary 
Steen Rendle 35 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
libFM Software 
libFM is an implementation of FMs 
I Model: second-order FMs 
I Learning/ inference: SGD, ALS, MCMC 
I Classi
cation and regression 
I Uses the same data format as LIBSVM, LIBLINEAR [Lin et. al], 
SVMlight [Joachims]. 
I Supports variable grouping. 
I Open source: GPLv3. 
Steen Rendle 36 / 53 
[http://www.libfm.org/]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Machines 
Applications 
Recommender Systems 
Link Prediction in Social Networks 
Clickthrough Prediction 
Personalized Ranking 
Student Performance Prediction 
Kaggle Competitions 
Summary 
Steen Rendle 37 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
(Context-aware) Rating Prediction 
I Main variables: 
I User ID (categorical) 
I Item ID (categorical) 
I Additional variables: 
I time 
I mood 
I user pro
le 
I item meta data 
I . . . 
I Examples: Net
ix prize, Movielens, KDDCup 2011 
+ 
♪ + + 
Song 
User Time Mood 
Steen Rendle 38 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Net
ix Prize 
Netflix Prize: Prediction Error 
Public Leaderboard 
RMS Error 
0.86 0.87 0.88 0.89 0.90 
user, movie 
user, movie, day 
user, movie, impl. 
user, movie, 
day, impl. 
SGD Matrix 
Factorization 
user, movie, 
day, impl., 
freq, lin. day 
$1M Prize 
I k = 128 factors, 512 MCMC samples (no burnin phase, initialization 
from random) 
I MCMC inference (no hyperparameters (learning rate, regularization) 
to specify) 
Steen Rendle 39 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Net
ix Prize 
Method (Name) Ref. Learning Method k Quiz RMSE 
Models using user ID and item ID 
Probabilistic Matrix Factorization [14, 13] Batch GD 40 *0.9170 
Probabilistic Matrix Factorization [14, 13] Batch GD 150 0.9211 
Matrix Factorization [6] Variational Bayes 30 *0.9141 
Matchbox [15] Variational Bayes 50 *0.9100 
ALS-MF [7] ALS 100 0.9079 
ALS-MF [7] ALS 1000 *0.9018 
SVD/ MF [3] SGD 100 0.9025 
SVD/ MF [3] SGD 200 *0.9009 
Bayesian Probablistic Matrix Factorization 
[13] MCMC 150 0.8965 
(BPMF) 
Bayesian Probablistic Matrix Factorization 
(BPMF) 
[13] MCMC 300 *0.8954 
FM, pred. var: user ID, movie ID - MCMC 128 0.8937 
Models using implicit feedback 
Probabilistic Matrix Factorization with Cons- 
traints 
[14] Batch GD 30 *0.9016 
SVD++ [3] SGD 100 0.8924 
SVD++ [3] SGD 200 *0.8911 
BSRM/F [18] MCMC 100 0.8926 
BSRM/F [18] MCMC 400 *0.8874 
FM, pred. var: user ID, movie ID, impl. - MCMC 128 0.8865 
Steen Rendle 40 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Net
ix Prize 
Method (Name) Ref. Learning Method k Quiz RMSE 
Models using time information 
Bayesian Probabilistic Tensor Factorization 
[17] MCMC 30 *0.9044 
(BPTF) 
FM, pred. var: user ID, movie ID, day - MCMC 128 0.8873 
Models using time and implicit feedback 
timeSVD++ [5] SGD 100 0.8805 
timeSVD++ [5] SGD 200 *0.8799 
FM, pred. var: user ID, movie ID, day, impl. - MCMC 128 0.8809 
FM, pred. var: user ID, movie ID, day, impl. - MCMC 256 0.8794 
Assorted models 
BRISMF/UM NB corrected [16] SGD 1000 *0.8904 
BMFSI plus side information [8] MCMC 100 *0.8875 
timeSVD++ plus frequencies [4] SGD 200 0.8777 
timeSVD++ plus frequencies [4] SGD 2000 *0.8762 
FM, pred. var: user ID, movie ID, day, impl., 
- MCMC 128 0.8779 
freq., lin. day 
FM, pred. var: user ID, movie ID, day, impl., 
freq., lin. day 
- MCMC 256 0.8771 
Steen Rendle 40 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Machines 
Applications 
Recommender Systems 
Link Prediction in Social Networks 
Clickthrough Prediction 
Personalized Ranking 
Student Performance Prediction 
Kaggle Competitions 
Summary 
Steen Rendle 41 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Link Prediction in Social Networks 
I Main variables: 
I Actor A ID 
I Actor B ID 
I Additional variables: 
I pro
les 
I actions 
I . . . 
+ 
Actor A Actor B 
Steen Rendle 42 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
KDDCup 2012: Track 1 
KDDCup 2012 Track 1: Prediction Quality 
Public Leaderboard Private Leaderboard 
Mean Average Precision @3 
0.32 0.34 0.36 0.38 0.40 0.42 
none 
gender, age, ... 
keywords 
friends 
all 
none 
gender, age, ... 
keywords 
friends 
all 
Top 1 
Top 5 
Top 10 
Top 100 
I k = 22 factors, 512 MCMC samples (no burnin phase, initialization 
from random) 
I MCMC inference (no hyperparameters (learning rate, regularization) 
to specify) 
Steen Rendle 43 / 53 
[Awarded 2nd place (out of 658 teams)]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Machines 
Applications 
Recommender Systems 
Link Prediction in Social Networks 
Clickthrough Prediction 
Personalized Ranking 
Student Performance Prediction 
Kaggle Competitions 
Summary 
Steen Rendle 44 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Clickthrough Prediction 
I Main variables: 
I User ID 
I Query ID 
I Ad/ Link ID 
I Additional variables: 
I query tokens 
I user pro
le 
I . . . 
+ 
keyword... + 
Link 1 
Link 2 
Link 3 
User Query Ad/ Link 
Steen Rendle 45 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
KDDCup 2012: Track 2 
Model Inference wAUC (public) wAUC (private) 
ID-based model (k = 0) SGD 0.78050 0.78086 
Attribute-based model (k = 8) MCMC 0.77409 0.77555 
Mixed model (k = 8) SGD 0.79011 0.79321 
Final ensemble n/a 0.79857 0.80178 
Ensemble 
I Rank positions (not predicted clickthrough rates) are used. 
I The MCMC attribute-based model and dierent variations of the 
. SGD models are included. 
Steen Rendle 46 / 53 
[Awarded 3rd place (out of 171 teams)]
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
Outline 
Factorization Models  Polynomial Regression 
Factorization Machines 
Applications 
Recommender Systems 
Link Prediction in Social Networks 
Clickthrough Prediction 
Personalized Ranking 
Student Performance Prediction 
Kaggle Competitions 
Summary 
Steen Rendle 47 / 53
Factorization Models  Polynomial Regression Factorization Machines Applications Summary 
ECML/PKDD Discovery Challenge 2013 
I Problem: Recommend given names. 
I Main variables: 
I User ID 
I Name ID 
I Additional variables: 
I session info 
I string representation for each name 
I . . . 
I FM approach won 1st place (online track) and 2nd (oine track). 
Steen Rendle 48 / 53

Weitere ähnliche Inhalte

Was ist angesagt?

Computer Graphics Practical
Computer Graphics PracticalComputer Graphics Practical
Computer Graphics Practical
Neha Sharma
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
Dongseo University
 
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
Hidekazu Oiwa
 

Was ist angesagt? (20)

Cg lab cse-v (1) (1)
Cg lab cse-v (1) (1)Cg lab cse-v (1) (1)
Cg lab cse-v (1) (1)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and Culling
 
Cgm Lab Manual
Cgm Lab ManualCgm Lab Manual
Cgm Lab Manual
 
Matlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge DetectionMatlab Feature Extraction Using Segmentation And Edge Detection
Matlab Feature Extraction Using Segmentation And Edge Detection
 
Computer graphics
Computer graphics Computer graphics
Computer graphics
 
Computer Graphics Practical
Computer Graphics PracticalComputer Graphics Practical
Computer Graphics Practical
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Shadow Volumes on Programmable Graphics Hardware
Shadow Volumes on Programmable Graphics HardwareShadow Volumes on Programmable Graphics Hardware
Shadow Volumes on Programmable Graphics Hardware
 
Auto-encoding variational bayes
Auto-encoding variational bayesAuto-encoding variational bayes
Auto-encoding variational bayes
 
Backpropagation
BackpropagationBackpropagation
Backpropagation
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Svm vs ls svm
Svm vs ls svmSvm vs ls svm
Svm vs ls svm
 
Geometry Shader-based Bump Mapping Setup
Geometry Shader-based Bump Mapping SetupGeometry Shader-based Bump Mapping Setup
Geometry Shader-based Bump Mapping Setup
 
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
2013-1 Machine Learning Lecture 05 - Andrew Moore - Support Vector Machines
 
Graph Kernels for Chemical Informatics
Graph Kernels for Chemical InformaticsGraph Kernels for Chemical Informatics
Graph Kernels for Chemical Informatics
 
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 
10_rnn.pdf
10_rnn.pdf10_rnn.pdf
10_rnn.pdf
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache spark
 

Andere mochten auch

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 

Andere mochten auch (18)

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Codes and standards
Codes and standardsCodes and standards
Codes and standards
 
Artificial Intelligence and the Data Center
Artificial Intelligence and the Data CenterArtificial Intelligence and the Data Center
Artificial Intelligence and the Data Center
 
AmebaPico 裏側の技術やAWSの利用について
AmebaPico 裏側の技術やAWSの利用についてAmebaPico 裏側の技術やAWSの利用について
AmebaPico 裏側の技術やAWSの利用について
 
芸能人推薦のしくみ
芸能人推薦のしくみ芸能人推薦のしくみ
芸能人推薦のしくみ
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
TensorFlow in Context
TensorFlow in ContextTensorFlow in Context
TensorFlow in Context
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Autonomous Vehicles: the Intersection of Robotics and Artificial Intelligence
Autonomous Vehicles: the Intersection of Robotics and Artificial IntelligenceAutonomous Vehicles: the Intersection of Robotics and Artificial Intelligence
Autonomous Vehicles: the Intersection of Robotics and Artificial Intelligence
 
Matrix Factorizationを使った評価予測
Matrix Factorizationを使った評価予測Matrix Factorizationを使った評価予測
Matrix Factorizationを使った評価予測
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
「はじめてでもわかる RandomForest 入門-集団学習による分類・予測 -」 -第7回データマイニング+WEB勉強会@東京
「はじめてでもわかる RandomForest 入門-集団学習による分類・予測 -」 -第7回データマイニング+WEB勉強会@東京「はじめてでもわかる RandomForest 入門-集団学習による分類・予測 -」 -第7回データマイニング+WEB勉強会@東京
「はじめてでもわかる RandomForest 入門-集団学習による分類・予測 -」 -第7回データマイニング+WEB勉強会@東京
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Understanding Feature Space in Machine Learning
Understanding Feature Space in Machine LearningUnderstanding Feature Space in Machine Learning
Understanding Feature Space in Machine Learning
 
Top 10 Interesting Facts About Solar System
Top 10 Interesting Facts About Solar SystemTop 10 Interesting Facts About Solar System
Top 10 Interesting Facts About Solar System
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 

Ähnlich wie Steffen Rendle, Research Scientist, Google at MLconf SF

PF_IDETC_2012_Souma
PF_IDETC_2012_SoumaPF_IDETC_2012_Souma
PF_IDETC_2012_Souma
MDO_Lab
 
An_expected_improvement_criterion_for_the_global_optimization_of_a_noisy_comp...
An_expected_improvement_criterion_for_the_global_optimization_of_a_noisy_comp...An_expected_improvement_criterion_for_the_global_optimization_of_a_noisy_comp...
An_expected_improvement_criterion_for_the_global_optimization_of_a_noisy_comp...
Kanika Anand
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
Thanh Hieu
 
PPT Image Analysis(IRDE, DRDO)
PPT Image Analysis(IRDE, DRDO)PPT Image Analysis(IRDE, DRDO)
PPT Image Analysis(IRDE, DRDO)
Nidhi Gopal
 
Projection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamicsProjection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamics
University of Glasgow
 
Applying Model Checking Approach with Floating Point Arithmetic for Verificat...
Applying Model Checking Approach with Floating Point Arithmetic for Verificat...Applying Model Checking Approach with Floating Point Arithmetic for Verificat...
Applying Model Checking Approach with Floating Point Arithmetic for Verificat...
Sergey Staroletov
 
PF_MAO_2010_Souam
PF_MAO_2010_SouamPF_MAO_2010_Souam
PF_MAO_2010_Souam
MDO_Lab
 

Ähnlich wie Steffen Rendle, Research Scientist, Google at MLconf SF (20)

PF_IDETC_2012_Souma
PF_IDETC_2012_SoumaPF_IDETC_2012_Souma
PF_IDETC_2012_Souma
 
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
A Multiple Kernel Learning Based Fusion Framework for Real-Time Multi-View Ac...
 
CP3_SDM_2010_Souma
CP3_SDM_2010_SoumaCP3_SDM_2010_Souma
CP3_SDM_2010_Souma
 
Применение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботамиПрименение машинного обучения для навигации и управления роботами
Применение машинного обучения для навигации и управления роботами
 
An_expected_improvement_criterion_for_the_global_optimization_of_a_noisy_comp...
An_expected_improvement_criterion_for_the_global_optimization_of_a_noisy_comp...An_expected_improvement_criterion_for_the_global_optimization_of_a_noisy_comp...
An_expected_improvement_criterion_for_the_global_optimization_of_a_noisy_comp...
 
GDRR Opening Workshop - Variance Reduction for Reliability Assessment with St...
GDRR Opening Workshop - Variance Reduction for Reliability Assessment with St...GDRR Opening Workshop - Variance Reduction for Reliability Assessment with St...
GDRR Opening Workshop - Variance Reduction for Reliability Assessment with St...
 
FMI output gap
FMI output gapFMI output gap
FMI output gap
 
Wp13105
Wp13105Wp13105
Wp13105
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
 
Compact 3 d_resist_models_general
Compact 3 d_resist_models_generalCompact 3 d_resist_models_general
Compact 3 d_resist_models_general
 
PPT Image Analysis(IRDE, DRDO)
PPT Image Analysis(IRDE, DRDO)PPT Image Analysis(IRDE, DRDO)
PPT Image Analysis(IRDE, DRDO)
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...
 
Phd Defense 2007
Phd Defense 2007Phd Defense 2007
Phd Defense 2007
 
Comprehensive Product Platform Planning (CP3) - Souma - AIAA/SDM2010
Comprehensive Product Platform Planning (CP3) - Souma - AIAA/SDM2010Comprehensive Product Platform Planning (CP3) - Souma - AIAA/SDM2010
Comprehensive Product Platform Planning (CP3) - Souma - AIAA/SDM2010
 
Prpagation of Error Bounds Across reduction interfaces
Prpagation of Error Bounds Across reduction interfacesPrpagation of Error Bounds Across reduction interfaces
Prpagation of Error Bounds Across reduction interfaces
 
AbdoSummerANS_mod3
AbdoSummerANS_mod3AbdoSummerANS_mod3
AbdoSummerANS_mod3
 
Projection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamicsProjection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamics
 
Applying Model Checking Approach with Floating Point Arithmetic for Verificat...
Applying Model Checking Approach with Floating Point Arithmetic for Verificat...Applying Model Checking Approach with Floating Point Arithmetic for Verificat...
Applying Model Checking Approach with Floating Point Arithmetic for Verificat...
 
PF_MAO_2010_Souam
PF_MAO_2010_SouamPF_MAO_2010_Souam
PF_MAO_2010_Souam
 
Development of Multi-Level ROM
Development of Multi-Level ROMDevelopment of Multi-Level ROM
Development of Multi-Level ROM
 

Mehr von MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Steffen Rendle, Research Scientist, Google at MLconf SF

  • 1. Factorization Models & Polynomial Regression Factorization Machines Applications Summary Factorization Machines Steen Rendle Current aliation: Google Inc. Work was done at University of Konstanz MLConf, November 14, 2014 Steen Rendle 1 / 53
  • 2. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Models Linear/ Polynomial Regression Comparison Factorization Machines Applications Summary Steen Rendle 2 / 53
  • 3. Factorization Models Polynomial Regression Factorization Machines Applications Summary Matrix Factorization Example for data: Matrix Factorization: Movie TI NH SW ST ... 5 3 1 ? ... ? ? 4 5 ... 1 ? 5 ? ... ... ... ... ... ... A B C ... User ^ Y := W Ht ; W 2 RjUjk ;H 2 RjIjk k is the rank of the reconstruction. Steen Rendle 3 / 53
  • 4. Factorization Models Polynomial Regression Factorization Machines Applications Summary Matrix Factorization Example for data: Matrix Factorization: Movie TI NH SW ST ... 5 3 1 ? ... ? ? 4 5 ... 1 ? 5 ? ... ... ... ... ... ... A B C ... User ^ Y := W Ht ; W 2 RjUjk ;H 2 RjIjk ^y(u; i) = ^yu;i = Xk f =1 wu;f hi ;f = hwu; hi i k is the rank of the reconstruction. Steen Rendle 3 / 53
  • 5. Factorization Models Polynomial Regression Factorization Machines Applications Summary Matrix Factorization Extensions Example for data: Examples for models: Movie TI NH SW ST ... 5 3 1 ? ... ? ? 4 5 ... 1 ? 5 ? ... ... ... ... ... ... A B C ... User ^yMF(u; i ) := Xk f =1 vu;f vi ;f = hvu; vi i Steen Rendle 4 / 53
  • 6. Factorization Models Polynomial Regression Factorization Machines Applications Summary Matrix Factorization Extensions Example for data: Examples for models: Movie TI NH SW ST ... 5 3 1 ? ... ? ? 4 5 ... 1 ? 5 ? ... ... ... ... ... ... A B C ... User ^yMF(u; i ) := Xk f =1 vu;f vi ;f = hvu; vi i ^ySVD++(u; i) := * vu + X j2N(u) vj ; vi + ^yFact-KNN(u; i ) := 1 jR(u)j X j2R(u) ru;j hvi ; vj i Steen Rendle 4 / 53
  • 7. Factorization Models Polynomial Regression Factorization Machines Applications Summary Matrix Factorization Extensions Example for data: Examples for models: Movie TI NH SW ST ... 5 3 1 ? ... ? ? 4 5 ... 1 ? 5 ? ... ... ... ... ... ... A B C ... User ^yMF(u; i ) := Xk f =1 vu;f vi ;f = hvu; vi i ^ySVD++(u; i) := * vu + X j2N(u) vj ; vi + ^yFact-KNN(u; i ) := 1 jR(u)j X j2R(u) ru;j hvi ; vj i Rating Matrix time ^ytimeSVD(u; i ; t) := hvu + vu;t ; vi i ^ytimeTF(u; i ; t) := Xk f =1 vu;f vi ;f vt;f : : : Steen Rendle 4 / 53
  • 8. Factorization Models Polynomial Regression Factorization Machines Applications Summary Tensor Factorization Example for data: Examples for models: Triples of Subject, Predicate, Object ^yPARAFAC(s; p; o) := Xk f =1 vs;f vp;f vo;f ^yPITF(s; p; o) := hvs ; vpi + hvs ; voi + hvp; voi : : : Steen Rendle 5 / 53 [illustration from Drumond et al. 2012]
  • 9. Factorization Models Polynomial Regression Factorization Machines Applications Summary Sequential Factorization Models Example for data: Examples for models: Bt Bt­3 b b a b a c User 1 ? c e c c a ? d c e e ? ? User 2 User 3 User 4 Bt­2 Bt­1 a ^yFMC(u; i ; t) := X l2Bt1 hvi ; vl i ^yFPMC(u; i ; t) := hvu; vi i + X l2Bt1 hvi ; vl i : : : Steen Rendle 6 / 53
  • 10. Factorization Models Polynomial Regression Factorization Machines Applications Summary Factorization Models: Discussion I Advantages I Can estimate interactions between two (or more) variables even if the cross is not observed. I E.g. user movie, current product next product, user query url, : : : Steen Rendle 7 / 53
  • 11. Factorization Models Polynomial Regression Factorization Machines Applications Summary Factorization Models: Discussion I Advantages I Can estimate interactions between two (or more) variables even if the cross is not observed. I E.g. user movie, current product next product, user query url, : : : I Downsides I Factorization models are usually build speci
  • 12. cally for each problem. I Learning algorithms and implementations are tailored to individual models. Steen Rendle 7 / 53
  • 13. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Models Linear/ Polynomial Regression Comparison Factorization Machines Applications Summary Steen Rendle 8 / 53
  • 14. Factorization Models Polynomial Regression Factorization Machines Applications Summary Data and Variable Representation Many standard ML approaches work with real valued feature vectors as input. It allows to represent, e.g.: I any number of variables I categorical domains by using dummy indicator variables I numerical domains I set-categorical domains by using dummy indicator variables Using this representation allows to apply a wide variety of standard models (e.g. linear regression, SVM, etc.). Steen Rendle 9 / 53
  • 15. Factorization Models Polynomial Regression Factorization Machines Applications Summary Linear Regression I Let x 2 Rp be an input vector with p predictor variables. I Model equation: ^y(x) := w0 + Xp i=1 wi xi I Model parameters: w0 2 R; w 2 Rp O(p) model parameters. Steen Rendle 10 / 53
  • 16. Factorization Models Polynomial Regression Factorization Machines Applications Summary Polynomial Regression I Let x 2 Rp be an input vector with p predictor variables. I Model equation (degree 2): ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji wi ;j xi xj I Model parameters: w0 2 R; w 2 Rp; W 2 Rpp O(p2) model parameters. Steen Rendle 11 / 53
  • 17. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Models Linear/ Polynomial Regression Comparison Factorization Machines Applications Summary Steen Rendle 12 / 53
  • 18. Factorization Models Polynomial Regression Factorization Machines Applications Summary Representation: Matrix/ Tensor vs. Feature Vectors Matrix/ Tensor data can be represented by feature vectors: Movie TI NH SW ST ... 5 3 1 ? ... ? ? 4 5 ... 1 ? 5 ? ... ... ... ... ... ... A B C ... User Steen Rendle 13 / 53
  • 19. Factorization Models Polynomial Regression Factorization Machines Applications Summary Representation: Matrix/ Tensor vs. Feature Vectors Matrix/ Tensor data can be represented by feature vectors: Movie TI NH SW ST ... 5 3 1 ? ... ? ? 4 5 ... 1 ? 5 ? ... ... ... ... ... ... A B C ... User , # User Movie Rating 1 Alice Titanic 5 2 Alice Notting Hill 3 3 Alice Star Wars 1 4 Bob Star Wars 4 5 Bob Star Trek 5 6 Charlie Titanic 1 7 Charlie Star Wars 5 . . . . . . . . . . . . Steen Rendle 13 / 53
  • 20. Factorization Models Polynomial Regression Factorization Machines Applications Summary Representation: Matrix/ Tensor vs. Feature Vectors Matrix/ Tensor data can be represented by feature vectors: # User Movie Rating 1 Alice Titanic 5 2 Alice Notting Hill 3 3 Alice Star Wars 1 4 Bob Star Wars 4 5 Bob Star Trek 5 6 Charlie Titanic 1 7 Charlie Star Wars 5 . . . . . . . . . . . . ) 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... A B C ... TI NH SW ST ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x User Movie Target y 5 3 1 y(3) 4 5 1 5 y(1) y(2) y(4) y(5) y(6) y(7) Steen Rendle 13 / 53
  • 21. Factorization Models Polynomial Regression Factorization Machines Applications Summary Application to Sparse Feature Vectors 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... A B C ... TI NH SW ST ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x User Movie Target y 5 3 1 y(3) 4 5 1 5 y(1) y(2) y(4) y(5) y(6) y(7) Applying regression models to this data leads to: Steen Rendle 14 / 53
  • 22. Factorization Models Polynomial Regression Factorization Machines Applications Summary Application to Sparse Feature Vectors 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... A B C ... TI NH SW ST ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x User Movie Target y 5 3 1 y(3) 4 5 1 5 y(1) y(2) y(4) y(5) y(6) y(7) Applying regression models to this data leads to: Linear regression: ^y(x) = w0 + wu + wi Steen Rendle 14 / 53
  • 23. Factorization Models Polynomial Regression Factorization Machines Applications Summary Application to Sparse Feature Vectors 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... A B C ... TI NH SW ST ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x User Movie Target y 5 3 1 y(3) 4 5 1 5 y(1) y(2) y(4) y(5) y(6) y(7) Applying regression models to this data leads to: Linear regression: ^y(x) = w0 + wu + wi Polynomial regression: ^y(x) = w0 + wu + wi + wu;i Steen Rendle 14 / 53
  • 24. Factorization Models Polynomial Regression Factorization Machines Applications Summary Application to Sparse Feature Vectors 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... A B C ... TI NH SW ST ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x User Movie Target y 5 3 1 y(3) 4 5 1 5 y(1) y(2) y(4) y(5) y(6) y(7) Applying regression models to this data leads to: Linear regression: ^y(x) = w0 + wu + wi Polynomial regression: ^y(x) = w0 + wu + wi + wu;i Matrix factorization: ^y(u; i) = hwu; hi i Steen Rendle 14 / 53
  • 25. Factorization Models Polynomial Regression Factorization Machines Applications Summary Application to Sparse Feature Vectors For the data of the example: I Linear regression has no user-item interaction. Steen Rendle 15 / 53
  • 26. Factorization Models Polynomial Regression Factorization Machines Applications Summary Application to Sparse Feature Vectors For the data of the example: I Linear regression has no user-item interaction. I ) Linear regression is not expressive enough. Steen Rendle 15 / 53
  • 27. Factorization Models Polynomial Regression Factorization Machines Applications Summary Application to Sparse Feature Vectors For the data of the example: I Linear regression has no user-item interaction. I ) Linear regression is not expressive enough. I Polynomial regression includes pairwise interactions but cannot estimate them from the data. Steen Rendle 15 / 53
  • 28. Factorization Models Polynomial Regression Factorization Machines Applications Summary Application to Sparse Feature Vectors For the data of the example: I Linear regression has no user-item interaction. I ) Linear regression is not expressive enough. I Polynomial regression includes pairwise interactions but cannot estimate them from the data. I n p2: number of cases is much smaller than number of model parameters. Steen Rendle 15 / 53
  • 29. Factorization Models Polynomial Regression Factorization Machines Applications Summary Application to Sparse Feature Vectors For the data of the example: I Linear regression has no user-item interaction. I ) Linear regression is not expressive enough. I Polynomial regression includes pairwise interactions but cannot estimate them from the data. I n p2: number of cases is much smaller than number of model parameters. I Max.-likelihood estimator for a pairwise eect is: wi ;j = ( y w0 wi wu; if (i ; j ; y) 2 S: not de
  • 30. ned; else Steen Rendle 15 / 53
  • 31. Factorization Models Polynomial Regression Factorization Machines Applications Summary Application to Sparse Feature Vectors For the data of the example: I Linear regression has no user-item interaction. I ) Linear regression is not expressive enough. I Polynomial regression includes pairwise interactions but cannot estimate them from the data. I n p2: number of cases is much smaller than number of model parameters. I Max.-likelihood estimator for a pairwise eect is: wi ;j = ( y w0 wi wu; if (i ; j ; y) 2 S: not de
  • 32. ned; else I Polynomial regression cannot generalize to any unobserved pairwise eect. Steen Rendle 15 / 53
  • 33. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Model Examples Properties Learning libFM Software Applications Summary Steen Rendle 16 / 53
  • 34. Factorization Models Polynomial Regression Factorization Machines Applications Summary Factorization Machine (FM) I Let x 2 Rp be an input vector with p predictor variables. I Model equation (degree 2): ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji hvi ; vj i xi xj I Model parameters: w0 2 R; w 2 Rp; V 2 Rpk Steen Rendle 17 / 53 [Rendle 2010, Rendle 2012]
  • 35. Factorization Models Polynomial Regression Factorization Machines Applications Summary Factorization Machine (FM) I Let x 2 Rp be an input vector with p predictor variables. I Model equation (degree 2): ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji hvi ; vj i xi xj I Model parameters: w0 2 R; w 2 Rp; V 2 Rpk Compared to Polynomial regression: I Model equation (degree 2): ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji wi ;j xi xj I Model parameters: w0 2 R; w 2 Rp; W 2 Rpp Steen Rendle 17 / 53 [Rendle 2010, Rendle 2012]
  • 36. Factorization Models Polynomial Regression Factorization Machines Applications Summary Factorization Machine (FM) I Let x 2 Rp be an input vector with p predictor variables. I Model equation (degree 2): ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji hvi ; vj i xi xj I Model parameters: w0 2 R; w 2 Rp; V 2 Rpk Steen Rendle 17 / 53 [Rendle 2010, Rendle 2012]
  • 37. Factorization Models Polynomial Regression Factorization Machines Applications Summary Factorization Machine (FM) I Let x 2 Rp be an input vector with p predictor variables. I Model equation (degree 3): ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji hvi ; vj i xi xj + Xp i=1 Xp ji Xp lj Xk f =1 v(3) i ;f v(3) j ;f v(3) l ;f xi xj xl I Model parameters: w0 2 R; w 2 Rp; V 2 Rpk ; V(3) 2 Rpk Steen Rendle 17 / 53 [Rendle 2010, Rendle 2012]
  • 38. Factorization Models Polynomial Regression Factorization Machines Applications Summary Factorization Machines: Discussion I FMs work with real valued input. I FMs include variable interactions like polynomial regression. I Model parameters for interactions are factorized. I Number of model parameters is O(k p) (instead of O(p2) for poly. regr.). Steen Rendle 18 / 53
  • 39. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Model Examples Properties Learning libFM Software Applications Summary Steen Rendle 19 / 53
  • 40. Factorization Models Polynomial Regression Factorization Machines Applications Summary Matrix Factorization and Factorization Machines Two categorical variables encoded with real valued predictor variables: 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... A B C ... TI NH SW ST ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x User Movie With this data, the FM is identical to MF with biases1: ^y(x) = w0 + wu + wi + hvu; vi i | {z } MF 1libFM, k = 128, MCMC inference, Net ix RMSE=0.8937 Steen Rendle 20 / 53
  • 41. Factorization Models Polynomial Regression Factorization Machines Applications Summary RDF-Triple Prediction with Factorization Machines Three categorical variables encoded with real valued predictor variables: 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0 0 0 1 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... S1 S2 S3 ... P1 P2 P3 P4 ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x 1 0 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 ... ... ... ... ... 0 0 0 1 ... O1 O2 O3 O4 ... Subject Predicate Object With this data, the FM is equivalent to the PITF model: ^y(x) := w0 + ws + wp + wo + hvs ; vpi + hvs ; voi + hvp; voi [PITF: Rendle et al. 2010, WSDM Best Student Paper, ECML 2009 Best DC Award] Steen Rendle 21 / 53
  • 42. Factorization Models Polynomial Regression Factorization Machines Applications Summary Time with Factorization Machines Two categorical variables and time as linear predictor: 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... A B C ... TI NH SW ST ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x User Movie 0.2 0.6 0.61 0.3 0.5 0.1 0.8 Time The FM model would correspond to: ^y(x) := w0 + wi + wu + t wtime + hvu; vi i + t hvu; vtimei + t hvi ; vtimei Steen Rendle 22 / 53
  • 43. Factorization Models Polynomial Regression Factorization Machines Applications Summary Time with Factorization Machines Two categorical variables and time discretized in bins (b(t)): 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... A B C ... TI NH SW ST ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x User Movie 1 0 0 1 0 1 0 0 1 1 0 1 0 0 Time 0 0 0 0 0 0 1 T1 T2 T3 The FM model would correspond to:2 ^y(x) := w0 + wi + wu + wb(t) + hvu; vi i + hvu; vb(t)i + hvi ; vb(t)i 2libFM, k = 128, MCMC inference, Net ix RMSE=0.8873 Steen Rendle 22 / 53
  • 44. Factorization Models Polynomial Regression Factorization Machines Applications Summary SVD++ 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0.3 0.3 0.3 0 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... A B C ... TI NH SW ST ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x 0.3 0.3 0 0 0.5 0.3 0.3 0 0 0 0.3 0.3 0.5 0.5 0.5 0 0 0.5 0.5 0 ... ... ... ... ... 0.5 0 0.5 0 ... TI NH SW ST ... User Movie Other Movies rated With this data, the FM3 is identical to: ^y(x) = SVD++ z }| { w0 + wu + wi + hvu; vi i + 1 p jNuj X l2Nu hvi ; vl i + 1 p jNuj X l2Nu 0 @wl + hvu; vl i + 1 p jNuj X l 02Nu ;l 0l hvl ; v0 l i 1 A 3libFM, k = 128, MCMC inference, Net ix RMSE=0.8865 Steen Rendle 23 / 53 [Koren, 2008]
  • 45. Factorization Models Polynomial Regression Factorization Machines Applications Summary Factorizing Personalized Markov Chains (FPMC) Two categorical variables (u,i ), one set categorical (Bt1): 1 0 0 ... 1 0 0 ... x(3) 1 0 0 ... 0 0 1 0 ... 0.5 0.5 0 0 ... 0 1 0 ... 0 1 0 ... 0 0 1 ... 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ... ... ... ... ... 0 0 1 ... 0 0 1 0 ... u1 u2 u3 ... A B C D ... x(1) x(2) x(4) x(5) x(6) x(7) Feature vector x 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ... ... ... ... ... 1 0 0 0 ... User Product A B C D ... Last Basket Sequential Baskets u1 A,B C u2 C D u3 A C FM is equivalent to ^y(x) := w0 + wu + wi + 1 jBt1j X j2Bt1 wj + hvu; vi i + 1 jBt1j X j2Bt1 hvi ; vj i + ::: Steen Rendle 24 / 53 [Rendle et al. 2010, WWW Best Paper]
  • 46. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Model Examples Properties Learning libFM Software Applications Summary Steen Rendle 25 / 53
  • 47. Factorization Models Polynomial Regression Factorization Machines Applications Summary Computation Complexity Factorization Machine model equation: ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji hvi ; vj i xi xj I Trivial computation: O(p2 k) Steen Rendle 26 / 53
  • 48. Factorization Models Polynomial Regression Factorization Machines Applications Summary Computation Complexity Factorization Machine model equation: ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji hvi ; vj i xi xj I Trivial computation: O(p2 k) I Ecient computation can be done in: O(p k) Steen Rendle 26 / 53
  • 49. Factorization Models Polynomial Regression Factorization Machines Applications Summary Computation Complexity Factorization Machine model equation: ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji hvi ; vj i xi xj I Trivial computation: O(p2 k) I Ecient computation can be done in: O(p k) I Making use of many zeros in x even in: O(Nz (x) k), where Nz (x) is the number of non-zero elements in vector x. Steen Rendle 26 / 53
  • 50. Factorization Models Polynomial Regression Factorization Machines Applications Summary Ecient Computation The model equation of an FM can be computed in O(p k). Steen Rendle 27 / 53
  • 51. Factorization Models Polynomial Regression Factorization Machines Applications Summary Ecient Computation The model equation of an FM can be computed in O(p k). Proof: ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji hvi ; vj i xi xj = w0 + Xp i=1 wi xi + 1 2 Xk f =1 2 4 Xp i=1 xi vi ;f !2 Xp i=1 (xi vi ;f )2 3 5 Steen Rendle 27 / 53
  • 52. Factorization Models Polynomial Regression Factorization Machines Applications Summary Ecient Computation The model equation of an FM can be computed in O(p k). Proof: ^y(x) := w0 + Xp i=1 wi xi + Xp i=1 Xp ji hvi ; vj i xi xj = w0 + Xp i=1 wi xi + 1 2 Xk f =1 2 4 Xp i=1 xi vi ;f !2 Xp i=1 (xi vi ;f )2 3 5 I In the sums over i , only non-zero xi elements have to be summed up ) O(Nz (x) k). I (The complexity of polynomial regression is O(Nz (x)2).) Steen Rendle 27 / 53
  • 53. Factorization Models Polynomial Regression Factorization Machines Applications Summary Multilinearity FMs are multilinear: 8 2 = fw0;w1; : : : ;wp; v1;1; : : : ; vp;kg : ^y(x; ) = h()(x) + g()(x) where g() and h() do not depend on the value of . Steen Rendle 28 / 53
  • 54. Factorization Models Polynomial Regression Factorization Machines Applications Summary Multilinearity FMs are multilinear: 8 2 = fw0;w1; : : : ;wp; v1;1; : : : ; vp;kg : ^y(x; ) = h()(x) + g()(x) where g() and h() do not depend on the value of . E.g. for second order eects ( = vl ;f ): ^y(x; vl;f ) := g(vl;f )(x) z }| { w0 + Xp i=1 wi xi + Xp i=1 Xp j=i+1 Xk f 0=1 (f 06=f )_(l62fi ;jg) vi ;f 0 vj;f 0 xi xj + vl;f xl X i=1;i6=l vi ;f xi | {z } h(vl;f )(x) Steen Rendle 28 / 53
  • 55. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Model Examples Properties Learning libFM Software Applications Summary Steen Rendle 29 / 53
  • 56. Factorization Models Polynomial Regression Factorization Machines Applications Summary Learning Using these properties, learning algorithms can be developed: I L2-regularized regression and classi
  • 57. cation: I Stochastic gradient descent [Rendle, 2010] I Alternating least squares/ Coordinate Descent [Rendle et al., 2011, Rendle 2012] I Markov Chain Monte Carlo (for Bayesian FMs) [Freudenthaler et al. 2011, Rendle 2012] I L2-regularized ranking: I Stochastic gradient descent [Rendle, 2010] All the proposed learning algorithms have a runtime of O(k Nz (X) i ), where i is the number of iterations and Nz (X) the number of non-zero elements in the design matrix X. Steen Rendle 30 / 53
  • 58. Factorization Models Polynomial Regression Factorization Machines Applications Summary Stochastic Gradient Descent (SGD) I For each training case (x; y) 2 S, SGD updates the FM model parameter using: 0 = (^y(x) y)h()(x) + () I is the learning rate / step size. I () is the regularization value of the parameter . I SGD can easily be applied to other loss functions. Steen Rendle 31 / 53 [Rendle, 2010]
  • 59. Factorization Models Polynomial Regression Factorization Machines Applications Summary Coordinate Descent (CD) I CD updates each FM model parameter using: 0 = P (x;y)2S y g()(x) h()(x) P (x;y)2S h2 ()(x) + () I Using caches of intermediate results, the runtime for updating all model parameters is O(k Nz (X)). I CD can be extended to classi
  • 60. cation [Rendle, 2012]. Steen Rendle 32 / 53 [Rendle et al., 2011]
  • 61. Factorization Models Polynomial Regression Factorization Machines Applications Summary Gibbs Sampling (MCMC) I Gibbs sampling with a block for each FM model parameter : jS; n fg N P (x;y)2S y g()(x) h()(x) P (x;y)2S h2 ()(x) + () ; 1 P (x;y)2S h2 ()(x) + () ! I Mean is the same as for CD ) computational complexity is also O(k Nz (X)). I MCMC can be extended to classi
  • 62. cation using link functions. Steen Rendle 33 / 53 [Freudenthaler et al. 2011, Rendle 2012]
  • 63. Factorization Models Polynomial Regression Factorization Machines Applications Summary Learning Regularization Values  v ,v yi w ,w wj w0 xij i=1,...,n v j  j=1,...,p  , 0 ,0 yi wj w0 w xij i=1,...,n v j  j=1,...,p w0 ,w0 0 ,0 w0 ,w0 w  v v Standard FM with priors. Two level FM with hyperpriors. Steen Rendle 34 / 53 [Freudenthaler et al., 2011]
  • 64. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Model Examples Properties Learning libFM Software Applications Summary Steen Rendle 35 / 53
  • 65. Factorization Models Polynomial Regression Factorization Machines Applications Summary libFM Software libFM is an implementation of FMs I Model: second-order FMs I Learning/ inference: SGD, ALS, MCMC I Classi
  • 66. cation and regression I Uses the same data format as LIBSVM, LIBLINEAR [Lin et. al], SVMlight [Joachims]. I Supports variable grouping. I Open source: GPLv3. Steen Rendle 36 / 53 [http://www.libfm.org/]
  • 67. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Applications Recommender Systems Link Prediction in Social Networks Clickthrough Prediction Personalized Ranking Student Performance Prediction Kaggle Competitions Summary Steen Rendle 37 / 53
  • 68. Factorization Models Polynomial Regression Factorization Machines Applications Summary (Context-aware) Rating Prediction I Main variables: I User ID (categorical) I Item ID (categorical) I Additional variables: I time I mood I user pro
  • 69. le I item meta data I . . . I Examples: Net ix prize, Movielens, KDDCup 2011 + ♪ + + Song User Time Mood Steen Rendle 38 / 53
  • 70. Factorization Models Polynomial Regression Factorization Machines Applications Summary Net ix Prize Netflix Prize: Prediction Error Public Leaderboard RMS Error 0.86 0.87 0.88 0.89 0.90 user, movie user, movie, day user, movie, impl. user, movie, day, impl. SGD Matrix Factorization user, movie, day, impl., freq, lin. day $1M Prize I k = 128 factors, 512 MCMC samples (no burnin phase, initialization from random) I MCMC inference (no hyperparameters (learning rate, regularization) to specify) Steen Rendle 39 / 53
  • 71. Factorization Models Polynomial Regression Factorization Machines Applications Summary Net ix Prize Method (Name) Ref. Learning Method k Quiz RMSE Models using user ID and item ID Probabilistic Matrix Factorization [14, 13] Batch GD 40 *0.9170 Probabilistic Matrix Factorization [14, 13] Batch GD 150 0.9211 Matrix Factorization [6] Variational Bayes 30 *0.9141 Matchbox [15] Variational Bayes 50 *0.9100 ALS-MF [7] ALS 100 0.9079 ALS-MF [7] ALS 1000 *0.9018 SVD/ MF [3] SGD 100 0.9025 SVD/ MF [3] SGD 200 *0.9009 Bayesian Probablistic Matrix Factorization [13] MCMC 150 0.8965 (BPMF) Bayesian Probablistic Matrix Factorization (BPMF) [13] MCMC 300 *0.8954 FM, pred. var: user ID, movie ID - MCMC 128 0.8937 Models using implicit feedback Probabilistic Matrix Factorization with Cons- traints [14] Batch GD 30 *0.9016 SVD++ [3] SGD 100 0.8924 SVD++ [3] SGD 200 *0.8911 BSRM/F [18] MCMC 100 0.8926 BSRM/F [18] MCMC 400 *0.8874 FM, pred. var: user ID, movie ID, impl. - MCMC 128 0.8865 Steen Rendle 40 / 53
  • 72. Factorization Models Polynomial Regression Factorization Machines Applications Summary Net ix Prize Method (Name) Ref. Learning Method k Quiz RMSE Models using time information Bayesian Probabilistic Tensor Factorization [17] MCMC 30 *0.9044 (BPTF) FM, pred. var: user ID, movie ID, day - MCMC 128 0.8873 Models using time and implicit feedback timeSVD++ [5] SGD 100 0.8805 timeSVD++ [5] SGD 200 *0.8799 FM, pred. var: user ID, movie ID, day, impl. - MCMC 128 0.8809 FM, pred. var: user ID, movie ID, day, impl. - MCMC 256 0.8794 Assorted models BRISMF/UM NB corrected [16] SGD 1000 *0.8904 BMFSI plus side information [8] MCMC 100 *0.8875 timeSVD++ plus frequencies [4] SGD 200 0.8777 timeSVD++ plus frequencies [4] SGD 2000 *0.8762 FM, pred. var: user ID, movie ID, day, impl., - MCMC 128 0.8779 freq., lin. day FM, pred. var: user ID, movie ID, day, impl., freq., lin. day - MCMC 256 0.8771 Steen Rendle 40 / 53
  • 73. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Applications Recommender Systems Link Prediction in Social Networks Clickthrough Prediction Personalized Ranking Student Performance Prediction Kaggle Competitions Summary Steen Rendle 41 / 53
  • 74. Factorization Models Polynomial Regression Factorization Machines Applications Summary Link Prediction in Social Networks I Main variables: I Actor A ID I Actor B ID I Additional variables: I pro
  • 75. les I actions I . . . + Actor A Actor B Steen Rendle 42 / 53
  • 76. Factorization Models Polynomial Regression Factorization Machines Applications Summary KDDCup 2012: Track 1 KDDCup 2012 Track 1: Prediction Quality Public Leaderboard Private Leaderboard Mean Average Precision @3 0.32 0.34 0.36 0.38 0.40 0.42 none gender, age, ... keywords friends all none gender, age, ... keywords friends all Top 1 Top 5 Top 10 Top 100 I k = 22 factors, 512 MCMC samples (no burnin phase, initialization from random) I MCMC inference (no hyperparameters (learning rate, regularization) to specify) Steen Rendle 43 / 53 [Awarded 2nd place (out of 658 teams)]
  • 77. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Applications Recommender Systems Link Prediction in Social Networks Clickthrough Prediction Personalized Ranking Student Performance Prediction Kaggle Competitions Summary Steen Rendle 44 / 53
  • 78. Factorization Models Polynomial Regression Factorization Machines Applications Summary Clickthrough Prediction I Main variables: I User ID I Query ID I Ad/ Link ID I Additional variables: I query tokens I user pro
  • 79. le I . . . + keyword... + Link 1 Link 2 Link 3 User Query Ad/ Link Steen Rendle 45 / 53
  • 80. Factorization Models Polynomial Regression Factorization Machines Applications Summary KDDCup 2012: Track 2 Model Inference wAUC (public) wAUC (private) ID-based model (k = 0) SGD 0.78050 0.78086 Attribute-based model (k = 8) MCMC 0.77409 0.77555 Mixed model (k = 8) SGD 0.79011 0.79321 Final ensemble n/a 0.79857 0.80178 Ensemble I Rank positions (not predicted clickthrough rates) are used. I The MCMC attribute-based model and dierent variations of the . SGD models are included. Steen Rendle 46 / 53 [Awarded 3rd place (out of 171 teams)]
  • 81. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Applications Recommender Systems Link Prediction in Social Networks Clickthrough Prediction Personalized Ranking Student Performance Prediction Kaggle Competitions Summary Steen Rendle 47 / 53
  • 82. Factorization Models Polynomial Regression Factorization Machines Applications Summary ECML/PKDD Discovery Challenge 2013 I Problem: Recommend given names. I Main variables: I User ID I Name ID I Additional variables: I session info I string representation for each name I . . . I FM approach won 1st place (online track) and 2nd (oine track). Steen Rendle 48 / 53
  • 83. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Applications Recommender Systems Link Prediction in Social Networks Clickthrough Prediction Personalized Ranking Student Performance Prediction Kaggle Competitions Summary Steen Rendle 49 / 53
  • 84. Factorization Models Polynomial Regression Factorization Machines Applications Summary Student Performance Prediction I Main variables: I Student ID I Question ID I Additional variables: I question hierarchy I sequence of questions I skills required I . . . I Examples: KDDCup 2010, Grockit Challenge4 (FM placed 1st/241) + ? Student Question 4http://www.kaggle.com/c/WhatDoYouKnow Steen Rendle 50 / 53
  • 85. Factorization Models Polynomial Regression Factorization Machines Applications Summary Outline Factorization Models Polynomial Regression Factorization Machines Applications Recommender Systems Link Prediction in Social Networks Clickthrough Prediction Personalized Ranking Student Performance Prediction Kaggle Competitions Summary Steen Rendle 51 / 53
  • 86. Factorization Models Polynomial Regression Factorization Machines Applications Summary Kaggle Competitions FMs have been successfully applied to several Kaggle competitions: I Criteon Display Advertising Challenge: 1st place (team '3 idiots'). I Blue Book for Bulldozers: 1st place (team 'Leustagos Titericz'). I EMI Music Data Science Hackathon: 2nd place (team 'lns'). Steen Rendle 52 / 53
  • 87. Factorization Models Polynomial Regression Factorization Machines Applications Summary Summary I Factorization machines combine linear/polynomial regression with factorization models. I Feature interactions are learned with a low rank representation. I Estimation of unobserved interactions is possible. I Factorization machines can be computed eciently and have high prediction quality. Steen Rendle 53 / 53
  • 88. Factorization Models Polynomial Regression Factorization Machines Applications Summary L. Drumond, S. Rendle, and L. Schmidt-Thieme. Predicting rdf triples in incomplete knowledge bases with tensor factorization. In Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC '12, pages 326{331, New York, NY, USA, 2012. ACM. C. Freudenthaler, L. Schmidt-Thieme, and S. Rendle. Bayesian factorization machines. In NIPS workshop on Sparse Representation and Low-rank Approximation, 2011. Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative
  • 89. ltering model. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 426{434, New York, NY, USA, 2008. ACM. Y. Koren. The bellkor solution to the net ix grand prize. 2009. Steen Rendle 53 / 53
  • 90. Factorization Models Polynomial Regression Factorization Machines Applications Summary Y. Koren. Collaborative
  • 91. ltering with temporal dynamics. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 447{456, New York, NY, USA, 2009. ACM. Y. J. Lim and Y. W. Teh. Variational Bayesian approach to movie rating prediction. In Proceedings of KDD Cup and Workshop, 2007. I. Pilaszy, D. Zibriczky, and D. Tikk. Fast als-based matrix factorization for explicit and implicit feedback datasets. In RecSys '10: Proceedings of the fourth ACM conference on Recommender systems, pages 71{78, New York, NY, USA, 2010. ACM. I. Porteous, A. Asuncion, and M. Welling. Bayesian matrix factorization with side information and dirichlet process mixtures. In Proceedings of the Twenty-Fourth AAAI Conference on Arti
  • 92. cial Intelligence, AAAI 2010, pages 563{568, 2010. Steen Rendle 53 / 53
  • 93. Factorization Models Polynomial Regression Factorization Machines Applications Summary S. Rendle. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, ICDM '10, pages 995{1000, Washington, DC, USA, 2010. IEEE Computer Society. S. Rendle. Factorization machines with libFM. ACM Trans. Intell. Syst. Technol., 3(3):57:1{57:22, May 2012. S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme. Factorizing personalized markov chains for next-basket recommendation. In WWW '10: Proceedings of the 19th international conference on World wide web, pages 811{820, New York, NY, USA, 2010. ACM. S. Rendle, Z. Gantner, C. Freudenthaler, and L. Schmidt-Thieme. Fast context-aware recommendations with factorization machines. In Proceedings of the 34th ACM SIGIR Conference on Reasearch and Development in Information Retrieval. ACM, 2011. R. Salakhutdinov and A. Mnih. Steen Rendle 53 / 53
  • 94. Factorization Models Polynomial Regression Factorization Machines Applications Summary Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th international conference on Machine learning, ICML '08, pages 880{887, New York, NY, USA, 2008. ACM. R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1257{1264, Cambridge, MA, 2008. MIT Press. D. H. Stern, R. Herbrich, and T. Graepel. Matchbox: large scale online bayesian recommendations. In Proceedings of the 18th international conference on World wide web, WWW '09, pages 111{120, New York, NY, USA, 2009. ACM. G. Takacs, I. Pilaszy, B. Nemeth, and D. Tikk. Scalable collaborative
  • 95. ltering approaches for large recommender systems. J. Mach. Learn. Res., 10:623{656, June 2009. Steen Rendle 53 / 53
  • 96. Factorization Models Polynomial Regression Factorization Machines Applications Summary L. Xiong, X. Chen, T.-K. Huang, J. Schneider, and J. G. Carbonell. Temporal collaborative
  • 97. ltering with bayesian probabilistic tensor factorization. In Proceedings of the SIAM International Conference on Data Mining, pages 211{222. SIAM, 2010. S. Zhu, K. Yu, and Y. Gong. Stochastic relational models for large-scale dyadic data using MCMC. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1993{2000, 2009. Steen Rendle 53 / 53