SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

SemanticSVD++:
INCORPORATING SEMANTIC
TASTE EVOLUTION FOR
PREDICTING RATINGS
DR. MATTHEW ROWE
SCHOOL OF COMPUTING AND COMMUNICATIONS
@MROWEBOT | M.ROWE@LANCASTER.AC.UK
International Conference on Web Intelligence 2014
Warsaw, Poland

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
1
1 2 3
1 4* 4* 2*
2 5* ? 1*
3 5* 4* 1*
1 2 3
1 4* 4* 2*
2 5* 4* 1*
3 5* 4* 1*Induce
Model and
Predict
Ratings
Predicting Ratings

2
1 … f
1
2
3
1 2 3
1 4* 4* 2*
2 5* ? 1*
3 5* 4* 1*
1 2 3
1
…
f
≈
Latent Factor Models: Factor Consistency Problem
•  Cannot ‘accurately’ align latent factors
•  Cannot tell how users’ taste have evolved
F = #factors (a priori)
Time
?
?
?
?

3
1 … c
1
2
3
1 2 3
1 4* 4* 2*
2 5* ? 1*
3 5* 4* 1*
≈
i <URI> {<SKOS_CATEGORY>}
Time
c = Dimensionality
of category space
Solution: Semantic Categories
Preference for
category c at
time s
√" √"

4
Semantic Alignment of Datasets
SPARQL Query
for Candidate
URIs from
Movie’s title
Get Semantic
Categories of
each candidate
Disambiguate
based on
Movie’s YearFor each
movie item
{(ItemID,<URI>)}
May Jul Sep Nov Jan
Time
Numberof
040,
(a) MovieLens
Mar May Jul Sep
Time
Numberof
0400
(b) MovieTweetings
Fig. 1. Distribution of reviews per day across the MovieLens and Movi-
eTweetings datasets. The ﬁrst dashed blue line indicates the cutoff point for
the training set, and the dashed red line indicates the cutoff point for the test
set - i.e. every rating after that point is placed in the test set. The validation
set contains the ratings between the blue and red dashed lines.
released in 1979, which we shall now use as a running
example, the following categories are found:
<h t t p : / / dbpedia . org / r e s o u r c e / Alien ( film)>
dcterms : s u b j e c t c a t e g o r y : Alien ( f r a n c h i s e ) f i l m s ;
dcterms : s u b j e c t c a t e g o r y :1979 h o r r o r f i l m s .
In this work we use DBPedia URIs, given their relation
th
B
re
o
it
d
w
a
a
th
in
w
T
in
e
li
fo
th
la

Semantic alignment = fewer elements
Time-ordered datasets split for experiments:
•  80%/10%/10% for training/validation/testing
5
Reduced Recommendation Datasets
RI that
w can
hod for
e used
stances
ained a
e URI
he set
matches
e from
Leven-
iprocal
hold to
the dataset. We also note that the reduction in the number of
ratings is not as great, this suggests two things: (i) mapped
items are popular, and thus dominate the ratings; and (ii)
obscure items are present within the data.
TABLE I. STATISTICS OF THE REVISED REVIEW DATASETS USED FOR
OUR ANALYSIS AND EXPERIMENTS. REDUCTION OVER THE ORIGINAL
DATASETS ARE SHOWN IN PARENTHESES.
Dataset #Users #Items #Ratings
MovieLens 5,390 (-11%) 3,231 (-12.1%) 841,602 (-6.7%)
MovieTweetings 2,357 (-89%) 7,913 (-30.8%) 73,397 (-38.2%)
Total 7,747 11,144 914,999
As Table I suggests, certain more ‘obscure’ movies do
not have DBPedia URIs; despite our use of the most recent
DBPedia datasets (i.e. version 3.9) coverage is still limited in
certain places. The reason for this lack of coverage for certain
items is largely due to the obscurity of the ﬁlm not having a
wikipedia page. For instance, for the MovieLens dataset we fail
to map the three movies ‘Never Met Picasso’, ‘Diebinnen’ and
‘Follow the Bitch’, despite these ﬁlms having IMDB pages they
have no wikipedia page, and hence no DBPedia entry. For the
Movie Tweetings dataset we fail to map ‘Summer Coda’ and
Hipster Dilemma: Occurs when obscure movie
items cannot be aligned to semantic web URIs!

6
Forming Semantic Taste Profiles
Split user’s
training ratings
into 5-stages
Derive the user’s
average rating
per semantic
category
Calculate the
probability of
the user rating
the category
highly
For each stage…
Pu
s
d how their tastes have evolved, at this
e. From this point onwards we reserve
aracters for set notations, as follows:
s, and i, j denote items.
own rating value (where r 2 [1, 5] or
ˆr denotes a predicted rating value.
rovided as quadruples of the form
where t denotes the time of the rating,
nted into training (Dtrain), validation
st (Dtest) sets by the above mentioned
mantic category that an item has been
cats(i) is a convenience function that
of semantic categories of item i.
les
es describe the preferences that a user
time for given semantic categories.
derstanding how a profile at one point
profile at an earlier point in time,
taste evolution has taken place. In
y and Leskovec [5] the assessment of
n the context of review platforms (e.g.
Review) demonstrated the propensity
From these definitions we then derived the discrete prob-
ability distribution of the user rating the category favourably
as follows, defining the set Cu,s
train as containing all unique
categories of items rated by u in stage s:
Pr(c|Du,s
train) =
avrating(Du,s,c
train)
X
c02Cu,s
train
avrating(Du,s,c0
train )
(4)
When implementing this approach, we only consider the
categories that item URIs are directly mapped to; that is,
only those categories that are connected to the URI by the
dbterms:subject predicate. Prior work by Ostuni et al.
[8] performed a mapping where grandparent categories were
mapped to URIs, however we chose the parent categories in
this instance to open up the possibility of other mappings in
the future - i.e. via linked data node vertex kernels.
B. User Taste Evolution: From Prior Taste Profiles
We now turn to looking at the evolution of users’ tastes
over time in order to understand how their preferences change.
Given our use of probability distributions to model the lifecycle
stage specific taste profile of each user, we can apply infor-
mation theoretic measures based on information entropy. One
such measure is conditional entropy, it enables one to assess
the user’s ratings distribution per semantic category within the
allotted time window (provided by the lifecycle stage of the
user as this denotes a closed interval - i.e. s = [t, t0
], t < t0
).
We formed a discrete probability distribution for category c at
time period s 2 S (where S is the set of 5 lifecycle stages)
by interpolating the user’s ratings within the distribution. We
first defined two sets, the former (Du,s,c
train) corresponding to the
ratings by u during period/stage s for items from category c,
and the latter (Du,s
train) corresponding to ratings by u during s,
hence Du,s,c
train ✓ Du,s
train, these sets are formed as follows:
Du,s,c
train = {(u, i, r, t) : (u, i, r, t) 2 Dtrain, t 2 s, c 2 cats(i)}
(1)
Du,s
train = {(u, i, r, t) : (u, i, r, t) 2 Dtrain, t 2 s} (2)
We then defined the function avrating to derive the
average rating value from all rating quadruples in a given set:
avrating(Du,s
train) =
1
|Du,s
train|
X
(u,i,r,t)2Du,s
train
r (3)
the increase in
users’ lifecycles
however the inc
the semantic ta
previous prefer
have follow the
categories
C. User Taste E
Our second
in general hav
modelling user-
development as
entropy to asses
step (s) has bee
and global taste
(s 1). For the
probability dist

7
Taste Evolution from Taste Profiles
0.2250.2350.245
Lifecycle Stages
ConditionalEntropy
●
●
●
●
1 2 3 4 5
(a) MovieLens
0.2750.2800.2850.290
Lifecycle Stages
ConditionalEntropy
●
●
●
●
1 2 3 4 5
(b) MovieTweetings
Fig. 2. Conditional entropy between consecutive lifecycle stages (e.g.
H(P2|P3)) across the datasets, together with the bounds of the 95% con-
fidence interval for the derived means.
users who posted ratings within the time interval of stage s.
Now, assume that we have a random variable that describes the
local categories that have been reviewed at the current stage
(Ys), a random variable of local categories at the previous stage
(Ys 1). and a third random variable of global categories at the
previous stage (Xs 1), we then define the transfer entropy of
one lifecycle stage to another as follows [11]:
TX!Y = H(Ys|Ys 1) H(Ys|Ys 1, Xs 1) (6)
Fig
H(
fide
na
lie
is
gre
Prior Tastes Comparison
•  Computed conditional entropy
between consecutive profiles
•  Increase: divergence from
prior tastes
•  Both datasets’ users diverge
from prior tastes
cle Stages
●
●
●
3 4 5
ieLens
0.2750.2800.2850.290
Lifecycle Stages
ConditionalEntropy
●
●
●
●
1 2 3 4 5
(b) MovieTweetings
entropy between consecutive lifecycle stages (e.g.
datasets, together with the bounds of the 95% con-
derived means.
0.1200.1220.124
Lifecycle Stages
TransferEntropy
●
●
●
●
1 2 3 4 5
(a) MovieLens
0.1120.1140.116
Lifecycle Stages
TransferEntropy
● ●
●
●
1 2 3 4 5
(b) MovieTweetings
Fig. 3. Transfer entropy between consecutive lifecycle stages (e.g.
Global Influence
•  Computed transfer entropy of how
global tastes have influenced users
tastes
•  Decrease: global tastes have a
stronger influence than prior tastes
•  Difference between datasets in
global influence’s role

8
Putting it all together: SemanticSVD++!
95% con-
stage s.
ibes the
nt stage
us stage
es at the
tropy of
(6)
alculate
al prob-
ariables
(7)
named SemanticSV D++
, an extension of Koren et al.’s ear-
lier SV D++
model [2]. The predictive function of the model
is shown in full in Eq. 8, we now explain each component in
greater detail.
ˆrui =
Static Biases
z }| {
µ + bi + bu +
Category Biases
z }| {
↵ibi,cats(i) + ↵ubu,cats(i)
+
Personalisation Component
z }| {
q|
i pu + |R(u)|
1
2
X
j2R(u)
yj
+ |cats(R(u))|
1
2
X
c2cats(R(u))
zc
!
(8)
A. Static Biases
Modified version of SVD++ with:
•  User taste evolution captured in semantic category biases
•  Semantic personalisation component
c latent factor vectors
for each of the rated
categories by the user

egan from:
1
4 k
4X
s=k
Qs+1(c) Qs(c)
Qs(c)
(9)
then calculated the conditional probability
y being rated highly by accounting for the
ng preference for the category as follows:
+|c) =
Prior Rating
z }| {
Q5(c) +
Change Rate
z }| {
cQ5(c) (10)
his over all categories for the item i we can
ving item bias from the provided training
i) =
1
|cats(i)|
X
c2cats(i)
Pr(+|c) (11)
Towards Categories: In the previous sec-
er-user discrete probability distributions that
bility of the user u rating a given category c
Given that a single item can be linked to many categories
on the web of linked data, we take the average across all
categories as the bias of the user given the categories of the
item:
bu,cats(i) =
1
|cats(i)|
X
c2cats(i)
Pr(+|c, u) (15)
Other schemes for calculating the biases towards categories
(both item and user) could be used, e.g. choosing the maximum
bias, however we use the average as an initial scheme.
3) Weighting Category Biases: The above category biases
are derived as static features within the recommendation model
(Eq. 8) mined from the provided training portion, however
each user may be influenced by these factors in different ways
when performing their ratings. To this end we included two
weights, one for each category bias, defined as ↵i and ↵u for
the item biases to categories and the user biases to categories
respectively. As we will explain below, these weights are then
learnt during the training phase of inducing the model.
C. Personalisation Component
9
shown in full in Eq. 8, we now explain each component in
eater detail.
ˆrui =
Static Biases
z }| {
µ + bi + bu +
Category Biases
z }| {
↵ibi,cats(i) + ↵ubu,cats(i)
+
Personalisation Component
z }| {
q|
i pu + |R(u)|
1
2
X
j2R(u)
yj
+ |cats(R(u))|
1
2
X
c2cats(R(u))
zc
!
(8)
Static Biases
The static biases include the general bias of the given
taset (µ), which is the mean rating score across all ratings
category c began from:
c =
1
4 k
4X
s=k
Qs+1(c) Qs(c)
Qs(c)
(9)
om this we then calculated the conditional probability
given category being rated highly by accounting for the
e rate of rating preference for the category as follows:
Pr(+|c) =
Prior Rating
z }| {
Q5(c) +
Change Rate
z }| {
cQ5(c) (10)
y averaging this over all categories for the item i we can
ate the evolving item bias from the provided training
ent:
bi,cats(i) =
1
|cats(i)|
X
c2cats(i)
Pr(+|c) (11)
Given that a single item ca
on the web of linked data, w
categories as the bias of the u
item:
bu,cats(i) =
1
|cats(i)
Other schemes for calculati
(both item and user) could be u
bias, however we use the aver
3) Weighting Category Bia
are derived as static features w
(Eq. 8) mined from the prov
each user may be influenced b
when performing their ratings
weights, one for each category
the item biases to categories a
respectively. As we will expla
transfer entropy for each user over time and modelling this as
global influence factor u
. We derive this as follows, based o
measuring the proportional change in transfer entropy startin
from lifecycle period k that produced a monotonic increase o
decrease in transfer entropy:
u
=
1
4 k
4X
s=k
T
s+1|s
Q!P T
s|s 1
Q!P
T
s|s 1
Q!P
(13
By combining the average change rate ( u
c ) of the use
highly rating a given category c with the global influence facto
( u
), we then derived the conditional probability of a use
rating a given category highly as follows, where Pu
5 denote
the taste profile of the user observed for the final lifecycl
stage (5):
Pr(+|c, u) =
Prior Rating
z }| {
Pu
5 (c) +
Change Rate
z }| {
u
c Pu
5 (c) +
Global Influence
z }| {
u
Q5(c) (14
Of global category
rating probability
Average change in Transfer Entropy
of the User
Incorporating Taste Evolution with Biases
From this we then calculated the conditional probability
of a given category being rated highly by accounting for the
change rate of rating preference for the category as follows:
Pr(+|c) =
Prior Rating
z }| {
Q5(c) +
Change Rate
z }| {
cQ5(c) (10)
By averaging this over all categories for the item i we can
calculate the evolving item bias from the provided training
egment:
bi,cats(i) =
1
|cats(i)|
X
c2cats(i)
Pr(+|c) (11)
2) User Biases Towards Categories: In the previous sec-
ion, we induced per-user discrete probability distributions that
captured the probability of the user u rating a given category c
highly during lifecycle stage s: Pu
s (c). Given that users’ taste
evolve, our goal is to estimate the probability of the user rating
an item highly given its categories by capturing how the user’s
preferences for each category have changed in past (decaying
or growing). To capture the development of a user’s preference
or a category we derived the average change rate ( u
c ) over
he k lifecycle periods coming before the final lifecycle stage
n the training set. The parameter k is the number of stages
back in the training segment from which either a monotonic
ncrease or decrease in the probability of rating category c
began from. We define the change rate ( u
c ) as follows:
|cats(i)|
c2cats(i)
Other schemes for calculating the bias
(both item and user) could be used, e.g. ch
bias, however we use the average as an
3) Weighting Category Biases: The a
are derived as static features within the re
(Eq. 8) mined from the provided traini
each user may be influenced by these fac
when performing their ratings. To this e
weights, one for each category bias, defi
the item biases to categories and the use
respectively. As we will explain below, th
learnt during the training phase of induc
C. Personalisation Component
The personalisation component of th
model builds on the existing SV D++
m
[2]. The modified model has four latent fa
denotes the f latent factors associated wit
denotes the f latent factors associated wit
denotes the f latent factors for item j f
items by user u: R(u); and we have defin
Rf
which captures the latent factor vec
for a given semantic category c. We den
tional component as the category factor
General Category
Biases
User Biases to
Categories

Evaluation Setup
10
¨  Tested three models (trained using Stochastic Gradient
Descent)
¤  SVD++ (baseline)
¤  SB-SVD++: SVD++ with Semantic Category Biases
¤  S-SVD++ (SB-SVD++ with personalisation component)
¨  Tuned hyperparameters over the validation splits
¨  Model testing:
¤  Trained models with tuned hyperparameters using both
training and validation splits
¤  Applied to held-out final 10% of reviews
¨  Evaluation measure: Root Mean Square Error

Evaluation Results
11
¨  Significantly outperformed the SVD++ baseline
¨  MovieLens:
¤  Full model (S-SVD++) produces significantly superior
performance
¨  MovieTweetings:
¤  Marginal difference between SB-SVD++ and S-SVD++
TABLE III. ROOT MEAN SQUARE ERROR (RMSE) OF THE THREE
MODELS ACROSS THE TWO DATASETS. EACH DATASET’S BEST MODEL IS
HIGHLIGHTED IN BOLD WITH THE P-VALUE FROM THE MANN-WHITNEY
WITH THE NEXT BEST MODEL.
Model MovieLens MovieTweetings
SV D++ 1.520 0.969
SB SV D++ 1.517 0.963
S SV D++ 1.513 (< 0.001) 0.963 (< 0.1)

Conclusions
12
¨  Semantic taste profiles can track users’ tastes:
¤  Overcomes the factor consistency problem
¤  Enables modelling of global taste influence
¤  SemanticSVD++ boosts recommendation performance
¨  Semantic categories are limited however:
¤  Hipster dilemma
¤  Cold-start Categories

13
dbpedia:c1!
dbpedia:c3!
dbpedia:c4!
Cold-start Categories
dbpedia:c5!
5* 4* ?
Transferring Semantic Categories with Vertex Kernels:
Recommendations with SemanticSVD++. M Rowe. To appear in
the proceedings of the International Semantic Web Conference.
Trentino, Italy. (2014)
dcterms:subject!
dbpedia:c2!
Unrated
Categories

@mrowebot
m.rowe@lancaster.ac.uk
http://www.lancaster.ac.uk/staff/rowem/
Questions?14

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (6)

Ähnlich wie SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

Ähnlich wie SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings (20)

Mehr von Matthew Rowe

Mehr von Matthew Rowe (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings