(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
RSDL2016
1. Exploring Deep Space:
Learning Personalized Ranking in a Semantic Space
Jeroen Vuurens - Martha Larson - Arjen de Vries
1
https://arxiv.org/pdf/1608.00276v2
Star Wars IV
Terminator 2
The Matrix
f
(
x
)
3. Semantic spaces
3
Consistent encoding of relations [2,3]
[2] T. Mikolov, and J. Dean. Distributed representations of words and phrases and their
compositionality. Advances in neural information processing systems, 2013.
[3] T. Mikolov, W.-T. Yih, and G. Zweig. Linguistic Regularities in Continuous Space Word
7. Ranking items
7
Star Wars IV #5
Terminator 2 #4
The Matrix #5
Star Wars V, VI
Men in BlackJurassic ParkBack to the Future
Raiders of the Lost Ark
f(x)
11. Implementation: learning vectors
11
[4] Q. V. Le and T. Mikolov. Distributed representations of sentences and documents.
In Proceedings of ICML, 2014.
ParagraphVector-DBOW [4],
movieId
userId_rating
Hierarchical Softmax
12. Implementation: learning vectors
12
[4] Q. V. Le and T. Mikolov. Distributed representations of sentences and documents.
In Proceedings of ICML, 2014.
ParagraphVector-DBOW[4]
Hierarchical Softmax
Star Wars
(user 3, rating 4) => user3_high
13. Implementation: learning vectors
13
[4] Q. V. Le and T. Mikolov. Distributed representations of sentences and documents.
In Proceedings of ICML, 2014.
PV-DBOW [4], content-based
movieId
wordInImdbReview
Hierarchical Softmax
18. Evaluation
System Recall@10 sig. over*
Popularity 0.053
BPRMF1
0.079 4
UserKNN2
0.087 4
WRMF3
0.089 4
DS-CF-500 0.144 1,2,3,4,5
DS-CF-1k 0.151 1,2,3,4,5
DS-CB-10k4
DS-VSM5
18
DS-CF:
• item vectors learned
from user-ratings
• marginally reduces
dimensionality
• sig. more effective
than other models
* all p < 0.001
19. Evaluation
System Recall@10 sig. over*
Popularity 0.053
BPRMF1
0.079 4
UserKNN2
0.087 4
WRMF3
0.089 4
DS-CF-500 0.144 1,2,3,4,5
DS-CF-1k 0.151 1,2,3,4,5
DS-CB-10k4
0.075
DS-VSM5
19
DS-CB:
• item vectors learned
from IMDB user
reviews
• requires high
dimensionality
• potentially useful for
novel items?
* all p < 0.001
20. DS-VSM:
• user-ratings used as
item vector
• ranking the items
according to a
hyperplane that
optimally ranks user’s
past ratings
Evaluation
System Recall@10 sig. over*
Popularity 0.053
BPRMF1
0.079 4
UserKNN2
0.087 4
WRMF3
0.089 4
DS-CF-500 0.144 1,2,3,4,5
DS-CF-1k 0.151 1,2,3,4,5
DS-CB-10k4
0.075
DS-VSM5
0.119 1,2,3,4
20
* all p < 0.001
22. Conclusion
• Semantic item vectors encode
substitutability
• Rank items according to hyperplane,
tuned to a user’s most recent N ratings.
22
23. Conclusion
• Semantic item vectors encode
substitutability
• Rank items according to hyperplane,
tuned to a user’s most recent N ratings.
• Semantic space generalizes over the
similarities between items
23
24. Conclusion
• Semantic item vectors encode
substitutability
• Rank items according to hyperplane,
tuned to a user’s most recent N ratings.
• Semantic space generalizes preferences
• Proposed pairwise L2R architecture
allows to use high-dimensional latent
vectors.
24
25. Questions?
[1] W. Lowe. Towards a theory of semantic space. In
Proceedings of CogSci, 2001.
[2] T. Mikolov, and J. Dean. Distributed representations of
words and phrases and their compositionality. Advances in
neural information processing systems, 2013.
[3] T. Mikolov, W.-T. Yih, and G. Zweig. Linguistic Regularities
in Continuous Space Word Representations. In Proceedings of
HLT-NAACL, 2013.paper: https://arxiv.org/abs/1608.00276
In this work, we looked at the potential benefit
of representing items in a high-dimensional semantic space.
In such a space, the proximity between items reflects their substitutability,
and we show how that can be used at the basis of recommending items to users.
Studies that use word embeddings have shown
that when learning embeddings for words based on the context they appear in,
not only are substitutes likely to positioned in close proximity,
but semantic similarities between words often end up being encoded in a consist way,
as can be seen for word-pairs with a difference in gender.
And for tasks such as analogous reasoning,
it has been shown that the composition over elementary semantic relations
can be used to find missing words
which shows the compositionality of semantic spaces.
We expect that when using a similar learning process to learn semantic vectors for items,
concepts that are useful to describe the differences between groups of items are also consistently encoded.
For example, in the movie domain
some useful encoded concepts may bare similarities to movie genres,
whether a movie is considered exciting, scary,
or contains strong language.
Now, let us look, at an example
In this normalized 2-dimensional semantic space,
we positioned the 20-most popular movies in Movielens
according to their substitutability, which we inferred from their user ratings.
Within this distribution, some useful concepts such as movie genres can be identified,
but also very coarsely we see that less suspenseful movies are positioned near the bottom, suspenseful movies more near the top,
and for instance only movies from the 90’s appear on the left hand side.
However, the low-dimensionality in this example shows there is friction, some items can not be positioned ideally.
For recommendations, many concepts are potentially are useful to describe the interest of groups of users to groups of movies.
Consider for instance that some users may strongly favor a specific actor.
To improve the potential, we should raise the dimensionality of the space would allow for more semantic patterns to be encoded effectively and independently.
Suppose that we have learned a semantic space that encodes the substitutability between movies by useful concepts
how can we use this representation to recommend items to a specific user?
We propose to learn a function
that optimally ranks the items in the collection based on user’s past ratings
thereby finding a mixture of semantic concepts that describes the user’s interest
and thus also ranks the unrated items based on their expected preference.
In this work, we use hyperplanes as a function
to rank the items by their signed distance to the hyperplane.
Why hyperplanes?
Let’s say for instance there is some vector that encodes how scary a movie is.
Within the user population,
we probably find users that like scary movies,
uses that dislike scary movies,
but perhaps more important, we can also find users that are indifferent to whether a movie contains scary elements or not.
Then, users that like or dislike scary movies
they can have a hyperplane orthogonal to the direction in which scariness is encoded in space
while a user that is indifferent to scary elements can have a hyperplane parallel to the scary vector.
Most importantly, therefore, in this semantic space, it does not matter if two movies that are equally preferred by the targeted user are separated by concepts that user is indifferent to.
So then we get to the implementation part.
For learning item vectors, we use the distributed bag of words variant of the Paragraph Vector that was proposed by Le and Mikolov.
In this architecture the input layer is a so-called 1-hot vector,
that contains a node for every item in the collection.
When learning an item-user-rating triple,
only the input node that corresponds to the item is set to 1 and all others are set to 0.
The corresponding column in the lower weight matrix contains the item embedding and is then copied to the hidden layer.
For the output layer, we preprocessed the data by first converting the ratings to High or Low depending on whether the rating is greater or equal to the user’s average rating.
Then we converted each user-rating into a single compound word
that consists of the user_id and whether the rating was labelled high or low.
This vocabulary for the output layer is turned into a Huffman tree to learn a hierarchical softmax.
We learn the ratings one-at-a-time in random order
Interestingly, we can also use the exact same architecture to learn the substitutability between items based on text descriptions.
For this experiment we used all IMDB user reviews for the movies in Movielens,
and then we learn embeddings for a movie by predicting the words that appear in its reviews.
Then for the second step, when we have learned the semantic space
To recommend movies to a specific user, we learn optimal hyperplane coefficients
using a custom architecture that resembles pairwise learning to rank.
We iteratively learn over pairs of movies that have received different ratings,
and unrated items are considered to have a rating of 0.
We insert the vector of the LOWER rated movie in A and the vector of the HIGHER rated movie in B.
The weight matrix contains the hyperplane coefficients,
therefore the hidden nodes will obtain the signed distances to the hyperplane.
In this architecture g will obtain a value close to zero
when the items are ranked correctly
and a value of 1 when ranked in reverse order.
Therefore g directly gives the gradient used for updating,
During learning we only update the weight matrix
by simply adding the gradient times b and subtracting the gradient times a
with respect to a learning rate.
The paper discusses three parameters that are used to control learning:
And most interesting one, theta R
controls the number of most recently rated items by the user,
The rationale here is that if a user’s preference changes over time,
the best recommendations are more like the most recently preferred items
than items that the user preferred long ago.
As we will see, limiting a user’s past preferences
has a large impact on the effectiveness.
For the evaluation, we used Movielens 1M.
Since our system uses the N most recently rated items by the user, the evaluation requires an online setting.
We split the data so that the test part covers 2% of the ratings and only contains the most recent ratings by users.
We only measured effectiveness over ratings of at least 4 out of a maximal 5, i.o.w. the movies that the user really likes.
We compare Recall@10 against the MyMediaLite implementations of BPRMF, WRMF and UserKNN.
Then we move towards the results
On top is the popularity baseline, what happens when we recommend the 10 most popular items the user has not seen.
The next three baselines are the existing collaborative filtering baselines we compared against.
Our first variant is the deep space collaborative filtering variant,
which uses item vectors that are learned from the user-item ratings,
and this significantly and greatly outperforms the other models.
Next is our deep-space content-based variant,
which uses semantic vectors learned from the words in IMDB user reviews. No collaborative filtering data is used.
This variant performs better than the popularity baseline
but less than the existing collaborative filtering baselines,
but it is still interesting to see that we can estimate substitutability between movies from reviews.
this could potentially help to improve the recommendation of new or rarely rated items, provided that we have sufficient text to learn substitutability.
To evaluate the effectiveness of the ranking architecture without using a semantic space,
we included a variant in which we represented every movie by a vector over their user-ratings, every user being a dimension.
And then generated recommendations by learning a hyperplane as described.
This significantly outperformed the existing collaborative filtering baselines we compared against.
This shows that the improvement of our collaborative filtering variant is not just the result of learning semantic vectors,
but also by the way we learn a hyperplane to rank the items
We analyzed how changing the hyperparameters changes the effectiveness.
and these are the 2 most interesting parameters
on the left hand side the dimensionality of the semantic space
We see that the proposed approach is underperforming below 300 dimensions
and maxes out at about 1000 dimensions on the Movielens collection.
and on the right hand side how many of most recently rated items by the targeted user are used find an optimal hyperplane.
And when using only the 5 most recently rated items
the recommendations are far more effective than using more history
In summary:
In this work we represented items in a semantic space in which the proximity between items reflects their substitutability.
Using such a space,
we recommend items
by optimally ranking a user’s past preferences according to a hyperplane,
which in the process also ranks the unrated items according to their expected preference.
Our experiments show a significant improvement over existing baselines on Movielens.
So the pending question is why does this work so well?
The first step of learning semantic representations
provides a generalization over item-similarities
that is useful to generalize beyond specific items when recommending.
Hence the improvement that we observe of our collaborative filtering variant over the VSM variant that uses vectors over user-ratings.
But ongoing experiments indicate that it does not matter much
how you learn a semantic space,
for instance, when using the high-dimensional latent item vectors learned by BPRMF
in the proposed hyperplane ranking method
the results are almost as good as with Paragraph2Vec.
Therefore, the real improvement seems to be the pairwise learning to rank architecture, that allows to handle higher dimensionality than existing algorithms.
this increase in dimensionality
can be used for more extensive encoding of concepts
that are potentially useful for recommending items to a user.
And if two items, such as in this example king and woman,
differ by multiple concepts,
the vector between them approximates the composition over these concepts.