Machine Learning Model Validation (Aijun Zhang 2024).pdf
Ph.D. Defense - Enhanced Vector Space Models for Content-based Recommender Systems
1. Università degli Studi di Bari ‘Aldo Moro’
Dottorato di Ricerca in Informatica - Ciclo XXIV
Enhanced Vector Space
Models for Content-based
Recommender Systems
Cataldo Musto, Ph.D. Candidate
Supervisor: prof. Giovanni Semeraro
08.06.12
2. what will we talk about
in the next 40 minutes?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
3. life is all
a matter of
decisions
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
4. life is all
a matter of
decisions
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
5. decision-making
is actually challenging
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
6. decision-making
is actually challenging
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
7. decision-making
is actually challenging
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
8. as much
we need to hold
knowledge as possible
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
9. Leibniz
“In things which are
absolutely indifferent
there can be no
choice and consequently
no option or will. ”
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
10. information age
knowledge is spread through the Web
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
11. social media
changed the rules for information
management and knowledge acquisition
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
12. exponential
growth of
the available
information
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
13. Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
14. it is physiologically
impossible
to follow the information flow
in real time
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
15. how much information?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
16. we daily interact
with
393 bits
of information
per second
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
17. human brain
can absorb
126 bits
of information
per second
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
18. we can handle 126 bits of information
we deal with 393 bits of information
ratio: more than
(Source: Adrian C.Ott, The 24-hour customer)
3x
consequence:
Information
Overload
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
19. Information Overload
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
20. Information Overload
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
21. Information Overload
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
22. Information Overload
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
23. Information Overload
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
24. paradox of choice
(Barry Schwartz, TED talk “Why more is less”)
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
25. Buridan’s ass paradox
Two alternatives. The ass cannot decide. It starves.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
26. Is the information overload actually unbearable?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
27. “It is not information
overload. It is
filter failure”
Clay Shirky
talk @Web2.0 Expo
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
28. Solution
we need to the improve
techniques for filtering the
information
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
29. Information Filtering (IF)
“To expose users only with the information that are
relevant for them, thus avoiding information overload.”
to filter.
as kids do when they
play with sand.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
30. IF applications
Example: Recommender System
Relevant items (movies, news, books, etc.) are pushed to the
user according to her needs.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
31. Recommender Systems are an effective way
to face the Information Overload problem
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
32. example
Amazon.com
Recommendations
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
33. Information Retrieval (IR)
“Findings of relevant pieces of information from a collection
of (usually unstructured) data”
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
34. IR applications
Example: Search Engines
Relevant document are returned to the user,
according to her query.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
35. IR vs. IF
• IR and IF represent two strictly related research
areas
• Same goal: to optimize and make easier the
access to (unstructured) data sources
• “Two sides of the same coin” (*)
(*) N.Belkin, W. Croft:
Information Filtering and Information
Retrieval: Two sides of the same coin”,
Communications of ACM, Volume 35,
Issue 12, pp. 29-38, 1992
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
36. IR vs IF: differences
• Little differences
• Representation of user needs
• Query in IR, user profile
in IF
• Convergence
between IR and IF
• Personalized Search !
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
37. Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
38. Ph.D. dissertation
Research Question
Is it possible to exploit the convergence
between IR and IF to introduce a
recommendation framework
based on IR techniques?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
39. outline.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
40. outline (1/2)
• recommender systems
• content-based recommender systems
(CBRS)
• vector space models
• VSM for CBRS
• strengths and weaknesses
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
41. outline (2/2)
• eVSM: enhanced vector space models
• semantics in VSMs
• dimensionality reduction in VSMs
• modeling negation in VSMs
• applications and experimental evaluation
• movie recommendation
• Philips TV-guides personalization
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
42. recommender systems.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
43. definition
guiding the
Recommender Systems have the goal of
users in a personalized way to interesting
or useful objects in a large space of possible
options.
Burke, 2002 (*)
(*) Robin D. Burke: Hybrid Recommender
Systems: Survey and Experiments. UMUAI,
volume 12, issue 4, 331-370 (2002)
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
44. suggestions
• Examples
• books or news to read
• music to be listened to
• movies worth to be
watched
• restaurants, etc.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
45. Some maths (1/2)
• Let
• U set of users
• I set of items
• Given
• user u ∈ U
• item i ∈ I
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
46. Some maths (2/2)
• A recommender system should
predict how relevant item i is
for user u by defining a scoring
function
• f: U×I→[0,1] = scoring
function
• The items with the highest
value of f are labeled as
relevant and returned to
the user
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
47. classes of RSs
• In literature many approaches for building RSs have been introduced.
• Collaborative Recommender Systems
• Content-based Recommender Systems
• Knowledge-based Recommender Systems
• Demographic-based Recommender Systems
• Social Recommender Systems
• Hybrid Recommender Systems
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
48. classes of RSs
• In literature many approaches for building RSs have been introduced.
• Collaborative Recommender Systems
FOCUS
• Content-based Recommender Systems
• Knowledge-based Recommender Systems
• Demographic-based Recommender Systems
• Social Recommender Systems
• Hybrid Recommender Systems
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
49. content-based recommenders
Suggest items similar to those liked in the past by the user
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
50. content-based recommenders
key concepts
• Each item has to be described through a set of
textual features
• Movie plots, content of news, book summaries,
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
51. content-based recommenders
key concepts
• User profile contains the features that often occur in the
items the user liked
• A profile of a user interested in basketball will contain
keywords related to it (example: basketball teams, players or
competitions)
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
52. content-based recommenders
key concepts
• Recommendations are provided by calculating the
overlap between the features stored in the user
profile and those that occur in the item.
• The bigger the overlap, the higher the relevance
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
53. content-based recommenders
example: news recommendations
Items User Profile
User is
interested in
♥
news articles
about sports,
football,
♥ cycling, etc.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
54. content-based recommenders
example: news recommendations
Items Recommendations
♥
♥
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
55. content-based recommenders
example: news recommendations
Items Recommendations
♥
X
♥
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
56. content-based recommenders
example: news recommendations
Items Recommendations
♥
X
♥
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
57. main building block
vector space model
the most adopted IR model (*)
(*) Gerard Salton: A Vector Space Model
for Automatic Indexing, Communications
of the ACM, vol. 18, nr. 11, pages 613–620
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
58. vector space model (VSM)
Testo
• Given a set of n features (vocabulary)
Testo • f={ f1, f2 ... fn }
• Given a set of M items
• Each document (item) is represented as
a point a an n-dimensional vector space
• I = (wi in the itemw is the weight of
i
feature
.....w ) -f1 fn fi
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
59. VSM representation
football news
sports news
politics news
politics news
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
60. research question
Is it possible to exploit VSM
for a recommendation scenario?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
61. VSM for CBRS
how to adapt it?
• In VSM each item is represented as a vector
• User profile vector space representation as well needs a
• How?
• For example, by combining vectors of the items (documents)
the user liked in the past
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
62. VSM representation
user profile
football news
sports news
politics news
politics news
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
63. VSM representation
Recommendation
task seen as
user profile similarity
calculation
football news between vectors
sports news
politics news
politics news
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
64. VSM representation
recommender
systmem suggests
user profile football and
football news sports news
sports news
politics news
politics news
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
65. Can this model be improved?
Yes.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
66. VSM weaknesses
• Modeling Negation
• VSM does not model negative
evidences
• The vector space representation
only depends on the features
that occur in the document,
there are no assumption about
the features that don’t occur
• What a specific user
dislikes is not considered
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
67. VSM weaknesses
• High Dimensionality
• As the number of
documents grows, the
number of features
grows as well
• Large vector spaces are
difficult to manage
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
68. VSM weaknesses
•Language issues
• Does not manage the latent semantic of documents
• String matching-based approach
• A CBRS based on VSM cannot understand
the information it manages
apple ?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
69. VSM weaknesses
•Language issues
• Representation is language-dependant
• User profile built in a language can not be
exploited to provide recommendation of
items described in another language
• It would be good to receive (e.g.) recommendation
about news written by english newspapers even if I
expressed my interest only on italian news articles!
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
70. How to catch these issues?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
71. a novel recommendation framework based on VSM
eVSM
enhanced Vector Space Model (*)
(*) Cataldo Musto: Enhanced Vector Space
Models for Content-based Recommender
Systems, RECSYS 2010, pages 361-364
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
72. eVSM
goals
• To introduce a CBRS based on VSM
• To catch representation issues of VSM
•No Semantics
•High Dimensionality
•No modeling of Negative Information
•Language-dependant recommendations
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
73. a novel recommendation framework based on VSM
eVSM
step 1: modeling semantics
step 2: dimensionality reduction
step 3: modeling negation
step 4: building user profiles
step 5: providing suggestions
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
74. how to improve the semantic modeling in VSMs?
distributional models
(Firth, 1957)
Firth, J.R. A synopsis of linguistic theory
1930-1955. In Studies in Linguistic Analysis,
pp. 1-32, 1957.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
75. distributional models
“meaning
is its use”
L.Wittgenstein
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
76. distributional models
insight
by analyzing large corpus of textual data it is possible
to infer information about the usage (about the meaning)
of the terms.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
77. distributional models
insight
by analyzing large corpus of textual data it is possible
to infer information about the usage (about the meaning)
of the terms.
example
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
78. Distributional Models
term/context matrix
c1 c2 c3 c4 c5 c6 c7 c8 c9
t1 ✔ ✔ ✔ ✔
t2 ✔ ✔ ✔ ✔
t3 ✔ ✔ ✔
t4 ✔ ✔ ✔ ✔
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
79. distributional models
• Key: definition of what is the
‘context’
• Different granularities
are possible
• Document
• Paragraph
• Sentence
• Sliding window of words
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
80. Distributional Models
term/context matrix
c1 c2 c3 c4 c5 c6 c7 c8 c9
t1 ✔ ✔ ✔ ✔
t2 ✔ ✔ ✔ ✔
t3 ✔ ✔ ✔
t4 ✔ ✔ ✔ ✔
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
81. distributional models
beer vs. glass: good overlap
c1 c2 c3 c4 c5 c6 c7 c8 c9
t1 ✔ ✔ ✔ ✔
t2 ✔ ✔ ✔ ✔
t3 ✔ ✔ ✔
t4 ✔ ✔ ✔ ✔
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
82. distributional models
beer vs. spoon: no overlap
c1 c2 c3 c4 c5 c6 c7 c8 c9
t1 ✔ ✔ ✔ ✔
t2 ✔ ✔ ✔ ✔
t3 ✔ ✔ ✔
t4 ✔ ✔ ✔ ✔
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
83. distributional models
recap
models for representing terms/
documents in large vector spaces
light semantics
it is simple to calculate
similarities between words
but the high dimensionality
problem is even worsened!
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
84. a novel recommendation framework based on VSM
eVSM
step 1: modeling semantics
step 2: dimensionality reduction
step 3: modeling negation
step 4: building user profiles
step 5: providing suggestions
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
85. Random Indexing
(Sahlgren, 2005)
Sahlgren, M. An Introduction to Random Indexing.
Proceedings of the Methods and Applications of
Semantic Indexing Workshop, TKE 2005.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
86. dimensionality reduction
random indexing
• Strenghts
• Incremental approach
• Based on
distributional
hypothesis
• Builds a small-scale
semantic vector
space representation
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
87. random indexing
• Input
• n-dimensional term-document matrix
• Output
• k-dimensional term-context matrix
• k << n
• Approximation built upon distributional hypothesis
• Based on contexts, but much more compact!
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
88. random indexing
dimensionality reduction
d1 d2 d3 d4 d5 . . . dn c1 c2 c3 c4 c5 . . . ck
t1 t1
t2
n >> k t2
t3 t3
t4 t4
t5 t5
term/document matrix term/context matrix
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
89. random indexing
dimensionality reduction
d1 d2 d3 d4 d5 . . . dn c1 c2 c3 c4 c5 . . . ck
t1 t1
t2
n >> k t2 k is a simple
t3 t3 parameter of the
model
t4 t4
t5 t5
term/document matrix term/context matrix
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
90. random indexing
dimensionality reduction
d1 d2 d3 d4 d5 . . . dn c1 c2 c3 c4 c5 . . . ck
t1 t1
t2
n >> k t2 the smaller , the k
more the efficiency
t3 t3
and the loss of
t4 t4 information
t5 t5
term/document matrix term/context matrix
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
91. random indexing
some literature
• Roots
• Sparse distributed representations (Kanerva, 1988)
• Studies about Random Projection
• State of the art applications
• Clustering text documents (Kohonen, 2000)
• Image data compression (Bingham, 2001)
• Information Retrieval (Basile, 2010)
• Collaborative filtering (Cisielczyk, 2010)
• Never exploited for CBRS.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
92. How to obtain the smaller
k-dimensional representation?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
93. random indexing
algorithm
• (1) Definition of the context.
• Document ? Paragraph ? Sentence ? Word ?
• (2) Each ‘context’ is assigned a context vector.
• Dimension of the vector = k
• Allowed values = {-1, 0, 1}
• Constraints: non-zero elements have to be much
smaller
• Values distributed in a random way
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
95. random indexing
algorithm
• (3) The vector space representation of a term
t is obtained by combining the random vectors of
the contexts it occurs in.
rc1 = (0, 0, -1, 1, 0, 0, 0, 0)
rc2 = (1, 0, 0, 0, 0, 0, 0, -1)
rc3 = (0, 0, 0, 0, 0, -1, 1, 0) t1 ∈ {c1, c2}
rc4 = (-1, 1-, 0, 0, 0, 0, 0, 0)
rc5 = (0, 0, 0, -1, 1, 0, 0, 0)
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
96. random indexing
algorithm
• (3) The vector space representation of a term
t is obtained by combining the random vectors of
the contexts it occurs in.
rc1 = (0, 0, -1, 1, 0, 0, 0, 0) t1 ∈ {c1, c2}
rc2 = (1, 0, 0, 0, 0, 0, 0, -1)
rc3 = (0, 0, 0, 0, 0, -1, 1, 0) rc1 = (0, 0, -1, 1, 0, 0, 0, 0)
rc4 = (-1, 1-, 0, 0, 0, 0, 0, 0) rc2 = (1, 0, 0, 0, 0, 0, 0, -1
rc5 = (0, 0, 0, -1, 1, 0, 0, 0) t1 = (1, 0, -1, 1, 0, 0, 0, -1)
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
97. random indexing
algorithm
• (3) The vector space representation of a term t is
obtained by combining the random vectors of the
contexts it occurs in.
• (4) The vector space representation of a document
d is obtained by combining the vector space representation
of the terms that occur in the document.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
98. random indexing
algorithm
• (3) The vector space representation of a term t is
obtained by combining the random vectors of the
contexts it occurs in.
output:
WORDSPACE
• (4) The vector space representation of a document
d is obtained by combining the vector space representation
of the terms that occur in the document.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
99. random indexing
algorithm
• (3) The vector space representation of a term t is
obtained by combining the random vectors of the
contexts it occurs in.
output:
DOCSPACE
• (4) The vector space representation of a document
d is obtained by combining the vector space representation
of the terms that occur in the document.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
100. random indexing
WordSpace DocSpace
c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck
t1 d1
t2
Uniform d2
t3 Representation d3
t4 d4
t5 d5
Comparison between Comparison between
terms documents
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
101. Dimensionality reduction is obtained upon a set
of random vectors
Does it sound weird?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
102. random indexing
theoretical basis
• Johnson-Lindenstauss Lemma (*)
• Distance between points are approximately
preserved.
• Constraint: orthogonal vectors
• Random Indexing vectors are nearly-ortoghonal.
• The loss of information depends on the
parameter k (*) Johnson, W and Lindenstauss, J.
Extensions of lipschitz maps into a Hilbert
space. Contemporary Mathematics, 1984
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
103. random indexing
johnson-lindenstrauss lemma
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
104. a novel recommendation framework based on VSM
eVSM
step 1: modeling semantics
step 2: dimensionality reduction
step 3: modeling negation
step 4: building user profiles
step 5: providing suggestions
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
105. quantum negation
(Widdows, 2007)
Sahlgren, M. An Introduction to Random Indexing.
Proceedings of the Methods and Applications of
Semantic Indexing Workshop, TKE 2005.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
106. negation in VSMs
state of the art
• State-of-the-art approaches: poor theoretical background
• Post-retrieval filtering, Rocchio Algorithm (Rocchio,
1971)
• Widdows proposed a different point of view
• Negation view as a form of orthogonality between
vectors
• Vision inherited from Quantum Logic
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
107. negation in VSMs
Quantum Negation
• Some theory
• Given vector a and vector b
• Through quantum negation it is possible to defined a
vector a not b (a ∧¬b)
• Projection of vector a on the subspace
orthogonal to those generated by vector b
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
108. quantum negation
application to CBRS
• Vector A models positive feedbacks
• Information about what a user likes
• Vector B models negative feedbacks
• Information about what a user does not like
• Vector A not B combines both information
sources
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
109. eVSM
building blocks - recap
• Distributional Models
• Light semantic modeling
• Random Indexing
(Sahlgren, 2005)
• Incremental technique for
dimensionality reduction
• Quantum Negation
(Widdows, 2007)
• Negation operator based
on Quantum Logic
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
110. eVSM
building blocks - recap
• A content-based recommendation
framework needs to:
• Represent items
• Build user profiles
• Provide suggestions
• Random Indexing and
Quantum Negation provide a
novel representation model.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
111. a novel recommendation framework based on VSM
eVSM
step 1: modeling semantics
step 2: dimensionality reduction
step 3: modeling negation
step 4: building user profiles
step 5: providing suggestions
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
112. eVSM
building user profiles
• Represent profiles in eVSM
• Vector space representation
• Obtained by combining the
vectors of the items the
user liked
• How?
• Four different profiling models
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
113. User Profiles
Random Indexing-based (RI)
Items Rating Threshold
VSM representation of RI-based profile for user u
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
114. User Profiles
Quantum Negation-based (QN)
Positive User Profile Vector
Negative User Profile Vector
VSM representation of QN-based profile for user u
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
115. User Profiles
Weighted Random Indexing-based (w-RI)
Items Rating Threshold
Higher weight given to the documents with higher rating
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
116. User Profiles
Weighted Quantum Negation-based (w-QN)
Positive User Profile Vector
Negative User Profile Vector
VSM representation of wQN-based profile for user u
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
117. a novel recommendation framework based on VSM
eVSM
step 1: modeling semantics
step 2: dimensionality reduction
step 3: modeling negation
step 4: building user profiles
step 5: providing suggestions
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
118. eVSM
providing suggestions - monolingual scenario
DocSpace
c1 c2 c3 c4 c5 . . . ck
d1
d2
d3
d4
p
P
All the items are vectors in a DocSpace
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
119. eVSM
providing suggestions - monolingual scenario
DocSpace
c1 c2 c3 c4 c5 . . . ck
d1
d2
d3
d4
p
profile is a vector in a DocSpace
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
120. eVSM
providing suggestions - monolingual scenario
DocSpace
c1 c2 c3 c4 c5 . . . ck
d1
d2
d3
d4
p
Similarity calculation between p and each item
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
121. Some maths (1/2)
• Let
• U set of users
• I set of items
• Given
• active user u ∈ U
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
122. Some maths (2/2)
• For each couple (u, ij)
• For both user u and item i a vector
space representation is provided
• u = (fu1, fu2 ... fun)
• i = (fi1, fi2 ... fin)
• Calculate sim(u, ij)
• Cosine similarity
• Order ij in a descending
similarity order
• Return the top-k elements
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
123. Similarity-based
recommendations
Relevance of an
item seen as a
form of
similarity
The most
similar items are
returned to the
target user
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
124. What about multilanguage
recommendations?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
125. eVSM
providing suggestions - multilingual scenario
• eVSM for multilingual recommendations
• Assumption
• The distribution of the terms is (almost) language-
independent
drink bere
beer / birra
glass bicchiere
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
126. eVSM
providing suggestions - multilingual scenario
• eVSM for multilingual recommendations
• Assumption
• The distribution of the terms is (almost) language-
independent
• The position of concept of in a WordSpace beer
will be always the same, regardless the language!
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
127. (english) WordSpace
beer
wine
spoon
dog
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
128. (italian) WordSpace
relationships between
terms stay
birra regardless the
language!
vino
cucchiaio
cane
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
129. eVSM
providing suggestions - multilingual scenario
DocSpace for L1 DocSpace for L2
c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck
Parallel
d1 DocSpaces d1
d2 Built upon the d2
same
d3 d3
set of
d4 random d4
d5
vectors d5
(italian) (english)
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
130. eVSM
providing suggestions - multilingual scenario
DocSpace for L1 DocSpace for L2
c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck
Parallel
d1 DocSpaces d1
d2 Built upon the d2
same
d3 d3
set of
d4 random d4
p vectors d5
L1
user profile in L1
(italian)
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
131. eVSM
providing suggestions - multilingual scenario
DocSpace for L1 DocSpace for L2
c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck
Parallel
d1 DocSpaces d1
d2 Built upon the d2
same
d3 d3
set of
d4 random d4
p vectors p
L1 L1
we can project user profile in the
DocSpace of english items
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
132. eVSM
providing suggestions - multilingual scenario
DocSpace for L1 DocSpace for L2
c1 c2 c3 c4 c5 . . . ck c1 c2 c3 c4 c5 . . . ck
Parallel
d1 DocSpaces d1
d2 Built upon the d2
same
d3 d3
set of
d4 random d4
p vectors p
L1 L1
similarity computations of italian profile with english items
to build multilingual recommendations
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
133. Multilingual recommendations
come with no costs.
Thanks to distributional hypothesis.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
134. experimental evaluation
applications
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
135. evaluation of eVSM
• selected experiments
• movie recommendation
• monolingual scenario
• Cataldo Musto, Giovanni Semeraro, Pasquale Lops, Marco de Gemmis: Random
Indexing and
Negative User Preferences for Enhancing Content-Based Recommender Systems.
EC-Web 2011. 270-281
• multilingual scenario
• Cataldo Musto, Fedelucio Narducci, Pierpaolo Basile, Pasquale Lops, Marco de Gemmis, Giovanni
Semeraro: Cross-Language Information Filtering: Word Sense Disambiguation vs.
Distributional Models. AI*IA 2011
• epg personalization
• Cataldo Musto, Fedelucio Narducci, Pasquale Lops, Giovanni Semeraro, Marco de Gemmis, Mauro Barbieri,
Jan H. M. Korst,Verus Pronk, Ramon Clout. Enhanced Semantic TV-Show Representation for
personalized electronic program guides. UMAP 2012 (to be presented)
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
136. movie recommendation
‘in vitro’ experiments
• Goal: to provide users with recommendations about movies
worth to be watched.
• Subset of 100k MovieLens dataset + Wikipedia content
• Monolingual and Multilingual settings
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
137. monolingual experiment
parameter tuning
• Size of context vectors
• k = 50, 100, 200, 400
• 99% reduction of DocSpace
• original size: 25k
• Profiling models
• RI, w-RI, QN- w-QN
• Weighted vs. Unweighted
• With negations vs. without negation
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
138. experimental design
experiments
• Experiment 1
• Do the weighting scheme and the
introduction of a negation operator
improve the predictive accuracy of the recommendation
models?
• Experiment 2
• How do the model perform with respect to other
state of the art approaches?
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
139. experiment 1
size=100 - Movielens dataset
87
86.69
RI WRI QN WQN
86.25 86.17
85.7485.8
85.61 85.57
85.4685.43
85.5 85.36
85.29
85.03
84.84 84.9
84.7884.8184.84
84.75
84
p@1 P@3 P@5 P@10
Weighted vs Unweighted: improvement under 0.2%
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
140. experiment 1
size=100 - Movielens dataset
87
86.69
RI WRI QN WQN
86.25 86.17
85.7485.8
85.61 85.57
85.4685.43
85.5 85.36
85.29
85.03
84.84 84.9
84.7884.8184.84
84.75
84
p@1 P@3 P@5 P@10
Weighted vs Unweighted: improvement under 0.2%
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
141. experiment 1
size=100 - Movielens dataset
87
86.69
RI WRI QN WQN
86.25 86.17
Peak: +0.52
85.8
85.74
85.61 85.57
85.4685.43
85.5 85.36
85.29
85.03
84.84 84.9
84.7884.8184.84
84.75
84
p@1 P@3 P@5 P@10
However, differences are not statistically significant
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
142. experiment 1
size=400 - Movielens dataset
87
RI WRI QN WQN
86.25
86.01
85.94
85.82
85.59 85.6
85.48
85.55 85.5285.5585.58 85.52
85.5 85.32 85.34
85.24
84.94
84.86
84.75
84
p@1 P@3 P@5 P@10
Negation vs No-negation: improvement under 0.5%
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
143. experiment 1
size=100 - Movielens dataset
87
86.69
RI WRI QN WQN
86.25 86.17
Gap: +1.08
85.8
85.74
85.61 85.57
85.46 85.43
85.5 85.36
85.29
85.03
84.84 84.9
84.78 84.81 84.84
84.75
84
p@1 P@3 P@5 P@10
Some exception, P@1 and P@3 , comparison W-RI vs. W-QN
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
144. experiment 1
size=100 - Movielens dataset
87
86.69
RI WRI QN WQN
86.25 86.17
85.74 85.8
85.61 85.57
85.5
85.29
85.36 Gap: +0.77
85.46 85.43
85.03
84.84 84.9
84.78 84.81 84.84
84.75
84
p@1 P@3 P@5 P@10
The use of negation operator improves the accuracy in a significant way.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
145. experiment 1
size=100 - Movielens dataset
87
86.69
RI WRI QN WQN
86.25 86.17
Gap: +1.08
85.8
85.74
85.61 85.57
85.46 85.43
85.5 85.36
85.29
85.03
84.84 84.9
84.78 84.81 84.84
84.75
84
p@1 P@3 P@5 P@10
Peaks in P@1 and P@3 are statistically significant
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
146. experiment 1
size=100 - Movielens dataset
87
86.69
RI WRI QN WQN
86.25 86.17
85.74 85.8
85.61 85.57
85.46 85.43
85.5 85.36
85.29
85.03
84.84 84.9
84.78 84.81 84.84
84.75
84
p@1 P@3 P@5 P@10
Generally speaking, W-QN configuration outperforms the others.
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
147. experiment 1
size=100 - Movielens dataset
87
86.69
RI WRI QN WQN
86.25 86.17
85.74 85.8
85.61 Gap: +1.4% 85.57
85.46 85.43
85.5 85.36
85.29
85.03
84.84 84.9
84.78 84.81 84.84
84.75
84
p@1 P@3 P@5 P@10
The combined use of weigthing and negation significally improves the accuracy
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
148. experiment 1
impact of negation operator and weighting scheme
context vectors - size
50 100 200 400
P@1 ✔ ✔ ✔
P@3 ✔ ✔ ✔
P@5
P@10 ✔
✔ = statistical significance
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
149. experiment 1
impact of negation operator and weighting scheme
context vectors - size
50 100 200 400
P@1 ✔ ✔ ✔
P@3 ✔ ✔ ✔
P@5
P@10 ✔
The combined use of weigthing and negation significally improves the accuracy
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12
150. experiment 2
87
size=400 - Movielens dataset
eVSM VSM
86.25
85.94 86.01 LSI Bayes
85.58 85.52
85.5 85.39
85.27
84.97
84.85
84.77 84.75
84.75 84.7 84.7
84.58
84.47 84.5
84.43
84
p@1 P@3 P@5 P@10
Gap always around 1%
Cataldo Musto - Enhanced Vector Space Models for Content-based Recommender Systems - Ph.D. defense - University of Bari Aldo Moro, Italy - 08.06.12