Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasets
1. TOOLS AND TECHNIQUES
“FEEDING RECOMMENDER SYSTEMS
WITH LINKED OPEN DATA”
Tommaso Di Noia
t.dinoia@poliba.it
Politecnico di Bari
ITALY
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
2. Recommender Systems
Input Data:
A set of users U={u1, …, uM}
A set of items X={x1, …, xN}
The rating matrix R=[ru,i]
Problem Definition:
Given user u and target item i
Predict the rating ru,i
A definition
Recommender Systems (RSs) are software tools and techniques
providing suggestions for items to be of use to a user.
[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
3. The rating matrix
5 1 2 4 3 ??
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
TheMatrix
Titanic
Iloveshopping
Argo
LoveActually
Thehangover
Tommaso
Enrico
Sean
Natasha
Valentina
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
4. The rating matrix
(in the real world)
5 4 3 ??
2 4 5 5
3 4 3
3 5 5 2
4 4 5 5 2
TheMatrix
Titanic
Iloveshopping
Argo
LoveActually
Thehangover
Tommaso
Enrico
Sean
Natasha
Valentina
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
5. Collaborative Recommender Systems
Collaborative RSs recommend items to a user by identifying
other users with a similar profile
Recommender
System
User profile
Users
Item7
Item15
Item11
…
Top-N Recommendations
Item1, 5
Item2, 1
Item5, 4
Item10, 5
….
….
Item1, 4
Item2, 2
Item5, 5
Item10, 3
….
Item1, 4
Item2, 2
Item5, 5
Item10, 3
….
Item1, 4
Item2, 2
Item5, 5
Item10, 3
….
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
6. Content-based Recommender Systems
Recommender
System
User profile
Item7
Item15
Item11
…
Top-N Recommendations
Item1, 5
Item2, 1
Item5, 4
Item10, 5
….
Items
Item1
Item2
Item100
Item’s
descriptions
….
CB-RSs recommend items to a user based on their description
and on the profile of the user’s interests
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
7. Knowledge-based Recommender Systems
Recommender
System
User profile
Item7
Item15
Item11
…
Top-N Recommendations
Item1, 5
Item2, 1
Item5, 4
Item10, 5
….
Items
Item1
Item2
Item100Item’s
descriptions
….
KB-RSs recommend items to a user based on their description
and domain knowledge encoded in a knowledge base
Knowledge-base
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
10. Content-Based Recommender Systems
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
P. Lops, M. de Gemmis, G. Semeraro. Content-based recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach, B. Shapira,
editors, Recommender Systems Hankbook: A complete Guide for Research Scientists & Practitioners
11. Content-Based Recommender Systems
• Items are described in terms of
attributes/features
• A finite set of values is associated to each
feature
• Items representation is a (Boolean) vector
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
12. Content-Based Recommender Systems
• Compute similarity between items
𝑠𝑖𝑚 𝑥𝑖, 𝑥𝑗 =
|𝑥𝑖 ∩ 𝑥𝑗|
|𝑥𝑖 ∪ 𝑥𝑗|
Jaccard similarity
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
13. Content-Based Recommender Systems
• Compute similarity between items
𝑠𝑖𝑚 𝑥𝑖, 𝑥𝑗 =
𝑥𝑖 ⋅ 𝑥𝑗
𝑥𝑖 ∗ |𝑥𝑗|
Cosine similarity
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
14. Content-Based Recommender Systems
• Compute similarity between items
𝑠𝑖𝑚 𝑥𝑖, 𝑥𝑗 =
𝑥𝑖 ⋅ 𝑥𝑗
𝑥𝑖 ∗ |𝑥𝑗|
Cosine similarity and TF-IDF
𝑇𝐹 𝑣, 𝑥 = # 𝑣 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥
𝐼𝐷𝐹 𝑣 = log
|𝑋|
#𝑣 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑋
𝑥 = (𝑇𝐹 𝑣1, 𝑥 ∗ 𝐼𝐷𝐹 𝑣1 , … , 𝑇𝐹 𝑣 𝑛, 𝑥 ∗ 𝐼𝐷𝐹 𝑣 𝑛 )
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
15. Content-Based Recommender Systems
Rate prediction
𝑟 𝑢𝑖, 𝑥′ =
𝑠𝑖𝑚 𝑥, 𝑥′ ∗ 𝑟𝑥,𝑢 𝑖𝑥∈𝑋 𝑢 𝑖
𝑠𝑖𝑚 𝑥, 𝑥′𝑥∈𝑋 𝑢 𝑖
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
16. Content-Based Recommender Systems
• Nearest neighbors
– Given a set of items representing the user profile,
select their most similar items which are not in
the user profile
– Predict the rate only for the N nearest neighbors
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
17. Content-Based Recommender Systems
• Nearest neighbors with kNN
𝑥𝑖,1
𝑥𝑖,2
𝑥𝑖,3
𝑥𝑖,4
𝑥𝑖,5
𝑥𝑖,6
k = 5
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
19. Need of domain knowledge!
We need rich descriptions of the items!
No suggestion is available if the analyzed content does not contain enough
information to discriminate items the user might like from items the user
might not like.*
(*) P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach and B. Shapira,
editors, Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners
The quality of CB recommendations are correlated with the quality of the
features that are explicitly associated with the items.
Main CB RSs Drawback: Limited Content Analysis
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
20. A solution based on Linked Data
Use Linked Data to mitigate
the limited content analysis
issue
• Plenty of structured data
available
• No Content Analyzer
required
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
21. FRED: Transforming Natural Language text
to RDF/OWL graphs
input
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Courtesy of Valentina Presutti
22. SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
FRED: RDF/OWL graph output
Resolution and linking
Vocabulary
alignment
Frames/events
Taxonomy
induction
Semantic roles
Co-reference
Custom namespace
Types
Induced types
(sense tagging)
22
Courtesy of Valentina Presutti
23. Linked Data as structured information
source for items descriptions
Rich items descriptions
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
24. Select the domain(s) of your RS
SELECT count(?i) AS ?num ?c
WHERE {
?i a ?c .
FILTER(regex(?c, "^http://dbpedia.org/ontology")) .
}
ORDER BY DESC(?num)
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
25. Tìpalo: automatic typing
of DBpedia entities
Input: natural language definition of an entity
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Courtesy of Valentina Presutti
26. 26
Linking to WN supersenses
DBpedia resource
Extracted type
Disambiguated sense
Inferred superclass
Linked resource
Disambiguated sense
A chaise longue is an upholstered sofa in
the shape of a chair that is long enough
to support the legs.
A chaise longue (English /ˌʃeɪz ˈlɔːŋ/;[1] French
pronunciation: [ʃɛzlɔ̃ŋɡ(ə)], "long chair") is an
upholstered sofa in the shape of a chair that is long
enough to support the legs.
Linking to DOLCE
http://en.wikipedia.org/wiki/Chaise_longue
Rich taxonomical
data!
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Courtesy of Valentina Presutti
27. Select the sub-graph you are interested in
select ?m ?p ?o1
where{
?m ?p ?o1 .
?m dcterms:subject ?o.
dbpedia:The_Matrix dcterms:subject ?o.
?m a dbpedia-owl:Film.
filter(regex(?p,"^http://dbpedia.org/ontology/")).
}
order by ?m
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
28. Enriching Annotations with Resources
From the Web
Watson: find ontologies on the Semantic Web
for enriching the representation of your data
Sindice: find other entities for enriching
your knowledge base
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
SameAs.org: find other entities to link to
from your knowledge base
Courtesy of Valentina Presutti
29. Computing similarity in LOD datasets
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
30. Vector Space Model for LOD
Righteous Kill
starring
director
subject/broader
genre
Heat
RobertDeNiro
JohnAvnet
Serialkillerfilms
Drama
AlPacino
BrianDennehy
Heistfilms
Crimefilms
starring
RobertDeNiro
AlPacino
BrianDennehy
Righteous Kill
Heat
… …
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
31. Vector Space Model for LOD
Righteous Kill
STARRING
Al Pacino
(v1)
Robert
De Niro
(v2)
Brian
Dennehy
(v3)
Righteous
Kill (m1)
X X X
Heat (m2) X X
Heat
Righteous Kill (x1) wv1,x1 wv2,x1 wv3,x1
Heat (x2) wv1,x2 wv2,x2 0
𝑤 𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 = 𝑡𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 ∗ 𝑖𝑑𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
32. Vector Space Model for LOD
Righteous Kill
STARRING
Al Pacino
(v1)
Robert
De Niro
(v2)
Brian
Dennehy
(v3)
Righteous
Kill (m1)
X X X
Heat (m2) X X
Heat
Righteous Kill (x1) wv1,x1 wv2,x1 wv3,x1
Heat (x2) wv1,x2 wv2,x2 0
𝑤 𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 = 𝑡𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 ∗ 𝑖𝑑𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜
𝑡𝑓 ∈ {0,1}
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
34. A LOD-based CB RS
Given a user profile defined as:
We can predict the rating using a Nearest Neighbor Classifier wherein the similarity
measure is a linear combination of local property similarities
𝑿 𝒖 𝒊
= { < 𝒙𝒊, 𝒓 𝒙 𝒊,𝒖 𝒊
> }
𝑟 𝑢𝑖, 𝑥′ =
𝛼 𝑝 ∗ 𝑠𝑖𝑚 𝑝(𝑥𝑖, 𝑥′)𝑝
𝑃
∗ 𝑟𝑥 𝑖,𝑢 𝑖<𝑥 𝑖,𝑟 𝑥 𝑖,𝑢 𝑖
> ∈ 𝑋 𝑢 𝑖
|𝑋 𝑢 𝑖
|
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
35. We’ve just scratched the surface…
Missing in this presentation (not a complete list)
• Learning 𝛼 𝑝
• Explanation
– Quite straight with a content-based approach
• Hybrid approaches
– Mix a content based approach with a collaborative one
– Collaborative similarity as a further vector space (property)
of the content based approach?
– Exploit the graph-based nature of both the rating matrix
and the RDF graph
• …
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
36. Many thanks to Vito Claudio Ostuni and Roberto Mirizzi
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web