SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Recommender Systems
in the Linked Data era
ROBERTO MIRIZZI, PHD
roberto.mirizzi@gmail.com
Outline
What is a Recommender System?
◦ A definition
◦ Types
What is Linked Data?
◦ LOD
◦ DBpedia
Some Recommender Systems (RS):
◦ A content-based RS (memory-based)
◦ A mobile content-based RS (memory-based)
◦ A content-based RS (model-based)
◦ A hybrid RS (model-based)
What is a Recommender System?
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
What is a Recommender System?
Recommender Systems (RSs) are software tools and techniques providing suggestions for items
to be of use to a user.
[F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]
Input Data:
A set of users U = {u1, …, uM}
A set of items I = {i1, …, iN}
The preference matrix R = [ru,i]
Problem Definition:
Given user u and target item i
Predict the preference ru,i
?
?
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Content-based (CB): recommendations are based on the assumption that if in the past a user liked a set of
items with particular features, they will likely go for items having similar characteristics
Recommender Systems: types
animation
fairytale
ogre
castle
Collaborative-filtering (CF): recommendations are based on the assumption that users having similar
history are more likely to have similar tastes/needs
Hybrid: it’s not too hard to guess what they are 
What is Linked Data?
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
What is Linked Data?
A collection of interrelated
datasets on the Web
Principles:
1. Use HTTP URIs to identify
things
2. Leverage standards such as
RDF and SPARQL to provide
information about things
3. Link related things by
relationships
[http://linkeddata.org/]
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
What is Linked Data?
A collection of interrelated
datasets on the Web
Principles:
1. Use HTTP URIs to identify
things
2. Leverage standards such as
RDF and SPARQL to provide
information about things
3. Link related things by
relationships
[http://linkeddata.org/]
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
foaf:page
DBpedia: a Nucleus for a Web of
Open Data
http://dbpedia.org
DBpedia is a crowd-sourced community effort to extract
structured information from Wikipedia and make this
information available on the Web.
DBpedia allows you to ask sophisticated queries against
Wikipedia, and to link the different data sets on the Web to
Wikipedia data.
[Auer et al., DBpedia: A Nucleus for a Web of Open Data. ISWC+ASWC 2007]
[Bizer et el., A crystallization point for the Web of Data. Journal Web Semantics, 2009]
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Querying DBpedia: SPARQL
DBpedia exposes a SPARQL endpoint
(http://dbpedia.org/sparql) to query the dataset.
Results can be provided in several formats (e.g., JSON,
XML, NTriples, etc.)
SPARQL is an RDF query language. Its queries consist of triple patterns, conjunctions, disjunctions and
optional patterns
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
A graph of knowledge
Why don’t we use all this information to foster recommender systems?
Ocean’s Eleven
George Clooney
Brad Pitt
Ocean’s Twelve
Steven
Soderbergh
Catherine Zeta-
Jones
2000s crime films
American criminal
comedy films
Crime films
Crime
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
A graph of knowledge
Ocean’s Eleven
George Clooney
Brad Pitt
Ocean’s Twelve
Steven
Soderbergh
Catherine Zeta-
Jones
2000s crime films
American criminal
comedy films
Crime films
Crime
Why don’t we use all this information to foster recommender systems?
likes
likes
A content-based RS (memory-based)
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
The good old Vector Space Model
[http://en.wikipedia.org/wiki/File:Vector_space_model.jpg]
The Vector Space Model is an algebraic model for
representing both text documents and queries as vectors
of index terms wt,d that are positive and non-binary.
1, 2, ,, ,...,
T
d d d N dv w w w   
, ,t d t d tw tf idf 
,
,
,
t d
t d
k dk
n
tf
n


, ,1
2 2
, ,1 1
( , )
N
i j i qj q i
j
N N
j i j i qi i
w wd d
sim d q
d q w w

 

 


 
 ' '
logt
D
idf
d D t d

 
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Semantic Vector Space Model (i)
Ocean’s Eleven
George Clooney
Steven Soderberg
2000s crime films
Crime
starring
director
subject/broader
genre
Ocean’s Twelve
Brad Pitt
Catherine Zeta-Jones
Crime films
American criminal…
Ocean’s Eleven
Ocean’s Twelve
starring
Each item is expressed as a tensor in a multi-
dimensional space where each dimension
corresponds to a specific property of the
considered datasets (e.g., starring,
subject/broader, director, genre, …)
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
STARRING
George
Clooney [gc]
(38 movies)
Catherine
Z. Jones [czj]
(22 movies)
Brad
Pitt [bp]
(35 movies)
Ocean’s Eleven
[o11]
(13 actors)
  
Ocean’s Twelve
[o12]
(15 actors)
  
STARRING
George
Clooney [gc]
(38 movies)
Catherine
Z. Jones [czj]
(22 movies)
Brad
Pitt [bp]
(35 movies)
Ocean’s Eleven
[o11]
(13 actors)
  
Ocean’s Twelve
[o12]
(15 actors)
  
Semantic Vector Space Model (ii)
starring George Clooney [gc] Catherine Z. Jones [czj] Brad Pitt [bp]
Ocean’s Eleven [o11]
Ocean’s Twelve [o12]
, ,x y x y xactor movie actor movie actorw tf idf 
11,gc ow
12,gc ow 12,czj ow
11,bp ow
12,bp ow
11,czj ow
We can now compute the scalar product between the two vectors to get their similarity…
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Semantic Vector Space Model (iii)
12 11 12 11 12 11
12 12 12 11 11 11
, , , , , ,
12 11
2 2 2 2 2 2
, , , , , ,
( , )
gc o gc o czj o czj o bp o bp o
starring
gc o czj o bp o gc o czj o bp o
w w w w w w
sim o o
w w w w w w
    

    
…and then combine all the similarities for each property:
12 11 12 11 12 11 12 11( , ) () ) ( ,( , , )starring directostarring director subjecr subjecttsim o o sis m oim o si o oo mo        
soon we will see how to compute the p coefficients
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Ready for our first Content-based RS
 ( ) , 1 if likes , 1 otherwisej j j j jprofile u m r r u m r     
( )
( , )
( , )
( )
j
p p j i
p
j
m profile u
i
sim m m
r
P
r u m
profile u






Given a user profile, defined as:
We predict the rating using a Nearest Neighbor Classifier (Memory-based) where the similarity measure is
a linear combination of local similarities:
 ( ) ,j j jprofile u m r r   
or as:
[Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support
Content-based Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS 2012) – best paper]
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
How do we compute the p coefficients?
We need to identify the best possible values for the coefficient p, that is the weights associated
with each property. There are plenty of choices to do that.
Depending on the nature of the user ratings (Likert or binary), we can consider the rating
prediction as a regression problem (linear regression) or as a classification problem (logistic
regression), and minimize a loss function J().
In the former case we can minimize the least squares loss function, and in the latter case we can
minimize the cross-entropy loss function. In both cases we can use gradient descent:
 p p
p
J   


 

Another possible approach is to use a genetic algorithm, to minimize a not smooth loss
function, such as the number of misclassification errors.
A mobile content-based RS (memory-based)
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Let’s go Mobile
(e.g., recommend movies in theaters)
[Vito Claudio Ostuni, Giosia Gentile, Tommaso Di Noia, Roberto Mirizzi, Davide Romito, Eugenio Di Sciascio. Mobile Movie
Recommendations with Linked Data. Human-Computer Interaction & Knowledge Discovery @ CD-ARES’13 (HCI-KDD 2013)]
 ( , ) , 1 if likes with companion , 1 otherwisej j j j jprofile u cmp m r r u m cmp r     
This time the user profile is context-dependent and is defined as:
( , , ) ( , , ) ( )i prefFilter preFilter i postFilter postFilterr u m cmp r u m cmp r u    
h (hierarchy): 1 if the theater is in the same city, 0 otherwise
c (cluster): 1 if the theater is a multiplex, 0 otherwise
cl (co-location): 1 if the theater is close to other POIs, 0 otherwise
ar (association-rule): 1 if the ticket price is known, 0 otherwise
ap (anchor-point proximity): 1 if the theater is close to the user home or office, 0 otherwise
( )
5
postFilter
h c cl ar ap
r u
   

( , )
( , )
( , , )
( , )
j
j j i
m profile u cmp
preFilter i
r sim m m
r u m cmp
profile u cmp




And the prediction is made by two parts, contextual pre-filtering and contextual post-filtering:
A content-based RS (model-based)
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Time for a Model-based CB-RS
George
Clooney [gc]
Catherine Z.
Jones [czj]
Brad Pitt
[bp]
starring
Ocean’s
Eleven [o11]
Ocean’s
Twelve [o12]
Steven
Soderbergh [ss]
director
2000s crime
films [2cf]
Crime films
[cf]
American criminal
comedy [acc]
subject
11,gc ow
12,gc ow 12,czj ow
11,bp ow
12,bp ow
11,czj ow 112 ,cf ow
122 ,cf ow 12,cf ow
11,acc ow
12,acc ow
11,cf ow11,ss ow
12,ss ow
This time each item is represented by a feature vector, where each feature corresponds to a property value.
 ( ) , 1 if likes , 1 otherwisej j j j jprofile u m r r u m r     The user profile is defined as:
[Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito. Exploiting the Web of Data in
Model-based Recommender Systems. 6th ACM Conference on Recommender Systems (RecSys 2012)]
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Training the system with an SVM
classifier
[https://en.wikipedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png]
Support Vector Machine (SVM) is known to work
well for text classification. Our problem of learning
the user profile has a lot of commonalities with it,
such as the sparse nature of the feature vector and
the high dimensionality of the input space.
Main advantages:
1. Feature selection is often not needed (SVM
robust to over-fitting and scales up pretty well)
2. No need to tune parameters like before
We then fit a logistic model to SVM output to
obtain a ranked list of items.
A hybrid RS (model-based)
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Let’s continue with a Hybrid RS
[Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi. Top-N Recommendations from
Implicit Feedback leveraging Linked Open Data. 7th ACM Conference on Recommender Systems (RecSys 2013)]
We want to recommend items i to user u, exploiting both
the LOD knowledge base and other users’ interactions.
The ultimate goal of this recommendation system is to
rank in the top-N positions items to be likely relevant for
the user, in presence of implicit feedback.
Given the nature of the problem, the user profile is
defined as:
 ( ) is relevant forprofile u i i u
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Path-based features
1
# ( )
( )
# ( )
ui
ui D
ui
d
path j
x j
path d



We define as the feature vector encoding all
the interactions between user u and item i. Each
component of this vector represents the relevance
score between u and i with respect to a particular
feature, and is defined as:
D
uix 
The paths can be content-based, collaborative or
hybrid.
Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA
7/12/2013
Learning the ranking function
In order to predict the ranking and form the top-N recommendation lists we deal with the learning to
rank problem by adopting a point-wise approach.
In particular we use a combination of Random Forests and Gradient Boosted Regression Trees (GBRT).
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Julien PLU
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Andre Freitas
 
Towards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebTowards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic Web
Jie Bao
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Trey Grainger
 

Was ist angesagt? (20)

Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法[系列活動] 文字探勘者的入門心法
[系列活動] 文字探勘者的入門心法
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
Ontology development in protégé-آنتولوژی در پروتوغه
Ontology development in protégé-آنتولوژی در پروتوغهOntology development in protégé-آنتولوژی در پروتوغه
Ontology development in protégé-آنتولوژی در پروتوغه
 
Automatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative NetworksAutomatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative Networks
 
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluation
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Personalised Search for the Social Semantic Web
Personalised Search for the Social Semantic WebPersonalised Search for the Social Semantic Web
Personalised Search for the Social Semantic Web
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
 
Towards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebTowards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic Web
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
[DSC x TAAI 2016] 林守德 / 人工智慧與機器學習在推薦系統上的應用
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 
Enhance discovery Solr and Mahout
Enhance discovery Solr and MahoutEnhance discovery Solr and Mahout
Enhance discovery Solr and Mahout
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 

Ähnlich wie Recommender Systems in the Linked Data era

Recommendation Engine Powered by Hadoop
Recommendation Engine Powered by HadoopRecommendation Engine Powered by Hadoop
Recommendation Engine Powered by Hadoop
Pranab Ghosh
 

Ähnlich wie Recommender Systems in the Linked Data era (20)

R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometrics
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Recommendation Engine Powered by Hadoop
Recommendation Engine Powered by HadoopRecommendation Engine Powered by Hadoop
Recommendation Engine Powered by Hadoop
 
Recommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshRecommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab Ghosh
 
inteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access FrameworkinteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access Framework
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
R tutorial
R tutorialR tutorial
R tutorial
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
New Directions in Metadata
New Directions in MetadataNew Directions in Metadata
New Directions in Metadata
 
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesExplanations in Dialogue Systems through Uncertain RDF Knowledge Bases
Explanations in Dialogue Systems through Uncertain RDF Knowledge Bases
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Data Structures for Robotic Learning
Data Structures for Robotic LearningData Structures for Robotic Learning
Data Structures for Robotic Learning
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...
 
OpenML DALI
OpenML DALIOpenML DALI
OpenML DALI
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Resume
ResumeResume
Resume
 

Mehr von Roku

Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010
Roku
 

Mehr von Roku (7)

Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
 
Movie Recommendation with DBpedia - IIR 2012
Movie Recommendation with DBpedia - IIR 2012Movie Recommendation with DBpedia - IIR 2012
Movie Recommendation with DBpedia - IIR 2012
 
From Exploratory Search to Web Search and back - PIKM 2010
From Exploratory Search to Web Search and back - PIKM 2010From Exploratory Search to Web Search and back - PIKM 2010
From Exploratory Search to Web Search and back - PIKM 2010
 
Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010
 
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
Semantic Tags Generation and Retrieval for Online Advertising - CIKM 2010
 
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
A Semantic Web enabled System for Résumé Composition and Publication - SWIM 09
 
Un sistema web-based per la gestione, la classificazione ed il recupero effic...
Un sistema web-based per la gestione, la classificazione ed il recupero effic...Un sistema web-based per la gestione, la classificazione ed il recupero effic...
Un sistema web-based per la gestione, la classificazione ed il recupero effic...
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Recommender Systems in the Linked Data era

  • 1. Recommender Systems in the Linked Data era ROBERTO MIRIZZI, PHD roberto.mirizzi@gmail.com
  • 2. Outline What is a Recommender System? ◦ A definition ◦ Types What is Linked Data? ◦ LOD ◦ DBpedia Some Recommender Systems (RS): ◦ A content-based RS (memory-based) ◦ A mobile content-based RS (memory-based) ◦ A content-based RS (model-based) ◦ A hybrid RS (model-based)
  • 3. What is a Recommender System?
  • 4. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 What is a Recommender System? Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. [F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.] Input Data: A set of users U = {u1, …, uM} A set of items I = {i1, …, iN} The preference matrix R = [ru,i] Problem Definition: Given user u and target item i Predict the preference ru,i ? ?
  • 5. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Content-based (CB): recommendations are based on the assumption that if in the past a user liked a set of items with particular features, they will likely go for items having similar characteristics Recommender Systems: types animation fairytale ogre castle Collaborative-filtering (CF): recommendations are based on the assumption that users having similar history are more likely to have similar tastes/needs Hybrid: it’s not too hard to guess what they are 
  • 7. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 What is Linked Data? A collection of interrelated datasets on the Web Principles: 1. Use HTTP URIs to identify things 2. Leverage standards such as RDF and SPARQL to provide information about things 3. Link related things by relationships [http://linkeddata.org/]
  • 8. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 What is Linked Data? A collection of interrelated datasets on the Web Principles: 1. Use HTTP URIs to identify things 2. Leverage standards such as RDF and SPARQL to provide information about things 3. Link related things by relationships [http://linkeddata.org/]
  • 9. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 foaf:page DBpedia: a Nucleus for a Web of Open Data http://dbpedia.org DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. [Auer et al., DBpedia: A Nucleus for a Web of Open Data. ISWC+ASWC 2007] [Bizer et el., A crystallization point for the Web of Data. Journal Web Semantics, 2009]
  • 10. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Querying DBpedia: SPARQL DBpedia exposes a SPARQL endpoint (http://dbpedia.org/sparql) to query the dataset. Results can be provided in several formats (e.g., JSON, XML, NTriples, etc.) SPARQL is an RDF query language. Its queries consist of triple patterns, conjunctions, disjunctions and optional patterns
  • 11. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 A graph of knowledge Why don’t we use all this information to foster recommender systems? Ocean’s Eleven George Clooney Brad Pitt Ocean’s Twelve Steven Soderbergh Catherine Zeta- Jones 2000s crime films American criminal comedy films Crime films Crime
  • 12. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 A graph of knowledge Ocean’s Eleven George Clooney Brad Pitt Ocean’s Twelve Steven Soderbergh Catherine Zeta- Jones 2000s crime films American criminal comedy films Crime films Crime Why don’t we use all this information to foster recommender systems? likes likes
  • 13. A content-based RS (memory-based)
  • 14. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 The good old Vector Space Model [http://en.wikipedia.org/wiki/File:Vector_space_model.jpg] The Vector Space Model is an algebraic model for representing both text documents and queries as vectors of index terms wt,d that are positive and non-binary. 1, 2, ,, ,..., T d d d N dv w w w    , ,t d t d tw tf idf  , , , t d t d k dk n tf n   , ,1 2 2 , ,1 1 ( , ) N i j i qj q i j N N j i j i qi i w wd d sim d q d q w w            ' ' logt D idf d D t d   
  • 15. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Semantic Vector Space Model (i) Ocean’s Eleven George Clooney Steven Soderberg 2000s crime films Crime starring director subject/broader genre Ocean’s Twelve Brad Pitt Catherine Zeta-Jones Crime films American criminal… Ocean’s Eleven Ocean’s Twelve starring Each item is expressed as a tensor in a multi- dimensional space where each dimension corresponds to a specific property of the considered datasets (e.g., starring, subject/broader, director, genre, …)
  • 16. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 STARRING George Clooney [gc] (38 movies) Catherine Z. Jones [czj] (22 movies) Brad Pitt [bp] (35 movies) Ocean’s Eleven [o11] (13 actors)    Ocean’s Twelve [o12] (15 actors)    STARRING George Clooney [gc] (38 movies) Catherine Z. Jones [czj] (22 movies) Brad Pitt [bp] (35 movies) Ocean’s Eleven [o11] (13 actors)    Ocean’s Twelve [o12] (15 actors)    Semantic Vector Space Model (ii) starring George Clooney [gc] Catherine Z. Jones [czj] Brad Pitt [bp] Ocean’s Eleven [o11] Ocean’s Twelve [o12] , ,x y x y xactor movie actor movie actorw tf idf  11,gc ow 12,gc ow 12,czj ow 11,bp ow 12,bp ow 11,czj ow We can now compute the scalar product between the two vectors to get their similarity…
  • 17. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Semantic Vector Space Model (iii) 12 11 12 11 12 11 12 12 12 11 11 11 , , , , , , 12 11 2 2 2 2 2 2 , , , , , , ( , ) gc o gc o czj o czj o bp o bp o starring gc o czj o bp o gc o czj o bp o w w w w w w sim o o w w w w w w            …and then combine all the similarities for each property: 12 11 12 11 12 11 12 11( , ) () ) ( ,( , , )starring directostarring director subjecr subjecttsim o o sis m oim o si o oo mo         soon we will see how to compute the p coefficients
  • 18. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Ready for our first Content-based RS  ( ) , 1 if likes , 1 otherwisej j j j jprofile u m r r u m r      ( ) ( , ) ( , ) ( ) j p p j i p j m profile u i sim m m r P r u m profile u       Given a user profile, defined as: We predict the rating using a Nearest Neighbor Classifier (Memory-based) where the similarity measure is a linear combination of local similarities:  ( ) ,j j jprofile u m r r    or as: [Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS 2012) – best paper]
  • 19. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 How do we compute the p coefficients? We need to identify the best possible values for the coefficient p, that is the weights associated with each property. There are plenty of choices to do that. Depending on the nature of the user ratings (Likert or binary), we can consider the rating prediction as a regression problem (linear regression) or as a classification problem (logistic regression), and minimize a loss function J(). In the former case we can minimize the least squares loss function, and in the latter case we can minimize the cross-entropy loss function. In both cases we can use gradient descent:  p p p J         Another possible approach is to use a genetic algorithm, to minimize a not smooth loss function, such as the number of misclassification errors.
  • 20. A mobile content-based RS (memory-based)
  • 21. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Let’s go Mobile (e.g., recommend movies in theaters) [Vito Claudio Ostuni, Giosia Gentile, Tommaso Di Noia, Roberto Mirizzi, Davide Romito, Eugenio Di Sciascio. Mobile Movie Recommendations with Linked Data. Human-Computer Interaction & Knowledge Discovery @ CD-ARES’13 (HCI-KDD 2013)]  ( , ) , 1 if likes with companion , 1 otherwisej j j j jprofile u cmp m r r u m cmp r      This time the user profile is context-dependent and is defined as: ( , , ) ( , , ) ( )i prefFilter preFilter i postFilter postFilterr u m cmp r u m cmp r u     h (hierarchy): 1 if the theater is in the same city, 0 otherwise c (cluster): 1 if the theater is a multiplex, 0 otherwise cl (co-location): 1 if the theater is close to other POIs, 0 otherwise ar (association-rule): 1 if the ticket price is known, 0 otherwise ap (anchor-point proximity): 1 if the theater is close to the user home or office, 0 otherwise ( ) 5 postFilter h c cl ar ap r u      ( , ) ( , ) ( , , ) ( , ) j j j i m profile u cmp preFilter i r sim m m r u m cmp profile u cmp     And the prediction is made by two parts, contextual pre-filtering and contextual post-filtering:
  • 22. A content-based RS (model-based)
  • 23. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Time for a Model-based CB-RS George Clooney [gc] Catherine Z. Jones [czj] Brad Pitt [bp] starring Ocean’s Eleven [o11] Ocean’s Twelve [o12] Steven Soderbergh [ss] director 2000s crime films [2cf] Crime films [cf] American criminal comedy [acc] subject 11,gc ow 12,gc ow 12,czj ow 11,bp ow 12,bp ow 11,czj ow 112 ,cf ow 122 ,cf ow 12,cf ow 11,acc ow 12,acc ow 11,cf ow11,ss ow 12,ss ow This time each item is represented by a feature vector, where each feature corresponds to a property value.  ( ) , 1 if likes , 1 otherwisej j j j jprofile u m r r u m r     The user profile is defined as: [Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito. Exploiting the Web of Data in Model-based Recommender Systems. 6th ACM Conference on Recommender Systems (RecSys 2012)]
  • 24. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Training the system with an SVM classifier [https://en.wikipedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png] Support Vector Machine (SVM) is known to work well for text classification. Our problem of learning the user profile has a lot of commonalities with it, such as the sparse nature of the feature vector and the high dimensionality of the input space. Main advantages: 1. Feature selection is often not needed (SVM robust to over-fitting and scales up pretty well) 2. No need to tune parameters like before We then fit a logistic model to SVM output to obtain a ranked list of items.
  • 25. A hybrid RS (model-based)
  • 26. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Let’s continue with a Hybrid RS [Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi. Top-N Recommendations from Implicit Feedback leveraging Linked Open Data. 7th ACM Conference on Recommender Systems (RecSys 2013)] We want to recommend items i to user u, exploiting both the LOD knowledge base and other users’ interactions. The ultimate goal of this recommendation system is to rank in the top-N positions items to be likely relevant for the user, in presence of implicit feedback. Given the nature of the problem, the user profile is defined as:  ( ) is relevant forprofile u i i u
  • 27. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Path-based features 1 # ( ) ( ) # ( ) ui ui D ui d path j x j path d    We define as the feature vector encoding all the interactions between user u and item i. Each component of this vector represents the relevance score between u and i with respect to a particular feature, and is defined as: D uix  The paths can be content-based, collaborative or hybrid.
  • 28. Recommender Systems in the Linked Data Era – HP Labs, Palo Alto, CA 7/12/2013 Learning the ranking function In order to predict the ranking and form the top-N recommendation lists we deal with the learning to rank problem by adopting a point-wise approach. In particular we use a combination of Random Forests and Gradient Boosted Regression Trees (GBRT).