SlideShare ist ein Scribd-Unternehmen logo
1 von 31
ACM Recommender Systems 2010
                       Barcelona, Spain




 Enhanced Vector Space
Models for Content-based
 Recommender Systems
          Cataldo Musto - cataldomusto@di.uniba.it

 University of Bari “Aldo Moro” (Italy), SWAP Research Group
              ACM Recsys 2010 Doctoral Symposium
                           26.09.10
outline                                                                                                                         2/30

               •      Motivations
                    •   Goals
                    •   Analysis of Vector Space Models


               •      Enhanced Vector Space Models
                    •    Random Indexing-based model
                    •    Semantic Vectors-based model


               •      Experimental Evaluation
                    •    Open Issues
                    •    Future Works


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
vector space model                                                                                                            3/30




                               item 2                                    item n




                                                                     item 1

Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
vector space model                                                                                                            4/30

               •      Introduced by Salton in 1975
                    •      Given a set of documents and given N features describing the
                           documents the VSM builds an N-dimensional Vector Space
                    •      Each item is represented as a point in the Vector Space
                    •      Application: Information Retrieval
                          •     Query: point in the Vector Space
                          •     Assumption: the nearest documents in the Vector Space
                                are the most relevant ones
                          •     Cosine Similarity to compute the similarity between
                                query and documents


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
idea                                                                                                                          5/30


                    •      To investigate the impact of Vector Space Models in the
                           area of Information Filtering
                          •     “Information Filtering & Information Retrieval: two sides of the same
                                coin?”, Belkin & Croft, 1992

                               •      Strong Analogies
                                    •      Documents to be retrieved vs. Items to be filtered
                                    •      Query vs. User Profiles
                                    •      Both IF and IR can share the same weighting
                                           techniques (TF/IDF) and similarity measures
                                           (Cosine similarity)


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
vsm analysis                                                                                                                  6/30



                    • Strong Points
                     • State-of-the-art model for the IR
                                community
                          • Clean and Solid formalism
                          • Simpleness of calculations between
                                objects in a VSM


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
vsm analysis (2)                                                                                                              7/30



                    •      Weak Points
                          •     High Dimensionality
                               •      NLP operations (stopwords elimination, stemming and so on)
                          •     Not incremental
                               •      The whole Vector Space has to be generated from scratch
                                      whenever a new item is added to the repository
                          •     Does not manage the latent semantic of documents
                               •      Any permutation of the terms in a document has the same
                                      VSM representation!



Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
goals                                                                                                                         8/30

                    • To introduce tools and techniques able
                           to overcome these drawbacks
                          • Random Indexing
                           • Dimensionality reduction technique
                           • Sahlgren, 2005
                          • Semantic Vectors
                           • Java open-source package
                           • Widdows, 2007
Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
random indexing                                                                                                               9/30

                    •      Random Indexing (RI) is an incremental and
                           effective technique for dimensionality reduction
                          •     Introduced by Sahlgren in 2005


                    •      Based on the so-called “Distributional
                           Hypothesis”
                          •     “Words that occur in the same context tend to
                                have similar meanings”
                          •     “Meaning is its use” (Wittgenstein)

Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
how it works?                                                                                                             10/30

          •       Random Indexing reduces
                  the m-dimensional term/doc
                  matrix to a new
                  k-dimensional matrix


         •      How?
              •      By multiplying the original matrix
                     with a random one, built in an
                     incremental way
                   •      formally: An,m Rm,k = Bn,k
                   •      k << m
              •      After projection, the distance
                     between points in the vector space
                     is preserved

Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
random matrix                                                                                                              11/30

                     •      How is the random matrix build?
                     •      The whole process is based on the concept of
                            “context”
                           •     Given a term, its “context” is the set of other
                                 words it co-occurs with


                     •      The matrix is built in an iterative and incremental way
                           •     The vector representing each document depends on the
                                 term that occur in it
                           •     The vector representing each term depends on its context
                                 (the other terms it co-occurs with)


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
item representation                                                                                                        12/30


               •      A context vector is assigned for each term. This
                      vector has a fixed dimension (k) and it can contain only
                      values in -1, 0,1. Values are distributed in a random way
                      but the number of non-zero elements is much smaller.
               •      The Vector Space representation of a term is obtained
                      by summing the context vectors of the terms it co-
                      occurs with.
               •      The Vector Space representation of a document
                      (item) is obtained by summing the context vectors of
                      the terms that occur in it


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
...summing up                                                                                                              13/30

               •      Random Indexing
                    •      Dimensionality reduction technique
                    •      Similar to LSA
                          •     Incremental
                               •      Tremendous saving of computational resources
                    •      Manages the semantics of documents
                          •     The position of a document (item) in the vector space
                                depends on the position of the terms that occur in the
                                document
                          •     The position of a terms depends on the position of the
                                other terms it co-occurs with!


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
recommendation models                                                                                                      14/30


                          • We developed two different
                                recommendation models
                               • Both based on vector space built
                                      through Random Indexing
                                    • Random Indexing-based
                                           model (RI)
                                    • Semantic Vectors-based
                                           model (SV)

Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
profile representation                                                                                                      15/30

               •      What about the user profiles?
                    •      Assumption
                          •     The information coming from documents (items) that
                                the user liked in the past could be a reliable source of
                                information for building user profiles
                    •      The Vector Space representation of a user profile is obtained
                           by combining the context vectors of all the documents that the
                           user liked in the past.


                    •      Definition of RI-based and SV-based models
                          •     The difference lies in the way they exploit the vector space to
                                build user profiles


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
RI-based approach                                                                                                          16/30




                            Documents                                      Rate                        Threshold




                  VSM representation of RI-based profile for user u
Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
RI-based approach                                                                                                          17/30


               •      The simplest user profile
                    •      Combines the information coming from
                           previously liked documents in an uniform
                           way
                          • Different ratings are not managed!
                          • Definition of a weighted
                                counterpart, called W-RI
                               • Weighted Random Indexing
Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
wRI-based approach                                                                                                         18/30




                            Documents                                      Rate                        Threshold




             VSM representation of wRI-based profile for user u
Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
wRI-based approach                                                                                                         19/30

                     •     Both models inherit a classical problem
                           of VSM
                          •      User profiles modeled only according
                                 to positive preferences
                          •      In classical text classifiers (Naive Bayes, SVM,
                                 etc.) both positive and negative preferences
                                 are modeled


                          •      Definition of Semantic Vectors (SV)
                                 based model to tackle this problem

Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
semantic vectors                                                                                                           20/30

                    •      Open-source package written in Java
                          •     Implements a Random Indexing-based approach
                                for documents indexing


                          •     Integrates a negation operator based on
                                quantum mechanics
                               •      Query as “A not B” are allowed!
                               •      Projection of vector A on the subspace orthogonal to
                                      those generated by the vector B

Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
SV-based approach                                                                                                          21/30

  Positive User Profile Vector




  Negative User Profile Vector




         VSM representation of SV-based profile for user u

Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
wSV-based approach                                                                                                         22/30

  Positive User Profile Vector




  Negative User Profile Vector




       VSM representation of wSV-based profile for user u

Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
recommendation step                                                                                                        23/30

               •      Given a user profile           u and a set of items we can suppose that the most
                      relevant items for u are the nearest ones in the vector space
                    •      RI and wRI: Submission of a query based on
                    •      SV and wSV: Submission of a query based on
                          •     Returns the items with as much as possible features from p+ and as
                                less as possible features from p-


               •      Cosine Similarity to rank the items
               •      Items whose similarity is under a certain threshold are labeled as non-relevant
                      and filtered
               •      Recommendation of the items with the highest similarity w.r.t.
           liked documents are combined.


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
experimental evaluation                                                                                                       24/30


                    •      100k Movielens Dataset
                          •     Content-based information crawled from
                                Wikipedia
                               •      Movies without a Wikipedia entry were deleted
                          •     613 users, 520 items, 40k ratings
                          •     5-fold cross validation
                          •     Average Precision @1, @3, @5, @7, @ 10
                          •     NLP processing: stopwords elimination


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
experimental design                                                                                                           25/30

                          • Experiment 1
                           • Do the weighting schema
                                      improve the predictive accuracy of the
                                      recommendation models?
                          • Experiment 2
                           • Do the introduction of a negation
                                      operator improve the predictive
                                      accuracy of the recommendation models?

Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
results - experiment 1                                                                                                     26/30

                       RI                    W-RI                                             SV                    W-SV

   86.4                                                                       87

86.125                                                                     86.5

  85.85                                                                       86

85.575                                                                     85.5

   85.3                                                                       85
            AVP@1                AVP@5               AVP@10                         AVP@1                AVP@5              AVP@10


            •      Our weighting model (even in this naive form) improves the
                   predictive accuracy of both RI-based and SV-based models
Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
results - experiment 2                                                                                                     27/30

                        RI                     SV                                           W-RI                     W-SV

      87                                                                      87

   86.5                                                                    86.5

      86                                                                      86

   85.5                                                                    85.5

      85                                                                      85
            AVP@1                AVP@5               AVP@10                         AVP@1                AVP@5              AVP@10


            •      The integration of a negation operator based on quantum mechanics
                   improves the predictive accuracy of both RI-based and SV-based models
Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
results                                                                                                                    28/30
                                           RI            W-RI               SV               W-SV                   Bayes
          Av-Precision@1                  85.93          86.33             85.97             86.78                   86.39
          Av-Precision@3                  85.78          85.97             86.19             86.33                   85.97
          Av-Precision@5                  85.75          86.10             85.99             86.16                   85.83
          Av-Precision@7                  85.61          85.92             85.88             85.95                   85.77
         Av-Precision@10                  85.45          85.76             85.76             85.83                   85.75

                 • SV and RI improve the Average Precision
                        with respect to the Naive Bayes approach
                        (currently implemented in our
                        recommender system)
                                                                  28
Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
conclusions                                                                                                                   29/30

                    •      Investigation of the impact of enhanced VSM in
                           the area of content-based recommender systems
                          •     Use of Random Indexing for dimensionality
                                reduction
                          •     Definition of RI and SV-based models
                          •     Encouraging experimental results
                               •      First results improve the predictive accuracy
                                      obtained by classical content-based filtering
                                      techniques (e.g. Bayes)

Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
open issues & future works                                                                                                    30/30

                    •      Work-in-progress
                          •     Experimental Evaluation on a classical TF/IDF-based VSM
                    •      Open Issues
                          •     Looking for a state-of-the-art dataset for the evaluation of content-
                                based recommendation models
                    •      Future Work
                          •     Comparison of the predictive accuracy with different NLP steps
                                (stemming, entity recognition, POS-tagging and so on)
                          •     Integration of Social Media (Facebook, Twitter, LinkedIn) for building
                                accurate user profiles by skipping the training step
                          •     Integration of Linked Data-based representation (by exploiting
                                DBPedia data) to exploit explicit relationships between concepts


Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
http://www.di.uniba.it/~swap/

                                discussion




        Cataldo Musto - cataldomusto@di.uniba.it

     University of Bari (Italy), SWAP Research Group
           ACM Recsys 2010 Doctoral Symposium

Weitere ähnliche Inhalte

Andere mochten auch

Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027
ankitgadgil
 
Recommender Algorithm for PRBT BiPartite Networks - IESL 18 Oct 2016_final_us...
Recommender Algorithm for PRBT BiPartite Networks - IESL 18 Oct 2016_final_us...Recommender Algorithm for PRBT BiPartite Networks - IESL 18 Oct 2016_final_us...
Recommender Algorithm for PRBT BiPartite Networks - IESL 18 Oct 2016_final_us...
Asoka Korale
 
OpenStack Heat slides
OpenStack Heat slidesOpenStack Heat slides
OpenStack Heat slides
dbelova
 

Andere mochten auch (20)

Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
Linked Open Data to Support Content-based Recommender Systems - I-SEMANTIC…
 
Diversity and novelty for recommendation system
Diversity and novelty for recommendation systemDiversity and novelty for recommendation system
Diversity and novelty for recommendation system
 
Couchbase Server 2.0 - Indexing and Querying - Deep dive
Couchbase Server 2.0 - Indexing and Querying - Deep diveCouchbase Server 2.0 - Indexing and Querying - Deep dive
Couchbase Server 2.0 - Indexing and Querying - Deep dive
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
Development Platform as a Service - erfarenheter efter ett års användning - ...
Development Platform as a Service - erfarenheter efter ett års användning -  ...Development Platform as a Service - erfarenheter efter ett års användning -  ...
Development Platform as a Service - erfarenheter efter ett års användning - ...
 
Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027Dwdm naive bayes_ankit_gadgil_027
Dwdm naive bayes_ankit_gadgil_027
 
Recommender Algorithm for PRBT BiPartite Networks - IESL 18 Oct 2016_final_us...
Recommender Algorithm for PRBT BiPartite Networks - IESL 18 Oct 2016_final_us...Recommender Algorithm for PRBT BiPartite Networks - IESL 18 Oct 2016_final_us...
Recommender Algorithm for PRBT BiPartite Networks - IESL 18 Oct 2016_final_us...
 
Semantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsSemantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender Systems
 
Hadoop on the Cloud
Hadoop on the CloudHadoop on the Cloud
Hadoop on the Cloud
 
RAPS: A Recommender Algorithm Based on Pattern Structures
RAPS: A Recommender Algorithm Based on Pattern StructuresRAPS: A Recommender Algorithm Based on Pattern Structures
RAPS: A Recommender Algorithm Based on Pattern Structures
 
Linked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsLinked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N Recommendations
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Recommender system introduction
Recommender system introductionRecommender system introduction
Recommender system introduction
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
OpenStack Heat slides
OpenStack Heat slidesOpenStack Heat slides
OpenStack Heat slides
 
storm at twitter
storm at twitterstorm at twitter
storm at twitter
 
Tutorial on Robustness of Recommender Systems
Tutorial on Robustness of Recommender SystemsTutorial on Robustness of Recommender Systems
Tutorial on Robustness of Recommender Systems
 
A user's perspective on SaltStack and other configuration management tools
A user's perspective on SaltStack and other configuration management toolsA user's perspective on SaltStack and other configuration management tools
A user's perspective on SaltStack and other configuration management tools
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
How to Build Recommender System with Content based Filtering
How to Build Recommender System with Content based FilteringHow to Build Recommender System with Content based Filtering
How to Build Recommender System with Content based Filtering
 

Ähnlich wie Enhanced Vector Space Models for Content-based Recommender Systems

“Introduction to Semantic Segmentation,” a Presentation from Au-Zone Technolo...
“Introduction to Semantic Segmentation,” a Presentation from Au-Zone Technolo...“Introduction to Semantic Segmentation,” a Presentation from Au-Zone Technolo...
“Introduction to Semantic Segmentation,” a Presentation from Au-Zone Technolo...
Edge AI and Vision Alliance
 
Model-Driven Cloud Data Storage
Model-Driven Cloud Data StorageModel-Driven Cloud Data Storage
Model-Driven Cloud Data Storage
jccastrejon
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
aimsnist
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
ijcseit
 
Twin tide skopje2012_june5
Twin tide skopje2012_june5Twin tide skopje2012_june5
Twin tide skopje2012_june5
Evan Karapanos
 
Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07
dingggthu
 

Ähnlich wie Enhanced Vector Space Models for Content-based Recommender Systems (20)

M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
 
“Introduction to Semantic Segmentation,” a Presentation from Au-Zone Technolo...
“Introduction to Semantic Segmentation,” a Presentation from Au-Zone Technolo...“Introduction to Semantic Segmentation,” a Presentation from Au-Zone Technolo...
“Introduction to Semantic Segmentation,” a Presentation from Au-Zone Technolo...
 
Multilayered paper prototyping for user concept modeling
Multilayered paper prototyping for user concept modelingMultilayered paper prototyping for user concept modeling
Multilayered paper prototyping for user concept modeling
 
Domain Modeling for Personalized Learning
Domain Modeling for Personalized LearningDomain Modeling for Personalized Learning
Domain Modeling for Personalized Learning
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
Model-Driven Cloud Data Storage
Model-Driven Cloud Data StorageModel-Driven Cloud Data Storage
Model-Driven Cloud Data Storage
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
 
Classification accuracy of sar images for various land
Classification accuracy of sar images for various landClassification accuracy of sar images for various land
Classification accuracy of sar images for various land
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video Surveillance
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
 
How data science works and how can customers help
How data science works and how can customers helpHow data science works and how can customers help
How data science works and how can customers help
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
 
Twin tide skopje2012_june5
Twin tide skopje2012_june5Twin tide skopje2012_june5
Twin tide skopje2012_june5
 
Grid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the CloudGrid is Dead ? Nimrod on the Cloud
Grid is Dead ? Nimrod on the Cloud
 
PointNet
PointNetPointNet
PointNet
 
ONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESS
 
Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07
 

Mehr von Cataldo Musto

Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Cataldo Musto
 

Mehr von Cataldo Musto (20)

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender System
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart Cities
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social Network
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Enhanced Vector Space Models for Content-based Recommender Systems

  • 1. ACM Recommender Systems 2010 Barcelona, Spain Enhanced Vector Space Models for Content-based Recommender Systems Cataldo Musto - cataldomusto@di.uniba.it University of Bari “Aldo Moro” (Italy), SWAP Research Group ACM Recsys 2010 Doctoral Symposium 26.09.10
  • 2. outline 2/30 • Motivations • Goals • Analysis of Vector Space Models • Enhanced Vector Space Models • Random Indexing-based model • Semantic Vectors-based model • Experimental Evaluation • Open Issues • Future Works Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 3. vector space model 3/30 item 2 item n item 1 Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 4. vector space model 4/30 • Introduced by Salton in 1975 • Given a set of documents and given N features describing the documents the VSM builds an N-dimensional Vector Space • Each item is represented as a point in the Vector Space • Application: Information Retrieval • Query: point in the Vector Space • Assumption: the nearest documents in the Vector Space are the most relevant ones • Cosine Similarity to compute the similarity between query and documents Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 5. idea 5/30 • To investigate the impact of Vector Space Models in the area of Information Filtering • “Information Filtering & Information Retrieval: two sides of the same coin?”, Belkin & Croft, 1992 • Strong Analogies • Documents to be retrieved vs. Items to be filtered • Query vs. User Profiles • Both IF and IR can share the same weighting techniques (TF/IDF) and similarity measures (Cosine similarity) Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 6. vsm analysis 6/30 • Strong Points • State-of-the-art model for the IR community • Clean and Solid formalism • Simpleness of calculations between objects in a VSM Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 7. vsm analysis (2) 7/30 • Weak Points • High Dimensionality • NLP operations (stopwords elimination, stemming and so on) • Not incremental • The whole Vector Space has to be generated from scratch whenever a new item is added to the repository • Does not manage the latent semantic of documents • Any permutation of the terms in a document has the same VSM representation! Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 8. goals 8/30 • To introduce tools and techniques able to overcome these drawbacks • Random Indexing • Dimensionality reduction technique • Sahlgren, 2005 • Semantic Vectors • Java open-source package • Widdows, 2007 Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 9. random indexing 9/30 • Random Indexing (RI) is an incremental and effective technique for dimensionality reduction • Introduced by Sahlgren in 2005 • Based on the so-called “Distributional Hypothesis” • “Words that occur in the same context tend to have similar meanings” • “Meaning is its use” (Wittgenstein) Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 10. how it works? 10/30 • Random Indexing reduces the m-dimensional term/doc matrix to a new k-dimensional matrix • How? • By multiplying the original matrix with a random one, built in an incremental way • formally: An,m Rm,k = Bn,k • k << m • After projection, the distance between points in the vector space is preserved Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 11. random matrix 11/30 • How is the random matrix build? • The whole process is based on the concept of “context” • Given a term, its “context” is the set of other words it co-occurs with • The matrix is built in an iterative and incremental way • The vector representing each document depends on the term that occur in it • The vector representing each term depends on its context (the other terms it co-occurs with) Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 12. item representation 12/30 • A context vector is assigned for each term. This vector has a fixed dimension (k) and it can contain only values in -1, 0,1. Values are distributed in a random way but the number of non-zero elements is much smaller. • The Vector Space representation of a term is obtained by summing the context vectors of the terms it co- occurs with. • The Vector Space representation of a document (item) is obtained by summing the context vectors of the terms that occur in it Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 13. ...summing up 13/30 • Random Indexing • Dimensionality reduction technique • Similar to LSA • Incremental • Tremendous saving of computational resources • Manages the semantics of documents • The position of a document (item) in the vector space depends on the position of the terms that occur in the document • The position of a terms depends on the position of the other terms it co-occurs with! Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 14. recommendation models 14/30 • We developed two different recommendation models • Both based on vector space built through Random Indexing • Random Indexing-based model (RI) • Semantic Vectors-based model (SV) Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 15. profile representation 15/30 • What about the user profiles? • Assumption • The information coming from documents (items) that the user liked in the past could be a reliable source of information for building user profiles • The Vector Space representation of a user profile is obtained by combining the context vectors of all the documents that the user liked in the past. • Definition of RI-based and SV-based models • The difference lies in the way they exploit the vector space to build user profiles Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 16. RI-based approach 16/30 Documents Rate Threshold VSM representation of RI-based profile for user u Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 17. RI-based approach 17/30 • The simplest user profile • Combines the information coming from previously liked documents in an uniform way • Different ratings are not managed! • Definition of a weighted counterpart, called W-RI • Weighted Random Indexing Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 18. wRI-based approach 18/30 Documents Rate Threshold VSM representation of wRI-based profile for user u Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 19. wRI-based approach 19/30 • Both models inherit a classical problem of VSM • User profiles modeled only according to positive preferences • In classical text classifiers (Naive Bayes, SVM, etc.) both positive and negative preferences are modeled • Definition of Semantic Vectors (SV) based model to tackle this problem Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 20. semantic vectors 20/30 • Open-source package written in Java • Implements a Random Indexing-based approach for documents indexing • Integrates a negation operator based on quantum mechanics • Query as “A not B” are allowed! • Projection of vector A on the subspace orthogonal to those generated by the vector B Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 21. SV-based approach 21/30 Positive User Profile Vector Negative User Profile Vector VSM representation of SV-based profile for user u Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 22. wSV-based approach 22/30 Positive User Profile Vector Negative User Profile Vector VSM representation of wSV-based profile for user u Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 23. recommendation step 23/30 • Given a user profile u and a set of items we can suppose that the most relevant items for u are the nearest ones in the vector space • RI and wRI: Submission of a query based on • SV and wSV: Submission of a query based on • Returns the items with as much as possible features from p+ and as less as possible features from p- • Cosine Similarity to rank the items • Items whose similarity is under a certain threshold are labeled as non-relevant and filtered • Recommendation of the items with the highest similarity w.r.t. liked documents are combined. Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 24. experimental evaluation 24/30 • 100k Movielens Dataset • Content-based information crawled from Wikipedia • Movies without a Wikipedia entry were deleted • 613 users, 520 items, 40k ratings • 5-fold cross validation • Average Precision @1, @3, @5, @7, @ 10 • NLP processing: stopwords elimination Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 25. experimental design 25/30 • Experiment 1 • Do the weighting schema improve the predictive accuracy of the recommendation models? • Experiment 2 • Do the introduction of a negation operator improve the predictive accuracy of the recommendation models? Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 26. results - experiment 1 26/30 RI W-RI SV W-SV 86.4 87 86.125 86.5 85.85 86 85.575 85.5 85.3 85 AVP@1 AVP@5 AVP@10 AVP@1 AVP@5 AVP@10 • Our weighting model (even in this naive form) improves the predictive accuracy of both RI-based and SV-based models Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 27. results - experiment 2 27/30 RI SV W-RI W-SV 87 87 86.5 86.5 86 86 85.5 85.5 85 85 AVP@1 AVP@5 AVP@10 AVP@1 AVP@5 AVP@10 • The integration of a negation operator based on quantum mechanics improves the predictive accuracy of both RI-based and SV-based models Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 28. results 28/30 RI W-RI SV W-SV Bayes Av-Precision@1 85.93 86.33 85.97 86.78 86.39 Av-Precision@3 85.78 85.97 86.19 86.33 85.97 Av-Precision@5 85.75 86.10 85.99 86.16 85.83 Av-Precision@7 85.61 85.92 85.88 85.95 85.77 Av-Precision@10 85.45 85.76 85.76 85.83 85.75 • SV and RI improve the Average Precision with respect to the Naive Bayes approach (currently implemented in our recommender system) 28 Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 29. conclusions 29/30 • Investigation of the impact of enhanced VSM in the area of content-based recommender systems • Use of Random Indexing for dimensionality reduction • Definition of RI and SV-based models • Encouraging experimental results • First results improve the predictive accuracy obtained by classical content-based filtering techniques (e.g. Bayes) Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 30. open issues & future works 30/30 • Work-in-progress • Experimental Evaluation on a classical TF/IDF-based VSM • Open Issues • Looking for a state-of-the-art dataset for the evaluation of content- based recommendation models • Future Work • Comparison of the predictive accuracy with different NLP steps (stemming, entity recognition, POS-tagging and so on) • Integration of Social Media (Facebook, Twitter, LinkedIn) for building accurate user profiles by skipping the training step • Integration of Linked Data-based representation (by exploiting DBPedia data) to exploit explicit relationships between concepts Cataldo Musto, Enhanced Vector Space Models for Content-based Recommender Systems - ACM RecSys 2010 Doctoral Symposium - Barcelona, Spain - 26.09.10
  • 31. http://www.di.uniba.it/~swap/ discussion Cataldo Musto - cataldomusto@di.uniba.it University of Bari (Italy), SWAP Research Group ACM Recsys 2010 Doctoral Symposium

Hinweis der Redaktion