SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Mathematical methods of
Tensor Factorization
applied to Recommender Systems
Giuseppe Ricci, PhD Student in Computer Science
University of Study of Bari “A. Moro”
Advances in DataBases and Information Systems
PhD Consortium, Genoa, 01 Septembre 2013
Semantic
Web
Access and
Personalization
research group
http://www.di.uniba.it/~swap
Dipartimento
di Informatica
Information Overload & Recommender Systems
On internet today, an overabundance of information can be
accessed, making it difficult for users to process and
evaluate options and make appropriate choices.
Recommender Systems (RS) are techniques for
information filtering which play an important role in e-
commerce, advertising, e-mail filtering, etc.
What do RS do exactly?
① Predict how much you may like a certain product/service
② Compose a list of N best items for you
③ Compose a list of N best users for a certain product/service
④ Explain why these items are recommended to you
⑤ Adjust the prediction and recommendation based on your
feedback (ratings) and other people
I1 I2 I3 I4 I5 I6 I7 I8 I9
U1 1 5 4
U2 4 2 5
U3 4 5
U4 5 2 4
A 1 3 1 3 1 4 5 8
user-item matrix
Matrix Factorization
Matrix Factorization (MF) techniques fall in the class of
collaborative filtering (CF) methods  latent factor
models: similarity between users and items is induced
by some factors hidden in the data
Latent factor models build a matrix of users and items and
each element is associated with a vector of characteristics
MF techniques represent users and items by vectors of
features derived from ratings given by users for the items
seen or tried
Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for
recommender systems. IEEE Computer, 42(8):30-37, 2009.
Matrix Factorization
U set of users, D set of items, R rating matrix.
MF aims to factorize R into two matrices P and Q such that their
product approximates R:
P row: strength of the association between user and k latent
features.
Q column: strength of the association between an item and the
latent features.
Once these vectors are discovered, recommendations are
calculated using the expression of
A MF used in literature: Singular Value Decomposition (SVD):
• introduced by Simon Funk in the NetFlix Prize
• has the objective of reducing the dimensionality, i. e. the rank,
of the user-item matrix
• capture latent relationships between users and items
T T
ij i jR P Q r p q
ijr
SVD
Different SVD algorithms were used in RS literature:
• in [15], the authors uses a small SVD obtained retaining only
k << r singular values by discarding other entries;
• in [11], the authors propose an algorithm to perform SVD on large
matrices, by focusing the study on parameters that affect the
convergence speed;
• in [9], Koren presents an approach oriented on factor models
which projected users and items in the same latent space where
some measures for comparison are defined. He propose several
versions of SVD with the objective of having better
recommendations as well as good scalability
[15] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Incremental singular value
decomposition algorithms for highly scalable recommender systems.
[11] Miklos Kurucz, Andras A. Benczur, and Balazs Torma. Methods for large scale SVD with missing
values.
[9] Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model.
Limitation of MF Techniques
They take into account only the standard profile of users and items This does
not allow to integrate further information such as context
Contextual information (the place where the user see the movie, the device, the
company...) cannot be managed with simple user-item matrices
Family with
children
At cinema with
friends or collegues
Tensors & Tensor Factorization
[6] R.A. Harshman. Foundations of the PARAFAC Procedure: Models and Conditions
for an "explanatory" Multi-modal Factor Analysis, volume 1 (16) of Working papers in
phonetics. University of California at Los Angeles, 1970.
[12] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular
value decomposition. SIAM J. Matrix Anal. Appl, 21:1253-1278, 2000.
Tensors are higher-dimensional arrays of numbers might be exploited in
order to include additional contextual information in the recommendation
process.
The techniques that generalize the MF can also be applied to tensors.
Two particular Tensor Factorizations (TF) can be considered to be higher-
order extensions of matrix singular value decomposition:
• PARallel FACtor analysis [6] or CANonical DECOMPosition
(PARAFAC/CANDECOMP), which decomposes a tensor as a sum of
rank-one tensors;
• High Order Singular Value Decomposition [12] (HOSVD), which is
an higher-order form of Principal Component Analysis (PCA)
HOSVD is the most widely adopted TF technique.
HOSVD is a generalization of the SVD for matrices: decomposes the
initial tensor in N matrices (N is the size of the tensor) and a “small
tensor”.
Examples of HOSVD in RS:
• Multiverse recommendation [7]: TF is applied to manage data for users,
movies, user ratings and contextual information such as age, day of the
week, companion;
• Tensor factorization for tag recommendation [13]: for a social tagging
system, users' data, items and tags are stored in a 3rd order tensor
factored, aim: discovering latent factors which bind the associations
user-item, user-tag and tag-item;
[7] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver.
Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative
filtering.
[13] Steen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme.
Learning optimal ranking with tensor factorization for tag recommendation. In KDD, pages 727-736,
2009.
HOSVD & RS 1/2
HOSVD & RS 2/2
• Cubesvd [17]: system of personalized web search, in order to
discover the hidden relationships between users, queries, web pages.
Data are collected in a 3rd order tensor that is decomposed.
[17] Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. Cubesvd: a novel approach to
personalized web search. In Proceedings of the 14th international conference on World Wide Web, WWW '05, pages
382-390, New York, NY, USA, 2005. ACM.
HOSVD: advantages & disadvantages
Advantages:
• the ability of taking into account more dimensions
simultaneously
• better data modeling than standard SVD, dimensionality reduction
can be performed not only in one dimension but also separately for
each dimension
Disadvantages:
• is not an optimal tensor decomposition, in the
sense of least squares data fitting: in SVD truncating the first n
singular values allows to find the best n-rank approximation of a
given matrix
• high computational cost
• cannot deal with missing values  they are treated as 0
PARAFAC
PARAFAC model of a 3-dimensional array is given by 3 loading
matrices A, B and C with typical elements aif , bjf , and ckf .
PARAFAC model is defined by:
ˆxijk = aif bjf ckf
f =1
F
å
F: number of rank-one components.
PARAFAC Advantages:
• alternative to HOSVD
• more simplicity
• linear computation time compared to HOSVD
• does not collapse data, but it retains its natural 3-dimensional
structure
• components are unique, up to permutation and scaling, under mild
conditions
PARAFAC, RS and not only 1/2
In Tfmap: optimizing map for top-n context-aware recommendation [16]:
tensor of 3-dimensions (users, items and context types) is factorized with
PARAFAC.
Dimensions are associated with the 3 factor matrices and used to calculate
user preference for item i under context type k.
Problem: PARAFAC & Missing Data
Solution: CP-WOPT algorithm
[16] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic, and Nuria Oliver. Tfmap:
optimizing map for top-n context-aware recommendation. In Proceedings of the 35th international ACM SIGIR
conference on Research and development in information retrieval, SIGIR '12, pages 155{164, New York, NY, USA,
2012. ACM.
PARAFAC, RS and not only 2/2
In Scalable tensor factorizations with missing data, PARAFAC & Missing
Data. CP-WOPT [1] (CP Weighted OPTimization) algorithm uses 1st-order
optimization to solve the weighted least squares objective function.
Using extensive numerical experiments on simulated data sets CP-WOPT can
successfully factor tensors with noise and up to 70% missing data.
CP-WOPT is significantly faster and accurate than the best published method
in literature.
[1] Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, and Morten Mrup. Scalable tensor factorizations with
missing data. In SDM10: Proceedings of the 2010 SIAM International Conference on Data Mining, pages 701-
712, Philadelphia, April 2010. SIAM.
CP-WOPT adaptation: Preliminary Experiments 1/3
CP-WOPT algorithm adapted to RS:
• takes into account missing values  the algorithm is suitable for very
sparse user-item matrices
• computation of a weighted factorization that models only known
values, rather to simply employ 0 values for missing data
• main goals:
• good reconstruction of missing values
• consider contextual information  to achieve more precise
recommendations.
Preliminary user study: users rated some movies (not all) under
contextual factors
 7 real users
 11 movies in the Movielens 100k dataset
 contextual factors: if they like to see the movie
 at home or cinema;
 with friends or with partner;
 with or without family.
CP-WOPT adaptation: Preliminary Experiments 2/3
Main Goal: good reconstruction of missing values with CP-WOPT
adapted
Ratings range: 1 to 5
Rating coding:
• 1-2: strong-modest preference for the 1st option
• 3: neutrality;
• 4-5: modest-strong preference for the 2nd option
Metrics:
accuracy (acc ), % of known values correctly reconstructed
coverage (cov ), % of non-zero values returned
Results: 105 maximum iterations
acc = 94.4%
cov = 91.7%
100
100
known values
errors
acc
100
cov 100
unknown values
errors
Other quality results:
the experiment showed that it is possible to express, through the n-
dimensional factorization, not only the recommendations for the single
user, but also more specific suggestions about the consumption of an
item.
CP-WOPT adaptation: Preliminary Experiments 3/3
In Vitro: Preliminary Experiment
Main Goal: test CP-WOPT adapted on RS for more precise
recommendations
Adapted version of CP-WOPT  subset (significant number of ratings)
of Movielens 100k dataset.
Ratings given by users wich have a profession are stored in a 3rd order
tensor.
Input: tensor of dimensions 100 users, 150 movies, 21 occupations (the
contextual factor)
Results:
acc = 92.09%
cov = 99.96%
MAE = 0.60
RMSE = 0.93
in line with results reported in literature
Ongoing and Future Work
• Extend the evaluation of our version of CP-WOPT on tensor
having high dimensionality (Movielens dataset)
• investigate methods to assess whether and which contextual
factors (occupation, company) inuflence the users' preferences
• user’s segmentation
• plan to test our approach in other domains such as news
recommendation or Electronic Program Guides
Thanks for your attention!!
Dott. Giuseppe Ricci
PhD Student in Computer Science
Department of Computer Science
4 floor LACAM Lab., SWAP Room
Phone: +39-080-5442298
E-mail: giuseppe.ricci@uniba.it

Weitere ähnliche Inhalte

Was ist angesagt?

INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
dannyijwest
 
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONIMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
adeij1
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
butest
 
11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classification11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classification
Alexander Decker
 

Was ist angesagt? (18)

Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Abstract
AbstractAbstract
Abstract
 
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeEfficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
 
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATIONIMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
IMPROVED SENTIMENT ANALYSIS USING A CUSTOMIZED DISTILBERT NLP CONFIGURATION
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
Sherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type deteSherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type dete
 
Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...Sources of errors in distributed development projects implications for colla...
Sources of errors in distributed development projects implications for colla...
 
Datamining intro-iep
Datamining intro-iepDatamining intro-iep
Datamining intro-iep
 
Multidirectional Product Support System for Decision Making In Textile Indust...
Multidirectional Product Support System for Decision Making In Textile Indust...Multidirectional Product Support System for Decision Making In Textile Indust...
Multidirectional Product Support System for Decision Making In Textile Indust...
 
On the benefit of logic-based machine learning to learn pairwise comparisons
On the benefit of logic-based machine learning to learn pairwise comparisonsOn the benefit of logic-based machine learning to learn pairwise comparisons
On the benefit of logic-based machine learning to learn pairwise comparisons
 
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
 
Df32676679
Df32676679Df32676679
Df32676679
 
Dwd mdatamining intro-iep
Dwd mdatamining intro-iepDwd mdatamining intro-iep
Dwd mdatamining intro-iep
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: Prediction
 
11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classification11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classification
 
Hybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classificationHybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classification
 

Andere mochten auch (9)

Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Rodriguez_DRT_Abstract_Beamer
Rodriguez_DRT_Abstract_BeamerRodriguez_DRT_Abstract_Beamer
Rodriguez_DRT_Abstract_Beamer
 
Dimensionality reduction
Dimensionality reductionDimensionality reduction
Dimensionality reduction
 
Inode explanation
Inode explanationInode explanation
Inode explanation
 
How inodes Work
How inodes WorkHow inodes Work
How inodes Work
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 

Ähnlich wie PhD Consortium ADBIS presetation.

Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
IJEACS
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
Editor IJCATR
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Editor IJAIEM
 

Ähnlich wie PhD Consortium ADBIS presetation. (20)

factorization methods
factorization methodsfactorization methods
factorization methods
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...NS-CUK Seminar: H.B.Kim,  Review on "metapath2vec: Scalable representation le...
NS-CUK Seminar: H.B.Kim, Review on "metapath2vec: Scalable representation le...
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)Ijsws14 423 (1)-paper-17-normalization of data in (1)
Ijsws14 423 (1)-paper-17-normalization of data in (1)
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
 
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...
 
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
Comparative Analysis of RMSE and MAP Metrices for Evaluating CNN and LSTM Mod...
 
Poster
PosterPoster
Poster
 
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 

Kürzlich hochgeladen

Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
David Celestin
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Hung Le
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
ZurliaSoop
 

Kürzlich hochgeladen (17)

lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 

PhD Consortium ADBIS presetation.

  • 1. Mathematical methods of Tensor Factorization applied to Recommender Systems Giuseppe Ricci, PhD Student in Computer Science University of Study of Bari “A. Moro” Advances in DataBases and Information Systems PhD Consortium, Genoa, 01 Septembre 2013 Semantic Web Access and Personalization research group http://www.di.uniba.it/~swap Dipartimento di Informatica
  • 2. Information Overload & Recommender Systems On internet today, an overabundance of information can be accessed, making it difficult for users to process and evaluate options and make appropriate choices. Recommender Systems (RS) are techniques for information filtering which play an important role in e- commerce, advertising, e-mail filtering, etc.
  • 3. What do RS do exactly? ① Predict how much you may like a certain product/service ② Compose a list of N best items for you ③ Compose a list of N best users for a certain product/service ④ Explain why these items are recommended to you ⑤ Adjust the prediction and recommendation based on your feedback (ratings) and other people I1 I2 I3 I4 I5 I6 I7 I8 I9 U1 1 5 4 U2 4 2 5 U3 4 5 U4 5 2 4 A 1 3 1 3 1 4 5 8 user-item matrix
  • 4. Matrix Factorization Matrix Factorization (MF) techniques fall in the class of collaborative filtering (CF) methods  latent factor models: similarity between users and items is induced by some factors hidden in the data Latent factor models build a matrix of users and items and each element is associated with a vector of characteristics MF techniques represent users and items by vectors of features derived from ratings given by users for the items seen or tried Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30-37, 2009.
  • 5. Matrix Factorization U set of users, D set of items, R rating matrix. MF aims to factorize R into two matrices P and Q such that their product approximates R: P row: strength of the association between user and k latent features. Q column: strength of the association between an item and the latent features. Once these vectors are discovered, recommendations are calculated using the expression of A MF used in literature: Singular Value Decomposition (SVD): • introduced by Simon Funk in the NetFlix Prize • has the objective of reducing the dimensionality, i. e. the rank, of the user-item matrix • capture latent relationships between users and items T T ij i jR P Q r p q ijr
  • 6. SVD Different SVD algorithms were used in RS literature: • in [15], the authors uses a small SVD obtained retaining only k << r singular values by discarding other entries; • in [11], the authors propose an algorithm to perform SVD on large matrices, by focusing the study on parameters that affect the convergence speed; • in [9], Koren presents an approach oriented on factor models which projected users and items in the same latent space where some measures for comparison are defined. He propose several versions of SVD with the objective of having better recommendations as well as good scalability [15] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Incremental singular value decomposition algorithms for highly scalable recommender systems. [11] Miklos Kurucz, Andras A. Benczur, and Balazs Torma. Methods for large scale SVD with missing values. [9] Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model.
  • 7. Limitation of MF Techniques They take into account only the standard profile of users and items This does not allow to integrate further information such as context Contextual information (the place where the user see the movie, the device, the company...) cannot be managed with simple user-item matrices Family with children At cinema with friends or collegues
  • 8. Tensors & Tensor Factorization [6] R.A. Harshman. Foundations of the PARAFAC Procedure: Models and Conditions for an "explanatory" Multi-modal Factor Analysis, volume 1 (16) of Working papers in phonetics. University of California at Los Angeles, 1970. [12] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl, 21:1253-1278, 2000. Tensors are higher-dimensional arrays of numbers might be exploited in order to include additional contextual information in the recommendation process. The techniques that generalize the MF can also be applied to tensors. Two particular Tensor Factorizations (TF) can be considered to be higher- order extensions of matrix singular value decomposition: • PARallel FACtor analysis [6] or CANonical DECOMPosition (PARAFAC/CANDECOMP), which decomposes a tensor as a sum of rank-one tensors; • High Order Singular Value Decomposition [12] (HOSVD), which is an higher-order form of Principal Component Analysis (PCA)
  • 9. HOSVD is the most widely adopted TF technique. HOSVD is a generalization of the SVD for matrices: decomposes the initial tensor in N matrices (N is the size of the tensor) and a “small tensor”. Examples of HOSVD in RS: • Multiverse recommendation [7]: TF is applied to manage data for users, movies, user ratings and contextual information such as age, day of the week, companion; • Tensor factorization for tag recommendation [13]: for a social tagging system, users' data, items and tags are stored in a 3rd order tensor factored, aim: discovering latent factors which bind the associations user-item, user-tag and tag-item; [7] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. [13] Steen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD, pages 727-736, 2009. HOSVD & RS 1/2
  • 10. HOSVD & RS 2/2 • Cubesvd [17]: system of personalized web search, in order to discover the hidden relationships between users, queries, web pages. Data are collected in a 3rd order tensor that is decomposed. [17] Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. Cubesvd: a novel approach to personalized web search. In Proceedings of the 14th international conference on World Wide Web, WWW '05, pages 382-390, New York, NY, USA, 2005. ACM.
  • 11. HOSVD: advantages & disadvantages Advantages: • the ability of taking into account more dimensions simultaneously • better data modeling than standard SVD, dimensionality reduction can be performed not only in one dimension but also separately for each dimension Disadvantages: • is not an optimal tensor decomposition, in the sense of least squares data fitting: in SVD truncating the first n singular values allows to find the best n-rank approximation of a given matrix • high computational cost • cannot deal with missing values  they are treated as 0
  • 12. PARAFAC PARAFAC model of a 3-dimensional array is given by 3 loading matrices A, B and C with typical elements aif , bjf , and ckf . PARAFAC model is defined by: ˆxijk = aif bjf ckf f =1 F å F: number of rank-one components. PARAFAC Advantages: • alternative to HOSVD • more simplicity • linear computation time compared to HOSVD • does not collapse data, but it retains its natural 3-dimensional structure • components are unique, up to permutation and scaling, under mild conditions
  • 13. PARAFAC, RS and not only 1/2 In Tfmap: optimizing map for top-n context-aware recommendation [16]: tensor of 3-dimensions (users, items and context types) is factorized with PARAFAC. Dimensions are associated with the 3 factor matrices and used to calculate user preference for item i under context type k. Problem: PARAFAC & Missing Data Solution: CP-WOPT algorithm [16] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic, and Nuria Oliver. Tfmap: optimizing map for top-n context-aware recommendation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pages 155{164, New York, NY, USA, 2012. ACM.
  • 14. PARAFAC, RS and not only 2/2 In Scalable tensor factorizations with missing data, PARAFAC & Missing Data. CP-WOPT [1] (CP Weighted OPTimization) algorithm uses 1st-order optimization to solve the weighted least squares objective function. Using extensive numerical experiments on simulated data sets CP-WOPT can successfully factor tensors with noise and up to 70% missing data. CP-WOPT is significantly faster and accurate than the best published method in literature. [1] Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, and Morten Mrup. Scalable tensor factorizations with missing data. In SDM10: Proceedings of the 2010 SIAM International Conference on Data Mining, pages 701- 712, Philadelphia, April 2010. SIAM.
  • 15. CP-WOPT adaptation: Preliminary Experiments 1/3 CP-WOPT algorithm adapted to RS: • takes into account missing values  the algorithm is suitable for very sparse user-item matrices • computation of a weighted factorization that models only known values, rather to simply employ 0 values for missing data • main goals: • good reconstruction of missing values • consider contextual information  to achieve more precise recommendations. Preliminary user study: users rated some movies (not all) under contextual factors  7 real users  11 movies in the Movielens 100k dataset  contextual factors: if they like to see the movie  at home or cinema;  with friends or with partner;  with or without family.
  • 16. CP-WOPT adaptation: Preliminary Experiments 2/3 Main Goal: good reconstruction of missing values with CP-WOPT adapted Ratings range: 1 to 5 Rating coding: • 1-2: strong-modest preference for the 1st option • 3: neutrality; • 4-5: modest-strong preference for the 2nd option Metrics: accuracy (acc ), % of known values correctly reconstructed coverage (cov ), % of non-zero values returned Results: 105 maximum iterations acc = 94.4% cov = 91.7% 100 100 known values errors acc 100 cov 100 unknown values errors
  • 17. Other quality results: the experiment showed that it is possible to express, through the n- dimensional factorization, not only the recommendations for the single user, but also more specific suggestions about the consumption of an item. CP-WOPT adaptation: Preliminary Experiments 3/3
  • 18. In Vitro: Preliminary Experiment Main Goal: test CP-WOPT adapted on RS for more precise recommendations Adapted version of CP-WOPT  subset (significant number of ratings) of Movielens 100k dataset. Ratings given by users wich have a profession are stored in a 3rd order tensor. Input: tensor of dimensions 100 users, 150 movies, 21 occupations (the contextual factor) Results: acc = 92.09% cov = 99.96% MAE = 0.60 RMSE = 0.93 in line with results reported in literature
  • 19. Ongoing and Future Work • Extend the evaluation of our version of CP-WOPT on tensor having high dimensionality (Movielens dataset) • investigate methods to assess whether and which contextual factors (occupation, company) inuflence the users' preferences • user’s segmentation • plan to test our approach in other domains such as news recommendation or Electronic Program Guides
  • 20. Thanks for your attention!! Dott. Giuseppe Ricci PhD Student in Computer Science Department of Computer Science 4 floor LACAM Lab., SWAP Room Phone: +39-080-5442298 E-mail: giuseppe.ricci@uniba.it