SlideShare a Scribd company logo
1 of 5
Collaborative Filtering Recommender
Based on Co-occurrence Matrix
Author: Marjan Sterjev
Today we are facing recommendations everywhere. The social networks and Internet portals
recommend products, movies, books, events, news... Recommendations are the essence of the
marketing campaigns. These recommendations are not passive and user behavior agnostic. There are
a lot of recommendation systems out there that recommend items to the users based on the user's
preferences as well as user's past history (purchasing, viewing, rating etc.)
Collaborative Filtering is one the most exploited item recommendation approaches. It is not based on
the item (movie, book..) content (genre, author, keywords …). Rather, it is based on the user's past
history like buying, viewing, rating items.
Let's say we have users and movies. Each user in the system has watched one or more movies in the
past. We will assume in this article that if the user has watched particular movie, he likes and prefers
that movie. User to movie preferences can be represented as matrix model where each row
represents particular user and each column represents particular movie. Value of 1 in a matrix cell
means that the user has watched and prefers that movie.
In R we can generate sample user movie preference matrix the following way:
> set.seed(1)
> no_users <- 15
> no_movies <- 10
> user_ids <- apply(array(1:no_users),1,function(i)paste(c('u',i),collapse='_'))
> movie_ids <- apply(array(1:no_movies),1,function(i)paste(c('m',i),collapse='_'))
> m_user_movie <- matrix(
sample(x=c(0,1),size=no_users*no_movies,replace=TRUE,prob=c(0.7,0.3)),
ncol=no_movies,byrow=TRUE,dimnames=list(user_ids,movie_ids))
The generated matrix for 15 users and 10 movies is:
m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10
u_1 0 0 0 1 0 1 1 0 0 0
u_2 0 0 0 0 1 0 1 1 0 1
u_3 1 0 0 0 0 0 0 0 1 0
u_4 0 0 0 0 1 0 1 0 1 0
u_5 1 0 1 0 0 1 0 0 1 0
u_6 0 1 0 0 0 0 0 0 0 0
u_7 1 0 0 0 0 0 0 1 0 1
u_8 0 1 0 0 0 1 1 0 1 1
u_9 0 1 0 0 1 0 1 0 0 0
u_10 0 0 0 1 1 1 0 0 1 0
u_11 0 0 0 1 0 0 0 0 1 0
u_12 1 1 0 0 0 0 1 0 0 0
u_13 1 0 0 0 1 0 0 0 0 0
u_14 0 0 0 0 1 0 0 0 1 0
u_15 0 0 0 0 1 0 0 1 0 1
So user 'u_1' has already watched movies 'm_4', 'm_6' and 'm_7', user 'u_2' has already watched
1
movies 'm_5', 'm_7', 'm_8','m_10' and so on.
An interesting aspect in the matrix model above is finding how many times two movies have appeared
together in the user preference vectors? For example, we can see that movies 'm_1' and 'm_2'
appear together only once in the movie preference vector for user 'u_12'. However, movies 'm_5' and
'm_7' appear together three times in the user movie preference vectors for the users 'u_2', 'u_4' and
'u_9'.
m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10
u_1 0 0 0 1 0 1 1 0 0 0
u_2 0 0 0 0 1 0 1 1 0 1
u_3 1 0 0 0 0 0 0 0 1 0
u_4 0 0 0 0 1 0 1 0 1 0
u_5 1 0 1 0 0 1 0 0 1 0
u_6 0 1 0 0 0 0 0 0 0 0
u_7 1 0 0 0 0 0 0 1 0 1
u_8 0 1 0 0 0 1 1 0 1 1
u_9 0 1 0 0 1 0 1 0 0 0
u_10 0 0 0 1 1 1 0 0 1 0
u_11 0 0 0 1 0 0 0 0 1 0
u_12 1 1 0 0 0 0 1 0 0 0
u_13 1 0 0 0 1 0 0 0 0 0
u_14 0 0 0 0 1 0 0 0 1 0
u_15 0 0 0 0 1 0 0 1 0 1
We can obtain the full co-occurrence matrix if we transpose the user movie preference matrix and
calculate cross product between this transposed matrix and the original one.
> co_movie <- t(m_user_movie) %*% m_user_movie
m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10
m_1 5 1 1 0 1 1 1 1 2 1
m_2 1 4 0 0 1 1 3 0 1 1
m_3 1 0 1 0 0 1 0 0 1 0
m_4 0 0 0 3 1 2 1 0 2 0
m_5 1 1 0 1 7 1 3 2 3 2
m_6 1 1 1 2 1 4 2 0 3 1
m_7 1 3 0 1 3 2 6 1 2 2
m_8 1 0 0 0 2 0 1 3 0 3
m_9 2 1 1 2 3 3 2 0 7 1
m_10 1 1 0 0 2 1 2 3 1 4
Diagonal elements of the co-occurrence matrix shall be set to zero because we are not interested how
many times a movie appears with itself in the user preference vectors.
> diag(co_movie) <- 0
The final co-occurrence matrix is:
m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10
m_1 0 1 1 0 1 1 1 1 2 1
m_2 1 0 0 0 1 1 3 0 1 1
m_3 1 0 0 0 0 1 0 0 1 0
m_4 0 0 0 0 1 2 1 0 2 0
2
m_5 1 1 0 1 0 1 3 2 3 2
m_6 1 1 1 2 1 0 2 0 3 1
m_7 1 3 0 1 3 2 0 1 2 2
m_8 1 0 0 0 2 0 1 0 0 3
m_9 2 1 1 2 3 3 2 0 0 1
m_10 1 1 0 0 2 1 2 3 1 0
Recommendations for particular user can be generated if we calculate matrix product between the co-
occurrence matrix and that user's movie preference vector. I will explain below the idea behind this
cross product. For example the user 'u_1' has the following movie preference vector:
> m_user_movie[1,]
m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10
0 0 0 1 0 1 1 0 0 0
The recommendation vector can be calculated as:
> co_movie %*% m_user_movie[1,]
[,1]
m_1 2
m_2 4
m_3 1
m_4 3
m_5 5
m_6 4
m_7 3
m_8 1
m_9 7
m_10 3
What are the weighting numbers obtained in the recommendation vector? For each movie in the
recommendation vector it is the number of times that movie appeared together with the movies
contained in the user preference vector. In the example above, the weighting value for movie 'm_1' is
2 because it appeared together once with 'm_6' and once with 'm_7'; the value for movie 'm_2' is 4
because this movie appeared together once with 'm_6' and three times with 'm_7'; the value for movie
'm_9' is 7 because this movie appeared together twice with 'm_4', three times with 'm_6' and twice
with 'm_7'.
In the movie recommendation vector we shall set to zero all movies already preferred by the user:
> co_movie %*% m_user_movie[1,] * (1-m_user_movie[1,])
[,1]
m_1 2
m_2 4
m_3 1
m_4 0
m_5 5
m_6 0
m_7 0
m_8 1
m_9 7
m_10 3
3
The movie with the maximum weighting value in the recommendation vector is the one that shall be
recommended to the user. In our example, user 'u_1' will be recommended the movie 'm_9' which has
the highest weighting value of 7.
The steps explained above can be wrapped into recommendation function:
> recommend <- function(co_movie, a_user_movie){
a_rec <- co_movie %*% a_user_movie
a_rec <- a_rec * (1-a_user_movie)
max_movie_idx <- which.max(a_rec)
movie_id=paste(c('m',max_movie_idx),collapse='_')
return(movie_id)
}
> recommend(co_movie, m_user_movie[1,])
[1] "m_9"
> recommend(co_movie, m_user_movie[15,])
[1] "m_7"
Sparse Matrices and Vectors
Matrices and Vectors where most of the elements are zero are sparse. There are a lot of proposed
approaches that describe how to represent sparse structures in memory and how to implement some
linear algebra operations like cross product between matrix and sparse vector.
Each user movie preference vector in the example above is sparse vector. Zeros appear in these
vectors with probability of 0.7 and ones with probability of 0.3.
The weight for the movie 'm_1' in the recommendation vector above is obtained the following way:
0*0+1*0+1*0+0*1+1*0+1*1+1*1+1*0+2*0+1*0=1*1+1*1=2
The weight for the movie 'm_2' in the recommendation vector above is obtained as:
1*0+0*0+0*0+0*1+1*0+1*1+3*1+0*0+1*0+1*0=1*1+3*1=4
The total number of operations for the whole recommendation vector is 10 * (10 multiplications+9
additions)=190. Most of these 190 operations are not needed. There is a simple trick that can greatly
reduce the number of the operations involved. First we need to find the positions in the user movie
preference vector that contain 1's. For user 'u_1' these positions are:
> which(m_user_movie[1,]==1)
m_4 m_6 m_7
4 6 7
The recommendation vector can be obtained if we sum the 4-th, 6-th and 7-th column from the co-
occurrence matrix:
> co_movie[,4]+co_movie[,6]+co_movie[,7]
m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10
2 4 1 3 5 4 3 1 7 3
4
The result is the same as the result obtained with the basic matrix product rule. However the number
of operations is reduced to only 20 additions.
This approach can be expressed in a more compact form that results into this recommender function:
> recommend_1 <- function(co_movie, a_user_movie){
a_rec <- rowSums(co_movie[,which(a_user_movie>0)])
a_rec <- a_rec * (1-a_user_movie)
max_movie_idx <- which.max(a_rec)
movie_id=paste(c('m',max_movie_idx),collapse='_')
return(movie_id)
}
> recommend_1(co_movie,m_user_movie[1,])
[1] "m_9"
This kind of linear algebra optimization is very important and essential when dealing with Big Data. In
reality the systems contain thousands or millions of items, sparsity of the matrices is very large and
the optimization presented in this article will drive you toward feasible system and optimal
performance.
5

More Related Content

Similar to Collaborative Filtering Recommendations Using Co-occurrence Matrix

movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReportSohini Sarkar
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
How popular are your tweets?
How popular are your tweets?How popular are your tweets?
How popular are your tweets?avijit_saha
 
How To Write An Argumentative Essay Struct
How To Write An Argumentative Essay StructHow To Write An Argumentative Essay Struct
How To Write An Argumentative Essay StructTiffany Robinson
 
Systems Analysis and Design - Assignment 1
Systems Analysis and Design - Assignment 1Systems Analysis and Design - Assignment 1
Systems Analysis and Design - Assignment 1Kevin Liao
 
Optimization of Image Search from Photo Sharing Websites
Optimization of Image Search from Photo Sharing WebsitesOptimization of Image Search from Photo Sharing Websites
Optimization of Image Search from Photo Sharing Websitesijceronline
 
IRJET - Movie Opinion Mining & Emotions Rating Software
IRJET -  	  Movie Opinion Mining & Emotions Rating SoftwareIRJET -  	  Movie Opinion Mining & Emotions Rating Software
IRJET - Movie Opinion Mining & Emotions Rating SoftwareIRJET Journal
 
Machine learning for publishers - part 1
Machine learning for publishers - part 1Machine learning for publishers - part 1
Machine learning for publishers - part 1MagnusEngstrm5
 
movierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptxmovierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptxAryanVyawahare
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Deep learning in E-Commerce Applications and Challenges (CNN)
Deep learning in E-Commerce Applications and Challenges (CNN) Deep learning in E-Commerce Applications and Challenges (CNN)
Deep learning in E-Commerce Applications and Challenges (CNN) Houda Bakir
 
Recommendation Systems with R
Recommendation Systems with RRecommendation Systems with R
Recommendation Systems with RARCHIT GUPTA
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation projectAbhishek Jaisingh
 
Catching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationCatching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationhyunsung lee
 

Similar to Collaborative Filtering Recommendations Using Co-occurrence Matrix (20)

movieRecommendation_FinalReport
movieRecommendation_FinalReportmovieRecommendation_FinalReport
movieRecommendation_FinalReport
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
How popular are your tweets?
How popular are your tweets?How popular are your tweets?
How popular are your tweets?
 
Raccomender engines
Raccomender enginesRaccomender engines
Raccomender engines
 
projectreport
projectreportprojectreport
projectreport
 
How To Write An Argumentative Essay Struct
How To Write An Argumentative Essay StructHow To Write An Argumentative Essay Struct
How To Write An Argumentative Essay Struct
 
Systems Analysis and Design - Assignment 1
Systems Analysis and Design - Assignment 1Systems Analysis and Design - Assignment 1
Systems Analysis and Design - Assignment 1
 
Optimization of Image Search from Photo Sharing Websites
Optimization of Image Search from Photo Sharing WebsitesOptimization of Image Search from Photo Sharing Websites
Optimization of Image Search from Photo Sharing Websites
 
IRJET - Movie Opinion Mining & Emotions Rating Software
IRJET -  	  Movie Opinion Mining & Emotions Rating SoftwareIRJET -  	  Movie Opinion Mining & Emotions Rating Software
IRJET - Movie Opinion Mining & Emotions Rating Software
 
Machine learning for publishers - part 1
Machine learning for publishers - part 1Machine learning for publishers - part 1
Machine learning for publishers - part 1
 
20320140501009 2
20320140501009 220320140501009 2
20320140501009 2
 
movierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptxmovierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptx
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Deep learning in E-Commerce Applications and Challenges (CNN)
Deep learning in E-Commerce Applications and Challenges (CNN) Deep learning in E-Commerce Applications and Challenges (CNN)
Deep learning in E-Commerce Applications and Challenges (CNN)
 
Recommendation Systems with R
Recommendation Systems with RRecommendation Systems with R
Recommendation Systems with R
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
movie ticket pricing prediction
movie ticket pricing prediction movie ticket pricing prediction
movie ticket pricing prediction
 
Movie recommendation project
Movie recommendation projectMovie recommendation project
Movie recommendation project
 
Movie tick process desk
Movie tick process deskMovie tick process desk
Movie tick process desk
 
Catching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationCatching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorization
 

Recently uploaded

IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Recently uploaded (20)

Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 

Collaborative Filtering Recommendations Using Co-occurrence Matrix

  • 1. Collaborative Filtering Recommender Based on Co-occurrence Matrix Author: Marjan Sterjev Today we are facing recommendations everywhere. The social networks and Internet portals recommend products, movies, books, events, news... Recommendations are the essence of the marketing campaigns. These recommendations are not passive and user behavior agnostic. There are a lot of recommendation systems out there that recommend items to the users based on the user's preferences as well as user's past history (purchasing, viewing, rating etc.) Collaborative Filtering is one the most exploited item recommendation approaches. It is not based on the item (movie, book..) content (genre, author, keywords …). Rather, it is based on the user's past history like buying, viewing, rating items. Let's say we have users and movies. Each user in the system has watched one or more movies in the past. We will assume in this article that if the user has watched particular movie, he likes and prefers that movie. User to movie preferences can be represented as matrix model where each row represents particular user and each column represents particular movie. Value of 1 in a matrix cell means that the user has watched and prefers that movie. In R we can generate sample user movie preference matrix the following way: > set.seed(1) > no_users <- 15 > no_movies <- 10 > user_ids <- apply(array(1:no_users),1,function(i)paste(c('u',i),collapse='_')) > movie_ids <- apply(array(1:no_movies),1,function(i)paste(c('m',i),collapse='_')) > m_user_movie <- matrix( sample(x=c(0,1),size=no_users*no_movies,replace=TRUE,prob=c(0.7,0.3)), ncol=no_movies,byrow=TRUE,dimnames=list(user_ids,movie_ids)) The generated matrix for 15 users and 10 movies is: m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10 u_1 0 0 0 1 0 1 1 0 0 0 u_2 0 0 0 0 1 0 1 1 0 1 u_3 1 0 0 0 0 0 0 0 1 0 u_4 0 0 0 0 1 0 1 0 1 0 u_5 1 0 1 0 0 1 0 0 1 0 u_6 0 1 0 0 0 0 0 0 0 0 u_7 1 0 0 0 0 0 0 1 0 1 u_8 0 1 0 0 0 1 1 0 1 1 u_9 0 1 0 0 1 0 1 0 0 0 u_10 0 0 0 1 1 1 0 0 1 0 u_11 0 0 0 1 0 0 0 0 1 0 u_12 1 1 0 0 0 0 1 0 0 0 u_13 1 0 0 0 1 0 0 0 0 0 u_14 0 0 0 0 1 0 0 0 1 0 u_15 0 0 0 0 1 0 0 1 0 1 So user 'u_1' has already watched movies 'm_4', 'm_6' and 'm_7', user 'u_2' has already watched 1
  • 2. movies 'm_5', 'm_7', 'm_8','m_10' and so on. An interesting aspect in the matrix model above is finding how many times two movies have appeared together in the user preference vectors? For example, we can see that movies 'm_1' and 'm_2' appear together only once in the movie preference vector for user 'u_12'. However, movies 'm_5' and 'm_7' appear together three times in the user movie preference vectors for the users 'u_2', 'u_4' and 'u_9'. m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10 u_1 0 0 0 1 0 1 1 0 0 0 u_2 0 0 0 0 1 0 1 1 0 1 u_3 1 0 0 0 0 0 0 0 1 0 u_4 0 0 0 0 1 0 1 0 1 0 u_5 1 0 1 0 0 1 0 0 1 0 u_6 0 1 0 0 0 0 0 0 0 0 u_7 1 0 0 0 0 0 0 1 0 1 u_8 0 1 0 0 0 1 1 0 1 1 u_9 0 1 0 0 1 0 1 0 0 0 u_10 0 0 0 1 1 1 0 0 1 0 u_11 0 0 0 1 0 0 0 0 1 0 u_12 1 1 0 0 0 0 1 0 0 0 u_13 1 0 0 0 1 0 0 0 0 0 u_14 0 0 0 0 1 0 0 0 1 0 u_15 0 0 0 0 1 0 0 1 0 1 We can obtain the full co-occurrence matrix if we transpose the user movie preference matrix and calculate cross product between this transposed matrix and the original one. > co_movie <- t(m_user_movie) %*% m_user_movie m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10 m_1 5 1 1 0 1 1 1 1 2 1 m_2 1 4 0 0 1 1 3 0 1 1 m_3 1 0 1 0 0 1 0 0 1 0 m_4 0 0 0 3 1 2 1 0 2 0 m_5 1 1 0 1 7 1 3 2 3 2 m_6 1 1 1 2 1 4 2 0 3 1 m_7 1 3 0 1 3 2 6 1 2 2 m_8 1 0 0 0 2 0 1 3 0 3 m_9 2 1 1 2 3 3 2 0 7 1 m_10 1 1 0 0 2 1 2 3 1 4 Diagonal elements of the co-occurrence matrix shall be set to zero because we are not interested how many times a movie appears with itself in the user preference vectors. > diag(co_movie) <- 0 The final co-occurrence matrix is: m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10 m_1 0 1 1 0 1 1 1 1 2 1 m_2 1 0 0 0 1 1 3 0 1 1 m_3 1 0 0 0 0 1 0 0 1 0 m_4 0 0 0 0 1 2 1 0 2 0 2
  • 3. m_5 1 1 0 1 0 1 3 2 3 2 m_6 1 1 1 2 1 0 2 0 3 1 m_7 1 3 0 1 3 2 0 1 2 2 m_8 1 0 0 0 2 0 1 0 0 3 m_9 2 1 1 2 3 3 2 0 0 1 m_10 1 1 0 0 2 1 2 3 1 0 Recommendations for particular user can be generated if we calculate matrix product between the co- occurrence matrix and that user's movie preference vector. I will explain below the idea behind this cross product. For example the user 'u_1' has the following movie preference vector: > m_user_movie[1,] m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10 0 0 0 1 0 1 1 0 0 0 The recommendation vector can be calculated as: > co_movie %*% m_user_movie[1,] [,1] m_1 2 m_2 4 m_3 1 m_4 3 m_5 5 m_6 4 m_7 3 m_8 1 m_9 7 m_10 3 What are the weighting numbers obtained in the recommendation vector? For each movie in the recommendation vector it is the number of times that movie appeared together with the movies contained in the user preference vector. In the example above, the weighting value for movie 'm_1' is 2 because it appeared together once with 'm_6' and once with 'm_7'; the value for movie 'm_2' is 4 because this movie appeared together once with 'm_6' and three times with 'm_7'; the value for movie 'm_9' is 7 because this movie appeared together twice with 'm_4', three times with 'm_6' and twice with 'm_7'. In the movie recommendation vector we shall set to zero all movies already preferred by the user: > co_movie %*% m_user_movie[1,] * (1-m_user_movie[1,]) [,1] m_1 2 m_2 4 m_3 1 m_4 0 m_5 5 m_6 0 m_7 0 m_8 1 m_9 7 m_10 3 3
  • 4. The movie with the maximum weighting value in the recommendation vector is the one that shall be recommended to the user. In our example, user 'u_1' will be recommended the movie 'm_9' which has the highest weighting value of 7. The steps explained above can be wrapped into recommendation function: > recommend <- function(co_movie, a_user_movie){ a_rec <- co_movie %*% a_user_movie a_rec <- a_rec * (1-a_user_movie) max_movie_idx <- which.max(a_rec) movie_id=paste(c('m',max_movie_idx),collapse='_') return(movie_id) } > recommend(co_movie, m_user_movie[1,]) [1] "m_9" > recommend(co_movie, m_user_movie[15,]) [1] "m_7" Sparse Matrices and Vectors Matrices and Vectors where most of the elements are zero are sparse. There are a lot of proposed approaches that describe how to represent sparse structures in memory and how to implement some linear algebra operations like cross product between matrix and sparse vector. Each user movie preference vector in the example above is sparse vector. Zeros appear in these vectors with probability of 0.7 and ones with probability of 0.3. The weight for the movie 'm_1' in the recommendation vector above is obtained the following way: 0*0+1*0+1*0+0*1+1*0+1*1+1*1+1*0+2*0+1*0=1*1+1*1=2 The weight for the movie 'm_2' in the recommendation vector above is obtained as: 1*0+0*0+0*0+0*1+1*0+1*1+3*1+0*0+1*0+1*0=1*1+3*1=4 The total number of operations for the whole recommendation vector is 10 * (10 multiplications+9 additions)=190. Most of these 190 operations are not needed. There is a simple trick that can greatly reduce the number of the operations involved. First we need to find the positions in the user movie preference vector that contain 1's. For user 'u_1' these positions are: > which(m_user_movie[1,]==1) m_4 m_6 m_7 4 6 7 The recommendation vector can be obtained if we sum the 4-th, 6-th and 7-th column from the co- occurrence matrix: > co_movie[,4]+co_movie[,6]+co_movie[,7] m_1 m_2 m_3 m_4 m_5 m_6 m_7 m_8 m_9 m_10 2 4 1 3 5 4 3 1 7 3 4
  • 5. The result is the same as the result obtained with the basic matrix product rule. However the number of operations is reduced to only 20 additions. This approach can be expressed in a more compact form that results into this recommender function: > recommend_1 <- function(co_movie, a_user_movie){ a_rec <- rowSums(co_movie[,which(a_user_movie>0)]) a_rec <- a_rec * (1-a_user_movie) max_movie_idx <- which.max(a_rec) movie_id=paste(c('m',max_movie_idx),collapse='_') return(movie_id) } > recommend_1(co_movie,m_user_movie[1,]) [1] "m_9" This kind of linear algebra optimization is very important and essential when dealing with Big Data. In reality the systems contain thousands or millions of items, sparsity of the matrices is very large and the optimization presented in this article will drive you toward feasible system and optimal performance. 5