SlideShare ist ein Scribd-Unternehmen logo
1 von 26
ICMLA'11, Honolulu, Hawaii   1




COLLABORATIVE
FILTERING WITH CCAM
Presenter: Meng-Lun Wu
Author: Meng-Lun Wu, Chia-Hui Chang and Rei-Zhe Liu
Date: 2011/12/21
ICMLA'11, Honolulu, Hawaii   2




Outline
• Introduction
• Related Work
• Preliminary
• Collaborative Filtering with CCAM
• Experiment
• Conclusion
ICMLA'11, Honolulu, Hawaii   3




Introduction (1/2)
• In any recommender system, the number of ratings already
 obtained is usually very small compared to the number of
 ratings that need to be predicted.

• A possible solution turns out to be dimensionality reduction
 methods which can alleviate data sparsity.

• Typically, clustering is the simplest way that can be extended
 over recommender systems to achieve a compact model and
 avoid the sparsity problem.
ICMLA'11, Honolulu, Hawaii   4




Introduction (2/2)
• In the past years, co-clustering based on information theory has
 attracted more and more attention.

• We have extended a co-clustering algorithm based on
 information theory to augmented data matrix which called Co-
 Clustering with Augmented data Matrix, CCAM.

• In this paper, we consider how to alleviate the sparsity problem
 and achieve a precise prediction by Collaborative Filtering with
 CCAM.
ICMLA'11, Honolulu, Hawaii      5




Related Work
• Information theoretical co-clustering
   • Dhillon et al. (2003) developed from information theory and tried to
     optimize the objective function based on the loss of mutual information
     between clustered random variables.

• Matrix factorization co-clustering
  • Chen et al. (2008) linearly combined user-based, item-based CF
    method, and matrix factorization results in order to make prediction on
    ratings which relied on ONMTF.

  • Li et al. (2009) presented a novel cross-domain collaborative filtering
    method which co-clusters movie information via ONMTF and
    reconstructs knowledge for recommending books and movies.
ICMLA'11, Honolulu, Hawaii          6




Preliminary (1/2)
• Suppose that we are given a clicking information matrix R
 which is composed of user set, U={u1, u2, …, unu} and a set of
 ad, A={a1, a2, …, ana}.
    • nu and na respectively represents the number of users and ads.



• For memory-based CF methods, before finding similar
 neighbors, it is inevitable to encounter sparsity issues of
 demanded data.
    • In the research of Dhillon et al. (2003), they considered a co-clustering
      algorithm which monotonically decreases the information loss of tabular data
      to form a compact model.
ICMLA'11, Honolulu, Hawaii     7




Preliminary (2/2)
• Assume U and A are random variable sets with a joint probability
 distribution p(U, A) and marginal distribution p(U) and p(A). The
 mutual information I(U; A) is defined as


• Suppose there are G1 user clusters CU={cu(1), cu(2), …, cu(G1)} and, G2
 ad clusters CA={ca(1), ca(2), …, ca(G2)}, in order to judge the quality of
 a co-clustering, we define the loss in mutual information as



• PROPOSITION 1. There are also properties that are declared and
 proven, they are
ICMLA'11, Honolulu, Hawaii            8


Co-Clustering with Augmented data
Matrix, CCAM (1/4)
• When the optimization problem of loss in mutual information is first
 proposed by Dhillon et al. (2003), it was designed and applied for
 single tabular data.
     • However, in many cases besides the major data set, there exist related tables which
       may provide some useful information.


• In this co-clustering approach, Co-Clustering with Augmented data
 Matrix (CCAM), we will simultaneously modify the co-clusters of
 multiple augmented data to reduce the information loss.

• The other two sets of components, feature set F={f1, f2, …, fn }, and
                                                                               f
 profile set P={p1, p2, …, pnp}, are extensive information for ads and
 users and form the augmented matrices
  • where nf and np denotes the number of features and profiles, respectively.
ICMLA'11, Honolulu, Hawaii   9


Co-Clustering with Augmented data
Matrix, CCAM (2/4)
• PROPOSITION 2. There are extensive properties recognized
 when p(A, F) and p(U, P) were considered.



    • which were also declared and proven.


• DEFINITION 1. An optimal co-cluster (CU, CA) we desire to
 obtain would minimize
ICMLA'11, Honolulu, Hawaii   10


Co-Clustering with Augmented data
Matrix, CCAM (3/4)
•
ICMLA'11, Honolulu, Hawaii   11

Algorithm 1Co-Clustering with Augmented data Matrix algorithm
ICMLA'11, Honolulu, Hawaii   12


Collaborative filtering with CCAM
(1/5)
•
ICMLA'11, Honolulu, Hawaii        13


Collaborative filtering with CCAM
(2/5)
• DEFINITION 3. Since CCAM is designed on the base of KL-
 divergence, the distance metrics would be in a similar format.
    • Here we define the distance between each user and user cluster and each ad and
      ad cluster.




• Note that the ad cluster prototype and user cluster prototype of
 CCAM would be regarded as
ICMLA'11, Honolulu, Hawaii   14




Collaborative filtering with CCAM (3/5)
•
ICMLA'11, Honolulu, Hawaii   15




Collaborative filtering with CCAM (4/5)
•
ICMLA'11, Honolulu, Hawaii   16




Collaborative filtering with CCAM (5/5)
ICMLA'11, Honolulu, Hawaii                 17




Data set
• The data set used in the experiments are obtained from a financial
 social web-site, Ad$Mart, which ranged from 2009/09/01 to
 2010/03/31.

• For each test user, 15 observed clicking rates (Given15) are provided
 to find nearest neighbors and the remaining clicking rates are used for
 evaluation.

• To ensure each test user would click at least 15 ads, users with more
 than 20 clicked ads and ads with more than 10 clicked user-ad pairs
 would be reserved.
  • User-Ad: The pre-processing clicking data is provided by 1786 users and 520 ads. After
    preprocessing, we make it a joint probability distribution over user and ad, and also reform it into a
    clicking rate matrix scaled from 1-5.
  • Ad-Feature: An advertisement feature data set compiling 37 statistics of 530 ads.
  • User-Profile: A questionnaire data set provided by 520 users on 24 survey questions.
ICMLA'11, Honolulu, Hawaii   18




Evaluation methodology (1/2)
•
ICMLA'11, Honolulu, Hawaii   19




Evaluation methodology (2/2)
•
ICMLA'11, Honolulu, Hawaii   20




 and  tuning based on k-NN
•
ICMLA'11, Honolulu, Hawaii        21




G1 and G2 tuning based on K-Means
• We also have to determine what value of G1 would result in a
 well-performed MAE.
    • We simply make G2=10 as well as K1 = K2 = 5, and as a strategy to avoid too
      many parameter tunings.
    • On this issue, we will see the responding of k-Means with different G1
      (7, 15, 30, 60) and reserve the best one in order to apply to the other
      algorithms.
ICMLA'11, Honolulu, Hawaii           22




Parameter tuning with CCAM (1/2)
• In order to evaluate the result of co-clustering, we take
 advantage of classification algorithm (Weka J48) on user data to
 test the F-measure of 10-fold c.v., and similarly in ad aspect.
    • We use the clustering result of the user data (user-ad matrix and user-profile
      matrix) as the target labels for evaluation of user clustering, and is similar to
      the ad data (ad-user matrix and ad-feature matrix).

    • To examine the effectiveness of co-clustering, we reduce the columns of user-
      ad matrix to a smaller user-ad cluster matrix. The reduced data is then inserted
      into our user data for classification, so as the ad data.

                                                                        Clustering result
                                               User-
                      User data                                          of user-ad and
                                             ad cluster
                                                                          user-profile
ICMLA'11, Honolulu, Hawaii   23




Parameter tuning with CCAM (2/2)




• We find that when G1=60, the best setting will be λ=0.2, φ=0.1.

• Therefore, we will then apply the result of the optimal parameters of
 CCAM in the next section to compare with the other algorithms.
              •
ICMLA'11, Honolulu, Hawaii   24




 Results
• Table 3 compare the model-
 based approaches.

• Table 4 compare the hybrid
 models approaches with the
 previous parameter settings.
ICMLA'11, Honolulu, Hawaii       25




Conclusion
• In this paper, we applied the rating framework of Chen’s to evaluate
 the performance of hybrid CF with various model construction.

• In order to give a fair comparison, we start by tuning for the best
 performance in each individual approach.

• As a result, we compared four algorithm, CCAM, ITCC, k-Means and
 k-NN. The MAE metric has shown that CCAM outperformed the
 other three algorithms.

• In the future, to have more thorough discussions, we will investigate
 our algorithm on different real world data set.
     • such as the MovieLens, EachMovie and Book-Crossing data sets which respectively
      contains movie and book rating data of users.
ICMLA'11, Honolulu, Hawaii   26




THANK YOU FOR
LISTENING.
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

Q-learning vertical handover scheme in two-tier LTE-A networks
Q-learning vertical handover scheme in two-tier LTE-A networks Q-learning vertical handover scheme in two-tier LTE-A networks
Q-learning vertical handover scheme in two-tier LTE-A networks IJECEIAES
 
Traffic models and estimation
Traffic models and estimation Traffic models and estimation
Traffic models and estimation Mina Yonan
 
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSA METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGcscpconf
 
Real-time PMU Data Recovery Application Based on Singular Value Decomposition
Real-time PMU Data Recovery Application Based on Singular Value DecompositionReal-time PMU Data Recovery Application Based on Singular Value Decomposition
Real-time PMU Data Recovery Application Based on Singular Value DecompositionPower System Operation
 
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...Sunny Kr
 
A multiperiod set covering location model for dynamic redeployment of ambulances
A multiperiod set covering location model for dynamic redeployment of ambulancesA multiperiod set covering location model for dynamic redeployment of ambulances
A multiperiod set covering location model for dynamic redeployment of ambulancesHari Rajagopalan
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clusteringlau
 
A bi objective workflow application
A bi objective workflow applicationA bi objective workflow application
A bi objective workflow applicationIJITE
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
 
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS TOPOLOGY AWARE LOAD BALANCING FOR GRIDS
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS ijgca
 

Was ist angesagt? (13)

Q-learning vertical handover scheme in two-tier LTE-A networks
Q-learning vertical handover scheme in two-tier LTE-A networks Q-learning vertical handover scheme in two-tier LTE-A networks
Q-learning vertical handover scheme in two-tier LTE-A networks
 
Traffic models and estimation
Traffic models and estimation Traffic models and estimation
Traffic models and estimation
 
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSA METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONS
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 
1 s2.0-s0142061515005086-main
1 s2.0-s0142061515005086-main1 s2.0-s0142061515005086-main
1 s2.0-s0142061515005086-main
 
Real-time PMU Data Recovery Application Based on Singular Value Decomposition
Real-time PMU Data Recovery Application Based on Singular Value DecompositionReal-time PMU Data Recovery Application Based on Singular Value Decomposition
Real-time PMU Data Recovery Application Based on Singular Value Decomposition
 
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
 
A multiperiod set covering location model for dynamic redeployment of ambulances
A multiperiod set covering location model for dynamic redeployment of ambulancesA multiperiod set covering location model for dynamic redeployment of ambulances
A multiperiod set covering location model for dynamic redeployment of ambulances
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clustering
 
A bi objective workflow application
A bi objective workflow applicationA bi objective workflow application
A bi objective workflow application
 
Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature Descriptor
 
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS TOPOLOGY AWARE LOAD BALANCING FOR GRIDS
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS
 

Andere mochten auch

地震知識
地震知識地震知識
地震知識AllenWu
 
Co-clustering with augmented data
Co-clustering with augmented dataCo-clustering with augmented data
Co-clustering with augmented dataAllenWu
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsAllenWu
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Ch4.mapreduce algorithm design
Ch4.mapreduce algorithm designCh4.mapreduce algorithm design
Ch4.mapreduce algorithm designAllenWu
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixAllenWu
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decompositionAllenWu
 
Mind the Gap: Another look at the problem of the semantic gap in image retrieval
Mind the Gap: Another look at the problem of the semantic gap in image retrievalMind the Gap: Another look at the problem of the semantic gap in image retrieval
Mind the Gap: Another look at the problem of the semantic gap in image retrievalJonathon Hare
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisAllenWu
 
2013 11 01(fast_grbf-nmf)_for_share
2013 11 01(fast_grbf-nmf)_for_share2013 11 01(fast_grbf-nmf)_for_share
2013 11 01(fast_grbf-nmf)_for_shareTatsuya Yokota
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 

Andere mochten auch (11)

地震知識
地震知識地震知識
地震知識
 
Co-clustering with augmented data
Co-clustering with augmented dataCo-clustering with augmented data
Co-clustering with augmented data
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Ch4.mapreduce algorithm design
Ch4.mapreduce algorithm designCh4.mapreduce algorithm design
Ch4.mapreduce algorithm design
 
Collaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrixCollaborative filtering using orthogonal nonnegative matrix
Collaborative filtering using orthogonal nonnegative matrix
 
Co clustering by-block_value_decomposition
Co clustering by-block_value_decompositionCo clustering by-block_value_decomposition
Co clustering by-block_value_decomposition
 
Mind the Gap: Another look at the problem of the semantic gap in image retrieval
Mind the Gap: Another look at the problem of the semantic gap in image retrievalMind the Gap: Another look at the problem of the semantic gap in image retrieval
Mind the Gap: Another look at the problem of the semantic gap in image retrieval
 
Semantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual AnalysisSemantics In Digital Photos A Contenxtual Analysis
Semantics In Digital Photos A Contenxtual Analysis
 
2013 11 01(fast_grbf-nmf)_for_share
2013 11 01(fast_grbf-nmf)_for_share2013 11 01(fast_grbf-nmf)_for_share
2013 11 01(fast_grbf-nmf)_for_share
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 

Ähnlich wie Collaborative filtering with CCAM

[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM RecommendersYONG ZHENG
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET Journal
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringAllen Wu
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Surveymobilizer1000
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxARIV4
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsA Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsLoc Nguyen
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
 
20 26 jan17 walter latex
20 26 jan17 walter latex20 26 jan17 walter latex
20 26 jan17 walter latexIAESIJEECS
 
Advances In Collaborative Filtering
Advances In Collaborative FilteringAdvances In Collaborative Filtering
Advances In Collaborative FilteringScott Donald
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Daniel Valcarce
 
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...YONG ZHENG
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 
reference paper.pdf
reference paper.pdfreference paper.pdf
reference paper.pdfMayuRana1
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot netredpel dot com
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
Clonal Selection Algorithm Parallelization with MPJExpress
Clonal Selection Algorithm Parallelization with MPJExpressClonal Selection Algorithm Parallelization with MPJExpress
Clonal Selection Algorithm Parallelization with MPJExpressAyi Purbasari
 

Ähnlich wie Collaborative filtering with CCAM (20)

[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
 
Incremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clusteringIncremental collaborative filtering via evolutionary co clustering
Incremental collaborative filtering via evolutionary co clustering
 
Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Survey
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docx
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
JEDM_RR_JF_Final
JEDM_RR_JF_FinalJEDM_RR_JF_Final
JEDM_RR_JF_Final
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsA Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
20 26 jan17 walter latex
20 26 jan17 walter latex20 26 jan17 walter latex
20 26 jan17 walter latex
 
Advances In Collaborative Filtering
Advances In Collaborative FilteringAdvances In Collaborative Filtering
Advances In Collaborative Filtering
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
reference paper.pdf
reference paper.pdfreference paper.pdf
reference paper.pdf
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Clonal Selection Algorithm Parallelization with MPJExpress
Clonal Selection Algorithm Parallelization with MPJExpressClonal Selection Algorithm Parallelization with MPJExpress
Clonal Selection Algorithm Parallelization with MPJExpress
 

Kürzlich hochgeladen

Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 

Kürzlich hochgeladen (20)

Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 

Collaborative filtering with CCAM

  • 1. ICMLA'11, Honolulu, Hawaii 1 COLLABORATIVE FILTERING WITH CCAM Presenter: Meng-Lun Wu Author: Meng-Lun Wu, Chia-Hui Chang and Rei-Zhe Liu Date: 2011/12/21
  • 2. ICMLA'11, Honolulu, Hawaii 2 Outline • Introduction • Related Work • Preliminary • Collaborative Filtering with CCAM • Experiment • Conclusion
  • 3. ICMLA'11, Honolulu, Hawaii 3 Introduction (1/2) • In any recommender system, the number of ratings already obtained is usually very small compared to the number of ratings that need to be predicted. • A possible solution turns out to be dimensionality reduction methods which can alleviate data sparsity. • Typically, clustering is the simplest way that can be extended over recommender systems to achieve a compact model and avoid the sparsity problem.
  • 4. ICMLA'11, Honolulu, Hawaii 4 Introduction (2/2) • In the past years, co-clustering based on information theory has attracted more and more attention. • We have extended a co-clustering algorithm based on information theory to augmented data matrix which called Co- Clustering with Augmented data Matrix, CCAM. • In this paper, we consider how to alleviate the sparsity problem and achieve a precise prediction by Collaborative Filtering with CCAM.
  • 5. ICMLA'11, Honolulu, Hawaii 5 Related Work • Information theoretical co-clustering • Dhillon et al. (2003) developed from information theory and tried to optimize the objective function based on the loss of mutual information between clustered random variables. • Matrix factorization co-clustering • Chen et al. (2008) linearly combined user-based, item-based CF method, and matrix factorization results in order to make prediction on ratings which relied on ONMTF. • Li et al. (2009) presented a novel cross-domain collaborative filtering method which co-clusters movie information via ONMTF and reconstructs knowledge for recommending books and movies.
  • 6. ICMLA'11, Honolulu, Hawaii 6 Preliminary (1/2) • Suppose that we are given a clicking information matrix R which is composed of user set, U={u1, u2, …, unu} and a set of ad, A={a1, a2, …, ana}. • nu and na respectively represents the number of users and ads. • For memory-based CF methods, before finding similar neighbors, it is inevitable to encounter sparsity issues of demanded data. • In the research of Dhillon et al. (2003), they considered a co-clustering algorithm which monotonically decreases the information loss of tabular data to form a compact model.
  • 7. ICMLA'11, Honolulu, Hawaii 7 Preliminary (2/2) • Assume U and A are random variable sets with a joint probability distribution p(U, A) and marginal distribution p(U) and p(A). The mutual information I(U; A) is defined as • Suppose there are G1 user clusters CU={cu(1), cu(2), …, cu(G1)} and, G2 ad clusters CA={ca(1), ca(2), …, ca(G2)}, in order to judge the quality of a co-clustering, we define the loss in mutual information as • PROPOSITION 1. There are also properties that are declared and proven, they are
  • 8. ICMLA'11, Honolulu, Hawaii 8 Co-Clustering with Augmented data Matrix, CCAM (1/4) • When the optimization problem of loss in mutual information is first proposed by Dhillon et al. (2003), it was designed and applied for single tabular data. • However, in many cases besides the major data set, there exist related tables which may provide some useful information. • In this co-clustering approach, Co-Clustering with Augmented data Matrix (CCAM), we will simultaneously modify the co-clusters of multiple augmented data to reduce the information loss. • The other two sets of components, feature set F={f1, f2, …, fn }, and f profile set P={p1, p2, …, pnp}, are extensive information for ads and users and form the augmented matrices • where nf and np denotes the number of features and profiles, respectively.
  • 9. ICMLA'11, Honolulu, Hawaii 9 Co-Clustering with Augmented data Matrix, CCAM (2/4) • PROPOSITION 2. There are extensive properties recognized when p(A, F) and p(U, P) were considered. • which were also declared and proven. • DEFINITION 1. An optimal co-cluster (CU, CA) we desire to obtain would minimize
  • 10. ICMLA'11, Honolulu, Hawaii 10 Co-Clustering with Augmented data Matrix, CCAM (3/4) •
  • 11. ICMLA'11, Honolulu, Hawaii 11 Algorithm 1Co-Clustering with Augmented data Matrix algorithm
  • 12. ICMLA'11, Honolulu, Hawaii 12 Collaborative filtering with CCAM (1/5) •
  • 13. ICMLA'11, Honolulu, Hawaii 13 Collaborative filtering with CCAM (2/5) • DEFINITION 3. Since CCAM is designed on the base of KL- divergence, the distance metrics would be in a similar format. • Here we define the distance between each user and user cluster and each ad and ad cluster. • Note that the ad cluster prototype and user cluster prototype of CCAM would be regarded as
  • 14. ICMLA'11, Honolulu, Hawaii 14 Collaborative filtering with CCAM (3/5) •
  • 15. ICMLA'11, Honolulu, Hawaii 15 Collaborative filtering with CCAM (4/5) •
  • 16. ICMLA'11, Honolulu, Hawaii 16 Collaborative filtering with CCAM (5/5)
  • 17. ICMLA'11, Honolulu, Hawaii 17 Data set • The data set used in the experiments are obtained from a financial social web-site, Ad$Mart, which ranged from 2009/09/01 to 2010/03/31. • For each test user, 15 observed clicking rates (Given15) are provided to find nearest neighbors and the remaining clicking rates are used for evaluation. • To ensure each test user would click at least 15 ads, users with more than 20 clicked ads and ads with more than 10 clicked user-ad pairs would be reserved. • User-Ad: The pre-processing clicking data is provided by 1786 users and 520 ads. After preprocessing, we make it a joint probability distribution over user and ad, and also reform it into a clicking rate matrix scaled from 1-5. • Ad-Feature: An advertisement feature data set compiling 37 statistics of 530 ads. • User-Profile: A questionnaire data set provided by 520 users on 24 survey questions.
  • 18. ICMLA'11, Honolulu, Hawaii 18 Evaluation methodology (1/2) •
  • 19. ICMLA'11, Honolulu, Hawaii 19 Evaluation methodology (2/2) •
  • 20. ICMLA'11, Honolulu, Hawaii 20  and  tuning based on k-NN •
  • 21. ICMLA'11, Honolulu, Hawaii 21 G1 and G2 tuning based on K-Means • We also have to determine what value of G1 would result in a well-performed MAE. • We simply make G2=10 as well as K1 = K2 = 5, and as a strategy to avoid too many parameter tunings. • On this issue, we will see the responding of k-Means with different G1 (7, 15, 30, 60) and reserve the best one in order to apply to the other algorithms.
  • 22. ICMLA'11, Honolulu, Hawaii 22 Parameter tuning with CCAM (1/2) • In order to evaluate the result of co-clustering, we take advantage of classification algorithm (Weka J48) on user data to test the F-measure of 10-fold c.v., and similarly in ad aspect. • We use the clustering result of the user data (user-ad matrix and user-profile matrix) as the target labels for evaluation of user clustering, and is similar to the ad data (ad-user matrix and ad-feature matrix). • To examine the effectiveness of co-clustering, we reduce the columns of user- ad matrix to a smaller user-ad cluster matrix. The reduced data is then inserted into our user data for classification, so as the ad data. Clustering result User- User data of user-ad and ad cluster user-profile
  • 23. ICMLA'11, Honolulu, Hawaii 23 Parameter tuning with CCAM (2/2) • We find that when G1=60, the best setting will be λ=0.2, φ=0.1. • Therefore, we will then apply the result of the optimal parameters of CCAM in the next section to compare with the other algorithms. •
  • 24. ICMLA'11, Honolulu, Hawaii 24 Results • Table 3 compare the model- based approaches. • Table 4 compare the hybrid models approaches with the previous parameter settings.
  • 25. ICMLA'11, Honolulu, Hawaii 25 Conclusion • In this paper, we applied the rating framework of Chen’s to evaluate the performance of hybrid CF with various model construction. • In order to give a fair comparison, we start by tuning for the best performance in each individual approach. • As a result, we compared four algorithm, CCAM, ITCC, k-Means and k-NN. The MAE metric has shown that CCAM outperformed the other three algorithms. • In the future, to have more thorough discussions, we will investigate our algorithm on different real world data set. • such as the MovieLens, EachMovie and Book-Crossing data sets which respectively contains movie and book rating data of users.
  • 26. ICMLA'11, Honolulu, Hawaii 26 THANK YOU FOR LISTENING. Q&A