夏ゼミプレゼン 4xp

Badrul Sarwar, ”Item-Based Collaborative
Filtering Recommendation Algorithms”,
WWW 2001

Deguchi Lab.
Takashi UMEDA
Mail: umeda07[at]cs.dis.titech.ac.jp
Web: http://umekoumeda.net/

Summer Seminar 2008 @Susukakedai http://umekoumeda.net/

Outline…

• Introduction
• Item-Based CF
• Experimental Procedure
• Experimental Result
• Conclusions


Chap.1

INTRODUCTION


1-1. My Research Domain

• Evaluating recommendation Algorithms by ABM
– Recommendation:
• Rule Based Approach
• Contents Based Approach
• Collaborative Filtering(CF)
• Bayesian Network
– Why CF?
• It’s mainly used in many websites
– Why ABM?
• To use ABM, Algorithms are optimized according to the
market environment


1-2. What’s CF? (1/2)

• Have you used Amazon.com ?


1-3. What’s CF 2/2

Collaborative Filtering Algorithms(CF) is commonly
used in EC WebSite.

Recommendation


1-4. What’s CF 3/3
Book List
CF will
recommend
Prof Deguchi
Follow book,
Prof. Kizima Based on people
that are similar
with him
Book List

They have same books
Prof. Deguchi ↓
They have similar
preference


1-5. Contribution of this paper
• Problem of the Basic CF Algorithms
– Basic CF : Nearest Neighbors
– Scalability(Performance)
• High Scalability : In many users, a system recommend for
them quickly
– Accuracy(Quality)
• High Accuracy : if the data were sparse, a system recommend
the item that a user may like
• In this paper, the Author proposed new
Algorithms
– Item-Based CF
– Performance & Quality can be improved


1-6. Collaborative Filtering Process
Input Data CF-Algorithm Output Interface

i1 i2 ・・ in
Pa,j
u1 a 1,2
• Predicted the degree of
u2 Prediction
u3 User – Item Matrices likeness of item ij by the
: user ua
um
• Ir ∩Iua = Φ
•U ={ u1,u2,..,um}
• I ={i1,i2,..,in} A list of N-items
• Iui : item where user ui that the user will
evalues, Iui ⊆ I Recommendation
(Top-N Recommendation) like the most(Ir⊂I)
• ai,j : evaluation of item ij
by user ui •Ir ∩Iua = Φ


1-7.Variation of the CF-Algorithm
CF- Algorithm

Memory Based Approach Model Based Approach

• Procedure
•Procedure(Nearest Neighbor) 1. The system develops a
1. The system defines a set of model of user ratings at off-
users known as neighbors line
at on-line 2. By using the model, the
2. The system produces a system produce a
prediction or top-n prediction or top n
recommendation recommendation
• How developing the mode ?
• Bayesian Network
• clustering

1-8.What ‘s online and offline ?

Off-line Computation On-line Computation

At a suitable interval, When a user used the
offline computation is system, online
performed automatically Computation is
performed quickly

• Indexing If you input a query, the
EX: • Crowling search engine output the
Google • Ranking result.


1-9.the problem of the basic CF

Sparsity of user-item matrices:
many users may have purchased
Accuracy well under 1% of the all items →
accuracy of Nearest Neighbor
Weakness of algorithm may be poor
the Nearest
Neighbor With millions of users and
items, Nearest Neighbor
Scalability algorithm may suffer serious
scalability problem

We need new CF-Algorithms………..


Chap.2

ITEM-BASED CF


2-1. Overview of Item base CF


Item Similarity Computation Prediction Computation

Si,j : Similarity between item ii and ij •Pu,i is the degree of the
likeness item-i by user-
i1 i2 ・・ in u ,based on the similarity
u1 R 1,2 between items,S
u2
u3
:
um

S2n


2-2. Item Similarity Computation
• Cosine-Based Similarity

• Correlation-based Similarity The Difference
in rating scale
between
defferent users

• Adjusted Cosine Similarity


2-3.Prediction Computation

• Weighted Sum •N is the set of item that is very
similar with item I
• |N| : neighbor size

normalization coefficient
• Regression
– Ru,n is calculated by Regression model
– Ri: Target item’s rating(explaining variable)
– Rn: Similar item’s rating (explained variable)


2-4. Time Complexity(1/2)
Time complexity of Nearest Neibhor is…..
On-line Computation

User Similarity
Action Prediction Computation
Computation
•Computing 1 user-user similarity,
Recommend System scan n scores.
→ O(n) • Computing 1 Pi,j-Value,
Time • Recommend System must Recommend System scan m
Compl computing m × m user-user user-user similarity → O(m)
exity similarity. →O(m×m)

O(m2n) + O(m)


2-4. Time Complexity(2/2)
Time complexity of Item-Based CF is better Performance
than Neaest Neighbor

Item Similarity
Action Prediction Computation
Computation
Item-Item Similarity is static as
Computing 1 Pi,j-Value,
opposed the User Similarity → It
Time Recommend System scan n
It’s possible to precompute item
item similarity → O(n)
Compl Similarity ( = model )
exity
O(n)


Chap.3

EXPERIMENTAL PROCEDURE


3-1. Experimental Procedure

the data set is divided into a train and a test portion
1.Data Dividing user item rating
u1 i2 3
u2 Test
i1 2 Evaluation
u6
Train
i3 3 Parameter Learning

2.To fix the
optimal values The Follow parameters is decided.
of a parameter • Similarity Algorithms
• Train/ Test Ratio(x) : Sparsity level in data
• neighborhood size

3.Full Experiment To evalue Item based CF, the follow value is measured
• Performance
• Quality


3-2. Data Sets

• Data Sets
– Data from website “ MovieLens”
– MovieLens is web based recommender system
– Hundreds of users visit MovieLens to rate and
receive recommendations for movies.
– A data set was converted into a user-item
matrix( 943user × 1682 columns )


3-3. Evaluation Metrics
• To evaluating the quality of a recomender system,
we use MAE as evaluation metrics.
• MAE: Mean Absolute Error
– pi: Predicted Rating for item I (predicted based on a
train data)
– qi: true Rating for item I (from a test data)

– The lower the MAE, the more accurately the
recommendation engine predicts user ratings.


Chap.4

EXPERIMENTAL RESULTS


4-1.Optimal Values of a parameter(1/2)

Item-Similarity Algorithms =
Train-test ratio (x) = 0.8 as an
Adjusted cosine is the best
optimum value
quality


4-1.Optimal Values of a parameter(2/2)

In Full Experiment, basic
parameter is as follows.

• Similarity Algorithms:
Adjusted Cosine

Considering both trends, • test/train ratio: 0.8
Optimal choise of
Neighborhood Size
Is 30 • neighborhood size : 30


4-2. Quality

• Quality

• Item-Based CF ( weighted sum ) out perform the nearest-neighbor
• Item-Based CF (regression ) out perform the other two cases at low values
of x and at low neighborhood size


4-3. Performance(1/2)
• model size:
– Full model: At item similarity computation,
all item – item similarity(1682×1682) is
computed .
– Model size = 200: At item similarity
computation, 200 item – 200 item similarity
(200×200 ) is computated .
• If model size is small , Good quality is
consistent ?
– Other model based Approach is consistent
– If it is consistent, online performance is
higher than full- model case
• Result:
– if model size is 100 ~ 200, it’s possible to
obtain resonably good prediction quality
In the case of not using all item-item similarity , the accurarcy of
prediction don’t down and the performance improve.


Chap.5

CONCLUSIONS


5. Conclusion
• Quality
– Item-based CF provides better quality of predictions
than nearest neighbor Algorithms.
• Independent of Neighborhood size and train/test ratio
– The improvement in quality is not large
• Performance
– Item-Similarity Computation can be pre-computed
• Item-similarity is static
– High online Performance
– It is possible to retain only a small subset of items and
produce good prediction quality& high Performance


THANK YOU


夏ゼミプレゼン 4xp

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie 夏ゼミプレゼン 4xp

Ähnlich wie 夏ゼミプレゼン 4xp (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

夏ゼミプレゼン 4xp