4. 1-1. My Research Domain
• Evaluating recommendation Algorithms by ABM
– Recommendation:
• Rule Based Approach
• Contents Based Approach
• Collaborative Filtering(CF)
• Bayesian Network
– Why CF?
• It’s mainly used in many websites
– Why ABM?
• To use ABM, Algorithms are optimized according to the
market environment
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
5. 1-2. What’s CF? (1/2)
• Have you used Amazon.com ?
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
6. 1-3. What’s CF 2/2
Collaborative Filtering Algorithms(CF) is commonly
used in EC WebSite.
Recommendation
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
7. 1-4. What’s CF 3/3
Book List
CF will
recommend
Prof Deguchi
Follow book,
Prof. Kizima Based on people
that are similar
with him
Book List
They have same books
Prof. Deguchi ↓
They have similar
preference
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
8. 1-5. Contribution of this paper
• Problem of the Basic CF Algorithms
– Basic CF : Nearest Neighbors
– Scalability(Performance)
• High Scalability : In many users, a system recommend for
them quickly
– Accuracy(Quality)
• High Accuracy : if the data were sparse, a system recommend
the item that a user may like
• In this paper, the Author proposed new
Algorithms
– Item-Based CF
– Performance & Quality can be improved
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
9. 1-6. Collaborative Filtering Process
Input Data CF-Algorithm Output Interface
i1 i2 ・・ in
Pa,j
u1 a 1,2
• Predicted the degree of
u2 Prediction
u3 User – Item Matrices likeness of item ij by the
: user ua
um
• Ir ∩Iua = Φ
•U ={ u1,u2,..,um}
• I ={i1,i2,..,in} A list of N-items
• Iui : item where user ui that the user will
evalues, Iui ⊆ I Recommendation
(Top-N Recommendation) like the most(Ir⊂I)
• ai,j : evaluation of item ij
by user ui •Ir ∩Iua = Φ
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
10. 1-7.Variation of the CF-Algorithm
CF- Algorithm
Memory Based Approach Model Based Approach
• Procedure
•Procedure(Nearest Neighbor) 1. The system develops a
1. The system defines a set of model of user ratings at off-
users known as neighbors line
at on-line 2. By using the model, the
2. The system produces a system produce a
prediction or top-n prediction or top n
recommendation recommendation
• How developing the mode ?
• Bayesian Network
• clustering
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
11. 1-8.What ‘s online and offline ?
Off-line Computation On-line Computation
At a suitable interval, When a user used the
offline computation is system, online
performed automatically Computation is
performed quickly
• Indexing If you input a query, the
EX: • Crowling search engine output the
Google • Ranking result.
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
12. 1-9.the problem of the basic CF
Sparsity of user-item matrices:
many users may have purchased
Accuracy well under 1% of the all items →
accuracy of Nearest Neighbor
Weakness of algorithm may be poor
the Nearest
Neighbor With millions of users and
items, Nearest Neighbor
Scalability algorithm may suffer serious
scalability problem
We need new CF-Algorithms………..
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
14. 2-1. Overview of Item base CF
Off-line Computation On-line Computation
Item Similarity Computation Prediction Computation
Si,j : Similarity between item ii and ij •Pu,i is the degree of the
likeness item-i by user-
i1 i2 ・・ in u ,based on the similarity
u1 R 1,2 between items,S
u2
u3
:
um
S2n
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
15. 2-2. Item Similarity Computation
• Cosine-Based Similarity
• Correlation-based Similarity The Difference
in rating scale
between
defferent users
• Adjusted Cosine Similarity
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
16. 2-3.Prediction Computation
• Weighted Sum •N is the set of item that is very
similar with item I
• |N| : neighbor size
normalization coefficient
• Regression
– Ru,n is calculated by Regression model
– Ri: Target item’s rating(explaining variable)
– Rn: Similar item’s rating (explained variable)
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
17. 2-4. Time Complexity(1/2)
Time complexity of Nearest Neibhor is…..
On-line Computation
User Similarity
Action Prediction Computation
Computation
•Computing 1 user-user similarity,
Recommend System scan n scores.
→ O(n) • Computing 1 Pi,j-Value,
Time • Recommend System must Recommend System scan m
Compl computing m × m user-user user-user similarity → O(m)
exity similarity. →O(m×m)
O(m2n) + O(m)
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
18. 2-4. Time Complexity(2/2)
Time complexity of Item-Based CF is better Performance
than Neaest Neighbor
Off-line Computation On-line Computation
Item Similarity
Action Prediction Computation
Computation
Item-Item Similarity is static as
Computing 1 Pi,j-Value,
opposed the User Similarity → It
Time Recommend System scan n
It’s possible to precompute item
item similarity → O(n)
Compl Similarity ( = model )
exity
O(n)
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
20. 3-1. Experimental Procedure
the data set is divided into a train and a test portion
1.Data Dividing user item rating
u1 i2 3
u2 Test
i1 2 Evaluation
u6
Train
i3 3 Parameter Learning
2.To fix the
optimal values The Follow parameters is decided.
of a parameter • Similarity Algorithms
• Train/ Test Ratio(x) : Sparsity level in data
• neighborhood size
3.Full Experiment To evalue Item based CF, the follow value is measured
• Performance
• Quality
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
21. 3-2. Data Sets
• Data Sets
– Data from website “ MovieLens”
– MovieLens is web based recommender system
– Hundreds of users visit MovieLens to rate and
receive recommendations for movies.
– A data set was converted into a user-item
matrix( 943user × 1682 columns )
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
22. 3-3. Evaluation Metrics
• To evaluating the quality of a recomender system,
we use MAE as evaluation metrics.
• MAE: Mean Absolute Error
– pi: Predicted Rating for item I (predicted based on a
train data)
– qi: true Rating for item I (from a test data)
– The lower the MAE, the more accurately the
recommendation engine predicts user ratings.
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
24. 4-1.Optimal Values of a parameter(1/2)
Item-Similarity Algorithms =
Train-test ratio (x) = 0.8 as an
Adjusted cosine is the best
optimum value
quality
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
25. 4-1.Optimal Values of a parameter(2/2)
In Full Experiment, basic
parameter is as follows.
• Similarity Algorithms:
Adjusted Cosine
Considering both trends, • test/train ratio: 0.8
Optimal choise of
Neighborhood Size
Is 30 • neighborhood size : 30
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
26. 4-2. Quality
• Quality
• Item-Based CF ( weighted sum ) out perform the nearest-neighbor
• Item-Based CF (regression ) out perform the other two cases at low values
of x and at low neighborhood size
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
27. 4-3. Performance(1/2)
• model size:
– Full model: At item similarity computation,
all item – item similarity(1682×1682) is
computed .
– Model size = 200: At item similarity
computation, 200 item – 200 item similarity
(200×200 ) is computated .
• If model size is small , Good quality is
consistent ?
– Other model based Approach is consistent
– If it is consistent, online performance is
higher than full- model case
• Result:
– if model size is 100 ~ 200, it’s possible to
obtain resonably good prediction quality
In the case of not using all item-item similarity , the accurarcy of
prediction don’t down and the performance improve.
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/
29. 5. Conclusion
• Quality
– Item-based CF provides better quality of predictions
than nearest neighbor Algorithms.
• Independent of Neighborhood size and train/test ratio
– The improvement in quality is not large
• Performance
– Item-Similarity Computation can be pre-computed
• Item-similarity is static
– High online Performance
– It is possible to retain only a small subset of items and
produce good prediction quality& high Performance
Summer Seminar 2008 @Susukakedai http://umekoumeda.net/