A presentation about scientific recommender systems from the seminarphase of the project group PUSHPIN at the University of Paderborn
http://pgpushpin.wordpress.com/
5. Recommender Systems
Recommender Systems
u :C ×S →R
C - set of all users
S - set of all items
R - totally ordered set, which describes the usefulness of the
items to the respective user
Scientific Recommender Systems 5
6. Categories of Recommender Systems
Categories of Recommender Systems
content-based: items are recommended that are similar to
items the user liked in the past
collaborative: items are recommended that people liked that
are similar to the user (similar taste/preferences)
hybrid: a combination of content-based and collaborative
recommendation approaches
Scientific Recommender Systems 6
7. Categories of Recommender Systems
Content-based Recommender Systems
utility u(c, s) of an item s is estimated with the help of the
utilities u(c, si ) of all items si ∈ S that user c already rated
that are similar to item s
similarity between items is calculated according to their
attributes
user and item profiles
common problems
limited content analysis
overspecialization
new user problem
Scientific Recommender Systems 7
8. Categories of Recommender Systems
Content-based Recommender: TF-IDF
N - total number of documents in the system
keyword ki appears in ni of the documents
fi,j denotes the number of times a certain keyword ki appears
in a document dj
Scientific Recommender Systems 8
9. Categories of Recommender Systems
Content-based Recommender: TF-IDF
N - total number of documents in the system
keyword ki appears in ni of the documents
fi,j denotes the number of times a certain keyword ki appears
in a document dj
Term Frequency
fi,j
TFi,j = maxz fz,j
maximum in the denominator calculated over the frequencies
of all keywords kz that appear in document dj
Scientific Recommender Systems 8
10. Categories of Recommender Systems
Content-based Recommender: TF-IDF
N - total number of documents in the system
keyword ki appears in ni of the documents
fi,j denotes the number of times a certain keyword ki appears
in a document dj
Term Frequency
fi,j
TFi,j = maxz fz,j
maximum in the denominator calculated over the frequencies
of all keywords kz that appear in document dj
Inverse Document Frequency
N
for a keyword ki : IDFi = log ni
Scientific Recommender Systems 8
11. Categories of Recommender Systems
Content-based Recommender: TF-IDF
N - total number of documents in the system
keyword ki appears in ni of the documents
fi,j denotes the number of times a certain keyword ki appears
in a document dj
Term Frequency
fi,j
TFi,j = maxz fz,j
maximum in the denominator calculated over the frequencies
of all keywords kz that appear in document dj
Inverse Document Frequency
N
for a keyword ki : IDFi = log ni
TF-IDF
wi,j = TFi,j × IDFi
Scientific Recommender Systems 8
12. Categories of Recommender Systems
Collaborative Recommender Systems
utility u(c, s) of an item s is estimated with the help of the
utilities u(ci , s) assigned by users ci ∈ C that are similar to
user c.
common problems
new user/item problem
cold start
sparsity
scalability
Scientific Recommender Systems 9
13. Categories of Recommender Systems
Collaborative Recommender: Apache Mahout (1)
provides a ”toolbox” to create collaborative recommender
systems
input
user (long), item (long), preference (double)
1, 111, 2.5
data model
input from different file formats, database
increase performance with specific data structures
Scientific Recommender Systems 10
14. Categories of Recommender Systems
Collaborative Recommender: Apache Mahout (2)
user-based recommender
Scientific Recommender Systems 11
15. Categories of Recommender Systems
Collaborative Recommender: Apache Mahout (2)
user-based recommender
item-based recommender
Scientific Recommender Systems 11
16. Categories of Recommender Systems
Collaborative Recommender: Apache Mahout (3)
similarity measures
pearson correlation (cosine similarity)
euclidean distance
spearman correlation
log-likelihood
...
slope-one recommender
other experimental recommender implementations
e.g. cluster-based
Scientific Recommender Systems 12
17. Categories of Recommender Systems
Hybrid Recommender Systems
combination of content-based and collaborative methods
seperate content-based and collaborative recommender
systems; results get combined somehow
collaborative recommender system with some added aspects of
content-based methods
content-based recommender system with some added aspects
of collaborative methods
a single recommender system which unifies content-based and
collaborative methods from the beginning
Scientific Recommender Systems 13
18. Categories of Recommender Systems
Hybrid Recommender: SciPlore
SciPlore Overview
Scientific Recommender Systems 14
20. Conclusion
Summary
utility function
categories of recommender systems
content-based
collaborative
hybrid
implementation with Apache Mahout
possible visualizations
Scientific Recommender Systems 16
21. Conclusion
Questions?
Scientific Recommender Systems 17
22. References
References
Apache Mahout: Scalable machine learning and data mining.
http://mahout.apache.org/ - accessed on 6th January 2012
SciPlore: Exploring Science. http://www.sciplore.org -
accessed on 6th January 2012
G Adomavicius and A Tuzhilin. Toward the next generation of
recommender systems: a survey of the state-of-the-art and
possible extensions. IEEE Transactions on Knowledge and
Data Engineering, 17(6):734-749, 2005
B Gipp, J Beel and C Hentschel. Scienstein: A research paper
recommender system, volume 301, pages 309-315. IEEE, 2009
Sean Owen, Robin Anil, Ted Dunning and Ellen Friedman.
Mahout in Action, 2011
Scientific Recommender Systems 18