I was invited to be the keynote speaker at a special track on Recommendation; Data Sharing and Research Practices in Science 2.0 at the I-KNOW 2011 conference (http://i-know.tugraz.at/) on 2011/09/07.
It presents the challanges involved in crowdsourcing the world's largest research catalogue and then building a recommendation service on top of them that scales to serve millions of users.
4. Mendeley provides tools to help users...
...collaborate with
one another
...organise ...discover new
their research research
5. Mendeley provides tools to help users...
...collaborate with
one another
...organise ...discover new
their research research
6. Mendeley provides tools to help users...
...collaborate with
one another
...organise ...discover new
their research research
7.
8. Mendeley provides tools to help users...
...collaborate with
one another
...organise ...discover new
their research research
9. Mendeley provides tools to help users...
...collaborate with
one another
...organise ...discover new
their research research
10. Summary
Summary
➔
what is mendeley?
➔
crowdsourcing on a large scale
➔
recommendations on a large scale
➔
data for you
11. Mendeley Last.fm
3) Last.fm builds your music
works like this: profile and recommends you
music you also could like
1) Install “Audioscrobbler”
and it’s the world’s
largest open music
2) Listen to music database!
12. Mendeley Last.fm
music libraries research libraries
artists researchers
songs papers
genres disciplines
Screenshot taken from
Mendeley is the world’s www.mendeley.com
largest crowdsourced on 04/09/11
research catalogue!
21. Recommendation through Test:
collaborative filtering 10-fold cross validation
50,000 user libraries
Article's in library or not
(e.g. binary input) 16 months ago
Various similarity metrics
(e.g. cooccurrence,
loglikelihood, tanimoto)
Results:
<0.025 precision at 10
22. Recommendation through Test:
collaborative filtering 10-fold cross validation
50,000 user libraries
Article's in library or not 10 months ago
(e.g. binary input) (i.e. + 6 months)
Various similarity metrics
(e.g. cooccurrence,
loglikelihood, tanimoto)
Results:
~0.1 precision at 10
23. Recommendation through Test:
collaborative filtering Release to a subset of
users
Article's in library or not 10 months ago
(e.g. binary input) (i.e. + 6 months)
Various similarity metrics
(e.g. cooccurrence,
loglikelihood, tanimoto)
Results:
~0.4 precision at 10
25. Article Recommendation:
System Requirements
1 million users!
generate personal article
recommendations users
(i.e. “here are some articles days!
that may interest you”)
update recommendations
every 24 hours
How to scale up?
26.
27. Test:
10-fold cross validation
50,000 user libraries
So, results comparable to non- Completely distributed, so can
distributed recommender easily run on EC2 within 24
hours...
28. Article Recommendation Precision Across User
Library Sizes (using cooccurrence)
Precision at 10 articles
How will real
users react?
Number of articles in user library
29. Summary
Summary
➔
what is mendeley?
➔
crowdsourcing on a large scale
➔
recommendations on a large scale
➔
data for you
30. Public Data
user libraries
50,000 libraries
4,848,724 articles
3,652,285 unique articles
library readership library stars
Obtain from: http://dev.mendeley.com/datachallenge