Mendeley:    RecommendationSystems for Academic           Literature   Kris Jack, PhDData Mining Team Lead
“All the time we are veryconscious of the huge challengesthat human society has now –curing cancer, understandingthe brain...
Overview➔    whats a recommender and what does it look like?➔    whats Mendeley?➔    the secrets behind recommenders➔    r...
Whats a   recommender andwhat does it look like?
Whats a recommender?Definition:     A recommendation system     (recommender) is a subclass of     information filtering s...
Recommendation Systems in the Wild
Recommendation Vs. Search➔    search is a pull strategy      vs.➔    recommendation is a push strategy
Recommendation Vs. Searchsearch is like following a path...
Recommendation Vs. Search recommendation is  like being on a roller  coaster...A differentsense ofcontrol
Whats Mendeley?
What is Mendeley?...a large data technologystartup company                       ...and its on a mission to               ...
Mendeley          Last.fm                                                   3) Last.fm builds your music                wo...
Mendeley   Last.fmmusic libraries                  research librariesartists                          researcherssongs    ...
Mendeley provides tools to help users......organisetheir research
Mendeley provides tools to help users...                 ...collaborate with                     one another...organisethe...
US National Academy of Engineering “Grand Challenges”:       Climate       change    Sustainable food                     ...
Mendeley provides tools to help users...                 ...collaborate with                     one another...organise   ...
Mendeley provides tools to help users...                 ...collaborate with                     one another...organise   ...
1.4 million+ users; the 20 largest userbases:                    University of Cambridge                         Stanford ...
50m          Real-time data on 28m unique papers:   Thomson Reuters’  Web of Knowledge  (dating from 1934)Mendeley after  ...
The secrets behind       recommendersQ1/2: How can a tool generate recommendations?Q2/2: How can you measure the tools per...
Q1/2: How can a tool generate recommendations?Content-based Filtering                   Collaborative FilteringFind items ...
Q2/2: How can you measure the tools performance? ➔      Cross validation with hold outs     ➔        get yourself a good g...
Recommenders   @ Mendeley            1) Related Research            ●              given 1 research article            ●  ...
Use Case 1: Related Research    Strategy       content-based approach (tf-idf with lucene implementation)       search for...
Use Case 1: Related Research                              tf-idf Precision per Field when Field is Available              ...
Use Case 1: Related Research                              tf-idf Precision for Field Combos when Field is Available       ...
How does Mendeleyuse recommendation           2/2 Personalised                                   Recommendations       tec...
Use Case 2: Perso Recommendations   Strategy      collaborative filtering (item-based with apache mahout)      recommend a...
Use Case 2: Perso Recommendations   Strategy      collaborative filtering (item-based with apache mahout)      recommend a...
Input:User libraries                 Output:                 Recommend 10                 articles to each user
Test:                         10-fold cross validation                         50,000 user libraries                      ...
Test:                       10-fold cross validation                       50,000 user libraries                          ...
Test:                       Release to a subset of                       users                            10 months ago   ...
Article Recommendation Acceptance RatesAcceptance rate (i.e. accept/reject clicks)                                        ...
Precision at 10 articles                           Precision by Library Size                           Number of articles ...
Test:                                       10-fold cross validation                                       50,000 user lib...
Conclusions Summary➔    Recommendations can be complementary to search➔    They can help users to discover interesting ite...
Conclusions Summary➔  Crowd-sourced metadata can have a powefulinformative value (e.g. article tags)➔    Sometimes you nee...
“All the time we are veryconscious of the huge challengesthat human society has now –curing cancer, understandingthe brain...
www.mendeley.com
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic Literature
Nächste SlideShare
Wird geladen in …5
×

Mendeley: Recommendation Systems for Academic Literature

2.148 Aufrufe

Veröffentlicht am

I gave this talk to an MSc class about Semantic Technologies at the Technical University of Graz (TUG) on 2012/01/12.

It presents what recommendation systems are and how they are often used before delving into how they are used at Mendeley. Real-world results from Mendeley’s article recommendation system are also presented.

The work presented here has been partially funded by the European Commission as part of the TEAM IAPP project (grant no. 251514) within the FP7 People Programme (Marie Curie).

Veröffentlicht in: Technologie, Bildung

Mendeley: Recommendation Systems for Academic Literature

  1. 1. Mendeley: RecommendationSystems for Academic Literature Kris Jack, PhDData Mining Team Lead
  2. 2. “All the time we are veryconscious of the huge challengesthat human society has now –curing cancer, understandingthe brain for Alzheimer‘s [...].But a lot of the state of knowledgeof the human race is sitting in thescientists’ computers, and iscurrently not shared […] We needto get it unlocked so we can tacklethose huge problems.“
  3. 3. Overview➔ whats a recommender and what does it look like?➔ whats Mendeley?➔ the secrets behind recommenders➔ recommenders @ Mendeley
  4. 4. Whats a recommender andwhat does it look like?
  5. 5. Whats a recommender?Definition: A recommendation system (recommender) is a subclass of information filtering system that aims to predict a users interest in items.
  6. 6. Recommendation Systems in the Wild
  7. 7. Recommendation Vs. Search➔ search is a pull strategy vs.➔ recommendation is a push strategy
  8. 8. Recommendation Vs. Searchsearch is like following a path...
  9. 9. Recommendation Vs. Search recommendation is like being on a roller coaster...A differentsense ofcontrol
  10. 10. Whats Mendeley?
  11. 11. What is Mendeley?...a large data technologystartup company ...and its on a mission to change the way that research is done!
  12. 12. Mendeley Last.fm 3) Last.fm builds your music works like this: profile and recommends you music you also could like... and1) Install “Audioscrobbler” it’s the world‘s biggest open music database 2) Listen to music
  13. 13. Mendeley Last.fmmusic libraries research librariesartists researcherssongs papersgenres disciplines
  14. 14. Mendeley provides tools to help users......organisetheir research
  15. 15. Mendeley provides tools to help users... ...collaborate with one another...organisetheir research
  16. 16. US National Academy of Engineering “Grand Challenges”: Climate change Sustainable food supplies Artificial Clean energy Intelligence Clean water Terrorist Pandemic diseases violence Tools of scientific discovery
  17. 17. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  18. 18. Mendeley provides tools to help users... ...collaborate with one another...organise ...discover newtheir research research
  19. 19. 1.4 million+ users; the 20 largest userbases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida University of North Carolina
  20. 20. 50m Real-time data on 28m unique papers: Thomson Reuters’ Web of Knowledge (dating from 1934)Mendeley after 16 months:
  21. 21. The secrets behind recommendersQ1/2: How can a tool generate recommendations?Q2/2: How can you measure the tools performance?
  22. 22. Q1/2: How can a tool generate recommendations?Content-based Filtering Collaborative FilteringFind items with similar Find items that users who arecharacteristics (e.g. title, similar to you also liked (wisdomdiscipline) to what the user of the crowds)previously likedTF-IDF, BM25, Bayesian User-based and item-basedclassifiers, decision trees, artificial variations, matrix factorisationneural networksQuickly absorbs new items No need to understand item(ovecomes cold start problem) characteristicsCan make good recommendations Tends to give more novelfrom very few examples recommendations Hybrid tools too...
  23. 23. Q2/2: How can you measure the tools performance? ➔ Cross validation with hold outs ➔ get yourself a good ground truth ➔ hide a fraction of your data from the system ➔ try to predict the hidden fraction from the remaining data ➔ calculate precision and recall ➔ Let users decide ➔ set up evaluations with real users (experimental) ➔ track tool usage by users
  24. 24. Recommenders @ Mendeley 1) Related Research ● given 1 research article ● find other related articles 2) Personalised Recommendations ● given a users profile (e.g. interests) ● find new articles of interest to them
  25. 25. Use Case 1: Related Research Strategy content-based approach (tf-idf with lucene implementation) search for articles with same metadata (e.g. title, tags) Evaluation cross-validation with hold outs on a ground truth data set
  26. 26. Use Case 1: Related Research tf-idf Precision per Field when Field is Available 0.5 Q2/2 What are our results? 0.45 0.4 0.35 0.3 Precision @ 5 0.25 0.2 0.15 0.1 0.05 0 tag abstract mesh-term title general-keyword author keyword metadata fieldResults 1) tags are the most informative field for finding related research
  27. 27. Use Case 1: Related Research tf-idf Precision for Field Combos when Field is Available 0.5 0.45 0.4 abstract+author+general-keyword+tag+title 0.35 0.3 precision @ 5 0.25 0.2 0.15 0.1 0.05 0 tag bestCombo abstract mesh-term title general-keyword author keyword metadata field(s)Results 2) tags outperform combinations of fields
  28. 28. How does Mendeleyuse recommendation 2/2 Personalised Recommendations technologies? 2) Personalised Recommendations ● given a users profile (e.g. interests) ● find new articles of interest to them
  29. 29. Use Case 2: Perso Recommendations Strategy collaborative filtering (item-based with apache mahout) recommend articles to researchers that would interest them Evaluation cross-validation with hold outs on a ground truth data set
  30. 30. Use Case 2: Perso Recommendations Strategy collaborative filtering (item-based with apache mahout) recommend articles to researchers that would interest them Evaluation cross-validation with hold outs on a ground truth data set
  31. 31. Input:User libraries Output: Recommend 10 articles to each user
  32. 32. Test: 10-fold cross validation 50,000 user libraries 16 months agoResults:<0.025 precision at 10
  33. 33. Test: 10-fold cross validation 50,000 user libraries 10 months ago (i.e. + 6 months)Results:~0.1 precision at 10
  34. 34. Test: Release to a subset of users 10 months ago (i.e. + 6 months)Results:~0.4 precision at 10
  35. 35. Article Recommendation Acceptance RatesAcceptance rate (i.e. accept/reject clicks) Number of months live
  36. 36. Precision at 10 articles Precision by Library Size Number of articles in user library
  37. 37. Test: 10-fold cross validation 50,000 user librariesSo, results comparable to non- Completely distributed, so candistributed recommender easily run on EC2 within 24 hours...
  38. 38. Conclusions Summary➔ Recommendations can be complementary to search➔ They can help users to discover interesting items➔ They can exploit item metadata (content-based)➔ They can exploit the wisdom of the crowds (CF)
  39. 39. Conclusions Summary➔ Crowd-sourced metadata can have a powefulinformative value (e.g. article tags)➔ Sometimes you need to let data grow➔ Evaluations under lab conditions dont alwayspredict real world results well➔ Recommenders dont just have to be about makingmoney … remember where we started...?
  40. 40. “All the time we are veryconscious of the huge challengesthat human society has now –curing cancer, understandingthe brain for Alzheimer‘s [...].But a lot of the state of knowledgeof the human race is sitting in thescientists’ computers, and iscurrently not shared […] We needto get it unlocked so we can tacklethose huge problems.“
  41. 41. www.mendeley.com

×