Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Building Recommender Systems for Scholarly Information

178 Aufrufe

Veröffentlicht am

The depth and breadth of research now being published is overwhelming for an individual researcher to keep track of, let alone consume. Recommender systems have been developed to make it easier for researchers to discover relevant content. However, these have predominately taken the form of item-to-item recommendations using citation network features or text similarity features. This paper details how the Mendeley Suggest recommender system has been designed and developed. We show how implicit user feedback (based on activity data from the reference manager) and collaborative filtering (CF) are used to generate the recommendations for Mendeley Suggest. Because collaborative filtering suffers from the cold start problem (the inability to serve recommendations to new users), we developed additional recommendation methods based on user-defined attributes, such as discipline and research interests. Our off-line evaluation shows that where possible, recommendations based on collaborative filtering perform best, followed by recommendations based on recent activity. However, for cold users (for whom collaborative filtering was not possible) recommendations based on discipline performed best. Additionally, when we segmented users by career stages, we found that among senior academics, content-based recommendations from recent activity had comparable performance to collaborative filtering. This justifies our approach of developing a variety of recommendation methods, in order to serve a range of users across the academic spectrum.

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Building Recommender Systems for Scholarly Information

  1. 1. Mendeley | Presented By Date Building recommender systems for scholarly information Maya Hristakeva, Daniel Kershaw, Marco Rossetti*, Petr Knoth^, Benjamin Pettit, Saùl Vargas, Kris Jack Daniel Kershaw 10th February 2017 * Currently working at Trainline ^ Currently working at the Open University
  2. 2. Mendeley | 2 Mendeley / Mendeley Suggest • Make it easier for user to discover relevant content • Utilize Collective intelligence for article discovery • Citations slow to propagate • Citation lags behind user reading patterns
  3. 3. Mendeley | • For the user the recommendations need to be: • Novel • Relevant • Familiar • Serendipitous • Well Explained • How to deal with cold and warm users • How to deal with large data sets 3 Challenges
  4. 4. Mendeley | • Implicit – serves recommendations based on user libraries • Recent Activity – based off recent additions to a users library • Research Interests - based on user generated tags • Discipline – based on their self identified discipline 4 Types of Recommendations
  5. 5. Mendeley | • Implicit – serves recommendations based on user libraries • Recent Activity – based off recent additions to a users library • Research Interests - based on user generated tags • Discipline – based on their self identified discipline 5 Types of Recommendations Most Personalized Least Personalized
  6. 6. Mendeley | Users who have read the same in the past will read the same in the future Identify similar users using cosine similarity cos 𝑢1, 𝑢2 = 𝐿1 × 𝐿2 𝐿1 × 𝐿2 The score of document for user is then a sum across the inverted neighborhood 𝑟𝑑 𝑢 = 𝑢′∈𝑠𝑖𝑚(𝑈,𝑢) cos 𝑢, 𝑢′ , 𝑖𝑓 𝑑 ∈ 𝑙𝑖𝑏(𝑢′) 𝑙𝑖𝑏(𝑢) 0, otherwise 6 Implicit – user-based nearest neighbor collaborative filtering
  7. 7. Mendeley | • Use the last article added to a users library or last article read • Fundamentally item-to-item recommendations • Performed through comparing the content of article though TF-IDF vectors. 𝑟𝑎 𝑞,𝑦 = 𝑠𝑖𝑚 𝑞, 𝑦 × (1 + log(𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑦, `𝑔𝑙𝑜𝑏𝑎𝑙′ )) • Score modified by the log of the global popularity, as a proxy for the quality of the article 7 Recent Activity
  8. 8. Mendeley | • Use user defined tags to form Search Query • Queries article stored in Elastic Search, limited to globally popular documents • Top N documents served as recommendations • More tailored to users • Not all users have filled in interests • Sometimes research interests are mini abstracts 8 Research Interests
  9. 9. Mendeley | • User chose discipline from a list of 30 categories (e.g. engineering, arts & humanities) • Popularity - rank each documents in our catalogue according to the number of unique users from that discipline who have it in their libraries 𝑝𝑜𝑝 𝑑, 𝑈𝑔 = 𝑢; 𝑢 ∈ 𝑈𝑔; 𝑑 ∈ 𝑙𝑖𝑏(𝑢) • Trending – rank each document in a discipline based on the rate of growth in popularity across consecutive weeks. 𝑇𝑑 𝑔 = 𝑝𝑜𝑝 𝑑, 𝑈𝑔, 𝜏 − 𝑝𝑜𝑝 𝑑, 𝑈 𝐺, 𝜏 − 1 : 𝜏 = 0 … 𝑛 9 Discipline
  10. 10. Mendeley | Predicting what users are going to add to their library Split Mendeley library addition on a time boundary (T). Warm users in both test and training sets ( ≈ 200,000 users) Cold users only in the Testing Data ( ≈ 50,000 users) 10 Evaluation
  11. 11. Mendeley | 11 Metrics 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑛 = 𝑡𝑝 𝑡𝑝 + 𝑓𝑝 𝐹@𝑛1 = 2 × 𝑝@𝑛 × 𝑟@𝑛 𝑝@𝑛 + 𝑟@𝑛 𝑟𝑒𝑐𝑎𝑙𝑙@𝑛 = 𝑡𝑝 𝑡𝑝 + 𝑓𝑛
  12. 12. Mendeley | 12 Cold Recommendations
  13. 13. Mendeley | 13 Warm Recommendations
  14. 14. Mendeley | • Unpublished – undergraduates and new postgrads • Postgraduate – publish 1 or 2 articles • Postdoc – published during their PhD and postdoc • Lecture – extensively published across a number of fields • Professor – prolific author with many collaborations 14 User Segmentation
  15. 15. Mendeley | 15 User Segmentation Results
  16. 16. Mendeley | Technical implementation • Spark, Hadoop, Mahout, Elastic Search Freshness of Content • Dithering is applied to give the appearance of fresh content to end user 𝑛𝑒𝑤𝑠𝑐𝑜𝑟𝑒 = log(𝑟𝑎𝑛𝑘) + 𝑁 0, log 𝜀 , 𝜀 = ∆𝑟𝑎𝑛𝑘 𝑟𝑎𝑛𝑘 Content Quality • User add anything to their library • Pre filtering removes articles with titles containing `content’ or `TOC’ • Completeness of meta data checked 16 Practicalities 2/10/2017
  17. 17. Mendeley | By mining user interaction with the Implicit feedback recommender, learn an optimal ranking based on a comparison of item features and user features e.g. content vectors Aggregate the different recommender systems into one list. With the mixture of recommenders personalized to each user. Future Directions - Learning to Rank
  18. 18. Mendeley | Presented By Date http://bit.ly/MendeleyDataScienceJob WE ARE HIRING DATA SCIENTISTS & ENGINEERS! 18