The depth and breadth of research now being published is overwhelming for an individual researcher to keep track of, let alone consume. Recommender systems have been developed to make it easier for researchers to discover relevant content. However, these have predominately taken the form of item-to-item recommendations using citation network features or text similarity features. This paper details how the Mendeley Suggest recommender system has been designed and developed. We show how implicit user feedback (based on activity data from the reference manager) and collaborative filtering (CF) are used to generate the recommendations for Mendeley Suggest. Because collaborative filtering suffers from the cold start problem (the inability to serve recommendations to new users), we developed additional recommendation methods based on user-defined attributes, such as discipline and research interests. Our off-line evaluation shows that where possible, recommendations based on collaborative filtering perform best, followed by recommendations based on recent activity. However, for cold users (for whom collaborative filtering was not possible) recommendations based on discipline performed best. Additionally, when we segmented users by career stages, we found that among senior academics, content-based recommendations from recent activity had comparable performance to collaborative filtering. This justifies our approach of developing a variety of recommendation methods, in order to serve a range of users across the academic spectrum.
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
Building Recommender Systems for Scholarly Information
1. Mendeley |
Presented By
Date
Building recommender systems for scholarly information
Maya Hristakeva, Daniel Kershaw, Marco Rossetti*, Petr Knoth^,
Benjamin Pettit, Saùl Vargas, Kris Jack
Daniel Kershaw
10th February 2017
* Currently working at Trainline
^ Currently working at the Open University
2. Mendeley | 2
Mendeley / Mendeley Suggest
• Make it easier for user to discover
relevant content
• Utilize Collective intelligence for
article discovery
• Citations slow to propagate
• Citation lags behind user reading
patterns
3. Mendeley |
• For the user the recommendations need to be:
• Novel
• Relevant
• Familiar
• Serendipitous
• Well Explained
• How to deal with cold and warm users
• How to deal with large data sets
3
Challenges
4. Mendeley |
• Implicit – serves recommendations
based on user libraries
• Recent Activity – based off recent
additions to a users library
• Research Interests - based on user
generated tags
• Discipline – based on their self
identified discipline
4
Types of Recommendations
5. Mendeley |
• Implicit – serves recommendations
based on user libraries
• Recent Activity – based off recent
additions to a users library
• Research Interests - based on user
generated tags
• Discipline – based on their self
identified discipline
5
Types of Recommendations
Most Personalized
Least Personalized
6. Mendeley |
Users who have read the same in the past will read the same in the future
Identify similar users using cosine similarity
cos 𝑢1, 𝑢2 =
𝐿1 × 𝐿2
𝐿1 × 𝐿2
The score of document for user is then a sum across the inverted neighborhood
𝑟𝑑
𝑢
=
𝑢′∈𝑠𝑖𝑚(𝑈,𝑢)
cos 𝑢, 𝑢′
, 𝑖𝑓 𝑑 ∈
𝑙𝑖𝑏(𝑢′)
𝑙𝑖𝑏(𝑢)
0, otherwise
6
Implicit – user-based nearest neighbor collaborative filtering
7. Mendeley |
• Use the last article added to a users library or last article read
• Fundamentally item-to-item recommendations
• Performed through comparing the content of article though TF-IDF vectors.
𝑟𝑎 𝑞,𝑦 = 𝑠𝑖𝑚 𝑞, 𝑦 × (1 + log(𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑦, `𝑔𝑙𝑜𝑏𝑎𝑙′
))
• Score modified by the log of the global popularity, as a proxy for the quality of
the article
7
Recent Activity
8. Mendeley |
• Use user defined tags to form
Search Query
• Queries article stored in Elastic
Search, limited to globally popular
documents
• Top N documents served as
recommendations
• More tailored to users
• Not all users have filled in
interests
• Sometimes research interests are
mini abstracts
8
Research Interests
9. Mendeley |
• User chose discipline from a list of 30 categories (e.g. engineering, arts &
humanities)
• Popularity - rank each documents in our catalogue according to the number of
unique users from that discipline who have it in their libraries
𝑝𝑜𝑝 𝑑, 𝑈𝑔 = 𝑢; 𝑢 ∈ 𝑈𝑔; 𝑑 ∈ 𝑙𝑖𝑏(𝑢)
• Trending – rank each document in a discipline based on the rate of growth in
popularity across consecutive weeks.
𝑇𝑑
𝑔
= 𝑝𝑜𝑝 𝑑, 𝑈𝑔, 𝜏 − 𝑝𝑜𝑝 𝑑, 𝑈 𝐺, 𝜏 − 1 : 𝜏 = 0 … 𝑛
9
Discipline
10. Mendeley |
Predicting what users are going to add to their library
Split Mendeley library addition on a time boundary (T).
Warm users in both test and training sets ( ≈ 200,000 users)
Cold users only in the Testing Data ( ≈ 50,000 users)
10
Evaluation
14. Mendeley |
• Unpublished – undergraduates and
new postgrads
• Postgraduate – publish 1 or 2
articles
• Postdoc – published during their
PhD and postdoc
• Lecture – extensively published
across a number of fields
• Professor – prolific author with
many collaborations
14
User Segmentation
16. Mendeley |
Technical implementation
• Spark, Hadoop, Mahout, Elastic Search
Freshness of Content
• Dithering is applied to give the appearance of fresh content to end user
𝑛𝑒𝑤𝑠𝑐𝑜𝑟𝑒 = log(𝑟𝑎𝑛𝑘) + 𝑁 0, log 𝜀 , 𝜀 =
∆𝑟𝑎𝑛𝑘
𝑟𝑎𝑛𝑘
Content Quality
• User add anything to their library
• Pre filtering removes articles with titles containing `content’ or `TOC’
• Completeness of meta data checked
16
Practicalities
2/10/2017
17. Mendeley |
By mining user interaction with the
Implicit feedback recommender,
learn an optimal ranking based on a
comparison of item features and
user features e.g. content vectors
Aggregate the different
recommender systems into one list.
With the mixture of recommenders
personalized to each user.
Future Directions - Learning to Rank
It should be noted that this does not take into account thedifferent publication patterns across disciplines only apply a generic classification. Each metric is applied to warm users in each of the five persona classes.
Postdoc and lecturer have a higher recall for recency. This could be due to more senior researchers exploring a focused topic and adding a succession of related pa- pers, whereas less experienced research’s may be exploring the field and require a broader range of recommendations, as delivered by the CF system.