Building Recommender Systems for Scholarly Information

Mendeley |
Presented By
Date
Building recommender systems for scholarly information
Maya Hristakeva, Daniel Kershaw, Marco Rossetti*, Petr Knoth^,
Benjamin Pettit, Saùl Vargas, Kris Jack
Daniel Kershaw
10th February 2017
* Currently working at Trainline
^ Currently working at the Open University

Mendeley | 2
Mendeley / Mendeley Suggest
• Make it easier for user to discover
relevant content
• Utilize Collective intelligence for
article discovery
• Citations slow to propagate
• Citation lags behind user reading
patterns

Mendeley |
• For the user the recommendations need to be:
• Novel
• Relevant
• Familiar
• Serendipitous
• Well Explained
• How to deal with cold and warm users
• How to deal with large data sets
3
Challenges

Mendeley |
• Implicit – serves recommendations
based on user libraries
• Recent Activity – based off recent
additions to a users library
• Research Interests - based on user
generated tags
• Discipline – based on their self
identified discipline
4
Types of Recommendations

Mendeley |
• Implicit – serves recommendations
based on user libraries
• Recent Activity – based off recent
additions to a users library
• Research Interests - based on user
generated tags
• Discipline – based on their self
identified discipline
5
Types of Recommendations
Most Personalized
Least Personalized

Mendeley |
Users who have read the same in the past will read the same in the future
Identify similar users using cosine similarity
cos 𝑢1, 𝑢2 =
𝐿1 × 𝐿2
𝐿1 × 𝐿2
The score of document for user is then a sum across the inverted neighborhood
𝑟𝑑
𝑢
=
𝑢′∈𝑠𝑖𝑚(𝑈,𝑢)
cos 𝑢, 𝑢′
, 𝑖𝑓 𝑑 ∈
𝑙𝑖𝑏(𝑢′)
𝑙𝑖𝑏(𝑢)
0, otherwise
6
Implicit – user-based nearest neighbor collaborative filtering

Mendeley |
• Use the last article added to a users library or last article read
• Fundamentally item-to-item recommendations
• Performed through comparing the content of article though TF-IDF vectors.
𝑟𝑎 𝑞,𝑦 = 𝑠𝑖𝑚 𝑞, 𝑦 × (1 + log(𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑦, `𝑔𝑙𝑜𝑏𝑎𝑙′
))
• Score modified by the log of the global popularity, as a proxy for the quality of
the article
7
Recent Activity

Mendeley |
• Use user defined tags to form
Search Query
• Queries article stored in Elastic
Search, limited to globally popular
documents
• Top N documents served as
recommendations
• More tailored to users
• Not all users have filled in
interests
• Sometimes research interests are
mini abstracts
8
Research Interests

Mendeley |
• User chose discipline from a list of 30 categories (e.g. engineering, arts &
humanities)
• Popularity - rank each documents in our catalogue according to the number of
unique users from that discipline who have it in their libraries
𝑝𝑜𝑝 𝑑, 𝑈𝑔 = 𝑢; 𝑢 ∈ 𝑈𝑔; 𝑑 ∈ 𝑙𝑖𝑏(𝑢)
• Trending – rank each document in a discipline based on the rate of growth in
popularity across consecutive weeks.
𝑇𝑑
𝑔
= 𝑝𝑜𝑝 𝑑, 𝑈𝑔, 𝜏 − 𝑝𝑜𝑝 𝑑, 𝑈 𝐺, 𝜏 − 1 : 𝜏 = 0 … 𝑛
9
Discipline

Mendeley |
Predicting what users are going to add to their library
Split Mendeley library addition on a time boundary (T).
Warm users in both test and training sets ( ≈ 200,000 users)
Cold users only in the Testing Data ( ≈ 50,000 users)
10
Evaluation

Mendeley | 11
Metrics
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑛 =
𝑡𝑝
𝑡𝑝 + 𝑓𝑝
𝐹@𝑛1 = 2 ×
𝑝@𝑛 × 𝑟@𝑛
𝑝@𝑛 + 𝑟@𝑛
𝑟𝑒𝑐𝑎𝑙𝑙@𝑛 =
𝑡𝑝
𝑡𝑝 + 𝑓𝑛

Mendeley | 12
Cold Recommendations

Mendeley | 13
Warm Recommendations

Mendeley |
• Unpublished – undergraduates and
new postgrads
• Postgraduate – publish 1 or 2
articles
• Postdoc – published during their
PhD and postdoc
• Lecture – extensively published
across a number of fields
• Professor – prolific author with
many collaborations
14
User Segmentation

Mendeley | 15
User Segmentation Results

Mendeley |
Technical implementation
• Spark, Hadoop, Mahout, Elastic Search
Freshness of Content
• Dithering is applied to give the appearance of fresh content to end user
𝑛𝑒𝑤𝑠𝑐𝑜𝑟𝑒 = log(𝑟𝑎𝑛𝑘) + 𝑁 0, log 𝜀 , 𝜀 =
∆𝑟𝑎𝑛𝑘
𝑟𝑎𝑛𝑘
Content Quality
• User add anything to their library
• Pre filtering removes articles with titles containing `content’ or `TOC’
• Completeness of meta data checked
16
Practicalities
2/10/2017

Mendeley |
By mining user interaction with the
Implicit feedback recommender,
learn an optimal ranking based on a
comparison of item features and
user features e.g. content vectors
Aggregate the different
recommender systems into one list.
With the mixture of recommenders
personalized to each user.
Future Directions - Learning to Rank

Mendeley |
Presented By
Date
http://bit.ly/MendeleyDataScienceJob
WE ARE HIRING DATA SCIENTISTS & ENGINEERS!
18

Building Recommender Systems for Scholarly Information

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Building Recommender Systems for Scholarly Information

Ähnlich wie Building Recommender Systems for Scholarly Information (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Building Recommender Systems for Scholarly Information

Hinweis der Redaktion