SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Mendeley |
Presented By
Date
Building recommender systems for scholarly information
Maya Hristakeva, Daniel Kershaw, Marco Rossetti*, Petr Knoth^,
Benjamin Pettit, Saùl Vargas, Kris Jack
Daniel Kershaw
10th February 2017
* Currently working at Trainline
^ Currently working at the Open University
Mendeley | 2
Mendeley / Mendeley Suggest
• Make it easier for user to discover
relevant content
• Utilize Collective intelligence for
article discovery
• Citations slow to propagate
• Citation lags behind user reading
patterns
Mendeley |
• For the user the recommendations need to be:
• Novel
• Relevant
• Familiar
• Serendipitous
• Well Explained
• How to deal with cold and warm users
• How to deal with large data sets
3
Challenges
Mendeley |
• Implicit – serves recommendations
based on user libraries
• Recent Activity – based off recent
additions to a users library
• Research Interests - based on user
generated tags
• Discipline – based on their self
identified discipline
4
Types of Recommendations
Mendeley |
• Implicit – serves recommendations
based on user libraries
• Recent Activity – based off recent
additions to a users library
• Research Interests - based on user
generated tags
• Discipline – based on their self
identified discipline
5
Types of Recommendations
Most Personalized
Least Personalized
Mendeley |
Users who have read the same in the past will read the same in the future
Identify similar users using cosine similarity
cos 𝑢1, 𝑢2 =
𝐿1 × 𝐿2
𝐿1 × 𝐿2
The score of document for user is then a sum across the inverted neighborhood
𝑟𝑑
𝑢
=
𝑢′∈𝑠𝑖𝑚(𝑈,𝑢)
cos 𝑢, 𝑢′
, 𝑖𝑓 𝑑 ∈
𝑙𝑖𝑏(𝑢′)
𝑙𝑖𝑏(𝑢)
0, otherwise
6
Implicit – user-based nearest neighbor collaborative filtering
Mendeley |
• Use the last article added to a users library or last article read
• Fundamentally item-to-item recommendations
• Performed through comparing the content of article though TF-IDF vectors.
𝑟𝑎 𝑞,𝑦 = 𝑠𝑖𝑚 𝑞, 𝑦 × (1 + log(𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑦, `𝑔𝑙𝑜𝑏𝑎𝑙′
))
• Score modified by the log of the global popularity, as a proxy for the quality of
the article
7
Recent Activity
Mendeley |
• Use user defined tags to form
Search Query
• Queries article stored in Elastic
Search, limited to globally popular
documents
• Top N documents served as
recommendations
• More tailored to users
• Not all users have filled in
interests
• Sometimes research interests are
mini abstracts
8
Research Interests
Mendeley |
• User chose discipline from a list of 30 categories (e.g. engineering, arts &
humanities)
• Popularity - rank each documents in our catalogue according to the number of
unique users from that discipline who have it in their libraries
𝑝𝑜𝑝 𝑑, 𝑈𝑔 = 𝑢; 𝑢 ∈ 𝑈𝑔; 𝑑 ∈ 𝑙𝑖𝑏(𝑢)
• Trending – rank each document in a discipline based on the rate of growth in
popularity across consecutive weeks.
𝑇𝑑
𝑔
= 𝑝𝑜𝑝 𝑑, 𝑈𝑔, 𝜏 − 𝑝𝑜𝑝 𝑑, 𝑈 𝐺, 𝜏 − 1 : 𝜏 = 0 … 𝑛
9
Discipline
Mendeley |
Predicting what users are going to add to their library
Split Mendeley library addition on a time boundary (T).
Warm users in both test and training sets ( ≈ 200,000 users)
Cold users only in the Testing Data ( ≈ 50,000 users)
10
Evaluation
Mendeley | 11
Metrics
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑛 =
𝑡𝑝
𝑡𝑝 + 𝑓𝑝
𝐹@𝑛1 = 2 ×
𝑝@𝑛 × 𝑟@𝑛
𝑝@𝑛 + 𝑟@𝑛
𝑟𝑒𝑐𝑎𝑙𝑙@𝑛 =
𝑡𝑝
𝑡𝑝 + 𝑓𝑛
Mendeley | 12
Cold Recommendations
Mendeley | 13
Warm Recommendations
Mendeley |
• Unpublished – undergraduates and
new postgrads
• Postgraduate – publish 1 or 2
articles
• Postdoc – published during their
PhD and postdoc
• Lecture – extensively published
across a number of fields
• Professor – prolific author with
many collaborations
14
User Segmentation
Mendeley | 15
User Segmentation Results
Mendeley |
Technical implementation
• Spark, Hadoop, Mahout, Elastic Search
Freshness of Content
• Dithering is applied to give the appearance of fresh content to end user
𝑛𝑒𝑤𝑠𝑐𝑜𝑟𝑒 = log(𝑟𝑎𝑛𝑘) + 𝑁 0, log 𝜀 , 𝜀 =
∆𝑟𝑎𝑛𝑘
𝑟𝑎𝑛𝑘
Content Quality
• User add anything to their library
• Pre filtering removes articles with titles containing `content’ or `TOC’
• Completeness of meta data checked
16
Practicalities
2/10/2017
Mendeley |
By mining user interaction with the
Implicit feedback recommender,
learn an optimal ranking based on a
comparison of item features and
user features e.g. content vectors
Aggregate the different
recommender systems into one list.
With the mixture of recommenders
personalized to each user.
Future Directions - Learning to Rank
Mendeley |
Presented By
Date
http://bit.ly/MendeleyDataScienceJob
WE ARE HIRING DATA SCIENTISTS & ENGINEERS!
18

Weitere ähnliche Inhalte

Ähnlich wie Building Recommender Systems for Scholarly Information

Research recommendations at Mendeley
Research recommendations at MendeleyResearch recommendations at Mendeley
Research recommendations at MendeleyMarco Rossetti
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728Michael Levine-Clark
 
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمحاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمركز البحوث الأقسام العلمية
 
One System, Different Expectations (Laura Morse)
One System, Different Expectations (Laura Morse)One System, Different Expectations (Laura Morse)
One System, Different Expectations (Laura Morse)Charleston Conference
 
The OCLC Research Library Partnership
The OCLC Research Library PartnershipThe OCLC Research Library Partnership
The OCLC Research Library PartnershipOCLC
 
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013nettiel
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovskyifi8106tlu
 
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمحاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمركز البحوث الأقسام العلمية
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Giannis Tsakonas
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleKris Jack
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Lillian Rigling
 
Recommender systems
Recommender systemsRecommender systems
Recommender systemsTamer Rezk
 

Ähnlich wie Building Recommender Systems for Scholarly Information (20)

Research recommendations at Mendeley
Research recommendations at MendeleyResearch recommendations at Mendeley
Research recommendations at Mendeley
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728
 
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمحاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
 
Scientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked DataScientific Publication Retrieval in Linked Data
Scientific Publication Retrieval in Linked Data
 
One System, Different Expectations (Laura Morse)
One System, Different Expectations (Laura Morse)One System, Different Expectations (Laura Morse)
One System, Different Expectations (Laura Morse)
 
The OCLC Research Library Partnership
The OCLC Research Library PartnershipThe OCLC Research Library Partnership
The OCLC Research Library Partnership
 
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
DDA/OAMI Update - NISO Update, ALA Annual Chicago 2013
 
DDA/OAMI Update, NISO Update ALA Annual 2013
DDA/OAMI Update, NISO Update ALA Annual 2013DDA/OAMI Update, NISO Update ALA Annual 2013
DDA/OAMI Update, NISO Update ALA Annual 2013
 
NISO Webinar: Keyword Search = "Improve Discovery Systems"
NISO Webinar: Keyword Search = "Improve Discovery Systems"NISO Webinar: Keyword Search = "Improve Discovery Systems"
NISO Webinar: Keyword Search = "Improve Discovery Systems"
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
2016-05-31 Venia Legendi (CEITER): Sergey Sosnovsky
 
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزيرمحاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
محاضرة برنامج Endnote لتبويب المراجع العلمية د.غادة باوزير
 
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Recommenders.ppt
Recommenders.pptRecommenders.ppt
Recommenders.ppt
 
Data and Research Infrastructures and Open Science
Data and Research Infrastructures and Open ScienceData and Research Infrastructures and Open Science
Data and Research Infrastructures and Open Science
 

Kürzlich hochgeladen

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 

Kürzlich hochgeladen (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 

Building Recommender Systems for Scholarly Information

  • 1. Mendeley | Presented By Date Building recommender systems for scholarly information Maya Hristakeva, Daniel Kershaw, Marco Rossetti*, Petr Knoth^, Benjamin Pettit, Saùl Vargas, Kris Jack Daniel Kershaw 10th February 2017 * Currently working at Trainline ^ Currently working at the Open University
  • 2. Mendeley | 2 Mendeley / Mendeley Suggest • Make it easier for user to discover relevant content • Utilize Collective intelligence for article discovery • Citations slow to propagate • Citation lags behind user reading patterns
  • 3. Mendeley | • For the user the recommendations need to be: • Novel • Relevant • Familiar • Serendipitous • Well Explained • How to deal with cold and warm users • How to deal with large data sets 3 Challenges
  • 4. Mendeley | • Implicit – serves recommendations based on user libraries • Recent Activity – based off recent additions to a users library • Research Interests - based on user generated tags • Discipline – based on their self identified discipline 4 Types of Recommendations
  • 5. Mendeley | • Implicit – serves recommendations based on user libraries • Recent Activity – based off recent additions to a users library • Research Interests - based on user generated tags • Discipline – based on their self identified discipline 5 Types of Recommendations Most Personalized Least Personalized
  • 6. Mendeley | Users who have read the same in the past will read the same in the future Identify similar users using cosine similarity cos 𝑢1, 𝑢2 = 𝐿1 × 𝐿2 𝐿1 × 𝐿2 The score of document for user is then a sum across the inverted neighborhood 𝑟𝑑 𝑢 = 𝑢′∈𝑠𝑖𝑚(𝑈,𝑢) cos 𝑢, 𝑢′ , 𝑖𝑓 𝑑 ∈ 𝑙𝑖𝑏(𝑢′) 𝑙𝑖𝑏(𝑢) 0, otherwise 6 Implicit – user-based nearest neighbor collaborative filtering
  • 7. Mendeley | • Use the last article added to a users library or last article read • Fundamentally item-to-item recommendations • Performed through comparing the content of article though TF-IDF vectors. 𝑟𝑎 𝑞,𝑦 = 𝑠𝑖𝑚 𝑞, 𝑦 × (1 + log(𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑦, `𝑔𝑙𝑜𝑏𝑎𝑙′ )) • Score modified by the log of the global popularity, as a proxy for the quality of the article 7 Recent Activity
  • 8. Mendeley | • Use user defined tags to form Search Query • Queries article stored in Elastic Search, limited to globally popular documents • Top N documents served as recommendations • More tailored to users • Not all users have filled in interests • Sometimes research interests are mini abstracts 8 Research Interests
  • 9. Mendeley | • User chose discipline from a list of 30 categories (e.g. engineering, arts & humanities) • Popularity - rank each documents in our catalogue according to the number of unique users from that discipline who have it in their libraries 𝑝𝑜𝑝 𝑑, 𝑈𝑔 = 𝑢; 𝑢 ∈ 𝑈𝑔; 𝑑 ∈ 𝑙𝑖𝑏(𝑢) • Trending – rank each document in a discipline based on the rate of growth in popularity across consecutive weeks. 𝑇𝑑 𝑔 = 𝑝𝑜𝑝 𝑑, 𝑈𝑔, 𝜏 − 𝑝𝑜𝑝 𝑑, 𝑈 𝐺, 𝜏 − 1 : 𝜏 = 0 … 𝑛 9 Discipline
  • 10. Mendeley | Predicting what users are going to add to their library Split Mendeley library addition on a time boundary (T). Warm users in both test and training sets ( ≈ 200,000 users) Cold users only in the Testing Data ( ≈ 50,000 users) 10 Evaluation
  • 11. Mendeley | 11 Metrics 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@𝑛 = 𝑡𝑝 𝑡𝑝 + 𝑓𝑝 𝐹@𝑛1 = 2 × 𝑝@𝑛 × 𝑟@𝑛 𝑝@𝑛 + 𝑟@𝑛 𝑟𝑒𝑐𝑎𝑙𝑙@𝑛 = 𝑡𝑝 𝑡𝑝 + 𝑓𝑛
  • 12. Mendeley | 12 Cold Recommendations
  • 13. Mendeley | 13 Warm Recommendations
  • 14. Mendeley | • Unpublished – undergraduates and new postgrads • Postgraduate – publish 1 or 2 articles • Postdoc – published during their PhD and postdoc • Lecture – extensively published across a number of fields • Professor – prolific author with many collaborations 14 User Segmentation
  • 15. Mendeley | 15 User Segmentation Results
  • 16. Mendeley | Technical implementation • Spark, Hadoop, Mahout, Elastic Search Freshness of Content • Dithering is applied to give the appearance of fresh content to end user 𝑛𝑒𝑤𝑠𝑐𝑜𝑟𝑒 = log(𝑟𝑎𝑛𝑘) + 𝑁 0, log 𝜀 , 𝜀 = ∆𝑟𝑎𝑛𝑘 𝑟𝑎𝑛𝑘 Content Quality • User add anything to their library • Pre filtering removes articles with titles containing `content’ or `TOC’ • Completeness of meta data checked 16 Practicalities 2/10/2017
  • 17. Mendeley | By mining user interaction with the Implicit feedback recommender, learn an optimal ranking based on a comparison of item features and user features e.g. content vectors Aggregate the different recommender systems into one list. With the mixture of recommenders personalized to each user. Future Directions - Learning to Rank
  • 18. Mendeley | Presented By Date http://bit.ly/MendeleyDataScienceJob WE ARE HIRING DATA SCIENTISTS & ENGINEERS! 18

Hinweis der Redaktion

  1. It should be noted that this does not take into account thedifferent publication patterns across disciplines only apply a generic classification. Each metric is applied to warm users in each of the five persona classes.
  2. Postdoc and lecturer have a higher recall for recency. This could be due to more senior researchers exploring a focused topic and adding a succession of related pa- pers, whereas less experienced research’s may be exploring the field and require a broader range of recommendations, as delivered by the CF system.