SlideShare a Scribd company logo
1 of 47
Download to read offline
Mendeley:
    Recommendation
Systems for Academic
           Literature




   Kris Jack, PhD
Data Mining Team Lead
“All the time we are very
conscious of the huge challenges
that human society has now –
curing cancer, understanding
the brain for Alzheimer‘s [...].

But a lot of the state of knowledge
of the human race is sitting in the
scientists’ computers, and is
currently not shared […] We need
to get it unlocked so we can tackle
those huge problems.“
Overview

➔
    what's a recommender and what does it look like?

➔
    what's Mendeley?

➔
    the secrets behind recommenders

➔
    recommenders @ Mendeley
What's a
   recommender and
what does it look like?
What's a recommender?



Definition:

     A recommendation system
     (recommender) is a subclass of
     information filtering system that
     aims to predict a user's interest
     in items.
Recommendation Systems in the Wild
Recommendation Vs. Search




➔
    search is a pull strategy
      vs.
➔
    recommendation is a push strategy
Recommendation Vs. Search

search is like
 following a path...
Recommendation Vs. Search

 recommendation is
  like being on a roller
  coaster...




A different
sense of
control
What's Mendeley?
What is Mendeley?


...a large data technology
startup company




                       ...and it's on a mission to
                            change the way that
                                 research is done!
Mendeley          Last.fm
                                                   3) Last.fm builds your music
                works like this:                   profile and recommends you
                                                   music you also could like... and
1) Install “Audioscrobbler”                        it’s the world‘s biggest open
                                                   music database




                              2) Listen to music
Mendeley   Last.fm


music libraries                  research libraries


artists                          researchers


songs                            papers


genres                           disciplines
Mendeley provides tools to help users...


...organise
their research
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise
their research
US National Academy of Engineering “Grand Challenges”:



       Climate
       change    Sustainable food
                         supplies
                                    Artificial
  Clean energy                      Intelligence
        Clean water              Terrorist
               Pandemic diseases violence
                      Tools of scientific
                              discovery
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                            ...discover new
their research                                research
Mendeley provides tools to help users...
                 ...collaborate with
                     one another
...organise                            ...discover new
their research                                research
1.4 million+ users; the 20 largest userbases:
                    University of Cambridge
                         Stanford University
                                           MIT
                         University of Michigan
                               Harvard University
                               University of Oxford
                              Sao Paulo University
                            Imperial College London
                              University of Edinburgh
                                    Cornell University
                      University of California at Berkeley
                                              RWTH Aachen
                                       Columbia University
                                                   Georgia Tech
                                       University of Wisconsin
                                                    UC San Diego
                                      University of California at LA
                                                University of Florida
                                           University of North Carolina
50m
          Real-time data on 28m unique papers:

   Thomson Reuters’
  Web of Knowledge
  (dating from 1934)



Mendeley after
   16 months:
The secrets behind
       recommenders



Q1/2: How can a tool generate recommendations?

Q2/2: How can you measure the tool's performance?
Q1/2: How can a tool generate recommendations?


Content-based Filtering                   Collaborative Filtering
Find items with similar                   Find items that users who are
characteristics (e.g. title,              similar to you also liked (wisdom
discipline) to what the user              of the crowds)
previously liked


TF-IDF, BM25, Bayesian                    User-based and item-based
classifiers, decision trees, artificial   variations, matrix factorisation
neural networks
Quickly absorbs new items                 No need to understand item
(ovecomes cold start problem)             characteristics
Can make good recommendations             Tends to give more novel
from very few examples                    recommendations

                                                   Hybrid tools too...
Q2/2: How can you measure the tool's performance?



 ➔
      Cross validation with hold outs
     ➔
        get yourself a good ground truth
     ➔
        hide a fraction of your data from the system
     ➔
        try to predict the hidden fraction from the
       remaining data
     ➔
        calculate precision and recall

 ➔
      Let users decide
     ➔
       set up evaluations with real users (experimental)
     ➔
       track tool usage by users
Recommenders
   @ Mendeley



            1) Related Research
            ●
              given 1 research article
            ●
              find other related articles



            2) Personalised Recommendations
            ●
              given a user's profile (e.g. interests)
            ●
              find new articles of interest to them
Use Case 1: Related Research




    Strategy

       content-based approach (tf-idf with lucene implementation)
       search for articles with same metadata (e.g. title, tags)



    Evaluation

       cross-validation with hold outs on a ground truth data set
Use Case 1: Related Research
                              tf-idf Precision per Field when Field is Available

                  0.5

 Q2/2 What are our results?
   0.45

                  0.4

                 0.35

                  0.3
 Precision @ 5




                 0.25

                  0.2

                 0.15

                  0.1

                 0.05

                   0
                        tag     abstract   mesh-term           title    general-keyword   author   keyword

                                                       metadata field




Results                 1) tags are the most informative field for finding related research
Use Case 1: Related Research
                              tf-idf Precision for Field Combos when Field is Available

                  0.5

                 0.45

                  0.4                 abstract+author+general-keyword+tag+title
                 0.35

                  0.3
 precision @ 5




                 0.25

                  0.2

                 0.15

                  0.1

                 0.05

                   0
                        tag       bestCombo   abstract   mesh-term             title   general-keyword   author   keyword

                                                           metadata field(s)




Results                   2) tags outperform combinations of fields
How does Mendeley
use recommendation           2/2 Personalised
                                   Recommendations
       technologies?


                2) Personalised Recommendations
                ●
                  given a user's profile (e.g. interests)
                ●
                  find new articles of interest to them
Use Case 2: Perso Recommendations




   Strategy

      collaborative filtering (item-based with apache mahout)
      recommend articles to researchers that would interest them



   Evaluation

      cross-validation with hold outs on a ground truth data set
Use Case 2: Perso Recommendations




   Strategy

      collaborative filtering (item-based with apache mahout)
      recommend articles to researchers that would interest them



   Evaluation

      cross-validation with hold outs on a ground truth data set
Input:
User libraries




                 Output:
                 Recommend 10
                 articles to each user
Test:
                         10-fold cross validation
                         50,000 user libraries

                              16 months ago




Results:
<0.025 precision at 10
Test:
                       10-fold cross validation
                       50,000 user libraries
                            10 months ago
                            (i.e. + 6 months)




Results:
~0.1 precision at 10
Test:
                       Release to a subset of
                       users
                            10 months ago
                            (i.e. + 6 months)




Results:
~0.4 precision at 10
Article Recommendation Acceptance Rates
Acceptance rate (i.e. accept/reject clicks)




                                                 Number of months live
Precision at 10 articles
                           Precision by Library Size




                           Number of articles in user library
Test:
                                       10-fold cross validation
                                       50,000 user libraries




So, results comparable to non-   Completely distributed, so can
distributed recommender          easily run on EC2 within 24
                                 hours...
Conclusions
 Summary

➔
    Recommendations can be complementary to search

➔
    They can help users to discover interesting items

➔
    They can exploit item metadata (content-based)

➔
    They can exploit the 'wisdom of the crowds' (CF)
Conclusions
 Summary

➔
  Crowd-sourced metadata can have a poweful
informative value (e.g. article tags)

➔
    Sometimes you need to let data grow

➔
  Evaluations under lab conditions don't always
predict real world results well

➔
 Recommenders don't just have to be about making
money … remember where we started...?
“All the time we are very
conscious of the huge challenges
that human society has now –
curing cancer, understanding
the brain for Alzheimer‘s [...].


But a lot of the state of knowledge
of the human race is sitting in the
scientists’ computers, and is
currently not shared […] We need
to get it unlocked so we can tackle
those huge problems.“
www.mendeley.com

More Related Content

What's hot

NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...aschwarzman
 
Workshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysisWorkshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysisDr. Yaar Muhammad
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivoMarieke Guy
 
Scientific Recommender Systems - PG PUSHPIN
Scientific Recommender Systems - PG PUSHPINScientific Recommender Systems - PG PUSHPIN
Scientific Recommender Systems - PG PUSHPINDermitder
 

What's hot (6)

NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
NISO-NFAIS Supplemental Journal Article Materials Working Group: An Update o...
 
Workshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysisWorkshop 2 using nvivo 12 for qualitative data analysis
Workshop 2 using nvivo 12 for qualitative data analysis
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
From federated to aggregated search
From federated to aggregated searchFrom federated to aggregated search
From federated to aggregated search
 
Introduction to NVivo
Introduction to NVivoIntroduction to NVivo
Introduction to NVivo
 
Scientific Recommender Systems - PG PUSHPIN
Scientific Recommender Systems - PG PUSHPINScientific Recommender Systems - PG PUSHPIN
Scientific Recommender Systems - PG PUSHPIN
 

Similar to Mendeley: Recommendation Systems for Academic Literature

Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersKris Jack
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleKris Jack
 
Strata 2012: Big Data and Bibliometrics
Strata 2012: Big Data and BibliometricsStrata 2012: Big Data and Bibliometrics
Strata 2012: Big Data and BibliometricsWilliam Gunn
 
Telstar cambridge-2010-07-22-im.key
Telstar cambridge-2010-07-22-im.keyTelstar cambridge-2010-07-22-im.key
Telstar cambridge-2010-07-22-im.keyIan Mulvany
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchDatapetermurrayrust
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustLEARN Project
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutKris Jack
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack
 
Wikipedia as an Ontology for Describing Documents
Wikipedia as an Ontology for Describing DocumentsWikipedia as an Ontology for Describing Documents
Wikipedia as an Ontology for Describing DocumentsZareen Syed
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCarole Goble
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Program of Academic Excellence
Program of Academic ExcellenceProgram of Academic Excellence
Program of Academic ExcellenceDarrell W. Gunter
 

Similar to Mendeley: Recommendation Systems for Academic Literature (20)

Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchers
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
Strata 2012: Big Data and Bibliometrics
Strata 2012: Big Data and BibliometricsStrata 2012: Big Data and Bibliometrics
Strata 2012: Big Data and Bibliometrics
 
Telstar cambridge-2010-07-22-im.key
Telstar cambridge-2010-07-22-im.keyTelstar cambridge-2010-07-22-im.key
Telstar cambridge-2010-07-22-im.key
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchData
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-RustThe Culture of Research Data, by Peter Murray-Rust
The Culture of Research Data, by Peter Murray-Rust
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
Project literature search
Project literature searchProject literature search
Project literature search
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Wikipedia as an Ontology for Describing Documents
Wikipedia as an Ontology for Describing DocumentsWikipedia as an Ontology for Describing Documents
Wikipedia as an Ontology for Describing Documents
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Program of Academic Excellence
Program of Academic ExcellenceProgram of Academic Excellence
Program of Academic Excellence
 

More from Kris Jack

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ MendeleyKris Jack
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Kris Jack
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemKris Jack
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesKris Jack
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyKris Jack
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesKris Jack
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Kris Jack
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionKris Jack
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...Kris Jack
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...Kris Jack
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyKris Jack
 

More from Kris Jack (12)

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data Challenges
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similarities
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from Mendeley
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Mendeley: Recommendation Systems for Academic Literature

  • 1. Mendeley: Recommendation Systems for Academic Literature Kris Jack, PhD Data Mining Team Lead
  • 2. “All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...]. But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“
  • 3. Overview ➔ what's a recommender and what does it look like? ➔ what's Mendeley? ➔ the secrets behind recommenders ➔ recommenders @ Mendeley
  • 4. What's a recommender and what does it look like?
  • 5. What's a recommender? Definition: A recommendation system (recommender) is a subclass of information filtering system that aims to predict a user's interest in items.
  • 7. Recommendation Vs. Search ➔ search is a pull strategy vs. ➔ recommendation is a push strategy
  • 8. Recommendation Vs. Search search is like following a path...
  • 9. Recommendation Vs. Search recommendation is like being on a roller coaster... A different sense of control
  • 11. What is Mendeley? ...a large data technology startup company ...and it's on a mission to change the way that research is done!
  • 12. Mendeley Last.fm 3) Last.fm builds your music works like this: profile and recommends you music you also could like... and 1) Install “Audioscrobbler” it’s the world‘s biggest open music database 2) Listen to music
  • 13. Mendeley Last.fm music libraries research libraries artists researchers songs papers genres disciplines
  • 14. Mendeley provides tools to help users... ...organise their research
  • 15. Mendeley provides tools to help users... ...collaborate with one another ...organise their research
  • 16. US National Academy of Engineering “Grand Challenges”: Climate change Sustainable food supplies Artificial Clean energy Intelligence Clean water Terrorist Pandemic diseases violence Tools of scientific discovery
  • 17. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research
  • 18.
  • 19. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research
  • 20. 1.4 million+ users; the 20 largest userbases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida University of North Carolina
  • 21. 50m Real-time data on 28m unique papers: Thomson Reuters’ Web of Knowledge (dating from 1934) Mendeley after 16 months:
  • 22. The secrets behind recommenders Q1/2: How can a tool generate recommendations? Q2/2: How can you measure the tool's performance?
  • 23. Q1/2: How can a tool generate recommendations? Content-based Filtering Collaborative Filtering Find items with similar Find items that users who are characteristics (e.g. title, similar to you also liked (wisdom discipline) to what the user of the crowds) previously liked TF-IDF, BM25, Bayesian User-based and item-based classifiers, decision trees, artificial variations, matrix factorisation neural networks Quickly absorbs new items No need to understand item (ovecomes cold start problem) characteristics Can make good recommendations Tends to give more novel from very few examples recommendations Hybrid tools too...
  • 24. Q2/2: How can you measure the tool's performance? ➔ Cross validation with hold outs ➔ get yourself a good ground truth ➔ hide a fraction of your data from the system ➔ try to predict the hidden fraction from the remaining data ➔ calculate precision and recall ➔ Let users decide ➔ set up evaluations with real users (experimental) ➔ track tool usage by users
  • 25. Recommenders @ Mendeley 1) Related Research ● given 1 research article ● find other related articles 2) Personalised Recommendations ● given a user's profile (e.g. interests) ● find new articles of interest to them
  • 26.
  • 27. Use Case 1: Related Research Strategy content-based approach (tf-idf with lucene implementation) search for articles with same metadata (e.g. title, tags) Evaluation cross-validation with hold outs on a ground truth data set
  • 28.
  • 29. Use Case 1: Related Research tf-idf Precision per Field when Field is Available 0.5 Q2/2 What are our results? 0.45 0.4 0.35 0.3 Precision @ 5 0.25 0.2 0.15 0.1 0.05 0 tag abstract mesh-term title general-keyword author keyword metadata field Results 1) tags are the most informative field for finding related research
  • 30. Use Case 1: Related Research tf-idf Precision for Field Combos when Field is Available 0.5 0.45 0.4 abstract+author+general-keyword+tag+title 0.35 0.3 precision @ 5 0.25 0.2 0.15 0.1 0.05 0 tag bestCombo abstract mesh-term title general-keyword author keyword metadata field(s) Results 2) tags outperform combinations of fields
  • 31. How does Mendeley use recommendation 2/2 Personalised Recommendations technologies? 2) Personalised Recommendations ● given a user's profile (e.g. interests) ● find new articles of interest to them
  • 32.
  • 33. Use Case 2: Perso Recommendations Strategy collaborative filtering (item-based with apache mahout) recommend articles to researchers that would interest them Evaluation cross-validation with hold outs on a ground truth data set
  • 34.
  • 35. Use Case 2: Perso Recommendations Strategy collaborative filtering (item-based with apache mahout) recommend articles to researchers that would interest them Evaluation cross-validation with hold outs on a ground truth data set
  • 36. Input: User libraries Output: Recommend 10 articles to each user
  • 37. Test: 10-fold cross validation 50,000 user libraries 16 months ago Results: <0.025 precision at 10
  • 38. Test: 10-fold cross validation 50,000 user libraries 10 months ago (i.e. + 6 months) Results: ~0.1 precision at 10
  • 39. Test: Release to a subset of users 10 months ago (i.e. + 6 months) Results: ~0.4 precision at 10
  • 40. Article Recommendation Acceptance Rates Acceptance rate (i.e. accept/reject clicks) Number of months live
  • 41. Precision at 10 articles Precision by Library Size Number of articles in user library
  • 42. Test: 10-fold cross validation 50,000 user libraries So, results comparable to non- Completely distributed, so can distributed recommender easily run on EC2 within 24 hours...
  • 43.
  • 44. Conclusions Summary ➔ Recommendations can be complementary to search ➔ They can help users to discover interesting items ➔ They can exploit item metadata (content-based) ➔ They can exploit the 'wisdom of the crowds' (CF)
  • 45. Conclusions Summary ➔ Crowd-sourced metadata can have a poweful informative value (e.g. article tags) ➔ Sometimes you need to let data grow ➔ Evaluations under lab conditions don't always predict real world results well ➔ Recommenders don't just have to be about making money … remember where we started...?
  • 46. “All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...]. But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“