Mendeley: crowdsourcing and recommending research on a large scale

•

4 gefällt mir•752 views

I was invited to be the keynote speaker at a special track on Recommendation; Data Sharing and Research Practices in Science 2.0 at the I-KNOW 2011 conference (http://i-know.tugraz.at/) on 2011/09/07. It presents the challanges involved in crowdsourcing the world's largest research catalogue and then building a recommendation service on top of them that scales to serve millions of users.

Bildung Technologie

Mendeley:
crowdsourcing and
recommending research
on a large scale

Kris Jack, PhD
Data Mining Team Lead

Summary

➔
what is mendeley?

➔
crowdsourcing on a large scale

➔
recommendations on a large scale

➔
data for you

Mendeley is...

...a startup ...going to change
company the way that we
do research...

Mendeley provides tools to help users...
...collaborate with
one another
...organise ...discover new
their research research

Summary
Summary

➔
what is mendeley?

➔
crowdsourcing on a large scale

➔
recommendations on a large scale

➔
data for you

Mendeley Last.fm
music libraries research libraries

artists researchers

songs papers

genres disciplines

Screenshot taken from
Mendeley is the world’s www.mendeley.com
largest crowdsourced on 04/09/11
research catalogue!

Catalogue Crowdsourcing:
System Requirements

assimilate research artefacts
into catalogue in real time
(pdfs + citation metadata)

recognise duplicate and
non-duplicate artefacts
in noisy input

Main sources of input:
Main types of input:
→ Mendeley Desktop
→ Mendeley Web Importer
→ article PDFs
→ External catalogue imports (e.g. ArXiv)
→ article metadata (e.g. reference)
articles → External catalogue lookups (e.g.
CrossRef)

catalogue generator

catalogue

articles

catalogue generator

Aims:

→ Cluster documents together
→ Generate catalogue entries

catalogue

articles

catalogue generator

Process:

→ Filehash check (SHA-1)
→ Identifier check (e.g. PubMed id)
→ Document fingerprint (full text)
→ Metadata similarity check
→ Update individual article page catalogue

articles

Catalogue with:
catalogue generator
→ article metadata
→ aggregated statistics
→ support recs, etc.

catalogue

Summary
Summary

➔
what is mendeley?

➔
crowdsourcing on a large scale

➔
recommendations on a large scale

➔
what does this mean for you?

Article Recommendation:
System Requirements

generate personal article
recommendations for users
(i.e. “here are some articles
that may interest you”)

update recommendations
every 24 hours

Input:
User libraries

Output:
Recommend 10
articles to each user

Recommendation through Test:
collaborative filtering 10-fold cross validation
50,000 user libraries
Article's in library or not
(e.g. binary input) 16 months ago

Various similarity metrics
(e.g. cooccurrence,
loglikelihood, tanimoto)

Results:
<0.025 precision at 10

Recommendation through Test:
collaborative filtering 10-fold cross validation
50,000 user libraries
Article's in library or not 10 months ago
(e.g. binary input) (i.e. + 6 months)

Various similarity metrics
(e.g. cooccurrence,
loglikelihood, tanimoto)

Results:
~0.1 precision at 10

Recommendation through Test:
collaborative filtering Release to a subset of
users
Article's in library or not 10 months ago
(e.g. binary input) (i.e. + 6 months)

Various similarity metrics
(e.g. cooccurrence,
loglikelihood, tanimoto)

Results:
~0.4 precision at 10

Article Recommendation Acceptance Rates
Acceptance rate (i.e. accept/reject clicks)

Number of months live

Article Recommendation:
System Requirements

1 million users!

generate personal article
recommendations users
(i.e. “here are some articles days!
that may interest you”)

update recommendations
every 24 hours

How to scale up?

Test:
10-fold cross validation
50,000 user libraries

So, results comparable to non- Completely distributed, so can
distributed recommender easily run on EC2 within 24
hours...

Article Recommendation Precision Across User
Library Sizes (using cooccurrence)
Precision at 10 articles

How will real
users react?

Number of articles in user library

Public Data

user libraries

50,000 libraries
4,848,724 articles
3,652,285 unique articles

library readership library stars

Obtain from: http://dev.mendeley.com/datachallenge

Empfohlen

Opportunities: Improve Interoperability ... from a library viewpoint. TIB Hannover

The Magical Art of Extracting Meaning From Datalmrei

Hoe maak ik een goede bibliografie met Mendeley?agoralc

NoSQL databasesHarri Kauhanen

Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusDuncan Hull

Mendeley: Recommendation Systems for Academic LiteratureKris Jack

Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack

Institute for Global Environmental Strategies (IGES) - Mendeley Institution E...Nurhazman Abdul Aziz

Empfohlen

Opportunities: Improve Interoperability ... from a library viewpoint. TIB Hannover

The Magical Art of Extracting Meaning From Datalmrei

Hoe maak ik een goede bibliografie met Mendeley?agoralc

NoSQL databasesHarri Kauhanen

Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusDuncan Hull

Mendeley: Recommendation Systems for Academic LiteratureKris Jack

Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack

Institute for Global Environmental Strategies (IGES) - Mendeley Institution E...Nurhazman Abdul Aziz

DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...datascience_at

Cloud Elephants and Witches: A Big Data Tale from MendeleyKris Jack

Mendeley, putting data into the hands of researchersKris Jack

Mendeley manualSoon Kim

Usage-Based vs. Citation-Based Recommenders in a Digital LibraryAndre Vellino

Session 2_Mendeley_2.pdfmuhirwaSamuel

Mendeley Institutional Edition - Universiti Kebangasaan MalaysiaNurhazman Abdul Aziz

Mendeley software presentationSt. Xavier's college, maitighar,Kathmandu

Mendeley Teaching PresentationLuís Miguel Rodrigues

000000-tutorial_mendeley.pdfIzz-mohd

Building bibliographies and managing citations with MendeleyAda Giannatelli

Academic SEO, or: How do I get my research to show up in search engines and d...Open Knowledge Maps

Introduction to-mendeley presentation-2014Ir. Dr. R.Badlishah Ahmad

Improving Semantic Search Using Query Log AnalysisStuart Wrigley

Mendeley and Activity DataIan Mulvany

Libraries meet research 2.0Guus van den Brekel

Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Lillian Rigling

Literature Searching For Your Summer Scholarship 2011 - Science and EngineeringDeborah Fitchett

Mendeley teaching presentation_0981_templateWilliam Gunn

Mendeley Workshop PresentationSalma Patel

Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack

Machine Learning @ MendeleyKris Jack

Weitere ähnliche Inhalte

Ähnlich wie Mendeley: crowdsourcing and recommending research on a large scale

DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...datascience_at

Cloud Elephants and Witches: A Big Data Tale from MendeleyKris Jack

Mendeley, putting data into the hands of researchersKris Jack

Mendeley manualSoon Kim

Usage-Based vs. Citation-Based Recommenders in a Digital LibraryAndre Vellino

Session 2_Mendeley_2.pdfmuhirwaSamuel

Mendeley Institutional Edition - Universiti Kebangasaan MalaysiaNurhazman Abdul Aziz

Mendeley software presentationSt. Xavier's college, maitighar,Kathmandu

Mendeley Teaching PresentationLuís Miguel Rodrigues

000000-tutorial_mendeley.pdfIzz-mohd

Building bibliographies and managing citations with MendeleyAda Giannatelli

Academic SEO, or: How do I get my research to show up in search engines and d...Open Knowledge Maps

Introduction to-mendeley presentation-2014Ir. Dr. R.Badlishah Ahmad

Improving Semantic Search Using Query Log AnalysisStuart Wrigley

Mendeley and Activity DataIan Mulvany

Libraries meet research 2.0Guus van den Brekel

Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Lillian Rigling

Literature Searching For Your Summer Scholarship 2011 - Science and EngineeringDeborah Fitchett

Mendeley teaching presentation_0981_templateWilliam Gunn

Mendeley Workshop PresentationSalma Patel

Ähnlich wie Mendeley: crowdsourcing and recommending research on a large scale (20)

DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...

Cloud Elephants and Witches: A Big Data Tale from Mendeley

Mendeley, putting data into the hands of researchers

Mendeley manual

Usage-Based vs. Citation-Based Recommenders in a Digital Library

Session 2_Mendeley_2.pdf

Mendeley Institutional Edition - Universiti Kebangasaan Malaysia

Mendeley software presentation

Mendeley Teaching Presentation

000000-tutorial_mendeley.pdf

Building bibliographies and managing citations with Mendeley

Academic SEO, or: How do I get my research to show up in search engines and d...

Introduction to-mendeley presentation-2014

Improving Semantic Search Using Query Log Analysis

Mendeley and Activity Data

Libraries meet research 2.0

Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...

Literature Searching For Your Summer Scholarship 2011 - Science and Engineering

Mendeley teaching presentation_0981_template

Mendeley Workshop Presentation

Mehr von Kris Jack

Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack

Machine Learning @ MendeleyKris Jack

Mendeley Suggest: What will you read next?Kris Jack

Mendeley Suggest: Engineering a Personalised Article Recommender SystemKris Jack

Mendeley's Data and Perspectives on Data ChallengesKris Jack

Scientific Article Recommendation with MahoutKris Jack

Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyKris Jack

improving explicit preference entry by visualising data similaritiesKris Jack

Etude de la pertinence de critères de recherche en recherche d'informations s...Kris Jack

A Computational Model of Staged Language AcquisitionKris Jack

From Syllables to Syntax: Investigating Staged Linguistic Development through...Kris Jack

A Collaborative Tool for the Computational Modelling of Child Language Acquis...Kris Jack

Recommendation Engines for Scientific LiteratureKris Jack

Mehr von Kris Jack (13)

Modern Perspectives on Recommender Systems and their Applications in Mendeley

Machine Learning @ Mendeley

Mendeley Suggest: What will you read next?

Mendeley Suggest: Engineering a Personalised Article Recommender System

Mendeley's Data and Perspectives on Data Challenges

Scientific Article Recommendation with Mahout

Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley

improving explicit preference entry by visualising data similarities

Etude de la pertinence de critères de recherche en recherche d'informations s...

A Computational Model of Staged Language Acquisition

From Syllables to Syntax: Investigating Staged Linguistic Development through...

A Collaborative Tool for the Computational Modelling of Child Language Acquis...

Recommendation Engines for Scientific Literature

Kürzlich hochgeladen

A Critique of the Proposed National Education Policy ReformChameera Dedduwage

Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George

mini mental status format.docxPoojaSen20

Measures of Central Tendency: Mean, Median and ModeThiyagu K

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr

Introduction to AI in Higher Education_draft.pptxpboyjonauth

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar

CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Staff of Color (SOC) Retention Efforts DDSDDavid Douglas School District

Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN

How to Make a Pirate ship Primary Education.pptxmanuelaromero2013

The basics of sentences session 2pptx copy.pptxheathfieldcps1

URLs and Routing in the Odoo 17 Website AppCeline George

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani

MENTAL STATUS EXAMINATION format.docxPoojaSen20

Mastering the Unannounced Regulatory InspectionSafetyChain Software

Kürzlich hochgeladen (20)

A Critique of the Proposed National Education Policy Reform

Employee wellbeing at the workplace.pptx

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17

mini mental status format.docx

Measures of Central Tendency: Mean, Median and Mode

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...

Introduction to AI in Higher Education_draft.pptx

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx

CARE OF CHILD IN INCUBATOR..........pptx

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝

Staff of Color (SOC) Retention Efforts DDSD

Solving Puzzles Benefits Everyone (English).pptx

How to Make a Pirate ship Primary Education.pptx

The basics of sentences session 2pptx copy.pptx

URLs and Routing in the Odoo 17 Website App

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991

MENTAL STATUS EXAMINATION format.docx

Mastering the Unannounced Regulatory Inspection

Mendeley: crowdsourcing and recommending research on a large scale

1. Mendeley: crowdsourcing and recommending research on a large scale Kris Jack, PhD Data Mining Team Lead

2. Summary ➔ what is mendeley? ➔ crowdsourcing on a large scale ➔ recommendations on a large scale ➔ data for you

3. Mendeley is... ...a startup ...going to change company the way that we do research...

4. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research

5. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research

6. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research

8. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research

9. Mendeley provides tools to help users... ...collaborate with one another ...organise ...discover new their research research

10. Summary Summary ➔ what is mendeley? ➔ crowdsourcing on a large scale ➔ recommendations on a large scale ➔ data for you

11. Mendeley Last.fm 3) Last.fm builds your music works like this: profile and recommends you music you also could like 1) Install “Audioscrobbler” and it’s the world’s largest open music 2) Listen to music database!

12. Mendeley Last.fm music libraries research libraries artists researchers songs papers genres disciplines Screenshot taken from Mendeley is the world’s www.mendeley.com largest crowdsourced on 04/09/11 research catalogue!

13. Catalogue Crowdsourcing: System Requirements assimilate research artefacts into catalogue in real time (pdfs + citation metadata) recognise duplicate and non-duplicate artefacts in noisy input

14. Main sources of input: Main types of input: → Mendeley Desktop → Mendeley Web Importer → article PDFs → External catalogue imports (e.g. ArXiv) → article metadata (e.g. reference) articles → External catalogue lookups (e.g. CrossRef) catalogue generator catalogue

15. articles catalogue generator Aims: → Cluster documents together → Generate catalogue entries catalogue

16. articles catalogue generator Process: → Filehash check (SHA-1) → Identifier check (e.g. PubMed id) → Document fingerprint (full text) → Metadata similarity check → Update individual article page catalogue

17. articles Catalogue with: catalogue generator → article metadata → aggregated statistics → support recs, etc. catalogue

18. Summary Summary ➔ what is mendeley? ➔ crowdsourcing on a large scale ➔ recommendations on a large scale ➔ what does this mean for you?

19. Article Recommendation: System Requirements generate personal article recommendations for users (i.e. “here are some articles that may interest you”) update recommendations every 24 hours

20. Input: User libraries Output: Recommend 10 articles to each user

21. Recommendation through Test: collaborative filtering 10-fold cross validation 50,000 user libraries Article's in library or not (e.g. binary input) 16 months ago Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto) Results: <0.025 precision at 10

22. Recommendation through Test: collaborative filtering 10-fold cross validation 50,000 user libraries Article's in library or not 10 months ago (e.g. binary input) (i.e. + 6 months) Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto) Results: ~0.1 precision at 10

23. Recommendation through Test: collaborative filtering Release to a subset of users Article's in library or not 10 months ago (e.g. binary input) (i.e. + 6 months) Various similarity metrics (e.g. cooccurrence, loglikelihood, tanimoto) Results: ~0.4 precision at 10

24. Article Recommendation Acceptance Rates Acceptance rate (i.e. accept/reject clicks) Number of months live

25. Article Recommendation: System Requirements 1 million users! generate personal article recommendations users (i.e. “here are some articles days! that may interest you”) update recommendations every 24 hours How to scale up?

26.

27. Test: 10-fold cross validation 50,000 user libraries So, results comparable to non- Completely distributed, so can distributed recommender easily run on EC2 within 24 hours...

28. Article Recommendation Precision Across User Library Sizes (using cooccurrence) Precision at 10 articles How will real users react? Number of articles in user library

29. Summary Summary ➔ what is mendeley? ➔ crowdsourcing on a large scale ➔ recommendations on a large scale ➔ data for you

30. Public Data user libraries 50,000 libraries 4,848,724 articles 3,652,285 unique articles library readership library stars Obtain from: http://dev.mendeley.com/datachallenge

31. Mendeley's API

32.

33.

34. www.mendeley.com