SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Downloaden Sie, um offline zu lesen
Mendeley’s Research Catalogue:
building it, opening it up and
making it even more useful for researchers
Kris Jack, PhD
Chief Data Scientist, @_krisjack
Outline
1. What‘s Mendeley?
2. Under the Bonnet
3. Opening up Data
4. Working with Academia
5. Conclusions
What's Mendeley?
Mendeley‘s not just a reference manager
è  Mendeley is a platform that connects
researchers, research data and apps
Mendeley Open API
Mendeley Open API
research catalogue
è  Mendeley is a platform that connects
researchers, research data and apps
...organise
their research
Mendeley provides tools to help users...
è  Reference
management
è  Cite-as-you-
write
è  Full-text
article search
è  Digitalised
annotations
...organise
their research
...collaborate with
one another
Mendeley provides tools to help users...
è  Professional
research groups
è  Social network
è  Annotation
sharing
...organise
their research
...collaborate with
one another
...discover new
research
Mendeley provides tools to help users...
è  Explore crowdsourced
research catalogue
è  Document statistics
è  Personalised article
recommendations
è  Related research
è  Research contact
suggestions
...organise
their research
...collaborate with
one another
...discover new
research
Mendeley provides tools to help users...
...organise
their research
...collaborate with
one another
...discover new
research
Mendeley provides tools to help users...
Social network
(>2.4M users)
Research catalogue
(~85M unique articles)
Research groups
(~240K groups)
Personal libraries
(>425M articles)
Our community from a data perspective
Logging massive
set of usage data
Under the Bonnet
Lots of features to build & support
è  Reference
management
è  Cite-as-you-
write
è  Full-text
article search
è  Digitalised
annotations
è  Professional
research groups
è  Social network
è  Annotation
sharing
è  Explore crowdsourced
research catalogue
è  Document statistics
è  Personalised article
recommendations
è  Related research
è  Research contact
suggestions
Lots of features to build & support
è  Reference
management
è  Cite-as-you-
write
è  Full-text
article search
è  Digitalised
annotations
è  Professional
research groups
è  Social network
è  Annotation
sharing
è  Explore crowdsourced
research catalogue
è  Document statistics
è  Personalised article
recommendations
è  Related research
è  Research contact
suggestions
Lots of features to build & support
è  Reference
management
è  Cite-as-you-
write
è  Full-text
article search
è  Digitalised
annotations
è  Professional
research groups
è  Social network
è  Annotation
sharing
è  Explore crowdsourced
research catalogue
è  Document statistics
è  Personalised article
recommendations
è  Related research
è  Research contact
suggestions
Lots of features to build & support
features
Lots of features to build & support
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Lots of features to build & support
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Crowdsourcing
(deduplication,
metadata
aggregation,
statistics)
The curse of success
•  More articles came
•  More users came
•  Keeping catalogue data fresh was a burden
•  Algorithms relied on global counts
•  Iterating over MySQL tables was slow
•  Needed to shard tables to grow catalogue
•  In short, our backend system didn’t scale
Please try again later
~0.5 million users; the 20 largest user bases:
University of Cambridge
Stanford University
MIT
University of Michigan
Harvard University
University of Oxford
Sao Paulo University
Imperial College London
University of Edinburgh
Cornell University
University of California at Berkeley
RWTH Aachen
Columbia University
Georgia Tech
University of Wisconsin
UC San Diego
University of California at LA
University of Florida
University of North Carolina
~30m research articles
~0.5 million users; the 20 largest user bases:
University of Cambridge
Stanford University
MIT
University of Michigan
Harvard University
University of Oxford
Sao Paulo University
Imperial College London
University of Edinburgh
Cornell University
University of California at Berkeley
RWTH Aachen
Columbia University
Georgia Tech
University of Wisconsin
UC San Diego
University of California at LA
University of Florida
University of North Carolina
~30m research articles
The system started to become
slow.
How long did it take to
generate our daily readership
statistics?
~0.5 million users; the 20 largest user bases:
University of Cambridge
Stanford University
MIT
University of Michigan
Harvard University
University of Oxford
Sao Paulo University
Imperial College London
University of Edinburgh
Cornell University
University of California at Berkeley
RWTH Aachen
Columbia University
Georgia Tech
University of Wisconsin
UC San Diego
University of California at LA
University of Florida
University of North Carolina
~30m research articles
The system started to become
slow.
How long did it take to
generate our daily readership
statistics?
23 hours!
We had serious needs
•  Build a catalogue based on billions of articles
•  Support many features that rely on the catalogue
•  Statistics
•  Search
•  Recommendations
•  Sharing
•  Data
•  Freshness
•  Consistency
•  Business context
•  Agile development (rapid prototyping)
•  Cost effective
•  Going viral
•  Technical debt stacking up
Enter Hadoop
What is Hadoop?
The Apache Hadoop Project develops
open-source software for reliable,
scalable, distributed computing
www.hadoop.apache.org
Hadoop
•  Designed to operate on a cluster of
computers
•  1…thousands
•  Commodity hardware (low cost units)
•  Each node offers local computation and
storage
•  Provides framework for working with big
data (beyond petabytes)
New tech stack for backend
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Crowdsourcing
(deduplication,
metadata
aggregation,
statistics)
New tech stack for backend
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Crowdsourcing
(deduplication,
metadata
aggregation,
statistics)
23 hr
computations
now took 15
minutes
New tech stack for backend
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Crowdsourcing
(deduplication,
metadata
aggregation,
statistics)
recommended
reading
Mendeley Suggest
Generating recommendations
through matrix multiplication
This is item-based
recommendations as
similarity is based on
items, not users
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
Running on Amazon's Elastic Map Reduce
On demand use and easy to cost
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
3
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
3
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
3
-4.1K
(63%)
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
3
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
Orig. user-based
è 1K, 2.5
3
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
Orig. user-based
è 1K, 2.5
3
-1.4K
(58%)
+1 (67%)
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
Orig. user-based
è 1K, 2.5
3
Cust. user-based
è 0.3K, 2.5
Mahout's
Performance
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
Orig. user-based
è 1K, 2.5
3
Cust. user-based
è 0.3K, 2.5
-0.7K
(70%)
Mahout's
Performance
-4.1K
(63%)
NormalisedAmazonHours
No. Good Recommendations/10
0
1K
2K
3K
4K
5K
6K
7K
0 0.5 1 1.5 2 2.5
Costly & Bad Costly & Good
Cheap & Bad Cheap & Good
6.5K, 1.5
Orig. item-based
Cust. item-based
è 2.4K, 1.5
Orig. user-based
è 1K, 2.5
3
Cust. user-based
è 0.3K, 2.5
-6.2K
(95%)
Mahout's
Performance
+1 (67%)
Disclaimer: these advantages have costs
•  Migrating to a new system (data consistency)
•  Setup costs
•  Learn black magic to configure
•  Hardware for cluster
•  Administrative costs
•  High learning curve to administrate Hadoop
•  Still an immature technology
•  You may need to debug the source code
•  Developing against Mahout
•  Still needs lots of love
Big data backend
features
Research catalogue
(~30M unique articles)
Personal libraries
(>100M articles)
Crowdsourcing
(deduplication,
metadata
aggregation,
statistics)
Opening up Data
Social network
(>2.4M users)
Research catalogue
(~85M unique articles)
Research groups
(~240K groups)
Personal libraries
(>425M articles)
Our community from a data perspective
Logging massive
set of usage data
Challenge: Build an application with our data,
make science more open.
PloS/Mendeley's Binary Battle
More details at http://dev.mendeley.com/api-binary-battle/
Challenge: Build off-line system for scientific
recommendations with our API
and DataTEL data set
ScienceRec Challenge 2012
More details at http://2012.recsyschallenge.com/tracks/sciencerec/
Challenge: Build off-line system for scientific
recommendations with our API
and DataTEL data set
ScienceRec Challenge 2012
More details at http://2012.recsyschallenge.com/tracks/sciencerec/
Challenge: Metadata Extraction Challenge
The Next Challenge…?
Working with Academia
We have a history of academic
collaborations
Duration Project
2009-2011 MAKIN’IT
2010-2014 TEAM
2010-2011 DURA
2012-2012 CSL Editor
2012-2014 CODE
2012-2014 ERASM
2013-2015 EEXCESS
Demo
CSL Editor
http://editor.citationstyles.org/
Demo
CODE Mendeley Desktop
http://code-research.eu/results
Demo
Mendeley Labs
http://labs.mendeley.com/
We have a history of academic
collaborations
Duration Project
2009-2011 MAKIN’IT
2010-2014 TEAM
2010-2011 DURA
2012-2012 CSL Editor
2012-2014 CODE
2012-2014 ERASM
2013-2015 EEXCESS
Want to collaborate?
Conclusions
Conclusions
è  Mendeley is far more than a reference manager – it‘s
a platform that connects researchers, data and apps
è  Starting small is good, but be prepared for the cost of
scaling up
è  We‘re opening up our data for you to build apps on
our platform
è  We‘re always keen to collaborate with academic
groups
Kris Jack, PhD
Chief Data Scientist, @_krisjack

Weitere ähnliche Inhalte

Was ist angesagt?

Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedXavier Amatriain
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakDeepak Agarwal
 
ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial Alexandros Karatzoglou
 
Content - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationContent - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationAlessandro Liparoti
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Machine Learning to Grow the World's Knowledge
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's KnowledgeXavier Amatriain
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS BigDataCloud
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engineKeeyong Han
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...Sri Ambati
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricEdward Baker
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviewsmaranlar
 
Browsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedInBrowsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedInLili Wu
 
Intro to Mahout
Intro to MahoutIntro to Mahout
Intro to MahoutUri Lavi
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Lillian Rigling
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerceAlexander Konduforov
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Dakiry
 

Was ist angesagt? (20)

Recsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem RevisitedRecsys 2014 Tutorial - The Recommender Problem Revisited
Recsys 2014 Tutorial - The Recommender Problem Revisited
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
 
ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial ESSIR 2013 Recommender Systems tutorial
ESSIR 2013 Recommender Systems tutorial
 
Content - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationContent - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative Information
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Machine Learning to Grow the World's Knowledge
Machine Learning to Grow  the World's KnowledgeMachine Learning to Grow  the World's Knowledge
Machine Learning to Grow the World's Knowledge
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metric
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviews
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Browsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedInBrowsemap: Collaborative Filtering at LinkedIn
Browsemap: Collaborative Filtering at LinkedIn
 
Intro to Mahout
Intro to MahoutIntro to Mahout
Intro to Mahout
 
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
Let Your Conscience Be Your Guide: Taming Online Research Guides at the NCSU ...
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”
 

Andere mochten auch

Eurozone crisis and policy failure (Cádiz)
Eurozone crisis and policy failure (Cádiz)Eurozone crisis and policy failure (Cádiz)
Eurozone crisis and policy failure (Cádiz)Luis López-Molina
 
Presentation PositionGreen
Presentation PositionGreenPresentation PositionGreen
Presentation PositionGreenelvislaban
 
How to become Crorepati in 25 years
How to become Crorepati in 25 yearsHow to become Crorepati in 25 years
How to become Crorepati in 25 yearsNeeraj Maurya
 
Swing Trading Tactics - Lagging Indicators and Price Action
Swing Trading Tactics - Lagging Indicators and Price Action Swing Trading Tactics - Lagging Indicators and Price Action
Swing Trading Tactics - Lagging Indicators and Price Action Marketgeekschannel
 
What are the option greeks
What are the option greeksWhat are the option greeks
What are the option greeksOptionTiger.com
 
Technical Analysis of Stock----Wyckoff Method Project
Technical Analysis of Stock----Wyckoff Method ProjectTechnical Analysis of Stock----Wyckoff Method Project
Technical Analysis of Stock----Wyckoff Method ProjectLAI Wei
 
Windrose auswertung prototyp
Windrose auswertung prototypWindrose auswertung prototyp
Windrose auswertung prototypAndreas Kurth
 
Technology Budgeting for SMB
Technology Budgeting for SMBTechnology Budgeting for SMB
Technology Budgeting for SMBthomasmking1
 
Cal 30 s 29th october 2010
Cal 30 s 29th october 2010Cal 30 s 29th october 2010
Cal 30 s 29th october 2010Garden City
 
Daily Stock Report Monday Morning, July 9, 2012
Daily Stock Report Monday Morning, July 9, 2012Daily Stock Report Monday Morning, July 9, 2012
Daily Stock Report Monday Morning, July 9, 2012James Stewart
 
The Meisner Law Group - Community Association Seminar pt 4
The Meisner Law Group - Community Association Seminar pt 4The Meisner Law Group - Community Association Seminar pt 4
The Meisner Law Group - Community Association Seminar pt 4Robert M. Meisner
 

Andere mochten auch (17)

Eurozone crisis and policy failure (Cádiz)
Eurozone crisis and policy failure (Cádiz)Eurozone crisis and policy failure (Cádiz)
Eurozone crisis and policy failure (Cádiz)
 
Berlin
BerlinBerlin
Berlin
 
Presentation PositionGreen
Presentation PositionGreenPresentation PositionGreen
Presentation PositionGreen
 
2008 crisis
2008 crisis2008 crisis
2008 crisis
 
How to become Crorepati in 25 years
How to become Crorepati in 25 yearsHow to become Crorepati in 25 years
How to become Crorepati in 25 years
 
Swing Trading Tactics - Lagging Indicators and Price Action
Swing Trading Tactics - Lagging Indicators and Price Action Swing Trading Tactics - Lagging Indicators and Price Action
Swing Trading Tactics - Lagging Indicators and Price Action
 
What are the option greeks
What are the option greeksWhat are the option greeks
What are the option greeks
 
Technical Analysis of Stock----Wyckoff Method Project
Technical Analysis of Stock----Wyckoff Method ProjectTechnical Analysis of Stock----Wyckoff Method Project
Technical Analysis of Stock----Wyckoff Method Project
 
Act as state machine
Act as state machineAct as state machine
Act as state machine
 
Corporate governance jagadeesh
Corporate governance jagadeeshCorporate governance jagadeesh
Corporate governance jagadeesh
 
Windrose auswertung prototyp
Windrose auswertung prototypWindrose auswertung prototyp
Windrose auswertung prototyp
 
Technology Budgeting for SMB
Technology Budgeting for SMBTechnology Budgeting for SMB
Technology Budgeting for SMB
 
Cal 30 s 29th october 2010
Cal 30 s 29th october 2010Cal 30 s 29th october 2010
Cal 30 s 29th october 2010
 
Daily Stock Report Monday Morning, July 9, 2012
Daily Stock Report Monday Morning, July 9, 2012Daily Stock Report Monday Morning, July 9, 2012
Daily Stock Report Monday Morning, July 9, 2012
 
Clasificacion ultra "Güeyos del Diablu" veteranos
Clasificacion ultra "Güeyos del Diablu" veteranosClasificacion ultra "Güeyos del Diablu" veteranos
Clasificacion ultra "Güeyos del Diablu" veteranos
 
Pragati Bhotika - CV
Pragati Bhotika - CVPragati Bhotika - CV
Pragati Bhotika - CV
 
The Meisner Law Group - Community Association Seminar pt 4
The Meisner Law Group - Community Association Seminar pt 4The Meisner Law Group - Community Association Seminar pt 4
The Meisner Law Group - Community Association Seminar pt 4
 

Ähnlich wie Making Mendeley's Research Catalogue More Useful

eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleKris Jack
 
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...datascience_at
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyKris Jack
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slidesLouis Rosenfeld
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsIlkay Altintas, Ph.D.
 
8 Information Architecture Better Practices
8 Information Architecture Better Practices8 Information Architecture Better Practices
8 Information Architecture Better PracticesLouis Rosenfeld
 
BESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesBESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesRoberto García
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudDatabricks
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache SparkAI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache SparkValue Amplify Consulting
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
Publishing in a High Quality Journal.pptx
Publishing in a High Quality Journal.pptxPublishing in a High Quality Journal.pptx
Publishing in a High Quality Journal.pptxIbrahim573144
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsMaya Hristakeva
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!Cloudera, Inc.
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 

Ähnlich wie Making Mendeley's Research Catalogue More Useful (20)

eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
DataScience Meeting I - Cloud Elephants and Witches: A Big Data Tale from Men...
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from Mendeley
 
Adaptable Information Workshop slides
Adaptable Information Workshop slidesAdaptable Information Workshop slides
Adaptable Information Workshop slides
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
8 Information Architecture Better Practices
8 Information Architecture Better Practices8 Information Architecture Better Practices
8 Information Architecture Better Practices
 
BESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User InterfacesBESDUI: Benchmark for End-User Structured Data User Interfaces
BESDUI: Benchmark for End-User Structured Data User Interfaces
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache SparkAI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
AI Class Topic 4: Text Analytics, Sentiment Analysis and Apache Spark
 
Apsc 100 Clinic 3 C Fall 09
Apsc 100 Clinic 3 C Fall 09Apsc 100 Clinic 3 C Fall 09
Apsc 100 Clinic 3 C Fall 09
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Publishing in a High Quality Journal.pptx
Publishing in a High Quality Journal.pptxPublishing in a High Quality Journal.pptx
Publishing in a High Quality Journal.pptx
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research Recommendations
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
201 ssp discoverability
201 ssp discoverability201 ssp discoverability
201 ssp discoverability
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 

Mehr von Kris Jack

Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ MendeleyKris Jack
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Kris Jack
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemKris Jack
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesKris Jack
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutKris Jack
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyKris Jack
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesKris Jack
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Kris Jack
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionKris Jack
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...Kris Jack
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...Kris Jack
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersKris Jack
 
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic LiteratureMendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic LiteratureKris Jack
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureKris Jack
 

Mehr von Kris Jack (14)

Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?
 
Mendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender SystemMendeley Suggest: Engineering a Personalised Article Recommender System
Mendeley Suggest: Engineering a Personalised Article Recommender System
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data Challenges
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similarities
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchers
 
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic LiteratureMendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic Literature
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific Literature
 

Kürzlich hochgeladen

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Making Mendeley's Research Catalogue More Useful

  • 1. Mendeley’s Research Catalogue: building it, opening it up and making it even more useful for researchers Kris Jack, PhD Chief Data Scientist, @_krisjack
  • 2. Outline 1. What‘s Mendeley? 2. Under the Bonnet 3. Opening up Data 4. Working with Academia 5. Conclusions
  • 4. Mendeley‘s not just a reference manager
  • 5. è  Mendeley is a platform that connects researchers, research data and apps Mendeley Open API
  • 6. Mendeley Open API research catalogue è  Mendeley is a platform that connects researchers, research data and apps
  • 7. ...organise their research Mendeley provides tools to help users... è  Reference management è  Cite-as-you- write è  Full-text article search è  Digitalised annotations
  • 8. ...organise their research ...collaborate with one another Mendeley provides tools to help users... è  Professional research groups è  Social network è  Annotation sharing
  • 9. ...organise their research ...collaborate with one another ...discover new research Mendeley provides tools to help users... è  Explore crowdsourced research catalogue è  Document statistics è  Personalised article recommendations è  Related research è  Research contact suggestions
  • 10. ...organise their research ...collaborate with one another ...discover new research Mendeley provides tools to help users...
  • 11. ...organise their research ...collaborate with one another ...discover new research Mendeley provides tools to help users...
  • 12. Social network (>2.4M users) Research catalogue (~85M unique articles) Research groups (~240K groups) Personal libraries (>425M articles) Our community from a data perspective Logging massive set of usage data
  • 14. Lots of features to build & support è  Reference management è  Cite-as-you- write è  Full-text article search è  Digitalised annotations è  Professional research groups è  Social network è  Annotation sharing è  Explore crowdsourced research catalogue è  Document statistics è  Personalised article recommendations è  Related research è  Research contact suggestions
  • 15. Lots of features to build & support è  Reference management è  Cite-as-you- write è  Full-text article search è  Digitalised annotations è  Professional research groups è  Social network è  Annotation sharing è  Explore crowdsourced research catalogue è  Document statistics è  Personalised article recommendations è  Related research è  Research contact suggestions
  • 16. Lots of features to build & support è  Reference management è  Cite-as-you- write è  Full-text article search è  Digitalised annotations è  Professional research groups è  Social network è  Annotation sharing è  Explore crowdsourced research catalogue è  Document statistics è  Personalised article recommendations è  Related research è  Research contact suggestions
  • 17. Lots of features to build & support features
  • 18. Lots of features to build & support features Research catalogue (~30M unique articles) Personal libraries (>100M articles)
  • 19. Lots of features to build & support features Research catalogue (~30M unique articles) Personal libraries (>100M articles) Crowdsourcing (deduplication, metadata aggregation, statistics)
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. The curse of success •  More articles came •  More users came •  Keeping catalogue data fresh was a burden •  Algorithms relied on global counts •  Iterating over MySQL tables was slow •  Needed to shard tables to grow catalogue •  In short, our backend system didn’t scale
  • 26. ~0.5 million users; the 20 largest user bases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida University of North Carolina ~30m research articles
  • 27. ~0.5 million users; the 20 largest user bases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida University of North Carolina ~30m research articles The system started to become slow. How long did it take to generate our daily readership statistics?
  • 28. ~0.5 million users; the 20 largest user bases: University of Cambridge Stanford University MIT University of Michigan Harvard University University of Oxford Sao Paulo University Imperial College London University of Edinburgh Cornell University University of California at Berkeley RWTH Aachen Columbia University Georgia Tech University of Wisconsin UC San Diego University of California at LA University of Florida University of North Carolina ~30m research articles The system started to become slow. How long did it take to generate our daily readership statistics? 23 hours!
  • 29. We had serious needs •  Build a catalogue based on billions of articles •  Support many features that rely on the catalogue •  Statistics •  Search •  Recommendations •  Sharing •  Data •  Freshness •  Consistency •  Business context •  Agile development (rapid prototyping) •  Cost effective •  Going viral •  Technical debt stacking up
  • 30. Enter Hadoop What is Hadoop? The Apache Hadoop Project develops open-source software for reliable, scalable, distributed computing www.hadoop.apache.org
  • 31. Hadoop •  Designed to operate on a cluster of computers •  1…thousands •  Commodity hardware (low cost units) •  Each node offers local computation and storage •  Provides framework for working with big data (beyond petabytes)
  • 32. New tech stack for backend features Research catalogue (~30M unique articles) Personal libraries (>100M articles) Crowdsourcing (deduplication, metadata aggregation, statistics)
  • 33. New tech stack for backend features Research catalogue (~30M unique articles) Personal libraries (>100M articles) Crowdsourcing (deduplication, metadata aggregation, statistics) 23 hr computations now took 15 minutes
  • 34. New tech stack for backend features Research catalogue (~30M unique articles) Personal libraries (>100M articles) Crowdsourcing (deduplication, metadata aggregation, statistics) recommended reading
  • 36.
  • 37. Generating recommendations through matrix multiplication This is item-based recommendations as similarity is based on items, not users org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
  • 38. Running on Amazon's Elastic Map Reduce On demand use and easy to cost
  • 39. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based 3 Mahout's Performance
  • 40. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 3 Mahout's Performance
  • 41. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 3 -4.1K (63%) Mahout's Performance
  • 42. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 3 Mahout's Performance
  • 43. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 Orig. user-based è 1K, 2.5 3 Mahout's Performance
  • 44. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 Orig. user-based è 1K, 2.5 3 -1.4K (58%) +1 (67%) Mahout's Performance
  • 45. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 Orig. user-based è 1K, 2.5 3 Cust. user-based è 0.3K, 2.5 Mahout's Performance
  • 46. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 Orig. user-based è 1K, 2.5 3 Cust. user-based è 0.3K, 2.5 -0.7K (70%) Mahout's Performance -4.1K (63%)
  • 47. NormalisedAmazonHours No. Good Recommendations/10 0 1K 2K 3K 4K 5K 6K 7K 0 0.5 1 1.5 2 2.5 Costly & Bad Costly & Good Cheap & Bad Cheap & Good 6.5K, 1.5 Orig. item-based Cust. item-based è 2.4K, 1.5 Orig. user-based è 1K, 2.5 3 Cust. user-based è 0.3K, 2.5 -6.2K (95%) Mahout's Performance +1 (67%)
  • 48. Disclaimer: these advantages have costs •  Migrating to a new system (data consistency) •  Setup costs •  Learn black magic to configure •  Hardware for cluster •  Administrative costs •  High learning curve to administrate Hadoop •  Still an immature technology •  You may need to debug the source code •  Developing against Mahout •  Still needs lots of love
  • 49. Big data backend features Research catalogue (~30M unique articles) Personal libraries (>100M articles) Crowdsourcing (deduplication, metadata aggregation, statistics)
  • 51. Social network (>2.4M users) Research catalogue (~85M unique articles) Research groups (~240K groups) Personal libraries (>425M articles) Our community from a data perspective Logging massive set of usage data
  • 52.
  • 53.
  • 54.
  • 55. Challenge: Build an application with our data, make science more open. PloS/Mendeley's Binary Battle More details at http://dev.mendeley.com/api-binary-battle/
  • 56.
  • 57. Challenge: Build off-line system for scientific recommendations with our API and DataTEL data set ScienceRec Challenge 2012 More details at http://2012.recsyschallenge.com/tracks/sciencerec/
  • 58. Challenge: Build off-line system for scientific recommendations with our API and DataTEL data set ScienceRec Challenge 2012 More details at http://2012.recsyschallenge.com/tracks/sciencerec/
  • 59. Challenge: Metadata Extraction Challenge The Next Challenge…?
  • 61. We have a history of academic collaborations Duration Project 2009-2011 MAKIN’IT 2010-2014 TEAM 2010-2011 DURA 2012-2012 CSL Editor 2012-2014 CODE 2012-2014 ERASM 2013-2015 EEXCESS
  • 65. We have a history of academic collaborations Duration Project 2009-2011 MAKIN’IT 2010-2014 TEAM 2010-2011 DURA 2012-2012 CSL Editor 2012-2014 CODE 2012-2014 ERASM 2013-2015 EEXCESS Want to collaborate?
  • 67. Conclusions è  Mendeley is far more than a reference manager – it‘s a platform that connects researchers, data and apps è  Starting small is good, but be prepared for the cost of scaling up è  We‘re opening up our data for you to build apps on our platform è  We‘re always keen to collaborate with academic groups Kris Jack, PhD Chief Data Scientist, @_krisjack