SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Amazon Web Services
at
Mendeley
Dan Harvey
Data Architect



twitter: @danharvey
dan.harvey@mendeley.com
Overview
• What do we do?
• System design
• AWS details
• Future plans
• Summary
Mendeley helps researchers work smarter
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Automatic data extraction




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            External database integration




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Automatic bibliography generation




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Tagging and annotation




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter


                        3) Mendeley aggregates research
                                       data in the cloud
1) Install
Mendeley Desktop




               2) Manage
            your research
                   papers
By doing this, Mendeley makes science more
collaborative and transparent
Mendeley in numbers
• 1 million users

• 130 million research articles
• 40 million unique

• 14 million unique files uploaded
• 13 TB in total
System Overview
     S3
                                                                                                  ng
            Amazon Web                                       Web             Web           S ynci
             Services                                       Server          Server
EM
  R
                                                                                           Brow
                                                                                               sing




                                                             Docs
     EC
       2




                                                                              Usage Logs
                                           MySQL

                                                    MySQL


                                                               MySQL
           Da
             ta S
                 erv
                    ice
                       s
                              Map Reduce




                                                   HB
                                                     ase               HD
                                                                         FS
File Storage
• Sync to and from clients
 –Backed onto S3

• How to render 13TB of pdfs?
PDF Previews
• Elastic Beanstalk
• Java servlet
 –Load & render
 –Store into S3
• Quick to prototype
 –Fast iterations
 –No infrastructure to set up
                                   ©   Elas%c
Beanstalk,
Ma/
Wood,
AWS,
2011

 –Developers in control
 –No upfront cost in hardware
• No dependency on rest of our infrastructure
Adapt to take advantage
• Improve delivery
 –Cloud Front
 –Faster worldwide

• Re-working for cost saving
 –SQS
 –Spot instances
 –Render when it’s cheapest!
Article Search
• 40 million papers
• Gives 40GB index in Solr

• Variable load

• Moved to EC2
 –Elastic Load Balancer
                             Two
fold
variance
in
traffic
over
a
week
 –Auto-scale instances
Solr Instance Layout
• Master
                                         Solr
 –Single instance                       Master

 –Matched to indexing load
 –Backed onto EBS
                              Solr
                                          Solr        Solr
                             Slave
                                         Slave       Slave

• Slaves
 –HTTP sync to master
 –Pre-built AMI images                  Elastic
                                     Load Balancer
 –EC2 auto scaling
Desktop Client
• Client Downloads
 –From S3
 –Adding CloudFront


• Crash Reports
 –Stack traces into S3
 –Analytic reports on top
 –More focused bug fixing
The future
• Aim to buy no more hardware

• More Java on Elastic Beanstalk
• SQS - replace queues

• EMR - log analysis
• SimpleDB & S3 for data stores
Problems Faced
• Accounting usage
 –Mix of users on account
 –Start early with this!
 –IAM helps

• Orchestration
 –Cloud Formation
 –Elastic Beanstalk
 –Finding we need more
Summary
• Not all or nothing

• Focus on your problem
       not “Undifferentiated heavy lifting”
                                  - Werner Vogels


• Learn the building blocks provided
• Modular system design helps
Mendeley Binary Battle
• $10,001 prize + $1000 aws vouchers
• Collaboration with PLoS
• Prizes to best use of the API

• Judging panel includes
 –Werner Vogels
 –Tim O'Reilly
We’re hiring
     http://mendeley.com/careers/

             or chat to me after

• Lead Mobile Developer, iOS
• Web Developer, PHP/MySQL
• Software Engineer, Java

Weitere ähnliche Inhalte

Andere mochten auch

Structured writing using ms word
Structured writing using ms wordStructured writing using ms word
Structured writing using ms wordWouter Verkerken
 
Mendeley Workshop Presentation
Mendeley Workshop PresentationMendeley Workshop Presentation
Mendeley Workshop PresentationSalma Patel
 
Scientific writing process
Scientific writing processScientific writing process
Scientific writing processKhalid Hakeem
 
How to write a scientific article?
How to write a scientific article?How to write a scientific article?
How to write a scientific article?Annette Gerritsen
 
Dental drugs prescription
Dental drugs prescriptionDental drugs prescription
Dental drugs prescriptionDani Firman
 
Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Dani Firman
 
Parts of a Research Paper
Parts of a Research PaperParts of a Research Paper
Parts of a Research PaperDraizelle Sexon
 

Andere mochten auch (10)

Structured writing using ms word
Structured writing using ms wordStructured writing using ms word
Structured writing using ms word
 
Mendeley Workshop Presentation
Mendeley Workshop PresentationMendeley Workshop Presentation
Mendeley Workshop Presentation
 
Scientific writing process
Scientific writing processScientific writing process
Scientific writing process
 
How to write a scientific article?
How to write a scientific article?How to write a scientific article?
How to write a scientific article?
 
Dental drugs prescription
Dental drugs prescriptionDental drugs prescription
Dental drugs prescription
 
Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)
 
How to Write a Thesis
How to Write a ThesisHow to Write a Thesis
How to Write a Thesis
 
Structured writing - What's it Good For?
Structured writing - What's it Good For?Structured writing - What's it Good For?
Structured writing - What's it Good For?
 
Introduction to-mendeley presentation-2014
Introduction to-mendeley presentation-2014Introduction to-mendeley presentation-2014
Introduction to-mendeley presentation-2014
 
Parts of a Research Paper
Parts of a Research PaperParts of a Research Paper
Parts of a Research Paper
 

Mehr von Dan Harvey

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 
Data Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopData Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopDan Harvey
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to HadoopDan Harvey
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Dan Harvey
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loadingDan Harvey
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at MendeleyDan Harvey
 

Mehr von Dan Harvey (6)

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Data Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopData Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to Hadoop
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loading
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at Mendeley
 

Kürzlich hochgeladen

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Kürzlich hochgeladen (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Amazon Web Services at Mendeley

  • 1. Amazon Web Services at Mendeley Dan Harvey Data Architect twitter: @danharvey dan.harvey@mendeley.com
  • 2. Overview • What do we do? • System design • AWS details • Future plans • Summary
  • 4. Mendeley helps researchers work smarter 1) Install Mendeley Desktop
  • 5. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Automatic data extraction 2) Manage your research papers
  • 6. Mendeley helps researchers work smarter 1) Install Mendeley Desktop External database integration 2) Manage your research papers
  • 7. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Automatic bibliography generation 2) Manage your research papers
  • 8. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Tagging and annotation 2) Manage your research papers
  • 9. Mendeley helps researchers work smarter 3) Mendeley aggregates research data in the cloud 1) Install Mendeley Desktop 2) Manage your research papers
  • 10. By doing this, Mendeley makes science more collaborative and transparent
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Mendeley in numbers • 1 million users • 130 million research articles • 40 million unique • 14 million unique files uploaded • 13 TB in total
  • 19. System Overview S3 ng Amazon Web Web Web S ynci Services Server Server EM R Brow sing Docs EC 2 Usage Logs MySQL MySQL MySQL Da ta S erv ice s Map Reduce HB ase HD FS
  • 20. File Storage • Sync to and from clients –Backed onto S3 • How to render 13TB of pdfs?
  • 21. PDF Previews • Elastic Beanstalk • Java servlet –Load & render –Store into S3 • Quick to prototype –Fast iterations –No infrastructure to set up © Elas%c
Beanstalk,
Ma/
Wood,
AWS,
2011 –Developers in control –No upfront cost in hardware • No dependency on rest of our infrastructure
  • 22. Adapt to take advantage • Improve delivery –Cloud Front –Faster worldwide • Re-working for cost saving –SQS –Spot instances –Render when it’s cheapest!
  • 23. Article Search • 40 million papers • Gives 40GB index in Solr • Variable load • Moved to EC2 –Elastic Load Balancer Two
fold
variance
in
traffic
over
a
week –Auto-scale instances
  • 24. Solr Instance Layout • Master Solr –Single instance Master –Matched to indexing load –Backed onto EBS Solr Solr Solr Slave Slave Slave • Slaves –HTTP sync to master –Pre-built AMI images Elastic Load Balancer –EC2 auto scaling
  • 25. Desktop Client • Client Downloads –From S3 –Adding CloudFront • Crash Reports –Stack traces into S3 –Analytic reports on top –More focused bug fixing
  • 26. The future • Aim to buy no more hardware • More Java on Elastic Beanstalk • SQS - replace queues • EMR - log analysis • SimpleDB & S3 for data stores
  • 27. Problems Faced • Accounting usage –Mix of users on account –Start early with this! –IAM helps • Orchestration –Cloud Formation –Elastic Beanstalk –Finding we need more
  • 28. Summary • Not all or nothing • Focus on your problem not “Undifferentiated heavy lifting” - Werner Vogels • Learn the building blocks provided • Modular system design helps
  • 29. Mendeley Binary Battle • $10,001 prize + $1000 aws vouchers • Collaboration with PLoS • Prizes to best use of the API • Judging panel includes –Werner Vogels –Tim O'Reilly
  • 30. We’re hiring http://mendeley.com/careers/ or chat to me after • Lead Mobile Developer, iOS • Web Developer, PHP/MySQL • Software Engineer, Java

Hinweis der Redaktion

  1. \n
  2. \n
  3. as\n
  4. as\n
  5. as\n
  6. as\n
  7. as\n
  8. as\n
  9. as\n
  10. as\n
  11. as\n
  12. as\n
  13. as\n
  14. as\n
  15. as\n
  16. as\n
  17. as\n
  18. as\n
  19. as\n
  20. as\n
  21. as\n
  22. as\n
  23. as\n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n