SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Mendeley Suggest:
       Engineering a
  Personalised Article
Recommender System




          Kris Jack, PhD
         Chief Data Scientist
   https://twitter.com/_krisjack
Overview

➔
    What's Mendeley?

➔
    What's Mendeley Suggest?

➔
    Computation Layer

➔
    Serving Layer
    ➔
      Architecture
    ➔
      Technologies
    ➔
      Deployment

➔
    Conclusions
What's Mendeley?
➔
    Mendeley is a platform that connects
    researchers, research data and apps




                         Mendeley Open API
➔
    Mendeley is a platform that connects
    researchers, research data and apps




                         Mendeley Open API


➔
    Startup company with ~20 R&D engineers
What's Mendeley
       Suggest?
Use Case
➔
    Good researchers are on top of their game
➔
    Difficult with the amount being produced

➔
    There must be a technology that can help




➔
    Help researchers by recommending relevant research
Mendeley Suggest
Computation
     Layer
Mendeley Suggest
Mendeley Suggest
Mendeley Suggest
Running on Amazon's Elastic Map Reduce




                On demand use and easy to cost
Computation Layer                                      1.5M Users, 50M Articles
                                      Mahout's
    Normalised Amazon Hours          Performance




                              No. Good Recommendations/10
Computation Layer                                          1.5M Users, 50M Articles
                                          Mahout's
                   Costly & Bad
    Normalised Amazon Hours              Performance            Costly & Good




              Cheap & Bad         No. Good Recommendations/10   Cheap & Good
Computation Layer                                          1.5M Users, 50M Articles
                                          Mahout's
                   Costly & Bad
    Normalised Amazon Hours              Performance            Costly & Good




              Cheap & Bad         No. Good Recommendations/10   Cheap & Good
Computation Layer                                          1.5M Users, 50M Articles
                                          Mahout's
                   Costly & Bad
    Normalised Amazon Hours              Performance            Costly & Good




              Cheap & Bad         No. Good Recommendations/10   Cheap & Good
Computation Layer                                     1.5M Users, 50M Articles
                                        Mahout's
                   Costly & Bad        Performance          Costly & Good
                              7K
    Normalised Amazon Hours


                              6K

                              5K

                              4K

                              3K

                              2K

                              1K

                               0
                          0.5     10     1.5   2      2.5         3
              Cheap & Bad   No. Good Recommendations/10     Cheap & Good
Computation Layer                                         1.5M Users, 50M Articles
                                          Mahout's
                   Costly & Bad          Performance         Costly & Good
                              7K
                                       6.5K, 1.5
    Normalised Amazon Hours


                              6K       Orig. item-based


                              5K

                              4K

                              3K

                              2K

                              1K

                               0
                          0.5     10     1.5   2      2.5           3
              Cheap & Bad   No. Good Recommendations/10       Cheap & Good
Computation Layer                                             1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance         Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K

                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K

                              1K

                               0
                          0.5     10     1.5   2      2.5               3
              Cheap & Bad   No. Good Recommendations/10           Cheap & Good
Computation Layer                                                       1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                   Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K
                                                              -4.1K
                                                              (63%)
                              4K
                                                                 Paritioners
                                                                 MR allocation
                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K

                              1K

                               0
                          0.5     10     1.5   2      2.5                         3
              Cheap & Bad   No. Good Recommendations/10                     Cheap & Good
Computation Layer                                             1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance         Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K

                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K

                              1K

                               0
                          0.5     10     1.5   2      2.5               3
              Cheap & Bad   No. Good Recommendations/10           Cheap & Good
Computation Layer                                                           1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                        Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K

                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K
                                                                  Orig. user-based
                              1K
                                                              ➔
                                                                  1K, 2.5


                               0
                          0.5     10     1.5   2      2.5                             3
              Cheap & Bad   No. Good Recommendations/10                          Cheap & Good
Computation Layer                                                           1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                        Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K

                              3K           Cust. item-based
                                                              +1 (67%)
                                       ➔
                                           2.4K, 1.5
                              2K              -1.4K
                                                                  Orig. user-based
                                              (58%)
                              1K
                                                              ➔
                                                                  1K, 2.5


                               0
                          0.5     10     1.5   2      2.5                             3
              Cheap & Bad   No. Good Recommendations/10                          Cheap & Good
Computation Layer                                                        1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                      Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K

                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K
                                                                Orig. user-based
                              1K
                                                              ➔
                                                                1K, 2.5
                                                                Cust. user-based
                                                              ➔
                                                                0.3K, 2.5
                               0
                          0.5     10     1.5   2      2.5                           3
              Cheap & Bad   No. Good Recommendations/10                        Cheap & Good
Computation Layer                                                      1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                   Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K
                                                              -4.1K
                                                              (63%)
                              4K

                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K
                                                             Orig. user-based
                              1K                             1K, 2.5
                                                               ➔


                                                      -0.7K  Cust. user-based
                                                      (70%) ➔0.3K, 2.5
                               0
                          0.5     10     1.5   2      2.5                        3
              Cheap & Bad   No. Good Recommendations/10                     Cheap & Good
Computation Layer                                                        1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                      Costly & Good
                              7K                              +1 (67%)
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K
                                                                         -6.2K
                                                                         (95%)
                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K
                                                                Orig. user-based
                              1K
                                                              ➔
                                                                1K, 2.5
                                                                Cust. user-based
                                                              ➔
                                                                0.3K, 2.5
                               0
                          0.5     10     1.5   2      2.5                           3
              Cheap & Bad   No. Good Recommendations/10                        Cheap & Good
Mahout as the Computation
Layer
➔
    Out of the box, didn't work so well for us
➔
    Needed to understand Hadoop better
➔
    Contributed patch back to community (user-user)

➔
    Next step, the serving layer...
Serving Layer
Architecture




                           Mendeley
                            Hadoop
                            Cluster
   User        Cascading
 Libraries
                                      Computation
                                      Layer
Architecture

                       AWS


                                               Elastic
                                                 Elastic
                                              Beanstalk
                               DynamoDB           Elastic
                                               Beanstalk
                                                 Beanstalk
             Serving
             Layer

                                          Mendeley
                                           Hadoop
                                           Cluster
   User                      Map Reduce
 Libraries
                                                     Computation
                                                     Layer
Technologies

➔
    Spring dependency injection framework
    ➔
        Context-wide integration testing is easy, including pre-loading
        of test data
    ➔
        Allows other Spring features (cache, security, messaging)
➔
    Spring MVC 3.2.M1
    ➔
        Annotated controllers, type conversion 'for free'
    ➔
        Asynchronous Servlet 3.0 supports thread 'parking'
➔
    AlternatorDB
    ➔
        In-memory DynamoDB implementation for testing
Technologies


                                   Recommendation<K>




              LongRecommendation                         UuidRecommendation



GroupRecommendation       PersonRecommendation         DocumentRecommendation




➔
    Build once, employ in several use cases
Deployment

➔
    AWS ElasticBeanstalk
    ➔
        Managed, auto-scaling, health-checking .war container
➔
    Jenkins continuous integration (CI) server
➔
    Maven build tool (useful dependency management)
➔
    beanstalk-maven-plugin (push a button to deploy)
    ➔
        Deploys to ElasticBeanstalk
    ➔
        Replaces existing application version if required
    ➔
        'Zero downtime' updates (tested at ~300ms)
    ➔
        Triggered by Jenkins
Putting it all together... $$$
➔
    Real-time article recommendations for 2 million users
➔
    20 requests per second
➔
    $65.84/month
    ➔
        $34.24 ElasticBeanstalk
    ➔
        $28.17 DynamoDB
    ➔
        $2.76 bandwidth
➔
    $30 to update the computation layer periodically
Conclusions
Conclusions
➔
    Mendeley Suggest is a personalised article recommender
➔
    Built by small team for big data
➔
    Uses Mahout as computation layer
    ➔
        Needs some love out of the box
➔
    Serves from AWS
    ➔
        Reduces maintenance costs and is reliable
➔
    Intend to release Mendeley Suggest to all users this year
We're Hiring!
➔
    Data Scientist
    ➔
        apply recommender technologies to Mendeley's data
    ➔
        work on improving the quality of Mendeley's research catalogue
    ➔
        starting in first quarter of 2013
    ➔
        6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7
        TEAM project (http://team-project.tugraz.at/)
➔
    http://www.mendeley.com/careers/
www.mendeley.com

Weitere ähnliche Inhalte

Mehr von Kris Jack

Mehr von Kris Jack (17)

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data Challenges
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similarities
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchers
 
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic LiteratureMendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic Literature
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific Literature
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from Mendeley
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Mendeley Suggest: Engineering a Personalised Article Recommender System

  • 1. Mendeley Suggest: Engineering a Personalised Article Recommender System Kris Jack, PhD Chief Data Scientist https://twitter.com/_krisjack
  • 2. Overview ➔ What's Mendeley? ➔ What's Mendeley Suggest? ➔ Computation Layer ➔ Serving Layer ➔ Architecture ➔ Technologies ➔ Deployment ➔ Conclusions
  • 4. Mendeley is a platform that connects researchers, research data and apps Mendeley Open API
  • 5. Mendeley is a platform that connects researchers, research data and apps Mendeley Open API ➔ Startup company with ~20 R&D engineers
  • 6. What's Mendeley Suggest?
  • 7. Use Case ➔ Good researchers are on top of their game ➔ Difficult with the amount being produced ➔ There must be a technology that can help ➔ Help researchers by recommending relevant research
  • 9. Computation Layer
  • 13. Running on Amazon's Elastic Map Reduce On demand use and easy to cost
  • 14. Computation Layer 1.5M Users, 50M Articles Mahout's Normalised Amazon Hours Performance No. Good Recommendations/10
  • 15. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 16. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 17. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 18. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K Normalised Amazon Hours 6K 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 19. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 20. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 21. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K Paritioners MR allocation 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 22. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 23. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 24. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based +1 (67%) ➔ 2.4K, 1.5 2K -1.4K Orig. user-based (58%) 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 25. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 26. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K 1K, 2.5 ➔ -0.7K Cust. user-based (70%) ➔0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 27. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K +1 (67%) 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K -6.2K (95%) 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 28. Mahout as the Computation Layer ➔ Out of the box, didn't work so well for us ➔ Needed to understand Hadoop better ➔ Contributed patch back to community (user-user) ➔ Next step, the serving layer...
  • 30. Architecture Mendeley Hadoop Cluster User Cascading Libraries Computation Layer
  • 31. Architecture AWS Elastic Elastic Beanstalk DynamoDB Elastic Beanstalk Beanstalk Serving Layer Mendeley Hadoop Cluster User Map Reduce Libraries Computation Layer
  • 32. Technologies ➔ Spring dependency injection framework ➔ Context-wide integration testing is easy, including pre-loading of test data ➔ Allows other Spring features (cache, security, messaging) ➔ Spring MVC 3.2.M1 ➔ Annotated controllers, type conversion 'for free' ➔ Asynchronous Servlet 3.0 supports thread 'parking' ➔ AlternatorDB ➔ In-memory DynamoDB implementation for testing
  • 33. Technologies Recommendation<K> LongRecommendation UuidRecommendation GroupRecommendation PersonRecommendation DocumentRecommendation ➔ Build once, employ in several use cases
  • 34. Deployment ➔ AWS ElasticBeanstalk ➔ Managed, auto-scaling, health-checking .war container ➔ Jenkins continuous integration (CI) server ➔ Maven build tool (useful dependency management) ➔ beanstalk-maven-plugin (push a button to deploy) ➔ Deploys to ElasticBeanstalk ➔ Replaces existing application version if required ➔ 'Zero downtime' updates (tested at ~300ms) ➔ Triggered by Jenkins
  • 35. Putting it all together... $$$ ➔ Real-time article recommendations for 2 million users ➔ 20 requests per second ➔ $65.84/month ➔ $34.24 ElasticBeanstalk ➔ $28.17 DynamoDB ➔ $2.76 bandwidth ➔ $30 to update the computation layer periodically
  • 37. Conclusions ➔ Mendeley Suggest is a personalised article recommender ➔ Built by small team for big data ➔ Uses Mahout as computation layer ➔ Needs some love out of the box ➔ Serves from AWS ➔ Reduces maintenance costs and is reliable ➔ Intend to release Mendeley Suggest to all users this year
  • 38. We're Hiring! ➔ Data Scientist ➔ apply recommender technologies to Mendeley's data ➔ work on improving the quality of Mendeley's research catalogue ➔ starting in first quarter of 2013 ➔ 6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7 TEAM project (http://team-project.tugraz.at/) ➔ http://www.mendeley.com/careers/