SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Personalisation and
      Recommendations using Drupal
• Keywords:
  –   Personalisation
  –   Recommendations
  –   Scalable machine learning
  –   Predictions
  –   Similarity
  –   Data Mining
  –   Big Data
  –   Trend Spotting
  –   Clustering
                     Drupal Developer Days Barcelona
                               2012.06.16
Kendra Initiative
• Mission
  – Foster an Open Distributed Marketplace for Digital
    Media
• EU funded
  – P2P-Next
     • http://www.p2p-next.org
  – SARACEN = Socially Aware, collaboRative, scAlable
    Coding mEdia distributioN
     • http://www.saracen-p2p.eu
                    Drupal Developer Days Barcelona
                              2012.06.16
Deliverables
• Kendra Signpost
   – Metadata interoperability, mapping and transformation
• Smart Filters
   – Portable preferences and filters
• Kendra Social, Kendra Hub
   – Social networking management tools
• Standards work
   – OpenSocial extension
   – Social API – see Abstracting Social Networking functionality in
     Drupal sprint
• Kendra Match
   – Searching and recommendation

                         Drupal Developer Days Barcelona
                                   2012.06.16
Components
•   Drupal Recommender API module
•   Recommender helper modules
•   async_command module
•   Apache Mahout or cloud service
•   Hadoop cluster (optional)




                  Drupal Developer Days Barcelona
                            2012.06.16
Industry Examples
•   Amazon
•   Netflix
•   Spotify, Pandora
•   Facebook, LinkedIn
•   OKCupid
•   iTunes: Genius; app store - not so much


                   Drupal Developer Days Barcelona
                             2012.06.16
Machine learning
• Collaborative Filtering
  – AKA recommender engines
• Clustering
• Classification




                   Drupal Developer Days Barcelona
                             2012.06.16
Collaborative Filtering
• Input: preference data
• Output: predictions
• Preference = <uid1, (nid1 or uid2), w1>
  – w1 = signed integer representing weight of uid1-
    nid1 or uid1-uid2 correlation (affinity)
• Prediction = <uid1, (nid1or uid2), w2>
  – w2 = float representing strength of uid1-nid1 or
    uid1-uid2 correlation

                    Drupal Developer Days Barcelona
                              2012.06.16
Enter Mahout
• Apache Mahout is a scalable machine learning
  library that supports large data sets.
• Launched Spring 2010
• Grew from the Apache Lucene project (basis
  for Apache Solr)
• Merged with Taste project



                 Drupal Developer Days Barcelona
                           2012.06.16
Use Cases
•   Recommendation mining
•   Clustering
•   Classification
•   Frequent itemset mining




                  Drupal Developer Days Barcelona
                            2012.06.16
Out-of-box algorithms
•   Recommendation
     –   User-based recommender
     –   Item-based recommender
     –   Slope-One recommender
     –   Distributed Item-Based Collaborative Filtering
     –   Collaborative Filtering using parallel matrix factorisation
•   Clustering
     –   Canopy Clustering
     –   K-Means Clustering
     –   Fuzzy K-Means
     –   Mean Shift Clustering
     –   Dirichlet Process Clustering
     –   Latent Dirichlet Allocation
     –   Spectral Clustering
     –   Minhash Clustering
•   Model combination
     – Naive Bayes algorithm



                                        Drupal Developer Days Barcelona
                                                  2012.06.16
Hadoop
• Provides clustering capabilities
• Not trivial to set up
• Not yet implemented in Recommender API
  (issue #1206840)




                Drupal Developer Days Barcelona
                          2012.06.16
Recommender API
• Drupal 7 (alpha) & 6 (beta)
• Can run either on same server as Apache web
  server or on a remote server
• Java helper program (was PHP)
• Uses JDBC and Java Persistence API (JPA)
• Drupal helper modules



                 Drupal Developer Days Barcelona
                           2012.06.16
Recommender API helper modules
•   Browsing History Recommender
•   OG Similar groups module
•   Ubercart Products Recommender
•   Fivestar Recommender
•   Points Voting Recommender
•   Flag Recommender


                 Drupal Developer Days Barcelona
                           2012.06.16
Asynchronous operation
• Async_command module
  – Talks to Mahout
  – Typically run via cron
• Results are stored directly in Drupal db
  – Recommender tables
  – Via JDBC




                    Drupal Developer Days Barcelona
                              2012.06.16
Hosting Solutions
• Self-hosted: all-in-one (web server, database
  server, recommender server) - has its pro’s &
  cons
• Recommender API Cloud Service - looking for
  beta testers
• Amazon Elastic MapReduce (EMR)



                  Drupal Developer Days Barcelona
                            2012.06.16
Installing Mahout
• Prerequisites:
  – Dedicated VM if possible
  – Linux, Mac OSX Leopard 10.5.6 or later, Windows
    (Cygwin)
  – Java JDK 1.6
  – Maven 2.0.11 or higher (maven.apache.org)




                   Drupal Developer Days Barcelona
                             2012.06.16
Installing Mahout
• Building
  – Follow instructions
  – https://cwiki.apache.org/MAHOUT/buildingmaho
    ut.html
• Use maven to build examples




                 Drupal Developer Days Barcelona
                           2012.06.16
Installing Mahout
• Testing: Grouplens
  – On a single 2GHz server:
     • 100K ratings (1000 users, 1700 items): 9 minutes. 1M
       ratings (6000 users, 4000 items): 12 hours. 10M ratings
       (72,000 users, 10,000 items): fuggedaboutit
  – Using 6 concurrent 2GHz processing units:
     • 100K ratings (1000 users, 1700 items): 2 minutes. 1M
       ratings (6000 users, 4000 items): 2 hours. 10M ratings
       (72,000 users, 10,000 items): 11 days 20 hours.


                     Drupal Developer Days Barcelona
                               2012.06.16
Installing Recommender API
• See http://drupal.org/node/1207634
• Configuration
  – sites/all/modules/async_command/config.propert
    ies should match settings.php
• Download and enable async_command
• Check
  /admin/config/search/recommender/admin


                  Drupal Developer Days Barcelona
                            2012.06.16
Usage
• Making recommendations
  – User-user
  – User-item
  – Item-item
• Predictions/similarity feeds back into Drupal
• Blocks
• Views

                  Drupal Developer Days Barcelona
                            2012.06.16
Case study: Data Mining and
      Recommendations in SARACEN
• SARACEN: http://www.saracen-p2p.eu/
• Feedback loop to measure subjective quality of
  the recommendations
  – Limited set of data, small user base
  – API provides an initial set of recommended videos
  – User can then watch a recommended video
  – User’s actions are incorporated into their implicit
    profile, feeds back to the recommender API
  – Recommender API generates new predictions based
    on the complete set of implicit profile metadata

                    Drupal Developer Days Barcelona
                              2012.06.16
SARACEN: Prototype




     Drupal Developer Days Barcelona
               2012.06.16
Recommender data sources
• Explicit data
   – SARACEN account data, including location and language
   – Linked accounts and profiles
        • e.g. Facebook user profile, “likes”, connections, metadata
• Implicit data
   –   Activity history recorded during the user’s sessions
   –   Searches
   –   Shared content
   –   Viewed content
   –   Albums (media containers)
   –   Content ratings
                            Drupal Developer Days Barcelona
                                      2012.06.16
Scalability
• Don’t need Hadoop if
  – Number of users is orders of magnitude larger
    than the number of items
  – Users browse anonymously most of the time
  – Few users log in and need personalised
    recommendations
  – Item churn rate is relatively low



                   Drupal Developer Days Barcelona
                             2012.06.16
Worth Considering
• Decreased Transparency
• Decreased Serendipity
• Sleep deprivation




                Drupal Developer Days Barcelona
                          2012.06.16
Resources: Recommender API
• http://drupal.org/project/recommender
• http://recommenderapi.com/cloud
• https://cwiki.apache.org/confluence/display/
  MAHOUT




                 Drupal Developer Days Barcelona
                           2012.06.16
Resources: Mahout
• http://mahout.apache.org/
• Mahout in Action
  – http://www.manning.com/owen/
  – ISBN 9781935182689.
• The Optimality of Naive Bayes, Harry Zhang.
• http://aws.amazon.com/elasticmapreduce/



                 Drupal Developer Days Barcelona
                           2012.06.16
Acknowledgements
• Socially Aware, collaboRative, scAlable Coding
  mEdia distributioN (SARACEN)
  – http://www.saracen-p2p.eu
  – Funded within the European Union’s Seventh
    Framework Programme (FP7/2007-2013) under
    grant agreement 248474




                  Drupal Developer Days Barcelona
                            2012.06.16
Questions/comments…
• Kendra Initiative
   – @kendra
   – http://www.kendra.org.uk
   – https://github.com/kendrainitiative
• Klokie Grossfeld
   – @klokie
   – klokie@kendra.org.uk
   – http://www.linkedin.com/in/klokie
• Daniel Harris
   – @dahacouk
   – daniel@kendra.org.uk

                       Drupal Developer Days Barcelona
                                 2012.06.16
Drupal Developer Days Barcelona
          2012.06.16

Weitere ähnliche Inhalte

Ähnlich wie Personalisation and Recommendations using Drupal and Apache Mahout

Scalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopScalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopGrant Ingersoll
 
Reco4J @ London Meetup (June 26th)
Reco4J @ London Meetup (June 26th)Reco4J @ London Meetup (June 26th)
Reco4J @ London Meetup (June 26th)Alessandro Negro
 
Introduction to drupal
 Introduction to drupal Introduction to drupal
Introduction to drupalRachit Gupta
 
Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012scorlosquet
 
Getting Started with Drupal - Handouts
Getting Started with Drupal - HandoutsGetting Started with Drupal - Handouts
Getting Started with Drupal - HandoutsRachel Vacek
 
Reco4J @ Munich Meetup (April 18th)
Reco4J @ Munich Meetup (April 18th)Reco4J @ Munich Meetup (April 18th)
Reco4J @ Munich Meetup (April 18th)Alessandro Negro
 
Demystifying Decoupled Drupal for Developers & Content Authors
Demystifying Decoupled Drupal for Developers & Content AuthorsDemystifying Decoupled Drupal for Developers & Content Authors
Demystifying Decoupled Drupal for Developers & Content AuthorsRachel Wandishin
 
Play Architecture, Implementation, Shiny Objects, and a Proposal
Play Architecture, Implementation, Shiny Objects, and a ProposalPlay Architecture, Implementation, Shiny Objects, and a Proposal
Play Architecture, Implementation, Shiny Objects, and a ProposalMike Slinn
 
Hadoop_Architect__eVenkat
Hadoop_Architect__eVenkatHadoop_Architect__eVenkat
Hadoop_Architect__eVenkatVenkat Krishnan
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Drupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP WebinarDrupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP Webinarscorlosquet
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014spinningmatt
 
Provisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & AmbariProvisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & AmbariDataWorks Summit/Hadoop Summit
 
Drupal for programmers
Drupal for programmersDrupal for programmers
Drupal for programmersMichael Shahov
 
Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Joachim Neubert
 
SLQ vs NOSQL - friends or foes
SLQ vs NOSQL - friends or foes SLQ vs NOSQL - friends or foes
SLQ vs NOSQL - friends or foes Pedro Gomes
 

Ähnlich wie Personalisation and Recommendations using Drupal and Apache Mahout (20)

Scalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopScalable Machine Learning with Hadoop
Scalable Machine Learning with Hadoop
 
Reco4J @ London Meetup (June 26th)
Reco4J @ London Meetup (June 26th)Reco4J @ London Meetup (June 26th)
Reco4J @ London Meetup (June 26th)
 
Introduction to drupal
 Introduction to drupal Introduction to drupal
Introduction to drupal
 
Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012Drupal and the semantic web - SemTechBiz 2012
Drupal and the semantic web - SemTechBiz 2012
 
MahoutNew
MahoutNewMahoutNew
MahoutNew
 
Getting Started with Drupal - Handouts
Getting Started with Drupal - HandoutsGetting Started with Drupal - Handouts
Getting Started with Drupal - Handouts
 
Reco4J @ Munich Meetup (April 18th)
Reco4J @ Munich Meetup (April 18th)Reco4J @ Munich Meetup (April 18th)
Reco4J @ Munich Meetup (April 18th)
 
Demystifying Decoupled Drupal for Developers & Content Authors
Demystifying Decoupled Drupal for Developers & Content AuthorsDemystifying Decoupled Drupal for Developers & Content Authors
Demystifying Decoupled Drupal for Developers & Content Authors
 
Play Architecture, Implementation, Shiny Objects, and a Proposal
Play Architecture, Implementation, Shiny Objects, and a ProposalPlay Architecture, Implementation, Shiny Objects, and a Proposal
Play Architecture, Implementation, Shiny Objects, and a Proposal
 
Hadoop_Architect__eVenkat
Hadoop_Architect__eVenkatHadoop_Architect__eVenkat
Hadoop_Architect__eVenkat
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Drupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP WebinarDrupal and the Semantic Web - ESIP Webinar
Drupal and the Semantic Web - ESIP Webinar
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
 
Provisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & AmbariProvisioning Big Data Platform using Cloudbreak & Ambari
Provisioning Big Data Platform using Cloudbreak & Ambari
 
Cassandra eu
Cassandra euCassandra eu
Cassandra eu
 
Drupal for programmers
Drupal for programmersDrupal for programmers
Drupal for programmers
 
Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)Linked Data Publishing with Drupal (SWIB13 workshop)
Linked Data Publishing with Drupal (SWIB13 workshop)
 
SLQ vs NOSQL - friends or foes
SLQ vs NOSQL - friends or foes SLQ vs NOSQL - friends or foes
SLQ vs NOSQL - friends or foes
 
Apache drill
Apache drillApache drill
Apache drill
 

Kürzlich hochgeladen

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Kürzlich hochgeladen (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Personalisation and Recommendations using Drupal and Apache Mahout

  • 1. Personalisation and Recommendations using Drupal • Keywords: – Personalisation – Recommendations – Scalable machine learning – Predictions – Similarity – Data Mining – Big Data – Trend Spotting – Clustering Drupal Developer Days Barcelona 2012.06.16
  • 2. Kendra Initiative • Mission – Foster an Open Distributed Marketplace for Digital Media • EU funded – P2P-Next • http://www.p2p-next.org – SARACEN = Socially Aware, collaboRative, scAlable Coding mEdia distributioN • http://www.saracen-p2p.eu Drupal Developer Days Barcelona 2012.06.16
  • 3. Deliverables • Kendra Signpost – Metadata interoperability, mapping and transformation • Smart Filters – Portable preferences and filters • Kendra Social, Kendra Hub – Social networking management tools • Standards work – OpenSocial extension – Social API – see Abstracting Social Networking functionality in Drupal sprint • Kendra Match – Searching and recommendation Drupal Developer Days Barcelona 2012.06.16
  • 4. Components • Drupal Recommender API module • Recommender helper modules • async_command module • Apache Mahout or cloud service • Hadoop cluster (optional) Drupal Developer Days Barcelona 2012.06.16
  • 5. Industry Examples • Amazon • Netflix • Spotify, Pandora • Facebook, LinkedIn • OKCupid • iTunes: Genius; app store - not so much Drupal Developer Days Barcelona 2012.06.16
  • 6. Machine learning • Collaborative Filtering – AKA recommender engines • Clustering • Classification Drupal Developer Days Barcelona 2012.06.16
  • 7. Collaborative Filtering • Input: preference data • Output: predictions • Preference = <uid1, (nid1 or uid2), w1> – w1 = signed integer representing weight of uid1- nid1 or uid1-uid2 correlation (affinity) • Prediction = <uid1, (nid1or uid2), w2> – w2 = float representing strength of uid1-nid1 or uid1-uid2 correlation Drupal Developer Days Barcelona 2012.06.16
  • 8. Enter Mahout • Apache Mahout is a scalable machine learning library that supports large data sets. • Launched Spring 2010 • Grew from the Apache Lucene project (basis for Apache Solr) • Merged with Taste project Drupal Developer Days Barcelona 2012.06.16
  • 9. Use Cases • Recommendation mining • Clustering • Classification • Frequent itemset mining Drupal Developer Days Barcelona 2012.06.16
  • 10. Out-of-box algorithms • Recommendation – User-based recommender – Item-based recommender – Slope-One recommender – Distributed Item-Based Collaborative Filtering – Collaborative Filtering using parallel matrix factorisation • Clustering – Canopy Clustering – K-Means Clustering – Fuzzy K-Means – Mean Shift Clustering – Dirichlet Process Clustering – Latent Dirichlet Allocation – Spectral Clustering – Minhash Clustering • Model combination – Naive Bayes algorithm Drupal Developer Days Barcelona 2012.06.16
  • 11. Hadoop • Provides clustering capabilities • Not trivial to set up • Not yet implemented in Recommender API (issue #1206840) Drupal Developer Days Barcelona 2012.06.16
  • 12. Recommender API • Drupal 7 (alpha) & 6 (beta) • Can run either on same server as Apache web server or on a remote server • Java helper program (was PHP) • Uses JDBC and Java Persistence API (JPA) • Drupal helper modules Drupal Developer Days Barcelona 2012.06.16
  • 13. Recommender API helper modules • Browsing History Recommender • OG Similar groups module • Ubercart Products Recommender • Fivestar Recommender • Points Voting Recommender • Flag Recommender Drupal Developer Days Barcelona 2012.06.16
  • 14. Asynchronous operation • Async_command module – Talks to Mahout – Typically run via cron • Results are stored directly in Drupal db – Recommender tables – Via JDBC Drupal Developer Days Barcelona 2012.06.16
  • 15. Hosting Solutions • Self-hosted: all-in-one (web server, database server, recommender server) - has its pro’s & cons • Recommender API Cloud Service - looking for beta testers • Amazon Elastic MapReduce (EMR) Drupal Developer Days Barcelona 2012.06.16
  • 16. Installing Mahout • Prerequisites: – Dedicated VM if possible – Linux, Mac OSX Leopard 10.5.6 or later, Windows (Cygwin) – Java JDK 1.6 – Maven 2.0.11 or higher (maven.apache.org) Drupal Developer Days Barcelona 2012.06.16
  • 17. Installing Mahout • Building – Follow instructions – https://cwiki.apache.org/MAHOUT/buildingmaho ut.html • Use maven to build examples Drupal Developer Days Barcelona 2012.06.16
  • 18. Installing Mahout • Testing: Grouplens – On a single 2GHz server: • 100K ratings (1000 users, 1700 items): 9 minutes. 1M ratings (6000 users, 4000 items): 12 hours. 10M ratings (72,000 users, 10,000 items): fuggedaboutit – Using 6 concurrent 2GHz processing units: • 100K ratings (1000 users, 1700 items): 2 minutes. 1M ratings (6000 users, 4000 items): 2 hours. 10M ratings (72,000 users, 10,000 items): 11 days 20 hours. Drupal Developer Days Barcelona 2012.06.16
  • 19. Installing Recommender API • See http://drupal.org/node/1207634 • Configuration – sites/all/modules/async_command/config.propert ies should match settings.php • Download and enable async_command • Check /admin/config/search/recommender/admin Drupal Developer Days Barcelona 2012.06.16
  • 20. Usage • Making recommendations – User-user – User-item – Item-item • Predictions/similarity feeds back into Drupal • Blocks • Views Drupal Developer Days Barcelona 2012.06.16
  • 21. Case study: Data Mining and Recommendations in SARACEN • SARACEN: http://www.saracen-p2p.eu/ • Feedback loop to measure subjective quality of the recommendations – Limited set of data, small user base – API provides an initial set of recommended videos – User can then watch a recommended video – User’s actions are incorporated into their implicit profile, feeds back to the recommender API – Recommender API generates new predictions based on the complete set of implicit profile metadata Drupal Developer Days Barcelona 2012.06.16
  • 22. SARACEN: Prototype Drupal Developer Days Barcelona 2012.06.16
  • 23. Recommender data sources • Explicit data – SARACEN account data, including location and language – Linked accounts and profiles • e.g. Facebook user profile, “likes”, connections, metadata • Implicit data – Activity history recorded during the user’s sessions – Searches – Shared content – Viewed content – Albums (media containers) – Content ratings Drupal Developer Days Barcelona 2012.06.16
  • 24. Scalability • Don’t need Hadoop if – Number of users is orders of magnitude larger than the number of items – Users browse anonymously most of the time – Few users log in and need personalised recommendations – Item churn rate is relatively low Drupal Developer Days Barcelona 2012.06.16
  • 25. Worth Considering • Decreased Transparency • Decreased Serendipity • Sleep deprivation Drupal Developer Days Barcelona 2012.06.16
  • 26. Resources: Recommender API • http://drupal.org/project/recommender • http://recommenderapi.com/cloud • https://cwiki.apache.org/confluence/display/ MAHOUT Drupal Developer Days Barcelona 2012.06.16
  • 27. Resources: Mahout • http://mahout.apache.org/ • Mahout in Action – http://www.manning.com/owen/ – ISBN 9781935182689. • The Optimality of Naive Bayes, Harry Zhang. • http://aws.amazon.com/elasticmapreduce/ Drupal Developer Days Barcelona 2012.06.16
  • 28. Acknowledgements • Socially Aware, collaboRative, scAlable Coding mEdia distributioN (SARACEN) – http://www.saracen-p2p.eu – Funded within the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement 248474 Drupal Developer Days Barcelona 2012.06.16
  • 29. Questions/comments… • Kendra Initiative – @kendra – http://www.kendra.org.uk – https://github.com/kendrainitiative • Klokie Grossfeld – @klokie – klokie@kendra.org.uk – http://www.linkedin.com/in/klokie • Daniel Harris – @dahacouk – daniel@kendra.org.uk Drupal Developer Days Barcelona 2012.06.16
  • 30. Drupal Developer Days Barcelona 2012.06.16

Hinweis der Redaktion

  1. Scalable machine learningkeywords: Recommendations, Personalisation, Big Data, Data Mining, Trend Spotting, Predictions, Clusteringaudience: developers, experimenters - how many have already installed or played with Mahout? Recommender API? Built their own solutions?arch. overview: Drupal + Recommender API + Apache Mahout or cloud service; optionally run Mahout on Hadoop clusterasynchronous, using Mahout (Java) for heavy lifting; was PHP in early Rec. API but PHP sucks for computationally intensive or asynchronous tasks
  2. Scalable machine learningkeywords: Recommendations, Personalisation, Big Data, Data Mining, Trend Spotting, Predictions, Clusteringaudience: developers, experimenters - how many have already installed or played with Mahout? Recommender API? Built their own solutions?arch. overview: Drupal + Recommender API + Apache Mahout or cloud service; optionally run Mahout on Hadoop clusterasynchronous, using Mahout (Java) for heavy lifting; was PHP in early Rec. API but PHP sucks for computationally intensive or asynchronous tasks
  3. AmazonNetflixNetflix PrizeSpotify, PandoraFacebook, LinkedInOKCupidiTunes Genius; app store not so muchmany moreAs Amazon and others have demonstrated, recommenders can have concrete commercial value by enabling smart cross-selling opportunities. One firm reports that recommending products to users can drive an 8 to 12 percent increase in sales.
  4. Recommendation mining: aggregate a user’s behavior and use it to find other items they might likeClustering: take documents and group them by topicClassification: learn from exisitingcategorised documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category.Frequent itemset mining: take a set of item groups (terms in a query session, shopping cart content) and identify which individual items usually appear together
  5. Provides clustering capabilitiesNot trivial to set upSee issue #1206840 re: Recommender API support for Hadoop Mahout actually support Hadoop clusters, so potentially the Recommender API can use Hadoop too for really large computational tasks. However, I’m not sure if Hadoop is really needed because the current implementation is already quite fast.

  6. Http://drupal.org/project/recommender - Drupal 7 (alpha) &amp; 6 (beta)A Java program that uses Apache Mahout to do the recommendation computationThe Java program can run either on the local Drupal server or on a remote computer with better CPU/RAM capacityUses JDBC and Java Persistence API (JPA) to directly access the required Drupal database tables on most JDBC-compliant databasesEarlier version was originally done in PHP but the current design is much more scalableA Drupal module (recommender)So that users can issue commands to the Java program through the Drupal interfaceThen the Java program will pick up those commands and execute accordingly.Drupal integration moodulesAll the nitty-gritty communication between Drupal and the Java program is handled by Recommender APIHelper modules just use Recommender API to calculate the recommendations
  7. A feedback loop can be used to measure subjective quality of the recommendations:API provides an initial set of recommended items based on predictions using a limited set of dataUser is able to watch an item from the set of recommended items, or add them to his boxes for later viewingUser’s actions are incorporated into their implicit profile, feeds back to the recommender APIRecommender API generates new predictions based on the complete set of implicit profile metadata
  8. The output of the classifier models will be fed into the recommender models, but not vice versa, to prevent the creation of feedback loops in the modelling process. The final recommendation and classifier outputs will then be fed back into the implicit data triple store, where they may be relayed to users for predictions and similarity.All the classifiers and recommenders, and the model combiners, will run concurrently and asynchronously, and, if necessary, in parallel on different nodes in the Kendra API environment. This method is preferred to the generation of recommendations and classifications on demand, because the relevant algorithms tend to produce results in batches for multiple users, as opposed to individual results one at a time.
  9. ProcessingRecommendations are computed every 2 minutes during the initial implementation, using the Linux cron daemon.RationaleThis system has been chosen for a number of reasons:The overall multi-model and combiner system represents the state of the art in recommendation systems, and is well proven in other applications with similar problems.In spite of its apparent ad hoc approach, the model-combiner approach is known to be highly robust, and is thus a safe choice for the engineering goals of the projectSince it is impossible to know in advance of actual testing which classifiers will be successful, a model-combiner-based approach provides an objective means to select which algorithms should be used in the final system. Hand tuning is minimised, making results more objective and at the same time reducing project effort.This approach allows work on the project to progress incrementally, with the ability to generate partial results at an early stage in the development process, thereby increasing the probability of a successful project outcome.At the same time, this approach allows Kendra to take a novel research direction in producing a novel recommender algorithm, without detracting from the engineering goal of providing a working recommendation system for the project.The overall framework will then allow the assessment of the effectiveness of this recommender relative to the effectiveness of existing algorithms, in an objective manner.
  10. Deploying a massively scalable recommender system with Apache Mahout focuses on use cases different from SARACEN, but still useful:Use cases for HadoopNumber of users is orders of magnitude larger than the number of itemsUsers browse anonymously most of the timeFew users log in and need personalised recommendationsYour item churn rate is relatively low, items are available for weeks or months and it’s ok to have a waiting time of half a day or more until new items are included in the recommendationsI.e. most e-commerce sites and many video portals.
  11. Decreased transparencyhow are my previous choices influencing what I see?Serendipityrandom recommendations will, by definition, not receive as many clicks, but may add to system’s valueSleep deprivationif you’re in charge of setting up and maintaining a Hadoop cluster