SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Crab
                A Python Framework for Building
                    Recommendation Engines
                       Scipy 2011, Austin TX


Marcel Caraciolo Ricardo Caspirro             Bruno Melo
   @marcelcaraciolo       @ricardocaspirro        @brunomelo

                                                               1
What is Crab ?

 A python framework for building recommendation engines
A Scikit module for collaborative, content and hybrid filtering
       Mahout Alternative for Python Developers :D
             Open-Source under the BSD license


             https://github.com/muricoca/crab




                                                                 2
When started ?

It began one year ago
Community-driven, 4 members
Since April,2011 the open-source labs Muriçoca incorporated it
Since April,2011 rewritting it as Scikit




                https://github.com/muricoca/
                                                             3
Knowing Scikits
Scikits are Scipy Toolkits - independent and projects hosted
                under a common namespace.


                       Scikits Image
                     Scikits MlabWrap
                     Scikits AudioLab
                      Scikit Learn
                             ....

           http://scikits.appspot.com/scikits



                                                               4
Knowing Scikits

                        Scikit-Learn

    Machine Learning Algorithms + scientific Python packages
                (Numpy, Scipy and Matplotlib)

           http://scikit-learn.sourceforge.net/


Our goal: Incorporate the Crab as Scikit and incorporate
           some parts of them at Scikit-learn


                                                              5
Why Recommendations ?
The world is an over-crowded place
 !"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0#




                                      6
Why Recommendations
     * +,&-.$/).#&0#/"1.#$%234(".#                   ?
       $/)#5(&6 7&.2.#"$4,#)$8
                   We are overloaded
     * 93((3&/.#&0#:&'3".;#5&&<.#
         $/)#:-.34#2%$4<.#&/(3/"
Thousands of news articles and blog posts each day
       * =/#>$/&3;#?#@A#+B#4,$//"(.;#
          2,&-.$/).#&0#7%&6%$:.#
 Millions of movies, books and music tracks online
          "$4,#)$8
          Several Places, Offers and Events

     * =/#C"1#D&%<;#."'"%$(#
  Even Friends sometimes we are overloaded !

         2,&-.$/).#&0#$)#:"..$6".#
         ."/2#2&#-.#7"%#)$8




                                                         7
Why Recommendations ?
We really need and consume only a few of them!

   “A lot of times, people don’t know what
   they want until you show it to them.”
                                         Steve Jobs

  “We are leaving the Information age, and
  entering into the Recommendation age.”
                      Chris Anderson, from book Long Tail



                                                            8
Why Recommendations ?
Can Google help ?
  Yes, but only when we really know what we are looking for
           But, what’s does it mean by “interesting” ?
Can Facebook help ?
  Yes, I tend to find my friends’ stuffs interesting
   What if i had only few friends and what they like do not always
                             attract me ?
Can experts help ?
  Yes, but it won’t scale well.
    But it is what they like, not me! Exactly same advice!


                                                                     9
Why Recommendations ?
         Recommendation Systems
Systems designed to recommend to me something I may like




                                                           10
Why Recommendations ?
     !"#$%&"'$"'(')*#*+,)
     Recommendation Systems

      -+*#)+.               -#/')             0#)1#




                                    !
2'              23&4"+')1               5,6           7),*%'"&863


                      Graph Representation




                                                                    11
The current Crab

Collaborative Filtering algorithms
  User-Based, Item-Based and Slope One

Evaluation of the Recommender Algorithms
 Precision, Recall, F1-Score, RMSE




                           Precision-Recall Charts

                                                     12
The current Crab




   Precision-Recall Charts

                             13
The current Crab




                   14
The current Crab




Using REST APIs to deploy the recommender
          django-piston, django-rest, django-tastypie




                                                        15
Crab is already in production

  Brazilian Social Network called Atepassar.com
        Educational network with more than 60.000 students and 3000 video-classes




     Running on Python
    + Numpy + Scipy and
          Django


Backend for Recommendations
MongoDB - mongoengine

   Daily Recommendations
    with Explanations



                                                                                    16
Evaluating your recommender
 Crab implements the most used recommender metrics.
     Precision, Recall, F1-Score, RMSE



     Using matplotlib
     for a plotter utility

 Implement new metrics

Simulations support maybe (??)




                                                  17
Evaluating your recommender
All you have to do is implement your Evaluator




                                                 18
Distributing the recommendation computations


Use Hadoop and Map-Reduce intensively
  Investigating the Yelp mrjob framework   https://github.com/pfig/mrjob



Develop the Netflix and novel standard-of-the-art used
   Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines



The most commonly used is Slope One technique.



                                                                                 19
Why migrate ?
Old Crab running only using Pure Python
     Recommendations demand heavy maths calculations and lots of processing

Compatible with Numpy and Scipy libraries
   High Standard and popular scientific libraries optimized for scientific calculations in Python

Scikits projects are amazing!
    Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn)

Turn the Crab framework visible for the community
 Join the scientific researchers and machine learning developers around the Globe coding with
                                 Python to help us in this project


                              Be Fast and Furious

                                                                                                  20
Why migrate ?



Numpy optimized with PyPy

     2.x - 48.x Faster



  http://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html




                                                                               21
How are we working ?
            Sprints, Online Discussions and Issues




https://github.com/muricoca/crab/wiki/UpcomingEvents

                                                       22
Future Releases
        Planned Release 0.1
   Collaborative Filtering Algorithms working, sample datasets to load and test


        Planned Release 0.11
       Evaluation of Recommendation Algorithms and Database Models support


        Planned Release 0.12
   Recommendation as Services with REST APIs




....



                                                                                  23
Join us!

1. Read our Wiki Page
    https://github.com/muricoca/crab/wiki/Developer-Resources

2. Check out our current sprints and open issues
    https://github.com/muricoca/crab/issues

3. Forks, Pull Requests mandatory
4. Join us at irc.freenode.net #muricoca or at our
                     discussion list
                http://groups.google.com/group/scikit-crab




                                                                24
Crab
              A Python Framework for Building
                  Recommendation Engines

           https://github.com/muricoca/crab

Marcel Caraciolo Ricardo Caspirro                            Bruno Melo
   @marcelcaraciolo           @ricardocaspirro                 @brunomelo

                      {marcel, ricardo,bruno}@muricoca.com

                                                                            25

Weitere ähnliche Inhalte

Andere mochten auch

Imerior Crab Meat
Imerior Crab MeatImerior Crab Meat
Imerior Crab Meat
SMART WEB
 
Priyasha Rocky Shores
Priyasha Rocky ShoresPriyasha Rocky Shores
Priyasha Rocky Shores
Shaun Wood
 
Seed production mudcrab
Seed production mudcrabSeed production mudcrab
Seed production mudcrab
hoabienHP
 
Overview Breeding And Seed Production
Overview Breeding And Seed ProductionOverview Breeding And Seed Production
Overview Breeding And Seed Production
Ridzaludin
 
Crab Power Point
Crab Power PointCrab Power Point
Crab Power Point
KerriNor
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
wejia
 
Describing exercise
Describing exerciseDescribing exercise
Describing exercise
Sussan Roo
 

Andere mochten auch (19)

Imerior Crab Meat
Imerior Crab MeatImerior Crab Meat
Imerior Crab Meat
 
Sistemas de Recomendação e Inteligência Coletiva
Sistemas de Recomendação e Inteligência ColetivaSistemas de Recomendação e Inteligência Coletiva
Sistemas de Recomendação e Inteligência Coletiva
 
Priyasha Rocky Shores
Priyasha Rocky ShoresPriyasha Rocky Shores
Priyasha Rocky Shores
 
Lobster and crab fisheries in INDIA
Lobster and crab fisheries in INDIALobster and crab fisheries in INDIA
Lobster and crab fisheries in INDIA
 
Crab Bank Project
Crab Bank ProjectCrab Bank Project
Crab Bank Project
 
Mud crab farming tarang shah
Mud crab farming tarang shahMud crab farming tarang shah
Mud crab farming tarang shah
 
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
Brood stock management and larval rearing of mud crab scylla serrata-Gayatri ...
 
Seed production mudcrab
Seed production mudcrabSeed production mudcrab
Seed production mudcrab
 
Mud crab farming in India
Mud crab farming in IndiaMud crab farming in India
Mud crab farming in India
 
Mud crab
Mud crabMud crab
Mud crab
 
Overview Breeding And Seed Production
Overview Breeding And Seed ProductionOverview Breeding And Seed Production
Overview Breeding And Seed Production
 
Crab Power Point
Crab Power PointCrab Power Point
Crab Power Point
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievementsLcu14 wrap up meeting. Summary of Core Develoment teams achievements
Lcu14 wrap up meeting. Summary of Core Develoment teams achievements
 
Working away from the office: Benefits and drawbacks
Working away from the office: Benefits and drawbacksWorking away from the office: Benefits and drawbacks
Working away from the office: Benefits and drawbacks
 
Advancing Reinaldo Gonsalves’ Model of Global Economic Insertion
Advancing Reinaldo Gonsalves’ Model of Global Economic InsertionAdvancing Reinaldo Gonsalves’ Model of Global Economic Insertion
Advancing Reinaldo Gonsalves’ Model of Global Economic Insertion
 
Presentatie 2 Maart
Presentatie 2 MaartPresentatie 2 Maart
Presentatie 2 Maart
 
Describing exercise
Describing exerciseDescribing exercise
Describing exercise
 
Scmad Chapter09
Scmad Chapter09Scmad Chapter09
Scmad Chapter09
 

Ähnlich wie Introduction to Crab - Python Framework for Building Recommender Systems

GitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your ReposGitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your Repos
Weaveworks
 
Importance of Developers to HE in the UK
Importance of Developers to HE in the UKImportance of Developers to HE in the UK
Importance of Developers to HE in the UK
Paul Walk
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 

Ähnlich wie Introduction to Crab - Python Framework for Building Recommender Systems (20)

Python on Science ? Yes, We can.
Python on Science ?   Yes, We can.Python on Science ?   Yes, We can.
Python on Science ? Yes, We can.
 
Singularity Registry HPC
Singularity Registry HPCSingularity Registry HPC
Singularity Registry HPC
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
Walter api
Walter apiWalter api
Walter api
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
 
GitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your ReposGitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your Repos
 
A Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source AdventureA Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source Adventure
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
We are Digital Puppets
We are Digital PuppetsWe are Digital Puppets
We are Digital Puppets
 
PyData NYC by Akira Shibata
PyData NYC by Akira ShibataPyData NYC by Akira Shibata
PyData NYC by Akira Shibata
 
Qcon beijing 2010
Qcon beijing 2010Qcon beijing 2010
Qcon beijing 2010
 
Importance of Developers to HE in the UK
Importance of Developers to HE in the UKImportance of Developers to HE in the UK
Importance of Developers to HE in the UK
 
The quality of the python ecosystem - and how we can protect it!
The quality of the python ecosystem - and how we can protect it!The quality of the python ecosystem - and how we can protect it!
The quality of the python ecosystem - and how we can protect it!
 
Release management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRARelease management with NuGet/Chocolatey/JIRA
Release management with NuGet/Chocolatey/JIRA
 
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsJustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientists
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 
Six Principles of Software Design to Empower Scientists
Six Principles of Software Design to Empower ScientistsSix Principles of Software Design to Empower Scientists
Six Principles of Software Design to Empower Scientists
 
Why should I learn python
Why should I learn pythonWhy should I learn python
Why should I learn python
 

Mehr von Marcel Caraciolo

Mehr von Marcel Caraciolo (20)

Como interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com PythonComo interpretar seu próprio genoma com Python
Como interpretar seu próprio genoma com Python
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
 
Construindo softwares de bioinformática para análises clínicas : Desafios e...
Construindo softwares  de bioinformática  para análises clínicas : Desafios e...Construindo softwares  de bioinformática  para análises clínicas : Desafios e...
Construindo softwares de bioinformática para análises clínicas : Desafios e...
 
Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2Como Python ajudou a automatizar o nosso laboratório v.2
Como Python ajudou a automatizar o nosso laboratório v.2
 
Como Python pode ajudar na automação do seu laboratório
Como Python pode ajudar na automação do  seu laboratórioComo Python pode ajudar na automação do  seu laboratório
Como Python pode ajudar na automação do seu laboratório
 
Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3Oficina Python: Hackeando a Web com Python 3
Oficina Python: Hackeando a Web com Python 3
 
Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?Opensource - Como começar e dá dinheiro ?
Opensource - Como começar e dá dinheiro ?
 
Big Data com Python
Big Data com PythonBig Data com Python
Big Data com Python
 
Benchy, python framework for performance benchmarking of Python Scripts
Benchy, python framework for performance benchmarking  of Python ScriptsBenchy, python framework for performance benchmarking  of Python Scripts
Benchy, python framework for performance benchmarking of Python Scripts
 
Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?Python e 10 motivos por que devo conhece-la ?
Python e 10 motivos por que devo conhece-la ?
 
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
GeoMapper, Python Script for Visualizing Data on Social Networks with Geo-loc...
 
Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks Benchy: Lightweight framework for Performance Benchmarks
Benchy: Lightweight framework for Performance Benchmarks
 
Python, A pílula Azul da programação
Python, A pílula Azul da programaçãoPython, A pílula Azul da programação
Python, A pílula Azul da programação
 
Construindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduceConstruindo Soluções Científicas com Big Data & MapReduce
Construindo Soluções Científicas com Big Data & MapReduce
 
Como Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no BrasilComo Python está mudando a forma de aprendizagem à distância no Brasil
Como Python está mudando a forma de aprendizagem à distância no Brasil
 
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?Novas Tendências para a Educação a Distância: Como reinventar a educação ?
Novas Tendências para a Educação a Distância: Como reinventar a educação ?
 
Aula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursosAula WebCrawlers com Regex - PyCursos
Aula WebCrawlers com Regex - PyCursos
 
Arquivos Zip com Python - Aula PyCursos
Arquivos Zip com Python - Aula PyCursosArquivos Zip com Python - Aula PyCursos
Arquivos Zip com Python - Aula PyCursos
 
PyFoursquare: Python Library for Foursquare
PyFoursquare: Python Library for FoursquarePyFoursquare: Python Library for Foursquare
PyFoursquare: Python Library for Foursquare
 
Sistemas de Recomendação: Como funciona e Onde Se aplica?
Sistemas de Recomendação: Como funciona e Onde Se aplica?Sistemas de Recomendação: Como funciona e Onde Se aplica?
Sistemas de Recomendação: Como funciona e Onde Se aplica?
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Introduction to Crab - Python Framework for Building Recommender Systems

  • 1. Crab A Python Framework for Building Recommendation Engines Scipy 2011, Austin TX Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomelo 1
  • 2. What is Crab ? A python framework for building recommendation engines A Scikit module for collaborative, content and hybrid filtering Mahout Alternative for Python Developers :D Open-Source under the BSD license https://github.com/muricoca/crab 2
  • 3. When started ? It began one year ago Community-driven, 4 members Since April,2011 the open-source labs Muriçoca incorporated it Since April,2011 rewritting it as Scikit https://github.com/muricoca/ 3
  • 4. Knowing Scikits Scikits are Scipy Toolkits - independent and projects hosted under a common namespace. Scikits Image Scikits MlabWrap Scikits AudioLab Scikit Learn .... http://scikits.appspot.com/scikits 4
  • 5. Knowing Scikits Scikit-Learn Machine Learning Algorithms + scientific Python packages (Numpy, Scipy and Matplotlib) http://scikit-learn.sourceforge.net/ Our goal: Incorporate the Crab as Scikit and incorporate some parts of them at Scikit-learn 5
  • 6. Why Recommendations ? The world is an over-crowded place !"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0# 6
  • 7. Why Recommendations * +,&-.$/).#&0#/"1.#$%234(".# ? $/)#5(&6 7&.2.#"$4,#)$8 We are overloaded * 93((3&/.#&0#:&'3".;#5&&<.# $/)#:-.34#2%$4<.#&/(3/" Thousands of news articles and blog posts each day * =/#>$/&3;#?#@A#+B#4,$//"(.;# 2,&-.$/).#&0#7%&6%$:.# Millions of movies, books and music tracks online "$4,#)$8 Several Places, Offers and Events * =/#C"1#D&%<;#."'"%$(# Even Friends sometimes we are overloaded ! 2,&-.$/).#&0#$)#:"..$6".# ."/2#2&#-.#7"%#)$8 7
  • 8. Why Recommendations ? We really need and consume only a few of them! “A lot of times, people don’t know what they want until you show it to them.” Steve Jobs “We are leaving the Information age, and entering into the Recommendation age.” Chris Anderson, from book Long Tail 8
  • 9. Why Recommendations ? Can Google help ? Yes, but only when we really know what we are looking for But, what’s does it mean by “interesting” ? Can Facebook help ? Yes, I tend to find my friends’ stuffs interesting What if i had only few friends and what they like do not always attract me ? Can experts help ? Yes, but it won’t scale well. But it is what they like, not me! Exactly same advice! 9
  • 10. Why Recommendations ? Recommendation Systems Systems designed to recommend to me something I may like 10
  • 11. Why Recommendations ? !"#$%&"'$"'(')*#*+,) Recommendation Systems -+*#)+. -#/') 0#)1# ! 2' 23&4"+')1 5,6 7),*%'"&863 Graph Representation 11
  • 12. The current Crab Collaborative Filtering algorithms User-Based, Item-Based and Slope One Evaluation of the Recommender Algorithms Precision, Recall, F1-Score, RMSE Precision-Recall Charts 12
  • 13. The current Crab Precision-Recall Charts 13
  • 15. The current Crab Using REST APIs to deploy the recommender django-piston, django-rest, django-tastypie 15
  • 16. Crab is already in production Brazilian Social Network called Atepassar.com Educational network with more than 60.000 students and 3000 video-classes Running on Python + Numpy + Scipy and Django Backend for Recommendations MongoDB - mongoengine Daily Recommendations with Explanations 16
  • 17. Evaluating your recommender Crab implements the most used recommender metrics. Precision, Recall, F1-Score, RMSE Using matplotlib for a plotter utility Implement new metrics Simulations support maybe (??) 17
  • 18. Evaluating your recommender All you have to do is implement your Evaluator 18
  • 19. Distributing the recommendation computations Use Hadoop and Map-Reduce intensively Investigating the Yelp mrjob framework https://github.com/pfig/mrjob Develop the Netflix and novel standard-of-the-art used Matrix Factorization, Singular Value Decomposition (SVD), Boltzman machines The most commonly used is Slope One technique. 19
  • 20. Why migrate ? Old Crab running only using Pure Python Recommendations demand heavy maths calculations and lots of processing Compatible with Numpy and Scipy libraries High Standard and popular scientific libraries optimized for scientific calculations in Python Scikits projects are amazing! Active Communities, Scientific Conferences and updated projects (e.g. scikit-learn) Turn the Crab framework visible for the community Join the scientific researchers and machine learning developers around the Globe coding with Python to help us in this project Be Fast and Furious 20
  • 21. Why migrate ? Numpy optimized with PyPy 2.x - 48.x Faster http://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html 21
  • 22. How are we working ? Sprints, Online Discussions and Issues https://github.com/muricoca/crab/wiki/UpcomingEvents 22
  • 23. Future Releases Planned Release 0.1 Collaborative Filtering Algorithms working, sample datasets to load and test Planned Release 0.11 Evaluation of Recommendation Algorithms and Database Models support Planned Release 0.12 Recommendation as Services with REST APIs .... 23
  • 24. Join us! 1. Read our Wiki Page https://github.com/muricoca/crab/wiki/Developer-Resources 2. Check out our current sprints and open issues https://github.com/muricoca/crab/issues 3. Forks, Pull Requests mandatory 4. Join us at irc.freenode.net #muricoca or at our discussion list http://groups.google.com/group/scikit-crab 24
  • 25. Crab A Python Framework for Building Recommendation Engines https://github.com/muricoca/crab Marcel Caraciolo Ricardo Caspirro Bruno Melo @marcelcaraciolo @ricardocaspirro @brunomelo {marcel, ricardo,bruno}@muricoca.com 25