SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Practical Machine Learning
  A Tutorial on Apache Mahout


               Biju B
         NLP R&D Division
         365Media Pvt. Ltd.
         bijub@365Media.in

             FOSSMEET NITC,
                 Calicut


          4-6 February 2011




   Biju B & Jaganadh G   Practical Machine Learning
nlp r d $ whoweare




     Working in Natural Language Processing (NLP), Machine Learning,
     Data Mining
     Passionate about Free and Open source :-)
     When gets free time teaches Python and blogs at
     http://jaganadhg.freeflux.net/blog and contributes to
     Openstreetmap
     Works for 365Media Pvt. Ltd. Coimbatore India.
     twitter handle : @jaganadhg, @bijub




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.

      This talk is not aimed to give introduction about Machine Learning




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.

      This talk is not aimed to give introduction about Machine Learning
      Dont expect some mathy equations here




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering




                       Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering
     Sentiment Analysis




                       Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering
     Sentiment Analysis
     Fraud Detraction




                        Biju B & Jaganadh G   Practical Machine Learning
Mahout



  Mahout
  Open Source project by Apache Foundation
  Goal of this project is to build scalable machine learning libraries




                          Biju B & Jaganadh G   Practical Machine Learning
Mahout




  Mahout
  Mahout: a person who drives elephant ;-)
  The name comes from the project’s use of Apache Hadoop.




                       Biju B & Jaganadh G   Practical Machine Learning
Why a new library ?



  There are more than 30 Java libraries/ tools available for Machine
  Learning.
  Weka , Mallet, Classifier4j, Rapidminer ........
      Large Amount of data processing is not an easy task
      Machine Learning tools are supposed to produce quick results
      If the amount of data is too large it is not easy to process with a
      single machine (Even if it is powerful)
      Mahout is scalable: the core algorithms in Mahout are implemented
      on top of Apache Hadoop using the map/reduce paradigm




                        Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout




                Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining
     Complementary Naive Bayes classifier




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining
     Complementary Naive Bayes classifier
     Random forest decision tree based classifier




                       Biju B & Jaganadh G   Practical Machine Learning
Recommendation




    Filter information based on user preference
    Searching a large set of people and finding a smaller set with tastes
    similar to you
    e.g :- Amazon’s book recommendation , Netflix movie
    recommendation




                      Biju B & Jaganadh G   Practical Machine Learning
Document Classification




     Classify documents based on its content
     e.g: - spam filtering,priority inbox




                       Biju B & Jaganadh G   Practical Machine Learning
Demo


       Building recommendations engines with Mahout
       Document Classification with Mahout




                       Biju B & Jaganadh G   Practical Machine Learning
Reference




            Biju B & Jaganadh G   Practical Machine Learning
Reference


     Mahout in Action - Book by Sean Owen and Robin Anil, published
     by Manning Publications.
     Taming Text - By Grant Ingersoll and Tom Morton, published by
     Manning Publications.
     Introducing Apache Mahout - Grant Ingersoll - Intro to Apache
     Mahout focused on clustering, classification and collaborative
     filtering. https://www.ibm.com/developerworks/java/library/j-
     mahout/index.html
     Programming Collective Intelligence: Building Smart Web 2.0
     Applications
     http://www.amazon.com/Programming-Collective-Intelligence-
     Building-Applications/dp/0596529325




                      Biju B & Jaganadh G   Practical Machine Learning
Useful Resources




     Apache Mahout Site http://mahout.apache.org/
     Apache Mahout Mailing List user@mahout.apache.org
     The code which I used for Mahout demo is available at
     http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/
     Twenty News Group data set
     http://people.csail.mit.edu/jrennie/20Newsgroups/20news-
     bydate.tar.gz




                      Biju B & Jaganadh G   Practical Machine Learning
Questions ??




               Biju B & Jaganadh G   Practical Machine Learning
Acknowledgments



  Thanks to :
      Manning Publications for Review Copy of the book ”Mahout in
      Action”
      Apache Mahout mailing list members
      Ted Dunning and Robin Anil for suggestions
      @chelakkandupoda for review and criticism
      Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and
      encouragement




                       Biju B & Jaganadh G   Practical Machine Learning
Finally




          Biju B & Jaganadh G   Practical Machine Learning

Weitere ähnliche Inhalte

Ähnlich wie Mahout Tutorial FOSSMEET NITC

Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptxSession 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
jameshodgkinson9
 
Machine learning tutorial
Machine learning tutorialMachine learning tutorial
Machine learning tutorial
AshokKumarC18
 

Ähnlich wie Mahout Tutorial FOSSMEET NITC (20)

BotConf..pptx
BotConf..pptxBotConf..pptx
BotConf..pptx
 
Cognitive Automation - Your AI Coworker
Cognitive Automation - Your AI CoworkerCognitive Automation - Your AI Coworker
Cognitive Automation - Your AI Coworker
 
Python Machine Learning Tutorial
Python Machine Learning TutorialPython Machine Learning Tutorial
Python Machine Learning Tutorial
 
AI Training in Lucknow
AI Training in LucknowAI Training in Lucknow
AI Training in Lucknow
 
Projects
ProjectsProjects
Projects
 
Brief Presentation on Machine Learning In Power BI.pptx
Brief Presentation on Machine Learning In Power BI.pptxBrief Presentation on Machine Learning In Power BI.pptx
Brief Presentation on Machine Learning In Power BI.pptx
 
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptxSession 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
 
Machine learning tutorial
Machine learning tutorialMachine learning tutorial
Machine learning tutorial
 
Machine learning tutorial
Machine learning tutorialMachine learning tutorial
Machine learning tutorial
 
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 RoadmapCitizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
 
Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...
 
JAM23-24 session 2 .pptx
JAM23-24 session 2 .pptxJAM23-24 session 2 .pptx
JAM23-24 session 2 .pptx
 
VIRTUAL GYM ASSISTANT
VIRTUAL GYM ASSISTANTVIRTUAL GYM ASSISTANT
VIRTUAL GYM ASSISTANT
 
Pycon india-2016-success-story
Pycon india-2016-success-storyPycon india-2016-success-story
Pycon india-2016-success-story
 
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 
power-of-generative-ai.pdf
power-of-generative-ai.pdfpower-of-generative-ai.pdf
power-of-generative-ai.pdf
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM LeaderWebinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
 

Mehr von Jaganadh Gopinadhan

Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with Per
Jaganadh Gopinadhan
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org
Jaganadh Gopinadhan
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic
Jaganadh Gopinadhan
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
Jaganadh Gopinadhan
 

Mehr von Jaganadh Gopinadhan (20)

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with Per
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
A tutorial on Machine Translation
A tutorial on Machine TranslationA tutorial on Machine Translation
A tutorial on Machine Translation
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for Ooo
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Ilucbe python v1.2
Ilucbe python v1.2Ilucbe python v1.2
Ilucbe python v1.2
 
Social Media Analytics
Social Media Analytics Social Media Analytics
Social Media Analytics
 
Success Factor
Success Factor Success Factor
Success Factor
 
ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
 
Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Introduction to Free and Open Source Software
Introduction to Free and Open Source Software
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges
 
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
 
Hdfs
HdfsHdfs
Hdfs
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Mahout Tutorial FOSSMEET NITC

  • 1. Practical Machine Learning A Tutorial on Apache Mahout Biju B NLP R&D Division 365Media Pvt. Ltd. bijub@365Media.in FOSSMEET NITC, Calicut 4-6 February 2011 Biju B & Jaganadh G Practical Machine Learning
  • 2. nlp r d $ whoweare Working in Natural Language Processing (NLP), Machine Learning, Data Mining Passionate about Free and Open source :-) When gets free time teaches Python and blogs at http://jaganadhg.freeflux.net/blog and contributes to Openstreetmap Works for 365Media Pvt. Ltd. Coimbatore India. twitter handle : @jaganadhg, @bijub Biju B & Jaganadh G Practical Machine Learning
  • 3. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Biju B & Jaganadh G Practical Machine Learning
  • 4. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Biju B & Jaganadh G Practical Machine Learning
  • 5. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Biju B & Jaganadh G Practical Machine Learning
  • 6. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Dont expect some mathy equations here Biju B & Jaganadh G Practical Machine Learning
  • 7. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Biju B & Jaganadh G Practical Machine Learning
  • 8. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes Biju B & Jaganadh G Practical Machine Learning
  • 9. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Biju B & Jaganadh G Practical Machine Learning
  • 10. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Biju B & Jaganadh G Practical Machine Learning
  • 11. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Biju B & Jaganadh G Practical Machine Learning
  • 12. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Biju B & Jaganadh G Practical Machine Learning
  • 13. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Sentiment Analysis Biju B & Jaganadh G Practical Machine Learning
  • 14. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Sentiment Analysis Fraud Detraction Biju B & Jaganadh G Practical Machine Learning
  • 15. Mahout Mahout Open Source project by Apache Foundation Goal of this project is to build scalable machine learning libraries Biju B & Jaganadh G Practical Machine Learning
  • 16. Mahout Mahout Mahout: a person who drives elephant ;-) The name comes from the project’s use of Apache Hadoop. Biju B & Jaganadh G Practical Machine Learning
  • 17. Why a new library ? There are more than 30 Java libraries/ tools available for Machine Learning. Weka , Mallet, Classifier4j, Rapidminer ........ Large Amount of data processing is not an easy task Machine Learning tools are supposed to produce quick results If the amount of data is too large it is not easy to process with a single machine (Even if it is powerful) Mahout is scalable: the core algorithms in Mahout are implemented on top of Apache Hadoop using the map/reduce paradigm Biju B & Jaganadh G Practical Machine Learning
  • 18. Algorithms in Apache Mahout Biju B & Jaganadh G Practical Machine Learning
  • 19. Algorithms in Apache Mahout Collaborative Filtering Biju B & Jaganadh G Practical Machine Learning
  • 20. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders Biju B & Jaganadh G Practical Machine Learning
  • 21. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Biju B & Jaganadh G Practical Machine Learning
  • 22. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Biju B & Jaganadh G Practical Machine Learning
  • 23. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Biju B & Jaganadh G Practical Machine Learning
  • 24. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Biju B & Jaganadh G Practical Machine Learning
  • 25. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Biju B & Jaganadh G Practical Machine Learning
  • 26. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Biju B & Jaganadh G Practical Machine Learning
  • 27. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Biju B & Jaganadh G Practical Machine Learning
  • 28. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier Biju B & Jaganadh G Practical Machine Learning
  • 29. Recommendation Filter information based on user preference Searching a large set of people and finding a smaller set with tastes similar to you e.g :- Amazon’s book recommendation , Netflix movie recommendation Biju B & Jaganadh G Practical Machine Learning
  • 30. Document Classification Classify documents based on its content e.g: - spam filtering,priority inbox Biju B & Jaganadh G Practical Machine Learning
  • 31. Demo Building recommendations engines with Mahout Document Classification with Mahout Biju B & Jaganadh G Practical Machine Learning
  • 32. Reference Biju B & Jaganadh G Practical Machine Learning
  • 33. Reference Mahout in Action - Book by Sean Owen and Robin Anil, published by Manning Publications. Taming Text - By Grant Ingersoll and Tom Morton, published by Manning Publications. Introducing Apache Mahout - Grant Ingersoll - Intro to Apache Mahout focused on clustering, classification and collaborative filtering. https://www.ibm.com/developerworks/java/library/j- mahout/index.html Programming Collective Intelligence: Building Smart Web 2.0 Applications http://www.amazon.com/Programming-Collective-Intelligence- Building-Applications/dp/0596529325 Biju B & Jaganadh G Practical Machine Learning
  • 34. Useful Resources Apache Mahout Site http://mahout.apache.org/ Apache Mahout Mailing List user@mahout.apache.org The code which I used for Mahout demo is available at http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/ Twenty News Group data set http://people.csail.mit.edu/jrennie/20Newsgroups/20news- bydate.tar.gz Biju B & Jaganadh G Practical Machine Learning
  • 35. Questions ?? Biju B & Jaganadh G Practical Machine Learning
  • 36. Acknowledgments Thanks to : Manning Publications for Review Copy of the book ”Mahout in Action” Apache Mahout mailing list members Ted Dunning and Robin Anil for suggestions @chelakkandupoda for review and criticism Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and encouragement Biju B & Jaganadh G Practical Machine Learning
  • 37. Finally Biju B & Jaganadh G Practical Machine Learning