SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Apache Mahout:
Scalable Machine Learning Library

Anastasiia Kornilova
What is Machine Learning?

“Machine learning - branch of artificial
intelligence, concerns the construction
and study of systems that can learn from
data”
Typical Use Cases
●

Recommend products/friends …

●

Classify content into predefined groups

●

Computer vision

●

Sentiment analysis/opinion mining

●

Find patterns in users behavior/actions

●

Identify key topics/summarize text

●

Detect anomalies/fraud

●

Ranking search results

●

Speech and handwriting recognition

●

Natural language processing
ML Algorithms (subset):
●

Supervised learning
–
–

Logistic regression

–

Support Vector Machines

–
●

Linear regression

Random Forests

Unsupervised learning
–
–

Blind signal separation

–
●

Clustering
Hidden Markov models

Semi-supervised
Many ML libraries, frameworks
and tools:
●

Weka

●

Python Scikit

●

Pylearn/Pylearn2

●

Theano

●

Orange

●

SSBrain :)

●

More can be find here: http://mloss.org/software/
Typical Workflow
●

Get data

●

Prepare data

●

Choose algorithm(s)

●

Run your algorithm(s)

●

Validate results
Every ML algorithms deals
with:
1.Data
2.Computation over this data
Scalability strategies:
●

“Bigger” computer

●

More cores

●

GPU computing

●

Parallel computing, MapReduce
What is Mahout?
●

●

Scalable ML library built on Hadoop, written in Java
Driven by Ng et al's. Paper “MapReduce for Machine Learning on
Multicore”

●

Started as Lucene sub-project. Became Apache TLP in April 2010

●

25 July 2013 - Apache Mahout 0.8 released

●

Taste Recommended Framework by Sean Owen was added in
2008
Who use Mahout?
When you need Mahout?
Data Size
Lines, Sample Data

Task
Analysis and
visualization

Tools
Whiteboard, bash, ...

KBs – low MBs,
Prototype Data

Analysis and
visualization

Octave, R, bash, ...

MBs – low Gbs,
Online Data

Storage

Data bases (MySQL,
Postgresql), ...

Analysis

NumPy, SciPy, BLAS,
Weka

Visualization
GBs – TBs – Pbs
Big Data

Protovis, D3, ...

Storage

HDFS, Hbase,
Cassandra, ...

Analysis

Mahout, Hive, Pig, ….

table from Varad Meru
Advantages
●

Community

●

Documentations and examples

●

Scalability

●

Apache license

●

Well tested

●

Built over existing production quality
libraries
Requirements
●

Java 1.6.x or greater

●

Maven 3.x to build the source code

●

Hadoop 0.20.0 or greater
Core themes
●

Recommender engines (collaborative
filtering)

●

Clustering

●

Classification
Core themes
●

Recommender engines (collaborative
filtering)

●

Clustering

●

Classification
Algorithms
●

User and Item based recommenders

●

Matrix factorization based recommenders

●

K-Means, Fuzzy K-Means clustering

●

Latent Dirichlet Allocation

●

Singular value decomposition

●

Logistic regression based classifier

●

Complementary Naive Bayes classifier

●

Random forest decision tree based classifier
Recommender engine
Personalization level
●

Generic / Non-Personalized: everyone
receives same recommendations

●

Demographic: matches a target group

●

Ephemeral: matches current activity

●

Persistent: matches long-term interests
Content based
●

User Ratings x Item Attributes => Model

●

Model applied to new items via attributes

●

●

Alternative: knowledge-based (Item
attributes form model of item space)
Example: Personalized news feeds
Table of ratings
Ratings
●

Explicit (Rating, Review, Vote, Like)

●

Implicit (Click, Purchase, Follow)
Item Item
●

For every item I

●

Select N similar items

●

Recommend users, who work with item I
this N items
User user
●

For every user

●

Find n most similar users

●

Aggregate preferences for this user

●

Generate recommended items
Similarities metrics
●

Pearson Correlation

●

Tanimoto

●

Cosine similarity

●

Euclidean distance
Sparse matrix
Parameters
●

●

●

●

DataModel – FileDataModel, MySQLJDBCDataModel,
PostgreSQLJDBCDataModel, MongoDBDataModel,
CassandraDataModel
UserSimilarity – Pearson Corelation, Tanimoto, Log-Likelihood,
Euclidian Distance, Cosine Similarity
ItemSimilarity – Pearson Corelation, Tanimoto, Log-Likelihood,
Euclidian Distance, Cosine Similarity
UserNeighborhood – Nearest N-User Neighborhood, Threshold
User Neighborhood
Code example
Evaluation
●

Average absolute difference

●

RMSE

●

Precision and recall

●

●

Precision is the proportion of top results that are relevant, for some
definition of relevant.
Recall is the proportion of all relevant results included in the top
results.
Clustering
Mahout Clustering Algorithms
●

K-Means - runs on Hadoop

●

Fuzzy K-means - runs on Hadoop

●

Latent Dirichlet Allocation -runs on Hadoop

●

Canopy clustering - runs on Hadoop

●

Minhash clustering - runs on Hadoop

●

kMeans++ streaming clustering - documentation
missing
Classification
Mahout Classification
Algorithms
●

Logistic regression (SGD) - model parameter
selection can be done in Hadoop

●

Naive Bayes - training runs on Hadoop

●

Random Forests - training is done in Hadoop

●

Hidden Markov Models - training is done in
Map-Reduce
Resources
●

Mahout in action

●

Apache Mahout Cookbook

●

Introduction to Apache Mahout

●

http://mahout.apache.org/
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

Hybrid recommender systems
Hybrid recommender systemsHybrid recommender systems
Hybrid recommender systems
renataghisloti
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
PyData
 
Recommendation system based on adaptive ontological graphs and weighted ranking
Recommendation system based on adaptive ontological graphs and weighted rankingRecommendation system based on adaptive ontological graphs and weighted ranking
Recommendation system based on adaptive ontological graphs and weighted ranking
vikramadityajakkula
 

Was ist angesagt? (13)

Recommendation system by_arpit_sharma
Recommendation system by_arpit_sharmaRecommendation system by_arpit_sharma
Recommendation system by_arpit_sharma
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Hybrid recommender systems
Hybrid recommender systemsHybrid recommender systems
Hybrid recommender systems
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation system
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Recommendation system based on adaptive ontological graphs and weighted ranking
Recommendation system based on adaptive ontological graphs and weighted rankingRecommendation system based on adaptive ontological graphs and weighted ranking
Recommendation system based on adaptive ontological graphs and weighted ranking
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
 

Andere mochten auch

Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
Dmitriy Lyubimov
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Flink Forward
 
Apache Spark & Scala
Apache Spark & ScalaApache Spark & Scala
Apache Spark & Scala
Edureka!
 
Introduction to Functional Programming with Scala
Introduction to Functional Programming with ScalaIntroduction to Functional Programming with Scala
Introduction to Functional Programming with Scala
pramode_ce
 

Andere mochten auch (18)

Kaggle - global Data Science community
Kaggle - global Data Science communityKaggle - global Data Science community
Kaggle - global Data Science community
 
Stay well with machine learning
Stay well with machine learningStay well with machine learning
Stay well with machine learning
 
Apache mahout
Apache mahoutApache mahout
Apache mahout
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Webinar: Deep Learning with H2O
Webinar: Deep Learning with H2OWebinar: Deep Learning with H2O
Webinar: Deep Learning with H2O
 
Distributed Machine Learning with Apache Mahout
Distributed Machine Learning with Apache MahoutDistributed Machine Learning with Apache Mahout
Distributed Machine Learning with Apache Mahout
 
Scala Programming Introduction
Scala Programming IntroductionScala Programming Introduction
Scala Programming Introduction
 
Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query Understanding
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data World
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Apache Spark & Scala
Apache Spark & ScalaApache Spark & Scala
Apache Spark & Scala
 
Why Scala?
Why Scala?Why Scala?
Why Scala?
 
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
A Quick Tutorial on Mahout’s Recommendation Engine (v 0.4)
 
Introduction to Functional Programming with Scala
Introduction to Functional Programming with ScalaIntroduction to Functional Programming with Scala
Introduction to Functional Programming with Scala
 

Ähnlich wie Mahout

PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
Mladen Jovanovic
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
javed75
 

Ähnlich wie Mahout (20)

Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Recommender.system.presentation.pjug.01.21.2014
Recommender.system.presentation.pjug.01.21.2014Recommender.system.presentation.pjug.01.21.2014
Recommender.system.presentation.pjug.01.21.2014
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
 
Further enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learningFurther enhancements of recommender systems using deep learning
Further enhancements of recommender systems using deep learning
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Building a Recommender systems by Vivek Murugesan - Technical Architect at Cr...
Building a Recommender systems by Vivek Murugesan - Technical Architect at Cr...Building a Recommender systems by Vivek Murugesan - Technical Architect at Cr...
Building a Recommender systems by Vivek Murugesan - Technical Architect at Cr...
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
Machine Learning: Inteligencia Artificial no es sólo un tema de Ciencia Ficci...
 
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topicMachine Learning: Artificial Intelligence isn't just a Science Fiction topic
Machine Learning: Artificial Intelligence isn't just a Science Fiction topic
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
 
C3 w4
C3 w4C3 w4
C3 w4
 
Apache mahout and R-mining complex dataobject
Apache mahout and R-mining complex dataobjectApache mahout and R-mining complex dataobject
Apache mahout and R-mining complex dataobject
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Top Natural Language Processing |aitech.studio
Top Natural Language Processing |aitech.studioTop Natural Language Processing |aitech.studio
Top Natural Language Processing |aitech.studio
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Mahout