SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Mahout in Action
          Part 1


    Yasmine M. Gaber
      28 February 2013
Agenda

    Meet Apache Mahout

    Part 1: Recommendation

    Part 2: Clustering

    Part 3: Classification
Meet Apache Mahout

  It is an open source machine learning library
from Apache

    It is scalable

    It is a Java library

 It can be used with Hadoop to deal with large
scale data.
Famous Engines

  Recommender engines:

 Amazon.com

 Netflix

 Dating sites like Líbímseti

 Social networking sites like Facebook

  Clustering engines:

 Google News

 Search engines like Clusty

  Classification engines:

 Spam emails

 Google’s Picasa

 Optical character recognition software

 Apple’s Genius feature in iTunes
Recommendations
Recommender Input

    A preference consists of a user ID and an item
    ID, user’s preference for the item

    It is .csv file
Create Recommender
Recommender Evaluation

    Average difference vs Root-mean-square
Mahout RecommenderEvaluator
Precision and Recall
RecommenderIRStatsEvaluator
Representing Recommender Data

    Preference object
    −   new GenericPreference(123, 456, 3.0f)

    Preference Array
Representing Recommender Data

    Preference Array





    FastByIDMap and FastIDSet
In-memory DataModels

    GenericDataModel


    File-based data


    Refreshable components


    Database-based data
Coping without preference values
Coping without preference values
User-based Recommender

    The algorithm

for every item i that u has no preference for yet
 for every other user v that has a preference for i
    compute a similarity s between u and v
    incorporate v's preference for i, weighted by s, into a running
    average
return the top items, ranked by weighted average
Recommender Components

    Data model, implemented via DataModel


    User-user similarity metric, implemented via
    UserSimilarity


    User neighborhood definition, implemented via
    UserNeighborhood


    Recommender engine, implemented via a
    Recommender (here,
GenericUserBasedRecommender
User Neighborhoods

    Fixed-size neighborhoods





    Threshold-based neighborhood
similarity metrics

    Pearson correlation–based similarity
    −   It is a number between –1 and 1 that measures
        the tendency of two series of numbers, paired up
        one-to-one, to move together
    −   Problems:
        
            It doesn’t take into account the number of items in
            which two users’ preferences overlap, which is probably
            a weakness in the context of recommender engines.
        
            If two users overlap on only one item, no correlation can
            be computed because of how the computation is
            defined
similarity metrics

    Euclidean distance similarity
    −   1 / (1+euclidean distance)

    Cosine measure similarity
    −   between –1 and 1

    Tanimoto coefficient similarity
    −   The ratio of the size of the
    intersection to the size of
    the union of their preferred items
Item-based recommendation

    The algorithm

for every item i that u has no preference for yet
 for every item j that u has a preference for
    compute a similarity s between i and j
    add u's preference for j, weighted by s, to a running average
return the top items, ranked by weighted average
GenericItemBasedRecommender
Slope-one recommender

    The algorithm

for every item i the user u expresses no preference for
 for every item j that user u expresses a preference for
    find the average preference difference between j and i
    add this diff to u's preference value for j
    add this to a running average
return the top items, ranked by these averages
Taking Recommender to Production
User-based recommenders
Thank You



               Contact at:
Email: Yasmine.Gaber@espace.com.eg
Twitter: Twitter.com/yasmine_mohamed

Weitere ähnliche Inhalte

Was ist angesagt?

Improving Social Recommendations by applying a Personalized Item Clustering P...
Improving Social Recommendations by applying a Personalized Item Clustering P...Improving Social Recommendations by applying a Personalized Item Clustering P...
Improving Social Recommendations by applying a Personalized Item Clustering P...
Γιώργος Αλεξανδρίδης
 
intership summary
intership summaryintership summary
intership summary
Junting Ma
 
Recommender Engines
Recommender EnginesRecommender Engines
Recommender Engines
Thomas Hess
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
Lei Guo
 

Was ist angesagt? (20)

Collaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on HadoopCollaborative Filtering Recommendation Algorithm based on Hadoop
Collaborative Filtering Recommendation Algorithm based on Hadoop
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Project presentation
Project presentationProject presentation
Project presentation
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CFCollaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CF
 
Improving Social Recommendations by applying a Personalized Item Clustering P...
Improving Social Recommendations by applying a Personalized Item Clustering P...Improving Social Recommendations by applying a Personalized Item Clustering P...
Improving Social Recommendations by applying a Personalized Item Clustering P...
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
 
Presentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptxPresentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptx
 
(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Dm
DmDm
Dm
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 
intership summary
intership summaryintership summary
intership summary
 
Movies Recommendation System
Movies Recommendation SystemMovies Recommendation System
Movies Recommendation System
 
Recommender Engines
Recommender EnginesRecommender Engines
Recommender Engines
 
Towards Automatic Evaluation of Learning Object Metadata Quality
Towards Automatic Evaluation of Learning Object Metadata QualityTowards Automatic Evaluation of Learning Object Metadata Quality
Towards Automatic Evaluation of Learning Object Metadata Quality
 
Analyzing Adverse Drug Events Using Data Mining Approach
Analyzing Adverse Drug Events Using Data Mining ApproachAnalyzing Adverse Drug Events Using Data Mining Approach
Analyzing Adverse Drug Events Using Data Mining Approach
 
Recommender system
Recommender systemRecommender system
Recommender system
 
IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 

Ähnlich wie Mahout part1

Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
idoguy
 
Download
DownloadDownload
Download
butest
 
Download
DownloadDownload
Download
butest
 
Zaffar+Ahmed+ +Collaborative+Filtering
Zaffar+Ahmed+ +Collaborative+FilteringZaffar+Ahmed+ +Collaborative+Filtering
Zaffar+Ahmed+ +Collaborative+Filtering
Zaffar Ahmed Shaikh
 

Ähnlich wie Mahout part1 (20)

Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
B1802021823
B1802021823B1802021823
B1802021823
 
Item basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsItem basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithms
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System Introduction
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011Social Recommender Systems Tutorial - WWW 2011
Social Recommender Systems Tutorial - WWW 2011
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
movierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptxmovierecommendationproject-171223181147.pptx
movierecommendationproject-171223181147.pptx
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
Recommenders Systems
Recommenders SystemsRecommenders Systems
Recommenders Systems
 
LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING
LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING
LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING
 
Zaffar+Ahmed+ +Collaborative+Filtering
Zaffar+Ahmed+ +Collaborative+FilteringZaffar+Ahmed+ +Collaborative+Filtering
Zaffar+Ahmed+ +Collaborative+Filtering
 
Investigation and application of Personalizing Recommender Systems based on A...
Investigation and application of Personalizing Recommender Systems based on A...Investigation and application of Personalizing Recommender Systems based on A...
Investigation and application of Personalizing Recommender Systems based on A...
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems Roadtrip
 
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria RatingsA Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
 

Mehr von Yasmine Gaber (8)

Capistrano
CapistranoCapistrano
Capistrano
 
Ionic
IonicIonic
Ionic
 
Dyna trace
Dyna traceDyna trace
Dyna trace
 
Mahout part2
Mahout part2Mahout part2
Mahout part2
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
Home Bowling
Home BowlingHome Bowling
Home Bowling
 
Oauth2.0
Oauth2.0Oauth2.0
Oauth2.0
 
Why_do i_hate_shopping
Why_do i_hate_shoppingWhy_do i_hate_shopping
Why_do i_hate_shopping
 

Kürzlich hochgeladen

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Mahout part1

  • 1. Mahout in Action Part 1 Yasmine M. Gaber 28 February 2013
  • 2. Agenda  Meet Apache Mahout  Part 1: Recommendation  Part 2: Clustering  Part 3: Classification
  • 3. Meet Apache Mahout  It is an open source machine learning library from Apache  It is scalable  It is a Java library  It can be used with Hadoop to deal with large scale data.
  • 4. Famous Engines  Recommender engines:  Amazon.com  Netflix  Dating sites like Líbímseti  Social networking sites like Facebook  Clustering engines:  Google News  Search engines like Clusty  Classification engines:  Spam emails  Google’s Picasa  Optical character recognition software  Apple’s Genius feature in iTunes
  • 6. Recommender Input  A preference consists of a user ID and an item ID, user’s preference for the item  It is .csv file
  • 8. Recommender Evaluation  Average difference vs Root-mean-square
  • 12. Representing Recommender Data  Preference object − new GenericPreference(123, 456, 3.0f)  Preference Array
  • 13. Representing Recommender Data  Preference Array  FastByIDMap and FastIDSet
  • 14. In-memory DataModels  GenericDataModel  File-based data  Refreshable components  Database-based data
  • 17. User-based Recommender  The algorithm for every item i that u has no preference for yet for every other user v that has a preference for i compute a similarity s between u and v incorporate v's preference for i, weighted by s, into a running average return the top items, ranked by weighted average
  • 18. Recommender Components  Data model, implemented via DataModel  User-user similarity metric, implemented via UserSimilarity  User neighborhood definition, implemented via UserNeighborhood  Recommender engine, implemented via a Recommender (here,
  • 20. User Neighborhoods  Fixed-size neighborhoods  Threshold-based neighborhood
  • 21. similarity metrics  Pearson correlation–based similarity − It is a number between –1 and 1 that measures the tendency of two series of numbers, paired up one-to-one, to move together − Problems:  It doesn’t take into account the number of items in which two users’ preferences overlap, which is probably a weakness in the context of recommender engines.  If two users overlap on only one item, no correlation can be computed because of how the computation is defined
  • 22. similarity metrics  Euclidean distance similarity − 1 / (1+euclidean distance)  Cosine measure similarity − between –1 and 1  Tanimoto coefficient similarity − The ratio of the size of the intersection to the size of the union of their preferred items
  • 23. Item-based recommendation  The algorithm for every item i that u has no preference for yet for every item j that u has a preference for compute a similarity s between i and j add u's preference for j, weighted by s, to a running average return the top items, ranked by weighted average
  • 25. Slope-one recommender  The algorithm for every item i the user u expresses no preference for for every item j that user u expresses a preference for find the average preference difference between j and i add this diff to u's preference value for j add this to a running average return the top items, ranked by these averages
  • 26. Taking Recommender to Production
  • 28. Thank You Contact at: Email: Yasmine.Gaber@espace.com.eg Twitter: Twitter.com/yasmine_mohamed